1 00:00:09,006 --> 00:00:10,686 >> Unknown: Alright, welcome back to CS50. 2 00:00:10,746 --> 00:00:12,806 This is the start of week 7. 3 00:00:12,806 --> 00:00:14,486 For those of you that have been playing along at home, 4 00:00:14,486 --> 00:00:16,556 there was no week 6 on film at least 5 00:00:16,556 --> 00:00:17,856 because of last week's quiz. 6 00:00:17,856 --> 00:00:20,346 And now that quiz 0 is in fact behind us, 7 00:00:20,746 --> 00:00:25,126 I dare say that you are now qualified to either laugh with 8 00:00:25,516 --> 00:00:29,216 or laugh at a little something like this that went 9 00:00:29,216 --> 00:00:31,126 around the TF's list this weekend. 10 00:00:32,446 --> 00:00:34,446 [ Laughter ] 11 00:00:34,876 --> 00:00:36,876 [ Music ] 12 00:00:37,306 --> 00:00:38,406 >> Unknown: Hey. 13 00:00:38,906 --> 00:00:41,056 [ Background noise ] 14 00:00:41,556 --> 00:00:44,106 Mom, Dad, this is Ryan. 15 00:00:44,196 --> 00:00:47,336 >> Unknown: Oh, so this is the famous boy we've been hearing 16 00:00:47,336 --> 00:00:47,846 so much about. 17 00:00:47,846 --> 00:00:50,766 >> Ryan: Nice to finally meet you Missus Lead and sir. 18 00:00:51,206 --> 00:00:54,166 >> Unknown: So, Ryan, what kind of server do you play on? 19 00:00:55,186 --> 00:00:59,236 >> Ryan: Is that on, is that on a computer? 20 00:00:59,236 --> 00:00:59,303 [ Laughter ] 21 00:00:59,303 --> 00:00:59,556 You know what? 22 00:00:59,726 --> 00:01:03,486 I think I forgot the dessert in the car. 23 00:01:03,486 --> 00:01:03,553 [ Background noise ] 24 00:01:03,553 --> 00:01:05,396 >> Unknown: Who do you think you are bringing his kind 25 00:01:05,396 --> 00:01:05,706 into my house? 26 00:01:05,756 --> 00:01:08,166 >> Unknown: But dad, he's a nice guy really. 27 00:01:08,166 --> 00:01:10,836 >> Unknown: No, he's a newb Jessica, a newb. 28 00:01:11,096 --> 00:01:13,086 >> Jessica: But Ryan and I are perfect together. 29 00:01:13,086 --> 00:01:18,296 >> Missus Lead: This relationship is headed 30 00:01:18,296 --> 00:01:24,516 for an epic fail young lady. 31 00:01:24,516 --> 00:01:26,546 >> Unknown: You're elite damn it, 32 00:01:26,616 --> 00:01:40,916 we don't take newbs, we prone them. 33 00:01:41,726 --> 00:01:48,356 >> Jessica: Well maybe I don't like to be elite. 34 00:01:48,356 --> 00:01:49,886 >> Unknown: You're insolence, for a lose. 35 00:01:49,916 --> 00:01:51,086 >> Jessica: No, maybe I love that he watches VHS tapes still 36 00:01:51,116 --> 00:01:52,376 and maybe I love that his phone still has a cord. 37 00:01:52,406 --> 00:01:52,586 >> Missus Lead: You might 38 00:01:52,616 --> 00:01:53,606 as well date somebody who plays Aliant. 39 00:01:53,636 --> 00:01:54,206 >> Jessica: I'll take whoever I want. 40 00:01:54,236 --> 00:01:55,406 >> Unknown: Over my level 80 Roadster Pirelli dead body. 41 00:01:55,436 --> 00:01:55,706 >> Ryan: Mister Lead. 42 00:01:55,736 --> 00:01:56,876 I may run around in circles when I play Halo 43 00:01:56,906 --> 00:01:57,836 and I might never get a monster kill. 44 00:01:57,866 --> 00:01:59,036 I can't even find the space bar half the time. 45 00:01:59,066 --> 00:01:59,876 But I know when I found true love. 46 00:01:59,906 --> 00:02:01,346 And that is worth more than all the uber gear in the world. 47 00:02:01,346 --> 00:02:01,413 [ Music ] 48 00:02:01,413 --> 00:02:01,976 >> Unknown: Too long, did not listen. 49 00:02:02,516 --> 00:02:05,906 [ Laughter ] 50 00:02:06,406 --> 00:02:09,796 [ Background noise ] 51 00:02:10,296 --> 00:02:13,686 [ Laughter ] 52 00:02:14,186 --> 00:02:15,976 >> Unknown: It's like the fifth time I've watched that. 53 00:02:15,976 --> 00:02:16,043 [ Laughter ] 54 00:02:16,043 --> 00:02:21,426 So a couple of announcements, problem set 5 went 55 00:02:21,426 --> 00:02:22,646 out the door on Friday. 56 00:02:22,646 --> 00:02:24,646 This recall is the forensic P set 57 00:02:24,646 --> 00:02:27,596 which involves you're not only recovering a secret hidden 58 00:02:27,596 --> 00:02:30,056 message but also recovering some photographs 59 00:02:30,096 --> 00:02:31,726 that we accidentally deleted. 60 00:02:31,726 --> 00:02:33,396 And to add to the fun you'll find 61 00:02:33,396 --> 00:02:37,216 that once you've recovered these photos, they are all of persons, 62 00:02:37,296 --> 00:02:40,856 places or things on campus and the goal ultimately 63 00:02:41,036 --> 00:02:43,256 after the P set is submitted is if you can 64 00:02:43,256 --> 00:02:47,136 with your section mates find any or all of those photographs 65 00:02:47,406 --> 00:02:50,386 and then photograph yourself or someone else in your section 66 00:02:50,466 --> 00:02:53,116 in that same place or with that same person 67 00:02:53,116 --> 00:02:55,086 or thing, happy cat excluded. 68 00:02:55,296 --> 00:02:58,226 We will award you amazing well 69 00:02:58,226 --> 00:03:00,106 in some form later this semester. 70 00:03:00,106 --> 00:03:02,386 So you have a few weeks for that scavenger hunt portion. 71 00:03:02,666 --> 00:03:04,156 You'll notice on the course's website too 72 00:03:04,156 --> 00:03:07,896 that we've been soliciting beta testers for cloud.CS50.net. 73 00:03:08,266 --> 00:03:11,366 So CS50 has some super fancy hardware that we bought 74 00:03:11,366 --> 00:03:13,706 over the course of the summer that's taken us a little longer 75 00:03:13,706 --> 00:03:15,096 to configure it than we would have liked. 76 00:03:15,096 --> 00:03:17,216 But finally we debuted this weekend 77 00:03:17,466 --> 00:03:20,676 and the cloud.CS50.net is essentially a cluster 78 00:03:21,226 --> 00:03:22,686 of servers running Linux, 79 00:03:22,686 --> 00:03:24,826 virtual servers actually, virtual machines. 80 00:03:24,826 --> 00:03:25,766 More on that in the future. 81 00:03:26,136 --> 00:03:29,096 That you all will very soon have accounts on. 82 00:03:29,096 --> 00:03:31,656 This is so that we can leave nice.fas and some 83 00:03:31,656 --> 00:03:32,946 of its shortcomings behind us 84 00:03:32,946 --> 00:03:34,886 and operate our own infrastructure entirely. 85 00:03:34,886 --> 00:03:36,456 And this will be particularly useful 86 00:03:36,456 --> 00:03:39,636 for you come final projects time because inevitably many 87 00:03:39,636 --> 00:03:41,976 of you will go off and want to use a different language 88 00:03:41,976 --> 00:03:43,116 or a different piece of software. 89 00:03:43,386 --> 00:03:45,436 And because we'll finally be in our own infrastructure, 90 00:03:45,436 --> 00:03:48,626 we can just type a few magical commands and wala, you will have 91 00:03:48,726 --> 00:03:50,356 with higher probability what you need. 92 00:03:50,626 --> 00:03:54,076 Because this P set is related to forensics, I actually smiled 93 00:03:54,076 --> 00:03:56,266 when I happened to cross this this morning. 94 00:03:56,556 --> 00:03:58,596 A little something stupid as well for you. 95 00:03:59,756 --> 00:03:59,856 Oops. 96 00:04:00,061 --> 00:04:02,061 [ Laughter ] 97 00:04:02,266 --> 00:04:05,686 Ok, a few chuckles, alright. 98 00:04:05,686 --> 00:04:09,226 So that's from the popular website Icanhascheeseburger.com 99 00:04:09,226 --> 00:04:10,926 which we actually used to integrate 100 00:04:10,926 --> 00:04:12,116 into the course's own website. 101 00:04:12,116 --> 00:04:14,236 We used to have a wall catch of the day. 102 00:04:14,236 --> 00:04:16,846 Based on Q guide evaluations and a recent comment made 103 00:04:16,846 --> 00:04:19,516 in computer science 61 for those familiar, 104 00:04:19,516 --> 00:04:22,036 we decided to remove Wall Cats slightly 105 00:04:22,036 --> 00:04:23,216 from the course this semester, 106 00:04:23,456 --> 00:04:26,366 but some of you might enjoy wasting some time there. 107 00:04:26,366 --> 00:04:27,996 It's linked on the course's website. 108 00:04:28,356 --> 00:04:30,166 Alright, so final project. 109 00:04:30,316 --> 00:04:31,616 So it might be a little early, 110 00:04:31,856 --> 00:04:33,706 but actually now is pretty much a really good time 111 00:04:33,706 --> 00:04:35,696 to start thinking about final projects. 112 00:04:35,696 --> 00:04:37,896 And thinking about projects that even if you at this point 113 00:04:37,896 --> 00:04:40,016 in time have no idea how you, 114 00:04:40,236 --> 00:04:42,176 little old you would actually implement something, 115 00:04:42,176 --> 00:04:44,806 that's fine because there's much more to come in the weeks 116 00:04:44,806 --> 00:04:47,156 to come including a look at web programming and some 117 00:04:47,156 --> 00:04:48,736 of the languages, the technologies 118 00:04:48,736 --> 00:04:51,766 and the concepts underlying web based applications. 119 00:04:51,766 --> 00:04:54,826 You will find on the course's website a few relatively 120 00:04:54,826 --> 00:04:55,626 new links. 121 00:04:55,626 --> 00:04:58,196 For one, there is this link called seminars. 122 00:04:58,196 --> 00:05:00,326 And all of this is linked right now in the course's homepage. 123 00:05:00,916 --> 00:05:02,906 What the teaching fellows have done historically 124 00:05:02,906 --> 00:05:05,126 and will continue doing this year is offer a number 125 00:05:05,126 --> 00:05:08,186 of seminars on topics related to 126 00:05:08,186 --> 00:05:11,556 but nonetheless somewhat tangential to CS50 on material 127 00:05:11,556 --> 00:05:13,586 that we just haven't the time or the syllabus for, 128 00:05:13,586 --> 00:05:15,186 so that if you're interested in learning 129 00:05:15,186 --> 00:05:18,136 about developing a voice based application 130 00:05:18,136 --> 00:05:20,666 like Shuttle Boy Voice, well we'll have the infrastructure 131 00:05:20,666 --> 00:05:22,866 and the tools to make that possible for you 132 00:05:22,866 --> 00:05:25,626 and we'll offer a 60 to 90 minute seminar 133 00:05:25,626 --> 00:05:27,886 in early November led by one of the course assistants 134 00:05:27,886 --> 00:05:29,176 on how to do exactly that. 135 00:05:29,626 --> 00:05:33,006 We'll talk about Rails which is another programming environment, 136 00:05:33,006 --> 00:05:35,306 iPhone development, Android phone development. 137 00:05:35,476 --> 00:05:37,736 And we actually have access to some pretty neat stuff. 138 00:05:37,736 --> 00:05:40,936 So those of you who might like to tackle an iPhone app, 139 00:05:40,936 --> 00:05:42,326 you might be vaguely familiar 140 00:05:42,326 --> 00:05:44,306 that Apple actually charges people typically 141 00:05:44,306 --> 00:05:46,356 to download the tools necessary for such, 142 00:05:46,706 --> 00:05:49,156 99 dollars here, 299 dollars there. 143 00:05:49,376 --> 00:05:50,966 Well you won't need to pay any such things. 144 00:05:50,966 --> 00:05:53,336 We have our own CS50 account that we can make you part of. 145 00:05:53,566 --> 00:05:56,356 And also Google has very generously donated a number 146 00:05:56,356 --> 00:05:58,286 of Android cell phones to the course. 147 00:05:58,606 --> 00:06:00,866 Some of the teaching fellows are actually carrying this 148 00:06:00,866 --> 00:06:04,106 around already and have been playing with Google's API, 149 00:06:04,396 --> 00:06:06,286 application programming interface 150 00:06:06,596 --> 00:06:08,826 with which they've been writing their own software 151 00:06:08,826 --> 00:06:09,826 for these phones. 152 00:06:09,826 --> 00:06:12,036 And we'll demo at least one of those apps before long. 153 00:06:12,356 --> 00:06:16,036 But if you are a student in this course and are already 154 00:06:16,036 --> 00:06:18,686 on TMobile ideally or at least AT and T, 155 00:06:18,686 --> 00:06:20,736 both of whom uses GSM networks. 156 00:06:21,106 --> 00:06:24,666 And you would like to tackle an android based project, 157 00:06:24,756 --> 00:06:27,576 a phone based project using one of Google's new phones, 158 00:06:27,836 --> 00:06:29,276 we can provide the phone. 159 00:06:29,276 --> 00:06:31,446 And if this is something that you see through to throughition, 160 00:06:31,796 --> 00:06:33,596 you'll be welcome to keep the phone as well. 161 00:06:33,596 --> 00:06:35,656 So more on that in the time to come. 162 00:06:35,656 --> 00:06:38,006 And you'll also see that we have built up an archive 163 00:06:38,006 --> 00:06:40,186 over the past couple of years of seminars past, 164 00:06:40,456 --> 00:06:43,116 all of which were videotaped and made available online. 165 00:06:43,386 --> 00:06:45,536 So even if we're not offering something this year 166 00:06:45,796 --> 00:06:48,816 on some topic, realize that there's a couple of dozen 167 00:06:49,066 --> 00:06:50,486 of seminars from years past. 168 00:06:50,756 --> 00:06:52,576 And there's probably going to be some more to come. 169 00:06:52,856 --> 00:06:54,806 And also, I keep using this word API. 170 00:06:55,336 --> 00:06:59,406 And even though we've not used APIs at least public ones just 171 00:06:59,406 --> 00:07:01,756 yet in the course, we will for P sets 7 and 8. 172 00:07:02,196 --> 00:07:05,756 But an application programming interface is essentially, 173 00:07:05,906 --> 00:07:09,296 think of it kind of like a header file almost, a dot H file 174 00:07:09,576 --> 00:07:11,656 that has a bunch of functions declared in it 175 00:07:11,906 --> 00:07:14,726 that you can call inside of your own program. 176 00:07:14,726 --> 00:07:17,476 But you didn't have to implement the functionality defined 177 00:07:17,566 --> 00:07:18,226 in that API. 178 00:07:18,586 --> 00:07:20,006 So in P set 8 for instance, 179 00:07:20,326 --> 00:07:22,556 you guys will implement a mash up using Google Maps. 180 00:07:23,016 --> 00:07:25,636 Well you're not going to have to invent that from the ground up. 181 00:07:25,726 --> 00:07:29,166 You will be able to use Google's tiles and graphics 182 00:07:29,166 --> 00:07:30,926 for various maps around the world. 183 00:07:30,926 --> 00:07:33,406 You'll be able to use their JAVA script code that allows you 184 00:07:33,406 --> 00:07:36,356 to drop little markers on top of maps and pan 185 00:07:36,356 --> 00:07:37,806 from location to location. 186 00:07:37,806 --> 00:07:40,546 So you'll have a lot of functionality just handed to you 187 00:07:40,776 --> 00:07:43,576 in the form of an API or perhaps a library 188 00:07:43,786 --> 00:07:46,306 which just means Google has written a lot of functions 189 00:07:46,306 --> 00:07:48,466 or methods that you can call in your own code. 190 00:07:48,696 --> 00:07:51,916 And then actually wire things up in a more interesting way. 191 00:07:52,186 --> 00:07:54,306 So what we've started doing on this page here, 192 00:07:54,306 --> 00:07:56,016 which is also linked on our homepage now, 193 00:07:56,296 --> 00:07:59,186 is we've just started linking what we think are some fun APIs. 194 00:07:59,726 --> 00:08:02,206 And we don't specify on this page what you might want 195 00:08:02,206 --> 00:08:02,716 to do with them. 196 00:08:02,716 --> 00:08:04,906 The hope is that you'll skim this list at some point 197 00:08:05,116 --> 00:08:08,076 and just say oh wow, I had no idea I could do this 198 00:08:08,076 --> 00:08:10,146 with Facebook or I had no idea I could do this 199 00:08:10,146 --> 00:08:11,296 with Google or Twitter. 200 00:08:11,566 --> 00:08:12,816 And then hopefully it will start 201 00:08:12,816 --> 00:08:15,106 to instigate some ideas in your own mind. 202 00:08:15,106 --> 00:08:16,356 So we'll release formal specs 203 00:08:16,716 --> 00:08:19,316 for the final project in a few days time. 204 00:08:19,586 --> 00:08:21,006 By next week when we look at, 205 00:08:21,006 --> 00:08:22,276 start looking at web programming. 206 00:08:22,276 --> 00:08:26,366 But again the hope now is just to start fostering some thought. 207 00:08:26,436 --> 00:08:29,086 And it's funny, you've seen on CS50's website 208 00:08:29,086 --> 00:08:31,266 that we've been dabbling internally with various apps 209 00:08:31,266 --> 00:08:34,246 for events and news and shuttles and all of this. 210 00:08:34,526 --> 00:08:36,326 Partly just because we can and hopefully 211 00:08:36,326 --> 00:08:37,586 because people find it useful. 212 00:08:37,766 --> 00:08:39,366 But we actually had a staff member 213 00:08:39,366 --> 00:08:42,616 at Harvard email me the other day saying hey I've been using 214 00:08:42,616 --> 00:08:43,106 Harvard News. 215 00:08:43,106 --> 00:08:46,416 It'd be great if you did something for Harvard Twit, 216 00:08:46,876 --> 00:08:49,726 Twitters, so people who use Twitter, the Twitter audi. 217 00:08:49,836 --> 00:08:54,036 So I up until a few weeks ago had never let myself say words 218 00:08:54,036 --> 00:08:57,006 like Tweet and similar words, 219 00:08:57,006 --> 00:09:00,046 but I've kind of sunk to that now. 220 00:09:00,046 --> 00:09:01,666 I don't yet do it myself. 221 00:09:01,666 --> 00:09:03,556 Although I have a lot of followers on Twitter 222 00:09:03,556 --> 00:09:05,906 because I signed up like a year ago for a Twitter account 223 00:09:05,906 --> 00:09:08,036 because we were playing around with CS50 ideas. 224 00:09:08,406 --> 00:09:10,446 So apparently people have just been subscribing to mail 225 00:09:10,446 --> 00:09:11,926 in it post for the past 12 months. 226 00:09:12,056 --> 00:09:13,106 I've never said one thing. 227 00:09:13,106 --> 00:09:14,406 I have a lot of subscribers. 228 00:09:15,346 --> 00:09:17,666 Maybe we'll start doing something with it at some point. 229 00:09:18,076 --> 00:09:19,886 But in any case, I was inspired. 230 00:09:19,926 --> 00:09:23,106 So I was away at a workshop last week and I had some down time 231 00:09:23,106 --> 00:09:24,766 in the hotel room late at night. 232 00:09:24,766 --> 00:09:27,096 And I figured already, maybe this could be interesting 233 00:09:27,096 --> 00:09:28,326 to do something with Tweets. 234 00:09:28,766 --> 00:09:32,066 And so I figured, you know I vaguely knew what Twitter was 235 00:09:32,066 --> 00:09:34,036 and how it works and I figured, alright, they're pretty hip, 236 00:09:34,036 --> 00:09:34,996 they're pretty huge now. 237 00:09:35,296 --> 00:09:38,856 Maybe they have an API, some way of me interfacing 238 00:09:38,856 --> 00:09:42,046 with their data and with people's tweets and all of this 239 00:09:42,046 --> 00:09:43,926 so that I could present it maybe 240 00:09:43,926 --> 00:09:46,126 on a Harvard specific basis in my own site. 241 00:09:46,126 --> 00:09:51,106 So I Googled something like Twitter API, hit enter and wala, 242 00:09:51,106 --> 00:09:53,626 Twitter API Wiki or here's the documentation. 243 00:09:53,906 --> 00:09:55,786 So I spent some time that night just kind 244 00:09:55,786 --> 00:09:56,596 of reading through this. 245 00:09:56,596 --> 00:09:58,826 And a lot of this frankly I didn't really need. 246 00:09:58,916 --> 00:10:01,216 You'll find that even within Google's APIs, there's a lot 247 00:10:01,216 --> 00:10:02,586 of functionality you don't care about. 248 00:10:02,586 --> 00:10:04,676 So it need not be overwhelming but rather kind 249 00:10:04,676 --> 00:10:06,296 of exciting what you can do with this. 250 00:10:06,296 --> 00:10:09,746 And I found ultimately that I could serve through tweets 251 00:10:09,976 --> 00:10:11,666 that had been posted around the world. 252 00:10:11,936 --> 00:10:14,436 I found that I could search for a specific user names 253 00:10:14,436 --> 00:10:16,986 and get their icons and get their actual names 254 00:10:16,986 --> 00:10:18,306 and URLs and so forth. 255 00:10:18,656 --> 00:10:21,066 And so you know a few hours later that night 256 00:10:21,066 --> 00:10:22,826 and then a few hours later the next night, 257 00:10:23,146 --> 00:10:26,516 once I had touched things up, thus was born Harvard Tweets. 258 00:10:26,966 --> 00:10:30,116 So now we have another stupid little website 259 00:10:30,326 --> 00:10:33,656 that simply aggregates tweets from people who know about all 260 00:10:33,656 --> 00:10:36,796 over campus, whether person or department or group. 261 00:10:36,966 --> 00:10:39,586 And what we did was write a few scripts behind the scenes 262 00:10:39,626 --> 00:10:43,236 that synchronizes this site here, Tweets.CS50.net 263 00:10:43,546 --> 00:10:47,156 with Twitter's own dataset every 5 or so minutes. 264 00:10:47,376 --> 00:10:50,516 So it's actually fascinating since that hotel room last week, 265 00:10:50,856 --> 00:10:55,746 Harvard affiliates have tweeted 2344 times. 266 00:10:56,036 --> 00:10:57,776 And that number grows by the minute. 267 00:10:57,776 --> 00:11:00,246 This was kind of my innovation last night. 268 00:11:00,246 --> 00:11:01,706 And innovation is generous 269 00:11:01,706 --> 00:11:03,566 since I didn't really do anything here. 270 00:11:03,866 --> 00:11:06,336 But I found this on someone's blog just randomly. 271 00:11:06,336 --> 00:11:08,576 And I was like oh that's a really cool tag cloud. 272 00:11:08,576 --> 00:11:10,326 It's actually a cloud or a sphere. 273 00:11:10,696 --> 00:11:12,406 So this is actually a flash animation 274 00:11:12,406 --> 00:11:14,956 but this guy too had open sourced his project. 275 00:11:15,006 --> 00:11:17,886 He made available essentially what was an API 276 00:11:17,886 --> 00:11:21,716 for putting your own words into a 3D cloud like that. 277 00:11:21,716 --> 00:11:24,386 And so what these are, these little hash tags 278 00:11:24,386 --> 00:11:26,606 in Twitter speak that are simply key words 279 00:11:26,606 --> 00:11:27,876 that you can then search on. 280 00:11:27,876 --> 00:11:31,176 So if I go over here and I'm curious about people talking 281 00:11:31,176 --> 00:11:32,806 about baseball, I can click that. 282 00:11:33,176 --> 00:11:35,066 And all of the sudden I get all the recent tweets 283 00:11:35,066 --> 00:11:37,196 about baseball, people that have mentioned it. 284 00:11:37,276 --> 00:11:37,536 So. 285 00:11:38,036 --> 00:11:40,376 [ Laughter ] 286 00:11:40,876 --> 00:11:42,546 You would be amazed, ok. 287 00:11:42,856 --> 00:11:43,126 So. 288 00:11:43,126 --> 00:11:44,886 [ Laughter ] 289 00:11:44,886 --> 00:11:46,926 Alright, so this is ridiculous. 290 00:11:47,756 --> 00:11:53,066 You should see how many posts come from Harvard FML. 291 00:11:53,696 --> 00:11:57,216 This is since like last Tuesday night in my hotel room. 292 00:11:57,266 --> 00:11:59,416 So you guys are out of control. 293 00:11:59,416 --> 00:12:00,246 [ Laughter ] 294 00:12:00,246 --> 00:12:03,546 So, anyhow, at some point soon we might need a new filter 295 00:12:03,546 --> 00:12:04,566 of sorts because if you start skimming 296 00:12:04,566 --> 00:12:07,026 through this morning alone, I think it must depend 297 00:12:07,026 --> 00:12:09,386 on whoever's running the site when they actually wake up 298 00:12:09,386 --> 00:12:10,526 and start approving things. 299 00:12:10,526 --> 00:12:11,956 There it comes, so that's last night. 300 00:12:12,406 --> 00:12:13,686 I mean it starts to overwhelm. 301 00:12:13,686 --> 00:12:17,906 But the point here is I mean partly just to distract perhaps. 302 00:12:18,216 --> 00:12:20,936 But also it's actually really just neat what you can do. 303 00:12:20,936 --> 00:12:23,536 So now you guys have like half a semester behind you essentially 304 00:12:23,536 --> 00:12:24,856 and you have some programming chopped. 305 00:12:25,106 --> 00:12:26,806 You have hopefully conceptual understanding 306 00:12:26,806 --> 00:12:29,546 of how the world works, at least technologically. 307 00:12:29,546 --> 00:12:32,206 And there's again much more to come, but doing something 308 00:12:32,206 --> 00:12:34,686 like this is relatively now accessible. 309 00:12:34,686 --> 00:12:35,926 And the hope for a lot of you, 310 00:12:35,926 --> 00:12:38,136 especially those dubbed less comfortable, 311 00:12:38,296 --> 00:12:40,486 is that you'll be able to do come term's end 312 00:12:40,606 --> 00:12:43,376 for final projects things that you had no sense 313 00:12:43,416 --> 00:12:46,606 of your own ability to do at the start of the semester. 314 00:12:46,606 --> 00:12:49,046 So hopefully a number of you will amaze yourselves. 315 00:12:49,046 --> 00:12:51,626 And you'll find at the end of the semester that the goal is 316 00:12:51,626 --> 00:12:54,446 to exhibit precisely those accomplishments in the form 317 00:12:54,446 --> 00:12:57,536 of the CS50 fair for which we have several dozen videos 318 00:12:57,536 --> 00:12:59,096 from last year, hosted on Facebook. 319 00:12:59,326 --> 00:13:00,506 And it was actually a lot of fun. 320 00:13:00,506 --> 00:13:02,316 And we will host a similar event for you 321 00:13:02,316 --> 00:13:03,796 and your friends this year. 322 00:13:04,136 --> 00:13:06,166 Any questions, logistical or otherwise? 323 00:13:06,166 --> 00:13:06,756 [ Background noise ] 324 00:13:06,756 --> 00:13:09,606 Alright, so last note here. 325 00:13:09,856 --> 00:13:12,236 So dinner, this Wednesday, we had lunch last Friday, 326 00:13:12,236 --> 00:13:15,336 we'll do dinner this Wednesday hosted by Matherhouse and also 327 00:13:15,336 --> 00:13:18,016 in attendance will be Professional Rodica Nagpaul 328 00:13:18,016 --> 00:13:19,186 who teaches a number of course 329 00:13:19,306 --> 00:13:23,886 at SCAS including the course dubbed robo soccer. 330 00:13:23,886 --> 00:13:25,526 So last spring I had the pleasure 331 00:13:25,746 --> 00:13:27,646 of seeing some former CS50 students 332 00:13:27,646 --> 00:13:30,656 and teaching fellows compete in robo soccer, 333 00:13:30,656 --> 00:13:33,296 whereby they'd spent the entire semester using their computer 334 00:13:33,296 --> 00:13:34,916 science and engineering background. 335 00:13:35,216 --> 00:13:39,266 And to program these little robotic devices 336 00:13:39,266 --> 00:13:40,886 that were supposed to play soccer. 337 00:13:40,886 --> 00:13:43,836 And it was really neat, up in the Quad in Hillis Library, 338 00:13:44,116 --> 00:13:45,996 they had their little soccer field set up. 339 00:13:45,996 --> 00:13:47,646 They had cameras mounted in the ceiling 340 00:13:47,846 --> 00:13:50,066 and the cameras then looked down onto the field to figure 341 00:13:50,066 --> 00:13:52,316 out where the robots were in order 342 00:13:52,316 --> 00:13:55,336 to determine what algorithm should get applied 343 00:13:55,566 --> 00:13:58,066 to a particular robot at a given moment in time, 344 00:13:58,276 --> 00:14:00,526 given the locations of other robots, given the location 345 00:14:00,526 --> 00:14:03,406 of the ball, given the location of the goal. 346 00:14:03,406 --> 00:14:06,666 So Rodica will be joining us this Wednesday largely 347 00:14:06,666 --> 00:14:07,706 for a casual chat. 348 00:14:07,706 --> 00:14:10,086 She'll talk about her work and that kind of stuff. 349 00:14:10,086 --> 00:14:11,906 And then it's also an opportunity to chat with me 350 00:14:11,906 --> 00:14:13,486 and the teaching fellows and course assistants. 351 00:14:13,486 --> 00:14:16,866 So just head to CS50.net/RSVP if you are both hungry 352 00:14:16,866 --> 00:14:19,236 and interested in getting got know some of the team. 353 00:14:19,826 --> 00:14:24,406 Alright, so as much as excited as I tried to sound 354 00:14:24,406 --> 00:14:26,996 at the very start of this semester about four loops 355 00:14:26,996 --> 00:14:29,956 and vial loops and do while loops and all of this stuff, 356 00:14:30,276 --> 00:14:31,446 so we were kind of misleading. 357 00:14:31,446 --> 00:14:33,456 Like none of that stuff is really all that exciting. 358 00:14:33,576 --> 00:14:34,856 But we had to get you through it. 359 00:14:35,006 --> 00:14:38,226 But it's now after quiz 0 that we can finally dive 360 00:14:38,406 --> 00:14:41,096 into what are some more technical challenges. 361 00:14:41,406 --> 00:14:44,016 And also some more sophisticated solutions. 362 00:14:44,016 --> 00:14:45,216 So up until now, a lot 363 00:14:45,216 --> 00:14:49,076 of the problems you've been solving have evolved using you 364 00:14:49,076 --> 00:14:52,446 know an array or maybe starting last week thinking 365 00:14:52,446 --> 00:14:53,966 about things like link lists. 366 00:14:53,966 --> 00:14:57,316 But using those same basis, arrays and linked lists, 367 00:14:57,676 --> 00:15:01,236 we'll see this week can you do a lot more sophisticated 368 00:15:01,236 --> 00:15:03,356 approaches to similar problems. 369 00:15:03,356 --> 00:15:05,686 And we'll talk this week about compress. 370 00:15:05,836 --> 00:15:08,526 We'll talk this week about bit wise operations, 371 00:15:08,526 --> 00:15:10,626 manipulating bits at the very lowest levels, 372 00:15:10,726 --> 00:15:12,636 trees and things called tries and heaps. 373 00:15:13,016 --> 00:15:15,256 So data structure that you can continue a discussion 374 00:15:15,256 --> 00:15:19,616 of in a course like CS124, but generally that will empower you 375 00:15:19,616 --> 00:15:22,506 to solve actual interesting real world problems 376 00:15:23,076 --> 00:15:26,386 with a much richer tool kit under your belt. 377 00:15:26,786 --> 00:15:28,616 So first a practical tool. 378 00:15:28,616 --> 00:15:32,396 So it's been very annoying for you I'm sure to wage battle 379 00:15:32,396 --> 00:15:35,096 against seg fault and other memory related errors. 380 00:15:35,096 --> 00:15:37,616 And usually if you've triggered a seg fault you've done 381 00:15:37,616 --> 00:15:42,546 something per quiz 0 like step over the boundary of an array 382 00:15:42,546 --> 00:15:44,486 and touch memory that you don't own. 383 00:15:44,486 --> 00:15:46,706 Or you take a pointer and you try accidentally 384 00:15:46,706 --> 00:15:47,316 dereferencing it. 385 00:15:47,316 --> 00:15:50,316 Or you take null and worse yet you try dereferencing it. 386 00:15:50,316 --> 00:15:53,026 So in short, any time you guys have done something wrong, 387 00:15:53,026 --> 00:15:54,406 unintentionally with memory, 388 00:15:54,646 --> 00:15:56,736 have you possibly triggered a seg fault. 389 00:15:56,736 --> 00:15:58,056 Not always, but sometimes. 390 00:15:58,306 --> 00:16:01,086 It turns out there's help besides your own eyes 391 00:16:01,086 --> 00:16:02,296 and besides the teaching staff. 392 00:16:02,536 --> 00:16:05,696 So there are tools like this one, Val Grind which is a Linux 393 00:16:05,836 --> 00:16:09,276 or a base tool that you can now run much 394 00:16:09,276 --> 00:16:11,696 like GDB on your own binaries. 395 00:16:11,696 --> 00:16:14,256 Your own executable code and it will do its best 396 00:16:14,306 --> 00:16:17,606 to analyze your program to see did you screw up anywhere, 397 00:16:17,686 --> 00:16:19,156 at least with regard to memory. 398 00:16:19,156 --> 00:16:20,836 Might you trigger a seg fault? 399 00:16:21,116 --> 00:16:23,356 Did you allocate memory that you did not free? 400 00:16:23,356 --> 00:16:24,676 So do you have memory leaks? 401 00:16:24,966 --> 00:16:27,106 So what I did was I whipped up a little program here, 402 00:16:27,106 --> 00:16:28,626 you should have a print out of it today, 403 00:16:28,626 --> 00:16:31,786 hand outs were outside, a bit belatedly on the way in. 404 00:16:31,786 --> 00:16:32,596 If you didn't get them, 405 00:16:32,596 --> 00:16:34,516 there will be some there on during break. 406 00:16:34,706 --> 00:16:36,746 But it's a very small program and you can see it here. 407 00:16:37,096 --> 00:16:38,136 So I have a main routine 408 00:16:38,136 --> 00:16:40,846 that just calls a function called F and then it returns. 409 00:16:40,886 --> 00:16:41,976 But what does F do? 410 00:16:41,976 --> 00:16:44,006 Well I do an allocation of memory 411 00:16:44,126 --> 00:16:45,446 and then I do an assignment. 412 00:16:45,816 --> 00:16:47,806 So there's at least two bugs in this code. 413 00:16:48,196 --> 00:16:52,156 This is memory.c and whether you have it on paper or just 414 00:16:52,156 --> 00:16:55,016 in front of you, can someone identify one of the bugs 415 00:16:55,016 --> 00:16:56,526 or mistakes in this program? 416 00:16:56,901 --> 00:16:58,901 [ Background noise ] 417 00:16:59,276 --> 00:17:00,466 Say again? 418 00:17:00,966 --> 00:17:04,196 [ Background noise ] 419 00:17:04,696 --> 00:17:06,446 Ok so we don't free the malock, 420 00:17:06,606 --> 00:17:09,906 so the top of the function F I call mallock 421 00:17:09,906 --> 00:17:12,286 and I allocation 10 times 4 bytes, 422 00:17:12,286 --> 00:17:15,176 so 40 bytes assuming a 4 byte integer here. 423 00:17:15,436 --> 00:17:18,626 So I allocate 40 bytes dynamically and I retain 424 00:17:18,626 --> 00:17:21,506 that address, but then I never in fact call freeze. 425 00:17:21,506 --> 00:17:23,386 So that's arguably one bug in here. 426 00:17:23,616 --> 00:17:26,516 Now granted and to be clear, when this program quits, 427 00:17:26,516 --> 00:17:29,386 generally the operating system should take back the memory 428 00:17:29,386 --> 00:17:31,176 that was allocated for this program. 429 00:17:31,536 --> 00:17:35,636 So realize that even though with little baby programs like this, 430 00:17:35,896 --> 00:17:37,866 it might seem lame or unnecessary 431 00:17:37,866 --> 00:17:39,146 to bother freeing memory. 432 00:17:39,426 --> 00:17:41,646 Very rarely in life will your programs be 433 00:17:41,646 --> 00:17:43,156 as simple or as short as this. 434 00:17:43,156 --> 00:17:44,606 So this habit of freeing memory 435 00:17:44,606 --> 00:17:46,786 that you allocate is indeed important. 436 00:17:46,786 --> 00:17:47,736 So that's one bug there. 437 00:17:47,736 --> 00:17:49,026 So I haven't freed my memory. 438 00:17:49,306 --> 00:17:51,386 And there's another bug, maybe a little more subtle 439 00:17:51,386 --> 00:17:52,256 but even worse. 440 00:17:52,296 --> 00:17:54,766 >> Unknown: [Inaudible] the 10th element of array X. 441 00:17:55,306 --> 00:17:55,556 >> Unknown: Ok. 442 00:17:55,556 --> 00:17:57,366 >> Unknown: 0 through 9 element. 443 00:17:57,866 --> 00:17:58,266 >> Unknown: Ok, good. 444 00:17:59,476 --> 00:18:02,326 So and let me tweak your jargon. 445 00:18:02,326 --> 00:18:05,256 So I don't allocate but I assigned the 10th location 446 00:18:05,256 --> 00:18:09,396 or the 11th location of X even though there are only 10 447 00:18:09,396 --> 00:18:10,166 locations there. 448 00:18:10,456 --> 00:18:13,296 So because it's 0 indexed, I can go from X bracket 0 449 00:18:13,296 --> 00:18:16,116 through X bracket 9, but 10 is not good. 450 00:18:16,116 --> 00:18:18,756 Now not necessarily will this trigger a seg fault, 451 00:18:18,756 --> 00:18:20,866 which is why especially C and C plus, plus, 452 00:18:21,156 --> 00:18:23,106 some of these bugs are hard to chase down, 453 00:18:23,106 --> 00:18:25,826 because the word segmentation fault, 454 00:18:25,826 --> 00:18:29,326 the phrase segmentation fault actually hints at what's real, 455 00:18:29,446 --> 00:18:32,676 slightly more complicated structure underneath the hood 456 00:18:32,876 --> 00:18:34,896 which is that your computer's RAM is organized 457 00:18:34,896 --> 00:18:37,746 into different segments and it's really only once you cross 458 00:18:37,746 --> 00:18:39,966 segment boundaries that bad things happen. 459 00:18:40,216 --> 00:18:43,256 So realize that this might not trigger a seg fault 460 00:18:43,336 --> 00:18:46,336 but fortunately can you the human, or we the staff 461 00:18:46,336 --> 00:18:47,746 or programs like the one I'm 462 00:18:47,746 --> 00:18:50,926 about to demonstrate actually detect even mistakes like that. 463 00:18:51,106 --> 00:18:52,946 Because there are situations certainly 464 00:18:52,946 --> 00:18:56,636 in which bad things can happen like your program dumping core. 465 00:18:56,636 --> 00:18:57,316 So let me do this. 466 00:18:57,316 --> 00:18:58,126 Make memory. 467 00:18:58,816 --> 00:19:01,476 And that's going to execute the long GCC command. 468 00:19:01,836 --> 00:19:05,456 And I'm going to now run valgrand on memory 469 00:19:05,456 --> 00:19:07,346 and unfortunately its output is going 470 00:19:07,346 --> 00:19:08,486 to be kind of crazy, alright. 471 00:19:08,486 --> 00:19:11,316 A little overwhelming at first but let's just glance 472 00:19:11,316 --> 00:19:13,376 at the bottom on up, leak summary. 473 00:19:13,376 --> 00:19:16,166 So I definitely lost 40 bytes in one block. 474 00:19:16,286 --> 00:19:19,266 Alright, so that alone is a pretty clear warning. 475 00:19:19,266 --> 00:19:21,556 I have allocated memory that I did not free. 476 00:19:21,806 --> 00:19:23,306 It was even able to count that amount. 477 00:19:23,456 --> 00:19:26,026 Unfortunately it consists with what I predicted earlier 478 00:19:26,026 --> 00:19:27,416 when you pointed out the memory leak. 479 00:19:27,416 --> 00:19:30,416 So there's a problem there and then possibly loss, 480 00:19:30,526 --> 00:19:31,666 still reachable, suppressive. 481 00:19:31,666 --> 00:19:33,376 So this actually looks ok and you know what? 482 00:19:33,376 --> 00:19:34,686 Let me take its advice. 483 00:19:34,686 --> 00:19:37,826 I'm going to rerun the program with this switch as it's called, 484 00:19:38,236 --> 00:19:41,436 this command line argument often in Linux or UNIX or Mac OS 485 00:19:41,436 --> 00:19:44,566 when you want to tweak the behavior of a program, 486 00:19:44,566 --> 00:19:46,336 yes you can pass in command line arguments. 487 00:19:46,746 --> 00:19:49,156 But if they're sort of configuration options, 488 00:19:49,156 --> 00:19:52,756 those arguments will often begin with a single slash or a slash, 489 00:19:52,756 --> 00:19:54,856 sorry a single hyphen or two hyphens. 490 00:19:54,956 --> 00:19:56,046 This is simply convention. 491 00:19:56,316 --> 00:19:58,656 So let me do as it suggests and rerun it like this. 492 00:19:59,116 --> 00:20:00,756 Alright, so it looks the same at the bottom 493 00:20:00,756 --> 00:20:02,846 but I think I got a little more error, 494 00:20:02,986 --> 00:20:05,366 a little more reporting up top. 495 00:20:05,566 --> 00:20:07,886 So error summary, 14 errors, wow. 496 00:20:07,936 --> 00:20:09,696 I don't even have 14 lines of code. 497 00:20:09,776 --> 00:20:11,086 So something's really wrong. 498 00:20:11,526 --> 00:20:15,516 So let me scroll up and as is the case with GDB, 499 00:20:15,686 --> 00:20:18,436 don't focus your intention entirely on the bottom. 500 00:20:18,436 --> 00:20:20,386 Don't solve problems from the ground up. 501 00:20:20,626 --> 00:20:22,276 Go back to the very first message, 502 00:20:22,276 --> 00:20:24,416 because remember there's sometimes a cascading effect 503 00:20:24,416 --> 00:20:27,036 where a problem here can actually create the appearance 504 00:20:27,036 --> 00:20:29,056 of problems here even if there are none. 505 00:20:29,096 --> 00:20:30,026 So let's start at the top. 506 00:20:30,026 --> 00:20:31,246 Here's where I executed it. 507 00:20:31,926 --> 00:20:34,026 So mem check, a memory error detector, 508 00:20:34,026 --> 00:20:35,576 that's indeed what I'm running here. 509 00:20:35,876 --> 00:20:38,296 Some copyright information, which is not that useful. 510 00:20:38,296 --> 00:20:41,506 This is you know interesting but it feels a little cryptic, 511 00:20:41,506 --> 00:20:42,656 so I'm going to turn a blind eye 512 00:20:42,656 --> 00:20:44,396 for the moment until ah, here we go. 513 00:20:44,646 --> 00:20:46,286 Here is mention of my program. 514 00:20:46,286 --> 00:20:48,196 So this is shared library. 515 00:20:48,426 --> 00:20:49,236 This thing here. 516 00:20:49,456 --> 00:20:52,016 So if it's not your code or not a file you're familiar with, 517 00:20:52,186 --> 00:20:53,386 probably not a mistake. 518 00:20:53,386 --> 00:20:56,226 In almost certainly not your fault, but let's look here. 519 00:20:56,226 --> 00:20:58,276 Invalid write of size 4. 520 00:20:58,626 --> 00:21:00,756 Alright, so you can probably guess where this is coming from. 521 00:21:00,756 --> 00:21:04,416 So Valgrand is telling me in memory.C, line 2, 522 00:21:04,416 --> 00:21:06,656 I'm doing an invalid write of size 4. 523 00:21:06,916 --> 00:21:10,056 That is I'm probably trying to write 4 bytes or one integer 524 00:21:10,056 --> 00:21:11,986 to a location that it doesn't belong. 525 00:21:12,286 --> 00:21:15,656 And let's see, oh ok, so that is the result of having called F, 526 00:21:15,956 --> 00:21:18,856 the function F in line 29 of memory.c. 527 00:21:18,856 --> 00:21:20,896 So it's the output is similar in spirit 528 00:21:20,896 --> 00:21:22,376 to what GDB can do for you. 529 00:21:22,736 --> 00:21:24,796 And then let's see here. 530 00:21:24,796 --> 00:21:27,346 Alright so this is not good. 531 00:21:27,646 --> 00:21:30,936 Let's see, address 0 bytes after a block of size 40 allocated. 532 00:21:30,936 --> 00:21:34,376 So I'm calling mallock apparently in line 21 533 00:21:34,706 --> 00:21:37,576 of memory.c, right inside of my F function. 534 00:21:37,576 --> 00:21:39,286 So it's not being as clear perhaps 535 00:21:39,286 --> 00:21:40,546 as to what the problem is. 536 00:21:40,546 --> 00:21:42,436 It's referring to an allocation of memory. 537 00:21:42,716 --> 00:21:44,596 But again per the summary down at the bottom, 538 00:21:44,806 --> 00:21:47,406 I have definitely lost 40 bytes 539 00:21:47,776 --> 00:21:50,116 because of my allocation in this line. 540 00:21:50,366 --> 00:21:51,326 So let's see if I can fix this. 541 00:21:51,326 --> 00:21:54,036 Let me go ahead and load memory.c again. 542 00:21:55,056 --> 00:21:57,386 Alright, so let's fix this to do this. 543 00:21:57,486 --> 00:21:59,856 So let's change this to 9, that's probably safe. 544 00:21:59,856 --> 00:22:01,676 And let me go ahead and recompile. 545 00:22:02,186 --> 00:22:05,056 Alright and now let me rerun Valgrand. 546 00:22:05,816 --> 00:22:08,796 Oops, not vim. 547 00:22:08,796 --> 00:22:11,206 Let's rerun the memory check. 548 00:22:11,366 --> 00:22:14,886 Alright, so good, leak summary definitely lost 40 bytes 549 00:22:14,996 --> 00:22:17,996 but if I look up here that mention 550 00:22:18,306 --> 00:22:22,416 of writing 4 bytes is no longer there. 551 00:22:22,416 --> 00:22:24,276 So that's good and let's fix this other bug. 552 00:22:24,326 --> 00:22:27,556 So let me go in here and after I've allocated X, 553 00:22:27,556 --> 00:22:34,306 how do I go about freeing the memory pointed at by X? 554 00:22:34,306 --> 00:22:34,376 [ Period of silence ] 555 00:22:34,376 --> 00:22:41,516 Anyone. Free X, very difficult, alright, so. 556 00:22:41,576 --> 00:22:44,706 Notice you free the pointer, don't dereference the pointer. 557 00:22:44,706 --> 00:22:46,106 So let me recompile make memory. 558 00:22:46,626 --> 00:22:47,836 Alright, nothing's bad 559 00:22:47,836 --> 00:22:52,936 yet so let me rerun the memory check and ok, interesting. 560 00:22:52,936 --> 00:22:56,506 Error summary, 13 errors from 6 contexts but and here's 561 00:22:56,506 --> 00:22:58,786 where sometimes you should not be misled by the program 562 00:22:58,786 --> 00:23:02,576 because it's really doing a very diligent, very anal check of all 563 00:23:02,576 --> 00:23:04,126 of the code related to your program. 564 00:23:04,126 --> 00:23:06,466 But recall that almost any time you write a C program, 565 00:23:06,706 --> 00:23:09,306 you are linking in other people's code, 566 00:23:09,306 --> 00:23:11,766 which isn't necessarily buggy but maybe is 567 00:23:11,766 --> 00:23:16,376 in fact giving tools like this a little bit of cause for concern. 568 00:23:16,376 --> 00:23:18,026 But if I now look through this, 569 00:23:18,026 --> 00:23:22,916 I actually don't see any errors that seem to be mine. 570 00:23:22,916 --> 00:23:24,846 Mallock free, I did one allocation, 571 00:23:24,846 --> 00:23:26,536 one free, 40 bytes allocated. 572 00:23:26,846 --> 00:23:28,726 So in fact, no leaks are possible. 573 00:23:28,786 --> 00:23:29,866 So this is good. 574 00:23:29,866 --> 00:23:32,736 Down here, all heap blocks are free, no leaks are possible 575 00:23:32,736 --> 00:23:33,996 and let's do the more succinct one 576 00:23:34,226 --> 00:23:36,746 where we don't actually do this command line argument. 577 00:23:37,586 --> 00:23:40,046 And in fact none of these seem to be mine. 578 00:23:40,376 --> 00:23:45,046 Ok, so you might enjoy or you might be disappointed to go back 579 00:23:45,366 --> 00:23:50,016 into say p set 4 which didn't use pointers all that much 580 00:23:50,266 --> 00:23:52,216 to run this program on your own code, 581 00:23:52,216 --> 00:23:55,676 especially if you did actually use something like Mallock and 582 00:23:55,676 --> 00:23:57,086 or free or the lack thereof. 583 00:23:57,266 --> 00:23:59,106 And certainly for problem set 6 584 00:23:59,106 --> 00:24:01,506 which will be our last problem set based in C 585 00:24:01,506 --> 00:24:03,886 for which you implement that fastest spell checker possible, 586 00:24:04,076 --> 00:24:06,686 will you absolutely want to check whether 587 00:24:06,686 --> 00:24:08,756 or not you are risking memory errors. 588 00:24:08,756 --> 00:24:11,006 Because problem set 6 is the culmination 589 00:24:11,006 --> 00:24:12,496 of all these discussions we've been having 590 00:24:12,496 --> 00:24:14,066 about data structures and pointers. 591 00:24:14,296 --> 00:24:16,496 And it's dare say the most dangerous one with regard 592 00:24:16,496 --> 00:24:20,386 to memory, but also very easily solved especially 593 00:24:20,386 --> 00:24:21,896 with the help of tools like this. 594 00:24:22,066 --> 00:24:23,706 So more on that in the weeks to come. 595 00:24:24,256 --> 00:24:26,346 So this was as cute as I could be late at night with the slide. 596 00:24:26,706 --> 00:24:30,066 So hexadecimal is something we've talked about already 597 00:24:30,066 --> 00:24:30,986 and it's not all 598 00:24:30,986 --> 00:24:33,456 that complicated even though it might look a little cryptic 599 00:24:33,456 --> 00:24:35,446 and we talked about it in the context of Photoshop 600 00:24:35,446 --> 00:24:36,916 and you'll see it again when we do web stuff 601 00:24:36,916 --> 00:24:38,326 for colors and all of that. 602 00:24:38,546 --> 00:24:40,606 But you'll particularly see it this week because some 603 00:24:40,606 --> 00:24:42,606 of the tools you'll be using forensically 604 00:24:42,876 --> 00:24:47,166 or to get your program working for problem set 5 is, 605 00:24:47,856 --> 00:24:49,866 I forget how I started that sentence, 606 00:24:49,866 --> 00:24:52,516 one of the things you'll be doing this week is looking 607 00:24:52,516 --> 00:24:53,726 at hexadecimal values. 608 00:24:53,796 --> 00:24:56,346 Because of some of the tools you'll be using forensically 609 00:24:56,666 --> 00:24:59,506 to debug your code and develop your code. 610 00:24:59,506 --> 00:25:02,476 So you'll use a program for instance called xxd 611 00:25:02,756 --> 00:25:05,856 which simply dumps the contents of a file, like a JPEG 612 00:25:05,856 --> 00:25:09,426 or a BMP file or even your own binary and shows you the bits 613 00:25:09,426 --> 00:25:11,366 but it doesn't show you them as zeros and ones 614 00:25:11,366 --> 00:25:13,956 which is beyond useless for a typical human. 615 00:25:14,196 --> 00:25:16,706 It will instead show you things more succinctly 616 00:25:16,876 --> 00:25:19,426 in a more interesting way using hexadecimal. 617 00:25:19,716 --> 00:25:22,806 But we bring this up now because you'll start to realize 618 00:25:22,846 --> 00:25:25,156 that some details related to files 619 00:25:25,156 --> 00:25:28,996 on computers are very much operating system 620 00:25:28,996 --> 00:25:30,976 and or CPU specific. 621 00:25:31,326 --> 00:25:35,736 Some computers store data differently on disk than they do 622 00:25:35,736 --> 00:25:37,726 on say another computer altogether. 623 00:25:37,726 --> 00:25:39,026 So what do we mean by this? 624 00:25:39,206 --> 00:25:44,026 Well, there is this notion of endianness in the world 625 00:25:44,026 --> 00:25:47,756 of computing that refers to how numbesr are stored in memory, 626 00:25:47,906 --> 00:25:50,376 RAM, or perhaps on disk, 627 00:25:50,696 --> 00:25:52,506 depending on who is writing to disk. 628 00:25:52,866 --> 00:25:53,786 So what do I mean by this? 629 00:25:53,786 --> 00:25:56,286 Well, let me open up a silly little text editor here. 630 00:25:56,466 --> 00:25:59,696 And if I have the hexadecimal number like 1 631 00:25:59,696 --> 00:26:03,796 and I store something like this, int X gets 1. 632 00:26:04,106 --> 00:26:07,966 Well in let's see decimal that's just the number 1. 633 00:26:08,316 --> 00:26:11,736 In binary it's the number 1, 2, 3, 4, 5, 6, 7, 634 00:26:11,736 --> 00:26:14,236 8 and let me go ahead and copy this. 635 00:26:14,526 --> 00:26:17,686 It's going to be 1, 2, 3 and change this 636 00:26:18,026 --> 00:26:19,816 and now I'll shrink this down. 637 00:26:20,106 --> 00:26:22,696 So that's in binary the number and this is why humans tend not 638 00:26:22,696 --> 00:26:24,596 to use binary as a representation system. 639 00:26:24,596 --> 00:26:25,656 It's not all that useful. 640 00:26:25,986 --> 00:26:28,816 But in hexadecimal how would I write this same value? 641 00:26:29,776 --> 00:26:31,496 So that's in binary and incidentally, 642 00:26:31,496 --> 00:26:34,916 notionally some people will often add a suffix 643 00:26:34,916 --> 00:26:37,676 of lower case b just to indicate this is a binary number. 644 00:26:38,086 --> 00:26:39,376 How would I write this same thing in hex? 645 00:26:39,416 --> 00:26:40,866 0X what? 646 00:26:41,361 --> 00:26:43,361 [ Period of silence ] 647 00:26:43,856 --> 00:26:46,616 01 because each of these digits happens 648 00:26:46,616 --> 00:26:47,546 to represent how many bits? 649 00:26:49,086 --> 00:26:51,266 So 8, so each pair represents 8 650 00:26:51,266 --> 00:26:54,856 because each individual digit represents 4 bits 651 00:26:54,856 --> 00:26:56,386 and why was this? 652 00:26:56,476 --> 00:26:59,346 Just to go back to a couple weeks ago 653 00:26:59,346 --> 00:27:02,476 when we started talking about hexadecimal and we said 654 00:27:02,476 --> 00:27:07,166 that each digit 0 through 9 and A through F represented 4 bits, 655 00:27:07,276 --> 00:27:09,036 either some pattern of 0s and 1s. 656 00:27:09,036 --> 00:27:11,046 So this was the decimal number 0 657 00:27:11,266 --> 00:27:15,926 and 11111 is the hexadecimal number 658 00:27:15,926 --> 00:27:17,626 or the decimal number what? 659 00:27:17,626 --> 00:27:18,216 [ Period of silence ] 660 00:27:18,216 --> 00:27:22,656 15 or in hexadecimal we would write this as an F. So here 661 00:27:22,656 --> 00:27:25,546 in the left hand column I'll write this all out. 662 00:27:25,886 --> 00:27:28,856 So in the left hand column here I have binary 663 00:27:29,226 --> 00:27:33,176 and then I have decimal, decimal 664 00:27:33,176 --> 00:27:34,876 and then here I have hexadecimal. 665 00:27:35,176 --> 00:27:37,896 Ok, so those are my three little contrived columns there. 666 00:27:38,156 --> 00:27:39,636 So why is this relevant? 667 00:27:39,636 --> 00:27:41,906 Well, this number here, just the number one, 668 00:27:42,176 --> 00:27:45,026 it turns out when it is stored in RAM 669 00:27:45,026 --> 00:27:49,456 on a typical computer including NICE.fas.harvard.edu it's 670 00:27:49,456 --> 00:27:51,736 actually not stored in this format 671 00:27:52,116 --> 00:27:54,036 but in the little NDN format. 672 00:27:54,536 --> 00:27:57,986 So there's this notion of big Ns and little Ns to numbers. 673 00:27:57,986 --> 00:28:00,616 So here is a number, granted in hexadecimal. 674 00:28:00,816 --> 00:28:03,696 The little n is the lowest ordered bits. 675 00:28:04,026 --> 00:28:05,576 So the tiniest numbers right? 676 00:28:05,576 --> 00:28:07,146 Because if I put a one at the other side, 677 00:28:07,376 --> 00:28:09,156 this actually makes this a pretty big number. 678 00:28:09,396 --> 00:28:11,796 But because the number here is at the right hand side, 679 00:28:11,796 --> 00:28:14,226 this is the little n, this is why this is such a small value. 680 00:28:14,636 --> 00:28:16,356 So the big N is over here. 681 00:28:16,616 --> 00:28:18,346 So this is very nice and neat 682 00:28:18,346 --> 00:28:20,926 because when we've been reading binary numbers from left 683 00:28:20,926 --> 00:28:24,046 to right, or rather from right to left we have the ones column, 684 00:28:24,046 --> 00:28:26,676 the twos column, the fours column and so forth, 685 00:28:26,676 --> 00:28:28,876 everything mathematically works out nicely. 686 00:28:29,416 --> 00:28:32,756 Unfortunately a typical computer would actually store this 687 00:28:32,806 --> 00:28:37,776 number, the number 1 as this. 688 00:28:37,776 --> 00:28:37,996 [ Period of silence ] 689 00:28:37,996 --> 00:28:40,106 It would store it in little NDN format 690 00:28:40,106 --> 00:28:44,276 where the lowest ordered byte actually comes first in memory. 691 00:28:44,356 --> 00:28:47,946 So in other words the thing is essentially reversed in memory. 692 00:28:47,976 --> 00:28:48,886 Well what do we mean by that? 693 00:28:48,926 --> 00:28:51,526 Well we have this nice little picture which if any 694 00:28:51,526 --> 00:28:54,006 of this is unclear it's actually a really nice article 695 00:28:54,066 --> 00:28:55,966 on Wikipedia for this topic, NDN ness. 696 00:28:56,286 --> 00:28:59,646 Take a look at the top left box there called register. 697 00:28:59,646 --> 00:29:02,316 So inside of a CPU are a whole bunch of registers. 698 00:29:02,536 --> 00:29:05,556 These are very tiny units of memory that ultimately is 699 00:29:05,556 --> 00:29:07,616 where all the action happens, all of the additional, 700 00:29:07,616 --> 00:29:10,316 multiplication, subtraction, all of the lowest level operations 701 00:29:10,316 --> 00:29:13,476 in the CPU happen in these things called registers. 702 00:29:13,476 --> 00:29:17,306 So in that top left box there, 0A, 0B, 0C, 0D, 703 00:29:17,576 --> 00:29:19,846 person who made the picture just needed a contrived number that's 704 00:29:19,846 --> 00:29:20,556 easy to remember. 705 00:29:20,876 --> 00:29:24,266 That is a number just as I might write it by hand 706 00:29:24,266 --> 00:29:26,746 on the blackboard or on my little notepad here. 707 00:29:26,796 --> 00:29:28,806 So it's from the big end all the way down. 708 00:29:29,206 --> 00:29:33,756 So what's the case in a computer that big NDN, what happens is 709 00:29:33,756 --> 00:29:35,806 if the left hand thing there represents RAM, 710 00:29:36,346 --> 00:29:39,226 where A represents the low of, 711 00:29:39,286 --> 00:29:42,576 location A plus 1 represents the next location, 712 00:29:42,576 --> 00:29:44,856 A plus 2 is the next location. 713 00:29:45,046 --> 00:29:47,466 In other words the bytes are getting incremented from top 714 00:29:47,506 --> 00:29:48,846 to bottom in this picture. 715 00:29:49,156 --> 00:29:53,496 Notice that the big end of the number 0A does 716 00:29:53,496 --> 00:29:55,836 in fact come first in memory. 717 00:29:56,276 --> 00:29:58,396 It comes at the lowest ordered address 718 00:29:58,506 --> 00:30:02,746 which is little A. Meanwhile the lowest ordered bits 719 00:30:03,006 --> 00:30:06,206 which is 0D come later in memory. 720 00:30:06,456 --> 00:30:08,746 So in other words at least if you think like I do 721 00:30:08,746 --> 00:30:13,796 about numbers in memory, the real number 0A, 0B, 0C, 722 00:30:13,796 --> 00:30:17,606 0D is laid out in memory exactly as you the human would expect, 723 00:30:17,816 --> 00:30:20,006 from left to right or in this case from top to bottom. 724 00:30:20,416 --> 00:30:24,476 But in a little NDN architecture like nice.fas.harvard.edu 725 00:30:24,476 --> 00:30:26,206 and a lot of Intel hardware, 726 00:30:26,436 --> 00:30:28,676 what you have instead is kind of the opposite. 727 00:30:28,676 --> 00:30:31,366 So if we have that same number in the registers, 728 00:30:31,486 --> 00:30:35,296 the same number on a piece of paper, 0A,0B, 0C, 0D, 729 00:30:35,576 --> 00:30:39,336 notice that it gets laid out in memory essentially backwards. 730 00:30:39,676 --> 00:30:44,736 Where the big end of the number 0A comes last in memory. 731 00:30:45,316 --> 00:30:47,316 So this is simply unfortunate 732 00:30:47,766 --> 00:30:49,726 because I think it adds unnecessary confusion. 733 00:30:49,806 --> 00:30:51,366 But the world decided long ago 734 00:30:51,606 --> 00:30:55,006 that some CPUs would be little NDNs, some CPUs would be big NDN 735 00:30:55,176 --> 00:30:57,716 and really complicated annoying things happen 736 00:30:57,916 --> 00:30:59,896 when two computers need to talk to one another. 737 00:30:59,896 --> 00:31:03,416 So thankfully the world has decided on a network NDN format. 738 00:31:03,736 --> 00:31:05,906 So because we have this thing called the internet now, 739 00:31:05,906 --> 00:31:08,836 where data transfers from points A to B is commonplace and A 740 00:31:08,836 --> 00:31:11,586 and B absolutely might not be the same type of computer, 741 00:31:11,966 --> 00:31:15,156 thankfully the world has standardized what format is 742 00:31:15,156 --> 00:31:17,006 actually used across the wire, 743 00:31:17,006 --> 00:31:18,956 so that you don't have one computer talking 744 00:31:18,956 --> 00:31:20,686 to another in the wrong order. 745 00:31:20,916 --> 00:31:22,086 So what does this mean for you? 746 00:31:22,086 --> 00:31:25,776 Well when you take a look at BMP files, bitmap graphics 747 00:31:25,776 --> 00:31:29,456 that we've given you as part of problem set 5, you will find 748 00:31:29,456 --> 00:31:31,936 when you look at them with this program called XXD, 749 00:31:31,936 --> 00:31:36,166 it will show you the contents of bitmap files in memory. 750 00:31:36,526 --> 00:31:39,686 You're going to see that the numbers are not the way you 751 00:31:39,686 --> 00:31:43,116 would expect them if you wrote down those numbers in memory. 752 00:31:43,116 --> 00:31:44,996 So you'll be seeing and the whole spec walks you 753 00:31:44,996 --> 00:31:47,736 through this, a bitmap file can be represented 754 00:31:47,736 --> 00:31:51,106 with a whole bunch of RGB triples, red green, blue pixels, 755 00:31:51,296 --> 00:31:53,456 red, green, blue triples. 756 00:31:53,816 --> 00:31:56,646 You'll find unfortunately that these are laid out backwards 757 00:31:56,646 --> 00:31:58,016 when you actually look inside. 758 00:31:58,186 --> 00:32:01,136 So more on that in the spec itself. 759 00:32:01,416 --> 00:32:03,656 And let me go ahead and open this one file here. 760 00:32:04,176 --> 00:32:09,846 So vi of let's say NDN.c. Notice we have this little 761 00:32:09,846 --> 00:32:11,026 program here. 762 00:32:11,546 --> 00:32:14,636 So this is a quick teaser for what you'll be playing 763 00:32:14,636 --> 00:32:16,116 with probably partly in problem set 5. 764 00:32:16,376 --> 00:32:18,816 Let me go ahead and scroll to the top. 765 00:32:18,816 --> 00:32:19,926 I have one main function. 766 00:32:20,026 --> 00:32:23,006 This is NDN.c, now you got a little anal check here 767 00:32:23,006 --> 00:32:25,006 to make sure I provide a command line argument 768 00:32:25,096 --> 00:32:26,556 and if not I just return 1. 769 00:32:26,556 --> 00:32:28,386 I don't even yell at the user explicitly. 770 00:32:28,706 --> 00:32:33,006 Here recall is the notion for opening a file in read format. 771 00:32:33,086 --> 00:32:35,336 So the goal of this program is simply to run it, 772 00:32:35,816 --> 00:32:39,146 give it a BMP file, a graphic file like a Windows wallpaper, 773 00:32:39,146 --> 00:32:40,656 that's all a BMP file is. 774 00:32:40,996 --> 00:32:44,446 And then actually look inside of it and see what some 775 00:32:44,446 --> 00:32:46,276 of its fields look like in the order 776 00:32:46,276 --> 00:32:47,466 in which numbers are laid out. 777 00:32:47,806 --> 00:32:50,636 So let's see, if this thing, this file pointer is null, 778 00:32:50,866 --> 00:32:53,836 let's return immediately because that's not a legit file. 779 00:32:54,026 --> 00:32:56,796 And then there's some instructions like this, F seek. 780 00:32:56,796 --> 00:32:58,446 So you'll see this more in the spec 781 00:32:58,446 --> 00:33:00,436 and I'll defer to its explanations. 782 00:33:00,706 --> 00:33:04,176 But F seek, file seek is a function that simply starts 783 00:33:04,176 --> 00:33:06,026 at a file and then fast forwards, 784 00:33:06,026 --> 00:33:07,766 much like an old cassette player would, 785 00:33:07,956 --> 00:33:10,136 to a specific location in the file. 786 00:33:10,226 --> 00:33:13,216 So F seek seeks two bytes forward 787 00:33:13,216 --> 00:33:14,646 in the file, thanks to that 2. 788 00:33:15,076 --> 00:33:17,686 Down here I'm doing an F read and as you'll see 789 00:33:17,686 --> 00:33:20,686 in the distribution code, which again is well documented, 790 00:33:20,906 --> 00:33:22,626 notice that F read is file read. 791 00:33:22,876 --> 00:33:28,236 It's going to read into a variable called BF size a number 792 00:33:28,496 --> 00:33:31,706 from that file, thanks to the use of FP here. 793 00:33:31,966 --> 00:33:34,746 So in short what this program needs to do is given the name 794 00:33:34,746 --> 00:33:36,836 of a BMP file, it needs to open it, it then needs 795 00:33:36,836 --> 00:33:39,586 to fast forward to a specific location in that file 796 00:33:39,726 --> 00:33:43,686 and then it wants to read out a specific chunk of memory, 797 00:33:43,976 --> 00:33:47,636 specifically 32 bits which we're going to call BF size. 798 00:33:47,636 --> 00:33:48,846 Now why is this the case? 799 00:33:49,226 --> 00:33:53,516 Well I can whisk us over here to problem sets and I'm going to go 800 00:33:53,516 --> 00:33:56,216 down to just the standard edition of the forensics P set. 801 00:33:56,526 --> 00:33:59,066 I'm going to scroll down to a picture I'm looking 802 00:33:59,156 --> 00:34:03,436 for which is this one right here, which the specifics 803 00:34:03,436 --> 00:34:05,946 of which aren't terribly interesting right now. 804 00:34:06,226 --> 00:34:07,766 But what we're ultimately looking 805 00:34:07,766 --> 00:34:11,046 for in this file is a specific field. 806 00:34:11,266 --> 00:34:14,476 So long story short, files have different parts to them. 807 00:34:14,476 --> 00:34:16,516 This is what makes something a file format. 808 00:34:16,516 --> 00:34:18,666 A company like Microsoft or someone else declares 809 00:34:18,666 --> 00:34:19,966 that the first few bytes will mean this, 810 00:34:20,296 --> 00:34:22,166 the next few bytes will mean that, and so forth. 811 00:34:22,406 --> 00:34:25,576 This picture as the spec explains discusses exactly 812 00:34:25,576 --> 00:34:28,246 what's inside of a bitmap file and where. 813 00:34:28,516 --> 00:34:31,546 So the goal very simply of this code, because I know what 814 00:34:31,546 --> 00:34:35,226 that picture looks like, is to read out an integer inside 815 00:34:35,226 --> 00:34:37,476 of the file that's going to tell me how large it is. 816 00:34:37,896 --> 00:34:39,716 How large is this actual BMP? 817 00:34:40,236 --> 00:34:41,236 So what do I do next? 818 00:34:41,366 --> 00:34:44,746 After I've read in this nt, we report that the value 819 00:34:44,746 --> 00:34:47,936 of this nt, BF size is what's inside this variable. 820 00:34:48,286 --> 00:34:50,986 Then I do this, it turns out much like a cassette tape. 821 00:34:50,986 --> 00:34:53,576 There's a rewind function which rewinds the file 822 00:34:53,576 --> 00:34:54,486 to the very beginning. 823 00:34:54,816 --> 00:34:58,656 And then well I'm trying to, oh this is getting 824 00:34:58,656 --> 00:35:02,256 so much more complicated than I want it to be. 825 00:35:02,256 --> 00:35:06,196 Ok and then I do some complicated stuff which, damn. 826 00:35:06,356 --> 00:35:08,566 I don't really, let's spoil the ending. 827 00:35:10,076 --> 00:35:15,366 NDN. Ok, NDN of I have a file in here called doc.bmp. 828 00:35:15,366 --> 00:35:18,266 Ok. Let me skip some of the details 829 00:35:18,266 --> 00:35:19,986 because they're not going to be enlightening, they're just going 830 00:35:19,986 --> 00:35:21,356 to be boring at this point in time 831 00:35:21,356 --> 00:35:22,736 without P set 5 under your belt. 832 00:35:23,076 --> 00:35:25,636 So what I have just run here is a program 833 00:35:25,636 --> 00:35:29,406 that analyzes a tiny little bitmap file called doc.bmp. 834 00:35:29,406 --> 00:35:30,446 I made this with Photoshop. 835 00:35:30,716 --> 00:35:33,166 I opened up a one pixel by one pixel file 836 00:35:33,166 --> 00:35:34,716 and then I clicked my paintbrush tool 837 00:35:34,716 --> 00:35:36,056 and said make this a black dot. 838 00:35:36,386 --> 00:35:37,986 Then I save the file as doc.bmp. 839 00:35:37,986 --> 00:35:41,046 That's a bit of a white lie because making one little dot 840 00:35:41,046 --> 00:35:42,506 like that would not make such a big file. 841 00:35:43,106 --> 00:35:46,016 But BF size of this file is 58. 842 00:35:46,016 --> 00:35:48,936 This file in total is 58 bytes long. 843 00:35:49,156 --> 00:35:51,366 And simply what I did at the very end here, 844 00:35:51,366 --> 00:35:53,226 so that very complicated piece of code 845 00:35:53,226 --> 00:35:56,556 which I now regret teasing you with already is to show you 846 00:35:56,786 --> 00:36:00,236 that in memory, the size of this file is in fact laid 847 00:36:00,236 --> 00:36:03,546 out in little NDN format, even though this is an nt. 848 00:36:03,546 --> 00:36:05,426 And I would therefore expect it to be something 849 00:36:05,426 --> 00:36:10,046 like zeros then some zeros then some zeros then the number 58 850 00:36:10,046 --> 00:36:11,786 because there's 4 bytes to an integer. 851 00:36:12,056 --> 00:36:15,456 In fact I do see from left to right that the little end 852 00:36:15,456 --> 00:36:16,896 of this number comes first. 853 00:36:17,156 --> 00:36:19,646 So let me wave my hand at some of the complexity I tripped 854 00:36:19,646 --> 00:36:23,056 over there just to say that when you dive into problem set 5, 855 00:36:23,056 --> 00:36:25,846 if not already, these are precisely the encoding issues 856 00:36:26,136 --> 00:36:28,516 that you are going to encounter for yourself. 857 00:36:28,836 --> 00:36:29,846 But let's take a step back 858 00:36:29,986 --> 00:36:32,846 and keep things a little less overwhelming at first 859 00:36:33,186 --> 00:36:34,856 and just look at these building blocks. 860 00:36:34,856 --> 00:36:38,456 So even though thus far we've been only doing operations 861 00:36:38,456 --> 00:36:41,706 like addition and subtraction and multiplication and division 862 00:36:41,766 --> 00:36:44,986 in the context of C, there's some other useful operators 863 00:36:44,986 --> 00:36:47,556 as well with which you can do much more interesting things 864 00:36:47,556 --> 00:36:49,426 even though they're incredibly low level. 865 00:36:49,786 --> 00:36:52,866 So henceforth, if you ever want to take two variables, 866 00:36:52,866 --> 00:36:57,146 say X and Y, and you want to somehow combine their bits 867 00:36:57,146 --> 00:36:59,806 in an interesting way you can do that using these 868 00:36:59,806 --> 00:37:01,596 so called bit wise operators. 869 00:37:01,636 --> 00:37:05,336 Unlike the arithmetic operators like plus and minus that sort 870 00:37:05,336 --> 00:37:08,566 of operate on the number in the aggregate, bit wise operators 871 00:37:08,566 --> 00:37:09,826 when applied to this variable 872 00:37:09,826 --> 00:37:11,926 and this variable essentially line them up 873 00:37:12,216 --> 00:37:15,786 and then apply the operation one bit at a time. 874 00:37:16,036 --> 00:37:17,696 And there's the notion of anding bits, 875 00:37:17,696 --> 00:37:20,766 together oring them together, exclusively oring them together, 876 00:37:21,036 --> 00:37:24,046 complimenting them and also shifting left and right 877 00:37:24,106 --> 00:37:25,496 and by this I mean the following. 878 00:37:25,496 --> 00:37:28,296 So let me go ahead and just list 879 00:37:28,296 --> 00:37:33,936 out let's say a number X and call it 00000. 880 00:37:33,936 --> 00:37:36,026 For simplicity I'm not going to use integers 881 00:37:36,026 --> 00:37:38,186 because I'd be writing 32 digits all of the time. 882 00:37:38,496 --> 00:37:43,556 But let's say this Y is 00001 and suppose the operation I want 883 00:37:43,556 --> 00:37:49,636 to perform is X and Y. So X and Y would be represented as X 884 00:37:50,196 --> 00:37:54,296 and Y and this equals the result of applying the and operation 885 00:37:54,296 --> 00:37:57,586 to each pair of digits that line up together. 886 00:37:57,586 --> 00:37:58,956 So let's start from the left to right. 887 00:37:59,406 --> 00:38:03,816 0 and 0. So just logically that's false and false. 888 00:38:03,876 --> 00:38:06,396 So what should the answer be if you're anding them together much 889 00:38:06,396 --> 00:38:08,596 like we did with Boolean operators weeks ago. 890 00:38:08,596 --> 00:38:10,436 So it should be false. 891 00:38:10,706 --> 00:38:16,796 So 0 and 0 be clear, 0 and 0 is in fact 0. 892 00:38:16,916 --> 00:38:18,306 So that's the first of four digits. 893 00:38:18,586 --> 00:38:22,426 The next pair, this guy here and this guy here are also zeros 894 00:38:22,426 --> 00:38:25,026 so 0 and 0 gives me 0. 895 00:38:25,366 --> 00:38:32,436 0 and 0 gives me 0 and 0 and 1, false and true gives me 0. 896 00:38:32,646 --> 00:38:34,736 Alright so something that's false and true, 897 00:38:34,736 --> 00:38:36,706 if you and those two together, if you think of that 898 00:38:36,706 --> 00:38:39,726 as an ampersand ampersand operation whether it was 899 00:38:39,726 --> 00:38:42,796 in Scratch or it was in C. False 900 00:38:42,796 --> 00:38:44,946 and true gives you false as well. 901 00:38:45,186 --> 00:38:47,776 Meanwhile if we do this, this is the or operator. 902 00:38:47,776 --> 00:38:53,106 So x or y means either one bit or the other must be one. 903 00:38:53,356 --> 00:38:55,206 So what about 0 or 0? 904 00:38:55,666 --> 00:38:57,206 No. 0 or 0? 905 00:38:57,436 --> 00:38:58,866 No. 0 or 0? 906 00:38:59,086 --> 00:39:00,306 No. 0 or 1? 907 00:39:00,696 --> 00:39:03,606 Yes. And now this one's a little more interesting. 908 00:39:03,666 --> 00:39:08,466 X, X or Y, so slightly confusing name but exclusive 909 00:39:08,466 --> 00:39:11,576 or represented in the language C using the little 910 00:39:11,576 --> 00:39:12,596 carrot operator. 911 00:39:12,916 --> 00:39:15,816 This means that one of the bits and only one 912 00:39:15,816 --> 00:39:19,716 of the bits can be 1 for the result to be 1. 913 00:39:19,716 --> 00:39:23,816 So you need exclusivity, either one and zero or zero and one. 914 00:39:24,146 --> 00:39:26,486 But not 0 0 or 1 1. 915 00:39:26,486 --> 00:39:27,436 You need exclusivity. 916 00:39:27,666 --> 00:39:32,036 So 0 0, 0 X or 0, not exclusive. 917 00:39:32,036 --> 00:39:33,846 0 X or 0, not exclusive. 918 00:39:33,846 --> 00:39:35,486 0 X or 0, not exclusive. 919 00:39:35,816 --> 00:39:39,886 0 X or 1, yes, so now I have this answer here. 920 00:39:40,096 --> 00:39:41,806 Now some of them are pretty darn trivial. 921 00:39:41,856 --> 00:39:45,786 So Tilda X which here represents the compliment. 922 00:39:46,146 --> 00:39:49,176 This just means to flip the bits, 0 becomes 1 923 00:39:49,236 --> 00:39:55,896 and 1 becomes 0, so Tilda X is 1111, that's pretty simple. 924 00:39:55,896 --> 00:39:57,616 And then we had a couple of others. 925 00:39:57,616 --> 00:40:02,576 Let's say X left shift, so 2 angle brackets 1, 926 00:40:02,866 --> 00:40:04,876 let me go ahead and shrink the font a little bit. 927 00:40:04,906 --> 00:40:09,516 Actually let's do it to Y. So Y left shift, 928 00:40:09,516 --> 00:40:11,186 left shift is going to equal what? 929 00:40:11,186 --> 00:40:13,686 So this means take the bits of Y and shift them 930 00:40:13,746 --> 00:40:15,036 to the left by one place. 931 00:40:15,136 --> 00:40:16,176 So what does the output become? 932 00:40:16,176 --> 00:40:16,243 [ Period of silence ] 933 00:40:16,243 --> 00:40:21,846 001 and then what do you want to put at the left hand side? 934 00:40:22,656 --> 00:40:26,436 So 0. So the shift operators indeed pushes all of the bits 935 00:40:26,436 --> 00:40:28,586 to the left, if you're using the left shift operator 936 00:40:28,626 --> 00:40:30,616 or pushes all of the bits to the right 937 00:40:30,926 --> 00:40:32,406 if you're using the right shift operator. 938 00:40:32,606 --> 00:40:36,436 And it adds any empty spaces typically with zeros. 939 00:40:36,466 --> 00:40:38,586 Though FYI, there's some corner cases 940 00:40:38,586 --> 00:40:40,096 when you're using signed integers 941 00:40:40,346 --> 00:40:43,166 as to how you handle negative numbers in particular. 942 00:40:43,336 --> 00:40:45,866 But we'll focus just on simple positive numbers for now. 943 00:40:46,306 --> 00:40:48,766 This is kind of a funny thing and actually let's do this. 944 00:40:48,876 --> 00:40:54,996 Y right shift 1 is going to equal all zeros in this case 945 00:40:54,996 --> 00:40:57,216 because you're pushing the bit off the end of it. 946 00:40:57,616 --> 00:40:59,156 So this one's kind of interesting. 947 00:40:59,446 --> 00:41:02,796 If you shift the bits of a variable by 1 position 948 00:41:02,796 --> 00:41:05,306 like this, what's the equivalent mathematically? 949 00:41:05,306 --> 00:41:05,856 [ Period of silence ] 950 00:41:05,856 --> 00:41:09,876 Yeah, so this is actually a really neat low level way 951 00:41:09,876 --> 00:41:13,616 of multiplying or multiplying any number by 2. 952 00:41:13,866 --> 00:41:16,806 So if I had the number Y equals 1 up here, shift the bits 953 00:41:16,806 --> 00:41:19,326 to the left, you told me that we get this. 954 00:41:19,326 --> 00:41:20,106 This is the value. 955 00:41:20,226 --> 00:41:22,556 Let's see this is columns, ones columns, two. 956 00:41:22,776 --> 00:41:23,896 So that's the value 2. 957 00:41:24,296 --> 00:41:26,706 So one becomes 2, if I left shift again 958 00:41:26,706 --> 00:41:30,576 by 2 places this number becomes this which is the value 959 00:41:30,576 --> 00:41:33,406 in decimal known as 4. 960 00:41:33,406 --> 00:41:34,946 Do it again and we get 8, 961 00:41:34,946 --> 00:41:37,766 do it again if we have enough bits, 16, 32. 962 00:41:37,986 --> 00:41:41,176 So a really neat way of multiplying a value 963 00:41:41,176 --> 00:41:43,656 by 2 is simply to shift its bits to the left. 964 00:41:44,036 --> 00:41:46,266 So this is one of the underlying representations. 965 00:41:46,266 --> 00:41:48,946 Now who cares is an interesting question now, right? 966 00:41:48,996 --> 00:41:51,516 So this is all fine and good but I promised 967 00:41:51,516 --> 00:41:53,336 that we would actually be solving real problems 968 00:41:53,336 --> 00:41:55,096 and not devolving back to for loops 969 00:41:55,096 --> 00:41:56,336 and while loops and all of this. 970 00:41:56,696 --> 00:41:59,516 So it turns out there's some very interesting applications 971 00:41:59,516 --> 00:42:00,886 of these very simple ideas. 972 00:42:01,226 --> 00:42:03,086 So these, how many of you just by a show 973 00:42:03,086 --> 00:42:06,326 of hands has ever lost data because your hard drive crashed 974 00:42:06,326 --> 00:42:07,846 or got corrupted or something bad happened? 975 00:42:08,176 --> 00:42:08,806 So a lot of you. 976 00:42:09,086 --> 00:42:11,846 Now how many of you didn't actually lose data 977 00:42:11,846 --> 00:42:13,936 because you were using a raid 1 978 00:42:13,936 --> 00:42:16,146 or raid 5 set up in your machine? 979 00:42:17,006 --> 00:42:19,016 Alright, so not many, maybe just one or two geeks 980 00:42:19,096 --> 00:42:22,376 who ever awkwardly raised their hands that second time. 981 00:42:22,656 --> 00:42:23,536 So what does this mean? 982 00:42:23,536 --> 00:42:25,766 Well frankly if you have a desktop computer these days, 983 00:42:25,986 --> 00:42:29,626 there's no excuse for not having a pair of hard drives, 984 00:42:29,626 --> 00:42:33,446 two hard drives that are in a so called raid 1 configuration. 985 00:42:33,746 --> 00:42:36,856 So raid is redundant array of independent disks. 986 00:42:37,016 --> 00:42:38,186 It's just a really fancy way 987 00:42:38,186 --> 00:42:39,886 of saying you don't have one hard drive, 988 00:42:39,886 --> 00:42:41,026 you have two hard drives. 989 00:42:41,026 --> 00:42:43,496 And what's really neat about RAID 1 specifically 990 00:42:43,696 --> 00:42:45,646 and you can do this with Mac OS, you can do this 991 00:42:45,646 --> 00:42:47,586 with Windows these days, some computer companies 992 00:42:47,586 --> 00:42:50,176 like Dell offer this feature now when you use 993 00:42:50,176 --> 00:42:51,566 that little web based configurator 994 00:42:51,566 --> 00:42:52,556 when buying a computer. 995 00:42:52,866 --> 00:42:57,946 RAID 1 takes one disk of size you know let's say a terabyte 996 00:42:57,946 --> 00:43:01,286 or 500 gigabytes, then you get another identical hard drive 997 00:43:01,486 --> 00:43:04,116 and RAID 1 means that your computer treats them 998 00:43:04,116 --> 00:43:04,856 as identical. 999 00:43:04,916 --> 00:43:08,246 So any data that gets written here simultaneously gets 1000 00:43:08,246 --> 00:43:09,156 written here. 1001 00:43:09,446 --> 00:43:12,316 The upside of this is that if you are very unfortunate 1002 00:43:12,556 --> 00:43:16,056 to suffer a hard drive failure, this guy just breaks, 1003 00:43:16,426 --> 00:43:19,156 all of your data is literally still on the other drive. 1004 00:43:19,436 --> 00:43:22,006 So now you're operating in kind of a dangerous mode. 1005 00:43:22,006 --> 00:43:23,346 Your computer is smart enough, 1006 00:43:23,346 --> 00:43:25,846 if it supports this technology called RAID to keep 1007 00:43:25,846 --> 00:43:28,126 on plugging away using just the one hard drive. 1008 00:43:28,456 --> 00:43:31,806 But obviously if bad things happen to this hard drive too, 1009 00:43:31,806 --> 00:43:33,436 then your data is definitely lost. 1010 00:43:33,436 --> 00:43:36,676 But it decreases the probability that you'll lose data 1011 00:43:36,676 --> 00:43:39,016 because now you would have to have this drive 1012 00:43:39,306 --> 00:43:42,136 and this drive fail simultaneously for you 1013 00:43:42,136 --> 00:43:43,166 to actually lose data. 1014 00:43:43,166 --> 00:43:46,006 And what's really neat about RAID is that if this drive fails 1015 00:43:46,256 --> 00:43:50,346 and therefore is useless and this one keeps on plugging away, 1016 00:43:50,566 --> 00:43:52,636 you can go out, buy an identically sized 1017 00:43:52,636 --> 00:43:55,236 or larger hard drive, plug it into this slot, 1018 00:43:55,336 --> 00:43:56,676 throw the other hard drive away 1019 00:43:56,676 --> 00:43:59,046 and then the computer will automatically synchronize this 1020 00:43:59,096 --> 00:44:01,576 data over to the new drive and now you're back 1021 00:44:01,576 --> 00:44:03,726 in a stable RAID 1 configuration. 1022 00:44:03,996 --> 00:44:06,686 So this is not related at all to bit wise operators. 1023 00:44:06,686 --> 00:44:08,466 This is just very useful for consumers. 1024 00:44:08,746 --> 00:44:10,746 But that only scales so well. 1025 00:44:10,906 --> 00:44:13,646 There's other technologies like RAID 5 1026 00:44:13,996 --> 00:44:17,186 which actually lets you have multiple hard drives inside a 1027 00:44:17,186 --> 00:44:20,516 computer or more commonly inside of a server that allows you 1028 00:44:20,516 --> 00:44:23,386 to treat multiple hard drives as though they were just one. 1029 00:44:23,756 --> 00:44:26,316 So if I had 3 1 terabyte drives, 1030 00:44:26,666 --> 00:44:31,436 RAID 5 would actually let my computer think it had one 2 1031 00:44:31,436 --> 00:44:32,476 terabyte drives. 1032 00:44:32,886 --> 00:44:36,396 So if I had one terabyte, 1 terabyte 1033 00:44:36,866 --> 00:44:38,546 and 1 terabyte hard drive 1034 00:44:38,706 --> 00:44:42,756 and I'm using a RAID 5 configuration, I get a total 1035 00:44:42,756 --> 00:44:46,056 of 2 terabytes of space because one of those terabytes, 1036 00:44:46,056 --> 00:44:49,946 one hard drive is actually used to store a check sum, 1037 00:44:50,216 --> 00:44:54,756 which is a term that has came up very early on in the course 1038 00:44:54,756 --> 00:44:57,876 when talking about ISBNs and credit cards for problem set 1. 1039 00:44:58,226 --> 00:45:00,696 So if you're willing to sacrifice one hard drive 1040 00:45:00,736 --> 00:45:02,986 which frankly is pretty reasonable these days given the 1041 00:45:02,986 --> 00:45:06,286 cost, you can actually not only get this ability 1042 00:45:06,286 --> 00:45:08,316 to lose a hard drive and keep on plugging away, 1043 00:45:08,536 --> 00:45:11,706 you can also expand your storage to give the illusion 1044 00:45:11,706 --> 00:45:14,816 that multiple hard drives are just one really big hard drive. 1045 00:45:14,876 --> 00:45:15,786 Now how does this work? 1046 00:45:16,026 --> 00:45:17,706 Well I'm really going to simplify things. 1047 00:45:17,706 --> 00:45:20,586 So I don't want to write out one trillion bits here 1048 00:45:21,136 --> 00:45:22,046 or one trillion bytes. 1049 00:45:22,546 --> 00:45:23,716 So I'm going to instead say 1050 00:45:23,716 --> 00:45:27,396 that suppose this hard drive here has just 4 bits in it. 1051 00:45:27,396 --> 00:45:29,276 So I've only stored a tiny little file. 1052 00:45:29,276 --> 00:45:30,436 I've stored the number 15 1053 00:45:30,686 --> 00:45:33,146 and this hard drive simply stores this value, 1054 00:45:33,146 --> 00:45:34,526 another 4 bit value. 1055 00:45:34,926 --> 00:45:36,786 What's really neat about RAID 5 is 1056 00:45:36,786 --> 00:45:40,696 that essentially what it does it is uses the third hard drive 1057 00:45:41,126 --> 00:45:45,356 simply to store the XOR of the first two hard drives. 1058 00:45:45,356 --> 00:45:48,076 So using this very simple primitive can you do 1059 00:45:48,076 --> 00:45:48,806 the following. 1060 00:45:48,806 --> 00:45:51,046 1XOR1 is going to give me 1061 00:45:51,566 --> 00:45:54,766 if I line these left most bits up, 1XOR1 is what? 1062 00:45:55,236 --> 00:46:00,786 0, 1XOR1 is 0, 1XOR0, 1 and then 1. 1063 00:46:00,786 --> 00:46:03,936 So this third hard drive is now essentially my check sum. 1064 00:46:04,246 --> 00:46:05,746 So what does this mean exactly? 1065 00:46:05,746 --> 00:46:08,366 Well let's suppose some bad things happened 1066 00:46:08,366 --> 00:46:12,356 like this guy gets deleted or breaks. 1067 00:46:12,506 --> 00:46:14,266 So now I've lost my data, right. 1068 00:46:14,266 --> 00:46:17,026 This is really bad because some of my MP3s are over here, 1069 00:46:17,026 --> 00:46:19,066 some of them just so happen to be on this hard drive. 1070 00:46:19,276 --> 00:46:20,276 I seem to have lost it 1071 00:46:20,276 --> 00:46:22,486 and frankly this is not a back up right? 1072 00:46:22,516 --> 00:46:25,026 This third hard drive was clearly not the same 1073 00:46:25,266 --> 00:46:26,156 as the one I lost. 1074 00:46:26,156 --> 00:46:28,006 It happens coincidently to be the opposite. 1075 00:46:28,006 --> 00:46:29,276 But what does this mean? 1076 00:46:29,276 --> 00:46:32,816 Well thanks to RAID 5, XOR can also be applied in the, 1077 00:46:32,816 --> 00:46:34,756 a reverse direction conceptually. 1078 00:46:35,066 --> 00:46:38,226 If I now and let me go ahead and do something 1079 00:46:38,226 --> 00:46:40,476 like can I change colors? 1080 00:46:40,956 --> 00:46:44,276 Let's say that this guy broke entirely. 1081 00:46:44,356 --> 00:46:45,246 So red is bad. 1082 00:46:45,676 --> 00:46:46,646 So what can I do? 1083 00:46:46,646 --> 00:46:50,346 Well it turns out if I XOR this guy with this guy, 1084 00:46:50,546 --> 00:46:51,686 what do I get back out? 1085 00:46:51,896 --> 00:46:53,256 So let's call this result. 1086 00:46:53,256 --> 00:46:54,796 And now let's see. 1087 00:46:54,796 --> 00:46:56,136 So look just at the black numbers. 1088 00:46:56,236 --> 00:47:01,886 1XOR0 gives me 1, 1XOR0 gives me 1, 1089 00:47:02,206 --> 00:47:06,626 1XOR1 gives me 0, 1XOR1 gives me 0. 1090 00:47:06,876 --> 00:47:09,856 Wala, thanks god for that third hard drive 1091 00:47:09,856 --> 00:47:12,306 that really wasn't doing anything in terms of space. 1092 00:47:12,306 --> 00:47:13,956 It wasn't giving me another terabyte. 1093 00:47:14,286 --> 00:47:17,086 It was just keeping not a backup, but the result 1094 00:47:17,086 --> 00:47:19,356 of XORing the first two drives together. 1095 00:47:19,536 --> 00:47:21,626 Can my computer if designed 1096 00:47:21,626 --> 00:47:24,806 to support RAID recover the data that was lost in red? 1097 00:47:25,016 --> 00:47:27,036 Let's suppose it was the opposite situation. 1098 00:47:27,036 --> 00:47:28,456 So the take away to be clear is 1099 00:47:28,456 --> 00:47:31,386 that this result equals the lost hard drive. 1100 00:47:31,686 --> 00:47:35,246 So what is instead it's not the hard drive number 2 1101 00:47:35,246 --> 00:47:38,926 that is lost, but instead let's make him ok, 1102 00:47:38,926 --> 00:47:40,026 back to black there. 1103 00:47:40,026 --> 00:47:43,706 And now let's say it's this guy that actually disappears. 1104 00:47:44,126 --> 00:47:45,876 So now he is the dead hard drive. 1105 00:47:46,106 --> 00:47:48,866 Thankfully I've kept around my third hard drive, 1106 00:47:48,866 --> 00:47:50,116 so let's see what happens here. 1107 00:47:50,486 --> 00:47:54,986 1XOR0 gives me 1, same thing gives me 1, 1108 00:47:55,046 --> 00:47:56,566 same thing gives me 1, 1. 1109 00:47:56,846 --> 00:47:58,916 Wala, I've recovered my data as well. 1110 00:47:59,036 --> 00:48:01,736 And what's really neat is RAID 5 doesn't just support 3 hard 1111 00:48:01,736 --> 00:48:04,246 drives, it supports 4 hard drives, 5 hard drives. 1112 00:48:04,316 --> 00:48:07,636 You simply use the nth disk as this XOR disk, 1113 00:48:07,746 --> 00:48:10,576 the check sum disk and can you recover all of your data 1114 00:48:10,936 --> 00:48:13,556 in this very simple bit wise way. 1115 00:48:13,636 --> 00:48:16,536 I'm going to take a 5 minute break. 1116 00:48:16,661 --> 00:48:18,661 [ Background noise ] 1117 00:48:18,786 --> 00:48:23,416 Alright we are back, another application 1118 00:48:23,556 --> 00:48:27,716 of XOR is actually this sort of mind blowing example 1119 00:48:27,716 --> 00:48:30,656 where you can actually swap two values A and B 1120 00:48:30,916 --> 00:48:33,266 without using a temporary variable. 1121 00:48:33,526 --> 00:48:35,896 So alright, we've used this exercise multiple times 1122 00:48:35,896 --> 00:48:39,396 to demonstrate how not to swap values, how to swap values, 1123 00:48:39,396 --> 00:48:42,786 how to swap student's houses, so we kind of belabored this notion 1124 00:48:42,786 --> 00:48:46,426 of swapping, but in every case, quiz 0 and before 1125 00:48:46,426 --> 00:48:48,596 in lecture did we actually use some temporary storage 1126 00:48:48,796 --> 00:48:50,286 because just conceptually it's kind of hard 1127 00:48:50,286 --> 00:48:53,996 to imagine putting this variable here and this variable here 1128 00:48:54,216 --> 00:48:57,946 without actually clobbering this one in between those two steps. 1129 00:48:58,206 --> 00:49:00,366 But it turns out using XOR you can do this. 1130 00:49:00,746 --> 00:49:02,296 I don't think it's all that enlightening 1131 00:49:02,296 --> 00:49:05,696 to belabor the details, but this is a very quick 1132 00:49:05,696 --> 00:49:08,706 and dirt main routine that notice declares two variables, 1133 00:49:08,706 --> 00:49:11,856 X and Y. And this is called Swap2.c 1134 00:49:11,856 --> 00:49:13,446 for those following along on paper. 1135 00:49:13,696 --> 00:49:15,716 X is 1, Y is 2. 1136 00:49:15,916 --> 00:49:18,656 I print out the values of those variables then I call this 1137 00:49:18,656 --> 00:49:21,686 function swap, passing by reference or pointer, 1138 00:49:21,686 --> 00:49:24,126 not passing by value because value is very bad. 1139 00:49:24,296 --> 00:49:25,206 It's not going to work here. 1140 00:49:25,636 --> 00:49:28,956 But then this program does actually work correctly. 1141 00:49:28,956 --> 00:49:32,656 Let me go ahead and make swap2 and let me go ahead 1142 00:49:32,656 --> 00:49:35,096 and run swap2 a the command line. 1143 00:49:35,096 --> 00:49:36,186 And indeed it works. 1144 00:49:36,376 --> 00:49:38,496 The interesting question is how does it work? 1145 00:49:38,776 --> 00:49:41,056 Well notice we have these three lines of code, 1146 00:49:41,056 --> 00:49:43,896 none of which actually employs a local variable. 1147 00:49:44,356 --> 00:49:48,166 So I'll leave this I think more as an at home thought exercise, 1148 00:49:48,266 --> 00:49:50,086 but don't be distracted by the use of pointers. 1149 00:49:50,086 --> 00:49:52,146 The fact that we're using pointers is irrelevant 1150 00:49:52,196 --> 00:49:53,216 to the trick itself. 1151 00:49:53,496 --> 00:49:56,796 The pointers just allow us to actually do swaps at all. 1152 00:49:57,056 --> 00:49:58,056 So we're using pointers, 1153 00:49:58,056 --> 00:50:00,736 allows us to actually exchange the original values. 1154 00:50:01,026 --> 00:50:03,856 So don't think that the magic lies in the stars, 1155 00:50:03,986 --> 00:50:05,266 but the magic really lies 1156 00:50:05,266 --> 00:50:07,636 in this carrot, in this XOR operator. 1157 00:50:07,636 --> 00:50:11,726 So simply by XORing two values together once, and then again 1158 00:50:11,726 --> 00:50:14,006 and then again while updating the values of A, 1159 00:50:14,006 --> 00:50:17,216 B and A accordingly in each of those steps, you can literally 1160 00:50:17,216 --> 00:50:19,226 like take 2 values and I'll do it dramatically 1161 00:50:19,226 --> 00:50:22,356 with fingers this time, move them from one to the other 1162 00:50:22,356 --> 00:50:23,946 without losing any data. 1163 00:50:23,946 --> 00:50:27,126 And it's a neat trick and for those curious you can follow 1164 00:50:27,126 --> 00:50:29,426 along at home with the slides for today 1165 00:50:29,426 --> 00:50:32,266 where I belabor the point, line by line by line showing you 1166 00:50:32,266 --> 00:50:34,876 in comment exactly what's going on underneath the hood, 1167 00:50:35,086 --> 00:50:37,166 using very simple values of 1 and 4. 1168 00:50:37,166 --> 00:50:38,186 But it's really neat. 1169 00:50:38,326 --> 00:50:41,186 And if you're ever asked in some you know silly interview 1170 00:50:41,186 --> 00:50:43,966 question years from now how can you swap two values 1171 00:50:43,966 --> 00:50:45,326 without using a temporary variable? 1172 00:50:45,606 --> 00:50:48,196 This is the answer Microsoft and the like are looking for, 1173 00:50:48,276 --> 00:50:49,556 using bit wise operators. 1174 00:50:49,556 --> 00:50:50,846 So it's neat if nothing else. 1175 00:50:51,206 --> 00:50:52,366 But let's take a look at something 1176 00:50:52,366 --> 00:50:55,456 that is more enlightening and perhaps more useful longer term. 1177 00:50:55,736 --> 00:50:59,106 And that is to actually use a notion of a mask. 1178 00:50:59,636 --> 00:51:00,836 So this is a common idea. 1179 00:51:01,026 --> 00:51:03,756 And actually let's use these bit wise operations 1180 00:51:03,916 --> 00:51:06,786 at a lower level, but let's see what we can do. 1181 00:51:06,786 --> 00:51:07,846 So I'm going to do this 1182 00:51:07,846 --> 00:51:10,256 from scratch even though you have a copy in front of you 1183 00:51:10,256 --> 00:51:12,176 for reference, least I screw up on the fly. 1184 00:51:12,476 --> 00:51:15,056 But I'm just going to go ahead and start, the goal here is 1185 00:51:15,326 --> 00:51:18,386 to ask the user for an integer and then I want to convert 1186 00:51:18,386 --> 00:51:20,616 that integer which the user is going to type in decimal. 1187 00:51:20,856 --> 00:51:23,056 I want it to display as binary. 1188 00:51:23,176 --> 00:51:26,096 Right I'm really curious to see what binary numbers look like. 1189 00:51:26,386 --> 00:51:28,116 I want to see what the mapping is like. 1190 00:51:28,116 --> 00:51:32,476 So if I run this program and I type in the number 1 in decimal, 1191 00:51:32,656 --> 00:51:37,126 I want to see 31 zeros get printed and then 11 or if I type 1192 00:51:37,126 --> 00:51:39,736 in 4 billion something or other, I want to see a whole lot 1193 00:51:39,736 --> 00:51:41,456 of one's be printed across the screen. 1194 00:51:41,706 --> 00:51:43,066 Alright, so how can we do this? 1195 00:51:43,066 --> 00:51:45,866 And we'll see what the take away is in a moment. 1196 00:51:45,906 --> 00:51:47,806 So let me go ahead and include my own library. 1197 00:51:48,046 --> 00:51:54,036 Let me go ahead and also include let's say standardIO.h. Here's 1198 00:51:54,036 --> 00:51:59,846 my main into main into arc.c star arg v bracket. 1199 00:52:00,166 --> 00:52:01,356 Ok, coming along. 1200 00:52:01,756 --> 00:52:03,156 So now what do I want to do? 1201 00:52:03,156 --> 00:52:07,516 Well let's go ahead and ask the user, ask user for nt. 1202 00:52:07,516 --> 00:52:10,996 Alright, so let's see, I'm going to need to store this somewhere 1203 00:52:10,996 --> 00:52:13,466 and I'm going to need to do the following while the user doesn't 1204 00:52:13,466 --> 00:52:16,376 cooperate, so I know this construct is generally useful, 1205 00:52:16,376 --> 00:52:17,566 I'll fill in the blank in a moment. 1206 00:52:17,866 --> 00:52:18,486 What do I want to do? 1207 00:52:18,486 --> 00:52:20,636 Well let's do n get nt. 1208 00:52:21,426 --> 00:52:24,696 Alright and now how do I check that this is actually, 1209 00:52:24,696 --> 00:52:26,176 actually let's tell the user what to do. 1210 00:52:26,556 --> 00:52:31,776 So print def give me a non negative nt. 1211 00:52:31,956 --> 00:52:35,256 Alright, so now while and what do I want to put here to ensure 1212 00:52:35,256 --> 00:52:38,716 that they're giving me a value I want? 1213 00:52:38,716 --> 00:52:38,896 [ Period of silence ] 1214 00:52:38,896 --> 00:52:39,706 While n is? 1215 00:52:40,916 --> 00:52:41,456 Less than 0. 1216 00:52:41,456 --> 00:52:43,496 Alright so this will just pester the user again and again, 1217 00:52:43,496 --> 00:52:45,786 give me a non negative nt, non negative nt, non negative nt 1218 00:52:45,926 --> 00:52:48,016 until finally they give me 0 or higher, 1219 00:52:48,266 --> 00:52:49,356 then this loop will break. 1220 00:52:49,356 --> 00:52:51,376 And so now I can do something interesting with it. 1221 00:52:51,596 --> 00:52:54,126 So now I want to print the number in binary. 1222 00:52:54,536 --> 00:52:55,446 Alright, well this is 1223 00:52:55,446 --> 00:52:57,796 where things get a little more interesting. 1224 00:52:58,036 --> 00:52:59,676 Let's at least do a sanity check here. 1225 00:52:59,676 --> 00:53:03,696 Let's print out the number in integer format, 1226 00:53:03,696 --> 00:53:06,516 n. And then we'll just return. 1227 00:53:07,156 --> 00:53:09,106 So let me do, make sure I didn't screw up anywhere else. 1228 00:53:09,106 --> 00:53:11,436 Ok, so far so good, binary, 1229 00:53:11,706 --> 00:53:14,636 give me a non negative nt, 1, ok seems to work. 1230 00:53:14,636 --> 00:53:16,606 I'll give it 15. 1231 00:53:16,806 --> 00:53:18,686 Ok, seems to work, alright. 1232 00:53:18,686 --> 00:53:20,206 So let's move on now. 1233 00:53:20,236 --> 00:53:21,556 Alright print number in binary. 1234 00:53:21,876 --> 00:53:24,616 So how do I print a number in binary? 1235 00:53:24,736 --> 00:53:28,216 So there's no, unlike JAVA there's not just a function 1236 00:53:28,216 --> 00:53:29,266 that says print this in binary. 1237 00:53:29,696 --> 00:53:31,456 There's most anything in languages like that. 1238 00:53:31,456 --> 00:53:34,796 So C we have to kind of get our hands dirty or go lower level, 1239 00:53:34,796 --> 00:53:38,696 but let's see I know how a number is represented underneath 1240 00:53:38,806 --> 00:53:39,576 the hood. 1241 00:53:39,856 --> 00:53:42,436 It's just an nt, let's see it's 32 bits. 1242 00:53:42,466 --> 00:53:44,736 So I essentially just need a loop that's going 1243 00:53:44,736 --> 00:53:49,286 to iterate 32 times conceptually from the left end 1244 00:53:49,286 --> 00:53:51,036 of my number all the way to my right end, 1245 00:53:51,036 --> 00:53:53,006 because print def is only going to print from left to right. 1246 00:53:53,006 --> 00:53:55,596 I can't go backwards as we usually do on the screen. 1247 00:53:55,846 --> 00:53:59,056 So I need a loop that's going to execute 32 times. 1248 00:53:59,116 --> 00:54:01,586 So let me start off instinctively like this. 1249 00:54:01,586 --> 00:54:04,956 Nt is 0, I is less than 32, I plus plus. 1250 00:54:05,156 --> 00:54:06,366 So that's the right idea. 1251 00:54:06,366 --> 00:54:08,716 I can maybe clean this up a little bit. 1252 00:54:08,716 --> 00:54:12,066 In fact, I kind of want to do it from the left hand side, 1253 00:54:12,066 --> 00:54:12,876 but we'll come back to that. 1254 00:54:12,876 --> 00:54:13,726 I'll clean up that loop, 1255 00:54:13,766 --> 00:54:15,966 but I at least have the basic building block in place. 1256 00:54:16,476 --> 00:54:18,656 Now what do I want to do? 1257 00:54:18,656 --> 00:54:19,786 Well I essentially want 1258 00:54:19,786 --> 00:54:22,056 to ask this question on every iteration. 1259 00:54:22,056 --> 00:54:23,526 Is the current bit a 1? 1260 00:54:23,796 --> 00:54:24,876 If so print a 1. 1261 00:54:25,156 --> 00:54:26,476 Is the current bit a 0? 1262 00:54:26,756 --> 00:54:27,706 Print a 0. 1263 00:54:27,986 --> 00:54:30,416 Because we're now talking about individual bits. 1264 00:54:30,416 --> 00:54:31,426 But what I'm going to print 1265 00:54:31,476 --> 00:54:35,336 to the screen is actually quote unquote 0 or quote unquote 1. 1266 00:54:35,386 --> 00:54:37,666 So I need to essentially convert these tiny little bits 1267 00:54:37,946 --> 00:54:40,806 to an actually ASCII character so the user can understand. 1268 00:54:41,236 --> 00:54:42,356 So you know what I'm going to do? 1269 00:54:42,356 --> 00:54:44,636 I'm actually going to scroll back because I want to do this 1270 00:54:44,636 --> 00:54:47,736 from the left hand side first of the number, 1271 00:54:47,736 --> 00:54:48,966 not the right hand side. 1272 00:54:49,326 --> 00:54:50,576 So really I'm going to do this. 1273 00:54:50,576 --> 00:54:52,256 And I'm going to be a little more dynamic. 1274 00:54:52,256 --> 00:54:57,286 So nt I gets size of nt, oops, size of nt. 1275 00:54:58,006 --> 00:55:02,676 Alright, so that's going to give me a 4 bytes but I want 8 bytes. 1276 00:55:02,676 --> 00:55:05,146 Sorry, 8 bits per byte so size 1277 00:55:05,146 --> 00:55:06,986 of nt times 8 gives me what number? 1278 00:55:07,116 --> 00:55:07,776 Quick sanity check. 1279 00:55:08,826 --> 00:55:09,636 32 hopefully. 1280 00:55:09,636 --> 00:55:12,886 On the 32 bit machine, this is going to give me the value of 32 1281 00:55:12,886 --> 00:55:14,096 because it's 4 times 8. 1282 00:55:14,326 --> 00:55:17,706 But I don't want to go to the 32 bit because really 1283 00:55:17,706 --> 00:55:19,966 if the bits are 0 indexed it's 31. 1284 00:55:20,106 --> 00:55:21,526 So I'm going to be a little defensive 1285 00:55:21,526 --> 00:55:23,366 and I'm going to start at 31. 1286 00:55:23,436 --> 00:55:25,806 So the left most bit is going to be bit 31 1287 00:55:25,806 --> 00:55:28,776 because the right most bit is going to be bit number 0. 1288 00:55:29,096 --> 00:55:33,146 So I want to do this so long as I is greater than or equal to 0. 1289 00:55:33,146 --> 00:55:35,556 And then I want to decrement I on each iteration. 1290 00:55:35,556 --> 00:55:38,786 So again the goal is if I have a really long 32 bit value, 1291 00:55:39,066 --> 00:55:41,186 I conceptually just need I to point 1292 00:55:41,186 --> 00:55:45,536 at the bit 31 then decrement to 30, 29, 28, dot, dot, dot. 1293 00:55:45,746 --> 00:55:48,376 To bit 0 because what I want to do on each iteration 1294 00:55:48,376 --> 00:55:50,886 to be clear is print out in ASCII form 1295 00:55:51,136 --> 00:55:53,076 that particular bit, 0 or 1. 1296 00:55:53,376 --> 00:55:56,656 So the question then becomes if I just have a chunk 1297 00:55:56,656 --> 00:56:00,016 of 4 bits called an nt, how do I actually get 1298 00:56:00,016 --> 00:56:01,556 at individual integers? 1299 00:56:01,556 --> 00:56:04,766 Thus far we've not actually given you the ability. 1300 00:56:04,986 --> 00:56:06,636 And you've probably not discovered the ability 1301 00:56:06,636 --> 00:56:08,446 to actually manipulate individual bits. 1302 00:56:08,446 --> 00:56:10,866 The smallest unit of data that you've probably been able 1303 00:56:10,866 --> 00:56:12,606 to manipulate thus far is what? 1304 00:56:12,606 --> 00:56:16,806 How many bytes or how many bits? 1305 00:56:16,806 --> 00:56:17,006 [ Period of silence ] 1306 00:56:17,006 --> 00:56:19,686 Like what's the smallest piece of data you've actually changed 1307 00:56:19,686 --> 00:56:21,866 or read in any program you've written thus far? 1308 00:56:21,866 --> 00:56:23,046 [ Background noise ] 1309 00:56:23,046 --> 00:56:23,926 A char right? 1310 00:56:23,926 --> 00:56:27,566 8 bits or 1 byte is a char, but we want to go deeper than that. 1311 00:56:27,566 --> 00:56:30,476 And there's no data type called bit, so we actually have 1312 00:56:30,576 --> 00:56:33,746 to accept the fact that we have some limitations and figure 1313 00:56:33,746 --> 00:56:35,766 out how inside of a char or inside 1314 00:56:35,766 --> 00:56:38,566 of an nt we can nonetheless get at something we care about. 1315 00:56:38,816 --> 00:56:41,676 And this common trick, this is just a piece of jargon, 1316 00:56:41,676 --> 00:56:43,536 but it's an idea that will recur 1317 00:56:43,736 --> 00:56:45,766 in programs perhaps throughout your life 1318 00:56:45,766 --> 00:56:47,446 if you continue playing with stuff like this. 1319 00:56:47,736 --> 00:56:49,406 We can define what's called a bit mask. 1320 00:56:50,126 --> 00:56:53,846 So a mask is just a value that usually mostly zeros, 1321 00:56:53,846 --> 00:56:57,286 but it has a 1 any location that you happen to care 1322 00:56:57,286 --> 00:56:58,986 about at the current moment in time. 1323 00:56:58,986 --> 00:57:00,336 And you can actually flip the definition 1324 00:57:00,336 --> 00:57:02,766 and say it's all zeros, it's all ones except 1325 00:57:02,766 --> 00:57:03,736 for occasional zeros. 1326 00:57:04,016 --> 00:57:06,816 But I'm going to think of this mask, kind of in Photoshop form, 1327 00:57:06,816 --> 00:57:10,886 in overlay where I want a whole bunch of zeros but I want a 1 1328 00:57:11,166 --> 00:57:13,096 at the location I currently care about. 1329 00:57:13,366 --> 00:57:16,256 So in short, on every iteration now my goal I've decided is 1330 00:57:16,336 --> 00:57:20,476 to give, is to create for myself a 32 bit value, all zeros 1331 00:57:20,476 --> 00:57:23,226 but I want to turn on the bit I care about. 1332 00:57:23,426 --> 00:57:25,646 Bit 31 followed by bit 30, 1333 00:57:25,646 --> 00:57:28,486 followed by bit 30, 29, all the way down. 1334 00:57:28,486 --> 00:57:30,146 And then on each iteration do I want 1335 00:57:30,146 --> 00:57:33,126 to change all the 1s I've created back to 0s. 1336 00:57:33,176 --> 00:57:36,886 So I want a big sequence of 0s and then just a 1 that moves 1337 00:57:36,886 --> 00:57:38,676 from left to right on each iteration. 1338 00:57:38,676 --> 00:57:39,806 So how do I achieve this? 1339 00:57:40,056 --> 00:57:41,696 Well using these primitives from today. 1340 00:57:42,186 --> 00:57:46,556 If I want a mask, if I want a 1 at the ith location, 1341 00:57:46,556 --> 00:57:49,526 it turns out I can just do this. 1342 00:57:49,526 --> 00:57:52,866 So I'm going to declare a mask to be 32 bits. 1343 00:57:52,866 --> 00:57:54,286 That's why I'm using an nt. 1344 00:57:54,286 --> 00:57:56,386 I need a 1 at a certain location. 1345 00:57:56,556 --> 00:57:58,496 Well what location do I want it at at first? 1346 00:57:58,756 --> 00:58:00,436 I want it at location 31. 1347 00:58:00,846 --> 00:58:03,246 So using left shift, what this means is 1348 00:58:03,796 --> 00:58:08,146 when I simply do 1 left shift I that's really like saying 1349 00:58:08,146 --> 00:58:12,116 on the first iteration 1 left shift 31, which means ok, 1350 00:58:12,116 --> 00:58:16,356 take a 1 and then shift it to the left 31 places, 1351 00:58:16,386 --> 00:58:19,916 which essentially means, 1, 2, 3, 4, 5, 6, 7, 8, 1352 00:58:20,566 --> 00:58:23,546 let's just make this look like an actual nt. 1353 00:58:23,616 --> 00:58:26,646 And I'll shrink it so it actually fits on the screen. 1354 00:58:27,206 --> 00:58:30,366 And that gives us this, get rid of our stupid color wheel. 1355 00:58:30,836 --> 00:58:34,636 So if I have a 1 as I do at this point in time 1356 00:58:34,866 --> 00:58:39,356 and then I shift it 31 places, that means my 1 goes 1, 1357 00:58:39,356 --> 00:58:41,326 2, 3, 4, dot, dot, dot. 1358 00:58:41,326 --> 00:58:43,326 It eventually ends up over here. 1359 00:58:43,636 --> 00:58:48,176 So my mask is now this integer of all zeros but a 1 bit 1360 00:58:48,176 --> 00:58:50,006 at the location I care about. 1361 00:58:50,386 --> 00:58:54,186 And now using another bit wise operator can I actually check a 1362 00:58:54,186 --> 00:58:57,516 specific bit in the number that I got from the user. 1363 00:58:57,516 --> 00:58:58,366 So let's take a look. 1364 00:58:58,366 --> 00:59:03,806 Let me go ahead now and do if n, the number the user gave me, 1365 00:59:04,136 --> 00:59:09,856 ended with bitwise ended with my mask is true, 1366 00:59:09,856 --> 00:59:14,816 if that gives me back a positive value and rather a non 0 value, 1367 00:59:15,176 --> 00:59:16,596 what do I want to print? 1368 00:59:17,656 --> 00:59:22,466 In other words if the user's number ended 1369 00:59:22,466 --> 00:59:25,756 with my current mask gives me a non 0 value, 1370 00:59:25,756 --> 00:59:26,536 what does that mean? 1371 00:59:27,956 --> 00:59:30,786 So I want to print a 1 because it means a 1 is there, 1372 00:59:30,966 --> 00:59:34,646 else and I'll rewind in a second, I want to print a 0. 1373 00:59:35,056 --> 00:59:36,086 Now why does this work? 1374 00:59:36,086 --> 00:59:37,276 Well let's take a look at this. 1375 00:59:37,276 --> 00:59:39,206 The mask I've created is in fact this. 1376 00:59:39,736 --> 00:59:43,416 So now the user, the very, let's suppose the user types in 15. 1377 00:59:43,416 --> 00:59:45,876 Let me use the same number of bits. 1378 00:59:45,876 --> 00:59:48,536 So the user is going to type in 15 as I did 1379 00:59:48,536 --> 00:59:50,996 in my second demo there, 1, 2, 3, 4. 1380 00:59:51,166 --> 00:59:55,306 So this is the number 15 that the user has typed in. 1381 00:59:55,516 --> 00:59:57,596 Sure more sophisticated tools exist than text edit 1382 00:59:57,596 --> 00:59:59,016 for lectures, but that's ok. 1383 00:59:59,016 --> 01:00:00,286 And this is my mask. 1384 01:00:01,856 --> 01:00:04,076 And got to work on my font size. 1385 01:00:04,076 --> 01:00:07,016 Alright, so this is my mask and I'll space things accordingly. 1386 01:00:07,486 --> 01:00:09,776 Alright, this is a nice state of the world. 1387 01:00:09,776 --> 01:00:12,046 N is that value 15 that the user typed in. 1388 01:00:12,206 --> 01:00:15,396 My mask initially is a 1 all the way 1389 01:00:15,396 --> 01:00:16,766 over at the left hand location. 1390 01:00:16,946 --> 01:00:19,736 And then I do this and operator so what does 1391 01:00:19,736 --> 01:00:22,146 that actually give me at this location? 1392 01:00:22,146 --> 01:00:26,866 0 and 1 is 0 and then how about everything else? 1393 01:00:27,436 --> 01:00:29,506 If you just fast forward to the end. 1394 01:00:30,186 --> 01:00:31,176 It's all zeros. 1395 01:00:31,606 --> 01:00:34,486 So this is interesting because if I take the user's number N 1396 01:00:34,786 --> 01:00:36,486 and and it with my current mask 1397 01:00:36,826 --> 01:00:39,096 and my mask current has just a single bit set 1398 01:00:39,096 --> 01:00:40,626 at the 31 location. 1399 01:00:40,906 --> 01:00:42,606 And the answer comes back as zero. 1400 01:00:42,696 --> 01:00:43,936 Well what's the take away? 1401 01:00:43,936 --> 01:00:47,296 That that one location all the way on the left was a 0. 1402 01:00:48,006 --> 01:00:51,586 Because if it were a one, what number would I have gotten back? 1403 01:00:51,586 --> 01:00:53,986 If the user had typed in something like 4 billion, 1404 01:00:53,986 --> 01:00:57,116 something atrocious and then I had anded those two values 1405 01:00:57,186 --> 01:01:00,416 together, what then would I get as one and one? 1406 01:01:01,866 --> 01:01:04,816 So I'd actually get a value like god a 1 here 1407 01:01:04,816 --> 01:01:06,386 and then yes, a whole lot of zeros. 1408 01:01:06,386 --> 01:01:08,026 And I don't know what this number is off hand. 1409 01:01:08,026 --> 01:01:09,306 This is like 4 billion, 1410 01:01:09,356 --> 01:01:11,076 2 billion something or other right now. 1411 01:01:11,426 --> 01:01:12,386 But what's the take away? 1412 01:01:12,386 --> 01:01:17,686 2 billion is non 0 which is why my check here if N anded 1413 01:01:17,686 --> 01:01:20,516 with the mask gives me anything other than 0, 1414 01:01:20,566 --> 01:01:25,146 if it gives me any 1 bits anywhere, ie a non 0 number, 1415 01:01:25,386 --> 01:01:27,286 then I found a one at that location. 1416 01:01:27,366 --> 01:01:30,936 Otherwise it finds only a zero and so I'll print that. 1417 01:01:31,656 --> 01:01:33,536 So that's all it actually takes. 1418 01:01:33,536 --> 01:01:35,796 I now have a loop that we've already concluded is going 1419 01:01:35,796 --> 01:01:37,266 to iterate 32 times. 1420 01:01:37,746 --> 01:01:40,196 I'm creating on each iteration a temporary mask, 1421 01:01:40,626 --> 01:01:44,306 a temporary variable that just is all zeros except a 1 bit 1422 01:01:44,306 --> 01:01:47,846 at interesting locations from left to right, 1 bit at a time. 1423 01:01:48,106 --> 01:01:51,086 And the goal of that is to leverage the reality that if you 1424 01:01:51,186 --> 01:01:55,426 and a number with a mask, the only bits that are going 1425 01:01:55,426 --> 01:01:58,246 to be allowed to pass through this filter of sorts, 1426 01:01:58,346 --> 01:02:01,716 through this mask is going to be a one bit in the original value. 1427 01:02:01,716 --> 01:02:04,456 And that's kind of where the idea, the word gets its name. 1428 01:02:04,796 --> 01:02:08,436 So if I have a mask with a 1 here, the only bits 1429 01:02:08,436 --> 01:02:11,376 that will be allowed to pass through to be clear are 1430 01:02:11,376 --> 01:02:13,096 where there are ones in the mask. 1431 01:02:13,766 --> 01:02:15,506 So this number passes through. 1432 01:02:15,666 --> 01:02:18,916 These numbers all get blocked by the and operation 1433 01:02:19,146 --> 01:02:21,806 but that passing through of the left most one is enough o make 1434 01:02:21,806 --> 01:02:26,546 me realize, oh this expression N and mask is true 1435 01:02:26,846 --> 01:02:28,696 because remember true does not mean 1. 1436 01:02:29,006 --> 01:02:31,896 True means non 0, false means 0, 1437 01:02:32,196 --> 01:02:34,576 true means non 0, more generally. 1438 01:02:34,576 --> 01:02:36,666 So let's actually now compile this 1439 01:02:36,666 --> 01:02:40,446 and if I've made no mistakes, fingers crossed, ok mistakes, 1440 01:02:40,666 --> 01:02:42,966 oh I called it print instead of print def. 1441 01:02:42,966 --> 01:02:44,336 That's easy to fix. 1442 01:02:45,516 --> 01:02:48,696 Alright. Let's go ahead and recompile. 1443 01:02:48,816 --> 01:02:51,976 Better, so let me run binary in my current working directory, 1444 01:02:51,976 --> 01:02:54,356 give me a non negative nt, let's give it 1. 1445 01:02:55,126 --> 01:02:58,086 Ok, I need kind of a print def there, so let me go back in here 1446 01:02:58,086 --> 01:03:02,716 and do a little print def of backslash N. 1447 01:03:02,716 --> 01:03:05,196 And now let's recompile this. 1448 01:03:05,626 --> 01:03:06,696 Just for aesthetic reasons. 1449 01:03:06,696 --> 01:03:07,826 Ok. Now I'll give it 1. 1450 01:03:08,186 --> 01:03:09,276 Ok so that's kind of cool. 1451 01:03:09,276 --> 01:03:12,236 Underwhelming, let's give it something more interesting, 15. 1452 01:03:12,506 --> 01:03:16,596 Ok, so now again all I've done is iterate 32 times left 1453 01:03:16,596 --> 01:03:19,236 to right over that underlying 15 to get this result 1454 01:03:19,236 --> 01:03:20,626 and let's do something like 2 billion. 1455 01:03:20,626 --> 01:03:25,826 Enter, that's apparently what 2 billion 1456 01:03:25,826 --> 01:03:28,266 in decimal terms actually equals in binary. 1457 01:03:28,566 --> 01:03:29,726 So it's kind of neat. 1458 01:03:29,726 --> 01:03:32,936 And this idea, this is a problem you will not often solve simply 1459 01:03:32,936 --> 01:03:34,206 printing out numbers in binary. 1460 01:03:34,456 --> 01:03:37,846 But this idea of selecting individual bits is actually a 1461 01:03:37,846 --> 01:03:38,746 really powerful one. 1462 01:03:38,746 --> 01:03:40,846 In some of your problems, that's like problem set 4, 1463 01:03:41,116 --> 01:03:43,566 how many of you guys used an array, maybe a 1D 1464 01:03:43,566 --> 01:03:47,036 or a 2 dimensional array of boles simply to remember 1465 01:03:47,296 --> 01:03:50,076 which of the numbers had come with the bore. 1466 01:03:50,566 --> 01:03:51,636 Anyone take this approach? 1467 01:03:51,636 --> 01:03:53,396 Yeah, so from just chatter on the bulletin board, 1468 01:03:53,396 --> 01:03:55,356 this seemed to be a not uncommon approach. 1469 01:03:55,616 --> 01:03:57,846 Recall that p set 4 required that you remembered 1470 01:03:57,846 --> 01:03:59,996 which numbers came with the board and which ones did not. 1471 01:04:00,236 --> 01:04:01,626 So that you could actually prevent the user 1472 01:04:01,626 --> 01:04:02,726 from changing some of them. 1473 01:04:02,986 --> 01:04:05,226 Well, you guys, it sounds used a 2, 1474 01:04:05,226 --> 01:04:08,176 some of you used a 2 dimensional array storing a whole bunch 1475 01:04:08,176 --> 01:04:09,876 of Booleans so what is that? 1476 01:04:09,876 --> 01:04:13,486 81 Booleans to keep track of true or false. 1477 01:04:13,486 --> 01:04:15,576 That was really a waste of space, right? 1478 01:04:15,576 --> 01:04:19,266 You used 8 bits probably or you used 8 bits because a bole, 1479 01:04:19,266 --> 01:04:21,656 if you ever really looked closely is represented 1480 01:04:21,656 --> 01:04:23,006 with what data type? 1481 01:04:23,126 --> 01:04:26,236 I think it's actually represented 1482 01:04:26,236 --> 01:04:28,466 with what's called an unsigned char, but either way. 1483 01:04:28,466 --> 01:04:31,056 The smallest data type you said we have in C is a char. 1484 01:04:31,266 --> 01:04:33,846 Which means a bole underneath the hood is actually 8 bits. 1485 01:04:33,846 --> 01:04:37,576 You were using 7 bits more than you needed to for every one 1486 01:04:37,576 --> 01:04:38,826 of those true false values. 1487 01:04:38,826 --> 01:04:40,116 So a very common approach 1488 01:04:40,386 --> 01:04:43,176 to these low level bitwise operations is actually 1489 01:04:43,176 --> 01:04:45,496 to compress your program into using only 1490 01:04:45,496 --> 01:04:46,956 as much memory as it needs to. 1491 01:04:47,186 --> 01:04:48,896 Because right intuitively a much clever, 1492 01:04:48,896 --> 01:04:51,646 more clever approach would have been yes to have the notion 1493 01:04:51,646 --> 01:04:54,366 of a 2D array, but just use a single bit 1494 01:04:54,366 --> 01:04:55,716 for each of those locations. 1495 01:04:55,716 --> 01:04:57,816 And you could have used one eighth the memory 1496 01:04:58,146 --> 01:05:00,686 to remember something like which numbers came 1497 01:05:00,686 --> 01:05:01,916 and did not come with the board. 1498 01:05:02,106 --> 01:05:03,786 How do you actually implement this? 1499 01:05:03,786 --> 01:05:06,666 Well you could actually now implement just a chunk 1500 01:05:06,666 --> 01:05:09,516 of memory, if you think back to that problem set. 1501 01:05:09,856 --> 01:05:11,836 You could actually implement yourself 1502 01:05:11,916 --> 01:05:13,666 by manipulating individual bits. 1503 01:05:13,666 --> 01:05:17,236 And so what you'll find in this world of bitwise operations, 1504 01:05:17,236 --> 01:05:19,016 you can implement a function called set. 1505 01:05:19,106 --> 01:05:21,386 And you can implement a function called get 1506 01:05:21,386 --> 01:05:24,616 that simply takes a chunk of memory and flips individual bits 1507 01:05:24,616 --> 01:05:27,026 on and off and get simply checks them. 1508 01:05:27,576 --> 01:05:30,496 So we've looked at setting bits, sorry. 1509 01:05:30,776 --> 01:05:33,706 We've used getting bits using the and operation, 1510 01:05:33,986 --> 01:05:37,266 which of our several bitwise operations might be appropriate 1511 01:05:37,266 --> 01:05:38,916 for actually setting a value? 1512 01:05:39,836 --> 01:05:41,816 ANDOR, XOR, or compliment? 1513 01:05:41,816 --> 01:05:42,446 [ Background noise ] 1514 01:05:42,446 --> 01:05:48,916 How might you turn on a bit? 1515 01:05:48,916 --> 01:05:49,146 [ Background noise ] 1516 01:05:49,146 --> 01:05:51,186 Yeah, so or is actually the case. 1517 01:05:51,186 --> 01:05:52,456 And how do you do that? 1518 01:05:52,606 --> 01:05:55,736 Well if you wanted to OR two numbers together, 1519 01:05:56,066 --> 01:05:57,756 what you could do is something like this. 1520 01:05:57,756 --> 01:05:59,356 Let me get rid of my little scratch pad. 1521 01:05:59,356 --> 01:06:03,766 If I had nt X gets let's say something like 0, 1522 01:06:03,766 --> 01:06:06,496 and this actually equals 00000. 1523 01:06:06,496 --> 01:06:11,106 And then I have, suppose I want to set the one bit, 1524 01:06:11,106 --> 01:06:12,876 suppose I want to set this bit. 1525 01:06:12,876 --> 01:06:14,416 What operation can I perform? 1526 01:06:14,676 --> 01:06:19,356 Well I can actually do X GETS X OR 1 1527 01:06:20,086 --> 01:06:22,576 and that would actually set the left most bit. 1528 01:06:22,576 --> 01:06:26,936 Why? Because if you line these two numbers up, 00000 and 00001, 1529 01:06:27,006 --> 01:06:30,986 and you OR them together, well this guy here is obviously 0, 1530 01:06:31,216 --> 01:06:34,886 0 or 0, 0 or 0, 0 or 0, 0 or 1. 1531 01:06:35,186 --> 01:06:38,316 So in fact the or operator has the effect of turning bits on, 1532 01:06:38,586 --> 01:06:40,536 so long as you use something like assignment 1533 01:06:40,896 --> 01:06:42,116 to keep that value around. 1534 01:06:42,806 --> 01:06:45,296 So any questions on bitwise manipulation? 1535 01:06:45,906 --> 01:06:49,996 Alright, so let's lay the foundation here 1536 01:06:49,996 --> 01:06:52,716 for these higher level data structures that I promised. 1537 01:06:52,716 --> 01:06:55,126 So we've used arrays, we talked about link lists, 1538 01:06:55,126 --> 01:06:57,856 you'll use link lists, many of you for problem set 6. 1539 01:06:57,906 --> 01:06:59,706 You'll find that you have some design discretion 1540 01:06:59,706 --> 01:07:00,526 for that problem set. 1541 01:07:00,846 --> 01:07:05,136 But suppose the problem at hand is to actually store data 1542 01:07:05,216 --> 01:07:08,366 like words in a dictionary in some structure 1543 01:07:08,366 --> 01:07:11,166 that facilitates answering questions 1544 01:07:11,166 --> 01:07:12,776 like is this word here. 1545 01:07:12,996 --> 01:07:13,706 A data structure 1546 01:07:13,706 --> 01:07:16,796 that facilitates maybe even sorting this information faster 1547 01:07:16,796 --> 01:07:18,606 than something like a link list could. 1548 01:07:18,836 --> 01:07:20,426 So there's a problem with a linked list. 1549 01:07:20,496 --> 01:07:22,196 If we implement, we're going to have you 1550 01:07:22,246 --> 01:07:25,556 like for problem set 6, 140000 English words. 1551 01:07:25,556 --> 01:07:27,146 And we're going to hand you a really big text file 1552 01:07:27,396 --> 01:07:29,126 and you're going to be challenged with reading 1553 01:07:29,126 --> 01:07:31,706 that file into memory using things like F read 1554 01:07:31,706 --> 01:07:34,046 and other little functions we've seen or will see. 1555 01:07:34,336 --> 01:07:36,616 And you've got to somehow represent all of those strings 1556 01:07:36,616 --> 01:07:37,836 in memory because then we're going 1557 01:07:37,836 --> 01:07:41,706 to make your code spell check actual files, really big files 1558 01:07:41,766 --> 01:07:43,826 like the screenplay from Austin Powers 1559 01:07:43,826 --> 01:07:46,676 and the King James the fifth bible and really large texts 1560 01:07:46,936 --> 01:07:48,706 that will have some spelling mistakes 1561 01:07:48,706 --> 01:07:50,216 or at least unrecognized words. 1562 01:07:50,446 --> 01:07:51,806 And you're going to have to do this quickly. 1563 01:07:51,806 --> 01:07:56,256 Unfortunately if you just load all 140000 words into an array 1564 01:07:56,256 --> 01:07:58,436 or maybe being a little fancier a linked list, 1565 01:07:58,686 --> 01:08:01,936 what's the running time of your lookup function going to be? 1566 01:08:01,936 --> 01:08:03,986 What's the running time of your spell checking function going 1567 01:08:03,986 --> 01:08:04,276 to be? 1568 01:08:04,276 --> 01:08:06,186 I mean it's linear. 1569 01:08:06,286 --> 01:08:09,386 Every time you might have to on average you're going to look 1570 01:08:09,386 --> 01:08:13,436 at half of the word, so 70000 words every damn time we ask a 1571 01:08:13,436 --> 01:08:15,336 question is this word spelled correctly. 1572 01:08:15,616 --> 01:08:16,716 So can we do better? 1573 01:08:16,716 --> 01:08:19,636 Well it turned out we can with any number of data structures, 1574 01:08:19,636 --> 01:08:21,546 one of which is this thing called the hash table. 1575 01:08:21,936 --> 01:08:25,596 And it turns out you'll see this for real in PHP and JAVA script, 1576 01:08:25,906 --> 01:08:27,706 sort of the data structure that's known 1577 01:08:27,706 --> 01:08:29,366 as the universal data structure, 1578 01:08:29,366 --> 01:08:32,706 the one that some geeky computer scientist once joked 1579 01:08:32,756 --> 01:08:35,366 that if you're on a desert island what data structure would 1580 01:08:35,366 --> 01:08:36,206 you want to bring with you? 1581 01:08:36,406 --> 01:08:38,016 Well you'd bring this thing called a hash table 1582 01:08:38,016 --> 01:08:39,706 because you can do most anything with it. 1583 01:08:39,976 --> 01:08:43,756 Whether sloppily or otherwise, but a hash table is simply this. 1584 01:08:44,256 --> 01:08:47,796 It's essentially a big array and before you put something 1585 01:08:47,796 --> 01:08:50,366 into this array you ask yourself the question 1586 01:08:50,366 --> 01:08:52,876 at what location does this new element belong. 1587 01:08:53,166 --> 01:08:55,096 In other words given a word from a dictionary, 1588 01:08:55,386 --> 01:08:57,456 you ask yourself here's a big array 1589 01:08:57,456 --> 01:08:58,666 or here's a bunch of buckets. 1590 01:08:58,886 --> 01:09:01,396 Which bucket should I put this word into? 1591 01:09:01,866 --> 01:09:03,256 Alright, so there's a problem though 1592 01:09:03,256 --> 01:09:04,646 with this approach in general. 1593 01:09:04,646 --> 01:09:07,176 If you're treating a hash table as just a big array 1594 01:09:07,176 --> 01:09:09,056 or a whole bunch of buckets 1595 01:09:09,376 --> 01:09:11,896 and suppose I take in a word like apple. 1596 01:09:11,896 --> 01:09:14,576 And you decide somewhat arbitrarily that all 1597 01:09:14,576 --> 01:09:16,356 of the apple, all the words starting 1598 01:09:16,356 --> 01:09:18,466 with A should probably go where intuitively? 1599 01:09:19,956 --> 01:09:22,316 At the 0 location right, just because right? 1600 01:09:22,316 --> 01:09:23,456 A words will start at the beginning. 1601 01:09:23,456 --> 01:09:24,986 And then I get banana, where should I put 1602 01:09:24,986 --> 01:09:26,446 that into these buckets? 1603 01:09:26,986 --> 01:09:30,596 Put it at location table bracket 1 and then zoo would go 1604 01:09:30,596 --> 01:09:31,976 at the very bottom of the table. 1605 01:09:32,136 --> 01:09:33,056 But there's a problem. 1606 01:09:33,056 --> 01:09:36,186 What if then another word from the dictionary is aardvark. 1607 01:09:36,186 --> 01:09:38,476 Well where does that go? 1608 01:09:38,476 --> 01:09:40,856 If you've already got apple inside this table? 1609 01:09:40,856 --> 01:09:40,923 [ Period of silence ] 1610 01:09:40,923 --> 01:09:45,346 Well those are the only three words so far. 1611 01:09:45,526 --> 01:09:47,036 Apple, banana and zoo. 1612 01:09:47,336 --> 01:09:48,776 And now I insert aardvark. 1613 01:09:49,056 --> 01:09:51,186 Unfortunately bracket 0, bracket 1 1614 01:09:51,186 --> 01:09:53,946 and bracket 25 are already taken using our very simple heuristic, 1615 01:09:53,946 --> 01:09:55,636 so what's, we need to get the word in there. 1616 01:09:55,636 --> 01:09:57,206 We need aardvark to be in the dictionary. 1617 01:09:57,546 --> 01:10:03,126 Where can we put it? 1618 01:10:03,126 --> 01:10:03,386 [ Period of silence ] 1619 01:10:03,386 --> 01:10:04,436 Now you only have, alright, 1620 01:10:04,436 --> 01:10:08,256 so there are what 23 remaining answers. 1621 01:10:08,256 --> 01:10:10,146 You have a 1 in 23 chance of getting this right. 1622 01:10:10,436 --> 01:10:11,486 Where do you put aardvark? 1623 01:10:11,486 --> 01:10:12,836 [ Background noise ] 1624 01:10:12,836 --> 01:10:14,496 So two location, sure. 1625 01:10:14,526 --> 01:10:16,286 Right? In short, who cares, right? 1626 01:10:16,346 --> 01:10:19,556 Put it somewhere and so there's this general approach called 1627 01:10:19,556 --> 01:10:22,436 linear probing which means try to put it where you want to. 1628 01:10:22,436 --> 01:10:25,326 So in this case I'm going to try to put it at the zero location 1629 01:10:25,326 --> 01:10:26,766 because that's where the A word should go. 1630 01:10:27,036 --> 01:10:29,306 Unfortunately I have what's called a collision 1631 01:10:29,596 --> 01:10:31,836 which means I already have something in that location. 1632 01:10:31,836 --> 01:10:35,316 So probe the buckets, probe the hash table, just step by step 1633 01:10:35,316 --> 01:10:36,856 by step until you find an empty slot, 1634 01:10:37,056 --> 01:10:38,516 then plot aardvark in there. 1635 01:10:38,516 --> 01:10:41,176 So we probe bracket 0 and we find apple there. 1636 01:10:41,206 --> 01:10:43,366 We probe the next location and we find banana, ah ha! 1637 01:10:43,726 --> 01:10:46,726 There's no C words yet, let's steal the C words place 1638 01:10:46,726 --> 01:10:48,996 and put aardvark at table bracket 2. 1639 01:10:49,126 --> 01:10:50,366 And you continue in this way. 1640 01:10:50,366 --> 01:10:52,556 So now you are going to bump up against a real world limit. 1641 01:10:52,556 --> 01:10:54,116 How big can this dictionary obviously be? 1642 01:10:55,266 --> 01:10:56,606 So 26 words and that's it. 1643 01:10:56,606 --> 01:10:57,826 So we're kind of screwed 1644 01:10:57,826 --> 01:11:00,096 with our 140000 word dictionary already. 1645 01:11:00,096 --> 01:11:02,366 So hopefully there's a fundamentally better approach 1646 01:11:02,366 --> 01:11:03,856 but there's a problem fundamentally 1647 01:11:03,856 --> 01:11:05,986 with linear probing itself which is 1648 01:11:05,986 --> 01:11:08,136 that you're really hurting your look up time. 1649 01:11:08,386 --> 01:11:10,436 The amazing thing about hash tables is 1650 01:11:10,436 --> 01:11:14,176 that in the best case lookup times for words, lookup times 1651 01:11:14,176 --> 01:11:15,636 for elements are how fast? 1652 01:11:15,966 --> 01:11:19,786 In other words if I ask you is apple a word, how many steps, 1653 01:11:19,786 --> 01:11:21,936 how many units of time do you need to answer that question 1654 01:11:21,936 --> 01:11:22,936 with this data structure? 1655 01:11:22,936 --> 01:11:23,446 [ Background noise ] 1656 01:11:23,446 --> 01:11:24,246 One, right? 1657 01:11:24,946 --> 01:11:28,596 Because I hand you apple, you check the first letter, oh A, 1658 01:11:28,596 --> 01:11:30,636 let me check my 0th location. 1659 01:11:30,876 --> 01:11:33,046 Oh, there's apple, constant time, right? 1660 01:11:33,206 --> 01:11:35,646 One operation of time is required. 1661 01:11:35,816 --> 01:11:37,386 What about banana, is banana a word? 1662 01:11:38,136 --> 01:11:41,576 Check the B, oh brackets 1, yes, banana is a word. 1663 01:11:41,576 --> 01:11:42,976 But now you've got this problem. 1664 01:11:42,976 --> 01:11:44,166 What if I hand you aardvark? 1665 01:11:44,166 --> 01:11:45,166 Is aardvark a word? 1666 01:11:45,486 --> 01:11:48,466 Well, you check its first letter and say damn, apple is there, 1667 01:11:48,766 --> 01:11:50,126 but maybe I've got it elsewhere. 1668 01:11:50,176 --> 01:11:50,996 Let me look. 1669 01:11:50,996 --> 01:11:53,866 And that process of let me look now devolves 1670 01:11:53,866 --> 01:11:55,456 into how many steps in the worse case? 1671 01:11:56,646 --> 01:11:59,126 N, right? So we have this really interesting trade off. 1672 01:11:59,126 --> 01:12:02,966 The motivation for hash tables if really cleverly designed is 1673 01:12:02,966 --> 01:12:07,126 to achieve constant time lookups for pieces of information. 1674 01:12:07,126 --> 01:12:09,396 And we have yet to discover this recently 1675 01:12:09,656 --> 01:12:11,276 because we did have this with arrays. 1676 01:12:11,426 --> 01:12:13,266 But even with arrays our search, 1677 01:12:13,266 --> 01:12:16,616 our search time eventually became log N when we couldn't, 1678 01:12:16,616 --> 01:12:19,096 when we had an arbitrary number of elements in here. 1679 01:12:19,406 --> 01:12:21,376 So this kind of begs the question how big, 1680 01:12:21,376 --> 01:12:23,086 how bad is this problem, right? 1681 01:12:23,086 --> 01:12:26,116 One approach to this problem of having collisions between apple 1682 01:12:26,116 --> 01:12:29,196 and aardvark is fine, yes this was just stupid using 1683 01:12:29,196 --> 01:12:30,646 26 buckets. 1684 01:12:30,906 --> 01:12:32,136 Let's use 1000 buckets. 1685 01:12:32,376 --> 01:12:36,846 And then with use a more sophisticated hash function then 1686 01:12:36,846 --> 01:12:38,286 just checking the first letter, right? 1687 01:12:38,286 --> 01:12:40,416 Slightly more interesting then checking the first letter might 1688 01:12:40,416 --> 01:12:42,426 be why don't we check the first two letters? 1689 01:12:42,426 --> 01:12:45,186 Well that helps us with aardvark and apple 1690 01:12:45,186 --> 01:12:47,326 because we have aa and ap. 1691 01:12:47,416 --> 01:12:49,046 So those would end up in different buckets. 1692 01:12:49,096 --> 01:12:52,036 But I'm sure we could find another word that starts with AA 1693 01:12:52,036 --> 01:12:54,206 or another word that starts with AP that's going 1694 01:12:54,206 --> 01:12:55,786 to create another collision. 1695 01:12:56,006 --> 01:12:58,266 So the problem is that even though this is sort 1696 01:12:58,266 --> 01:13:00,006 of the holy grail of data structures, 1697 01:13:00,036 --> 01:13:02,746 constant time operations because we can blow 1698 01:13:02,876 --> 01:13:05,426 out of the water every other algorithm we've implemented 1699 01:13:05,426 --> 01:13:06,016 thus far. 1700 01:13:06,406 --> 01:13:08,836 We have this problem of collisions and we'll see 1701 01:13:08,836 --> 01:13:12,906 that this problem will become in a room full of N CS50 students, 1702 01:13:12,906 --> 01:13:16,016 what's the probability that 2 of you have the same birthday. 1703 01:13:16,286 --> 01:13:19,386 Turns out that happens with strikingly high probability, 1704 01:13:19,606 --> 01:13:22,526 which means in the analog of the world of apples and aardvarks, 1705 01:13:22,526 --> 01:13:24,876 it's going to happen really frequently here too. 1706 01:13:24,876 --> 01:13:27,006 And so we'll need a much more sophisticated approach 1707 01:13:27,006 --> 01:13:29,376 than this, but for that we wait till Wednesday. 1708 01:13:29,456 --> 01:13:30,686 See you then. 1709 01:13:33,516 --> 01:13:38,920 [ Music ]