1 00:00:09,416 --> 00:00:11,746 >> Alright, welcome back, this is CS50. 2 00:00:11,956 --> 00:00:14,476 This is the start of Week 7. 3 00:00:14,476 --> 00:00:18,556 So, though, this week we'll continue using the language C 4 00:00:18,556 --> 00:00:19,676 for the problems we explore. 5 00:00:19,676 --> 00:00:21,726 We'll actually begin to dive in next week 6 00:00:22,016 --> 00:00:23,566 into a bit of web programming. 7 00:00:23,566 --> 00:00:27,216 And for that we'll introduce a program HTML or XHTML, 8 00:00:27,466 --> 00:00:29,796 a little bit of CSS, both of which are not 9 00:00:29,796 --> 00:00:31,866 so much programming languages as they are markup 10 00:00:31,866 --> 00:00:35,376 or design languages, with which you can construct a web page. 11 00:00:35,376 --> 00:00:38,526 We'll introduce a bit of PHP, a language called SQL, 12 00:00:38,526 --> 00:00:40,086 with which you can query databases, 13 00:00:40,086 --> 00:00:42,846 and then a week after, another language called JavaScript, 14 00:00:42,846 --> 00:00:45,866 and this is all toward an end of applying the same kind of ideas 15 00:00:45,866 --> 00:00:48,776 that we've been exploring thus far this semester, but no longer 16 00:00:48,776 --> 00:00:51,036 at a boring black and white blinking prompt, 17 00:00:51,036 --> 00:00:53,176 but rather in a user interface 18 00:00:53,176 --> 00:00:54,836 with which you're much more familiar these days, 19 00:00:55,116 --> 00:00:56,006 namely the web. 20 00:00:56,086 --> 00:00:57,516 So, that's what's on the horizon. 21 00:00:57,736 --> 00:01:01,956 Problem Set 5, though, involves a little something like this. 22 00:01:02,126 --> 00:01:04,506 So this is one of the images that you might have seen already 23 00:01:04,506 --> 00:01:06,396 in pset5, which went out on Friday. 24 00:01:06,696 --> 00:01:10,006 Hidden inside of this image is actually a secret message, 25 00:01:10,006 --> 00:01:11,726 the answer to a particular riddle, 26 00:01:12,006 --> 00:01:13,966 and this is an example of, what is generally known 27 00:01:13,966 --> 00:01:16,996 as steganography, taking an image, taking a picture 28 00:01:17,246 --> 00:01:19,506 that might just look like red noise, or might look 29 00:01:19,506 --> 00:01:22,116 like a photograph that you took, and using a computer 30 00:01:22,116 --> 00:01:25,416 to manipulate some of the underlying bits that represent 31 00:01:25,416 --> 00:01:28,116 that image to hide some message. 32 00:01:28,116 --> 00:01:31,006 Now, we took a fairly simplistic approach, simply hiding 33 00:01:31,006 --> 00:01:35,236 in this red noise a cyan or a bluish message 34 00:01:35,236 --> 00:01:38,476 with which you can then forward, you can then write code 35 00:01:38,476 --> 00:01:39,986 to recover the bluish text, 36 00:01:40,536 --> 00:01:43,246 but more generally you can take an arbitrary photograph off 37 00:01:43,246 --> 00:01:45,966 of the web, off of your camera, and run it through a program 38 00:01:45,966 --> 00:01:47,956 that maybe you write or that you download or buy 39 00:01:48,436 --> 00:01:51,676 and actually have it hide entire paragraphs of text, 40 00:01:51,946 --> 00:01:54,066 entire books worth of texts depending 41 00:01:54,066 --> 00:01:55,306 on the resolution of the image. 42 00:01:55,306 --> 00:01:56,986 It's actually quite cool what you can do, 43 00:01:56,986 --> 00:01:58,166 and the implications, of course, 44 00:01:58,366 --> 00:01:59,676 are that you don't necessarily have 45 00:01:59,676 --> 00:02:02,336 to use cryptography per se to send a message. 46 00:02:02,336 --> 00:02:05,566 You can send someone an attachment containing an image 47 00:02:05,756 --> 00:02:07,926 that looks completely innocuous to anyone 48 00:02:07,926 --> 00:02:09,906 who might intercept it, and yet, if they run it 49 00:02:09,906 --> 00:02:12,586 through this special program, and change for instance all 50 00:02:12,586 --> 00:02:15,126 of the pixels to red or all of the pixels to blue, 51 00:02:15,126 --> 00:02:17,846 or some of the pixels to red or blue, 52 00:02:18,186 --> 00:02:19,456 something along those lines, 53 00:02:19,716 --> 00:02:22,356 do they then see the message that's revealed. 54 00:02:22,396 --> 00:02:24,366 So, if you like this sort of stuff, certainly Google 55 00:02:24,366 --> 00:02:25,886 around for something called steganography, 56 00:02:25,886 --> 00:02:28,636 and you'll see lots of really neat examples. 57 00:02:28,856 --> 00:02:31,226 But, in this Problem Set 2 we do introduce the idea 58 00:02:31,776 --> 00:02:33,826 of images themselves in file format. 59 00:02:33,826 --> 00:02:35,776 So, we introduced a week or two ago the idea 60 00:02:35,776 --> 00:02:37,726 of actually persisting your data to disc. 61 00:02:37,726 --> 00:02:39,176 Almost all of the programs we wrote 62 00:02:39,176 --> 00:02:41,536 in the first several weeks, as soon as they quit, 63 00:02:41,536 --> 00:02:43,116 as soon as main ended, that was it. 64 00:02:43,296 --> 00:02:46,446 All state, all memory was lost that you might have built 65 00:02:46,446 --> 00:02:47,616 up while running this program. 66 00:02:47,616 --> 00:02:49,946 And then we introduced this ability to save files 67 00:02:49,946 --> 00:02:53,256 with fprintf, and in this problem set we hand you some 68 00:02:53,256 --> 00:02:56,966 files, namely some BMP files for bitmap files 69 00:02:57,226 --> 00:02:58,516 that essentially look like this. 70 00:02:58,516 --> 00:03:01,126 So, if you've ever wondered how you go 71 00:03:01,126 --> 00:03:04,186 around representing an image, well, the simplest way, frankly, 72 00:03:04,396 --> 00:03:07,956 is just to assume that 1 means white, zero means black, 73 00:03:08,136 --> 00:03:11,236 arbitrarily, and then with just those building blocks can you 74 00:03:11,236 --> 00:03:12,706 implement the simplest of pictures. 75 00:03:12,706 --> 00:03:14,366 This is a little smiley face here. 76 00:03:14,616 --> 00:03:17,176 Turns out that all images, all image formats, 77 00:03:17,386 --> 00:03:19,466 are pretty much structured as rectangles. 78 00:03:19,466 --> 00:03:22,706 And so you need to have a row of pixels on the top 79 00:03:22,706 --> 00:03:25,496 and then columns of pixels going top to bottom. 80 00:03:25,736 --> 00:03:26,926 And if you kind of take a step back, 81 00:03:27,196 --> 00:03:30,186 even though these things are not spaced in the same way, 82 00:03:30,536 --> 00:03:34,196 notice how each of the ones at top left match to each 83 00:03:34,196 --> 00:03:36,266 of the white squares at top right. 84 00:03:36,596 --> 00:03:37,886 Similarly are zeros black. 85 00:03:38,136 --> 00:03:40,196 And so this file up here on the left, 86 00:03:40,376 --> 00:03:43,406 if you stored those patterns of bits inside of a file, 87 00:03:43,666 --> 00:03:46,426 you can then say this represents a smiley face. 88 00:03:46,426 --> 00:03:49,076 Now, what does it mean to say it represents a smiley face? 89 00:03:49,316 --> 00:03:51,896 Well, you have to standardize how those bits are laid 90 00:03:51,896 --> 00:03:52,646 out in a file. 91 00:03:52,646 --> 00:03:54,516 This is probably an oversimplification 92 00:03:54,746 --> 00:03:56,336 because these days anything that has 93 00:03:56,336 --> 00:03:58,386 like a three-letter file extension, 94 00:03:58,386 --> 00:04:02,796 .DOC for Microsoft Word documents, .XLS for Excel, 95 00:04:02,796 --> 00:04:07,346 .BMP for bitmap files, they are standardized in some way. 96 00:04:07,346 --> 00:04:09,656 Microsoft or Apple or someone on the Internet said, 97 00:04:09,926 --> 00:04:11,546 you know what, if you have a file that ends 98 00:04:11,546 --> 00:04:14,026 in that extension, the bits inside of it are going 99 00:04:14,026 --> 00:04:15,186 to be laid out as follows. 100 00:04:15,186 --> 00:04:18,616 And so what we do in Problem Set 5 is introduce you to one 101 00:04:18,616 --> 00:04:21,366 such format, this BMP, bitmap file format, 102 00:04:21,616 --> 00:04:25,616 which pretty much does get implemented to like this. 103 00:04:25,896 --> 00:04:28,606 But thankfully you can actually use 24-bit color, 104 00:04:28,606 --> 00:04:30,556 not 1-bit color; a zero or one. 105 00:04:30,786 --> 00:04:32,806 You can actually have 24 bits per pixel, 106 00:04:32,806 --> 00:04:34,626 which gives you millions of total colors 107 00:04:34,626 --> 00:04:35,566 as you'll see in the pset. 108 00:04:35,916 --> 00:04:37,976 But what Microsoft said years ago was 109 00:04:37,976 --> 00:04:41,216 that for this BMP file format, the first several bits, 110 00:04:41,216 --> 00:04:44,546 or the first several bytes in any such file, have to be laid 111 00:04:44,546 --> 00:04:45,776 out in the following way. 112 00:04:45,776 --> 00:04:46,636 In other words, there has 113 00:04:46,636 --> 00:04:50,226 to be what we generally call metadata inside of the file. 114 00:04:50,226 --> 00:04:52,506 It's not necessarily the data you care about, 115 00:04:52,506 --> 00:04:54,706 it's not the zeros and ones that represent the picture, 116 00:04:54,966 --> 00:04:56,796 but it's some ancillary information in, like, 117 00:04:57,026 --> 00:04:59,206 what's the width of this image and what's the height of it 118 00:04:59,206 --> 00:05:01,796 and what's the total size of this file, little clues 119 00:05:01,796 --> 00:05:04,026 that might be helpful to programs like Photoshop 120 00:05:04,026 --> 00:05:05,496 and Windows and Mac OS. 121 00:05:05,496 --> 00:05:07,736 So that when they open this file, they know what to expect 122 00:05:07,816 --> 00:05:09,016 and they know how to display it. 123 00:05:09,276 --> 00:05:11,436 So, this is explained to some detail in this fact, 124 00:05:11,436 --> 00:05:13,926 although we don't care about a lot of these fields, thankfully. 125 00:05:14,166 --> 00:05:17,686 But, you'll see fields called like bfSize, which refers 126 00:05:17,686 --> 00:05:20,726 to the size of the file, biWidth, biHeight. 127 00:05:20,986 --> 00:05:23,186 And notice even though this is laid out in the chart here, 128 00:05:23,756 --> 00:05:25,076 this is just Microsoft's way 129 00:05:25,376 --> 00:05:28,256 of defining the contents of a struct. 130 00:05:28,646 --> 00:05:31,886 So, you'll actually see in a file called bmp.h 131 00:05:32,106 --> 00:05:35,756 that we implement this specification using a little bit 132 00:05:35,756 --> 00:05:38,266 of C code and it looks a little something like this. 133 00:05:38,266 --> 00:05:40,056 Let me go ahead and scroll down, 134 00:05:40,306 --> 00:05:42,346 and notice that there is one of the structs. 135 00:05:42,776 --> 00:05:45,886 It's a struct called a bitmap file header and it seems 136 00:05:45,886 --> 00:05:48,536 to have some of those same variable names, bfSize, 137 00:05:48,536 --> 00:05:50,186 is one of the ones I just mentioned there. 138 00:05:50,186 --> 00:05:52,546 So, I won't go into great detail and lecture because some 139 00:05:52,546 --> 00:05:54,176 of this information you don't need, per se, 140 00:05:54,176 --> 00:05:55,526 and a lot of it is covered in this spec. 141 00:05:56,016 --> 00:05:58,386 But what is curious is that inside 142 00:05:58,386 --> 00:06:00,306 of these files is a whole bunch of data types 143 00:06:00,306 --> 00:06:03,126 that you might not have seen before: WORD, DWORD. 144 00:06:03,636 --> 00:06:05,856 If we scroll down further, I think we see others, 145 00:06:05,966 --> 00:06:07,506 LONG, in all capitals. 146 00:06:07,796 --> 00:06:09,646 So it turns out that in different languages, 147 00:06:09,646 --> 00:06:11,276 and sometimes different operating systems, 148 00:06:11,536 --> 00:06:12,896 there are of course, different types 149 00:06:12,896 --> 00:06:15,816 and Microsoft has standardized, at least back in the day, 150 00:06:16,026 --> 00:06:18,296 along the lines of these types, 151 00:06:18,536 --> 00:06:21,746 but thankfully they map very nicely to a Linux computer 152 00:06:21,746 --> 00:06:24,526 or a Mac OS computer, so long as you know what to expect. 153 00:06:24,646 --> 00:06:26,816 So, you'll see in the spec and in the comments here 154 00:06:27,036 --> 00:06:29,486 that we used typedef, the thing for creating synonyms 155 00:06:29,486 --> 00:06:31,656 for data types, to say that, you know what, 156 00:06:31,656 --> 00:06:35,686 Microsoft calls a word we are going to call a uint16_t. 157 00:06:35,686 --> 00:06:39,906 So, this is actually a type we haven't talked about. 158 00:06:39,906 --> 00:06:41,896 But thankfully there do exist 159 00:06:41,896 --> 00:06:46,126 in C data types whose sizes do not vary based 160 00:06:46,126 --> 00:06:47,466 on the computer you're using. 161 00:06:47,466 --> 00:06:50,466 Right? One of the little disclaimers in the quiz 0, even, 162 00:06:50,466 --> 00:06:53,396 was assume a 32-bit architecture, like the cloud. 163 00:06:53,656 --> 00:06:55,556 Now, we have to say that because if you're on a new 164 00:06:55,556 --> 00:06:57,426 or fancier machine like your own laptop, 165 00:06:57,426 --> 00:07:00,166 it might very well be 64 bits, which means some of the answers 166 00:07:00,166 --> 00:07:02,716 to quiz questions, like how big is a long, how big is an int, 167 00:07:03,006 --> 00:07:05,176 might have actually differed, depending on the machine. 168 00:07:05,346 --> 00:07:06,756 Now, this is incredibly annoying 169 00:07:06,756 --> 00:07:07,786 if you're the computer programmer, 170 00:07:07,786 --> 00:07:10,176 the computer scientist and you need to decide 171 00:07:10,176 --> 00:07:13,026 for yourself how many bits to use to represent a number. 172 00:07:13,026 --> 00:07:16,126 You don't want your program to work or not work just based 173 00:07:16,126 --> 00:07:18,866 on what kind of computer your users happen to be using. 174 00:07:19,186 --> 00:07:20,136 So, thankfully, even 175 00:07:20,136 --> 00:07:22,636 in C there's this additional header file 176 00:07:22,636 --> 00:07:26,476 that we have not used before called stdint.h 177 00:07:26,696 --> 00:07:28,056 and that gives us access 178 00:07:28,056 --> 00:07:29,836 to these cryptic-looking data types, 179 00:07:30,096 --> 00:07:32,376 but whose names pretty much say who they are. 180 00:07:32,376 --> 00:07:35,216 So, for instance, a WORD, in the world of Microsoft, 181 00:07:35,216 --> 00:07:38,096 we have defines as a uint16_t and what does the 182 00:07:38,096 --> 00:07:39,156 "u" probably stand for? 183 00:07:39,996 --> 00:07:40,806 So, unsigned. 184 00:07:40,866 --> 00:07:43,186 This just means that I'm going to use 16 bits 185 00:07:43,446 --> 00:07:44,896 to represent a piece of data, 186 00:07:45,126 --> 00:07:48,886 but know that this value should be unsigned. 187 00:07:48,886 --> 00:07:51,416 In other words, don't waste one of the bits allowing you 188 00:07:51,416 --> 00:07:52,816 to have negative numbers in there. 189 00:07:52,816 --> 00:07:55,976 I'm only going to put positive numbers, or nonnegative numbers 190 00:07:56,236 --> 00:07:58,066 in this unsigned 16-bit int. 191 00:07:58,396 --> 00:08:00,266 A LONG, in the world of Microsoft, 192 00:08:00,266 --> 00:08:02,276 if you read the documentation at these URLs, 193 00:08:02,556 --> 00:08:06,346 that's actually a signed 32-bit integer, so we looked 194 00:08:06,346 --> 00:08:09,396 in our stdint.h file and we pulled out this type, 195 00:08:09,396 --> 00:08:10,976 which is a 32-bit signed int. 196 00:08:11,246 --> 00:08:13,636 And, then, for a byte, and this is frankly, the most useful 197 00:08:13,636 --> 00:08:15,596 in the problem set, because you'll probably be able 198 00:08:15,596 --> 00:08:18,666 to use this same data type for the forensic aspect of it, 199 00:08:18,666 --> 00:08:19,786 and more on that in just a moment. 200 00:08:19,916 --> 00:08:22,896 And notice that we have defined what Microsoft calls a byte 201 00:08:22,896 --> 00:08:25,516 and what anyone in this room would now just call a chunk 202 00:08:25,516 --> 00:08:29,626 of 8 bits, as a uint8_t. 203 00:08:29,916 --> 00:08:33,666 So, you could define a byte as pretty much just a char, right? 204 00:08:33,666 --> 00:08:36,376 A char as far as we've discussed is just 8 bits, 205 00:08:36,826 --> 00:08:38,516 but that's not... 206 00:08:38,796 --> 00:08:41,146 Technically, even though we call a char a char, 207 00:08:41,146 --> 00:08:43,126 which kind of implies a character is going to go 208 00:08:43,126 --> 00:08:45,816 in here, that's not really a strict requirement. 209 00:08:45,816 --> 00:08:49,766 A char in our context is technically just an 8-bit value. 210 00:08:50,036 --> 00:08:53,486 And even though we tend to use a char to represent A's and B's 211 00:08:53,486 --> 00:08:56,186 and C's, technically you can put any 8-bit numbers 212 00:08:56,186 --> 00:08:58,936 in there you want: zero on up to 255. 213 00:08:59,326 --> 00:09:02,386 So, in this context, when you're actually working with bits 214 00:09:02,386 --> 00:09:05,576 in a file, bits inside of an image, a bitmap file, 215 00:09:05,886 --> 00:09:10,886 you really don't want to confuse 8-bit chunks, bytes, as numbers, 216 00:09:10,886 --> 00:09:13,146 as characters, because obviously an image is not going 217 00:09:13,146 --> 00:09:14,446 to have A's, B's and C's in it. 218 00:09:14,446 --> 00:09:15,886 It's going to have just zeros and ones. 219 00:09:16,196 --> 00:09:19,086 So this is our way of specifying very precisely, 220 00:09:19,326 --> 00:09:22,436 assume that a byte is going to be an unsigned value, 221 00:09:22,496 --> 00:09:25,396 that is there's no notion of negativity in this context, 222 00:09:25,626 --> 00:09:29,446 I only need 8 bits for a byte and this typedef here mandates 223 00:09:29,576 --> 00:09:33,266 that in my code for this pset, I will always be treating a byte 224 00:09:33,476 --> 00:09:36,436 as 8 bits and with no notion of negativity. 225 00:09:36,436 --> 00:09:37,766 And, this is a very useful thing, 226 00:09:37,976 --> 00:09:40,216 so that your code doesn't break and in quizzes frankly, 227 00:09:40,216 --> 00:09:42,526 you don't have to have these parentheticals saying 228 00:09:42,526 --> 00:09:44,106 on this type of architecture. 229 00:09:44,106 --> 00:09:47,856 It will just work because now we have more portable types. 230 00:09:48,286 --> 00:09:51,566 But, moving to something that's a little more fun, 231 00:09:51,566 --> 00:09:54,436 we did in fact, take our stroll across campus, 232 00:09:54,436 --> 00:09:56,096 although the weather wasn't great the past week 233 00:09:56,096 --> 00:09:58,836 or so our stroll rather devolved into a stroll along Facebook 234 00:09:58,836 --> 00:10:01,756 and Google Images where we actually found a whole bunch 235 00:10:01,756 --> 00:10:03,526 of fun pictures of computer scientists, 236 00:10:03,526 --> 00:10:05,816 one being Matt Chartier here, one of our teaching fellows. 237 00:10:06,136 --> 00:10:08,326 And what you'll find is, as promised, 238 00:10:08,326 --> 00:10:11,716 I accidentally formatted my CompactFlash card 239 00:10:11,716 --> 00:10:14,166 on which I was taking all of these photographs. 240 00:10:14,166 --> 00:10:17,726 Now, it's not really possible to share a CompactFlash card, 241 00:10:17,726 --> 00:10:19,756 or a little memory stick among 500 people, 242 00:10:19,946 --> 00:10:23,536 and so what we did was, sort of industry standard practice, 243 00:10:23,746 --> 00:10:26,846 we made a forensic image of this CompactFlash card. 244 00:10:27,036 --> 00:10:29,406 I plugged it into one of those memory card readers that most 245 00:10:29,406 --> 00:10:32,056 of you have, or I plugged it into a camera with one 246 00:10:32,096 --> 00:10:34,006 of those USB cables so that it could talk to my computer 247 00:10:34,006 --> 00:10:36,596 and then I ran a special command on my computer 248 00:10:36,596 --> 00:10:38,916 that says start grabbing all of the bits 249 00:10:38,916 --> 00:10:41,596 from this CompactFlash card until I say stop. 250 00:10:41,596 --> 00:10:44,726 And I said stop after about 4 million or so bits 251 00:10:44,726 --> 00:10:46,156 because I thought, okay that's enough. 252 00:10:46,436 --> 00:10:50,476 I didn't take, what is it, a gigabyte worth of photos, 253 00:10:50,706 --> 00:10:52,066 I just had a few megabytes worth. 254 00:10:52,066 --> 00:10:54,336 So I then packaged up a forensic image. 255 00:10:54,336 --> 00:10:57,286 It's about 4 megabytes and inside of there are all 256 00:10:57,286 --> 00:10:59,126 of the JPEGs from this problem set. 257 00:10:59,126 --> 00:11:00,576 So the program you're going to have to write 258 00:11:00,576 --> 00:11:03,776 for the latter part of the pset is a program called Recover 259 00:11:04,076 --> 00:11:07,536 that opens up a file, namely this forensic image, 260 00:11:07,536 --> 00:11:10,786 called card.raw and it's going to have to iterate for all 261 00:11:10,786 --> 00:11:12,806 of the bits in this file from top to bottom 262 00:11:13,076 --> 00:11:15,086 and every time it sees what looks 263 00:11:15,086 --> 00:11:19,026 like a JPEG you should actually start copying those bits 264 00:11:19,376 --> 00:11:21,656 to a new file called something.jpeg 265 00:11:21,656 --> 00:11:24,796 and then the next time you see what looks like another JPEG, 266 00:11:25,006 --> 00:11:26,386 you close that previous file 267 00:11:26,386 --> 00:11:29,486 and you open a new one using a function called fwrite 268 00:11:29,486 --> 00:11:30,596 and fopen, as you'll see, 269 00:11:30,766 --> 00:11:33,086 and you start writing those same bits to the next file. 270 00:11:33,126 --> 00:11:35,646 And in the end you should find yourself inside 271 00:11:35,646 --> 00:11:39,476 of your pset5 JPEG directory with 50 JPEGs, 272 00:11:39,846 --> 00:11:43,136 all of which were once on this digital camera's memory card. 273 00:11:43,456 --> 00:11:44,826 And then the real fun begins. 274 00:11:44,826 --> 00:11:47,246 On a per-section basis you are challenged over the next couple 275 00:11:47,246 --> 00:11:50,596 of weeks to find the people in these photos, and Matt being one 276 00:11:50,596 --> 00:11:52,316 of them, and photograph yourself with them. 277 00:11:52,316 --> 00:11:53,676 And the section that does this first 278 00:11:53,676 --> 00:11:57,326 and most accurately will win, as we promised an amazing prize. 279 00:11:57,636 --> 00:12:01,326 So, with that said, the Hacker Edition. 280 00:12:01,326 --> 00:12:04,056 So, the Hacker edition is mostly similar because this is, 281 00:12:04,056 --> 00:12:07,336 frankly, too much fun of a pset to deprive even the hacker types 282 00:12:07,336 --> 00:12:09,066 of doing some of these same problems. 283 00:12:09,306 --> 00:12:12,006 But in Hacker edition you will find that one of the components 284 00:12:12,006 --> 00:12:14,596 of the pset on both standard and hacker requires 285 00:12:14,596 --> 00:12:17,926 that you implement a program with which to resize an image. 286 00:12:18,136 --> 00:12:20,306 Now, resizing an image is actually relatively 287 00:12:20,306 --> 00:12:21,776 straightforward if it looks like this. 288 00:12:22,026 --> 00:12:24,966 If you think of a picture as a grid of pixels, top to bottom, 289 00:12:24,966 --> 00:12:28,946 left to right, if I want to make this smiley face twice as big, 290 00:12:29,316 --> 00:12:32,476 well, I can simply convert each of these single pixels, 291 00:12:32,476 --> 00:12:34,776 individual squares, to just four. 292 00:12:34,926 --> 00:12:37,966 Right, because if I just move one pixel and make it 2 wide 293 00:12:37,966 --> 00:12:41,046 and 2 tall that will essentially double the size of the image 294 00:12:41,046 --> 00:12:43,876 so you can relatively easily do this with a couple of for loops. 295 00:12:44,126 --> 00:12:45,456 What's a little trickier, though, 296 00:12:45,456 --> 00:12:48,106 is that if you actually have a bigger image, say a bitmap 297 00:12:48,106 --> 00:12:52,396 of Matt and we say don't blow Matt up, but shrink him down, 298 00:12:52,826 --> 00:12:56,316 right, then, you have to decide which pixels do you throw away, 299 00:12:56,316 --> 00:12:59,516 which do you compress and how do you actually take an arbitrarily 300 00:12:59,516 --> 00:13:01,996 large bitmap file and shrink it down. 301 00:13:01,996 --> 00:13:03,596 You're going to have to make some judgment calls 302 00:13:03,596 --> 00:13:05,456 in the Hacker edition of this pset. 303 00:13:05,676 --> 00:13:10,026 So, SFTP, quick demonstration of this. 304 00:13:10,026 --> 00:13:14,106 It is of course, the case that when you SSH to cloud CS50.net, 305 00:13:14,106 --> 00:13:15,956 you're sitting at your own laptop or your own desktop 306 00:13:16,206 --> 00:13:19,306 and you're connecting to our servers, cloud.cs50.net, 307 00:13:19,586 --> 00:13:20,726 where you have your home directories 308 00:13:20,726 --> 00:13:23,476 and your storage space and all of this. 309 00:13:23,476 --> 00:13:27,646 Now, this is a bit of an issue if the purpose of this pset is 310 00:13:27,646 --> 00:13:30,916 to write a program that creates JPEGs and bitmap files, 311 00:13:31,266 --> 00:13:33,556 but those bitmap files and JPEG files are now going to live 312 00:13:33,556 --> 00:13:36,776 on the cloud, which means you can't just double click on them 313 00:13:36,776 --> 00:13:40,106 because again, SSH has never been a point and click interface 314 00:13:40,356 --> 00:13:42,846 and so you need a program with which to transfer those files 315 00:13:42,846 --> 00:13:43,936 down to your own computer 316 00:13:44,186 --> 00:13:46,786 and thus there exists this other protocol SFTP, 317 00:13:46,786 --> 00:13:48,826 for Secure File Transfer Protocol. 318 00:13:49,046 --> 00:13:50,866 It's better than its predecessor, 319 00:13:50,866 --> 00:13:53,866 File Transfer Protocol, which was not secure. 320 00:13:53,866 --> 00:13:56,066 Passwords were sent in the clear. 321 00:13:56,546 --> 00:13:59,516 So, on a Mac you can run, as the pset says, 322 00:13:59,516 --> 00:14:00,956 a program called Cyberduck. 323 00:14:00,956 --> 00:14:02,586 It's a free program that you can download. 324 00:14:03,126 --> 00:14:05,496 You'll click a little icon like this, open connection, 325 00:14:05,706 --> 00:14:08,186 and there's only a couple of details to be wary of here 326 00:14:08,186 --> 00:14:10,146 and we have a how-to online so you don't have 327 00:14:10,146 --> 00:14:11,916 to retain all of this right now. 328 00:14:12,006 --> 00:14:14,606 You'll see in this dropdown menu, each program can connect 329 00:14:14,606 --> 00:14:17,486 to servers in all sorts of ways, to different protocols. 330 00:14:17,746 --> 00:14:20,056 The default tends to be this insecure one, 331 00:14:20,236 --> 00:14:22,346 so you want to make sure you're using the secure version, 332 00:14:22,346 --> 00:14:23,286 if you're a Mac user. 333 00:14:23,556 --> 00:14:25,696 For server, you of course type cloud.cs50.net. 334 00:14:25,736 --> 00:14:28,706 You want to leave the port as 22. 335 00:14:28,706 --> 00:14:30,786 It's the same number as SSH uses. 336 00:14:31,046 --> 00:14:33,426 You type in your username, you type in your password, 337 00:14:33,426 --> 00:14:35,406 you click connect and then you'll be connected 338 00:14:35,406 --> 00:14:36,096 to the cloud. 339 00:14:36,096 --> 00:14:39,116 And just as though you typed ls you'll see the contents 340 00:14:39,116 --> 00:14:40,116 of your home directory. 341 00:14:40,366 --> 00:14:43,506 But now you have a GUI, a Graphical User Interface. 342 00:14:43,766 --> 00:14:46,476 So, if you want you can do things like drag a file 343 00:14:46,646 --> 00:14:48,826 and then just over this window and let go 344 00:14:48,826 --> 00:14:50,146 and the file will be transferred. 345 00:14:50,376 --> 00:14:52,716 Now, in the context of Windows, same exact thing, 346 00:14:52,716 --> 00:14:54,266 but with a different program. 347 00:14:54,526 --> 00:14:55,976 It happens to be called WinSCP, 348 00:14:55,976 --> 00:14:58,126 and if you follow the instructions here 349 00:14:58,126 --> 00:15:00,186 with screen shots you'll see exactly the same process. 350 00:15:00,336 --> 00:15:01,766 And then it just boils down to the obvious. 351 00:15:01,766 --> 00:15:03,076 If you've got JPEGs on the server, 352 00:15:03,076 --> 00:15:04,566 click and drag them to your hard drive. 353 00:15:04,786 --> 00:15:06,426 If you want to upload something to the cloud, 354 00:15:06,766 --> 00:15:07,986 do the opposite process. 355 00:15:08,086 --> 00:15:10,586 So, that is a little something called SFTP. 356 00:15:10,856 --> 00:15:13,686 Before I forget, the "Harvard Science Review," a student group 357 00:15:13,686 --> 00:15:16,316 on campus is looking for webmasters with which 358 00:15:16,316 --> 00:15:17,386 to make their website. 359 00:15:17,646 --> 00:15:20,256 This is actually a very common request this time of year 360 00:15:20,256 --> 00:15:21,696 as we look towards final projects 361 00:15:21,696 --> 00:15:22,676 in just a few weeks time. 362 00:15:23,206 --> 00:15:26,516 If you are interested in joining them as a webmaster go ahead 363 00:15:26,516 --> 00:15:29,746 to Google Harvard Science Review Campus publication, 364 00:15:29,746 --> 00:15:32,486 click contact and let them know that you might be interested. 365 00:15:32,486 --> 00:15:35,436 In fact, these kinds of things are certainly viable options 366 00:15:35,476 --> 00:15:37,976 for final projects in the course. 367 00:15:38,016 --> 00:15:41,616 More on that on Monday when we dive into web programming. 368 00:15:42,076 --> 00:15:46,546 Alright, so a couple of whirlwind comments on quiz 0? 369 00:15:46,736 --> 00:15:49,176 So, overall it seemed to go quite well. 370 00:15:49,286 --> 00:15:52,326 We had a good time with it, but certainly do reach out to us 371 00:15:52,326 --> 00:15:54,506 if you have any questions or concerns especially if you are 372 00:15:54,506 --> 00:15:55,466 on the lower end of things. 373 00:15:55,736 --> 00:15:58,266 Realize plenty of opportunities still remain with psets, 374 00:15:58,586 --> 00:16:01,256 another quiz, and the final project to bolster any scores 375 00:16:01,256 --> 00:16:02,426 with which you might be disappointed. 376 00:16:02,756 --> 00:16:05,936 But a couple of highlights of very common mistakes 377 00:16:05,936 --> 00:16:07,996 that we all saw while grading the quiz. 378 00:16:07,996 --> 00:16:09,766 So, this was an excerpt of mine 379 00:16:10,076 --> 00:16:13,816 from the problem involving bottles whereby there was a 380 00:16:13,816 --> 00:16:14,666 scoping issue. 381 00:16:14,706 --> 00:16:17,516 We declared a variable s, assigned at the string bottles, 382 00:16:17,716 --> 00:16:18,596 and then we printed it. 383 00:16:18,736 --> 00:16:22,006 A very common answer was that the problem in this program is 384 00:16:22,006 --> 00:16:24,886 that you actually have to dereference s by doing this. 385 00:16:26,346 --> 00:16:27,986 We'll refer to the online answer key 386 00:16:27,986 --> 00:16:30,086 to see exactly what the real answers were, 387 00:16:30,306 --> 00:16:31,706 but this is in fact, not correct. 388 00:16:31,706 --> 00:16:34,396 If s is declared as a char*, 389 00:16:34,396 --> 00:16:36,626 it is of course, by nature, a pointer. 390 00:16:36,656 --> 00:16:40,526 If you say char* = "hello"; something like this, 391 00:16:40,526 --> 00:16:41,776 it is of course a pointer. 392 00:16:42,066 --> 00:16:44,206 %s means put a string here. 393 00:16:44,206 --> 00:16:45,076 Well, what's a string? 394 00:16:45,256 --> 00:16:46,926 Well, we realized in the past week or two 395 00:16:46,926 --> 00:16:50,196 that string is just a pointer to a chunk of memory. 396 00:16:50,196 --> 00:16:53,216 It's the first byte of an array of characters. 397 00:16:53,486 --> 00:16:57,456 So it's absolutely proper to just say something like s here. 398 00:16:57,866 --> 00:17:00,676 If instead you really wanted to use this * notation, 399 00:17:00,986 --> 00:17:03,156 what is that really pointing at? 400 00:17:03,156 --> 00:17:07,366 When you dereference s, what do you get back? 401 00:17:08,096 --> 00:17:09,656 The first character, right? 402 00:17:09,656 --> 00:17:11,756 Because the star operator is the dereference operator, 403 00:17:11,756 --> 00:17:14,156 and it says go there, there being an address. 404 00:17:14,556 --> 00:17:17,336 So, really if you really want to use that syntax, that's fine, 405 00:17:17,576 --> 00:17:20,136 but you have to realize that the appropriate format code is then 406 00:17:20,136 --> 00:17:23,686 going to be %c, because what's at that address, specifically 407 00:17:23,686 --> 00:17:26,446 at that byte, is of course just a character. 408 00:17:26,446 --> 00:17:28,566 So, do let us know if you do have questions 409 00:17:28,566 --> 00:17:30,496 about that question in particular. 410 00:17:30,746 --> 00:17:33,616 Another common gotcha was this one. 411 00:17:33,616 --> 00:17:37,136 So even though we talk about zero in many different ways, 412 00:17:37,136 --> 00:17:39,166 there's the zero that you type in at the keyboard, 413 00:17:39,166 --> 00:17:41,716 there's the zero which is the decimal number, there's NULL 414 00:17:41,716 --> 00:17:44,056 which is supposedly zero, there's 0, 415 00:17:44,056 --> 00:17:45,226 which is supposedly zero, 416 00:17:45,426 --> 00:17:47,796 all of these things actually have different data types 417 00:17:47,796 --> 00:17:50,116 or they're slightly different in different contexts. 418 00:17:50,356 --> 00:17:53,406 So, realize that NULL's type is a pointer. 419 00:17:53,546 --> 00:17:56,996 When we say NULL in a program you're technically saying this, 420 00:17:57,306 --> 00:18:00,186 in case you've seen this in any book. 421 00:18:00,186 --> 00:18:07,096 NULL is typedef'd in stdlib.h to be a ((void *) 0). 422 00:18:07,646 --> 00:18:09,016 So, what does this mean? 423 00:18:09,206 --> 00:18:11,746 This means yes, it's zero, but it's a pointer. 424 00:18:11,746 --> 00:18:12,646 What type of pointer? 425 00:18:12,646 --> 00:18:13,686 And I don't really know, 426 00:18:13,686 --> 00:18:15,596 so it turns out in C there exists this notion 427 00:18:15,596 --> 00:18:18,736 of a void * pointer, which just means it's a generic pointer. 428 00:18:18,736 --> 00:18:20,176 It could point at anything, I don't know, 429 00:18:20,476 --> 00:18:21,776 but it is in fact, a pointer. 430 00:18:21,776 --> 00:18:23,286 It's not in fact, zero. 431 00:18:23,566 --> 00:18:28,086 This is different now from saying '0', 432 00:18:28,406 --> 00:18:31,156 which is actually the NULL terminating character. 433 00:18:31,516 --> 00:18:35,146 Even though it's called NULL, most people write it as NULL, 434 00:18:35,146 --> 00:18:37,376 which is a stupid distinction to make admittedly, 435 00:18:37,656 --> 00:18:40,546 but it is a character so its data type is char. 436 00:18:40,796 --> 00:18:43,046 The relevance of this, of course, was in strlen. 437 00:18:43,146 --> 00:18:44,936 When some of you were implementing strlen, 438 00:18:45,296 --> 00:18:47,226 you probably did something like... 439 00:18:47,346 --> 00:18:53,556 you probably wanted to check if (s[0] == NULL). 440 00:18:53,556 --> 00:18:56,086 This was not correct because if s was a string 441 00:18:56,286 --> 00:18:58,446 and s[i] is a character, what you really wanted 442 00:18:58,446 --> 00:19:00,166 to check for was this. 443 00:19:00,426 --> 00:19:02,366 So, just do beware of that distinction there. 444 00:19:02,766 --> 00:19:05,216 And then finally this guy too. 445 00:19:05,216 --> 00:19:08,456 So n is a new line and r is a carriage return. 446 00:19:08,456 --> 00:19:11,336 This is not a sort of important question at first glance, 447 00:19:11,586 --> 00:19:13,786 but it is important in the context of portability 448 00:19:13,786 --> 00:19:15,836 of files especially as you exit the course, 449 00:19:15,836 --> 00:19:17,846 you're doing research, or projects where you have 450 00:19:17,846 --> 00:19:20,316 to import files or data sets from other people. 451 00:19:20,616 --> 00:19:23,206 The fact of the matter is the stupid distinction amongst 452 00:19:23,206 --> 00:19:25,726 operating systems does tend to create problems. 453 00:19:26,026 --> 00:19:32,426 Linux systems do use n, Mac's use, to denote the end 454 00:19:33,516 --> 00:19:36,426 of a line, r, and then Windows uses 455 00:19:36,426 --> 00:19:40,746 for good measure both r or n. 456 00:19:40,746 --> 00:19:42,476 So, that's why this is germane 457 00:19:42,476 --> 00:19:45,306 to anyone actually manipulating files in the end. 458 00:19:45,306 --> 00:19:46,276 And this one too. 459 00:19:46,496 --> 00:19:50,126 This one's pretty well explained in the quiz solution itself, 460 00:19:50,376 --> 00:19:52,236 but do make sure you understand the distinction 461 00:19:52,236 --> 00:19:54,866 between including the header file this is linking 462 00:19:54,986 --> 00:19:56,356 against a piece of code. 463 00:19:56,626 --> 00:19:58,646 So, looking ahead, you guys for the first time this year have 464 00:19:58,646 --> 00:20:01,016 to preterm plan, which means unofficially say 465 00:20:01,286 --> 00:20:03,126 to the world what courses you're probably going 466 00:20:03,126 --> 00:20:03,826 to take next year. 467 00:20:04,426 --> 00:20:06,346 So we actually thought we'd use this, and we'll discuss more 468 00:20:06,346 --> 00:20:09,916 on Wednesday, as an opportunity to consider what kind 469 00:20:09,916 --> 00:20:12,676 of mindset you should have going into final projects. 470 00:20:12,676 --> 00:20:15,336 So, we recently surveyed 1600 or so people 471 00:20:15,336 --> 00:20:17,146 that used Harvard courses this past fall. 472 00:20:17,386 --> 00:20:19,496 This again, was the CS50 app with which you can shop 473 00:20:19,496 --> 00:20:20,866 for courses a little more easily. 474 00:20:21,146 --> 00:20:23,606 We got back responses from these folks, which we've taken 475 00:20:23,606 --> 00:20:25,226 into account all of their suggestions, 476 00:20:25,226 --> 00:20:28,636 kind of considered what were the most important feature requests, 477 00:20:28,636 --> 00:20:30,686 and on Wednesday we'll release a newer version of this 478 00:20:30,716 --> 00:20:33,166 which improves upon this previous version. 479 00:20:33,166 --> 00:20:34,706 The point here is not so much to dwell 480 00:20:34,876 --> 00:20:36,246 on this particular application. 481 00:20:36,246 --> 00:20:38,526 But when it comes time to design final projects, 482 00:20:38,526 --> 00:20:40,756 especially if you have your eye on doing something 483 00:20:40,756 --> 00:20:42,806 that affects campus, affects other people, 484 00:20:43,066 --> 00:20:46,136 builds you a startup of sorts, I'm asking these kinds 485 00:20:46,136 --> 00:20:50,476 of questions, how do you design software well, how do you design 486 00:20:50,476 --> 00:20:53,776 with user interface in mind, is certainly important. 487 00:20:53,776 --> 00:20:56,156 And we'll also reveal some of the gripes some of you had, 488 00:20:56,156 --> 00:20:58,046 anonymously, with various programs 489 00:20:58,046 --> 00:21:00,016 and machines you've seen in the world, 490 00:21:00,106 --> 00:21:02,566 excluding the MBTA system. 491 00:21:03,016 --> 00:21:04,536 So, any questions, administratively 492 00:21:04,536 --> 00:21:06,106 on really, any of all that? 493 00:21:07,486 --> 00:21:09,186 That was a lot of boring stuff, I felt. 494 00:21:10,216 --> 00:21:11,786 Alright, no more boring stuff. 495 00:21:12,036 --> 00:21:12,996 Alright, so Mather House. 496 00:21:13,616 --> 00:21:17,306 So, this is where we left off right before Quiz 0, 497 00:21:17,496 --> 00:21:20,556 and it was teasing you with additional data structures. 498 00:21:20,556 --> 00:21:22,516 Thus far our toolkit's been pretty sparse. 499 00:21:22,716 --> 00:21:25,706 We introduced variables way early on in week 0 with Scratch 500 00:21:25,706 --> 00:21:26,986 so you could actually store some data. 501 00:21:27,236 --> 00:21:29,376 We introduced lists in the context of Scratch. 502 00:21:29,376 --> 00:21:30,886 You can add like, an inventory 503 00:21:31,256 --> 00:21:33,966 for the fruit-playing RPG, that kind of context. 504 00:21:33,966 --> 00:21:36,706 And then in C we had arrays, which were similar to lists, 505 00:21:36,826 --> 00:21:39,056 but we had some frustrations with arrays 506 00:21:39,056 --> 00:21:41,976 in C already whereby you have to know their size in advance, 507 00:21:42,246 --> 00:21:44,566 you can very easily segfault if you go too far, 508 00:21:44,566 --> 00:21:45,796 so there's some downsides there. 509 00:21:46,026 --> 00:21:47,676 We began to address some of those downsides 510 00:21:47,676 --> 00:21:48,716 with that data structure 511 00:21:48,716 --> 00:21:51,586 by introducing linked lists two weeks ago. 512 00:21:51,866 --> 00:21:53,266 And what was one of the selling points 513 00:21:53,266 --> 00:21:54,826 of a linked list versus an array? 514 00:21:55,646 --> 00:21:58,186 So, you could insert at any point. 515 00:21:58,246 --> 00:22:00,826 Rather than in an array, whereby if you want to put something 516 00:22:00,826 --> 00:22:02,936 in the middle of all these people, we would have to say, 517 00:22:02,936 --> 00:22:04,616 you know, hey, please move all the way down, 518 00:22:04,616 --> 00:22:07,366 and then every element that has to be moved, which is expensive. 519 00:22:07,366 --> 00:22:08,806 That's a linear time operation. 520 00:22:09,106 --> 00:22:11,856 With a linked list you find your location and then you just say, 521 00:22:11,856 --> 00:22:14,036 you know, excuse me, just move one person over 522 00:22:14,226 --> 00:22:17,406 and then update your pointers which only took two 523 00:22:17,486 --> 00:22:19,076 or three operations total. 524 00:22:19,076 --> 00:22:20,956 You, of course, still have to find that location, 525 00:22:20,956 --> 00:22:22,826 but at least you still have this dynamism. 526 00:22:22,826 --> 00:22:24,036 Where over in the linked list, 527 00:22:24,276 --> 00:22:26,196 you don't necessarily have a bound 528 00:22:26,196 --> 00:22:27,636 on how many elements you can fit. 529 00:22:27,636 --> 00:22:30,916 You're only limited by the amount of RAM in your computer, 530 00:22:30,916 --> 00:22:33,486 the amount of RAM that malloc is willing to return to you. 531 00:22:33,646 --> 00:22:35,066 But what's a downside, someone else, 532 00:22:35,066 --> 00:22:38,146 of using a linked list versus an array? 533 00:22:39,516 --> 00:22:43,206 There's never a 100 percent improvement. 534 00:22:44,156 --> 00:22:44,916 What's a downside? 535 00:22:44,916 --> 00:22:46,416 [ Inaudible audience response ] 536 00:22:46,416 --> 00:22:46,716 >> I'm sorry? 537 00:22:46,716 --> 00:22:47,796 [ Inaudible audience response ] 538 00:22:47,796 --> 00:22:49,116 >> So, you lose random access. 539 00:22:49,116 --> 00:22:52,906 Whereas, an array, you knew that everything was equidistant 540 00:22:52,906 --> 00:22:53,806 and so you could use this 541 00:22:53,806 --> 00:22:53,873 [ ] 542 00:22:53,873 --> 00:22:56,086 notation and just jump magically, 543 00:22:56,086 --> 00:22:57,876 immediately to any element in the array. 544 00:22:58,126 --> 00:23:01,216 With a linked list, you can't just jump to the middle element 545 00:23:01,416 --> 00:23:03,426 because you don't know where that middle element is. 546 00:23:03,426 --> 00:23:05,256 In fact, with a linked list, you only know 547 00:23:05,256 --> 00:23:06,676 where the first element is. 548 00:23:06,726 --> 00:23:10,386 Thankfully he or she knows where the next element is, 549 00:23:10,636 --> 00:23:12,756 so everything is chained together nicely, 550 00:23:12,986 --> 00:23:14,916 but you do in fact, lose that random access. 551 00:23:15,166 --> 00:23:15,906 Now, who cares? 552 00:23:15,906 --> 00:23:19,816 What's the downside of losing this random access? 553 00:23:19,816 --> 00:23:21,416 [ Inaudible audience response ] 554 00:23:21,416 --> 00:23:22,956 >> Linear search. 555 00:23:22,956 --> 00:23:25,156 Alright, so the searching process devolves 556 00:23:25,156 --> 00:23:26,416 into something linear again, right. 557 00:23:26,416 --> 00:23:27,766 You can't leverage binary search, 558 00:23:27,766 --> 00:23:29,936 you can't leverage divide and conquer, you can't leverage some 559 00:23:29,936 --> 00:23:32,686 of these really useful, simple ideas 560 00:23:32,686 --> 00:23:34,576 that we've been taking advantage of sometimes. 561 00:23:34,576 --> 00:23:36,256 So, again, there's a tradeoff here. 562 00:23:36,256 --> 00:23:39,036 So the data structure we introduced next was this notion 563 00:23:39,036 --> 00:23:40,676 of a stack and this was actually just 564 00:23:40,676 --> 00:23:43,406 to formalize the data structure we've been using for sometime. 565 00:23:43,716 --> 00:23:45,976 So, the stack of trays conveys this idea 566 00:23:45,976 --> 00:23:49,896 that when you put a tray on top, the one that you are going 567 00:23:49,896 --> 00:23:52,966 to take off first, unless you want to make a huge scene, 568 00:23:52,966 --> 00:23:54,256 is going to be which one, obviously? 569 00:23:54,806 --> 00:23:55,856 So, the top, right? 570 00:23:55,856 --> 00:24:01,246 So, a stack is a data structure where it is last in first out, 571 00:24:01,246 --> 00:24:04,036 or a computer scientists would typically write LIFO, 572 00:24:04,036 --> 00:24:05,406 last in first out. 573 00:24:05,676 --> 00:24:07,586 Now, this might seem a weird structure. 574 00:24:07,586 --> 00:24:09,776 For instance, it would be kind of unfair 575 00:24:09,776 --> 00:24:13,136 if an amusement park implemented a stack for the line, right? 576 00:24:13,136 --> 00:24:15,626 That would mean that the last person to get on line 577 00:24:15,626 --> 00:24:17,816 for some roller coaster is going to be the first one to get on. 578 00:24:18,006 --> 00:24:19,296 Might be great to be that person 579 00:24:19,506 --> 00:24:21,206 but it's not necessarily quite fair. 580 00:24:21,436 --> 00:24:23,256 Now, we have certainly used the stack 581 00:24:23,256 --> 00:24:24,736 in the context of computing, right? 582 00:24:24,736 --> 00:24:28,186 The whole layout of memory leverages this idea of a stack, 583 00:24:28,186 --> 00:24:30,306 which is useful because it does make sense 584 00:24:30,306 --> 00:24:32,286 that when a function gets called, and another one 585 00:24:32,286 --> 00:24:34,976 and another one, you don't want to jump back down to this guy. 586 00:24:35,226 --> 00:24:37,066 You want to slowly unravel this stack 587 00:24:37,276 --> 00:24:39,166 and pop frames off one at a time. 588 00:24:39,516 --> 00:24:42,376 But sometimes a queue makes more sense. 589 00:24:42,376 --> 00:24:44,586 So a queue is another data structure 590 00:24:44,586 --> 00:24:45,656 in computer science terms. 591 00:24:45,656 --> 00:24:47,526 We've now got linked lists, we've got arrays, 592 00:24:47,526 --> 00:24:50,166 but arrays are avery limited data structure. 593 00:24:50,416 --> 00:24:51,686 We have a notion of a stack, 594 00:24:52,056 --> 00:24:53,326 which we can certainly implement, 595 00:24:53,326 --> 00:24:55,036 and we have the notion of a queue, 596 00:24:55,346 --> 00:24:57,186 which we can certainly implement as well. 597 00:24:57,406 --> 00:25:00,426 It turns out, when we start talking about more interesting, 598 00:25:00,426 --> 00:25:02,066 more sophisticated data structures, 599 00:25:02,386 --> 00:25:05,386 many of them can be implemented in code 600 00:25:05,776 --> 00:25:07,426 with previous data structures. 601 00:25:07,426 --> 00:25:10,566 For instance, if you want to implement the notion of a queue, 602 00:25:10,836 --> 00:25:13,616 a line, the idea being for a queue 603 00:25:13,856 --> 00:25:17,846 that you implement a FIFO structure (first in first out). 604 00:25:17,846 --> 00:25:19,656 So, the first person on line at 5:00 a.m. 605 00:25:19,656 --> 00:25:21,676 in the morning is the first person to get out of line 606 00:25:21,676 --> 00:25:23,746 and be able to buy the iPhone, in this case. 607 00:25:24,116 --> 00:25:27,906 So, with what data structure could you implement a queue? 608 00:25:27,906 --> 00:25:29,836 [ Inaudible audience response ] 609 00:25:29,836 --> 00:25:34,576 >> In other words, if you want to implement a new struct in C 610 00:25:34,926 --> 00:25:37,706 and you want to call this thing a queue and it's going 611 00:25:37,706 --> 00:25:41,686 to have the properties of FIFO (first in first out), well, 612 00:25:41,686 --> 00:25:43,556 how do you represent the individual people? 613 00:25:43,586 --> 00:25:44,726 Or, let's just number them, 614 00:25:44,726 --> 00:25:47,066 how do you store integers in a queue? 615 00:25:47,066 --> 00:25:49,096 What data structure could you actually use in C? 616 00:25:50,156 --> 00:25:51,436 So, we could just use an array. 617 00:25:51,636 --> 00:25:54,366 Right? In theory, Apple, for fire code reasons 618 00:25:54,366 --> 00:25:56,276 or management reasons, they might just say, you know what, 619 00:25:56,276 --> 00:25:58,766 we're only going to let a thousand people line up outside 620 00:25:58,766 --> 00:26:00,046 of this door or a hundred people. 621 00:26:00,266 --> 00:26:02,596 So, in a sense it might be reasonable to bound a number 622 00:26:02,596 --> 00:26:04,126 of slots in your data structure. 623 00:26:04,376 --> 00:26:08,376 So a queue could just really underneath the hood be an array. 624 00:26:08,586 --> 00:26:10,226 And that array can be a fixed size. 625 00:26:10,516 --> 00:26:12,506 But what do you then have to be careful of? 626 00:26:12,536 --> 00:26:15,686 Well, I can implement a queue just sketching this out partly 627 00:26:15,686 --> 00:26:17,386 in C and partly in pseudocode here. 628 00:26:17,636 --> 00:26:19,356 I might call my struct a queue 629 00:26:19,686 --> 00:26:24,486 and my queue might actually have, let's say, int members, 630 00:26:24,486 --> 00:26:25,786 the people in this queue, and it's going 631 00:26:25,786 --> 00:26:26,926 to be bounded at a hundred. 632 00:26:27,206 --> 00:26:29,246 So, now I have a struct called a queue 633 00:26:29,426 --> 00:26:32,396 and I could then have a function like, pop. 634 00:26:33,036 --> 00:26:39,476 So, let me do int pop (queue *q). 635 00:26:39,646 --> 00:26:46,486 And the purpose of this function in C is going to be returned 636 00:26:46,486 --> 00:26:48,256 to me the next person 637 00:26:48,496 --> 00:26:50,156 that should be removed from the queue. 638 00:26:50,386 --> 00:26:51,846 So, in other words, a data structure, 639 00:26:51,846 --> 00:26:53,926 more generally is the, rather... 640 00:26:54,616 --> 00:26:59,036 A data structure generally has not only data associated 641 00:26:59,036 --> 00:27:02,036 with it inside, it also typically has some standard 642 00:27:02,036 --> 00:27:03,826 operations associated with it. 643 00:27:03,826 --> 00:27:05,186 And even though we didn't highlight this 644 00:27:05,186 --> 00:27:07,746 with linked lists, a linked list's operations are the ones 645 00:27:07,746 --> 00:27:08,346 we played with. 646 00:27:08,346 --> 00:27:11,276 Delete is an operation on a linked list data structure. 647 00:27:11,316 --> 00:27:15,256 Insert, search, sort might be another one. 648 00:27:15,256 --> 00:27:16,886 So anything we've been calling a function 649 00:27:16,886 --> 00:27:18,606 or an algorithm you can think now 650 00:27:18,606 --> 00:27:21,076 as applying very specifically to a data structure. 651 00:27:21,076 --> 00:27:23,476 So, a common one for a queue is pop, 652 00:27:23,796 --> 00:27:25,986 or you can call it different things, but the purpose here is 653 00:27:25,986 --> 00:27:28,366 that I call this function, I pass in a queue, 654 00:27:28,566 --> 00:27:31,556 I want it to return to me the number of the person 655 00:27:31,556 --> 00:27:33,106 who should be removed from that queue next. 656 00:27:33,566 --> 00:27:35,366 So, now let's just brainstorm for a moment. 657 00:27:35,366 --> 00:27:39,236 If I pass in a queue and I need to be told who the next person 658 00:27:39,236 --> 00:27:41,756 in line is that should be allowed into the store, 659 00:27:42,446 --> 00:27:44,466 how would I possibly implement that? 660 00:27:44,466 --> 00:27:47,896 Do I need additional data inside of this data structure in order 661 00:27:47,896 --> 00:27:48,956 to implement that idea? 662 00:27:49,456 --> 00:27:52,756 [ Inaudible audience response ] 663 00:27:53,256 --> 00:27:54,406 >> What is the rule? 664 00:27:54,406 --> 00:27:56,646 Alright, so the rule is again FIFO. 665 00:27:56,646 --> 00:27:59,756 So for a data structure like a queue, suppose we filled it 666 00:27:59,756 --> 00:28:02,706 up with a hundred people or a hundred integers, the rule is, 667 00:28:02,706 --> 00:28:05,476 when I call pop, I want the first person in line. 668 00:28:05,846 --> 00:28:07,896 Now, what does that mean in this context? 669 00:28:08,616 --> 00:28:11,346 What should pop return in the simplest case? 670 00:28:11,346 --> 00:28:12,236 [ Inaudible audience response ] 671 00:28:12,236 --> 00:28:14,256 >> The next element or array, or actually, 672 00:28:14,256 --> 00:28:16,026 if you want to return the first person, you know, 673 00:28:16,026 --> 00:28:18,606 why don't I just keep it simple, return [member]... 674 00:28:18,606 --> 00:28:20,966 whoops, return members[0]; right? 675 00:28:20,966 --> 00:28:23,406 So, it's just an array. 676 00:28:23,406 --> 00:28:26,126 So if you want to return the first person in line, 677 00:28:26,126 --> 00:28:27,356 just return members[0]. 678 00:28:27,356 --> 00:28:29,656 But what's the problem now? 679 00:28:30,256 --> 00:28:32,376 If again, the point here is to implement a queue, 680 00:28:33,526 --> 00:28:36,086 if the Apple employee executes the same function again, 681 00:28:36,086 --> 00:28:37,506 who's going to be allowed into the store next? 682 00:28:38,866 --> 00:28:41,426 The same guy, the same person, same person, same person. 683 00:28:41,426 --> 00:28:42,626 So, this is clearly flawed. 684 00:28:42,626 --> 00:28:45,036 So the next time around what do I return? 685 00:28:45,036 --> 00:28:47,656 Well, pop needs to know something more here. 686 00:28:47,906 --> 00:28:48,516 Members... 687 00:28:48,516 --> 00:28:50,386 Oh, actually, this is the wrong syntax. 688 00:28:50,386 --> 00:28:51,526 Remember, to get at something? 689 00:28:51,826 --> 00:28:55,146 It's q->members[0]; Take the struct called queue, go there 690 00:28:55,396 --> 00:28:57,596 and return the member's zeroth element. 691 00:28:57,866 --> 00:28:58,886 What do I really want to do? 692 00:28:59,226 --> 00:29:00,616 Well, more generally, I don't want 693 00:29:00,616 --> 00:29:02,856 to return always the first person in line. 694 00:29:03,096 --> 00:29:06,126 I probably want to return the person at the head of the line. 695 00:29:06,486 --> 00:29:08,026 But, ahead might be a variable. 696 00:29:08,086 --> 00:29:09,406 Where should I store this variable? 697 00:29:09,686 --> 00:29:12,786 How could I possibly keep track of where someone is? 698 00:29:12,786 --> 00:29:13,976 [ Inaudible audience response ] 699 00:29:13,976 --> 00:29:17,956 >> So, maybe local, but if I put a local variable inside pop, 700 00:29:17,956 --> 00:29:19,906 as soon pop returns I will have forgotten 701 00:29:19,906 --> 00:29:21,046 who I just removed from the list. 702 00:29:21,046 --> 00:29:23,336 >> Inside the struct. 703 00:29:23,486 --> 00:29:24,666 >> So, inside the struct, right? 704 00:29:24,666 --> 00:29:26,196 So, don't forget, a couple of weeks ago 705 00:29:26,196 --> 00:29:27,696 when we introduced a student struct, 706 00:29:27,886 --> 00:29:29,966 whenever we had data we wanted to keep together, 707 00:29:29,966 --> 00:29:31,926 like an ID number, a name, and a house, 708 00:29:32,146 --> 00:29:34,976 we just lumped it all together inside of the same struct. 709 00:29:35,246 --> 00:29:38,966 So, you know, I could just do this: int head; for my queue 710 00:29:39,186 --> 00:29:42,466 and the head variable is what ultimately keeps track 711 00:29:42,466 --> 00:29:43,946 of who's next in line. 712 00:29:44,206 --> 00:29:48,926 So now I might say go ahead and get the person at the head 713 00:29:48,926 --> 00:29:52,346 of the line by accessing the same queue structure, 714 00:29:52,616 --> 00:29:54,476 its head field and return it. 715 00:29:54,516 --> 00:29:55,646 But I'm not quite there yet. 716 00:29:56,266 --> 00:29:58,556 What's the problem now? 717 00:29:58,726 --> 00:30:00,056 Who's going to get pulled off the head 718 00:30:00,056 --> 00:30:02,226 of the line the next time I call pop? 719 00:30:02,226 --> 00:30:04,216 [ Inaudible audience response ] 720 00:30:04,216 --> 00:30:06,476 >> So, I do need to iterate, but who specifically is going 721 00:30:06,476 --> 00:30:07,526 to get popped off the front 722 00:30:07,526 --> 00:30:08,856 of the line the next time this is called? 723 00:30:08,856 --> 00:30:10,226 [ Inaudible audience response ] 724 00:30:10,226 --> 00:30:11,396 >> It's the same person right? 725 00:30:11,396 --> 00:30:14,026 It's not sufficient to generalize who gets popped off 726 00:30:14,236 --> 00:30:16,706 if I'm not actually going to maintain this data structure. 727 00:30:16,956 --> 00:30:18,796 So really I want to do something like this: 728 00:30:18,796 --> 00:30:21,416 int next gets the person who's currently 729 00:30:21,416 --> 00:30:22,406 at the head of the line. 730 00:30:22,656 --> 00:30:25,136 Agreed? Alright, and then ultimately I want 731 00:30:25,136 --> 00:30:27,176 to return this next person. 732 00:30:27,176 --> 00:30:28,536 But what do I need to do first? 733 00:30:28,536 --> 00:30:28,616 [ Inaudible audience response ] 734 00:30:28,616 --> 00:30:31,656 >> Right, I need to plus, plus this. 735 00:30:31,656 --> 00:30:34,446 So really I want to say the person at the head 736 00:30:34,446 --> 00:30:37,356 of the line should now be the person 737 00:30:37,356 --> 00:30:39,246 at the head of the line plus one. 738 00:30:39,636 --> 00:30:42,056 So, now I've taken in this data structure, 739 00:30:42,206 --> 00:30:43,966 I've grabbed the person who supposedly 740 00:30:43,966 --> 00:30:44,816 at the front of the line. 741 00:30:45,116 --> 00:30:47,696 I then say, okay, the next time let me plan for this, 742 00:30:47,696 --> 00:30:50,566 the next time go ahead and call the next person, 743 00:30:50,566 --> 00:30:52,706 which is equivalent to plus, plussing this. 744 00:30:52,766 --> 00:30:54,166 So, I could actually, simplify this. 745 00:30:54,216 --> 00:30:56,136 This is in fact identical to just saying this. 746 00:30:56,776 --> 00:30:59,966 And then I return not the current value of head, 747 00:30:59,966 --> 00:31:01,936 because then I'd be skipping the first person in line, 748 00:31:02,146 --> 00:31:04,116 I then return the person that I called next. 749 00:31:04,456 --> 00:31:05,526 Now, there's one bug here. 750 00:31:06,076 --> 00:31:07,726 What's going to happen eventually 751 00:31:08,846 --> 00:31:10,266 with this implementation of pop? 752 00:31:10,266 --> 00:31:10,333 [ Inaudible audience response ] 753 00:31:10,333 --> 00:31:11,866 >>At the end of the line I'm going to segfault, 754 00:31:11,866 --> 00:31:15,266 or I'm at least going to start touching memory that's not 755 00:31:15,266 --> 00:31:15,896 actually there. 756 00:31:15,896 --> 00:31:18,166 So this is as though the Apple employee is going 757 00:31:18,166 --> 00:31:22,916 to like start talking to no one at the end of the line trying 758 00:31:22,916 --> 00:31:25,256 to pull more and more people that just aren't there. 759 00:31:25,256 --> 00:31:28,366 So, we need a fix here and what operator probably saves us? 760 00:31:28,366 --> 00:31:29,846 [ Inaudible audience response ] 761 00:31:29,846 --> 00:31:32,246 >> What's that? 762 00:31:32,246 --> 00:31:32,436 [ Inaudible audience response ] 763 00:31:32,436 --> 00:31:34,656 >> Not or, but what do we do generally 764 00:31:34,656 --> 00:31:35,736 when we want to wrap around? 765 00:31:37,346 --> 00:31:44,036 Right, so we could do something this: head = q ->head + 1. 766 00:31:44,036 --> 00:31:46,536 And then we can make sure that we always stay 767 00:31:46,536 --> 00:31:49,466 within the boundaries of this array with something like this. 768 00:31:49,796 --> 00:31:51,676 But again, I'm kind of skipping a step here. 769 00:31:51,676 --> 00:31:54,376 Now, if I implement this as follows, this assumes 770 00:31:54,376 --> 00:31:56,576 that there's going to be a continual flow of people. 771 00:31:56,576 --> 00:31:59,386 I presumably now need, if I'm going to allow this 772 00:31:59,386 --> 00:32:02,046 to now back wrap around to the other side, what do I also need 773 00:32:02,046 --> 00:32:03,256 to keep track of apparently? 774 00:32:03,256 --> 00:32:03,816 [ Inaudible audience response ] 775 00:32:03,816 --> 00:32:07,936 >> What's that? 776 00:32:07,936 --> 00:32:08,306 [ Inaudible audience response ] 777 00:32:08,306 --> 00:32:09,496 >> So the end of the line, right? 778 00:32:09,496 --> 00:32:13,076 I need to keep track of how many of the one hundred integers 779 00:32:13,076 --> 00:32:16,716 that I reserved space for are actually legitimate candidates 780 00:32:16,716 --> 00:32:18,566 to be removed from this data structure. 781 00:32:18,776 --> 00:32:22,106 So, I'm probably going to need another integer in here, 782 00:32:22,396 --> 00:32:25,926 something like int size; or something along those lines. 783 00:32:26,116 --> 00:32:27,816 But here's where things start to get interesting. 784 00:32:27,816 --> 00:32:30,116 When you're designing a data structure to solve a problem, 785 00:32:30,336 --> 00:32:32,296 you typically wrap up the data you care 786 00:32:32,296 --> 00:32:33,486 about inside of a structure. 787 00:32:33,716 --> 00:32:35,456 You implement one or more functions 788 00:32:35,456 --> 00:32:38,136 with which you then start to manipulate that data, 789 00:32:38,386 --> 00:32:41,366 so that if you have a goal in mind like, let me implement 790 00:32:41,366 --> 00:32:44,826 in code this notion of a queue so that a computer program, 791 00:32:44,826 --> 00:32:48,526 so one of these employees on a little iPhone can be told 792 00:32:48,606 --> 00:32:50,726 who should be popped off the list first. 793 00:32:51,006 --> 00:32:53,336 Well, you could implement this using these kinds 794 00:32:53,336 --> 00:32:54,046 of data members. 795 00:32:54,046 --> 00:32:55,446 We'll look at this with more detail 796 00:32:55,446 --> 00:32:58,316 with problem set 6 next week, but this is just a taste 797 00:32:58,316 --> 00:32:59,706 of the thought process that goes 798 00:32:59,706 --> 00:33:02,696 into actually designing interesting data structures; 799 00:33:02,946 --> 00:33:05,536 not these most simplistic of ones, arrays and linked lists, 800 00:33:05,536 --> 00:33:07,016 that we've looked at thus far. 801 00:33:07,326 --> 00:33:08,926 But all of this generally reduces 802 00:33:08,926 --> 00:33:12,356 to using this new feature, this power called pointers 803 00:33:12,356 --> 00:33:15,356 and memory addresses in C. Unfortunately, you can do a lot 804 00:33:15,356 --> 00:33:18,656 of damage using pointers in C, both in terms of security, 805 00:33:18,656 --> 00:33:20,216 as we've discussed with buffer overflows, 806 00:33:20,436 --> 00:33:21,716 and just in terms of correctness. 807 00:33:21,716 --> 00:33:25,236 Probably almost everyone in this room has had a segfault 808 00:33:25,236 --> 00:33:28,026 at some point of some sort because you've touched memory 809 00:33:28,026 --> 00:33:29,226 that doesn't belong to you. 810 00:33:29,226 --> 00:33:31,296 And sometimes these problems can be hard to spot. 811 00:33:31,556 --> 00:33:33,376 You might have not looked at your code before submitting it 812 00:33:33,376 --> 00:33:35,826 for some pset, felt pretty good that, got it. 813 00:33:35,936 --> 00:33:37,416 There are no bugs in this code. 814 00:33:37,726 --> 00:33:39,866 And then your TF finds something, nonetheless. 815 00:33:40,366 --> 00:33:45,056 Well, this is not necessarily the easiest thing to debug 816 00:33:45,056 --> 00:33:47,466 on your own, and so thankfully some tools exist. 817 00:33:47,706 --> 00:33:51,946 So Valgrind is a program we'll start using for problem set... 818 00:33:52,436 --> 00:33:54,106 you can start using for problem set 5. 819 00:33:54,106 --> 00:33:56,616 It's particularly germane, as you'll see, to problem set 6 820 00:33:56,906 --> 00:33:59,736 that essentially analyzes your code for you. 821 00:33:59,736 --> 00:34:01,316 And if it can, tells you 822 00:34:01,316 --> 00:34:04,096 where you've made memory-related mistakes 823 00:34:04,096 --> 00:34:05,716 so that you can proactively fix them 824 00:34:05,716 --> 00:34:07,146 so that your code isn't broken, 825 00:34:07,406 --> 00:34:09,036 doesn't get downgraded certainly, 826 00:34:09,036 --> 00:34:10,866 and ultimately is better written. 827 00:34:11,116 --> 00:34:12,556 So, let's take a look at an example. 828 00:34:12,556 --> 00:34:14,066 This is among your printouts for today. 829 00:34:14,066 --> 00:34:16,326 It's a simple program called memory.c. 830 00:34:16,836 --> 00:34:20,096 But I wrote this intentionally to have a couple of flaws. 831 00:34:20,096 --> 00:34:21,326 It doesn't do all that much. 832 00:34:21,386 --> 00:34:22,736 At the bottom of this file, 833 00:34:22,736 --> 00:34:24,266 notice I have what's called main. 834 00:34:24,496 --> 00:34:26,846 It's sole purpose in life is to call the function f(); 835 00:34:27,466 --> 00:34:29,576 and then it returns 0 just by default. 836 00:34:30,126 --> 00:34:31,906 And then what does f(); do? 837 00:34:31,906 --> 00:34:32,896 It returns nothing. 838 00:34:32,926 --> 00:34:35,846 It takes in no arguments as indicated by void here. 839 00:34:36,096 --> 00:34:39,436 But it does do a couple of things, albeit uselessly. 840 00:34:39,796 --> 00:34:41,596 It first declares int *x. 841 00:34:41,596 --> 00:34:45,346 It then mallocs how many bytes? 842 00:34:47,836 --> 00:34:49,256 So 40, right? 843 00:34:49,256 --> 00:34:51,006 If we assume that an int is 4 bytes, 844 00:34:51,116 --> 00:34:53,586 it's going to allocate 40 bytes using malloc, 845 00:34:53,586 --> 00:34:56,986 storing the address in x. And then it appears to, 846 00:34:57,156 --> 00:35:00,116 at the tenth location in this chunk of memory, 847 00:35:00,396 --> 00:35:03,726 so at the location of the 11th integer, zero indexed, 848 00:35:03,726 --> 00:35:06,416 it's going to arbitrarily put the value 0. 849 00:35:06,666 --> 00:35:09,576 But this is problematic because how many ints did I 850 00:35:09,576 --> 00:35:10,476 actually allocate? 851 00:35:11,876 --> 00:35:12,706 So only 10. 852 00:35:12,856 --> 00:35:14,426 If I'm indexing into this array 853 00:35:14,426 --> 00:35:16,746 at [10] I've gone one step too far. 854 00:35:16,746 --> 00:35:18,256 Right? This is a buffer overrun. 855 00:35:18,256 --> 00:35:20,136 It's not really an exploit because you're not going 856 00:35:20,136 --> 00:35:22,366 to compromise a machine really, with just the number zero. 857 00:35:22,626 --> 00:35:24,716 But you are in fact overstepping the bound. 858 00:35:24,716 --> 00:35:26,846 So you might be able to catch this just by looking 859 00:35:26,846 --> 00:35:28,506 through the code, but not necessarily. 860 00:35:28,716 --> 00:35:31,106 So let's see what this program can actually do for us. 861 00:35:31,226 --> 00:35:34,346 So, I'm going to go ahead and run this command. 862 00:35:34,346 --> 00:35:35,166 It's valgrind. 863 00:35:35,296 --> 00:35:36,476 That's the command. 864 00:35:36,756 --> 00:35:38,946 -v is a switch that says be verbose. 865 00:35:38,946 --> 00:35:40,336 Tell me as much as you can. 866 00:35:40,626 --> 00:35:44,516 And then -- leak - check=full says, 867 00:35:44,516 --> 00:35:46,286 do all of the memory leak tests 868 00:35:46,636 --> 00:35:48,246 that the program is designed to support. 869 00:35:48,246 --> 00:35:48,906 How do I know this? 870 00:35:49,016 --> 00:35:51,736 I read the man page, or frankly, I ran just valgrind, 871 00:35:51,736 --> 00:35:53,086 and it yelled at me and it told me 872 00:35:53,086 --> 00:35:54,596 to run this other command instead. 873 00:35:54,716 --> 00:35:55,666 So that's how you can remember. 874 00:35:56,006 --> 00:35:57,556 So let me go ahead and paste this in. 875 00:35:57,896 --> 00:35:59,766 First let me make the program memory. 876 00:36:00,206 --> 00:36:01,156 Seems to compile. 877 00:36:01,156 --> 00:36:03,746 If I run it, seems to work just fine. 878 00:36:03,746 --> 00:36:05,786 If I run gdb on it, and then run it, 879 00:36:06,296 --> 00:36:08,156 seemingly a program exits normally, 880 00:36:08,156 --> 00:36:09,606 so it looks like it's okay. 881 00:36:09,986 --> 00:36:11,606 But let's see if valgrind knows better. 882 00:36:11,606 --> 00:36:15,196 So, let me run this command on the program called memory 883 00:36:15,586 --> 00:36:16,876 in my current directory. 884 00:36:16,906 --> 00:36:20,246 Enter. Wow, so a whole bunch of text flew by. 885 00:36:20,246 --> 00:36:22,346 That's what happens when you tell a program to be verbose. 886 00:36:22,346 --> 00:36:24,466 A lot of this is not all that interesting. 887 00:36:24,466 --> 00:36:26,346 It's a bit more arcane than we'd care about. 888 00:36:26,636 --> 00:36:29,386 But you can skim through it even if this is the very first time. 889 00:36:29,386 --> 00:36:32,296 And you can kind of look for the proverbial red flag. 890 00:36:32,376 --> 00:36:33,796 So, let me scroll down to... 891 00:36:33,796 --> 00:36:37,766 Oh, this is interesting: Invalid write of size 4. 892 00:36:37,766 --> 00:36:38,926 And I say it's interesting 893 00:36:39,156 --> 00:36:41,576 because it's not cryptically referring to all these files 894 00:36:41,576 --> 00:36:42,436 that I've never heard of. 895 00:36:42,596 --> 00:36:43,856 It's referring actually 896 00:36:43,856 --> 00:36:47,156 to something I wrote: memory.c line 23. 897 00:36:47,156 --> 00:36:50,596 Then let me scroll down a little further. 898 00:36:50,816 --> 00:36:52,336 Oh, and this is bad sounding: 899 00:36:52,446 --> 00:36:55,376 40 bytes in 1 blocks are definitely lost 900 00:36:55,376 --> 00:36:56,726 in loss record 1 of 1. 901 00:36:56,726 --> 00:36:58,466 Alright, so it couldn't be more verbose. 902 00:36:58,866 --> 00:37:01,776 But the point is, again it's my code, right? 903 00:37:01,776 --> 00:37:03,586 So it's finding fault with my code, 904 00:37:03,586 --> 00:37:04,776 so let me look a little closer. 905 00:37:04,776 --> 00:37:06,116 Let's do the first one first. 906 00:37:06,116 --> 00:37:09,486 So, Invalid write of size 4 apparently induced 907 00:37:09,486 --> 00:37:11,986 at this address, but that's not useful to me, the human, 908 00:37:12,176 --> 00:37:16,286 at this line number, 23, in memory.c. So, let me go ahead 909 00:37:16,286 --> 00:37:19,046 and open memory.c line 23. 910 00:37:19,116 --> 00:37:25,936 Aha! Line 23 is this thing here where I say x[10] = 0; 911 00:37:25,936 --> 00:37:28,556 and the error message being indicated there is again, this: 912 00:37:28,656 --> 00:37:30,456 Invalid write of size 4. 913 00:37:30,456 --> 00:37:32,146 So, what does it mean by saying size 4? 914 00:37:32,726 --> 00:37:34,546 So, it's an int, right? 915 00:37:34,546 --> 00:37:36,686 I'm storing an int, namely the int 0. 916 00:37:36,686 --> 00:37:38,066 It happens to be 4 bytes, 917 00:37:38,366 --> 00:37:41,176 and so that's why it's saying Invalid write. 918 00:37:41,216 --> 00:37:43,136 You know, it's not completely holding my hand 919 00:37:43,136 --> 00:37:44,666 and telling me how to fix this problem. 920 00:37:44,896 --> 00:37:46,586 But now I see, oh, line 23, 921 00:37:46,586 --> 00:37:47,616 there's definitely a problem there. 922 00:37:47,886 --> 00:37:50,236 Bam! I've now detected my buffer overrun. 923 00:37:50,396 --> 00:37:53,076 Now, valgrind can't necessarily do this in all contexts, 924 00:37:53,076 --> 00:37:55,716 especially if your program is more dynamic, takes user input 925 00:37:55,716 --> 00:37:57,526 with GetInt() or GetString(), or the command line. 926 00:37:57,826 --> 00:37:59,736 But at least in this case we catch that error. 927 00:37:59,986 --> 00:38:01,426 What else did it yell at me for? 928 00:38:01,426 --> 00:38:05,186 Let me look at the other error, which was this one. 929 00:38:05,186 --> 00:38:06,396 Let me rerun the command. 930 00:38:07,126 --> 00:38:10,396 It was toward the bottom: Invalid write of size 4. 931 00:38:10,396 --> 00:38:10,866 We saw this... 932 00:38:10,866 --> 00:38:14,236 Ooh: definitely lost: 40 bytes in 1 blocks. 933 00:38:14,266 --> 00:38:15,066 That was not good. 934 00:38:15,306 --> 00:38:18,786 And here it is: (memory.c:22). 935 00:38:19,146 --> 00:38:21,956 So, let me go into memory.c, line 22. 936 00:38:21,956 --> 00:38:22,726 It's the one above it. 937 00:38:23,106 --> 00:38:25,166 Oh, so I'm losing 40 bytes. 938 00:38:25,256 --> 00:38:26,046 Well, why is that? 939 00:38:26,046 --> 00:38:27,276 Well, apparently I'm not doing what? 940 00:38:27,276 --> 00:38:29,546 I'm not calling free. 941 00:38:29,546 --> 00:38:31,246 So, let's actually see if I can fix this. 942 00:38:31,246 --> 00:38:32,956 Again, the program is still pretty pointless, 943 00:38:32,956 --> 00:38:34,856 but let me at least knock off one of these. 944 00:38:34,856 --> 00:38:36,156 Any time you call malloc, 945 00:38:36,156 --> 00:38:38,296 henceforth you should absolutely be calling free. 946 00:38:38,606 --> 00:38:40,966 But you want to be calling free on the same pointer 947 00:38:41,116 --> 00:38:43,346 that malloc returned, which is x in this case. 948 00:38:43,346 --> 00:38:44,156 And we save that. 949 00:38:44,546 --> 00:38:45,916 Let me remake memory. 950 00:38:46,396 --> 00:38:49,396 Let me go ahead and now run valgrind again and see 951 00:38:49,396 --> 00:38:50,346 if it yells a little less. 952 00:38:51,026 --> 00:38:51,556 Okay, good. 953 00:38:51,616 --> 00:38:53,156 Now I only have one errors. 954 00:38:53,696 --> 00:38:58,906 And... So, they have such a fancy sophisticated program, 955 00:38:58,906 --> 00:39:01,366 and again, they didn't even parenthesize the s. This is 956 00:39:01,366 --> 00:39:03,156 really just an if condition they could have had. 957 00:39:03,156 --> 00:39:06,606 So even experienced programmers cut corners sometimes. 958 00:39:07,346 --> 00:39:08,556 Alright. Not that you should. 959 00:39:08,956 --> 00:39:10,456 Alright, so this problem still remains. 960 00:39:10,666 --> 00:39:12,676 So, in short, there's a whole lot of cryptic output, 961 00:39:12,676 --> 00:39:13,946 but thankfully with Valgrind, you don't have 962 00:39:13,946 --> 00:39:14,886 to care about most of it. 963 00:39:15,096 --> 00:39:16,726 Looking for any mentions of your code 964 00:39:16,726 --> 00:39:19,206 or your line numbers can absolutely start helping you 965 00:39:19,206 --> 00:39:23,116 chase down any kind of memory-related bugs. 966 00:39:23,356 --> 00:39:25,026 But we see this recurring theme here. 967 00:39:25,026 --> 00:39:28,096 So every time we look at memory, every time we look at pointers, 968 00:39:28,336 --> 00:39:29,626 we see things like this. 969 00:39:29,916 --> 00:39:31,076 And I had to use this slide 970 00:39:31,076 --> 00:39:32,406 because I thought it was the cutest thing ever. 971 00:39:32,826 --> 00:39:35,606 Alright, five of us think this is cute. 972 00:39:35,806 --> 00:39:38,256 So, this, of course, hexadecimal. 973 00:39:38,366 --> 00:39:40,136 So why do we keep using hexadecimal? 974 00:39:40,276 --> 00:39:42,536 So, we already have binary, where you've got eight bits... 975 00:39:42,536 --> 00:39:45,486 Or actually, you only have two digits: zero and one. 976 00:39:45,486 --> 00:39:47,166 We've got decimal, which we all grew up with, 977 00:39:47,166 --> 00:39:48,116 which is zero through nine. 978 00:39:48,116 --> 00:39:52,456 Hexadecimal has zero through nine and A through F, 979 00:39:52,456 --> 00:39:54,396 so this is 16 total characters. 980 00:39:54,396 --> 00:39:55,486 And why hexadecimal? 981 00:39:55,486 --> 00:39:59,056 So, someone just asked the other day after class, you know, 982 00:39:59,056 --> 00:40:00,956 why use 16 characters? 983 00:40:00,956 --> 00:40:02,536 Why not 12 and not 14? 984 00:40:02,666 --> 00:40:03,656 Well, we absolutely could. 985 00:40:03,716 --> 00:40:04,786 There is base 12. 986 00:40:04,786 --> 00:40:07,246 There is base 14, there's base 20, 987 00:40:07,246 --> 00:40:09,106 there's any base system you want to come up with. 988 00:40:09,396 --> 00:40:11,526 What's particularly nice about hexadecimal though, 989 00:40:11,796 --> 00:40:14,076 is that you're representing zero through nine, 990 00:40:14,136 --> 00:40:17,976 and a through f. That's 16 total digits. 991 00:40:18,296 --> 00:40:21,806 To represent 16 total values how many bits do you need? 992 00:40:23,076 --> 00:40:24,536 You only need four. 993 00:40:24,536 --> 00:40:28,736 So with four bits, 1, 2, 3, 4, I can represent, of course, 994 00:40:28,796 --> 00:40:30,716 the number zero pretty effectively, 995 00:40:30,716 --> 00:40:34,886 and I can represent the number 15, so this is a total 996 00:40:34,886 --> 00:40:37,126 of 16 possible values. 997 00:40:37,126 --> 00:40:40,226 So what's nice about hexadecimal is that because hex, 998 00:40:40,406 --> 00:40:43,546 implying 16, is a power of 2, what it means is 999 00:40:43,546 --> 00:40:47,636 that with one hexadecimal digit, you can represent four bits, 1000 00:40:47,636 --> 00:40:51,566 and thus with two hexadecimal digits you can represent eight 1001 00:40:51,626 --> 00:40:52,876 bits or one byte. 1002 00:40:53,006 --> 00:40:56,116 So, any time you see 0x, just saying here comes hexadecimal, 1003 00:40:56,446 --> 00:41:00,156 -01, or 02 or 03, it's just a nice human convention 1004 00:41:00,156 --> 00:41:04,336 because that zero and 1 are actually represented using 8 1005 00:41:04,336 --> 00:41:05,196 bits in total. 1006 00:41:05,196 --> 00:41:09,416 For instance, 0x01, if we write this out, let's go ahead 1007 00:41:09,416 --> 00:41:11,706 and do 0x01, well, 1008 00:41:11,706 --> 00:41:13,976 if you actually implement this is binary, well, 1009 00:41:13,976 --> 00:41:17,256 this first zero actually represents 4 bits. 1010 00:41:17,516 --> 00:41:19,986 So, that's actually the bits 1, 2, 3, 4. 1011 00:41:20,246 --> 00:41:24,116 This 1 is actually four bits, so that's the number this? 1012 00:41:24,636 --> 00:41:28,986 No, it's just the number 1, so 1, 2, 3, 4. 1013 00:41:29,176 --> 00:41:33,226 This means the value we say 0x02 is just, zero, zero, zero, zero, 1014 00:41:33,626 --> 00:41:35,596 then zero, zero, zero, 2. 1015 00:41:35,596 --> 00:41:38,796 Or, no, right, so what's 2? 1016 00:41:38,796 --> 00:41:40,436 It should really be 1 0. 1017 00:41:40,436 --> 00:41:42,916 So it's still just binary, you know, from several weeks ago. 1018 00:41:42,916 --> 00:41:48,506 This would be 1, 2, 3, 4 for the zero, then 1, 2, 3, 4 for the 3. 1019 00:41:48,776 --> 00:41:50,956 And then finally, if we keep going up in this way, 1020 00:41:50,956 --> 00:41:55,276 let's go up to ff, this means four ones, 1021 00:41:55,346 --> 00:41:57,146 now we have the number 255. 1022 00:41:57,346 --> 00:42:00,456 So, hex is just a nice way of saying, expressing a number 1023 00:42:00,656 --> 00:42:02,266 in four-bit chunks at a time. 1024 00:42:02,556 --> 00:42:05,596 So, why is this actually useful? 1025 00:42:05,596 --> 00:42:08,276 It's because with this notation it makes it really easy 1026 00:42:08,276 --> 00:42:10,856 to start manipulating data at the bit level. 1027 00:42:10,856 --> 00:42:13,756 Thus far we've only focused on things like integers 1028 00:42:13,756 --> 00:42:16,166 and maybe chars, but we've never really cared 1029 00:42:16,166 --> 00:42:18,076 about the underlying bit representation. 1030 00:42:18,316 --> 00:42:21,106 But now for this pset, now for this part of the course, 1031 00:42:21,106 --> 00:42:22,886 where we're actually talking about files 1032 00:42:22,886 --> 00:42:24,326 and representing data persistently, 1033 00:42:24,576 --> 00:42:28,196 being able to twiddle, or tweak or flip, just change a zero 1034 00:42:28,196 --> 00:42:31,906 to a one or a one to a zero, can actually give us a great deal 1035 00:42:32,226 --> 00:42:33,836 of new power, which we'll take a look 1036 00:42:33,986 --> 00:42:38,486 at in just a five-minute break. 1037 00:42:39,516 --> 00:42:42,166 Okay, so we are back and I haven't exactly vetted this, 1038 00:42:42,196 --> 00:42:45,206 but given that the subject of steganography did come up here, 1039 00:42:45,206 --> 00:42:46,976 I thought I'd at least do a quick Google search. 1040 00:42:47,216 --> 00:42:48,446 And let's assume at the moment 1041 00:42:48,446 --> 00:42:49,926 that Wikipedia has gotten this right. 1042 00:42:50,226 --> 00:42:53,556 Here is an image of what appears to be a nice sky and a tree, 1043 00:42:53,826 --> 00:42:56,896 but it turns out that a subset of the bits 1044 00:42:56,896 --> 00:43:00,546 that compose this picture have been tweaked in such a way, 1045 00:43:00,726 --> 00:43:03,886 and by tweaked I mean zeros have become ones, 1046 00:43:03,886 --> 00:43:06,196 ones have become zeros, or something along those lines. 1047 00:43:06,456 --> 00:43:09,606 It turns out that if you undo those changes 1048 00:43:09,606 --> 00:43:11,096 that what you actually get back 1049 00:43:11,096 --> 00:43:14,126 out of this picture, is that picture. 1050 00:43:14,596 --> 00:43:17,376 Okay, cuteness was not the point. 1051 00:43:17,376 --> 00:43:18,816 [ Laughing ] 1052 00:43:18,816 --> 00:43:20,796 >> So, what's powerful here is that, again, 1053 00:43:20,796 --> 00:43:21,606 you can read the article. 1054 00:43:21,606 --> 00:43:23,616 This is just the Wikipedia steganography article. 1055 00:43:23,796 --> 00:43:26,666 What's powerful here is that you can take really an arbitrary 1056 00:43:26,666 --> 00:43:29,436 image, and if you were using enough bits to represent each 1057 00:43:29,436 --> 00:43:32,126 of the pixels, each of the dots, you could hide a lot 1058 00:43:32,126 --> 00:43:34,136 of additional information, whether it's text 1059 00:43:34,136 --> 00:43:36,956 or whether it's a cat, or anything else altogether. 1060 00:43:36,956 --> 00:43:39,026 So realize that this is a very powerful technique, 1061 00:43:39,026 --> 00:43:40,226 and we're just scratching the surface 1062 00:43:40,226 --> 00:43:42,196 of it with problem set 5. 1063 00:43:42,196 --> 00:43:43,816 So we promised to do some 1064 00:43:43,816 --> 00:43:46,126 of these lower-level bit manipulations 1065 00:43:46,126 --> 00:43:47,596 and we actually have this ability. 1066 00:43:47,866 --> 00:43:50,136 So, in the past you've used the ampersand 1067 00:43:50,136 --> 00:43:53,776 and you've used the vertical bar for the purposes of conditions 1068 00:43:53,776 --> 00:43:55,416 and boolean expressions. 1069 00:43:55,416 --> 00:44:00,546 && is a boolean AND, it says, make sure this and this is true. 1070 00:44:00,706 --> 00:44:02,346 And then we had the vertical bars, which was, 1071 00:44:02,346 --> 00:44:04,436 make sure this or this was true. 1072 00:44:04,736 --> 00:44:07,546 Now, it turns out if you actually just use one ampersand 1073 00:44:07,546 --> 00:44:10,526 or one vertical bar, they have very different meanings. 1074 00:44:10,526 --> 00:44:14,236 What it allows you to do is what's called a bitwise AND 1075 00:44:14,236 --> 00:44:17,376 or a bitwise OR, or there's some other operators all together. 1076 00:44:17,606 --> 00:44:21,286 What bitwise means is that this operator will let you take an 1077 00:44:21,376 --> 00:44:25,446 8-bit char or a 32-bit int or a 64-bit long long, 1078 00:44:25,536 --> 00:44:29,086 and it will allow you to change from 1 to 0, or 0 to 1, 1079 00:44:29,396 --> 00:44:33,016 any of the 8's, or the 32-bits, or the 64 bits 1080 00:44:33,316 --> 00:44:34,616 in that particular number. 1081 00:44:34,616 --> 00:44:36,806 So, it's a step toward implementing something 1082 00:44:36,806 --> 00:44:39,276 like the tree-to-cat transformation, 1083 00:44:39,276 --> 00:44:42,136 if you can actually manipulate bits at that level. 1084 00:44:42,136 --> 00:44:44,716 And in the context of pset 5 you won't necessarily do it 1085 00:44:44,716 --> 00:44:47,526 at the bit level, you'll do it at the byte level in terms 1086 00:44:47,526 --> 00:44:49,046 of the red,green, and blue values 1087 00:44:49,106 --> 00:44:50,146 in the pictures we give you. 1088 00:44:50,446 --> 00:44:54,396 So, let's actually see an example of this in action. 1089 00:44:54,396 --> 00:44:56,246 Let's try a simple one first. 1090 00:44:56,476 --> 00:44:59,466 So, in binary.c, which is among your printouts today, 1091 00:44:59,836 --> 00:45:00,966 we have this code here. 1092 00:45:00,966 --> 00:45:01,956 It's not all that long. 1093 00:45:02,216 --> 00:45:04,796 It's got a do-while loop whose purpose in life is just 1094 00:45:04,796 --> 00:45:08,226 to ask the user again and again for a non-negative integer 1095 00:45:08,326 --> 00:45:09,656 so that we have a number to play with. 1096 00:45:09,966 --> 00:45:11,846 And then down here there's a bit of fanciness. 1097 00:45:12,016 --> 00:45:14,796 So, first let's take a look at what the program actually does 1098 00:45:14,796 --> 00:45:16,616 and then we'll delve into how it's doing that. 1099 00:45:16,616 --> 00:45:18,176 Let me go ahead and make binary. 1100 00:45:18,176 --> 00:45:20,386 I'm going to go ahead now and run binary, 1101 00:45:21,006 --> 00:45:22,826 and I need a non-negative integer, please. 1102 00:45:22,826 --> 00:45:24,246 Let's start with the number 3. 1103 00:45:24,776 --> 00:45:28,226 Enter. And it looks like I've provided an integer, 1104 00:45:28,226 --> 00:45:29,286 it's the number 3. 1105 00:45:29,636 --> 00:45:32,376 But to represent that as binary you have what appeared 1106 00:45:32,376 --> 00:45:35,706 to be 30 zeros and then a one and a one 1107 00:45:35,706 --> 00:45:38,056 and this is consistent, presumably with what we know 1108 00:45:38,056 --> 00:45:39,866 to be the binary representation of three. 1109 00:45:40,176 --> 00:45:41,396 Let's do one other. 1110 00:45:41,396 --> 00:45:44,686 So let me go ahead and do something like a 255. 1111 00:45:44,686 --> 00:45:48,216 I should get a whole bunch more ones, and sure enough, I do. 1112 00:45:48,536 --> 00:45:50,446 And let's try something even bigger. 1113 00:45:50,446 --> 00:45:53,546 Let me go ahead and type in like, big number. 1114 00:45:53,896 --> 00:45:56,436 So that happens to map to this binary number. 1115 00:45:56,436 --> 00:45:59,446 So, assuming this is correct, how is it actually doing this? 1116 00:45:59,446 --> 00:46:00,356 Well, let's take a look. 1117 00:46:00,526 --> 00:46:02,486 So, most of this code, uninteresting. 1118 00:46:02,486 --> 00:46:04,506 It's just a do-while whose purpose in life is, again, 1119 00:46:04,546 --> 00:46:05,626 to get an int from the user. 1120 00:46:05,626 --> 00:46:09,406 So the magic is in here and here's how relatively easy, 1121 00:46:09,406 --> 00:46:10,646 if it's a little cryptic at first, 1122 00:46:10,966 --> 00:46:13,406 it is to start manipulating individual bits. 1123 00:46:13,626 --> 00:46:14,856 What's my goal here? 1124 00:46:15,506 --> 00:46:18,226 Well, I know just from how computers are implemented, 1125 00:46:18,226 --> 00:46:21,456 that an integer is inside RAM 32 bits. 1126 00:46:21,916 --> 00:46:24,666 So what I really want to do is get access to each 1127 00:46:24,666 --> 00:46:27,406 of those individual bits, from the right-hand side, 1128 00:46:27,776 --> 00:46:29,256 all the way over to the left. 1129 00:46:29,386 --> 00:46:31,456 In other words, if I know conceptually 1130 00:46:31,456 --> 00:46:38,606 that the number 3 looks like this, I want to be able to print 1131 00:46:38,606 --> 00:46:41,586 out those bits, and so actually I think I misspoke, 1132 00:46:41,826 --> 00:46:45,046 I want to start at the leftmost bit in this integer, 1133 00:46:45,376 --> 00:46:47,736 and I want to print out what it is, a zero or a one. 1134 00:46:47,736 --> 00:46:50,026 I want to then move to the next bit, the next bit, the next bit. 1135 00:46:50,246 --> 00:46:52,336 So in other words, I want to take the actual integer 1136 00:46:52,336 --> 00:46:54,666 and iterate over all 32 of its bits. 1137 00:46:54,956 --> 00:46:55,696 So, how can I do this? 1138 00:46:56,336 --> 00:46:57,206 Well, let's take a look. 1139 00:46:57,616 --> 00:47:00,756 So, first I'm going to declare an int i. I'm going to declare i 1140 00:47:00,756 --> 00:47:04,466 to be the size of an int, so that's 4 times 8; 1141 00:47:04,746 --> 00:47:06,926 so that's 32, minus 1. 1142 00:47:06,926 --> 00:47:08,796 So, i is initially 31. 1143 00:47:08,796 --> 00:47:10,726 Why didn't I just say, i equals 31? 1144 00:47:10,726 --> 00:47:12,336 I wanted this code to be portable. 1145 00:47:12,336 --> 00:47:13,956 If you ran it on a newer Mac or PC, 1146 00:47:14,046 --> 00:47:15,726 I wanted to make sure the code didn't break on you, 1147 00:47:15,976 --> 00:47:18,526 so I generalized the size of an integer in case it varies. 1148 00:47:18,906 --> 00:47:22,146 So, i is initially 32, and this is kind of old school now,. 1149 00:47:22,146 --> 00:47:23,856 It looks like this loop is going to go from i... 1150 00:47:23,946 --> 00:47:25,226 Sorry, i is 31... 1151 00:47:25,516 --> 00:47:27,846 on down to zero, equal to zero. 1152 00:47:28,076 --> 00:47:30,036 So this loop is going to iterate 32 times. 1153 00:47:30,406 --> 00:47:31,936 Now, why 31 and zero? 1154 00:47:32,076 --> 00:47:34,716 Well, whenever you write a number in binary, 1155 00:47:35,336 --> 00:47:39,436 the convention is that if you write, 1, 2, 3, 4, 5, 6, 7, 8, 1156 00:47:39,846 --> 00:47:43,406 this bit here; the rightmost, is what's generally know 1157 00:47:43,406 --> 00:47:46,606 as the lowest-order digit, the rightmost digit, 1158 00:47:46,606 --> 00:47:49,086 the lowest-order digit, lowest order in the sense 1159 00:47:49,116 --> 00:47:52,566 that this guy is the least contributing number 1160 00:47:52,806 --> 00:47:53,616 in this whole number. 1161 00:47:53,616 --> 00:47:55,076 It's got the least weight in terms 1162 00:47:55,076 --> 00:47:57,576 of the grade school columns, ones columns, twos columns, 1163 00:47:57,576 --> 00:47:59,286 fours columns; this is the smallest column. 1164 00:47:59,286 --> 00:48:00,546 So, it's the least order of bits. 1165 00:48:00,806 --> 00:48:03,006 By contrast, the guy all the way over here on the left, 1166 00:48:03,316 --> 00:48:04,946 that's the highest-order bit. 1167 00:48:05,276 --> 00:48:08,876 So, he's the most powerful bit because he contributes a 128. 1168 00:48:08,876 --> 00:48:11,546 That's the 128s column in the context of binary. 1169 00:48:11,866 --> 00:48:16,286 So we generally number, then, by convention, the right-hand side. 1170 00:48:16,366 --> 00:48:19,126 This would be bit number 0, and this guy 1171 00:48:19,126 --> 00:48:21,036 over here would be bit number 7. 1172 00:48:21,476 --> 00:48:24,496 And by contrast, if we have 32 bits, 0 would be over there, 1173 00:48:24,686 --> 00:48:26,276 31-bit would be over there. 1174 00:48:26,276 --> 00:48:29,246 So, you number it in sort of the opposite way you might think. 1175 00:48:29,596 --> 00:48:31,406 Alright, so what does this mean for our code? 1176 00:48:31,616 --> 00:48:33,686 Well, I'm going to start at the 31st bit 1177 00:48:34,026 --> 00:48:35,686 and move my way down to the zero bit. 1178 00:48:35,686 --> 00:48:36,996 So, conceptually this is right. 1179 00:48:36,996 --> 00:48:39,616 I'm going to iterate over all 32 bits from left to right. 1180 00:48:39,996 --> 00:48:41,006 What do I need to do? 1181 00:48:41,256 --> 00:48:45,326 Well, when I have this integer n that the user provided, 1182 00:48:45,706 --> 00:48:48,946 that just gives me this high-level number, the number 3, 1183 00:48:48,946 --> 00:48:51,076 the number 1 million, whatever they typed in. 1184 00:48:51,286 --> 00:48:54,036 I need to dive in and actually look at individual bits. 1185 00:48:54,336 --> 00:48:58,836 So there's this notion in programming called a mask. 1186 00:48:58,836 --> 00:49:01,666 So, I've declared a variable of type int called mask. 1187 00:49:02,376 --> 00:49:04,286 I've set it equal to this fancy thing. 1188 00:49:04,286 --> 00:49:07,106 So this is literally the number 1 that we're familiar with. 1189 00:49:07,396 --> 00:49:08,776 What does that really equal? 1190 00:49:08,956 --> 00:49:12,406 Well, if it's a binary number, this is 1, 2, 3, 4, 5, 6, 7, 8, 1191 00:49:12,406 --> 00:49:19,396 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 1192 00:49:19,396 --> 00:49:23,056 25, 26, 27, 28, 29, 30, 31, 32. 1193 00:49:23,276 --> 00:49:26,126 So, that is the number very tediously represented in binary. 1194 00:49:26,126 --> 00:49:29,926 So, that's what I get if I do int x = 1;. 1195 00:49:29,926 --> 00:49:31,616 But I'm not quite doing that, or int mask = 1;. 1196 00:49:31,616 --> 00:49:32,576 But I'm not quite doing that. 1197 00:49:32,576 --> 00:49:34,006 I have another part of an expression here. 1198 00:49:34,296 --> 00:49:36,516 This is the left-shift operator. 1199 00:49:36,906 --> 00:49:39,736 So it means literally, take the bits in this number 1200 00:49:39,926 --> 00:49:41,386 and shift them over to the left. 1201 00:49:41,386 --> 00:49:42,866 Now, obviously that's going to leave you 1202 00:49:42,866 --> 00:49:44,476 with some blank spaces on the right. 1203 00:49:44,526 --> 00:49:46,626 So by default those should get filled in with zeros. 1204 00:49:46,926 --> 00:49:49,216 So the effect is really, with this left shift, 1205 00:49:49,406 --> 00:49:51,676 it's going to move all the ones and zeros that way 1206 00:49:51,676 --> 00:49:54,536 and it's going to fill the placeholders with zeros again. 1207 00:49:54,956 --> 00:49:55,876 So, what does this mean? 1208 00:49:55,876 --> 00:49:57,686 Well, i is initially 31. 1209 00:49:58,266 --> 00:50:01,636 So this means that this 1 effectively is going 1210 00:50:01,636 --> 00:50:05,456 to get shifted over a whole bunch of places. 1211 00:50:05,456 --> 00:50:07,976 And I can't quite depict this perfectly on both sides, 1212 00:50:08,016 --> 00:50:10,416 but I shifted over once, now it's there. 1213 00:50:10,416 --> 00:50:11,676 One more time, now it's there. 1214 00:50:11,956 --> 00:50:15,496 31 times total, it ends up there, and as promised, 1215 00:50:15,496 --> 00:50:21,186 this gets filled in, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1216 00:50:21,186 --> 00:50:24,376 I've got to learn how to copy paste. 1217 00:50:24,486 --> 00:50:26,316 Alright, paste, paste, zero. 1218 00:50:26,556 --> 00:50:30,506 Okay, so that is the result of not just initializing mask to 1, 1219 00:50:30,506 --> 00:50:33,036 but saying initialize it to 1, but then shift those bits 1220 00:50:33,036 --> 00:50:37,186 over 31 places, or more generally, i = 31. 1221 00:50:37,486 --> 00:50:39,136 Now, what was the point of this exercise? 1222 00:50:39,356 --> 00:50:41,896 Well, now I have conceptually this thing called a mask. 1223 00:50:42,236 --> 00:50:43,416 And a mask is generally... 1224 00:50:43,416 --> 00:50:45,116 And actually the same idea exists in Photoshop, 1225 00:50:45,116 --> 00:50:47,096 if you guys have ever played with the notion of a mask there. 1226 00:50:47,346 --> 00:50:49,256 In that context it's usually black and white, 1227 00:50:49,256 --> 00:50:50,746 or red and white, something like that. 1228 00:50:51,216 --> 00:50:52,696 But the point is there's two states. 1229 00:50:53,346 --> 00:50:56,426 In this mask here, there's two states: zero and one. 1230 00:50:56,736 --> 00:51:00,426 The one on the left-hand side is essentially conceptually a hint 1231 00:51:00,506 --> 00:51:02,086 that I care about that bit. 1232 00:51:02,646 --> 00:51:05,666 All the other zeros mean I don't care about those bits. 1233 00:51:06,136 --> 00:51:09,996 So now what I can do is essentially line up this mask 1234 00:51:10,686 --> 00:51:14,286 with a given number and I can just look at the number I care 1235 00:51:14,286 --> 00:51:17,086 about by, as we'll see, ANDing them together. 1236 00:51:17,086 --> 00:51:17,996 So, let's look at the code. 1237 00:51:18,426 --> 00:51:20,976 I initialize mask, that long sequence of bits 1238 00:51:20,976 --> 00:51:24,066 and then I do this: if (n & mask). 1239 00:51:24,846 --> 00:51:26,896 So if (n & mask) printf("1");. 1240 00:51:27,266 --> 00:51:28,426 So, what does this mean? 1241 00:51:28,426 --> 00:51:29,906 Well, n, we recall is this. 1242 00:51:29,906 --> 00:51:33,226 This is my mask now, so let me lay this 1243 00:51:33,226 --> 00:51:35,156 out nicely, make this bigger. 1244 00:51:35,616 --> 00:51:37,416 So that's my mask. 1245 00:51:37,416 --> 00:51:39,246 n, let's say, is the number 3. 1246 00:51:39,886 --> 00:51:40,636 So n is... 1247 00:51:40,966 --> 00:51:43,976 here we go, that's the number 3. 1248 00:51:44,286 --> 00:51:47,966 So, now what I'm saying is AND these two things together. 1249 00:51:48,666 --> 00:51:51,316 So I'll just lay it out kind of like a mathematical expression. 1250 00:51:51,316 --> 00:51:52,636 So I'm going to AND them together. 1251 00:51:52,636 --> 00:51:54,186 And what that gives me, 1252 00:51:54,756 --> 00:51:58,446 it's a very sophisticated ASCII art, what? 1253 00:51:58,656 --> 00:52:01,316 So, the boolean AND operator, just like... 1254 00:52:01,316 --> 00:52:04,316 Sorry, the bitwise AND operator, the single &, 1255 00:52:04,316 --> 00:52:07,926 just like the boolean AND operator says that for an answer 1256 00:52:07,926 --> 00:52:10,736 to be true, both of the inputs must be true. 1257 00:52:10,766 --> 00:52:13,746 If you want an output of 1, the inputs must both be 1. 1258 00:52:14,046 --> 00:52:19,476 So, what this means is, let's do 1 & 0, and the answer is 0. 1259 00:52:19,716 --> 00:52:22,086 What's 0 & 0? 1260 00:52:22,606 --> 00:52:26,336 It's also 0, or again in the world of bools is false. 1261 00:52:26,336 --> 00:52:26,916 And false? 1262 00:52:27,186 --> 00:52:28,976 The answer is false so we write down a 0. 1263 00:52:28,976 --> 00:52:31,066 This is the same question again and again and again. 1264 00:52:31,066 --> 00:52:32,236 What happens when I get down here? 1265 00:52:33,006 --> 00:52:36,016 0 & 1? 0. 0 & 1? 1266 00:52:36,536 --> 00:52:42,366 0. So the answer to (n & mask) is what number? 1267 00:52:43,956 --> 00:52:47,426 It's 32 zeros, which we know to be the value false, right? 1268 00:52:47,426 --> 00:52:50,726 False is anything that's non-positive and non-negative. 1269 00:52:50,986 --> 00:52:52,496 So anything that's zero is false. 1270 00:52:52,726 --> 00:52:57,256 So (n & mask) returns the boolean expression false. 1271 00:52:57,586 --> 00:52:59,776 So does this condition print 1 or 0? 1272 00:53:00,916 --> 00:53:02,696 So, it prints 0, but this is useful 1273 00:53:02,696 --> 00:53:05,366 because again, here's n. I want... 1274 00:53:05,456 --> 00:53:07,616 but n, really when I typed it in is the number 3. 1275 00:53:07,806 --> 00:53:10,206 But underneath the hood the n is this and I want 1276 00:53:10,206 --> 00:53:11,616 to print out its leftmost bit. 1277 00:53:11,616 --> 00:53:12,976 Its leftmost bit is a zero. 1278 00:53:12,976 --> 00:53:15,196 and indeed I'm about to print out a zero 1279 00:53:15,196 --> 00:53:16,836 because of this condition here. 1280 00:53:17,276 --> 00:53:19,796 Now, we don't have to repeat this story 30 more times 1281 00:53:19,796 --> 00:53:20,976 because it's pretty much the same. 1282 00:53:21,566 --> 00:53:23,126 If I now let my loop continue, 1283 00:53:23,596 --> 00:53:26,166 the next iteration i equals what? 1284 00:53:27,376 --> 00:53:30,536 So, it's not 31; it's instead 30, right? 1285 00:53:30,536 --> 00:53:31,576 Because I'm doing i-- . 1286 00:53:31,956 --> 00:53:32,846 So, what does this mean? 1287 00:53:32,876 --> 00:53:34,806 Again, I initialize mask to 1. 1288 00:53:35,116 --> 00:53:37,236 Then I left-shifted 30 places, 1289 00:53:37,506 --> 00:53:39,756 which means again if I've got 1... 1290 00:53:39,806 --> 00:53:40,886 Let's let me cheat here. 1291 00:53:41,286 --> 00:53:46,056 If this is the value 1 and then I shift it 30 places, 1292 00:53:46,366 --> 00:53:47,876 that gives me what value? 1293 00:53:48,446 --> 00:53:52,166 It gives me this value. 1294 00:53:52,386 --> 00:53:55,466 So, now my mask on the second iteration is almost the same, 1295 00:53:55,726 --> 00:53:58,276 but the bit I care about is not the leftmost one, 1296 00:53:58,316 --> 00:54:00,216 it's the second leftmost one. 1297 00:54:00,406 --> 00:54:03,646 And now as this pattern repeats, i is going to get decremented 1298 00:54:03,646 --> 00:54:07,696 and decremented, so your mask looks the same except that one, 1299 00:54:07,726 --> 00:54:09,776 the single one, is kind of inching its way 1300 00:54:10,046 --> 00:54:11,516 down toward the end of the string. 1301 00:54:11,776 --> 00:54:14,476 Now, once you get to the end of the string, 1302 00:54:14,676 --> 00:54:16,506 what's ultimately going to get printed out? 1303 00:54:16,556 --> 00:54:18,966 Well, at the very last couple 1304 00:54:18,966 --> 00:54:20,916 of iterations my mask is going to look like this. 1305 00:54:21,506 --> 00:54:22,836 This is my second-to-last mask, 1306 00:54:23,336 --> 00:54:25,846 and my last mask is going to look like this. 1307 00:54:25,846 --> 00:54:27,786 Right? Because the 1 is going to keep moving along 1308 00:54:27,786 --> 00:54:28,756 to the right-hand side. 1309 00:54:29,036 --> 00:54:31,866 So, let's go ahead and do this second-to-last one. 1310 00:54:31,866 --> 00:54:35,066 1 & 1 is going to give me 1. 1311 00:54:35,746 --> 00:54:37,776 0 & 1 is going to give me... 1312 00:54:38,066 --> 00:54:41,116 Oh, sorry, we're not at that point of this story. 1313 00:54:41,166 --> 00:54:43,206 Alright, so now the last iteration, 1314 00:54:43,526 --> 00:54:46,456 my mask is this guy here. 1315 00:54:46,896 --> 00:54:50,246 1 & 1 is 1, and so it gets printed out as 1, 1. 1316 00:54:50,396 --> 00:54:52,366 Alright, sorry, I feel like I didn't say this very well. 1317 00:54:52,626 --> 00:54:54,836 At the end of the day what is going on? 1318 00:54:54,996 --> 00:54:58,756 We're using the mask to tell the computer we care about one bit 1319 00:54:58,756 --> 00:55:02,676 and only one bit and it's boolean 1320 00:55:02,676 --> 00:55:06,686 AND to essentially extract only that bit and print it out. 1321 00:55:06,926 --> 00:55:10,166 So the end result, after we've iterated 32 times, 1322 00:55:10,426 --> 00:55:14,756 is that we've printed out 000000 until finally our mask lines 1323 00:55:14,756 --> 00:55:17,346 up with this bit, and our mask lines up with this bit, 1324 00:55:17,396 --> 00:55:21,196 and we actually print the number that we care about. 1325 00:55:21,196 --> 00:55:23,256 Okay, so let's actually use this power 1326 00:55:23,256 --> 00:55:26,596 for something that's otherwise been done in a different way, 1327 00:55:26,596 --> 00:55:28,356 and we'll see what the implications are, 1328 00:55:28,356 --> 00:55:29,546 this new capability. 1329 00:55:29,546 --> 00:55:30,786 Let me point out this. 1330 00:55:30,866 --> 00:55:32,806 I'm going to keep using my little cheat sheet here, 1331 00:55:33,376 --> 00:55:38,076 and I'm going to remind us that the value A is the number 65. 1332 00:55:38,336 --> 00:55:39,466 Well, what is that in binary? 1333 00:55:39,546 --> 00:55:44,346 Number 65 in binary is really 2, 3, 4, 5, 6, 7, 8. 1334 00:55:44,506 --> 00:55:47,006 That's the number 65 in binary, agree or disagree? 1335 00:55:47,756 --> 00:55:50,476 Alright, so that's the 64s column, then the 1s column. 1336 00:55:50,476 --> 00:55:51,736 So that's the number 65. 1337 00:55:51,736 --> 00:55:54,426 And what was the letter little a? 1338 00:55:55,106 --> 00:55:56,926 97. Oh, too fast. 1339 00:55:57,206 --> 00:55:58,446 97, alright. 1340 00:55:58,446 --> 00:55:59,976 So how do we represent 97? 1341 00:56:00,076 --> 00:56:04,396 We need a 1 there, and then you need another 32. 1342 00:56:04,516 --> 00:56:05,116 I think that's it. 1343 00:56:06,176 --> 00:56:10,726 Agreed? 64 plus 32 plus 1, so that's the number 97. 1344 00:56:10,956 --> 00:56:13,266 Here's a curiosity that we've never really leveraged before. 1345 00:56:13,456 --> 00:56:16,736 Big A and little a are always 32 digits apart, 1346 00:56:16,736 --> 00:56:18,396 and we know this just by simple arithmetic: 1347 00:56:18,396 --> 00:56:20,866 97 minus 65 is always 32. 1348 00:56:20,866 --> 00:56:22,526 But there's an interesting side effect, 1349 00:56:22,666 --> 00:56:24,186 probably deliberate of that fact. 1350 00:56:24,776 --> 00:56:25,996 What is the only difference 1351 00:56:25,996 --> 00:56:29,016 between the binary representation of 65 and 97? 1352 00:56:29,016 --> 00:56:30,856 >> [Inaudible audience response] 1353 00:56:30,856 --> 00:56:33,286 >> There's this single 1 in, coincidence, 1354 00:56:33,386 --> 00:56:35,066 the 32's column, right? 1355 00:56:35,066 --> 00:56:38,846 This is the 1 columns, 2s, 4s, 8s, 16s, 32s. 1356 00:56:38,926 --> 00:56:40,856 They only differ in the 32s column. 1357 00:56:40,856 --> 00:56:42,326 If we do a little sanity check here. 1358 00:56:42,586 --> 00:56:44,076 What about big B and little b? 1359 00:56:44,126 --> 00:56:48,226 Well, this would be the number 66, which means that is this, 1360 00:56:48,426 --> 00:56:52,736 and the then the number little b is 98, and that just means this. 1361 00:56:52,956 --> 00:56:53,486 So you know what? 1362 00:56:53,486 --> 00:56:54,896 That pattern is going to persist. 1363 00:56:55,136 --> 00:56:58,236 So these big and little letters are always only differing 1364 00:56:58,476 --> 00:56:59,676 by one bit position. 1365 00:56:59,786 --> 00:57:00,346 So, you know what? 1366 00:57:00,346 --> 00:57:03,256 This gives us a new capability to capitalize a number 1367 00:57:03,256 --> 00:57:04,896 if we want or lowercase it. 1368 00:57:05,156 --> 00:57:06,336 Let's go ahead and look at this, 1369 00:57:06,336 --> 00:57:09,686 tolower.c. Alright, let's scroll down. 1370 00:57:10,276 --> 00:57:12,646 Again, this program is almost completely boring, 1371 00:57:12,686 --> 00:57:15,146 except for this one line of code. 1372 00:57:15,196 --> 00:57:17,416 So, you've called the function tolower(). 1373 00:57:17,836 --> 00:57:21,296 You might have looked at this on the quiz 0 itself. 1374 00:57:21,556 --> 00:57:22,716 But notice what we do here. 1375 00:57:22,826 --> 00:57:24,856 In the top here, I just ask the user for an 1376 00:57:24,856 --> 00:57:26,176 "Uppercase character, please:". 1377 00:57:26,456 --> 00:57:28,576 I use GetChar() and I insure 1378 00:57:28,576 --> 00:57:32,366 that the user gives me uppercase A through uppercase Z. 1379 00:57:32,676 --> 00:57:35,486 And then I execute one line of code that's going 1380 00:57:35,486 --> 00:57:37,116 to do something really quite neat. 1381 00:57:37,116 --> 00:57:39,096 Let me go ahead and make tolower. 1382 00:57:40,226 --> 00:57:41,106 Run tolower. 1383 00:57:41,106 --> 00:57:43,366 Enter. Uppercase character, please. 1384 00:57:43,366 --> 00:57:44,016 Let me give the letter A, 1385 00:57:44,016 --> 00:57:48,036 and I get back lowercase a. Let's do B, little b; 1386 00:57:48,036 --> 00:57:51,196 uppercase Z, little z. Well, how is that working? 1387 00:57:51,196 --> 00:57:53,036 Well, take a look at what's going on here now 1388 00:57:53,036 --> 00:57:55,306 that we understand how to manipulate individual bits. 1389 00:57:55,656 --> 00:57:57,006 printf("%c... 1390 00:57:57,006 --> 00:57:57,926 So that's uninteresting. 1391 00:57:57,926 --> 00:57:58,696 Print a character. 1392 00:57:58,696 --> 00:57:59,376 What character? 1393 00:57:59,596 --> 00:58:03,636 Well, it looks like printf("%cn", c | 0x20);. 1394 00:58:03,636 --> 00:58:10,486 Alright, well, let's just bite these off one at a time. 1395 00:58:10,486 --> 00:58:14,136 So 0x20 is actually what? 1396 00:58:14,356 --> 00:58:17,316 It's 1, 2, 3, 4, bits, followed by another 4 bits 1397 00:58:17,636 --> 00:58:19,256 as per our discussion of hexadecimal. 1398 00:58:19,256 --> 00:58:20,646 Let's do the easiest ones first. 1399 00:58:20,836 --> 00:58:25,406 The last 4 are all zeros because that's this bit here 1400 00:58:25,766 --> 00:58:26,876 and then what about these four? 1401 00:58:26,876 --> 00:58:29,146 What should I replace these four placeholders with? 1402 00:58:29,146 --> 00:58:29,666 [ Inaudible audience response ] 1403 00:58:29,666 --> 00:58:33,376 >> 0010. So, this is an interesting thing. 1404 00:58:33,376 --> 00:58:36,936 It looks like 0x20, because I know what hexadecimal notation 1405 00:58:36,936 --> 00:58:40,426 is, is just a succinct way of representing a binary number 1406 00:58:40,636 --> 00:58:42,366 that has a 1 in which position? 1407 00:58:42,366 --> 00:58:43,246 [ Inaudible audience response ] 1408 00:58:43,246 --> 00:58:44,566 >> The 32s only. 1409 00:58:44,906 --> 00:58:47,596 So now I'm saying this expression here in code. 1410 00:58:48,096 --> 00:58:51,976 Take c, which is going to be the number 65 or 66, 1411 00:58:51,976 --> 00:58:57,156 or so forth, and OR it with 0x20. 1412 00:58:57,506 --> 00:58:58,316 So, what does this mean? 1413 00:58:58,316 --> 00:59:02,046 Well, lets take A. This is A. Let's OR it 1414 00:59:02,256 --> 00:59:04,386 with this expression here. 1415 00:59:04,866 --> 00:59:07,826 And now just do our notion of ORing. 1416 00:59:08,056 --> 00:59:12,056 So, this position here: 0 | 0 is 0. 1417 00:59:12,056 --> 00:59:12,766 1 | 0 is? 1418 00:59:12,766 --> 00:59:13,666 [ Inaudible audience response ] 1419 00:59:13,666 --> 00:59:14,166 >> No, careful. 1420 00:59:14,626 --> 00:59:15,406 >> Is 1, right? 1421 00:59:15,406 --> 00:59:17,696 Because the OR, just like in the world of booleans, something 1422 00:59:17,696 --> 00:59:19,476 OR something, only one of them has to be true. 1423 00:59:19,476 --> 00:59:21,646 So true or false gives me true. 1424 00:59:22,036 --> 00:59:24,926 False or true gives me true. 1425 00:59:25,276 --> 00:59:28,996 False, false, false, false, true, so my answer 1426 00:59:29,226 --> 00:59:32,816 to c OR'd with 0x20 is this. 1427 00:59:32,816 --> 00:59:34,106 And you know what that equals? 1428 00:59:35,076 --> 00:59:35,646 This here. 1429 00:59:35,896 --> 00:59:38,486 So, now using this bit flipping or bit twiddling, 1430 00:59:38,486 --> 00:59:39,546 as people are fond of saying, 1431 00:59:39,546 --> 00:59:41,956 can we actually flip the bit individually. 1432 00:59:41,956 --> 00:59:42,936 What about the opposite? 1433 00:59:43,196 --> 00:59:45,556 If I actually wanted to force something to lowercase 1434 00:59:45,856 --> 00:59:49,086 and implement on the fly tolower, this is very easy. 1435 00:59:49,086 --> 00:59:51,356 I can tell the user to give me something different. 1436 00:59:51,746 --> 00:59:57,606 And now, how would I go about turning off that bit? 1437 00:59:57,701 --> 00:59:59,701 [ Inaudible audience response ] 1438 00:59:59,796 --> 01:00:00,676 >> I don't want to OR it. 1439 01:00:00,776 --> 01:00:03,336 So OR typically has the effect at the bitwise level 1440 01:00:03,336 --> 01:00:06,476 of turning bits on, changing zeros to ones, as we just saw. 1441 01:00:07,566 --> 01:00:09,066 So, we do want to use AND. 1442 01:00:09,546 --> 01:00:11,526 But if we just AND it together, is that going 1443 01:00:11,526 --> 01:00:12,556 to give us the right answer? 1444 01:00:13,456 --> 01:00:14,956 In fact if we AND these together, 1445 01:00:15,516 --> 01:00:16,696 notice what's going to happen. 1446 01:00:16,806 --> 01:00:19,536 I'm going to get all zeros, 1447 01:00:19,536 --> 01:00:24,826 which is definitely not an uppercase A. So, 1448 01:00:24,826 --> 01:00:25,906 what do I really want to do? 1449 01:00:25,906 --> 01:00:27,496 I don't want to AND it with this. 1450 01:00:27,566 --> 01:00:30,736 I want to AND it with... 1451 01:00:30,916 --> 01:00:31,746 What about the opposite? 1452 01:00:31,746 --> 01:00:33,646 Let me temporarily just come up with a second line. 1453 01:00:33,646 --> 01:00:37,096 Let me flip these bits: 11011111. 1454 01:00:37,096 --> 01:00:38,956 What if I do this? 1455 01:00:40,276 --> 01:00:46,266 What will that give me? 1456 01:00:46,436 --> 01:00:49,196 This will give me what? 1457 01:00:49,196 --> 01:00:49,376 [ Inaudible audience response ] 1458 01:00:49,376 --> 01:00:51,946 >> 01... Hmm. 1459 01:00:52,296 --> 01:00:54,626 Doesn't seem to be helping me, is it? 1460 01:00:55,606 --> 01:00:57,966 Or it is. That is in fact capital A, right? 1461 01:00:57,966 --> 01:01:00,286 If you look here and then look here, 1462 01:01:00,286 --> 01:01:02,616 that is in fact capital A. So, let's see this in code. 1463 01:01:02,936 --> 01:01:07,326 Let's go ahead: toupper.c. And in fact, it's as simple as this. 1464 01:01:07,616 --> 01:01:09,706 Now, how did I come up with this hexadecimal code? 1465 01:01:09,706 --> 01:01:11,496 Well, df, this one's actually pretty easy. 1466 01:01:11,526 --> 01:01:12,046 Same thing. 1467 01:01:12,326 --> 01:01:16,256 I just said df, so 0xdf equals, let's see, 1, 1468 01:01:16,256 --> 01:01:17,876 2, 3, 4; 1, 2, 3, 4... 1469 01:01:18,076 --> 01:01:20,006 the last four of these are pretty easy, 1, 2, 1470 01:01:20,006 --> 01:01:22,096 3, 4, and then what's d? 1471 01:01:22,436 --> 01:01:30,146 So, d is actually 13, because a is 10, b is 11, 12, 13... 1472 01:01:30,736 --> 01:01:32,146 So what does this represent? 1473 01:01:32,206 --> 01:01:34,206 [ Inaudible audience response ] 1474 01:01:34,266 --> 01:01:36,956 >> 1101. df. 1475 01:01:38,386 --> 01:01:41,016 Alright, now that we have this ability to manipulate bits 1476 01:01:41,016 --> 01:01:42,526 at the individual level, can we start 1477 01:01:42,526 --> 01:01:44,876 to implement much more interesting data structures. 1478 01:01:44,876 --> 01:01:46,196 So, it all comes full circle 1479 01:01:46,196 --> 01:01:48,736 in that we talked briefly conceptually about stacks 1480 01:01:48,806 --> 01:01:51,686 and queues, and these allow us to perform different operations 1481 01:01:51,766 --> 01:01:55,246 on data and also represent them in more sophisticated ways. 1482 01:01:55,516 --> 01:01:56,996 But suppose that the problems that we want 1483 01:01:56,996 --> 01:02:00,876 to start solving are not these trivial play examples, 1484 01:02:00,876 --> 01:02:02,976 of like people in line or students 1485 01:02:02,976 --> 01:02:04,206 in a queue, and so forth. 1486 01:02:04,356 --> 01:02:06,926 But suppose that we want to solve some real-world problems. 1487 01:02:06,926 --> 01:02:09,816 One of them might be implementing a spellchecker, 1488 01:02:09,866 --> 01:02:10,036 right? 1489 01:02:10,036 --> 01:02:12,106 Problem set 6, as we've teased you with before, 1490 01:02:12,106 --> 01:02:15,036 amounts to being handed a huge text file from us, 1491 01:02:15,036 --> 01:02:18,076 which has 140,000 English words in it. 1492 01:02:18,306 --> 01:02:20,376 And you are then challenged to represent all 1493 01:02:20,376 --> 01:02:21,386 of those words in memory. 1494 01:02:21,606 --> 01:02:24,336 Now, a very simplistic implementation might very well 1495 01:02:24,336 --> 01:02:27,296 be to just take those 140,000 words, 1496 01:02:27,386 --> 01:02:30,896 allocate a really big array, and store one word after the other, 1497 01:02:30,896 --> 01:02:33,446 and maybe you'd alphabetize to keep things slightly optimal, 1498 01:02:33,696 --> 01:02:36,886 and then you can perform binary search if asked the question, 1499 01:02:36,936 --> 01:02:38,426 is this word spelled correctly? 1500 01:02:38,516 --> 01:02:40,666 And I say binary search because given a word, 1501 01:02:40,856 --> 01:02:44,006 you can quickly look it up in your huge dictionary and check: 1502 01:02:44,136 --> 01:02:45,146 is it actually present? 1503 01:02:45,146 --> 01:02:47,556 If so, it's obviously spelled correctly; if not, 1504 01:02:47,766 --> 01:02:49,626 it's presumably not spelled correctly. 1505 01:02:49,886 --> 01:02:51,136 So that would give you an answer. 1506 01:02:51,406 --> 01:02:55,486 But binary search is log n, which isn't actually all 1507 01:02:55,486 --> 01:02:56,876 that impressive, relative 1508 01:02:56,966 --> 01:03:01,116 to a better running time, namely Big O of 1. 1509 01:03:01,146 --> 01:03:02,856 O(1) represents what notion of running time? 1510 01:03:02,856 --> 01:03:03,966 [ Inaudible audience response ] 1511 01:03:03,966 --> 01:03:04,906 >> Right, constant time. 1512 01:03:05,036 --> 01:03:07,406 And in fact, thus far, we've been pretty content 1513 01:03:07,686 --> 01:03:11,266 to just have log n running time, or even n running time, 1514 01:03:11,266 --> 01:03:12,606 as opposed to n squared, or the like. 1515 01:03:12,846 --> 01:03:15,266 But what if the Holy Grail really is 1516 01:03:15,316 --> 01:03:17,066 to get constant running time. 1517 01:03:17,066 --> 01:03:19,396 Or if I ask a question, I want the answer now. 1518 01:03:19,676 --> 01:03:21,836 I don't want to wait n. I don't want to wait O(n squared). 1519 01:03:21,836 --> 01:03:24,106 I want don't even want to wait log n time. 1520 01:03:24,106 --> 01:03:25,956 I want that answer now. 1521 01:03:25,956 --> 01:03:26,976 Case in point, something 1522 01:03:26,976 --> 01:03:29,706 like Google probably is not doing linear search 1523 01:03:29,706 --> 01:03:32,656 and probably not even binary search of the billions of pages 1524 01:03:32,656 --> 01:03:34,336 that it actually has in its database. 1525 01:03:34,516 --> 01:03:36,096 Hopefully there's a much faster way 1526 01:03:36,096 --> 01:03:39,246 so that I get my seemingly instantaneous reply. 1527 01:03:39,246 --> 01:03:41,266 And the ideal, of course, is one step. 1528 01:03:41,486 --> 01:03:43,216 Give me an answer in just one step. 1529 01:03:43,506 --> 01:03:45,266 So, one of the questions we'll be able to ask, 1530 01:03:45,706 --> 01:03:47,506 now that we have some of these new skills, 1531 01:03:47,506 --> 01:03:49,646 is that if I have a container for data, 1532 01:03:49,646 --> 01:03:51,606 with something we'll generally call a hash table, 1533 01:03:51,886 --> 01:03:54,086 into which I can put numbers, I can put students, 1534 01:03:54,086 --> 01:03:56,426 I can put words from a dictionary, 1535 01:03:56,736 --> 01:04:01,096 how can I put those words or those humans into these buckets 1536 01:04:01,286 --> 01:04:05,606 of a hash table in such a way that when I ask for that person 1537 01:04:05,606 --> 01:04:08,846 or that word again, you can get this answer instantly? 1538 01:04:09,026 --> 01:04:11,596 And to motivate this, ultimately with problem set 6, 1539 01:04:11,626 --> 01:04:12,756 not only will you be challenged 1540 01:04:12,756 --> 01:04:16,106 to implement the fastest spellchecker possible that's 1541 01:04:16,106 --> 01:04:18,046 correct, you will also be challenged 1542 01:04:18,046 --> 01:04:20,446 to implement the fastest spell checker vis-a-vis your 1543 01:04:20,596 --> 01:04:23,146 classmates, if you choose to opt in, whereby we'll rank 1544 01:04:23,146 --> 01:04:26,466 on the course's homepage what your running time currently is, 1545 01:04:26,506 --> 01:04:28,346 what your amount of RAM currently is, 1546 01:04:28,346 --> 01:04:30,346 so we'll have a bit of a race next week, if you choose 1547 01:04:30,346 --> 01:04:34,176 to partake, to implement the fastest spellchecker in CS50. 1548 01:04:34,486 --> 01:04:37,276 So with that little teaser we will see you on Wednesday. 1549 01:04:40,508 --> 01:04:42,508 [ Applause ]