1 00:00:00,000 --> 00:00:03,395 >> [MUSIC PLAYING] 2 00:00:03,395 --> 00:00:11,031 3 00:00:11,031 --> 00:00:13,280 DAVID J. MALAN: This is like a freshman seminar today. 4 00:00:13,280 --> 00:00:14,060 OK. 5 00:00:14,060 --> 00:00:15,024 So very rainy out. 6 00:00:15,024 --> 00:00:17,690 This tends to happen on Wednesdays, but all the more opportunity 7 00:00:17,690 --> 00:00:18,700 for questions today. 8 00:00:18,700 --> 00:00:22,210 So let's start off actually with the film in just a moment. 9 00:00:22,210 --> 00:00:24,560 But we'll start grandly as always. 10 00:00:24,560 --> 00:00:28,000 >> This is CS50, and this is the end of week 4. 11 00:00:28,000 --> 00:00:30,820 So if you've ever watched TV or a movie wherein 12 00:00:30,820 --> 00:00:34,690 there's some computer experts and the police, or FBI, or some agency 13 00:00:34,690 --> 00:00:36,930 is trying to catch some adversary, well, you've 14 00:00:36,930 --> 00:00:40,850 probably heard the expression "enhance," whereby that technician somehow 15 00:00:40,850 --> 00:00:44,750 magically zooms in infinitely far to see the criminals 16 00:00:44,750 --> 00:00:48,640 identity or the license plate number in even the shimmer of a mirror 17 00:00:48,640 --> 00:00:50,390 or the glint of someone's eye. 18 00:00:50,390 --> 00:00:55,196 So indeed, let's take a look at a few such scenes from Hollywood. 19 00:00:55,196 --> 00:00:55,862 [VIDEO PLAYBACK] 20 00:00:55,862 --> 00:00:59,243 -OK, now let's get a good look at you. 21 00:00:59,243 --> 00:01:06,488 22 00:01:06,488 --> 00:01:07,415 >> -Hold it. 23 00:01:07,415 --> 00:01:08,267 Run that back. 24 00:01:08,267 --> 00:01:09,121 >> -Wait a minute. 25 00:01:09,121 --> 00:01:11,300 Go right. 26 00:01:11,300 --> 00:01:12,209 >> -There, freeze that. 27 00:01:12,209 --> 00:01:12,750 -Full screen. 28 00:01:12,750 --> 00:01:13,558 -OK, freeze that. 29 00:01:13,558 --> 00:01:14,820 -Tighten up on that, will you? 30 00:01:14,820 --> 00:01:16,530 -Vector in on that guy by the back wheel. 31 00:01:16,530 --> 00:01:19,400 -Zoom in right here on this spot. 32 00:01:19,400 --> 00:01:22,846 -With the right equipment, the image could be enlarged and sharpened. 33 00:01:22,846 --> 00:01:24,065 -What's that? 34 00:01:24,065 --> 00:01:25,600 -It's an enhancement program. 35 00:01:25,600 --> 00:01:26,860 -Can you clear that up any? 36 00:01:26,860 --> 00:01:27,890 -I don't know. 37 00:01:27,890 --> 00:01:29,050 Let's enhance it. 38 00:01:29,050 --> 00:01:31,575 >> -Enhance section A6. 39 00:01:31,575 --> 00:01:33,642 >> -I enhanced the detail, and-- I think there's 40 00:01:33,642 --> 00:01:35,433 enough to enhance, release it to my screen. 41 00:01:35,433 --> 00:01:37,080 -I enhanced the reflection in her eye. 42 00:01:37,080 --> 00:01:38,830 >> -Let's run this through video enhancement. 43 00:01:38,830 --> 00:01:40,100 -Edgar, can you enhance this? 44 00:01:40,100 --> 00:01:41,875 >> -Hang on. 45 00:01:41,875 --> 00:01:44,010 >> -I've been working on this reflection. 46 00:01:44,010 --> 00:01:44,995 >> -Someone's reflection. 47 00:01:44,995 --> 00:01:45,495 -Reflection. 48 00:01:45,495 --> 00:01:47,399 -There's a reflection of the man's face. 49 00:01:47,399 --> 00:01:48,065 -The reflection. 50 00:01:48,065 --> 00:01:48,981 -There's a reflection. 51 00:01:48,981 --> 00:01:50,600 -Zoom in on the mirror. 52 00:01:50,600 --> 00:01:52,712 -You can see a reflection. 53 00:01:52,712 --> 00:01:54,350 -Can you enhance the image from here? 54 00:01:54,350 --> 00:01:55,370 -Can you enhance him right here? 55 00:01:55,370 --> 00:01:56,210 -Can you enhance it? 56 00:01:56,210 --> 00:01:56,900 Can you enhance it? 57 00:01:56,900 --> 00:01:57,870 >> -Can we enhance this? 58 00:01:57,870 --> 00:01:58,717 >> -Can you enhance it? 59 00:01:58,717 --> 00:02:00,050 -Hold on a second, I'll enhance. 60 00:02:00,050 --> 00:02:00,924 -Zoom in on the door. 61 00:02:00,924 --> 00:02:01,700 -Times 10. 62 00:02:01,700 --> 00:02:02,586 -Zoom. 63 00:02:02,586 --> 00:02:03,490 -Move in. 64 00:02:03,490 --> 00:02:03,990 -More. 65 00:02:03,990 --> 00:02:04,690 -Wait, stop. 66 00:02:04,690 --> 00:02:05,190 -Stop. 67 00:02:05,190 --> 00:02:05,970 -Pause it. 68 00:02:05,970 --> 00:02:09,460 -Rotate us 75 degrees around the vertical, please. 69 00:02:09,460 --> 00:02:10,962 -Stop. 70 00:02:10,962 --> 00:02:14,040 Go back to the part about the door, again. 71 00:02:14,040 --> 00:02:15,860 >> -Got an image enhancer that can bitmap? 72 00:02:15,860 --> 00:02:18,776 >> -Hey, maybe we can use the Pradeep Sen method to see into the windows. 73 00:02:18,776 --> 00:02:20,372 -This software is state of the art. 74 00:02:20,372 --> 00:02:21,845 >> -The eigenvalue is off. 75 00:02:21,845 --> 00:02:24,300 >> -With the right combination of algorithm-- 76 00:02:24,300 --> 00:02:26,755 >> -He's taken elimination algorithms to the next level, 77 00:02:26,755 --> 00:02:28,730 and I can use them to enhance this photograph. 78 00:02:28,730 --> 00:02:31,286 >> -Lock on and enlarge the z-axis. 79 00:02:31,286 --> 00:02:32,560 >> -Enhance. 80 00:02:32,560 --> 00:02:33,100 >> -Enhance. 81 00:02:33,100 --> 00:02:33,600 >> -Enhance. 82 00:02:33,600 --> 00:02:34,960 -Freeze and enhance. 83 00:02:34,960 --> 00:02:37,180 >> [END PLAYBACK] 84 00:02:37,180 --> 00:02:41,160 >> DAVID J. MALAN: All right, so all of those are actually words. 85 00:02:41,160 --> 00:02:44,450 They're just strung together in a way that's not actually sensible. 86 00:02:44,450 --> 00:02:48,400 And, in fact, CS50 and courses like it tends to ruin a lot of TV and movies 87 00:02:48,400 --> 00:02:48,900 for you. 88 00:02:48,900 --> 00:02:52,330 Because when those computer experts are rattling off terms and saying 89 00:02:52,330 --> 00:02:56,860 fancy things like eigenvectors, and the z-axis, 90 00:02:56,860 --> 00:02:59,572 and any number of other actually more technical terms, 91 00:02:59,572 --> 00:03:02,030 they're really just stringing words together all too often. 92 00:03:02,030 --> 00:03:05,020 Is that one of our hopes is that, as a side effect of taking courses 93 00:03:05,020 --> 00:03:08,245 like this, will more people in the world actually be able to weigh in 94 00:03:08,245 --> 00:03:12,040 and just ever so slightly influence the quality and accuracy of those films? 95 00:03:12,040 --> 00:03:14,350 >> In fact, let's take a look at reality. 96 00:03:14,350 --> 00:03:18,070 So here is the staff photo of Mary, one of our teaching fellows. 97 00:03:18,070 --> 00:03:20,050 And suppose she is suspected of something. 98 00:03:20,050 --> 00:03:23,730 And yet, there's a glimmer of some piece of evidence in her eye, 99 00:03:23,730 --> 00:03:25,480 or in the reflection of her eyeglasses. 100 00:03:25,480 --> 00:03:30,760 Well, if we do exactly as the films propose, wherein we zoom and "enhance", 101 00:03:30,760 --> 00:03:34,080 this is how much information is in Mary's face 102 00:03:34,080 --> 00:03:36,795 when you capture an image with that original resolution. 103 00:03:36,795 --> 00:03:39,120 >> And, in fact, you can see these dots. 104 00:03:39,120 --> 00:03:41,900 And these are what are called pixels, P-I-X-E-L-S, 105 00:03:41,900 --> 00:03:45,740 which is just a square typically that is a dot that composes an image. 106 00:03:45,740 --> 00:03:49,200 And back in the day, and actually even today with some of today's LED TVs 107 00:03:49,200 --> 00:03:51,950 or LCD TVs, if you've got one in your room or at home, 108 00:03:51,950 --> 00:03:55,100 if you go up super close to it, and especially if it's a somewhat older TV, 109 00:03:55,100 --> 00:03:58,760 you can probably even see these dots and that's what compose an image. 110 00:03:58,760 --> 00:04:00,980 >> And there is no more information than this. 111 00:04:00,980 --> 00:04:05,400 We could "enhance", in the sense of smoothing things over and sort of 112 00:04:05,400 --> 00:04:09,040 inferring kind of, sort of what color should be next to Mary's eye 113 00:04:09,040 --> 00:04:10,910 so that it's not actually so pixelated. 114 00:04:10,910 --> 00:04:14,510 But if I keep zooming in, there is the bad guy in her eye. 115 00:04:14,510 --> 00:04:16,600 Like that is all the information we have. 116 00:04:16,600 --> 00:04:18,920 You cannot create information out of nothing. 117 00:04:18,920 --> 00:04:20,790 There's only a finite number of bits there. 118 00:04:20,790 --> 00:04:22,873 >> So in Problem Set 4, where you have an opportunity 119 00:04:22,873 --> 00:04:24,580 to play with this kind of world. 120 00:04:24,580 --> 00:04:27,610 In Problem Set 4, you'll explore the world of graphics, and forensics, 121 00:04:27,610 --> 00:04:30,870 and actually write code that recovers lost images. 122 00:04:30,870 --> 00:04:33,510 You'll write code that manipulates existing images 123 00:04:33,510 --> 00:04:36,120 and ultimately understand what's going on underneath the hood. 124 00:04:36,120 --> 00:04:38,540 >> And, it turns out, it's actually not all that complicated. 125 00:04:38,540 --> 00:04:41,320 For instance, if we wanted to represent a smiley face where 126 00:04:41,320 --> 00:04:44,160 with these black pixels, or these black dots, 127 00:04:44,160 --> 00:04:47,230 well, we could simply represent them as truly a bitmap. 128 00:04:47,230 --> 00:04:50,040 And if you had ever heard that expression bitmap, perhaps 129 00:04:50,040 --> 00:04:52,330 it now starts to make a little more sense today. 130 00:04:52,330 --> 00:04:53,580 >> We already know what a bit is. 131 00:04:53,580 --> 00:04:54,160 It's 0 or 1. 132 00:04:54,160 --> 00:04:56,201 And a map is just something like a piece of paper 133 00:04:56,201 --> 00:04:59,180 that gives you directions and has maybe a grid of x- and y-coordinates. 134 00:04:59,180 --> 00:05:00,540 So here is a bitmap. 135 00:05:00,540 --> 00:05:03,680 It's a map of bits whereby a 1 is apparently 136 00:05:03,680 --> 00:05:07,857 going to represent a white pixel, and a 0 is going to represent a black pixel. 137 00:05:07,857 --> 00:05:09,440 But we could certainly flip it around. 138 00:05:09,440 --> 00:05:11,648 It doesn't really matter so long as we're consistent. 139 00:05:11,648 --> 00:05:15,570 And here is how, in binary-- inside of a computer's memory, or even inside 140 00:05:15,570 --> 00:05:18,160 of a file on your hard drive-- could you store 141 00:05:18,160 --> 00:05:20,240 the simplest of smiley face images. 142 00:05:20,240 --> 00:05:23,990 But what are we, of course, lacking in this image? 143 00:05:23,990 --> 00:05:24,610 Color, right? 144 00:05:24,610 --> 00:05:28,220 It's an obvious next step or enhancement to improve this with color. 145 00:05:28,220 --> 00:05:32,230 So unfortunately with just a single bit, 0 or 1, we could represent color. 146 00:05:32,230 --> 00:05:36,100 That could be red, or blue, or black, or white, or green, or pink, 147 00:05:36,100 --> 00:05:37,420 or any pairs of colors. 148 00:05:37,420 --> 00:05:40,860 But for simplicity's sake, we'll just assume black and white. 149 00:05:40,860 --> 00:05:45,930 >> So what logically do we need if we want to implement color in an image? 150 00:05:45,930 --> 00:05:49,080 What do we have to do? 151 00:05:49,080 --> 00:05:51,900 Like if the limiting factor here is that with one bit you can only 152 00:05:51,900 --> 00:05:55,977 represent two states, 0 or 1, white or black, what do you want to do? 153 00:05:55,977 --> 00:05:56,810 AUDIENCE: More data. 154 00:05:56,810 --> 00:05:58,813 DAVID J. MALAN: More bits, yeah more data, more bits. 155 00:05:58,813 --> 00:06:01,440 And, indeed, that's exactly how color images are represented. 156 00:06:01,440 --> 00:06:05,120 Rather than use a single bit, a 0 or 1 for each pixel, each dot, 157 00:06:05,120 --> 00:06:06,170 you just use multiple. 158 00:06:06,170 --> 00:06:09,660 Maybe use 8, maybe, more commonly use 24, and indeed, in Problem Set 159 00:06:09,660 --> 00:06:13,300 4, will you play with a file format that uses 24 bits typically. 160 00:06:13,300 --> 00:06:15,430 >> But most of you are probably familiar with JPEGs. 161 00:06:15,430 --> 00:06:17,460 If you've ever taken a photo on your phone, 162 00:06:17,460 --> 00:06:20,360 or uploaded or seen something on Facebook, or Flickr, any number 163 00:06:20,360 --> 00:06:24,882 of photo-based websites, you've probably seen a JPEG image before. 164 00:06:24,882 --> 00:06:27,840 And it turns out, this is the file format we're going to use in PSet 4, 165 00:06:27,840 --> 00:06:30,340 whereby you're going to have to recover images 166 00:06:30,340 --> 00:06:35,160 that I've accidentally deleted from a corrupted memory card in the camera, 167 00:06:35,160 --> 00:06:35,800 if you will. 168 00:06:35,800 --> 00:06:38,490 >> And it turns out that even though JPEG is pretty sophisticated-- 169 00:06:38,490 --> 00:06:40,906 it's much more sophisticated than the black and white dots 170 00:06:40,906 --> 00:06:44,480 we saw a moment ago, because there's actually fancy algorithms that 171 00:06:44,480 --> 00:06:47,410 are used to compress a JPEG, so that you can have a really nice, 172 00:06:47,410 --> 00:06:49,832 quality picture but using relatively few bits. 173 00:06:49,832 --> 00:06:51,790 And we'll come back to compression before long. 174 00:06:51,790 --> 00:06:56,280 It turns out that the first three bytes in a JPEG image-- 175 00:06:56,280 --> 00:07:02,750 no matter what you've taken a photograph of-- are the values 255, 216, 255. 176 00:07:02,750 --> 00:07:05,990 >> In other words, if you just see that pattern of bits, 177 00:07:05,990 --> 00:07:09,180 represented here as three bytes, or 24 bits total, 178 00:07:09,180 --> 00:07:13,810 with high probability you can infer that you are looking at it this first three 179 00:07:13,810 --> 00:07:15,230 bytes of a JPEG. 180 00:07:15,230 --> 00:07:18,040 And this is what's known as the signature of a JPEG. 181 00:07:18,040 --> 00:07:20,540 A lot of file formats out there tend to start 182 00:07:20,540 --> 00:07:23,735 with certain patterns of 0s and 1s, so that Windows, and Mac OS, and iOS, 183 00:07:23,735 --> 00:07:28,272 and Android know what kind of file they are, in addition to the so-called file 184 00:07:28,272 --> 00:07:29,730 extension that a lot of files have. 185 00:07:29,730 --> 00:07:32,590 If you have .jpg, that's another clue to the computer. 186 00:07:32,590 --> 00:07:35,310 >> So let's now look at this a little more technically. 187 00:07:35,310 --> 00:07:37,390 We know the decimal system is 0 through 9. 188 00:07:37,390 --> 00:07:38,740 We know binary is 0 and 1. 189 00:07:38,740 --> 00:07:41,842 And if you think back to PSet 0, we had you wrestle with, 190 00:07:41,842 --> 00:07:43,800 for a little bit, something called hexadecimal, 191 00:07:43,800 --> 00:07:47,320 where you have 16 digits, instead of 10 or instead of 2. 192 00:07:47,320 --> 00:07:50,405 And those digits, by convention, are 0 through 9 and then a 193 00:07:50,405 --> 00:07:55,040 through f, where f represents what decimal number, just as a quick sanity 194 00:07:55,040 --> 00:07:56,640 check? 195 00:07:56,640 --> 00:07:57,610 So, 15. 196 00:07:57,610 --> 00:08:01,390 And a must represent 10, just by nature of the ordering that I've given. 197 00:08:01,390 --> 00:08:04,350 It's just an arbitrary convention, but it's quite standard. 198 00:08:04,350 --> 00:08:06,870 >> So if we look at this pattern of three bytes-- let's 199 00:08:06,870 --> 00:08:09,620 just start to look at it in a manner consistent with how 200 00:08:09,620 --> 00:08:12,450 computer scientists generally look at and think about files. 201 00:08:12,450 --> 00:08:15,580 You can certainly think about files in 0s, and 1s, and decimal, 202 00:08:15,580 --> 00:08:19,340 but in reality, we tend to use binary or more typically hexadecimal-- 203 00:08:19,340 --> 00:08:20,760 back from PSet 0. 204 00:08:20,760 --> 00:08:25,857 So let me propose that 255, 216, and 255 are just these patterns of 0s and 1s. 205 00:08:25,857 --> 00:08:28,440 And you can check this if you want to do the math from Week 0. 206 00:08:28,440 --> 00:08:30,810 But, for now, just assume that this is indeed correct. 207 00:08:30,810 --> 00:08:33,850 I've just rewritten three decimal numbers as three binary values. 208 00:08:33,850 --> 00:08:36,100 Now what I'm going to do is just add some white space, 209 00:08:36,100 --> 00:08:37,266 just for readability's sake. 210 00:08:37,266 --> 00:08:39,940 And notice, I'm just going to move things apart. 211 00:08:39,940 --> 00:08:43,090 So before, after, before, after. 212 00:08:43,090 --> 00:08:46,180 I'm doing nothing interesting other than just spreading things out so 213 00:08:46,180 --> 00:08:50,380 that notice each set of eight bits is now two sets of four bits. 214 00:08:50,380 --> 00:08:54,920 This is useful because hexadecimal is particularly fashionable 215 00:08:54,920 --> 00:09:00,930 because each hexadecimal digit 0 through f, or more specifically 0 through 15, 216 00:09:00,930 --> 00:09:03,430 can be represented with exactly four bits. 217 00:09:03,430 --> 00:09:07,960 In other words, in hexadecimal if you want to represent a 0, it's just 0000, 218 00:09:07,960 --> 00:09:08,780 four zeros. 219 00:09:08,780 --> 00:09:13,997 And if you want to represent 15, it's 1111, which is four bits. 220 00:09:13,997 --> 00:09:16,080 And if you do the math, if this is the ones place, 221 00:09:16,080 --> 00:09:18,210 this is the 16s place, that's going to give you-- 222 00:09:18,210 --> 00:09:19,960 rather that's going to-- sorry, in binary, 223 00:09:19,960 --> 00:09:23,660 that's going to give you 15, ones place, twos place, fours and eights place. 224 00:09:23,660 --> 00:09:26,821 So let me propose that that set of four bits on the left 225 00:09:26,821 --> 00:09:28,070 is what we're going to call f. 226 00:09:28,070 --> 00:09:30,110 It's the biggest number you can represent with four bits. 227 00:09:30,110 --> 00:09:33,300 And we already know from hexadecimal, f is the biggest digit in hexadecimal. 228 00:09:33,300 --> 00:09:36,020 We've got another f there, two more over there. 229 00:09:36,020 --> 00:09:38,980 And for now, just take on faith that I have done the math right 230 00:09:38,980 --> 00:09:41,890 and that the left half of those bits, 1101, 231 00:09:41,890 --> 00:09:43,980 is the same thing as d in hexadecimal. 232 00:09:43,980 --> 00:09:46,490 And the right hand, 1000, is just 8. 233 00:09:46,490 --> 00:09:48,140 >> And that one's easy to see, right? 234 00:09:48,140 --> 00:09:51,670 The 8 represents-- is right underneath that eights place. 235 00:09:51,670 --> 00:09:56,040 So we have one in the eights column and nothing in the fours, twos or ones. 236 00:09:56,040 --> 00:09:59,830 So now more conventionally, humans tend to write hexadecimal digits like this, 237 00:09:59,830 --> 00:10:03,000 you just squish them together, and then you prefix them with 0x. 238 00:10:03,000 --> 00:10:05,920 It means nothing other than a visual clue to a human-- 239 00:10:05,920 --> 00:10:10,350 here comes a hexadecimal value-- because it might not otherwise be obvious. 240 00:10:10,350 --> 00:10:13,629 >> Which is to say, ultimately, that the pattern of zeros and ones, 241 00:10:13,629 --> 00:10:16,170 or the pattern of hexadecimal digits equivalently that you're 242 00:10:16,170 --> 00:10:18,990 going to start looking for in Problem Set 4 is this-- 243 00:10:18,990 --> 00:10:22,120 and the Problem Set 4 spec will walk you through this in more detail-- 244 00:10:22,120 --> 00:10:25,344 but realize as sort of arcane as this might look at first glance, 245 00:10:25,344 --> 00:10:27,010 you're going to start seeing this a lot. 246 00:10:27,010 --> 00:10:30,320 And in fact, even in GDB, the debugger we introduced on Monday 247 00:10:30,320 --> 00:10:35,440 and Dan introduces in PSet 3, is going to often show you hexadecimal values 248 00:10:35,440 --> 00:10:39,910 just because they tend to be more conventional than decimal or binary 249 00:10:39,910 --> 00:10:41,157 in the world of computers. 250 00:10:41,157 --> 00:10:42,490 Now let's put this into context. 251 00:10:42,490 --> 00:10:48,040 Many of you might remember this picture here, which came from what? 252 00:10:48,040 --> 00:10:51,240 Vista, so even earlier than that, Windows XP did this debut. 253 00:10:51,240 --> 00:10:52,620 So this is a beautiful landscape. 254 00:10:52,620 --> 00:10:55,940 And in fact, if you poke around online-- I think it's a Wikipedia article, 255 00:10:55,940 --> 00:11:00,110 wherein someone very amazingly went out found this location in the world set up 256 00:11:00,110 --> 00:11:02,240 his or her camera in precisely the right place-- 257 00:11:02,240 --> 00:11:06,510 and this today looks like-- but it's exactly the same setting. 258 00:11:06,510 --> 00:11:10,060 This image, though, is in a file format called bitmap, b-m-p. 259 00:11:10,060 --> 00:11:12,910 And we're going to take a super quick glance at what that means. 260 00:11:12,910 --> 00:11:17,770 >> But bitmap is just a different way of representing images still using pixels 261 00:11:17,770 --> 00:11:19,580 in 0s and 1s, ultimately. 262 00:11:19,580 --> 00:11:23,282 But at quick glance, it has a more interesting signature 263 00:11:23,282 --> 00:11:24,490 at the beginning of the file. 264 00:11:24,490 --> 00:11:26,670 It's not just three bytes, rather there's 265 00:11:26,670 --> 00:11:30,770 a whole bunch of patterns of bytes that have predetermined meaning. 266 00:11:30,770 --> 00:11:34,490 For instance, somewhere in the first few bytes of a bitmap image 267 00:11:34,490 --> 00:11:37,440 is going to be the size of the image, the width of the image, 268 00:11:37,440 --> 00:11:40,390 the height of the image, so useful metadata, if you will. 269 00:11:40,390 --> 00:11:43,940 Useful information that Photoshop or any graphics program you're using 270 00:11:43,940 --> 00:11:45,180 might actually care about. 271 00:11:45,180 --> 00:11:47,170 >> So more on this in Problem Set 4, but this 272 00:11:47,170 --> 00:11:49,220 is only to say that at the end of the day 273 00:11:49,220 --> 00:11:52,390 all the file formats you've been using for years-- Microsoft Word files, 274 00:11:52,390 --> 00:11:55,820 Numbers files, Excel files, any number of file formats 275 00:11:55,820 --> 00:11:57,770 that might have some known file extension 276 00:11:57,770 --> 00:12:00,130 are just 0s and 1s underneath the hood. 277 00:12:00,130 --> 00:12:02,970 And humans have decided what the conventions are, 278 00:12:02,970 --> 00:12:08,340 what patterns of 0s and 1s represent a Word file versus an Excel file, 279 00:12:08,340 --> 00:12:10,322 versus any number of other file formats. 280 00:12:10,322 --> 00:12:12,780 So in PSet 4, you'll have an opportunity to play with that. 281 00:12:12,780 --> 00:12:14,405 >> But what does it mean to have a struct. 282 00:12:14,405 --> 00:12:18,012 This is actually a nice segue now into C, which has only a couple 283 00:12:18,012 --> 00:12:20,220 of additional features that we haven't looked at yet. 284 00:12:20,220 --> 00:12:24,230 It's a pretty small language and one of the nice features about C is a struct. 285 00:12:24,230 --> 00:12:27,300 For instance, if you wanted to represent-- let's 286 00:12:27,300 --> 00:12:33,690 say you wanted to have a variable that represents a student in some program. 287 00:12:33,690 --> 00:12:37,330 Maybe you were writing a course registration program, or core shopping 288 00:12:37,330 --> 00:12:38,870 tool, or something like that. 289 00:12:38,870 --> 00:12:42,922 What are pieces of data related to a student that come to mind? 290 00:12:42,922 --> 00:12:44,880 Like a student is represented with what values? 291 00:12:44,880 --> 00:12:45,732 Yeah? 292 00:12:45,732 --> 00:12:46,940 You have a name as a student. 293 00:12:46,940 --> 00:12:48,900 What else does a typical student have? 294 00:12:48,900 --> 00:12:49,320 >> AUDIENCE: [INAUDIBLE] 295 00:12:49,320 --> 00:12:50,200 >> DAVID J. MALAN: So, sorry. 296 00:12:50,200 --> 00:12:50,660 >> AUDIENCE: Age. 297 00:12:50,660 --> 00:12:52,980 >> DAVID J. MALAN: An age or birthday equivalently, yep. 298 00:12:52,980 --> 00:12:53,557 What else? 299 00:12:53,557 --> 00:12:54,390 AUDIENCE: ID number? 300 00:12:54,390 --> 00:12:57,460 DAVID J. MALAN: So an ID number, maybe a phone number, maybe a dorm, or house, 301 00:12:57,460 --> 00:12:58,670 or college, or something like that. 302 00:12:58,670 --> 00:13:01,820 Any number of pieces of data that you might have in your contacts list 303 00:13:01,820 --> 00:13:03,890 is what might define a student. 304 00:13:03,890 --> 00:13:08,490 So if we wanted to do this, in code, we might do something simple like this. 305 00:13:08,490 --> 00:13:15,670 We might have a program so that has let's say, int main(void). 306 00:13:15,670 --> 00:13:18,920 And if I want to represent a student I might have, for instance, 307 00:13:18,920 --> 00:13:24,330 a string called name for that student, a string called dorm for that student, 308 00:13:24,330 --> 00:13:26,900 maybe an int called ID for that student. 309 00:13:26,900 --> 00:13:30,840 And because I'm using string, I need to go back and put up CS50.h. 310 00:13:30,840 --> 00:13:33,300 Maybe I'm going to need stdio.h. 311 00:13:33,300 --> 00:13:38,190 So let me preemptively do those and I'm going to call this student.c for now 312 00:13:38,190 --> 00:13:40,080 and save this. 313 00:13:40,080 --> 00:13:44,206 >> And now I can do something with these variables. 314 00:13:44,206 --> 00:13:46,830 And we're just going to write that as a comment in pseudo code, 315 00:13:46,830 --> 00:13:48,829 because it's not interesting what we do for now. 316 00:13:48,829 --> 00:13:51,242 OK, so this is a program that somehow stores a student. 317 00:13:51,242 --> 00:13:53,450 What do I want to do if I want to store two students? 318 00:13:53,450 --> 00:13:55,991 So my first instinct is going to be all right, wait a minute, 319 00:13:55,991 --> 00:14:01,920 if I have another student why don't I just do string name 2 , string dorm 2, 320 00:14:01,920 --> 00:14:04,190 int id2. 321 00:14:04,190 --> 00:14:06,540 And we've done gone down this road before 322 00:14:06,540 --> 00:14:10,890 and what was our solution to what seems to be kind of a hackish copy paste 323 00:14:10,890 --> 00:14:11,555 job here? 324 00:14:11,555 --> 00:14:12,346 AUDIENCE: An array. 325 00:14:12,346 --> 00:14:13,830 DAVID J. MALAN: Yeah, we could use an array. 326 00:14:13,830 --> 00:14:15,620 Right this very quickly becomes unwieldy. 327 00:14:15,620 --> 00:14:18,453 You have to sort of arbitrarily start naming all of these variables. 328 00:14:18,453 --> 00:14:22,190 And you, the human, have to keep track that OK name2 corresponds 329 00:14:22,190 --> 00:14:25,060 with dorm2 corresponds with id2. 330 00:14:25,060 --> 00:14:26,200 It just becomes a mess. 331 00:14:26,200 --> 00:14:29,350 So it's a lot easier, recall from a few weeks ago, 332 00:14:29,350 --> 00:14:34,300 to just having to called string names and maybe give us three of those. 333 00:14:34,300 --> 00:14:36,940 And then maybe we have string dorms and have 334 00:14:36,940 --> 00:14:41,900 three of those, or with a constant, int ids and have three of those. 335 00:14:41,900 --> 00:14:45,250 But even now this feels a little sloppy, right. 336 00:14:45,250 --> 00:14:49,440 We're talking about students and yet I'm really dwelling on the low level 337 00:14:49,440 --> 00:14:50,470 implementation details. 338 00:14:50,470 --> 00:14:52,790 The student is a name and a dorm and ID. 339 00:14:52,790 --> 00:14:59,814 >> Why can't I just declare a variable called student and call it s. 340 00:14:59,814 --> 00:15:02,230 And if I want another student, why don't I just call it t. 341 00:15:02,230 --> 00:15:05,260 Or if I want a whole bunch of students, why don't I just 342 00:15:05,260 --> 00:15:09,740 say I have a whole class of students, and it's three of them. 343 00:15:09,740 --> 00:15:12,470 In other words, why can't I come up with my own data type, called 344 00:15:12,470 --> 00:15:15,641 Students, inside of which is a name, is an ID, is a dorm, 345 00:15:15,641 --> 00:15:16,890 is any number of other fields. 346 00:15:16,890 --> 00:15:19,030 And it turns out you can do exactly that. 347 00:15:19,030 --> 00:15:21,850 >> So C has this feature called struct. 348 00:15:21,850 --> 00:15:24,700 That's a language feature that allows us to do exactly this. 349 00:15:24,700 --> 00:15:28,370 I'm going to go ahead and open up structs.h 350 00:15:28,370 --> 00:15:32,299 where we're going to see the following definition of a student. 351 00:15:32,299 --> 00:15:35,215 It turns out--and this one's even simpler than the one involving an ID 352 00:15:35,215 --> 00:15:36,080 a moment ago. 353 00:15:36,080 --> 00:15:39,120 If you want to come up with your homemade data type, 354 00:15:39,120 --> 00:15:42,750 and in addition to int, and char and float and all these others that exist, 355 00:15:42,750 --> 00:15:45,810 you can do so by literally writing typedef struct, 356 00:15:45,810 --> 00:15:47,880 then some curly braces, inside of which you 357 00:15:47,880 --> 00:15:51,460 list the variables you want to associate with this new custom data 358 00:15:51,460 --> 00:15:55,670 type like a name and a dorm, and then after the curly braces 359 00:15:55,670 --> 00:15:57,860 you give a name to the new data type. 360 00:15:57,860 --> 00:15:59,220 So, for instance, student. 361 00:15:59,220 --> 00:16:03,247 >> And what's nice about this now is that if we look at the corresponding code, 362 00:16:03,247 --> 00:16:05,080 the convention, first of all, is to put this 363 00:16:05,080 --> 00:16:08,230 in a file called something dot h, a header file, which we haven't 364 00:16:08,230 --> 00:16:09,780 started using ourselves too much. 365 00:16:09,780 --> 00:16:12,120 But we're going to start using quite a bit now. 366 00:16:12,120 --> 00:16:18,650 And what we can do with this, ultimately, in these few lines of code 367 00:16:18,650 --> 00:16:22,130 is declare exactly that data type, a student. 368 00:16:22,130 --> 00:16:23,230 And now let's use it. 369 00:16:23,230 --> 00:16:27,274 >> I'm going to now go into a file called structs1.c. 370 00:16:27,274 --> 00:16:29,440 And let's take a look at a few characteristics here. 371 00:16:29,440 --> 00:16:32,250 So the stuff up here is mostly familiar, and we'll 372 00:16:32,250 --> 00:16:35,040 come back to what is not familiar in just a moment. 373 00:16:35,040 --> 00:16:39,880 This of course is including my own header file, which is new as well, 374 00:16:39,880 --> 00:16:42,580 except for PSet 3 where, recall, we have helpers.h. 375 00:16:42,580 --> 00:16:45,150 So you might recall #include helpers.h. 376 00:16:45,150 --> 00:16:49,381 >> Why though am I using quotes instead of angled brackets? 377 00:16:49,381 --> 00:16:50,630 When do I choose between them? 378 00:16:50,630 --> 00:16:52,310 Almost always I seem to use angled brackets. 379 00:16:52,310 --> 00:16:55,040 And then, all of a sudden on line six I'm using double quotes. 380 00:16:55,040 --> 00:16:55,860 Why might that be? 381 00:16:55,860 --> 00:16:56,700 Yeah? 382 00:16:56,700 --> 00:16:57,725 >> AUDIENCE: [INAUDIBLE] 383 00:16:57,725 --> 00:16:59,350 DAVID J. MALAN: That's an actual, what? 384 00:16:59,350 --> 00:17:00,559 AUDIENCE: That's in your IDE. 385 00:17:00,559 --> 00:17:02,475 DAVID J. MALAN: Yeah, that's in my actual IDE. 386 00:17:02,475 --> 00:17:05,690 And let's not dwell on the IDE, because that's just a tool that I'm using. 387 00:17:05,690 --> 00:17:08,119 That's in my current directory, specifically. 388 00:17:08,119 --> 00:17:11,647 So structs.h is my own file not installed in the IDE, 389 00:17:11,647 --> 00:17:14,480 in the operating system itself, rather it's in my current directory. 390 00:17:14,480 --> 00:17:16,910 So the convention is if you want to include your own header file, 391 00:17:16,910 --> 00:17:18,200 you just use double quotes. 392 00:17:18,200 --> 00:17:23,290 >> What do we call this thing in line 8, generally speaking? 393 00:17:23,290 --> 00:17:25,200 This is what? 394 00:17:25,200 --> 00:17:28,220 #define something. 395 00:17:28,220 --> 00:17:31,040 This represents constants, right? 396 00:17:31,040 --> 00:17:33,140 If you want to have a value in your program 397 00:17:33,140 --> 00:17:35,110 that you use a whole bunch of times, it's 398 00:17:35,110 --> 00:17:39,330 good convention to factor it out, declare it, with the hash symbol 399 00:17:39,330 --> 00:17:43,340 define, then, by convention, in all uppercase word-- though it's not 400 00:17:43,340 --> 00:17:45,320 strictly necessary, but it's human convention 401 00:17:45,320 --> 00:17:47,210 to capitalize constants so that they jump out 402 00:17:47,210 --> 00:17:50,380 at you visually-- space and then the value you want to be 403 00:17:50,380 --> 00:17:52,250 equivalent to that constant's name. 404 00:17:52,250 --> 00:17:56,110 No semicolon, but you simply follow that pattern there. 405 00:17:56,110 --> 00:17:57,770 >> So what am I doing in this actual code. 406 00:17:57,770 --> 00:18:00,660 So let's take a look at the main program here. 407 00:18:00,660 --> 00:18:04,080 In line 12 because I have included structs.h, 408 00:18:04,080 --> 00:18:06,492 I now have magically at my disposal a new data type. 409 00:18:06,492 --> 00:18:09,200 I don't just have access to int, and char, and float, and string, 410 00:18:09,200 --> 00:18:10,060 and blue and others. 411 00:18:10,060 --> 00:18:12,470 I now have access to a student data type. 412 00:18:12,470 --> 00:18:17,740 So in line 12, I'm combining two ideas-- one a custom data type and two, 413 00:18:17,740 --> 00:18:18,940 using an array. 414 00:18:18,940 --> 00:18:21,700 And so in this program if I want to actually support 415 00:18:21,700 --> 00:18:24,320 three different students in my program, I 416 00:18:24,320 --> 00:18:30,480 can simply say give me a variable called students, each of which 417 00:18:30,480 --> 00:18:32,970 is of type students, which is my custom data type. 418 00:18:32,970 --> 00:18:35,890 And, specifically, give me three of those in my array. 419 00:18:35,890 --> 00:18:37,750 >> So now what do we do in this program? 420 00:18:37,750 --> 00:18:40,670 Here's just a for loop iterating from 0 to 3, because that's 421 00:18:40,670 --> 00:18:42,110 what the value of students is. 422 00:18:42,110 --> 00:18:44,420 I'm just prompting the user give me the student's name. 423 00:18:44,420 --> 00:18:48,090 And then in line 17, we have a mostly familiar line. 424 00:18:48,090 --> 00:18:50,370 We have our old friend GetString on the right. 425 00:18:50,370 --> 00:18:52,345 And what piece of syntax is apparently new, 426 00:18:52,345 --> 00:18:55,130 if you've never programmed in C before, and have never used the structs? 427 00:18:55,130 --> 00:18:55,510 Yeah? 428 00:18:55,510 --> 00:18:56,417 >> AUDIENCE: The .name. 429 00:18:56,417 --> 00:18:57,500 DAVID J. MALAN: The .name. 430 00:18:57,500 --> 00:19:01,220 But this isn't too much of a leap, because now students bracket i 431 00:19:01,220 --> 00:19:02,590 gives you the i-th student. 432 00:19:02,590 --> 00:19:04,730 And if you want to dive inside of that structure, 433 00:19:04,730 --> 00:19:09,490 you just use a single period and then the name of the variable inside, 434 00:19:09,490 --> 00:19:11,900 or the property inside that you want to get access to. 435 00:19:11,900 --> 00:19:14,816 Similarly then, if I then prompt the user, give me the student's dorm, 436 00:19:14,816 --> 00:19:18,390 you can similarly store that string in the dorm variable inside 437 00:19:18,390 --> 00:19:19,940 of that student structure. 438 00:19:19,940 --> 00:19:21,410 >> And now things get a little fancy. 439 00:19:21,410 --> 00:19:24,420 And this is going to look at perhaps a lot quite soon. 440 00:19:24,420 --> 00:19:27,970 But you'll see this far more in PSet 4, so let's just glance at it now. 441 00:19:27,970 --> 00:19:33,364 It turns out that in line 23 through 38, what do you think I'm perhaps doing? 442 00:19:33,364 --> 00:19:35,530 I've removed the comments for today, but the version 443 00:19:35,530 --> 00:19:38,660 of the code online for reference has all comments. 444 00:19:38,660 --> 00:19:40,171 What do I seem to be doing? 445 00:19:40,171 --> 00:19:42,530 >> AUDIENCE: Saving the file with all the information that the user entered. 446 00:19:42,530 --> 00:19:44,530 >> DAVID J. MALAN: Yeah, exactly, this is a new way 447 00:19:44,530 --> 00:19:46,370 that we're seeing two, another feature of C, 448 00:19:46,370 --> 00:19:48,700 whereby I can create my own files. 449 00:19:48,700 --> 00:19:51,580 Thus far, almost every program you've written is stateless. 450 00:19:51,580 --> 00:19:53,334 As soon as it's done running, that's it. 451 00:19:53,334 --> 00:19:55,000 There's no memory or recollection of it. 452 00:19:55,000 --> 00:19:56,110 There's no file saved. 453 00:19:56,110 --> 00:19:58,120 But if you do want to save input that has 454 00:19:58,120 --> 00:20:02,100 happened, like in a game or a program like this, it turns out we can do so. 455 00:20:02,100 --> 00:20:04,360 And you'll see this more in PSet 4 and in Section. 456 00:20:04,360 --> 00:20:08,661 But this line 23 essentially creates a file called students.csv. 457 00:20:08,661 --> 00:20:10,160 And you might have seen this before. 458 00:20:10,160 --> 00:20:14,250 Even if you've never studied CS before, CSV is comma-separated variables. 459 00:20:14,250 --> 00:20:19,000 It's like a very poor man's version of an Excel file, 460 00:20:19,000 --> 00:20:22,270 which means that it could be opened in Excel and in Apple Numbers, 461 00:20:22,270 --> 00:20:23,830 and it has rows and columns. 462 00:20:23,830 --> 00:20:26,485 But it's not a proprietary format like Microsoft or Apple's. 463 00:20:26,485 --> 00:20:29,840 It's just commas separating the values that we'll see in a moment. 464 00:20:29,840 --> 00:20:31,010 >> And just take a guess. 465 00:20:31,010 --> 00:20:33,480 In line 23, at the very end, my second argument 466 00:20:33,480 --> 00:20:37,700 to this new function called f open for file open is w. 467 00:20:37,700 --> 00:20:39,430 What might w denote? 468 00:20:39,430 --> 00:20:40,022 Yeah? 469 00:20:40,022 --> 00:20:41,260 >> AUDIENCE: It lets you write to the file? 470 00:20:41,260 --> 00:20:42,630 >> DAVID J. MALAN: It lets you write to the file. 471 00:20:42,630 --> 00:20:44,810 So there's a couple of variants that we can plug in here. 472 00:20:44,810 --> 00:20:47,184 But if you just want to read the file, that is look at it 473 00:20:47,184 --> 00:20:50,010 and read it into memory, you just use quote unquote "r". 474 00:20:50,010 --> 00:20:53,110 If you want to write to the file, you use quote unquote "w". 475 00:20:53,110 --> 00:20:55,190 There's also append and a couple of other things 476 00:20:55,190 --> 00:20:57,356 if you want to modify existing files. 477 00:20:57,356 --> 00:21:00,480 Now we're going to keep seeing this thing, then we'll come back to line 24. 478 00:21:00,480 --> 00:21:02,640 NULL, it turns out, is a special value that 479 00:21:02,640 --> 00:21:06,070 can be returned by certain functions if something has gone wrong-- 480 00:21:06,070 --> 00:21:08,490 if the file doesn't exist, if you've run out of memory, 481 00:21:08,490 --> 00:21:09,620 or a bunch of other errors. 482 00:21:09,620 --> 00:21:13,470 But for now, let's just assume that this is just conventional error checking. 483 00:21:13,470 --> 00:21:17,090 Here in line 26, I'm iterating from 0 to 3 over all my students. 484 00:21:17,090 --> 00:21:20,470 And this is kind of sort of a new function, fprintf, 485 00:21:20,470 --> 00:21:21,460 but just take a guess. 486 00:21:21,460 --> 00:21:24,370 If printf is just print a formatted string, 487 00:21:24,370 --> 00:21:26,507 what does fprintf probably mean? 488 00:21:26,507 --> 00:21:27,590 AUDIENCE: Print to a file. 489 00:21:27,590 --> 00:21:29,290 DAVID J. MALAN: Print a formatted string to a file. 490 00:21:29,290 --> 00:21:31,180 That's what the additional f means is file. 491 00:21:31,180 --> 00:21:36,420 And the new first argument has to be the variable that represents your file. 492 00:21:36,420 --> 00:21:38,866 Then we just have a format string just like printf. 493 00:21:38,866 --> 00:21:40,740 And even though this syntax is new, this just 494 00:21:40,740 --> 00:21:44,610 means plug in the student's name, plug-in the student dorm, and then 495 00:21:44,610 --> 00:21:47,160 with fclose, close the file. 496 00:21:47,160 --> 00:21:49,730 And then lastly-- this is new and we'll come back to this 497 00:21:49,730 --> 00:21:53,240 before long-- I'm freeing the student for reasons 498 00:21:53,240 --> 00:21:54,860 that happened up above there. 499 00:21:54,860 --> 00:21:56,820 But we'll come back to that before long-- 500 00:21:56,820 --> 00:21:59,820 that's because of how GetString is actually working underneath the hood. 501 00:21:59,820 --> 00:22:01,280 >> So let's take a quick look here. 502 00:22:01,280 --> 00:22:04,380 If I type ls in my directory, notice that I do not 503 00:22:04,380 --> 00:22:09,360 have a file called students.csv, just not there, does not exist. 504 00:22:09,360 --> 00:22:14,965 So if I now compile this program, make structs-1, . /structs-1, 505 00:22:14,965 --> 00:22:20,570 and I'm going to go ahead and type in Andi , who lives in Berkeley at Yale. 506 00:22:20,570 --> 00:22:26,350 We're going to have Rob who lives in Thayer these days. 507 00:22:26,350 --> 00:22:33,760 And let's come up with where is, I think, Maria is in Mather, 508 00:22:33,760 --> 00:22:35,100 if I have remembered correctly. 509 00:22:35,100 --> 00:22:36,460 >> So nothing seems to happen. 510 00:22:36,460 --> 00:22:40,680 But if I type ls now, there is students.csv. 511 00:22:40,680 --> 00:22:43,080 Let's go ahead and open students.csv. 512 00:22:43,080 --> 00:22:46,050 This is again a very lightweight file format. 513 00:22:46,050 --> 00:22:49,570 But I've simply adopted a convention that I have two rows and columns here. 514 00:22:49,570 --> 00:22:52,020 The first column is people's first names. 515 00:22:52,020 --> 00:22:55,740 The second column is the student's dorm, or college, or house, or whatnot. 516 00:22:55,740 --> 00:22:57,900 And now I've saved this permanently in a file. 517 00:22:57,900 --> 00:22:59,280 >> So it's not all that interesting. 518 00:22:59,280 --> 00:23:02,980 But this is just a stepping stone now to being able to persist information 519 00:23:02,980 --> 00:23:04,040 permanently. 520 00:23:04,040 --> 00:23:08,340 So let's see now what more we can do with these and other features. 521 00:23:08,340 --> 00:23:10,729 But first, any questions? 522 00:23:10,729 --> 00:23:12,145 That was a lot, and that was fast. 523 00:23:12,145 --> 00:23:16,131 But you'll see a lot more in PSet 4, as well. 524 00:23:16,131 --> 00:23:16,630 Yeah? 525 00:23:16,630 --> 00:23:19,360 >> AUDIENCE: Is there a way to continue adding names to that file? 526 00:23:19,360 --> 00:23:19,880 >> DAVID J. MALAN: Good question. 527 00:23:19,880 --> 00:23:21,800 Is there a way to continue adding names to that file? 528 00:23:21,800 --> 00:23:22,340 Yes. 529 00:23:22,340 --> 00:23:24,630 And, in fact, if you end up re-opening the file, 530 00:23:24,630 --> 00:23:26,780 you would use quote unquote "a" for append, 531 00:23:26,780 --> 00:23:31,090 which would just add a new line, a new line again and again, exactly. 532 00:23:31,090 --> 00:23:32,010 Good question. 533 00:23:32,010 --> 00:23:32,950 Other questions? 534 00:23:32,950 --> 00:23:33,450 Yeah? 535 00:23:33,450 --> 00:23:35,580 AUDIENCE: If you ran the program again right now, 536 00:23:35,580 --> 00:23:38,000 would it keep adding names to the file or would it open up a new file? 537 00:23:38,000 --> 00:23:38,740 >> DAVID J. MALAN: Ah, good question. 538 00:23:38,740 --> 00:23:41,448 If you ran the program again right now, maybe typed in new names, 539 00:23:41,448 --> 00:23:44,820 would it add to the file or overwrite the file? 540 00:23:44,820 --> 00:23:47,420 The latter, because I'm not using append mode. 541 00:23:47,420 --> 00:23:49,930 And because I'm just blindly opening the file for writing, 542 00:23:49,930 --> 00:23:51,310 it's just going to overwrite the file. 543 00:23:51,310 --> 00:23:54,570 So I would indeed need to do is append, if I want to actually have a long term 544 00:23:54,570 --> 00:23:55,350 database. 545 00:23:55,350 --> 00:23:58,220 >> Now CSV is useful, frankly, even for like if you're writing-- 546 00:23:58,220 --> 00:24:00,100 and we'll eventually see this later in the semester when 547 00:24:00,100 --> 00:24:01,455 we use CSVs for other purposes. 548 00:24:01,455 --> 00:24:04,920 If you want to store all of the people who have registered for some event, 549 00:24:04,920 --> 00:24:07,420 or signed up for your student group, or something like that, 550 00:24:07,420 --> 00:24:10,330 storing the data in this kind of format is super convenient. 551 00:24:10,330 --> 00:24:12,580 Because literally, if I were to download this file. 552 00:24:12,580 --> 00:24:14,540 I could double-- and let's actually try this 553 00:24:14,540 --> 00:24:16,720 if I have Excel or Numbers on here. 554 00:24:16,720 --> 00:24:19,130 >> I'm going to right-click or control-click my file. 555 00:24:19,130 --> 00:24:20,020 Whoops. 556 00:24:20,020 --> 00:24:21,830 Right-click or control-click my file. 557 00:24:21,830 --> 00:24:24,960 Come on, my mouse isn't cooperating. 558 00:24:24,960 --> 00:24:32,694 Download-- I'm going to download all the files here so 559 00:24:32,694 --> 00:24:33,860 just so I can grab this one. 560 00:24:33,860 --> 00:24:37,850 And let's see if this works students.csv-- first time 561 00:24:37,850 --> 00:24:39,310 I've activated. 562 00:24:39,310 --> 00:24:41,360 Now they want to see my contacts. 563 00:24:41,360 --> 00:24:44,310 Now, I need to register. 564 00:24:44,310 --> 00:24:47,620 See how easy it is to use CSVs? 565 00:24:47,620 --> 00:24:50,840 Yes, keep it up to date. 566 00:24:50,840 --> 00:24:52,375 OK, now we're ready for class. 567 00:24:52,375 --> 00:24:58,750 568 00:24:58,750 --> 00:25:00,370 OK, oh, what's new? 569 00:25:00,370 --> 00:25:02,920 OK, close. 570 00:25:02,920 --> 00:25:04,750 That was magical. 571 00:25:04,750 --> 00:25:07,280 OK, now we have to update. 572 00:25:07,280 --> 00:25:10,890 And now, it forgot what file I originally opened, 573 00:25:10,890 --> 00:25:13,090 but what a-- there we go. 574 00:25:13,090 --> 00:25:16,341 OK, so now we have an Excel file. 575 00:25:16,341 --> 00:25:18,290 Thank you. 576 00:25:18,290 --> 00:25:20,764 >> OK, so what I did was the easy part. 577 00:25:20,764 --> 00:25:23,930 Of course I could have pre-installed Excel, or Numbers, or whatever program. 578 00:25:23,930 --> 00:25:25,846 But this is nice, because now I can manipulate 579 00:25:25,846 --> 00:25:28,090 the data in a standard format. 580 00:25:28,090 --> 00:25:30,294 >> So now let's context switch to where we left off 581 00:25:30,294 --> 00:25:32,710 last time, which was to start to take off training wheels. 582 00:25:32,710 --> 00:25:34,543 But first, you didn't see this earlier lunch 583 00:25:34,543 --> 00:25:38,150 is again happening here at Fire and Ice in Cambridge, Sitar in New Haven. 584 00:25:38,150 --> 00:25:43,150 Sign up on CS50s website ASAP to join CS50 students and staff. 585 00:25:43,150 --> 00:25:46,090 >> So we took training wheels off on Monday as follows-- 586 00:25:46,090 --> 00:25:49,120 string has been declared in CS50s library for some time. 587 00:25:49,120 --> 00:25:52,650 And it's nice, because it allows us to talk about variables as being 588 00:25:52,650 --> 00:25:54,660 complete words and sentences and more. 589 00:25:54,660 --> 00:25:56,710 But it turns out string does not exist. 590 00:25:56,710 --> 00:26:00,200 That is just a synonym, or an alias, that we have created for something that 591 00:26:00,200 --> 00:26:03,780 actually is a little more technical called a char*. 592 00:26:03,780 --> 00:26:07,900 >> And indeed, we saw an example of a program on Monday 593 00:26:07,900 --> 00:26:11,200 that didn't behave quite as we expected. 594 00:26:11,200 --> 00:26:13,630 This was the file, compare-0. 595 00:26:13,630 --> 00:26:17,910 And recall that compare-0, if I recompile Monday's program 596 00:26:17,910 --> 00:26:22,670 and run compare-0 and type in mom in lowercase, and mom in lowercase again. 597 00:26:22,670 --> 00:26:25,320 The program insisted I type different things, 598 00:26:25,320 --> 00:26:29,210 even though mom, all in lowercase, is identical visually. 599 00:26:29,210 --> 00:26:31,990 So what was the short answer for why the computer thinks 600 00:26:31,990 --> 00:26:34,500 those two strings are different? 601 00:26:34,500 --> 00:26:35,250 Yeah? 602 00:26:35,250 --> 00:26:36,534 >> AUDIENCE: [INAUDIBLE] 603 00:26:36,534 --> 00:26:37,450 DAVID J. MALAN: Right. 604 00:26:37,450 --> 00:26:39,600 So, mom, the first time I type it in, is being 605 00:26:39,600 --> 00:26:42,710 stored somewhere in my computer's memory but in a different location 606 00:26:42,710 --> 00:26:44,690 than the second time I type in mom. 607 00:26:44,690 --> 00:26:46,580 Now it certainly could be optimized. 608 00:26:46,580 --> 00:26:49,205 The computer could be smart and realize these two strings, hey, 609 00:26:49,205 --> 00:26:49,954 they're identical. 610 00:26:49,954 --> 00:26:51,520 Let me not redundantly store it. 611 00:26:51,520 --> 00:26:54,229 But computers don't do that optimization unless you tell them to. 612 00:26:54,229 --> 00:26:56,061 So, by default, they're just going to end up 613 00:26:56,061 --> 00:26:57,670 in two different places in memory. 614 00:26:57,670 --> 00:27:01,570 And so to be more clear, when we compared the two strings, 615 00:27:01,570 --> 00:27:03,950 the first was called s, the second was called 616 00:27:03,950 --> 00:27:08,530 t, what specifically was I comparing here on line 13? 617 00:27:08,530 --> 00:27:09,494 Yeah. 618 00:27:09,494 --> 00:27:12,390 >> AUDIENCE: It's the place in memory that the variable will point to. 619 00:27:12,390 --> 00:27:14,900 >> DAVID J. MALAN: Exactly, I was comparing the place in memory 620 00:27:14,900 --> 00:27:16,300 that those variables pointed to. 621 00:27:16,300 --> 00:27:20,560 So specifically, if mom was at byte number 1, and 2, and 3, 622 00:27:20,560 --> 00:27:24,020 and 4-- because remember the backslash 0 needs to be all the way at the end. 623 00:27:24,020 --> 00:27:29,420 And the other instance of mom, m-o-m, was at address 10, 11, 12, and 13. 624 00:27:29,420 --> 00:27:33,100 I was comparing 1, that address, that location in memory, 625 00:27:33,100 --> 00:27:35,160 against 10, which is obviously not the same. 626 00:27:35,160 --> 00:27:36,260 1 is not 10. 627 00:27:36,260 --> 00:27:39,620 >> So this is nice in that it's pretty straightforward. 628 00:27:39,620 --> 00:27:42,870 But it's problematic insofar as we can't seem to compare strings. 629 00:27:42,870 --> 00:27:44,930 So fundamentally-- and at this low level, 630 00:27:44,930 --> 00:27:47,300 if you wanted to implement a program to compare 631 00:27:47,300 --> 00:27:50,270 two separate words that the user has typed in for quality, 632 00:27:50,270 --> 00:27:53,944 do they line up char for char, just in general terms, 633 00:27:53,944 --> 00:27:55,360 what do we need to do, apparently? 634 00:27:55,360 --> 00:27:57,940 It's not sufficient just to look at those two addresses. 635 00:27:57,940 --> 00:27:58,860 What do we need to do? 636 00:27:58,860 --> 00:27:59,360 Yeah? 637 00:27:59,360 --> 00:28:01,120 >> AUDIENCE: Iterate through the string [INAUDIBLE]. 638 00:28:01,120 --> 00:28:02,600 >> DAVID J. MALAN: Yeah, let's iterate through the string. 639 00:28:02,600 --> 00:28:05,808 Let's use a for loop, a while loop, or whatever you're most comfortable with. 640 00:28:05,808 --> 00:28:08,840 And if we've got two strings somewhere in memory, let's look at each's 641 00:28:08,840 --> 00:28:11,770 first character, then each's second character, then third, and fourth, 642 00:28:11,770 --> 00:28:15,206 and fifth, until we hit what special sentinel value? 643 00:28:15,206 --> 00:28:16,080 AUDIENCE: [INAUDIBLE] 644 00:28:16,080 --> 00:28:18,800 DAVID J. MALAN: Yeah, the backslash zero, at which point in either string 645 00:28:18,800 --> 00:28:20,100 we can decide that's it. 646 00:28:20,100 --> 00:28:21,970 Have we matched every single character? 647 00:28:21,970 --> 00:28:22,990 If not, return false. 648 00:28:22,990 --> 00:28:24,770 If so, return true. 649 00:28:24,770 --> 00:28:28,800 And so that's exactly what this version of the program compare-1.c does. 650 00:28:28,800 --> 00:28:31,677 It is identical to what we looked at Monday except that I've 651 00:28:31,677 --> 00:28:34,760 gotten rid of the word string-- though that has no functional impact-- all 652 00:28:34,760 --> 00:28:37,450 I'm doing now is removing some visual training wheels, 653 00:28:37,450 --> 00:28:40,880 but to see clearly that s and t are addresses. 654 00:28:40,880 --> 00:28:43,020 And that's what the star, the asterisk, represents 655 00:28:43,020 --> 00:28:46,690 is an address, otherwise known more technically as a pointer. 656 00:28:46,690 --> 00:28:49,880 >> So when I declare s on line 9 and say char* s, 657 00:28:49,880 --> 00:28:52,160 that doesn't mean give me a string. 658 00:28:52,160 --> 00:28:56,360 That means give me a variable whose purpose in life is to store an address. 659 00:28:56,360 --> 00:29:00,400 Because I am about to put the address of a string into it. 660 00:29:00,400 --> 00:29:03,500 And indeed, GetString, to be clear, does not return a string. 661 00:29:03,500 --> 00:29:06,110 It does not return mom backslash zero, per se. 662 00:29:06,110 --> 00:29:10,005 What does GetString specifically and precisely return? 663 00:29:10,005 --> 00:29:10,880 AUDIENCE: [INAUDIBLE] 664 00:29:10,880 --> 00:29:14,080 DAVID J. MALAN: An address, the address of the first character 665 00:29:14,080 --> 00:29:16,070 in some string it has gotten. 666 00:29:16,070 --> 00:29:19,250 And so now we're seeing a special keyword again. 667 00:29:19,250 --> 00:29:20,640 And, I alluded to this earlier. 668 00:29:20,640 --> 00:29:23,620 This is going to be good convention that we'll see again and again now. 669 00:29:23,620 --> 00:29:27,540 I'm checking to make sure that s is not null and t is not null. 670 00:29:27,540 --> 00:29:30,100 Because based on my really quick mention earlier, 671 00:29:30,100 --> 00:29:35,510 what might mean if GetString returns not an address but N-U-L-L, which is again, 672 00:29:35,510 --> 00:29:36,990 some special value? 673 00:29:36,990 --> 00:29:37,890 >> AUDIENCE: Error. 674 00:29:37,890 --> 00:29:38,600 >> DAVID J. MALAN: It's an error. 675 00:29:38,600 --> 00:29:39,550 Something went wrong. 676 00:29:39,550 --> 00:29:41,341 And what typically might happen, especially 677 00:29:41,341 --> 00:29:45,162 with strings-- which might be of unknown length in advance-- 678 00:29:45,162 --> 00:29:46,870 maybe the computers' out of memory, maybe 679 00:29:46,870 --> 00:29:49,280 you typed in such a long word or sentence 680 00:29:49,280 --> 00:29:51,880 or pasted such a huge essay there's just not enough memory. 681 00:29:51,880 --> 00:29:55,340 And so GetString can't return the address of the whole thing, 682 00:29:55,340 --> 00:29:56,620 so it just returns nothing. 683 00:29:56,620 --> 00:30:00,580 And it says an error has happened by returning the special NULL value. 684 00:30:00,580 --> 00:30:02,890 It's the zero address, so to speak. 685 00:30:02,890 --> 00:30:06,157 >> Now it turns out C comes with a function that does that iteration. 686 00:30:06,157 --> 00:30:09,240 We don't have to implement this with a for loop or a while loop ourselves. 687 00:30:09,240 --> 00:30:11,150 We can use a function, called succinctly, 688 00:30:11,150 --> 00:30:15,400 stir comp, or string compare, whose purpose in life is to do exactly that. 689 00:30:15,400 --> 00:30:19,990 You give it two pointers, two addresses, and it will go to those addresses 690 00:30:19,990 --> 00:30:23,130 and then compare letter for letter for letter for quality, 691 00:30:23,130 --> 00:30:26,610 stopping only when what is true? 692 00:30:26,610 --> 00:30:31,540 When intuitively should stir comp stop iterating, just to be clear? 693 00:30:31,540 --> 00:30:35,400 When it hits a backslash 0 in either string, at which point it can decide 694 00:30:35,400 --> 00:30:38,910 has everything matched, or has there been a discrepancy? 695 00:30:38,910 --> 00:30:42,740 >> So, if we run this now and try our little capitalization game, 696 00:30:42,740 --> 00:30:49,260 so make compare-1, ./compare-1, and type mom in lowercase both times. 697 00:30:49,260 --> 00:30:50,560 Now it's the same thing. 698 00:30:50,560 --> 00:30:54,080 And if I do it again with lowercase and then maybe uppercase. 699 00:30:54,080 --> 00:30:56,720 Now it indeed distinguishes between upper and lowercase. 700 00:30:56,720 --> 00:31:00,440 So not all that hard or magical, but it does now explain 701 00:31:00,440 --> 00:31:03,140 what's going on underneath the hood. 702 00:31:03,140 --> 00:31:07,640 >> So what more can we extract from this kind of lesson? 703 00:31:07,640 --> 00:31:08,980 So let's take a look at this. 704 00:31:08,980 --> 00:31:15,380 I'm going to go ahead and write a quick program here called copy-0. 705 00:31:15,380 --> 00:31:21,594 And now let's go ahead and actually let's do this-- with copy-0, 706 00:31:21,594 --> 00:31:23,010 take a look at what I've got here. 707 00:31:23,010 --> 00:31:24,712 I first tell the user, say something. 708 00:31:24,712 --> 00:31:26,420 Then I get a string and I stored it in s. 709 00:31:26,420 --> 00:31:29,810 Then I check if s equals equals NULL, just return 1. 710 00:31:29,810 --> 00:31:31,590 So this is just standard error checking. 711 00:31:31,590 --> 00:31:33,112 Nothing interesting has happened. 712 00:31:33,112 --> 00:31:36,320 And in fact, if we get rid of the error checking, this looks like week 1 code 713 00:31:36,320 --> 00:31:36,985 at the moment. 714 00:31:36,985 --> 00:31:39,110 But I've started to get a little better about that. 715 00:31:39,110 --> 00:31:43,340 >> Now in line 16, a week ago, maybe even a couple days or minutes ago, 716 00:31:43,340 --> 00:31:46,720 you might say line 16 is creating a variable called t 717 00:31:46,720 --> 00:31:48,219 and copying s into it. 718 00:31:48,219 --> 00:31:50,010 And that's a perfectly reasonable takeaway. 719 00:31:50,010 --> 00:31:51,560 But be more precise now. 720 00:31:51,560 --> 00:31:54,190 What is happening in line 16? 721 00:31:54,190 --> 00:31:56,170 What is getting copied from right to left? 722 00:31:56,170 --> 00:31:56,669 Yeah? 723 00:31:56,669 --> 00:31:58,490 AUDIENCE: Is t getting an address of s? 724 00:31:58,490 --> 00:32:01,220 >> DAVID J. MALAN: Exactly, t is getting the address of s. 725 00:32:01,220 --> 00:32:05,170 So to be clear now, if I go back to that earlier example 726 00:32:05,170 --> 00:32:08,520 and I draw out the thing I've typed in. 727 00:32:08,520 --> 00:32:11,640 And what I've typed in-- here's s, and here 728 00:32:11,640 --> 00:32:15,830 is what I've typed in somewhere in memory, mom and then a backslash 729 00:32:15,830 --> 00:32:17,840 0 that's added for me. 730 00:32:17,840 --> 00:32:23,060 What I stored in here, recall, this is at location 1, 2, 3, 4, 731 00:32:23,060 --> 00:32:24,655 this is what's currently in s. 732 00:32:24,655 --> 00:32:29,220 So if on line 16, I say give me another variable called t and store 733 00:32:29,220 --> 00:32:33,590 in at the value of s, what gets stored here will not mom 734 00:32:33,590 --> 00:32:35,480 but rather just the number 1. 735 00:32:35,480 --> 00:32:38,520 >> So if we look ahead in this program now, what's going to happen? 736 00:32:38,520 --> 00:32:40,690 So notice that there's this function you might 737 00:32:40,690 --> 00:32:44,410 have used this some time ago for Caesar, or Vigenere, or maybe not at all. 738 00:32:44,410 --> 00:32:48,170 I claim with my printf, I'm going to capitalize the copy t. 739 00:32:48,170 --> 00:32:51,616 First in line 19, quick sanity check, strlen checks the length of t. 740 00:32:51,616 --> 00:32:53,740 Because I don't want to try to capitalize something 741 00:32:53,740 --> 00:32:55,104 if there's no string there. 742 00:32:55,104 --> 00:32:57,520 If the user just hit Enter, there's nothing to capitalize. 743 00:32:57,520 --> 00:33:01,100 So I don't want to do line 21. 744 00:33:01,100 --> 00:33:05,758 So line 21 is capitalizing which letter, apparently, in t? 745 00:33:05,758 --> 00:33:06,514 >> AUDIENCE: m? 746 00:33:06,514 --> 00:33:08,722 DAVID J. MALAN: It looks like it's copying which one? 747 00:33:08,722 --> 00:33:09,486 AUDIENCE: m. 748 00:33:09,486 --> 00:33:10,450 DAVID J. MALAN: Uh, m. 749 00:33:10,450 --> 00:33:12,685 OK, so the first m, because notice that I'm 750 00:33:12,685 --> 00:33:14,935 passing to toupper, which if you've never seen it it's 751 00:33:14,935 --> 00:33:16,980 just a function to capitalize as its input. 752 00:33:16,980 --> 00:33:20,240 t bracket zero means give me the zero character of t. 753 00:33:20,240 --> 00:33:22,550 And so how does this picture change, to be clear? 754 00:33:22,550 --> 00:33:25,490 755 00:33:25,490 --> 00:33:29,160 What needs to get rewritten or changed with respect to s and t and mom 756 00:33:29,160 --> 00:33:30,097 backslash zero. 757 00:33:30,097 --> 00:33:31,470 >> AUDIENCE: [INAUDIBLE] 758 00:33:31,470 --> 00:33:34,030 >> DAVID J. MALAN: Yeah, so this one here simply 759 00:33:34,030 --> 00:33:40,860 needs to get changed to-- fix this-- needs to get changed to a capital m. 760 00:33:40,860 --> 00:33:44,330 But now, look later in the program, if I print out 761 00:33:44,330 --> 00:33:49,800 s and t as I clean here, watch what's going to happen printing out s and t. 762 00:33:49,800 --> 00:33:54,310 So make copy-0, ./copy-0. 763 00:33:54,310 --> 00:33:57,140 Let me go ahead and type in mom in all lowercase. 764 00:33:57,140 --> 00:34:00,140 Notice both the original and the copy have been capitalized. 765 00:34:00,140 --> 00:34:00,850 Why? 766 00:34:00,850 --> 00:34:04,431 Well, s and t are both pointing to, if you will, the same chunk of memory. 767 00:34:04,431 --> 00:34:06,930 And frankly, this is getting really uninteresting-- the fact 768 00:34:06,930 --> 00:34:09,150 that we're using address zero here. 769 00:34:09,150 --> 00:34:11,719 I mean, I don't really care where stuff is in memory. 770 00:34:11,719 --> 00:34:13,550 Sorry I'm erasing a little too much. 771 00:34:13,550 --> 00:34:15,674 But I don't really care where things are in memory. 772 00:34:15,674 --> 00:34:18,510 And so, indeed what programmers tend to think about 773 00:34:18,510 --> 00:34:21,080 is that when you talk about an address, or a pointer, 774 00:34:21,080 --> 00:34:22,679 who cares where it is in memory. 775 00:34:22,679 --> 00:34:24,989 I don't care if it's at byte one or one billion. 776 00:34:24,989 --> 00:34:27,920 I just care that this variable is effectively 777 00:34:27,920 --> 00:34:29,620 pointing at that chunk of memory. 778 00:34:29,620 --> 00:34:33,350 And so, henceforth, rather than quibble over arbitrary memory addresses, let's 779 00:34:33,350 --> 00:34:36,710 just start to draw pointers as pointers, as arrows. 780 00:34:36,710 --> 00:34:39,340 So what s and t really are, according to this program, 781 00:34:39,340 --> 00:34:42,130 because of how I created t, it's just two separate variables 782 00:34:42,130 --> 00:34:43,840 pointing at the same chunk of memory. 783 00:34:43,840 --> 00:34:45,215 And we don't care where they are. 784 00:34:45,215 --> 00:34:47,130 So we can abstract away that detail. 785 00:34:47,130 --> 00:34:48,780 >> So how do I fix this? 786 00:34:48,780 --> 00:34:54,120 If I want to write a version of the copy program that actually copies the string 787 00:34:54,120 --> 00:34:56,840 and capitalizes only the copy, just intuitively, 788 00:34:56,840 --> 00:34:59,766 what's got to be an ingredient to our solution? 789 00:34:59,766 --> 00:35:00,640 AUDIENCE: [INAUDIBLE] 790 00:35:00,640 --> 00:35:01,420 DAVID J. MALAN: We need a what? 791 00:35:01,420 --> 00:35:01,820 AUDIENCE: Chunk of memory. 792 00:35:01,820 --> 00:35:03,280 DAVID J. MALAN: We need another chunk of memory, right? 793 00:35:03,280 --> 00:35:05,360 We don't know how to do it yet, necessarily. 794 00:35:05,360 --> 00:35:11,330 But I kind of need this to happen so that the original mom in lower case 795 00:35:11,330 --> 00:35:14,170 ends up in that extra chunk of memory. 796 00:35:14,170 --> 00:35:19,770 And then when I change the copy, I don't want to change this copy here. 797 00:35:19,770 --> 00:35:26,020 I instead want to change only this copy so that the original is unchanged. 798 00:35:26,020 --> 00:35:27,980 >> So, let's see how we might do this. 799 00:35:27,980 --> 00:35:31,800 In copy-1, which has already been stripped of comment, 800 00:35:31,800 --> 00:35:33,250 but is commented online. 801 00:35:33,250 --> 00:35:36,710 We instead do the following-- these lines are identical, get me a string 802 00:35:36,710 --> 00:35:38,340 and call it s. 803 00:35:38,340 --> 00:35:43,500 But now let's look at one of our most complex but the last of the complexity 804 00:35:43,500 --> 00:35:47,340 for awhile, line 16 does exactly this. 805 00:35:47,340 --> 00:35:49,400 So if your comfy with the picture we just drew-- 806 00:35:49,400 --> 00:35:51,790 give me a new chunk of memory, copy everything into it, 807 00:35:51,790 --> 00:35:53,730 let's see how we translate that to code. 808 00:35:53,730 --> 00:35:59,400 >> So line 16, on the left hand side, char* t gives me this box over here. 809 00:35:59,400 --> 00:36:00,230 That's all it does. 810 00:36:00,230 --> 00:36:03,240 On the right hand side, m alloc, or malloc, 811 00:36:03,240 --> 00:36:06,480 is memory allocation, super fancy, a cryptic way of just saying 812 00:36:06,480 --> 00:36:07,640 give me a chunk of memory. 813 00:36:07,640 --> 00:36:09,290 How much memory do we need? 814 00:36:09,290 --> 00:36:10,910 Well, is kind of a big expression. 815 00:36:10,910 --> 00:36:12,570 But let's see what it says here. 816 00:36:12,570 --> 00:36:15,940 So this, of course, is give me the string length of s. 817 00:36:15,940 --> 00:36:19,094 So, mom it should be what? 818 00:36:19,094 --> 00:36:21,010 So just three, right? mom is three characters. 819 00:36:21,010 --> 00:36:22,830 You don't count the backslash zero when you 820 00:36:22,830 --> 00:36:25,960 talk about the length of a string it's actually the human visible letters. 821 00:36:25,960 --> 00:36:28,020 So mom, so this gives me 3. 822 00:36:28,020 --> 00:36:31,170 But wait a minute, I'm now adding 1. 823 00:36:31,170 --> 00:36:34,861 Why do I actually want to allocate 4 bytes and not just 3? 824 00:36:34,861 --> 00:36:35,360 Yeah? 825 00:36:35,360 --> 00:36:36,910 >> AUDIENCE: For the sentinel value? 826 00:36:36,910 --> 00:36:38,951 >> DAVID J. MALAN: Exactly, for that sentinel value. 827 00:36:38,951 --> 00:36:40,840 For the backslash zero, I need 4 bytes total. 828 00:36:40,840 --> 00:36:42,870 So I need the length of the string plus 1. 829 00:36:42,870 --> 00:36:45,400 And then just for good measure-- even though on this system, 830 00:36:45,400 --> 00:36:49,390 it's always going to be 1-- I'm saying multiply this by the size of a char. 831 00:36:49,390 --> 00:36:51,552 Turns out sizeof is an operator in C that 832 00:36:51,552 --> 00:36:53,260 just tells you the number of bytes that's 833 00:36:53,260 --> 00:36:54,700 required for a certain data type. 834 00:36:54,700 --> 00:36:57,740 It doesn't work for arrays, typically, sometimes it does. 835 00:36:57,740 --> 00:36:59,210 But in the general case, no. 836 00:36:59,210 --> 00:37:02,330 But it will tell me how many bytes a char is, which turns out is always 1. 837 00:37:02,330 --> 00:37:04,080 So this is like multiplying by 1. 838 00:37:04,080 --> 00:37:05,900 >> So super cryptic looking line of code. 839 00:37:05,900 --> 00:37:09,320 But all it does is gives me a chunk of memory. 840 00:37:09,320 --> 00:37:13,590 But does it seem to be copying anything into that memory? 841 00:37:13,590 --> 00:37:14,560 Not yet. 842 00:37:14,560 --> 00:37:22,040 And so what do I on line 22, and 23, 24, 25, well, I simply do this. 843 00:37:22,040 --> 00:37:23,760 And this is sort of old school stuff now. 844 00:37:23,760 --> 00:37:26,010 This is like PSet 2, where you're just moving things 845 00:37:26,010 --> 00:37:28,620 around in memory, or rather in strings. 846 00:37:28,620 --> 00:37:31,920 >> So I'm iterating from 0 to the length of the string s. 847 00:37:31,920 --> 00:37:37,820 And I'm copying the i-th character in s into the i-th character in t. 848 00:37:37,820 --> 00:37:41,820 And because I, the programmer, made sure to allocate exactly as many bytes 849 00:37:41,820 --> 00:37:44,600 as I need, it's perfect one-to-one relationship. 850 00:37:44,600 --> 00:37:47,060 And I copy mom in lowercase to the new one. 851 00:37:47,060 --> 00:37:50,170 And then lastly, I do this line. 852 00:37:50,170 --> 00:37:54,637 And so the effect is only to capitalize this t here. 853 00:37:54,637 --> 00:37:56,470 So a lot to absorb, but if you just consider 854 00:37:56,470 --> 00:37:58,220 what's really going on underneath the hood 855 00:37:58,220 --> 00:38:00,880 is just moving these bytes around, all that 856 00:38:00,880 --> 00:38:06,617 is needed to solve this problem is just to give us this chunk of memory. 857 00:38:06,617 --> 00:38:08,450 Now at the risk of overwhelming, let me show 858 00:38:08,450 --> 00:38:13,200 one other example that's almost identical, except for this one 859 00:38:13,200 --> 00:38:14,350 line of code. 860 00:38:14,350 --> 00:38:18,870 So this is the hacker version of this program, if you will. 861 00:38:18,870 --> 00:38:21,050 But let's just distill it into what's going on. 862 00:38:21,050 --> 00:38:28,920 Line 24 used to be this t bracket i gets s bracket i. 863 00:38:28,920 --> 00:38:33,370 Now, I'm changing this to the much more cryptic star t 864 00:38:33,370 --> 00:38:36,280 plus 1 equals star s plus 1. 865 00:38:36,280 --> 00:38:38,702 >> So what's happening and why do we have a star character? 866 00:38:38,702 --> 00:38:41,410 We've seen the star before, and it's being used differently here. 867 00:38:41,410 --> 00:38:45,490 We previously saw char*, now I'm seeing a star at the beginning, and that's OK. 868 00:38:45,490 --> 00:38:48,190 Because it turns out we can kind of infer just 869 00:38:48,190 --> 00:38:50,280 from those first principles what's going on. 870 00:38:50,280 --> 00:38:53,860 So just to be clear, what is s? 871 00:38:53,860 --> 00:38:55,052 Last week, it was a string. 872 00:38:55,052 --> 00:38:56,260 That doesn't suffice anymore. 873 00:38:56,260 --> 00:38:57,690 What is s, specifically? 874 00:38:57,690 --> 00:38:58,590 >> AUDIENCE: [INAUDIBLE] 875 00:38:58,590 --> 00:38:59,881 >> DAVID J. MALAN: It's a pointer. 876 00:38:59,881 --> 00:39:02,610 It's the address of the first character we typed in. 877 00:39:02,610 --> 00:39:04,780 OK, what is t? 878 00:39:04,780 --> 00:39:05,660 >> AUDIENCE: [INAUDIBLE] 879 00:39:05,660 --> 00:39:07,950 >> DAVID J. MALAN: The address of the first byte 880 00:39:07,950 --> 00:39:10,490 in t, that chunk of memory reallocated. 881 00:39:10,490 --> 00:39:14,720 So it turns out that when we iterate from 0 on up to the string 882 00:39:14,720 --> 00:39:17,424 length-- first of all, i starts off at 0, because 883 00:39:17,424 --> 00:39:18,840 of this old school for loop thing. 884 00:39:18,840 --> 00:39:22,400 So just for simplicity, let's assume that the first line of code 885 00:39:22,400 --> 00:39:23,760 is really just this, right. 886 00:39:23,760 --> 00:39:26,080 If i is zero, adding zero to something presumably 887 00:39:26,080 --> 00:39:27,540 is not going to have an effect. 888 00:39:27,540 --> 00:39:28,560 >> So what is this saying? 889 00:39:28,560 --> 00:39:31,600 It turns out that the star operator in this context 890 00:39:31,600 --> 00:39:33,700 is the dereference operator, which is just 891 00:39:33,700 --> 00:39:37,530 a fancy way of saying go to the following address. 892 00:39:37,530 --> 00:39:42,080 So if s is the address of the first character in this chunk of memory, 893 00:39:42,080 --> 00:39:43,630 *s means go there. 894 00:39:43,630 --> 00:39:45,630 And because we've drawn the picture in this way, 895 00:39:45,630 --> 00:39:47,430 you can adopt the following mental model. 896 00:39:47,430 --> 00:39:51,030 If this is s, and you say *s, *s kind of like chutes and ladders, 897 00:39:51,030 --> 00:39:54,540 if you remember the game from childhood, is like follow that arrow and go 898 00:39:54,540 --> 00:39:55,570 to the address. 899 00:39:55,570 --> 00:39:57,080 >> *t is the same thing. 900 00:39:57,080 --> 00:39:59,855 So start here, go to its chunk. 901 00:39:59,855 --> 00:40:03,350 I can't just draw on this screen that way. 902 00:40:03,350 --> 00:40:05,560 *t means to go here. 903 00:40:05,560 --> 00:40:08,830 And then, the for loop is just saying move this character here, 904 00:40:08,830 --> 00:40:11,330 move this character here, move this character here. 905 00:40:11,330 --> 00:40:12,890 But how do I do that incrementation? 906 00:40:12,890 --> 00:40:15,430 I need to undo what I just deleted. 907 00:40:15,430 --> 00:40:18,140 This is what's generally called pointer arithmetic, which 908 00:40:18,140 --> 00:40:20,040 means math with addresses. 909 00:40:20,040 --> 00:40:22,460 >> If, in this for loop, I keep incrementing i, 910 00:40:22,460 --> 00:40:26,880 and s is an address and t is an address, if I just keep adding 1, 911 00:40:26,880 --> 00:40:31,406 that just means keep moving forward, and forward, and forward in the memory. 912 00:40:31,406 --> 00:40:34,030 It's like Oxford Street, the street that the CS building is on. 913 00:40:34,030 --> 00:40:36,490 The CS buildings is at 33 Oxford Street. 914 00:40:36,490 --> 00:40:39,870 So if you were to do 33 Oxford Street plus 1, 915 00:40:39,870 --> 00:40:42,870 that brings you to 34 Oxford Street, then 35 Oxford Street, 916 00:40:42,870 --> 00:40:46,380 then 36 Oxford Street, whatever those buildings actually are--if they exist. 917 00:40:46,380 --> 00:40:50,540 And so, that's all we're doing here with pointer arithmetic. 918 00:40:50,540 --> 00:40:53,820 >> So it's a super arcane way of expressing ourselves. 919 00:40:53,820 --> 00:40:56,160 But all that's happening underneath the hood 920 00:40:56,160 --> 00:40:59,330 is just following these addresses, like following a map, if you will, 921 00:40:59,330 --> 00:41:02,692 or following arrows like we've drawn on the screen. 922 00:41:02,692 --> 00:41:04,910 OK, a lot to digest. 923 00:41:04,910 --> 00:41:10,410 Any question on syntax, concepts, pointers, malloc, or the like. 924 00:41:10,410 --> 00:41:11,480 Yeah, over here first. 925 00:41:11,480 --> 00:41:13,755 >> AUDIENCE: So where that says *t equals toupper *t, 926 00:41:13,755 --> 00:41:15,575 is that going to capitalize all the letters or just-- 927 00:41:15,575 --> 00:41:17,283 >> DAVID J. MALAN: Ah, really good question. 928 00:41:17,283 --> 00:41:19,805 So in this line here, 31, is this going to capitalize 929 00:41:19,805 --> 00:41:21,430 the first letter or all of the letters. 930 00:41:21,430 --> 00:41:23,460 So let's answer that by going back to first principles. 931 00:41:23,460 --> 00:41:26,168 And first principles here I mean just go to the basic definitions 932 00:41:26,168 --> 00:41:27,000 of what's involved. 933 00:41:27,000 --> 00:41:29,770 So toupper's a function that capitalizes a char. 934 00:41:29,770 --> 00:41:30,530 That's all. 935 00:41:30,530 --> 00:41:36,740 *t means go to the first-- go to the address in t. 936 00:41:36,740 --> 00:41:40,350 So, in the picture, if this is the chunk of memory we allocated with malloc, 937 00:41:40,350 --> 00:41:43,310 and this is t, *t means go here. 938 00:41:43,310 --> 00:41:46,710 >> Meanwhile, you're passing that value, lowercase m 939 00:41:46,710 --> 00:41:50,040 to toupper, you're getting back capital M, where are you putting it? 940 00:41:50,040 --> 00:41:52,410 You're putting it in that same location. 941 00:41:52,410 --> 00:41:55,540 And so by that logic of those basic definitions it's only 942 00:41:55,540 --> 00:41:58,792 capitalizing the first letter unless you iterate with i or a 943 00:41:58,792 --> 00:42:02,000 for loop or a while loop, it's not going to do anything more than you ask it. 944 00:42:02,000 --> 00:42:02,583 Good question. 945 00:42:02,583 --> 00:42:03,237 Yeah? 946 00:42:03,237 --> 00:42:05,369 >> AUDIENCE: Why did you use the dereference method rather than 947 00:42:05,369 --> 00:42:05,979 the array? 948 00:42:05,979 --> 00:42:07,395 >> DAVID J. MALAN: Ah, good question. 949 00:42:07,395 --> 00:42:10,672 Why would you use the dereference method instead of the array method? 950 00:42:10,672 --> 00:42:12,130 No particular reason, to be honest. 951 00:42:12,130 --> 00:42:15,290 And, in fact, for this kind of example, right, 952 00:42:15,290 --> 00:42:17,556 I'm just arguing making the program more complicated, 953 00:42:17,556 --> 00:42:19,680 more eyes are glazing over, people are checking out 954 00:42:19,680 --> 00:42:22,830 because this looks super arcane, but even though it's doing the same thing. 955 00:42:22,830 --> 00:42:26,695 And so, frankly, this is an unnecessarily visually complex solution 956 00:42:26,695 --> 00:42:27,320 to the problem. 957 00:42:27,320 --> 00:42:29,580 >> It's still good design, five out of five for design, 958 00:42:29,580 --> 00:42:33,140 whether it's in the bracket notation or the pointer notation. 959 00:42:33,140 --> 00:42:36,299 But-- especially when we get later in the course in PSet 5 960 00:42:36,299 --> 00:42:39,340 when we implement that dictionary that I've mentioned a couple of times-- 961 00:42:39,340 --> 00:42:42,300 we'll actually care about the low level memory addresses 962 00:42:42,300 --> 00:42:44,140 that we really understand what's going on. 963 00:42:44,140 --> 00:42:48,300 >> But, for now, it turns out that this line of code here square brackets 964 00:42:48,300 --> 00:42:49,900 don't really exist. 965 00:42:49,900 --> 00:42:52,230 They are what's called syntactic sugar, which 966 00:42:52,230 --> 00:42:58,390 is just a weirdly cool way of saying the compiler converts square brackets to be 967 00:42:58,390 --> 00:43:00,420 that mathematical expression. 968 00:43:00,420 --> 00:43:02,660 So it's a human convention to be able to just write 969 00:43:02,660 --> 00:43:04,220 these very user-friendly brackets. 970 00:43:04,220 --> 00:43:06,850 But what the compiler, clang, is really doing any time 971 00:43:06,850 --> 00:43:10,970 you write what's highlighted in line 24, underneath the hood it's really 972 00:43:10,970 --> 00:43:12,330 converting it to this. 973 00:43:12,330 --> 00:43:16,200 It's just more pleasurable as a human to read and write code like line 24. 974 00:43:16,200 --> 00:43:18,530 But eventually those training wheels too come off 975 00:43:18,530 --> 00:43:21,780 when one's own comfort gets stronger. 976 00:43:21,780 --> 00:43:27,240 >> All right, so recall then that this was the sort of biggest problem 977 00:43:27,240 --> 00:43:27,807 we ran into. 978 00:43:27,807 --> 00:43:30,640 And that's what sparked this whole damn conversation about pointers, 979 00:43:30,640 --> 00:43:32,340 and addresses, and copying things. 980 00:43:32,340 --> 00:43:35,410 It was because we tripped over this stupid, stupid issue, whereby 981 00:43:35,410 --> 00:43:38,830 I implemented logically-- with Lauren up here on the demo and the orange juice 982 00:43:38,830 --> 00:43:43,770 in the milk-- a perfectly algorithmically correct function 983 00:43:43,770 --> 00:43:47,010 for swapping two variables' values, but the damn thing 984 00:43:47,010 --> 00:43:50,550 didn't have any persistent, or permanent, effect on my code. 985 00:43:50,550 --> 00:43:51,820 >> And why was that? 986 00:43:51,820 --> 00:43:54,650 In a nutshell, why is this implementation of swap 987 00:43:54,650 --> 00:43:58,740 logically correct, but has no impact on the variables that are passed to it, 988 00:43:58,740 --> 00:44:01,119 like x and y for main? 989 00:44:01,119 --> 00:44:02,410 What was the gist of the issue? 990 00:44:02,410 --> 00:44:02,909 Yeah? 991 00:44:02,909 --> 00:44:05,532 AUDIENCE: Because variable made copies of variable in the pass 992 00:44:05,532 --> 00:44:06,240 through function. 993 00:44:06,240 --> 00:44:09,060 >> DAVID J. MALAN: Exactly, when you pass variables into a function, or arguments 994 00:44:09,060 --> 00:44:11,030 into a function, they're passed by copy, which 995 00:44:11,030 --> 00:44:14,770 means you get an identical looking pattern of bits for both x and y, 996 00:44:14,770 --> 00:44:15,955 called here a and b. 997 00:44:15,955 --> 00:44:18,080 And you can do anything you want with those copies, 998 00:44:18,080 --> 00:44:20,657 but they're going to have no effect on the calling function. 999 00:44:20,657 --> 00:44:22,990 And, in fact, we drew that picture on the screen, recall 1000 00:44:22,990 --> 00:44:25,520 last time, whereby if you really think about what's 1001 00:44:25,520 --> 00:44:28,570 going on underneath the hood-- if this is your computer's memory, 1002 00:44:28,570 --> 00:44:31,650 and down here is the chunk of memory being used for main, 1003 00:44:31,650 --> 00:44:34,020 this is the chunk of memory being used for swap, 1004 00:44:34,020 --> 00:44:37,090 and so even if main has two variables, x and y, 1005 00:44:37,090 --> 00:44:41,840 swap might have identical looking values, both of which are 1 and 2, 1006 00:44:41,840 --> 00:44:44,520 but they're completely different chunks of memory. 1007 00:44:44,520 --> 00:44:46,130 >> So we need a solution to this. 1008 00:44:46,130 --> 00:44:51,580 And frankly, it would seem that we now have a solution to this problem, right. 1009 00:44:51,580 --> 00:44:55,760 If we now have the ability to manipulate things by way of addresses 1010 00:44:55,760 --> 00:44:59,310 and, sort of chutes and ladders style, follow these arrows 1011 00:44:59,310 --> 00:45:02,820 and go anywhere we want in memory, couldn't we 1012 00:45:02,820 --> 00:45:06,220 solve this problem by passing from main to swap 1013 00:45:06,220 --> 00:45:09,650 not the values we want to swap, but just intuitively 1014 00:45:09,650 --> 00:45:11,630 what could we pass to swap instead? 1015 00:45:11,630 --> 00:45:12,620 >> [INTERPOSING VOICES] 1016 00:45:12,620 --> 00:45:15,244 >> DAVID J. MALAN: Why don't we just pass it the addresses, right? 1017 00:45:15,244 --> 00:45:17,470 Why don't we give swap a treasure map, if you will, 1018 00:45:17,470 --> 00:45:20,950 that leads it to the actual values x and y. 1019 00:45:20,950 --> 00:45:24,340 Let's swap, actually change those original bits, rather than 1020 00:45:24,340 --> 00:45:26,797 just passing copies of the bits. 1021 00:45:26,797 --> 00:45:29,130 And so, in fact, that's what's going to be the solution. 1022 00:45:29,130 --> 00:45:31,899 This version here is clearly bad and flawed. 1023 00:45:31,899 --> 00:45:35,190 And now, at first glance, it just looks like we added a bunch of stars randomly 1024 00:45:35,190 --> 00:45:37,106 and crossed our fingers that it would compile. 1025 00:45:37,106 --> 00:45:38,460 But, it would now compile. 1026 00:45:38,460 --> 00:45:40,090 >> But let's see what these things mean. 1027 00:45:40,090 --> 00:45:43,990 And, unfortunately, the authors of C could have chosen another symbol 1028 00:45:43,990 --> 00:45:46,380 to make this a little clearer, but the star operator 1029 00:45:46,380 --> 00:45:48,610 has different meaning in two different contexts. 1030 00:45:48,610 --> 00:45:50,890 And we've seen both, but let's distinguish. 1031 00:45:50,890 --> 00:45:55,310 >> So up at the top there, when I have changed a and b 1032 00:45:55,310 --> 00:46:00,470 from being int's in the bad version to int stars, a and b, 1033 00:46:00,470 --> 00:46:01,740 previously, were integers. 1034 00:46:01,740 --> 00:46:05,752 What are a and b now in the good, green version? 1035 00:46:05,752 --> 00:46:06,900 They're addresses. 1036 00:46:06,900 --> 00:46:09,610 Addresses of what, to be clear? 1037 00:46:09,610 --> 00:46:10,770 Addresses of integers. 1038 00:46:10,770 --> 00:46:12,520 So the fact that I'm saying int star means 1039 00:46:12,520 --> 00:46:15,440 this is the address of an integer, specifically. 1040 00:46:15,440 --> 00:46:19,120 >> So now notice in the lines of code, something else has changed too. 1041 00:46:19,120 --> 00:46:22,770 tmp stays the same, because it's just the temporary integer, 1042 00:46:22,770 --> 00:46:24,110 no memory magic there. 1043 00:46:24,110 --> 00:46:26,370 But a now needs a star. 1044 00:46:26,370 --> 00:46:28,560 And, in fact, every other mention of a and b, 1045 00:46:28,560 --> 00:46:31,780 notice that all that's changing from red to green 1046 00:46:31,780 --> 00:46:34,209 is that I'm prefixing those variables with stars. 1047 00:46:34,209 --> 00:46:35,750 Because I don't want to copy a and b. 1048 00:46:35,750 --> 00:46:40,350 Because if I just copy a and b and swap a and b, what am I actually swapping? 1049 00:46:40,350 --> 00:46:43,760 Just addresses, I want to swap what's at those addresses. 1050 00:46:43,760 --> 00:46:44,860 I want to go there. 1051 00:46:44,860 --> 00:46:48,000 And so the star operator inside of my function, 1052 00:46:48,000 --> 00:46:51,700 not inside of the parameter list, means you go to those addresses 1053 00:46:51,700 --> 00:46:54,490 and actually change those values. 1054 00:46:54,490 --> 00:46:56,500 >> So what does the picture now look like instead. 1055 00:46:56,500 --> 00:47:03,250 Well, if instead I'm passing in for a and b not 1 and 2-- 1056 00:47:03,250 --> 00:47:05,790 I actually need to add one other definition here. 1057 00:47:05,790 --> 00:47:09,030 So suppose that this chunk of memory is at location 10. 1058 00:47:09,030 --> 00:47:12,960 >> This is at location 11, but this is a bit of a simplification, 1059 00:47:12,960 --> 00:47:18,900 I now have two choices do I pass x and y or do I pass their addresses? 1060 00:47:18,900 --> 00:47:22,500 If I pass their addresses like this, I just 1061 00:47:22,500 --> 00:47:25,390 now need to implement swap per the green code 1062 00:47:25,390 --> 00:47:29,080 so that when it sees a and when it sees b, it doesn't just copy a and b 1063 00:47:29,080 --> 00:47:30,540 and move the milk and orange juice. 1064 00:47:30,540 --> 00:47:32,664 The milk and orange juice metaphor now breaks down, 1065 00:47:32,664 --> 00:47:35,060 because those are cups of liquid and not maps. 1066 00:47:35,060 --> 00:47:37,750 We instead need to go to address 10 and we 1067 00:47:37,750 --> 00:47:42,420 need to go to address 11, and then perform that swapping logic. 1068 00:47:42,420 --> 00:47:45,580 >> So the logic is the same, but we need a slightly different way 1069 00:47:45,580 --> 00:47:47,160 of accessing those variables. 1070 00:47:47,160 --> 00:47:52,400 And so in the end, what the program has to look like is this. 1071 00:47:52,400 --> 00:47:56,610 In swap.c literally copied and pasted the green version. 1072 00:47:56,610 --> 00:47:58,450 But I need to make one change. 1073 00:47:58,450 --> 00:48:00,180 It's not sufficient just to change swap. 1074 00:48:00,180 --> 00:48:03,830 What other line of code do I need to change? 1075 00:48:03,830 --> 00:48:04,330 Yeah? 1076 00:48:04,330 --> 00:48:05,770 >> AUDIENCE: Where it takes the arguments. 1077 00:48:05,770 --> 00:48:07,603 >> DAVID J. MALAN: Where it takes its argument. 1078 00:48:07,603 --> 00:48:09,985 So if I scroll up to main, I can't just pass in x and y, 1079 00:48:09,985 --> 00:48:12,820 and, I promise, the last piece of new syntax today. 1080 00:48:12,820 --> 00:48:17,200 I need to pass in not x and y but the address of x and y. 1081 00:48:17,200 --> 00:48:20,400 And it turns out, the symbol that the authors of C chose 1082 00:48:20,400 --> 00:48:23,860 is if you use an ampersand here, not to be confused with the bitwise ampersand, 1083 00:48:23,860 --> 00:48:27,130 if you use an ampersand here and an ampersand here, 1084 00:48:27,130 --> 00:48:29,570 this figures out for you, what's the address of x, 1085 00:48:29,570 --> 00:48:31,740 maybe it's 10, what's the address of y, maybe it's 1086 00:48:31,740 --> 00:48:35,400 11, and passes those in instead. 1087 00:48:35,400 --> 00:48:37,210 >> So a lot to absorb all at once. 1088 00:48:37,210 --> 00:48:40,190 But let's see now quickly in our remaining four minutes 1089 00:48:40,190 --> 00:48:42,150 where things can go awry. 1090 00:48:42,150 --> 00:48:45,120 And as an aside, actually I took this picture, 1091 00:48:45,120 --> 00:48:46,920 TF took this picture a year or two ago. 1092 00:48:46,920 --> 00:48:49,190 So this is the back corner of Eliot Dining Hall. 1093 00:48:49,190 --> 00:48:52,310 Pointers are perhaps the hardest topic that we cover in CS50. 1094 00:48:52,310 --> 00:48:54,810 So if you worry the sort of slope is like maybe it's 1095 00:48:54,810 --> 00:48:56,770 more of a hockey stick like this, realize 1096 00:48:56,770 --> 00:49:00,160 we're kind of nearing a peak in terms of the conceptual complexity. 1097 00:49:00,160 --> 00:49:02,300 >> And I bring up this photo, because I swear 1098 00:49:02,300 --> 00:49:05,920 to god, in fall 1996, when I took CS50 with my teaching fellow, 1099 00:49:05,920 --> 00:49:09,620 Nishat Mehta, he sat me down in the corner of the Eliot D. Hall over lunch, 1100 00:49:09,620 --> 00:49:12,330 or dinner, or something to try to help me understand pointers. 1101 00:49:12,330 --> 00:49:16,520 And this is where I was weeks after it was introduced in lecture when 1102 00:49:16,520 --> 00:49:18,170 I finally understood pointers. 1103 00:49:18,170 --> 00:49:20,590 And I'm hopeful that this will click far sooner for you. 1104 00:49:20,590 --> 00:49:23,540 But realize this absolutely among the more sophisticated topics 1105 00:49:23,540 --> 00:49:24,420 we've looked at. 1106 00:49:24,420 --> 00:49:25,819 But it's among the most powerful. 1107 00:49:25,819 --> 00:49:28,860 And when you get it, it's really all just going to finally come together. 1108 00:49:28,860 --> 00:49:31,460 So rest assured it doesn't need to all sink in today. 1109 00:49:31,460 --> 00:49:32,980 >> So here's the last program we're going to look at. 1110 00:49:32,980 --> 00:49:35,605 And we're going to end with a quick three minutes of claymation 1111 00:49:35,605 --> 00:49:37,030 made by our friend, Nick Parlante. 1112 00:49:37,030 --> 00:49:41,440 Here's a program, that on the top two lines declares a variable x and y. 1113 00:49:41,440 --> 00:49:44,780 Both of which are addresses of integers, AKA pointers. 1114 00:49:44,780 --> 00:49:48,125 We then allocate enough memory to store an int 1115 00:49:48,125 --> 00:49:51,344 and store the address of that memory in x. 1116 00:49:51,344 --> 00:49:53,260 So, it's even simpler than the example before. 1117 00:49:53,260 --> 00:49:56,100 Give me four bytes of memory, that's the size of an int, 1118 00:49:56,100 --> 00:49:58,000 and put that address in x. 1119 00:49:58,000 --> 00:50:01,070 This line here means go to the address in x 1120 00:50:01,070 --> 00:50:05,270 and put the meaning of life, the number 42 there. 1121 00:50:05,270 --> 00:50:07,710 But this line worries me. 1122 00:50:07,710 --> 00:50:12,620 Star y means go to the address in y, and put the unlucky number 13 there. 1123 00:50:12,620 --> 00:50:15,780 Why is it dangerous, at this point in the story-- albeit rapidly told 1124 00:50:15,780 --> 00:50:17,980 in our waning minutes here-- why is it bad 1125 00:50:17,980 --> 00:50:19,660 for me to say, go to the address in y? 1126 00:50:19,660 --> 00:50:21,077 >> AUDIENCE: You haven't [INAUDIBLE]. 1127 00:50:21,077 --> 00:50:22,910 DAVID J. MALAN: I haven't put anything in y. 1128 00:50:22,910 --> 00:50:25,520 So what is the value of y, at this point in the story? 1129 00:50:25,520 --> 00:50:26,570 We have no idea. 1130 00:50:26,570 --> 00:50:29,190 It's some garbage value and nor does Binky know. 1131 00:50:29,190 --> 00:50:32,532 If we could end on this note. 1132 00:50:32,532 --> 00:50:34,832 >> [VIDEO PLAYBACK] 1133 00:50:34,832 --> 00:50:36,500 >> -Hey, Binky, wake up. 1134 00:50:36,500 --> 00:50:39,140 It's time for pointer fun. 1135 00:50:39,140 --> 00:50:40,210 >> -What's that? 1136 00:50:40,210 --> 00:50:41,690 Learn about pointers? 1137 00:50:41,690 --> 00:50:43,570 Oh, goody. 1138 00:50:43,570 --> 00:50:46,600 >> -Well, to get started, I guess we're going to need a couple pointers. 1139 00:50:46,600 --> 00:50:47,380 >> -OK. 1140 00:50:47,380 --> 00:50:51,120 This code allocates two pointers which can point to integers. 1141 00:50:51,120 --> 00:50:53,557 >> -OK, well I see the two pointers, but they 1142 00:50:53,557 --> 00:50:55,140 don't seem to be pointing to anything. 1143 00:50:55,140 --> 00:50:55,970 >> -That's right. 1144 00:50:55,970 --> 00:50:58,100 Initially pointers don't point to anything. 1145 00:50:58,100 --> 00:51:00,950 The things they point to are called pointees and setting them up 1146 00:51:00,950 --> 00:51:02,330 is a separate step. 1147 00:51:02,330 --> 00:51:03,210 >> -Oh, right, right. 1148 00:51:03,210 --> 00:51:03,940 I knew that. 1149 00:51:03,940 --> 00:51:05,730 The pointees are separate. 1150 00:51:05,730 --> 00:51:08,310 So how do you allocate a pointee? 1151 00:51:08,310 --> 00:51:11,960 >> -OK, well this code allocates a new integer pointee, 1152 00:51:11,960 --> 00:51:15,050 and this part sets x to point to it. 1153 00:51:15,050 --> 00:51:16,240 >> -Hey, that looks better. 1154 00:51:16,240 --> 00:51:17,743 So make it do something. 1155 00:51:17,743 --> 00:51:23,580 >> -OK, I'll dereference the pointer x to store the number 42 into its pointee. 1156 00:51:23,580 --> 00:51:27,130 For this trick, I'll need my magic wand of dereferencing. 1157 00:51:27,130 --> 00:51:30,200 >> -Your magic wand of dereferencing? 1158 00:51:30,200 --> 00:51:32,310 Uh, that, that's great. 1159 00:51:32,310 --> 00:51:34,270 >> -This is what the code looks like. 1160 00:51:34,270 --> 00:51:35,970 I'll just set up the number and-- 1161 00:51:35,970 --> 00:51:37,070 >> [POP SOUND] 1162 00:51:37,070 --> 00:51:39,140 >> -Hey, look there it goes. 1163 00:51:39,140 --> 00:51:43,980 So, doing a dereference on x follows the arrow to access its pointee. 1164 00:51:43,980 --> 00:51:46,150 In this case, to store 42 in there. 1165 00:51:46,150 --> 00:51:50,700 Hey, try using it to store the number 13 through the other pointer, y. 1166 00:51:50,700 --> 00:51:51,840 >> -OK. 1167 00:51:51,840 --> 00:51:56,270 I'll just go over here to y, and get the number 13 set up. 1168 00:51:56,270 --> 00:52:00,380 And then take the wand of dereferencing and just-- 1169 00:52:00,380 --> 00:52:01,646 >> [BUZZER SOUND] 1170 00:52:01,646 --> 00:52:04,080 >> -Oh, hey that didn't work. 1171 00:52:04,080 --> 00:52:06,470 Say, uh, Binky, I don't think dereferencing 1172 00:52:06,470 --> 00:52:10,850 y is a good idea, because setting up the pointee is a separate step. 1173 00:52:10,850 --> 00:52:12,480 And I don't think we ever did it. 1174 00:52:12,480 --> 00:52:14,620 >> -Hmm, good point. 1175 00:52:14,620 --> 00:52:19,810 >> -Yeah, we allocated the pointer, y, but we never set it to point to a pointee. 1176 00:52:19,810 --> 00:52:21,590 >> -Hmm, very observant. 1177 00:52:21,590 --> 00:52:23,215 -Hey, you're looking good there, Binky. 1178 00:52:23,215 --> 00:52:26,390 Can you fix it so that y points to the same pointee as x. 1179 00:52:26,390 --> 00:52:29,290 >> -Sure, I use my magic wand of pointer assignment. 1180 00:52:29,290 --> 00:52:31,970 >> -Is that going to be a problem, like before? 1181 00:52:31,970 --> 00:52:33,790 >> -No, this doesn't touch the pointees. 1182 00:52:33,790 --> 00:52:35,840 It just changes one pointer to point to the same thing-- 1183 00:52:35,840 --> 00:52:36,465 >> [POPPING SOUND] 1184 00:52:36,465 --> 00:52:37,450 --as another. 1185 00:52:37,450 --> 00:52:38,440 >> -Oh, I see. 1186 00:52:38,440 --> 00:52:41,200 Now y points to the same place as x. 1187 00:52:41,200 --> 00:52:42,950 So, wait, now y is fixed. 1188 00:52:42,950 --> 00:52:44,110 It has a pointee. 1189 00:52:44,110 --> 00:52:47,779 So you can try the wand of dereferencing again to send the 13 over. 1190 00:52:47,779 --> 00:52:51,110 >> -Oh, OK, here goes. 1191 00:52:51,110 --> 00:52:52,330 >> -Hey, look at that. 1192 00:52:52,330 --> 00:52:53,570 Now dereferencing works on y. 1193 00:52:53,570 --> 00:52:57,900 And because the pointers are sharing that one pointee, they both see the 13. 1194 00:52:57,900 --> 00:52:59,952 >> -Yeah, sharing, uh, whatever. 1195 00:52:59,952 --> 00:53:01,535 So, are we going to switch places now? 1196 00:53:01,535 --> 00:53:03,730 >> -Oh, look we're out of time. 1197 00:53:03,730 --> 00:53:04,660 >> -But-- 1198 00:53:04,660 --> 00:53:06,520 >> -Just remember the three pointer rules. 1199 00:53:06,520 --> 00:53:09,550 Number 1, the basic structure is that you have a pointer, 1200 00:53:09,550 --> 00:53:11,630 and it points over to a pointee. 1201 00:53:11,630 --> 00:53:13,740 But the pointer and pointee are separate. 1202 00:53:13,740 --> 00:53:15,620 And the common error is to set up a pointer 1203 00:53:15,620 --> 00:53:18,000 but to forget to give it a pointee. 1204 00:53:18,000 --> 00:53:21,170 >> Number 2, pointer dereferencing starts at the pointer 1205 00:53:21,170 --> 00:53:24,020 and follows its arrow over to access its pointee. 1206 00:53:24,020 --> 00:53:27,815 As we all know, this only works if there is a pointee, which kind of gets back 1207 00:53:27,815 --> 00:53:29,260 to rule number 1. 1208 00:53:29,260 --> 00:53:31,990 >> Number 3, pointer assignment takes one pointer 1209 00:53:31,990 --> 00:53:35,330 and changes it to point to the same pointee as another pointer. 1210 00:53:35,330 --> 00:53:37,150 So after the assignment, the two pointers 1211 00:53:37,150 --> 00:53:40,927 will point to the same pointee, sometimes that's called sharing. 1212 00:53:40,927 --> 00:53:42,510 And that's all there is to it, really. 1213 00:53:42,510 --> 00:53:43,130 Bye-bye now. 1214 00:53:43,130 --> 00:53:43,475 >> [END PLAYBACK] 1215 00:53:43,475 --> 00:53:44,830 >> DAVID J. MALAN: That's it for CS50. 1216 00:53:44,830 --> 00:53:46,246 Thanks to Professor Nick Parlante. 1217 00:53:46,246 --> 00:53:47,730 We'll see you next week. 1218 00:53:47,730 --> 00:53:51,706 1219 00:53:51,706 --> 00:53:56,435 >> [ELECTRONIC MUSIC PLAYING] 1220 00:53:56,435 --> 00:57:22,775