1 00:00:00,000 --> 00:00:01,996 2 00:00:01,996 --> 00:00:07,485 [MUSIC PLAYING] 3 00:00:07,485 --> 00:01:13,297 4 00:01:13,297 --> 00:01:14,380 DAVID J. MALAN: All right. 5 00:01:14,380 --> 00:01:15,940 This is CS50. 6 00:01:15,940 --> 00:01:17,260 And this is week four. 7 00:01:17,260 --> 00:01:19,872 And if you think back a few weeks ago already, in week zero, 8 00:01:19,872 --> 00:01:21,580 we started talking about what images are, 9 00:01:21,580 --> 00:01:25,690 and we talked about representation of images as this grid of pixels. 10 00:01:25,690 --> 00:01:28,923 And each pixel has some pattern of bits that defines its color. 11 00:01:28,923 --> 00:01:31,840 Well, it turns out today, we'll take a deeper look underneath the hood 12 00:01:31,840 --> 00:01:34,360 at how things like images, and so much more, 13 00:01:34,360 --> 00:01:37,240 is actually implemented using just these zeros and ones, 14 00:01:37,240 --> 00:01:40,300 and how now as a programmer, you can actually 15 00:01:40,300 --> 00:01:43,750 harness that, for better or for worse, to better understand and better 16 00:01:43,750 --> 00:01:46,090 manipulate what's going on inside of a computer's memory 17 00:01:46,090 --> 00:01:47,590 using a language like C. 18 00:01:47,590 --> 00:01:50,200 In fact, even this bowl of stress balls that we keep happening 19 00:01:50,200 --> 00:01:51,567 is just a photograph of course. 20 00:01:51,567 --> 00:01:54,400 But if you think back to week zero, if you sort of enhance, enhance, 21 00:01:54,400 --> 00:01:56,860 enhance this image, like they do in the movies, 22 00:01:56,860 --> 00:02:00,310 it actually doesn't work out the way you would think from Hollywood. 23 00:02:00,310 --> 00:02:04,900 As I keep continue to zoom in, and zoom in, and zoom in on a screen like this, 24 00:02:04,900 --> 00:02:06,470 you'll see that yes, it gets bigger. 25 00:02:06,470 --> 00:02:09,190 But if it gets too big, what do you start to notice? 26 00:02:09,190 --> 00:02:10,539 The so-called pixelation. 27 00:02:10,539 --> 00:02:12,550 And indeed, you can see the individual dots. 28 00:02:12,550 --> 00:02:16,810 So next time you watch some show or movie on TV 29 00:02:16,810 --> 00:02:19,030 that has this sort of notion of enhancing, 30 00:02:19,030 --> 00:02:20,680 there's actually a finite limit there. 31 00:02:20,680 --> 00:02:23,680 You can only enhance so far as there's actually information there. 32 00:02:23,680 --> 00:02:27,487 But once you zoom in to a certain level like this, that's all that's there. 33 00:02:27,487 --> 00:02:30,070 You're not going to see the glint of the suspect in some crime 34 00:02:30,070 --> 00:02:33,260 drama in their eye just because you've enhanced the image. 35 00:02:33,260 --> 00:02:36,130 There's only a finite amount of information actually there. 36 00:02:36,130 --> 00:02:39,085 But we'll see today too that by understanding 37 00:02:39,085 --> 00:02:40,960 what's going on inside of a computer's memory 38 00:02:40,960 --> 00:02:43,150 we can start to represent and even create and code 39 00:02:43,150 --> 00:02:44,180 more interesting things. 40 00:02:44,180 --> 00:02:49,060 So for instance, here is a bitmap, if you will, which is a term of art. 41 00:02:49,060 --> 00:02:51,037 A bitmap is a type of image. 42 00:02:51,037 --> 00:02:52,870 And it's a map of bits in the sense that you 43 00:02:52,870 --> 00:02:54,912 have this coordinate system of a top, down, left, 44 00:02:54,912 --> 00:02:57,530 right at least in this artist's representation here. 45 00:02:57,530 --> 00:03:02,050 And suppose that maybe we all decide as the world 46 00:03:02,050 --> 00:03:05,080 that one shall represent the color white and zero 47 00:03:05,080 --> 00:03:06,850 shall represent the color black. 48 00:03:06,850 --> 00:03:12,160 What might this map of bits, this bitmap, actually be? 49 00:03:12,160 --> 00:03:13,330 Can you see through it? 50 00:03:13,330 --> 00:03:13,830 Yeah. 51 00:03:13,830 --> 00:03:14,740 AUDIENCE: [INAUDIBLE] 52 00:03:14,740 --> 00:03:16,900 DAVID J. MALAN: It is indeed a smiley face. 53 00:03:16,900 --> 00:03:18,070 So an amazing eye. 54 00:03:18,070 --> 00:03:21,430 If I actually turn all of the ones to white just to visualize this, 55 00:03:21,430 --> 00:03:23,595 you'll see indeed, this is what was embedded there. 56 00:03:23,595 --> 00:03:25,720 But of course, on our computer monitors and phones, 57 00:03:25,720 --> 00:03:28,390 we have this grid of squares, this grid of pixels. 58 00:03:28,390 --> 00:03:30,970 So indeed, if you were to actually see on your screen 59 00:03:30,970 --> 00:03:34,152 a smiley face, like a black and white one at that, what's probably going on 60 00:03:34,152 --> 00:03:36,610 underneath the hood is just some pattern of zeros and ones, 61 00:03:36,610 --> 00:03:39,520 and maybe single bits, one bit color, if you will, 62 00:03:39,520 --> 00:03:43,210 where one here represents white and zero represents black. 63 00:03:43,210 --> 00:03:45,370 So if you kind of like this thing, it turns out 64 00:03:45,370 --> 00:03:49,150 you can do pretty beautiful, pretty interesting, pretty artistically 65 00:03:49,150 --> 00:03:50,150 inclined things. 66 00:03:50,150 --> 00:03:53,568 If you go to this URL at your leisure, cs50.ly.art, 67 00:03:53,568 --> 00:03:56,860 it'll actually redirect you to a Google spreadsheet that we've made in advance. 68 00:03:56,860 --> 00:03:58,870 And we've kind of shrunk the rows and columns 69 00:03:58,870 --> 00:04:02,470 to resemble a grid of pixels, tiny little squares, all of which 70 00:04:02,470 --> 00:04:06,010 are white by default, not unlike this easel here 71 00:04:06,010 --> 00:04:08,500 that we have a couple of volunteers working away at. 72 00:04:08,500 --> 00:04:10,875 In fact, would you guys like to come forward for a moment 73 00:04:10,875 --> 00:04:13,040 and say a quick hello before we come back to you? 74 00:04:13,040 --> 00:04:13,630 DANIEL: Hello. 75 00:04:13,630 --> 00:04:14,530 My name is Daniel. 76 00:04:14,530 --> 00:04:15,542 I'm from Chicago. 77 00:04:15,542 --> 00:04:16,959 DAVID J. MALAN: Welcome to Daniel. 78 00:04:16,959 --> 00:04:17,320 And-- 79 00:04:17,320 --> 00:04:18,112 ADAM: Hi, everyone. 80 00:04:18,112 --> 00:04:19,000 I'm Adam. 81 00:04:19,000 --> 00:04:20,950 And I'm from Trinidad and Tobago. 82 00:04:20,950 --> 00:04:21,250 DAVID J. MALAN: Nice. 83 00:04:21,250 --> 00:04:22,333 Well, welcome to you both. 84 00:04:22,333 --> 00:04:23,020 Thank you. 85 00:04:23,020 --> 00:04:24,880 You'll see that in their hands are actually 86 00:04:24,880 --> 00:04:27,922 a whole bunch of pixels, post-it notes that we've handed them in advance. 87 00:04:27,922 --> 00:04:30,713 So if you don't mind, we'll come back to you in a couple of minutes 88 00:04:30,713 --> 00:04:33,640 and see what they've created, if you will, on this grid of white paper 89 00:04:33,640 --> 00:04:35,890 much like you could create on this Google spreadsheet. 90 00:04:35,890 --> 00:04:39,790 In fact, feel free to send us your creations if so inclined via the URL 91 00:04:39,790 --> 00:04:42,400 you'll get at cs50.ly/art. 92 00:04:42,400 --> 00:04:45,700 Now let's come back to week zero where we define some of the building 93 00:04:45,700 --> 00:04:46,480 blocks for images. 94 00:04:46,480 --> 00:04:49,240 We talked about RGB, which is just red, green, blue. 95 00:04:49,240 --> 00:04:51,550 And it just one of the systems, a popular system, 96 00:04:51,550 --> 00:04:53,890 via which you can represent any color of the rainbow 97 00:04:53,890 --> 00:04:57,950 using some combination of red, and green, and blue. 98 00:04:57,950 --> 00:05:00,190 And if any of you are artistically inclined 99 00:05:00,190 --> 00:05:02,920 or have used Photoshop or similar programs, 100 00:05:02,920 --> 00:05:05,560 you might typically have some means of selecting 101 00:05:05,560 --> 00:05:07,460 a color via some grid like this. 102 00:05:07,460 --> 00:05:09,910 But down here, notice there's explicit mentions 103 00:05:09,910 --> 00:05:11,800 of the types of color systems in use. 104 00:05:11,800 --> 00:05:13,150 RGB. 105 00:05:13,150 --> 00:05:15,820 And in fact, here, you see zero, zero, zero. 106 00:05:15,820 --> 00:05:18,460 And up here under New, you see the color black. 107 00:05:18,460 --> 00:05:21,340 And that implies that if you have no red, no green, no blue, well, 108 00:05:21,340 --> 00:05:24,340 that indeed would represent by convention the color black. 109 00:05:24,340 --> 00:05:27,910 By contrast, if we play around with Photoshop or any similar program, 110 00:05:27,910 --> 00:05:31,360 if you have a lot of red, a lot of green, and a lot of blue, 111 00:05:31,360 --> 00:05:36,880 for instance, 255, 255, 255, really crank it up to the max value, 112 00:05:36,880 --> 00:05:40,720 you can represent with 8 bits per week zero, well then it turns out 113 00:05:40,720 --> 00:05:42,490 you get the color white here. 114 00:05:42,490 --> 00:05:44,650 And we can play with these numbers endlessly. 115 00:05:44,650 --> 00:05:50,290 For instance, if we use 255 of red, but zero green and zero blue, 116 00:05:50,290 --> 00:05:54,100 not surprisingly, the square at the top of the screen becomes of course red 117 00:05:54,100 --> 00:05:57,020 entirely because it's all red and no green, no blue. 118 00:05:57,020 --> 00:06:01,790 If we change it instead to 255 for green but zero for red and blue, of course, 119 00:06:01,790 --> 00:06:02,500 we get green. 120 00:06:02,500 --> 00:06:06,130 And then lastly, if we crank up the blue but leave red and green as zero, 121 00:06:06,130 --> 00:06:07,360 we of course get blue. 122 00:06:07,360 --> 00:06:09,730 But all this while, down here highlighted 123 00:06:09,730 --> 00:06:12,210 is something that maybe some of you have seen before, 124 00:06:12,210 --> 00:06:14,250 like some combination of numbers and letters. 125 00:06:14,250 --> 00:06:18,130 If any of you have made personal web pages or used programs like Photoshop, 126 00:06:18,130 --> 00:06:20,130 you might have used these so-called color codes. 127 00:06:20,130 --> 00:06:25,290 So indeed, the world has this convention whereby using six digits, or sometimes 128 00:06:25,290 --> 00:06:29,170 three, you can represent a little more succinctly some amount of red, 129 00:06:29,170 --> 00:06:30,330 green, blue. 130 00:06:30,330 --> 00:06:34,840 And you'll see here, maybe by inference, that if RGB is zero, 131 00:06:34,840 --> 00:06:38,430 zero, 255 respectively, perhaps where we're going with this 132 00:06:38,430 --> 00:06:43,200 is that zero, zero, zero, zero, FF is just an alternative way of expressing 133 00:06:43,200 --> 00:06:44,250 the exact same idea. 134 00:06:44,250 --> 00:06:46,710 No red, no green, and a lot of blue. 135 00:06:46,710 --> 00:06:48,210 But why is that? 136 00:06:48,210 --> 00:06:51,390 And in fact, we'll come full circle here to introducing something 137 00:06:51,390 --> 00:06:53,220 that we could have done in week zero, but it doesn't really 138 00:06:53,220 --> 00:06:54,240 solve a problem then. 139 00:06:54,240 --> 00:06:57,900 But today, as we focus more on images and on memory itself, 140 00:06:57,900 --> 00:07:00,520 turns out understanding these patterns is pretty useful. 141 00:07:00,520 --> 00:07:03,390 So back in week zero, we talked, of course, about binary. 142 00:07:03,390 --> 00:07:07,560 And binary by implying two, only gives you two digits, zero and one. 143 00:07:07,560 --> 00:07:10,530 You and I as humans almost always use the decimal system 144 00:07:10,530 --> 00:07:12,810 in normal conversation, dec meaning 10. 145 00:07:12,810 --> 00:07:15,270 So we have zero through nine instead. 146 00:07:15,270 --> 00:07:20,340 If a human like us wants to count up as high as 10, or 11, or 12, 147 00:07:20,340 --> 00:07:23,220 we don't have a digit per se for 10, 11, and 12. 148 00:07:23,220 --> 00:07:24,850 We start reusing digits. 149 00:07:24,850 --> 00:07:27,580 So it's one zero, one one, one two, and so forth. 150 00:07:27,580 --> 00:07:30,960 But in other systems, not binary, not decimal, 151 00:07:30,960 --> 00:07:34,950 but systems called hexadecimal, hex implying 16, 152 00:07:34,950 --> 00:07:39,240 there are actually more digits than these which might come as a surprise. 153 00:07:39,240 --> 00:07:42,403 It's not pairs of digits, like in decimal, single digits. 154 00:07:42,403 --> 00:07:44,820 And frankly, it doesn't really matter what the digits are. 155 00:07:44,820 --> 00:07:46,020 Because at the end of the day, these are just 156 00:07:46,020 --> 00:07:49,080 symbols that you and I immediately associate with some notion of math, 157 00:07:49,080 --> 00:07:52,980 but just strokes on the screen that represent some-- 158 00:07:52,980 --> 00:07:54,670 represent some actual value. 159 00:07:54,670 --> 00:07:58,680 So it turns out that by convention, when you want more than nine-- 160 00:07:58,680 --> 00:08:01,380 10 digits, zero through nine, you start using 161 00:08:01,380 --> 00:08:06,090 letters of the English alphabet, A, B, C, D, E, and F. 162 00:08:06,090 --> 00:08:07,770 And you can represent them in lowercase. 163 00:08:07,770 --> 00:08:08,820 It's case insensitive. 164 00:08:08,820 --> 00:08:09,690 So it doesn't really matter. 165 00:08:09,690 --> 00:08:11,482 You might see it in uppercase or lowercase. 166 00:08:11,482 --> 00:08:14,190 But this is how you can count beyond nine not 167 00:08:14,190 --> 00:08:17,490 using decimal but using Indeed something called hexadecimal. 168 00:08:17,490 --> 00:08:20,700 If we get really technical, this is also known as base-16. 169 00:08:20,700 --> 00:08:22,410 And it's the same idea as week zero where 170 00:08:22,410 --> 00:08:25,770 instead of using base two for binary, base-10 for decimal, 171 00:08:25,770 --> 00:08:28,500 you use 16 as the base for hexadecimal. 172 00:08:28,500 --> 00:08:31,740 And so if we run through just some simple examples here 173 00:08:31,740 --> 00:08:35,789 in the world of hexadecimal, your columns are just powers of 16. 174 00:08:35,789 --> 00:08:40,419 16 to the 0, 16 to the 1, 16 to the 2, and so forth. 175 00:08:40,419 --> 00:08:43,919 But in the world of hex, we usually, at least thus far, and today, we'll 176 00:08:43,919 --> 00:08:45,760 see just pairs of digits like this. 177 00:08:45,760 --> 00:08:49,260 So here, for instance, is the ones column, and the 16's column 178 00:08:49,260 --> 00:08:50,500 if we multiply that out. 179 00:08:50,500 --> 00:08:52,560 So if you wanted to represent the number you 180 00:08:52,560 --> 00:08:56,820 and I know in the real world as zero in hexadecimal, 181 00:08:56,820 --> 00:08:58,530 it would just be zero, zero. 182 00:08:58,530 --> 00:09:01,500 If you want to represent the number one, it would be zero one. 183 00:09:01,500 --> 00:09:05,466 And from there, we get zero two, zero three, zero four, zero five, zero six, 184 00:09:05,466 --> 00:09:10,230 zero seven, zero eight, zero nine, now things get potentially interesting. 185 00:09:10,230 --> 00:09:12,330 In decimal, it would obviously become 10. 186 00:09:12,330 --> 00:09:17,010 But in hexadecimal, it just becomes zero a then zero 187 00:09:17,010 --> 00:09:19,980 b, which is to say, if I rewind, after nine 188 00:09:19,980 --> 00:09:23,160 comes in hexadecimal, if I pronounce it in decimal, 189 00:09:23,160 --> 00:09:24,930 this is how you'd represent 10. 190 00:09:24,930 --> 00:09:30,450 This is how you'd represent 11, 12, 13, 14, and then lastly in hexadecimal, 191 00:09:30,450 --> 00:09:35,950 the 16th value is F, which is just always going to represent 15. 192 00:09:35,950 --> 00:09:39,210 So where-- how do we connect this to some of the past math? 193 00:09:39,210 --> 00:09:42,150 Well, once you get to zero F, in hexadecimal, 194 00:09:42,150 --> 00:09:45,060 if F is the highest you can count, just like in decimal, 195 00:09:45,060 --> 00:09:48,780 nine is the highest you can count, what comes next? 196 00:09:48,780 --> 00:09:53,250 If this is 15 I claim, how do I represent 16 in hexadecimal, 197 00:09:53,250 --> 00:09:56,520 with what pattern of symbols? 198 00:09:56,520 --> 00:09:58,600 What pattern of symbols for hexadecimal? 199 00:09:58,600 --> 00:09:59,100 Yeah. 200 00:09:59,100 --> 00:09:59,892 AUDIENCE: One zero. 201 00:09:59,892 --> 00:10:02,760 DAVID J. MALAN: So one zero, not 10, even though you might read it 202 00:10:02,760 --> 00:10:04,080 like that as a typical human. 203 00:10:04,080 --> 00:10:05,400 But one zero. 204 00:10:05,400 --> 00:10:06,207 Because why? 205 00:10:06,207 --> 00:10:08,040 Well, even if this is completely new to you, 206 00:10:08,040 --> 00:10:11,340 the whole column system, the places, are exactly the same intuitively. 207 00:10:11,340 --> 00:10:15,360 So you need one in the 16's place and a zero in the ones place. 208 00:10:15,360 --> 00:10:17,612 And we won't count all the way up to 255, 209 00:10:17,612 --> 00:10:19,320 but we count if we count a little higher, 210 00:10:19,320 --> 00:10:24,180 this would be one zero, AKA 16 in decimal, this would be one one, 211 00:10:24,180 --> 00:10:30,510 AKA 17 in decimal, and then 18, 19, 20, and so forth, dot, dot, dot. 212 00:10:30,510 --> 00:10:33,300 And we can count all the way up to FF. 213 00:10:33,300 --> 00:10:36,180 Because if F is the biggest digit in hexadecimal, 214 00:10:36,180 --> 00:10:38,610 FF is indeed as high as we can count. 215 00:10:38,610 --> 00:10:42,570 And if each F represents 15, well, let's just do the math like in week zero. 216 00:10:42,570 --> 00:10:48,510 So 16 times f plus 1 times f is how all of us learn to do math in grade school, 217 00:10:48,510 --> 00:10:50,250 even though not in hexadecimal. 218 00:10:50,250 --> 00:10:54,480 That's of course 16 times 15 plus 1 times 15. 219 00:10:54,480 --> 00:10:57,810 Multiply that out, you get 240, plus 15. 220 00:10:57,810 --> 00:11:04,887 And ergo, you can count as high as 255 using two hexadecimal digits. 221 00:11:04,887 --> 00:11:06,720 Now this is not the kind of thing where this 222 00:11:06,720 --> 00:11:09,930 is going to be an interesting exercise mentally to ever convert in your head. 223 00:11:09,930 --> 00:11:12,740 Generally, you'll get used to the fact that after nine comes 224 00:11:12,740 --> 00:11:15,063 A and the biggest digit is F. And you'll just 225 00:11:15,063 --> 00:11:17,480 start to see patterns like this in the world of Photoshop, 226 00:11:17,480 --> 00:11:19,520 web pages in a few weeks, and beyond. 227 00:11:19,520 --> 00:11:22,760 But why is hexadecimal useful? 228 00:11:22,760 --> 00:11:25,670 Why are we complicating the world and adding 229 00:11:25,670 --> 00:11:27,690 on top of decimals something else? 230 00:11:27,690 --> 00:11:30,260 Well, it turns out that a single decimal digit, like F, 231 00:11:30,260 --> 00:11:32,960 the biggest one for instance, is 15. 232 00:11:32,960 --> 00:11:35,360 And here, let me just propose a bit of mental math. 233 00:11:35,360 --> 00:11:41,270 How many bits do you need to represent the number 15 in binary? 234 00:11:41,270 --> 00:11:45,630 If you've got the ones place, twos place, 4s and so forth, 235 00:11:45,630 --> 00:11:47,166 how many bits total? 236 00:11:47,166 --> 00:11:47,980 AUDIENCE: Five. 237 00:11:47,980 --> 00:11:51,610 DAVID J. MALAN: So fewer than five to count as high as 15 I think. 238 00:11:51,610 --> 00:11:53,350 But close. 239 00:11:53,350 --> 00:11:55,960 Someone else? 240 00:11:55,960 --> 00:11:56,720 I'm seeing a hand. 241 00:11:56,720 --> 00:11:57,220 Yeah. 242 00:11:57,220 --> 00:11:57,845 AUDIENCE: Four. 243 00:11:57,845 --> 00:12:00,340 DAVID J. MALAN: So four bits I think suffice. 244 00:12:00,340 --> 00:12:04,000 Because if you want to count as high as F, that is to say 15, 245 00:12:04,000 --> 00:12:06,910 I think if you have four bits, you can do that. 246 00:12:06,910 --> 00:12:10,280 Because if over here is the ones place from week zero for binary, 247 00:12:10,280 --> 00:12:13,750 this is the twos place, this is the fours placed, this is the eights place. 248 00:12:13,750 --> 00:12:14,710 Do up some quick math. 249 00:12:14,710 --> 00:12:19,370 So 8 plus 4 is 12, plus 2 is 14, plus 1 is 15. 250 00:12:19,370 --> 00:12:22,690 So it turns out that by convenience, hexadecimal digits 251 00:12:22,690 --> 00:12:26,360 can just be represented consistently with four bits or fewer. 252 00:12:26,360 --> 00:12:27,190 But four. 253 00:12:27,190 --> 00:12:29,140 And four, of course, is half of eight. 254 00:12:29,140 --> 00:12:32,270 And eight is everywhere, like 8 bits is a byte, which is, again, 255 00:12:32,270 --> 00:12:33,590 just a convention we've seen. 256 00:12:33,590 --> 00:12:36,940 And so the reason that you see hexadecimal in the world of Photoshop, 257 00:12:36,940 --> 00:12:39,910 and eventually web pages, is it actually just maps 258 00:12:39,910 --> 00:12:43,330 really nicely to expressing binary numbers more 259 00:12:43,330 --> 00:12:45,800 succinctly with a fixed number of digits. 260 00:12:45,800 --> 00:12:52,677 So for instance, any time you see 11111111 in the world as binary, 261 00:12:52,677 --> 00:12:53,260 you know what? 262 00:12:53,260 --> 00:12:55,600 That's a little tedious to both say and write. 263 00:12:55,600 --> 00:13:02,230 You can represent more succinctly any group of four 1 bits more succinctly 264 00:13:02,230 --> 00:13:09,310 in hexadecimal as just F. So 11111111 in binary more succinctly and more 265 00:13:09,310 --> 00:13:13,000 commonly now in the world of Photoshop, memory, images, and the like 266 00:13:13,000 --> 00:13:14,950 is represented more succinctly as FF. 267 00:13:14,950 --> 00:13:18,350 And that's why because it just maps really nicely to 4 bits. 268 00:13:18,350 --> 00:13:20,630 And so we can be a little more succinct. 269 00:13:20,630 --> 00:13:23,710 So any questions on hexadecimal, which is just 270 00:13:23,710 --> 00:13:27,110 another way of representing information but using the same grade school 271 00:13:27,110 --> 00:13:27,610 approach? 272 00:13:27,610 --> 00:13:28,110 Yeah. 273 00:13:28,110 --> 00:13:28,780 AUDIENCE: So-- 274 00:13:28,780 --> 00:13:30,030 DAVID J. MALAN: Good question. 275 00:13:30,030 --> 00:13:33,130 If you represent 15 with F, it would use 4 bits. 276 00:13:33,130 --> 00:13:37,450 So base systems are really just a way for us humans on paper or on screens 277 00:13:37,450 --> 00:13:38,830 to represent information. 278 00:13:38,830 --> 00:13:42,820 If F represents the decimal number 15, the computer underneath the hood 279 00:13:42,820 --> 00:13:45,610 has to use 4 bits to represent it. 280 00:13:45,610 --> 00:13:48,700 So one hexadecimal digit by convention always 281 00:13:48,700 --> 00:13:51,160 implies 4 bits underneath the hood. 282 00:13:51,160 --> 00:13:53,680 So therefore, if you have two hexadecimal digits, 283 00:13:53,680 --> 00:13:57,430 like zero, zero, that means eight zero bits underneath the hood 284 00:13:57,430 --> 00:13:59,200 like for red or for green. 285 00:13:59,200 --> 00:14:03,460 If you see FF, now we know that's 4 one bits and another 4 one bits. 286 00:14:03,460 --> 00:14:05,650 And if we do out the math, that's 255. 287 00:14:05,650 --> 00:14:14,320 That's why in Photoshop, 0000FF means no red, no green, and 255 of blue. 288 00:14:14,320 --> 00:14:17,050 And it's just way more succinct than writing out what, 8 plus 8, 289 00:14:17,050 --> 00:14:19,090 plus 8, 24 zeros and ones. 290 00:14:19,090 --> 00:14:21,370 And it's just cleaner than even using decimal 291 00:14:21,370 --> 00:14:25,273 when you're using units of eight, which again computers just use everywhere. 292 00:14:25,273 --> 00:14:26,440 So it's just another system. 293 00:14:26,440 --> 00:14:28,273 It's not one you need to dwell on very much. 294 00:14:28,273 --> 00:14:32,110 But again, it's fundamentally no different from binary or decimal. 295 00:14:32,110 --> 00:14:34,977 We're just using a slightly different base. 296 00:14:34,977 --> 00:14:35,560 Now all right. 297 00:14:35,560 --> 00:14:37,720 Well, we had this blank canvas here. 298 00:14:37,720 --> 00:14:40,600 And I think, are you two perhaps ready to reveal 299 00:14:40,600 --> 00:14:42,140 for the world what you've created? 300 00:14:42,140 --> 00:14:43,348 Do you want to go ahead and-- 301 00:14:43,348 --> 00:14:45,170 I'll swivel it around for you. 302 00:14:45,170 --> 00:14:45,670 All right. 303 00:14:45,670 --> 00:14:46,180 Here we go. 304 00:14:46,180 --> 00:14:46,870 Big reveal. 305 00:14:46,870 --> 00:14:51,760 And today's pixel art, a round of applause if we could. 306 00:14:51,760 --> 00:14:54,280 307 00:14:54,280 --> 00:14:55,325 Very nicely done. 308 00:14:55,325 --> 00:14:56,200 Well, thank you both. 309 00:14:56,200 --> 00:14:58,600 If you want to come up after, and tear this off, and bring it home, 310 00:14:58,600 --> 00:15:00,430 you're welcome to, and keep the post-it notes too. 311 00:15:00,430 --> 00:15:02,140 Well, thank you to our volunteers there. 312 00:15:02,140 --> 00:15:05,793 Let's now translate this to really more technical world 313 00:15:05,793 --> 00:15:07,960 where we're going to see and consider it more often. 314 00:15:07,960 --> 00:15:10,570 Because in fact, sometimes, when you've had error messages 315 00:15:10,570 --> 00:15:13,000 over the past few weeks from clang, the compiler, 316 00:15:13,000 --> 00:15:15,483 you might have even seen evidence of hexadecimal. 317 00:15:15,483 --> 00:15:16,400 We didn't call it out. 318 00:15:16,400 --> 00:15:17,980 It wasn't useful to know at the time. 319 00:15:17,980 --> 00:15:21,880 But it turns out a lot of programs use, and a lot of code, 320 00:15:21,880 --> 00:15:25,490 uses hexadecimal for those reasons of more precise-- 321 00:15:25,490 --> 00:15:26,930 more succinct representation. 322 00:15:26,930 --> 00:15:28,840 So for instance, where else might we see it? 323 00:15:28,840 --> 00:15:31,990 Well, here's that picture we keep pulling up of our computer's memory. 324 00:15:31,990 --> 00:15:34,330 And each of these squares in this grid represents 325 00:15:34,330 --> 00:15:37,210 a byte, sort of top left to bottom right in the computer's memory. 326 00:15:37,210 --> 00:15:39,730 But again, just an artist's representation. 327 00:15:39,730 --> 00:15:43,570 A few weeks ago, I claimed that each of these bytes can be numbered of course. 328 00:15:43,570 --> 00:15:46,300 Like this is byte 0 at top left, then byte one, then 329 00:15:46,300 --> 00:15:49,760 byte two, then byte two billion if you have 2 gigabytes of memory. 330 00:15:49,760 --> 00:15:54,760 And so we could just number them like this, zero through 15 on up. 331 00:15:54,760 --> 00:15:56,480 16, 17, 18, and so forth. 332 00:15:56,480 --> 00:16:00,500 But per the reasons earlier, it's just more common in computer systems 333 00:16:00,500 --> 00:16:03,340 and in software to actually use hexadecimal just 334 00:16:03,340 --> 00:16:07,030 to describe the locations of, the addresses, of things in memory. 335 00:16:07,030 --> 00:16:10,120 So instead, a typical programmer, or a computer scientist, 336 00:16:10,120 --> 00:16:14,230 would call these first 16 bytes zero through F just because. 337 00:16:14,230 --> 00:16:17,000 But that's because it's a predictable number of bits. 338 00:16:17,000 --> 00:16:21,670 So if we keep going beyond that, you would get not 10, not 11, not 12, 339 00:16:21,670 --> 00:16:25,900 but in hexadecimal, one, zero, one, one, one, two, and so forth, 340 00:16:25,900 --> 00:16:30,520 all the way down on the screen to one F. And if I shrunk this down or had 341 00:16:30,520 --> 00:16:34,450 a bigger monitor, we would see eventually 255 bytes later 342 00:16:34,450 --> 00:16:37,400 from the start 255 as well. 343 00:16:37,400 --> 00:16:40,840 But there's a potential problem here with using hexadecimal in this way. 344 00:16:40,840 --> 00:16:42,730 There's an ambiguity. 345 00:16:42,730 --> 00:16:49,180 Can anyone imagine what can go wrong if we use hex to just simply describe 346 00:16:49,180 --> 00:16:52,960 locations in memory like this? 347 00:16:52,960 --> 00:16:53,650 Yeah. 348 00:16:53,650 --> 00:16:55,285 AUDIENCE: One zero might also be 10. 349 00:16:55,285 --> 00:16:56,160 DAVID J. MALAN: Yeah. 350 00:16:56,160 --> 00:16:57,960 One zero might also be 10. 351 00:16:57,960 --> 00:17:01,090 And maybe if you're really thorough, OK, wait a minute. 352 00:17:01,090 --> 00:17:02,950 It can't be 10 because here's F over here. 353 00:17:02,950 --> 00:17:04,200 So it's obviously not decimal. 354 00:17:04,200 --> 00:17:07,079 But why create potential confusion, especially when you're collaborating, 355 00:17:07,079 --> 00:17:08,412 building something with someone? 356 00:17:08,412 --> 00:17:09,760 We want to avoid that ambiguity. 357 00:17:09,760 --> 00:17:12,359 And so the convention humans decided on years ago 358 00:17:12,359 --> 00:17:16,380 is that if you want to make clear that a number is in hexadecimal just 359 00:17:16,380 --> 00:17:20,790 by convention, you prefix all of the digits with 0x. 360 00:17:20,790 --> 00:17:22,950 The X is not another character. 361 00:17:22,950 --> 00:17:24,720 It's not a 17th character. 362 00:17:24,720 --> 00:17:29,700 It's just a human convention of putting 0x to imply, here comes hexadecimal. 363 00:17:29,700 --> 00:17:31,020 And now it's unambiguous. 364 00:17:31,020 --> 00:17:35,760 So now we see 0x10 obviously is not 10 as we know it in decimal. 365 00:17:35,760 --> 00:17:39,060 But rather it's the number that comes after a single F. 366 00:17:39,060 --> 00:17:41,430 So it's really the number in decimal 16. 367 00:17:41,430 --> 00:17:46,620 So 0x, any time you see it, that's just a visual cue that what is ahead 368 00:17:46,620 --> 00:17:48,940 is actually hexadecimal. 369 00:17:48,940 --> 00:17:52,480 So let's now start playing around with this information. 370 00:17:52,480 --> 00:17:54,750 So here's a super simple line of code from week one 371 00:17:54,750 --> 00:17:59,445 where I'm just declaring a variable n, and I'm defining it to be the value 50. 372 00:17:59,445 --> 00:18:00,570 And this is out of context. 373 00:18:00,570 --> 00:18:02,612 We probably need a main function and all of that. 374 00:18:02,612 --> 00:18:05,820 But let's just rewind to week one where we actually saw code like this 375 00:18:05,820 --> 00:18:08,530 and do something useful with a line of code like this. 376 00:18:08,530 --> 00:18:10,500 So let me go over here to VS Code. 377 00:18:10,500 --> 00:18:14,070 And in VS Code, I'll create a program called, how about addresses? 378 00:18:14,070 --> 00:18:15,900 Since the goal of this-- 379 00:18:15,900 --> 00:18:20,310 the goal here is to just play around, ultimately, with a variable like n. 380 00:18:20,310 --> 00:18:21,750 And let me go ahead and do this. 381 00:18:21,750 --> 00:18:24,510 I'll include, how about standard I/O.h? 382 00:18:24,510 --> 00:18:25,770 I'll do int main void. 383 00:18:25,770 --> 00:18:28,260 So no command line arguments for now. 384 00:18:28,260 --> 00:18:30,090 Int n gets 50. 385 00:18:30,090 --> 00:18:32,950 And now so that we can do something mildly useful with it, 386 00:18:32,950 --> 00:18:37,950 let's just go use printf and print out with %i and then a new line whatever 387 00:18:37,950 --> 00:18:38,970 that value of n is. 388 00:18:38,970 --> 00:18:41,020 So this is not going to be interesting per se. 389 00:18:41,020 --> 00:18:43,620 It's just week one stuff where I'm defining a variable 390 00:18:43,620 --> 00:18:45,610 and printing it out to the screen. 391 00:18:45,610 --> 00:18:49,290 So let me go down to my terminal window and do make addresses. 392 00:18:49,290 --> 00:18:50,405 No errors. 393 00:18:50,405 --> 00:18:51,030 So that's good. 394 00:18:51,030 --> 00:18:52,440 I'll do dot slash addresses. 395 00:18:52,440 --> 00:18:55,500 And of course, I should see the number 50 here. 396 00:18:55,500 --> 00:18:57,360 Now what's going on underneath the hood? 397 00:18:57,360 --> 00:19:00,900 Let's translate now code to really what's going 398 00:19:00,900 --> 00:19:03,430 on underneath the hood of the computer. 399 00:19:03,430 --> 00:19:05,793 So if this is our grid of memory, I don't necessarily 400 00:19:05,793 --> 00:19:07,710 know as the programmer, and I definitely don't 401 00:19:07,710 --> 00:19:10,440 care as the programmer, where exactly it's ending up in memory. 402 00:19:10,440 --> 00:19:11,670 That's the whole point of using code. 403 00:19:11,670 --> 00:19:13,140 Let the computer figure this out. 404 00:19:13,140 --> 00:19:17,430 But at least conceptually, I know that by declaring a line of code like that, 405 00:19:17,430 --> 00:19:21,250 the number 50 ends up somewhere in the computer's memory. 406 00:19:21,250 --> 00:19:25,980 And it's assigned the name n, a symbol n, by which I, the programmer, 407 00:19:25,980 --> 00:19:26,890 can refer to it. 408 00:19:26,890 --> 00:19:33,810 And I very deliberately used four of these squares for what reason? 409 00:19:33,810 --> 00:19:37,260 What might be the reason for using four squares specifically? 410 00:19:37,260 --> 00:19:38,100 Yeah. 411 00:19:38,100 --> 00:19:39,660 Yeah, so an integer is 4 bytes. 412 00:19:39,660 --> 00:19:42,870 At least most of the time on modern systems, an integer is 4 bytes. 413 00:19:42,870 --> 00:19:44,760 On an older computer, it might just use one. 414 00:19:44,760 --> 00:19:46,890 Or maybe even 2 bytes. 415 00:19:46,890 --> 00:19:49,710 But here, by convention, we're almost always going to see 4 bytes. 416 00:19:49,710 --> 00:19:51,190 I don't know if it's going to end up here. 417 00:19:51,190 --> 00:19:52,330 It might end up over here. 418 00:19:52,330 --> 00:19:53,550 But for now, who cares? 419 00:19:53,550 --> 00:19:56,340 I just know that the computer can store the information 420 00:19:56,340 --> 00:19:58,450 in this way underneath the hood. 421 00:19:58,450 --> 00:20:01,560 So let's now introduce another feature of C 422 00:20:01,560 --> 00:20:04,260 that we haven't had occasion to use just yet that's 423 00:20:04,260 --> 00:20:07,560 going to allow us to start poking around the computer's memory 424 00:20:07,560 --> 00:20:08,640 for better or for worse. 425 00:20:08,640 --> 00:20:10,348 And this is one of those situations where 426 00:20:10,348 --> 00:20:14,850 you're about to learn, acquire a skill, a power, that can actually come back 427 00:20:14,850 --> 00:20:15,420 to bite you. 428 00:20:15,420 --> 00:20:18,510 Because once you know how to start poking around a computer's memory, 429 00:20:18,510 --> 00:20:20,050 you can do very powerful things. 430 00:20:20,050 --> 00:20:22,920 And next week, we'll see what you can build in a computer's memory, 431 00:20:22,920 --> 00:20:24,997 but you can also screw up pretty easily and cause 432 00:20:24,997 --> 00:20:28,080 more of those segmentation faults that a few of you have already suffered. 433 00:20:28,080 --> 00:20:31,350 So with that said, let's just stipulate that you know what? 434 00:20:31,350 --> 00:20:34,510 I don't care necessarily where the 50 is in memory. 435 00:20:34,510 --> 00:20:37,230 But I know it exists at some address in memory. 436 00:20:37,230 --> 00:20:39,300 And just so I have an easy address to pronounce, 437 00:20:39,300 --> 00:20:42,060 let's just suppose it lives at 0x123. 438 00:20:42,060 --> 00:20:45,180 So that's the address in memory in hexadecimal by convention. 439 00:20:45,180 --> 00:20:48,730 And that just happens to be where it ends up when I write that line of code. 440 00:20:48,730 --> 00:20:52,770 But it turns out, C has some other operators we can use. 441 00:20:52,770 --> 00:20:55,290 When we've seen the asterisk before, the star, and we've 442 00:20:55,290 --> 00:20:56,588 used it for multiplication. 443 00:20:56,588 --> 00:20:59,130 But today, we're going to use it for something more powerful. 444 00:20:59,130 --> 00:21:01,338 And we're also going to introduce an ampersand, which 445 00:21:01,338 --> 00:21:02,970 allows us to do something as well. 446 00:21:02,970 --> 00:21:06,270 The ampersand operator is going to allow us 447 00:21:06,270 --> 00:21:11,970 to get the address of a piece of data in memory, like by literally putting 448 00:21:11,970 --> 00:21:14,520 ampersand before the name of a variable, C 449 00:21:14,520 --> 00:21:18,210 will tell us, tell you, what address that variable lives at. 450 00:21:18,210 --> 00:21:20,730 Maybe it's 0x123, maybe it's 0x456. 451 00:21:20,730 --> 00:21:21,270 Who knows? 452 00:21:21,270 --> 00:21:23,610 But that will give you back the answer. 453 00:21:23,610 --> 00:21:25,360 The star does the opposite. 454 00:21:25,360 --> 00:21:26,710 It sort of means, go there. 455 00:21:26,710 --> 00:21:30,090 So using the star, otherwise known as the de-reference operator, 456 00:21:30,090 --> 00:21:33,000 I can actually go to a specific address if I want. 457 00:21:33,000 --> 00:21:35,230 And we'll see what this means in code. 458 00:21:35,230 --> 00:21:39,510 So how can I leverage this in some mildly interesting way 459 00:21:39,510 --> 00:21:40,470 to start poking around? 460 00:21:40,470 --> 00:21:44,650 But eventually, we'll use this primitive to build more interesting things. 461 00:21:44,650 --> 00:21:47,520 So let me go back to say, VS Code here. 462 00:21:47,520 --> 00:21:49,350 And let me go ahead and do this. 463 00:21:49,350 --> 00:21:51,210 I'll clear my terminal to start fresh. 464 00:21:51,210 --> 00:21:55,430 And I'll introduce another format code for printf, %p. 465 00:21:55,430 --> 00:21:59,630 And for now, just take it on faith that this it is %p because. 466 00:21:59,630 --> 00:22:05,420 But %p is going to allow me to print the address of a variable if I additionally 467 00:22:05,420 --> 00:22:08,150 tell C, get the address of n. 468 00:22:08,150 --> 00:22:10,340 So I'm changing %i to %p. 469 00:22:10,340 --> 00:22:13,700 And that's just something you have to do when printing addresses for now. 470 00:22:13,700 --> 00:22:17,310 But I need to change an in front of the variable name. 471 00:22:17,310 --> 00:22:19,220 So I don't print n, the number 50. 472 00:22:19,220 --> 00:22:21,178 I print out something like 0x123. 473 00:22:21,178 --> 00:22:22,970 And it's not going to be as simple as that. 474 00:22:22,970 --> 00:22:24,970 We'll see on the screen though where it actually 475 00:22:24,970 --> 00:22:27,090 ended up in my code space's memory. 476 00:22:27,090 --> 00:22:28,490 So here we go. 477 00:22:28,490 --> 00:22:32,720 Dot-- down in my terminal, make addresses again to recompile. 478 00:22:32,720 --> 00:22:37,610 And now, dot slash addresses should reveal not the value of 50, 479 00:22:37,610 --> 00:22:40,310 but the address of 50. 480 00:22:40,310 --> 00:22:41,570 And there it is. 481 00:22:41,570 --> 00:22:42,890 It's pretty long. 482 00:22:42,890 --> 00:22:45,230 It's not quite as simple and pretty as 0x123. 483 00:22:45,230 --> 00:22:47,720 But there's the 0x, meaning here's a hexadecimal address. 484 00:22:47,720 --> 00:22:52,070 And it's 7ffcc784a04c. 485 00:22:52,070 --> 00:22:55,310 Suffice it to say your code space, and even your Macs and PCs nowadays, 486 00:22:55,310 --> 00:22:56,760 have a lot of memory. 487 00:22:56,760 --> 00:23:00,320 That's why, in part, this address is so big, not as small 488 00:23:00,320 --> 00:23:02,040 as the thing on my slide. 489 00:23:02,040 --> 00:23:05,840 So this at the moment isn't that useful yet. 490 00:23:05,840 --> 00:23:09,800 But it introduces us to a concept that we'll now call pointers. 491 00:23:09,800 --> 00:23:14,420 And pointers are admittedly one of the more challenging aspects of C. 492 00:23:14,420 --> 00:23:18,643 And if in future life, you tell friends that, oh, I took a class called CS50, 493 00:23:18,643 --> 00:23:20,810 and we learned C, you'll probably get kind of a look 494 00:23:20,810 --> 00:23:22,460 at people like, why did you learn C? 495 00:23:22,460 --> 00:23:23,900 Or like, oh, C was hard. 496 00:23:23,900 --> 00:23:27,380 And it's largely because of this topic, which 497 00:23:27,380 --> 00:23:30,812 isn't to say that it's that hard to wrap your mind around. 498 00:23:30,812 --> 00:23:32,270 But it's definitely very different. 499 00:23:32,270 --> 00:23:36,080 And it's not a feature that you can harness in higher level languages 500 00:23:36,080 --> 00:23:39,800 that we'll see in class two, like Python, and Java, and the like. 501 00:23:39,800 --> 00:23:42,290 C is about as close to the computer's hardware, 502 00:23:42,290 --> 00:23:45,350 so to speak, that you can get before things get actually scary, 503 00:23:45,350 --> 00:23:48,770 the so-called assembly language we saw in week two when I had a link, 504 00:23:48,770 --> 00:23:50,900 and compile, and assemble, and all of that. 505 00:23:50,900 --> 00:23:52,100 That gets really low level. 506 00:23:52,100 --> 00:23:55,310 And you really have to be an expert with the computer's CPU, or brain, 507 00:23:55,310 --> 00:23:56,340 to understand that. 508 00:23:56,340 --> 00:23:59,730 But with C, you can actually poke around the computer's memory 509 00:23:59,730 --> 00:24:01,130 and do powerful things with that. 510 00:24:01,130 --> 00:24:03,470 But again, with great power comes responsibility. 511 00:24:03,470 --> 00:24:07,670 It's very easy to break programs by misusing memory or just having a bug 512 00:24:07,670 --> 00:24:11,220 that touches memory in some way that you don't intend. 513 00:24:11,220 --> 00:24:16,370 So pointers, at the end of the day, are pretty much what we just saw. 514 00:24:16,370 --> 00:24:22,910 A pointer is really just a variable that contains the address of some value. 515 00:24:22,910 --> 00:24:25,790 A pointer is a variable that contains the address of some value, 516 00:24:25,790 --> 00:24:28,280 or more simply, it's fine to think of it as an address. 517 00:24:28,280 --> 00:24:31,650 A pointer is an address of something in the computer's memory. 518 00:24:31,650 --> 00:24:35,880 Now, what might we do to actualize this? 519 00:24:35,880 --> 00:24:37,440 Well, here's two lines of code. 520 00:24:37,440 --> 00:24:41,600 It turns out by using our two new operators today, I can declare an int, 521 00:24:41,600 --> 00:24:45,770 call it n, and assign it a value like 50, just like before. 522 00:24:45,770 --> 00:24:49,400 If I want to store the address of n in a variable, 523 00:24:49,400 --> 00:24:51,680 and not just print it immediately via printf, 524 00:24:51,680 --> 00:24:54,458 I can declare a variable, for instance, called p. 525 00:24:54,458 --> 00:24:56,750 But I could call it anything I want, like any variable. 526 00:24:56,750 --> 00:24:59,960 But because it's an address, it's not int p. 527 00:24:59,960 --> 00:25:03,410 It has to be int star p, so to speak. 528 00:25:03,410 --> 00:25:06,590 And the star here on the left hand side of the equal sign 529 00:25:06,590 --> 00:25:10,790 is just a clue to see that means p is going to be a pointer. 530 00:25:10,790 --> 00:25:13,520 That is, p is going to be the address of what? 531 00:25:13,520 --> 00:25:15,450 The address of an integer. 532 00:25:15,450 --> 00:25:17,540 Now technically, it's still an integer itself 533 00:25:17,540 --> 00:25:21,230 because an address is just a number whether it's 1, 2, 3, or 0x123. 534 00:25:21,230 --> 00:25:23,310 So this is really just a semantic difference. 535 00:25:23,310 --> 00:25:26,450 So int star p just means that this variable doesn't 536 00:25:26,450 --> 00:25:28,580 contain any old number, like 50. 537 00:25:28,580 --> 00:25:33,570 It specifically contains a number that is the address of something else. 538 00:25:33,570 --> 00:25:35,670 So how can I now use this? 539 00:25:35,670 --> 00:25:37,400 Well, let me go back to VS Code. 540 00:25:37,400 --> 00:25:41,670 And let me propose that we add a line of code like that. 541 00:25:41,670 --> 00:25:44,220 So instead of just directly printing out that value, 542 00:25:44,220 --> 00:25:48,170 let's go ahead and define a second variable called p that's of type int 543 00:25:48,170 --> 00:25:53,480 star p, set it equal to ampersand n, and then this time, 544 00:25:53,480 --> 00:25:55,460 let's not just print out ampersand n. 545 00:25:55,460 --> 00:25:57,530 Let's actually print out the value of p. 546 00:25:57,530 --> 00:25:59,810 So the only two new things here if I zoom in 547 00:25:59,810 --> 00:26:04,160 are I've used not only the ampersand on the right to get the address of n. 548 00:26:04,160 --> 00:26:07,130 I'm now using the star on the left to tell C 549 00:26:07,130 --> 00:26:10,190 that p is still a variable as always. 550 00:26:10,190 --> 00:26:11,450 But it's a pointer. 551 00:26:11,450 --> 00:26:14,897 It is the address of some other value like this. 552 00:26:14,897 --> 00:26:17,480 And I'm still going to print it with the same format code, %p. 553 00:26:17,480 --> 00:26:18,660 So that doesn't change. 554 00:26:18,660 --> 00:26:24,680 So let me go ahead and zoom out and do make addresses, and ./addresses. 555 00:26:24,680 --> 00:26:27,620 And there it is, exactly the same thing. 556 00:26:27,620 --> 00:26:30,500 Now in and of itself, not that useful yet. 557 00:26:30,500 --> 00:26:34,280 But the fact that you can now access the addresses of things in memory 558 00:26:34,280 --> 00:26:38,360 means that we'll be able to build things, and construct things, and link 559 00:26:38,360 --> 00:26:41,690 things together by knowing where they live, so to speak. 560 00:26:41,690 --> 00:26:44,570 So any questions on this technique thus far? 561 00:26:44,570 --> 00:26:45,511 Yeah. 562 00:26:45,511 --> 00:26:48,767 AUDIENCE: I guess I'm a little confused about the [INAUDIBLE].. 563 00:26:48,767 --> 00:26:50,100 DAVID J. MALAN: A good question. 564 00:26:50,100 --> 00:26:52,320 On line six, must it be star p and ampersand? 565 00:26:52,320 --> 00:26:54,210 And in this case, yes. 566 00:26:54,210 --> 00:26:55,290 Because what am I doing? 567 00:26:55,290 --> 00:26:58,050 On the left, and I'll get rid of the equal sign for now, 568 00:26:58,050 --> 00:27:02,640 this would give me a variable called p that's not an integer per se, 569 00:27:02,640 --> 00:27:04,710 but that's the address of an integer. 570 00:27:04,710 --> 00:27:08,110 But without the equal sign, I'm not storing anything in that variable. 571 00:27:08,110 --> 00:27:12,390 So by adding the equal sign and then ampersand n, 572 00:27:12,390 --> 00:27:16,650 I am explicitly figuring out with ampersand what the address of n 573 00:27:16,650 --> 00:27:20,100 is, which already exists per line five and tucking it away 574 00:27:20,100 --> 00:27:23,310 in this new variable called p. 575 00:27:23,310 --> 00:27:24,100 Other questions? 576 00:27:24,100 --> 00:27:24,600 Yeah. 577 00:27:24,600 --> 00:27:26,570 AUDIENCE: [INAUDIBLE] 578 00:27:26,570 --> 00:27:27,820 DAVID J. MALAN: Good question. 579 00:27:27,820 --> 00:27:30,695 Every time I run the program, it uses up a different piece of memory? 580 00:27:30,695 --> 00:27:31,770 Short answer, yes. 581 00:27:31,770 --> 00:27:33,628 Computers, though, long story short, also 582 00:27:33,628 --> 00:27:35,170 have something called virtual memory. 583 00:27:35,170 --> 00:27:37,337 So if you run it again and again, you might actually 584 00:27:37,337 --> 00:27:40,590 see the same addresses on the same Mac, or PC, or cloud-based server. 585 00:27:40,590 --> 00:27:44,580 But we'll see in a bit where at a high level it's laid out. 586 00:27:44,580 --> 00:27:47,447 But it will always exist at some address. 587 00:27:47,447 --> 00:27:48,030 Good question. 588 00:27:48,030 --> 00:27:48,530 Yeah. 589 00:27:48,530 --> 00:27:50,120 AUDIENCE: [INAUDIBLE] 590 00:27:50,120 --> 00:27:51,120 DAVID J. MALAN: Correct. 591 00:27:51,120 --> 00:27:52,980 Ampersand n is the address of n. 592 00:27:52,980 --> 00:27:57,660 And int star p is a pointer called p. 593 00:27:57,660 --> 00:28:02,550 And honestly, in an ideal world, if C were made today and not decades ago 594 00:28:02,550 --> 00:28:05,030 when humans were first creating languages, 595 00:28:05,030 --> 00:28:08,220 ideally, we would just have a data type called pointer. 596 00:28:08,220 --> 00:28:11,370 And then this would be a little less complicated because it would literally 597 00:28:11,370 --> 00:28:12,480 be what it says. 598 00:28:12,480 --> 00:28:14,290 The humans who invented C didn't do this. 599 00:28:14,290 --> 00:28:15,580 But this is the idea. 600 00:28:15,580 --> 00:28:18,060 So pointer is not a legitimate word in the code. 601 00:28:18,060 --> 00:28:20,070 It is a term of R in English. 602 00:28:20,070 --> 00:28:21,780 But this is really just the idea. 603 00:28:21,780 --> 00:28:25,140 But the way you express pointer as a data type 604 00:28:25,140 --> 00:28:29,530 is a little more cryptic as int star p here. 605 00:28:29,530 --> 00:28:34,643 But notice in line seven, when I print out p, I don't use a star. 606 00:28:34,643 --> 00:28:35,685 I don't use an ampersand. 607 00:28:35,685 --> 00:28:36,090 Why? 608 00:28:36,090 --> 00:28:38,040 I literally just want to print the value of p. 609 00:28:38,040 --> 00:28:39,748 And we've been doing that since week one. 610 00:28:39,748 --> 00:28:42,930 If you want to print a variable, just describe the variable by its name. 611 00:28:42,930 --> 00:28:44,400 No special syntax. 612 00:28:44,400 --> 00:28:46,690 Any other questions on this thus far? 613 00:28:46,690 --> 00:28:48,148 AUDIENCE: [INAUDIBLE] 614 00:28:48,148 --> 00:28:50,440 DAVID J. MALAN: What's the advantage of using pointers? 615 00:28:50,440 --> 00:28:53,560 With pointers, we'll see today some applications of them, 616 00:28:53,560 --> 00:28:55,572 really the idea is going to come to fruition 617 00:28:55,572 --> 00:28:57,280 next week when we're going to create what 618 00:28:57,280 --> 00:29:01,330 are called data structures in memory, where we can build not just, 619 00:29:01,330 --> 00:29:04,548 for instance, one dimensional data structures like an array. 620 00:29:04,548 --> 00:29:06,340 We'll see next week, we can actually create 621 00:29:06,340 --> 00:29:08,500 the equivalent of two dimensional data structures, 622 00:29:08,500 --> 00:29:10,250 or even three dimensional data structures, 623 00:29:10,250 --> 00:29:12,852 by using these addresses and sort of linking things together. 624 00:29:12,852 --> 00:29:14,810 And we'll see the beginnings of that this week. 625 00:29:14,810 --> 00:29:18,160 But for now, focus at least for now on just really the syntax 626 00:29:18,160 --> 00:29:20,485 and what these building blocks can do for us. 627 00:29:20,485 --> 00:29:22,960 AUDIENCE: Does the p pointer have to be an integer? 628 00:29:22,960 --> 00:29:25,150 DAVID J. MALAN: Does the p integer-- 629 00:29:25,150 --> 00:29:27,922 does the p pointer have to be an-- point to an integer? 630 00:29:27,922 --> 00:29:28,630 Short answer, no. 631 00:29:28,630 --> 00:29:29,500 And we'll come back to this. 632 00:29:29,500 --> 00:29:31,250 For now, for the sake of discussion, we're 633 00:29:31,250 --> 00:29:33,460 only dealing with integers like the number 50. 634 00:29:33,460 --> 00:29:35,410 You mentioned strings, or characters. 635 00:29:35,410 --> 00:29:36,010 Absolutely. 636 00:29:36,010 --> 00:29:38,060 We're about to go there soon. 637 00:29:38,060 --> 00:29:41,510 So you can use the address of anything you want in the computer's memory. 638 00:29:41,510 --> 00:29:44,643 So in fact, let's translate this now to just the same picture just 639 00:29:44,643 --> 00:29:47,560 to help you wrap your minds around what these two lines of code really 640 00:29:47,560 --> 00:29:49,000 fundamentally are doing. 641 00:29:49,000 --> 00:29:51,460 So if I come back to my grid of memory here, 642 00:29:51,460 --> 00:29:54,910 let's plop the number 50 in the variable n at the bottom right, 643 00:29:54,910 --> 00:29:56,000 like it was before. 644 00:29:56,000 --> 00:29:58,040 So this is that first line of code as before. 645 00:29:58,040 --> 00:30:03,080 But with the new second line of code, as soon as I create p, what do I do? 646 00:30:03,080 --> 00:30:07,177 Well, first, remember that n lives somewhere in the computer's memory. 647 00:30:07,177 --> 00:30:09,010 Usually, I don't care precisely where it is. 648 00:30:09,010 --> 00:30:10,885 But for the sake of discussion, let's suppose 649 00:30:10,885 --> 00:30:14,480 it's at 0x123, which is easier to say than where it actually ended up. 650 00:30:14,480 --> 00:30:15,820 And now what is p? 651 00:30:15,820 --> 00:30:17,630 Well, p is just another variable. 652 00:30:17,630 --> 00:30:19,280 And variables live in memory too. 653 00:30:19,280 --> 00:30:22,550 So let me just hypothesize that p lives up here. 654 00:30:22,550 --> 00:30:28,630 And it turns out that p once you assign it, the value of ampersand n 655 00:30:28,630 --> 00:30:32,110 means that C will take a look at the variable n, 656 00:30:32,110 --> 00:30:37,240 realize, oh it lives at 0x123, and what goes in the value of p 657 00:30:37,240 --> 00:30:39,550 is literally 0x123. 658 00:30:39,550 --> 00:30:42,440 So again, it's still an integer, which is confusing. 659 00:30:42,440 --> 00:30:45,520 But it's technically an integer being used as an address. 660 00:30:45,520 --> 00:30:49,960 And now just a prompt here, notice that this pointer is pretty darn big. 661 00:30:49,960 --> 00:30:51,940 It's like eight squares. 662 00:30:51,940 --> 00:30:53,350 What's the implication of that? 663 00:30:53,350 --> 00:30:55,000 Because I did that deliberately. 664 00:30:55,000 --> 00:30:59,205 How big must a pointer apparently be in most modern systems, would you say? 665 00:30:59,205 --> 00:31:00,080 AUDIENCE: [INAUDIBLE] 666 00:31:00,080 --> 00:31:00,490 DAVID J. MALAN: OK, good. 667 00:31:00,490 --> 00:31:01,480 Computers today are very big. 668 00:31:01,480 --> 00:31:03,310 You have gigabytes of RAM in your computer. 669 00:31:03,310 --> 00:31:05,477 You therefore need big pointers to be able to point, 670 00:31:05,477 --> 00:31:07,640 and memory that's conceptually pretty far away. 671 00:31:07,640 --> 00:31:10,840 So to be clear, how many bytes does a pointer apparently take up? 672 00:31:10,840 --> 00:31:12,820 Well, it seems to take up 8 in total. 673 00:31:12,820 --> 00:31:15,280 Integers by convention nowadays are usually 4. 674 00:31:15,280 --> 00:31:18,393 Pointers though nowadays are typically 8 in this case. 675 00:31:18,393 --> 00:31:20,810 So I'm drawing it in a manner consistent with the reality, 676 00:31:20,810 --> 00:31:23,260 even though at the end of the day, it's not really that interesting 677 00:31:23,260 --> 00:31:24,590 what values are in here. 678 00:31:24,590 --> 00:31:27,012 In fact, let's emerge from these weeds. 679 00:31:27,012 --> 00:31:28,720 I don't really care what else is going on 680 00:31:28,720 --> 00:31:31,012 in my computer's memory at the moment because I've only 681 00:31:31,012 --> 00:31:34,450 got those two lines of juicy code defining n and defining p. 682 00:31:34,450 --> 00:31:36,370 So let's hide all of the other squares. 683 00:31:36,370 --> 00:31:39,190 And honestly, I mean it when I say that programmers 684 00:31:39,190 --> 00:31:43,490 need to know that a variable exists somewhere in memory, 685 00:31:43,490 --> 00:31:46,660 and needs to be able to get that address using the ampersand, 686 00:31:46,660 --> 00:31:50,680 but you're never going to printf like I did, the actual address. 687 00:31:50,680 --> 00:31:53,680 It's not generally interesting, unless you're debugging your code. 688 00:31:53,680 --> 00:31:57,250 But you're not going to start typing out crazy 0x numbers in your code 689 00:31:57,250 --> 00:31:58,370 to move things around. 690 00:31:58,370 --> 00:32:01,720 You just need to know that the computer can figure out where things are. 691 00:32:01,720 --> 00:32:05,800 So frankly, by that logic, who cares that it's 0x123? 692 00:32:05,800 --> 00:32:08,510 Tomorrow, it could be 0x456 or something else. 693 00:32:08,510 --> 00:32:12,610 So one of the ways to think of a pointer is literally as 694 00:32:12,610 --> 00:32:15,890 a variable that points at something else. 695 00:32:15,890 --> 00:32:19,750 And indeed, in this case, p, yeah, technically it has an address. 696 00:32:19,750 --> 00:32:22,030 And yeah, technically it's 0x123 in this story. 697 00:32:22,030 --> 00:32:23,140 But honestly, who cares? 698 00:32:23,140 --> 00:32:28,900 I just need to know that using p, I can get to the value n. 699 00:32:28,900 --> 00:32:30,370 And so what are these addresses? 700 00:32:30,370 --> 00:32:33,430 And in fact, if Carter wouldn't mind joining me up here for a moment, 701 00:32:33,430 --> 00:32:34,480 what are these addresses? 702 00:32:34,480 --> 00:32:36,760 Well, just like in our human world we have mailboxes, 703 00:32:36,760 --> 00:32:39,260 even though you might not check it very frequently nowadays, 704 00:32:39,260 --> 00:32:43,060 but to get physical mail, every home, every business has a unique address. 705 00:32:43,060 --> 00:32:47,080 The Science and Engineering Complex is 150 Western Avenue Allston, 706 00:32:47,080 --> 00:32:49,930 Massachusetts, 02134 USA. 707 00:32:49,930 --> 00:32:53,830 And theoretically, that uniquely identifies that building in the world. 708 00:32:53,830 --> 00:32:56,170 Well, here we have two mailboxes. 709 00:32:56,170 --> 00:33:00,160 Over here, we have a value n that happens to live, I'll claim, 710 00:33:00,160 --> 00:33:01,960 at address 0x123. 711 00:33:01,960 --> 00:33:07,092 And then over here, I claim there's another address called by name p. 712 00:33:07,092 --> 00:33:10,300 I don't actually care where it is, even though it definitely exists somewhere 713 00:33:10,300 --> 00:33:11,540 in the computer's memory. 714 00:33:11,540 --> 00:33:16,090 But if this is p, which is a variable, and that's n, another variable, 715 00:33:16,090 --> 00:33:18,178 ideally, this mailbox would be twice as big 716 00:33:18,178 --> 00:33:19,720 because of the number of bytes using. 717 00:33:19,720 --> 00:33:22,060 But Home Depot only had identical sized mailboxes. 718 00:33:22,060 --> 00:33:23,830 But here is p, one variable. 719 00:33:23,830 --> 00:33:25,690 There is n, another variable. 720 00:33:25,690 --> 00:33:30,790 If I open up this mailbox, what should I find inside of it 721 00:33:30,790 --> 00:33:33,250 based on our story thus far? 722 00:33:33,250 --> 00:33:36,970 What value will I pull out dramatically in just a moment? 723 00:33:36,970 --> 00:33:37,810 Yeah, I think. 724 00:33:37,810 --> 00:33:39,640 0x123. 725 00:33:39,640 --> 00:33:42,160 Now using this, you can kind of think of this as like X 726 00:33:42,160 --> 00:33:44,770 marks the spot, no pun intended, where I can now 727 00:33:44,770 --> 00:33:48,737 walk around the computer's memory and find my way to that location 728 00:33:48,737 --> 00:33:50,320 by sort of following the treasure map. 729 00:33:50,320 --> 00:33:53,770 Or if I want it more dramatically, thanks to our little Yale foam 730 00:33:53,770 --> 00:33:58,385 finger here, you can think of it more abstractly as p is just pointing at n. 731 00:33:58,385 --> 00:33:59,510 That's not going over well. 732 00:33:59,510 --> 00:34:01,177 So let's switch over to the Harvard one. 733 00:34:01,177 --> 00:34:02,775 So p is pointing-- 734 00:34:02,775 --> 00:34:03,400 AUDIENCE: Whoo. 735 00:34:03,400 --> 00:34:06,100 736 00:34:06,100 --> 00:34:07,740 DAVID J. MALAN: So p is pointing at n. 737 00:34:07,740 --> 00:34:10,620 And so it turns out we will be able to write code now 738 00:34:10,620 --> 00:34:13,252 that will do the equivalent of me walking over to n. 739 00:34:13,252 --> 00:34:15,960 But for now, Carter, if you want to reveal what's in the mailbox, 740 00:34:15,960 --> 00:34:19,170 we should see indeed the number 50. 741 00:34:19,170 --> 00:34:20,878 So that's really all that-- 742 00:34:20,878 --> 00:34:22,170 Carter is waiting for applause. 743 00:34:22,170 --> 00:34:24,300 So really, nicely done. 744 00:34:24,300 --> 00:34:28,270 745 00:34:28,270 --> 00:34:28,929 Thank you. 746 00:34:28,929 --> 00:34:31,900 So that's just a physical metaphor of what's going on here. 747 00:34:31,900 --> 00:34:33,760 In one variable, we have an address. 748 00:34:33,760 --> 00:34:36,070 And that variable by convention is called a pointer. 749 00:34:36,070 --> 00:34:39,550 In the other variable per week one, we just have a value like n. 750 00:34:39,550 --> 00:34:43,105 And you can, yes, follow the map and walk yourself 751 00:34:43,105 --> 00:34:44,230 to that particular address. 752 00:34:44,230 --> 00:34:45,772 And we'll see how to do that in code. 753 00:34:45,772 --> 00:34:49,389 But what's really interesting is this abstraction, that pointers literally, 754 00:34:49,389 --> 00:34:54,310 or really I guess, figuratively, point at some other value in memory. 755 00:34:54,310 --> 00:34:56,658 All right, questions, then, on pointers in this form. 756 00:34:56,658 --> 00:34:58,450 AUDIENCE: Can pointers point to each other? 757 00:34:58,450 --> 00:34:59,990 DAVID J. MALAN: Can pointers point to each other? 758 00:34:59,990 --> 00:35:00,790 So yes. 759 00:35:00,790 --> 00:35:02,590 There are things called double pointers. 760 00:35:02,590 --> 00:35:04,298 We're not going to see them anytime soon. 761 00:35:04,298 --> 00:35:08,380 But using star, star, you can express an address of an address. 762 00:35:08,380 --> 00:35:10,370 But we won't see that just yet. 763 00:35:10,370 --> 00:35:13,920 Other questions on pointers? 764 00:35:13,920 --> 00:35:15,401 Yeah, in front. 765 00:35:15,401 --> 00:35:18,630 AUDIENCE: [INAUDIBLE] 766 00:35:18,630 --> 00:35:21,900 DAVID J. MALAN: Are array-- so to summarize, are arrays then pointers? 767 00:35:21,900 --> 00:35:23,767 So short answer, there's a relationship. 768 00:35:23,767 --> 00:35:25,600 And we'll come back to that in a little bit. 769 00:35:25,600 --> 00:35:28,020 But arrays are technically different from pointers. 770 00:35:28,020 --> 00:35:30,965 But we're going to be able to blur the lines a little bit by using one 771 00:35:30,965 --> 00:35:31,590 like the other. 772 00:35:31,590 --> 00:35:34,090 But let me come back to that in just a bit of time. 773 00:35:34,090 --> 00:35:34,590 All right. 774 00:35:34,590 --> 00:35:38,350 So if we have now this mental model, if you will, 775 00:35:38,350 --> 00:35:41,460 of what a pointer is in memory, I think we 776 00:35:41,460 --> 00:35:45,000 can start to peel back a layer of simplification 777 00:35:45,000 --> 00:35:47,700 that we've been assuming for the past few weeks since week one. 778 00:35:47,700 --> 00:35:50,380 So a string, recall, is a sequence of characters. 779 00:35:50,380 --> 00:35:52,380 And so if you want to create a string that says, 780 00:35:52,380 --> 00:35:54,870 hi, in all caps and an exclamation point, 781 00:35:54,870 --> 00:35:57,480 we do string s equals quote unquote "hi". 782 00:35:57,480 --> 00:36:00,070 And we can hard code it like this, or we could use get string. 783 00:36:00,070 --> 00:36:01,903 But for now, just assume that I hardcoded it 784 00:36:01,903 --> 00:36:06,097 into my code to always say, hi, in all caps with an exclamation point. 785 00:36:06,097 --> 00:36:08,430 Well, what does that look like in the computer's memory? 786 00:36:08,430 --> 00:36:10,650 Well, let's stop looking at the entire memory 787 00:36:10,650 --> 00:36:12,630 and let's just focus on really what's going on. 788 00:36:12,630 --> 00:36:17,430 Once you create a string called S and store in it hi, 789 00:36:17,430 --> 00:36:19,740 you know that a couple of things are happening. 790 00:36:19,740 --> 00:36:24,030 H, and I, and the exclamation point are ending up in the computer's memory. 791 00:36:24,030 --> 00:36:29,670 We know from week two that this thing, the so-called NUL character, NUL, AKA 792 00:36:29,670 --> 00:36:32,632 backslash zero, is also being added for you. 793 00:36:32,632 --> 00:36:33,840 And it's somewhere in memory. 794 00:36:33,840 --> 00:36:36,360 At the moment, I don't really care where I drew it at the bottom right. 795 00:36:36,360 --> 00:36:37,435 Yes, it has an address. 796 00:36:37,435 --> 00:36:39,060 But for now, it just ends up somewhere. 797 00:36:39,060 --> 00:36:43,470 And in fact, here's a little visual cue as to how this happens. 798 00:36:43,470 --> 00:36:47,370 In C, any time you use double quotes to give you a string, 799 00:36:47,370 --> 00:36:50,580 you can imagine that the double quotes are like a clue 800 00:36:50,580 --> 00:36:55,200 to not only store HI exclamation point, but also put the NUL character there 801 00:36:55,200 --> 00:36:55,720 for you. 802 00:36:55,720 --> 00:36:59,670 And this is in contrast to what chars, if you want individual characters, what 803 00:36:59,670 --> 00:37:01,545 syntax did we use instead? 804 00:37:01,545 --> 00:37:02,420 AUDIENCE: [INAUDIBLE] 805 00:37:02,420 --> 00:37:03,795 DAVID J. MALAN: So single quotes. 806 00:37:03,795 --> 00:37:06,190 Single quotes do not add magically a backslash zero. 807 00:37:06,190 --> 00:37:08,000 They literally just store one character. 808 00:37:08,000 --> 00:37:10,167 So again, strings have always been a little special. 809 00:37:10,167 --> 00:37:11,110 You get some extra-- 810 00:37:11,110 --> 00:37:13,720 an extra byte for free so that you know where 811 00:37:13,720 --> 00:37:17,270 the string ends, and functions like STR compare can then find their way there. 812 00:37:17,270 --> 00:37:20,630 So in memory, it might indeed look a little like this. 813 00:37:20,630 --> 00:37:24,068 And if we assume that there's going to be somewhere in memory, 814 00:37:24,068 --> 00:37:26,110 these things are going to be somewhere in memory, 815 00:37:26,110 --> 00:37:29,270 we can address them per week two by way of the name of the variable. 816 00:37:29,270 --> 00:37:31,578 So if S is the name of the variable, S bracket 0 817 00:37:31,578 --> 00:37:33,370 is how you would refer to the first letter. 818 00:37:33,370 --> 00:37:34,630 S bracket 1, S bracket 2. 819 00:37:34,630 --> 00:37:39,100 And if you really want, S bracket 3 would get you at the NUL character 820 00:37:39,100 --> 00:37:40,370 at the very end. 821 00:37:40,370 --> 00:37:41,830 But what is S? 822 00:37:41,830 --> 00:37:45,430 So technically in this line of code here, not only 823 00:37:45,430 --> 00:37:51,150 is the computer giving you memory for HI exclamation point backslash zero, we-- 824 00:37:51,150 --> 00:37:54,430 it turns out that S itself must take up some amount of space 825 00:37:54,430 --> 00:37:55,570 because S is the variable. 826 00:37:55,570 --> 00:37:57,778 And every time we've talked about variables thus far, 827 00:37:57,778 --> 00:38:00,980 I've given you a rectangle on the screen in which to store its value. 828 00:38:00,980 --> 00:38:05,650 So let's assume for the sake of discussion that the H is at 0x123 829 00:38:05,650 --> 00:38:09,520 and I is at 0x124 exclamation point is at 0x125, 830 00:38:09,520 --> 00:38:11,740 and the NUL character is at 0x126. 831 00:38:11,740 --> 00:38:13,390 Well, what then is S? 832 00:38:13,390 --> 00:38:16,107 Well, s is just going to be some other variable. 833 00:38:16,107 --> 00:38:18,940 And I'll draw it somewhat abstractly without all of the other boxes, 834 00:38:18,940 --> 00:38:19,630 up here. 835 00:38:19,630 --> 00:38:22,270 And I'll claim that the name of this variable is s. 836 00:38:22,270 --> 00:38:25,000 But it turns out, what is s really? 837 00:38:25,000 --> 00:38:27,130 How do strings really work? 838 00:38:27,130 --> 00:38:30,640 Well, s is a variable, and has been since week one. 839 00:38:30,640 --> 00:38:33,730 But when you define it, what the computer is doing for you automatically 840 00:38:33,730 --> 00:38:36,940 is when it knows you want to store HI exclamation point, 841 00:38:36,940 --> 00:38:38,440 it puts that somewhere in memory. 842 00:38:38,440 --> 00:38:40,720 The computer then figures out for you, what's 843 00:38:40,720 --> 00:38:42,880 the address of the very first character? 844 00:38:42,880 --> 00:38:46,120 And it stores that address, and only that address, 845 00:38:46,120 --> 00:38:50,080 in the variable you created on the left hand side of the equal sign. 846 00:38:50,080 --> 00:38:51,370 And that's enough. 847 00:38:51,370 --> 00:38:55,120 To represent a string with three letters of the alphabet or punctuation, 848 00:38:55,120 --> 00:38:57,590 you don't need three variables. 849 00:38:57,590 --> 00:38:58,450 You just need one. 850 00:38:58,450 --> 00:39:02,050 You just need to know the beginning of the string. 851 00:39:02,050 --> 00:39:02,810 Why? 852 00:39:02,810 --> 00:39:06,790 Why is it sufficient for a variable to only store the first byte's address, 853 00:39:06,790 --> 00:39:09,456 and not all of the bytes' addresses? 854 00:39:09,456 --> 00:39:11,380 AUDIENCE: [INAUDIBLE] 855 00:39:11,380 --> 00:39:12,380 DAVID J. MALAN: Exactly. 856 00:39:12,380 --> 00:39:17,360 Because of the design of strings per week two, we always NUL terminate them. 857 00:39:17,360 --> 00:39:20,013 So it's suffices to only remember the first byte's address. 858 00:39:20,013 --> 00:39:22,430 Because from there, you can sort of follow the breadcrumbs 859 00:39:22,430 --> 00:39:23,690 byte, after byte, after byte. 860 00:39:23,690 --> 00:39:27,140 And until you see the new line, sorry, the NUL character, 861 00:39:27,140 --> 00:39:31,230 you know that all of those characters are apparently part of the same string. 862 00:39:31,230 --> 00:39:35,600 So this is what's been going on in the computer's memory all since week one. 863 00:39:35,600 --> 00:39:37,430 And in fact, if we abstract this away, you 864 00:39:37,430 --> 00:39:42,590 can really think of S as being just this, really a pointer 865 00:39:42,590 --> 00:39:44,400 to that chunk of memory. 866 00:39:44,400 --> 00:39:46,920 So in fact, what do we have here? 867 00:39:46,920 --> 00:39:51,360 Well, in the left to recap on the code here, on the left hand side string, 868 00:39:51,360 --> 00:39:54,170 that's what ensures that we'll actually be able to store 869 00:39:54,170 --> 00:39:55,880 a string in a variable called s. 870 00:39:55,880 --> 00:39:59,760 We're going to have on the right hand side, though, the actual value. 871 00:39:59,760 --> 00:40:01,460 So let me switch back to VS Code here. 872 00:40:01,460 --> 00:40:04,490 And let me change my code to no longer involve integers alone. 873 00:40:04,490 --> 00:40:09,260 So I'm going to add the CS50 library just so 874 00:40:09,260 --> 00:40:11,000 that I can use some shortcuts in there. 875 00:40:11,000 --> 00:40:11,783 CS50.h. 876 00:40:11,783 --> 00:40:14,450 And then in my main function, I'm going to go ahead and do this. 877 00:40:14,450 --> 00:40:18,270 String s equals quote unquote "HI" in all caps, exclamation point. 878 00:40:18,270 --> 00:40:22,220 And then I'm going go ahead and print out using %S as always backslash n 879 00:40:22,220 --> 00:40:23,270 the value of s. 880 00:40:23,270 --> 00:40:25,530 So this program at the moment, not interesting at all. 881 00:40:25,530 --> 00:40:29,690 It's just week one stuff again. ./addresses indeed prints out hi. 882 00:40:29,690 --> 00:40:33,380 But it turns out that now that I know this, 883 00:40:33,380 --> 00:40:36,530 what's really been going on underneath the hood all this time? 884 00:40:36,530 --> 00:40:40,640 Well, here's that same line of code that defines the variable called S. 885 00:40:40,640 --> 00:40:46,880 And it turns out anyone, want to guess what string is actually a synonym for? 886 00:40:46,880 --> 00:40:50,540 String, it turns out, is kind of a white lie we've been telling since week one. 887 00:40:50,540 --> 00:40:55,190 There is no such thing as string as a keyword in C. 888 00:40:55,190 --> 00:40:57,110 It's technically a CS50 thing. 889 00:40:57,110 --> 00:40:57,650 Yeah. 890 00:40:57,650 --> 00:40:58,733 AUDIENCE: [INAUDIBLE] 891 00:40:58,733 --> 00:41:00,650 DAVID J. MALAN: It's a pointer to a character. 892 00:41:00,650 --> 00:41:03,410 So really, all this time, we've kind of been lying to you. 893 00:41:03,410 --> 00:41:05,510 There is no "string" quote unquote. 894 00:41:05,510 --> 00:41:07,430 It's actually char star. 895 00:41:07,430 --> 00:41:12,890 And if I may dramatically here, go, the training wheels. 896 00:41:12,890 --> 00:41:14,310 That didn't land very well. 897 00:41:14,310 --> 00:41:16,490 So what have we been doing? 898 00:41:16,490 --> 00:41:19,310 Well, it turns out that string is a much easier way conceptually 899 00:41:19,310 --> 00:41:21,227 to think about what a string of characters is. 900 00:41:21,227 --> 00:41:24,053 My God, if we had a start in week one by having you type char star, 901 00:41:24,053 --> 00:41:25,220 yeah, you might get past it. 902 00:41:25,220 --> 00:41:28,300 But this is just way too much ugly syntax, not intellectually interesting 903 00:41:28,300 --> 00:41:28,800 at all. 904 00:41:28,800 --> 00:41:30,020 So we abstract it away. 905 00:41:30,020 --> 00:41:33,740 What a char star was in the first week of C, by telling you 906 00:41:33,740 --> 00:41:35,210 it's actually called string. 907 00:41:35,210 --> 00:41:37,400 Now string is a term of R. C programmers, 908 00:41:37,400 --> 00:41:39,800 programmers in any language will use the word string 909 00:41:39,800 --> 00:41:41,430 to mean a sequence of characters. 910 00:41:41,430 --> 00:41:45,510 But in C, it's not technically a word unto itself. 911 00:41:45,510 --> 00:41:49,080 It's rather a synonym that we ourselves created in some form. 912 00:41:49,080 --> 00:41:50,960 So in fact, how did we do this? 913 00:41:50,960 --> 00:41:52,550 Well, think back to just last week. 914 00:41:52,550 --> 00:41:54,830 Last week, I proposed that it'd be really nice 915 00:41:54,830 --> 00:41:58,040 if we had a person data type, which the creators of C 916 00:41:58,040 --> 00:41:59,617 did not think of decades ago. 917 00:41:59,617 --> 00:42:00,200 But that's OK. 918 00:42:00,200 --> 00:42:01,520 We can define it ourselves. 919 00:42:01,520 --> 00:42:02,940 What did we do here? 920 00:42:02,940 --> 00:42:05,400 Well, we're using syntax like this. 921 00:42:05,400 --> 00:42:08,210 Recall that we defined a person to be what? 922 00:42:08,210 --> 00:42:09,800 To be this structure. 923 00:42:09,800 --> 00:42:12,080 This structure, using the new keyword last week, 924 00:42:12,080 --> 00:42:14,600 struct, means that a person is just a name and a number. 925 00:42:14,600 --> 00:42:16,100 And it could have been other things. 926 00:42:16,100 --> 00:42:17,180 We just kept it simple. 927 00:42:17,180 --> 00:42:22,640 But how did I associate person with that structure? 928 00:42:22,640 --> 00:42:25,070 Well, we claimed that it was this value here, 929 00:42:25,070 --> 00:42:28,770 typedef, which as you might expect, defines a data type. 930 00:42:28,770 --> 00:42:33,020 So what did we do as CS50 back in week one without telling you? 931 00:42:33,020 --> 00:42:36,950 Well, we could have done something like this. 932 00:42:36,950 --> 00:42:38,318 Int itself is a little cryptic. 933 00:42:38,318 --> 00:42:41,360 And maybe we should have to keep things even simpler said, hey, everyone. 934 00:42:41,360 --> 00:42:44,990 Turns out you can define integers in C. And if you wanted to do this, well, 935 00:42:44,990 --> 00:42:47,700 if you want to create the keyword integer as a data type, 936 00:42:47,700 --> 00:42:49,760 you can just typedef it to int. 937 00:42:49,760 --> 00:42:53,570 So typedef creates the word on the far right, integer, 938 00:42:53,570 --> 00:42:57,320 and creates a synonym for it in this case called int. 939 00:42:57,320 --> 00:43:00,590 So what did we do in week one without telling you? 940 00:43:00,590 --> 00:43:04,490 We have a line of code like this in the CS50 library 941 00:43:04,490 --> 00:43:10,490 that associates quote unquote "string" with more cryptically char star. 942 00:43:10,490 --> 00:43:15,110 And this is why in week one onward, any time you use the CS50 library, 943 00:43:15,110 --> 00:43:18,297 you can write the word string as though it's a real C data type. 944 00:43:18,297 --> 00:43:21,380 And that's just because we wanted to have this abstraction, these training 945 00:43:21,380 --> 00:43:23,060 wheels on for the first weeks, so we don't 946 00:43:23,060 --> 00:43:25,393 have to get in the weeds of all this crazy memory stuff. 947 00:43:25,393 --> 00:43:27,740 We can sort of talk about strings at a higher level. 948 00:43:27,740 --> 00:43:29,120 But that's all they are. 949 00:43:29,120 --> 00:43:31,970 Strings are the address of the first character 950 00:43:31,970 --> 00:43:34,760 in that sequence of characters. 951 00:43:34,760 --> 00:43:38,220 Questions now on any of these details? 952 00:43:38,220 --> 00:43:38,720 Yeah. 953 00:43:38,720 --> 00:43:41,460 AUDIENCE: What about the strings libraries that [INAUDIBLE]?? 954 00:43:41,460 --> 00:43:42,710 DAVID J. MALAN: Good question. 955 00:43:42,710 --> 00:43:45,910 What about the strings library, which we have used? 956 00:43:45,910 --> 00:43:46,600 Unrelated. 957 00:43:46,600 --> 00:43:48,520 So it does not define the word string. 958 00:43:48,520 --> 00:43:51,430 Everything in there actually relates to char stars. 959 00:43:51,430 --> 00:43:56,320 And so in fact, if you've used the CS50 manual, which is just 960 00:43:56,320 --> 00:44:00,340 our user-friendly version of the actual manual pages for the official language, 961 00:44:00,340 --> 00:44:03,820 C, you'll see throughout that now if you start poking around or turning off 962 00:44:03,820 --> 00:44:05,830 less comfortable mode, you'll actually see 963 00:44:05,830 --> 00:44:08,350 that we changed any mentions of char star 964 00:44:08,350 --> 00:44:10,690 in the official documentation for these first weeks 965 00:44:10,690 --> 00:44:12,610 to just string to simplify it. 966 00:44:12,610 --> 00:44:17,350 But underneath the hood, C does not know the word string per se as a keyword. 967 00:44:17,350 --> 00:44:21,320 But it's absolutely a concept that every program in the world knows about. 968 00:44:21,320 --> 00:44:23,750 And in fact, in other languages, in Python for instance, 969 00:44:23,750 --> 00:44:26,227 there will actually be a proper string, although it's not 970 00:44:26,227 --> 00:44:27,310 going to be called string. 971 00:44:27,310 --> 00:44:30,340 It's going to be called STR, STR for short. 972 00:44:30,340 --> 00:44:33,470 Questions on these strings here. 973 00:44:33,470 --> 00:44:36,520 Well, let me propose there's one other feature of this syntax 974 00:44:36,520 --> 00:44:38,960 that we can now leverage as follows. 975 00:44:38,960 --> 00:44:43,060 Let me propose that if we go back to the previous version of my code 976 00:44:43,060 --> 00:44:46,760 here, wherein, let me switch back to VS Code in just a moment, 977 00:44:46,760 --> 00:44:51,800 I'm going to rewind in VS Code to the integer version of my code from before. 978 00:44:51,800 --> 00:44:55,540 And most recently, it looked like this, before when we were using integers 979 00:44:55,540 --> 00:44:58,570 only and not, in fact, strings at all. 980 00:44:58,570 --> 00:45:01,270 Let me propose that there's this other feature of C 981 00:45:01,270 --> 00:45:04,360 that we can use that actually allows us to go to an address. 982 00:45:04,360 --> 00:45:07,090 So at the moment, let me just rewind and do, make addresses, 983 00:45:07,090 --> 00:45:10,660 to remind you what this program did when it was using integers alone. 984 00:45:10,660 --> 00:45:12,010 And there's that address. 985 00:45:12,010 --> 00:45:12,670 Why? 986 00:45:12,670 --> 00:45:15,880 Because on line seven, notice, I'm printing out 987 00:45:15,880 --> 00:45:17,710 the value of p, which is a pointer. 988 00:45:17,710 --> 00:45:20,060 So of course, it's going to look like an address. 989 00:45:20,060 --> 00:45:22,660 But let me zoom out now and make one change. 990 00:45:22,660 --> 00:45:27,730 Instead of printing out p, how can I use today's second new operator, 991 00:45:27,730 --> 00:45:32,020 not the ampersand, but the star, to actually go to that address? 992 00:45:32,020 --> 00:45:35,170 Well, what I can actually do on this line of code, is this. 993 00:45:35,170 --> 00:45:39,250 If I want to print out the actual integer 50 that's in that variable, 994 00:45:39,250 --> 00:45:44,858 or equivalently at that address, I can go to p here and not print p literally, 995 00:45:44,858 --> 00:45:46,150 because that's just an address. 996 00:45:46,150 --> 00:45:48,910 I can now say, star p. 997 00:45:48,910 --> 00:45:51,310 And star p means go there. 998 00:45:51,310 --> 00:45:53,200 More technically, de-reference p. 999 00:45:53,200 --> 00:45:56,560 That is, follow the treasure map to the actual address and do what Carter did. 1000 00:45:56,560 --> 00:46:00,260 Open the mailbox and print whatever was in the mailbox, which recall, 1001 00:46:00,260 --> 00:46:02,380 was the actual number 50. 1002 00:46:02,380 --> 00:46:03,680 So let me try this. 1003 00:46:03,680 --> 00:46:05,270 Let me recompile the code. 1004 00:46:05,270 --> 00:46:08,020 So make addresses. 1005 00:46:08,020 --> 00:46:09,520 OK, let me clear my terminal window. 1006 00:46:09,520 --> 00:46:10,510 Dot slash addresses. 1007 00:46:10,510 --> 00:46:12,640 This time, I shouldn't see the 0x anything. 1008 00:46:12,640 --> 00:46:16,550 I should see just the number 50 in this case. 1009 00:46:16,550 --> 00:46:19,990 And here too is kind of a unfortunate design decision, certainly 1010 00:46:19,990 --> 00:46:23,290 pedagogically I would say in C. If I zoom in on this code, 1011 00:46:23,290 --> 00:46:26,335 star is unfortunately being used in two different ways. 1012 00:46:26,335 --> 00:46:28,960 In an ideal world, they would have used three different symbols 1013 00:46:28,960 --> 00:46:30,550 to make this more semantically clear. 1014 00:46:30,550 --> 00:46:32,030 But this is what we're stuck with. 1015 00:46:32,030 --> 00:46:35,370 So in line six, when you declare a pointer, 1016 00:46:35,370 --> 00:46:37,120 that is a variable that stores an address, 1017 00:46:37,120 --> 00:46:39,520 you put the type of variable that you want 1018 00:46:39,520 --> 00:46:42,865 to point at, then a star just because, and then the name of the variable. 1019 00:46:42,865 --> 00:46:44,740 And then on the right hand side, you actually 1020 00:46:44,740 --> 00:46:47,590 get the address of whatever using ampersand. 1021 00:46:47,590 --> 00:46:51,790 But when you want to go to an address, you want to de-reference a pointer, 1022 00:46:51,790 --> 00:46:53,380 you don't use int again. 1023 00:46:53,380 --> 00:46:54,640 And we've never done that. 1024 00:46:54,640 --> 00:46:57,100 Once you declare a variable, you never again mention the data type. 1025 00:46:57,100 --> 00:46:58,975 But in the world of pointers now, if you want 1026 00:46:58,975 --> 00:47:03,340 to not print out p but go to whatever address p is storing, 1027 00:47:03,340 --> 00:47:05,180 you use star p here. 1028 00:47:05,180 --> 00:47:07,570 So a good visual indicator would be when you 1029 00:47:07,570 --> 00:47:10,850 declare a pointer, that is make it exist in your program, 1030 00:47:10,850 --> 00:47:13,900 you have to declare the data type with the star. 1031 00:47:13,900 --> 00:47:16,405 But when you use a pointer, you just use the star. 1032 00:47:16,405 --> 00:47:19,030 In an ideal world, this would be a completely different symbol. 1033 00:47:19,030 --> 00:47:21,310 But again, this is what we have. 1034 00:47:21,310 --> 00:47:23,920 Questions now on that syntax. 1035 00:47:23,920 --> 00:47:25,077 Yeah. 1036 00:47:25,077 --> 00:47:27,850 AUDIENCE: [INAUDIBLE] 1037 00:47:27,850 --> 00:47:30,850 DAVID J. MALAN: Why can't we just do the ampersand here, are you saying? 1038 00:47:30,850 --> 00:47:32,260 It was still a little quiet. 1039 00:47:32,260 --> 00:47:34,600 So strictly speaking, we do not need line six. 1040 00:47:34,600 --> 00:47:36,730 So this is really for pedagogical sake that I 1041 00:47:36,730 --> 00:47:41,320 am defining a separate variable p and then printing it out. 1042 00:47:41,320 --> 00:47:44,500 At this point though, I'm just kind of going in circles, if you will. 1043 00:47:44,500 --> 00:47:46,480 Because more simple would have been what I 1044 00:47:46,480 --> 00:47:49,150 would have done in week one, which would be get rid of p 1045 00:47:49,150 --> 00:47:52,660 altogether, get rid of p here, and just print out n. 1046 00:47:52,660 --> 00:47:56,710 But today, we're just giving you this new building block, this new syntax, 1047 00:47:56,710 --> 00:47:59,170 via which you can figure out the address of something, 1048 00:47:59,170 --> 00:48:04,450 and then reverse the process later and actually go to it as well. 1049 00:48:04,450 --> 00:48:08,200 Other questions on what we've done here with these pointers. 1050 00:48:08,200 --> 00:48:08,700 All right. 1051 00:48:08,700 --> 00:48:10,770 Well, let's context switch back to the string 1052 00:48:10,770 --> 00:48:15,780 now and see what more we can do with this here in the case of our strings 1053 00:48:15,780 --> 00:48:16,800 here. 1054 00:48:16,800 --> 00:48:22,710 Let me refine this to zoom out, let me delete the integer-related code here, 1055 00:48:22,710 --> 00:48:27,480 let me do string s equals quote unquote "HI" in all caps, let me go ahead 1056 00:48:27,480 --> 00:48:30,930 and for the moment include CS50.h at the top 1057 00:48:30,930 --> 00:48:36,390 so that indeed I can use the key word s, string rather, and let me go ahead now 1058 00:48:36,390 --> 00:48:38,500 and do something more than I did last time. 1059 00:48:38,500 --> 00:48:43,020 Last time, I did printf of %s backslash n, and then I printed out s. 1060 00:48:43,020 --> 00:48:45,990 And again, I'll recompile this just for clarity, make addresses, 1061 00:48:45,990 --> 00:48:46,920 dot slash addresses. 1062 00:48:46,920 --> 00:48:47,920 That just prints out hi. 1063 00:48:47,920 --> 00:48:49,450 So that's, again, week one stuff. 1064 00:48:49,450 --> 00:48:52,260 But now that we have this other bit of syntax, 1065 00:48:52,260 --> 00:48:54,630 we can do some interesting things too. 1066 00:48:54,630 --> 00:48:58,440 So for instance, suppose I want to print out not s itself, 1067 00:48:58,440 --> 00:49:00,870 but what if I want to print out the address of s? 1068 00:49:00,870 --> 00:49:03,210 At what memory location is s? 1069 00:49:03,210 --> 00:49:07,350 Well, I can change my %s to %p, which now we know p is for pointer. 1070 00:49:07,350 --> 00:49:10,200 So %p means print out the value of a pointer. 1071 00:49:10,200 --> 00:49:11,280 That is an address. 1072 00:49:11,280 --> 00:49:15,150 And here, I can actually print out s itself. 1073 00:49:15,150 --> 00:49:17,200 But why that is, we'll see in a moment. 1074 00:49:17,200 --> 00:49:19,200 Let me do this. 1075 00:49:19,200 --> 00:49:20,850 Here go the training wheels. 1076 00:49:20,850 --> 00:49:25,350 String does not technically exist, but it does if I'm using the CS50 library. 1077 00:49:25,350 --> 00:49:28,380 But if I get rid of the CS50 library, as I'm metaphorically 1078 00:49:28,380 --> 00:49:31,690 doing by taking off the training wheels, I can't use the word string anymore. 1079 00:49:31,690 --> 00:49:33,773 And in fact, let me make this mistake deliberately 1080 00:49:33,773 --> 00:49:35,880 as you might have accidentally in past weeks. 1081 00:49:35,880 --> 00:49:40,470 Here is the error message I get if I forget the CS50 library, use 1082 00:49:40,470 --> 00:49:42,420 of undeclared identifier string. 1083 00:49:42,420 --> 00:49:43,830 Did you mean standard in? 1084 00:49:43,830 --> 00:49:46,872 It's trying to be helpful, but it's not because I didn't mean standard n. 1085 00:49:46,872 --> 00:49:48,870 So indeed, this is confirmation that C does not 1086 00:49:48,870 --> 00:49:51,780 know the word string exists, at least as a keyword. 1087 00:49:51,780 --> 00:49:54,130 Exists as a concept, but not a keyword. 1088 00:49:54,130 --> 00:49:57,515 So I could fix this by adding back the CS50 library. 1089 00:49:57,515 --> 00:50:00,390 But that's kind of a step backwards, educationally, instead of a step 1090 00:50:00,390 --> 00:50:00,930 forward. 1091 00:50:00,930 --> 00:50:05,605 What could I do instead to fix this now if the training wheels are now off? 1092 00:50:05,605 --> 00:50:06,105 Yeah. 1093 00:50:06,105 --> 00:50:09,210 AUDIENCE: [INAUDIBLE] 1094 00:50:09,210 --> 00:50:10,210 DAVID J. MALAN: Exactly. 1095 00:50:10,210 --> 00:50:13,808 Replace "string" quote unquote with char star instead. 1096 00:50:13,808 --> 00:50:15,850 So I'm going to go ahead and change this to char. 1097 00:50:15,850 --> 00:50:18,280 Technically, you can put the literal star here, 1098 00:50:18,280 --> 00:50:21,310 the asterisk, or you can put it there, or you can put it here. 1099 00:50:21,310 --> 00:50:23,710 By convention is to do what I've done from the beginning, 1100 00:50:23,710 --> 00:50:28,180 put the star next to the name of the variable as opposed to anywhere else. 1101 00:50:28,180 --> 00:50:29,930 Let me go ahead now and-- 1102 00:50:29,930 --> 00:50:30,447 or sorry. 1103 00:50:30,447 --> 00:50:31,780 I meant to add the spaces there. 1104 00:50:31,780 --> 00:50:32,560 You could do this too. 1105 00:50:32,560 --> 00:50:34,480 But this would be the most normal convention. 1106 00:50:34,480 --> 00:50:35,590 So now let's do this. 1107 00:50:35,590 --> 00:50:39,850 Make addresses, compiles OK now, dot slash addresses. 1108 00:50:39,850 --> 00:50:40,990 What should I see? 1109 00:50:40,990 --> 00:50:44,660 Hi or something else? 1110 00:50:44,660 --> 00:50:45,970 Feel free to just call it out. 1111 00:50:45,970 --> 00:50:46,918 AUDIENCE: [INAUDIBLE] 1112 00:50:46,918 --> 00:50:48,460 DAVID J. MALAN: So still hi, you say? 1113 00:50:48,460 --> 00:50:50,307 Someone else? 1114 00:50:50,307 --> 00:50:51,390 AUDIENCE: Memory location? 1115 00:50:51,390 --> 00:50:52,410 DAVID J. MALAN: A memory location. 1116 00:50:52,410 --> 00:50:54,510 All right, so it could be one of the two options. 1117 00:50:54,510 --> 00:50:57,552 Either I'm going to see the string, or I'm going to see a memory address. 1118 00:50:57,552 --> 00:50:59,400 Though I do, in fact, see a memory address. 1119 00:50:59,400 --> 00:51:01,608 And this one is quite different from the integer one. 1120 00:51:01,608 --> 00:51:04,950 But does anyone now want to explain why you were correct? 1121 00:51:04,950 --> 00:51:09,000 Why am I seeing the address down here and not hi? 1122 00:51:09,000 --> 00:51:09,640 It's subtle. 1123 00:51:09,640 --> 00:51:10,140 Yeah. 1124 00:51:10,140 --> 00:51:12,500 AUDIENCE: [INAUDIBLE] 1125 00:51:12,500 --> 00:51:13,500 DAVID J. MALAN: Exactly. 1126 00:51:13,500 --> 00:51:17,490 Because I left my %p there, which means, hey, printf, show me a pointer. 1127 00:51:17,490 --> 00:51:21,030 But this is where printf is smart and has been smart since week zero. 1128 00:51:21,030 --> 00:51:26,475 Humans who invented printf decades ago wrote code that notices that OK, 1129 00:51:26,475 --> 00:51:31,590 %s means to treat the following value, not as just an address per se that gets 1130 00:51:31,590 --> 00:51:34,890 printed literally, but print it as with the mailbox demo, 1131 00:51:34,890 --> 00:51:39,300 as sort of a treasure map that leads you to the address of a character. 1132 00:51:39,300 --> 00:51:42,750 So simply by changing one character, %p to %s, 1133 00:51:42,750 --> 00:51:45,840 and if I now do make addresses again and dot slash addresses, 1134 00:51:45,840 --> 00:51:49,710 this now is identical to week one, but hopefully makes sense. 1135 00:51:49,710 --> 00:51:56,100 Because %s is just a clue to printf that means, go to this address in s. 1136 00:51:56,100 --> 00:52:00,990 Print out every character there and thereafter until you see, what? 1137 00:52:00,990 --> 00:52:01,890 The NUL character. 1138 00:52:01,890 --> 00:52:04,060 And then stop printing anything more. 1139 00:52:04,060 --> 00:52:07,050 And this is why hi has printed since week one. 1140 00:52:07,050 --> 00:52:09,690 Today, we can see the address %p. 1141 00:52:09,690 --> 00:52:13,920 But this combination of having access to addresses and the NUL terminator 1142 00:52:13,920 --> 00:52:16,740 is all the information printf needs to actually do something more 1143 00:52:16,740 --> 00:52:21,180 useful by printing the actual strings. 1144 00:52:21,180 --> 00:52:26,250 Any questions now on this approach to %s? 1145 00:52:26,250 --> 00:52:28,102 Yeah, in back. 1146 00:52:28,102 --> 00:52:29,363 AUDIENCE: [INAUDIBLE] 1147 00:52:29,363 --> 00:52:32,280 DAVID J. MALAN: Oh, so why is it traditionally being used in this way? 1148 00:52:32,280 --> 00:52:34,662 Honestly, the word string has been around for decades. 1149 00:52:34,662 --> 00:52:37,620 It's not a key word you should be able to type in C unless you're using 1150 00:52:37,620 --> 00:52:39,450 a library like CS50's. 1151 00:52:39,450 --> 00:52:41,190 And so s just means string. 1152 00:52:41,190 --> 00:52:45,060 So even though it doesn't exist as a key word, %s connotes string. 1153 00:52:45,060 --> 00:52:48,250 And humans decades ago, like today, just kind of know what that means. 1154 00:52:48,250 --> 00:52:50,458 So they could have chosen any letter of the alphabet. 1155 00:52:50,458 --> 00:52:52,420 But s sort of makes the most sense. 1156 00:52:52,420 --> 00:52:52,920 All right. 1157 00:52:52,920 --> 00:52:54,150 Well, let's-- in back. 1158 00:52:54,150 --> 00:52:54,930 Other question? 1159 00:52:54,930 --> 00:52:56,310 AUDIENCE: [INAUDIBLE] 1160 00:52:56,310 --> 00:52:57,560 DAVID J. MALAN: Good question. 1161 00:52:57,560 --> 00:52:58,560 Before-- let me zoom in. 1162 00:52:58,560 --> 00:53:01,010 I did not use a star before the s. 1163 00:53:01,010 --> 00:53:01,760 Why? 1164 00:53:01,760 --> 00:53:03,290 Well, it's subtle here. 1165 00:53:03,290 --> 00:53:07,340 But printf was invented years ago to know, 1166 00:53:07,340 --> 00:53:11,430 given an address like in the variable s, printf knows to go there. 1167 00:53:11,430 --> 00:53:14,810 So if we looked at the source code that some human wrote years ago for C, 1168 00:53:14,810 --> 00:53:18,620 we would likely see the actual asterisk that you're referring to. 1169 00:53:18,620 --> 00:53:21,650 Printf is taking on the responsibility for going to s. 1170 00:53:21,650 --> 00:53:26,510 If you were to do star s here instead, an asterisk, and an s, 1171 00:53:26,510 --> 00:53:29,570 that would now be literally a character. 1172 00:53:29,570 --> 00:53:33,345 Because if I say star s, that means go to the address in s. 1173 00:53:33,345 --> 00:53:35,720 And all you're going to find there is a single character. 1174 00:53:35,720 --> 00:53:39,680 What printf wants to know is not, what is the character there? 1175 00:53:39,680 --> 00:53:41,430 What is the address of that character? 1176 00:53:41,430 --> 00:53:41,930 Why? 1177 00:53:41,930 --> 00:53:45,890 Because printf needs to walk through the rest of those characters 1178 00:53:45,890 --> 00:53:48,770 looking for the final NUL character. 1179 00:53:48,770 --> 00:53:51,140 And in fact, let me see, with a bit more syntax, 1180 00:53:51,140 --> 00:53:52,830 if we can highlight this a bit more. 1181 00:53:52,830 --> 00:53:53,570 Let me do this. 1182 00:53:53,570 --> 00:53:56,930 In addition to printing s, let's try out our syntax in another way. 1183 00:53:56,930 --> 00:54:01,040 Let me print out with %s how about not s here, 1184 00:54:01,040 --> 00:54:05,840 but let's print out some addresses. %s backslash n, close quote, 1185 00:54:05,840 --> 00:54:08,030 and then let's print out, how about this? 1186 00:54:08,030 --> 00:54:12,440 The first character in the string s would be called s bracket 0. 1187 00:54:12,440 --> 00:54:16,425 But how do I get the address of the first character in s? 1188 00:54:16,425 --> 00:54:18,800 Well, I could technically just use today's new primitive. 1189 00:54:18,800 --> 00:54:19,967 I can just add an ampersand. 1190 00:54:19,967 --> 00:54:22,620 That always gives me the address of some value. 1191 00:54:22,620 --> 00:54:27,200 So when I end this thought and clear my terminal window 1192 00:54:27,200 --> 00:54:29,660 and run make addresses, still compiles, when 1193 00:54:29,660 --> 00:54:32,090 I run addresses in just a moment, any guesses 1194 00:54:32,090 --> 00:54:34,950 as to what I will see line by line? 1195 00:54:34,950 --> 00:54:37,083 This will print out two things. 1196 00:54:37,083 --> 00:54:39,500 And you don't have to remember what the actual number was. 1197 00:54:39,500 --> 00:54:42,440 But at a high level, what will be printed now? 1198 00:54:42,440 --> 00:54:44,120 The same thing twice. 1199 00:54:44,120 --> 00:54:44,700 Why? 1200 00:54:44,700 --> 00:54:48,080 Well, when I run this, what I'm printing here, and let me zoom in at the bottom, 1201 00:54:48,080 --> 00:54:50,780 I indeed see two really long addresses. 1202 00:54:50,780 --> 00:54:52,100 But they're, in fact, the same. 1203 00:54:52,100 --> 00:54:52,650 Why? 1204 00:54:52,650 --> 00:54:57,830 Well, that's because, again, if s is the address of a character, as implied now 1205 00:54:57,830 --> 00:55:02,750 by either the CS50 word string, or the actual phrase char star, well, 1206 00:55:02,750 --> 00:55:04,370 then s is just an address. 1207 00:55:04,370 --> 00:55:09,050 By contrast per week two, s bracket 0 is a char. 1208 00:55:09,050 --> 00:55:11,230 Always has been a char, specific char. 1209 00:55:11,230 --> 00:55:14,105 But if you want the address of that char, you just add the ampersand. 1210 00:55:14,105 --> 00:55:18,198 Well, it turns out that strings, per the definition we keep emphasizing, 1211 00:55:18,198 --> 00:55:20,490 is just the address of the first character in a string. 1212 00:55:20,490 --> 00:55:23,550 So of course, if you do this, you're going to see the exact same thing. 1213 00:55:23,550 --> 00:55:26,450 And if I do this a bit more, generally, you don't want to copy paste. 1214 00:55:26,450 --> 00:55:28,940 But this is just for visualization sake. 1215 00:55:28,940 --> 00:55:30,440 Let me print out all the characters. 1216 00:55:30,440 --> 00:55:32,330 So another, another, another. 1217 00:55:32,330 --> 00:55:36,200 And let me change this to print out the address of bracket one, bracket 1218 00:55:36,200 --> 00:55:37,520 two, and bracket three. 1219 00:55:37,520 --> 00:55:40,310 So all four characters, H, I, exclamation point, 1220 00:55:40,310 --> 00:55:41,480 and the NUL character. 1221 00:55:41,480 --> 00:55:43,920 Notice I'm using %p for all of them. 1222 00:55:43,920 --> 00:55:47,330 So if I now do make addresses and dot slash addresses, 1223 00:55:47,330 --> 00:55:49,590 now notice, and this is kind of cool. 1224 00:55:49,590 --> 00:55:51,530 The first two are indeed still the same. 1225 00:55:51,530 --> 00:55:54,575 But what's noteworthy about the other values on the screen? 1226 00:55:54,575 --> 00:55:57,180 1227 00:55:57,180 --> 00:55:58,380 Yeah, they're consecutive. 1228 00:55:58,380 --> 00:55:59,822 Each of these is just 1 byte away. 1229 00:55:59,822 --> 00:56:03,030 Even if you're not good at hex yet and there's a crazy number of digits here, 1230 00:56:03,030 --> 00:56:03,670 who cares? 1231 00:56:03,670 --> 00:56:07,420 They're all the same except for the last ones, four, four, and then five, 1232 00:56:07,420 --> 00:56:07,920 six, seven. 1233 00:56:07,920 --> 00:56:11,280 And this confirms what I've been claiming for weeks is that in an array, 1234 00:56:11,280 --> 00:56:16,380 all of the characters are back to back to back contiguous 1 byte away. 1235 00:56:16,380 --> 00:56:18,960 So with just this ampersand, with just this star, 1236 00:56:18,960 --> 00:56:21,000 it's actually a pretty cool tool in the toolkit 1237 00:56:21,000 --> 00:56:24,630 to have Because you can start to poke around what's actually going 1238 00:56:24,630 --> 00:56:27,220 on inside of the computer's memory. 1239 00:56:27,220 --> 00:56:31,710 And in fact, if we do this, I can introduce one other cool trick here, 1240 00:56:31,710 --> 00:56:32,490 if you will. 1241 00:56:32,490 --> 00:56:38,160 Let me propose that we can actually now do arithmetic on pointers. 1242 00:56:38,160 --> 00:56:39,090 And you don't have to. 1243 00:56:39,090 --> 00:56:40,780 You'll see a simpler way to do this. 1244 00:56:40,780 --> 00:56:44,790 But now that you have perhaps this underlying understanding of where 1245 00:56:44,790 --> 00:56:46,710 things are in memory and it's just addresses, 1246 00:56:46,710 --> 00:56:48,850 we can actually do something kind of neat. 1247 00:56:48,850 --> 00:56:50,950 We can do something like this. 1248 00:56:50,950 --> 00:56:55,470 Let me go back to how about the string version of this with hi. 1249 00:56:55,470 --> 00:56:57,120 And let me do this instead. 1250 00:56:57,120 --> 00:57:01,390 Let me clean this up a bit, get rid of some of these lines of code. 1251 00:57:01,390 --> 00:57:02,290 And let me do this. 1252 00:57:02,290 --> 00:57:04,975 Let me print out %c, %c, %c. 1253 00:57:04,975 --> 00:57:06,600 Let me get rid of all these ampersands. 1254 00:57:06,600 --> 00:57:09,210 We're going to roll back to week two stuff. 1255 00:57:09,210 --> 00:57:13,110 Just to be clear, when I compile and run this version of the program, 1256 00:57:13,110 --> 00:57:17,100 and I'll zoom in, what should get printed on the screen? 1257 00:57:17,100 --> 00:57:19,380 This is just week two stuff now. 1258 00:57:19,380 --> 00:57:20,864 No pointers per se. 1259 00:57:20,864 --> 00:57:21,364 Yeah. 1260 00:57:21,364 --> 00:57:23,800 AUDIENCE: [INAUDIBLE] 1261 00:57:23,800 --> 00:57:26,290 DAVID J. MALAN: Just HI exclamation point, one per line, 1262 00:57:26,290 --> 00:57:28,040 because I have all of these backslash n's. 1263 00:57:28,040 --> 00:57:29,150 So let me do that. 1264 00:57:29,150 --> 00:57:32,285 Let me go down here, make addresses, Enter. 1265 00:57:32,285 --> 00:57:32,960 OK, pretty good. 1266 00:57:32,960 --> 00:57:33,980 Dot slash addresses. 1267 00:57:33,980 --> 00:57:36,140 And indeed HI exclamation point. 1268 00:57:36,140 --> 00:57:38,468 But now if you're getting a little more comfortable, 1269 00:57:38,468 --> 00:57:41,510 and it's fine if you're not yet today, but over the coming week or weeks, 1270 00:57:41,510 --> 00:57:45,050 as you get a little more comfortable with the equivalence of addresses 1271 00:57:45,050 --> 00:57:48,720 with our definition in the past of arrays, and strings, and all of this, 1272 00:57:48,720 --> 00:57:50,180 you can start to play around. 1273 00:57:50,180 --> 00:57:51,680 And I can do this instead. 1274 00:57:51,680 --> 00:57:56,060 If I want to print out the first character in the string, 1275 00:57:56,060 --> 00:57:58,023 I could do, like week two, s bracket 0. 1276 00:57:58,023 --> 00:57:58,940 That will always work. 1277 00:57:58,940 --> 00:57:59,960 And you can keep using that. 1278 00:57:59,960 --> 00:58:00,960 That's not a CS50 thing. 1279 00:58:00,960 --> 00:58:05,570 It's just a convenience in C. But I could technically print out not s, 1280 00:58:05,570 --> 00:58:07,650 because s is an address. 1281 00:58:07,650 --> 00:58:13,240 But what would be the syntax I could use to say, print out the character at s? 1282 00:58:13,240 --> 00:58:15,070 Any instincts? 1283 00:58:15,070 --> 00:58:19,050 How can I say, go to the address in s? 1284 00:58:19,050 --> 00:58:21,840 It's one of two possible answers today. 1285 00:58:21,840 --> 00:58:24,030 So of our two new-- 1286 00:58:24,030 --> 00:58:27,300 of our two new operators today, we have the ampersand and the star. 1287 00:58:27,300 --> 00:58:30,405 Which one will lead us to what is at an address? 1288 00:58:30,405 --> 00:58:31,540 AUDIENCE: [INAUDIBLE] 1289 00:58:31,540 --> 00:58:32,707 DAVID J. MALAN: So the star. 1290 00:58:32,707 --> 00:58:37,200 So in fact, if I want to print out, what is at address zero, at the address s, 1291 00:58:37,200 --> 00:58:39,390 I can just do star s. 1292 00:58:39,390 --> 00:58:42,060 And if you really want to get fancy, how do you print out 1293 00:58:42,060 --> 00:58:45,300 the second character that's immediately to the right of it, so to speak? 1294 00:58:45,300 --> 00:58:48,424 Well, you can go to, with the de-reference operator-- 1295 00:58:48,424 --> 00:58:49,882 and do you want to answer this one? 1296 00:58:49,882 --> 00:58:50,826 AUDIENCE: [INAUDIBLE] 1297 00:58:50,826 --> 00:58:52,380 DAVID J. MALAN: S plus 1. 1298 00:58:52,380 --> 00:58:54,300 Ergo, pointer arithmetic. 1299 00:58:54,300 --> 00:58:56,430 You can do math, simple addition, subtraction, 1300 00:58:56,430 --> 00:58:58,140 whatever, on pointers if you want. 1301 00:58:58,140 --> 00:58:59,850 And you can do this here too. 1302 00:58:59,850 --> 00:59:01,980 So star, if you want to pluck this one off too, 1303 00:59:01,980 --> 00:59:04,914 how do I print out the last character, the third? 1304 00:59:04,914 --> 00:59:05,870 AUDIENCE: s plus 2? 1305 00:59:05,870 --> 00:59:07,227 DAVID J. MALAN: s plus 2. 1306 00:59:07,227 --> 00:59:09,560 Because if you know and understand that a string is just 1307 00:59:09,560 --> 00:59:11,935 a sequence of characters, every character is just a byte, 1308 00:59:11,935 --> 00:59:14,160 and these bytes are back to back to back, 1309 00:59:14,160 --> 00:59:17,160 you can just go wherever you want in the computer's memory. 1310 00:59:17,160 --> 00:59:20,150 And here, I can do make addresses again, dot slash addresses. 1311 00:59:20,150 --> 00:59:23,030 And voila, we now have hi exclamation point. 1312 00:59:23,030 --> 00:59:25,100 So we haven't printed out anything new. 1313 00:59:25,100 --> 00:59:28,430 But again, just by using these two new operators, the ampersand and the star, 1314 00:59:28,430 --> 00:59:30,360 you can figure out the address of something, 1315 00:59:30,360 --> 00:59:33,020 and you can go to the address of something. 1316 00:59:33,020 --> 00:59:34,250 OK, question in back. 1317 00:59:34,250 --> 00:59:35,352 AUDIENCE: [INAUDIBLE] 1318 00:59:35,352 --> 00:59:36,310 DAVID J. MALAN: Indeed. 1319 00:59:36,310 --> 00:59:37,770 It ends up being the exact same. 1320 00:59:37,770 --> 00:59:39,520 And so I might have used this term before. 1321 00:59:39,520 --> 00:59:41,890 The ampersand technique-- sorry. 1322 00:59:41,890 --> 00:59:45,625 The square bracket technique where you do s bracket zero, s bracket one, 1323 00:59:45,625 --> 00:59:49,240 s bracket two, that's actually what we would really call syntactic sugar. 1324 00:59:49,240 --> 00:59:49,780 It works. 1325 00:59:49,780 --> 00:59:50,572 And you can use it. 1326 00:59:50,572 --> 00:59:51,322 You should use it. 1327 00:59:51,322 --> 00:59:52,210 It's nice and simple. 1328 00:59:52,210 --> 00:59:55,090 But the square bracket notation underneath the hood 1329 00:59:55,090 --> 00:59:58,107 is essentially being converted to this, which this is not fun. 1330 00:59:58,107 --> 01:00:00,190 This is when you want to show off to your friends, 1331 01:00:00,190 --> 01:00:01,773 you know how to do cool stuff in code. 1332 01:00:01,773 --> 01:00:05,313 But this is not as readable as just s bracket zero, and one, and two. 1333 01:00:05,313 --> 01:00:07,480 But that's all that's happening underneath the hood. 1334 01:00:07,480 --> 01:00:09,745 And so again, this is why in CS50 we spend time 1335 01:00:09,745 --> 01:00:11,620 on some of these lower level building blocks. 1336 01:00:11,620 --> 01:00:14,290 Because if you assume that indeed your computer's memory is just 1337 01:00:14,290 --> 01:00:19,510 this grid of bytes and you have now the code ability in code to get an address 1338 01:00:19,510 --> 01:00:22,600 and go to an address, you can start doing anything you want. 1339 01:00:22,600 --> 01:00:25,210 And you can poke around a computer's memory at any location. 1340 01:00:25,210 --> 01:00:26,950 And herein lies the danger. 1341 01:00:26,950 --> 01:00:28,660 I'm kind of on the honor system right now 1342 01:00:28,660 --> 01:00:32,140 that if my string is hi exclamation point, it's kind up to me 1343 01:00:32,140 --> 01:00:34,930 to go to the first byte, the second, and the third. 1344 01:00:34,930 --> 01:00:36,910 But I could get kind of crazy now. 1345 01:00:36,910 --> 01:00:40,090 And if I want to see what's going on in the computer's memory, I mean, 1346 01:00:40,090 --> 01:00:43,570 there's nothing stopping me from doing like s plus 50. 1347 01:00:43,570 --> 01:00:44,780 And let's see what's there. 1348 01:00:44,780 --> 01:00:49,540 So make addresses, dot slash addresses, hi, and then, OK, nothing it seems. 1349 01:00:49,540 --> 01:00:51,522 Well, how about 5,000 bytes away? 1350 01:00:51,522 --> 01:00:52,480 Let's just poke around. 1351 01:00:52,480 --> 01:00:54,105 What's inside of the computer's memory? 1352 01:00:54,105 --> 01:00:59,020 So make addresses again, make addresses, dot slash addresses, Enter. 1353 01:00:59,020 --> 01:01:00,400 OK, still nothing there. 1354 01:01:00,400 --> 01:01:02,470 Let's try 50,000. 1355 01:01:02,470 --> 01:01:03,190 All right. 1356 01:01:03,190 --> 01:01:05,860 Make addresses, dot slash addresses. 1357 01:01:05,860 --> 01:01:07,530 OK, there we see it. 1358 01:01:07,530 --> 01:01:09,280 So you've probably done this, some of you, 1359 01:01:09,280 --> 01:01:12,322 by accident because you probably went too far to the left or to the right 1360 01:01:12,322 --> 01:01:14,710 in an array touching memory that you shouldn't. 1361 01:01:14,710 --> 01:01:19,180 Suffice it to say I should not go blindly touching 50,000 bytes away. 1362 01:01:19,180 --> 01:01:20,650 Because who knows what's there? 1363 01:01:20,650 --> 01:01:23,320 And indeed, in your computer, when a program is running, 1364 01:01:23,320 --> 01:01:26,810 the computer segments it into different segments of memory. 1365 01:01:26,810 --> 01:01:30,790 And if you get a little too greedy and you touch another segment of memory 1366 01:01:30,790 --> 01:01:34,510 that technically was not allocated to you by Mac OS, or Windows, or Linux, 1367 01:01:34,510 --> 01:01:36,820 or the operating system, bad things happen. 1368 01:01:36,820 --> 01:01:38,320 And you get a segmentation fault. 1369 01:01:38,320 --> 01:01:40,040 And that means it's a bug in your code. 1370 01:01:40,040 --> 01:01:41,500 So you can now do this. 1371 01:01:41,500 --> 01:01:44,410 And this means hackers too can do things like this. 1372 01:01:44,410 --> 01:01:47,440 If they can somehow inject code into your C program, 1373 01:01:47,440 --> 01:01:50,240 maybe they can poke around the computer's memory. 1374 01:01:50,240 --> 01:01:52,360 And indeed, this is kind of the technique whereby 1375 01:01:52,360 --> 01:01:55,990 maybe a really sophisticated hacker can jump to this memory, this memory, 1376 01:01:55,990 --> 01:01:58,780 this memory looking for something like your password, 1377 01:01:58,780 --> 01:02:01,960 or your financial information, or anything that's in the program 1378 01:02:01,960 --> 01:02:03,280 but at some other address. 1379 01:02:03,280 --> 01:02:06,520 There's nothing stopping an adversary, at least right now, 1380 01:02:06,520 --> 01:02:09,610 from poking around if they can execute code on your computer 1381 01:02:09,610 --> 01:02:11,120 from doing this kind of thing. 1382 01:02:11,120 --> 01:02:13,750 So there and again is the power of C, but also the danger. 1383 01:02:13,750 --> 01:02:16,910 And you'll absolutely suffer more seg faults in the coming days. 1384 01:02:16,910 --> 01:02:19,240 But ultimately, the goal is going to be to help you 1385 01:02:19,240 --> 01:02:22,490 solve them ultimately and fix things. 1386 01:02:22,490 --> 01:02:27,170 But for now, I think that was quite a bit. 1387 01:02:27,170 --> 01:02:30,640 So let me propose that we go ahead and take our longer break here, 1388 01:02:30,640 --> 01:02:33,730 maybe 10 minutes, and have ourselves some whoopie pies in the transept. 1389 01:02:33,730 --> 01:02:35,710 We'll be back in 10. 1390 01:02:35,710 --> 01:02:37,090 All right. 1391 01:02:37,090 --> 01:02:38,470 So we're back. 1392 01:02:38,470 --> 01:02:42,760 And to recap where we left off, you now have this new capability in code 1393 01:02:42,760 --> 01:02:45,220 to do pointer arithmetic like treat addresses 1394 01:02:45,220 --> 01:02:47,320 as numbers, which they really are in hexadecimal 1395 01:02:47,320 --> 01:02:49,720 or otherwise, and add them together and kind 1396 01:02:49,720 --> 01:02:51,440 of poke around a computer's memory. 1397 01:02:51,440 --> 01:02:55,180 And it was asked during break actually how we might further 1398 01:02:55,180 --> 01:02:56,980 harness this in the context of string. 1399 01:02:56,980 --> 01:02:59,650 So I didn't change the code we wrote just before break. 1400 01:02:59,650 --> 01:03:04,960 Recall that we last broke the program by checking out bytes 50,000 bytes away. 1401 01:03:04,960 --> 01:03:06,070 But let's not do that. 1402 01:03:06,070 --> 01:03:09,760 And let's actually try printing out not individual characters, like I did, 1403 01:03:09,760 --> 01:03:14,710 per the %c, but why don't we try printing out strings and substrings 1404 01:03:14,710 --> 01:03:15,290 if you will? 1405 01:03:15,290 --> 01:03:16,810 So let me clear my terminal window. 1406 01:03:16,810 --> 01:03:21,490 Let me change all of these %c's to %s, %s, %s. 1407 01:03:21,490 --> 01:03:24,520 And then let me rewind to what we've been doing since week one 1408 01:03:24,520 --> 01:03:26,930 with strings, which is just print them out, 1409 01:03:26,930 --> 01:03:28,720 for instance, with that first line. 1410 01:03:28,720 --> 01:03:31,592 And the only difference at the moment is that now, I 1411 01:03:31,592 --> 01:03:32,800 took off the training wheels. 1412 01:03:32,800 --> 01:03:38,950 I got rid of CS50.h wherein string is typedef to char star for you. 1413 01:03:38,950 --> 01:03:39,920 Got rid of that. 1414 01:03:39,920 --> 01:03:42,640 So now on line five, I'm declaring s as being a char star, which 1415 01:03:42,640 --> 01:03:44,223 just means the address of a character. 1416 01:03:44,223 --> 01:03:46,720 And printf is smart enough to know that the end of a string 1417 01:03:46,720 --> 01:03:48,560 is wherever that NUL character is. 1418 01:03:48,560 --> 01:03:50,740 But now that I can do pointer arithmetic, 1419 01:03:50,740 --> 01:03:52,910 notice that I could do something like this. 1420 01:03:52,910 --> 01:03:55,390 If I want to print out s, I just print out s. 1421 01:03:55,390 --> 01:04:02,080 Suppose I do s plus 1 here and s plus 2 here, again, after changing %c to %s. 1422 01:04:02,080 --> 01:04:10,240 Any intuition around what this code will now print on the screen line by line. 1423 01:04:10,240 --> 01:04:11,180 Yeah, thoughts? 1424 01:04:11,180 --> 01:04:12,260 AUDIENCE: [INAUDIBLE] 1425 01:04:12,260 --> 01:04:14,010 DAVID J. MALAN: OK, reasonable conjecture. 1426 01:04:14,010 --> 01:04:17,500 Maybe the memory address of h, that of i, that of exclamation point. 1427 01:04:17,500 --> 01:04:18,550 But other thoughts? 1428 01:04:18,550 --> 01:04:20,105 AUDIENCE: [INAUDIBLE] 1429 01:04:20,105 --> 01:04:20,980 DAVID J. MALAN: Yeah. 1430 01:04:20,980 --> 01:04:22,510 I think it's actually going to do the latter. 1431 01:04:22,510 --> 01:04:24,430 It's going to print, hi, in the usual way. 1432 01:04:24,430 --> 01:04:28,900 Because honestly, line five is this-- rather line six is the same as week one 1433 01:04:28,900 --> 01:04:31,240 stuff, except we took off the training wheel of string 1434 01:04:31,240 --> 01:04:32,620 and we're calling it char star. 1435 01:04:32,620 --> 01:04:35,770 But I think line seven is indeed going to print out i. 1436 01:04:35,770 --> 01:04:38,440 And line eight is just going to print out 1437 01:04:38,440 --> 01:04:40,480 because it will be just the exclamation point. 1438 01:04:40,480 --> 01:04:45,040 Printf will still be smart enough to know where each of those substrings, 1439 01:04:45,040 --> 01:04:47,860 portions of the strings, end by the same logic as always. 1440 01:04:47,860 --> 01:04:51,370 But let me go ahead and zoom out, run make addresses, Enter, 1441 01:04:51,370 --> 01:04:53,950 compiles OK, dot slash addresses. 1442 01:04:53,950 --> 01:04:57,340 And now indeed, this is all a string is. 1443 01:04:57,340 --> 01:05:00,340 It's a sequence of characters identified by its first byte. 1444 01:05:00,340 --> 01:05:03,550 If you then start poking around and tell printf 1445 01:05:03,550 --> 01:05:06,505 to print at what's at the next byte, or the next, next byte, 1446 01:05:06,505 --> 01:05:08,380 it's going to do its same thing, printing out 1447 01:05:08,380 --> 01:05:12,500 that character and everything after it up until that NUL character. 1448 01:05:12,500 --> 01:05:14,810 So again, even though there's a lot going on, 1449 01:05:14,810 --> 01:05:16,630 we've introduced these two new operators, 1450 01:05:16,630 --> 01:05:19,840 there's nothing that's happening today that hasn't been happening for weeks. 1451 01:05:19,840 --> 01:05:23,367 But hopefully, through this week, this week's lecture, this week's problem 1452 01:05:23,367 --> 01:05:25,450 set, and beyond, you'll start to realize that now, 1453 01:05:25,450 --> 01:05:28,570 you just have more tools by which to harness those lower 1454 01:05:28,570 --> 01:05:30,680 level implementation details. 1455 01:05:30,680 --> 01:05:34,420 So last week two, recall one other implementation detail. 1456 01:05:34,420 --> 01:05:39,250 I claimed that you could not compare two strings quite as easily as you could 1457 01:05:39,250 --> 01:05:42,820 compare two integers for instance. 1458 01:05:42,820 --> 01:05:45,850 And I told you to use a different function instead 1459 01:05:45,850 --> 01:05:49,540 that you probably used one or more times with the past problem set. 1460 01:05:49,540 --> 01:05:52,395 How are you supposed to compare strings apparently? 1461 01:05:52,395 --> 01:05:53,270 AUDIENCE: [INAUDIBLE] 1462 01:05:53,270 --> 01:05:54,260 DAVID J. MALAN: Yeah, so string compare. 1463 01:05:54,260 --> 01:05:54,825 STR Comp. 1464 01:05:54,825 --> 01:05:57,950 That additional function that we said, eh, you just have to use it for now. 1465 01:05:57,950 --> 01:06:00,590 But you might have a little intuition already 1466 01:06:00,590 --> 01:06:03,410 as to why we have to use STR compare and we can't just 1467 01:06:03,410 --> 01:06:06,170 use equals equals to compare strings. 1468 01:06:06,170 --> 01:06:07,970 Any intuition for this already? 1469 01:06:07,970 --> 01:06:10,190 Why was STR compare necessary last week? 1470 01:06:10,190 --> 01:06:11,380 AUDIENCE: [INAUDIBLE] 1471 01:06:11,380 --> 01:06:12,380 DAVID J. MALAN: Perfect. 1472 01:06:12,380 --> 01:06:14,600 Equals, equals would compare literally the two memory 1473 01:06:14,600 --> 01:06:18,080 addresses instead of the actual strings character by character. 1474 01:06:18,080 --> 01:06:21,180 And unless the memory addresses are literally the same, 1475 01:06:21,180 --> 01:06:24,260 so you compare that exact same memory address, 1476 01:06:24,260 --> 01:06:26,240 two different strings probably are not going 1477 01:06:26,240 --> 01:06:29,640 to be considered equal even if us humans, they indeed look equal. 1478 01:06:29,640 --> 01:06:30,570 So let's see this. 1479 01:06:30,570 --> 01:06:32,750 Let me go ahead and close addresses.c. 1480 01:06:32,750 --> 01:06:35,150 And actually, before I do one last mention, 1481 01:06:35,150 --> 01:06:39,120 one of the powerful things about pointer arithmetic, as an aside, 1482 01:06:39,120 --> 01:06:42,560 is that C, and really the compiler, is smart enough 1483 01:06:42,560 --> 01:06:45,958 to know how many bytes to keep adding and adding. 1484 01:06:45,958 --> 01:06:47,000 And by that, I mean this. 1485 01:06:47,000 --> 01:06:49,875 Right now, we got lucky because a string is a sequence of characters. 1486 01:06:49,875 --> 01:06:52,220 And by definition, every character is a single byte. 1487 01:06:52,220 --> 01:06:56,270 You can poke around and do s plus 1 to get the next byte, s plus 2 1488 01:06:56,270 --> 01:06:58,040 to get the third byte. 1489 01:06:58,040 --> 01:07:00,020 However, if we weren't dealing with strings, 1490 01:07:00,020 --> 01:07:03,350 suppose we were dealing with integers that were in an array 1491 01:07:03,350 --> 01:07:06,800 back to back to back, if you wanted to get at the next integer, 1492 01:07:06,800 --> 01:07:10,077 you could still do plus 1, or plus 2 to get 1493 01:07:10,077 --> 01:07:11,660 at the next or the next, next integer. 1494 01:07:11,660 --> 01:07:16,310 You would not start to get into the weeds of doing plus 4, and then plus 8. 1495 01:07:16,310 --> 01:07:19,460 You don't have to know or care how big the data types are in the computer. 1496 01:07:19,460 --> 01:07:21,652 C and the compiler will figure that out for you 1497 01:07:21,652 --> 01:07:23,110 based on the data type in question. 1498 01:07:23,110 --> 01:07:27,510 So keep that in mind if ever doing this on a different data type than chars. 1499 01:07:27,510 --> 01:07:29,510 All right, so let me go ahead and open up a file 1500 01:07:29,510 --> 01:07:31,850 that I wrote in advance most of. 1501 01:07:31,850 --> 01:07:34,380 And let me hide my terminal window and show you this. 1502 01:07:34,380 --> 01:07:37,160 So here is a program called compare.c, whose purpose in life 1503 01:07:37,160 --> 01:07:39,200 is to compare two strings. 1504 01:07:39,200 --> 01:07:41,032 I'm back to using the CS50 library. 1505 01:07:41,032 --> 01:07:43,490 Because at least for now, and probably a couple more weeks, 1506 01:07:43,490 --> 01:07:47,390 it is so much easier to get input from the user using CS50's function, 1507 01:07:47,390 --> 01:07:47,930 get int. 1508 01:07:47,930 --> 01:07:51,480 But we'll conclude today by taking off those training wheels as well. 1509 01:07:51,480 --> 01:07:55,280 So you can see how you can actually get user input with nothing CS50 specific. 1510 01:07:55,280 --> 01:07:57,890 So line six and seven, pretty boring. 1511 01:07:57,890 --> 01:07:58,820 Week one stuff. 1512 01:07:58,820 --> 01:08:01,190 Get an int called i, get an int called j, 1513 01:08:01,190 --> 01:08:03,710 and store them in two variables, i and j respectively. 1514 01:08:03,710 --> 01:08:07,235 If i equals equals j, print out the same, else print 1515 01:08:07,235 --> 01:08:08,360 out that they're different. 1516 01:08:08,360 --> 01:08:11,750 Let me just stipulate for time's sake, I'm pretty sure this code is correct. 1517 01:08:11,750 --> 01:08:13,550 This will get two integers from the human. 1518 01:08:13,550 --> 01:08:15,800 It will compare them and tell me correctly 1519 01:08:15,800 --> 01:08:17,450 if they're the same or different. 1520 01:08:17,450 --> 01:08:23,180 And I'll prove as much by running make compare dot slash compare. 1521 01:08:23,180 --> 01:08:25,910 And I'll type in 50 for i, 50 for j. 1522 01:08:25,910 --> 01:08:27,029 And they're the same. 1523 01:08:27,029 --> 01:08:30,859 And now I'll do, how about 50, and say 13. 1524 01:08:30,859 --> 01:08:31,920 And those are different. 1525 01:08:31,920 --> 01:08:34,189 So let me just stipulate this code is indeed correct. 1526 01:08:34,189 --> 01:08:37,370 Would have worked in week one, also works now in week four. 1527 01:08:37,370 --> 01:08:40,310 But let me now change it to compare not two integers, 1528 01:08:40,310 --> 01:08:43,620 but as I hinted, maybe two strings instead. 1529 01:08:43,620 --> 01:08:46,220 So let me go ahead and change this line of code 1530 01:08:46,220 --> 01:08:52,220 to maybe be string s equals get string, asking the user for s. 1531 01:08:52,220 --> 01:08:55,223 Then let's change this second line here to be string t, 1532 01:08:55,223 --> 01:08:57,140 just to keep the variable names short for now. 1533 01:08:57,140 --> 01:09:00,680 And t is a good choice after s for something like this. 1534 01:09:00,680 --> 01:09:02,960 Get string, prompt the human for t. 1535 01:09:02,960 --> 01:09:06,890 And then let's change our i and j here to do the wrong thing, 1536 01:09:06,890 --> 01:09:08,450 per the intuition earlier. 1537 01:09:08,450 --> 01:09:11,720 If s equals equals t, then print out the same, 1538 01:09:11,720 --> 01:09:13,399 else, print out that they're different. 1539 01:09:13,399 --> 01:09:16,274 Now if I want, I could take off at least some of the training wheels. 1540 01:09:16,274 --> 01:09:17,689 I could change this to char star. 1541 01:09:17,689 --> 01:09:19,430 I could change this to char star. 1542 01:09:19,430 --> 01:09:20,055 Either is fine. 1543 01:09:20,055 --> 01:09:22,805 I still need the CS50 library though because I'm using get string, 1544 01:09:22,805 --> 01:09:25,880 because it's actually hard, as we'll see today, to get strings manually 1545 01:09:25,880 --> 01:09:26,930 without using a library. 1546 01:09:26,930 --> 01:09:30,470 But I'll keep it using string just for now with the library. 1547 01:09:30,470 --> 01:09:33,890 All right, make compare again, dot slash compare. 1548 01:09:33,890 --> 01:09:38,240 And now let me go ahead and type in, for instance, hi, exclamation point, Enter, 1549 01:09:38,240 --> 01:09:40,220 and hi, exclamation point, Enter. 1550 01:09:40,220 --> 01:09:42,640 And oh, they're different. 1551 01:09:42,640 --> 01:09:44,390 All right, they're obviously not visually. 1552 01:09:44,390 --> 01:09:45,765 But they are underneath the hood. 1553 01:09:45,765 --> 01:09:47,840 And you probably do have the intuition for this 1554 01:09:47,840 --> 01:09:50,660 already, whereby what's going on underneath the hood 1555 01:09:50,660 --> 01:09:54,149 is that we're comparing accidentally the two memory addresses. 1556 01:09:54,149 --> 01:09:55,400 So in fact, let's go there. 1557 01:09:55,400 --> 01:09:56,848 Let's consider the memory. 1558 01:09:56,848 --> 01:09:59,640 And let me zoom out now so I can just have more bytes to play with. 1559 01:09:59,640 --> 01:10:03,020 So the squares are a little smaller than before just so we can fit more in them. 1560 01:10:03,020 --> 01:10:08,950 And let me propose that when I declare s on what was line six a moment ago, 1561 01:10:08,950 --> 01:10:11,450 it ends up somewhere in memory like the top left hand corner 1562 01:10:11,450 --> 01:10:13,010 of my picture for discussion's sake? 1563 01:10:13,010 --> 01:10:18,930 And when I execute that same line of code, and get string is called, 1564 01:10:18,930 --> 01:10:21,560 and I type in hi exclamation point, we know 1565 01:10:21,560 --> 01:10:24,990 from week one that get string puts it somewhere in the computer's memory. 1566 01:10:24,990 --> 01:10:28,580 And I'll propose that it's in the bottom left hand corner of the screen here. 1567 01:10:28,580 --> 01:10:29,970 What happens after that? 1568 01:10:29,970 --> 01:10:32,120 Well, I know, even though I don't generally care, 1569 01:10:32,120 --> 01:10:34,730 that H, I, exclamation point, and the NUL character 1570 01:10:34,730 --> 01:10:40,310 exist at some address, like 0x123, 124, 125, 126 for discussion's sake. 1571 01:10:40,310 --> 01:10:41,480 And what's in s? 1572 01:10:41,480 --> 01:10:44,600 Same as before break, 0x123. 1573 01:10:44,600 --> 01:10:48,080 So that's all that's happening again on line six, which 1574 01:10:48,080 --> 01:10:51,170 is pretty much the same as when we were getting an s earlier. 1575 01:10:51,170 --> 01:10:55,030 But notice now with line seven, when I get a second variable called t 1576 01:10:55,030 --> 01:10:56,830 and I call get string again. 1577 01:10:56,830 --> 01:10:59,980 And by coincidence, as the human, I type the same thing. 1578 01:10:59,980 --> 01:11:02,530 Well, what happens here? t gets its own chunk of memory, 1579 01:11:02,530 --> 01:11:04,120 maybe at the top right. 1580 01:11:04,120 --> 01:11:07,643 That second version of hi gets somewhere else in memory. 1581 01:11:07,643 --> 01:11:10,060 The computer could be smart and notice that it's the same. 1582 01:11:10,060 --> 01:11:11,970 But C doesn't generally do that for you. 1583 01:11:11,970 --> 01:11:13,720 It just plops it somewhere else in memory. 1584 01:11:13,720 --> 01:11:18,873 And maybe it's at address 0x456, 457, 458, 459, or wherever. 1585 01:11:18,873 --> 01:11:21,040 But you can perhaps see where this is going already. 1586 01:11:21,040 --> 01:11:23,770 t now, of course, contains the address of that first byte. 1587 01:11:23,770 --> 01:11:29,530 And so in my code, on line nine, when I compare s and t for equality, 1588 01:11:29,530 --> 01:11:33,190 suffice it to say they are not equal because of the way 1589 01:11:33,190 --> 01:11:36,460 the strings are laid out in the computer's memory, 1590 01:11:36,460 --> 01:11:38,960 it's indeed looks the same, the same values are there. 1591 01:11:38,960 --> 01:11:43,990 But if we abstract away further, you can really see that s and t not the same 1592 01:11:43,990 --> 01:11:45,290 themselves. 1593 01:11:45,290 --> 01:11:46,870 And so how did we fix this? 1594 01:11:46,870 --> 01:11:50,125 Or really, how did we avoid this last week without spilling the beans 1595 01:11:50,125 --> 01:11:52,000 and going down this rabbit hole of explaining 1596 01:11:52,000 --> 01:11:53,890 why you have to use STR compare? 1597 01:11:53,890 --> 01:11:57,760 Well, if I go back to my code here, let's do it now the right way. 1598 01:11:57,760 --> 01:12:00,640 Let me go ahead and include a line of code 1599 01:12:00,640 --> 01:12:04,480 that says string compare of s comma t, both as inputs. 1600 01:12:04,480 --> 01:12:08,530 And then if you recall, what does STR compare return 1601 01:12:08,530 --> 01:12:10,040 when two strings are equal? 1602 01:12:10,040 --> 01:12:11,625 There's three possible return values. 1603 01:12:11,625 --> 01:12:12,500 AUDIENCE: [INAUDIBLE] 1604 01:12:12,500 --> 01:12:13,500 DAVID J. MALAN: So zero. 1605 01:12:13,500 --> 01:12:17,120 So one is for if it comes alphabetically or ASCIIabetically first or second. 1606 01:12:17,120 --> 01:12:18,700 But for now, I just want zero. 1607 01:12:18,700 --> 01:12:22,450 If I want to use STR compare, I do need string.h. 1608 01:12:22,450 --> 01:12:24,130 So string.h does exist. 1609 01:12:24,130 --> 01:12:25,390 That's not a CS50 thing. 1610 01:12:25,390 --> 01:12:28,000 There's no keyword string as a data type. 1611 01:12:28,000 --> 01:12:29,080 That's a CS50 thing. 1612 01:12:29,080 --> 01:12:30,620 But string.h does exist. 1613 01:12:30,620 --> 01:12:34,030 So I think now with that change on line 10, if I do make 1614 01:12:34,030 --> 01:12:38,320 compare, and dot slash compare, and then run again, 1615 01:12:38,320 --> 01:12:42,162 type again, hi exclamation point, hi exclamation point, 1616 01:12:42,162 --> 01:12:43,370 I think now they're the same. 1617 01:12:43,370 --> 01:12:48,760 And just as a second check, HI in all caps, maybe hi in lowercase, 1618 01:12:48,760 --> 01:12:50,440 those are, in fact, different. 1619 01:12:50,440 --> 01:12:51,110 Why? 1620 01:12:51,110 --> 01:12:54,400 Well, STR compare, which was written by some other human decades 1621 01:12:54,400 --> 01:12:59,890 ago is just smart enough to know that it should go to s and go to t, 1622 01:12:59,890 --> 01:13:04,450 start comparing them left to right, stopping once it hits one or both NUL 1623 01:13:04,450 --> 01:13:07,720 characters, and return zero only if everything in s 1624 01:13:07,720 --> 01:13:11,140 and in t are exactly the same. 1625 01:13:11,140 --> 01:13:15,580 Are any questions then on this here? 1626 01:13:15,580 --> 01:13:18,680 Any questions on why we're using STR compare? 1627 01:13:18,680 --> 01:13:19,180 All right. 1628 01:13:19,180 --> 01:13:20,530 If no-- yeah, oh. 1629 01:13:20,530 --> 01:13:22,120 In the middle. 1630 01:13:22,120 --> 01:13:24,270 AUDIENCE: Why do [INAUDIBLE] integers? 1631 01:13:24,270 --> 01:13:25,437 Why [INAUDIBLE]? 1632 01:13:25,437 --> 01:13:26,270 DAVID J. MALAN: Yes. 1633 01:13:26,270 --> 01:13:28,610 So why-- why is it not the case with integers? 1634 01:13:28,610 --> 01:13:30,920 So it turns out it's not the case with integers, 1635 01:13:30,920 --> 01:13:35,060 with floats, with bools, with doubles, with longs. 1636 01:13:35,060 --> 01:13:37,790 Literally every other data type works correctly. 1637 01:13:37,790 --> 01:13:39,530 Strings though are special. 1638 01:13:39,530 --> 01:13:42,500 They're useful enough in programming and have been for decades 1639 01:13:42,500 --> 01:13:44,900 that the authors of printf, and the authors of STR 1640 01:13:44,900 --> 01:13:47,840 compare, and bunches of other functions, strlen for that matter, 1641 01:13:47,840 --> 01:13:51,650 just kind of treat strings special because they're just useful. 1642 01:13:51,650 --> 01:13:54,660 We humans interact using language, be it English or anything else. 1643 01:13:54,660 --> 01:13:58,040 And so it's just useful to have into the language C 1644 01:13:58,040 --> 01:14:02,900 just sort of first class support for this notion of strings of human text. 1645 01:14:02,900 --> 01:14:05,330 So the short answer is just because. 1646 01:14:05,330 --> 01:14:08,300 It just is necessary-- strings are different. 1647 01:14:08,300 --> 01:14:11,480 They're implemented with this address and the NUL character. 1648 01:14:11,480 --> 01:14:14,060 Everything else, though, is just a value. 1649 01:14:14,060 --> 01:14:15,680 But a string again is a white lie. 1650 01:14:15,680 --> 01:14:16,490 It's an address. 1651 01:14:16,490 --> 01:14:19,550 It's not a thing unto itself. 1652 01:14:19,550 --> 01:14:20,180 Good question. 1653 01:14:20,180 --> 01:14:21,020 Yeah, in front. 1654 01:14:21,020 --> 01:14:23,333 AUDIENCE: How come [INAUDIBLE]? 1655 01:14:23,333 --> 01:14:25,000 DAVID J. MALAN: Oh really good question. 1656 01:14:25,000 --> 01:14:30,310 So in my code here in VS Code, what if I do this? 1657 01:14:30,310 --> 01:14:33,600 Instead of STR compare, and instead of if s equals 1658 01:14:33,600 --> 01:14:39,060 equals t, what if I start playing around using star s and star t? 1659 01:14:39,060 --> 01:14:41,020 Really interesting case to consider. 1660 01:14:41,020 --> 01:14:43,330 Let's go back to our sort of deductive logic here. 1661 01:14:43,330 --> 01:14:46,720 So star, the asterisk operator today, means go there. 1662 01:14:46,720 --> 01:14:49,860 So when I've typed in HI once and then HI again, 1663 01:14:49,860 --> 01:14:53,850 both uppercase for instance, what is at the address s literally? 1664 01:14:53,850 --> 01:14:56,410 Someone else. 1665 01:14:56,410 --> 01:14:57,640 What is at the address s? 1666 01:14:57,640 --> 01:14:59,080 Yeah. 1667 01:14:59,080 --> 01:15:00,100 So not quite. 1668 01:15:00,100 --> 01:15:01,645 At the address. 1669 01:15:01,645 --> 01:15:02,830 So not, what is the address? 1670 01:15:02,830 --> 01:15:04,535 What is at the address 0x123? 1671 01:15:04,535 --> 01:15:05,410 AUDIENCE: [INAUDIBLE] 1672 01:15:05,410 --> 01:15:06,340 DAVID J. MALAN: h. 1673 01:15:06,340 --> 01:15:08,590 And what is at the address 0x456? 1674 01:15:08,590 --> 01:15:09,550 AUDIENCE: [INAUDIBLE] 1675 01:15:09,550 --> 01:15:10,690 DAVID J. MALAN: h also. 1676 01:15:10,690 --> 01:15:12,850 And so here, you're kind of cheating. 1677 01:15:12,850 --> 01:15:17,800 You're comparing the first character of both strings, but not every other one. 1678 01:15:17,800 --> 01:15:19,570 Now you could be really pedantic. 1679 01:15:19,570 --> 01:15:22,510 And here, again, this is not a good use of code. 1680 01:15:22,510 --> 01:15:23,600 But you could do this. 1681 01:15:23,600 --> 01:15:26,480 If that, and how about this craziness? 1682 01:15:26,480 --> 01:15:32,140 So star s plus 1 equals equals star t plus 1. 1683 01:15:32,140 --> 01:15:34,420 And you could do this for every character manually. 1684 01:15:34,420 --> 01:15:35,960 But that's why STR compare exists. 1685 01:15:35,960 --> 01:15:37,130 It does all of this for you. 1686 01:15:37,130 --> 01:15:37,798 But that's why. 1687 01:15:37,798 --> 01:15:38,840 And that's the intuition. 1688 01:15:38,840 --> 01:15:41,965 So I would encourage you too, anytime there's something kind of weird going 1689 01:15:41,965 --> 01:15:42,670 on, there's-- 1690 01:15:42,670 --> 01:15:45,160 I realize we might be straining credibility now, 1691 01:15:45,160 --> 01:15:46,990 we haven't told you that many white lies. 1692 01:15:46,990 --> 01:15:50,140 And so most everything that we've seen thus far 1693 01:15:50,140 --> 01:15:53,350 can explain pretty much all of the behavior up until now 1694 01:15:53,350 --> 01:15:56,830 from week one onward in C. So let me revert this back to the right way. 1695 01:15:56,830 --> 01:15:59,830 If s STR compare of s and t equals equals zero, 1696 01:15:59,830 --> 01:16:01,608 this now is the right version of the code. 1697 01:16:01,608 --> 01:16:03,400 And now here is, again, where you can play. 1698 01:16:03,400 --> 01:16:04,250 So let me do this. 1699 01:16:04,250 --> 01:16:07,690 Let me clear my terminal window just to tidy things up. 1700 01:16:07,690 --> 01:16:09,670 Let me get rid of all of this comparison stuff. 1701 01:16:09,670 --> 01:16:12,795 And let's just see what's going on, as you are welcome to in your own code. 1702 01:16:12,795 --> 01:16:14,800 Let's print out, for instance, as we might 1703 01:16:14,800 --> 01:16:18,850 have in week one, the value of s itself on a new line, comma s. 1704 01:16:18,850 --> 01:16:21,460 And then let's just print out t just to make sure it compiles 1705 01:16:21,460 --> 01:16:22,910 and I'm not doing anything wrong. 1706 01:16:22,910 --> 01:16:24,785 But this is not going to be that interesting. 1707 01:16:24,785 --> 01:16:27,820 And frankly, I don't need string.h anymore 1708 01:16:27,820 --> 01:16:29,320 because I'm not using STR compare. 1709 01:16:29,320 --> 01:16:34,660 So make addresses dot slash addresses, there's my-- 1710 01:16:34,660 --> 01:16:35,290 oh, sorry. 1711 01:16:35,290 --> 01:16:36,430 That's fun. 1712 01:16:36,430 --> 01:16:39,130 Not %t, %s here too. 1713 01:16:39,130 --> 01:16:39,797 Ignore that. 1714 01:16:39,797 --> 01:16:40,630 Let's do this again. 1715 01:16:40,630 --> 01:16:43,720 Make a-- oh, and that's the wrong program. 1716 01:16:43,720 --> 01:16:47,770 Dot slash-- let's do make compare dot slash compare. 1717 01:16:47,770 --> 01:16:49,990 And let's type in hi again and hi again. 1718 01:16:49,990 --> 01:16:51,490 And now we just see the two strings. 1719 01:16:51,490 --> 01:16:52,360 I'm not comparing. 1720 01:16:52,360 --> 01:16:54,160 But now we can kind of play around. 1721 01:16:54,160 --> 01:16:57,850 Instead of printing out %s, which prints the string, 1722 01:16:57,850 --> 01:17:01,960 how do I print the address in s? 1723 01:17:01,960 --> 01:17:04,240 I just need to make a slight change. 1724 01:17:04,240 --> 01:17:09,400 If I want to see not what's at s, but I want to see s, the address-- 1725 01:17:09,400 --> 01:17:10,356 Yeah. 1726 01:17:10,356 --> 01:17:13,050 AUDIENCE: Change %s to %p? 1727 01:17:13,050 --> 01:17:14,050 DAVID J. MALAN: Perfect. 1728 01:17:14,050 --> 01:17:17,500 So change %s in both places here to %p. 1729 01:17:17,500 --> 01:17:20,522 So now, printf will treat it literally as an address. 1730 01:17:20,522 --> 01:17:22,480 It's not going to do any fancy this with a loop 1731 01:17:22,480 --> 01:17:24,522 from left to right looking for the NUL character. 1732 01:17:24,522 --> 01:17:26,140 It's just going to print out s and t. 1733 01:17:26,140 --> 01:17:29,170 So let me clear my terminal, run make compare, whoops. 1734 01:17:29,170 --> 01:17:31,450 Let's do make compare dot slash compare. 1735 01:17:31,450 --> 01:17:32,050 Enter. 1736 01:17:32,050 --> 01:17:34,100 Type in hi, type in hi again. 1737 01:17:34,100 --> 01:17:37,618 And now you see, oh, so this is interesting. 1738 01:17:37,618 --> 01:17:40,660 It's not quite as straightforward as the other values which were slight-- 1739 01:17:40,660 --> 01:17:41,950 1 byte away. 1740 01:17:41,950 --> 01:17:43,220 They're almost the same. 1741 01:17:43,220 --> 01:17:44,950 But this one ends in b0. 1742 01:17:44,950 --> 01:17:46,540 This one ends in f0. 1743 01:17:46,540 --> 01:17:49,630 So they're indeed separated by some number 1744 01:17:49,630 --> 01:17:51,430 of bytes, not just one, but a few. 1745 01:17:51,430 --> 01:17:54,000 Because these strings are indeed longer. 1746 01:17:54,000 --> 01:17:54,500 All right. 1747 01:17:54,500 --> 01:17:58,240 So once you've seen this here, how can we now maybe leverage 1748 01:17:58,240 --> 01:18:00,250 this to solve other problems? 1749 01:18:00,250 --> 01:18:01,880 Well, let me propose that we do this. 1750 01:18:01,880 --> 01:18:05,770 Let me zoom out here, let me close compare. 1751 01:18:05,770 --> 01:18:10,150 And let me open up another program I wrote part of in advance called copy.c. 1752 01:18:10,150 --> 01:18:14,230 So copy.c in theory makes a copy of a string. 1753 01:18:14,230 --> 01:18:14,920 How? 1754 01:18:14,920 --> 01:18:17,920 On line eight, I'm doing the same thing as before. 1755 01:18:17,920 --> 01:18:22,360 Get string, storing it in a string, or char star, and asking the user for it. 1756 01:18:22,360 --> 01:18:24,980 Then I'm not asking get string again. 1757 01:18:24,980 --> 01:18:30,610 I'm just making a copy super simply with line 10 here, string t equals s. 1758 01:18:30,610 --> 01:18:33,820 Now intuitively, I think that's how I would copy a variable. 1759 01:18:33,820 --> 01:18:37,090 That's how we've copied variables every week thus far in C. 1760 01:18:37,090 --> 01:18:39,160 But something is going to go wrong. 1761 01:18:39,160 --> 01:18:41,830 In line 12, in English, does someone want 1762 01:18:41,830 --> 01:18:44,470 to explain what you think line 12 does? 1763 01:18:44,470 --> 01:18:46,450 Don't worry about finding any bugs or mistakes. 1764 01:18:46,450 --> 01:18:50,290 But what does line 12 seem to be doing using two upper, which 1765 01:18:50,290 --> 01:18:54,080 is thanks to the C type library, which I've included the header file for? 1766 01:18:54,080 --> 01:18:54,580 Yeah. 1767 01:18:54,580 --> 01:18:56,593 AUDIENCE: [INAUDIBLE] 1768 01:18:56,593 --> 01:18:57,760 DAVID J. MALAN: Yeah, right? 1769 01:18:57,760 --> 01:18:59,350 It's kind of like ugly syntax. 1770 01:18:59,350 --> 01:19:02,710 But this would seem to be capitalizing the first letter of t 1771 01:19:02,710 --> 01:19:04,880 specifically and just changing it. 1772 01:19:04,880 --> 01:19:07,720 So we have t bracket 0 here, because we want to save the change. 1773 01:19:07,720 --> 01:19:10,430 And we're passing to two upper, the first character here. 1774 01:19:10,430 --> 01:19:12,460 So this is how we did uppercase in the past. 1775 01:19:12,460 --> 01:19:16,930 And now I print out s and t respectively using %s. 1776 01:19:16,930 --> 01:19:18,460 So this feels like it should work. 1777 01:19:18,460 --> 01:19:21,790 I copied s and stored it in t on line 10. 1778 01:19:21,790 --> 01:19:25,450 And then I change t and only t on line 12. 1779 01:19:25,450 --> 01:19:27,970 But you can perhaps, if you're comfy thus far, 1780 01:19:27,970 --> 01:19:32,860 see where this is going if I do make copy, dot slash copy. 1781 01:19:32,860 --> 01:19:37,220 And let me type in lowercase hi exclamation point this time, just once. 1782 01:19:37,220 --> 01:19:38,410 So I'm going to hit Enter. 1783 01:19:38,410 --> 01:19:43,510 And watch what we see for the value of s and t. 1784 01:19:43,510 --> 01:19:48,640 The new value of s and t at the end of my program seems to be what? 1785 01:19:48,640 --> 01:19:52,030 It seems to be the same. 1786 01:19:52,030 --> 01:19:54,165 Hi is capitalized both times. 1787 01:19:54,165 --> 01:19:56,620 So what's the intuition then for this? 1788 01:19:56,620 --> 01:20:00,100 Why did this just happen? 1789 01:20:00,100 --> 01:20:01,030 Yeah, in back. 1790 01:20:01,030 --> 01:20:02,582 AUDIENCE: [INAUDIBLE] 1791 01:20:02,582 --> 01:20:05,290 DAVID J. MALAN: Yeah, I assigned s and t the same memory address. 1792 01:20:05,290 --> 01:20:07,590 So it did copy s into t. 1793 01:20:07,590 --> 01:20:09,540 But C takes this very literally. 1794 01:20:09,540 --> 01:20:10,073 What is s? 1795 01:20:10,073 --> 01:20:10,740 It's an address. 1796 01:20:10,740 --> 01:20:11,280 What is t? 1797 01:20:11,280 --> 01:20:12,790 It's a copy of that address. 1798 01:20:12,790 --> 01:20:15,990 If you want to copy the whole string like a normal human would expect, 1799 01:20:15,990 --> 01:20:18,180 hey, you or someone has to do a lot more work. 1800 01:20:18,180 --> 01:20:21,690 You have to go to that address, copy this character, this one, this one, 1801 01:20:21,690 --> 01:20:24,390 this one, and copy it to a new location and memory. 1802 01:20:24,390 --> 01:20:26,745 That does not happen automatically here for you in C. 1803 01:20:26,745 --> 01:20:28,620 It does in some other languages, those of you 1804 01:20:28,620 --> 01:20:30,660 who've programmed in certain higher level languages. 1805 01:20:30,660 --> 01:20:32,090 This just works as you would hope. 1806 01:20:32,090 --> 01:20:34,590 And that's one of the benefits of Python and other languages 1807 01:20:34,590 --> 01:20:35,520 that we'll soon see. 1808 01:20:35,520 --> 01:20:38,880 But for now, it literally takes at face value what this is. 1809 01:20:38,880 --> 01:20:40,800 Copy the address into this address. 1810 01:20:40,800 --> 01:20:44,470 And I'll make that more clear by getting rid of the string keyword, which, 1811 01:20:44,470 --> 01:20:45,960 again, is just a typedef. 1812 01:20:45,960 --> 01:20:47,890 This is technically an address here. 1813 01:20:47,890 --> 01:20:49,570 This is technically an address here. 1814 01:20:49,570 --> 01:20:54,090 So what's being copied is the value of that address, not all of the characters 1815 01:20:54,090 --> 01:20:55,750 that might very well follow it. 1816 01:20:55,750 --> 01:20:58,530 So I should make one note too here. 1817 01:20:58,530 --> 01:21:01,290 I'm going to start getting more in the habit of trying 1818 01:21:01,290 --> 01:21:05,070 to avoid segmentation faults because things could go wrong here. 1819 01:21:05,070 --> 01:21:09,930 For instance, on line 12 previously, I was kind of blindly, naively, 1820 01:21:09,930 --> 01:21:14,265 dangerously assuming that there will be at least one character in s or t. 1821 01:21:14,265 --> 01:21:15,390 That might not be the case. 1822 01:21:15,390 --> 01:21:18,390 If the user just hits Enter, there's no characters to uppercase. 1823 01:21:18,390 --> 01:21:21,720 And so this is reckless of me and could theoretically create a seg fault. 1824 01:21:21,720 --> 01:21:25,510 So I should probably start to be smarter and say something like this. 1825 01:21:25,510 --> 01:21:29,130 If the length of t is greater than zero, OK, 1826 01:21:29,130 --> 01:21:32,310 now it's safe to actually capitalize the first letter. 1827 01:21:32,310 --> 01:21:36,270 And that will decrease the probability now of those segmentation faults 1828 01:21:36,270 --> 01:21:40,050 by just not making any assumptions about what the human does. 1829 01:21:40,050 --> 01:21:43,770 Almost always, your programs will crash when you've made a mistake, 1830 01:21:43,770 --> 01:21:49,210 yes, but the user gives you an input that you yourself did not expect. 1831 01:21:49,210 --> 01:21:51,060 So what does this all look like in memory? 1832 01:21:51,060 --> 01:21:53,370 Well, let's go back to the big grid, this time 1833 01:21:53,370 --> 01:21:55,210 focusing on the copying of values. 1834 01:21:55,210 --> 01:21:56,040 And let's do this. 1835 01:21:56,040 --> 01:22:01,110 Here's s as in this new program just declared to be a char star. 1836 01:22:01,110 --> 01:22:04,560 Here is where my lower case high maybe ended up in the computer's memory. 1837 01:22:04,560 --> 01:22:08,860 That's probably at 0x123, 124, 125, whatever, something like that. 1838 01:22:08,860 --> 01:22:12,180 And that's, of course, what ends up in s as a value. 1839 01:22:12,180 --> 01:22:16,860 When I declare t, I do get a second variable called t just like before. 1840 01:22:16,860 --> 01:22:21,330 But when I copy s into t, what happens? 1841 01:22:21,330 --> 01:22:24,150 It's really just literally 0x123. 1842 01:22:24,150 --> 01:22:27,060 Whatever the value of s is is now also the value of t. 1843 01:22:27,060 --> 01:22:29,160 And so if we abstract this away at a high level, 1844 01:22:29,160 --> 01:22:33,240 get rid of all of those extra squares, this is what s and t now are. 1845 01:22:33,240 --> 01:22:36,090 They're indeed copies, but copies of each other, not 1846 01:22:36,090 --> 01:22:38,070 copies of the underlying characters. 1847 01:22:38,070 --> 01:22:41,160 And so if you follow those arrows and try 1848 01:22:41,160 --> 01:22:43,930 to print them both out after capitalizing one or the other, 1849 01:22:43,930 --> 01:22:47,790 you're going to unfortunately end up capitalizing not just one of them, s, 1850 01:22:47,790 --> 01:22:50,310 but both of them, s and t. 1851 01:22:50,310 --> 01:22:52,830 Because literally, it's the same address. 1852 01:22:52,830 --> 01:22:56,130 Any questions, then, on this visualization? 1853 01:22:56,130 --> 01:22:56,730 Yeah. 1854 01:22:56,730 --> 01:22:58,330 AUDIENCE: [INAUDIBLE] 1855 01:22:58,330 --> 01:22:59,580 DAVID J. MALAN: Good question. 1856 01:22:59,580 --> 01:23:01,200 Is this pass by reference? 1857 01:23:01,200 --> 01:23:07,170 We haven't-- we have not seen in detail an example like that. 1858 01:23:07,170 --> 01:23:09,102 Right now, you're copying by value. 1859 01:23:09,102 --> 01:23:10,560 But references will come into play. 1860 01:23:10,560 --> 01:23:12,852 And remind me in a bit if I haven't used that term yet. 1861 01:23:12,852 --> 01:23:14,925 But this is just copying things by-- 1862 01:23:14,925 --> 01:23:17,370 that could have ended poorly, value. 1863 01:23:17,370 --> 01:23:19,840 Other questions. 1864 01:23:19,840 --> 01:23:20,390 No? 1865 01:23:20,390 --> 01:23:25,960 All right, so with this in mind, how do we actually copy things properly? 1866 01:23:25,960 --> 01:23:28,340 For this, we actually need another building block. 1867 01:23:28,340 --> 01:23:30,080 So today, we give you two functions. 1868 01:23:30,080 --> 01:23:32,860 One of which is called malloc, one of which is called free. 1869 01:23:32,860 --> 01:23:35,590 And these are used all of the time by like every piece 1870 01:23:35,590 --> 01:23:38,460 of software you and I use on our Macs, PCs, and phones, 1871 01:23:38,460 --> 01:23:40,960 whether it's written in C or some equivalent other language. 1872 01:23:40,960 --> 01:23:43,360 Malloc is for memory allocation. 1873 01:23:43,360 --> 01:23:47,960 It's a function that you can use to ask the operating system, MacOS, Linux, 1874 01:23:47,960 --> 01:23:51,250 Windows, anything, for some number of bytes, 1 byte, 100 1875 01:23:51,250 --> 01:23:52,600 bytes, a gigabyte of memory. 1876 01:23:52,600 --> 01:23:55,750 You can ask malloc for however much memory you want in advance. 1877 01:23:55,750 --> 01:24:00,340 It will return to you the address of the first byte of memory 1878 01:24:00,340 --> 01:24:02,110 that it found free for you. 1879 01:24:02,110 --> 01:24:04,940 Unlike a string, it is not NUL terminated. 1880 01:24:04,940 --> 01:24:07,960 And so the danger with malloc is that it's on the honor system. 1881 01:24:07,960 --> 01:24:12,220 If you ask it for 1 byte or 10 bytes, you, the programmer, in a variable, 1882 01:24:12,220 --> 01:24:16,090 have to remember how many bytes you requested, 1, or 10, or the like. 1883 01:24:16,090 --> 01:24:19,032 Strings do that for you, not when we're getting now to this low level. 1884 01:24:19,032 --> 01:24:22,240 Malloc is just going to give you some memory and it's up to you to manage it. 1885 01:24:22,240 --> 01:24:23,500 Free does the opposite. 1886 01:24:23,500 --> 01:24:24,790 When you're done with some chunk of memory, 1887 01:24:24,790 --> 01:24:28,090 you can free it by passing in that same address and just hand it back to Mac 1888 01:24:28,090 --> 01:24:30,230 OS, Windows, or Linux, and say I'm done with this, 1889 01:24:30,230 --> 01:24:33,130 you can let me use this for something else later. 1890 01:24:33,130 --> 01:24:38,198 As an aside, if your computer has ever frozen, or hung, 1891 01:24:38,198 --> 01:24:40,240 the whole thing maybe just spontaneously reboots, 1892 01:24:40,240 --> 01:24:42,280 yet another reason for a bug like that might 1893 01:24:42,280 --> 01:24:46,300 be if you write a program with a bug that keeps mallocing, mallocing, 1894 01:24:46,300 --> 01:24:49,360 mallocing that is asking for more and more and more memory, 1895 01:24:49,360 --> 01:24:51,850 but you make a mistake and you never free it, 1896 01:24:51,850 --> 01:24:54,742 well eventually, the computer is going to literally run out of memory 1897 01:24:54,742 --> 01:24:56,200 and something is going to go wrong. 1898 01:24:56,200 --> 01:24:58,825 And that's often when computers freeze. 1899 01:24:58,825 --> 01:24:59,950 They're just out of memory. 1900 01:24:59,950 --> 01:25:03,430 It has the memory there, but the program was trying to use too much of it 1901 01:25:03,430 --> 01:25:04,150 endlessly. 1902 01:25:04,150 --> 01:25:06,160 So this too will be a mistake that some of us 1903 01:25:06,160 --> 01:25:07,820 will surely make in the coming weeks. 1904 01:25:07,820 --> 01:25:09,890 But hopefully, you'll now see the solution. 1905 01:25:09,890 --> 01:25:12,820 So let me go back to VS Code here. 1906 01:25:12,820 --> 01:25:15,120 And let me propose that we do the following. 1907 01:25:15,120 --> 01:25:16,870 I'll hide my terminal window for a moment. 1908 01:25:16,870 --> 01:25:19,698 And I'm going to introduce another header file up here. 1909 01:25:19,698 --> 01:25:22,240 And I promise there's not going to be too many more of these. 1910 01:25:22,240 --> 01:25:26,860 But this one is called standard lib.h for standard library. 1911 01:25:26,860 --> 01:25:31,060 And in this file are the declarations, the prototypes for malloc, and free, 1912 01:25:31,060 --> 01:25:32,530 and a bunch of other stuff as well. 1913 01:25:32,530 --> 01:25:35,270 It lets me now manage my own memory. 1914 01:25:35,270 --> 01:25:37,360 So let's focus now on line 11. 1915 01:25:37,360 --> 01:25:39,400 Line 11 is where I went wrong before. 1916 01:25:39,400 --> 01:25:41,650 Because conceptually, I want to copy the whole string. 1917 01:25:41,650 --> 01:25:45,530 But of course, I'm only copying modestly the individual address. 1918 01:25:45,530 --> 01:25:47,680 So how do I copy the whole darned thing? 1919 01:25:47,680 --> 01:25:49,400 Well, what I need to do is this. 1920 01:25:49,400 --> 01:25:53,290 When I declare t to be the address of something in memory, 1921 01:25:53,290 --> 01:25:56,780 why don't I set t to be the address of a free chunk of memory? 1922 01:25:56,780 --> 01:25:59,620 So let me ask the operating system, give me this many bytes. 1923 01:25:59,620 --> 01:26:00,820 Tell me what the address is. 1924 01:26:00,820 --> 01:26:03,190 And I'm going to store that in t initially just so I 1925 01:26:03,190 --> 01:26:04,850 know where there's free space for me. 1926 01:26:04,850 --> 01:26:06,020 So how do I do that? 1927 01:26:06,020 --> 01:26:09,250 Well, quite simply, I call malloc, and then I pass in the number of bytes 1928 01:26:09,250 --> 01:26:09,850 that I need. 1929 01:26:09,850 --> 01:26:12,850 Now for HI exclamation point, I think I need three. 1930 01:26:12,850 --> 01:26:13,770 Although wait, no. 1931 01:26:13,770 --> 01:26:16,620 I really need four because of the NUL character. 1932 01:26:16,620 --> 01:26:19,120 But I don't think I should be hard coding numbers like this. 1933 01:26:19,120 --> 01:26:21,328 Because who knows what the human is going to type in? 1934 01:26:21,328 --> 01:26:25,630 So I can actually use strlen of s, and then plus 1. 1935 01:26:25,630 --> 01:26:28,870 This will ask malloc then for however many bytes 1936 01:26:28,870 --> 01:26:32,380 corresponds to the number of characters the human typed in plus 1, 1937 01:26:32,380 --> 01:26:33,970 for again, the NUL character. 1938 01:26:33,970 --> 01:26:37,550 So it's just being smart and defensive rather than choosing a number myself. 1939 01:26:37,550 --> 01:26:41,330 But now all t is is a pointer, if you will, 1940 01:26:41,330 --> 01:26:43,520 to some random chunk of free space. 1941 01:26:43,520 --> 01:26:45,010 So there's nothing there yet. 1942 01:26:45,010 --> 01:26:45,993 Or there's bits there. 1943 01:26:45,993 --> 01:26:47,410 But who knows what value they are? 1944 01:26:47,410 --> 01:26:49,870 They're certainly not identical to what the human typed in. 1945 01:26:49,870 --> 01:26:51,430 I now have to do this. 1946 01:26:51,430 --> 01:26:55,090 So how can I copy one string into the other? 1947 01:26:55,090 --> 01:26:56,450 Well, let me do this. 1948 01:26:56,450 --> 01:27:00,650 Instead of capitalizing something just yet, let me do this. 1949 01:27:00,650 --> 01:27:08,020 How about four int i get 0, i is less than the length of s. 1950 01:27:08,020 --> 01:27:09,262 And then i plus plus. 1951 01:27:09,262 --> 01:27:11,720 So I'm going to iterate for the whole length of the string. 1952 01:27:11,720 --> 01:27:13,630 And in here, I'm just going to do this. 1953 01:27:13,630 --> 01:27:18,640 The ith character in t should be identical to the ith character in s. 1954 01:27:18,640 --> 01:27:22,870 So I'm just literally copying from right to left each and every character in s. 1955 01:27:22,870 --> 01:27:24,980 And I can trust that there's enough memory in t. 1956 01:27:24,980 --> 01:27:25,480 Why? 1957 01:27:25,480 --> 01:27:27,670 Because I asked for that many bytes plus 1. 1958 01:27:27,670 --> 01:27:29,410 Now there's technically a bug here. 1959 01:27:29,410 --> 01:27:31,240 I actually should probably do this. 1960 01:27:31,240 --> 01:27:34,480 I should do plus 1 here. 1961 01:27:34,480 --> 01:27:39,100 Or if you prefer, I should do less than or equal to the strlen. 1962 01:27:39,100 --> 01:27:41,440 But I think it's a little clear to do the plus 1. 1963 01:27:41,440 --> 01:27:46,360 Why do I for the first time want to go just beyond the boundary of s 1964 01:27:46,360 --> 01:27:48,130 and copy 1 more byte? 1965 01:27:48,130 --> 01:27:49,005 AUDIENCE: [INAUDIBLE] 1966 01:27:49,005 --> 01:27:50,963 DAVID J. MALAN: Yeah, I need the NUL character. 1967 01:27:50,963 --> 01:27:53,895 I could technically manually add it with some additional line of code. 1968 01:27:53,895 --> 01:27:55,270 But I might as well just copy it. 1969 01:27:55,270 --> 01:27:57,580 Because backslash zero is backslash zero. 1970 01:27:57,580 --> 01:27:59,920 So this time, and probably only this time, 1971 01:27:59,920 --> 01:28:03,340 it's reasonable and correct to go just beyond the boundary of your string 1972 01:28:03,340 --> 01:28:06,640 so you copy the NUL terminating character so that the computer also 1973 01:28:06,640 --> 01:28:08,050 knows where t ends. 1974 01:28:08,050 --> 01:28:12,800 And now I think what I can do a little more safely is this. 1975 01:28:12,800 --> 01:28:18,100 Let me go down here and say, t bracket 0 equals 2 upper 1976 01:28:18,100 --> 01:28:21,368 of t, of 2 upper of t bracket 0. 1977 01:28:21,368 --> 01:28:22,660 So same line of code as before. 1978 01:28:22,660 --> 01:28:25,327 If I actually want to be really safe, I should probably do this. 1979 01:28:25,327 --> 01:28:28,540 So if the strlen of t is greater than zero. 1980 01:28:28,540 --> 01:28:30,010 So there's at least 1 byte there. 1981 01:28:30,010 --> 01:28:33,700 OK, now it's safe to blindly capitalize the first character. 1982 01:28:33,700 --> 01:28:36,290 And I think that now puts me in better shape. 1983 01:28:36,290 --> 01:28:37,270 So let me try this now. 1984 01:28:37,270 --> 01:28:43,300 Let me open up my terminal, make copy, dot slash copy. 1985 01:28:43,300 --> 01:28:46,690 I'm going to type in hi exclamation point in all lowercase 1986 01:28:46,690 --> 01:28:48,280 crossing my fingers this time. 1987 01:28:48,280 --> 01:28:53,260 And now if I zoom in, it indeed capitalized only t 1988 01:28:53,260 --> 01:28:55,570 and not s in this case. 1989 01:28:55,570 --> 01:28:57,610 So pictorially, let me switch over here. 1990 01:28:57,610 --> 01:29:02,890 Here is, as before, the variable s pointing at hi in all lowercase. 1991 01:29:02,890 --> 01:29:07,000 When I call malloc though, that gives me a chunk of memory 1992 01:29:07,000 --> 01:29:09,430 that I'm going to store the address in t of. 1993 01:29:09,430 --> 01:29:12,245 So if t is some other variable, as it is in my code, 1994 01:29:12,245 --> 01:29:15,370 and there's some other available chunk of memory, I don't know where it is. 1995 01:29:15,370 --> 01:29:19,660 But let's assume as always it's at 0x456, 457, 458, 459. 1996 01:29:19,660 --> 01:29:20,980 So 4 bytes total. 1997 01:29:20,980 --> 01:29:22,360 What is now happening? 1998 01:29:22,360 --> 01:29:24,760 Well, t is defined as pointing to that. 1999 01:29:24,760 --> 01:29:26,950 Because that's what malloc gives us, the address 2000 01:29:26,950 --> 01:29:29,260 of the first byte of the free memory. 2001 01:29:29,260 --> 01:29:33,070 And now with for loop, I'm just iterating over it, copying the h, 2002 01:29:33,070 --> 01:29:36,700 then the i, then the exclamation point, and then for good measure, 2003 01:29:36,700 --> 01:29:39,790 the backslash 0 instead. 2004 01:29:39,790 --> 01:29:43,476 Questions then on this process here? 2005 01:29:43,476 --> 01:29:44,370 AUDIENCE: [INAUDIBLE] 2006 01:29:44,370 --> 01:29:45,995 DAVID J. MALAN: A really good question. 2007 01:29:45,995 --> 01:29:52,350 If I omitted in my code the plus 1 and I didn't do less than 2008 01:29:52,350 --> 01:29:56,130 or equal to so that I'm copying the fourth byte, odds are in this program, 2009 01:29:56,130 --> 01:29:59,280 because it's so short, you wouldn't notice that there's an actual error. 2010 01:29:59,280 --> 01:30:04,650 But what could happen is when I call printf on t, 2011 01:30:04,650 --> 01:30:09,720 if there's no NUL byte there, it might print h, i, exclamation point, 2012 01:30:09,720 --> 01:30:12,960 some random values, some random values, some random values, some random value 2013 01:30:12,960 --> 01:30:16,380 until it gets lucky and there happens to be a 0 byte, a NUL 2014 01:30:16,380 --> 01:30:18,340 byte by chance for instance. 2015 01:30:18,340 --> 01:30:22,800 So if you don't include the backslash zero some way, that's going to happen. 2016 01:30:22,800 --> 01:30:23,970 And I say some way. 2017 01:30:23,970 --> 01:30:25,030 I could even do this. 2018 01:30:25,030 --> 01:30:29,520 I could technically just copy the length of the string s, and at the very bottom 2019 01:30:29,520 --> 01:30:33,030 here, I could do something like t bracket i-- 2020 01:30:33,030 --> 01:30:38,010 sorry, t bracket strlen of t. 2021 01:30:38,010 --> 01:30:39,520 I could do this. 2022 01:30:39,520 --> 01:30:41,010 But this is just not necessary. 2023 01:30:41,010 --> 01:30:43,540 I could manually add it at the end of the string. 2024 01:30:43,540 --> 01:30:46,170 But again, I'd claim that it's just simpler to borrow, 2025 01:30:46,170 --> 01:30:48,450 that is copy, the one that's already in s because it's 2026 01:30:48,450 --> 01:30:50,370 the same thing at the end of the day. 2027 01:30:50,370 --> 01:30:51,130 Good question. 2028 01:30:51,130 --> 01:30:53,960 Other questions on this copying correctly now? 2029 01:30:53,960 --> 01:30:57,040 2030 01:30:57,040 --> 01:30:57,540 All right. 2031 01:30:57,540 --> 01:31:00,070 Is there any room for improvements here? 2032 01:31:00,070 --> 01:31:02,310 Well, let me propose a slight optimization. 2033 01:31:02,310 --> 01:31:04,860 This is kind of a throwback now to week one. 2034 01:31:04,860 --> 01:31:09,810 Turns out that arguably, my line 13 here, wherein I have this for loop, 2035 01:31:09,810 --> 01:31:12,750 now that I'm doing things in loops again and again 2036 01:31:12,750 --> 01:31:15,210 and using a function like strlen, this is correct. 2037 01:31:15,210 --> 01:31:21,510 It will iterate from zero on up to the length of i, length of s plus 1. 2038 01:31:21,510 --> 01:31:26,910 But it's kind of stupid of me to write this for loop in this way. 2039 01:31:26,910 --> 01:31:27,438 Why? 2040 01:31:27,438 --> 01:31:29,230 Well, here's my initialization on the left. 2041 01:31:29,230 --> 01:31:30,930 Here's my condition in the middle. 2042 01:31:30,930 --> 01:31:35,190 And in general, calling a function inside of your condition 2043 01:31:35,190 --> 01:31:38,400 is probably not very good design. 2044 01:31:38,400 --> 01:31:39,000 Why? 2045 01:31:39,000 --> 01:31:43,260 Why is it bad for me to be calling a function like strlen in this condition 2046 01:31:43,260 --> 01:31:44,610 in the middle of my for loop? 2047 01:31:44,610 --> 01:31:45,150 Yeah. 2048 01:31:45,150 --> 01:31:48,430 AUDIENCE: [INAUDIBLE] 2049 01:31:48,430 --> 01:31:50,930 DAVID J. MALAN: Yeah, you're just calling it again and again 2050 01:31:50,930 --> 01:31:51,650 for no reason. 2051 01:31:51,650 --> 01:31:53,040 The length of s never changes. 2052 01:31:53,040 --> 01:31:55,820 So why are you wasting everyone's time by calling strlen of s 2053 01:31:55,820 --> 01:32:00,110 again, again, again, again just to check this inequality, whether i 2054 01:32:00,110 --> 01:32:01,350 is less than that value? 2055 01:32:01,350 --> 01:32:03,260 So it turns out if you haven't discovered this already, 2056 01:32:03,260 --> 01:32:05,093 there's a slight optimization we can do here 2057 01:32:05,093 --> 01:32:08,570 that has nothing to do fundamentally with strings, or pointers, just 2058 01:32:08,570 --> 01:32:09,770 with better design. 2059 01:32:09,770 --> 01:32:12,260 I can actually define two variables at once. 2060 01:32:12,260 --> 01:32:13,350 I could do this. 2061 01:32:13,350 --> 01:32:15,230 Let me remove this whole condition. 2062 01:32:15,230 --> 01:32:20,660 And let me add a comma after i equals 0, set n, or any variable, 2063 01:32:20,660 --> 01:32:24,590 equal to the strlen of s plus 1. 2064 01:32:24,590 --> 01:32:30,020 And then after the semicolon, just ask the question while i is less than n. 2065 01:32:30,020 --> 01:32:31,620 So it's almost the same. 2066 01:32:31,620 --> 01:32:35,090 But notice now my condition in the very middle of this loop 2067 01:32:35,090 --> 01:32:37,730 is at least comparing two static values. 2068 01:32:37,730 --> 01:32:38,700 n never changes. 2069 01:32:38,700 --> 01:32:39,200 Sorry. 2070 01:32:39,200 --> 01:32:41,390 One static value. n never changes. 2071 01:32:41,390 --> 01:32:42,500 All that changes is i. 2072 01:32:42,500 --> 01:32:45,810 But I'm not foolishly calling strlen, strlen, strlen again and again. 2073 01:32:45,810 --> 01:32:46,310 Why? 2074 01:32:46,310 --> 01:32:47,600 Well, how does strlen work? 2075 01:32:47,600 --> 01:32:52,547 Similar in spirit to printf, strlen, given the name of a string, 2076 01:32:52,547 --> 01:32:54,380 looks at the first character and then starts 2077 01:32:54,380 --> 01:32:57,590 looking through the entire string looking for the NUL character. 2078 01:32:57,590 --> 01:33:01,320 And we saw this in week two counting up how many characters are there. 2079 01:33:01,320 --> 01:33:03,820 So it's just a waste of time again and again. 2080 01:33:03,820 --> 01:33:09,330 AUDIENCE: [INAUDIBLE] all the way at the top so that way, [INAUDIBLE]?? 2081 01:33:09,330 --> 01:33:10,330 DAVID J. MALAN: Totally. 2082 01:33:10,330 --> 01:33:12,220 If you wanted to use n multiple times, you 2083 01:33:12,220 --> 01:33:16,330 could absolutely take it out of for loop, put it right after s is defined, 2084 01:33:16,330 --> 01:33:17,770 and reuse n and again and again. 2085 01:33:17,770 --> 01:33:18,430 Absolutely. 2086 01:33:18,430 --> 01:33:19,990 But in general, consider this. 2087 01:33:19,990 --> 01:33:23,500 When designing for loops, even though modern compilers like Clang, 2088 01:33:23,500 --> 01:33:26,200 can actually fix this problem, this inefficiency for you, 2089 01:33:26,200 --> 01:33:29,320 good practice would be don't call functions unnecessarily, 2090 01:33:29,320 --> 01:33:33,020 especially if the answer is always going to be the same. 2091 01:33:33,020 --> 01:33:33,520 All right. 2092 01:33:33,520 --> 01:33:37,100 So what else should I perhaps refine here? 2093 01:33:37,100 --> 01:33:41,380 Well, how about I do one last thing and just comment on what exactly 2094 01:33:41,380 --> 01:33:42,890 could go wrong here. 2095 01:33:42,890 --> 01:33:44,320 Well, a couple of things. 2096 01:33:44,320 --> 01:33:46,480 Well, actually, this is just silly too. 2097 01:33:46,480 --> 01:33:50,290 Surely, someone before me in the world has had to copy a string before. 2098 01:33:50,290 --> 01:33:53,380 Surely, there's a function like called strcpy maybe, 2099 01:33:53,380 --> 01:33:55,000 like strcompare, like strlen. 2100 01:33:55,000 --> 01:33:55,900 And indeed there is. 2101 01:33:55,900 --> 01:33:58,960 So let me propose that we actually get rid of this whole for loop 2102 01:33:58,960 --> 01:34:03,880 and we actually just call a function called strcpy, no O, just strcpy. 2103 01:34:03,880 --> 01:34:08,320 And pass in the destination, which is t first, and then the source 2104 01:34:08,320 --> 01:34:10,300 that you want to copy into the destination. 2105 01:34:10,300 --> 01:34:13,810 And that takes the place entirely of that whole loop. 2106 01:34:13,810 --> 01:34:17,140 So again, I demonstrated the loop first just to be very pedantic about it. 2107 01:34:17,140 --> 01:34:18,320 But that's wasting time. 2108 01:34:18,320 --> 01:34:20,820 You're wasting time writing lines of code you don't need to. 2109 01:34:20,820 --> 01:34:24,020 strcpy is what you can use here instead. 2110 01:34:24,020 --> 01:34:25,720 And so this has now always existed. 2111 01:34:25,720 --> 01:34:26,930 And what more can I do? 2112 01:34:26,930 --> 01:34:30,520 Well as one final point, it turns out that there's actually 2113 01:34:30,520 --> 01:34:33,940 things that can go wrong in this code even besides the string 2114 01:34:33,940 --> 01:34:34,785 being too short. 2115 01:34:34,785 --> 01:34:37,160 If the human just hits Enter and there are no characters, 2116 01:34:37,160 --> 01:34:40,243 I don't want to blindly capitalize the first character that doesn't exist. 2117 01:34:40,243 --> 01:34:41,950 That's why I added that if condition. 2118 01:34:41,950 --> 01:34:43,840 But there's other things that can go wrong. 2119 01:34:43,840 --> 01:34:45,520 And we introduce those to you today. 2120 01:34:45,520 --> 01:34:52,330 It turns out that functions like get string and functions like malloc return 2121 01:34:52,330 --> 01:34:54,190 potentially a special value. 2122 01:34:54,190 --> 01:34:58,320 And wonderfully confusingly, it's also called NULL, but with two L's. 2123 01:34:58,320 --> 01:34:58,820 All right? 2124 01:34:58,820 --> 01:35:01,780 So left hand and right hand weren't talking so well decades ago. 2125 01:35:01,780 --> 01:35:04,480 NUL is a backslash zero. 2126 01:35:04,480 --> 01:35:08,950 It's a single character as it always has been for a couple of weeks now. 2127 01:35:08,950 --> 01:35:12,070 NULL is technically a pointer. 2128 01:35:12,070 --> 01:35:14,650 It's an address, but it's address zero. 2129 01:35:14,650 --> 01:35:18,550 It's like the top left hand corner, if you will, of your computer's memory 2130 01:35:18,550 --> 01:35:21,490 that just nothing is ever supposed to go in by convention. 2131 01:35:21,490 --> 01:35:24,790 So NULL is a synonym for zero. 2132 01:35:24,790 --> 01:35:26,260 But it's specifically an address. 2133 01:35:26,260 --> 01:35:27,500 Now why is this useful? 2134 01:35:27,500 --> 01:35:30,622 Well, suppose that in my code here, something goes wrong with get string. 2135 01:35:30,622 --> 01:35:33,830 Suppose you're being a little crazy and you type in way too long of a string. 2136 01:35:33,830 --> 01:35:36,223 It's not just hi, but it's like an entire essay of text. 2137 01:35:36,223 --> 01:35:38,140 And there's not enough memory in the computer. 2138 01:35:38,140 --> 01:35:41,350 How does get string signal to the programmer, whoa, 2139 01:35:41,350 --> 01:35:44,290 that's way too big of a string, I can't fit it in memory? 2140 01:35:44,290 --> 01:35:45,860 Well, we never told you this. 2141 01:35:45,860 --> 01:35:49,120 But all of this time, it turns out that get 2142 01:35:49,120 --> 01:35:53,600 string will return this special value called NULL if something goes wrong. 2143 01:35:53,600 --> 01:35:57,080 So to be really careful now, you should do something like this. 2144 01:35:57,080 --> 01:36:03,160 If s equals equals literally NULL, then you better exit the program entirely 2145 01:36:03,160 --> 01:36:06,400 and return like one, or two, or three to signify that something went wrong. 2146 01:36:06,400 --> 01:36:08,320 Don't go any further. 2147 01:36:08,320 --> 01:36:12,002 Similarly with malloc, it's possible if you ask for way too much memory, that 2148 01:36:12,002 --> 01:36:14,710 could fail, especially if you're asking now for double the memory 2149 01:36:14,710 --> 01:36:16,168 after the human typed something in. 2150 01:36:16,168 --> 01:36:18,760 So if t equals equals NULL, then you know what? 2151 01:36:18,760 --> 01:36:20,860 Let's also return one, or some other value, 2152 01:36:20,860 --> 01:36:25,220 to just get out before something crashes or freezes on the human as well. 2153 01:36:25,220 --> 01:36:28,377 So honestly, I tend not to do this always in class because the code just 2154 01:36:28,377 --> 01:36:29,710 gets so bloated and complicated. 2155 01:36:29,710 --> 01:36:32,780 But you absolutely in practice need to start doing this. 2156 01:36:32,780 --> 01:36:36,040 Otherwise, you will be responsible for the freezes, and the crashes, 2157 01:36:36,040 --> 01:36:38,140 and the reboots that users in the real world 2158 01:36:38,140 --> 01:36:40,450 might actually encounter otherwise. 2159 01:36:40,450 --> 01:36:43,300 Of course, if we get to the bottom of this program now, 2160 01:36:43,300 --> 01:36:46,960 I should probably return zero explicitly, or implicitly, to just 2161 01:36:46,960 --> 01:36:50,260 signify that everything is successful. 2162 01:36:50,260 --> 01:36:52,660 But there's one other thing I haven't done. 2163 01:36:52,660 --> 01:36:53,890 We introduced malloc. 2164 01:36:53,890 --> 01:36:55,725 But what did I claim also existed? 2165 01:36:55,725 --> 01:36:56,350 AUDIENCE: Free. 2166 01:36:56,350 --> 01:36:57,220 DAVID J. MALAN: So free. 2167 01:36:57,220 --> 01:36:58,762 I'm also being a little reckless now. 2168 01:36:58,762 --> 01:37:00,850 Here I am not practicing what I'm preaching. 2169 01:37:00,850 --> 01:37:03,370 I'm asking the computer for memory via get string, 2170 01:37:03,370 --> 01:37:05,830 I'm asking the computer for more memory via malloc, 2171 01:37:05,830 --> 01:37:08,210 and I'm never technically handing it back. 2172 01:37:08,210 --> 01:37:11,770 So really what I should be doing at the very bottom of my program 2173 01:37:11,770 --> 01:37:16,120 too is freeing the memory I've asked for. 2174 01:37:16,120 --> 01:37:19,540 So henceforth, it is a rule, a law, if you will in C, 2175 01:37:19,540 --> 01:37:23,380 whenever you allocate memory with malloc, or certain other functions 2176 01:37:23,380 --> 01:37:27,670 as well, you, the programmer, must free it when you're all done with it. 2177 01:37:27,670 --> 01:37:30,250 Now this is a bit of an overstatement because technically, 2178 01:37:30,250 --> 01:37:32,800 when programs quit, they'll free the memory automatically. 2179 01:37:32,800 --> 01:37:35,410 So you're not going to break someone's Mac or PC because you necessarily 2180 01:37:35,410 --> 01:37:35,980 have this bug. 2181 01:37:35,980 --> 01:37:38,480 But for programs that are running all the time, like someone 2182 01:37:38,480 --> 01:37:41,890 keeps a Chrome, their browser open, Microsoft Word, or the like, bad things 2183 01:37:41,890 --> 01:37:44,960 will happen if over time you never, never, never call free 2184 01:37:44,960 --> 01:37:46,210 and the program keeps running. 2185 01:37:46,210 --> 01:37:48,250 So always get into this habit here. 2186 01:37:48,250 --> 01:37:52,540 You do not need the free memory that comes from get string because the CS50 2187 01:37:52,540 --> 01:37:54,680 library automatically frees it for you. 2188 01:37:54,680 --> 01:37:58,840 But you, any time you use malloc henceforth, as you did or I did here, 2189 01:37:58,840 --> 01:38:04,150 you must free that by just passing in the same address you got back. 2190 01:38:04,150 --> 01:38:09,740 Questions now on malloc and free? 2191 01:38:09,740 --> 01:38:11,000 Questions? 2192 01:38:11,000 --> 01:38:11,890 Yeah. 2193 01:38:11,890 --> 01:38:17,878 AUDIENCE: [INAUDIBLE] 2194 01:38:17,878 --> 01:38:19,420 DAVID J. MALAN: Really good question. 2195 01:38:19,420 --> 01:38:22,260 So free just-- so what does free do? 2196 01:38:22,260 --> 01:38:26,072 So free just lets the computer know that you 2197 01:38:26,072 --> 01:38:27,780 are done with that chunk of memory, which 2198 01:38:27,780 --> 01:38:29,572 means that if you have another line of code 2199 01:38:29,572 --> 01:38:32,430 elsewhere, that same memory might be reused, 2200 01:38:32,430 --> 01:38:34,020 and can be used again and again. 2201 01:38:34,020 --> 01:38:36,280 And that's going to be necessary certainly for any long running program. 2202 01:38:36,280 --> 01:38:37,827 You can't ask for memory constantly. 2203 01:38:37,827 --> 01:38:38,910 You'll eventually run out. 2204 01:38:38,910 --> 01:38:40,368 So you need to free it in this way. 2205 01:38:40,368 --> 01:38:41,555 Other languages as an aside. 2206 01:38:41,555 --> 01:38:43,680 Python, yet another motivation in a couple of weeks 2207 01:38:43,680 --> 01:38:46,013 for it is going to be Python and certain other languages 2208 01:38:46,013 --> 01:38:47,890 manage all this headache for you. 2209 01:38:47,890 --> 01:38:52,600 But in C, the goal here is to really harness these capabilities ourselves. 2210 01:38:52,600 --> 01:38:53,100 All right. 2211 01:38:53,100 --> 01:38:56,140 So it turns out almost everyone in the room, everyone in the room, 2212 01:38:56,140 --> 01:38:57,420 myself included, you're going to screw up 2213 01:38:57,420 --> 01:39:00,090 when it comes to anything memory related if you haven't already. 2214 01:39:00,090 --> 01:39:01,630 Seg faults are in your future. 2215 01:39:01,630 --> 01:39:04,560 But hopefully, there's tools via which you can detect these things 2216 01:39:04,560 --> 01:39:09,750 and fix them proactively, and not just use printf, or debug50, or rubber duck. 2217 01:39:09,750 --> 01:39:12,397 We actually have another tool we can equip you with now 2218 01:39:12,397 --> 01:39:13,980 that will help you find some mistakes. 2219 01:39:13,980 --> 01:39:14,920 So let me do this. 2220 01:39:14,920 --> 01:39:16,800 Let me close copy.c. 2221 01:39:16,800 --> 01:39:19,980 Let me open a program I wrote in advance called memory.c 2222 01:39:19,980 --> 01:39:22,120 that doesn't do anything really interesting. 2223 01:39:22,120 --> 01:39:24,070 But it's going to have two bugs in it. 2224 01:39:24,070 --> 01:39:27,090 Notice that I've included standardio.h as always. 2225 01:39:27,090 --> 01:39:30,270 I've also included standardlib.h, which is necessary now 2226 01:39:30,270 --> 01:39:33,690 for anything related to malloc and or free and the like. 2227 01:39:33,690 --> 01:39:34,890 Line six. 2228 01:39:34,890 --> 01:39:36,730 It's a little weird what I've done here. 2229 01:39:36,730 --> 01:39:42,300 But this is the manual way of asking for enough memory for an array. 2230 01:39:42,300 --> 01:39:45,690 In week two, how do we ask for memory for an array? 2231 01:39:45,690 --> 01:39:49,470 You very simply say, int x3. 2232 01:39:49,470 --> 01:39:52,140 And that gives you an array called x of size three. 2233 01:39:52,140 --> 01:39:55,770 But if you do it manually now using malloc, what you have to do 2234 01:39:55,770 --> 01:39:57,780 is use syntax like this. 2235 01:39:57,780 --> 01:40:02,173 You call malloc, you ask for three things times however big an int is. 2236 01:40:02,173 --> 01:40:03,090 Now we know it's four. 2237 01:40:03,090 --> 01:40:04,650 So you could literally write 12 here. 2238 01:40:04,650 --> 01:40:06,250 But this is more generic. 2239 01:40:06,250 --> 01:40:09,930 So three times the size of an integer will give you 12 dynamically. 2240 01:40:09,930 --> 01:40:11,280 And what does malloc return? 2241 01:40:11,280 --> 01:40:14,400 The address of the first byte you get back. 2242 01:40:14,400 --> 01:40:15,940 Where do I want to put that? 2243 01:40:15,940 --> 01:40:17,560 Well, I want to put it in a variable. 2244 01:40:17,560 --> 01:40:20,620 Now the variable can't just be int x because that's a number. 2245 01:40:20,620 --> 01:40:22,260 It's not an address per se. 2246 01:40:22,260 --> 01:40:25,530 If I want to store this address in a variable, I could call it x, 2247 01:40:25,530 --> 01:40:26,490 I could call it p. 2248 01:40:26,490 --> 01:40:31,200 But int star x just means that x is now the address of a chunk of memory, 2249 01:40:31,200 --> 01:40:33,420 specifically a chunk of memory that's big enough not 2250 01:40:33,420 --> 01:40:36,660 for one, but for three ints in total. 2251 01:40:36,660 --> 01:40:39,930 All right, now, I'm just sort of naively putting 2252 01:40:39,930 --> 01:40:43,210 our old friend 72, 73, and 33 at the first, second, 2253 01:40:43,210 --> 01:40:44,940 and third locations in memory. 2254 01:40:44,940 --> 01:40:47,340 But perhaps based on week two or week four, 2255 01:40:47,340 --> 01:40:49,530 I'm clearly screwing up here in a couple of ways. 2256 01:40:49,530 --> 01:40:52,200 Someone want to identify at least one bug? 2257 01:40:52,200 --> 01:40:53,193 What did I do wrong? 2258 01:40:53,193 --> 01:40:55,110 AUDIENCE: You start at zero instead of at one. 2259 01:40:55,110 --> 01:40:58,410 DAVID J. MALAN: Yeah, this is now amateur stuff. 2260 01:40:58,410 --> 01:41:00,790 I should be zero indexing not one indexing. 2261 01:41:00,790 --> 01:41:03,210 So this has got to be zero, one, two ultimately. 2262 01:41:03,210 --> 01:41:05,775 And other bugs that are maybe more week four specific? 2263 01:41:05,775 --> 01:41:08,340 2264 01:41:08,340 --> 01:41:09,180 Other bugs. 2265 01:41:09,180 --> 01:41:09,930 It's more subtle. 2266 01:41:09,930 --> 01:41:10,430 Yeah. 2267 01:41:10,430 --> 01:41:11,310 AUDIENCE: [INAUDIBLE] 2268 01:41:11,310 --> 01:41:12,390 DAVID J. MALAN: I'm not freeing the memory, right? 2269 01:41:12,390 --> 01:41:14,970 So I'm not practicing what I'm preaching by freeing this memory. 2270 01:41:14,970 --> 01:41:16,470 Now suppose these are non-obvious. 2271 01:41:16,470 --> 01:41:20,070 And honestly, after an hour or two of this, this shouldn't be obvious yet. 2272 01:41:20,070 --> 01:41:21,420 It will be over time. 2273 01:41:21,420 --> 01:41:25,830 How could I find these bugs with software as opposed 2274 01:41:25,830 --> 01:41:28,530 to just staring at the thing, or asking someone for help? 2275 01:41:28,530 --> 01:41:29,980 Well, let me propose this. 2276 01:41:29,980 --> 01:41:33,930 Let me first go ahead and run make memory to compile the program. 2277 01:41:33,930 --> 01:41:35,970 And it seems to work-- look fine. 2278 01:41:35,970 --> 01:41:37,800 There's no syntax errors at least. 2279 01:41:37,800 --> 01:41:40,838 Dot slash memory, notice, seems to work fine too. 2280 01:41:40,838 --> 01:41:42,880 Now this program doesn't do anything interesting. 2281 01:41:42,880 --> 01:41:44,610 There's no printf or anything like that. 2282 01:41:44,610 --> 01:41:45,760 But it didn't crash. 2283 01:41:45,760 --> 01:41:48,000 There's no segmentation fault. But that doesn't 2284 01:41:48,000 --> 01:41:51,060 mean there aren't bugs latent in the software. 2285 01:41:51,060 --> 01:41:54,090 And this is true, sadly, of all of today's software. 2286 01:41:54,090 --> 01:41:56,340 Chrome, and Microsoft Word, and other programs 2287 01:41:56,340 --> 01:42:00,120 surely have memory-related bugs that people at Google and Microsoft 2288 01:42:00,120 --> 01:42:01,080 haven't yet found. 2289 01:42:01,080 --> 01:42:04,410 But there are tools at least to find the most obvious of those bugs. 2290 01:42:04,410 --> 01:42:07,620 And we're going to introduce you now to a program called valgrind. 2291 01:42:07,620 --> 01:42:09,900 So valgrind, it's a fairly fancy program. 2292 01:42:09,900 --> 01:42:11,700 But we'll use it for very simple ways. 2293 01:42:11,700 --> 01:42:15,840 We'll look at your code and find memory errors as it's executing 2294 01:42:15,840 --> 01:42:18,060 and try to help you understand where they are. 2295 01:42:18,060 --> 01:42:20,190 So let me go back to VS Code here. 2296 01:42:20,190 --> 01:42:21,515 Memory seems to be fine. 2297 01:42:21,515 --> 01:42:23,640 I feel like, OK, I'm going to submit this homework. 2298 01:42:23,640 --> 01:42:24,180 All is good. 2299 01:42:24,180 --> 01:42:25,170 No error messages. 2300 01:42:25,170 --> 01:42:26,620 That's no longer the case. 2301 01:42:26,620 --> 01:42:28,740 Now you need to poke a little more at your code 2302 01:42:28,740 --> 01:42:30,820 to see if maybe there's still some bug there. 2303 01:42:30,820 --> 01:42:35,560 So let me do this. valgrind and then space, dot slash memory. 2304 01:42:35,560 --> 01:42:38,700 So just like debug50, you run it on a program you already compiled. 2305 01:42:38,700 --> 01:42:41,550 valgrind, I'm going to run it on a program I already compiled. 2306 01:42:41,550 --> 01:42:44,760 Let me zoom in on my terminal window so we can see more at once. 2307 01:42:44,760 --> 01:42:46,110 And Enter. 2308 01:42:46,110 --> 01:42:49,273 All right, the output is crazy cryptic for no good reason. 2309 01:42:49,273 --> 01:42:50,940 There's lots of numbers and equal signs. 2310 01:42:50,940 --> 01:42:52,200 It's a lot of clutter. 2311 01:42:52,200 --> 01:42:54,250 But there is some juicy information here. 2312 01:42:54,250 --> 01:42:55,950 And let me start from the top down. 2313 01:42:55,950 --> 01:42:58,470 Invalid write of size four. 2314 01:42:58,470 --> 01:43:02,400 So write means to change a value, read means to access a value. 2315 01:43:02,400 --> 01:43:06,000 And this is, again, esoteric, like a lot of our error messages are. 2316 01:43:06,000 --> 01:43:11,580 But it looks like after a block of size 12 alloc'd, and then there's 2317 01:43:11,580 --> 01:43:13,200 these weird hex notation. 2318 01:43:13,200 --> 01:43:14,580 There's some mention of malloc. 2319 01:43:14,580 --> 01:43:18,120 But honestly, the juicy part here is memory.c, line six. 2320 01:43:18,120 --> 01:43:21,610 That's probably my fault. So let's look at line six per that output. 2321 01:43:21,610 --> 01:43:24,300 Let me shrink the terminal window, look at line six. 2322 01:43:24,300 --> 01:43:26,160 OK, 12 is now germane. 2323 01:43:26,160 --> 01:43:29,000 If you did the mental math of the size of an n times 3, 2324 01:43:29,000 --> 01:43:31,170 12 is somehow involved here. 2325 01:43:31,170 --> 01:43:36,090 But line six is now happening next here. 2326 01:43:36,090 --> 01:43:37,650 That's where the memory came from. 2327 01:43:37,650 --> 01:43:38,500 What is this? 2328 01:43:38,500 --> 01:43:39,480 Let me zoom back in. 2329 01:43:39,480 --> 01:43:45,150 Where is there invalid write of size four? 2330 01:43:45,150 --> 01:43:47,820 What's perhaps going wrong here? 2331 01:43:47,820 --> 01:43:49,830 Invalid write of size four. 2332 01:43:49,830 --> 01:43:50,880 What does that mean? 2333 01:43:50,880 --> 01:43:53,850 It's like a very technical way of explaining. 2334 01:43:53,850 --> 01:43:57,270 The bug is actually one line later, on line seven, as we already identified. 2335 01:43:57,270 --> 01:43:57,945 Yeah. 2336 01:43:57,945 --> 01:43:58,820 AUDIENCE: [INAUDIBLE] 2337 01:43:58,820 --> 01:43:59,300 DAVID J. MALAN: Indeed. 2338 01:43:59,300 --> 01:44:00,467 And I misspoke a moment ago. 2339 01:44:00,467 --> 01:44:02,420 The bug actually arises here with line nine. 2340 01:44:02,420 --> 01:44:06,575 So after the allocation of memory, I'm somehow writing 4 bytes incorrectly. 2341 01:44:06,575 --> 01:44:08,450 And unfortunately, the onus is kind of on you 2342 01:44:08,450 --> 01:44:11,420 to sort of think through deductively what could that mean. 2343 01:44:11,420 --> 01:44:14,960 But I'm clearly touching 4 bytes of memory in these few lines of code 2344 01:44:14,960 --> 01:44:15,797 that I shouldn't be. 2345 01:44:15,797 --> 01:44:18,380 And hopefully here as the light bulb already went off earlier, 2346 01:44:18,380 --> 01:44:20,150 oh, I'm not zero indexing. 2347 01:44:20,150 --> 01:44:22,800 OK, that must mean that x bracket three, as you know, 2348 01:44:22,800 --> 01:44:25,170 is just too far past the chunk of memory. 2349 01:44:25,170 --> 01:44:28,670 So I'm invalidly writing to 4 bytes that I shouldn't be. 2350 01:44:28,670 --> 01:44:30,200 So again, it's not super obvious. 2351 01:44:30,200 --> 01:44:31,920 This is not super user friendly. 2352 01:44:31,920 --> 01:44:35,120 But at least it does give you a clue as to where that bug is. 2353 01:44:35,120 --> 01:44:38,690 So the fix there is going to be quite simply to change the one 2354 01:44:38,690 --> 01:44:42,020 to a zero, the two to a one, and the three to a two. 2355 01:44:42,020 --> 01:44:42,775 That'll fix that. 2356 01:44:42,775 --> 01:44:44,150 But there's still a second error. 2357 01:44:44,150 --> 01:44:46,250 And let me look at the cryptic output again. 2358 01:44:46,250 --> 01:44:50,180 Heap summary, some stuff there, OK, this does not sound good down here. 2359 01:44:50,180 --> 01:44:54,740 12 bytes in one blocks are definitely lost in loss record one of one. 2360 01:44:54,740 --> 01:44:56,510 Very arcane output two. 2361 01:44:56,510 --> 01:44:59,970 But clearly related to line six again, our allocation of memory. 2362 01:44:59,970 --> 01:45:02,300 Now here too, it's not obvious what the solution is. 2363 01:45:02,300 --> 01:45:04,010 But memory is lost. 2364 01:45:04,010 --> 01:45:05,900 AKA, this is a memory leak. 2365 01:45:05,900 --> 01:45:08,910 And now the deduction is kind of up to you. 2366 01:45:08,910 --> 01:45:09,720 What is leaking? 2367 01:45:09,720 --> 01:45:10,220 Oh, wait. 2368 01:45:10,220 --> 01:45:11,480 I didn't call free. 2369 01:45:11,480 --> 01:45:13,760 And so the second solution here is probably 2370 01:45:13,760 --> 01:45:16,242 to free x at the very end of the program. 2371 01:45:16,242 --> 01:45:18,950 And if you really want to be pedantic, you should probably check, 2372 01:45:18,950 --> 01:45:21,590 like I proposed earlier, if x is NULL, just 2373 01:45:21,590 --> 01:45:24,260 get out now while you still can and don't even 2374 01:45:24,260 --> 01:45:25,650 touch those other lines of code. 2375 01:45:25,650 --> 01:45:27,410 But if you get to the bottom, return zero. 2376 01:45:27,410 --> 01:45:30,740 But really, the takeaways are, I fixed my zero indexing 2377 01:45:30,740 --> 01:45:33,530 of the array to avoid the invalid write of size four. 2378 01:45:33,530 --> 01:45:36,390 And now, I'm freeing the memory that I asked for. 2379 01:45:36,390 --> 01:45:37,927 So there should be no leak lost. 2380 01:45:37,927 --> 01:45:39,260 All right, let's try this again. 2381 01:45:39,260 --> 01:45:41,900 Make memory, dot slash memory. 2382 01:45:41,900 --> 01:45:43,370 No visible errors yet. 2383 01:45:43,370 --> 01:45:45,980 But let me now increase my terminal window again, do 2384 01:45:45,980 --> 01:45:49,010 valgrind of dot slash memory, crossing my fingers, 2385 01:45:49,010 --> 01:45:53,598 and now all heap blocks were freed, no leaks are possible. 2386 01:45:53,598 --> 01:45:54,890 I don't see any invalid writes. 2387 01:45:54,890 --> 01:45:56,150 There's still a crazy amount of output. 2388 01:45:56,150 --> 01:45:57,350 But none of it is erroneous. 2389 01:45:57,350 --> 01:45:58,310 It's not bad. 2390 01:45:58,310 --> 01:46:00,020 Now I fixed my memory bugs. 2391 01:46:00,020 --> 01:46:01,970 And so now my ta, my tf, they're not going 2392 01:46:01,970 --> 01:46:03,890 to find them either because at least valgrind 2393 01:46:03,890 --> 01:46:05,990 has proactively done that for me. 2394 01:46:05,990 --> 01:46:08,717 Questions then on valgrind? 2395 01:46:08,717 --> 01:46:11,300 Generally, it's those two types of errors you might trip over. 2396 01:46:11,300 --> 01:46:14,720 There's not too much else in the way of arcane output. 2397 01:46:14,720 --> 01:46:17,550 Questions then on this? 2398 01:46:17,550 --> 01:46:18,050 No? 2399 01:46:18,050 --> 01:46:20,640 All right, well, what else might be going on? 2400 01:46:20,640 --> 01:46:22,530 So someone alluded to this earlier. 2401 01:46:22,530 --> 01:46:26,780 What happens when you, for instance, forget the NULL terminator 2402 01:46:26,780 --> 01:46:30,740 or you generally start poking around memory that you yourself didn't ask for 2403 01:46:30,740 --> 01:46:33,440 or looking at values you didn't put there? 2404 01:46:33,440 --> 01:46:34,940 Well, let me go ahead and open this. 2405 01:46:34,940 --> 01:46:39,140 Code of garbage.c, in honor of Oscar the Grouch here of sorts. 2406 01:46:39,140 --> 01:46:42,740 And here is a simple program if I hide my terminal window that 2407 01:46:42,740 --> 01:46:44,390 just does something kind of arbitrary. 2408 01:46:44,390 --> 01:46:47,870 I first declare an array called scores. 2409 01:46:47,870 --> 01:46:50,630 But I made it crazy big, like 1024. 2410 01:46:50,630 --> 01:46:52,520 That's a lot of integers. 2411 01:46:52,520 --> 01:46:53,430 But so be it. 2412 01:46:53,430 --> 01:46:55,520 And then I integrate over those integers. 2413 01:46:55,520 --> 01:46:57,350 And I print each of those scores out. 2414 01:46:57,350 --> 01:46:59,570 So I'm using week two syntax here. 2415 01:46:59,570 --> 01:47:02,630 But based on this program, what have I clearly not done that I did 2416 01:47:02,630 --> 01:47:04,400 do back in week two? 2417 01:47:04,400 --> 01:47:07,355 I've allocated the array, I'm printing the array, but, but, but-- 2418 01:47:07,355 --> 01:47:09,600 AUDIENCE: [INAUDIBLE] 2419 01:47:09,600 --> 01:47:12,600 DAVID J. MALAN: Yeah, I didn't initialize any values for that array. 2420 01:47:12,600 --> 01:47:14,160 Back in week two, we didn't do 1024. 2421 01:47:14,160 --> 01:47:14,910 We did like three. 2422 01:47:14,910 --> 01:47:17,310 And I typed in three test scores or something like that. 2423 01:47:17,310 --> 01:47:20,310 Here, I'm allocating memory even more than that just because I really 2424 01:47:20,310 --> 01:47:22,310 want to be dramatic with what I'm demonstrating. 2425 01:47:22,310 --> 01:47:24,760 But I'm not initializing those values to anything. 2426 01:47:24,760 --> 01:47:27,660 And so here, it turns out in C, generally, 2427 01:47:27,660 --> 01:47:30,300 if you do not initialize a variable, or you do not 2428 01:47:30,300 --> 01:47:32,490 initialize an array with explicit values, 2429 01:47:32,490 --> 01:47:35,220 there are going to be garbage values there, so to speak, 2430 01:47:35,220 --> 01:47:39,210 remnants of that memory having been used before probably 2431 01:47:39,210 --> 01:47:42,540 by some other function of yours, some library function, or something else 2432 01:47:42,540 --> 01:47:43,800 while your program is running. 2433 01:47:43,800 --> 01:47:46,480 Not a huge deal with a super small program like this. 2434 01:47:46,480 --> 01:47:49,140 But for anything sizable, memory is going to be used, 2435 01:47:49,140 --> 01:47:52,560 and unused, and used, and unused that is malloced and freed again and again. 2436 01:47:52,560 --> 01:47:55,650 There's going to be lots of garbage values in the computer's memory 2437 01:47:55,650 --> 01:47:59,460 by default. So if I open my terminal window here, let 2438 01:47:59,460 --> 01:48:04,230 me do make garbage, let me zoom in on my terminal so we can see the output. 2439 01:48:04,230 --> 01:48:06,960 When I run dot slash garbage, theoretically, I 2440 01:48:06,960 --> 01:48:11,445 should see 1,024 integers, but none of which have been initialized. 2441 01:48:11,445 --> 01:48:13,320 Now I'm going to get lucky with some of them. 2442 01:48:13,320 --> 01:48:16,470 And it looks like, wow, OK, a lot of them are initialized to zero. 2443 01:48:16,470 --> 01:48:19,260 And C does in some contexts initialize memory for you 2444 01:48:19,260 --> 01:48:22,770 to zero, at least at the beginning, but not again and again typically. 2445 01:48:22,770 --> 01:48:27,870 But if I start scrolling backwards in time at this array of size 1024, 2446 01:48:27,870 --> 01:48:30,780 where did these values come from? 2447 01:48:30,780 --> 01:48:34,920 So just random positive and negative numbers interspersed among the zeros? 2448 01:48:34,920 --> 01:48:38,848 Well, that's because I'm literally poking around on the random 1,024 2449 01:48:38,848 --> 01:48:40,140 bytes of the computer's memory. 2450 01:48:40,140 --> 01:48:41,440 Who knows what's there? 2451 01:48:41,440 --> 01:48:43,960 So the lesson here is that garbage values are indeed 2452 01:48:43,960 --> 01:48:47,910 this term of R. It means that a variable that you might have 2453 01:48:47,910 --> 01:48:49,980 defined that you might have declared. 2454 01:48:49,980 --> 01:48:53,690 If you don't give it an explicit value, who knows what's going to be there? 2455 01:48:53,690 --> 01:48:55,440 And the lesson here is just don't do that. 2456 01:48:55,440 --> 01:48:58,200 Always initialize variables to something, 2457 01:48:58,200 --> 01:49:01,950 either yourself, or prompting the human for it. 2458 01:49:01,950 --> 01:49:05,650 Questions about garbage values. 2459 01:49:05,650 --> 01:49:09,540 You'll see them sometimes if you print things you shouldn't or touch arrays 2460 01:49:09,540 --> 01:49:11,590 beyond their boundaries. 2461 01:49:11,590 --> 01:49:12,090 All right. 2462 01:49:12,090 --> 01:49:15,330 So maybe to make this a little visual too, it turns out that a lot of things 2463 01:49:15,330 --> 01:49:16,860 can go wrong unfortunately with pointers. 2464 01:49:16,860 --> 01:49:17,910 And we've seen some of them. 2465 01:49:17,910 --> 01:49:20,118 And here's another program that's a little contrived. 2466 01:49:20,118 --> 01:49:20,860 It's very simple. 2467 01:49:20,860 --> 01:49:23,618 And it just is about manipulating values. 2468 01:49:23,618 --> 01:49:25,410 It doesn't do anything useful per se except 2469 01:49:25,410 --> 01:49:26,980 demonstrate some of today's concepts. 2470 01:49:26,980 --> 01:49:29,580 So in main here, let me propose that we declare 2471 01:49:29,580 --> 01:49:32,970 a pointer called x that's going to store eventually the address of an integer 2472 01:49:32,970 --> 01:49:33,630 apparently. 2473 01:49:33,630 --> 01:49:36,420 Here's another one called y that's going to store the address of an integer 2474 01:49:36,420 --> 01:49:36,990 as well. 2475 01:49:36,990 --> 01:49:40,995 Here now, I'm allocating enough memory to fit one integer. 2476 01:49:40,995 --> 01:49:42,120 Now technically, it's four. 2477 01:49:42,120 --> 01:49:42,662 We know that. 2478 01:49:42,662 --> 01:49:45,720 But size of int just gives me that answer dynamically. 2479 01:49:45,720 --> 01:49:47,760 So it will work on all systems. 2480 01:49:47,760 --> 01:49:52,230 And I'm going to store the address that malloc finds for me in x. 2481 01:49:52,230 --> 01:49:56,610 Then I go to x and put the number 42 there. 2482 01:49:56,610 --> 01:49:57,690 All right, why? 2483 01:49:57,690 --> 01:50:00,630 The sort of meaning of life, the universe, and everything here, 2484 01:50:00,630 --> 01:50:05,020 but star x, again, just means go to that address and put a value there. 2485 01:50:05,020 --> 01:50:05,670 So why? 2486 01:50:05,670 --> 01:50:06,240 I don't know. 2487 01:50:06,240 --> 01:50:08,610 But it's just correct at this point. 2488 01:50:08,610 --> 01:50:10,170 But what about this line here? 2489 01:50:10,170 --> 01:50:12,900 Star y equals 13. 2490 01:50:12,900 --> 01:50:14,280 Unlucky in this case. 2491 01:50:14,280 --> 01:50:17,362 What's bad about this line here, star y? 2492 01:50:17,362 --> 01:50:20,070 It's a combination now of today's primitives and that point here. 2493 01:50:20,070 --> 01:50:20,922 Yeah. 2494 01:50:20,922 --> 01:50:21,822 AUDIENCE: [INAUDIBLE] 2495 01:50:21,822 --> 01:50:24,780 DAVID J. MALAN: Yeah, we didn't ask the computer to allocate any space. 2496 01:50:24,780 --> 01:50:28,750 So y was not initialized with an equal sign at any point to anything. 2497 01:50:28,750 --> 01:50:31,410 And so what is inside y so to speak? 2498 01:50:31,410 --> 01:50:32,077 A garbage value. 2499 01:50:32,077 --> 01:50:35,077 Maybe it's zero, which isn't bad, because at least it's nice and simple. 2500 01:50:35,077 --> 01:50:37,090 But maybe it's some crazy large positive number, 2501 01:50:37,090 --> 01:50:38,650 or some crazy large negative number. 2502 01:50:38,650 --> 01:50:40,710 Either way, odds are if I go to this address 2503 01:50:40,710 --> 01:50:44,562 or that address randomly with star y, bad things are going to happen. 2504 01:50:44,562 --> 01:50:46,020 And so let me go ahead and propose. 2505 01:50:46,020 --> 01:50:47,200 Well, let's not do that. 2506 01:50:47,200 --> 01:50:50,655 Let's actually do this instead, assign y equal to x. 2507 01:50:50,655 --> 01:50:51,780 And we've done that before. 2508 01:50:51,780 --> 01:50:56,580 And then I can go to y now and change what was a 42 to a 13. 2509 01:50:56,580 --> 01:50:57,120 Again, why? 2510 01:50:57,120 --> 01:50:58,840 This is just for educational sake. 2511 01:50:58,840 --> 01:51:04,020 But for now, this does not crash because I only de-reference y with star y 2512 01:51:04,020 --> 01:51:05,550 after actually giving it a value. 2513 01:51:05,550 --> 01:51:09,570 Albeit, a duplicate value similar to our copy example earlier. 2514 01:51:09,570 --> 01:51:12,333 So our friends at Stanford have put together a wonderful visual. 2515 01:51:12,333 --> 01:51:13,500 It's about two minutes long. 2516 01:51:13,500 --> 01:51:15,960 Allow me to dramatically dim the lights, if we could, 2517 01:51:15,960 --> 01:51:20,550 and play with what happens with memory when you do bad things like this. 2518 01:51:20,550 --> 01:51:21,334 [VIDEO PLAYBACK] 2519 01:51:21,334 --> 01:51:23,923 [MUSIC PLAYING] 2520 01:51:23,923 --> 01:51:24,840 SPEAKER 1: Hey, Binky. 2521 01:51:24,840 --> 01:51:25,620 Wake up. 2522 01:51:25,620 --> 01:51:28,230 It's time for pointer fun. 2523 01:51:28,230 --> 01:51:29,310 BINKY: What's that? 2524 01:51:29,310 --> 01:51:30,960 Learn about pointers? 2525 01:51:30,960 --> 01:51:32,253 Oh, goody. 2526 01:51:32,253 --> 01:51:34,170 SPEAKER 1: Well, to get started, I guess we're 2527 01:51:34,170 --> 01:51:35,760 going to need a couple of pointers. 2528 01:51:35,760 --> 01:51:40,370 BINKY: OK, this code allocates two pointers which can point to integers. 2529 01:51:40,370 --> 01:51:42,352 SPEAKER 1: OK, well I see the two pointers. 2530 01:51:42,352 --> 01:51:44,310 But they don't seem to be pointing to anything. 2531 01:51:44,310 --> 01:51:45,143 BINKY: That's right. 2532 01:51:45,143 --> 01:51:47,220 Initially, pointers don't point to anything. 2533 01:51:47,220 --> 01:51:49,500 The things they point to are called pointees. 2534 01:51:49,500 --> 01:51:51,243 And setting them up is a separate step. 2535 01:51:51,243 --> 01:51:52,410 SPEAKER 1: Oh, right, right. 2536 01:51:52,410 --> 01:51:53,130 I knew that. 2537 01:51:53,130 --> 01:51:54,990 The pointees are separate. 2538 01:51:54,990 --> 01:51:57,390 So how do you allocate a pointee? 2539 01:51:57,390 --> 01:52:01,020 BINKY: OK, well, this code allocates a new integer pointee 2540 01:52:01,020 --> 01:52:04,043 and this part sets x to point to it. 2541 01:52:04,043 --> 01:52:05,460 SPEAKER 1: Hey, that looks better. 2542 01:52:05,460 --> 01:52:07,040 So make it do something. 2543 01:52:07,040 --> 01:52:10,520 BINKY: OK, I'll de-reference the pointer x to store the number 2544 01:52:10,520 --> 01:52:12,620 42 into its pointee. 2545 01:52:12,620 --> 01:52:16,250 For this trick, I'll need my magic wand of de-referencing. 2546 01:52:16,250 --> 01:52:19,940 SPEAKER 1: Your magic wand of de-referencing? 2547 01:52:19,940 --> 01:52:21,520 That's great. 2548 01:52:21,520 --> 01:52:23,200 BINKY: This is what the code looks like. 2549 01:52:23,200 --> 01:52:26,025 I'll just set up the number and-- 2550 01:52:26,025 --> 01:52:26,900 SPEAKER 1: Hey, look. 2551 01:52:26,900 --> 01:52:28,190 There it goes. 2552 01:52:28,190 --> 01:52:31,790 So doing a de-reference on x follows the arrow 2553 01:52:31,790 --> 01:52:35,240 to access its pointee, in this case, the store 42 in there. 2554 01:52:35,240 --> 01:52:39,770 Hey, try using it to store the number 13 through the other pointer, y. 2555 01:52:39,770 --> 01:52:41,180 BINKY: OK. 2556 01:52:41,180 --> 01:52:45,290 Just go over here to y and get the number 13 set up, 2557 01:52:45,290 --> 01:52:49,550 and then take the wand of de-referencing and just-- 2558 01:52:49,550 --> 01:52:50,948 [BUZZER SOUND] whoa! 2559 01:52:50,948 --> 01:52:51,740 SPEAKER 1: Oh, hey. 2560 01:52:51,740 --> 01:52:53,120 That didn't work. 2561 01:52:53,120 --> 01:52:58,580 Say, Binky, I don't think de-referencing y is a good idea because setting up 2562 01:52:58,580 --> 01:53:00,210 the pointee is a separate step. 2563 01:53:00,210 --> 01:53:02,600 And I don't think we ever did it. 2564 01:53:02,600 --> 01:53:03,650 BINKY: Good point. 2565 01:53:03,650 --> 01:53:06,110 SPEAKER 1: Yeah, we allocated the pointer y, 2566 01:53:06,110 --> 01:53:09,320 but we never set it to point to a pointee. 2567 01:53:09,320 --> 01:53:10,458 BINKY: Very observant. 2568 01:53:10,458 --> 01:53:12,500 SPEAKER 1: Hey, you're looking good there, Binky. 2569 01:53:12,500 --> 01:53:15,440 Can you fix it so that y points to the same point as x? 2570 01:53:15,440 --> 01:53:16,160 BINKY: Sure. 2571 01:53:16,160 --> 01:53:18,782 I'll use my magic wand of pointer assignment. 2572 01:53:18,782 --> 01:53:20,990 SPEAKER 1: Is that going to be a problem like before? 2573 01:53:20,990 --> 01:53:22,910 BINKY: No, this doesn't touch the pointees. 2574 01:53:22,910 --> 01:53:26,540 It just changes one pointer to point to the same thing as another. 2575 01:53:26,540 --> 01:53:27,590 SPEAKER 1: Oh, I see. 2576 01:53:27,590 --> 01:53:30,575 Now y points to the same place as x. 2577 01:53:30,575 --> 01:53:32,150 So wait, now y is fixed. 2578 01:53:32,150 --> 01:53:33,230 It has a pointee. 2579 01:53:33,230 --> 01:53:35,210 So you can try the wand of de-referencing again 2580 01:53:35,210 --> 01:53:37,790 to send the 13 over. 2581 01:53:37,790 --> 01:53:40,182 BINKY: OK, here it goes. 2582 01:53:40,182 --> 01:53:41,390 SPEAKER 1: Hey, look at that. 2583 01:53:41,390 --> 01:53:43,190 Now de-referencing works on y. 2584 01:53:43,190 --> 01:53:47,210 And because the pointers are sharing that one pointee, they both see the 13. 2585 01:53:47,210 --> 01:53:48,380 BINKY: Yeah, sharing. 2586 01:53:48,380 --> 01:53:48,920 Whatever. 2587 01:53:48,920 --> 01:53:50,917 So are we going to switch places now? 2588 01:53:50,917 --> 01:53:51,750 SPEAKER 1: Oh, look. 2589 01:53:51,750 --> 01:53:52,575 We're out of time. 2590 01:53:52,575 --> 01:53:53,075 BINKY: But-- 2591 01:53:53,075 --> 01:53:53,270 [END PLAYBACK] 2592 01:53:53,270 --> 01:53:54,980 DAVID J. MALAN: Our thanks to Professor Nick Parlante 2593 01:53:54,980 --> 01:53:57,290 of Stanford for spending a huge amount of time 2594 01:53:57,290 --> 01:53:59,180 doing stop motion animation for that. 2595 01:53:59,180 --> 01:54:02,120 But hopefully now, you have a sense of what too can go wrong 2596 01:54:02,120 --> 01:54:04,612 when you misuse a memory in this way. 2597 01:54:04,612 --> 01:54:06,320 But at the end of the day, we really only 2598 01:54:06,320 --> 01:54:08,070 have these four new building blocks today, 2599 01:54:08,070 --> 01:54:11,090 like the star operator, the ampersand operator, malloc, and free. 2600 01:54:11,090 --> 01:54:13,340 And really with that, and the underlying understanding 2601 01:54:13,340 --> 01:54:15,810 of what your computer is doing underneath the hood, 2602 01:54:15,810 --> 01:54:18,242 we have this way now to really manipulate things 2603 01:54:18,242 --> 01:54:19,700 in memory, for better or for worse. 2604 01:54:19,700 --> 01:54:21,960 And eventually, we'll see how we can build things. 2605 01:54:21,960 --> 01:54:23,930 But we can also now use today's primitives 2606 01:54:23,930 --> 01:54:26,390 to better explain some things that we've been 2607 01:54:26,390 --> 01:54:29,130 asking you to take for granted over the past several weeks. 2608 01:54:29,130 --> 01:54:33,200 So for instance, let me propose that we do-- 2609 01:54:33,200 --> 01:54:35,180 one volunteer up here if we could. 2610 01:54:35,180 --> 01:54:37,392 Could we get one volunteer who-- 2611 01:54:37,392 --> 01:54:38,600 you want to come straight up? 2612 01:54:38,600 --> 01:54:39,810 Yep, right in the middle. 2613 01:54:39,810 --> 01:54:40,070 Come on. 2614 01:54:40,070 --> 01:54:41,903 You'll have to take a left or a right there. 2615 01:54:41,903 --> 01:54:47,760 2616 01:54:47,760 --> 01:54:48,360 All right. 2617 01:54:48,360 --> 01:54:52,380 So we have two empty glasses here and two colors of liquid. 2618 01:54:52,380 --> 01:54:57,150 And we have, let me give you the mic, if you'd like to say hello to the group. 2619 01:54:57,150 --> 01:54:57,840 MOINE: Hello. 2620 01:54:57,840 --> 01:54:58,950 I'm Moine. 2621 01:54:58,950 --> 01:55:00,780 I'm in [INAUDIBLE] and first year. 2622 01:55:00,780 --> 01:55:01,170 DAVID J. MALAN: All right. 2623 01:55:01,170 --> 01:55:01,670 Welcome. 2624 01:55:01,670 --> 01:55:02,520 Well, welcome here. 2625 01:55:02,520 --> 01:55:06,390 I'm going to go ahead and fill these two glasses with this colored liquid, 2626 01:55:06,390 --> 01:55:08,280 purple here on my right. 2627 01:55:08,280 --> 01:55:11,430 Let's fill up a glass here. 2628 01:55:11,430 --> 01:55:12,690 MOINE: It's ominous. 2629 01:55:12,690 --> 01:55:14,460 DAVID J. MALAN: Yes, don't drink. 2630 01:55:14,460 --> 01:55:18,250 And now we'll put some orange in here. 2631 01:55:18,250 --> 01:55:21,660 And what we'd like you to do for the audience, if you don't mind, 2632 01:55:21,660 --> 01:55:23,583 is swap the two values. 2633 01:55:23,583 --> 01:55:25,500 You've got a purple value and an orange value. 2634 01:55:25,500 --> 01:55:28,770 And I'd like the purple liquid in this glass and the orange liquid 2635 01:55:28,770 --> 01:55:29,655 in that glass please. 2636 01:55:29,655 --> 01:55:32,652 2637 01:55:32,652 --> 01:55:34,010 MOINE: Can I have another glass? 2638 01:55:34,010 --> 01:55:34,550 DAVID J. MALAN: Oh, OK. 2639 01:55:34,550 --> 01:55:35,390 Good intuition. 2640 01:55:35,390 --> 01:55:37,267 But for the microphone-- 2641 01:55:37,267 --> 01:55:38,600 MOINE: Can I have another glass? 2642 01:55:38,600 --> 01:55:39,440 DAVID J. MALAN: So you can. 2643 01:55:39,440 --> 01:55:41,552 And just in fact, I brought one here for you. 2644 01:55:41,552 --> 01:55:43,010 Why are you asking for this though? 2645 01:55:43,010 --> 01:55:45,620 MOINE: Because if I just pour this into this, then it'll get mixed up. 2646 01:55:45,620 --> 01:55:46,537 DAVID J. MALAN: Right. 2647 01:55:46,537 --> 01:55:49,410 So obviously we need like a temporary variable, if you will. 2648 01:55:49,410 --> 01:55:52,765 So here is your temporary variable. 2649 01:55:52,765 --> 01:55:53,640 MOINE: And you want-- 2650 01:55:53,640 --> 01:55:54,330 OK. 2651 01:55:54,330 --> 01:55:55,205 DAVID J. MALAN: Yeah. 2652 01:55:55,205 --> 01:55:56,640 There's-- yeah. 2653 01:55:56,640 --> 01:55:59,730 All right so pouring the value of the orange glass 2654 01:55:59,730 --> 01:56:03,390 into this temporary variable, if you will. 2655 01:56:03,390 --> 01:56:04,080 All right. 2656 01:56:04,080 --> 01:56:09,240 And now pouring the value of the purple glass into the former orange glass. 2657 01:56:09,240 --> 01:56:12,740 2658 01:56:12,740 --> 01:56:14,270 And now-- 2659 01:56:14,270 --> 01:56:15,570 MOINE: And now this goes back. 2660 01:56:15,570 --> 01:56:19,965 DAVID J. MALAN: The temporary value goes back into the original purple glass. 2661 01:56:19,965 --> 01:56:21,840 And now I think we give you round of applause 2662 01:56:21,840 --> 01:56:23,132 for having done that very well. 2663 01:56:23,132 --> 01:56:25,970 [INAUDIBLE] 2664 01:56:25,970 --> 01:56:27,120 MOINE: Thank you. 2665 01:56:27,120 --> 01:56:28,280 DAVID J. MALAN: All right. 2666 01:56:28,280 --> 01:56:31,680 So it should go without saying that in the real world, that's how you do this. 2667 01:56:31,680 --> 01:56:34,430 And in fact, in code, that's pretty much how you have to do this, 2668 01:56:34,430 --> 01:56:37,632 although ask us sometime for a super fancy way of doing it 2669 01:56:37,632 --> 01:56:38,840 without a temporary variable. 2670 01:56:38,840 --> 01:56:41,090 It turns out that is possible using bits. 2671 01:56:41,090 --> 01:56:43,703 But for now, let's suppose that, indeed, this demonstrates 2672 01:56:43,703 --> 01:56:44,870 what is the reality in code. 2673 01:56:44,870 --> 01:56:46,910 If you want to swap two values, you need to have 2674 01:56:46,910 --> 01:56:48,990 something like a temporary variable. 2675 01:56:48,990 --> 01:56:52,820 So for instance, on the screen here is a-- the beginning of a function 2676 01:56:52,820 --> 01:56:56,420 called swap, whose purpose in life is to, as you just did, swap two values, 2677 01:56:56,420 --> 01:56:59,270 call it A and B. So orange and purple respectively 2678 01:56:59,270 --> 01:57:01,730 are now just A and B and integers to keep things simple. 2679 01:57:01,730 --> 01:57:03,688 Well, here is the corresponding code, if I may, 2680 01:57:03,688 --> 01:57:05,300 to what you just enacted as a human. 2681 01:57:05,300 --> 01:57:08,700 You declared a temporary variable, a call temp in this case, 2682 01:57:08,700 --> 01:57:10,640 which was like me handing you the empty glass. 2683 01:57:10,640 --> 01:57:14,000 And you stored the orange liquid in it, AKA A, you then 2684 01:57:14,000 --> 01:57:19,190 change the value of the formerly orange glass to be equal to the purple 2685 01:57:19,190 --> 01:57:20,480 by pouring one into the other. 2686 01:57:20,480 --> 01:57:22,040 And then you did the opposite there. 2687 01:57:22,040 --> 01:57:25,500 Now at the end of this, you still have a temporary variable that's now empty. 2688 01:57:25,500 --> 01:57:27,260 So it's temporary in literally that sense. 2689 01:57:27,260 --> 01:57:28,552 You just don't need it anymore. 2690 01:57:28,552 --> 01:57:30,570 But it was necessary along the way. 2691 01:57:30,570 --> 01:57:33,770 So I dare say this code is correct logically. 2692 01:57:33,770 --> 01:57:39,440 This will swap two values A and B thanks to the use of that temporary variable. 2693 01:57:39,440 --> 01:57:42,348 Unfortunately though, if I actually do this in practice, 2694 01:57:42,348 --> 01:57:44,390 let me go over to VS Code here and open a program 2695 01:57:44,390 --> 01:57:48,920 I wrote in advance called swap.c, which does this as follows. 2696 01:57:48,920 --> 01:57:52,790 In here, notice I have my prototype for a swap function at the very top. 2697 01:57:52,790 --> 01:57:54,590 And let me scroll down to the very bottom. 2698 01:57:54,590 --> 01:57:56,210 There is that exact same code. 2699 01:57:56,210 --> 01:58:00,050 So I'm-- the same code for swapping two values A and B, 2700 01:58:00,050 --> 01:58:02,253 which I'm claiming for now is correct. 2701 01:58:02,253 --> 01:58:04,670 Now if I go back up here, what is main going to do for us? 2702 01:58:04,670 --> 01:58:06,628 Main is really just meant to be a demonstration 2703 01:58:06,628 --> 01:58:08,220 of the correctness of your algorithm. 2704 01:58:08,220 --> 01:58:11,990 So here I declare on line seven and eight, two variables, x and y, 2705 01:58:11,990 --> 01:58:14,660 being one and two arbitrarily respectively. 2706 01:58:14,660 --> 01:58:18,230 I then on line 10 just print out what the value of x is and y 2707 01:58:18,230 --> 01:58:20,210 is just so I can see it on the screen. 2708 01:58:20,210 --> 01:58:22,940 I then call the swap function on line 11, 2709 01:58:22,940 --> 01:58:26,780 and then I literally print the exact same thing again, I print x and y. 2710 01:58:26,780 --> 01:58:29,190 Hopefully, it'll obviously be the opposite. 2711 01:58:29,190 --> 01:58:31,640 So I think logically, swap is indeed correct. 2712 01:58:31,640 --> 01:58:34,520 Let me do make swap and then dot slash swap. 2713 01:58:34,520 --> 01:58:40,100 And I should see x is 1, y is 2, and then hopefully x is 2, y is 1. 2714 01:58:40,100 --> 01:58:41,630 Enter. 2715 01:58:41,630 --> 01:58:42,980 But I don't. 2716 01:58:42,980 --> 01:58:46,800 And it did work in the sense that the code compiled, the code ran. 2717 01:58:46,800 --> 01:58:49,010 So it's not like some bug in that sense. 2718 01:58:49,010 --> 01:58:52,850 But because I don't quite understand what's going on underneath the hood, 2719 01:58:52,850 --> 01:58:55,370 at least as of right now, or prior weeks, 2720 01:58:55,370 --> 01:58:59,330 this code here is indeed buggy in some way. 2721 01:58:59,330 --> 01:59:02,660 But does anyone have an intuition, perhaps based on today's discussion, 2722 01:59:02,660 --> 01:59:06,500 as to why this code, while logically correct, clearly works in reality, 2723 01:59:06,500 --> 01:59:09,670 apparently does not work in C? 2724 01:59:09,670 --> 01:59:10,780 Any intuition? 2725 01:59:10,780 --> 01:59:11,320 Yeah. 2726 01:59:11,320 --> 01:59:13,980 AUDIENCE: [INAUDIBLE] 2727 01:59:13,980 --> 01:59:14,980 DAVID J. MALAN: Perfect. 2728 01:59:14,980 --> 01:59:17,147 And to summarize, here's that term of art I promise. 2729 01:59:17,147 --> 01:59:20,680 When you call a function and pass in two arguments, like a and b, 2730 01:59:20,680 --> 01:59:23,120 you're passing those arguments by value. 2731 01:59:23,120 --> 01:59:25,850 So copies of those values effectively. 2732 01:59:25,850 --> 01:59:28,610 And so when swap is actually called here-- 2733 01:59:28,610 --> 01:59:29,110 sorry. 2734 01:59:29,110 --> 01:59:31,682 When you pass an x and y, we call them a and b. 2735 01:59:31,682 --> 01:59:32,890 But that's just a convention. 2736 01:59:32,890 --> 01:59:35,410 We could call the parameters anything we want. 2737 01:59:35,410 --> 01:59:39,950 What a and b are are indeed the values of x and y respectively, 2738 01:59:39,950 --> 01:59:41,810 but copies of the values. 2739 01:59:41,810 --> 01:59:45,220 So this code here is very successfully, in VS Code too, 2740 01:59:45,220 --> 01:59:47,110 swapping the values of a and b. 2741 01:59:47,110 --> 01:59:51,610 But as you note, because I'm passing them in by value, literally one, 2742 01:59:51,610 --> 01:59:55,450 literally two, and not by another term of art, by reference, AKA 2743 01:59:55,450 --> 01:59:59,650 by their addresses, swap has no capability in C 2744 01:59:59,650 --> 02:00:02,740 to go to those locations, swap the actual locations, 2745 02:00:02,740 --> 02:00:04,990 just like we did successfully in reality. 2746 02:00:04,990 --> 02:00:07,300 But I think we really have the syntax already 2747 02:00:07,300 --> 02:00:10,690 for solving this if we consider that really, this is just an issue of scope. 2748 02:00:10,690 --> 02:00:12,790 And we've talked a bit about scope in the past, 2749 02:00:12,790 --> 02:00:16,150 whereby scope refers to the context in which a variable lives. 2750 02:00:16,150 --> 02:00:18,310 And generally, I've claimed that a variable exists 2751 02:00:18,310 --> 02:00:20,290 between the most recent curly braces. 2752 02:00:20,290 --> 02:00:24,010 And that's pretty much true for the swap function because a and b, 2753 02:00:24,010 --> 02:00:27,670 I now claim again, exist only in the context of these curly braces. 2754 02:00:27,670 --> 02:00:32,048 They have no effect on main up top, which has different variables x and y. 2755 02:00:32,048 --> 02:00:34,840 But we can consider now what's really going on underneath the hood. 2756 02:00:34,840 --> 02:00:37,360 And here's that same picture of memory, as we've seen in the past. 2757 02:00:37,360 --> 02:00:39,550 If we zoom in and see on these little black chips, 2758 02:00:39,550 --> 02:00:41,200 this is a bunch of bytes of memory. 2759 02:00:41,200 --> 02:00:43,780 If I create a grid out of it just to kind of highlight 2760 02:00:43,780 --> 02:00:47,500 that we can address each of these bytes, throw away the plastic circuit board, 2761 02:00:47,500 --> 02:00:51,430 and focus only on those bytes, what's going on underneath the hood 2762 02:00:51,430 --> 02:00:55,600 when functions are called in C, which you've been doing for weeks now? 2763 02:00:55,600 --> 02:00:59,350 Well, this rectangle of memory, if we kind of abstracted away further, 2764 02:00:59,350 --> 02:01:02,870 is generally broken up into different regions or segments, 2765 02:01:02,870 --> 02:01:04,000 like I called them earlier. 2766 02:01:04,000 --> 02:01:06,490 And different things get put in different parts 2767 02:01:06,490 --> 02:01:07,760 of the computer's memory. 2768 02:01:07,760 --> 02:01:10,330 And without getting too into the weeds, when 2769 02:01:10,330 --> 02:01:12,490 you double click a program on your Mac or PC, 2770 02:01:12,490 --> 02:01:15,940 or when you do dot slash something on a Linux, 2771 02:01:15,940 --> 02:01:19,630 you are loading your machine code into the computer's memory 2772 02:01:19,630 --> 02:01:21,410 from the computer's hard drive. 2773 02:01:21,410 --> 02:01:24,550 So all the zeros and ones that compose Microsoft Word, or Chrome, 2774 02:01:24,550 --> 02:01:27,970 or whatever are loaded into the computer's memory or RAM. 2775 02:01:27,970 --> 02:01:31,360 And by convention, it's put up top in the so-called machine code area. 2776 02:01:31,360 --> 02:01:34,660 And that's how the CPU has access to them quickly at that. 2777 02:01:34,660 --> 02:01:37,450 Below that are what are going to be our globals. 2778 02:01:37,450 --> 02:01:40,360 So global variables, which we haven't used very much in C. 2779 02:01:40,360 --> 02:01:44,080 But you can declare them outside of main at the very top of your files. 2780 02:01:44,080 --> 02:01:47,320 If you have globals, they end up up there as well, just FYI. 2781 02:01:47,320 --> 02:01:49,180 And then there's this big chunk of memory 2782 02:01:49,180 --> 02:01:52,480 that we saw valgrind mention indirectly earlier called the heap. 2783 02:01:52,480 --> 02:01:54,340 And it's kind of like heap, literally. 2784 02:01:54,340 --> 02:01:57,550 It's a heap of memory that you can use as you see fit. 2785 02:01:57,550 --> 02:02:01,030 And the heap is where malloc grabs memory from. 2786 02:02:01,030 --> 02:02:02,920 So initially, there's nothing in the heap. 2787 02:02:02,920 --> 02:02:04,480 It's just a big chunk of free space. 2788 02:02:04,480 --> 02:02:08,680 Any time you call malloc, malloc kind of carves out from the heap area 2789 02:02:08,680 --> 02:02:09,790 more and more bytes. 2790 02:02:09,790 --> 02:02:11,920 And malloc keeps track of, essentially, which 2791 02:02:11,920 --> 02:02:13,480 bytes have already been allocated. 2792 02:02:13,480 --> 02:02:14,890 So initially, it looks empty. 2793 02:02:14,890 --> 02:02:17,290 But different bytes, squares if you will, 2794 02:02:17,290 --> 02:02:20,560 keep getting requested again and again as a program runs thanks to functions 2795 02:02:20,560 --> 02:02:21,340 like malloc. 2796 02:02:21,340 --> 02:02:23,900 And it grows, if you will, conceptually down. 2797 02:02:23,900 --> 02:02:27,173 So the more and more memory you request from malloc, it starts up here. 2798 02:02:27,173 --> 02:02:29,590 But then the next chunk you get is down here conceptually. 2799 02:02:29,590 --> 02:02:31,250 The next chunk is down here, down here. 2800 02:02:31,250 --> 02:02:35,170 So it kind of fills the available space in the computer's overall memory. 2801 02:02:35,170 --> 02:02:38,740 But there's this other chunk of memory called the stack. 2802 02:02:38,740 --> 02:02:42,250 And just like a stack of trays in Annenberg or a cafeteria, 2803 02:02:42,250 --> 02:02:45,340 kind of grow upward, so does a stack of memory. 2804 02:02:45,340 --> 02:02:50,620 And it turns out the stack is where functions have variables, 2805 02:02:50,620 --> 02:02:53,530 and have arguments stored temporarily. 2806 02:02:53,530 --> 02:02:57,100 So whenever you call a function and it has variables inside of it, 2807 02:02:57,100 --> 02:03:00,010 or has arguments there too, this is the chunk of memory, 2808 02:03:00,010 --> 02:03:03,610 and the computer's overall block of memory, that are used for functions. 2809 02:03:03,610 --> 02:03:06,425 But any time you call malloc, it's memory up here. 2810 02:03:06,425 --> 02:03:08,800 At the end of the day, they just had to pick a direction. 2811 02:03:08,800 --> 02:03:10,660 Top, bottom, and technically it's an artist's rendition. 2812 02:03:10,660 --> 02:03:13,310 You could circle this thing around any orientation you want. 2813 02:03:13,310 --> 02:03:16,540 But you're just using a finite amount of memory in this conventional way. 2814 02:03:16,540 --> 02:03:19,360 Malloc starts here, functions start here. 2815 02:03:19,360 --> 02:03:22,415 Now you can kind of see where bad things can happen. 2816 02:03:22,415 --> 02:03:24,790 And indeed, one of the other reasons programs, computers, 2817 02:03:24,790 --> 02:03:27,910 can crash is if you ask for way too much memory from the heap 2818 02:03:27,910 --> 02:03:30,040 by calling malloc many, many, many times, 2819 02:03:30,040 --> 02:03:33,730 or if you call way too many functions, or accidentally per last week, 2820 02:03:33,730 --> 02:03:37,570 you recurse infinitely many times, you might have a segmentation fault. 2821 02:03:37,570 --> 02:03:40,100 And that's because you're using too much stack memory. 2822 02:03:40,100 --> 02:03:42,460 So this is bound to be a problem eventually. 2823 02:03:42,460 --> 02:03:45,550 And the onus is on the programmer to just minimize 2824 02:03:45,550 --> 02:03:49,270 the probability of doing that and really avoid the possibility of doing that 2825 02:03:49,270 --> 02:03:53,770 by just checking return values, checking if malloc or get string return NULL. 2826 02:03:53,770 --> 02:03:56,260 Because you can proactively with conditionals 2827 02:03:56,260 --> 02:03:59,830 make sure that these two things do not collide by just making sure 2828 02:03:59,830 --> 02:04:02,060 that you get back non-NULL values. 2829 02:04:02,060 --> 02:04:04,840 So let's consider the stack in the context of swap 2830 02:04:04,840 --> 02:04:06,400 and what's really happening here. 2831 02:04:06,400 --> 02:04:09,233 And Carter, if you wouldn't mind helping me animate the screen here, 2832 02:04:09,233 --> 02:04:13,000 when I call the main function of any program, 2833 02:04:13,000 --> 02:04:17,483 it is allocated a slice of memory called a frame at the bottom of this stack. 2834 02:04:17,483 --> 02:04:19,650 So if Carter, you want to go ahead and advance here, 2835 02:04:19,650 --> 02:04:22,260 here's the first slice of memory that will always 2836 02:04:22,260 --> 02:04:26,520 be used by main whether it has command line arguments, or local variables. 2837 02:04:26,520 --> 02:04:28,260 It just ends up here in memory. 2838 02:04:28,260 --> 02:04:32,980 Suppose now per our swap.c program that main calls swap. 2839 02:04:32,980 --> 02:04:35,190 Well, where does the memory for swap end up? 2840 02:04:35,190 --> 02:04:35,950 Right up here. 2841 02:04:35,950 --> 02:04:39,390 So swap had two variables-- two arguments a and b. 2842 02:04:39,390 --> 02:04:41,140 And it also had a temporary variable. 2843 02:04:41,140 --> 02:04:42,993 So all of those end up in here in memory. 2844 02:04:42,993 --> 02:04:44,910 And if you want to go ahead and advance again, 2845 02:04:44,910 --> 02:04:48,330 Carter, once swap is done executing, whether it just 2846 02:04:48,330 --> 02:04:51,780 returns because there's no more lines of code, or you explicitly return, 2847 02:04:51,780 --> 02:04:54,577 this memory is just freed up automatically. 2848 02:04:54,577 --> 02:04:55,410 You don't call free. 2849 02:04:55,410 --> 02:04:56,610 You don't undo malloc. 2850 02:04:56,610 --> 02:04:58,110 This just all happens automatically. 2851 02:04:58,110 --> 02:04:59,880 It has been since week one. 2852 02:04:59,880 --> 02:05:02,820 Now technically, it's still there even though we've 2853 02:05:02,820 --> 02:05:04,200 removed it from the picture. 2854 02:05:04,200 --> 02:05:06,870 And there's your first hint of garbage values. 2855 02:05:06,870 --> 02:05:08,340 There's still zeros and ones there. 2856 02:05:08,340 --> 02:05:11,340 And they're left in the original-- the previous configuration. 2857 02:05:11,340 --> 02:05:13,967 And so the reason you get random values in the memory 2858 02:05:13,967 --> 02:05:16,050 is because even though we haven't drawn swap here, 2859 02:05:16,050 --> 02:05:17,730 there was stuff there a moment ago. 2860 02:05:17,730 --> 02:05:20,400 It's going to be there the next time you use that same memory. 2861 02:05:20,400 --> 02:05:23,250 Now let's go ahead and step through this a little more methodically. 2862 02:05:23,250 --> 02:05:27,210 Main has two variables called x and y one and two. 2863 02:05:27,210 --> 02:05:30,330 So let's advance and represent x as one, y as two 2864 02:05:30,330 --> 02:05:31,950 taking up these two chunks of memory. 2865 02:05:31,950 --> 02:05:35,130 When we call swap now, swap gets a new slice of memory 2866 02:05:35,130 --> 02:05:40,050 that then gives us three variables, a and b, technically the arguments, 2867 02:05:40,050 --> 02:05:40,800 and temp. 2868 02:05:40,800 --> 02:05:41,820 So what happens? 2869 02:05:41,820 --> 02:05:45,900 Well, because functions automatically pass in values by value, 2870 02:05:45,900 --> 02:05:48,930 or rather pass in arguments by value, x gets 2871 02:05:48,930 --> 02:05:53,130 copied into a, y gets copied into b, and then once we 2872 02:05:53,130 --> 02:05:55,920 start executing the algorithm, a la the watered glasses, well, 2873 02:05:55,920 --> 02:05:57,040 what happens here? 2874 02:05:57,040 --> 02:06:01,710 So if I execute the first line of code, temp equals a, temp gets a copy of a. 2875 02:06:01,710 --> 02:06:03,645 What happens next? a equals b. 2876 02:06:03,645 --> 02:06:05,880 So a takes on a copy of b. 2877 02:06:05,880 --> 02:06:09,030 And now we do the final swap in the glass, is b equals temp. 2878 02:06:09,030 --> 02:06:10,890 b gets a copy of temp. 2879 02:06:10,890 --> 02:06:13,872 Now we don't have to change temp because it's essentially empty, 2880 02:06:13,872 --> 02:06:15,330 although there's the garbage value. 2881 02:06:15,330 --> 02:06:18,390 One is always now going to be there until we reuse that memory. 2882 02:06:18,390 --> 02:06:21,240 The important thing, though, is that a and b have been swapped. 2883 02:06:21,240 --> 02:06:26,430 But what obviously has not been swapped, as is manifest as when swap returns, x 2884 02:06:26,430 --> 02:06:27,660 and y are untouched. 2885 02:06:27,660 --> 02:06:29,830 Because copies thereof were passed in. 2886 02:06:29,830 --> 02:06:31,690 So we need a solution to this problem. 2887 02:06:31,690 --> 02:06:34,290 And if we advance one more time, if you don't mind, let me step over here 2888 02:06:34,290 --> 02:06:35,748 but then call you back in a second. 2889 02:06:35,748 --> 02:06:38,310 This code here is logically correct. 2890 02:06:38,310 --> 02:06:39,540 This is what you did. 2891 02:06:39,540 --> 02:06:41,460 But this is now a detail of c. 2892 02:06:41,460 --> 02:06:44,670 You can't just swap the things by value, because you're only changing it 2893 02:06:44,670 --> 02:06:46,260 in the scope of the swap function. 2894 02:06:46,260 --> 02:06:50,610 But I think if we change it to this and add some annoying syntax, 2895 02:06:50,610 --> 02:06:52,500 we can solve the problem. 2896 02:06:52,500 --> 02:06:55,740 Just like you can declare variables as storing addresses, 2897 02:06:55,740 --> 02:07:00,360 you can declare arguments to functions, AKA parameters, as taking addresses. 2898 02:07:00,360 --> 02:07:04,950 This new version of swap means that a shall be the address of an integer. 2899 02:07:04,950 --> 02:07:07,050 b shall be the address of an integer. 2900 02:07:07,050 --> 02:07:09,150 And now it gets a little cryptic here. 2901 02:07:09,150 --> 02:07:12,270 Temp is the same because it's just an integer like it was in week one. 2902 02:07:12,270 --> 02:07:14,040 Nothing special about temp. 2903 02:07:14,040 --> 02:07:18,135 But if you want to get the value at a, you do star a. 2904 02:07:18,135 --> 02:07:21,150 And that goes to the address, grabs the number one presumably. 2905 02:07:21,150 --> 02:07:24,268 If you want to change the value of a, you go to that address, 2906 02:07:24,268 --> 02:07:26,310 you follow the treasure map to the other mailbox, 2907 02:07:26,310 --> 02:07:29,310 and you set it equal to whatever is at the value of b. 2908 02:07:29,310 --> 02:07:30,900 You go to b as well. 2909 02:07:30,900 --> 02:07:33,450 Last line, you go to b now and change it to be 2910 02:07:33,450 --> 02:07:36,990 whatever the temporary variable was, which happened to be the same as a. 2911 02:07:36,990 --> 02:07:39,300 So that's where the final value gets swapped. 2912 02:07:39,300 --> 02:07:41,810 But here, there's a lot more crisscrossing metaphorically 2913 02:07:41,810 --> 02:07:43,560 across the stage where you're going to all 2914 02:07:43,560 --> 02:07:46,950 of these different addresses in the swap function to make these changes. 2915 02:07:46,950 --> 02:07:49,200 So if we advance now to the pictorial version of this, 2916 02:07:49,200 --> 02:07:51,480 here's the same story as before with main. 2917 02:07:51,480 --> 02:07:53,700 And x and y are one and two respectively. 2918 02:07:53,700 --> 02:07:57,540 When swap gets called now, notice, and I'll do it with arrows here, 2919 02:07:57,540 --> 02:08:01,680 a is effectively pointing to x, b is effectively pointing to y. 2920 02:08:01,680 --> 02:08:04,290 If we really get into the weeds, these are actually addresses. 2921 02:08:04,290 --> 02:08:05,910 But who cares about the specifics? 2922 02:08:05,910 --> 02:08:07,540 It's really just the concept here. 2923 02:08:07,540 --> 02:08:08,700 So now what happens? 2924 02:08:08,700 --> 02:08:10,500 Int temp gets star a. 2925 02:08:10,500 --> 02:08:12,840 Star a means start at a and go there. 2926 02:08:12,840 --> 02:08:14,160 Follow the arrow, if you will. 2927 02:08:14,160 --> 02:08:15,570 Sort of chutes and ladders style. 2928 02:08:15,570 --> 02:08:16,620 And then that's one. 2929 02:08:16,620 --> 02:08:18,010 So we put one in temp. 2930 02:08:18,010 --> 02:08:18,510 All right. 2931 02:08:18,510 --> 02:08:20,123 Star a equals star b. 2932 02:08:20,123 --> 02:08:21,540 So let's do it from right to left. 2933 02:08:21,540 --> 02:08:23,220 Star b means follow the arrow. 2934 02:08:23,220 --> 02:08:24,050 It's two. 2935 02:08:24,050 --> 02:08:25,050 And then what do you do? 2936 02:08:25,050 --> 02:08:26,040 Follow the arrow. 2937 02:08:26,040 --> 02:08:29,790 It's now two because you copy one to the other from right to left. 2938 02:08:29,790 --> 02:08:31,740 And then lastly, star b gets temp. 2939 02:08:31,740 --> 02:08:33,270 So start at b, go to b. 2940 02:08:33,270 --> 02:08:36,330 And now store whatever the value is in temp. 2941 02:08:36,330 --> 02:08:39,930 So just by having this basic new syntax of like ampersands, and stars, 2942 02:08:39,930 --> 02:08:42,270 and so forth, we can actually now go to places 2943 02:08:42,270 --> 02:08:44,790 and circumvent what is otherwise a feature of C, 2944 02:08:44,790 --> 02:08:47,160 that these variables are locally scoped. 2945 02:08:47,160 --> 02:08:50,265 But you can still access things in other functions as well. 2946 02:08:50,265 --> 02:08:52,390 So thank you so much for helping step through this. 2947 02:08:52,390 --> 02:08:55,440 So we now have a application of this that 2948 02:08:55,440 --> 02:08:59,830 explains why now in this version of the C code this would actually now work. 2949 02:08:59,830 --> 02:09:03,030 So in fact, let me go back to my swap code here. 2950 02:09:03,030 --> 02:09:06,030 And let me change the function ever so slightly in VS Code. 2951 02:09:06,030 --> 02:09:08,800 So let me scroll down, leaving main the same. 2952 02:09:08,800 --> 02:09:13,590 And let me change swaps prototype to taking in addresses. 2953 02:09:13,590 --> 02:09:15,060 Let me go to a here. 2954 02:09:15,060 --> 02:09:16,320 Let me go to a here. 2955 02:09:16,320 --> 02:09:17,760 Let me go to b here. 2956 02:09:17,760 --> 02:09:19,590 And let me go to b here as well. 2957 02:09:19,590 --> 02:09:20,930 But nothing else changes. 2958 02:09:20,930 --> 02:09:23,540 This change here in particular is enough of a clue 2959 02:09:23,540 --> 02:09:27,410 to see that means when you call swap and pass in two values, 2960 02:09:27,410 --> 02:09:31,040 I'm expecting addresses now, not integers. 2961 02:09:31,040 --> 02:09:34,640 But now that I've made this change, I do need to go up to main 2962 02:09:34,640 --> 02:09:37,490 and make one change. 2963 02:09:37,490 --> 02:09:40,190 Does anyone have the intuition for what now needs change 2964 02:09:40,190 --> 02:09:45,590 in main so that I pass in x and y by reference, that is by address rather 2965 02:09:45,590 --> 02:09:48,290 than by value or copy? 2966 02:09:48,290 --> 02:09:49,652 Yeah, in back. 2967 02:09:49,652 --> 02:09:52,198 AUDIENCE: [INAUDIBLE] 2968 02:09:52,198 --> 02:09:53,240 DAVID J. MALAN: So close. 2969 02:09:53,240 --> 02:09:57,170 So on the swap line, it's not star that I want in front of the x and the y. 2970 02:09:57,170 --> 02:09:59,834 It's instead-- 2971 02:09:59,834 --> 02:10:00,720 AUDIENCE: [INAUDIBLE] 2972 02:10:00,720 --> 02:10:01,560 DAVID J. MALAN: What's the other one? 2973 02:10:01,560 --> 02:10:02,380 AUDIENCE: Ampersand. 2974 02:10:02,380 --> 02:10:03,235 DAVID J. MALAN: It's the ampersand. 2975 02:10:03,235 --> 02:10:03,735 Why? 2976 02:10:03,735 --> 02:10:06,840 Because if I want to enable swap to go somewhere, just like Carter 2977 02:10:06,840 --> 02:10:08,590 and I played this game with the mailboxes, 2978 02:10:08,590 --> 02:10:12,220 I need to inform swap of the address of x and the address of y. 2979 02:10:12,220 --> 02:10:14,470 And again, per the beginning of today's class, 2980 02:10:14,470 --> 02:10:16,720 ampersand is the syntax via which we do that. 2981 02:10:16,720 --> 02:10:19,840 So I add an ampersand here to get the address of x, ampersand here 2982 02:10:19,840 --> 02:10:20,860 to get the address of y. 2983 02:10:20,860 --> 02:10:23,530 And now this code lines up with the picture 2984 02:10:23,530 --> 02:10:25,280 that Carter just helped us walk through. 2985 02:10:25,280 --> 02:10:29,360 And so when I run make swap here, I have a mistake. 2986 02:10:29,360 --> 02:10:30,670 Oh, what did I do wrong? 2987 02:10:30,670 --> 02:10:31,600 Not intentional. 2988 02:10:31,600 --> 02:10:34,240 But I guess worth pointing out. 2989 02:10:34,240 --> 02:10:35,230 I screwed up here. 2990 02:10:35,230 --> 02:10:40,000 It doesn't like ampersand x because of something 2991 02:10:40,000 --> 02:10:43,240 on line three, which is way early into the code. 2992 02:10:43,240 --> 02:10:44,470 What did I screw up? 2993 02:10:44,470 --> 02:10:45,490 Yeah, in the middle. 2994 02:10:45,490 --> 02:10:47,757 AUDIENCE: [INAUDIBLE] 2995 02:10:47,757 --> 02:10:50,090 DAVID J. MALAN: Yeah, so this is why we-- you should not 2996 02:10:50,090 --> 02:10:53,215 copy paste, even though it's necessary for things like function prototypes. 2997 02:10:53,215 --> 02:10:56,240 If I changed swap at the bottom, I need to change its prototype. 2998 02:10:56,240 --> 02:10:59,360 So let me add the star there, add the star there, or just re-copy paste 2999 02:10:59,360 --> 02:11:00,780 it at the top of the file. 3000 02:11:00,780 --> 02:11:02,490 Now let me do make swap again. 3001 02:11:02,490 --> 02:11:03,980 Let me now do dot slash swap. 3002 02:11:03,980 --> 02:11:06,530 And I should now see x is 1, y is 2. 3003 02:11:06,530 --> 02:11:10,760 And hopefully, x is 2, y is 1, which I now do. 3004 02:11:10,760 --> 02:11:12,017 So the logic is the same. 3005 02:11:12,017 --> 02:11:13,100 The algorithm is the same. 3006 02:11:13,100 --> 02:11:14,810 All the week zero stuff is the same. 3007 02:11:14,810 --> 02:11:17,960 Except now in week four, you just have a bit more expressiveness 3008 02:11:17,960 --> 02:11:22,310 via which you can tell the computer exactly what you want to manipulate 3009 02:11:22,310 --> 02:11:24,070 and how. 3010 02:11:24,070 --> 02:11:29,250 Any questions then on this technique here? 3011 02:11:29,250 --> 02:11:29,750 No? 3012 02:11:29,750 --> 02:11:30,250 All right. 3013 02:11:30,250 --> 02:11:33,072 Well, when we fix this, there's still going to be problems. 3014 02:11:33,072 --> 02:11:35,030 And just so you've seen some terms of art here, 3015 02:11:35,030 --> 02:11:38,113 this is bad whenever you have two arrows pointing at one another certainly 3016 02:11:38,113 --> 02:11:40,400 if you might use and reuse more and more memory. 3017 02:11:40,400 --> 02:11:43,320 And it turns out there are some terms of art that might suddenly now make sense, 3018 02:11:43,320 --> 02:11:44,945 especially if you've programmed before. 3019 02:11:44,945 --> 02:11:47,210 Bad things can happen by this design. 3020 02:11:47,210 --> 02:11:49,045 But there's really only this kind of design 3021 02:11:49,045 --> 02:11:50,670 because it's a finite amount of memory. 3022 02:11:50,670 --> 02:11:52,370 So at some point, bad things are going to happen no matter 3023 02:11:52,370 --> 02:11:54,070 what if a computer runs out of memory. 3024 02:11:54,070 --> 02:11:55,820 So it's not that this was a poor decision. 3025 02:11:55,820 --> 02:11:59,480 It's just sort of a necessary one given finite amounts of memory in a computer. 3026 02:11:59,480 --> 02:12:01,790 But a heap overflow, so to speak, is when 3027 02:12:01,790 --> 02:12:05,300 you actually overflow the heap and touch memory that you shouldn't up there. 3028 02:12:05,300 --> 02:12:08,300 Stack overflow is when you somehow overflow the stack and touch 3029 02:12:08,300 --> 02:12:10,030 memory that you shouldn't down there. 3030 02:12:10,030 --> 02:12:12,780 So with that said, these are really just problems that can happen. 3031 02:12:12,780 --> 02:12:14,488 And they're specific incarnations of what 3032 02:12:14,488 --> 02:12:16,610 are generally called buffer overflows. 3033 02:12:16,610 --> 02:12:19,670 A buffer, like in the YouTube sense, is just a chunk of memory, 3034 02:12:19,670 --> 02:12:23,030 that in the case of YouTube, stores the next few seconds or minutes of video. 3035 02:12:23,030 --> 02:12:25,910 But generally speaking, a buffer is just a chunk of memory 3036 02:12:25,910 --> 02:12:27,860 that the computer is using for some purpose, 3037 02:12:27,860 --> 02:12:31,430 be it the stack, be it the heap, be it an array in the computer. 3038 02:12:31,430 --> 02:12:34,280 And so buffer overflows are what happens when you just 3039 02:12:34,280 --> 02:12:37,100 have logical bugs in your code. 3040 02:12:37,100 --> 02:12:39,710 But with these primitives now in mind, we 3041 02:12:39,710 --> 02:12:42,050 wanted to conclude with a final revelation. 3042 02:12:42,050 --> 02:12:44,940 And that's how some functions like these here work. 3043 02:12:44,940 --> 02:12:48,140 The other thing in the CS50 library, besides the typedef for quote unquote 3044 02:12:48,140 --> 02:12:50,240 "string" is, of course, all of these functions. 3045 02:12:50,240 --> 02:12:51,573 And we give you these functions. 3046 02:12:51,573 --> 02:12:55,370 Because honestly in C, it is hard, it's annoying, it's painful, 3047 02:12:55,370 --> 02:12:58,430 it's difficult to get user input correctly. 3048 02:12:58,430 --> 02:13:01,670 It's very easy when you don't know how much the human is 3049 02:13:01,670 --> 02:13:04,280 going to type to write buggy code when it comes to it. 3050 02:13:04,280 --> 02:13:06,800 And indeed, it's really hard to store it correctly 3051 02:13:06,800 --> 02:13:09,930 without accidentally having some kind of buffer overflow. 3052 02:13:09,930 --> 02:13:12,150 So for instance, let me show you a program here. 3053 02:13:12,150 --> 02:13:14,400 I'm going to go ahead and write this one from scratch. 3054 02:13:14,400 --> 02:13:17,060 So let me go ahead and open a file called get.c, 3055 02:13:17,060 --> 02:13:20,480 wherein I'm going to go ahead and mimic the idea of getting integers manually 3056 02:13:20,480 --> 02:13:21,950 without the CS50 library. 3057 02:13:21,950 --> 02:13:24,470 So I'm going to include standardio.h only, 3058 02:13:24,470 --> 02:13:27,395 I'm going to define main as not taking any command line arguments, 3059 02:13:27,395 --> 02:13:29,270 and then I'm going to do something like this. 3060 02:13:29,270 --> 02:13:31,850 Give me a variable x with no value yet. 3061 02:13:31,850 --> 02:13:34,250 And normally, I would do something like get int. 3062 02:13:34,250 --> 02:13:35,450 But let me take that away. 3063 02:13:35,450 --> 02:13:37,490 No more training wheels for get int either. 3064 02:13:37,490 --> 02:13:39,890 So let me just define the int x. 3065 02:13:39,890 --> 02:13:44,270 Let me then just print out something like a prompt. 3066 02:13:44,270 --> 02:13:47,090 And I'll just do x colon just to make it obvious to the human what 3067 02:13:47,090 --> 02:13:48,330 we're waiting for. 3068 02:13:48,330 --> 02:13:51,800 And now I'm going to use a built-in C function to get user input. 3069 02:13:51,800 --> 02:13:54,920 I'm going to call a function called scanf, which sort of scans 3070 02:13:54,920 --> 02:13:56,600 the user's keyboard for input. 3071 02:13:56,600 --> 02:13:58,400 I'm going to scan it for an integer. 3072 02:13:58,400 --> 02:14:01,850 So just like printf, I'm going to use i because I expect an int. 3073 02:14:01,850 --> 02:14:04,760 And then I want to tell scanf where to put 3074 02:14:04,760 --> 02:14:07,640 the human's integer from the keyboard. 3075 02:14:07,640 --> 02:14:09,950 It is not correct though to say x. 3076 02:14:09,950 --> 02:14:12,770 Because if I say x, I run into the same swap problem. 3077 02:14:12,770 --> 02:14:13,490 Scanf. 3078 02:14:13,490 --> 02:14:16,250 No function can change the value of x unless I pass it 3079 02:14:16,250 --> 02:14:20,200 not by value, but by reference. 3080 02:14:20,200 --> 02:14:22,020 So we're back to our ampersand friend. 3081 02:14:22,020 --> 02:14:26,700 And now, it has a treasure map to the actual location of x, 3082 02:14:26,700 --> 02:14:27,998 and can therefore change it. 3083 02:14:27,998 --> 02:14:29,790 And so now at the very end of this program, 3084 02:14:29,790 --> 02:14:34,890 let me do something simple like, let's just go ahead and print out with printf 3085 02:14:34,890 --> 02:14:40,575 the value of x, using %i as always plugging in x, not ampersand x. 3086 02:14:40,575 --> 02:14:41,700 This is now week one stuff. 3087 02:14:41,700 --> 02:14:44,430 I want to print the actual integer value of x. 3088 02:14:44,430 --> 02:14:47,400 So the only change here is that instead of using get int, 3089 02:14:47,400 --> 02:14:51,540 I'm now using this new function that as of today exists called scanf. 3090 02:14:51,540 --> 02:14:54,600 So let me go ahead and run get. 3091 02:14:54,600 --> 02:14:56,400 Make get to create this program. 3092 02:14:56,400 --> 02:14:57,632 Dot slash get. 3093 02:14:57,632 --> 02:14:59,340 Let's go ahead and type in a value for x. 3094 02:14:59,340 --> 02:15:00,240 50. 3095 02:15:00,240 --> 02:15:01,080 Enter. 3096 02:15:01,080 --> 02:15:02,160 And it just works. 3097 02:15:02,160 --> 02:15:05,710 So it turns out get int is pretty simple to implement. 3098 02:15:05,710 --> 02:15:07,380 However, notice what does not work. 3099 02:15:07,380 --> 02:15:11,010 If I type in cat, for instance, cat gets converted to zero. 3100 02:15:11,010 --> 02:15:14,610 And meanwhile, get int, recall, will re-prompt the user. 3101 02:15:14,610 --> 02:15:16,890 If a human does not type an actual integer, 3102 02:15:16,890 --> 02:15:18,360 you get automatically re-prompted. 3103 02:15:18,360 --> 02:15:20,160 So that's one of the features we for CS50 3104 02:15:20,160 --> 02:15:22,960 added to get int just to make your programs more user friendly. 3105 02:15:22,960 --> 02:15:25,710 But otherwise, get int is pretty straightforward 3106 02:15:25,710 --> 02:15:27,600 to re-implement using scanf. 3107 02:15:27,600 --> 02:15:30,210 Unfortunately, that's not true for strings. 3108 02:15:30,210 --> 02:15:33,393 Because how do you know when you write your code what word the human is 3109 02:15:33,393 --> 02:15:34,560 going to eventually type in? 3110 02:15:34,560 --> 02:15:37,590 How long they're greeting, like hi is? 3111 02:15:37,590 --> 02:15:40,140 If their name is David, or Carter, or anything else, 3112 02:15:40,140 --> 02:15:42,670 you just don't in advance how much memory you need. 3113 02:15:42,670 --> 02:15:45,360 So how might we do this with strings? 3114 02:15:45,360 --> 02:15:48,170 Well, let me go ahead and declare a string s. 3115 02:15:48,170 --> 02:15:49,170 Although, you know what? 3116 02:15:49,170 --> 02:15:50,400 There's no CS50 library. 3117 02:15:50,400 --> 02:15:53,230 So we do char star s today instead. 3118 02:15:53,230 --> 02:15:56,850 And that gives me not a string per se, but a pointer 3119 02:15:56,850 --> 02:15:59,670 that will point presumably to a string. 3120 02:15:59,670 --> 02:16:00,990 Ideally, I would use this. 3121 02:16:00,990 --> 02:16:01,770 Get string. 3122 02:16:01,770 --> 02:16:04,030 But again, we've taken that training wheel away. 3123 02:16:04,030 --> 02:16:08,040 So now that I have a pointer s, suppose I prompt the human for a value for s, 3124 02:16:08,040 --> 02:16:09,090 just like before. 3125 02:16:09,090 --> 02:16:13,575 Let me use scanf now and tell the user that I expect to read a string, 3126 02:16:13,575 --> 02:16:18,150 %s from the keyboard, and store it in s. 3127 02:16:18,150 --> 02:16:19,470 Now this is subtle. 3128 02:16:19,470 --> 02:16:24,240 I don't technically need an ampersand here, even though I did for an int. 3129 02:16:24,240 --> 02:16:28,020 And I would for a float, and a double, and a long, and a bool, and a char. 3130 02:16:28,020 --> 02:16:34,260 Why do I not need an ampersand in this story to pass by reference? 3131 02:16:34,260 --> 02:16:34,879 Because s is-- 3132 02:16:34,879 --> 02:16:36,129 AUDIENCE: Already [INAUDIBLE]. 3133 02:16:36,129 --> 02:16:37,299 DAVID J. MALAN: It's already an address. 3134 02:16:37,299 --> 02:16:38,830 Again, strings are just special. 3135 02:16:38,830 --> 02:16:40,809 Strings now are always addresses. 3136 02:16:40,809 --> 02:16:43,809 So you don't need to additionally add an ampersand here. 3137 02:16:43,809 --> 02:16:45,575 That's the only subtle difference here. 3138 02:16:45,575 --> 02:16:48,700 But now, if I go ahead and print out at the very end what the value of s is 3139 02:16:48,700 --> 02:16:54,709 using %s as before, this program looks like it's almost the same as the int 3140 02:16:54,709 --> 02:16:55,209 version. 3141 02:16:55,209 --> 02:16:57,190 But let's do make get. 3142 02:16:57,190 --> 02:16:59,500 And OK, so this is not good. 3143 02:16:59,500 --> 02:17:01,910 All right, so it doesn't like an uninitialized value. 3144 02:17:01,910 --> 02:17:02,980 So let me make it happy. 3145 02:17:02,980 --> 02:17:05,138 I said earlier to always initialize my variable. 3146 02:17:05,138 --> 02:17:07,930 So let's initialize it to NULL so that at least something is there. 3147 02:17:07,930 --> 02:17:09,910 That's your good default value nowadays. 3148 02:17:09,910 --> 02:17:12,400 Now if I do dot slash get, now we're good. 3149 02:17:12,400 --> 02:17:15,969 And let me type in something like cat. 3150 02:17:15,969 --> 02:17:17,870 OK, cat is not x. 3151 02:17:17,870 --> 02:17:19,120 Well, let me try another word. 3152 02:17:19,120 --> 02:17:20,379 Maybe it's just cat is wrong. 3153 02:17:20,379 --> 02:17:21,160 Dog. 3154 02:17:21,160 --> 02:17:22,910 OK, let me try David. 3155 02:17:22,910 --> 02:17:24,430 It just doesn't seem to be working. 3156 02:17:24,430 --> 02:17:27,040 Moreover, it's printing it as a zero. 3157 02:17:27,040 --> 02:17:30,719 What logically, though, is the bug here? 3158 02:17:30,719 --> 02:17:32,790 Scanf worked a moment ago for integers. 3159 02:17:32,790 --> 02:17:34,320 But it's not working for strings. 3160 02:17:34,320 --> 02:17:37,170 And it seems to be forgetting C-A-T. It's forgetting D-O-G. 3161 02:17:37,170 --> 02:17:40,820 It's forgetting D-A-V-I-D. Why? 3162 02:17:40,820 --> 02:17:44,090 What's happening here? 3163 02:17:44,090 --> 02:17:47,920 Think back to our yellow pictures of memory. 3164 02:17:47,920 --> 02:17:49,090 When I-- yeah. 3165 02:17:49,090 --> 02:17:50,267 AUDIENCE: [INAUDIBLE] 3166 02:17:50,267 --> 02:17:52,600 DAVID J. MALAN: It might be reading just the NULL itself 3167 02:17:52,600 --> 02:17:54,790 because s is being initialized to NULL. 3168 02:17:54,790 --> 02:17:58,090 And what step have I forgotten from just a few minutes ago? 3169 02:17:58,090 --> 02:18:01,760 What did I not actually request of the computer? 3170 02:18:01,760 --> 02:18:06,290 Actual memory to store the C-A-T, the D-O-G, the D-A-V-I-D. 3171 02:18:06,290 --> 02:18:10,010 There's nowhere have I asked the computer for some amount of memory. 3172 02:18:10,010 --> 02:18:14,600 And so technically, it might be reading it into some garbage location. 3173 02:18:14,600 --> 02:18:18,020 And that's really the problem here. s is initialized to NULL now. 3174 02:18:18,020 --> 02:18:20,278 And so in fact, it is printing zero as NULL. 3175 02:18:20,278 --> 02:18:22,070 But I'm not seeing any of the other letters 3176 02:18:22,070 --> 02:18:23,653 because there was nowhere to put them. 3177 02:18:23,653 --> 02:18:27,920 C-A-T, D-O-G, D-A-V-I-D because I didn't ask for 3 bytes, 4 bytes, 5 bytes, 3178 02:18:27,920 --> 02:18:28,610 100 bytes. 3179 02:18:28,610 --> 02:18:29,840 There's no use of malloc. 3180 02:18:29,840 --> 02:18:31,100 There's no use of an array. 3181 02:18:31,100 --> 02:18:35,370 There's no memory allocated for anything other than the pointer itself. 3182 02:18:35,370 --> 02:18:38,209 And this is where, honestly, life gets hard with scanf. 3183 02:18:38,209 --> 02:18:40,709 I could solve this problem in a couple of ways. 3184 02:18:40,709 --> 02:18:41,910 Let me go ahead and do this. 3185 02:18:41,910 --> 02:18:44,150 Instead of declaring s to be a pointer, let 3186 02:18:44,150 --> 02:18:48,469 me declare s to actually be an array of four chars. 3187 02:18:48,469 --> 02:18:51,540 And now let me go ahead and recompile the code. 3188 02:18:51,540 --> 02:18:55,380 So make get dot slash get, and I'll type in cat now. 3189 02:18:55,380 --> 02:18:56,780 That now works. 3190 02:18:56,780 --> 02:18:57,590 Why? 3191 02:18:57,590 --> 02:19:00,680 Well, I'm allocating an explicit array of size four, 3192 02:19:00,680 --> 02:19:03,930 enough for a one, two, three letters, plus a NULL character. 3193 02:19:03,930 --> 02:19:06,379 Here's where to someone's question earlier, it 3194 02:19:06,379 --> 02:19:09,469 turns out that in some contexts, you can treat arrays 3195 02:19:09,469 --> 02:19:12,000 as though they are pointers themselves. 3196 02:19:12,000 --> 02:19:14,010 So you will sort of do the conversion for you. 3197 02:19:14,010 --> 02:19:17,540 But for now, just assume that s is just an array of size four. 3198 02:19:17,540 --> 02:19:20,719 And if you pass it into scanf, that's like a treasure 3199 02:19:20,719 --> 02:19:24,559 map that leads to those 4 bytes so scanf can now successfully fill it 3200 02:19:24,559 --> 02:19:29,309 with C-A-T, D-O-G. But let's try this again. 3201 02:19:29,309 --> 02:19:30,440 Let's type in David. 3202 02:19:30,440 --> 02:19:32,940 And here, OK, we got lucky. 3203 02:19:32,940 --> 02:19:35,459 But I technically touched memory that I should not. 3204 02:19:35,459 --> 02:19:38,209 And in fact, if I typed in a long enough string, and I don't think 3205 02:19:38,209 --> 02:19:40,910 I could do it very easily by-- without typing 3206 02:19:40,910 --> 02:19:42,830 this thousands or hundreds of times. 3207 02:19:42,830 --> 02:19:43,910 Still OK. 3208 02:19:43,910 --> 02:19:47,298 But you'll notice that it's forgotten the rest of it now. 3209 02:19:47,298 --> 02:19:49,590 So somewhere, we went beyond the boundary of the array. 3210 02:19:49,590 --> 02:19:52,230 And we just don't have enough storage space for that entire thing. 3211 02:19:52,230 --> 02:19:53,647 So what do you do in your program? 3212 02:19:53,647 --> 02:19:57,050 If you don't know how long the person's name or the animal name is going to be, 3213 02:19:57,050 --> 02:19:57,530 what do you do? 3214 02:19:57,530 --> 02:19:58,030 40? 3215 02:19:58,030 --> 02:19:59,180 400? 3216 02:19:59,180 --> 02:19:59,960 4,000? 3217 02:19:59,960 --> 02:20:00,785 40,000? 3218 02:20:00,785 --> 02:20:02,910 At some point, you have to draw a line in the sand. 3219 02:20:02,910 --> 02:20:06,980 And that's why getting user input is so annoying in a language like C. 3220 02:20:06,980 --> 02:20:08,900 And that's why get string exists. 3221 02:20:08,900 --> 02:20:12,500 What we do, if you're curious, is we look at the user's input 3222 02:20:12,500 --> 02:20:13,760 and we take baby steps. 3223 02:20:13,760 --> 02:20:15,980 We look at it one character at a time. 3224 02:20:15,980 --> 02:20:18,380 And every time we see another character, we actually 3225 02:20:18,380 --> 02:20:19,750 call malloc again and say, no. 3226 02:20:19,750 --> 02:20:20,750 I need more than 1 byte. 3227 02:20:20,750 --> 02:20:21,470 I need 2. 3228 02:20:21,470 --> 02:20:23,012 Oh wait, they typed in three letters. 3229 02:20:23,012 --> 02:20:23,930 I need 3 instead of 2. 3230 02:20:23,930 --> 02:20:25,340 Oh, I need 4 instead of 2. 3231 02:20:25,340 --> 02:20:27,470 And we have this crazy loop essentially that 3232 02:20:27,470 --> 02:20:30,500 keeps asking for more and more memory but by taking baby steps. 3233 02:20:30,500 --> 02:20:33,380 And honestly, if you all had to do that in week one, my God. 3234 02:20:33,380 --> 02:20:35,640 We couldn't even write, hello, world anymore. 3235 02:20:35,640 --> 02:20:38,450 And so that's why these training wheels exist, at least early on. 3236 02:20:38,450 --> 02:20:42,050 And that's why in higher level languages like in Python, 3237 02:20:42,050 --> 02:20:43,820 you don't have to do this at all. 3238 02:20:43,820 --> 02:20:45,960 It just works as you'd expect. 3239 02:20:45,960 --> 02:20:47,640 So what more can we do? 3240 02:20:47,640 --> 02:20:51,050 Well, you'll see in problem set four this coming week, if I open up 3241 02:20:51,050 --> 02:20:53,750 an example like this, phonebook.c, you'll 3242 02:20:53,750 --> 02:20:56,510 see that you can manipulate files now, that you 3243 02:20:56,510 --> 02:20:57,950 have a vocabulary for pointers. 3244 02:20:57,950 --> 02:20:59,570 It's going to be new quickly. 3245 02:20:59,570 --> 02:21:02,000 But here we have an example of how. 3246 02:21:02,000 --> 02:21:04,490 I have a program using some familiar libraries here. 3247 02:21:04,490 --> 02:21:08,010 But as I claim in my comment, this saves names and numbers to a CSV file. 3248 02:21:08,010 --> 02:21:10,235 All of my examples thus far, I type in some words, 3249 02:21:10,235 --> 02:21:12,110 I type in some names, and some phone numbers, 3250 02:21:12,110 --> 02:21:14,550 and they disappear because we only store them in memory. 3251 02:21:14,550 --> 02:21:18,290 But if you want to store data in like a CSV file, Comma Separated Values, which 3252 02:21:18,290 --> 02:21:21,200 is like a simple spreadsheet like Excel, and Apple Numbers, 3253 02:21:21,200 --> 02:21:23,910 and Google Sheets can open, you can actually do this yourself. 3254 02:21:23,910 --> 02:21:26,660 So just as a teaser for this week, here on line nine, 3255 02:21:26,660 --> 02:21:27,950 I'm using a new data type. 3256 02:21:27,950 --> 02:21:28,820 Not a CS50 thing. 3257 02:21:28,820 --> 02:21:30,575 This is a C thing called file. 3258 02:21:30,575 --> 02:21:32,450 But if you want to manipulate files, you need 3259 02:21:32,450 --> 02:21:34,250 to use addresses, that is pointers. 3260 02:21:34,250 --> 02:21:37,160 So here is me creating a variable called file 3261 02:21:37,160 --> 02:21:40,070 that's going to point to an actual file on the hard drive, 3262 02:21:40,070 --> 02:21:42,140 on the server, or your Mac, or PC. 3263 02:21:42,140 --> 02:21:45,560 fopen is going to be a new function you'll use that will open a file. 3264 02:21:45,560 --> 02:21:49,160 And it will return effectively a pointer there to in memory. 3265 02:21:49,160 --> 02:21:51,560 The file name I want to open is phonebook.csv. 3266 02:21:51,560 --> 02:21:54,230 And in this example, it's going to be a pen mode. 3267 02:21:54,230 --> 02:21:57,447 It will keep allowing me to add more and more names and numbers to this file. 3268 02:21:57,447 --> 02:21:59,780 Here are some old get string stuff because I'm not going 3269 02:21:59,780 --> 02:22:01,460 to reinvent get string with scanf. 3270 02:22:01,460 --> 02:22:03,930 But down here is a slightly new function. 3271 02:22:03,930 --> 02:22:05,260 It's not printf, but fprintf. 3272 02:22:05,260 --> 02:22:08,010 And it turns out it's very easy to print things not to the screen, 3273 02:22:08,010 --> 02:22:09,800 but to a file with fprintf. 3274 02:22:09,800 --> 02:22:11,900 And it takes an additional argument, instead 3275 02:22:11,900 --> 02:22:14,000 of starting with the quoted string, you'll 3276 02:22:14,000 --> 02:22:16,400 have to say what file you want to write to. 3277 02:22:16,400 --> 02:22:20,240 And fprintf we'll figure out how to get the bits into that 3278 02:22:20,240 --> 02:22:23,520 file passing in something like name, comma number. 3279 02:22:23,520 --> 02:22:26,940 So if I run this somewhat quickly here, let me do this. 3280 02:22:26,940 --> 02:22:31,400 Let me pre-create a file called phonebook.csv. 3281 02:22:31,400 --> 02:22:34,940 And in phonebook.csv, I'm going to create a temporary row here, name 3282 02:22:34,940 --> 02:22:37,820 comma number just so that there's something in this file. 3283 02:22:37,820 --> 02:22:41,810 And now let me go ahead and do this and split my screen here. 3284 02:22:41,810 --> 02:22:46,410 If I have phonebook.csv on the right and phonebook.c on the left, 3285 02:22:46,410 --> 02:22:51,345 let me compile, make phone book, which is the C version, dot slash phonebook. 3286 02:22:51,345 --> 02:22:53,220 And now I'm prompted for a name and a number. 3287 02:22:53,220 --> 02:22:58,080 So I'll type in David, and then for instance plus 1-949-- 3288 02:22:58,080 --> 02:22:58,580 what is it? 3289 02:22:58,580 --> 02:23:01,190 4682750. 3290 02:23:01,190 --> 02:23:02,390 Enter. 3291 02:23:02,390 --> 02:23:03,110 Oh, damn it. 3292 02:23:03,110 --> 02:23:03,995 Bug. 3293 02:23:03,995 --> 02:23:05,120 Pretend that didn't happen. 3294 02:23:05,120 --> 02:23:06,990 I forgot to Enter in the file. 3295 02:23:06,990 --> 02:23:08,630 So let's do this again. 3296 02:23:08,630 --> 02:23:17,900 If I run the program again, David, and plus 1-949-4682750, 3297 02:23:17,900 --> 02:23:20,970 Enter, it's been saved now to the file. 3298 02:23:20,970 --> 02:23:27,230 And if I close this file and I reopen code of phonebook.csv, 3299 02:23:27,230 --> 02:23:29,990 you'll see that the file is persisting. 3300 02:23:29,990 --> 02:23:32,053 And if I downloaded this to my Mac, or my PC, 3301 02:23:32,053 --> 02:23:33,470 I could double click the CSV file. 3302 02:23:33,470 --> 02:23:36,260 And voila, Excel would open up, or Apple Numbers, or the like. 3303 02:23:36,260 --> 02:23:38,527 And I've actually created an actual CSV file. 3304 02:23:38,527 --> 02:23:41,360 If you're smiling because I keep repeating my phone number out loud, 3305 02:23:41,360 --> 02:23:43,910 I would encourage you to call or text that number sometime. 3306 02:23:43,910 --> 02:23:46,100 It might very well be an Easter egg of sorts. 3307 02:23:46,100 --> 02:23:49,160 But via these functions here do we have now the ability 3308 02:23:49,160 --> 02:23:52,130 to write files input and output. 3309 02:23:52,130 --> 02:23:54,442 And among the goals then for this week, as we'll see, 3310 02:23:54,442 --> 02:23:56,900 are to actually play with images in the spirit of something 3311 02:23:56,900 --> 02:23:58,670 like Instagram filters or the like. 3312 02:23:58,670 --> 02:24:01,490 And we'll introduce you, for instance, to a file format called 3313 02:24:01,490 --> 02:24:05,990 BNPs, which to come full circle to the start of class, are just maps of bits, 3314 02:24:05,990 --> 02:24:09,830 but more than just single bits for white and black, but rather colorful patterns 3315 02:24:09,830 --> 02:24:10,500 as well. 3316 02:24:10,500 --> 02:24:12,500 And will give you images like this of the Weeks Bridge 3317 02:24:12,500 --> 02:24:13,875 here across the river at Harvard. 3318 02:24:13,875 --> 02:24:16,490 And you run, after writing your own code in C, 3319 02:24:16,490 --> 02:24:19,925 and understanding how the data is stored in the computer's memory, 3320 02:24:19,925 --> 02:24:22,550 you'll be able to apply your own Instagram-like filters to make 3321 02:24:22,550 --> 02:24:25,830 things grayscale instead, or sepia in this case. 3322 02:24:25,830 --> 02:24:28,820 You can even flip the bits around so that the thing is a mirror image. 3323 02:24:28,820 --> 02:24:30,530 You can blur things further. 3324 02:24:30,530 --> 02:24:32,510 Or if you really are feeling more comfortable, 3325 02:24:32,510 --> 02:24:35,570 you can even write code that finds the edges of the image 3326 02:24:35,570 --> 02:24:37,520 and creates works of art like these. 3327 02:24:37,520 --> 02:24:39,740 So all that and more in problem set four. 3328 02:24:39,740 --> 02:24:42,650 We will see you next time. 3329 02:24:42,650 --> 02:24:47,200 [MUSIC PLAYING] 3330 02:24:47,200 --> 02:25:19,000