WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:01.996 --> 00:00:07.485 [MUSIC PLAYING] 00:01:13.297 --> 00:01:14.380 DAVID J. MALAN: All right. 00:01:14.380 --> 00:01:15.940 This is CS50. 00:01:15.940 --> 00:01:17.260 And this is week four. 00:01:17.260 --> 00:01:19.872 And if you think back a few weeks ago already, in week zero, 00:01:19.872 --> 00:01:21.580 we started talking about what images are, 00:01:21.580 --> 00:01:25.690 and we talked about representation of images as this grid of pixels. 00:01:25.690 --> 00:01:28.923 And each pixel has some pattern of bits that defines its color. 00:01:28.923 --> 00:01:31.840 Well, it turns out today, we'll take a deeper look underneath the hood 00:01:31.840 --> 00:01:34.360 at how things like images, and so much more, 00:01:34.360 --> 00:01:37.240 is actually implemented using just these zeros and ones, 00:01:37.240 --> 00:01:40.300 and how now as a programmer, you can actually 00:01:40.300 --> 00:01:43.750 harness that, for better or for worse, to better understand and better 00:01:43.750 --> 00:01:46.090 manipulate what's going on inside of a computer's memory 00:01:46.090 --> 00:01:47.590 using a language like C. 00:01:47.590 --> 00:01:50.200 In fact, even this bowl of stress balls that we keep happening 00:01:50.200 --> 00:01:51.567 is just a photograph of course. 00:01:51.567 --> 00:01:54.400 But if you think back to week zero, if you sort of enhance, enhance, 00:01:54.400 --> 00:01:56.860 enhance this image, like they do in the movies, 00:01:56.860 --> 00:02:00.310 it actually doesn't work out the way you would think from Hollywood. 00:02:00.310 --> 00:02:04.900 As I keep continue to zoom in, and zoom in, and zoom in on a screen like this, 00:02:04.900 --> 00:02:06.470 you'll see that yes, it gets bigger. 00:02:06.470 --> 00:02:09.190 But if it gets too big, what do you start to notice? 00:02:09.190 --> 00:02:10.539 The so-called pixelation. 00:02:10.539 --> 00:02:12.550 And indeed, you can see the individual dots. 00:02:12.550 --> 00:02:16.810 So next time you watch some show or movie on TV 00:02:16.810 --> 00:02:19.030 that has this sort of notion of enhancing, 00:02:19.030 --> 00:02:20.680 there's actually a finite limit there. 00:02:20.680 --> 00:02:23.680 You can only enhance so far as there's actually information there. 00:02:23.680 --> 00:02:27.487 But once you zoom in to a certain level like this, that's all that's there. 00:02:27.487 --> 00:02:30.070 You're not going to see the glint of the suspect in some crime 00:02:30.070 --> 00:02:33.260 drama in their eye just because you've enhanced the image. 00:02:33.260 --> 00:02:36.130 There's only a finite amount of information actually there. 00:02:36.130 --> 00:02:39.085 But we'll see today too that by understanding 00:02:39.085 --> 00:02:40.960 what's going on inside of a computer's memory 00:02:40.960 --> 00:02:43.150 we can start to represent and even create and code 00:02:43.150 --> 00:02:44.180 more interesting things. 00:02:44.180 --> 00:02:49.060 So for instance, here is a bitmap, if you will, which is a term of art. 00:02:49.060 --> 00:02:51.037 A bitmap is a type of image. 00:02:51.037 --> 00:02:52.870 And it's a map of bits in the sense that you 00:02:52.870 --> 00:02:54.912 have this coordinate system of a top, down, left, 00:02:54.912 --> 00:02:57.530 right at least in this artist's representation here. 00:02:57.530 --> 00:03:02.050 And suppose that maybe we all decide as the world 00:03:02.050 --> 00:03:05.080 that one shall represent the color white and zero 00:03:05.080 --> 00:03:06.850 shall represent the color black. 00:03:06.850 --> 00:03:12.160 What might this map of bits, this bitmap, actually be? 00:03:12.160 --> 00:03:13.330 Can you see through it? 00:03:13.330 --> 00:03:13.830 Yeah. 00:03:13.830 --> 00:03:14.740 AUDIENCE: [INAUDIBLE] 00:03:14.740 --> 00:03:16.900 DAVID J. MALAN: It is indeed a smiley face. 00:03:16.900 --> 00:03:18.070 So an amazing eye. 00:03:18.070 --> 00:03:21.430 If I actually turn all of the ones to white just to visualize this, 00:03:21.430 --> 00:03:23.595 you'll see indeed, this is what was embedded there. 00:03:23.595 --> 00:03:25.720 But of course, on our computer monitors and phones, 00:03:25.720 --> 00:03:28.390 we have this grid of squares, this grid of pixels. 00:03:28.390 --> 00:03:30.970 So indeed, if you were to actually see on your screen 00:03:30.970 --> 00:03:34.152 a smiley face, like a black and white one at that, what's probably going on 00:03:34.152 --> 00:03:36.610 underneath the hood is just some pattern of zeros and ones, 00:03:36.610 --> 00:03:39.520 and maybe single bits, one bit color, if you will, 00:03:39.520 --> 00:03:43.210 where one here represents white and zero represents black. 00:03:43.210 --> 00:03:45.370 So if you kind of like this thing, it turns out 00:03:45.370 --> 00:03:49.150 you can do pretty beautiful, pretty interesting, pretty artistically 00:03:49.150 --> 00:03:50.150 inclined things. 00:03:50.150 --> 00:03:53.568 If you go to this URL at your leisure, cs50.ly.art, 00:03:53.568 --> 00:03:56.860 it'll actually redirect you to a Google spreadsheet that we've made in advance. 00:03:56.860 --> 00:03:58.870 And we've kind of shrunk the rows and columns 00:03:58.870 --> 00:04:02.470 to resemble a grid of pixels, tiny little squares, all of which 00:04:02.470 --> 00:04:06.010 are white by default, not unlike this easel here 00:04:06.010 --> 00:04:08.500 that we have a couple of volunteers working away at. 00:04:08.500 --> 00:04:10.875 In fact, would you guys like to come forward for a moment 00:04:10.875 --> 00:04:13.040 and say a quick hello before we come back to you? 00:04:13.040 --> 00:04:13.630 DANIEL: Hello. 00:04:13.630 --> 00:04:14.530 My name is Daniel. 00:04:14.530 --> 00:04:15.542 I'm from Chicago. 00:04:15.542 --> 00:04:16.959 DAVID J. MALAN: Welcome to Daniel. 00:04:16.959 --> 00:04:17.320 And-- 00:04:17.320 --> 00:04:18.112 ADAM: Hi, everyone. 00:04:18.112 --> 00:04:19.000 I'm Adam. 00:04:19.000 --> 00:04:20.950 And I'm from Trinidad and Tobago. 00:04:20.950 --> 00:04:21.250 DAVID J. MALAN: Nice. 00:04:21.250 --> 00:04:22.333 Well, welcome to you both. 00:04:22.333 --> 00:04:23.020 Thank you. 00:04:23.020 --> 00:04:24.880 You'll see that in their hands are actually 00:04:24.880 --> 00:04:27.922 a whole bunch of pixels, post-it notes that we've handed them in advance. 00:04:27.922 --> 00:04:30.713 So if you don't mind, we'll come back to you in a couple of minutes 00:04:30.713 --> 00:04:33.640 and see what they've created, if you will, on this grid of white paper 00:04:33.640 --> 00:04:35.890 much like you could create on this Google spreadsheet. 00:04:35.890 --> 00:04:39.790 In fact, feel free to send us your creations if so inclined via the URL 00:04:39.790 --> 00:04:42.400 you'll get at cs50.ly/art. 00:04:42.400 --> 00:04:45.700 Now let's come back to week zero where we define some of the building 00:04:45.700 --> 00:04:46.480 blocks for images. 00:04:46.480 --> 00:04:49.240 We talked about RGB, which is just red, green, blue. 00:04:49.240 --> 00:04:51.550 And it just one of the systems, a popular system, 00:04:51.550 --> 00:04:53.890 via which you can represent any color of the rainbow 00:04:53.890 --> 00:04:57.950 using some combination of red, and green, and blue. 00:04:57.950 --> 00:05:00.190 And if any of you are artistically inclined 00:05:00.190 --> 00:05:02.920 or have used Photoshop or similar programs, 00:05:02.920 --> 00:05:05.560 you might typically have some means of selecting 00:05:05.560 --> 00:05:07.460 a color via some grid like this. 00:05:07.460 --> 00:05:09.910 But down here, notice there's explicit mentions 00:05:09.910 --> 00:05:11.800 of the types of color systems in use. 00:05:11.800 --> 00:05:13.150 RGB. 00:05:13.150 --> 00:05:15.820 And in fact, here, you see zero, zero, zero. 00:05:15.820 --> 00:05:18.460 And up here under New, you see the color black. 00:05:18.460 --> 00:05:21.340 And that implies that if you have no red, no green, no blue, well, 00:05:21.340 --> 00:05:24.340 that indeed would represent by convention the color black. 00:05:24.340 --> 00:05:27.910 By contrast, if we play around with Photoshop or any similar program, 00:05:27.910 --> 00:05:31.360 if you have a lot of red, a lot of green, and a lot of blue, 00:05:31.360 --> 00:05:36.880 for instance, 255, 255, 255, really crank it up to the max value, 00:05:36.880 --> 00:05:40.720 you can represent with 8 bits per week zero, well then it turns out 00:05:40.720 --> 00:05:42.490 you get the color white here. 00:05:42.490 --> 00:05:44.650 And we can play with these numbers endlessly. 00:05:44.650 --> 00:05:50.290 For instance, if we use 255 of red, but zero green and zero blue, 00:05:50.290 --> 00:05:54.100 not surprisingly, the square at the top of the screen becomes of course red 00:05:54.100 --> 00:05:57.020 entirely because it's all red and no green, no blue. 00:05:57.020 --> 00:06:01.790 If we change it instead to 255 for green but zero for red and blue, of course, 00:06:01.790 --> 00:06:02.500 we get green. 00:06:02.500 --> 00:06:06.130 And then lastly, if we crank up the blue but leave red and green as zero, 00:06:06.130 --> 00:06:07.360 we of course get blue. 00:06:07.360 --> 00:06:09.730 But all this while, down here highlighted 00:06:09.730 --> 00:06:12.210 is something that maybe some of you have seen before, 00:06:12.210 --> 00:06:14.250 like some combination of numbers and letters. 00:06:14.250 --> 00:06:18.130 If any of you have made personal web pages or used programs like Photoshop, 00:06:18.130 --> 00:06:20.130 you might have used these so-called color codes. 00:06:20.130 --> 00:06:25.290 So indeed, the world has this convention whereby using six digits, or sometimes 00:06:25.290 --> 00:06:29.170 three, you can represent a little more succinctly some amount of red, 00:06:29.170 --> 00:06:30.330 green, blue. 00:06:30.330 --> 00:06:34.840 And you'll see here, maybe by inference, that if RGB is zero, 00:06:34.840 --> 00:06:38.430 zero, 255 respectively, perhaps where we're going with this 00:06:38.430 --> 00:06:43.200 is that zero, zero, zero, zero, FF is just an alternative way of expressing 00:06:43.200 --> 00:06:44.250 the exact same idea. 00:06:44.250 --> 00:06:46.710 No red, no green, and a lot of blue. 00:06:46.710 --> 00:06:48.210 But why is that? 00:06:48.210 --> 00:06:51.390 And in fact, we'll come full circle here to introducing something 00:06:51.390 --> 00:06:53.220 that we could have done in week zero, but it doesn't really 00:06:53.220 --> 00:06:54.240 solve a problem then. 00:06:54.240 --> 00:06:57.900 But today, as we focus more on images and on memory itself, 00:06:57.900 --> 00:07:00.520 turns out understanding these patterns is pretty useful. 00:07:00.520 --> 00:07:03.390 So back in week zero, we talked, of course, about binary. 00:07:03.390 --> 00:07:07.560 And binary by implying two, only gives you two digits, zero and one. 00:07:07.560 --> 00:07:10.530 You and I as humans almost always use the decimal system 00:07:10.530 --> 00:07:12.810 in normal conversation, dec meaning 10. 00:07:12.810 --> 00:07:15.270 So we have zero through nine instead. 00:07:15.270 --> 00:07:20.340 If a human like us wants to count up as high as 10, or 11, or 12, 00:07:20.340 --> 00:07:23.220 we don't have a digit per se for 10, 11, and 12. 00:07:23.220 --> 00:07:24.850 We start reusing digits. 00:07:24.850 --> 00:07:27.580 So it's one zero, one one, one two, and so forth. 00:07:27.580 --> 00:07:30.960 But in other systems, not binary, not decimal, 00:07:30.960 --> 00:07:34.950 but systems called hexadecimal, hex implying 16, 00:07:34.950 --> 00:07:39.240 there are actually more digits than these which might come as a surprise. 00:07:39.240 --> 00:07:42.403 It's not pairs of digits, like in decimal, single digits. 00:07:42.403 --> 00:07:44.820 And frankly, it doesn't really matter what the digits are. 00:07:44.820 --> 00:07:46.020 Because at the end of the day, these are just 00:07:46.020 --> 00:07:49.080 symbols that you and I immediately associate with some notion of math, 00:07:49.080 --> 00:07:52.980 but just strokes on the screen that represent some-- 00:07:52.980 --> 00:07:54.670 represent some actual value. 00:07:54.670 --> 00:07:58.680 So it turns out that by convention, when you want more than nine-- 00:07:58.680 --> 00:08:01.380 10 digits, zero through nine, you start using 00:08:01.380 --> 00:08:06.090 letters of the English alphabet, A, B, C, D, E, and F. 00:08:06.090 --> 00:08:07.770 And you can represent them in lowercase. 00:08:07.770 --> 00:08:08.820 It's case insensitive. 00:08:08.820 --> 00:08:09.690 So it doesn't really matter. 00:08:09.690 --> 00:08:11.482 You might see it in uppercase or lowercase. 00:08:11.482 --> 00:08:14.190 But this is how you can count beyond nine not 00:08:14.190 --> 00:08:17.490 using decimal but using Indeed something called hexadecimal. 00:08:17.490 --> 00:08:20.700 If we get really technical, this is also known as base-16. 00:08:20.700 --> 00:08:22.410 And it's the same idea as week zero where 00:08:22.410 --> 00:08:25.770 instead of using base two for binary, base-10 for decimal, 00:08:25.770 --> 00:08:28.500 you use 16 as the base for hexadecimal. 00:08:28.500 --> 00:08:31.740 And so if we run through just some simple examples here 00:08:31.740 --> 00:08:35.789 in the world of hexadecimal, your columns are just powers of 16. 00:08:35.789 --> 00:08:40.419 16 to the 0, 16 to the 1, 16 to the 2, and so forth. 00:08:40.419 --> 00:08:43.919 But in the world of hex, we usually, at least thus far, and today, we'll 00:08:43.919 --> 00:08:45.760 see just pairs of digits like this. 00:08:45.760 --> 00:08:49.260 So here, for instance, is the ones column, and the 16's column 00:08:49.260 --> 00:08:50.500 if we multiply that out. 00:08:50.500 --> 00:08:52.560 So if you wanted to represent the number you 00:08:52.560 --> 00:08:56.820 and I know in the real world as zero in hexadecimal, 00:08:56.820 --> 00:08:58.530 it would just be zero, zero. 00:08:58.530 --> 00:09:01.500 If you want to represent the number one, it would be zero one. 00:09:01.500 --> 00:09:05.466 And from there, we get zero two, zero three, zero four, zero five, zero six, 00:09:05.466 --> 00:09:10.230 zero seven, zero eight, zero nine, now things get potentially interesting. 00:09:10.230 --> 00:09:12.330 In decimal, it would obviously become 10. 00:09:12.330 --> 00:09:17.010 But in hexadecimal, it just becomes zero a then zero 00:09:17.010 --> 00:09:19.980 b, which is to say, if I rewind, after nine 00:09:19.980 --> 00:09:23.160 comes in hexadecimal, if I pronounce it in decimal, 00:09:23.160 --> 00:09:24.930 this is how you'd represent 10. 00:09:24.930 --> 00:09:30.450 This is how you'd represent 11, 12, 13, 14, and then lastly in hexadecimal, 00:09:30.450 --> 00:09:35.950 the 16th value is F, which is just always going to represent 15. 00:09:35.950 --> 00:09:39.210 So where-- how do we connect this to some of the past math? 00:09:39.210 --> 00:09:42.150 Well, once you get to zero F, in hexadecimal, 00:09:42.150 --> 00:09:45.060 if F is the highest you can count, just like in decimal, 00:09:45.060 --> 00:09:48.780 nine is the highest you can count, what comes next? 00:09:48.780 --> 00:09:53.250 If this is 15 I claim, how do I represent 16 in hexadecimal, 00:09:53.250 --> 00:09:56.520 with what pattern of symbols? 00:09:56.520 --> 00:09:58.600 What pattern of symbols for hexadecimal? 00:09:58.600 --> 00:09:59.100 Yeah. 00:09:59.100 --> 00:09:59.892 AUDIENCE: One zero. 00:09:59.892 --> 00:10:02.760 DAVID J. MALAN: So one zero, not 10, even though you might read it 00:10:02.760 --> 00:10:04.080 like that as a typical human. 00:10:04.080 --> 00:10:05.400 But one zero. 00:10:05.400 --> 00:10:06.207 Because why? 00:10:06.207 --> 00:10:08.040 Well, even if this is completely new to you, 00:10:08.040 --> 00:10:11.340 the whole column system, the places, are exactly the same intuitively. 00:10:11.340 --> 00:10:15.360 So you need one in the 16's place and a zero in the ones place. 00:10:15.360 --> 00:10:17.612 And we won't count all the way up to 255, 00:10:17.612 --> 00:10:19.320 but we count if we count a little higher, 00:10:19.320 --> 00:10:24.180 this would be one zero, AKA 16 in decimal, this would be one one, 00:10:24.180 --> 00:10:30.510 AKA 17 in decimal, and then 18, 19, 20, and so forth, dot, dot, dot. 00:10:30.510 --> 00:10:33.300 And we can count all the way up to FF. 00:10:33.300 --> 00:10:36.180 Because if F is the biggest digit in hexadecimal, 00:10:36.180 --> 00:10:38.610 FF is indeed as high as we can count. 00:10:38.610 --> 00:10:42.570 And if each F represents 15, well, let's just do the math like in week zero. 00:10:42.570 --> 00:10:48.510 So 16 times f plus 1 times f is how all of us learn to do math in grade school, 00:10:48.510 --> 00:10:50.250 even though not in hexadecimal. 00:10:50.250 --> 00:10:54.480 That's of course 16 times 15 plus 1 times 15. 00:10:54.480 --> 00:10:57.810 Multiply that out, you get 240, plus 15. 00:10:57.810 --> 00:11:04.887 And ergo, you can count as high as 255 using two hexadecimal digits. 00:11:04.887 --> 00:11:06.720 Now this is not the kind of thing where this 00:11:06.720 --> 00:11:09.930 is going to be an interesting exercise mentally to ever convert in your head. 00:11:09.930 --> 00:11:12.740 Generally, you'll get used to the fact that after nine comes 00:11:12.740 --> 00:11:15.063 A and the biggest digit is F. And you'll just 00:11:15.063 --> 00:11:17.480 start to see patterns like this in the world of Photoshop, 00:11:17.480 --> 00:11:19.520 web pages in a few weeks, and beyond. 00:11:19.520 --> 00:11:22.760 But why is hexadecimal useful? 00:11:22.760 --> 00:11:25.670 Why are we complicating the world and adding 00:11:25.670 --> 00:11:27.690 on top of decimals something else? 00:11:27.690 --> 00:11:30.260 Well, it turns out that a single decimal digit, like F, 00:11:30.260 --> 00:11:32.960 the biggest one for instance, is 15. 00:11:32.960 --> 00:11:35.360 And here, let me just propose a bit of mental math. 00:11:35.360 --> 00:11:41.270 How many bits do you need to represent the number 15 in binary? 00:11:41.270 --> 00:11:45.630 If you've got the ones place, twos place, 4s and so forth, 00:11:45.630 --> 00:11:47.166 how many bits total? 00:11:47.166 --> 00:11:47.980 AUDIENCE: Five. 00:11:47.980 --> 00:11:51.610 DAVID J. MALAN: So fewer than five to count as high as 15 I think. 00:11:51.610 --> 00:11:53.350 But close. 00:11:53.350 --> 00:11:55.960 Someone else? 00:11:55.960 --> 00:11:56.720 I'm seeing a hand. 00:11:56.720 --> 00:11:57.220 Yeah. 00:11:57.220 --> 00:11:57.845 AUDIENCE: Four. 00:11:57.845 --> 00:12:00.340 DAVID J. MALAN: So four bits I think suffice. 00:12:00.340 --> 00:12:04.000 Because if you want to count as high as F, that is to say 15, 00:12:04.000 --> 00:12:06.910 I think if you have four bits, you can do that. 00:12:06.910 --> 00:12:10.280 Because if over here is the ones place from week zero for binary, 00:12:10.280 --> 00:12:13.750 this is the twos place, this is the fours placed, this is the eights place. 00:12:13.750 --> 00:12:14.710 Do up some quick math. 00:12:14.710 --> 00:12:19.370 So 8 plus 4 is 12, plus 2 is 14, plus 1 is 15. 00:12:19.370 --> 00:12:22.690 So it turns out that by convenience, hexadecimal digits 00:12:22.690 --> 00:12:26.360 can just be represented consistently with four bits or fewer. 00:12:26.360 --> 00:12:27.190 But four. 00:12:27.190 --> 00:12:29.140 And four, of course, is half of eight. 00:12:29.140 --> 00:12:32.270 And eight is everywhere, like 8 bits is a byte, which is, again, 00:12:32.270 --> 00:12:33.590 just a convention we've seen. 00:12:33.590 --> 00:12:36.940 And so the reason that you see hexadecimal in the world of Photoshop, 00:12:36.940 --> 00:12:39.910 and eventually web pages, is it actually just maps 00:12:39.910 --> 00:12:43.330 really nicely to expressing binary numbers more 00:12:43.330 --> 00:12:45.800 succinctly with a fixed number of digits. 00:12:45.800 --> 00:12:52.677 So for instance, any time you see 11111111 in the world as binary, 00:12:52.677 --> 00:12:53.260 you know what? 00:12:53.260 --> 00:12:55.600 That's a little tedious to both say and write. 00:12:55.600 --> 00:13:02.230 You can represent more succinctly any group of four 1 bits more succinctly 00:13:02.230 --> 00:13:09.310 in hexadecimal as just F. So 11111111 in binary more succinctly and more 00:13:09.310 --> 00:13:13.000 commonly now in the world of Photoshop, memory, images, and the like 00:13:13.000 --> 00:13:14.950 is represented more succinctly as FF. 00:13:14.950 --> 00:13:18.350 And that's why because it just maps really nicely to 4 bits. 00:13:18.350 --> 00:13:20.630 And so we can be a little more succinct. 00:13:20.630 --> 00:13:23.710 So any questions on hexadecimal, which is just 00:13:23.710 --> 00:13:27.110 another way of representing information but using the same grade school 00:13:27.110 --> 00:13:27.610 approach? 00:13:27.610 --> 00:13:28.110 Yeah. 00:13:28.110 --> 00:13:28.780 AUDIENCE: So-- 00:13:28.780 --> 00:13:30.030 DAVID J. MALAN: Good question. 00:13:30.030 --> 00:13:33.130 If you represent 15 with F, it would use 4 bits. 00:13:33.130 --> 00:13:37.450 So base systems are really just a way for us humans on paper or on screens 00:13:37.450 --> 00:13:38.830 to represent information. 00:13:38.830 --> 00:13:42.820 If F represents the decimal number 15, the computer underneath the hood 00:13:42.820 --> 00:13:45.610 has to use 4 bits to represent it. 00:13:45.610 --> 00:13:48.700 So one hexadecimal digit by convention always 00:13:48.700 --> 00:13:51.160 implies 4 bits underneath the hood. 00:13:51.160 --> 00:13:53.680 So therefore, if you have two hexadecimal digits, 00:13:53.680 --> 00:13:57.430 like zero, zero, that means eight zero bits underneath the hood 00:13:57.430 --> 00:13:59.200 like for red or for green. 00:13:59.200 --> 00:14:03.460 If you see FF, now we know that's 4 one bits and another 4 one bits. 00:14:03.460 --> 00:14:05.650 And if we do out the math, that's 255. 00:14:05.650 --> 00:14:14.320 That's why in Photoshop, 0000FF means no red, no green, and 255 of blue. 00:14:14.320 --> 00:14:17.050 And it's just way more succinct than writing out what, 8 plus 8, 00:14:17.050 --> 00:14:19.090 plus 8, 24 zeros and ones. 00:14:19.090 --> 00:14:21.370 And it's just cleaner than even using decimal 00:14:21.370 --> 00:14:25.273 when you're using units of eight, which again computers just use everywhere. 00:14:25.273 --> 00:14:26.440 So it's just another system. 00:14:26.440 --> 00:14:28.273 It's not one you need to dwell on very much. 00:14:28.273 --> 00:14:32.110 But again, it's fundamentally no different from binary or decimal. 00:14:32.110 --> 00:14:34.977 We're just using a slightly different base. 00:14:34.977 --> 00:14:35.560 Now all right. 00:14:35.560 --> 00:14:37.720 Well, we had this blank canvas here. 00:14:37.720 --> 00:14:40.600 And I think, are you two perhaps ready to reveal 00:14:40.600 --> 00:14:42.140 for the world what you've created? 00:14:42.140 --> 00:14:43.348 Do you want to go ahead and-- 00:14:43.348 --> 00:14:45.170 I'll swivel it around for you. 00:14:45.170 --> 00:14:45.670 All right. 00:14:45.670 --> 00:14:46.180 Here we go. 00:14:46.180 --> 00:14:46.870 Big reveal. 00:14:46.870 --> 00:14:51.760 And today's pixel art, a round of applause if we could. 00:14:54.280 --> 00:14:55.325 Very nicely done. 00:14:55.325 --> 00:14:56.200 Well, thank you both. 00:14:56.200 --> 00:14:58.600 If you want to come up after, and tear this off, and bring it home, 00:14:58.600 --> 00:15:00.430 you're welcome to, and keep the post-it notes too. 00:15:00.430 --> 00:15:02.140 Well, thank you to our volunteers there. 00:15:02.140 --> 00:15:05.793 Let's now translate this to really more technical world 00:15:05.793 --> 00:15:07.960 where we're going to see and consider it more often. 00:15:07.960 --> 00:15:10.570 Because in fact, sometimes, when you've had error messages 00:15:10.570 --> 00:15:13.000 over the past few weeks from clang, the compiler, 00:15:13.000 --> 00:15:15.483 you might have even seen evidence of hexadecimal. 00:15:15.483 --> 00:15:16.400 We didn't call it out. 00:15:16.400 --> 00:15:17.980 It wasn't useful to know at the time. 00:15:17.980 --> 00:15:21.880 But it turns out a lot of programs use, and a lot of code, 00:15:21.880 --> 00:15:25.490 uses hexadecimal for those reasons of more precise-- 00:15:25.490 --> 00:15:26.930 more succinct representation. 00:15:26.930 --> 00:15:28.840 So for instance, where else might we see it? 00:15:28.840 --> 00:15:31.990 Well, here's that picture we keep pulling up of our computer's memory. 00:15:31.990 --> 00:15:34.330 And each of these squares in this grid represents 00:15:34.330 --> 00:15:37.210 a byte, sort of top left to bottom right in the computer's memory. 00:15:37.210 --> 00:15:39.730 But again, just an artist's representation. 00:15:39.730 --> 00:15:43.570 A few weeks ago, I claimed that each of these bytes can be numbered of course. 00:15:43.570 --> 00:15:46.300 Like this is byte 0 at top left, then byte one, then 00:15:46.300 --> 00:15:49.760 byte two, then byte two billion if you have 2 gigabytes of memory. 00:15:49.760 --> 00:15:54.760 And so we could just number them like this, zero through 15 on up. 00:15:54.760 --> 00:15:56.480 16, 17, 18, and so forth. 00:15:56.480 --> 00:16:00.500 But per the reasons earlier, it's just more common in computer systems 00:16:00.500 --> 00:16:03.340 and in software to actually use hexadecimal just 00:16:03.340 --> 00:16:07.030 to describe the locations of, the addresses, of things in memory. 00:16:07.030 --> 00:16:10.120 So instead, a typical programmer, or a computer scientist, 00:16:10.120 --> 00:16:14.230 would call these first 16 bytes zero through F just because. 00:16:14.230 --> 00:16:17.000 But that's because it's a predictable number of bits. 00:16:17.000 --> 00:16:21.670 So if we keep going beyond that, you would get not 10, not 11, not 12, 00:16:21.670 --> 00:16:25.900 but in hexadecimal, one, zero, one, one, one, two, and so forth, 00:16:25.900 --> 00:16:30.520 all the way down on the screen to one F. And if I shrunk this down or had 00:16:30.520 --> 00:16:34.450 a bigger monitor, we would see eventually 255 bytes later 00:16:34.450 --> 00:16:37.400 from the start 255 as well. 00:16:37.400 --> 00:16:40.840 But there's a potential problem here with using hexadecimal in this way. 00:16:40.840 --> 00:16:42.730 There's an ambiguity. 00:16:42.730 --> 00:16:49.180 Can anyone imagine what can go wrong if we use hex to just simply describe 00:16:49.180 --> 00:16:52.960 locations in memory like this? 00:16:52.960 --> 00:16:53.650 Yeah. 00:16:53.650 --> 00:16:55.285 AUDIENCE: One zero might also be 10. 00:16:55.285 --> 00:16:56.160 DAVID J. MALAN: Yeah. 00:16:56.160 --> 00:16:57.960 One zero might also be 10. 00:16:57.960 --> 00:17:01.090 And maybe if you're really thorough, OK, wait a minute. 00:17:01.090 --> 00:17:02.950 It can't be 10 because here's F over here. 00:17:02.950 --> 00:17:04.200 So it's obviously not decimal. 00:17:04.200 --> 00:17:07.079 But why create potential confusion, especially when you're collaborating, 00:17:07.079 --> 00:17:08.412 building something with someone? 00:17:08.412 --> 00:17:09.760 We want to avoid that ambiguity. 00:17:09.760 --> 00:17:12.359 And so the convention humans decided on years ago 00:17:12.359 --> 00:17:16.380 is that if you want to make clear that a number is in hexadecimal just 00:17:16.380 --> 00:17:20.790 by convention, you prefix all of the digits with 0x. 00:17:20.790 --> 00:17:22.950 The X is not another character. 00:17:22.950 --> 00:17:24.720 It's not a 17th character. 00:17:24.720 --> 00:17:29.700 It's just a human convention of putting 0x to imply, here comes hexadecimal. 00:17:29.700 --> 00:17:31.020 And now it's unambiguous. 00:17:31.020 --> 00:17:35.760 So now we see 0x10 obviously is not 10 as we know it in decimal. 00:17:35.760 --> 00:17:39.060 But rather it's the number that comes after a single F. 00:17:39.060 --> 00:17:41.430 So it's really the number in decimal 16. 00:17:41.430 --> 00:17:46.620 So 0x, any time you see it, that's just a visual cue that what is ahead 00:17:46.620 --> 00:17:48.940 is actually hexadecimal. 00:17:48.940 --> 00:17:52.480 So let's now start playing around with this information. 00:17:52.480 --> 00:17:54.750 So here's a super simple line of code from week one 00:17:54.750 --> 00:17:59.445 where I'm just declaring a variable n, and I'm defining it to be the value 50. 00:17:59.445 --> 00:18:00.570 And this is out of context. 00:18:00.570 --> 00:18:02.612 We probably need a main function and all of that. 00:18:02.612 --> 00:18:05.820 But let's just rewind to week one where we actually saw code like this 00:18:05.820 --> 00:18:08.530 and do something useful with a line of code like this. 00:18:08.530 --> 00:18:10.500 So let me go over here to VS Code. 00:18:10.500 --> 00:18:14.070 And in VS Code, I'll create a program called, how about addresses? 00:18:14.070 --> 00:18:15.900 Since the goal of this-- 00:18:15.900 --> 00:18:20.310 the goal here is to just play around, ultimately, with a variable like n. 00:18:20.310 --> 00:18:21.750 And let me go ahead and do this. 00:18:21.750 --> 00:18:24.510 I'll include, how about standard I/O.h? 00:18:24.510 --> 00:18:25.770 I'll do int main void. 00:18:25.770 --> 00:18:28.260 So no command line arguments for now. 00:18:28.260 --> 00:18:30.090 Int n gets 50. 00:18:30.090 --> 00:18:32.950 And now so that we can do something mildly useful with it, 00:18:32.950 --> 00:18:37.950 let's just go use printf and print out with %i and then a new line whatever 00:18:37.950 --> 00:18:38.970 that value of n is. 00:18:38.970 --> 00:18:41.020 So this is not going to be interesting per se. 00:18:41.020 --> 00:18:43.620 It's just week one stuff where I'm defining a variable 00:18:43.620 --> 00:18:45.610 and printing it out to the screen. 00:18:45.610 --> 00:18:49.290 So let me go down to my terminal window and do make addresses. 00:18:49.290 --> 00:18:50.405 No errors. 00:18:50.405 --> 00:18:51.030 So that's good. 00:18:51.030 --> 00:18:52.440 I'll do dot slash addresses. 00:18:52.440 --> 00:18:55.500 And of course, I should see the number 50 here. 00:18:55.500 --> 00:18:57.360 Now what's going on underneath the hood? 00:18:57.360 --> 00:19:00.900 Let's translate now code to really what's going 00:19:00.900 --> 00:19:03.430 on underneath the hood of the computer. 00:19:03.430 --> 00:19:05.793 So if this is our grid of memory, I don't necessarily 00:19:05.793 --> 00:19:07.710 know as the programmer, and I definitely don't 00:19:07.710 --> 00:19:10.440 care as the programmer, where exactly it's ending up in memory. 00:19:10.440 --> 00:19:11.670 That's the whole point of using code. 00:19:11.670 --> 00:19:13.140 Let the computer figure this out. 00:19:13.140 --> 00:19:17.430 But at least conceptually, I know that by declaring a line of code like that, 00:19:17.430 --> 00:19:21.250 the number 50 ends up somewhere in the computer's memory. 00:19:21.250 --> 00:19:25.980 And it's assigned the name n, a symbol n, by which I, the programmer, 00:19:25.980 --> 00:19:26.890 can refer to it. 00:19:26.890 --> 00:19:33.810 And I very deliberately used four of these squares for what reason? 00:19:33.810 --> 00:19:37.260 What might be the reason for using four squares specifically? 00:19:37.260 --> 00:19:38.100 Yeah. 00:19:38.100 --> 00:19:39.660 Yeah, so an integer is 4 bytes. 00:19:39.660 --> 00:19:42.870 At least most of the time on modern systems, an integer is 4 bytes. 00:19:42.870 --> 00:19:44.760 On an older computer, it might just use one. 00:19:44.760 --> 00:19:46.890 Or maybe even 2 bytes. 00:19:46.890 --> 00:19:49.710 But here, by convention, we're almost always going to see 4 bytes. 00:19:49.710 --> 00:19:51.190 I don't know if it's going to end up here. 00:19:51.190 --> 00:19:52.330 It might end up over here. 00:19:52.330 --> 00:19:53.550 But for now, who cares? 00:19:53.550 --> 00:19:56.340 I just know that the computer can store the information 00:19:56.340 --> 00:19:58.450 in this way underneath the hood. 00:19:58.450 --> 00:20:01.560 So let's now introduce another feature of C 00:20:01.560 --> 00:20:04.260 that we haven't had occasion to use just yet that's 00:20:04.260 --> 00:20:07.560 going to allow us to start poking around the computer's memory 00:20:07.560 --> 00:20:08.640 for better or for worse. 00:20:08.640 --> 00:20:10.348 And this is one of those situations where 00:20:10.348 --> 00:20:14.850 you're about to learn, acquire a skill, a power, that can actually come back 00:20:14.850 --> 00:20:15.420 to bite you. 00:20:15.420 --> 00:20:18.510 Because once you know how to start poking around a computer's memory, 00:20:18.510 --> 00:20:20.050 you can do very powerful things. 00:20:20.050 --> 00:20:22.920 And next week, we'll see what you can build in a computer's memory, 00:20:22.920 --> 00:20:24.997 but you can also screw up pretty easily and cause 00:20:24.997 --> 00:20:28.080 more of those segmentation faults that a few of you have already suffered. 00:20:28.080 --> 00:20:31.350 So with that said, let's just stipulate that you know what? 00:20:31.350 --> 00:20:34.510 I don't care necessarily where the 50 is in memory. 00:20:34.510 --> 00:20:37.230 But I know it exists at some address in memory. 00:20:37.230 --> 00:20:39.300 And just so I have an easy address to pronounce, 00:20:39.300 --> 00:20:42.060 let's just suppose it lives at 0x123. 00:20:42.060 --> 00:20:45.180 So that's the address in memory in hexadecimal by convention. 00:20:45.180 --> 00:20:48.730 And that just happens to be where it ends up when I write that line of code. 00:20:48.730 --> 00:20:52.770 But it turns out, C has some other operators we can use. 00:20:52.770 --> 00:20:55.290 When we've seen the asterisk before, the star, and we've 00:20:55.290 --> 00:20:56.588 used it for multiplication. 00:20:56.588 --> 00:20:59.130 But today, we're going to use it for something more powerful. 00:20:59.130 --> 00:21:01.338 And we're also going to introduce an ampersand, which 00:21:01.338 --> 00:21:02.970 allows us to do something as well. 00:21:02.970 --> 00:21:06.270 The ampersand operator is going to allow us 00:21:06.270 --> 00:21:11.970 to get the address of a piece of data in memory, like by literally putting 00:21:11.970 --> 00:21:14.520 ampersand before the name of a variable, C 00:21:14.520 --> 00:21:18.210 will tell us, tell you, what address that variable lives at. 00:21:18.210 --> 00:21:20.730 Maybe it's 0x123, maybe it's 0x456. 00:21:20.730 --> 00:21:21.270 Who knows? 00:21:21.270 --> 00:21:23.610 But that will give you back the answer. 00:21:23.610 --> 00:21:25.360 The star does the opposite. 00:21:25.360 --> 00:21:26.710 It sort of means, go there. 00:21:26.710 --> 00:21:30.090 So using the star, otherwise known as the de-reference operator, 00:21:30.090 --> 00:21:33.000 I can actually go to a specific address if I want. 00:21:33.000 --> 00:21:35.230 And we'll see what this means in code. 00:21:35.230 --> 00:21:39.510 So how can I leverage this in some mildly interesting way 00:21:39.510 --> 00:21:40.470 to start poking around? 00:21:40.470 --> 00:21:44.650 But eventually, we'll use this primitive to build more interesting things. 00:21:44.650 --> 00:21:47.520 So let me go back to say, VS Code here. 00:21:47.520 --> 00:21:49.350 And let me go ahead and do this. 00:21:49.350 --> 00:21:51.210 I'll clear my terminal to start fresh. 00:21:51.210 --> 00:21:55.430 And I'll introduce another format code for printf, %p. 00:21:55.430 --> 00:21:59.630 And for now, just take it on faith that this it is %p because. 00:21:59.630 --> 00:22:05.420 But %p is going to allow me to print the address of a variable if I additionally 00:22:05.420 --> 00:22:08.150 tell C, get the address of n. 00:22:08.150 --> 00:22:10.340 So I'm changing %i to %p. 00:22:10.340 --> 00:22:13.700 And that's just something you have to do when printing addresses for now. 00:22:13.700 --> 00:22:17.310 But I need to change an in front of the variable name. 00:22:17.310 --> 00:22:19.220 So I don't print n, the number 50. 00:22:19.220 --> 00:22:21.178 I print out something like 0x123. 00:22:21.178 --> 00:22:22.970 And it's not going to be as simple as that. 00:22:22.970 --> 00:22:24.970 We'll see on the screen though where it actually 00:22:24.970 --> 00:22:27.090 ended up in my code space's memory. 00:22:27.090 --> 00:22:28.490 So here we go. 00:22:28.490 --> 00:22:32.720 Dot-- down in my terminal, make addresses again to recompile. 00:22:32.720 --> 00:22:37.610 And now, dot slash addresses should reveal not the value of 50, 00:22:37.610 --> 00:22:40.310 but the address of 50. 00:22:40.310 --> 00:22:41.570 And there it is. 00:22:41.570 --> 00:22:42.890 It's pretty long. 00:22:42.890 --> 00:22:45.230 It's not quite as simple and pretty as 0x123. 00:22:45.230 --> 00:22:47.720 But there's the 0x, meaning here's a hexadecimal address. 00:22:47.720 --> 00:22:52.070 And it's 7ffcc784a04c. 00:22:52.070 --> 00:22:55.310 Suffice it to say your code space, and even your Macs and PCs nowadays, 00:22:55.310 --> 00:22:56.760 have a lot of memory. 00:22:56.760 --> 00:23:00.320 That's why, in part, this address is so big, not as small 00:23:00.320 --> 00:23:02.040 as the thing on my slide. 00:23:02.040 --> 00:23:05.840 So this at the moment isn't that useful yet. 00:23:05.840 --> 00:23:09.800 But it introduces us to a concept that we'll now call pointers. 00:23:09.800 --> 00:23:14.420 And pointers are admittedly one of the more challenging aspects of C. 00:23:14.420 --> 00:23:18.643 And if in future life, you tell friends that, oh, I took a class called CS50, 00:23:18.643 --> 00:23:20.810 and we learned C, you'll probably get kind of a look 00:23:20.810 --> 00:23:22.460 at people like, why did you learn C? 00:23:22.460 --> 00:23:23.900 Or like, oh, C was hard. 00:23:23.900 --> 00:23:27.380 And it's largely because of this topic, which 00:23:27.380 --> 00:23:30.812 isn't to say that it's that hard to wrap your mind around. 00:23:30.812 --> 00:23:32.270 But it's definitely very different. 00:23:32.270 --> 00:23:36.080 And it's not a feature that you can harness in higher level languages 00:23:36.080 --> 00:23:39.800 that we'll see in class two, like Python, and Java, and the like. 00:23:39.800 --> 00:23:42.290 C is about as close to the computer's hardware, 00:23:42.290 --> 00:23:45.350 so to speak, that you can get before things get actually scary, 00:23:45.350 --> 00:23:48.770 the so-called assembly language we saw in week two when I had a link, 00:23:48.770 --> 00:23:50.900 and compile, and assemble, and all of that. 00:23:50.900 --> 00:23:52.100 That gets really low level. 00:23:52.100 --> 00:23:55.310 And you really have to be an expert with the computer's CPU, or brain, 00:23:55.310 --> 00:23:56.340 to understand that. 00:23:56.340 --> 00:23:59.730 But with C, you can actually poke around the computer's memory 00:23:59.730 --> 00:24:01.130 and do powerful things with that. 00:24:01.130 --> 00:24:03.470 But again, with great power comes responsibility. 00:24:03.470 --> 00:24:07.670 It's very easy to break programs by misusing memory or just having a bug 00:24:07.670 --> 00:24:11.220 that touches memory in some way that you don't intend. 00:24:11.220 --> 00:24:16.370 So pointers, at the end of the day, are pretty much what we just saw. 00:24:16.370 --> 00:24:22.910 A pointer is really just a variable that contains the address of some value. 00:24:22.910 --> 00:24:25.790 A pointer is a variable that contains the address of some value, 00:24:25.790 --> 00:24:28.280 or more simply, it's fine to think of it as an address. 00:24:28.280 --> 00:24:31.650 A pointer is an address of something in the computer's memory. 00:24:31.650 --> 00:24:35.880 Now, what might we do to actualize this? 00:24:35.880 --> 00:24:37.440 Well, here's two lines of code. 00:24:37.440 --> 00:24:41.600 It turns out by using our two new operators today, I can declare an int, 00:24:41.600 --> 00:24:45.770 call it n, and assign it a value like 50, just like before. 00:24:45.770 --> 00:24:49.400 If I want to store the address of n in a variable, 00:24:49.400 --> 00:24:51.680 and not just print it immediately via printf, 00:24:51.680 --> 00:24:54.458 I can declare a variable, for instance, called p. 00:24:54.458 --> 00:24:56.750 But I could call it anything I want, like any variable. 00:24:56.750 --> 00:24:59.960 But because it's an address, it's not int p. 00:24:59.960 --> 00:25:03.410 It has to be int star p, so to speak. 00:25:03.410 --> 00:25:06.590 And the star here on the left hand side of the equal sign 00:25:06.590 --> 00:25:10.790 is just a clue to see that means p is going to be a pointer. 00:25:10.790 --> 00:25:13.520 That is, p is going to be the address of what? 00:25:13.520 --> 00:25:15.450 The address of an integer. 00:25:15.450 --> 00:25:17.540 Now technically, it's still an integer itself 00:25:17.540 --> 00:25:21.230 because an address is just a number whether it's 1, 2, 3, or 0x123. 00:25:21.230 --> 00:25:23.310 So this is really just a semantic difference. 00:25:23.310 --> 00:25:26.450 So int star p just means that this variable doesn't 00:25:26.450 --> 00:25:28.580 contain any old number, like 50. 00:25:28.580 --> 00:25:33.570 It specifically contains a number that is the address of something else. 00:25:33.570 --> 00:25:35.670 So how can I now use this? 00:25:35.670 --> 00:25:37.400 Well, let me go back to VS Code. 00:25:37.400 --> 00:25:41.670 And let me propose that we add a line of code like that. 00:25:41.670 --> 00:25:44.220 So instead of just directly printing out that value, 00:25:44.220 --> 00:25:48.170 let's go ahead and define a second variable called p that's of type int 00:25:48.170 --> 00:25:53.480 star p, set it equal to ampersand n, and then this time, 00:25:53.480 --> 00:25:55.460 let's not just print out ampersand n. 00:25:55.460 --> 00:25:57.530 Let's actually print out the value of p. 00:25:57.530 --> 00:25:59.810 So the only two new things here if I zoom in 00:25:59.810 --> 00:26:04.160 are I've used not only the ampersand on the right to get the address of n. 00:26:04.160 --> 00:26:07.130 I'm now using the star on the left to tell C 00:26:07.130 --> 00:26:10.190 that p is still a variable as always. 00:26:10.190 --> 00:26:11.450 But it's a pointer. 00:26:11.450 --> 00:26:14.897 It is the address of some other value like this. 00:26:14.897 --> 00:26:17.480 And I'm still going to print it with the same format code, %p. 00:26:17.480 --> 00:26:18.660 So that doesn't change. 00:26:18.660 --> 00:26:24.680 So let me go ahead and zoom out and do make addresses, and ./addresses. 00:26:24.680 --> 00:26:27.620 And there it is, exactly the same thing. 00:26:27.620 --> 00:26:30.500 Now in and of itself, not that useful yet. 00:26:30.500 --> 00:26:34.280 But the fact that you can now access the addresses of things in memory 00:26:34.280 --> 00:26:38.360 means that we'll be able to build things, and construct things, and link 00:26:38.360 --> 00:26:41.690 things together by knowing where they live, so to speak. 00:26:41.690 --> 00:26:44.570 So any questions on this technique thus far? 00:26:44.570 --> 00:26:45.511 Yeah. 00:26:45.511 --> 00:26:48.767 AUDIENCE: I guess I'm a little confused about the [INAUDIBLE].. 00:26:48.767 --> 00:26:50.100 DAVID J. MALAN: A good question. 00:26:50.100 --> 00:26:52.320 On line six, must it be star p and ampersand? 00:26:52.320 --> 00:26:54.210 And in this case, yes. 00:26:54.210 --> 00:26:55.290 Because what am I doing? 00:26:55.290 --> 00:26:58.050 On the left, and I'll get rid of the equal sign for now, 00:26:58.050 --> 00:27:02.640 this would give me a variable called p that's not an integer per se, 00:27:02.640 --> 00:27:04.710 but that's the address of an integer. 00:27:04.710 --> 00:27:08.110 But without the equal sign, I'm not storing anything in that variable. 00:27:08.110 --> 00:27:12.390 So by adding the equal sign and then ampersand n, 00:27:12.390 --> 00:27:16.650 I am explicitly figuring out with ampersand what the address of n 00:27:16.650 --> 00:27:20.100 is, which already exists per line five and tucking it away 00:27:20.100 --> 00:27:23.310 in this new variable called p. 00:27:23.310 --> 00:27:24.100 Other questions? 00:27:24.100 --> 00:27:24.600 Yeah. 00:27:24.600 --> 00:27:26.570 AUDIENCE: [INAUDIBLE] 00:27:26.570 --> 00:27:27.820 DAVID J. MALAN: Good question. 00:27:27.820 --> 00:27:30.695 Every time I run the program, it uses up a different piece of memory? 00:27:30.695 --> 00:27:31.770 Short answer, yes. 00:27:31.770 --> 00:27:33.628 Computers, though, long story short, also 00:27:33.628 --> 00:27:35.170 have something called virtual memory. 00:27:35.170 --> 00:27:37.337 So if you run it again and again, you might actually 00:27:37.337 --> 00:27:40.590 see the same addresses on the same Mac, or PC, or cloud-based server. 00:27:40.590 --> 00:27:44.580 But we'll see in a bit where at a high level it's laid out. 00:27:44.580 --> 00:27:47.447 But it will always exist at some address. 00:27:47.447 --> 00:27:48.030 Good question. 00:27:48.030 --> 00:27:48.530 Yeah. 00:27:48.530 --> 00:27:50.120 AUDIENCE: [INAUDIBLE] 00:27:50.120 --> 00:27:51.120 DAVID J. MALAN: Correct. 00:27:51.120 --> 00:27:52.980 Ampersand n is the address of n. 00:27:52.980 --> 00:27:57.660 And int star p is a pointer called p. 00:27:57.660 --> 00:28:02.550 And honestly, in an ideal world, if C were made today and not decades ago 00:28:02.550 --> 00:28:05.030 when humans were first creating languages, 00:28:05.030 --> 00:28:08.220 ideally, we would just have a data type called pointer. 00:28:08.220 --> 00:28:11.370 And then this would be a little less complicated because it would literally 00:28:11.370 --> 00:28:12.480 be what it says. 00:28:12.480 --> 00:28:14.290 The humans who invented C didn't do this. 00:28:14.290 --> 00:28:15.580 But this is the idea. 00:28:15.580 --> 00:28:18.060 So pointer is not a legitimate word in the code. 00:28:18.060 --> 00:28:20.070 It is a term of R in English. 00:28:20.070 --> 00:28:21.780 But this is really just the idea. 00:28:21.780 --> 00:28:25.140 But the way you express pointer as a data type 00:28:25.140 --> 00:28:29.530 is a little more cryptic as int star p here. 00:28:29.530 --> 00:28:34.643 But notice in line seven, when I print out p, I don't use a star. 00:28:34.643 --> 00:28:35.685 I don't use an ampersand. 00:28:35.685 --> 00:28:36.090 Why? 00:28:36.090 --> 00:28:38.040 I literally just want to print the value of p. 00:28:38.040 --> 00:28:39.748 And we've been doing that since week one. 00:28:39.748 --> 00:28:42.930 If you want to print a variable, just describe the variable by its name. 00:28:42.930 --> 00:28:44.400 No special syntax. 00:28:44.400 --> 00:28:46.690 Any other questions on this thus far? 00:28:46.690 --> 00:28:48.148 AUDIENCE: [INAUDIBLE] 00:28:48.148 --> 00:28:50.440 DAVID J. MALAN: What's the advantage of using pointers? 00:28:50.440 --> 00:28:53.560 With pointers, we'll see today some applications of them, 00:28:53.560 --> 00:28:55.572 really the idea is going to come to fruition 00:28:55.572 --> 00:28:57.280 next week when we're going to create what 00:28:57.280 --> 00:29:01.330 are called data structures in memory, where we can build not just, 00:29:01.330 --> 00:29:04.548 for instance, one dimensional data structures like an array. 00:29:04.548 --> 00:29:06.340 We'll see next week, we can actually create 00:29:06.340 --> 00:29:08.500 the equivalent of two dimensional data structures, 00:29:08.500 --> 00:29:10.250 or even three dimensional data structures, 00:29:10.250 --> 00:29:12.852 by using these addresses and sort of linking things together. 00:29:12.852 --> 00:29:14.810 And we'll see the beginnings of that this week. 00:29:14.810 --> 00:29:18.160 But for now, focus at least for now on just really the syntax 00:29:18.160 --> 00:29:20.485 and what these building blocks can do for us. 00:29:20.485 --> 00:29:22.960 AUDIENCE: Does the p pointer have to be an integer? 00:29:22.960 --> 00:29:25.150 DAVID J. MALAN: Does the p integer-- 00:29:25.150 --> 00:29:27.922 does the p pointer have to be an-- point to an integer? 00:29:27.922 --> 00:29:28.630 Short answer, no. 00:29:28.630 --> 00:29:29.500 And we'll come back to this. 00:29:29.500 --> 00:29:31.250 For now, for the sake of discussion, we're 00:29:31.250 --> 00:29:33.460 only dealing with integers like the number 50. 00:29:33.460 --> 00:29:35.410 You mentioned strings, or characters. 00:29:35.410 --> 00:29:36.010 Absolutely. 00:29:36.010 --> 00:29:38.060 We're about to go there soon. 00:29:38.060 --> 00:29:41.510 So you can use the address of anything you want in the computer's memory. 00:29:41.510 --> 00:29:44.643 So in fact, let's translate this now to just the same picture just 00:29:44.643 --> 00:29:47.560 to help you wrap your minds around what these two lines of code really 00:29:47.560 --> 00:29:49.000 fundamentally are doing. 00:29:49.000 --> 00:29:51.460 So if I come back to my grid of memory here, 00:29:51.460 --> 00:29:54.910 let's plop the number 50 in the variable n at the bottom right, 00:29:54.910 --> 00:29:56.000 like it was before. 00:29:56.000 --> 00:29:58.040 So this is that first line of code as before. 00:29:58.040 --> 00:30:03.080 But with the new second line of code, as soon as I create p, what do I do? 00:30:03.080 --> 00:30:07.177 Well, first, remember that n lives somewhere in the computer's memory. 00:30:07.177 --> 00:30:09.010 Usually, I don't care precisely where it is. 00:30:09.010 --> 00:30:10.885 But for the sake of discussion, let's suppose 00:30:10.885 --> 00:30:14.480 it's at 0x123, which is easier to say than where it actually ended up. 00:30:14.480 --> 00:30:15.820 And now what is p? 00:30:15.820 --> 00:30:17.630 Well, p is just another variable. 00:30:17.630 --> 00:30:19.280 And variables live in memory too. 00:30:19.280 --> 00:30:22.550 So let me just hypothesize that p lives up here. 00:30:22.550 --> 00:30:28.630 And it turns out that p once you assign it, the value of ampersand n 00:30:28.630 --> 00:30:32.110 means that C will take a look at the variable n, 00:30:32.110 --> 00:30:37.240 realize, oh it lives at 0x123, and what goes in the value of p 00:30:37.240 --> 00:30:39.550 is literally 0x123. 00:30:39.550 --> 00:30:42.440 So again, it's still an integer, which is confusing. 00:30:42.440 --> 00:30:45.520 But it's technically an integer being used as an address. 00:30:45.520 --> 00:30:49.960 And now just a prompt here, notice that this pointer is pretty darn big. 00:30:49.960 --> 00:30:51.940 It's like eight squares. 00:30:51.940 --> 00:30:53.350 What's the implication of that? 00:30:53.350 --> 00:30:55.000 Because I did that deliberately. 00:30:55.000 --> 00:30:59.205 How big must a pointer apparently be in most modern systems, would you say? 00:30:59.205 --> 00:31:00.080 AUDIENCE: [INAUDIBLE] 00:31:00.080 --> 00:31:00.490 DAVID J. MALAN: OK, good. 00:31:00.490 --> 00:31:01.480 Computers today are very big. 00:31:01.480 --> 00:31:03.310 You have gigabytes of RAM in your computer. 00:31:03.310 --> 00:31:05.477 You therefore need big pointers to be able to point, 00:31:05.477 --> 00:31:07.640 and memory that's conceptually pretty far away. 00:31:07.640 --> 00:31:10.840 So to be clear, how many bytes does a pointer apparently take up? 00:31:10.840 --> 00:31:12.820 Well, it seems to take up 8 in total. 00:31:12.820 --> 00:31:15.280 Integers by convention nowadays are usually 4. 00:31:15.280 --> 00:31:18.393 Pointers though nowadays are typically 8 in this case. 00:31:18.393 --> 00:31:20.810 So I'm drawing it in a manner consistent with the reality, 00:31:20.810 --> 00:31:23.260 even though at the end of the day, it's not really that interesting 00:31:23.260 --> 00:31:24.590 what values are in here. 00:31:24.590 --> 00:31:27.012 In fact, let's emerge from these weeds. 00:31:27.012 --> 00:31:28.720 I don't really care what else is going on 00:31:28.720 --> 00:31:31.012 in my computer's memory at the moment because I've only 00:31:31.012 --> 00:31:34.450 got those two lines of juicy code defining n and defining p. 00:31:34.450 --> 00:31:36.370 So let's hide all of the other squares. 00:31:36.370 --> 00:31:39.190 And honestly, I mean it when I say that programmers 00:31:39.190 --> 00:31:43.490 need to know that a variable exists somewhere in memory, 00:31:43.490 --> 00:31:46.660 and needs to be able to get that address using the ampersand, 00:31:46.660 --> 00:31:50.680 but you're never going to printf like I did, the actual address. 00:31:50.680 --> 00:31:53.680 It's not generally interesting, unless you're debugging your code. 00:31:53.680 --> 00:31:57.250 But you're not going to start typing out crazy 0x numbers in your code 00:31:57.250 --> 00:31:58.370 to move things around. 00:31:58.370 --> 00:32:01.720 You just need to know that the computer can figure out where things are. 00:32:01.720 --> 00:32:05.800 So frankly, by that logic, who cares that it's 0x123? 00:32:05.800 --> 00:32:08.510 Tomorrow, it could be 0x456 or something else. 00:32:08.510 --> 00:32:12.610 So one of the ways to think of a pointer is literally as 00:32:12.610 --> 00:32:15.890 a variable that points at something else. 00:32:15.890 --> 00:32:19.750 And indeed, in this case, p, yeah, technically it has an address. 00:32:19.750 --> 00:32:22.030 And yeah, technically it's 0x123 in this story. 00:32:22.030 --> 00:32:23.140 But honestly, who cares? 00:32:23.140 --> 00:32:28.900 I just need to know that using p, I can get to the value n. 00:32:28.900 --> 00:32:30.370 And so what are these addresses? 00:32:30.370 --> 00:32:33.430 And in fact, if Carter wouldn't mind joining me up here for a moment, 00:32:33.430 --> 00:32:34.480 what are these addresses? 00:32:34.480 --> 00:32:36.760 Well, just like in our human world we have mailboxes, 00:32:36.760 --> 00:32:39.260 even though you might not check it very frequently nowadays, 00:32:39.260 --> 00:32:43.060 but to get physical mail, every home, every business has a unique address. 00:32:43.060 --> 00:32:47.080 The Science and Engineering Complex is 150 Western Avenue Allston, 00:32:47.080 --> 00:32:49.930 Massachusetts, 02134 USA. 00:32:49.930 --> 00:32:53.830 And theoretically, that uniquely identifies that building in the world. 00:32:53.830 --> 00:32:56.170 Well, here we have two mailboxes. 00:32:56.170 --> 00:33:00.160 Over here, we have a value n that happens to live, I'll claim, 00:33:00.160 --> 00:33:01.960 at address 0x123. 00:33:01.960 --> 00:33:07.092 And then over here, I claim there's another address called by name p. 00:33:07.092 --> 00:33:10.300 I don't actually care where it is, even though it definitely exists somewhere 00:33:10.300 --> 00:33:11.540 in the computer's memory. 00:33:11.540 --> 00:33:16.090 But if this is p, which is a variable, and that's n, another variable, 00:33:16.090 --> 00:33:18.178 ideally, this mailbox would be twice as big 00:33:18.178 --> 00:33:19.720 because of the number of bytes using. 00:33:19.720 --> 00:33:22.060 But Home Depot only had identical sized mailboxes. 00:33:22.060 --> 00:33:23.830 But here is p, one variable. 00:33:23.830 --> 00:33:25.690 There is n, another variable. 00:33:25.690 --> 00:33:30.790 If I open up this mailbox, what should I find inside of it 00:33:30.790 --> 00:33:33.250 based on our story thus far? 00:33:33.250 --> 00:33:36.970 What value will I pull out dramatically in just a moment? 00:33:36.970 --> 00:33:37.810 Yeah, I think. 00:33:37.810 --> 00:33:39.640 0x123. 00:33:39.640 --> 00:33:42.160 Now using this, you can kind of think of this as like X 00:33:42.160 --> 00:33:44.770 marks the spot, no pun intended, where I can now 00:33:44.770 --> 00:33:48.737 walk around the computer's memory and find my way to that location 00:33:48.737 --> 00:33:50.320 by sort of following the treasure map. 00:33:50.320 --> 00:33:53.770 Or if I want it more dramatically, thanks to our little Yale foam 00:33:53.770 --> 00:33:58.385 finger here, you can think of it more abstractly as p is just pointing at n. 00:33:58.385 --> 00:33:59.510 That's not going over well. 00:33:59.510 --> 00:34:01.177 So let's switch over to the Harvard one. 00:34:01.177 --> 00:34:02.775 So p is pointing-- 00:34:02.775 --> 00:34:03.400 AUDIENCE: Whoo. 00:34:06.100 --> 00:34:07.740 DAVID J. MALAN: So p is pointing at n. 00:34:07.740 --> 00:34:10.620 And so it turns out we will be able to write code now 00:34:10.620 --> 00:34:13.252 that will do the equivalent of me walking over to n. 00:34:13.252 --> 00:34:15.960 But for now, Carter, if you want to reveal what's in the mailbox, 00:34:15.960 --> 00:34:19.170 we should see indeed the number 50. 00:34:19.170 --> 00:34:20.878 So that's really all that-- 00:34:20.878 --> 00:34:22.170 Carter is waiting for applause. 00:34:22.170 --> 00:34:24.300 So really, nicely done. 00:34:28.270 --> 00:34:28.929 Thank you. 00:34:28.929 --> 00:34:31.900 So that's just a physical metaphor of what's going on here. 00:34:31.900 --> 00:34:33.760 In one variable, we have an address. 00:34:33.760 --> 00:34:36.070 And that variable by convention is called a pointer. 00:34:36.070 --> 00:34:39.550 In the other variable per week one, we just have a value like n. 00:34:39.550 --> 00:34:43.105 And you can, yes, follow the map and walk yourself 00:34:43.105 --> 00:34:44.230 to that particular address. 00:34:44.230 --> 00:34:45.772 And we'll see how to do that in code. 00:34:45.772 --> 00:34:49.389 But what's really interesting is this abstraction, that pointers literally, 00:34:49.389 --> 00:34:54.310 or really I guess, figuratively, point at some other value in memory. 00:34:54.310 --> 00:34:56.658 All right, questions, then, on pointers in this form. 00:34:56.658 --> 00:34:58.450 AUDIENCE: Can pointers point to each other? 00:34:58.450 --> 00:34:59.990 DAVID J. MALAN: Can pointers point to each other? 00:34:59.990 --> 00:35:00.790 So yes. 00:35:00.790 --> 00:35:02.590 There are things called double pointers. 00:35:02.590 --> 00:35:04.298 We're not going to see them anytime soon. 00:35:04.298 --> 00:35:08.380 But using star, star, you can express an address of an address. 00:35:08.380 --> 00:35:10.370 But we won't see that just yet. 00:35:10.370 --> 00:35:13.920 Other questions on pointers? 00:35:13.920 --> 00:35:15.401 Yeah, in front. 00:35:15.401 --> 00:35:18.630 AUDIENCE: [INAUDIBLE] 00:35:18.630 --> 00:35:21.900 DAVID J. MALAN: Are array-- so to summarize, are arrays then pointers? 00:35:21.900 --> 00:35:23.767 So short answer, there's a relationship. 00:35:23.767 --> 00:35:25.600 And we'll come back to that in a little bit. 00:35:25.600 --> 00:35:28.020 But arrays are technically different from pointers. 00:35:28.020 --> 00:35:30.965 But we're going to be able to blur the lines a little bit by using one 00:35:30.965 --> 00:35:31.590 like the other. 00:35:31.590 --> 00:35:34.090 But let me come back to that in just a bit of time. 00:35:34.090 --> 00:35:34.590 All right. 00:35:34.590 --> 00:35:38.350 So if we have now this mental model, if you will, 00:35:38.350 --> 00:35:41.460 of what a pointer is in memory, I think we 00:35:41.460 --> 00:35:45.000 can start to peel back a layer of simplification 00:35:45.000 --> 00:35:47.700 that we've been assuming for the past few weeks since week one. 00:35:47.700 --> 00:35:50.380 So a string, recall, is a sequence of characters. 00:35:50.380 --> 00:35:52.380 And so if you want to create a string that says, 00:35:52.380 --> 00:35:54.870 hi, in all caps and an exclamation point, 00:35:54.870 --> 00:35:57.480 we do string s equals quote unquote "hi". 00:35:57.480 --> 00:36:00.070 And we can hard code it like this, or we could use get string. 00:36:00.070 --> 00:36:01.903 But for now, just assume that I hardcoded it 00:36:01.903 --> 00:36:06.097 into my code to always say, hi, in all caps with an exclamation point. 00:36:06.097 --> 00:36:08.430 Well, what does that look like in the computer's memory? 00:36:08.430 --> 00:36:10.650 Well, let's stop looking at the entire memory 00:36:10.650 --> 00:36:12.630 and let's just focus on really what's going on. 00:36:12.630 --> 00:36:17.430 Once you create a string called S and store in it hi, 00:36:17.430 --> 00:36:19.740 you know that a couple of things are happening. 00:36:19.740 --> 00:36:24.030 H, and I, and the exclamation point are ending up in the computer's memory. 00:36:24.030 --> 00:36:29.670 We know from week two that this thing, the so-called NUL character, NUL, AKA 00:36:29.670 --> 00:36:32.632 backslash zero, is also being added for you. 00:36:32.632 --> 00:36:33.840 And it's somewhere in memory. 00:36:33.840 --> 00:36:36.360 At the moment, I don't really care where I drew it at the bottom right. 00:36:36.360 --> 00:36:37.435 Yes, it has an address. 00:36:37.435 --> 00:36:39.060 But for now, it just ends up somewhere. 00:36:39.060 --> 00:36:43.470 And in fact, here's a little visual cue as to how this happens. 00:36:43.470 --> 00:36:47.370 In C, any time you use double quotes to give you a string, 00:36:47.370 --> 00:36:50.580 you can imagine that the double quotes are like a clue 00:36:50.580 --> 00:36:55.200 to not only store HI exclamation point, but also put the NUL character there 00:36:55.200 --> 00:36:55.720 for you. 00:36:55.720 --> 00:36:59.670 And this is in contrast to what chars, if you want individual characters, what 00:36:59.670 --> 00:37:01.545 syntax did we use instead? 00:37:01.545 --> 00:37:02.420 AUDIENCE: [INAUDIBLE] 00:37:02.420 --> 00:37:03.795 DAVID J. MALAN: So single quotes. 00:37:03.795 --> 00:37:06.190 Single quotes do not add magically a backslash zero. 00:37:06.190 --> 00:37:08.000 They literally just store one character. 00:37:08.000 --> 00:37:10.167 So again, strings have always been a little special. 00:37:10.167 --> 00:37:11.110 You get some extra-- 00:37:11.110 --> 00:37:13.720 an extra byte for free so that you know where 00:37:13.720 --> 00:37:17.270 the string ends, and functions like STR compare can then find their way there. 00:37:17.270 --> 00:37:20.630 So in memory, it might indeed look a little like this. 00:37:20.630 --> 00:37:24.068 And if we assume that there's going to be somewhere in memory, 00:37:24.068 --> 00:37:26.110 these things are going to be somewhere in memory, 00:37:26.110 --> 00:37:29.270 we can address them per week two by way of the name of the variable. 00:37:29.270 --> 00:37:31.578 So if S is the name of the variable, S bracket 0 00:37:31.578 --> 00:37:33.370 is how you would refer to the first letter. 00:37:33.370 --> 00:37:34.630 S bracket 1, S bracket 2. 00:37:34.630 --> 00:37:39.100 And if you really want, S bracket 3 would get you at the NUL character 00:37:39.100 --> 00:37:40.370 at the very end. 00:37:40.370 --> 00:37:41.830 But what is S? 00:37:41.830 --> 00:37:45.430 So technically in this line of code here, not only 00:37:45.430 --> 00:37:51.150 is the computer giving you memory for HI exclamation point backslash zero, we-- 00:37:51.150 --> 00:37:54.430 it turns out that S itself must take up some amount of space 00:37:54.430 --> 00:37:55.570 because S is the variable. 00:37:55.570 --> 00:37:57.778 And every time we've talked about variables thus far, 00:37:57.778 --> 00:38:00.980 I've given you a rectangle on the screen in which to store its value. 00:38:00.980 --> 00:38:05.650 So let's assume for the sake of discussion that the H is at 0x123 00:38:05.650 --> 00:38:09.520 and I is at 0x124 exclamation point is at 0x125, 00:38:09.520 --> 00:38:11.740 and the NUL character is at 0x126. 00:38:11.740 --> 00:38:13.390 Well, what then is S? 00:38:13.390 --> 00:38:16.107 Well, s is just going to be some other variable. 00:38:16.107 --> 00:38:18.940 And I'll draw it somewhat abstractly without all of the other boxes, 00:38:18.940 --> 00:38:19.630 up here. 00:38:19.630 --> 00:38:22.270 And I'll claim that the name of this variable is s. 00:38:22.270 --> 00:38:25.000 But it turns out, what is s really? 00:38:25.000 --> 00:38:27.130 How do strings really work? 00:38:27.130 --> 00:38:30.640 Well, s is a variable, and has been since week one. 00:38:30.640 --> 00:38:33.730 But when you define it, what the computer is doing for you automatically 00:38:33.730 --> 00:38:36.940 is when it knows you want to store HI exclamation point, 00:38:36.940 --> 00:38:38.440 it puts that somewhere in memory. 00:38:38.440 --> 00:38:40.720 The computer then figures out for you, what's 00:38:40.720 --> 00:38:42.880 the address of the very first character? 00:38:42.880 --> 00:38:46.120 And it stores that address, and only that address, 00:38:46.120 --> 00:38:50.080 in the variable you created on the left hand side of the equal sign. 00:38:50.080 --> 00:38:51.370 And that's enough. 00:38:51.370 --> 00:38:55.120 To represent a string with three letters of the alphabet or punctuation, 00:38:55.120 --> 00:38:57.590 you don't need three variables. 00:38:57.590 --> 00:38:58.450 You just need one. 00:38:58.450 --> 00:39:02.050 You just need to know the beginning of the string. 00:39:02.050 --> 00:39:02.810 Why? 00:39:02.810 --> 00:39:06.790 Why is it sufficient for a variable to only store the first byte's address, 00:39:06.790 --> 00:39:09.456 and not all of the bytes' addresses? 00:39:09.456 --> 00:39:11.380 AUDIENCE: [INAUDIBLE] 00:39:11.380 --> 00:39:12.380 DAVID J. MALAN: Exactly. 00:39:12.380 --> 00:39:17.360 Because of the design of strings per week two, we always NUL terminate them. 00:39:17.360 --> 00:39:20.013 So it's suffices to only remember the first byte's address. 00:39:20.013 --> 00:39:22.430 Because from there, you can sort of follow the breadcrumbs 00:39:22.430 --> 00:39:23.690 byte, after byte, after byte. 00:39:23.690 --> 00:39:27.140 And until you see the new line, sorry, the NUL character, 00:39:27.140 --> 00:39:31.230 you know that all of those characters are apparently part of the same string. 00:39:31.230 --> 00:39:35.600 So this is what's been going on in the computer's memory all since week one. 00:39:35.600 --> 00:39:37.430 And in fact, if we abstract this away, you 00:39:37.430 --> 00:39:42.590 can really think of S as being just this, really a pointer 00:39:42.590 --> 00:39:44.400 to that chunk of memory. 00:39:44.400 --> 00:39:46.920 So in fact, what do we have here? 00:39:46.920 --> 00:39:51.360 Well, in the left to recap on the code here, on the left hand side string, 00:39:51.360 --> 00:39:54.170 that's what ensures that we'll actually be able to store 00:39:54.170 --> 00:39:55.880 a string in a variable called s. 00:39:55.880 --> 00:39:59.760 We're going to have on the right hand side, though, the actual value. 00:39:59.760 --> 00:40:01.460 So let me switch back to VS Code here. 00:40:01.460 --> 00:40:04.490 And let me change my code to no longer involve integers alone. 00:40:04.490 --> 00:40:09.260 So I'm going to add the CS50 library just so 00:40:09.260 --> 00:40:11.000 that I can use some shortcuts in there. 00:40:11.000 --> 00:40:11.783 CS50.h. 00:40:11.783 --> 00:40:14.450 And then in my main function, I'm going to go ahead and do this. 00:40:14.450 --> 00:40:18.270 String s equals quote unquote "HI" in all caps, exclamation point. 00:40:18.270 --> 00:40:22.220 And then I'm going go ahead and print out using %S as always backslash n 00:40:22.220 --> 00:40:23.270 the value of s. 00:40:23.270 --> 00:40:25.530 So this program at the moment, not interesting at all. 00:40:25.530 --> 00:40:29.690 It's just week one stuff again. ./addresses indeed prints out hi. 00:40:29.690 --> 00:40:33.380 But it turns out that now that I know this, 00:40:33.380 --> 00:40:36.530 what's really been going on underneath the hood all this time? 00:40:36.530 --> 00:40:40.640 Well, here's that same line of code that defines the variable called S. 00:40:40.640 --> 00:40:46.880 And it turns out anyone, want to guess what string is actually a synonym for? 00:40:46.880 --> 00:40:50.540 String, it turns out, is kind of a white lie we've been telling since week one. 00:40:50.540 --> 00:40:55.190 There is no such thing as string as a keyword in C. 00:40:55.190 --> 00:40:57.110 It's technically a CS50 thing. 00:40:57.110 --> 00:40:57.650 Yeah. 00:40:57.650 --> 00:40:58.733 AUDIENCE: [INAUDIBLE] 00:40:58.733 --> 00:41:00.650 DAVID J. MALAN: It's a pointer to a character. 00:41:00.650 --> 00:41:03.410 So really, all this time, we've kind of been lying to you. 00:41:03.410 --> 00:41:05.510 There is no "string" quote unquote. 00:41:05.510 --> 00:41:07.430 It's actually char star. 00:41:07.430 --> 00:41:12.890 And if I may dramatically here, go, the training wheels. 00:41:12.890 --> 00:41:14.310 That didn't land very well. 00:41:14.310 --> 00:41:16.490 So what have we been doing? 00:41:16.490 --> 00:41:19.310 Well, it turns out that string is a much easier way conceptually 00:41:19.310 --> 00:41:21.227 to think about what a string of characters is. 00:41:21.227 --> 00:41:24.053 My God, if we had a start in week one by having you type char star, 00:41:24.053 --> 00:41:25.220 yeah, you might get past it. 00:41:25.220 --> 00:41:28.300 But this is just way too much ugly syntax, not intellectually interesting 00:41:28.300 --> 00:41:28.800 at all. 00:41:28.800 --> 00:41:30.020 So we abstract it away. 00:41:30.020 --> 00:41:33.740 What a char star was in the first week of C, by telling you 00:41:33.740 --> 00:41:35.210 it's actually called string. 00:41:35.210 --> 00:41:37.400 Now string is a term of R. C programmers, 00:41:37.400 --> 00:41:39.800 programmers in any language will use the word string 00:41:39.800 --> 00:41:41.430 to mean a sequence of characters. 00:41:41.430 --> 00:41:45.510 But in C, it's not technically a word unto itself. 00:41:45.510 --> 00:41:49.080 It's rather a synonym that we ourselves created in some form. 00:41:49.080 --> 00:41:50.960 So in fact, how did we do this? 00:41:50.960 --> 00:41:52.550 Well, think back to just last week. 00:41:52.550 --> 00:41:54.830 Last week, I proposed that it'd be really nice 00:41:54.830 --> 00:41:58.040 if we had a person data type, which the creators of C 00:41:58.040 --> 00:41:59.617 did not think of decades ago. 00:41:59.617 --> 00:42:00.200 But that's OK. 00:42:00.200 --> 00:42:01.520 We can define it ourselves. 00:42:01.520 --> 00:42:02.940 What did we do here? 00:42:02.940 --> 00:42:05.400 Well, we're using syntax like this. 00:42:05.400 --> 00:42:08.210 Recall that we defined a person to be what? 00:42:08.210 --> 00:42:09.800 To be this structure. 00:42:09.800 --> 00:42:12.080 This structure, using the new keyword last week, 00:42:12.080 --> 00:42:14.600 struct, means that a person is just a name and a number. 00:42:14.600 --> 00:42:16.100 And it could have been other things. 00:42:16.100 --> 00:42:17.180 We just kept it simple. 00:42:17.180 --> 00:42:22.640 But how did I associate person with that structure? 00:42:22.640 --> 00:42:25.070 Well, we claimed that it was this value here, 00:42:25.070 --> 00:42:28.770 typedef, which as you might expect, defines a data type. 00:42:28.770 --> 00:42:33.020 So what did we do as CS50 back in week one without telling you? 00:42:33.020 --> 00:42:36.950 Well, we could have done something like this. 00:42:36.950 --> 00:42:38.318 Int itself is a little cryptic. 00:42:38.318 --> 00:42:41.360 And maybe we should have to keep things even simpler said, hey, everyone. 00:42:41.360 --> 00:42:44.990 Turns out you can define integers in C. And if you wanted to do this, well, 00:42:44.990 --> 00:42:47.700 if you want to create the keyword integer as a data type, 00:42:47.700 --> 00:42:49.760 you can just typedef it to int. 00:42:49.760 --> 00:42:53.570 So typedef creates the word on the far right, integer, 00:42:53.570 --> 00:42:57.320 and creates a synonym for it in this case called int. 00:42:57.320 --> 00:43:00.590 So what did we do in week one without telling you? 00:43:00.590 --> 00:43:04.490 We have a line of code like this in the CS50 library 00:43:04.490 --> 00:43:10.490 that associates quote unquote "string" with more cryptically char star. 00:43:10.490 --> 00:43:15.110 And this is why in week one onward, any time you use the CS50 library, 00:43:15.110 --> 00:43:18.297 you can write the word string as though it's a real C data type. 00:43:18.297 --> 00:43:21.380 And that's just because we wanted to have this abstraction, these training 00:43:21.380 --> 00:43:23.060 wheels on for the first weeks, so we don't 00:43:23.060 --> 00:43:25.393 have to get in the weeds of all this crazy memory stuff. 00:43:25.393 --> 00:43:27.740 We can sort of talk about strings at a higher level. 00:43:27.740 --> 00:43:29.120 But that's all they are. 00:43:29.120 --> 00:43:31.970 Strings are the address of the first character 00:43:31.970 --> 00:43:34.760 in that sequence of characters. 00:43:34.760 --> 00:43:38.220 Questions now on any of these details? 00:43:38.220 --> 00:43:38.720 Yeah. 00:43:38.720 --> 00:43:41.460 AUDIENCE: What about the strings libraries that [INAUDIBLE]?? 00:43:41.460 --> 00:43:42.710 DAVID J. MALAN: Good question. 00:43:42.710 --> 00:43:45.910 What about the strings library, which we have used? 00:43:45.910 --> 00:43:46.600 Unrelated. 00:43:46.600 --> 00:43:48.520 So it does not define the word string. 00:43:48.520 --> 00:43:51.430 Everything in there actually relates to char stars. 00:43:51.430 --> 00:43:56.320 And so in fact, if you've used the CS50 manual, which is just 00:43:56.320 --> 00:44:00.340 our user-friendly version of the actual manual pages for the official language, 00:44:00.340 --> 00:44:03.820 C, you'll see throughout that now if you start poking around or turning off 00:44:03.820 --> 00:44:05.830 less comfortable mode, you'll actually see 00:44:05.830 --> 00:44:08.350 that we changed any mentions of char star 00:44:08.350 --> 00:44:10.690 in the official documentation for these first weeks 00:44:10.690 --> 00:44:12.610 to just string to simplify it. 00:44:12.610 --> 00:44:17.350 But underneath the hood, C does not know the word string per se as a keyword. 00:44:17.350 --> 00:44:21.320 But it's absolutely a concept that every program in the world knows about. 00:44:21.320 --> 00:44:23.750 And in fact, in other languages, in Python for instance, 00:44:23.750 --> 00:44:26.227 there will actually be a proper string, although it's not 00:44:26.227 --> 00:44:27.310 going to be called string. 00:44:27.310 --> 00:44:30.340 It's going to be called STR, STR for short. 00:44:30.340 --> 00:44:33.470 Questions on these strings here. 00:44:33.470 --> 00:44:36.520 Well, let me propose there's one other feature of this syntax 00:44:36.520 --> 00:44:38.960 that we can now leverage as follows. 00:44:38.960 --> 00:44:43.060 Let me propose that if we go back to the previous version of my code 00:44:43.060 --> 00:44:46.760 here, wherein, let me switch back to VS Code in just a moment, 00:44:46.760 --> 00:44:51.800 I'm going to rewind in VS Code to the integer version of my code from before. 00:44:51.800 --> 00:44:55.540 And most recently, it looked like this, before when we were using integers 00:44:55.540 --> 00:44:58.570 only and not, in fact, strings at all. 00:44:58.570 --> 00:45:01.270 Let me propose that there's this other feature of C 00:45:01.270 --> 00:45:04.360 that we can use that actually allows us to go to an address. 00:45:04.360 --> 00:45:07.090 So at the moment, let me just rewind and do, make addresses, 00:45:07.090 --> 00:45:10.660 to remind you what this program did when it was using integers alone. 00:45:10.660 --> 00:45:12.010 And there's that address. 00:45:12.010 --> 00:45:12.670 Why? 00:45:12.670 --> 00:45:15.880 Because on line seven, notice, I'm printing out 00:45:15.880 --> 00:45:17.710 the value of p, which is a pointer. 00:45:17.710 --> 00:45:20.060 So of course, it's going to look like an address. 00:45:20.060 --> 00:45:22.660 But let me zoom out now and make one change. 00:45:22.660 --> 00:45:27.730 Instead of printing out p, how can I use today's second new operator, 00:45:27.730 --> 00:45:32.020 not the ampersand, but the star, to actually go to that address? 00:45:32.020 --> 00:45:35.170 Well, what I can actually do on this line of code, is this. 00:45:35.170 --> 00:45:39.250 If I want to print out the actual integer 50 that's in that variable, 00:45:39.250 --> 00:45:44.858 or equivalently at that address, I can go to p here and not print p literally, 00:45:44.858 --> 00:45:46.150 because that's just an address. 00:45:46.150 --> 00:45:48.910 I can now say, star p. 00:45:48.910 --> 00:45:51.310 And star p means go there. 00:45:51.310 --> 00:45:53.200 More technically, de-reference p. 00:45:53.200 --> 00:45:56.560 That is, follow the treasure map to the actual address and do what Carter did. 00:45:56.560 --> 00:46:00.260 Open the mailbox and print whatever was in the mailbox, which recall, 00:46:00.260 --> 00:46:02.380 was the actual number 50. 00:46:02.380 --> 00:46:03.680 So let me try this. 00:46:03.680 --> 00:46:05.270 Let me recompile the code. 00:46:05.270 --> 00:46:08.020 So make addresses. 00:46:08.020 --> 00:46:09.520 OK, let me clear my terminal window. 00:46:09.520 --> 00:46:10.510 Dot slash addresses. 00:46:10.510 --> 00:46:12.640 This time, I shouldn't see the 0x anything. 00:46:12.640 --> 00:46:16.550 I should see just the number 50 in this case. 00:46:16.550 --> 00:46:19.990 And here too is kind of a unfortunate design decision, certainly 00:46:19.990 --> 00:46:23.290 pedagogically I would say in C. If I zoom in on this code, 00:46:23.290 --> 00:46:26.335 star is unfortunately being used in two different ways. 00:46:26.335 --> 00:46:28.960 In an ideal world, they would have used three different symbols 00:46:28.960 --> 00:46:30.550 to make this more semantically clear. 00:46:30.550 --> 00:46:32.030 But this is what we're stuck with. 00:46:32.030 --> 00:46:35.370 So in line six, when you declare a pointer, 00:46:35.370 --> 00:46:37.120 that is a variable that stores an address, 00:46:37.120 --> 00:46:39.520 you put the type of variable that you want 00:46:39.520 --> 00:46:42.865 to point at, then a star just because, and then the name of the variable. 00:46:42.865 --> 00:46:44.740 And then on the right hand side, you actually 00:46:44.740 --> 00:46:47.590 get the address of whatever using ampersand. 00:46:47.590 --> 00:46:51.790 But when you want to go to an address, you want to de-reference a pointer, 00:46:51.790 --> 00:46:53.380 you don't use int again. 00:46:53.380 --> 00:46:54.640 And we've never done that. 00:46:54.640 --> 00:46:57.100 Once you declare a variable, you never again mention the data type. 00:46:57.100 --> 00:46:58.975 But in the world of pointers now, if you want 00:46:58.975 --> 00:47:03.340 to not print out p but go to whatever address p is storing, 00:47:03.340 --> 00:47:05.180 you use star p here. 00:47:05.180 --> 00:47:07.570 So a good visual indicator would be when you 00:47:07.570 --> 00:47:10.850 declare a pointer, that is make it exist in your program, 00:47:10.850 --> 00:47:13.900 you have to declare the data type with the star. 00:47:13.900 --> 00:47:16.405 But when you use a pointer, you just use the star. 00:47:16.405 --> 00:47:19.030 In an ideal world, this would be a completely different symbol. 00:47:19.030 --> 00:47:21.310 But again, this is what we have. 00:47:21.310 --> 00:47:23.920 Questions now on that syntax. 00:47:23.920 --> 00:47:25.077 Yeah. 00:47:25.077 --> 00:47:27.850 AUDIENCE: [INAUDIBLE] 00:47:27.850 --> 00:47:30.850 DAVID J. MALAN: Why can't we just do the ampersand here, are you saying? 00:47:30.850 --> 00:47:32.260 It was still a little quiet. 00:47:32.260 --> 00:47:34.600 So strictly speaking, we do not need line six. 00:47:34.600 --> 00:47:36.730 So this is really for pedagogical sake that I 00:47:36.730 --> 00:47:41.320 am defining a separate variable p and then printing it out. 00:47:41.320 --> 00:47:44.500 At this point though, I'm just kind of going in circles, if you will. 00:47:44.500 --> 00:47:46.480 Because more simple would have been what I 00:47:46.480 --> 00:47:49.150 would have done in week one, which would be get rid of p 00:47:49.150 --> 00:47:52.660 altogether, get rid of p here, and just print out n. 00:47:52.660 --> 00:47:56.710 But today, we're just giving you this new building block, this new syntax, 00:47:56.710 --> 00:47:59.170 via which you can figure out the address of something, 00:47:59.170 --> 00:48:04.450 and then reverse the process later and actually go to it as well. 00:48:04.450 --> 00:48:08.200 Other questions on what we've done here with these pointers. 00:48:08.200 --> 00:48:08.700 All right. 00:48:08.700 --> 00:48:10.770 Well, let's context switch back to the string 00:48:10.770 --> 00:48:15.780 now and see what more we can do with this here in the case of our strings 00:48:15.780 --> 00:48:16.800 here. 00:48:16.800 --> 00:48:22.710 Let me refine this to zoom out, let me delete the integer-related code here, 00:48:22.710 --> 00:48:27.480 let me do string s equals quote unquote "HI" in all caps, let me go ahead 00:48:27.480 --> 00:48:30.930 and for the moment include CS50.h at the top 00:48:30.930 --> 00:48:36.390 so that indeed I can use the key word s, string rather, and let me go ahead now 00:48:36.390 --> 00:48:38.500 and do something more than I did last time. 00:48:38.500 --> 00:48:43.020 Last time, I did printf of %s backslash n, and then I printed out s. 00:48:43.020 --> 00:48:45.990 And again, I'll recompile this just for clarity, make addresses, 00:48:45.990 --> 00:48:46.920 dot slash addresses. 00:48:46.920 --> 00:48:47.920 That just prints out hi. 00:48:47.920 --> 00:48:49.450 So that's, again, week one stuff. 00:48:49.450 --> 00:48:52.260 But now that we have this other bit of syntax, 00:48:52.260 --> 00:48:54.630 we can do some interesting things too. 00:48:54.630 --> 00:48:58.440 So for instance, suppose I want to print out not s itself, 00:48:58.440 --> 00:49:00.870 but what if I want to print out the address of s? 00:49:00.870 --> 00:49:03.210 At what memory location is s? 00:49:03.210 --> 00:49:07.350 Well, I can change my %s to %p, which now we know p is for pointer. 00:49:07.350 --> 00:49:10.200 So %p means print out the value of a pointer. 00:49:10.200 --> 00:49:11.280 That is an address. 00:49:11.280 --> 00:49:15.150 And here, I can actually print out s itself. 00:49:15.150 --> 00:49:17.200 But why that is, we'll see in a moment. 00:49:17.200 --> 00:49:19.200 Let me do this. 00:49:19.200 --> 00:49:20.850 Here go the training wheels. 00:49:20.850 --> 00:49:25.350 String does not technically exist, but it does if I'm using the CS50 library. 00:49:25.350 --> 00:49:28.380 But if I get rid of the CS50 library, as I'm metaphorically 00:49:28.380 --> 00:49:31.690 doing by taking off the training wheels, I can't use the word string anymore. 00:49:31.690 --> 00:49:33.773 And in fact, let me make this mistake deliberately 00:49:33.773 --> 00:49:35.880 as you might have accidentally in past weeks. 00:49:35.880 --> 00:49:40.470 Here is the error message I get if I forget the CS50 library, use 00:49:40.470 --> 00:49:42.420 of undeclared identifier string. 00:49:42.420 --> 00:49:43.830 Did you mean standard in? 00:49:43.830 --> 00:49:46.872 It's trying to be helpful, but it's not because I didn't mean standard n. 00:49:46.872 --> 00:49:48.870 So indeed, this is confirmation that C does not 00:49:48.870 --> 00:49:51.780 know the word string exists, at least as a keyword. 00:49:51.780 --> 00:49:54.130 Exists as a concept, but not a keyword. 00:49:54.130 --> 00:49:57.515 So I could fix this by adding back the CS50 library. 00:49:57.515 --> 00:50:00.390 But that's kind of a step backwards, educationally, instead of a step 00:50:00.390 --> 00:50:00.930 forward. 00:50:00.930 --> 00:50:05.605 What could I do instead to fix this now if the training wheels are now off? 00:50:05.605 --> 00:50:06.105 Yeah. 00:50:06.105 --> 00:50:09.210 AUDIENCE: [INAUDIBLE] 00:50:09.210 --> 00:50:10.210 DAVID J. MALAN: Exactly. 00:50:10.210 --> 00:50:13.808 Replace "string" quote unquote with char star instead. 00:50:13.808 --> 00:50:15.850 So I'm going to go ahead and change this to char. 00:50:15.850 --> 00:50:18.280 Technically, you can put the literal star here, 00:50:18.280 --> 00:50:21.310 the asterisk, or you can put it there, or you can put it here. 00:50:21.310 --> 00:50:23.710 By convention is to do what I've done from the beginning, 00:50:23.710 --> 00:50:28.180 put the star next to the name of the variable as opposed to anywhere else. 00:50:28.180 --> 00:50:29.930 Let me go ahead now and-- 00:50:29.930 --> 00:50:30.447 or sorry. 00:50:30.447 --> 00:50:31.780 I meant to add the spaces there. 00:50:31.780 --> 00:50:32.560 You could do this too. 00:50:32.560 --> 00:50:34.480 But this would be the most normal convention. 00:50:34.480 --> 00:50:35.590 So now let's do this. 00:50:35.590 --> 00:50:39.850 Make addresses, compiles OK now, dot slash addresses. 00:50:39.850 --> 00:50:40.990 What should I see? 00:50:40.990 --> 00:50:44.660 Hi or something else? 00:50:44.660 --> 00:50:45.970 Feel free to just call it out. 00:50:45.970 --> 00:50:46.918 AUDIENCE: [INAUDIBLE] 00:50:46.918 --> 00:50:48.460 DAVID J. MALAN: So still hi, you say? 00:50:48.460 --> 00:50:50.307 Someone else? 00:50:50.307 --> 00:50:51.390 AUDIENCE: Memory location? 00:50:51.390 --> 00:50:52.410 DAVID J. MALAN: A memory location. 00:50:52.410 --> 00:50:54.510 All right, so it could be one of the two options. 00:50:54.510 --> 00:50:57.552 Either I'm going to see the string, or I'm going to see a memory address. 00:50:57.552 --> 00:50:59.400 Though I do, in fact, see a memory address. 00:50:59.400 --> 00:51:01.608 And this one is quite different from the integer one. 00:51:01.608 --> 00:51:04.950 But does anyone now want to explain why you were correct? 00:51:04.950 --> 00:51:09.000 Why am I seeing the address down here and not hi? 00:51:09.000 --> 00:51:09.640 It's subtle. 00:51:09.640 --> 00:51:10.140 Yeah. 00:51:10.140 --> 00:51:12.500 AUDIENCE: [INAUDIBLE] 00:51:12.500 --> 00:51:13.500 DAVID J. MALAN: Exactly. 00:51:13.500 --> 00:51:17.490 Because I left my %p there, which means, hey, printf, show me a pointer. 00:51:17.490 --> 00:51:21.030 But this is where printf is smart and has been smart since week zero. 00:51:21.030 --> 00:51:26.475 Humans who invented printf decades ago wrote code that notices that OK, 00:51:26.475 --> 00:51:31.590 %s means to treat the following value, not as just an address per se that gets 00:51:31.590 --> 00:51:34.890 printed literally, but print it as with the mailbox demo, 00:51:34.890 --> 00:51:39.300 as sort of a treasure map that leads you to the address of a character. 00:51:39.300 --> 00:51:42.750 So simply by changing one character, %p to %s, 00:51:42.750 --> 00:51:45.840 and if I now do make addresses again and dot slash addresses, 00:51:45.840 --> 00:51:49.710 this now is identical to week one, but hopefully makes sense. 00:51:49.710 --> 00:51:56.100 Because %s is just a clue to printf that means, go to this address in s. 00:51:56.100 --> 00:52:00.990 Print out every character there and thereafter until you see, what? 00:52:00.990 --> 00:52:01.890 The NUL character. 00:52:01.890 --> 00:52:04.060 And then stop printing anything more. 00:52:04.060 --> 00:52:07.050 And this is why hi has printed since week one. 00:52:07.050 --> 00:52:09.690 Today, we can see the address %p. 00:52:09.690 --> 00:52:13.920 But this combination of having access to addresses and the NUL terminator 00:52:13.920 --> 00:52:16.740 is all the information printf needs to actually do something more 00:52:16.740 --> 00:52:21.180 useful by printing the actual strings. 00:52:21.180 --> 00:52:26.250 Any questions now on this approach to %s? 00:52:26.250 --> 00:52:28.102 Yeah, in back. 00:52:28.102 --> 00:52:29.363 AUDIENCE: [INAUDIBLE] 00:52:29.363 --> 00:52:32.280 DAVID J. MALAN: Oh, so why is it traditionally being used in this way? 00:52:32.280 --> 00:52:34.662 Honestly, the word string has been around for decades. 00:52:34.662 --> 00:52:37.620 It's not a key word you should be able to type in C unless you're using 00:52:37.620 --> 00:52:39.450 a library like CS50's. 00:52:39.450 --> 00:52:41.190 And so s just means string. 00:52:41.190 --> 00:52:45.060 So even though it doesn't exist as a key word, %s connotes string. 00:52:45.060 --> 00:52:48.250 And humans decades ago, like today, just kind of know what that means. 00:52:48.250 --> 00:52:50.458 So they could have chosen any letter of the alphabet. 00:52:50.458 --> 00:52:52.420 But s sort of makes the most sense. 00:52:52.420 --> 00:52:52.920 All right. 00:52:52.920 --> 00:52:54.150 Well, let's-- in back. 00:52:54.150 --> 00:52:54.930 Other question? 00:52:54.930 --> 00:52:56.310 AUDIENCE: [INAUDIBLE] 00:52:56.310 --> 00:52:57.560 DAVID J. MALAN: Good question. 00:52:57.560 --> 00:52:58.560 Before-- let me zoom in. 00:52:58.560 --> 00:53:01.010 I did not use a star before the s. 00:53:01.010 --> 00:53:01.760 Why? 00:53:01.760 --> 00:53:03.290 Well, it's subtle here. 00:53:03.290 --> 00:53:07.340 But printf was invented years ago to know, 00:53:07.340 --> 00:53:11.430 given an address like in the variable s, printf knows to go there. 00:53:11.430 --> 00:53:14.810 So if we looked at the source code that some human wrote years ago for C, 00:53:14.810 --> 00:53:18.620 we would likely see the actual asterisk that you're referring to. 00:53:18.620 --> 00:53:21.650 Printf is taking on the responsibility for going to s. 00:53:21.650 --> 00:53:26.510 If you were to do star s here instead, an asterisk, and an s, 00:53:26.510 --> 00:53:29.570 that would now be literally a character. 00:53:29.570 --> 00:53:33.345 Because if I say star s, that means go to the address in s. 00:53:33.345 --> 00:53:35.720 And all you're going to find there is a single character. 00:53:35.720 --> 00:53:39.680 What printf wants to know is not, what is the character there? 00:53:39.680 --> 00:53:41.430 What is the address of that character? 00:53:41.430 --> 00:53:41.930 Why? 00:53:41.930 --> 00:53:45.890 Because printf needs to walk through the rest of those characters 00:53:45.890 --> 00:53:48.770 looking for the final NUL character. 00:53:48.770 --> 00:53:51.140 And in fact, let me see, with a bit more syntax, 00:53:51.140 --> 00:53:52.830 if we can highlight this a bit more. 00:53:52.830 --> 00:53:53.570 Let me do this. 00:53:53.570 --> 00:53:56.930 In addition to printing s, let's try out our syntax in another way. 00:53:56.930 --> 00:54:01.040 Let me print out with %s how about not s here, 00:54:01.040 --> 00:54:05.840 but let's print out some addresses. %s backslash n, close quote, 00:54:05.840 --> 00:54:08.030 and then let's print out, how about this? 00:54:08.030 --> 00:54:12.440 The first character in the string s would be called s bracket 0. 00:54:12.440 --> 00:54:16.425 But how do I get the address of the first character in s? 00:54:16.425 --> 00:54:18.800 Well, I could technically just use today's new primitive. 00:54:18.800 --> 00:54:19.967 I can just add an ampersand. 00:54:19.967 --> 00:54:22.620 That always gives me the address of some value. 00:54:22.620 --> 00:54:27.200 So when I end this thought and clear my terminal window 00:54:27.200 --> 00:54:29.660 and run make addresses, still compiles, when 00:54:29.660 --> 00:54:32.090 I run addresses in just a moment, any guesses 00:54:32.090 --> 00:54:34.950 as to what I will see line by line? 00:54:34.950 --> 00:54:37.083 This will print out two things. 00:54:37.083 --> 00:54:39.500 And you don't have to remember what the actual number was. 00:54:39.500 --> 00:54:42.440 But at a high level, what will be printed now? 00:54:42.440 --> 00:54:44.120 The same thing twice. 00:54:44.120 --> 00:54:44.700 Why? 00:54:44.700 --> 00:54:48.080 Well, when I run this, what I'm printing here, and let me zoom in at the bottom, 00:54:48.080 --> 00:54:50.780 I indeed see two really long addresses. 00:54:50.780 --> 00:54:52.100 But they're, in fact, the same. 00:54:52.100 --> 00:54:52.650 Why? 00:54:52.650 --> 00:54:57.830 Well, that's because, again, if s is the address of a character, as implied now 00:54:57.830 --> 00:55:02.750 by either the CS50 word string, or the actual phrase char star, well, 00:55:02.750 --> 00:55:04.370 then s is just an address. 00:55:04.370 --> 00:55:09.050 By contrast per week two, s bracket 0 is a char. 00:55:09.050 --> 00:55:11.230 Always has been a char, specific char. 00:55:11.230 --> 00:55:14.105 But if you want the address of that char, you just add the ampersand. 00:55:14.105 --> 00:55:18.198 Well, it turns out that strings, per the definition we keep emphasizing, 00:55:18.198 --> 00:55:20.490 is just the address of the first character in a string. 00:55:20.490 --> 00:55:23.550 So of course, if you do this, you're going to see the exact same thing. 00:55:23.550 --> 00:55:26.450 And if I do this a bit more, generally, you don't want to copy paste. 00:55:26.450 --> 00:55:28.940 But this is just for visualization sake. 00:55:28.940 --> 00:55:30.440 Let me print out all the characters. 00:55:30.440 --> 00:55:32.330 So another, another, another. 00:55:32.330 --> 00:55:36.200 And let me change this to print out the address of bracket one, bracket 00:55:36.200 --> 00:55:37.520 two, and bracket three. 00:55:37.520 --> 00:55:40.310 So all four characters, H, I, exclamation point, 00:55:40.310 --> 00:55:41.480 and the NUL character. 00:55:41.480 --> 00:55:43.920 Notice I'm using %p for all of them. 00:55:43.920 --> 00:55:47.330 So if I now do make addresses and dot slash addresses, 00:55:47.330 --> 00:55:49.590 now notice, and this is kind of cool. 00:55:49.590 --> 00:55:51.530 The first two are indeed still the same. 00:55:51.530 --> 00:55:54.575 But what's noteworthy about the other values on the screen? 00:55:57.180 --> 00:55:58.380 Yeah, they're consecutive. 00:55:58.380 --> 00:55:59.822 Each of these is just 1 byte away. 00:55:59.822 --> 00:56:03.030 Even if you're not good at hex yet and there's a crazy number of digits here, 00:56:03.030 --> 00:56:03.670 who cares? 00:56:03.670 --> 00:56:07.420 They're all the same except for the last ones, four, four, and then five, 00:56:07.420 --> 00:56:07.920 six, seven. 00:56:07.920 --> 00:56:11.280 And this confirms what I've been claiming for weeks is that in an array, 00:56:11.280 --> 00:56:16.380 all of the characters are back to back to back contiguous 1 byte away. 00:56:16.380 --> 00:56:18.960 So with just this ampersand, with just this star, 00:56:18.960 --> 00:56:21.000 it's actually a pretty cool tool in the toolkit 00:56:21.000 --> 00:56:24.630 to have Because you can start to poke around what's actually going 00:56:24.630 --> 00:56:27.220 on inside of the computer's memory. 00:56:27.220 --> 00:56:31.710 And in fact, if we do this, I can introduce one other cool trick here, 00:56:31.710 --> 00:56:32.490 if you will. 00:56:32.490 --> 00:56:38.160 Let me propose that we can actually now do arithmetic on pointers. 00:56:38.160 --> 00:56:39.090 And you don't have to. 00:56:39.090 --> 00:56:40.780 You'll see a simpler way to do this. 00:56:40.780 --> 00:56:44.790 But now that you have perhaps this underlying understanding of where 00:56:44.790 --> 00:56:46.710 things are in memory and it's just addresses, 00:56:46.710 --> 00:56:48.850 we can actually do something kind of neat. 00:56:48.850 --> 00:56:50.950 We can do something like this. 00:56:50.950 --> 00:56:55.470 Let me go back to how about the string version of this with hi. 00:56:55.470 --> 00:56:57.120 And let me do this instead. 00:56:57.120 --> 00:57:01.390 Let me clean this up a bit, get rid of some of these lines of code. 00:57:01.390 --> 00:57:02.290 And let me do this. 00:57:02.290 --> 00:57:04.975 Let me print out %c, %c, %c. 00:57:04.975 --> 00:57:06.600 Let me get rid of all these ampersands. 00:57:06.600 --> 00:57:09.210 We're going to roll back to week two stuff. 00:57:09.210 --> 00:57:13.110 Just to be clear, when I compile and run this version of the program, 00:57:13.110 --> 00:57:17.100 and I'll zoom in, what should get printed on the screen? 00:57:17.100 --> 00:57:19.380 This is just week two stuff now. 00:57:19.380 --> 00:57:20.864 No pointers per se. 00:57:20.864 --> 00:57:21.364 Yeah. 00:57:21.364 --> 00:57:23.800 AUDIENCE: [INAUDIBLE] 00:57:23.800 --> 00:57:26.290 DAVID J. MALAN: Just HI exclamation point, one per line, 00:57:26.290 --> 00:57:28.040 because I have all of these backslash n's. 00:57:28.040 --> 00:57:29.150 So let me do that. 00:57:29.150 --> 00:57:32.285 Let me go down here, make addresses, Enter. 00:57:32.285 --> 00:57:32.960 OK, pretty good. 00:57:32.960 --> 00:57:33.980 Dot slash addresses. 00:57:33.980 --> 00:57:36.140 And indeed HI exclamation point. 00:57:36.140 --> 00:57:38.468 But now if you're getting a little more comfortable, 00:57:38.468 --> 00:57:41.510 and it's fine if you're not yet today, but over the coming week or weeks, 00:57:41.510 --> 00:57:45.050 as you get a little more comfortable with the equivalence of addresses 00:57:45.050 --> 00:57:48.720 with our definition in the past of arrays, and strings, and all of this, 00:57:48.720 --> 00:57:50.180 you can start to play around. 00:57:50.180 --> 00:57:51.680 And I can do this instead. 00:57:51.680 --> 00:57:56.060 If I want to print out the first character in the string, 00:57:56.060 --> 00:57:58.023 I could do, like week two, s bracket 0. 00:57:58.023 --> 00:57:58.940 That will always work. 00:57:58.940 --> 00:57:59.960 And you can keep using that. 00:57:59.960 --> 00:58:00.960 That's not a CS50 thing. 00:58:00.960 --> 00:58:05.570 It's just a convenience in C. But I could technically print out not s, 00:58:05.570 --> 00:58:07.650 because s is an address. 00:58:07.650 --> 00:58:13.240 But what would be the syntax I could use to say, print out the character at s? 00:58:13.240 --> 00:58:15.070 Any instincts? 00:58:15.070 --> 00:58:19.050 How can I say, go to the address in s? 00:58:19.050 --> 00:58:21.840 It's one of two possible answers today. 00:58:21.840 --> 00:58:24.030 So of our two new-- 00:58:24.030 --> 00:58:27.300 of our two new operators today, we have the ampersand and the star. 00:58:27.300 --> 00:58:30.405 Which one will lead us to what is at an address? 00:58:30.405 --> 00:58:31.540 AUDIENCE: [INAUDIBLE] 00:58:31.540 --> 00:58:32.707 DAVID J. MALAN: So the star. 00:58:32.707 --> 00:58:37.200 So in fact, if I want to print out, what is at address zero, at the address s, 00:58:37.200 --> 00:58:39.390 I can just do star s. 00:58:39.390 --> 00:58:42.060 And if you really want to get fancy, how do you print out 00:58:42.060 --> 00:58:45.300 the second character that's immediately to the right of it, so to speak? 00:58:45.300 --> 00:58:48.424 Well, you can go to, with the de-reference operator-- 00:58:48.424 --> 00:58:49.882 and do you want to answer this one? 00:58:49.882 --> 00:58:50.826 AUDIENCE: [INAUDIBLE] 00:58:50.826 --> 00:58:52.380 DAVID J. MALAN: S plus 1. 00:58:52.380 --> 00:58:54.300 Ergo, pointer arithmetic. 00:58:54.300 --> 00:58:56.430 You can do math, simple addition, subtraction, 00:58:56.430 --> 00:58:58.140 whatever, on pointers if you want. 00:58:58.140 --> 00:58:59.850 And you can do this here too. 00:58:59.850 --> 00:59:01.980 So star, if you want to pluck this one off too, 00:59:01.980 --> 00:59:04.914 how do I print out the last character, the third? 00:59:04.914 --> 00:59:05.870 AUDIENCE: s plus 2? 00:59:05.870 --> 00:59:07.227 DAVID J. MALAN: s plus 2. 00:59:07.227 --> 00:59:09.560 Because if you know and understand that a string is just 00:59:09.560 --> 00:59:11.935 a sequence of characters, every character is just a byte, 00:59:11.935 --> 00:59:14.160 and these bytes are back to back to back, 00:59:14.160 --> 00:59:17.160 you can just go wherever you want in the computer's memory. 00:59:17.160 --> 00:59:20.150 And here, I can do make addresses again, dot slash addresses. 00:59:20.150 --> 00:59:23.030 And voila, we now have hi exclamation point. 00:59:23.030 --> 00:59:25.100 So we haven't printed out anything new. 00:59:25.100 --> 00:59:28.430 But again, just by using these two new operators, the ampersand and the star, 00:59:28.430 --> 00:59:30.360 you can figure out the address of something, 00:59:30.360 --> 00:59:33.020 and you can go to the address of something. 00:59:33.020 --> 00:59:34.250 OK, question in back. 00:59:34.250 --> 00:59:35.352 AUDIENCE: [INAUDIBLE] 00:59:35.352 --> 00:59:36.310 DAVID J. MALAN: Indeed. 00:59:36.310 --> 00:59:37.770 It ends up being the exact same. 00:59:37.770 --> 00:59:39.520 And so I might have used this term before. 00:59:39.520 --> 00:59:41.890 The ampersand technique-- sorry. 00:59:41.890 --> 00:59:45.625 The square bracket technique where you do s bracket zero, s bracket one, 00:59:45.625 --> 00:59:49.240 s bracket two, that's actually what we would really call syntactic sugar. 00:59:49.240 --> 00:59:49.780 It works. 00:59:49.780 --> 00:59:50.572 And you can use it. 00:59:50.572 --> 00:59:51.322 You should use it. 00:59:51.322 --> 00:59:52.210 It's nice and simple. 00:59:52.210 --> 00:59:55.090 But the square bracket notation underneath the hood 00:59:55.090 --> 00:59:58.107 is essentially being converted to this, which this is not fun. 00:59:58.107 --> 01:00:00.190 This is when you want to show off to your friends, 01:00:00.190 --> 01:00:01.773 you know how to do cool stuff in code. 01:00:01.773 --> 01:00:05.313 But this is not as readable as just s bracket zero, and one, and two. 01:00:05.313 --> 01:00:07.480 But that's all that's happening underneath the hood. 01:00:07.480 --> 01:00:09.745 And so again, this is why in CS50 we spend time 01:00:09.745 --> 01:00:11.620 on some of these lower level building blocks. 01:00:11.620 --> 01:00:14.290 Because if you assume that indeed your computer's memory is just 01:00:14.290 --> 01:00:19.510 this grid of bytes and you have now the code ability in code to get an address 01:00:19.510 --> 01:00:22.600 and go to an address, you can start doing anything you want. 01:00:22.600 --> 01:00:25.210 And you can poke around a computer's memory at any location. 01:00:25.210 --> 01:00:26.950 And herein lies the danger. 01:00:26.950 --> 01:00:28.660 I'm kind of on the honor system right now 01:00:28.660 --> 01:00:32.140 that if my string is hi exclamation point, it's kind up to me 01:00:32.140 --> 01:00:34.930 to go to the first byte, the second, and the third. 01:00:34.930 --> 01:00:36.910 But I could get kind of crazy now. 01:00:36.910 --> 01:00:40.090 And if I want to see what's going on in the computer's memory, I mean, 01:00:40.090 --> 01:00:43.570 there's nothing stopping me from doing like s plus 50. 01:00:43.570 --> 01:00:44.780 And let's see what's there. 01:00:44.780 --> 01:00:49.540 So make addresses, dot slash addresses, hi, and then, OK, nothing it seems. 01:00:49.540 --> 01:00:51.522 Well, how about 5,000 bytes away? 01:00:51.522 --> 01:00:52.480 Let's just poke around. 01:00:52.480 --> 01:00:54.105 What's inside of the computer's memory? 01:00:54.105 --> 01:00:59.020 So make addresses again, make addresses, dot slash addresses, Enter. 01:00:59.020 --> 01:01:00.400 OK, still nothing there. 01:01:00.400 --> 01:01:02.470 Let's try 50,000. 01:01:02.470 --> 01:01:03.190 All right. 01:01:03.190 --> 01:01:05.860 Make addresses, dot slash addresses. 01:01:05.860 --> 01:01:07.530 OK, there we see it. 01:01:07.530 --> 01:01:09.280 So you've probably done this, some of you, 01:01:09.280 --> 01:01:12.322 by accident because you probably went too far to the left or to the right 01:01:12.322 --> 01:01:14.710 in an array touching memory that you shouldn't. 01:01:14.710 --> 01:01:19.180 Suffice it to say I should not go blindly touching 50,000 bytes away. 01:01:19.180 --> 01:01:20.650 Because who knows what's there? 01:01:20.650 --> 01:01:23.320 And indeed, in your computer, when a program is running, 01:01:23.320 --> 01:01:26.810 the computer segments it into different segments of memory. 01:01:26.810 --> 01:01:30.790 And if you get a little too greedy and you touch another segment of memory 01:01:30.790 --> 01:01:34.510 that technically was not allocated to you by Mac OS, or Windows, or Linux, 01:01:34.510 --> 01:01:36.820 or the operating system, bad things happen. 01:01:36.820 --> 01:01:38.320 And you get a segmentation fault. 01:01:38.320 --> 01:01:40.040 And that means it's a bug in your code. 01:01:40.040 --> 01:01:41.500 So you can now do this. 01:01:41.500 --> 01:01:44.410 And this means hackers too can do things like this. 01:01:44.410 --> 01:01:47.440 If they can somehow inject code into your C program, 01:01:47.440 --> 01:01:50.240 maybe they can poke around the computer's memory. 01:01:50.240 --> 01:01:52.360 And indeed, this is kind of the technique whereby 01:01:52.360 --> 01:01:55.990 maybe a really sophisticated hacker can jump to this memory, this memory, 01:01:55.990 --> 01:01:58.780 this memory looking for something like your password, 01:01:58.780 --> 01:02:01.960 or your financial information, or anything that's in the program 01:02:01.960 --> 01:02:03.280 but at some other address. 01:02:03.280 --> 01:02:06.520 There's nothing stopping an adversary, at least right now, 01:02:06.520 --> 01:02:09.610 from poking around if they can execute code on your computer 01:02:09.610 --> 01:02:11.120 from doing this kind of thing. 01:02:11.120 --> 01:02:13.750 So there and again is the power of C, but also the danger. 01:02:13.750 --> 01:02:16.910 And you'll absolutely suffer more seg faults in the coming days. 01:02:16.910 --> 01:02:19.240 But ultimately, the goal is going to be to help you 01:02:19.240 --> 01:02:22.490 solve them ultimately and fix things. 01:02:22.490 --> 01:02:27.170 But for now, I think that was quite a bit. 01:02:27.170 --> 01:02:30.640 So let me propose that we go ahead and take our longer break here, 01:02:30.640 --> 01:02:33.730 maybe 10 minutes, and have ourselves some whoopie pies in the transept. 01:02:33.730 --> 01:02:35.710 We'll be back in 10. 01:02:35.710 --> 01:02:37.090 All right. 01:02:37.090 --> 01:02:38.470 So we're back. 01:02:38.470 --> 01:02:42.760 And to recap where we left off, you now have this new capability in code 01:02:42.760 --> 01:02:45.220 to do pointer arithmetic like treat addresses 01:02:45.220 --> 01:02:47.320 as numbers, which they really are in hexadecimal 01:02:47.320 --> 01:02:49.720 or otherwise, and add them together and kind 01:02:49.720 --> 01:02:51.440 of poke around a computer's memory. 01:02:51.440 --> 01:02:55.180 And it was asked during break actually how we might further 01:02:55.180 --> 01:02:56.980 harness this in the context of string. 01:02:56.980 --> 01:02:59.650 So I didn't change the code we wrote just before break. 01:02:59.650 --> 01:03:04.960 Recall that we last broke the program by checking out bytes 50,000 bytes away. 01:03:04.960 --> 01:03:06.070 But let's not do that. 01:03:06.070 --> 01:03:09.760 And let's actually try printing out not individual characters, like I did, 01:03:09.760 --> 01:03:14.710 per the %c, but why don't we try printing out strings and substrings 01:03:14.710 --> 01:03:15.290 if you will? 01:03:15.290 --> 01:03:16.810 So let me clear my terminal window. 01:03:16.810 --> 01:03:21.490 Let me change all of these %c's to %s, %s, %s. 01:03:21.490 --> 01:03:24.520 And then let me rewind to what we've been doing since week one 01:03:24.520 --> 01:03:26.930 with strings, which is just print them out, 01:03:26.930 --> 01:03:28.720 for instance, with that first line. 01:03:28.720 --> 01:03:31.592 And the only difference at the moment is that now, I 01:03:31.592 --> 01:03:32.800 took off the training wheels. 01:03:32.800 --> 01:03:38.950 I got rid of CS50.h wherein string is typedef to char star for you. 01:03:38.950 --> 01:03:39.920 Got rid of that. 01:03:39.920 --> 01:03:42.640 So now on line five, I'm declaring s as being a char star, which 01:03:42.640 --> 01:03:44.223 just means the address of a character. 01:03:44.223 --> 01:03:46.720 And printf is smart enough to know that the end of a string 01:03:46.720 --> 01:03:48.560 is wherever that NUL character is. 01:03:48.560 --> 01:03:50.740 But now that I can do pointer arithmetic, 01:03:50.740 --> 01:03:52.910 notice that I could do something like this. 01:03:52.910 --> 01:03:55.390 If I want to print out s, I just print out s. 01:03:55.390 --> 01:04:02.080 Suppose I do s plus 1 here and s plus 2 here, again, after changing %c to %s. 01:04:02.080 --> 01:04:10.240 Any intuition around what this code will now print on the screen line by line. 01:04:10.240 --> 01:04:11.180 Yeah, thoughts? 01:04:11.180 --> 01:04:12.260 AUDIENCE: [INAUDIBLE] 01:04:12.260 --> 01:04:14.010 DAVID J. MALAN: OK, reasonable conjecture. 01:04:14.010 --> 01:04:17.500 Maybe the memory address of h, that of i, that of exclamation point. 01:04:17.500 --> 01:04:18.550 But other thoughts? 01:04:18.550 --> 01:04:20.105 AUDIENCE: [INAUDIBLE] 01:04:20.105 --> 01:04:20.980 DAVID J. MALAN: Yeah. 01:04:20.980 --> 01:04:22.510 I think it's actually going to do the latter. 01:04:22.510 --> 01:04:24.430 It's going to print, hi, in the usual way. 01:04:24.430 --> 01:04:28.900 Because honestly, line five is this-- rather line six is the same as week one 01:04:28.900 --> 01:04:31.240 stuff, except we took off the training wheel of string 01:04:31.240 --> 01:04:32.620 and we're calling it char star. 01:04:32.620 --> 01:04:35.770 But I think line seven is indeed going to print out i. 01:04:35.770 --> 01:04:38.440 And line eight is just going to print out 01:04:38.440 --> 01:04:40.480 because it will be just the exclamation point. 01:04:40.480 --> 01:04:45.040 Printf will still be smart enough to know where each of those substrings, 01:04:45.040 --> 01:04:47.860 portions of the strings, end by the same logic as always. 01:04:47.860 --> 01:04:51.370 But let me go ahead and zoom out, run make addresses, Enter, 01:04:51.370 --> 01:04:53.950 compiles OK, dot slash addresses. 01:04:53.950 --> 01:04:57.340 And now indeed, this is all a string is. 01:04:57.340 --> 01:05:00.340 It's a sequence of characters identified by its first byte. 01:05:00.340 --> 01:05:03.550 If you then start poking around and tell printf 01:05:03.550 --> 01:05:06.505 to print at what's at the next byte, or the next, next byte, 01:05:06.505 --> 01:05:08.380 it's going to do its same thing, printing out 01:05:08.380 --> 01:05:12.500 that character and everything after it up until that NUL character. 01:05:12.500 --> 01:05:14.810 So again, even though there's a lot going on, 01:05:14.810 --> 01:05:16.630 we've introduced these two new operators, 01:05:16.630 --> 01:05:19.840 there's nothing that's happening today that hasn't been happening for weeks. 01:05:19.840 --> 01:05:23.367 But hopefully, through this week, this week's lecture, this week's problem 01:05:23.367 --> 01:05:25.450 set, and beyond, you'll start to realize that now, 01:05:25.450 --> 01:05:28.570 you just have more tools by which to harness those lower 01:05:28.570 --> 01:05:30.680 level implementation details. 01:05:30.680 --> 01:05:34.420 So last week two, recall one other implementation detail. 01:05:34.420 --> 01:05:39.250 I claimed that you could not compare two strings quite as easily as you could 01:05:39.250 --> 01:05:42.820 compare two integers for instance. 01:05:42.820 --> 01:05:45.850 And I told you to use a different function instead 01:05:45.850 --> 01:05:49.540 that you probably used one or more times with the past problem set. 01:05:49.540 --> 01:05:52.395 How are you supposed to compare strings apparently? 01:05:52.395 --> 01:05:53.270 AUDIENCE: [INAUDIBLE] 01:05:53.270 --> 01:05:54.260 DAVID J. MALAN: Yeah, so string compare. 01:05:54.260 --> 01:05:54.825 STR Comp. 01:05:54.825 --> 01:05:57.950 That additional function that we said, eh, you just have to use it for now. 01:05:57.950 --> 01:06:00.590 But you might have a little intuition already 01:06:00.590 --> 01:06:03.410 as to why we have to use STR compare and we can't just 01:06:03.410 --> 01:06:06.170 use equals equals to compare strings. 01:06:06.170 --> 01:06:07.970 Any intuition for this already? 01:06:07.970 --> 01:06:10.190 Why was STR compare necessary last week? 01:06:10.190 --> 01:06:11.380 AUDIENCE: [INAUDIBLE] 01:06:11.380 --> 01:06:12.380 DAVID J. MALAN: Perfect. 01:06:12.380 --> 01:06:14.600 Equals, equals would compare literally the two memory 01:06:14.600 --> 01:06:18.080 addresses instead of the actual strings character by character. 01:06:18.080 --> 01:06:21.180 And unless the memory addresses are literally the same, 01:06:21.180 --> 01:06:24.260 so you compare that exact same memory address, 01:06:24.260 --> 01:06:26.240 two different strings probably are not going 01:06:26.240 --> 01:06:29.640 to be considered equal even if us humans, they indeed look equal. 01:06:29.640 --> 01:06:30.570 So let's see this. 01:06:30.570 --> 01:06:32.750 Let me go ahead and close addresses.c. 01:06:32.750 --> 01:06:35.150 And actually, before I do one last mention, 01:06:35.150 --> 01:06:39.120 one of the powerful things about pointer arithmetic, as an aside, 01:06:39.120 --> 01:06:42.560 is that C, and really the compiler, is smart enough 01:06:42.560 --> 01:06:45.958 to know how many bytes to keep adding and adding. 01:06:45.958 --> 01:06:47.000 And by that, I mean this. 01:06:47.000 --> 01:06:49.875 Right now, we got lucky because a string is a sequence of characters. 01:06:49.875 --> 01:06:52.220 And by definition, every character is a single byte. 01:06:52.220 --> 01:06:56.270 You can poke around and do s plus 1 to get the next byte, s plus 2 01:06:56.270 --> 01:06:58.040 to get the third byte. 01:06:58.040 --> 01:07:00.020 However, if we weren't dealing with strings, 01:07:00.020 --> 01:07:03.350 suppose we were dealing with integers that were in an array 01:07:03.350 --> 01:07:06.800 back to back to back, if you wanted to get at the next integer, 01:07:06.800 --> 01:07:10.077 you could still do plus 1, or plus 2 to get 01:07:10.077 --> 01:07:11.660 at the next or the next, next integer. 01:07:11.660 --> 01:07:16.310 You would not start to get into the weeds of doing plus 4, and then plus 8. 01:07:16.310 --> 01:07:19.460 You don't have to know or care how big the data types are in the computer. 01:07:19.460 --> 01:07:21.652 C and the compiler will figure that out for you 01:07:21.652 --> 01:07:23.110 based on the data type in question. 01:07:23.110 --> 01:07:27.510 So keep that in mind if ever doing this on a different data type than chars. 01:07:27.510 --> 01:07:29.510 All right, so let me go ahead and open up a file 01:07:29.510 --> 01:07:31.850 that I wrote in advance most of. 01:07:31.850 --> 01:07:34.380 And let me hide my terminal window and show you this. 01:07:34.380 --> 01:07:37.160 So here is a program called compare.c, whose purpose in life 01:07:37.160 --> 01:07:39.200 is to compare two strings. 01:07:39.200 --> 01:07:41.032 I'm back to using the CS50 library. 01:07:41.032 --> 01:07:43.490 Because at least for now, and probably a couple more weeks, 01:07:43.490 --> 01:07:47.390 it is so much easier to get input from the user using CS50's function, 01:07:47.390 --> 01:07:47.930 get int. 01:07:47.930 --> 01:07:51.480 But we'll conclude today by taking off those training wheels as well. 01:07:51.480 --> 01:07:55.280 So you can see how you can actually get user input with nothing CS50 specific. 01:07:55.280 --> 01:07:57.890 So line six and seven, pretty boring. 01:07:57.890 --> 01:07:58.820 Week one stuff. 01:07:58.820 --> 01:08:01.190 Get an int called i, get an int called j, 01:08:01.190 --> 01:08:03.710 and store them in two variables, i and j respectively. 01:08:03.710 --> 01:08:07.235 If i equals equals j, print out the same, else print 01:08:07.235 --> 01:08:08.360 out that they're different. 01:08:08.360 --> 01:08:11.750 Let me just stipulate for time's sake, I'm pretty sure this code is correct. 01:08:11.750 --> 01:08:13.550 This will get two integers from the human. 01:08:13.550 --> 01:08:15.800 It will compare them and tell me correctly 01:08:15.800 --> 01:08:17.450 if they're the same or different. 01:08:17.450 --> 01:08:23.180 And I'll prove as much by running make compare dot slash compare. 01:08:23.180 --> 01:08:25.910 And I'll type in 50 for i, 50 for j. 01:08:25.910 --> 01:08:27.029 And they're the same. 01:08:27.029 --> 01:08:30.859 And now I'll do, how about 50, and say 13. 01:08:30.859 --> 01:08:31.920 And those are different. 01:08:31.920 --> 01:08:34.189 So let me just stipulate this code is indeed correct. 01:08:34.189 --> 01:08:37.370 Would have worked in week one, also works now in week four. 01:08:37.370 --> 01:08:40.310 But let me now change it to compare not two integers, 01:08:40.310 --> 01:08:43.620 but as I hinted, maybe two strings instead. 01:08:43.620 --> 01:08:46.220 So let me go ahead and change this line of code 01:08:46.220 --> 01:08:52.220 to maybe be string s equals get string, asking the user for s. 01:08:52.220 --> 01:08:55.223 Then let's change this second line here to be string t, 01:08:55.223 --> 01:08:57.140 just to keep the variable names short for now. 01:08:57.140 --> 01:09:00.680 And t is a good choice after s for something like this. 01:09:00.680 --> 01:09:02.960 Get string, prompt the human for t. 01:09:02.960 --> 01:09:06.890 And then let's change our i and j here to do the wrong thing, 01:09:06.890 --> 01:09:08.450 per the intuition earlier. 01:09:08.450 --> 01:09:11.720 If s equals equals t, then print out the same, 01:09:11.720 --> 01:09:13.399 else, print out that they're different. 01:09:13.399 --> 01:09:16.274 Now if I want, I could take off at least some of the training wheels. 01:09:16.274 --> 01:09:17.689 I could change this to char star. 01:09:17.689 --> 01:09:19.430 I could change this to char star. 01:09:19.430 --> 01:09:20.055 Either is fine. 01:09:20.055 --> 01:09:22.805 I still need the CS50 library though because I'm using get string, 01:09:22.805 --> 01:09:25.880 because it's actually hard, as we'll see today, to get strings manually 01:09:25.880 --> 01:09:26.930 without using a library. 01:09:26.930 --> 01:09:30.470 But I'll keep it using string just for now with the library. 01:09:30.470 --> 01:09:33.890 All right, make compare again, dot slash compare. 01:09:33.890 --> 01:09:38.240 And now let me go ahead and type in, for instance, hi, exclamation point, Enter, 01:09:38.240 --> 01:09:40.220 and hi, exclamation point, Enter. 01:09:40.220 --> 01:09:42.640 And oh, they're different. 01:09:42.640 --> 01:09:44.390 All right, they're obviously not visually. 01:09:44.390 --> 01:09:45.765 But they are underneath the hood. 01:09:45.765 --> 01:09:47.840 And you probably do have the intuition for this 01:09:47.840 --> 01:09:50.660 already, whereby what's going on underneath the hood 01:09:50.660 --> 01:09:54.149 is that we're comparing accidentally the two memory addresses. 01:09:54.149 --> 01:09:55.400 So in fact, let's go there. 01:09:55.400 --> 01:09:56.848 Let's consider the memory. 01:09:56.848 --> 01:09:59.640 And let me zoom out now so I can just have more bytes to play with. 01:09:59.640 --> 01:10:03.020 So the squares are a little smaller than before just so we can fit more in them. 01:10:03.020 --> 01:10:08.950 And let me propose that when I declare s on what was line six a moment ago, 01:10:08.950 --> 01:10:11.450 it ends up somewhere in memory like the top left hand corner 01:10:11.450 --> 01:10:13.010 of my picture for discussion's sake? 01:10:13.010 --> 01:10:18.930 And when I execute that same line of code, and get string is called, 01:10:18.930 --> 01:10:21.560 and I type in hi exclamation point, we know 01:10:21.560 --> 01:10:24.990 from week one that get string puts it somewhere in the computer's memory. 01:10:24.990 --> 01:10:28.580 And I'll propose that it's in the bottom left hand corner of the screen here. 01:10:28.580 --> 01:10:29.970 What happens after that? 01:10:29.970 --> 01:10:32.120 Well, I know, even though I don't generally care, 01:10:32.120 --> 01:10:34.730 that H, I, exclamation point, and the NUL character 01:10:34.730 --> 01:10:40.310 exist at some address, like 0x123, 124, 125, 126 for discussion's sake. 01:10:40.310 --> 01:10:41.480 And what's in s? 01:10:41.480 --> 01:10:44.600 Same as before break, 0x123. 01:10:44.600 --> 01:10:48.080 So that's all that's happening again on line six, which 01:10:48.080 --> 01:10:51.170 is pretty much the same as when we were getting an s earlier. 01:10:51.170 --> 01:10:55.030 But notice now with line seven, when I get a second variable called t 01:10:55.030 --> 01:10:56.830 and I call get string again. 01:10:56.830 --> 01:10:59.980 And by coincidence, as the human, I type the same thing. 01:10:59.980 --> 01:11:02.530 Well, what happens here? t gets its own chunk of memory, 01:11:02.530 --> 01:11:04.120 maybe at the top right. 01:11:04.120 --> 01:11:07.643 That second version of hi gets somewhere else in memory. 01:11:07.643 --> 01:11:10.060 The computer could be smart and notice that it's the same. 01:11:10.060 --> 01:11:11.970 But C doesn't generally do that for you. 01:11:11.970 --> 01:11:13.720 It just plops it somewhere else in memory. 01:11:13.720 --> 01:11:18.873 And maybe it's at address 0x456, 457, 458, 459, or wherever. 01:11:18.873 --> 01:11:21.040 But you can perhaps see where this is going already. 01:11:21.040 --> 01:11:23.770 t now, of course, contains the address of that first byte. 01:11:23.770 --> 01:11:29.530 And so in my code, on line nine, when I compare s and t for equality, 01:11:29.530 --> 01:11:33.190 suffice it to say they are not equal because of the way 01:11:33.190 --> 01:11:36.460 the strings are laid out in the computer's memory, 01:11:36.460 --> 01:11:38.960 it's indeed looks the same, the same values are there. 01:11:38.960 --> 01:11:43.990 But if we abstract away further, you can really see that s and t not the same 01:11:43.990 --> 01:11:45.290 themselves. 01:11:45.290 --> 01:11:46.870 And so how did we fix this? 01:11:46.870 --> 01:11:50.125 Or really, how did we avoid this last week without spilling the beans 01:11:50.125 --> 01:11:52.000 and going down this rabbit hole of explaining 01:11:52.000 --> 01:11:53.890 why you have to use STR compare? 01:11:53.890 --> 01:11:57.760 Well, if I go back to my code here, let's do it now the right way. 01:11:57.760 --> 01:12:00.640 Let me go ahead and include a line of code 01:12:00.640 --> 01:12:04.480 that says string compare of s comma t, both as inputs. 01:12:04.480 --> 01:12:08.530 And then if you recall, what does STR compare return 01:12:08.530 --> 01:12:10.040 when two strings are equal? 01:12:10.040 --> 01:12:11.625 There's three possible return values. 01:12:11.625 --> 01:12:12.500 AUDIENCE: [INAUDIBLE] 01:12:12.500 --> 01:12:13.500 DAVID J. MALAN: So zero. 01:12:13.500 --> 01:12:17.120 So one is for if it comes alphabetically or ASCIIabetically first or second. 01:12:17.120 --> 01:12:18.700 But for now, I just want zero. 01:12:18.700 --> 01:12:22.450 If I want to use STR compare, I do need string.h. 01:12:22.450 --> 01:12:24.130 So string.h does exist. 01:12:24.130 --> 01:12:25.390 That's not a CS50 thing. 01:12:25.390 --> 01:12:28.000 There's no keyword string as a data type. 01:12:28.000 --> 01:12:29.080 That's a CS50 thing. 01:12:29.080 --> 01:12:30.620 But string.h does exist. 01:12:30.620 --> 01:12:34.030 So I think now with that change on line 10, if I do make 01:12:34.030 --> 01:12:38.320 compare, and dot slash compare, and then run again, 01:12:38.320 --> 01:12:42.162 type again, hi exclamation point, hi exclamation point, 01:12:42.162 --> 01:12:43.370 I think now they're the same. 01:12:43.370 --> 01:12:48.760 And just as a second check, HI in all caps, maybe hi in lowercase, 01:12:48.760 --> 01:12:50.440 those are, in fact, different. 01:12:50.440 --> 01:12:51.110 Why? 01:12:51.110 --> 01:12:54.400 Well, STR compare, which was written by some other human decades 01:12:54.400 --> 01:12:59.890 ago is just smart enough to know that it should go to s and go to t, 01:12:59.890 --> 01:13:04.450 start comparing them left to right, stopping once it hits one or both NUL 01:13:04.450 --> 01:13:07.720 characters, and return zero only if everything in s 01:13:07.720 --> 01:13:11.140 and in t are exactly the same. 01:13:11.140 --> 01:13:15.580 Are any questions then on this here? 01:13:15.580 --> 01:13:18.680 Any questions on why we're using STR compare? 01:13:18.680 --> 01:13:19.180 All right. 01:13:19.180 --> 01:13:20.530 If no-- yeah, oh. 01:13:20.530 --> 01:13:22.120 In the middle. 01:13:22.120 --> 01:13:24.270 AUDIENCE: Why do [INAUDIBLE] integers? 01:13:24.270 --> 01:13:25.437 Why [INAUDIBLE]? 01:13:25.437 --> 01:13:26.270 DAVID J. MALAN: Yes. 01:13:26.270 --> 01:13:28.610 So why-- why is it not the case with integers? 01:13:28.610 --> 01:13:30.920 So it turns out it's not the case with integers, 01:13:30.920 --> 01:13:35.060 with floats, with bools, with doubles, with longs. 01:13:35.060 --> 01:13:37.790 Literally every other data type works correctly. 01:13:37.790 --> 01:13:39.530 Strings though are special. 01:13:39.530 --> 01:13:42.500 They're useful enough in programming and have been for decades 01:13:42.500 --> 01:13:44.900 that the authors of printf, and the authors of STR 01:13:44.900 --> 01:13:47.840 compare, and bunches of other functions, strlen for that matter, 01:13:47.840 --> 01:13:51.650 just kind of treat strings special because they're just useful. 01:13:51.650 --> 01:13:54.660 We humans interact using language, be it English or anything else. 01:13:54.660 --> 01:13:58.040 And so it's just useful to have into the language C 01:13:58.040 --> 01:14:02.900 just sort of first class support for this notion of strings of human text. 01:14:02.900 --> 01:14:05.330 So the short answer is just because. 01:14:05.330 --> 01:14:08.300 It just is necessary-- strings are different. 01:14:08.300 --> 01:14:11.480 They're implemented with this address and the NUL character. 01:14:11.480 --> 01:14:14.060 Everything else, though, is just a value. 01:14:14.060 --> 01:14:15.680 But a string again is a white lie. 01:14:15.680 --> 01:14:16.490 It's an address. 01:14:16.490 --> 01:14:19.550 It's not a thing unto itself. 01:14:19.550 --> 01:14:20.180 Good question. 01:14:20.180 --> 01:14:21.020 Yeah, in front. 01:14:21.020 --> 01:14:23.333 AUDIENCE: How come [INAUDIBLE]? 01:14:23.333 --> 01:14:25.000 DAVID J. MALAN: Oh really good question. 01:14:25.000 --> 01:14:30.310 So in my code here in VS Code, what if I do this? 01:14:30.310 --> 01:14:33.600 Instead of STR compare, and instead of if s equals 01:14:33.600 --> 01:14:39.060 equals t, what if I start playing around using star s and star t? 01:14:39.060 --> 01:14:41.020 Really interesting case to consider. 01:14:41.020 --> 01:14:43.330 Let's go back to our sort of deductive logic here. 01:14:43.330 --> 01:14:46.720 So star, the asterisk operator today, means go there. 01:14:46.720 --> 01:14:49.860 So when I've typed in HI once and then HI again, 01:14:49.860 --> 01:14:53.850 both uppercase for instance, what is at the address s literally? 01:14:53.850 --> 01:14:56.410 Someone else. 01:14:56.410 --> 01:14:57.640 What is at the address s? 01:14:57.640 --> 01:14:59.080 Yeah. 01:14:59.080 --> 01:15:00.100 So not quite. 01:15:00.100 --> 01:15:01.645 At the address. 01:15:01.645 --> 01:15:02.830 So not, what is the address? 01:15:02.830 --> 01:15:04.535 What is at the address 0x123? 01:15:04.535 --> 01:15:05.410 AUDIENCE: [INAUDIBLE] 01:15:05.410 --> 01:15:06.340 DAVID J. MALAN: h. 01:15:06.340 --> 01:15:08.590 And what is at the address 0x456? 01:15:08.590 --> 01:15:09.550 AUDIENCE: [INAUDIBLE] 01:15:09.550 --> 01:15:10.690 DAVID J. MALAN: h also. 01:15:10.690 --> 01:15:12.850 And so here, you're kind of cheating. 01:15:12.850 --> 01:15:17.800 You're comparing the first character of both strings, but not every other one. 01:15:17.800 --> 01:15:19.570 Now you could be really pedantic. 01:15:19.570 --> 01:15:22.510 And here, again, this is not a good use of code. 01:15:22.510 --> 01:15:23.600 But you could do this. 01:15:23.600 --> 01:15:26.480 If that, and how about this craziness? 01:15:26.480 --> 01:15:32.140 So star s plus 1 equals equals star t plus 1. 01:15:32.140 --> 01:15:34.420 And you could do this for every character manually. 01:15:34.420 --> 01:15:35.960 But that's why STR compare exists. 01:15:35.960 --> 01:15:37.130 It does all of this for you. 01:15:37.130 --> 01:15:37.798 But that's why. 01:15:37.798 --> 01:15:38.840 And that's the intuition. 01:15:38.840 --> 01:15:41.965 So I would encourage you too, anytime there's something kind of weird going 01:15:41.965 --> 01:15:42.670 on, there's-- 01:15:42.670 --> 01:15:45.160 I realize we might be straining credibility now, 01:15:45.160 --> 01:15:46.990 we haven't told you that many white lies. 01:15:46.990 --> 01:15:50.140 And so most everything that we've seen thus far 01:15:50.140 --> 01:15:53.350 can explain pretty much all of the behavior up until now 01:15:53.350 --> 01:15:56.830 from week one onward in C. So let me revert this back to the right way. 01:15:56.830 --> 01:15:59.830 If s STR compare of s and t equals equals zero, 01:15:59.830 --> 01:16:01.608 this now is the right version of the code. 01:16:01.608 --> 01:16:03.400 And now here is, again, where you can play. 01:16:03.400 --> 01:16:04.250 So let me do this. 01:16:04.250 --> 01:16:07.690 Let me clear my terminal window just to tidy things up. 01:16:07.690 --> 01:16:09.670 Let me get rid of all of this comparison stuff. 01:16:09.670 --> 01:16:12.795 And let's just see what's going on, as you are welcome to in your own code. 01:16:12.795 --> 01:16:14.800 Let's print out, for instance, as we might 01:16:14.800 --> 01:16:18.850 have in week one, the value of s itself on a new line, comma s. 01:16:18.850 --> 01:16:21.460 And then let's just print out t just to make sure it compiles 01:16:21.460 --> 01:16:22.910 and I'm not doing anything wrong. 01:16:22.910 --> 01:16:24.785 But this is not going to be that interesting. 01:16:24.785 --> 01:16:27.820 And frankly, I don't need string.h anymore 01:16:27.820 --> 01:16:29.320 because I'm not using STR compare. 01:16:29.320 --> 01:16:34.660 So make addresses dot slash addresses, there's my-- 01:16:34.660 --> 01:16:35.290 oh, sorry. 01:16:35.290 --> 01:16:36.430 That's fun. 01:16:36.430 --> 01:16:39.130 Not %t, %s here too. 01:16:39.130 --> 01:16:39.797 Ignore that. 01:16:39.797 --> 01:16:40.630 Let's do this again. 01:16:40.630 --> 01:16:43.720 Make a-- oh, and that's the wrong program. 01:16:43.720 --> 01:16:47.770 Dot slash-- let's do make compare dot slash compare. 01:16:47.770 --> 01:16:49.990 And let's type in hi again and hi again. 01:16:49.990 --> 01:16:51.490 And now we just see the two strings. 01:16:51.490 --> 01:16:52.360 I'm not comparing. 01:16:52.360 --> 01:16:54.160 But now we can kind of play around. 01:16:54.160 --> 01:16:57.850 Instead of printing out %s, which prints the string, 01:16:57.850 --> 01:17:01.960 how do I print the address in s? 01:17:01.960 --> 01:17:04.240 I just need to make a slight change. 01:17:04.240 --> 01:17:09.400 If I want to see not what's at s, but I want to see s, the address-- 01:17:09.400 --> 01:17:10.356 Yeah. 01:17:10.356 --> 01:17:13.050 AUDIENCE: Change %s to %p? 01:17:13.050 --> 01:17:14.050 DAVID J. MALAN: Perfect. 01:17:14.050 --> 01:17:17.500 So change %s in both places here to %p. 01:17:17.500 --> 01:17:20.522 So now, printf will treat it literally as an address. 01:17:20.522 --> 01:17:22.480 It's not going to do any fancy this with a loop 01:17:22.480 --> 01:17:24.522 from left to right looking for the NUL character. 01:17:24.522 --> 01:17:26.140 It's just going to print out s and t. 01:17:26.140 --> 01:17:29.170 So let me clear my terminal, run make compare, whoops. 01:17:29.170 --> 01:17:31.450 Let's do make compare dot slash compare. 01:17:31.450 --> 01:17:32.050 Enter. 01:17:32.050 --> 01:17:34.100 Type in hi, type in hi again. 01:17:34.100 --> 01:17:37.618 And now you see, oh, so this is interesting. 01:17:37.618 --> 01:17:40.660 It's not quite as straightforward as the other values which were slight-- 01:17:40.660 --> 01:17:41.950 1 byte away. 01:17:41.950 --> 01:17:43.220 They're almost the same. 01:17:43.220 --> 01:17:44.950 But this one ends in b0. 01:17:44.950 --> 01:17:46.540 This one ends in f0. 01:17:46.540 --> 01:17:49.630 So they're indeed separated by some number 01:17:49.630 --> 01:17:51.430 of bytes, not just one, but a few. 01:17:51.430 --> 01:17:54.000 Because these strings are indeed longer. 01:17:54.000 --> 01:17:54.500 All right. 01:17:54.500 --> 01:17:58.240 So once you've seen this here, how can we now maybe leverage 01:17:58.240 --> 01:18:00.250 this to solve other problems? 01:18:00.250 --> 01:18:01.880 Well, let me propose that we do this. 01:18:01.880 --> 01:18:05.770 Let me zoom out here, let me close compare. 01:18:05.770 --> 01:18:10.150 And let me open up another program I wrote part of in advance called copy.c. 01:18:10.150 --> 01:18:14.230 So copy.c in theory makes a copy of a string. 01:18:14.230 --> 01:18:14.920 How? 01:18:14.920 --> 01:18:17.920 On line eight, I'm doing the same thing as before. 01:18:17.920 --> 01:18:22.360 Get string, storing it in a string, or char star, and asking the user for it. 01:18:22.360 --> 01:18:24.980 Then I'm not asking get string again. 01:18:24.980 --> 01:18:30.610 I'm just making a copy super simply with line 10 here, string t equals s. 01:18:30.610 --> 01:18:33.820 Now intuitively, I think that's how I would copy a variable. 01:18:33.820 --> 01:18:37.090 That's how we've copied variables every week thus far in C. 01:18:37.090 --> 01:18:39.160 But something is going to go wrong. 01:18:39.160 --> 01:18:41.830 In line 12, in English, does someone want 01:18:41.830 --> 01:18:44.470 to explain what you think line 12 does? 01:18:44.470 --> 01:18:46.450 Don't worry about finding any bugs or mistakes. 01:18:46.450 --> 01:18:50.290 But what does line 12 seem to be doing using two upper, which 01:18:50.290 --> 01:18:54.080 is thanks to the C type library, which I've included the header file for? 01:18:54.080 --> 01:18:54.580 Yeah. 01:18:54.580 --> 01:18:56.593 AUDIENCE: [INAUDIBLE] 01:18:56.593 --> 01:18:57.760 DAVID J. MALAN: Yeah, right? 01:18:57.760 --> 01:18:59.350 It's kind of like ugly syntax. 01:18:59.350 --> 01:19:02.710 But this would seem to be capitalizing the first letter of t 01:19:02.710 --> 01:19:04.880 specifically and just changing it. 01:19:04.880 --> 01:19:07.720 So we have t bracket 0 here, because we want to save the change. 01:19:07.720 --> 01:19:10.430 And we're passing to two upper, the first character here. 01:19:10.430 --> 01:19:12.460 So this is how we did uppercase in the past. 01:19:12.460 --> 01:19:16.930 And now I print out s and t respectively using %s. 01:19:16.930 --> 01:19:18.460 So this feels like it should work. 01:19:18.460 --> 01:19:21.790 I copied s and stored it in t on line 10. 01:19:21.790 --> 01:19:25.450 And then I change t and only t on line 12. 01:19:25.450 --> 01:19:27.970 But you can perhaps, if you're comfy thus far, 01:19:27.970 --> 01:19:32.860 see where this is going if I do make copy, dot slash copy. 01:19:32.860 --> 01:19:37.220 And let me type in lowercase hi exclamation point this time, just once. 01:19:37.220 --> 01:19:38.410 So I'm going to hit Enter. 01:19:38.410 --> 01:19:43.510 And watch what we see for the value of s and t. 01:19:43.510 --> 01:19:48.640 The new value of s and t at the end of my program seems to be what? 01:19:48.640 --> 01:19:52.030 It seems to be the same. 01:19:52.030 --> 01:19:54.165 Hi is capitalized both times. 01:19:54.165 --> 01:19:56.620 So what's the intuition then for this? 01:19:56.620 --> 01:20:00.100 Why did this just happen? 01:20:00.100 --> 01:20:01.030 Yeah, in back. 01:20:01.030 --> 01:20:02.582 AUDIENCE: [INAUDIBLE] 01:20:02.582 --> 01:20:05.290 DAVID J. MALAN: Yeah, I assigned s and t the same memory address. 01:20:05.290 --> 01:20:07.590 So it did copy s into t. 01:20:07.590 --> 01:20:09.540 But C takes this very literally. 01:20:09.540 --> 01:20:10.073 What is s? 01:20:10.073 --> 01:20:10.740 It's an address. 01:20:10.740 --> 01:20:11.280 What is t? 01:20:11.280 --> 01:20:12.790 It's a copy of that address. 01:20:12.790 --> 01:20:15.990 If you want to copy the whole string like a normal human would expect, 01:20:15.990 --> 01:20:18.180 hey, you or someone has to do a lot more work. 01:20:18.180 --> 01:20:21.690 You have to go to that address, copy this character, this one, this one, 01:20:21.690 --> 01:20:24.390 this one, and copy it to a new location and memory. 01:20:24.390 --> 01:20:26.745 That does not happen automatically here for you in C. 01:20:26.745 --> 01:20:28.620 It does in some other languages, those of you 01:20:28.620 --> 01:20:30.660 who've programmed in certain higher level languages. 01:20:30.660 --> 01:20:32.090 This just works as you would hope. 01:20:32.090 --> 01:20:34.590 And that's one of the benefits of Python and other languages 01:20:34.590 --> 01:20:35.520 that we'll soon see. 01:20:35.520 --> 01:20:38.880 But for now, it literally takes at face value what this is. 01:20:38.880 --> 01:20:40.800 Copy the address into this address. 01:20:40.800 --> 01:20:44.470 And I'll make that more clear by getting rid of the string keyword, which, 01:20:44.470 --> 01:20:45.960 again, is just a typedef. 01:20:45.960 --> 01:20:47.890 This is technically an address here. 01:20:47.890 --> 01:20:49.570 This is technically an address here. 01:20:49.570 --> 01:20:54.090 So what's being copied is the value of that address, not all of the characters 01:20:54.090 --> 01:20:55.750 that might very well follow it. 01:20:55.750 --> 01:20:58.530 So I should make one note too here. 01:20:58.530 --> 01:21:01.290 I'm going to start getting more in the habit of trying 01:21:01.290 --> 01:21:05.070 to avoid segmentation faults because things could go wrong here. 01:21:05.070 --> 01:21:09.930 For instance, on line 12 previously, I was kind of blindly, naively, 01:21:09.930 --> 01:21:14.265 dangerously assuming that there will be at least one character in s or t. 01:21:14.265 --> 01:21:15.390 That might not be the case. 01:21:15.390 --> 01:21:18.390 If the user just hits Enter, there's no characters to uppercase. 01:21:18.390 --> 01:21:21.720 And so this is reckless of me and could theoretically create a seg fault. 01:21:21.720 --> 01:21:25.510 So I should probably start to be smarter and say something like this. 01:21:25.510 --> 01:21:29.130 If the length of t is greater than zero, OK, 01:21:29.130 --> 01:21:32.310 now it's safe to actually capitalize the first letter. 01:21:32.310 --> 01:21:36.270 And that will decrease the probability now of those segmentation faults 01:21:36.270 --> 01:21:40.050 by just not making any assumptions about what the human does. 01:21:40.050 --> 01:21:43.770 Almost always, your programs will crash when you've made a mistake, 01:21:43.770 --> 01:21:49.210 yes, but the user gives you an input that you yourself did not expect. 01:21:49.210 --> 01:21:51.060 So what does this all look like in memory? 01:21:51.060 --> 01:21:53.370 Well, let's go back to the big grid, this time 01:21:53.370 --> 01:21:55.210 focusing on the copying of values. 01:21:55.210 --> 01:21:56.040 And let's do this. 01:21:56.040 --> 01:22:01.110 Here's s as in this new program just declared to be a char star. 01:22:01.110 --> 01:22:04.560 Here is where my lower case high maybe ended up in the computer's memory. 01:22:04.560 --> 01:22:08.860 That's probably at 0x123, 124, 125, whatever, something like that. 01:22:08.860 --> 01:22:12.180 And that's, of course, what ends up in s as a value. 01:22:12.180 --> 01:22:16.860 When I declare t, I do get a second variable called t just like before. 01:22:16.860 --> 01:22:21.330 But when I copy s into t, what happens? 01:22:21.330 --> 01:22:24.150 It's really just literally 0x123. 01:22:24.150 --> 01:22:27.060 Whatever the value of s is is now also the value of t. 01:22:27.060 --> 01:22:29.160 And so if we abstract this away at a high level, 01:22:29.160 --> 01:22:33.240 get rid of all of those extra squares, this is what s and t now are. 01:22:33.240 --> 01:22:36.090 They're indeed copies, but copies of each other, not 01:22:36.090 --> 01:22:38.070 copies of the underlying characters. 01:22:38.070 --> 01:22:41.160 And so if you follow those arrows and try 01:22:41.160 --> 01:22:43.930 to print them both out after capitalizing one or the other, 01:22:43.930 --> 01:22:47.790 you're going to unfortunately end up capitalizing not just one of them, s, 01:22:47.790 --> 01:22:50.310 but both of them, s and t. 01:22:50.310 --> 01:22:52.830 Because literally, it's the same address. 01:22:52.830 --> 01:22:56.130 Any questions, then, on this visualization? 01:22:56.130 --> 01:22:56.730 Yeah. 01:22:56.730 --> 01:22:58.330 AUDIENCE: [INAUDIBLE] 01:22:58.330 --> 01:22:59.580 DAVID J. MALAN: Good question. 01:22:59.580 --> 01:23:01.200 Is this pass by reference? 01:23:01.200 --> 01:23:07.170 We haven't-- we have not seen in detail an example like that. 01:23:07.170 --> 01:23:09.102 Right now, you're copying by value. 01:23:09.102 --> 01:23:10.560 But references will come into play. 01:23:10.560 --> 01:23:12.852 And remind me in a bit if I haven't used that term yet. 01:23:12.852 --> 01:23:14.925 But this is just copying things by-- 01:23:14.925 --> 01:23:17.370 that could have ended poorly, value. 01:23:17.370 --> 01:23:19.840 Other questions. 01:23:19.840 --> 01:23:20.390 No? 01:23:20.390 --> 01:23:25.960 All right, so with this in mind, how do we actually copy things properly? 01:23:25.960 --> 01:23:28.340 For this, we actually need another building block. 01:23:28.340 --> 01:23:30.080 So today, we give you two functions. 01:23:30.080 --> 01:23:32.860 One of which is called malloc, one of which is called free. 01:23:32.860 --> 01:23:35.590 And these are used all of the time by like every piece 01:23:35.590 --> 01:23:38.460 of software you and I use on our Macs, PCs, and phones, 01:23:38.460 --> 01:23:40.960 whether it's written in C or some equivalent other language. 01:23:40.960 --> 01:23:43.360 Malloc is for memory allocation. 01:23:43.360 --> 01:23:47.960 It's a function that you can use to ask the operating system, MacOS, Linux, 01:23:47.960 --> 01:23:51.250 Windows, anything, for some number of bytes, 1 byte, 100 01:23:51.250 --> 01:23:52.600 bytes, a gigabyte of memory. 01:23:52.600 --> 01:23:55.750 You can ask malloc for however much memory you want in advance. 01:23:55.750 --> 01:24:00.340 It will return to you the address of the first byte of memory 01:24:00.340 --> 01:24:02.110 that it found free for you. 01:24:02.110 --> 01:24:04.940 Unlike a string, it is not NUL terminated. 01:24:04.940 --> 01:24:07.960 And so the danger with malloc is that it's on the honor system. 01:24:07.960 --> 01:24:12.220 If you ask it for 1 byte or 10 bytes, you, the programmer, in a variable, 01:24:12.220 --> 01:24:16.090 have to remember how many bytes you requested, 1, or 10, or the like. 01:24:16.090 --> 01:24:19.032 Strings do that for you, not when we're getting now to this low level. 01:24:19.032 --> 01:24:22.240 Malloc is just going to give you some memory and it's up to you to manage it. 01:24:22.240 --> 01:24:23.500 Free does the opposite. 01:24:23.500 --> 01:24:24.790 When you're done with some chunk of memory, 01:24:24.790 --> 01:24:28.090 you can free it by passing in that same address and just hand it back to Mac 01:24:28.090 --> 01:24:30.230 OS, Windows, or Linux, and say I'm done with this, 01:24:30.230 --> 01:24:33.130 you can let me use this for something else later. 01:24:33.130 --> 01:24:38.198 As an aside, if your computer has ever frozen, or hung, 01:24:38.198 --> 01:24:40.240 the whole thing maybe just spontaneously reboots, 01:24:40.240 --> 01:24:42.280 yet another reason for a bug like that might 01:24:42.280 --> 01:24:46.300 be if you write a program with a bug that keeps mallocing, mallocing, 01:24:46.300 --> 01:24:49.360 mallocing that is asking for more and more and more memory, 01:24:49.360 --> 01:24:51.850 but you make a mistake and you never free it, 01:24:51.850 --> 01:24:54.742 well eventually, the computer is going to literally run out of memory 01:24:54.742 --> 01:24:56.200 and something is going to go wrong. 01:24:56.200 --> 01:24:58.825 And that's often when computers freeze. 01:24:58.825 --> 01:24:59.950 They're just out of memory. 01:24:59.950 --> 01:25:03.430 It has the memory there, but the program was trying to use too much of it 01:25:03.430 --> 01:25:04.150 endlessly. 01:25:04.150 --> 01:25:06.160 So this too will be a mistake that some of us 01:25:06.160 --> 01:25:07.820 will surely make in the coming weeks. 01:25:07.820 --> 01:25:09.890 But hopefully, you'll now see the solution. 01:25:09.890 --> 01:25:12.820 So let me go back to VS Code here. 01:25:12.820 --> 01:25:15.120 And let me propose that we do the following. 01:25:15.120 --> 01:25:16.870 I'll hide my terminal window for a moment. 01:25:16.870 --> 01:25:19.698 And I'm going to introduce another header file up here. 01:25:19.698 --> 01:25:22.240 And I promise there's not going to be too many more of these. 01:25:22.240 --> 01:25:26.860 But this one is called standard lib.h for standard library. 01:25:26.860 --> 01:25:31.060 And in this file are the declarations, the prototypes for malloc, and free, 01:25:31.060 --> 01:25:32.530 and a bunch of other stuff as well. 01:25:32.530 --> 01:25:35.270 It lets me now manage my own memory. 01:25:35.270 --> 01:25:37.360 So let's focus now on line 11. 01:25:37.360 --> 01:25:39.400 Line 11 is where I went wrong before. 01:25:39.400 --> 01:25:41.650 Because conceptually, I want to copy the whole string. 01:25:41.650 --> 01:25:45.530 But of course, I'm only copying modestly the individual address. 01:25:45.530 --> 01:25:47.680 So how do I copy the whole darned thing? 01:25:47.680 --> 01:25:49.400 Well, what I need to do is this. 01:25:49.400 --> 01:25:53.290 When I declare t to be the address of something in memory, 01:25:53.290 --> 01:25:56.780 why don't I set t to be the address of a free chunk of memory? 01:25:56.780 --> 01:25:59.620 So let me ask the operating system, give me this many bytes. 01:25:59.620 --> 01:26:00.820 Tell me what the address is. 01:26:00.820 --> 01:26:03.190 And I'm going to store that in t initially just so I 01:26:03.190 --> 01:26:04.850 know where there's free space for me. 01:26:04.850 --> 01:26:06.020 So how do I do that? 01:26:06.020 --> 01:26:09.250 Well, quite simply, I call malloc, and then I pass in the number of bytes 01:26:09.250 --> 01:26:09.850 that I need. 01:26:09.850 --> 01:26:12.850 Now for HI exclamation point, I think I need three. 01:26:12.850 --> 01:26:13.770 Although wait, no. 01:26:13.770 --> 01:26:16.620 I really need four because of the NUL character. 01:26:16.620 --> 01:26:19.120 But I don't think I should be hard coding numbers like this. 01:26:19.120 --> 01:26:21.328 Because who knows what the human is going to type in? 01:26:21.328 --> 01:26:25.630 So I can actually use strlen of s, and then plus 1. 01:26:25.630 --> 01:26:28.870 This will ask malloc then for however many bytes 01:26:28.870 --> 01:26:32.380 corresponds to the number of characters the human typed in plus 1, 01:26:32.380 --> 01:26:33.970 for again, the NUL character. 01:26:33.970 --> 01:26:37.550 So it's just being smart and defensive rather than choosing a number myself. 01:26:37.550 --> 01:26:41.330 But now all t is is a pointer, if you will, 01:26:41.330 --> 01:26:43.520 to some random chunk of free space. 01:26:43.520 --> 01:26:45.010 So there's nothing there yet. 01:26:45.010 --> 01:26:45.993 Or there's bits there. 01:26:45.993 --> 01:26:47.410 But who knows what value they are? 01:26:47.410 --> 01:26:49.870 They're certainly not identical to what the human typed in. 01:26:49.870 --> 01:26:51.430 I now have to do this. 01:26:51.430 --> 01:26:55.090 So how can I copy one string into the other? 01:26:55.090 --> 01:26:56.450 Well, let me do this. 01:26:56.450 --> 01:27:00.650 Instead of capitalizing something just yet, let me do this. 01:27:00.650 --> 01:27:08.020 How about four int i get 0, i is less than the length of s. 01:27:08.020 --> 01:27:09.262 And then i plus plus. 01:27:09.262 --> 01:27:11.720 So I'm going to iterate for the whole length of the string. 01:27:11.720 --> 01:27:13.630 And in here, I'm just going to do this. 01:27:13.630 --> 01:27:18.640 The ith character in t should be identical to the ith character in s. 01:27:18.640 --> 01:27:22.870 So I'm just literally copying from right to left each and every character in s. 01:27:22.870 --> 01:27:24.980 And I can trust that there's enough memory in t. 01:27:24.980 --> 01:27:25.480 Why? 01:27:25.480 --> 01:27:27.670 Because I asked for that many bytes plus 1. 01:27:27.670 --> 01:27:29.410 Now there's technically a bug here. 01:27:29.410 --> 01:27:31.240 I actually should probably do this. 01:27:31.240 --> 01:27:34.480 I should do plus 1 here. 01:27:34.480 --> 01:27:39.100 Or if you prefer, I should do less than or equal to the strlen. 01:27:39.100 --> 01:27:41.440 But I think it's a little clear to do the plus 1. 01:27:41.440 --> 01:27:46.360 Why do I for the first time want to go just beyond the boundary of s 01:27:46.360 --> 01:27:48.130 and copy 1 more byte? 01:27:48.130 --> 01:27:49.005 AUDIENCE: [INAUDIBLE] 01:27:49.005 --> 01:27:50.963 DAVID J. MALAN: Yeah, I need the NUL character. 01:27:50.963 --> 01:27:53.895 I could technically manually add it with some additional line of code. 01:27:53.895 --> 01:27:55.270 But I might as well just copy it. 01:27:55.270 --> 01:27:57.580 Because backslash zero is backslash zero. 01:27:57.580 --> 01:27:59.920 So this time, and probably only this time, 01:27:59.920 --> 01:28:03.340 it's reasonable and correct to go just beyond the boundary of your string 01:28:03.340 --> 01:28:06.640 so you copy the NUL terminating character so that the computer also 01:28:06.640 --> 01:28:08.050 knows where t ends. 01:28:08.050 --> 01:28:12.800 And now I think what I can do a little more safely is this. 01:28:12.800 --> 01:28:18.100 Let me go down here and say, t bracket 0 equals 2 upper 01:28:18.100 --> 01:28:21.368 of t, of 2 upper of t bracket 0. 01:28:21.368 --> 01:28:22.660 So same line of code as before. 01:28:22.660 --> 01:28:25.327 If I actually want to be really safe, I should probably do this. 01:28:25.327 --> 01:28:28.540 So if the strlen of t is greater than zero. 01:28:28.540 --> 01:28:30.010 So there's at least 1 byte there. 01:28:30.010 --> 01:28:33.700 OK, now it's safe to blindly capitalize the first character. 01:28:33.700 --> 01:28:36.290 And I think that now puts me in better shape. 01:28:36.290 --> 01:28:37.270 So let me try this now. 01:28:37.270 --> 01:28:43.300 Let me open up my terminal, make copy, dot slash copy. 01:28:43.300 --> 01:28:46.690 I'm going to type in hi exclamation point in all lowercase 01:28:46.690 --> 01:28:48.280 crossing my fingers this time. 01:28:48.280 --> 01:28:53.260 And now if I zoom in, it indeed capitalized only t 01:28:53.260 --> 01:28:55.570 and not s in this case. 01:28:55.570 --> 01:28:57.610 So pictorially, let me switch over here. 01:28:57.610 --> 01:29:02.890 Here is, as before, the variable s pointing at hi in all lowercase. 01:29:02.890 --> 01:29:07.000 When I call malloc though, that gives me a chunk of memory 01:29:07.000 --> 01:29:09.430 that I'm going to store the address in t of. 01:29:09.430 --> 01:29:12.245 So if t is some other variable, as it is in my code, 01:29:12.245 --> 01:29:15.370 and there's some other available chunk of memory, I don't know where it is. 01:29:15.370 --> 01:29:19.660 But let's assume as always it's at 0x456, 457, 458, 459. 01:29:19.660 --> 01:29:20.980 So 4 bytes total. 01:29:20.980 --> 01:29:22.360 What is now happening? 01:29:22.360 --> 01:29:24.760 Well, t is defined as pointing to that. 01:29:24.760 --> 01:29:26.950 Because that's what malloc gives us, the address 01:29:26.950 --> 01:29:29.260 of the first byte of the free memory. 01:29:29.260 --> 01:29:33.070 And now with for loop, I'm just iterating over it, copying the h, 01:29:33.070 --> 01:29:36.700 then the i, then the exclamation point, and then for good measure, 01:29:36.700 --> 01:29:39.790 the backslash 0 instead. 01:29:39.790 --> 01:29:43.476 Questions then on this process here? 01:29:43.476 --> 01:29:44.370 AUDIENCE: [INAUDIBLE] 01:29:44.370 --> 01:29:45.995 DAVID J. MALAN: A really good question. 01:29:45.995 --> 01:29:52.350 If I omitted in my code the plus 1 and I didn't do less than 01:29:52.350 --> 01:29:56.130 or equal to so that I'm copying the fourth byte, odds are in this program, 01:29:56.130 --> 01:29:59.280 because it's so short, you wouldn't notice that there's an actual error. 01:29:59.280 --> 01:30:04.650 But what could happen is when I call printf on t, 01:30:04.650 --> 01:30:09.720 if there's no NUL byte there, it might print h, i, exclamation point, 01:30:09.720 --> 01:30:12.960 some random values, some random values, some random values, some random value 01:30:12.960 --> 01:30:16.380 until it gets lucky and there happens to be a 0 byte, a NUL 01:30:16.380 --> 01:30:18.340 byte by chance for instance. 01:30:18.340 --> 01:30:22.800 So if you don't include the backslash zero some way, that's going to happen. 01:30:22.800 --> 01:30:23.970 And I say some way. 01:30:23.970 --> 01:30:25.030 I could even do this. 01:30:25.030 --> 01:30:29.520 I could technically just copy the length of the string s, and at the very bottom 01:30:29.520 --> 01:30:33.030 here, I could do something like t bracket i-- 01:30:33.030 --> 01:30:38.010 sorry, t bracket strlen of t. 01:30:38.010 --> 01:30:39.520 I could do this. 01:30:39.520 --> 01:30:41.010 But this is just not necessary. 01:30:41.010 --> 01:30:43.540 I could manually add it at the end of the string. 01:30:43.540 --> 01:30:46.170 But again, I'd claim that it's just simpler to borrow, 01:30:46.170 --> 01:30:48.450 that is copy, the one that's already in s because it's 01:30:48.450 --> 01:30:50.370 the same thing at the end of the day. 01:30:50.370 --> 01:30:51.130 Good question. 01:30:51.130 --> 01:30:53.960 Other questions on this copying correctly now? 01:30:57.040 --> 01:30:57.540 All right. 01:30:57.540 --> 01:31:00.070 Is there any room for improvements here? 01:31:00.070 --> 01:31:02.310 Well, let me propose a slight optimization. 01:31:02.310 --> 01:31:04.860 This is kind of a throwback now to week one. 01:31:04.860 --> 01:31:09.810 Turns out that arguably, my line 13 here, wherein I have this for loop, 01:31:09.810 --> 01:31:12.750 now that I'm doing things in loops again and again 01:31:12.750 --> 01:31:15.210 and using a function like strlen, this is correct. 01:31:15.210 --> 01:31:21.510 It will iterate from zero on up to the length of i, length of s plus 1. 01:31:21.510 --> 01:31:26.910 But it's kind of stupid of me to write this for loop in this way. 01:31:26.910 --> 01:31:27.438 Why? 01:31:27.438 --> 01:31:29.230 Well, here's my initialization on the left. 01:31:29.230 --> 01:31:30.930 Here's my condition in the middle. 01:31:30.930 --> 01:31:35.190 And in general, calling a function inside of your condition 01:31:35.190 --> 01:31:38.400 is probably not very good design. 01:31:38.400 --> 01:31:39.000 Why? 01:31:39.000 --> 01:31:43.260 Why is it bad for me to be calling a function like strlen in this condition 01:31:43.260 --> 01:31:44.610 in the middle of my for loop? 01:31:44.610 --> 01:31:45.150 Yeah. 01:31:45.150 --> 01:31:48.430 AUDIENCE: [INAUDIBLE] 01:31:48.430 --> 01:31:50.930 DAVID J. MALAN: Yeah, you're just calling it again and again 01:31:50.930 --> 01:31:51.650 for no reason. 01:31:51.650 --> 01:31:53.040 The length of s never changes. 01:31:53.040 --> 01:31:55.820 So why are you wasting everyone's time by calling strlen of s 01:31:55.820 --> 01:32:00.110 again, again, again, again just to check this inequality, whether i 01:32:00.110 --> 01:32:01.350 is less than that value? 01:32:01.350 --> 01:32:03.260 So it turns out if you haven't discovered this already, 01:32:03.260 --> 01:32:05.093 there's a slight optimization we can do here 01:32:05.093 --> 01:32:08.570 that has nothing to do fundamentally with strings, or pointers, just 01:32:08.570 --> 01:32:09.770 with better design. 01:32:09.770 --> 01:32:12.260 I can actually define two variables at once. 01:32:12.260 --> 01:32:13.350 I could do this. 01:32:13.350 --> 01:32:15.230 Let me remove this whole condition. 01:32:15.230 --> 01:32:20.660 And let me add a comma after i equals 0, set n, or any variable, 01:32:20.660 --> 01:32:24.590 equal to the strlen of s plus 1. 01:32:24.590 --> 01:32:30.020 And then after the semicolon, just ask the question while i is less than n. 01:32:30.020 --> 01:32:31.620 So it's almost the same. 01:32:31.620 --> 01:32:35.090 But notice now my condition in the very middle of this loop 01:32:35.090 --> 01:32:37.730 is at least comparing two static values. 01:32:37.730 --> 01:32:38.700 n never changes. 01:32:38.700 --> 01:32:39.200 Sorry. 01:32:39.200 --> 01:32:41.390 One static value. n never changes. 01:32:41.390 --> 01:32:42.500 All that changes is i. 01:32:42.500 --> 01:32:45.810 But I'm not foolishly calling strlen, strlen, strlen again and again. 01:32:45.810 --> 01:32:46.310 Why? 01:32:46.310 --> 01:32:47.600 Well, how does strlen work? 01:32:47.600 --> 01:32:52.547 Similar in spirit to printf, strlen, given the name of a string, 01:32:52.547 --> 01:32:54.380 looks at the first character and then starts 01:32:54.380 --> 01:32:57.590 looking through the entire string looking for the NUL character. 01:32:57.590 --> 01:33:01.320 And we saw this in week two counting up how many characters are there. 01:33:01.320 --> 01:33:03.820 So it's just a waste of time again and again. 01:33:03.820 --> 01:33:09.330 AUDIENCE: [INAUDIBLE] all the way at the top so that way, [INAUDIBLE]?? 01:33:09.330 --> 01:33:10.330 DAVID J. MALAN: Totally. 01:33:10.330 --> 01:33:12.220 If you wanted to use n multiple times, you 01:33:12.220 --> 01:33:16.330 could absolutely take it out of for loop, put it right after s is defined, 01:33:16.330 --> 01:33:17.770 and reuse n and again and again. 01:33:17.770 --> 01:33:18.430 Absolutely. 01:33:18.430 --> 01:33:19.990 But in general, consider this. 01:33:19.990 --> 01:33:23.500 When designing for loops, even though modern compilers like Clang, 01:33:23.500 --> 01:33:26.200 can actually fix this problem, this inefficiency for you, 01:33:26.200 --> 01:33:29.320 good practice would be don't call functions unnecessarily, 01:33:29.320 --> 01:33:33.020 especially if the answer is always going to be the same. 01:33:33.020 --> 01:33:33.520 All right. 01:33:33.520 --> 01:33:37.100 So what else should I perhaps refine here? 01:33:37.100 --> 01:33:41.380 Well, how about I do one last thing and just comment on what exactly 01:33:41.380 --> 01:33:42.890 could go wrong here. 01:33:42.890 --> 01:33:44.320 Well, a couple of things. 01:33:44.320 --> 01:33:46.480 Well, actually, this is just silly too. 01:33:46.480 --> 01:33:50.290 Surely, someone before me in the world has had to copy a string before. 01:33:50.290 --> 01:33:53.380 Surely, there's a function like called strcpy maybe, 01:33:53.380 --> 01:33:55.000 like strcompare, like strlen. 01:33:55.000 --> 01:33:55.900 And indeed there is. 01:33:55.900 --> 01:33:58.960 So let me propose that we actually get rid of this whole for loop 01:33:58.960 --> 01:34:03.880 and we actually just call a function called strcpy, no O, just strcpy. 01:34:03.880 --> 01:34:08.320 And pass in the destination, which is t first, and then the source 01:34:08.320 --> 01:34:10.300 that you want to copy into the destination. 01:34:10.300 --> 01:34:13.810 And that takes the place entirely of that whole loop. 01:34:13.810 --> 01:34:17.140 So again, I demonstrated the loop first just to be very pedantic about it. 01:34:17.140 --> 01:34:18.320 But that's wasting time. 01:34:18.320 --> 01:34:20.820 You're wasting time writing lines of code you don't need to. 01:34:20.820 --> 01:34:24.020 strcpy is what you can use here instead. 01:34:24.020 --> 01:34:25.720 And so this has now always existed. 01:34:25.720 --> 01:34:26.930 And what more can I do? 01:34:26.930 --> 01:34:30.520 Well as one final point, it turns out that there's actually 01:34:30.520 --> 01:34:33.940 things that can go wrong in this code even besides the string 01:34:33.940 --> 01:34:34.785 being too short. 01:34:34.785 --> 01:34:37.160 If the human just hits Enter and there are no characters, 01:34:37.160 --> 01:34:40.243 I don't want to blindly capitalize the first character that doesn't exist. 01:34:40.243 --> 01:34:41.950 That's why I added that if condition. 01:34:41.950 --> 01:34:43.840 But there's other things that can go wrong. 01:34:43.840 --> 01:34:45.520 And we introduce those to you today. 01:34:45.520 --> 01:34:52.330 It turns out that functions like get string and functions like malloc return 01:34:52.330 --> 01:34:54.190 potentially a special value. 01:34:54.190 --> 01:34:58.320 And wonderfully confusingly, it's also called NULL, but with two L's. 01:34:58.320 --> 01:34:58.820 All right? 01:34:58.820 --> 01:35:01.780 So left hand and right hand weren't talking so well decades ago. 01:35:01.780 --> 01:35:04.480 NUL is a backslash zero. 01:35:04.480 --> 01:35:08.950 It's a single character as it always has been for a couple of weeks now. 01:35:08.950 --> 01:35:12.070 NULL is technically a pointer. 01:35:12.070 --> 01:35:14.650 It's an address, but it's address zero. 01:35:14.650 --> 01:35:18.550 It's like the top left hand corner, if you will, of your computer's memory 01:35:18.550 --> 01:35:21.490 that just nothing is ever supposed to go in by convention. 01:35:21.490 --> 01:35:24.790 So NULL is a synonym for zero. 01:35:24.790 --> 01:35:26.260 But it's specifically an address. 01:35:26.260 --> 01:35:27.500 Now why is this useful? 01:35:27.500 --> 01:35:30.622 Well, suppose that in my code here, something goes wrong with get string. 01:35:30.622 --> 01:35:33.830 Suppose you're being a little crazy and you type in way too long of a string. 01:35:33.830 --> 01:35:36.223 It's not just hi, but it's like an entire essay of text. 01:35:36.223 --> 01:35:38.140 And there's not enough memory in the computer. 01:35:38.140 --> 01:35:41.350 How does get string signal to the programmer, whoa, 01:35:41.350 --> 01:35:44.290 that's way too big of a string, I can't fit it in memory? 01:35:44.290 --> 01:35:45.860 Well, we never told you this. 01:35:45.860 --> 01:35:49.120 But all of this time, it turns out that get 01:35:49.120 --> 01:35:53.600 string will return this special value called NULL if something goes wrong. 01:35:53.600 --> 01:35:57.080 So to be really careful now, you should do something like this. 01:35:57.080 --> 01:36:03.160 If s equals equals literally NULL, then you better exit the program entirely 01:36:03.160 --> 01:36:06.400 and return like one, or two, or three to signify that something went wrong. 01:36:06.400 --> 01:36:08.320 Don't go any further. 01:36:08.320 --> 01:36:12.002 Similarly with malloc, it's possible if you ask for way too much memory, that 01:36:12.002 --> 01:36:14.710 could fail, especially if you're asking now for double the memory 01:36:14.710 --> 01:36:16.168 after the human typed something in. 01:36:16.168 --> 01:36:18.760 So if t equals equals NULL, then you know what? 01:36:18.760 --> 01:36:20.860 Let's also return one, or some other value, 01:36:20.860 --> 01:36:25.220 to just get out before something crashes or freezes on the human as well. 01:36:25.220 --> 01:36:28.377 So honestly, I tend not to do this always in class because the code just 01:36:28.377 --> 01:36:29.710 gets so bloated and complicated. 01:36:29.710 --> 01:36:32.780 But you absolutely in practice need to start doing this. 01:36:32.780 --> 01:36:36.040 Otherwise, you will be responsible for the freezes, and the crashes, 01:36:36.040 --> 01:36:38.140 and the reboots that users in the real world 01:36:38.140 --> 01:36:40.450 might actually encounter otherwise. 01:36:40.450 --> 01:36:43.300 Of course, if we get to the bottom of this program now, 01:36:43.300 --> 01:36:46.960 I should probably return zero explicitly, or implicitly, to just 01:36:46.960 --> 01:36:50.260 signify that everything is successful. 01:36:50.260 --> 01:36:52.660 But there's one other thing I haven't done. 01:36:52.660 --> 01:36:53.890 We introduced malloc. 01:36:53.890 --> 01:36:55.725 But what did I claim also existed? 01:36:55.725 --> 01:36:56.350 AUDIENCE: Free. 01:36:56.350 --> 01:36:57.220 DAVID J. MALAN: So free. 01:36:57.220 --> 01:36:58.762 I'm also being a little reckless now. 01:36:58.762 --> 01:37:00.850 Here I am not practicing what I'm preaching. 01:37:00.850 --> 01:37:03.370 I'm asking the computer for memory via get string, 01:37:03.370 --> 01:37:05.830 I'm asking the computer for more memory via malloc, 01:37:05.830 --> 01:37:08.210 and I'm never technically handing it back. 01:37:08.210 --> 01:37:11.770 So really what I should be doing at the very bottom of my program 01:37:11.770 --> 01:37:16.120 too is freeing the memory I've asked for. 01:37:16.120 --> 01:37:19.540 So henceforth, it is a rule, a law, if you will in C, 01:37:19.540 --> 01:37:23.380 whenever you allocate memory with malloc, or certain other functions 01:37:23.380 --> 01:37:27.670 as well, you, the programmer, must free it when you're all done with it. 01:37:27.670 --> 01:37:30.250 Now this is a bit of an overstatement because technically, 01:37:30.250 --> 01:37:32.800 when programs quit, they'll free the memory automatically. 01:37:32.800 --> 01:37:35.410 So you're not going to break someone's Mac or PC because you necessarily 01:37:35.410 --> 01:37:35.980 have this bug. 01:37:35.980 --> 01:37:38.480 But for programs that are running all the time, like someone 01:37:38.480 --> 01:37:41.890 keeps a Chrome, their browser open, Microsoft Word, or the like, bad things 01:37:41.890 --> 01:37:44.960 will happen if over time you never, never, never call free 01:37:44.960 --> 01:37:46.210 and the program keeps running. 01:37:46.210 --> 01:37:48.250 So always get into this habit here. 01:37:48.250 --> 01:37:52.540 You do not need the free memory that comes from get string because the CS50 01:37:52.540 --> 01:37:54.680 library automatically frees it for you. 01:37:54.680 --> 01:37:58.840 But you, any time you use malloc henceforth, as you did or I did here, 01:37:58.840 --> 01:38:04.150 you must free that by just passing in the same address you got back. 01:38:04.150 --> 01:38:09.740 Questions now on malloc and free? 01:38:09.740 --> 01:38:11.000 Questions? 01:38:11.000 --> 01:38:11.890 Yeah. 01:38:11.890 --> 01:38:17.878 AUDIENCE: [INAUDIBLE] 01:38:17.878 --> 01:38:19.420 DAVID J. MALAN: Really good question. 01:38:19.420 --> 01:38:22.260 So free just-- so what does free do? 01:38:22.260 --> 01:38:26.072 So free just lets the computer know that you 01:38:26.072 --> 01:38:27.780 are done with that chunk of memory, which 01:38:27.780 --> 01:38:29.572 means that if you have another line of code 01:38:29.572 --> 01:38:32.430 elsewhere, that same memory might be reused, 01:38:32.430 --> 01:38:34.020 and can be used again and again. 01:38:34.020 --> 01:38:36.280 And that's going to be necessary certainly for any long running program. 01:38:36.280 --> 01:38:37.827 You can't ask for memory constantly. 01:38:37.827 --> 01:38:38.910 You'll eventually run out. 01:38:38.910 --> 01:38:40.368 So you need to free it in this way. 01:38:40.368 --> 01:38:41.555 Other languages as an aside. 01:38:41.555 --> 01:38:43.680 Python, yet another motivation in a couple of weeks 01:38:43.680 --> 01:38:46.013 for it is going to be Python and certain other languages 01:38:46.013 --> 01:38:47.890 manage all this headache for you. 01:38:47.890 --> 01:38:52.600 But in C, the goal here is to really harness these capabilities ourselves. 01:38:52.600 --> 01:38:53.100 All right. 01:38:53.100 --> 01:38:56.140 So it turns out almost everyone in the room, everyone in the room, 01:38:56.140 --> 01:38:57.420 myself included, you're going to screw up 01:38:57.420 --> 01:39:00.090 when it comes to anything memory related if you haven't already. 01:39:00.090 --> 01:39:01.630 Seg faults are in your future. 01:39:01.630 --> 01:39:04.560 But hopefully, there's tools via which you can detect these things 01:39:04.560 --> 01:39:09.750 and fix them proactively, and not just use printf, or debug50, or rubber duck. 01:39:09.750 --> 01:39:12.397 We actually have another tool we can equip you with now 01:39:12.397 --> 01:39:13.980 that will help you find some mistakes. 01:39:13.980 --> 01:39:14.920 So let me do this. 01:39:14.920 --> 01:39:16.800 Let me close copy.c. 01:39:16.800 --> 01:39:19.980 Let me open a program I wrote in advance called memory.c 01:39:19.980 --> 01:39:22.120 that doesn't do anything really interesting. 01:39:22.120 --> 01:39:24.070 But it's going to have two bugs in it. 01:39:24.070 --> 01:39:27.090 Notice that I've included standardio.h as always. 01:39:27.090 --> 01:39:30.270 I've also included standardlib.h, which is necessary now 01:39:30.270 --> 01:39:33.690 for anything related to malloc and or free and the like. 01:39:33.690 --> 01:39:34.890 Line six. 01:39:34.890 --> 01:39:36.730 It's a little weird what I've done here. 01:39:36.730 --> 01:39:42.300 But this is the manual way of asking for enough memory for an array. 01:39:42.300 --> 01:39:45.690 In week two, how do we ask for memory for an array? 01:39:45.690 --> 01:39:49.470 You very simply say, int x3. 01:39:49.470 --> 01:39:52.140 And that gives you an array called x of size three. 01:39:52.140 --> 01:39:55.770 But if you do it manually now using malloc, what you have to do 01:39:55.770 --> 01:39:57.780 is use syntax like this. 01:39:57.780 --> 01:40:02.173 You call malloc, you ask for three things times however big an int is. 01:40:02.173 --> 01:40:03.090 Now we know it's four. 01:40:03.090 --> 01:40:04.650 So you could literally write 12 here. 01:40:04.650 --> 01:40:06.250 But this is more generic. 01:40:06.250 --> 01:40:09.930 So three times the size of an integer will give you 12 dynamically. 01:40:09.930 --> 01:40:11.280 And what does malloc return? 01:40:11.280 --> 01:40:14.400 The address of the first byte you get back. 01:40:14.400 --> 01:40:15.940 Where do I want to put that? 01:40:15.940 --> 01:40:17.560 Well, I want to put it in a variable. 01:40:17.560 --> 01:40:20.620 Now the variable can't just be int x because that's a number. 01:40:20.620 --> 01:40:22.260 It's not an address per se. 01:40:22.260 --> 01:40:25.530 If I want to store this address in a variable, I could call it x, 01:40:25.530 --> 01:40:26.490 I could call it p. 01:40:26.490 --> 01:40:31.200 But int star x just means that x is now the address of a chunk of memory, 01:40:31.200 --> 01:40:33.420 specifically a chunk of memory that's big enough not 01:40:33.420 --> 01:40:36.660 for one, but for three ints in total. 01:40:36.660 --> 01:40:39.930 All right, now, I'm just sort of naively putting 01:40:39.930 --> 01:40:43.210 our old friend 72, 73, and 33 at the first, second, 01:40:43.210 --> 01:40:44.940 and third locations in memory. 01:40:44.940 --> 01:40:47.340 But perhaps based on week two or week four, 01:40:47.340 --> 01:40:49.530 I'm clearly screwing up here in a couple of ways. 01:40:49.530 --> 01:40:52.200 Someone want to identify at least one bug? 01:40:52.200 --> 01:40:53.193 What did I do wrong? 01:40:53.193 --> 01:40:55.110 AUDIENCE: You start at zero instead of at one. 01:40:55.110 --> 01:40:58.410 DAVID J. MALAN: Yeah, this is now amateur stuff. 01:40:58.410 --> 01:41:00.790 I should be zero indexing not one indexing. 01:41:00.790 --> 01:41:03.210 So this has got to be zero, one, two ultimately. 01:41:03.210 --> 01:41:05.775 And other bugs that are maybe more week four specific? 01:41:08.340 --> 01:41:09.180 Other bugs. 01:41:09.180 --> 01:41:09.930 It's more subtle. 01:41:09.930 --> 01:41:10.430 Yeah. 01:41:10.430 --> 01:41:11.310 AUDIENCE: [INAUDIBLE] 01:41:11.310 --> 01:41:12.390 DAVID J. MALAN: I'm not freeing the memory, right? 01:41:12.390 --> 01:41:14.970 So I'm not practicing what I'm preaching by freeing this memory. 01:41:14.970 --> 01:41:16.470 Now suppose these are non-obvious. 01:41:16.470 --> 01:41:20.070 And honestly, after an hour or two of this, this shouldn't be obvious yet. 01:41:20.070 --> 01:41:21.420 It will be over time. 01:41:21.420 --> 01:41:25.830 How could I find these bugs with software as opposed 01:41:25.830 --> 01:41:28.530 to just staring at the thing, or asking someone for help? 01:41:28.530 --> 01:41:29.980 Well, let me propose this. 01:41:29.980 --> 01:41:33.930 Let me first go ahead and run make memory to compile the program. 01:41:33.930 --> 01:41:35.970 And it seems to work-- look fine. 01:41:35.970 --> 01:41:37.800 There's no syntax errors at least. 01:41:37.800 --> 01:41:40.838 Dot slash memory, notice, seems to work fine too. 01:41:40.838 --> 01:41:42.880 Now this program doesn't do anything interesting. 01:41:42.880 --> 01:41:44.610 There's no printf or anything like that. 01:41:44.610 --> 01:41:45.760 But it didn't crash. 01:41:45.760 --> 01:41:48.000 There's no segmentation fault. But that doesn't 01:41:48.000 --> 01:41:51.060 mean there aren't bugs latent in the software. 01:41:51.060 --> 01:41:54.090 And this is true, sadly, of all of today's software. 01:41:54.090 --> 01:41:56.340 Chrome, and Microsoft Word, and other programs 01:41:56.340 --> 01:42:00.120 surely have memory-related bugs that people at Google and Microsoft 01:42:00.120 --> 01:42:01.080 haven't yet found. 01:42:01.080 --> 01:42:04.410 But there are tools at least to find the most obvious of those bugs. 01:42:04.410 --> 01:42:07.620 And we're going to introduce you now to a program called valgrind. 01:42:07.620 --> 01:42:09.900 So valgrind, it's a fairly fancy program. 01:42:09.900 --> 01:42:11.700 But we'll use it for very simple ways. 01:42:11.700 --> 01:42:15.840 We'll look at your code and find memory errors as it's executing 01:42:15.840 --> 01:42:18.060 and try to help you understand where they are. 01:42:18.060 --> 01:42:20.190 So let me go back to VS Code here. 01:42:20.190 --> 01:42:21.515 Memory seems to be fine. 01:42:21.515 --> 01:42:23.640 I feel like, OK, I'm going to submit this homework. 01:42:23.640 --> 01:42:24.180 All is good. 01:42:24.180 --> 01:42:25.170 No error messages. 01:42:25.170 --> 01:42:26.620 That's no longer the case. 01:42:26.620 --> 01:42:28.740 Now you need to poke a little more at your code 01:42:28.740 --> 01:42:30.820 to see if maybe there's still some bug there. 01:42:30.820 --> 01:42:35.560 So let me do this. valgrind and then space, dot slash memory. 01:42:35.560 --> 01:42:38.700 So just like debug50, you run it on a program you already compiled. 01:42:38.700 --> 01:42:41.550 valgrind, I'm going to run it on a program I already compiled. 01:42:41.550 --> 01:42:44.760 Let me zoom in on my terminal window so we can see more at once. 01:42:44.760 --> 01:42:46.110 And Enter. 01:42:46.110 --> 01:42:49.273 All right, the output is crazy cryptic for no good reason. 01:42:49.273 --> 01:42:50.940 There's lots of numbers and equal signs. 01:42:50.940 --> 01:42:52.200 It's a lot of clutter. 01:42:52.200 --> 01:42:54.250 But there is some juicy information here. 01:42:54.250 --> 01:42:55.950 And let me start from the top down. 01:42:55.950 --> 01:42:58.470 Invalid write of size four. 01:42:58.470 --> 01:43:02.400 So write means to change a value, read means to access a value. 01:43:02.400 --> 01:43:06.000 And this is, again, esoteric, like a lot of our error messages are. 01:43:06.000 --> 01:43:11.580 But it looks like after a block of size 12 alloc'd, and then there's 01:43:11.580 --> 01:43:13.200 these weird hex notation. 01:43:13.200 --> 01:43:14.580 There's some mention of malloc. 01:43:14.580 --> 01:43:18.120 But honestly, the juicy part here is memory.c, line six. 01:43:18.120 --> 01:43:21.610 That's probably my fault. So let's look at line six per that output. 01:43:21.610 --> 01:43:24.300 Let me shrink the terminal window, look at line six. 01:43:24.300 --> 01:43:26.160 OK, 12 is now germane. 01:43:26.160 --> 01:43:29.000 If you did the mental math of the size of an n times 3, 01:43:29.000 --> 01:43:31.170 12 is somehow involved here. 01:43:31.170 --> 01:43:36.090 But line six is now happening next here. 01:43:36.090 --> 01:43:37.650 That's where the memory came from. 01:43:37.650 --> 01:43:38.500 What is this? 01:43:38.500 --> 01:43:39.480 Let me zoom back in. 01:43:39.480 --> 01:43:45.150 Where is there invalid write of size four? 01:43:45.150 --> 01:43:47.820 What's perhaps going wrong here? 01:43:47.820 --> 01:43:49.830 Invalid write of size four. 01:43:49.830 --> 01:43:50.880 What does that mean? 01:43:50.880 --> 01:43:53.850 It's like a very technical way of explaining. 01:43:53.850 --> 01:43:57.270 The bug is actually one line later, on line seven, as we already identified. 01:43:57.270 --> 01:43:57.945 Yeah. 01:43:57.945 --> 01:43:58.820 AUDIENCE: [INAUDIBLE] 01:43:58.820 --> 01:43:59.300 DAVID J. MALAN: Indeed. 01:43:59.300 --> 01:44:00.467 And I misspoke a moment ago. 01:44:00.467 --> 01:44:02.420 The bug actually arises here with line nine. 01:44:02.420 --> 01:44:06.575 So after the allocation of memory, I'm somehow writing 4 bytes incorrectly. 01:44:06.575 --> 01:44:08.450 And unfortunately, the onus is kind of on you 01:44:08.450 --> 01:44:11.420 to sort of think through deductively what could that mean. 01:44:11.420 --> 01:44:14.960 But I'm clearly touching 4 bytes of memory in these few lines of code 01:44:14.960 --> 01:44:15.797 that I shouldn't be. 01:44:15.797 --> 01:44:18.380 And hopefully here as the light bulb already went off earlier, 01:44:18.380 --> 01:44:20.150 oh, I'm not zero indexing. 01:44:20.150 --> 01:44:22.800 OK, that must mean that x bracket three, as you know, 01:44:22.800 --> 01:44:25.170 is just too far past the chunk of memory. 01:44:25.170 --> 01:44:28.670 So I'm invalidly writing to 4 bytes that I shouldn't be. 01:44:28.670 --> 01:44:30.200 So again, it's not super obvious. 01:44:30.200 --> 01:44:31.920 This is not super user friendly. 01:44:31.920 --> 01:44:35.120 But at least it does give you a clue as to where that bug is. 01:44:35.120 --> 01:44:38.690 So the fix there is going to be quite simply to change the one 01:44:38.690 --> 01:44:42.020 to a zero, the two to a one, and the three to a two. 01:44:42.020 --> 01:44:42.775 That'll fix that. 01:44:42.775 --> 01:44:44.150 But there's still a second error. 01:44:44.150 --> 01:44:46.250 And let me look at the cryptic output again. 01:44:46.250 --> 01:44:50.180 Heap summary, some stuff there, OK, this does not sound good down here. 01:44:50.180 --> 01:44:54.740 12 bytes in one blocks are definitely lost in loss record one of one. 01:44:54.740 --> 01:44:56.510 Very arcane output two. 01:44:56.510 --> 01:44:59.970 But clearly related to line six again, our allocation of memory. 01:44:59.970 --> 01:45:02.300 Now here too, it's not obvious what the solution is. 01:45:02.300 --> 01:45:04.010 But memory is lost. 01:45:04.010 --> 01:45:05.900 AKA, this is a memory leak. 01:45:05.900 --> 01:45:08.910 And now the deduction is kind of up to you. 01:45:08.910 --> 01:45:09.720 What is leaking? 01:45:09.720 --> 01:45:10.220 Oh, wait. 01:45:10.220 --> 01:45:11.480 I didn't call free. 01:45:11.480 --> 01:45:13.760 And so the second solution here is probably 01:45:13.760 --> 01:45:16.242 to free x at the very end of the program. 01:45:16.242 --> 01:45:18.950 And if you really want to be pedantic, you should probably check, 01:45:18.950 --> 01:45:21.590 like I proposed earlier, if x is NULL, just 01:45:21.590 --> 01:45:24.260 get out now while you still can and don't even 01:45:24.260 --> 01:45:25.650 touch those other lines of code. 01:45:25.650 --> 01:45:27.410 But if you get to the bottom, return zero. 01:45:27.410 --> 01:45:30.740 But really, the takeaways are, I fixed my zero indexing 01:45:30.740 --> 01:45:33.530 of the array to avoid the invalid write of size four. 01:45:33.530 --> 01:45:36.390 And now, I'm freeing the memory that I asked for. 01:45:36.390 --> 01:45:37.927 So there should be no leak lost. 01:45:37.927 --> 01:45:39.260 All right, let's try this again. 01:45:39.260 --> 01:45:41.900 Make memory, dot slash memory. 01:45:41.900 --> 01:45:43.370 No visible errors yet. 01:45:43.370 --> 01:45:45.980 But let me now increase my terminal window again, do 01:45:45.980 --> 01:45:49.010 valgrind of dot slash memory, crossing my fingers, 01:45:49.010 --> 01:45:53.598 and now all heap blocks were freed, no leaks are possible. 01:45:53.598 --> 01:45:54.890 I don't see any invalid writes. 01:45:54.890 --> 01:45:56.150 There's still a crazy amount of output. 01:45:56.150 --> 01:45:57.350 But none of it is erroneous. 01:45:57.350 --> 01:45:58.310 It's not bad. 01:45:58.310 --> 01:46:00.020 Now I fixed my memory bugs. 01:46:00.020 --> 01:46:01.970 And so now my ta, my tf, they're not going 01:46:01.970 --> 01:46:03.890 to find them either because at least valgrind 01:46:03.890 --> 01:46:05.990 has proactively done that for me. 01:46:05.990 --> 01:46:08.717 Questions then on valgrind? 01:46:08.717 --> 01:46:11.300 Generally, it's those two types of errors you might trip over. 01:46:11.300 --> 01:46:14.720 There's not too much else in the way of arcane output. 01:46:14.720 --> 01:46:17.550 Questions then on this? 01:46:17.550 --> 01:46:18.050 No? 01:46:18.050 --> 01:46:20.640 All right, well, what else might be going on? 01:46:20.640 --> 01:46:22.530 So someone alluded to this earlier. 01:46:22.530 --> 01:46:26.780 What happens when you, for instance, forget the NULL terminator 01:46:26.780 --> 01:46:30.740 or you generally start poking around memory that you yourself didn't ask for 01:46:30.740 --> 01:46:33.440 or looking at values you didn't put there? 01:46:33.440 --> 01:46:34.940 Well, let me go ahead and open this. 01:46:34.940 --> 01:46:39.140 Code of garbage.c, in honor of Oscar the Grouch here of sorts. 01:46:39.140 --> 01:46:42.740 And here is a simple program if I hide my terminal window that 01:46:42.740 --> 01:46:44.390 just does something kind of arbitrary. 01:46:44.390 --> 01:46:47.870 I first declare an array called scores. 01:46:47.870 --> 01:46:50.630 But I made it crazy big, like 1024. 01:46:50.630 --> 01:46:52.520 That's a lot of integers. 01:46:52.520 --> 01:46:53.430 But so be it. 01:46:53.430 --> 01:46:55.520 And then I integrate over those integers. 01:46:55.520 --> 01:46:57.350 And I print each of those scores out. 01:46:57.350 --> 01:46:59.570 So I'm using week two syntax here. 01:46:59.570 --> 01:47:02.630 But based on this program, what have I clearly not done that I did 01:47:02.630 --> 01:47:04.400 do back in week two? 01:47:04.400 --> 01:47:07.355 I've allocated the array, I'm printing the array, but, but, but-- 01:47:07.355 --> 01:47:09.600 AUDIENCE: [INAUDIBLE] 01:47:09.600 --> 01:47:12.600 DAVID J. MALAN: Yeah, I didn't initialize any values for that array. 01:47:12.600 --> 01:47:14.160 Back in week two, we didn't do 1024. 01:47:14.160 --> 01:47:14.910 We did like three. 01:47:14.910 --> 01:47:17.310 And I typed in three test scores or something like that. 01:47:17.310 --> 01:47:20.310 Here, I'm allocating memory even more than that just because I really 01:47:20.310 --> 01:47:22.310 want to be dramatic with what I'm demonstrating. 01:47:22.310 --> 01:47:24.760 But I'm not initializing those values to anything. 01:47:24.760 --> 01:47:27.660 And so here, it turns out in C, generally, 01:47:27.660 --> 01:47:30.300 if you do not initialize a variable, or you do not 01:47:30.300 --> 01:47:32.490 initialize an array with explicit values, 01:47:32.490 --> 01:47:35.220 there are going to be garbage values there, so to speak, 01:47:35.220 --> 01:47:39.210 remnants of that memory having been used before probably 01:47:39.210 --> 01:47:42.540 by some other function of yours, some library function, or something else 01:47:42.540 --> 01:47:43.800 while your program is running. 01:47:43.800 --> 01:47:46.480 Not a huge deal with a super small program like this. 01:47:46.480 --> 01:47:49.140 But for anything sizable, memory is going to be used, 01:47:49.140 --> 01:47:52.560 and unused, and used, and unused that is malloced and freed again and again. 01:47:52.560 --> 01:47:55.650 There's going to be lots of garbage values in the computer's memory 01:47:55.650 --> 01:47:59.460 by default. So if I open my terminal window here, let 01:47:59.460 --> 01:48:04.230 me do make garbage, let me zoom in on my terminal so we can see the output. 01:48:04.230 --> 01:48:06.960 When I run dot slash garbage, theoretically, I 01:48:06.960 --> 01:48:11.445 should see 1,024 integers, but none of which have been initialized. 01:48:11.445 --> 01:48:13.320 Now I'm going to get lucky with some of them. 01:48:13.320 --> 01:48:16.470 And it looks like, wow, OK, a lot of them are initialized to zero. 01:48:16.470 --> 01:48:19.260 And C does in some contexts initialize memory for you 01:48:19.260 --> 01:48:22.770 to zero, at least at the beginning, but not again and again typically. 01:48:22.770 --> 01:48:27.870 But if I start scrolling backwards in time at this array of size 1024, 01:48:27.870 --> 01:48:30.780 where did these values come from? 01:48:30.780 --> 01:48:34.920 So just random positive and negative numbers interspersed among the zeros? 01:48:34.920 --> 01:48:38.848 Well, that's because I'm literally poking around on the random 1,024 01:48:38.848 --> 01:48:40.140 bytes of the computer's memory. 01:48:40.140 --> 01:48:41.440 Who knows what's there? 01:48:41.440 --> 01:48:43.960 So the lesson here is that garbage values are indeed 01:48:43.960 --> 01:48:47.910 this term of R. It means that a variable that you might have 01:48:47.910 --> 01:48:49.980 defined that you might have declared. 01:48:49.980 --> 01:48:53.690 If you don't give it an explicit value, who knows what's going to be there? 01:48:53.690 --> 01:48:55.440 And the lesson here is just don't do that. 01:48:55.440 --> 01:48:58.200 Always initialize variables to something, 01:48:58.200 --> 01:49:01.950 either yourself, or prompting the human for it. 01:49:01.950 --> 01:49:05.650 Questions about garbage values. 01:49:05.650 --> 01:49:09.540 You'll see them sometimes if you print things you shouldn't or touch arrays 01:49:09.540 --> 01:49:11.590 beyond their boundaries. 01:49:11.590 --> 01:49:12.090 All right. 01:49:12.090 --> 01:49:15.330 So maybe to make this a little visual too, it turns out that a lot of things 01:49:15.330 --> 01:49:16.860 can go wrong unfortunately with pointers. 01:49:16.860 --> 01:49:17.910 And we've seen some of them. 01:49:17.910 --> 01:49:20.118 And here's another program that's a little contrived. 01:49:20.118 --> 01:49:20.860 It's very simple. 01:49:20.860 --> 01:49:23.618 And it just is about manipulating values. 01:49:23.618 --> 01:49:25.410 It doesn't do anything useful per se except 01:49:25.410 --> 01:49:26.980 demonstrate some of today's concepts. 01:49:26.980 --> 01:49:29.580 So in main here, let me propose that we declare 01:49:29.580 --> 01:49:32.970 a pointer called x that's going to store eventually the address of an integer 01:49:32.970 --> 01:49:33.630 apparently. 01:49:33.630 --> 01:49:36.420 Here's another one called y that's going to store the address of an integer 01:49:36.420 --> 01:49:36.990 as well. 01:49:36.990 --> 01:49:40.995 Here now, I'm allocating enough memory to fit one integer. 01:49:40.995 --> 01:49:42.120 Now technically, it's four. 01:49:42.120 --> 01:49:42.662 We know that. 01:49:42.662 --> 01:49:45.720 But size of int just gives me that answer dynamically. 01:49:45.720 --> 01:49:47.760 So it will work on all systems. 01:49:47.760 --> 01:49:52.230 And I'm going to store the address that malloc finds for me in x. 01:49:52.230 --> 01:49:56.610 Then I go to x and put the number 42 there. 01:49:56.610 --> 01:49:57.690 All right, why? 01:49:57.690 --> 01:50:00.630 The sort of meaning of life, the universe, and everything here, 01:50:00.630 --> 01:50:05.020 but star x, again, just means go to that address and put a value there. 01:50:05.020 --> 01:50:05.670 So why? 01:50:05.670 --> 01:50:06.240 I don't know. 01:50:06.240 --> 01:50:08.610 But it's just correct at this point. 01:50:08.610 --> 01:50:10.170 But what about this line here? 01:50:10.170 --> 01:50:12.900 Star y equals 13. 01:50:12.900 --> 01:50:14.280 Unlucky in this case. 01:50:14.280 --> 01:50:17.362 What's bad about this line here, star y? 01:50:17.362 --> 01:50:20.070 It's a combination now of today's primitives and that point here. 01:50:20.070 --> 01:50:20.922 Yeah. 01:50:20.922 --> 01:50:21.822 AUDIENCE: [INAUDIBLE] 01:50:21.822 --> 01:50:24.780 DAVID J. MALAN: Yeah, we didn't ask the computer to allocate any space. 01:50:24.780 --> 01:50:28.750 So y was not initialized with an equal sign at any point to anything. 01:50:28.750 --> 01:50:31.410 And so what is inside y so to speak? 01:50:31.410 --> 01:50:32.077 A garbage value. 01:50:32.077 --> 01:50:35.077 Maybe it's zero, which isn't bad, because at least it's nice and simple. 01:50:35.077 --> 01:50:37.090 But maybe it's some crazy large positive number, 01:50:37.090 --> 01:50:38.650 or some crazy large negative number. 01:50:38.650 --> 01:50:40.710 Either way, odds are if I go to this address 01:50:40.710 --> 01:50:44.562 or that address randomly with star y, bad things are going to happen. 01:50:44.562 --> 01:50:46.020 And so let me go ahead and propose. 01:50:46.020 --> 01:50:47.200 Well, let's not do that. 01:50:47.200 --> 01:50:50.655 Let's actually do this instead, assign y equal to x. 01:50:50.655 --> 01:50:51.780 And we've done that before. 01:50:51.780 --> 01:50:56.580 And then I can go to y now and change what was a 42 to a 13. 01:50:56.580 --> 01:50:57.120 Again, why? 01:50:57.120 --> 01:50:58.840 This is just for educational sake. 01:50:58.840 --> 01:51:04.020 But for now, this does not crash because I only de-reference y with star y 01:51:04.020 --> 01:51:05.550 after actually giving it a value. 01:51:05.550 --> 01:51:09.570 Albeit, a duplicate value similar to our copy example earlier. 01:51:09.570 --> 01:51:12.333 So our friends at Stanford have put together a wonderful visual. 01:51:12.333 --> 01:51:13.500 It's about two minutes long. 01:51:13.500 --> 01:51:15.960 Allow me to dramatically dim the lights, if we could, 01:51:15.960 --> 01:51:20.550 and play with what happens with memory when you do bad things like this. 01:51:20.550 --> 01:51:21.334 [VIDEO PLAYBACK] 01:51:21.334 --> 01:51:23.923 [MUSIC PLAYING] 01:51:23.923 --> 01:51:24.840 SPEAKER 1: Hey, Binky. 01:51:24.840 --> 01:51:25.620 Wake up. 01:51:25.620 --> 01:51:28.230 It's time for pointer fun. 01:51:28.230 --> 01:51:29.310 BINKY: What's that? 01:51:29.310 --> 01:51:30.960 Learn about pointers? 01:51:30.960 --> 01:51:32.253 Oh, goody. 01:51:32.253 --> 01:51:34.170 SPEAKER 1: Well, to get started, I guess we're 01:51:34.170 --> 01:51:35.760 going to need a couple of pointers. 01:51:35.760 --> 01:51:40.370 BINKY: OK, this code allocates two pointers which can point to integers. 01:51:40.370 --> 01:51:42.352 SPEAKER 1: OK, well I see the two pointers. 01:51:42.352 --> 01:51:44.310 But they don't seem to be pointing to anything. 01:51:44.310 --> 01:51:45.143 BINKY: That's right. 01:51:45.143 --> 01:51:47.220 Initially, pointers don't point to anything. 01:51:47.220 --> 01:51:49.500 The things they point to are called pointees. 01:51:49.500 --> 01:51:51.243 And setting them up is a separate step. 01:51:51.243 --> 01:51:52.410 SPEAKER 1: Oh, right, right. 01:51:52.410 --> 01:51:53.130 I knew that. 01:51:53.130 --> 01:51:54.990 The pointees are separate. 01:51:54.990 --> 01:51:57.390 So how do you allocate a pointee? 01:51:57.390 --> 01:52:01.020 BINKY: OK, well, this code allocates a new integer pointee 01:52:01.020 --> 01:52:04.043 and this part sets x to point to it. 01:52:04.043 --> 01:52:05.460 SPEAKER 1: Hey, that looks better. 01:52:05.460 --> 01:52:07.040 So make it do something. 01:52:07.040 --> 01:52:10.520 BINKY: OK, I'll de-reference the pointer x to store the number 01:52:10.520 --> 01:52:12.620 42 into its pointee. 01:52:12.620 --> 01:52:16.250 For this trick, I'll need my magic wand of de-referencing. 01:52:16.250 --> 01:52:19.940 SPEAKER 1: Your magic wand of de-referencing? 01:52:19.940 --> 01:52:21.520 That's great. 01:52:21.520 --> 01:52:23.200 BINKY: This is what the code looks like. 01:52:23.200 --> 01:52:26.025 I'll just set up the number and-- 01:52:26.025 --> 01:52:26.900 SPEAKER 1: Hey, look. 01:52:26.900 --> 01:52:28.190 There it goes. 01:52:28.190 --> 01:52:31.790 So doing a de-reference on x follows the arrow 01:52:31.790 --> 01:52:35.240 to access its pointee, in this case, the store 42 in there. 01:52:35.240 --> 01:52:39.770 Hey, try using it to store the number 13 through the other pointer, y. 01:52:39.770 --> 01:52:41.180 BINKY: OK. 01:52:41.180 --> 01:52:45.290 Just go over here to y and get the number 13 set up, 01:52:45.290 --> 01:52:49.550 and then take the wand of de-referencing and just-- 01:52:49.550 --> 01:52:50.948 [BUZZER SOUND] whoa! 01:52:50.948 --> 01:52:51.740 SPEAKER 1: Oh, hey. 01:52:51.740 --> 01:52:53.120 That didn't work. 01:52:53.120 --> 01:52:58.580 Say, Binky, I don't think de-referencing y is a good idea because setting up 01:52:58.580 --> 01:53:00.210 the pointee is a separate step. 01:53:00.210 --> 01:53:02.600 And I don't think we ever did it. 01:53:02.600 --> 01:53:03.650 BINKY: Good point. 01:53:03.650 --> 01:53:06.110 SPEAKER 1: Yeah, we allocated the pointer y, 01:53:06.110 --> 01:53:09.320 but we never set it to point to a pointee. 01:53:09.320 --> 01:53:10.458 BINKY: Very observant. 01:53:10.458 --> 01:53:12.500 SPEAKER 1: Hey, you're looking good there, Binky. 01:53:12.500 --> 01:53:15.440 Can you fix it so that y points to the same point as x? 01:53:15.440 --> 01:53:16.160 BINKY: Sure. 01:53:16.160 --> 01:53:18.782 I'll use my magic wand of pointer assignment. 01:53:18.782 --> 01:53:20.990 SPEAKER 1: Is that going to be a problem like before? 01:53:20.990 --> 01:53:22.910 BINKY: No, this doesn't touch the pointees. 01:53:22.910 --> 01:53:26.540 It just changes one pointer to point to the same thing as another. 01:53:26.540 --> 01:53:27.590 SPEAKER 1: Oh, I see. 01:53:27.590 --> 01:53:30.575 Now y points to the same place as x. 01:53:30.575 --> 01:53:32.150 So wait, now y is fixed. 01:53:32.150 --> 01:53:33.230 It has a pointee. 01:53:33.230 --> 01:53:35.210 So you can try the wand of de-referencing again 01:53:35.210 --> 01:53:37.790 to send the 13 over. 01:53:37.790 --> 01:53:40.182 BINKY: OK, here it goes. 01:53:40.182 --> 01:53:41.390 SPEAKER 1: Hey, look at that. 01:53:41.390 --> 01:53:43.190 Now de-referencing works on y. 01:53:43.190 --> 01:53:47.210 And because the pointers are sharing that one pointee, they both see the 13. 01:53:47.210 --> 01:53:48.380 BINKY: Yeah, sharing. 01:53:48.380 --> 01:53:48.920 Whatever. 01:53:48.920 --> 01:53:50.917 So are we going to switch places now? 01:53:50.917 --> 01:53:51.750 SPEAKER 1: Oh, look. 01:53:51.750 --> 01:53:52.575 We're out of time. 01:53:52.575 --> 01:53:53.075 BINKY: But-- 01:53:53.075 --> 01:53:53.270 [END PLAYBACK] 01:53:53.270 --> 01:53:54.980 DAVID J. MALAN: Our thanks to Professor Nick Parlante 01:53:54.980 --> 01:53:57.290 of Stanford for spending a huge amount of time 01:53:57.290 --> 01:53:59.180 doing stop motion animation for that. 01:53:59.180 --> 01:54:02.120 But hopefully now, you have a sense of what too can go wrong 01:54:02.120 --> 01:54:04.612 when you misuse a memory in this way. 01:54:04.612 --> 01:54:06.320 But at the end of the day, we really only 01:54:06.320 --> 01:54:08.070 have these four new building blocks today, 01:54:08.070 --> 01:54:11.090 like the star operator, the ampersand operator, malloc, and free. 01:54:11.090 --> 01:54:13.340 And really with that, and the underlying understanding 01:54:13.340 --> 01:54:15.810 of what your computer is doing underneath the hood, 01:54:15.810 --> 01:54:18.242 we have this way now to really manipulate things 01:54:18.242 --> 01:54:19.700 in memory, for better or for worse. 01:54:19.700 --> 01:54:21.960 And eventually, we'll see how we can build things. 01:54:21.960 --> 01:54:23.930 But we can also now use today's primitives 01:54:23.930 --> 01:54:26.390 to better explain some things that we've been 01:54:26.390 --> 01:54:29.130 asking you to take for granted over the past several weeks. 01:54:29.130 --> 01:54:33.200 So for instance, let me propose that we do-- 01:54:33.200 --> 01:54:35.180 one volunteer up here if we could. 01:54:35.180 --> 01:54:37.392 Could we get one volunteer who-- 01:54:37.392 --> 01:54:38.600 you want to come straight up? 01:54:38.600 --> 01:54:39.810 Yep, right in the middle. 01:54:39.810 --> 01:54:40.070 Come on. 01:54:40.070 --> 01:54:41.903 You'll have to take a left or a right there. 01:54:47.760 --> 01:54:48.360 All right. 01:54:48.360 --> 01:54:52.380 So we have two empty glasses here and two colors of liquid. 01:54:52.380 --> 01:54:57.150 And we have, let me give you the mic, if you'd like to say hello to the group. 01:54:57.150 --> 01:54:57.840 MOINE: Hello. 01:54:57.840 --> 01:54:58.950 I'm Moine. 01:54:58.950 --> 01:55:00.780 I'm in [INAUDIBLE] and first year. 01:55:00.780 --> 01:55:01.170 DAVID J. MALAN: All right. 01:55:01.170 --> 01:55:01.670 Welcome. 01:55:01.670 --> 01:55:02.520 Well, welcome here. 01:55:02.520 --> 01:55:06.390 I'm going to go ahead and fill these two glasses with this colored liquid, 01:55:06.390 --> 01:55:08.280 purple here on my right. 01:55:08.280 --> 01:55:11.430 Let's fill up a glass here. 01:55:11.430 --> 01:55:12.690 MOINE: It's ominous. 01:55:12.690 --> 01:55:14.460 DAVID J. MALAN: Yes, don't drink. 01:55:14.460 --> 01:55:18.250 And now we'll put some orange in here. 01:55:18.250 --> 01:55:21.660 And what we'd like you to do for the audience, if you don't mind, 01:55:21.660 --> 01:55:23.583 is swap the two values. 01:55:23.583 --> 01:55:25.500 You've got a purple value and an orange value. 01:55:25.500 --> 01:55:28.770 And I'd like the purple liquid in this glass and the orange liquid 01:55:28.770 --> 01:55:29.655 in that glass please. 01:55:32.652 --> 01:55:34.010 MOINE: Can I have another glass? 01:55:34.010 --> 01:55:34.550 DAVID J. MALAN: Oh, OK. 01:55:34.550 --> 01:55:35.390 Good intuition. 01:55:35.390 --> 01:55:37.267 But for the microphone-- 01:55:37.267 --> 01:55:38.600 MOINE: Can I have another glass? 01:55:38.600 --> 01:55:39.440 DAVID J. MALAN: So you can. 01:55:39.440 --> 01:55:41.552 And just in fact, I brought one here for you. 01:55:41.552 --> 01:55:43.010 Why are you asking for this though? 01:55:43.010 --> 01:55:45.620 MOINE: Because if I just pour this into this, then it'll get mixed up. 01:55:45.620 --> 01:55:46.537 DAVID J. MALAN: Right. 01:55:46.537 --> 01:55:49.410 So obviously we need like a temporary variable, if you will. 01:55:49.410 --> 01:55:52.765 So here is your temporary variable. 01:55:52.765 --> 01:55:53.640 MOINE: And you want-- 01:55:53.640 --> 01:55:54.330 OK. 01:55:54.330 --> 01:55:55.205 DAVID J. MALAN: Yeah. 01:55:55.205 --> 01:55:56.640 There's-- yeah. 01:55:56.640 --> 01:55:59.730 All right so pouring the value of the orange glass 01:55:59.730 --> 01:56:03.390 into this temporary variable, if you will. 01:56:03.390 --> 01:56:04.080 All right. 01:56:04.080 --> 01:56:09.240 And now pouring the value of the purple glass into the former orange glass. 01:56:12.740 --> 01:56:14.270 And now-- 01:56:14.270 --> 01:56:15.570 MOINE: And now this goes back. 01:56:15.570 --> 01:56:19.965 DAVID J. MALAN: The temporary value goes back into the original purple glass. 01:56:19.965 --> 01:56:21.840 And now I think we give you round of applause 01:56:21.840 --> 01:56:23.132 for having done that very well. 01:56:23.132 --> 01:56:25.970 [INAUDIBLE] 01:56:25.970 --> 01:56:27.120 MOINE: Thank you. 01:56:27.120 --> 01:56:28.280 DAVID J. MALAN: All right. 01:56:28.280 --> 01:56:31.680 So it should go without saying that in the real world, that's how you do this. 01:56:31.680 --> 01:56:34.430 And in fact, in code, that's pretty much how you have to do this, 01:56:34.430 --> 01:56:37.632 although ask us sometime for a super fancy way of doing it 01:56:37.632 --> 01:56:38.840 without a temporary variable. 01:56:38.840 --> 01:56:41.090 It turns out that is possible using bits. 01:56:41.090 --> 01:56:43.703 But for now, let's suppose that, indeed, this demonstrates 01:56:43.703 --> 01:56:44.870 what is the reality in code. 01:56:44.870 --> 01:56:46.910 If you want to swap two values, you need to have 01:56:46.910 --> 01:56:48.990 something like a temporary variable. 01:56:48.990 --> 01:56:52.820 So for instance, on the screen here is a-- the beginning of a function 01:56:52.820 --> 01:56:56.420 called swap, whose purpose in life is to, as you just did, swap two values, 01:56:56.420 --> 01:56:59.270 call it A and B. So orange and purple respectively 01:56:59.270 --> 01:57:01.730 are now just A and B and integers to keep things simple. 01:57:01.730 --> 01:57:03.688 Well, here is the corresponding code, if I may, 01:57:03.688 --> 01:57:05.300 to what you just enacted as a human. 01:57:05.300 --> 01:57:08.700 You declared a temporary variable, a call temp in this case, 01:57:08.700 --> 01:57:10.640 which was like me handing you the empty glass. 01:57:10.640 --> 01:57:14.000 And you stored the orange liquid in it, AKA A, you then 01:57:14.000 --> 01:57:19.190 change the value of the formerly orange glass to be equal to the purple 01:57:19.190 --> 01:57:20.480 by pouring one into the other. 01:57:20.480 --> 01:57:22.040 And then you did the opposite there. 01:57:22.040 --> 01:57:25.500 Now at the end of this, you still have a temporary variable that's now empty. 01:57:25.500 --> 01:57:27.260 So it's temporary in literally that sense. 01:57:27.260 --> 01:57:28.552 You just don't need it anymore. 01:57:28.552 --> 01:57:30.570 But it was necessary along the way. 01:57:30.570 --> 01:57:33.770 So I dare say this code is correct logically. 01:57:33.770 --> 01:57:39.440 This will swap two values A and B thanks to the use of that temporary variable. 01:57:39.440 --> 01:57:42.348 Unfortunately though, if I actually do this in practice, 01:57:42.348 --> 01:57:44.390 let me go over to VS Code here and open a program 01:57:44.390 --> 01:57:48.920 I wrote in advance called swap.c, which does this as follows. 01:57:48.920 --> 01:57:52.790 In here, notice I have my prototype for a swap function at the very top. 01:57:52.790 --> 01:57:54.590 And let me scroll down to the very bottom. 01:57:54.590 --> 01:57:56.210 There is that exact same code. 01:57:56.210 --> 01:58:00.050 So I'm-- the same code for swapping two values A and B, 01:58:00.050 --> 01:58:02.253 which I'm claiming for now is correct. 01:58:02.253 --> 01:58:04.670 Now if I go back up here, what is main going to do for us? 01:58:04.670 --> 01:58:06.628 Main is really just meant to be a demonstration 01:58:06.628 --> 01:58:08.220 of the correctness of your algorithm. 01:58:08.220 --> 01:58:11.990 So here I declare on line seven and eight, two variables, x and y, 01:58:11.990 --> 01:58:14.660 being one and two arbitrarily respectively. 01:58:14.660 --> 01:58:18.230 I then on line 10 just print out what the value of x is and y 01:58:18.230 --> 01:58:20.210 is just so I can see it on the screen. 01:58:20.210 --> 01:58:22.940 I then call the swap function on line 11, 01:58:22.940 --> 01:58:26.780 and then I literally print the exact same thing again, I print x and y. 01:58:26.780 --> 01:58:29.190 Hopefully, it'll obviously be the opposite. 01:58:29.190 --> 01:58:31.640 So I think logically, swap is indeed correct. 01:58:31.640 --> 01:58:34.520 Let me do make swap and then dot slash swap. 01:58:34.520 --> 01:58:40.100 And I should see x is 1, y is 2, and then hopefully x is 2, y is 1. 01:58:40.100 --> 01:58:41.630 Enter. 01:58:41.630 --> 01:58:42.980 But I don't. 01:58:42.980 --> 01:58:46.800 And it did work in the sense that the code compiled, the code ran. 01:58:46.800 --> 01:58:49.010 So it's not like some bug in that sense. 01:58:49.010 --> 01:58:52.850 But because I don't quite understand what's going on underneath the hood, 01:58:52.850 --> 01:58:55.370 at least as of right now, or prior weeks, 01:58:55.370 --> 01:58:59.330 this code here is indeed buggy in some way. 01:58:59.330 --> 01:59:02.660 But does anyone have an intuition, perhaps based on today's discussion, 01:59:02.660 --> 01:59:06.500 as to why this code, while logically correct, clearly works in reality, 01:59:06.500 --> 01:59:09.670 apparently does not work in C? 01:59:09.670 --> 01:59:10.780 Any intuition? 01:59:10.780 --> 01:59:11.320 Yeah. 01:59:11.320 --> 01:59:13.980 AUDIENCE: [INAUDIBLE] 01:59:13.980 --> 01:59:14.980 DAVID J. MALAN: Perfect. 01:59:14.980 --> 01:59:17.147 And to summarize, here's that term of art I promise. 01:59:17.147 --> 01:59:20.680 When you call a function and pass in two arguments, like a and b, 01:59:20.680 --> 01:59:23.120 you're passing those arguments by value. 01:59:23.120 --> 01:59:25.850 So copies of those values effectively. 01:59:25.850 --> 01:59:28.610 And so when swap is actually called here-- 01:59:28.610 --> 01:59:29.110 sorry. 01:59:29.110 --> 01:59:31.682 When you pass an x and y, we call them a and b. 01:59:31.682 --> 01:59:32.890 But that's just a convention. 01:59:32.890 --> 01:59:35.410 We could call the parameters anything we want. 01:59:35.410 --> 01:59:39.950 What a and b are are indeed the values of x and y respectively, 01:59:39.950 --> 01:59:41.810 but copies of the values. 01:59:41.810 --> 01:59:45.220 So this code here is very successfully, in VS Code too, 01:59:45.220 --> 01:59:47.110 swapping the values of a and b. 01:59:47.110 --> 01:59:51.610 But as you note, because I'm passing them in by value, literally one, 01:59:51.610 --> 01:59:55.450 literally two, and not by another term of art, by reference, AKA 01:59:55.450 --> 01:59:59.650 by their addresses, swap has no capability in C 01:59:59.650 --> 02:00:02.740 to go to those locations, swap the actual locations, 02:00:02.740 --> 02:00:04.990 just like we did successfully in reality. 02:00:04.990 --> 02:00:07.300 But I think we really have the syntax already 02:00:07.300 --> 02:00:10.690 for solving this if we consider that really, this is just an issue of scope. 02:00:10.690 --> 02:00:12.790 And we've talked a bit about scope in the past, 02:00:12.790 --> 02:00:16.150 whereby scope refers to the context in which a variable lives. 02:00:16.150 --> 02:00:18.310 And generally, I've claimed that a variable exists 02:00:18.310 --> 02:00:20.290 between the most recent curly braces. 02:00:20.290 --> 02:00:24.010 And that's pretty much true for the swap function because a and b, 02:00:24.010 --> 02:00:27.670 I now claim again, exist only in the context of these curly braces. 02:00:27.670 --> 02:00:32.048 They have no effect on main up top, which has different variables x and y. 02:00:32.048 --> 02:00:34.840 But we can consider now what's really going on underneath the hood. 02:00:34.840 --> 02:00:37.360 And here's that same picture of memory, as we've seen in the past. 02:00:37.360 --> 02:00:39.550 If we zoom in and see on these little black chips, 02:00:39.550 --> 02:00:41.200 this is a bunch of bytes of memory. 02:00:41.200 --> 02:00:43.780 If I create a grid out of it just to kind of highlight 02:00:43.780 --> 02:00:47.500 that we can address each of these bytes, throw away the plastic circuit board, 02:00:47.500 --> 02:00:51.430 and focus only on those bytes, what's going on underneath the hood 02:00:51.430 --> 02:00:55.600 when functions are called in C, which you've been doing for weeks now? 02:00:55.600 --> 02:00:59.350 Well, this rectangle of memory, if we kind of abstracted away further, 02:00:59.350 --> 02:01:02.870 is generally broken up into different regions or segments, 02:01:02.870 --> 02:01:04.000 like I called them earlier. 02:01:04.000 --> 02:01:06.490 And different things get put in different parts 02:01:06.490 --> 02:01:07.760 of the computer's memory. 02:01:07.760 --> 02:01:10.330 And without getting too into the weeds, when 02:01:10.330 --> 02:01:12.490 you double click a program on your Mac or PC, 02:01:12.490 --> 02:01:15.940 or when you do dot slash something on a Linux, 02:01:15.940 --> 02:01:19.630 you are loading your machine code into the computer's memory 02:01:19.630 --> 02:01:21.410 from the computer's hard drive. 02:01:21.410 --> 02:01:24.550 So all the zeros and ones that compose Microsoft Word, or Chrome, 02:01:24.550 --> 02:01:27.970 or whatever are loaded into the computer's memory or RAM. 02:01:27.970 --> 02:01:31.360 And by convention, it's put up top in the so-called machine code area. 02:01:31.360 --> 02:01:34.660 And that's how the CPU has access to them quickly at that. 02:01:34.660 --> 02:01:37.450 Below that are what are going to be our globals. 02:01:37.450 --> 02:01:40.360 So global variables, which we haven't used very much in C. 02:01:40.360 --> 02:01:44.080 But you can declare them outside of main at the very top of your files. 02:01:44.080 --> 02:01:47.320 If you have globals, they end up up there as well, just FYI. 02:01:47.320 --> 02:01:49.180 And then there's this big chunk of memory 02:01:49.180 --> 02:01:52.480 that we saw valgrind mention indirectly earlier called the heap. 02:01:52.480 --> 02:01:54.340 And it's kind of like heap, literally. 02:01:54.340 --> 02:01:57.550 It's a heap of memory that you can use as you see fit. 02:01:57.550 --> 02:02:01.030 And the heap is where malloc grabs memory from. 02:02:01.030 --> 02:02:02.920 So initially, there's nothing in the heap. 02:02:02.920 --> 02:02:04.480 It's just a big chunk of free space. 02:02:04.480 --> 02:02:08.680 Any time you call malloc, malloc kind of carves out from the heap area 02:02:08.680 --> 02:02:09.790 more and more bytes. 02:02:09.790 --> 02:02:11.920 And malloc keeps track of, essentially, which 02:02:11.920 --> 02:02:13.480 bytes have already been allocated. 02:02:13.480 --> 02:02:14.890 So initially, it looks empty. 02:02:14.890 --> 02:02:17.290 But different bytes, squares if you will, 02:02:17.290 --> 02:02:20.560 keep getting requested again and again as a program runs thanks to functions 02:02:20.560 --> 02:02:21.340 like malloc. 02:02:21.340 --> 02:02:23.900 And it grows, if you will, conceptually down. 02:02:23.900 --> 02:02:27.173 So the more and more memory you request from malloc, it starts up here. 02:02:27.173 --> 02:02:29.590 But then the next chunk you get is down here conceptually. 02:02:29.590 --> 02:02:31.250 The next chunk is down here, down here. 02:02:31.250 --> 02:02:35.170 So it kind of fills the available space in the computer's overall memory. 02:02:35.170 --> 02:02:38.740 But there's this other chunk of memory called the stack. 02:02:38.740 --> 02:02:42.250 And just like a stack of trays in Annenberg or a cafeteria, 02:02:42.250 --> 02:02:45.340 kind of grow upward, so does a stack of memory. 02:02:45.340 --> 02:02:50.620 And it turns out the stack is where functions have variables, 02:02:50.620 --> 02:02:53.530 and have arguments stored temporarily. 02:02:53.530 --> 02:02:57.100 So whenever you call a function and it has variables inside of it, 02:02:57.100 --> 02:03:00.010 or has arguments there too, this is the chunk of memory, 02:03:00.010 --> 02:03:03.610 and the computer's overall block of memory, that are used for functions. 02:03:03.610 --> 02:03:06.425 But any time you call malloc, it's memory up here. 02:03:06.425 --> 02:03:08.800 At the end of the day, they just had to pick a direction. 02:03:08.800 --> 02:03:10.660 Top, bottom, and technically it's an artist's rendition. 02:03:10.660 --> 02:03:13.310 You could circle this thing around any orientation you want. 02:03:13.310 --> 02:03:16.540 But you're just using a finite amount of memory in this conventional way. 02:03:16.540 --> 02:03:19.360 Malloc starts here, functions start here. 02:03:19.360 --> 02:03:22.415 Now you can kind of see where bad things can happen. 02:03:22.415 --> 02:03:24.790 And indeed, one of the other reasons programs, computers, 02:03:24.790 --> 02:03:27.910 can crash is if you ask for way too much memory from the heap 02:03:27.910 --> 02:03:30.040 by calling malloc many, many, many times, 02:03:30.040 --> 02:03:33.730 or if you call way too many functions, or accidentally per last week, 02:03:33.730 --> 02:03:37.570 you recurse infinitely many times, you might have a segmentation fault. 02:03:37.570 --> 02:03:40.100 And that's because you're using too much stack memory. 02:03:40.100 --> 02:03:42.460 So this is bound to be a problem eventually. 02:03:42.460 --> 02:03:45.550 And the onus is on the programmer to just minimize 02:03:45.550 --> 02:03:49.270 the probability of doing that and really avoid the possibility of doing that 02:03:49.270 --> 02:03:53.770 by just checking return values, checking if malloc or get string return NULL. 02:03:53.770 --> 02:03:56.260 Because you can proactively with conditionals 02:03:56.260 --> 02:03:59.830 make sure that these two things do not collide by just making sure 02:03:59.830 --> 02:04:02.060 that you get back non-NULL values. 02:04:02.060 --> 02:04:04.840 So let's consider the stack in the context of swap 02:04:04.840 --> 02:04:06.400 and what's really happening here. 02:04:06.400 --> 02:04:09.233 And Carter, if you wouldn't mind helping me animate the screen here, 02:04:09.233 --> 02:04:13.000 when I call the main function of any program, 02:04:13.000 --> 02:04:17.483 it is allocated a slice of memory called a frame at the bottom of this stack. 02:04:17.483 --> 02:04:19.650 So if Carter, you want to go ahead and advance here, 02:04:19.650 --> 02:04:22.260 here's the first slice of memory that will always 02:04:22.260 --> 02:04:26.520 be used by main whether it has command line arguments, or local variables. 02:04:26.520 --> 02:04:28.260 It just ends up here in memory. 02:04:28.260 --> 02:04:32.980 Suppose now per our swap.c program that main calls swap. 02:04:32.980 --> 02:04:35.190 Well, where does the memory for swap end up? 02:04:35.190 --> 02:04:35.950 Right up here. 02:04:35.950 --> 02:04:39.390 So swap had two variables-- two arguments a and b. 02:04:39.390 --> 02:04:41.140 And it also had a temporary variable. 02:04:41.140 --> 02:04:42.993 So all of those end up in here in memory. 02:04:42.993 --> 02:04:44.910 And if you want to go ahead and advance again, 02:04:44.910 --> 02:04:48.330 Carter, once swap is done executing, whether it just 02:04:48.330 --> 02:04:51.780 returns because there's no more lines of code, or you explicitly return, 02:04:51.780 --> 02:04:54.577 this memory is just freed up automatically. 02:04:54.577 --> 02:04:55.410 You don't call free. 02:04:55.410 --> 02:04:56.610 You don't undo malloc. 02:04:56.610 --> 02:04:58.110 This just all happens automatically. 02:04:58.110 --> 02:04:59.880 It has been since week one. 02:04:59.880 --> 02:05:02.820 Now technically, it's still there even though we've 02:05:02.820 --> 02:05:04.200 removed it from the picture. 02:05:04.200 --> 02:05:06.870 And there's your first hint of garbage values. 02:05:06.870 --> 02:05:08.340 There's still zeros and ones there. 02:05:08.340 --> 02:05:11.340 And they're left in the original-- the previous configuration. 02:05:11.340 --> 02:05:13.967 And so the reason you get random values in the memory 02:05:13.967 --> 02:05:16.050 is because even though we haven't drawn swap here, 02:05:16.050 --> 02:05:17.730 there was stuff there a moment ago. 02:05:17.730 --> 02:05:20.400 It's going to be there the next time you use that same memory. 02:05:20.400 --> 02:05:23.250 Now let's go ahead and step through this a little more methodically. 02:05:23.250 --> 02:05:27.210 Main has two variables called x and y one and two. 02:05:27.210 --> 02:05:30.330 So let's advance and represent x as one, y as two 02:05:30.330 --> 02:05:31.950 taking up these two chunks of memory. 02:05:31.950 --> 02:05:35.130 When we call swap now, swap gets a new slice of memory 02:05:35.130 --> 02:05:40.050 that then gives us three variables, a and b, technically the arguments, 02:05:40.050 --> 02:05:40.800 and temp. 02:05:40.800 --> 02:05:41.820 So what happens? 02:05:41.820 --> 02:05:45.900 Well, because functions automatically pass in values by value, 02:05:45.900 --> 02:05:48.930 or rather pass in arguments by value, x gets 02:05:48.930 --> 02:05:53.130 copied into a, y gets copied into b, and then once we 02:05:53.130 --> 02:05:55.920 start executing the algorithm, a la the watered glasses, well, 02:05:55.920 --> 02:05:57.040 what happens here? 02:05:57.040 --> 02:06:01.710 So if I execute the first line of code, temp equals a, temp gets a copy of a. 02:06:01.710 --> 02:06:03.645 What happens next? a equals b. 02:06:03.645 --> 02:06:05.880 So a takes on a copy of b. 02:06:05.880 --> 02:06:09.030 And now we do the final swap in the glass, is b equals temp. 02:06:09.030 --> 02:06:10.890 b gets a copy of temp. 02:06:10.890 --> 02:06:13.872 Now we don't have to change temp because it's essentially empty, 02:06:13.872 --> 02:06:15.330 although there's the garbage value. 02:06:15.330 --> 02:06:18.390 One is always now going to be there until we reuse that memory. 02:06:18.390 --> 02:06:21.240 The important thing, though, is that a and b have been swapped. 02:06:21.240 --> 02:06:26.430 But what obviously has not been swapped, as is manifest as when swap returns, x 02:06:26.430 --> 02:06:27.660 and y are untouched. 02:06:27.660 --> 02:06:29.830 Because copies thereof were passed in. 02:06:29.830 --> 02:06:31.690 So we need a solution to this problem. 02:06:31.690 --> 02:06:34.290 And if we advance one more time, if you don't mind, let me step over here 02:06:34.290 --> 02:06:35.748 but then call you back in a second. 02:06:35.748 --> 02:06:38.310 This code here is logically correct. 02:06:38.310 --> 02:06:39.540 This is what you did. 02:06:39.540 --> 02:06:41.460 But this is now a detail of c. 02:06:41.460 --> 02:06:44.670 You can't just swap the things by value, because you're only changing it 02:06:44.670 --> 02:06:46.260 in the scope of the swap function. 02:06:46.260 --> 02:06:50.610 But I think if we change it to this and add some annoying syntax, 02:06:50.610 --> 02:06:52.500 we can solve the problem. 02:06:52.500 --> 02:06:55.740 Just like you can declare variables as storing addresses, 02:06:55.740 --> 02:07:00.360 you can declare arguments to functions, AKA parameters, as taking addresses. 02:07:00.360 --> 02:07:04.950 This new version of swap means that a shall be the address of an integer. 02:07:04.950 --> 02:07:07.050 b shall be the address of an integer. 02:07:07.050 --> 02:07:09.150 And now it gets a little cryptic here. 02:07:09.150 --> 02:07:12.270 Temp is the same because it's just an integer like it was in week one. 02:07:12.270 --> 02:07:14.040 Nothing special about temp. 02:07:14.040 --> 02:07:18.135 But if you want to get the value at a, you do star a. 02:07:18.135 --> 02:07:21.150 And that goes to the address, grabs the number one presumably. 02:07:21.150 --> 02:07:24.268 If you want to change the value of a, you go to that address, 02:07:24.268 --> 02:07:26.310 you follow the treasure map to the other mailbox, 02:07:26.310 --> 02:07:29.310 and you set it equal to whatever is at the value of b. 02:07:29.310 --> 02:07:30.900 You go to b as well. 02:07:30.900 --> 02:07:33.450 Last line, you go to b now and change it to be 02:07:33.450 --> 02:07:36.990 whatever the temporary variable was, which happened to be the same as a. 02:07:36.990 --> 02:07:39.300 So that's where the final value gets swapped. 02:07:39.300 --> 02:07:41.810 But here, there's a lot more crisscrossing metaphorically 02:07:41.810 --> 02:07:43.560 across the stage where you're going to all 02:07:43.560 --> 02:07:46.950 of these different addresses in the swap function to make these changes. 02:07:46.950 --> 02:07:49.200 So if we advance now to the pictorial version of this, 02:07:49.200 --> 02:07:51.480 here's the same story as before with main. 02:07:51.480 --> 02:07:53.700 And x and y are one and two respectively. 02:07:53.700 --> 02:07:57.540 When swap gets called now, notice, and I'll do it with arrows here, 02:07:57.540 --> 02:08:01.680 a is effectively pointing to x, b is effectively pointing to y. 02:08:01.680 --> 02:08:04.290 If we really get into the weeds, these are actually addresses. 02:08:04.290 --> 02:08:05.910 But who cares about the specifics? 02:08:05.910 --> 02:08:07.540 It's really just the concept here. 02:08:07.540 --> 02:08:08.700 So now what happens? 02:08:08.700 --> 02:08:10.500 Int temp gets star a. 02:08:10.500 --> 02:08:12.840 Star a means start at a and go there. 02:08:12.840 --> 02:08:14.160 Follow the arrow, if you will. 02:08:14.160 --> 02:08:15.570 Sort of chutes and ladders style. 02:08:15.570 --> 02:08:16.620 And then that's one. 02:08:16.620 --> 02:08:18.010 So we put one in temp. 02:08:18.010 --> 02:08:18.510 All right. 02:08:18.510 --> 02:08:20.123 Star a equals star b. 02:08:20.123 --> 02:08:21.540 So let's do it from right to left. 02:08:21.540 --> 02:08:23.220 Star b means follow the arrow. 02:08:23.220 --> 02:08:24.050 It's two. 02:08:24.050 --> 02:08:25.050 And then what do you do? 02:08:25.050 --> 02:08:26.040 Follow the arrow. 02:08:26.040 --> 02:08:29.790 It's now two because you copy one to the other from right to left. 02:08:29.790 --> 02:08:31.740 And then lastly, star b gets temp. 02:08:31.740 --> 02:08:33.270 So start at b, go to b. 02:08:33.270 --> 02:08:36.330 And now store whatever the value is in temp. 02:08:36.330 --> 02:08:39.930 So just by having this basic new syntax of like ampersands, and stars, 02:08:39.930 --> 02:08:42.270 and so forth, we can actually now go to places 02:08:42.270 --> 02:08:44.790 and circumvent what is otherwise a feature of C, 02:08:44.790 --> 02:08:47.160 that these variables are locally scoped. 02:08:47.160 --> 02:08:50.265 But you can still access things in other functions as well. 02:08:50.265 --> 02:08:52.390 So thank you so much for helping step through this. 02:08:52.390 --> 02:08:55.440 So we now have a application of this that 02:08:55.440 --> 02:08:59.830 explains why now in this version of the C code this would actually now work. 02:08:59.830 --> 02:09:03.030 So in fact, let me go back to my swap code here. 02:09:03.030 --> 02:09:06.030 And let me change the function ever so slightly in VS Code. 02:09:06.030 --> 02:09:08.800 So let me scroll down, leaving main the same. 02:09:08.800 --> 02:09:13.590 And let me change swaps prototype to taking in addresses. 02:09:13.590 --> 02:09:15.060 Let me go to a here. 02:09:15.060 --> 02:09:16.320 Let me go to a here. 02:09:16.320 --> 02:09:17.760 Let me go to b here. 02:09:17.760 --> 02:09:19.590 And let me go to b here as well. 02:09:19.590 --> 02:09:20.930 But nothing else changes. 02:09:20.930 --> 02:09:23.540 This change here in particular is enough of a clue 02:09:23.540 --> 02:09:27.410 to see that means when you call swap and pass in two values, 02:09:27.410 --> 02:09:31.040 I'm expecting addresses now, not integers. 02:09:31.040 --> 02:09:34.640 But now that I've made this change, I do need to go up to main 02:09:34.640 --> 02:09:37.490 and make one change. 02:09:37.490 --> 02:09:40.190 Does anyone have the intuition for what now needs change 02:09:40.190 --> 02:09:45.590 in main so that I pass in x and y by reference, that is by address rather 02:09:45.590 --> 02:09:48.290 than by value or copy? 02:09:48.290 --> 02:09:49.652 Yeah, in back. 02:09:49.652 --> 02:09:52.198 AUDIENCE: [INAUDIBLE] 02:09:52.198 --> 02:09:53.240 DAVID J. MALAN: So close. 02:09:53.240 --> 02:09:57.170 So on the swap line, it's not star that I want in front of the x and the y. 02:09:57.170 --> 02:09:59.834 It's instead-- 02:09:59.834 --> 02:10:00.720 AUDIENCE: [INAUDIBLE] 02:10:00.720 --> 02:10:01.560 DAVID J. MALAN: What's the other one? 02:10:01.560 --> 02:10:02.380 AUDIENCE: Ampersand. 02:10:02.380 --> 02:10:03.235 DAVID J. MALAN: It's the ampersand. 02:10:03.235 --> 02:10:03.735 Why? 02:10:03.735 --> 02:10:06.840 Because if I want to enable swap to go somewhere, just like Carter 02:10:06.840 --> 02:10:08.590 and I played this game with the mailboxes, 02:10:08.590 --> 02:10:12.220 I need to inform swap of the address of x and the address of y. 02:10:12.220 --> 02:10:14.470 And again, per the beginning of today's class, 02:10:14.470 --> 02:10:16.720 ampersand is the syntax via which we do that. 02:10:16.720 --> 02:10:19.840 So I add an ampersand here to get the address of x, ampersand here 02:10:19.840 --> 02:10:20.860 to get the address of y. 02:10:20.860 --> 02:10:23.530 And now this code lines up with the picture 02:10:23.530 --> 02:10:25.280 that Carter just helped us walk through. 02:10:25.280 --> 02:10:29.360 And so when I run make swap here, I have a mistake. 02:10:29.360 --> 02:10:30.670 Oh, what did I do wrong? 02:10:30.670 --> 02:10:31.600 Not intentional. 02:10:31.600 --> 02:10:34.240 But I guess worth pointing out. 02:10:34.240 --> 02:10:35.230 I screwed up here. 02:10:35.230 --> 02:10:40.000 It doesn't like ampersand x because of something 02:10:40.000 --> 02:10:43.240 on line three, which is way early into the code. 02:10:43.240 --> 02:10:44.470 What did I screw up? 02:10:44.470 --> 02:10:45.490 Yeah, in the middle. 02:10:45.490 --> 02:10:47.757 AUDIENCE: [INAUDIBLE] 02:10:47.757 --> 02:10:50.090 DAVID J. MALAN: Yeah, so this is why we-- you should not 02:10:50.090 --> 02:10:53.215 copy paste, even though it's necessary for things like function prototypes. 02:10:53.215 --> 02:10:56.240 If I changed swap at the bottom, I need to change its prototype. 02:10:56.240 --> 02:10:59.360 So let me add the star there, add the star there, or just re-copy paste 02:10:59.360 --> 02:11:00.780 it at the top of the file. 02:11:00.780 --> 02:11:02.490 Now let me do make swap again. 02:11:02.490 --> 02:11:03.980 Let me now do dot slash swap. 02:11:03.980 --> 02:11:06.530 And I should now see x is 1, y is 2. 02:11:06.530 --> 02:11:10.760 And hopefully, x is 2, y is 1, which I now do. 02:11:10.760 --> 02:11:12.017 So the logic is the same. 02:11:12.017 --> 02:11:13.100 The algorithm is the same. 02:11:13.100 --> 02:11:14.810 All the week zero stuff is the same. 02:11:14.810 --> 02:11:17.960 Except now in week four, you just have a bit more expressiveness 02:11:17.960 --> 02:11:22.310 via which you can tell the computer exactly what you want to manipulate 02:11:22.310 --> 02:11:24.070 and how. 02:11:24.070 --> 02:11:29.250 Any questions then on this technique here? 02:11:29.250 --> 02:11:29.750 No? 02:11:29.750 --> 02:11:30.250 All right. 02:11:30.250 --> 02:11:33.072 Well, when we fix this, there's still going to be problems. 02:11:33.072 --> 02:11:35.030 And just so you've seen some terms of art here, 02:11:35.030 --> 02:11:38.113 this is bad whenever you have two arrows pointing at one another certainly 02:11:38.113 --> 02:11:40.400 if you might use and reuse more and more memory. 02:11:40.400 --> 02:11:43.320 And it turns out there are some terms of art that might suddenly now make sense, 02:11:43.320 --> 02:11:44.945 especially if you've programmed before. 02:11:44.945 --> 02:11:47.210 Bad things can happen by this design. 02:11:47.210 --> 02:11:49.045 But there's really only this kind of design 02:11:49.045 --> 02:11:50.670 because it's a finite amount of memory. 02:11:50.670 --> 02:11:52.370 So at some point, bad things are going to happen no matter 02:11:52.370 --> 02:11:54.070 what if a computer runs out of memory. 02:11:54.070 --> 02:11:55.820 So it's not that this was a poor decision. 02:11:55.820 --> 02:11:59.480 It's just sort of a necessary one given finite amounts of memory in a computer. 02:11:59.480 --> 02:12:01.790 But a heap overflow, so to speak, is when 02:12:01.790 --> 02:12:05.300 you actually overflow the heap and touch memory that you shouldn't up there. 02:12:05.300 --> 02:12:08.300 Stack overflow is when you somehow overflow the stack and touch 02:12:08.300 --> 02:12:10.030 memory that you shouldn't down there. 02:12:10.030 --> 02:12:12.780 So with that said, these are really just problems that can happen. 02:12:12.780 --> 02:12:14.488 And they're specific incarnations of what 02:12:14.488 --> 02:12:16.610 are generally called buffer overflows. 02:12:16.610 --> 02:12:19.670 A buffer, like in the YouTube sense, is just a chunk of memory, 02:12:19.670 --> 02:12:23.030 that in the case of YouTube, stores the next few seconds or minutes of video. 02:12:23.030 --> 02:12:25.910 But generally speaking, a buffer is just a chunk of memory 02:12:25.910 --> 02:12:27.860 that the computer is using for some purpose, 02:12:27.860 --> 02:12:31.430 be it the stack, be it the heap, be it an array in the computer. 02:12:31.430 --> 02:12:34.280 And so buffer overflows are what happens when you just 02:12:34.280 --> 02:12:37.100 have logical bugs in your code. 02:12:37.100 --> 02:12:39.710 But with these primitives now in mind, we 02:12:39.710 --> 02:12:42.050 wanted to conclude with a final revelation. 02:12:42.050 --> 02:12:44.940 And that's how some functions like these here work. 02:12:44.940 --> 02:12:48.140 The other thing in the CS50 library, besides the typedef for quote unquote 02:12:48.140 --> 02:12:50.240 "string" is, of course, all of these functions. 02:12:50.240 --> 02:12:51.573 And we give you these functions. 02:12:51.573 --> 02:12:55.370 Because honestly in C, it is hard, it's annoying, it's painful, 02:12:55.370 --> 02:12:58.430 it's difficult to get user input correctly. 02:12:58.430 --> 02:13:01.670 It's very easy when you don't know how much the human is 02:13:01.670 --> 02:13:04.280 going to type to write buggy code when it comes to it. 02:13:04.280 --> 02:13:06.800 And indeed, it's really hard to store it correctly 02:13:06.800 --> 02:13:09.930 without accidentally having some kind of buffer overflow. 02:13:09.930 --> 02:13:12.150 So for instance, let me show you a program here. 02:13:12.150 --> 02:13:14.400 I'm going to go ahead and write this one from scratch. 02:13:14.400 --> 02:13:17.060 So let me go ahead and open a file called get.c, 02:13:17.060 --> 02:13:20.480 wherein I'm going to go ahead and mimic the idea of getting integers manually 02:13:20.480 --> 02:13:21.950 without the CS50 library. 02:13:21.950 --> 02:13:24.470 So I'm going to include standardio.h only, 02:13:24.470 --> 02:13:27.395 I'm going to define main as not taking any command line arguments, 02:13:27.395 --> 02:13:29.270 and then I'm going to do something like this. 02:13:29.270 --> 02:13:31.850 Give me a variable x with no value yet. 02:13:31.850 --> 02:13:34.250 And normally, I would do something like get int. 02:13:34.250 --> 02:13:35.450 But let me take that away. 02:13:35.450 --> 02:13:37.490 No more training wheels for get int either. 02:13:37.490 --> 02:13:39.890 So let me just define the int x. 02:13:39.890 --> 02:13:44.270 Let me then just print out something like a prompt. 02:13:44.270 --> 02:13:47.090 And I'll just do x colon just to make it obvious to the human what 02:13:47.090 --> 02:13:48.330 we're waiting for. 02:13:48.330 --> 02:13:51.800 And now I'm going to use a built-in C function to get user input. 02:13:51.800 --> 02:13:54.920 I'm going to call a function called scanf, which sort of scans 02:13:54.920 --> 02:13:56.600 the user's keyboard for input. 02:13:56.600 --> 02:13:58.400 I'm going to scan it for an integer. 02:13:58.400 --> 02:14:01.850 So just like printf, I'm going to use i because I expect an int. 02:14:01.850 --> 02:14:04.760 And then I want to tell scanf where to put 02:14:04.760 --> 02:14:07.640 the human's integer from the keyboard. 02:14:07.640 --> 02:14:09.950 It is not correct though to say x. 02:14:09.950 --> 02:14:12.770 Because if I say x, I run into the same swap problem. 02:14:12.770 --> 02:14:13.490 Scanf. 02:14:13.490 --> 02:14:16.250 No function can change the value of x unless I pass it 02:14:16.250 --> 02:14:20.200 not by value, but by reference. 02:14:20.200 --> 02:14:22.020 So we're back to our ampersand friend. 02:14:22.020 --> 02:14:26.700 And now, it has a treasure map to the actual location of x, 02:14:26.700 --> 02:14:27.998 and can therefore change it. 02:14:27.998 --> 02:14:29.790 And so now at the very end of this program, 02:14:29.790 --> 02:14:34.890 let me do something simple like, let's just go ahead and print out with printf 02:14:34.890 --> 02:14:40.575 the value of x, using %i as always plugging in x, not ampersand x. 02:14:40.575 --> 02:14:41.700 This is now week one stuff. 02:14:41.700 --> 02:14:44.430 I want to print the actual integer value of x. 02:14:44.430 --> 02:14:47.400 So the only change here is that instead of using get int, 02:14:47.400 --> 02:14:51.540 I'm now using this new function that as of today exists called scanf. 02:14:51.540 --> 02:14:54.600 So let me go ahead and run get. 02:14:54.600 --> 02:14:56.400 Make get to create this program. 02:14:56.400 --> 02:14:57.632 Dot slash get. 02:14:57.632 --> 02:14:59.340 Let's go ahead and type in a value for x. 02:14:59.340 --> 02:15:00.240 50. 02:15:00.240 --> 02:15:01.080 Enter. 02:15:01.080 --> 02:15:02.160 And it just works. 02:15:02.160 --> 02:15:05.710 So it turns out get int is pretty simple to implement. 02:15:05.710 --> 02:15:07.380 However, notice what does not work. 02:15:07.380 --> 02:15:11.010 If I type in cat, for instance, cat gets converted to zero. 02:15:11.010 --> 02:15:14.610 And meanwhile, get int, recall, will re-prompt the user. 02:15:14.610 --> 02:15:16.890 If a human does not type an actual integer, 02:15:16.890 --> 02:15:18.360 you get automatically re-prompted. 02:15:18.360 --> 02:15:20.160 So that's one of the features we for CS50 02:15:20.160 --> 02:15:22.960 added to get int just to make your programs more user friendly. 02:15:22.960 --> 02:15:25.710 But otherwise, get int is pretty straightforward 02:15:25.710 --> 02:15:27.600 to re-implement using scanf. 02:15:27.600 --> 02:15:30.210 Unfortunately, that's not true for strings. 02:15:30.210 --> 02:15:33.393 Because how do you know when you write your code what word the human is 02:15:33.393 --> 02:15:34.560 going to eventually type in? 02:15:34.560 --> 02:15:37.590 How long they're greeting, like hi is? 02:15:37.590 --> 02:15:40.140 If their name is David, or Carter, or anything else, 02:15:40.140 --> 02:15:42.670 you just don't in advance how much memory you need. 02:15:42.670 --> 02:15:45.360 So how might we do this with strings? 02:15:45.360 --> 02:15:48.170 Well, let me go ahead and declare a string s. 02:15:48.170 --> 02:15:49.170 Although, you know what? 02:15:49.170 --> 02:15:50.400 There's no CS50 library. 02:15:50.400 --> 02:15:53.230 So we do char star s today instead. 02:15:53.230 --> 02:15:56.850 And that gives me not a string per se, but a pointer 02:15:56.850 --> 02:15:59.670 that will point presumably to a string. 02:15:59.670 --> 02:16:00.990 Ideally, I would use this. 02:16:00.990 --> 02:16:01.770 Get string. 02:16:01.770 --> 02:16:04.030 But again, we've taken that training wheel away. 02:16:04.030 --> 02:16:08.040 So now that I have a pointer s, suppose I prompt the human for a value for s, 02:16:08.040 --> 02:16:09.090 just like before. 02:16:09.090 --> 02:16:13.575 Let me use scanf now and tell the user that I expect to read a string, 02:16:13.575 --> 02:16:18.150 %s from the keyboard, and store it in s. 02:16:18.150 --> 02:16:19.470 Now this is subtle. 02:16:19.470 --> 02:16:24.240 I don't technically need an ampersand here, even though I did for an int. 02:16:24.240 --> 02:16:28.020 And I would for a float, and a double, and a long, and a bool, and a char. 02:16:28.020 --> 02:16:34.260 Why do I not need an ampersand in this story to pass by reference? 02:16:34.260 --> 02:16:34.879 Because s is-- 02:16:34.879 --> 02:16:36.129 AUDIENCE: Already [INAUDIBLE]. 02:16:36.129 --> 02:16:37.299 DAVID J. MALAN: It's already an address. 02:16:37.299 --> 02:16:38.830 Again, strings are just special. 02:16:38.830 --> 02:16:40.809 Strings now are always addresses. 02:16:40.809 --> 02:16:43.809 So you don't need to additionally add an ampersand here. 02:16:43.809 --> 02:16:45.575 That's the only subtle difference here. 02:16:45.575 --> 02:16:48.700 But now, if I go ahead and print out at the very end what the value of s is 02:16:48.700 --> 02:16:54.709 using %s as before, this program looks like it's almost the same as the int 02:16:54.709 --> 02:16:55.209 version. 02:16:55.209 --> 02:16:57.190 But let's do make get. 02:16:57.190 --> 02:16:59.500 And OK, so this is not good. 02:16:59.500 --> 02:17:01.910 All right, so it doesn't like an uninitialized value. 02:17:01.910 --> 02:17:02.980 So let me make it happy. 02:17:02.980 --> 02:17:05.138 I said earlier to always initialize my variable. 02:17:05.138 --> 02:17:07.930 So let's initialize it to NULL so that at least something is there. 02:17:07.930 --> 02:17:09.910 That's your good default value nowadays. 02:17:09.910 --> 02:17:12.400 Now if I do dot slash get, now we're good. 02:17:12.400 --> 02:17:15.969 And let me type in something like cat. 02:17:15.969 --> 02:17:17.870 OK, cat is not x. 02:17:17.870 --> 02:17:19.120 Well, let me try another word. 02:17:19.120 --> 02:17:20.379 Maybe it's just cat is wrong. 02:17:20.379 --> 02:17:21.160 Dog. 02:17:21.160 --> 02:17:22.910 OK, let me try David. 02:17:22.910 --> 02:17:24.430 It just doesn't seem to be working. 02:17:24.430 --> 02:17:27.040 Moreover, it's printing it as a zero. 02:17:27.040 --> 02:17:30.719 What logically, though, is the bug here? 02:17:30.719 --> 02:17:32.790 Scanf worked a moment ago for integers. 02:17:32.790 --> 02:17:34.320 But it's not working for strings. 02:17:34.320 --> 02:17:37.170 And it seems to be forgetting C-A-T. It's forgetting D-O-G. 02:17:37.170 --> 02:17:40.820 It's forgetting D-A-V-I-D. Why? 02:17:40.820 --> 02:17:44.090 What's happening here? 02:17:44.090 --> 02:17:47.920 Think back to our yellow pictures of memory. 02:17:47.920 --> 02:17:49.090 When I-- yeah. 02:17:49.090 --> 02:17:50.267 AUDIENCE: [INAUDIBLE] 02:17:50.267 --> 02:17:52.600 DAVID J. MALAN: It might be reading just the NULL itself 02:17:52.600 --> 02:17:54.790 because s is being initialized to NULL. 02:17:54.790 --> 02:17:58.090 And what step have I forgotten from just a few minutes ago? 02:17:58.090 --> 02:18:01.760 What did I not actually request of the computer? 02:18:01.760 --> 02:18:06.290 Actual memory to store the C-A-T, the D-O-G, the D-A-V-I-D. 02:18:06.290 --> 02:18:10.010 There's nowhere have I asked the computer for some amount of memory. 02:18:10.010 --> 02:18:14.600 And so technically, it might be reading it into some garbage location. 02:18:14.600 --> 02:18:18.020 And that's really the problem here. s is initialized to NULL now. 02:18:18.020 --> 02:18:20.278 And so in fact, it is printing zero as NULL. 02:18:20.278 --> 02:18:22.070 But I'm not seeing any of the other letters 02:18:22.070 --> 02:18:23.653 because there was nowhere to put them. 02:18:23.653 --> 02:18:27.920 C-A-T, D-O-G, D-A-V-I-D because I didn't ask for 3 bytes, 4 bytes, 5 bytes, 02:18:27.920 --> 02:18:28.610 100 bytes. 02:18:28.610 --> 02:18:29.840 There's no use of malloc. 02:18:29.840 --> 02:18:31.100 There's no use of an array. 02:18:31.100 --> 02:18:35.370 There's no memory allocated for anything other than the pointer itself. 02:18:35.370 --> 02:18:38.209 And this is where, honestly, life gets hard with scanf. 02:18:38.209 --> 02:18:40.709 I could solve this problem in a couple of ways. 02:18:40.709 --> 02:18:41.910 Let me go ahead and do this. 02:18:41.910 --> 02:18:44.150 Instead of declaring s to be a pointer, let 02:18:44.150 --> 02:18:48.469 me declare s to actually be an array of four chars. 02:18:48.469 --> 02:18:51.540 And now let me go ahead and recompile the code. 02:18:51.540 --> 02:18:55.380 So make get dot slash get, and I'll type in cat now. 02:18:55.380 --> 02:18:56.780 That now works. 02:18:56.780 --> 02:18:57.590 Why? 02:18:57.590 --> 02:19:00.680 Well, I'm allocating an explicit array of size four, 02:19:00.680 --> 02:19:03.930 enough for a one, two, three letters, plus a NULL character. 02:19:03.930 --> 02:19:06.379 Here's where to someone's question earlier, it 02:19:06.379 --> 02:19:09.469 turns out that in some contexts, you can treat arrays 02:19:09.469 --> 02:19:12.000 as though they are pointers themselves. 02:19:12.000 --> 02:19:14.010 So you will sort of do the conversion for you. 02:19:14.010 --> 02:19:17.540 But for now, just assume that s is just an array of size four. 02:19:17.540 --> 02:19:20.719 And if you pass it into scanf, that's like a treasure 02:19:20.719 --> 02:19:24.559 map that leads to those 4 bytes so scanf can now successfully fill it 02:19:24.559 --> 02:19:29.309 with C-A-T, D-O-G. But let's try this again. 02:19:29.309 --> 02:19:30.440 Let's type in David. 02:19:30.440 --> 02:19:32.940 And here, OK, we got lucky. 02:19:32.940 --> 02:19:35.459 But I technically touched memory that I should not. 02:19:35.459 --> 02:19:38.209 And in fact, if I typed in a long enough string, and I don't think 02:19:38.209 --> 02:19:40.910 I could do it very easily by-- without typing 02:19:40.910 --> 02:19:42.830 this thousands or hundreds of times. 02:19:42.830 --> 02:19:43.910 Still OK. 02:19:43.910 --> 02:19:47.298 But you'll notice that it's forgotten the rest of it now. 02:19:47.298 --> 02:19:49.590 So somewhere, we went beyond the boundary of the array. 02:19:49.590 --> 02:19:52.230 And we just don't have enough storage space for that entire thing. 02:19:52.230 --> 02:19:53.647 So what do you do in your program? 02:19:53.647 --> 02:19:57.050 If you don't know how long the person's name or the animal name is going to be, 02:19:57.050 --> 02:19:57.530 what do you do? 02:19:57.530 --> 02:19:58.030 40? 02:19:58.030 --> 02:19:59.180 400? 02:19:59.180 --> 02:19:59.960 4,000? 02:19:59.960 --> 02:20:00.785 40,000? 02:20:00.785 --> 02:20:02.910 At some point, you have to draw a line in the sand. 02:20:02.910 --> 02:20:06.980 And that's why getting user input is so annoying in a language like C. 02:20:06.980 --> 02:20:08.900 And that's why get string exists. 02:20:08.900 --> 02:20:12.500 What we do, if you're curious, is we look at the user's input 02:20:12.500 --> 02:20:13.760 and we take baby steps. 02:20:13.760 --> 02:20:15.980 We look at it one character at a time. 02:20:15.980 --> 02:20:18.380 And every time we see another character, we actually 02:20:18.380 --> 02:20:19.750 call malloc again and say, no. 02:20:19.750 --> 02:20:20.750 I need more than 1 byte. 02:20:20.750 --> 02:20:21.470 I need 2. 02:20:21.470 --> 02:20:23.012 Oh wait, they typed in three letters. 02:20:23.012 --> 02:20:23.930 I need 3 instead of 2. 02:20:23.930 --> 02:20:25.340 Oh, I need 4 instead of 2. 02:20:25.340 --> 02:20:27.470 And we have this crazy loop essentially that 02:20:27.470 --> 02:20:30.500 keeps asking for more and more memory but by taking baby steps. 02:20:30.500 --> 02:20:33.380 And honestly, if you all had to do that in week one, my God. 02:20:33.380 --> 02:20:35.640 We couldn't even write, hello, world anymore. 02:20:35.640 --> 02:20:38.450 And so that's why these training wheels exist, at least early on. 02:20:38.450 --> 02:20:42.050 And that's why in higher level languages like in Python, 02:20:42.050 --> 02:20:43.820 you don't have to do this at all. 02:20:43.820 --> 02:20:45.960 It just works as you'd expect. 02:20:45.960 --> 02:20:47.640 So what more can we do? 02:20:47.640 --> 02:20:51.050 Well, you'll see in problem set four this coming week, if I open up 02:20:51.050 --> 02:20:53.750 an example like this, phonebook.c, you'll 02:20:53.750 --> 02:20:56.510 see that you can manipulate files now, that you 02:20:56.510 --> 02:20:57.950 have a vocabulary for pointers. 02:20:57.950 --> 02:20:59.570 It's going to be new quickly. 02:20:59.570 --> 02:21:02.000 But here we have an example of how. 02:21:02.000 --> 02:21:04.490 I have a program using some familiar libraries here. 02:21:04.490 --> 02:21:08.010 But as I claim in my comment, this saves names and numbers to a CSV file. 02:21:08.010 --> 02:21:10.235 All of my examples thus far, I type in some words, 02:21:10.235 --> 02:21:12.110 I type in some names, and some phone numbers, 02:21:12.110 --> 02:21:14.550 and they disappear because we only store them in memory. 02:21:14.550 --> 02:21:18.290 But if you want to store data in like a CSV file, Comma Separated Values, which 02:21:18.290 --> 02:21:21.200 is like a simple spreadsheet like Excel, and Apple Numbers, 02:21:21.200 --> 02:21:23.910 and Google Sheets can open, you can actually do this yourself. 02:21:23.910 --> 02:21:26.660 So just as a teaser for this week, here on line nine, 02:21:26.660 --> 02:21:27.950 I'm using a new data type. 02:21:27.950 --> 02:21:28.820 Not a CS50 thing. 02:21:28.820 --> 02:21:30.575 This is a C thing called file. 02:21:30.575 --> 02:21:32.450 But if you want to manipulate files, you need 02:21:32.450 --> 02:21:34.250 to use addresses, that is pointers. 02:21:34.250 --> 02:21:37.160 So here is me creating a variable called file 02:21:37.160 --> 02:21:40.070 that's going to point to an actual file on the hard drive, 02:21:40.070 --> 02:21:42.140 on the server, or your Mac, or PC. 02:21:42.140 --> 02:21:45.560 fopen is going to be a new function you'll use that will open a file. 02:21:45.560 --> 02:21:49.160 And it will return effectively a pointer there to in memory. 02:21:49.160 --> 02:21:51.560 The file name I want to open is phonebook.csv. 02:21:51.560 --> 02:21:54.230 And in this example, it's going to be a pen mode. 02:21:54.230 --> 02:21:57.447 It will keep allowing me to add more and more names and numbers to this file. 02:21:57.447 --> 02:21:59.780 Here are some old get string stuff because I'm not going 02:21:59.780 --> 02:22:01.460 to reinvent get string with scanf. 02:22:01.460 --> 02:22:03.930 But down here is a slightly new function. 02:22:03.930 --> 02:22:05.260 It's not printf, but fprintf. 02:22:05.260 --> 02:22:08.010 And it turns out it's very easy to print things not to the screen, 02:22:08.010 --> 02:22:09.800 but to a file with fprintf. 02:22:09.800 --> 02:22:11.900 And it takes an additional argument, instead 02:22:11.900 --> 02:22:14.000 of starting with the quoted string, you'll 02:22:14.000 --> 02:22:16.400 have to say what file you want to write to. 02:22:16.400 --> 02:22:20.240 And fprintf we'll figure out how to get the bits into that 02:22:20.240 --> 02:22:23.520 file passing in something like name, comma number. 02:22:23.520 --> 02:22:26.940 So if I run this somewhat quickly here, let me do this. 02:22:26.940 --> 02:22:31.400 Let me pre-create a file called phonebook.csv. 02:22:31.400 --> 02:22:34.940 And in phonebook.csv, I'm going to create a temporary row here, name 02:22:34.940 --> 02:22:37.820 comma number just so that there's something in this file. 02:22:37.820 --> 02:22:41.810 And now let me go ahead and do this and split my screen here. 02:22:41.810 --> 02:22:46.410 If I have phonebook.csv on the right and phonebook.c on the left, 02:22:46.410 --> 02:22:51.345 let me compile, make phone book, which is the C version, dot slash phonebook. 02:22:51.345 --> 02:22:53.220 And now I'm prompted for a name and a number. 02:22:53.220 --> 02:22:58.080 So I'll type in David, and then for instance plus 1-949-- 02:22:58.080 --> 02:22:58.580 what is it? 02:22:58.580 --> 02:23:01.190 4682750. 02:23:01.190 --> 02:23:02.390 Enter. 02:23:02.390 --> 02:23:03.110 Oh, damn it. 02:23:03.110 --> 02:23:03.995 Bug. 02:23:03.995 --> 02:23:05.120 Pretend that didn't happen. 02:23:05.120 --> 02:23:06.990 I forgot to Enter in the file. 02:23:06.990 --> 02:23:08.630 So let's do this again. 02:23:08.630 --> 02:23:17.900 If I run the program again, David, and plus 1-949-4682750, 02:23:17.900 --> 02:23:20.970 Enter, it's been saved now to the file. 02:23:20.970 --> 02:23:27.230 And if I close this file and I reopen code of phonebook.csv, 02:23:27.230 --> 02:23:29.990 you'll see that the file is persisting. 02:23:29.990 --> 02:23:32.053 And if I downloaded this to my Mac, or my PC, 02:23:32.053 --> 02:23:33.470 I could double click the CSV file. 02:23:33.470 --> 02:23:36.260 And voila, Excel would open up, or Apple Numbers, or the like. 02:23:36.260 --> 02:23:38.527 And I've actually created an actual CSV file. 02:23:38.527 --> 02:23:41.360 If you're smiling because I keep repeating my phone number out loud, 02:23:41.360 --> 02:23:43.910 I would encourage you to call or text that number sometime. 02:23:43.910 --> 02:23:46.100 It might very well be an Easter egg of sorts. 02:23:46.100 --> 02:23:49.160 But via these functions here do we have now the ability 02:23:49.160 --> 02:23:52.130 to write files input and output. 02:23:52.130 --> 02:23:54.442 And among the goals then for this week, as we'll see, 02:23:54.442 --> 02:23:56.900 are to actually play with images in the spirit of something 02:23:56.900 --> 02:23:58.670 like Instagram filters or the like. 02:23:58.670 --> 02:24:01.490 And we'll introduce you, for instance, to a file format called 02:24:01.490 --> 02:24:05.990 BNPs, which to come full circle to the start of class, are just maps of bits, 02:24:05.990 --> 02:24:09.830 but more than just single bits for white and black, but rather colorful patterns 02:24:09.830 --> 02:24:10.500 as well. 02:24:10.500 --> 02:24:12.500 And will give you images like this of the Weeks Bridge 02:24:12.500 --> 02:24:13.875 here across the river at Harvard. 02:24:13.875 --> 02:24:16.490 And you run, after writing your own code in C, 02:24:16.490 --> 02:24:19.925 and understanding how the data is stored in the computer's memory, 02:24:19.925 --> 02:24:22.550 you'll be able to apply your own Instagram-like filters to make 02:24:22.550 --> 02:24:25.830 things grayscale instead, or sepia in this case. 02:24:25.830 --> 02:24:28.820 You can even flip the bits around so that the thing is a mirror image. 02:24:28.820 --> 02:24:30.530 You can blur things further. 02:24:30.530 --> 02:24:32.510 Or if you really are feeling more comfortable, 02:24:32.510 --> 02:24:35.570 you can even write code that finds the edges of the image 02:24:35.570 --> 02:24:37.520 and creates works of art like these. 02:24:37.520 --> 02:24:39.740 So all that and more in problem set four. 02:24:39.740 --> 02:24:42.650 We will see you next time. 02:24:42.650 --> 02:24:47.200 [MUSIC PLAYING]