WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:01.988 [MUSIC PLAYING] 00:01:23.070 --> 00:01:26.050 DAVID J. MALAN: All right, this is CS50 And this 00:01:26.050 --> 00:01:29.020 is the day we take off the proverbial training wheels, namely 00:01:29.020 --> 00:01:30.850 the CS50 library. 00:01:30.850 --> 00:01:33.370 You'll recall last week as we focused on algorithms, 00:01:33.370 --> 00:01:37.540 we started focusing on lots of comparisons and lots of swapping. 00:01:37.540 --> 00:01:41.930 And we did that fairly algorithmically, fairly conceptually last week. 00:01:41.930 --> 00:01:43.930 but today we're going to focus on actually doing 00:01:43.930 --> 00:01:48.360 that a little more mechanically, a little more methodically. 00:01:48.360 --> 00:01:53.050 And I thought this would be easier to take the training wheels off, 00:01:53.050 --> 00:01:54.520 hopefully not a metaphor for today. 00:01:54.520 --> 00:01:55.020 OK. 00:01:55.020 --> 00:01:57.790 So [CHUCKLE] what we'll do first though, is learn how 00:01:57.790 --> 00:01:59.350 to count in a slightly different way. 00:01:59.350 --> 00:02:01.527 You'll recall in Week 0 we did this already 00:02:01.527 --> 00:02:04.360 whereby we introduced not only the human decimal system-- with which 00:02:04.360 --> 00:02:06.240 everyone's familiar --but also binary. 00:02:06.240 --> 00:02:08.770 It turns out there's other base systems where you don't just 00:02:08.770 --> 00:02:12.850 use powers of 10 or 2, you use other base systems entirely as well. 00:02:12.850 --> 00:02:15.520 And this is useful because today when we focus really 00:02:15.520 --> 00:02:18.280 on the computer's memory, and later today on files-- 00:02:18.280 --> 00:02:21.040 the actual creation of and editing of files, 00:02:21.040 --> 00:02:23.770 like images you might have on your own phones or computers 00:02:23.770 --> 00:02:27.070 --it turns out it's very useful to be able to address the memory 00:02:27.070 --> 00:02:29.200 inside of our computers or phones-- that is assign 00:02:29.200 --> 00:02:31.660 a number, a unique identifier, to every byte 00:02:31.660 --> 00:02:34.490 so that we can just talk about where things are in memory. 00:02:34.490 --> 00:02:41.030 Now you might think we would do 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 00:02:41.030 --> 00:02:44.247 14, 15, but it turns out that's not actually human convention. 00:02:44.247 --> 00:02:45.580 There's nothing wrong with this. 00:02:45.580 --> 00:02:48.747 It's correct but you're about to see today a slightly different syntax where 00:02:48.747 --> 00:02:54.730 we do count from 0 to 1, to 2, to 3, to 4, to 5, to 6, to 7, to 8, to 9, 00:02:54.730 --> 00:02:59.680 but in the world of not decimal, not binary, but hexadecimal-- hex 00:02:59.680 --> 00:03:01.090 meaning 16. 00:03:01.090 --> 00:03:03.250 Can you actually count higher than nine? 00:03:03.250 --> 00:03:08.480 There is the letter A, B, C, D, E and F. Why? 00:03:08.480 --> 00:03:11.020 While using these individual alphabetical letters, can 00:03:11.020 --> 00:03:14.020 you effectively count not only from 0 through 9-- 00:03:14.020 --> 00:03:19.780 using single digits --but also 10, 11, 12, 13, 14, 15-- 00:03:19.780 --> 00:03:21.130 F, representing 15. 00:03:21.130 --> 00:03:24.675 And so I introduce this because we'll see this pattern throughout today 00:03:24.675 --> 00:03:27.550 and throughout the coming weeks programs where the computer will just 00:03:27.550 --> 00:03:31.000 very conventionally display to you numbers not in decimal, not in binary, 00:03:31.000 --> 00:03:32.440 but sometimes in hexadecimal. 00:03:32.440 --> 00:03:34.660 But we'll see why that is in just a moment. 00:03:34.660 --> 00:03:36.820 Indeed, in binary we had the digits 0 and 1, 00:03:36.820 --> 00:03:39.100 decimal we had 0 through 9, in hexadecimal-- 00:03:39.100 --> 00:03:43.420 to recap --we have 0 through F, where again, F is 15. 00:03:43.420 --> 00:03:44.980 So how does this actually work? 00:03:44.980 --> 00:03:48.430 Just a quick whirlwind tour, this was our notation in binary. 00:03:48.430 --> 00:03:52.330 And I had eight 0 bits here, bit meaning binary digit. 00:03:52.330 --> 00:03:54.700 And based on the columns there, we had powers of 2, 00:03:54.700 --> 00:03:57.550 or if we multiplied that out, the ones place over there, 00:03:57.550 --> 00:03:59.650 the 128's place over here. 00:03:59.650 --> 00:04:04.070 This of course, if you do the math, is what number in decimal? 00:04:04.070 --> 00:04:07.460 So just 0-- right --if you multiply the columns by the numbers they're in. 00:04:07.460 --> 00:04:08.610 But what about this? 00:04:08.610 --> 00:04:10.970 If I change all those 0s to 1s, what was the highest 00:04:10.970 --> 00:04:13.070 we could count in binary if we had eight bits? 00:04:13.070 --> 00:04:14.690 AUDIENCE: 255 00:04:14.690 --> 00:04:18.110 DAVID J. MALAN: Yeah, 255 was the highest we can count. 00:04:18.110 --> 00:04:21.320 You might say 256 but again, if you start counting at 0, 00:04:21.320 --> 00:04:23.930 you sort of spend one of those numbers as the 0. 00:04:23.930 --> 00:04:27.548 So 255 is the highest you can count with eight bits. 00:04:27.548 --> 00:04:29.090 And we could do the math if we cared. 00:04:29.090 --> 00:04:32.960 128 times 1 plus 64 times 1, and so forth. 00:04:32.960 --> 00:04:35.412 But let me just stipulate, that's indeed 255. 00:04:35.412 --> 00:04:38.120 In decimal, and indeed in decimal, we would represent the columns 00:04:38.120 --> 00:04:41.910 as powers of 10 or ones place, ten place, hundreds place, and so forth. 00:04:41.910 --> 00:04:44.000 So that's all Week 0 stuff. 00:04:44.000 --> 00:04:48.290 It turns out, though, that there's another way of representing 255 00:04:48.290 --> 00:04:53.720 in decimal using hexadecimal, except now instead of powers of 2 or powers of 10, 00:04:53.720 --> 00:04:55.580 we're just going to use powers of 16. 00:04:55.580 --> 00:05:00.170 And it turns out this is convenient for reasons related to computing. 00:05:00.170 --> 00:05:03.930 So the rightmost column will be our 16th to the zeroth or the ones place. 00:05:03.930 --> 00:05:06.000 The second column will be our 16s place. 00:05:06.000 --> 00:05:09.750 And remember, F, individually represents 15 in decimal. 00:05:09.750 --> 00:05:11.300 So we can count quite similarly. 00:05:11.300 --> 00:05:13.850 So this in hexadecimal would just be 0. 00:05:13.850 --> 00:05:17.630 16 times 0, plus 1 times 0, is of course 0. 00:05:17.630 --> 00:05:20.640 This of course, easy one, is what number? 00:05:20.640 --> 00:05:21.140 AUDIENCE: 1 00:05:21.140 --> 00:05:22.380 DAVID J. MALAN: 1 in decimal. 00:05:22.380 --> 00:05:26.870 This is going to be 2, 3, 4, 5, 6, 7, 8, 9. 00:05:26.870 --> 00:05:30.170 And whereas in the decimal role would you want to say 10-- 00:05:30.170 --> 00:05:37.880 or 1, 0 --here we can actually count a little higher to A, B, C, D, E, F-- 00:05:37.880 --> 00:05:39.560 and that represents 15. 00:05:39.560 --> 00:05:40.130 Why? 00:05:40.130 --> 00:05:44.450 16 times 0, plus 1 times F-- which again, F is 15. 00:05:44.450 --> 00:05:46.060 So 1 times F-- 00:05:46.060 --> 00:05:47.600 or 15-- gives you 15. 00:05:47.600 --> 00:05:49.667 Now how do you count as high as 16? 00:05:49.667 --> 00:05:51.750 Well, you can probably envision it already, right? 00:05:51.750 --> 00:05:54.710 You kind of carry the 1 just like in decimal and binary. 00:05:54.710 --> 00:05:58.775 So in hexadecimal, 1, 0 is the number 16. 00:05:58.775 --> 00:06:00.650 And here's where you just have to be careful. 00:06:00.650 --> 00:06:02.270 You shouldn't say 10 anymore. 00:06:02.270 --> 00:06:03.620 That's a decimal number. 00:06:03.620 --> 00:06:06.160 This is 1, 0 in hexadecimal. 00:06:06.160 --> 00:06:07.160 But we can count higher. 00:06:07.160 --> 00:06:17.780 If this is 16, this is 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 00:06:17.780 --> 00:06:19.490 31. 00:06:19.490 --> 00:06:23.360 And once you need 32, that's going to require another digit, if you will. 00:06:23.360 --> 00:06:24.380 So very low level. 00:06:24.380 --> 00:06:27.240 And none of us really on staff sort of think in hexadecimal, 00:06:27.240 --> 00:06:29.210 you'll just see things in hexadecimal. 00:06:29.210 --> 00:06:31.640 And all this is to say is that it can be converted back 00:06:31.640 --> 00:06:35.630 to the more familiar decimal or any other system as well. 00:06:35.630 --> 00:06:39.690 Higher than that we would go 2, 0, which of course, is 16 times 2-- 00:06:39.690 --> 00:06:42.530 which is 32 --plus 0. 00:06:42.530 --> 00:06:48.410 So it turns out that if you have four 1s and four 1s that it can be represented 00:06:48.410 --> 00:06:53.480 as FF, you've actually seen FF and probably 00 and other alphabetical 00:06:53.480 --> 00:06:54.360 characters before. 00:06:54.360 --> 00:06:58.340 How many of you have ever done web design using HTML, CSS? 00:06:58.340 --> 00:07:00.200 So like at least a third or so of the class. 00:07:00.200 --> 00:07:01.880 And for those unfamiliar, we'll get to that 00:07:01.880 --> 00:07:04.130 if you want to pursue that track later in the semester 00:07:04.130 --> 00:07:06.020 but recall RGB from Week 0. 00:07:06.020 --> 00:07:08.720 Red, green, blue refers to how computers can represent 00:07:08.720 --> 00:07:12.410 the colors of every pixel using some amount of red, some amount of green, 00:07:12.410 --> 00:07:13.460 some amount of blue. 00:07:13.460 --> 00:07:15.950 Well it turns out it's just human convention 00:07:15.950 --> 00:07:20.270 to describe the amounts of red, green, and blue in a color in terms 00:07:20.270 --> 00:07:25.592 of hexadecimal digits-- where this means give me no red, no green, no blue. 00:07:25.592 --> 00:07:28.550 And if you think back to Week 0 that's actually going to give us black. 00:07:28.550 --> 00:07:31.400 If you have none of those three colors, it's just the absence of those colors 00:07:31.400 --> 00:07:32.450 and you get black. 00:07:32.450 --> 00:07:35.090 If however, you have FF-- 00:07:35.090 --> 00:07:36.240 which is what? 00:07:36.240 --> 00:07:41.940 --255 amount of red, that's a lot of red, and 0 green, 0 blue. 00:07:41.940 --> 00:07:45.980 So if a computer were to represent a pixel on your screen as red 00:07:45.980 --> 00:07:48.230 it would store FF0000. 00:07:48.230 --> 00:07:51.500 That is a lot of red, no green, no blue. 00:07:51.500 --> 00:07:53.310 Meanwhile, if you had this representation, 00:07:53.310 --> 00:07:54.638 this is why this is green. 00:07:54.638 --> 00:07:55.430 This would be blue. 00:07:55.430 --> 00:07:57.710 And if you combine all three colors a lot-- 00:07:57.710 --> 00:07:59.870 a lot of red, lot of green, lot of blue --this 00:07:59.870 --> 00:08:01.780 is how a computer would represent white. 00:08:01.780 --> 00:08:04.280 And so we'll come back to this later on in game development, 00:08:04.280 --> 00:08:05.690 and web development, and mobile-- 00:08:05.690 --> 00:08:09.330 if of interest-- but notice that this is just a common convention as well. 00:08:09.330 --> 00:08:11.480 So if we reconsider what our memory looks like, 00:08:11.480 --> 00:08:13.010 it's just this big grid of bytes. 00:08:13.010 --> 00:08:16.980 And we might describe the top one is 0 and the bottom one in this case as 1F. 00:08:16.980 --> 00:08:18.230 And we can just keep counting. 00:08:18.230 --> 00:08:21.300 However, at first glance it might be a little ambiguous. 00:08:21.300 --> 00:08:22.700 Am I looking at decimal? 00:08:22.700 --> 00:08:23.990 Am I looking at hexadecimal? 00:08:23.990 --> 00:08:25.970 Am I looking at something else altogether? 00:08:25.970 --> 00:08:29.540 So humans years ago decided that just to avoid ambiguity, 00:08:29.540 --> 00:08:32.510 if you are using hexadecimal, the human convention 00:08:32.510 --> 00:08:38.240 is to prefix every digit on the screen with 0x, just arbitrarily. 00:08:38.240 --> 00:08:40.220 The 0x means nothing mathematically. 00:08:40.220 --> 00:08:42.870 It just means here comes a hexadecimal value. 00:08:42.870 --> 00:08:48.386 So you can disambiguate it from something like decimal itself. 00:08:48.386 --> 00:08:49.300 Whew. 00:08:49.300 --> 00:08:49.827 OK. 00:08:49.827 --> 00:08:50.660 That was a mouthful. 00:08:50.660 --> 00:08:51.960 And that's it for base systems. 00:08:51.960 --> 00:08:56.130 There's no more something decimals here on out this term. 00:08:56.130 --> 00:08:58.130 Slight white lie, there's something called octal 00:08:58.130 --> 00:08:59.588 but we probably won't look at that. 00:08:59.588 --> 00:09:02.290 Are there any questions at all? 00:09:02.290 --> 00:09:03.550 No, all right. 00:09:03.550 --> 00:09:05.660 So how can we actually use this information? 00:09:05.660 --> 00:09:07.930 Well let's now see some examples of what's 00:09:07.930 --> 00:09:09.960 going on truly inside of your computer's memory. 00:09:09.960 --> 00:09:11.710 And we'll see where hexadecimal is germane 00:09:11.710 --> 00:09:15.100 and how we can now start manipulating things more carefully inside 00:09:15.100 --> 00:09:16.240 of the computer's memory. 00:09:16.240 --> 00:09:19.690 This of course, is just a line of code involving creation of a variable called 00:09:19.690 --> 00:09:20.260 n. 00:09:20.260 --> 00:09:24.370 And that variable is having stored in it, the value 50. 00:09:24.370 --> 00:09:27.332 So let's go ahead and whip up a quick program that does exactly this. 00:09:27.332 --> 00:09:29.290 I'm going to go ahead and call this address dot 00:09:29.290 --> 00:09:33.040 c, just to convey that we're going to be playing with addresses 00:09:33.040 --> 00:09:34.580 in the computer's memory. 00:09:34.580 --> 00:09:36.830 And I'm going to go ahead and keep it simple at first, 00:09:36.830 --> 00:09:40.120 include standard I/O dot h and then int main void. 00:09:40.120 --> 00:09:43.145 And then down here, super simple, int n gets 50. 00:09:43.145 --> 00:09:45.020 And then I'm going to go ahead and print out, 00:09:45.020 --> 00:09:48.700 percent i comma n, thereby printing this value. 00:09:48.700 --> 00:09:52.480 So this too is sort of Week 1 stuff, whereby when I run this program now 00:09:52.480 --> 00:09:55.060 after saving it, make address-- 00:09:55.060 --> 00:10:00.512 seems to compile OK --dot slash address, I should see of course, 50. 00:10:00.512 --> 00:10:02.470 All right, just the number 50 in that variable. 00:10:02.470 --> 00:10:02.970 All right. 00:10:02.970 --> 00:10:06.190 So you're probably comfortable with these kinds of exercises thus far. 00:10:06.190 --> 00:10:09.490 But it turns out that we can now kind of infer what's 00:10:09.490 --> 00:10:11.170 going on inside the computer's memory. 00:10:11.170 --> 00:10:13.930 If this again is my computer's memory and somewhere in there 00:10:13.930 --> 00:10:17.470 I have a variable n, it might take up four bytes down there. 00:10:17.470 --> 00:10:20.050 An int recall is four bytes so I'm going to go ahead and use 00:10:20.050 --> 00:10:21.370 four squares on the screen. 00:10:21.370 --> 00:10:24.610 For consistency, I'm going to call it n and just put the number 50. 00:10:24.610 --> 00:10:27.670 Now if you really look underneath the hood, that's not 50 per se, 00:10:27.670 --> 00:10:31.148 it's like 32 bits, 0s and 1s that represent the number 50. 00:10:31.148 --> 00:10:33.940 But again, we don't care about transistors in that low level detail 00:10:33.940 --> 00:10:34.628 now. 00:10:34.628 --> 00:10:36.670 But when I go ahead and print this, all I'm doing 00:10:36.670 --> 00:10:40.630 is printing the contents of that variable called n. 00:10:40.630 --> 00:10:45.700 But that variable technically does exist at a specific address in memory. 00:10:45.700 --> 00:10:46.200 Right? 00:10:46.200 --> 00:10:49.150 If the top left hand corner was 0 and the bottom right hand corner 00:10:49.150 --> 00:10:51.442 was a bigger number-- and maybe this is out of context. 00:10:51.442 --> 00:10:53.710 I'm sort of zoomed out because you might have billions 00:10:53.710 --> 00:10:55.300 of bytes of memory in your computer. 00:10:55.300 --> 00:10:59.380 Suppose for the sake of discussion that that variable n and the value therein, 00:10:59.380 --> 00:11:05.890 50 is technically at address 0x meaning hexadecimal 12345678, wherever that is. 00:11:05.890 --> 00:11:07.780 It's a big arbitrary number. 00:11:07.780 --> 00:11:10.450 But it indeed exists somewhere in your computer's memory so long 00:11:10.450 --> 00:11:14.170 as you have that many bytes of hardware to use. 00:11:14.170 --> 00:11:17.920 Well it turns out that using C we can actually-- 00:11:17.920 --> 00:11:20.840 no pun intended --see this value as well. 00:11:20.840 --> 00:11:23.560 Let me go ahead and tweak this code slightly. 00:11:23.560 --> 00:11:25.920 I'm not going to go ahead and print out n this time, 00:11:25.920 --> 00:11:29.140 I'm going to go ahead and print out ampersand n, which 00:11:29.140 --> 00:11:31.390 happens to be a new piece of syntax for C. 00:11:31.390 --> 00:11:34.840 But it quite simply means the AddressOf operator. 00:11:34.840 --> 00:11:38.440 So wherever n is, go ahead and figure out what its address is, 00:11:38.440 --> 00:11:39.910 it's location in memory. 00:11:39.910 --> 00:11:42.490 And it turns out C has a special format code for this. 00:11:42.490 --> 00:11:46.570 Instead of percent i, it's percent p, where percent p 00:11:46.570 --> 00:11:48.760 is going to print that address for us. 00:11:48.760 --> 00:11:54.280 So let me go ahead and save that make address again to recompile and then do 00:11:54.280 --> 00:11:56.980 dot slash address, enter. 00:11:56.980 --> 00:11:57.910 And voila. 00:11:57.910 --> 00:12:02.260 Now it just so happens that in CS50 IDE running on this cloud server, 00:12:02.260 --> 00:12:04.810 it's not address 0x12345678. 00:12:04.810 --> 00:12:06.850 I just made that up for the sake of discussion. 00:12:06.850 --> 00:12:14.410 It's technically at 0x7FFE00B3ADBC, which has no meaning to us 00:12:14.410 --> 00:12:17.290 here in class but it is all hexadecimal because every digit there 00:12:17.290 --> 00:12:19.150 is 0 through F. 00:12:19.150 --> 00:12:20.530 So it's kind of cool. 00:12:20.530 --> 00:12:22.840 This doesn't seem like useful information yet 00:12:22.840 --> 00:12:27.670 but you can in fact see where values are inside of your computer's memory. 00:12:27.670 --> 00:12:28.970 Well, what is that value? 00:12:28.970 --> 00:12:31.120 Well it turns out that as soon as you ask 00:12:31.120 --> 00:12:34.210 the computer for the address of some value, 00:12:34.210 --> 00:12:37.450 you are getting what's called a pointer to that value. 00:12:37.450 --> 00:12:40.640 A pointer is effectively an address in the computer's memory. 00:12:40.640 --> 00:12:42.100 And that's why it's percent p. 00:12:42.100 --> 00:12:44.590 This is telling printf, go ahead and print for me 00:12:44.590 --> 00:12:47.080 a pointer, the address of some value. 00:12:47.080 --> 00:12:51.250 And by convention again, it's displayed in hexadecimal like that. 00:12:51.250 --> 00:12:53.500 Well, it turns out we can actually undo these effects. 00:12:53.500 --> 00:12:55.330 Let me go ahead and make one change here. 00:12:55.330 --> 00:12:59.410 Suppose that now I want to go ahead and print out 50 again. 00:12:59.410 --> 00:13:02.140 I can actually reverse the effects of this operator. 00:13:02.140 --> 00:13:06.610 So ampersand n means to go get the address of n. 00:13:06.610 --> 00:13:09.070 But it turns out there's another operator in C that's 00:13:09.070 --> 00:13:12.550 quite useful around now and that's this one here. 00:13:12.550 --> 00:13:15.970 So whereas ampersand is our so-called AddressOf operator, --star, 00:13:15.970 --> 00:13:17.020 or an asterisk-- 00:13:17.020 --> 00:13:18.730 we've seen before in multiplication. 00:13:18.730 --> 00:13:21.550 And today it has a different meaning in a different context. 00:13:21.550 --> 00:13:25.630 The star is the opposite of the AddressOf operator, 00:13:25.630 --> 00:13:28.450 it says go to a specific address. 00:13:28.450 --> 00:13:31.280 So whereas, an ampersand means what's the address, 00:13:31.280 --> 00:13:33.560 star means go to an address. 00:13:33.560 --> 00:13:36.760 So if I want to print out now, not the address per se, 00:13:36.760 --> 00:13:39.880 but I literally want to print out the value in n, 00:13:39.880 --> 00:13:44.980 ergo using percent i, I can actually undo what I literally did, 00:13:44.980 --> 00:13:48.760 stupidly-- but for the sake of demonstration --by doing star 00:13:48.760 --> 00:13:49.370 ampersand n. 00:13:49.370 --> 00:13:49.870 Why? 00:13:49.870 --> 00:13:51.495 The ampersand says, what's the address? 00:13:51.495 --> 00:13:53.002 The star says, go to that address. 00:13:53.002 --> 00:13:54.835 So it effectively just undoes the operation. 00:13:54.835 --> 00:13:56.770 So you wouldn't want to use this in practice 00:13:56.770 --> 00:14:00.800 but it just speaks to the sort of basic operations that we're doing here. 00:14:00.800 --> 00:14:06.090 So make address, let me go ahead and say now, dot slash address, enter. 00:14:06.090 --> 00:14:08.990 And what should I see this time? 00:14:08.990 --> 00:14:11.650 50, because I'm not even showing the address. 00:14:11.650 --> 00:14:15.490 I'm getting the address and going to the address, thereby defeating the point. 00:14:15.490 --> 00:14:16.730 I again see 50. 00:14:16.730 --> 00:14:19.960 But this is only to say quite simply that even though things might seem 00:14:19.960 --> 00:14:22.510 a little cryptic today at first glance, syntactically, 00:14:22.510 --> 00:14:26.990 ampersand is get the address, star is go to that address, one way or the other. 00:14:26.990 --> 00:14:27.490 Yeah? 00:14:27.490 --> 00:14:30.885 AUDIENCE: Can you [INAUDIBLE] by typing the address 00:14:30.885 --> 00:14:34.730 in [INAUDIBLE] like a [INAUDIBLE]? 00:14:34.730 --> 00:14:36.480 DAVID J. MALAN: Really good question, yes. 00:14:36.480 --> 00:14:40.650 So if I had remembered the address, maybe it was 0x12345678, 00:14:40.650 --> 00:14:43.590 I could actually hard code that address in my program 00:14:43.590 --> 00:14:45.585 and tell the computer to go there. 00:14:45.585 --> 00:14:46.960 The syntax is a little different. 00:14:46.960 --> 00:14:50.040 I would have to coerce it using a cast but I could make that happen, yes. 00:14:50.040 --> 00:14:50.540 Yeah. 00:14:50.540 --> 00:14:54.851 AUDIENCE: What happens if you don't know even the type of the variable? 00:14:54.851 --> 00:14:56.767 Can you [INAUDIBLE] without knowing that? 00:14:56.767 --> 00:14:58.010 DAVID J. MALAN: Ah, really good question. 00:14:58.010 --> 00:15:00.190 What if you don't know the type of the variable, 00:15:00.190 --> 00:15:02.650 what format code would you therefore use? 00:15:02.650 --> 00:15:04.270 Short answer, you have to decide. 00:15:04.270 --> 00:15:07.480 To a computer, everything in memory is just bits, 0s and 1s, how 00:15:07.480 --> 00:15:09.590 you display them is entirely up to you. 00:15:09.590 --> 00:15:11.710 So if you don't know what they are, you can only 00:15:11.710 --> 00:15:13.880 guess, or tell the computer arbitrarily to say 00:15:13.880 --> 00:15:15.880 it's a char, a float, an int, or something else. 00:15:15.880 --> 00:15:18.890 It can't figure that out for you, at least in C. 00:15:18.890 --> 00:15:19.390 All right. 00:15:19.390 --> 00:15:22.060 So let's just go ahead now and make more clear 00:15:22.060 --> 00:15:23.912 where we can store information here. 00:15:23.912 --> 00:15:26.120 Let me go ahead and change this code. now as follows. 00:15:26.120 --> 00:15:29.287 It turns out that you can actually store addresses and variables themselves. 00:15:29.287 --> 00:15:31.880 I don't have to just do this ampersand thing here. 00:15:31.880 --> 00:15:34.070 Let me go ahead and change the program as follows. 00:15:34.070 --> 00:15:38.530 Let me go ahead and declare another variable called p and store in it's 00:15:38.530 --> 00:15:40.390 the address of n. 00:15:40.390 --> 00:15:45.550 So again, nothing new here, just says, ampersand n, go get the address of n. 00:15:45.550 --> 00:15:47.798 But I do have to do something different here. 00:15:47.798 --> 00:15:49.840 On the left hand side is the name of my variable. 00:15:49.840 --> 00:15:51.430 I've called it p, for pointer. 00:15:51.430 --> 00:15:54.910 But if you want to store the address of some value 00:15:54.910 --> 00:15:59.980 in a variable you have to specify not just the type of value that's 00:15:59.980 --> 00:16:02.290 in that other variable, you have to specify 00:16:02.290 --> 00:16:06.310 with this star operator in a very confusing, unfortunate, different 00:16:06.310 --> 00:16:08.590 context, that this is a pointer. 00:16:08.590 --> 00:16:14.420 So whereas n has a data type of int-- just as it has since Week 0 00:16:14.420 --> 00:16:18.880 --the only thing new now is that it turns out there's another type of data 00:16:18.880 --> 00:16:20.980 that you can describe as a pointer. 00:16:20.980 --> 00:16:24.880 And a pointer is denoted with this star and the int just 00:16:24.880 --> 00:16:28.930 means this is the pointer to an int or it is the address of an int. 00:16:28.930 --> 00:16:31.990 And we'll see later we can do floats and-- 00:16:31.990 --> 00:16:34.720 floats, and chars, and bunches of other data types too. 00:16:34.720 --> 00:16:36.890 This just means that p is a variable that's going 00:16:36.890 --> 00:16:39.790 to contain a pointer to an int, a.k.a. 00:16:39.790 --> 00:16:41.920 The address of an int. 00:16:41.920 --> 00:16:42.430 All right. 00:16:42.430 --> 00:16:45.645 So what can I do now with this information? 00:16:45.645 --> 00:16:47.770 Well let me go ahead and print out either of these. 00:16:47.770 --> 00:16:51.490 If I want to go ahead and print out now, for instance, that address, 00:16:51.490 --> 00:16:56.330 I can go ahead and print % p and print out p just like this. 00:16:56.330 --> 00:16:58.810 Let me go ahead and make address, enter-- 00:16:58.810 --> 00:17:00.610 seems to compile OK --run address. 00:17:00.610 --> 00:17:06.032 And I'm going to see something cryptic again, 0x 7FFF3977662C, which 00:17:06.032 --> 00:17:07.990 is different from before but that's because one 00:17:07.990 --> 00:17:09.948 of the features of modern computers is actually 00:17:09.948 --> 00:17:12.970 to move things around in memory for you, which is a security feature. 00:17:12.970 --> 00:17:14.740 But more on that perhaps, later on. 00:17:14.740 --> 00:17:17.710 But it's still a big cryptic hexadecimal address. 00:17:17.710 --> 00:17:20.290 What if though, just for the sake of demonstration, 00:17:20.290 --> 00:17:23.440 I didn't want to print out the address because rarely after today 00:17:23.440 --> 00:17:27.040 are we going to care about the specific addresses where things are? 00:17:27.040 --> 00:17:32.680 How could I change line 7 here to print out, not the value of p, 00:17:32.680 --> 00:17:37.180 but what is at the location p? 00:17:37.180 --> 00:17:40.850 How do I go to the location in p? 00:17:40.850 --> 00:17:41.350 OK. 00:17:41.350 --> 00:17:42.610 Star p, I heard. 00:17:42.610 --> 00:17:46.050 So instead of printing p itself, I say star p. 00:17:46.050 --> 00:17:48.850 I change the format code just to be an int. 00:17:48.850 --> 00:17:49.350 OK. 00:17:49.350 --> 00:17:51.190 Siri is trying to be helpful here. 00:17:51.190 --> 00:17:55.580 But now I'm saying, go ahead and print me an integer. 00:17:55.580 --> 00:17:58.090 And the integer I want you to print is the one at p. 00:17:58.090 --> 00:18:00.570 Star means go to that address, which is p. 00:18:00.570 --> 00:18:03.190 So let me save this, make address. 00:18:03.190 --> 00:18:04.750 All right, seems to compile. 00:18:04.750 --> 00:18:06.910 Dot slash address, let's see what happens. 00:18:06.910 --> 00:18:08.453 And back to 50. 00:18:08.453 --> 00:18:10.870 So we're just kind of jumping through hoops at the moment, 00:18:10.870 --> 00:18:12.460 accomplishing nothing real yet. 00:18:12.460 --> 00:18:15.040 But again, just demonstrating, and applying, and reversing 00:18:15.040 --> 00:18:18.820 the effects of these two operators. 00:18:18.820 --> 00:18:25.428 Any questions thus far on these addresses, or pointers, or the like? 00:18:25.428 --> 00:18:26.880 Yeah. 00:18:26.880 --> 00:18:34.634 AUDIENCE: So there's six lines where you stored the address of n-- 00:18:34.634 --> 00:18:35.592 DAVID J. MALAN: Mm hmm. 00:18:35.592 --> 00:18:37.060 AUDIENCE: --pointer of p. 00:18:37.060 --> 00:18:41.710 DAVID J. MALAN: You stored the address of n in p and p 00:18:41.710 --> 00:18:45.790 is a pointer, specifically a pointer to an integer. 00:18:45.790 --> 00:18:49.130 Put another way, p is the address of an integer. 00:18:49.130 --> 00:18:50.470 Which integer? 00:18:50.470 --> 00:18:51.732 n 00:18:51.732 --> 00:18:54.149 AUDIENCE: Could I just write-- what would happen if I just 00:18:54.149 --> 00:18:55.920 write int p instead of int star p? 00:18:55.920 --> 00:18:57.170 DAVID J. MALAN: Good question. 00:18:57.170 --> 00:19:02.135 If you said int p equals ampersand n semicolon, instead of int star p, 00:19:02.135 --> 00:19:05.260 Clang-- the compiler --would actually yell at you because it realizes that, 00:19:05.260 --> 00:19:08.302 wait a minute, you're trying to store an address, not an integer like you 00:19:08.302 --> 00:19:10.212 and I know it, 12345678. 00:19:10.212 --> 00:19:11.920 Even though technically they are numbers, 00:19:11.920 --> 00:19:14.042 Clang is smart enough to realize that if you're 00:19:14.042 --> 00:19:16.750 getting the address of something, you must store it in a pointer. 00:19:16.750 --> 00:19:20.150 You cannot store it in just an integer. 00:19:20.150 --> 00:19:20.650 All right. 00:19:20.650 --> 00:19:22.320 So let's make this a little more visual. 00:19:22.320 --> 00:19:24.490 So if this is again my computer's memory, 00:19:24.490 --> 00:19:26.620 let me go ahead and pull up the slide from before. 00:19:26.620 --> 00:19:29.470 And the goal at hand is to visualize really these two lines of code. 00:19:29.470 --> 00:19:31.810 Give me a variable called n and store in it 50-- 00:19:31.810 --> 00:19:36.820 just like Week 1 --then also give me a variable called p and store in it 00:19:36.820 --> 00:19:39.280 the address of n. 00:19:39.280 --> 00:19:40.558 That's now in Week 4. 00:19:40.558 --> 00:19:41.600 What does this look like? 00:19:41.600 --> 00:19:42.725 Well, my computer's memory. 00:19:42.725 --> 00:19:44.800 Let's go ahead and put n on the screen again. 00:19:44.800 --> 00:19:47.560 And n might be down there arbitrarily somewhere in memory. 00:19:47.560 --> 00:19:50.230 And it's called n, the value is 50. 00:19:50.230 --> 00:19:52.220 Technically, that 50 is somewhere. 00:19:52.220 --> 00:19:55.270 And let's just arbitrarily for discussion sake, say it address 00:19:55.270 --> 00:19:58.600 0x 12345678, so somewhere arbitrary. 00:19:58.600 --> 00:20:00.940 What does p look like in this picture? 00:20:00.940 --> 00:20:04.840 Well p is a variable, which means it's a bunch of bits 00:20:04.840 --> 00:20:06.190 that can store information. 00:20:06.190 --> 00:20:09.220 And let's just propose that they're up here in the middle. 00:20:09.220 --> 00:20:10.960 This variable is called p. 00:20:10.960 --> 00:20:12.610 What value is p storing? 00:20:12.610 --> 00:20:21.220 It's literally storing 0x12345678, which is again, the address of the value n. 00:20:21.220 --> 00:20:22.690 So that's all that's going on here. 00:20:22.690 --> 00:20:24.280 But honestly, this is getting so low level. 00:20:24.280 --> 00:20:26.100 And even my sort of eyes are glazing over 00:20:26.100 --> 00:20:28.183 as we start talking about these low level details. 00:20:28.183 --> 00:20:30.680 Turns out that pointers lend themselves to abstraction. 00:20:30.680 --> 00:20:32.600 And in fact, we can start to do that already. 00:20:32.600 --> 00:20:36.190 Let's just focus now in the absence of memory, just on these two values. 00:20:36.190 --> 00:20:39.220 This big rectangle here represents a variable 00:20:39.220 --> 00:20:41.350 called p, which stores an address. 00:20:41.350 --> 00:20:43.480 This rectangle here represents another variable 00:20:43.480 --> 00:20:46.150 called n that storing the number 50. 00:20:46.150 --> 00:20:48.100 Technically speaking, I don't really want 00:20:48.100 --> 00:20:52.300 to care moving forward what address of n is. 00:20:52.300 --> 00:20:54.308 I just want you to know that I can access it. 00:20:54.308 --> 00:20:56.350 And so would a computer scientist would typically 00:20:56.350 --> 00:20:59.320 do is never talk about specific addresses-- 00:20:59.320 --> 00:21:02.410 certainly never write them down like I have thus far --but instead, just 00:21:02.410 --> 00:21:04.690 literally draw an arrow that conceptually 00:21:04.690 --> 00:21:09.220 says that this variable p is pointing at the number 50. 00:21:09.220 --> 00:21:11.530 And we can very quickly start to move away 00:21:11.530 --> 00:21:15.520 from the actual addresses in question. 00:21:15.520 --> 00:21:18.640 And in fact, we can visualize this even a little metaphorically. 00:21:18.640 --> 00:21:20.860 So for instance, here is, for instance, a mailbox. 00:21:20.860 --> 00:21:23.890 And suppose that this is address 123. 00:21:23.890 --> 00:21:25.390 What is in address 123? 00:21:25.390 --> 00:21:29.470 Well it's a variable of type int, called n, 00:21:29.470 --> 00:21:31.150 looks like it's storing the number 50. 00:21:31.150 --> 00:21:31.650 Right? 00:21:31.650 --> 00:21:32.700 We saw these letters-- 00:21:32.700 --> 00:21:33.700 these numbers last week. 00:21:33.700 --> 00:21:37.480 So here's the number 50, which is an integer inside of this variable, today, 00:21:37.480 --> 00:21:40.720 represented as a mailbox instead of as a locker. 00:21:40.720 --> 00:21:45.220 Well suppose that this mailbox over here is not n but suppose this is p. 00:21:45.220 --> 00:21:47.200 And it happens to be an address 456. 00:21:47.200 --> 00:21:48.970 But who really cares? 00:21:48.970 --> 00:21:55.720 If this variable p is a pointer to an integer, namely that one over there, 00:21:55.720 --> 00:21:58.210 when I open this door, what am I going to find? 00:21:58.210 --> 00:22:00.400 Well I'm hoping I find the equivalent of-- we 00:22:00.400 --> 00:22:02.620 picked these up at the Coop earlier --the equivalent 00:22:02.620 --> 00:22:07.580 of a conceptual pointer saying the number n is over there. 00:22:07.580 --> 00:22:11.350 But what specifically, at a lower level, is actually inside this mailbox 00:22:11.350 --> 00:22:15.520 if that variable n is at location 0x123? 00:22:15.520 --> 00:22:17.590 What's probably inside this mailbox? 00:22:17.590 --> 00:22:19.310 AUDIENCE: [INAUDIBLE] 00:22:19.310 --> 00:22:21.680 DAVID J. MALAN: Yeah, the address, indeed, 123. 00:22:21.680 --> 00:22:23.680 So it's sort of like a treasure map if you will. 00:22:23.680 --> 00:22:25.960 Oh, I have to go to 123 to get this value. 00:22:25.960 --> 00:22:28.817 Oh, the integer in question is indeed 50. 00:22:28.817 --> 00:22:30.400 And that's the fundamental difference. 00:22:30.400 --> 00:22:34.240 This is the int that happens to be inside of this variable of type int. 00:22:34.240 --> 00:22:38.980 This is the address that's a pointer that's in this other variable, p, 00:22:38.980 --> 00:22:42.340 but that is conceptually, simply pointing from one variable 00:22:42.340 --> 00:22:45.610 to another, thereby giving any sort of conceptual breadcrumbs. 00:22:45.610 --> 00:22:49.193 And we'll see-- frankly, in one week --how amazingly powerful it is. 00:22:49.193 --> 00:22:51.610 When you can have one piece of memory pointing at another, 00:22:51.610 --> 00:22:53.350 pointing at another, pointing at another, 00:22:53.350 --> 00:22:56.860 you can start to construct very sophisticated data structures, 00:22:56.860 --> 00:22:58.940 as they're called, things like family trees, 00:22:58.940 --> 00:23:01.690 and lists, and other data structures that you might have heard of. 00:23:01.690 --> 00:23:04.810 Or even if you haven't, these will be the underpinnings next week 00:23:04.810 --> 00:23:07.572 of all of today's fanciest algorithms used by, 00:23:07.572 --> 00:23:09.280 certainly the Googles, and the Facebooks, 00:23:09.280 --> 00:23:11.830 and the Microsofts of the world to manage large data sets. 00:23:11.830 --> 00:23:15.340 That's where we're going next week, in terms of application. 00:23:15.340 --> 00:23:18.317 So questions about that representation? 00:23:18.317 --> 00:23:19.150 Yeah, in the middle. 00:23:19.150 --> 00:23:22.380 AUDIENCE: Does that mean that your memory has to be twice as big? 00:23:22.380 --> 00:23:23.390 DAVID J. MALAN: Sorry can you say it once more? 00:23:23.390 --> 00:23:26.640 AUDIENCE: Is that to say your memory has to be twice as big to store pointers? 00:23:26.640 --> 00:23:28.348 DAVID J. MALAN: Ah, really good question. 00:23:28.348 --> 00:23:30.850 Is it the case that your pointers need to be twice as big? 00:23:30.850 --> 00:23:34.240 Not necessarily, just, this is the way life is these days. 00:23:34.240 --> 00:23:39.460 On most modern Macs and PCs, pointers use 64 bits-- the equivalent of a long, 00:23:39.460 --> 00:23:41.860 if you recall that brief discussion in Week 1. 00:23:41.860 --> 00:23:44.110 So I deliberately drew my pointer on the screen 00:23:44.110 --> 00:23:47.440 here as taking up 8 bytes or 64 bits. 00:23:47.440 --> 00:23:52.060 I've deliberately drawn my integer n as taking up 4 bytes or 32 bits. 00:23:52.060 --> 00:23:54.400 That is convention these days on modern hardware. 00:23:54.400 --> 00:23:56.900 But it's not necessarily the case. 00:23:56.900 --> 00:23:59.602 Frankly, I could not find a bigger mailbox at Home Depot, 00:23:59.602 --> 00:24:01.810 so we went with two identical different colored ones. 00:24:01.810 --> 00:24:03.880 So metaphor is imperfect. 00:24:03.880 --> 00:24:04.630 All right. 00:24:04.630 --> 00:24:09.070 So moving from this to something more familiar now, if you will. 00:24:09.070 --> 00:24:12.970 Recall that we've been talking about strings for quite some time. 00:24:12.970 --> 00:24:15.880 And in fact, most of the interesting programs we've written thus far 00:24:15.880 --> 00:24:19.630 involve maybe input from the human and some form of text 00:24:19.630 --> 00:24:21.280 that you are then manipulating. 00:24:21.280 --> 00:24:24.728 But string we said in Week 1 is a bit of a white lie. 00:24:24.728 --> 00:24:26.770 I mean, it is the training wheels that I promised 00:24:26.770 --> 00:24:28.360 we would start taking off today. 00:24:28.360 --> 00:24:32.990 So let's consider what a string actually is now in this new context. 00:24:32.990 --> 00:24:36.040 So if we have a string like EMMA here, declared in a variable 00:24:36.040 --> 00:24:39.920 called s, and quote unquote, EMMA in all caps, as we've done a couple of times 00:24:39.920 --> 00:24:40.420 now. 00:24:40.420 --> 00:24:42.795 What does this actually look like inside of the computer? 00:24:42.795 --> 00:24:47.380 Well somewhere in my computer's memory there are four, nay, five bytes, 00:24:47.380 --> 00:24:52.610 storing E-M-M-A, and then additionally, that null terminating character that 00:24:52.610 --> 00:24:55.390 demarcates where the end of the string is. 00:24:55.390 --> 00:24:58.150 This is just eight individual 0 bits. 00:24:58.150 --> 00:25:01.030 So that's where EMMA might be represented in the computer's memory. 00:25:01.030 --> 00:25:04.240 But recall that the variable in question was s. 00:25:04.240 --> 00:25:05.320 That was my string. 00:25:05.320 --> 00:25:07.330 And so that's why over the past few weeks 00:25:07.330 --> 00:25:11.050 any time you want to manipulate a string, you use its name, like s. 00:25:11.050 --> 00:25:13.870 And you can access bracket 0, bracket 1, bracket 2, bracket 3, 00:25:13.870 --> 00:25:19.450 to get at the individual characters in that string like EMMA, E-M-M-A, 00:25:19.450 --> 00:25:20.810 respectively. 00:25:20.810 --> 00:25:26.050 But of course it's the case, especially per today's revelation, that really, 00:25:26.050 --> 00:25:28.280 all of those bytes have their own addresses. 00:25:28.280 --> 00:25:28.780 Right? 00:25:28.780 --> 00:25:31.780 We're not going to care after this week what those addresses are 00:25:31.780 --> 00:25:32.920 but they certainly exist. 00:25:32.920 --> 00:25:36.160 For instance, E might be at 0x123. 00:25:36.160 --> 00:25:38.020 M might be at 0x124-- 00:25:38.020 --> 00:25:42.675 1 byte away --0x125, 0x126, 0x127. 00:25:42.675 --> 00:25:45.550 They're deliberately 1 byte away because remember a string is defined 00:25:45.550 --> 00:25:47.930 by characters back-to-back-to-back. 00:25:47.930 --> 00:25:51.700 So let's say for the sake of discussion that EMMA name in memory 00:25:51.700 --> 00:25:54.670 happens to start at 0x123. 00:25:54.670 --> 00:25:58.270 Well, what then really is that variable s? 00:25:58.270 --> 00:26:01.960 Well, I dare say that s is really just a pointer. 00:26:01.960 --> 00:26:02.830 Right? 00:26:02.830 --> 00:26:06.790 It can be a variable, depicted here just as before, called s. 00:26:06.790 --> 00:26:08.920 And it stores the value 0x123. 00:26:08.920 --> 00:26:09.700 Why? 00:26:09.700 --> 00:26:11.590 That's where Emma's name begins. 00:26:11.590 --> 00:26:14.680 But of course, we don't really have to care about this level of precision, 00:26:14.680 --> 00:26:15.472 the actual numbers. 00:26:15.472 --> 00:26:17.140 Let's just draw it as a picture. 00:26:17.140 --> 00:26:21.980 s is, if you will, a pointer to Emma's actual name in memory, 00:26:21.980 --> 00:26:23.230 which might be down over here. 00:26:23.230 --> 00:26:24.147 It might be over here. 00:26:24.147 --> 00:26:27.040 It might be over here, depending on where in the computer's memory 00:26:27.040 --> 00:26:28.390 it ended up by chance. 00:26:28.390 --> 00:26:32.830 But this arrow just suggests that s is pointing to Emma, specifically 00:26:32.830 --> 00:26:35.020 at the first letter in her name. 00:26:35.020 --> 00:26:36.670 But that's sufficient though, right? 00:26:36.670 --> 00:26:41.377 Because how-- if s stores the beginning of Emma's name, 0x123. 00:26:41.377 --> 00:26:43.210 And that's indeed where the E is but we just 00:26:43.210 --> 00:26:45.940 draw this pictorially with an arrow. 00:26:45.940 --> 00:26:48.550 How does the computer know where Emma's name 00:26:48.550 --> 00:26:52.392 ends if all it's technically remembering is the beginning? 00:26:52.392 --> 00:26:54.100 AUDIENCE: The null terminating character. 00:26:54.100 --> 00:26:55.300 DAVID J. MALAN: The null terminating character. 00:26:55.300 --> 00:26:58.130 And we stipulated a couple of weeks ago that that is important. 00:26:58.130 --> 00:27:00.610 But now it's all the more important because it turns out 00:27:00.610 --> 00:27:03.640 that s, this thing we've been calling a string, 00:27:03.640 --> 00:27:08.530 has no familiarity with MMA or the null terminator. 00:27:08.530 --> 00:27:11.500 All s is pointing at technically, as of today, 00:27:11.500 --> 00:27:16.090 is the first letter in her name, which happens to be in this story at 0x123. 00:27:16.090 --> 00:27:19.570 But the computer is smart enough to know that if you just point it 00:27:19.570 --> 00:27:22.630 at the first letter in a string, it can figure out 00:27:22.630 --> 00:27:25.150 where the string ends by just looking-- 00:27:25.150 --> 00:27:29.440 as with a loop --for that null terminating character. 00:27:29.440 --> 00:27:35.590 So this is to say ultimately, that there is no such thing as string. 00:27:35.590 --> 00:27:37.870 And we'll see if this strikes a chord. 00:27:37.870 --> 00:27:39.740 There is no such thing as a string. 00:27:39.740 --> 00:27:42.160 This was a little white lie we began telling in Week 1 00:27:42.160 --> 00:27:46.190 just so that we could get interesting, real work done, manipulating text. 00:27:46.190 --> 00:27:51.306 But what is string most likely implemented as would you say? 00:27:51.306 --> 00:27:52.970 AUDIENCE: An array of characters. 00:27:52.970 --> 00:27:54.140 DAVID J. MALAN: An array of characters, yes. 00:27:54.140 --> 00:27:55.515 But that was Week 1's definition. 00:27:55.515 --> 00:27:58.070 What technically now, as of today, must a string be? 00:27:58.070 --> 00:27:59.520 AUDIENCE: [INAUDIBLE] 00:27:59.520 --> 00:28:00.000 DAVID J. MALAN: Sorry, over here. 00:28:00.000 --> 00:28:00.800 AUDIENCE: A pointer. 00:28:00.800 --> 00:28:01.883 DAVID J. MALAN: A pointer. 00:28:01.883 --> 00:28:04.170 Right? s, the variable in which I was storing 00:28:04.170 --> 00:28:08.790 Emma's name would seem to manifest a pattern just 00:28:08.790 --> 00:28:11.430 like we saw with the numbers a moment ago, the number 50. 00:28:11.430 --> 00:28:14.640 s seems to be storing the address of the first character 00:28:14.640 --> 00:28:16.170 in that sequence of characters. 00:28:16.170 --> 00:28:18.417 And so indeed, it would seem to be a string. 00:28:18.417 --> 00:28:20.250 Well, how do we actually connect these dots? 00:28:20.250 --> 00:28:22.500 Well suppose that we have this line of code 00:28:22.500 --> 00:28:24.910 again where we had int n equals 50. 00:28:24.910 --> 00:28:27.160 And then we had this other line of code where we said, 00:28:27.160 --> 00:28:31.170 go ahead and create a variable called p and store in it the address of n. 00:28:31.170 --> 00:28:33.210 That's where we left off earlier. 00:28:33.210 --> 00:28:36.990 But it turns out that this thing here is our data type from Week 1. 00:28:36.990 --> 00:28:40.650 This thing here, int star, is a new data type as of today. 00:28:40.650 --> 00:28:43.860 The variable stores, not an int, but the address of an int. 00:28:43.860 --> 00:28:47.910 It turns out that something like this line of code, with Emma's name, 00:28:47.910 --> 00:28:51.850 is synonymous with char star. 00:28:51.850 --> 00:28:52.350 Right? 00:28:52.350 --> 00:28:58.200 If a star represents an address and char represents the type of address being 00:28:58.200 --> 00:29:02.760 pointed at, just as int star can let you point at a value like n-- 00:29:02.760 --> 00:29:05.550 which stored 50 --so could a char star-- 00:29:05.550 --> 00:29:09.390 by that same logic --allow you to store the address of and therefore 00:29:09.390 --> 00:29:12.030 point at a character. 00:29:12.030 --> 00:29:14.790 And of course, as you said, from Week 1, a string 00:29:14.790 --> 00:29:16.890 is just a sequence of characters. 00:29:16.890 --> 00:29:21.210 So a string would seem to be just the address of the first byte 00:29:21.210 --> 00:29:23.010 in the sequence of characters. 00:29:23.010 --> 00:29:27.540 And the last byte happens to be all 0s by convention, to help us find the end. 00:29:27.540 --> 00:29:29.340 So what then more technically is a string 00:29:29.340 --> 00:29:31.800 and what is the CS50 library that we're now going 00:29:31.800 --> 00:29:34.440 to start taking off as training wheels? 00:29:34.440 --> 00:29:36.900 Well last week we introduced you to the notion of typedef, 00:29:36.900 --> 00:29:40.890 where you can create your own customized data type that does not exist in C 00:29:40.890 --> 00:29:42.810 but does exist in your own program. 00:29:42.810 --> 00:29:44.675 And we introduced this keyword, typedef. 00:29:44.675 --> 00:29:47.550 We proposed last week that this was useful because you could actually 00:29:47.550 --> 00:29:50.340 declare a fancy structure that encapsulates 00:29:50.340 --> 00:29:52.860 multiple variables, like name and number, 00:29:52.860 --> 00:29:56.160 and then we called this data structure, last week, a person. 00:29:56.160 --> 00:29:58.050 That was the new data type we invented. 00:29:58.050 --> 00:30:01.410 Well it turns out you can use typedef in exactly the same way 00:30:01.410 --> 00:30:04.740 even more simply than we did last week by saying this. 00:30:04.740 --> 00:30:08.460 If you say typedef char star string-- 00:30:08.460 --> 00:30:11.910 typedef means give me a new data type, just for my own use. 00:30:11.910 --> 00:30:17.610 Char star means the type of value is going to be the address of a character. 00:30:17.610 --> 00:30:21.480 And the name I want to give to that data type is going to be string. 00:30:21.480 --> 00:30:24.630 And so literally, this line of code here, this 00:30:24.630 --> 00:30:28.230 is one of the lines of code in CS50 dot h-- the header 00:30:28.230 --> 00:30:30.480 file you've been including for several weeks, 00:30:30.480 --> 00:30:33.540 where we are creating a data type called string 00:30:33.540 --> 00:30:35.792 to make it a synonym for char star. 00:30:35.792 --> 00:30:37.500 So that if you will, it's an abstraction, 00:30:37.500 --> 00:30:42.090 a simplification on top of the idea of a sequence of characters 00:30:42.090 --> 00:30:45.257 being pointed at by an address. 00:30:45.257 --> 00:30:45.840 Any questions? 00:30:45.840 --> 00:30:47.910 And honestly, this is why-- and maybe those sort 00:30:47.910 --> 00:30:51.270 of blank stares --this is why we introduced strings in Week 1 00:30:51.270 --> 00:30:55.080 as being an actual type as opposed to not existing at all. 00:30:55.080 --> 00:30:57.630 Because who really cares about addresses and pointers 00:30:57.630 --> 00:30:59.670 and all of that when all you want to do is like, 00:30:59.670 --> 00:31:04.950 print, hello world, or hello, so and so's name? 00:31:04.950 --> 00:31:05.730 Yeah, question. 00:31:05.730 --> 00:31:10.450 AUDIENCE: What other-- what other functions are created-- 00:31:10.450 --> 00:31:13.568 major functions are created by CS50 are not intrinsic to-- 00:31:13.568 --> 00:31:15.110 DAVID J. MALAN: Really good question. 00:31:15.110 --> 00:31:16.610 We'll come back to this later today. 00:31:16.610 --> 00:31:19.410 But other functions that are defined in the CS50 library that 00:31:19.410 --> 00:31:21.660 are training wheels that come off today are getString, 00:31:21.660 --> 00:31:24.630 getInt, getFloat, and the other get functions as well. 00:31:24.630 --> 00:31:27.630 But that's about it that we do for you. 00:31:27.630 --> 00:31:30.072 Other questions? 00:31:30.072 --> 00:31:31.060 Yeah. 00:31:31.060 --> 00:31:34.024 AUDIENCE: Can you define all of these words again? 00:31:34.024 --> 00:31:38.964 Like, it's-- so string is like a character pointer which points-- 00:31:38.964 --> 00:31:40.107 I was confused about that. 00:31:40.107 --> 00:31:40.940 Can you repeat that? 00:31:40.940 --> 00:31:42.030 DAVID J. MALAN: Sure. 00:31:42.030 --> 00:31:47.707 A string, per this definition, is a char star, as a programmer would say. 00:31:47.707 --> 00:31:48.540 What does that mean? 00:31:48.540 --> 00:31:53.850 A string is quite simply a variable that contains the address of a character. 00:31:53.850 --> 00:31:56.970 By our human convention, that character might be the beginning 00:31:56.970 --> 00:31:59.400 of a multi character sequence. 00:31:59.400 --> 00:32:01.590 But that's what we called strings in Week 1. 00:32:01.590 --> 00:32:05.273 So a string is just the address of a single character. 00:32:05.273 --> 00:32:08.190 And we leave it to human convention to know that the end of the string 00:32:08.190 --> 00:32:12.000 will just be demarcated by eight 0 bits, a.k.a. 00:32:12.000 --> 00:32:13.088 the null terminator. 00:32:13.088 --> 00:32:14.880 And this is the sense in which-- especially 00:32:14.880 --> 00:32:16.755 if you have some prior programming experience 00:32:16.755 --> 00:32:18.570 --that C is much more low level. 00:32:18.570 --> 00:32:20.700 In Python, as you'll soon see in a few weeks, 00:32:20.700 --> 00:32:22.867 everything just works so splendidly easily. 00:32:22.867 --> 00:32:24.700 If you want a string, you can have a string. 00:32:24.700 --> 00:32:27.242 You don't have to worry about any of these low level details. 00:32:27.242 --> 00:32:30.090 But that's because Python is built here, conceptually, 00:32:30.090 --> 00:32:33.780 where C is built down here-- so to speak --closer to the computer's memory. 00:32:33.780 --> 00:32:34.740 But there's no magic. 00:32:34.740 --> 00:32:36.060 If you want to string, fine. 00:32:36.060 --> 00:32:38.310 Just remember where it starts, remember where it ends. 00:32:38.310 --> 00:32:39.630 And boom, you're done. 00:32:39.630 --> 00:32:45.130 The star in the syntax today is just a way of expressing those ideas in code. 00:32:45.130 --> 00:32:47.520 So let's go ahead then and experiment with this string, 00:32:47.520 --> 00:32:51.660 just as we did a moment ago using Emma's name now instead of an int. 00:32:51.660 --> 00:32:53.730 So let me go ahead and erase those lines earlier. 00:32:53.730 --> 00:32:57.930 And let me go back to Week 1 style stuff, where I just say string s 00:32:57.930 --> 00:32:59.340 equals quote unquote, Emma. 00:32:59.340 --> 00:33:03.750 And then of course, if I to print this, I can simply say this as before. 00:33:03.750 --> 00:33:07.950 So just as a quick safety check, let me go ahead and make address again. 00:33:07.950 --> 00:33:09.390 Whoops. 00:33:09.390 --> 00:33:11.570 What did I do wrong? 00:33:11.570 --> 00:33:13.620 Let me scroll up to the first-- 00:33:13.620 --> 00:33:15.964 of many it seems --errors. 00:33:15.964 --> 00:33:17.410 Yeah. 00:33:17.410 --> 00:33:19.873 AUDIENCE: You're using string, [INAUDIBLE] 00:33:19.873 --> 00:33:22.040 DAVID J. MALAN: Yeah, I kind of shouldn't have taken 00:33:22.040 --> 00:33:23.750 off all the training wheels just yet. 00:33:23.750 --> 00:33:25.087 I'm still using string. 00:33:25.087 --> 00:33:27.170 So let me go ahead and put that back just for now. 00:33:27.170 --> 00:33:29.870 That will give me access to that typedef for string. 00:33:29.870 --> 00:33:31.960 Let me recompile it as make address. 00:33:31.960 --> 00:33:32.480 That worked. 00:33:32.480 --> 00:33:33.980 So that was the solution, thank you. 00:33:33.980 --> 00:33:35.360 And then address again. 00:33:35.360 --> 00:33:36.470 We just see Emma. 00:33:36.470 --> 00:33:39.810 So what can we now do that's a little bit different here? 00:33:39.810 --> 00:33:42.350 Well, one, you know what I can actually do? 00:33:42.350 --> 00:33:45.230 I can get rid of this-- the solution a moment ago --and say, 00:33:45.230 --> 00:33:46.440 I don't need string anymore. 00:33:46.440 --> 00:33:47.898 I don't need those training wheels. 00:33:47.898 --> 00:33:51.483 If s is going to represent a string, technically, s 00:33:51.483 --> 00:33:53.900 is just going to store the address of the first character. 00:33:53.900 --> 00:33:57.170 And it suffices actually, just to write this. 00:33:57.170 --> 00:34:00.170 So literally instead of string, you write char star. 00:34:00.170 --> 00:34:01.730 Technically, you don't need-- 00:34:01.730 --> 00:34:03.680 you can have extra space to the left or right. 00:34:03.680 --> 00:34:08.150 But most programmers write it just as I have here, char star variable name. 00:34:08.150 --> 00:34:11.045 That looks scarier now but it's no different from what 00:34:11.045 --> 00:34:12.170 we've been doing for weeks. 00:34:12.170 --> 00:34:14.780 If I now do make address without the CS50 library, 00:34:14.780 --> 00:34:17.670 still works, because C knows what I'm talking about. 00:34:17.670 --> 00:34:20.780 And if I run address now, I still see Emma. 00:34:20.780 --> 00:34:22.270 But now I can start to play around. 00:34:22.270 --> 00:34:22.770 Right? 00:34:22.770 --> 00:34:26.480 If s is the address of a character, what was the format code 00:34:26.480 --> 00:34:29.570 I can use to print an address? 00:34:29.570 --> 00:34:30.947 Not percent i, but-- 00:34:30.947 --> 00:34:31.780 AUDIENCE: Percent p. 00:34:31.780 --> 00:34:33.650 DAVID J. MALAN: Percent p, a pointer. 00:34:33.650 --> 00:34:35.600 So let me go ahead and recompile this now. 00:34:35.600 --> 00:34:38.512 Make address, that compiles too. 00:34:38.512 --> 00:34:41.179 And when I run dot slash address, I'm not going to see Emma now. 00:34:41.179 --> 00:34:44.427 What should I see instead? 00:34:44.427 --> 00:34:45.260 Some address, right? 00:34:45.260 --> 00:34:46.409 I have no idea what it is. 00:34:46.409 --> 00:34:50.060 It looks like Emma's name is stored at 0x42A9F2, 00:34:50.060 --> 00:34:52.489 whatever that number translates to decimal, somewhere 00:34:52.489 --> 00:34:53.870 in the computer's memory. 00:34:53.870 --> 00:34:57.482 But it turns out then too, what about this? 00:34:57.482 --> 00:34:59.690 Let me go ahead and add another line of code and say, 00:34:59.690 --> 00:35:01.850 you know what, I'm really curious now. 00:35:01.850 --> 00:35:06.470 What is the address of the first letter in Emma's name? 00:35:06.470 --> 00:35:10.400 How do I express in C, the first letter only of Emma's name 00:35:10.400 --> 00:35:12.066 if Emma is stored in s. 00:35:12.066 --> 00:35:13.930 AUDIENCE: [INAUDIBLE] 00:35:13.930 --> 00:35:17.085 DAVID J. MALAN: s bracket zero, right? 00:35:17.085 --> 00:35:18.210 That would seem to be that. 00:35:18.210 --> 00:35:18.960 But that is what? 00:35:18.960 --> 00:35:21.510 That's a char. s bracket 0 is a char. 00:35:21.510 --> 00:35:23.850 How do I get the address of s bracket 0? 00:35:23.850 --> 00:35:24.770 AUDIENCE: Ampersand. 00:35:24.770 --> 00:35:26.728 DAVID J. MALAN: Yeah, I can just say ampersand. 00:35:26.728 --> 00:35:27.228 Right? 00:35:27.228 --> 00:35:29.180 So it's ugly looking but that's fine for now. 00:35:29.180 --> 00:35:30.620 Make address, enter. 00:35:30.620 --> 00:35:31.940 Whoops. 00:35:31.940 --> 00:35:34.130 It's uglier because I forgot my semicolon. 00:35:34.130 --> 00:35:37.160 Let me go ahead and make address again, enter. 00:35:37.160 --> 00:35:38.270 Seems to compile. 00:35:38.270 --> 00:35:41.850 And when I run dot slash address now, notice I get the same thing. 00:35:41.850 --> 00:35:43.880 And this is because C is taking me literally. 00:35:43.880 --> 00:35:46.972 When you print out s, a string, it's technically just the address 00:35:46.972 --> 00:35:47.930 of the first character. 00:35:47.930 --> 00:35:50.330 And indeed, I can corroborate as much by running s 00:35:50.330 --> 00:35:53.570 bracket zero then get the address of the first character. 00:35:53.570 --> 00:35:55.560 And they are indeed one in the same. 00:35:55.560 --> 00:35:59.660 So a string is this sort of abstraction on top of a bunch of characters. 00:35:59.660 --> 00:36:01.880 But again, s is just an address. 00:36:01.880 --> 00:36:03.520 And that's all we're emphasizing now. 00:36:03.520 --> 00:36:06.020 And if I get really curious-- not that you would necessarily 00:36:06.020 --> 00:36:08.090 do this in a real program --what if I print 00:36:08.090 --> 00:36:13.310 out a few more characters in Emma's name, like s bracket 1, 2, and 3? 00:36:13.310 --> 00:36:16.700 Let me go ahead, just out of curiosity and make this program and dot slash 00:36:16.700 --> 00:36:17.750 address. 00:36:17.750 --> 00:36:23.960 Now notice what I see, is again, s's address is at 42AB52. 00:36:23.960 --> 00:36:26.150 The first character in s is at the same thing, 00:36:26.150 --> 00:36:28.220 by definition of what a string is. 00:36:28.220 --> 00:36:31.280 And then notice what's kind of neat-- if this is-- 00:36:31.280 --> 00:36:35.960 if-- for some definition of neat --53, 54, 55 is noteworthy. 00:36:35.960 --> 00:36:36.670 Why? 00:36:36.670 --> 00:36:39.270 They're one byte apart. 00:36:39.270 --> 00:36:42.590 So this whole time, whenever you implemented Caesar, or substitution, 00:36:42.590 --> 00:36:45.053 or some other cipher in problem set two, anytime 00:36:45.053 --> 00:36:47.720 you were manipulating individual characters-- you didn't know it 00:36:47.720 --> 00:36:49.940 --but you were just visiting different mailboxes. 00:36:49.940 --> 00:36:53.330 You were just visiting different addresses in the computer's memory 00:36:53.330 --> 00:36:57.630 in order to manipulate them somehow. 00:36:57.630 --> 00:36:58.130 All right. 00:36:58.130 --> 00:37:00.530 Can I do one last demo that's a little arcane and then 00:37:00.530 --> 00:37:02.820 we'll make things more-- more real? 00:37:02.820 --> 00:37:03.320 All right. 00:37:03.320 --> 00:37:06.950 So it turns out if all that's going on underneath the hood 00:37:06.950 --> 00:37:10.940 is just addresses, watch what I can do here. 00:37:10.940 --> 00:37:16.370 If I want to go ahead and print out what is at the address s, 00:37:16.370 --> 00:37:21.695 what will I find in memory if I go to the address in s? 00:37:21.695 --> 00:37:23.030 AUDIENCE: [INAUDIBLE] 00:37:23.030 --> 00:37:23.750 DAVID J. MALAN: Sorry, a little louder. 00:37:23.750 --> 00:37:25.010 AUDIENCE: The first letter. 00:37:25.010 --> 00:37:27.302 DAVID J. MALAN: The first letter in Emma's name, right? 00:37:27.302 --> 00:37:29.840 If we can all agree-- even if it's a little unfamiliar still 00:37:29.840 --> 00:37:33.290 --that s is just the address of a character, and I say, go to s, 00:37:33.290 --> 00:37:34.670 what should I see specifically? 00:37:34.670 --> 00:37:35.900 AUDIENCE: [INAUDIBLE] 00:37:35.900 --> 00:37:38.510 DAVID J. MALAN: Probably E in Emma Right? 00:37:38.510 --> 00:37:41.070 If s is the address of the first character of her name, 00:37:41.070 --> 00:37:44.300 star s would mean go to that character. 00:37:44.300 --> 00:37:46.220 So let me go ahead and print that as a char. 00:37:46.220 --> 00:37:51.770 So let me go ahead now and make address dot slash address, enter. 00:37:51.770 --> 00:37:56.030 There is the E because I can say, go to that address and print what's there. 00:37:56.030 --> 00:37:59.060 And I can actually do this for all of her letters in her name. 00:37:59.060 --> 00:38:01.250 Let me go ahead and print out another one here. 00:38:01.250 --> 00:38:03.980 So how do I get at the second letter in Emma's name? 00:38:03.980 --> 00:38:07.050 Previous-- normally, like last week, we would have done this. 00:38:07.050 --> 00:38:10.218 And that just magically gets you to the second letter in her name. 00:38:10.218 --> 00:38:11.760 But I can do it a little differently. 00:38:11.760 --> 00:38:16.010 What if I go to s and then, from where do I want 00:38:16.010 --> 00:38:18.200 to go from s to get the second letter? 00:38:18.200 --> 00:38:19.087 AUDIENCE: Plus one. 00:38:19.087 --> 00:38:20.420 DAVID J. MALAN: Plus one, right? 00:38:20.420 --> 00:38:22.730 I mean, maybe we can literally just do arithmetic here. 00:38:22.730 --> 00:38:26.120 If s is the address of her first letter, it stands to reason that s plus 1 00:38:26.120 --> 00:38:27.830 is the address of her second letter. 00:38:27.830 --> 00:38:31.940 So make address now dot slash address. 00:38:31.940 --> 00:38:34.700 And I should see EM. 00:38:34.700 --> 00:38:39.710 And I can do this twice more maybe and go ahead and do this and then this. 00:38:39.710 --> 00:38:44.570 But this time add 2 and this time add 3, just doing some simple arithmetic. 00:38:44.570 --> 00:38:50.760 Make address dot slash address, there is Emma but in a much lower level detail. 00:38:50.760 --> 00:38:52.370 So what is this bracket symbol? 00:38:52.370 --> 00:38:55.307 In computer science, this is what's called syntactic sugar. 00:38:55.307 --> 00:38:56.390 It's kind of a silly name. 00:38:56.390 --> 00:39:00.800 But it just refers to a handy feature so that you, the programmer, can say, 00:39:00.800 --> 00:39:03.230 s bracket 0 or bracket 1. 00:39:03.230 --> 00:39:06.500 But what the computer is actually doing underneath the hood-- the compiler, 00:39:06.500 --> 00:39:10.520 Clang --it's actually converting all of your uses of square brackets since Week 00:39:10.520 --> 00:39:13.790 1 to this format here. 00:39:13.790 --> 00:39:15.980 It's just doing arithmetic underneath the hood. 00:39:15.980 --> 00:39:18.120 Now you don't have to do this moving forward. 00:39:18.120 --> 00:39:21.740 But I point out this low level detail just to give you a sense of, 00:39:21.740 --> 00:39:23.090 there really is no magic. 00:39:23.090 --> 00:39:25.460 When you say, go print an address or go do this, 00:39:25.460 --> 00:39:29.010 the computer is taking you literally. 00:39:29.010 --> 00:39:29.990 Whew. 00:39:29.990 --> 00:39:30.920 OK, that was a lot. 00:39:30.920 --> 00:39:32.046 Yes, question. 00:39:32.046 --> 00:39:37.006 AUDIENCE: So [INAUDIBLE] 00:39:39.500 --> 00:39:42.830 DAVID J. MALAN: Star s would mean go to the address in s. 00:39:42.830 --> 00:39:52.708 AUDIENCE: So why for instance, if you [INAUDIBLE] character [INAUDIBLE] 00:39:52.708 --> 00:39:54.250 DAVID J. MALAN: Really good question. 00:39:54.250 --> 00:39:58.360 Why, when you print out s, does it print out the whole string and not 00:39:58.360 --> 00:39:59.410 just the character? 00:39:59.410 --> 00:40:02.020 That's what the printf format code is doing for you. 00:40:02.020 --> 00:40:06.160 When you tell printf to use percent s, that has special meaning to printf. 00:40:06.160 --> 00:40:08.980 And it knows to go to the first address and not just print 00:40:08.980 --> 00:40:12.280 the second-- the first char, but print every character thereafter 00:40:12.280 --> 00:40:13.880 until it sees what? 00:40:13.880 --> 00:40:15.130 AUDIENCE: The null terminator. 00:40:15.130 --> 00:40:17.088 DAVID J. MALAN: The null terminating character. 00:40:17.088 --> 00:40:21.370 So printf and percent s are special and have been special since the Week 1. 00:40:21.370 --> 00:40:24.580 They just know to do exactly what you've described. 00:40:24.580 --> 00:40:27.730 So pointer arithmetic, to be clear, is just taking addresses and like, 00:40:27.730 --> 00:40:30.040 doing arithmetic with them, adding 1, adding 2, adding 00:40:30.040 --> 00:40:33.890 3, or any other manipulation like that. 00:40:33.890 --> 00:40:34.390 All right. 00:40:34.390 --> 00:40:38.004 So [CHUCKLE] let's take another stab at a meme here. 00:40:38.004 --> 00:40:38.790 [CHUCKLE] 00:40:38.790 --> 00:40:39.700 OK, a few of us. 00:40:39.700 --> 00:40:41.397 All right. 00:40:41.397 --> 00:40:42.730 All right, it's trying too hard. 00:40:42.730 --> 00:40:43.230 All right. 00:40:43.230 --> 00:40:45.850 So what then do we have when it comes to strings? 00:40:45.850 --> 00:40:48.430 Well, let's now try to learn from these primitives 00:40:48.430 --> 00:40:51.490 and actually trip over some mistakes that we might otherwise make. 00:40:51.490 --> 00:40:53.840 I'm going to go ahead and open up a new file. 00:40:53.840 --> 00:40:56.590 I'm going to go ahead and call this one, compare. 00:40:56.590 --> 00:40:58.965 So we'll save this as compare dot c. 00:40:58.965 --> 00:41:01.840 And this will be reminiscent of something we started doing last week. 00:41:01.840 --> 00:41:04.420 And you've done this past week, particularly for implementing 00:41:04.420 --> 00:41:05.715 voting and comparing strings. 00:41:05.715 --> 00:41:07.840 I'm going to go ahead and make a quick program that 00:41:07.840 --> 00:41:09.225 just compares two integers. 00:41:09.225 --> 00:41:11.600 I'm going to put the training wheels back on temporarily, 00:41:11.600 --> 00:41:13.683 just so that we can get some numbers from the user 00:41:13.683 --> 00:41:17.900 pretty easily, including CS50 dot h and standard I/O dot h. 00:41:17.900 --> 00:41:21.080 I'm going to do int main void as my program. 00:41:21.080 --> 00:41:25.850 I'm going to get an integer called i and ask the human for that. 00:41:25.850 --> 00:41:29.440 I'm going to get another integer called j, ask the human for that. 00:41:29.440 --> 00:41:32.740 And then I'm going to go ahead and say if i equals equals j, 00:41:32.740 --> 00:41:38.570 then go ahead and print with printf that they're the same. 00:41:38.570 --> 00:41:43.510 Else, if i does not equal j, I'm going to go ahead quite simply and print out 00:41:43.510 --> 00:41:45.430 different backslash n. 00:41:45.430 --> 00:41:48.310 So if i equals equals j, it should say, same. 00:41:48.310 --> 00:41:50.620 Else, if it's different, it should say different. 00:41:50.620 --> 00:41:54.460 So let me go ahead and make compare dot slash compare. 00:41:54.460 --> 00:42:00.130 And I should see, hopefully, if I type in say, 1, 2, they're different. 00:42:00.130 --> 00:42:02.900 And if I instead do 1, 1, they're the same. 00:42:02.900 --> 00:42:03.400 All right. 00:42:03.400 --> 00:42:06.580 So it stands to reason that logically this is pretty straightforward when 00:42:06.580 --> 00:42:07.858 you want to compare things. 00:42:07.858 --> 00:42:10.400 So instead of using numbers, let me go ahead and change this. 00:42:10.400 --> 00:42:15.890 Let me go ahead and do, say, string s gets getString, just as before 00:42:15.890 --> 00:42:18.250 but using getString instead and ask the human for s. 00:42:18.250 --> 00:42:21.700 Then give me another string, t, just because it's alphabetically next. 00:42:21.700 --> 00:42:24.010 And I'll ask the human for t. 00:42:24.010 --> 00:42:26.860 And then I'm going to go ahead and ask this question, if s equals 00:42:26.860 --> 00:42:30.340 equals t, print same, else, print different. 00:42:30.340 --> 00:42:33.400 So now let me go ahead and make compare again. 00:42:33.400 --> 00:42:36.160 I'm going to go ahead and type in dot slash compare. 00:42:36.160 --> 00:42:37.540 We'll type in Emma. 00:42:37.540 --> 00:42:39.160 We'll then type in Rodrigo. 00:42:39.160 --> 00:42:41.230 And of course, it's going to say different. 00:42:41.230 --> 00:42:44.230 But if I instead run it again and type in Emma and all right, 00:42:44.230 --> 00:42:45.680 I'll type Emma again-- 00:42:45.680 --> 00:42:47.080 hmm, different. 00:42:47.080 --> 00:42:49.900 Maybe it's a capitalization thing? 00:42:49.900 --> 00:42:51.430 No. 00:42:51.430 --> 00:42:54.230 But why as of today, are they indeed different? 00:42:54.230 --> 00:42:56.980 Last week we kind of waved our hands and said, ah, they're arrays, 00:42:56.980 --> 00:42:57.770 you have to do some stuff. 00:42:57.770 --> 00:42:58.895 But why are they different? 00:42:58.895 --> 00:43:00.895 AUDIENCE: They're stored in different locations. 00:43:00.895 --> 00:43:03.520 DAVID J. MALAN: Exactly, they're stored in different locations. 00:43:03.520 --> 00:43:06.400 So when you get a string with getString and call it s, and then you 00:43:06.400 --> 00:43:09.100 get another string with t and call it t, you're 00:43:09.100 --> 00:43:10.780 getting two different chunks of memory. 00:43:10.780 --> 00:43:14.650 And yes, maybe the human has typed the same thing into the keyboard, 00:43:14.650 --> 00:43:17.220 but that doesn't necessarily mean that they're going 00:43:17.220 --> 00:43:19.000 to be stored in the exact same place. 00:43:19.000 --> 00:43:22.420 In fact, what we really have here is a picture not unlike this. 00:43:22.420 --> 00:43:26.410 If I have a variable called s-- and I'm just going to draw it as a box there 00:43:26.410 --> 00:43:28.180 --and if I have a variable called t-- 00:43:28.180 --> 00:43:31.510 I'll draw it as another box here --and I typed in Emma-- 00:43:31.510 --> 00:43:35.260 E-M-M-A --that's going to give me somewhere in memory, 00:43:35.260 --> 00:43:40.390 E-M-M-A backslash 0. 00:43:40.390 --> 00:43:43.720 And I'll try it as an actual array, albeit a little messily. 00:43:43.720 --> 00:43:46.900 And then here, if I type EMMA again in all caps, 00:43:46.900 --> 00:43:48.700 it's going to end up-- thanks to getString, 00:43:48.700 --> 00:43:50.590 at a different location in memory. 00:43:50.590 --> 00:43:54.470 By nature of how getString works, it's going to store anything you type in it. 00:43:54.470 --> 00:43:56.403 And what's going to get stored in s and t? 00:43:56.403 --> 00:43:58.570 Well, for the sake of discussion, let's suppose that 00:43:58.570 --> 00:44:01.540 this chunk of memory with the first input-- 00:44:01.540 --> 00:44:06.250 sorry --happens to be at 0x123. 00:44:06.250 --> 00:44:10.960 And the second chunk of memory happens to be at 0x456, just by chance. 00:44:10.960 --> 00:44:13.920 Well, what am I technically storing in s? 00:44:13.920 --> 00:44:16.110 0x123. 00:44:16.110 --> 00:44:17.250 And what am I storing in t? 00:44:17.250 --> 00:44:19.530 0x456. 00:44:19.530 --> 00:44:22.080 So when you say, is s equal equal t. 00:44:22.080 --> 00:44:23.230 Is it? 00:44:23.230 --> 00:44:23.730 Well, no. 00:44:23.730 --> 00:44:27.243 You're literally comparing 123 versus 456. 00:44:27.243 --> 00:44:29.160 The computer is not going to presumptuously go 00:44:29.160 --> 00:44:32.670 to that address for you unless you somehow tell it to. 00:44:32.670 --> 00:44:35.130 Put another way, if I instead draw these boxes, 00:44:35.130 --> 00:44:39.990 not as actual numbers, what we really have-- sorry --what we really have is 00:44:39.990 --> 00:44:42.000 what we'll draw as an arrow more generally, 00:44:42.000 --> 00:44:44.220 just a pointer to that value. 00:44:44.220 --> 00:44:46.400 Who really cares where the address is? 00:44:46.400 --> 00:44:48.900 So this is why last week we kind of waved our hand and said, 00:44:48.900 --> 00:44:51.840 eh, you can't just compare two strings because you probably 00:44:51.840 --> 00:44:53.310 have to compare every character. 00:44:53.310 --> 00:44:54.750 And that was true. 00:44:54.750 --> 00:44:58.140 But what you're technically comparing is indeed 00:44:58.140 --> 00:45:02.220 the addresses of those two variables. 00:45:02.220 --> 00:45:06.828 Any questions then on this here? 00:45:06.828 --> 00:45:09.220 Yeah. 00:45:09.220 --> 00:45:10.290 Sure, yes. 00:45:10.290 --> 00:45:14.190 AUDIENCE: So you said earlier that the, I 00:45:14.190 --> 00:45:17.190 guess, the pointer, and the actual thing it's 00:45:17.190 --> 00:45:22.080 pointing are like kind of somewhere in the memory not in a specific-- 00:45:22.080 --> 00:45:23.330 they're just somewhere, right? 00:45:23.330 --> 00:45:24.122 DAVID J. MALAN: OK. 00:45:24.122 --> 00:45:27.100 AUDIENCE: So do you need something that points to the point-- 00:45:27.100 --> 00:45:29.250 how does the computer know where the pointer is? 00:45:29.250 --> 00:45:32.250 DAVID J. MALAN: Oh, how does the computer know where these pointers are? 00:45:32.250 --> 00:45:33.690 So that's a really good question. 00:45:33.690 --> 00:45:35.610 And let's answer it right here. 00:45:35.610 --> 00:45:39.572 All this time when you've been calling getString to get a string, 00:45:39.572 --> 00:45:42.780 you've probably been assigning it to a variable like I have here on line six, 00:45:42.780 --> 00:45:44.130 with string s. 00:45:44.130 --> 00:45:49.440 But we know as of today that if we get rid of the CS50 library, technically, 00:45:49.440 --> 00:45:52.530 string is just synonymous with char star. 00:45:52.530 --> 00:45:58.110 And so both here and with t, do you technically have char star, right? 00:45:58.110 --> 00:46:01.200 It's just a find and replace if we get rid of that training wheel. 00:46:01.200 --> 00:46:05.310 Char star just means s is storing the address of a character. 00:46:05.310 --> 00:46:07.980 And char star t means t is storing the address of a character. 00:46:07.980 --> 00:46:14.610 Ergo, all this time since the Week 1 of CS50, what type of value has getString 00:46:14.610 --> 00:46:19.030 been returning, even though we never described it as such? 00:46:19.030 --> 00:46:22.040 What must getString be returning? 00:46:22.040 --> 00:46:22.752 Yeah. 00:46:22.752 --> 00:46:24.695 AUDIENCE: The index of the first letter. 00:46:24.695 --> 00:46:27.153 DAVID J. MALAN: Not even the index per se, but rather the-- 00:46:27.153 --> 00:46:28.350 AUDIENCE: It houses the memory of that. 00:46:28.350 --> 00:46:30.860 DAVID J. MALAN: The address of the first character. 00:46:30.860 --> 00:46:33.670 So anytime you called getString, getString code we wrote 00:46:33.670 --> 00:46:35.920 is finding in your computer's memory some free space, 00:46:35.920 --> 00:46:39.460 enough bytes to fit whatever the word was that got typed in. 00:46:39.460 --> 00:46:41.230 getString then, if we looked at its code, 00:46:41.230 --> 00:46:46.840 is designed to return the address of the first byte of that chunk of memory. 00:46:46.840 --> 00:46:49.960 So getString, this whole time, has been returning, if you will, 00:46:49.960 --> 00:46:51.700 what's called a pointer. 00:46:51.700 --> 00:46:54.970 But again, nuances that we didn't want to get into in the very first week 00:46:54.970 --> 00:46:58.040 certainly, of C programming. 00:46:58.040 --> 00:46:58.540 All right. 00:46:58.540 --> 00:47:00.520 Well, let's go ahead and make this a little more concrete. 00:47:00.520 --> 00:47:02.770 If I pull up this code, I don't have to just check 00:47:02.770 --> 00:47:05.687 if they're same or different, let me just go ahead and print them out. 00:47:05.687 --> 00:47:09.790 If I do percent p backslash n, I can literally print out s. 00:47:09.790 --> 00:47:14.590 And if I go ahead and print out the same thing for t using percent p, 00:47:14.590 --> 00:47:16.560 I can print out the value of t. 00:47:16.560 --> 00:47:18.400 So let me go ahead and make compare. 00:47:18.400 --> 00:47:19.233 Seems to compile OK. 00:47:19.233 --> 00:47:21.358 And I don't know what the addresses are in advance. 00:47:21.358 --> 00:47:23.980 But let me go ahead and type in, for instance, Emma and Emma. 00:47:23.980 --> 00:47:26.890 So even though those strings look the same notice, 00:47:26.890 --> 00:47:31.690 it's a little subtle this time, the first Emma's at 0xED76A0. 00:47:31.690 --> 00:47:39.668 The second Emma's at 0xED76E0, which is a few numbers away from the first Emma. 00:47:39.668 --> 00:47:41.710 So that just corroborates the instincts last week 00:47:41.710 --> 00:47:44.330 that we can't just compare them like that. 00:47:44.330 --> 00:47:46.300 So what are the implications then? 00:47:46.300 --> 00:47:48.100 Let's do one other example here. 00:47:48.100 --> 00:47:51.110 Let me go ahead and save this as copy dot C. 00:47:51.110 --> 00:47:52.660 And let's try a very reasonable goal. 00:47:52.660 --> 00:47:56.230 If I want to go ahead and get the user's input and actually copy a string 00:47:56.230 --> 00:47:58.883 and capitalize the string from the user, let's see this. 00:47:58.883 --> 00:48:02.050 So let me go ahead and give myself the temporary training wheels again, just 00:48:02.050 --> 00:48:03.880 so I can get a string from the human. 00:48:03.880 --> 00:48:08.710 Let me go ahead and include standard I/O dot h and then an int main void. 00:48:08.710 --> 00:48:10.690 Let me do a simple example, the goal of which 00:48:10.690 --> 00:48:16.430 now, is to get a string from the user and capitalize a copy thereof. 00:48:16.430 --> 00:48:20.620 So I'm going to go ahead and do string s gets getString and call it s, 00:48:20.620 --> 00:48:21.640 as before. 00:48:21.640 --> 00:48:24.820 I'm going to go ahead and then do string t equals 00:48:24.820 --> 00:48:26.767 s to make a copy of the variable. 00:48:26.767 --> 00:48:28.600 And then I'm going to go ahead and say what? 00:48:28.600 --> 00:48:30.640 Let me go ahead and capitalize the copy. 00:48:30.640 --> 00:48:34.840 And to capitalize the copy, I can just change the first character 00:48:34.840 --> 00:48:38.290 in t, so t bracket 0, to what? 00:48:38.290 --> 00:48:41.316 I think we had toupper a while back. 00:48:41.316 --> 00:48:42.940 Does this seem familiar? 00:48:42.940 --> 00:48:44.473 You can call the toupper function. 00:48:44.473 --> 00:48:46.390 And the toupper function, if you don't recall, 00:48:46.390 --> 00:48:48.940 you technically have to use C type dot h. 00:48:48.940 --> 00:48:51.040 This might be reminiscent of the second c problem 00:48:51.040 --> 00:48:54.650 set, where you might have used this in Caesar, or substitution, or the like. 00:48:54.650 --> 00:48:55.150 All right. 00:48:55.150 --> 00:48:57.525 And now, let me go ahead and print out these two strings. 00:48:57.525 --> 00:49:00.640 Let me go ahead and print out s. 00:49:00.640 --> 00:49:04.390 And let me go ahead and print out t. 00:49:04.390 --> 00:49:08.470 So again, all I've done in this program is get a string from the user, 00:49:08.470 --> 00:49:13.060 copy that string, capitalize the copy called t. 00:49:13.060 --> 00:49:15.260 And let's just print out the end results. 00:49:15.260 --> 00:49:17.060 So let me go ahead and save the file. 00:49:17.060 --> 00:49:19.360 Let me go ahead and make copy. 00:49:19.360 --> 00:49:20.410 Seems to compile OK. 00:49:20.410 --> 00:49:21.868 Let me go ahead and run copy. 00:49:21.868 --> 00:49:24.160 And let me go ahead and type in emma, in all lowercase, 00:49:24.160 --> 00:49:28.746 deliberately, because I want to see that t is capitalized but not s. 00:49:28.746 --> 00:49:30.480 Hmm. 00:49:30.480 --> 00:49:34.050 But somehow they're both capitalized. 00:49:34.050 --> 00:49:39.060 Notice, that emma in all lowercase ended up being both capitalized in s 00:49:39.060 --> 00:49:42.660 and capitalized in t per the two lines of output. 00:49:42.660 --> 00:49:43.270 That's a bug? 00:49:43.270 --> 00:49:43.770 Right? 00:49:43.770 --> 00:49:46.020 I only capitalized t, how did I accidentally 00:49:46.020 --> 00:49:48.300 also capitalize s do you think? 00:49:50.873 --> 00:49:51.415 Any thoughts? 00:49:54.940 --> 00:49:57.743 Doesn't matter if I avert the lights, I still can't see any hands. 00:49:57.743 --> 00:49:58.910 OK, how about here in front? 00:49:58.910 --> 00:49:59.590 Yeah. 00:49:59.590 --> 00:50:05.280 AUDIENCE: So when you say t equal s you have to [INAUDIBLE] 00:50:05.280 --> 00:50:06.280 DAVID J. MALAN: Exactly. 00:50:06.280 --> 00:50:11.350 When I say t equals s on this line, I am getting a second variable called t. 00:50:11.350 --> 00:50:12.790 And I am copying s. 00:50:12.790 --> 00:50:15.070 But I'm copying s literally. 00:50:15.070 --> 00:50:17.380 s as of today, is an address. 00:50:17.380 --> 00:50:22.840 After all, string is the same thing as char star for both s and t. 00:50:22.840 --> 00:50:25.760 And so technically, all I'm doing is copying an address. 00:50:25.760 --> 00:50:28.810 So if I go back to my picture from before, this time, 00:50:28.810 --> 00:50:33.670 if I've gone ahead and typed in an array of emma, with all lowercase-- 00:50:33.670 --> 00:50:39.250 e-m-m-a --and then a backslash 0, somewhere in memory using getString, 00:50:39.250 --> 00:50:42.685 and I've gone ahead initially and stored that in a variable called s-- 00:50:42.685 --> 00:50:44.560 and I don't care about the addresses anymore. 00:50:44.560 --> 00:50:47.170 I'm just going to use arrows now to depict it graphically. 00:50:47.170 --> 00:50:53.590 When I created a second variable called t and I set t equal to s, 00:50:53.590 --> 00:50:57.220 that's like literally copying the arrow that's in s 00:50:57.220 --> 00:51:01.750 and storing it in t, which means t is also pointing at the same thing. 00:51:01.750 --> 00:51:04.990 Because again, if I didn't do this hand wavy arrow notation, 00:51:04.990 --> 00:51:07.640 I literally wrote out 0x123. 00:51:07.640 --> 00:51:11.830 I would have just written out 0x123 in both s and t. 00:51:11.830 --> 00:51:15.220 So when, in my code, I go ahead and say, you 00:51:15.220 --> 00:51:19.780 know what, go to the first character in t and then go ahead and uppercase it. 00:51:19.780 --> 00:51:22.390 Guess what the first character in t is? 00:51:22.390 --> 00:51:23.380 Well, it's this e. 00:51:23.380 --> 00:51:28.090 But guess what the first character in s is, literally that same e. 00:51:28.090 --> 00:51:30.820 So this does not suffice to copy a string 00:51:30.820 --> 00:51:35.480 by just saying t equals s, as it has up until now with every other variable. 00:51:35.480 --> 00:51:38.230 Any time you've needed a temporary variable or a copy of something 00:51:38.230 --> 00:51:39.040 this worked. 00:51:39.040 --> 00:51:42.070 Intuitively, what do we have to do probably instead 00:51:42.070 --> 00:51:45.890 to truly copy Emma into two different places in memory? 00:51:45.890 --> 00:51:46.390 Yeah. 00:51:46.390 --> 00:51:50.730 AUDIENCE: Probably create a char or create a variable exactly the same size 00:51:50.730 --> 00:51:52.535 and copy each character individually. 00:51:52.535 --> 00:51:53.410 DAVID J. MALAN: Nice. 00:51:53.410 --> 00:51:55.452 So maybe we should give ourselves a variable that 00:51:55.452 --> 00:51:58.840 has more memory, the same amount of memory being stored 00:51:58.840 --> 00:52:02.530 for the original Emma, and then copy the characters from s 00:52:02.530 --> 00:52:05.470 into the space we've allocated for t. 00:52:05.470 --> 00:52:06.890 And so we can actually do this. 00:52:06.890 --> 00:52:10.210 Let me go ahead and get rid of all but that first line, where 00:52:10.210 --> 00:52:11.930 I've gotten s as before. 00:52:11.930 --> 00:52:15.107 And I'm going to go ahead and do this, I'm to say that t is a string-- 00:52:15.107 --> 00:52:17.440 but you know, we don't need that training wheel anymore. 00:52:17.440 --> 00:52:20.020 String, char star, even though it looks uglier. 00:52:20.020 --> 00:52:22.450 Let me go ahead and allocate more memory for myself. 00:52:22.450 --> 00:52:23.292 How do I do that? 00:52:23.292 --> 00:52:26.500 Well, it turns out-- we've not used this before --there's a C function called 00:52:26.500 --> 00:52:28.660 malloc, for memory alloca. 00:52:28.660 --> 00:52:32.320 And all it asks as input is how many bytes you want. 00:52:32.320 --> 00:52:36.130 So how many bytes do I want for Emma to store her name? 00:52:39.422 --> 00:52:40.297 AUDIENCE: [INAUDIBLE] 00:52:40.297 --> 00:52:41.905 DAVID J. MALAN: I heard 4, 5. 00:52:41.905 --> 00:52:42.460 Why, 5? 00:52:42.460 --> 00:52:43.850 AUDIENCE: [INAUDIBLE] 00:52:43.850 --> 00:52:46.267 DAVID J. MALAN: So we need the null terminating character, 00:52:46.267 --> 00:52:47.660 e-m-m-a and then backslash 0. 00:52:47.660 --> 00:52:48.532 So that's 5. 00:52:48.532 --> 00:52:50.240 So I could literally hard code this here. 00:52:50.240 --> 00:52:52.580 Of course, this feels a little fragile because I'm 00:52:52.580 --> 00:52:54.572 asking for any string via getString. 00:52:54.572 --> 00:52:56.030 I don't know it's going to be Emma. 00:52:56.030 --> 00:52:58.238 So you know what, let me go ahead and ask a question? 00:52:58.238 --> 00:53:01.940 Whatever the length is of the human's input in s, 00:53:01.940 --> 00:53:04.970 go ahead and add 1 to it for the null character 00:53:04.970 --> 00:53:06.580 and then allocate that many bytes. 00:53:06.580 --> 00:53:08.570 So now my program's more dynamic. 00:53:08.570 --> 00:53:11.520 And once I have this, well, how can I go ahead and copy this? 00:53:11.520 --> 00:53:13.170 Well, let me just do old school loop. 00:53:13.170 --> 00:53:18.980 So for int I get 0, i is less than the string length of s, i plus plus-- 00:53:18.980 --> 00:53:22.610 so this is just a standard for loop iterating over a string 00:53:22.610 --> 00:53:28.880 --and I think I can just do t bracket i equals s bracket i in order 00:53:28.880 --> 00:53:31.520 to copy the two strings. 00:53:31.520 --> 00:53:36.200 There's a subtle bug and a subtle inefficiency though. 00:53:36.200 --> 00:53:42.050 Anyone want to critique how I've gone about copying s into t? 00:53:42.050 --> 00:53:42.656 Yeah. 00:53:42.656 --> 00:53:44.790 AUDIENCE: [INAUDIBLE] getString [INAUDIBLE].. 00:53:44.790 --> 00:53:45.090 DAVID J. MALAN: Yeah. 00:53:45.090 --> 00:53:45.840 This was inefficient. 00:53:45.840 --> 00:53:47.730 We said a couple of weeks ago this is bad design 00:53:47.730 --> 00:53:49.860 to just keep asking the question, what's the length the s? 00:53:49.860 --> 00:53:51.000 What's the length of s? 00:53:51.000 --> 00:53:54.070 So remember that we had a little optimization a couple of weeks ago. 00:53:54.070 --> 00:53:56.790 Let's just declare n to equal the string length of s 00:53:56.790 --> 00:53:59.460 and then do a condition of i is less than n. 00:53:59.460 --> 00:54:01.000 So we've improved the design there. 00:54:01.000 --> 00:54:02.208 It's a little more efficient. 00:54:02.208 --> 00:54:03.330 We're wasting less time. 00:54:03.330 --> 00:54:06.745 There's still a subtle bug here. 00:54:06.745 --> 00:54:07.620 How many byte-- yeah. 00:54:07.620 --> 00:54:09.960 AUDIENCE: Aren't you not copying the null terminator 00:54:09.960 --> 00:54:12.127 DAVID J. MALAN: I'm not copying the null terminator. 00:54:12.127 --> 00:54:15.840 So every other time we've iterated over a string, this has been correct. 00:54:15.840 --> 00:54:20.850 Iterate up to the length but not through the length of that string. 00:54:20.850 --> 00:54:23.880 But I technically do want to go one more step 00:54:23.880 --> 00:54:26.790 this time, or equivalently, one more step. 00:54:26.790 --> 00:54:30.810 Because I also want to copy not just e-m-m-a, which is str length 4-- 00:54:30.810 --> 00:54:35.620 e-m-m-a is 4 --I also want to do it a fifth time for the null character. 00:54:35.620 --> 00:54:37.860 So in this case, I'm deliberately going one step 00:54:37.860 --> 00:54:42.090 past where I usually want to go to make sure I copy 5 bytes for Emma, 00:54:42.090 --> 00:54:42.910 not just 4. 00:54:42.910 --> 00:54:43.410 All right. 00:54:43.410 --> 00:54:45.035 Let's go ahead now and capitalize Emma. 00:54:45.035 --> 00:54:51.060 So t bracket 0 gets toupper of Emma's first character in the copy. 00:54:51.060 --> 00:54:54.900 And now let's go ahead and print out both strings s 00:54:54.900 --> 00:54:59.550 and t, just as before, with percent s of t. 00:54:59.550 --> 00:55:02.278 And let me make one change, I use strlen now. 00:55:02.278 --> 00:55:04.570 So I know I'm going to get an error if I don't do this. 00:55:04.570 --> 00:55:07.760 I need to use string dot h-- recall --anytime you use string length. 00:55:07.760 --> 00:55:09.600 So I'm going to go proactively add that. 00:55:09.600 --> 00:55:10.710 So what's different? 00:55:10.710 --> 00:55:12.340 This line is the same as before. 00:55:12.340 --> 00:55:14.110 I'm getting a string from the user. 00:55:14.110 --> 00:55:15.790 This line is the same as before. 00:55:15.790 --> 00:55:17.615 I'm capitalizing the first letter. 00:55:17.615 --> 00:55:18.990 And these two lines are the same. 00:55:18.990 --> 00:55:20.610 I'm just printing out s and t. 00:55:20.610 --> 00:55:24.960 So the new idea here is, with my malloc, am I allocating as many bytes 00:55:24.960 --> 00:55:28.020 as I need to store a copy of Emma, and then with this for loop 00:55:28.020 --> 00:55:31.440 am I actually doing the actual copy? 00:55:31.440 --> 00:55:34.350 Let me go ahead and do make copy again. 00:55:34.350 --> 00:55:35.190 Seems to run OK. 00:55:35.190 --> 00:55:37.080 Run dot slash copy. 00:55:37.080 --> 00:55:39.240 Type e-m-m-a in all lowercase. 00:55:39.240 --> 00:55:44.570 And voila, now I've capitalized t but not s. 00:55:44.570 --> 00:55:45.070 Yeah? 00:55:45.070 --> 00:55:49.920 AUDIENCE: When you use malloc, it's just allocating number of bytes, 00:55:49.920 --> 00:55:51.377 it doesn't matter where? 00:55:51.377 --> 00:55:53.960 DAVID J. MALAN: It is just allocating that many bytes for you. 00:55:53.960 --> 00:55:55.200 It does not matter where. 00:55:55.200 --> 00:55:58.950 You indeed should not care where it is because you're just 00:55:58.950 --> 00:56:01.180 being handed the address and using C code, 00:56:01.180 --> 00:56:04.000 can you just go there as you want. 00:56:04.000 --> 00:56:04.500 All right. 00:56:04.500 --> 00:56:05.710 Let's clean this up too. 00:56:05.710 --> 00:56:07.930 Surely, people copy strings for years. 00:56:07.930 --> 00:56:10.470 And in fact, we don't need to do this for loop ourself. 00:56:10.470 --> 00:56:14.160 It turns out we can simplify this code a little bit by enhancing this 00:56:14.160 --> 00:56:14.920 as follows. 00:56:14.920 --> 00:56:17.700 It turns out, if you look in the manual page for strings, 00:56:17.700 --> 00:56:20.070 you can actually use something called strcopy-- 00:56:20.070 --> 00:56:22.350 no-- without any vowels. 00:56:22.350 --> 00:56:25.980 And you can copy into t, the contents of s. 00:56:25.980 --> 00:56:29.220 strcpy is a function written a long time ago by some other human. 00:56:29.220 --> 00:56:32.850 And they went ahead and implemented, probably, that loop for us. 00:56:32.850 --> 00:56:35.670 And it tightens up our code here a little bit more. 00:56:35.670 --> 00:56:36.570 AUDIENCE: Professor? 00:56:36.570 --> 00:56:37.445 DAVID J. MALAN: Yeah. 00:56:37.445 --> 00:56:41.338 AUDIENCE: What if I forgot to copy in the null character at the end? 00:56:41.338 --> 00:56:42.880 DAVID J. MALAN: Really good question. 00:56:42.880 --> 00:56:47.160 What if you forgot to copy in the null character at the end? 00:56:47.160 --> 00:56:49.140 It is unclear what would happen. 00:56:49.140 --> 00:56:52.770 If there just happened to be some bits in that location in memory 00:56:52.770 --> 00:56:55.260 from earlier-- from some other part of your program 00:56:55.260 --> 00:56:57.450 --and you try printing out s and printing out t, 00:56:57.450 --> 00:56:59.880 you might print out many more characters than you actually 00:56:59.880 --> 00:57:03.742 intended-- if there's no backslash 0 actually there. 00:57:03.742 --> 00:57:04.950 We'll see this more and more. 00:57:04.950 --> 00:57:07.380 Anytime you don't initialize the value of a variable, 00:57:07.380 --> 00:57:09.600 it's what's called a garbage value, which means 00:57:09.600 --> 00:57:11.550 who knows what 0s and 1s are there. 00:57:11.550 --> 00:57:13.260 You might get lucky and it's all 0s. 00:57:13.260 --> 00:57:17.950 But most likely it's going to print some garbage value instead. 00:57:17.950 --> 00:57:18.450 All right. 00:57:18.450 --> 00:57:20.843 Any questions on this? 00:57:20.843 --> 00:57:21.809 Yeah. 00:57:21.809 --> 00:57:25.060 AUDIENCE: Is the string length function only in the CS50 library? 00:57:25.060 --> 00:57:26.060 DAVID J. MALAN: Is the-- 00:57:26.060 --> 00:57:26.685 which function? 00:57:26.685 --> 00:57:27.840 AUDIENCE: String length. 00:57:27.840 --> 00:57:30.132 DAVID J. MALAN: Oh, strlen, no, that's in string dot h. 00:57:30.132 --> 00:57:31.966 That is a standard C thing. 00:57:31.966 --> 00:57:32.918 AUDIENCE: OK. 00:57:32.918 --> 00:57:39.097 If string length is a standard function but strings are not-- 00:57:39.097 --> 00:57:41.180 DAVID J. MALAN: So what's the dichotomy here then? 00:57:41.180 --> 00:57:43.970 If strings don't exist-- 00:57:43.970 --> 00:57:45.320 as I've noted multiple times. 00:57:45.320 --> 00:57:49.010 And yet, there's functions like strcpy and strlen --what's going on? 00:57:49.010 --> 00:57:50.990 C calls them char stars. 00:57:50.990 --> 00:57:52.970 It is c that does not call them strings. 00:57:52.970 --> 00:57:56.750 We, CS50, and the world in general, calls addresses 00:57:56.750 --> 00:57:59.690 of sequences of characters, strings. 00:57:59.690 --> 00:58:02.450 So the only training wheel here, really is the semantics. 00:58:02.450 --> 00:58:06.620 We gave you a data type called string so that in the first week of C and CS50, 00:58:06.620 --> 00:58:09.830 you don't have to see or type char star, which would arguably 00:58:09.830 --> 00:58:11.720 be a lot more cryptic so early on. 00:58:11.720 --> 00:58:14.390 It's arguably a bit cryptic today too. 00:58:14.390 --> 00:58:15.950 Other questions? 00:58:15.950 --> 00:58:16.991 All right, yeah. 00:58:16.991 --> 00:58:21.027 AUDIENCE: So is char star ID type [INAUDIBLE] 00:58:21.027 --> 00:58:21.860 DAVID J. MALAN: Is-- 00:58:21.860 --> 00:58:23.090 say that once more. 00:58:23.090 --> 00:58:26.498 AUDIENCE: Char star ID type [INAUDIBLE]. 00:58:26.498 --> 00:58:29.540 DAVID J. MALAN: Not all of them, but any of them that take a string, yes. 00:58:29.540 --> 00:58:33.890 In fact, any time you have seen us or TF in CS50 say string, 00:58:33.890 --> 00:58:37.730 you can literally, starting today, change that expression to char star 00:58:37.730 --> 00:58:39.966 and it will be one and the same. 00:58:39.966 --> 00:58:40.500 Phew. 00:58:40.500 --> 00:58:41.000 OK. 00:58:41.000 --> 00:58:41.625 That was a lot. 00:58:41.625 --> 00:58:44.530 Let's take our five minute break here with cookies outside. 00:58:44.530 --> 00:58:46.550 All right. 00:58:46.550 --> 00:58:48.560 So we are back. 00:58:48.560 --> 00:58:50.870 That was a lot. 00:58:50.870 --> 00:58:54.650 Let me draw our attention to what the newest feature was just 00:58:54.650 --> 00:58:58.340 a moment ago, this notion of malloc, memory allocation. 00:58:58.340 --> 00:59:02.240 So recall that getString I claim as of today, all this time, 00:59:02.240 --> 00:59:05.030 it's just returning to you the address of the string 00:59:05.030 --> 00:59:06.980 that was gotten from the human. 00:59:06.980 --> 00:59:09.950 malloc, similarly, has a return value. 00:59:09.950 --> 00:59:13.490 And when you ask malloc for this many bytes-- maybe it's five, for emma, 00:59:13.490 --> 00:59:16.580 plus the null terminator, malloc's purpose in life 00:59:16.580 --> 00:59:21.000 is to return to you the address of the first byte of that memory as well. 00:59:21.000 --> 00:59:24.860 So memory alloc means, go get me a chunk of memory somewhere, 00:59:24.860 --> 00:59:27.410 hand me back a pointer there too. 00:59:27.410 --> 00:59:29.990 And the onus is on me to remember that address, 00:59:29.990 --> 00:59:32.750 as I'm doing here, by storing it in t. 00:59:32.750 --> 00:59:35.480 But it turns out, now that we're taking the training wheels off, 00:59:35.480 --> 00:59:38.210 unfortunately, we have to kind of do a bit more work ourselves. 00:59:38.210 --> 00:59:41.390 And there's actually a latent bug in this program. 00:59:41.390 --> 00:59:45.470 It turns out that I am mal-allocating memory with this 00:59:45.470 --> 00:59:47.150 but I'm never actually freeing it. 00:59:47.150 --> 00:59:50.900 The opposite of malloc is a function called free, whose purpose in life 00:59:50.900 --> 00:59:54.740 is to hand back the memory that you asked for so that you 00:59:54.740 --> 00:59:57.828 have plenty of memory available for other parts of your program 00:59:57.828 --> 00:59:58.370 and so forth. 00:59:58.370 --> 01:00:01.435 And long story short, if you've ever-- on your Mac or PC 01:00:01.435 --> 01:00:03.560 --been running a program that maybe is a little bit 01:00:03.560 --> 01:00:06.500 buggy --you might notice your computer is getting slower, and slower, 01:00:06.500 --> 01:00:08.540 or maybe it even runs out of memory explicitly, 01:00:08.540 --> 01:00:10.670 per some error message --that might be quite 01:00:10.670 --> 01:00:14.960 simply, that the programmer of that program kept using mallc, 01:00:14.960 --> 01:00:18.600 and malloc, and malloc to grow, and grow, and grow their use of memory, 01:00:18.600 --> 01:00:21.080 but they never got around to freeing any of that memory. 01:00:21.080 --> 01:00:23.210 So programs can run out of memory. 01:00:23.210 --> 01:00:24.780 Your computer can run out of memory. 01:00:24.780 --> 01:00:28.670 So it's good practice, therefore, to free any memory you're not using. 01:00:28.670 --> 01:00:30.340 However, how do you find this mistake? 01:00:30.340 --> 01:00:33.110 So we've got one final debugging tool for you. 01:00:33.110 --> 01:00:35.720 This one's not CS50 specific like debug50. 01:00:35.720 --> 01:00:37.130 This one is called Valgrind. 01:00:37.130 --> 01:00:41.178 Unfortunately, it's not the easiest thing to understand at first glance. 01:00:41.178 --> 01:00:42.720 So I'm going to go ahead and do this. 01:00:42.720 --> 01:00:48.820 I'm going to run Valgrind on this program, dot slash copy, and hit Enter. 01:00:48.820 --> 01:00:49.320 And unfort-- 01:00:49.320 --> 01:00:50.195 AUDIENCE: [INAUDIBLE] 01:00:50.195 --> 01:00:51.320 [CHUCKLE] 01:00:51.320 --> 01:00:52.022 [COUGH] 01:00:52.022 --> 01:00:52.980 DAVID J. MALAN: Gotcha. 01:00:52.980 --> 01:00:53.480 OK. 01:00:53.480 --> 01:00:56.420 I'm going to go ahead and-- 01:00:56.420 --> 01:00:57.450 there we go. 01:00:57.450 --> 01:00:58.370 AUDIENCE: [INAUDIBLE] 01:00:58.370 --> 01:01:00.780 So what you missed was a very scary message. 01:01:00.780 --> 01:01:03.680 So I'm going to go ahead and run Valgrind on dot slash copy. 01:01:03.680 --> 01:01:06.990 We see this esoteric output up top and then my prompt for s-- 01:01:06.990 --> 01:01:08.240 because it's the same program. 01:01:08.240 --> 01:01:11.448 It's prompting me for a string --so I'm going to give it emma, all lowercase, 01:01:11.448 --> 01:01:12.200 and enter. 01:01:12.200 --> 01:01:16.220 And you'll notice now, that there's some summary going on here 01:01:16.220 --> 01:01:17.810 but also some mention of error. 01:01:17.810 --> 01:01:21.860 So heap summary-- we'll come back to that in a bit --5 bytes in 1 blocks 01:01:21.860 --> 01:01:24.950 are definitely lost in loss record 1 of 2. 01:01:24.950 --> 01:01:27.767 Leak summary, I've got 5 bytes leaking in 1 blocks. 01:01:27.767 --> 01:01:30.350 I mean, this is one of these programs in Linux-- the operating 01:01:30.350 --> 01:01:34.100 system that we use, that's quite common in industry too --I mean, my god. 01:01:34.100 --> 01:01:37.625 There's so-- there's so many more characters on the screen that 01:01:37.625 --> 01:01:39.000 are actually enlightening for me. 01:01:39.000 --> 01:01:41.510 Let's see if we can focus our attention on what matters. 01:01:41.510 --> 01:01:43.220 Memory leaking, bad. 01:01:43.220 --> 01:01:47.060 So how do we go about chasing down where memory is leaking? 01:01:47.060 --> 01:01:49.740 Well, as before, we can use help50. 01:01:49.740 --> 01:01:52.670 And in fact, help50 will analyze the output of Valgrind-- it's still 01:01:52.670 --> 01:01:53.690 going to prompt me first string. 01:01:53.690 --> 01:01:56.220 So I'm going to again, type in emma --it's going to look at that. 01:01:56.220 --> 01:01:57.095 It's to ask for help. 01:01:57.095 --> 01:02:01.460 And voila, highlighted in yellow, is a message that we, help50, recognize. 01:02:01.460 --> 01:02:05.540 And notice our advices, looks like your program leaked 5 bytes of memory. 01:02:05.540 --> 01:02:08.510 Did you forget to free memory that you allocated via malloc. 01:02:08.510 --> 01:02:11.443 Take a closer look at line 10 of copy dot C. 01:02:11.443 --> 01:02:14.360 Now once you've done this a couple of times and made the same mistake, 01:02:14.360 --> 01:02:18.350 you can probably scroll up and glean for yourself where the error is. 01:02:18.350 --> 01:02:21.560 We're not revealing any more information than is right in front of you. 01:02:21.560 --> 01:02:26.330 And in fact, you can see here, ah, in main on copy dot C, line 10, 01:02:26.330 --> 01:02:30.290 there's some kind of 5 bytes in 1 blocks are definitely lost. 01:02:30.290 --> 01:02:33.690 So there's a lot of words there but it does draw attention to the right place. 01:02:33.690 --> 01:02:36.890 So let me go ahead and scroll down, focus on line 10. 01:02:36.890 --> 01:02:39.560 And indeed, line 10 is where I allocated the memory. 01:02:39.560 --> 01:02:42.680 So it turns out the solution for this is quite simple. 01:02:42.680 --> 01:02:45.770 Down here, I'm just going to go ahead and free 01:02:45.770 --> 01:02:51.170 t, the address of the chunk of memory that malloc returned to me. 01:02:51.170 --> 01:02:54.770 So I'm undoing the effects of allocating memory by de-allocating memory. 01:02:54.770 --> 01:02:56.920 So now let me go ahead and run copy. 01:02:56.920 --> 01:03:00.050 And if I run copy, it's not going to seem to run any differently. 01:03:00.050 --> 01:03:01.700 It's still going to work correctly. 01:03:01.700 --> 01:03:06.200 But now if I analyze it for mistakes with Valgrind, so Valgrind of dot 01:03:06.200 --> 01:03:07.510 slash copy-- 01:03:07.510 --> 01:03:10.520 I'm going to again type in emma in all lowercase 01:03:10.520 --> 01:03:13.430 and I cross my fingers --that indeed now, leaked 01:03:13.430 --> 01:03:15.975 summary, 0 bytes in 0 blocks. 01:03:15.975 --> 01:03:19.190 So unfortunately, even when all is well, it still spits out a mouthful. 01:03:19.190 --> 01:03:23.010 But now I see no mention of blocks that are actually 01:03:23.010 --> 01:03:25.477 leaked, at least in the top part here. 01:03:25.477 --> 01:03:27.810 And we'll see more of this over the next couple of weeks 01:03:27.810 --> 01:03:30.750 as we use it to chase down more complicated bugs. 01:03:30.750 --> 01:03:33.480 But it's just another tool in the toolkit that allows 01:03:33.480 --> 01:03:35.820 us to detect these kinds of errors. 01:03:35.820 --> 01:03:37.320 Let me try one other thing actually. 01:03:37.320 --> 01:03:39.720 This is a program that I wrote in advance. 01:03:39.720 --> 01:03:43.320 This one is called memory dot C. And as always, these are all on the course's 01:03:43.320 --> 01:03:45.220 website if you'd like to tinker after. 01:03:45.220 --> 01:03:46.950 And it's a little pointless. 01:03:46.950 --> 01:03:49.030 It's just meant for demonstration purposes. 01:03:49.030 --> 01:03:50.160 So here is a program. 01:03:50.160 --> 01:03:54.642 And it's copied from this online manual for Valgrind, the tool I just used. 01:03:54.642 --> 01:03:55.850 So let's see what's going on. 01:03:55.850 --> 01:03:57.900 Here I have main, at the bottom of my code. 01:03:57.900 --> 01:03:58.410 I copied it. 01:03:58.410 --> 01:03:59.452 I didn't use a prototype. 01:03:59.452 --> 01:04:00.750 I just copied what they did. 01:04:00.750 --> 01:04:04.980 And see here, it calls a function called f and then returns 0. 01:04:04.980 --> 01:04:07.140 Well what does f do? f is this random function 01:04:07.140 --> 01:04:10.410 up here that takes no inputs per the void. 01:04:10.410 --> 01:04:14.640 And in English, how would you describe what's happening in line 7 now-- 01:04:14.640 --> 01:04:18.420 that we've introduced malloc and stars-- 01:04:18.420 --> 01:04:19.840 or pointers? 01:04:19.840 --> 01:04:20.590 What's this doing? 01:04:20.590 --> 01:04:21.233 Yeah. 01:04:21.233 --> 01:04:26.695 AUDIENCE: It's allocating enough memory in [INAUDIBLE] for [INAUDIBLE].. 01:04:26.695 --> 01:04:27.570 DAVID J. MALAN: Good. 01:04:27.570 --> 01:04:31.080 Allocate enough memory for 10 integers-- and then 01:04:31.080 --> 01:04:33.210 let me add-- elaborate on your words --and then 01:04:33.210 --> 01:04:37.860 store the address of that chunk of memory 01:04:37.860 --> 01:04:40.020 in a pointer called x, if you will. 01:04:40.020 --> 01:04:41.102 So sizeof is new. 01:04:41.102 --> 01:04:42.560 But it literally does what it says. 01:04:42.560 --> 01:04:44.400 If you say sizeof open paren, close paren, 01:04:44.400 --> 01:04:47.563 and then the name of a data type, it will tell you that an int is 4 bytes. 01:04:47.563 --> 01:04:49.230 It will tell you that a long is 8 bytes. 01:04:49.230 --> 01:04:51.060 It will tell you that a char is one byte. 01:04:51.060 --> 01:04:53.070 It's just a dynamic way of avoiding having 01:04:53.070 --> 01:04:54.940 to memorize those kinds of things. 01:04:54.940 --> 01:04:56.910 So this just means give me 10 times the size 01:04:56.910 --> 01:04:58.830 of an int, which happens to be 4 bytes. 01:04:58.830 --> 01:05:02.310 So that means give me 10 times 4, or 40 bytes of memory. 01:05:02.310 --> 01:05:06.090 That's effectively an array of memory that I can store integers in. 01:05:06.090 --> 01:05:09.420 And malloc, per its definition, is going to return to me the address 01:05:09.420 --> 01:05:12.030 of the first byte of that memory. 01:05:12.030 --> 01:05:20.170 What is now scary about line 8, relatively speaking? 01:05:20.170 --> 01:05:25.240 What might worry you with line 8, which is buggy, unfortunately? 01:05:25.240 --> 01:05:26.436 Yeah. 01:05:26.436 --> 01:05:30.970 AUDIENCE: [INAUDIBLE] 01:05:30.970 --> 01:05:31.970 DAVID J. MALAN: Exactly. 01:05:31.970 --> 01:05:35.490 I'm doing x bracket 10 and just arbitrarily storing the number 0. 01:05:35.490 --> 01:05:35.990 Why? 01:05:35.990 --> 01:05:36.980 Just because. 01:05:36.980 --> 01:05:38.340 But 10 does not exist. 01:05:38.340 --> 01:05:38.840 Right? 01:05:38.840 --> 01:05:44.390 If I have 10 int, it's bracket 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, not bracket 10. 01:05:44.390 --> 01:05:47.870 So this is an example of overflowing a buffer, so to speak. 01:05:47.870 --> 01:05:50.153 Anytime you're talking about memory, any time 01:05:50.153 --> 01:05:52.820 you're talking about an array of memory-- which this effectively 01:05:52.820 --> 01:05:55.280 is, 10 integers, room for 10 integers back to back 01:05:55.280 --> 01:06:01.040 to back --if you go one step too far, that's what's called a buffer overflow, 01:06:01.040 --> 01:06:03.150 whereby the buffer is the array. 01:06:03.150 --> 01:06:05.150 And in fact, this would make it even more clear. 01:06:05.150 --> 01:06:08.510 Suppose I tried to go there, bracket 10,000. 01:06:08.510 --> 01:06:11.640 That is definitely not among the bytes of memory I allocated. 01:06:11.640 --> 01:06:14.240 That's definitely going beyond the boundaries of my array. 01:06:14.240 --> 01:06:18.590 But so is it true that bracket 10 is one step too far. 01:06:18.590 --> 01:06:20.880 So what's nice about Valgrind is this. 01:06:20.880 --> 01:06:24.910 Let me go ahead and rerun Valgrind after compiling this memory program-- 01:06:24.910 --> 01:06:27.050 whoops --in my source directory. 01:06:27.050 --> 01:06:29.390 Let me go ahead and make memory. 01:06:29.390 --> 01:06:29.890 All right. 01:06:29.890 --> 01:06:31.280 It compiled OK. 01:06:31.280 --> 01:06:34.190 Valgrind dot slash memory-- 01:06:34.190 --> 01:06:37.040 and unfortunately, we're going to see some crazy arcane error 01:06:37.040 --> 01:06:38.000 messages for a moment. 01:06:38.000 --> 01:06:39.920 But let's see what it says. 01:06:39.920 --> 01:06:43.040 Notice here, invalid write of size 4-- 01:06:43.040 --> 01:06:46.968 that sounds bad --and 40 bytes in one blocks are-- 01:06:46.968 --> 01:06:49.260 OK, they didn't really add an if condition in Valgrind. 01:06:49.260 --> 01:06:50.990 --40 bytes in 1 blocks-- 01:06:50.990 --> 01:06:52.910 plural --are definitely lost. 01:06:52.910 --> 01:06:54.830 So let's fix the second of those first. 01:06:54.830 --> 01:06:57.615 Why am I leaking 40 bytes exactly? 01:06:57.615 --> 01:06:58.490 AUDIENCE: [INAUDIBLE] 01:06:58.490 --> 01:07:00.032 DAVID J. MALAN: I'm never freeing it. 01:07:00.032 --> 01:07:03.770 So I think I can get away with just doing this here, just free the memory 01:07:03.770 --> 01:07:06.560 after I'm done using it-- even though I'm not really 01:07:06.560 --> 01:07:08.938 using it for anything purposeful here. 01:07:08.938 --> 01:07:09.980 So let me try this again. 01:07:09.980 --> 01:07:16.040 Make memory, now let me do Valgrind dot slash memory. 01:07:16.040 --> 01:07:18.200 And-- OK, better. 01:07:18.200 --> 01:07:20.450 I don't see 40 bytes lost anymore. 01:07:20.450 --> 01:07:21.080 So that's good. 01:07:21.080 --> 01:07:22.760 But I do still have this issue. 01:07:22.760 --> 01:07:25.850 But here's where it's sometimes useful to understand the various data 01:07:25.850 --> 01:07:27.230 types and their sizes. 01:07:27.230 --> 01:07:29.300 Invalid write of size 4. 01:07:29.300 --> 01:07:32.340 Writing in a program just means changing a value. 01:07:32.340 --> 01:07:34.160 And it mentioned line 8 here. 01:07:34.160 --> 01:07:37.580 In what sense is this an invalid write of size 4? 01:07:37.580 --> 01:07:39.510 Well, how big is an int? 01:07:39.510 --> 01:07:40.520 Four bytes. 01:07:40.520 --> 01:07:42.530 You're trying to change it arbitrarily to 0. 01:07:42.530 --> 01:07:44.900 But I could have made that 50 or any other number. 01:07:44.900 --> 01:07:48.080 But I'm trying to touch an int that should not 01:07:48.080 --> 01:07:51.470 be within the memory I have allocated for myself. 01:07:51.470 --> 01:07:57.080 I asked for 40 bytes, or 10 ints, but again, because arrays are zero indexed, 01:07:57.080 --> 01:07:59.510 this is like going one beyond the boundary. 01:07:59.510 --> 01:08:03.750 So let me fix this and just arbitrarily say, let's go touch that part of it. 01:08:03.750 --> 01:08:06.080 Let me go here and do make memory. 01:08:06.080 --> 01:08:10.250 Let me go ahead and do Valgrind dot slash memory. 01:08:10.250 --> 01:08:16.520 And now, arcane output aside, notice that that error message went away too. 01:08:16.520 --> 01:08:19.370 So this will be helpful over the coming couple of weeks 01:08:19.370 --> 01:08:22.890 as we continue to use C to implement a number of programs 01:08:22.890 --> 01:08:24.390 that now start to manipulate memory. 01:08:24.390 --> 01:08:26.479 It's just a tool that helps you spot errors 01:08:26.479 --> 01:08:29.240 that certainly, your TF might otherwise, or that 01:08:29.240 --> 01:08:33.140 might be causing your program to crash, or to freeze, or to segfault-- 01:08:33.140 --> 01:08:35.899 if you've seen that yourselves before. 01:08:35.899 --> 01:08:36.399 All right. 01:08:36.399 --> 01:08:37.410 So that's just a tool. 01:08:37.410 --> 01:08:41.899 Let's go ahead and transition now to some actual use cases here. 01:08:41.899 --> 01:08:46.350 Recall from last week that it was pretty useful to be able to swap values. 01:08:46.350 --> 01:08:46.850 Right? 01:08:46.850 --> 01:08:50.758 With bubble sort, with selection sort, we needed to be able to exchange values 01:08:50.758 --> 01:08:52.800 so that we could put things into the right place. 01:08:52.800 --> 01:08:54.720 Turns out this is pretty straightforward. 01:08:54.720 --> 01:08:54.960 Right? 01:08:54.960 --> 01:08:57.002 And we can actually mimic this in the real world. 01:08:57.002 --> 01:09:01.585 We just have opportunity for one volunteer today, one volunteer. 01:09:01.585 --> 01:09:02.460 Can we get a little-- 01:09:02.460 --> 01:09:03.170 OK, over here. 01:09:03.170 --> 01:09:03.670 Yeah. 01:09:03.670 --> 01:09:05.264 What's your name? 01:09:05.264 --> 01:09:06.240 FARRAH: Farrah. 01:09:06.240 --> 01:09:06.479 DAVID J. MALAN: Sorry. 01:09:06.479 --> 01:09:07.161 FARRAH: Farrah. 01:09:07.161 --> 01:09:08.036 DAVID J. MALAN: Vera. 01:09:08.036 --> 01:09:08.990 FARRAH: Farrah. 01:09:08.990 --> 01:09:10.532 DAVID J. MALAN: Oh, here, come on up. 01:09:10.532 --> 01:09:12.085 Then I can hear you up here. 01:09:12.085 --> 01:09:12.960 OK, what's your name? 01:09:12.960 --> 01:09:13.950 FARRAH: Farrah. 01:09:13.950 --> 01:09:14.825 DAVID J. MALAN: Vera. 01:09:14.825 --> 01:09:15.630 FARRAH: With an F. 01:09:15.630 --> 01:09:16.080 DAVID J. MALAN: Fera. 01:09:16.080 --> 01:09:16.680 FARRAH: Farrah. 01:09:16.680 --> 01:09:17.340 DAVID J. MALAN: Farrah. 01:09:17.340 --> 01:09:17.609 FARRAH: Yes. 01:09:17.609 --> 01:09:17.880 DAVID J. MALAN: Farrah. 01:09:17.880 --> 01:09:18.430 Yes, OK. 01:09:18.430 --> 01:09:18.930 Good. 01:09:18.930 --> 01:09:19.439 Come on up. 01:09:19.439 --> 01:09:20.022 Still come up. 01:09:20.022 --> 01:09:20.880 [CHUCKLE] Thank you. 01:09:20.880 --> 01:09:21.420 Thank you. 01:09:21.420 --> 01:09:22.334 [APPLAUSE] 01:09:22.334 --> 01:09:24.619 [CHEERS] 01:09:24.619 --> 01:09:25.680 OK, nice to meet you. 01:09:25.680 --> 01:09:26.700 FARRAH: Hi, nice to meet you. 01:09:26.700 --> 01:09:27.149 DAVID J. MALAN: Farrah. 01:09:27.149 --> 01:09:27.649 FARRAH: Yes. 01:09:27.649 --> 01:09:28.620 DAVID J. MALAN: OK. 01:09:28.620 --> 01:09:29.700 So let's go ahead here. 01:09:29.700 --> 01:09:32.100 Let me give you this so that you can be mic'd as well. 01:09:32.100 --> 01:09:36.479 OK, so the goal at hand is here, I have two glasses of colored water. 01:09:36.479 --> 01:09:37.830 So we have some purple here. 01:09:37.830 --> 01:09:39.324 [WATER POURING] 01:09:39.324 --> 01:09:40.819 OK. 01:09:40.819 --> 01:09:42.622 And we've got some green here. 01:09:42.622 --> 01:09:43.770 [WATER POURING] 01:09:43.770 --> 01:09:47.189 And the only goal at hand is to do a very simple operation 01:09:47.189 --> 01:09:48.990 like we needed to do quite a bit last week, 01:09:48.990 --> 01:09:51.819 which is to swap two variables just like we swap two numbers. 01:09:51.819 --> 01:09:54.240 So if you could go ahead and get the purple liquid in here 01:09:54.240 --> 01:09:56.250 and the green liquid in here, go. 01:09:59.950 --> 01:10:00.450 [CHUCKLE] 01:10:00.450 --> 01:10:01.993 FARRAH: Is it OK if they overlap? 01:10:01.993 --> 01:10:03.160 DAVID J. MALAN: Ideally, no. 01:10:03.160 --> 01:10:05.670 We want to put only purple here, and only green here, 01:10:05.670 --> 01:10:07.460 and no temporary store. 01:10:07.460 --> 01:10:07.960 [LAUGHTER] 01:10:07.960 --> 01:10:09.050 FARRAH: Oh. 01:10:09.050 --> 01:10:10.800 DAVID J. MALAN: OK, but you're hesitating. 01:10:10.800 --> 01:10:12.005 Why? 01:10:12.005 --> 01:10:14.630 FARRAH: Because you told me they couldn't touch [CHUCKLE] and-- 01:10:14.630 --> 01:10:16.630 DAVID J. MALAN: Well, you can touch the glasses. 01:10:16.630 --> 01:10:18.360 But you're hesitating to swap them, why? 01:10:18.360 --> 01:10:19.020 [CLINK] 01:10:19.020 --> 01:10:20.240 OK, that's just cheating. 01:10:20.240 --> 01:10:21.707 [LAUGHTER] 01:10:21.707 --> 01:10:22.685 [APPLAUSE] 01:10:22.685 --> 01:10:25.130 [CHEERS] 01:10:25.130 --> 01:10:27.840 OK, very clever. 01:10:27.840 --> 01:10:29.970 Supposing you can't just move things around 01:10:29.970 --> 01:10:32.890 in memory, what if I gave you a temporary variable. 01:10:32.890 --> 01:10:33.390 FARRAH: OK. 01:10:33.390 --> 01:10:35.310 DAVID J. MALAN: Does this help? 01:10:35.310 --> 01:10:37.140 FARRAH: Yes. 01:10:37.140 --> 01:10:40.573 DAVID J. MALAN: So how can we now get purple in there and green in there? 01:10:40.573 --> 01:10:42.052 [CHUCKLE] 01:10:42.052 --> 01:10:44.098 FARRAH: Can I put purple in here first? 01:10:44.098 --> 01:10:44.973 DAVID J. MALAN: Sure. 01:10:44.973 --> 01:10:45.720 FARRAH: I'm going to spill it. 01:10:45.720 --> 01:10:47.056 DAVID J. MALAN: It's OK. 01:10:47.056 --> 01:10:48.050 [WATER POURING] 01:10:48.050 --> 01:10:49.090 OK. 01:10:49.090 --> 01:10:51.340 So purple goes into the temporary, very nice. 01:10:51.340 --> 01:10:51.840 [APPLAUSE] 01:10:51.840 --> 01:10:55.040 FARRAH: Thank you. 01:10:55.040 --> 01:10:57.290 DAVID J. MALAN: Green goes into what was purple. 01:10:57.290 --> 01:10:58.410 FARRAH: Yes 01:10:58.410 --> 01:11:00.920 DAVID J. MALAN: OK, good. 01:11:00.920 --> 01:11:03.650 And then purple goes in-- from the temporary variable 01:11:03.650 --> 01:11:06.140 into the original green glass. 01:11:06.140 --> 01:11:08.070 Now, a proper round of applause if we could. 01:11:08.070 --> 01:11:08.836 OK. 01:11:08.836 --> 01:11:09.336 [APPLAUSE] 01:11:09.336 --> 01:11:09.836 Thank you. 01:11:09.836 --> 01:11:10.800 FARRAH: Thank you. 01:11:10.800 --> 01:11:13.730 DAVID J. MALAN: OK. 01:11:13.730 --> 01:11:17.780 So suffice it to say, that is the correct way of swapping two values. 01:11:17.780 --> 01:11:21.960 But the key detail there was that Farrah had access to a temporary variable. 01:11:21.960 --> 01:11:25.310 And so you would think that this idea, simple as it is in reality, 01:11:25.310 --> 01:11:28.650 would translate pretty naturally to code as well. 01:11:28.650 --> 01:11:30.957 But it turns out that's not necessarily the case. 01:11:30.957 --> 01:11:33.290 So it turns out that if we wanted to swap two variables, 01:11:33.290 --> 01:11:35.510 you might implement a function called swap 01:11:35.510 --> 01:11:38.120 and just take in two integers, a and b, the goal of which 01:11:38.120 --> 01:11:39.170 is to do the switcheroo. 01:11:39.170 --> 01:11:41.210 Purple becomes green, green becomes purple, 01:11:41.210 --> 01:11:43.580 just as a becomes b, b becomes a. 01:11:43.580 --> 01:11:47.480 And you would think that we just need a temporary variable inside of that code 01:11:47.480 --> 01:11:48.900 in order to make that happen. 01:11:48.900 --> 01:11:51.960 So I would argue that the equivalent to what Farrah did in 01:11:51.960 --> 01:11:54.980 person, in code in C, might look like this. 01:11:54.980 --> 01:11:57.050 Give me a temporary variable called temp-- 01:11:57.050 --> 01:12:00.950 or anything you want --store in it, a-- just as she stored one of the colors 01:12:00.950 --> 01:12:05.000 in the temporary glass first, purple --then go ahead and change the value 01:12:05.000 --> 01:12:07.310 of a to equal the value of b-- 01:12:07.310 --> 01:12:10.940 because you've already kept a copy of a around in a temporary variable 01:12:10.940 --> 01:12:13.730 --then finally, store in b what is in temp. 01:12:13.730 --> 01:12:18.230 So that is the code equivalent of what Farrah did using these colored liquids. 01:12:18.230 --> 01:12:22.430 Unfortunately, it's not quite as simple it would seem, as that. 01:12:22.430 --> 01:12:26.090 I'm going to go ahead and open up, say, a program that I wrote in advance here 01:12:26.090 --> 01:12:27.980 too, called-- 01:12:27.980 --> 01:12:29.930 intentionally --no swap. 01:12:29.930 --> 01:12:33.270 Even though you would like to think that it does exactly that. 01:12:33.270 --> 01:12:36.140 So notice that in this code we have-- 01:12:36.140 --> 01:12:39.170 including standard I/O dot h --we have a prototype 01:12:39.170 --> 01:12:41.390 for the function I just proposed we make, swap, 01:12:41.390 --> 01:12:43.220 that takes two ints a and b. 01:12:43.220 --> 01:12:44.660 Here's my main function. 01:12:44.660 --> 01:12:47.750 And I'm just going to arbitrarily initialize x to 1 and y 01:12:47.750 --> 01:12:52.370 to 2, just as I initialized one glass to purple and one glass to green. 01:12:52.370 --> 01:12:55.590 Then, just so that we can see what's going on inside our code, 01:12:55.590 --> 01:12:59.390 I'm just going to print out x is such and such, y is such and such-- 01:12:59.390 --> 01:13:02.510 printing x and y --then I'm going to call that swap function, 01:13:02.510 --> 01:13:03.710 swapping x and y. 01:13:03.710 --> 01:13:06.008 And then I'm going to literally print the same phrase. 01:13:06.008 --> 01:13:09.050 But I'm hoping that it's going to say the opposite the second time around 01:13:09.050 --> 01:13:10.970 if x and y are indeed swapped. 01:13:10.970 --> 01:13:12.680 So how do I implement swap? 01:13:12.680 --> 01:13:14.900 Well, it would seem to be, with this same code, 01:13:14.900 --> 01:13:17.420 using a temporary variable-- or temporary glass, 01:13:17.420 --> 01:13:21.170 just as Farrah did for the two liquids. 01:13:21.170 --> 01:13:24.860 Unfortunately, when I go ahead and run this program, no swap-- 01:13:24.860 --> 01:13:27.890 and its name alone is a bit of a spoiler --if I go ahead 01:13:27.890 --> 01:13:33.380 and run dot slash no swap with x and y hardcoded to 1 and 2 respectively, 01:13:33.380 --> 01:13:37.060 you'll see that it runs, and says, x is 1, y is 2, x is 1, 01:13:37.060 --> 01:13:42.860 y is 2, thereby clearly failing to swap. 01:13:42.860 --> 01:13:47.330 But if you're in agreement with me, this feels like it's correct. 01:13:47.330 --> 01:13:49.490 I didn't get any compiler errors. 01:13:49.490 --> 01:13:54.920 Yet, this line of code, which uses swap, seems to have no effect. 01:13:54.920 --> 01:13:57.710 So what might the intuition here or hunch 01:13:57.710 --> 01:14:01.640 be for why this program indeed does not swap? 01:14:01.640 --> 01:14:05.584 AUDIENCE: So when it takes the [INAUDIBLE] in the-- 01:14:05.584 --> 01:14:09.270 when it takes the [INAUDIBLE] whole new variable that [INAUDIBLE].. 01:14:09.270 --> 01:14:10.520 DAVID J. MALAN: Yeah, exactly. 01:14:10.520 --> 01:14:14.150 When you pass inputs to a function, you are effectively 01:14:14.150 --> 01:14:17.850 passing copies of your own values to that function. 01:14:17.850 --> 01:14:22.310 And so when you have two variables, x and y-- initialized to 1 and 2 --yes, 01:14:22.310 --> 01:14:24.350 you're passing them as input to swap. 01:14:24.350 --> 01:14:29.570 But swap is not getting actually x and y, it's getting copies of x and y. 01:14:29.570 --> 01:14:34.020 And per its prototype, is calling them a and b, respectively. 01:14:34.020 --> 01:14:36.740 So it turns out this swap function actually does work. 01:14:36.740 --> 01:14:38.240 It swaps a and b. 01:14:38.240 --> 01:14:42.800 But it does not swap x and y because those are copies. 01:14:42.800 --> 01:14:48.470 Now this seems especially worrisome now in so far as I cannot seem to implement 01:14:48.470 --> 01:14:52.640 a function called swap that can even implement bubble sorts or selection 01:14:52.640 --> 01:14:53.150 sort. 01:14:53.150 --> 01:14:55.233 And frankly, you might have run into this yourself 01:14:55.233 --> 01:14:57.830 if trying to implement this for one of your voting algorithms. 01:14:57.830 --> 01:14:59.990 If you needed to do a swap, if you had a helper function, 01:14:59.990 --> 01:15:02.810 you might have had to think about it in a somewhat different way. 01:15:02.810 --> 01:15:04.738 So what's the explanation for all of this? 01:15:04.738 --> 01:15:06.530 Well, this version of swap doesn't actually 01:15:06.530 --> 01:15:09.470 work because again, if we go back to first principles, 01:15:09.470 --> 01:15:12.410 go inside of the computer's memory and consider 01:15:12.410 --> 01:15:15.980 our memory is just a grid of bytes, top to bottom, left to right. 01:15:15.980 --> 01:15:17.300 What's really going on? 01:15:17.300 --> 01:15:19.190 Well, it turns out that all this time we've 01:15:19.190 --> 01:15:22.910 been using C, my computer isn't just arbitrarily putting things in memory 01:15:22.910 --> 01:15:24.470 over here, over here, over here. 01:15:24.470 --> 01:15:27.990 It actually uses your computer's memory in a methodical way. 01:15:27.990 --> 01:15:29.690 Certain types of data go down here. 01:15:29.690 --> 01:15:32.250 Certain types of data go up here, and so forth. 01:15:32.250 --> 01:15:33.830 So what is that methodology? 01:15:33.830 --> 01:15:37.050 Well, if we consider it just abstractly as a big rectangle, 01:15:37.050 --> 01:15:39.470 it turns out that if this is your computer's memory, 01:15:39.470 --> 01:15:43.430 at the very top of it, conceptually, goes all of the 0s and 1s 01:15:43.430 --> 01:15:45.500 that Clang compiled for you. 01:15:45.500 --> 01:15:49.460 The so-called machine code, is literally loaded into your computer's RAM when 01:15:49.460 --> 01:15:53.390 you run dot slash something, or in a Mac or PC, when you double click an icon, 01:15:53.390 --> 01:15:57.350 those 0s and 1s-- the compiled code --is loaded into your computer's memory up 01:15:57.350 --> 01:15:57.870 here-- 01:15:57.870 --> 01:16:00.980 let's say --and it might take up this much space for a small program, 01:16:00.980 --> 01:16:02.900 this much space for a big program. 01:16:02.900 --> 01:16:07.250 Below that, if your program uses any global variables or other type of data, 01:16:07.250 --> 01:16:10.650 those will go just below, so to speak, the machine 01:16:10.650 --> 01:16:11.900 code in the computer's memory. 01:16:11.900 --> 01:16:12.290 Why? 01:16:12.290 --> 01:16:14.540 Just because humans needed to decide when implementing 01:16:14.540 --> 01:16:17.720 compilers where to put stuff in the computer's memory. 01:16:17.720 --> 01:16:21.080 Below that is a special chunk of memory called the heap. 01:16:21.080 --> 01:16:22.150 And Valgrind gave it-- 01:16:22.150 --> 01:16:24.620 a teaser of this word a moment ago. 01:16:24.620 --> 01:16:30.410 The heap is a big chunk of memory where you can allocate memory from. 01:16:30.410 --> 01:16:32.210 And in fact, if you call malloc-- 01:16:32.210 --> 01:16:34.530 as I did once before --that memory is going 01:16:34.530 --> 01:16:37.460 to come from this region of the computer's memory, 01:16:37.460 --> 01:16:40.670 below the global variables, below the machine code, 01:16:40.670 --> 01:16:44.230 because that's where Clang and compiler designers decided to draw memory from. 01:16:44.230 --> 01:16:47.870 So every time you call malloc, you're carving out more and more bytes 01:16:47.870 --> 01:16:49.250 for your program to use. 01:16:49.250 --> 01:16:51.500 And that heap grows, conceptually, downward. 01:16:51.500 --> 01:16:53.510 The more memory you use, the lower, lower, 01:16:53.510 --> 01:16:56.450 lower it gets in this artist's rendition. 01:16:56.450 --> 01:17:00.290 However, there's a different portion of memory here down below that's 01:17:00.290 --> 01:17:02.450 used for a very different purpose. 01:17:02.450 --> 01:17:06.410 Anytime you call a function in your program, 01:17:06.410 --> 01:17:10.340 it turns out that that functions local variables end up 01:17:10.340 --> 01:17:14.010 going at the bottom of your computer's memory on what's called a stack. 01:17:14.010 --> 01:17:17.090 So if you have main, the default function, 01:17:17.090 --> 01:17:20.300 and it has one or more arguments, or one or more local variables, 01:17:20.300 --> 01:17:23.360 those variables just go down here, conceptually, in memory. 01:17:23.360 --> 01:17:26.990 And if you call a function like swap, or anything else, 01:17:26.990 --> 01:17:29.670 it just keeps using more and more memory above that. 01:17:29.670 --> 01:17:33.050 So the heap is where malloc gets you bytes from. 01:17:33.050 --> 01:17:36.230 And the stack is where your local variables go when 01:17:36.230 --> 01:17:38.640 functions are called, bottom to top. 01:17:38.640 --> 01:17:40.400 So let's see this in action here. 01:17:40.400 --> 01:17:44.300 If we consider the stack alone in the context of swapping variables 01:17:44.300 --> 01:17:47.730 unsuccessfully, what's really happening with code like this? 01:17:47.730 --> 01:17:50.120 Well, on the bottom of my memory when I call main, 01:17:50.120 --> 01:17:54.530 I am given-- by nature of how C programs work when compiled --a slice of memory 01:17:54.530 --> 01:17:56.860 called a frame, a stack frame. 01:17:56.860 --> 01:18:00.890 And this is just some number of bytes that store maybe argv, argc, 01:18:00.890 --> 01:18:03.140 it stores x and y, my local variables. 01:18:03.140 --> 01:18:07.250 Any variables I have in main get stored in this chunk of memory here. 01:18:07.250 --> 01:18:11.480 If main calls a function, like this swap function, that function gets 01:18:11.480 --> 01:18:15.140 its own frame of memory, its own slice of memory, that conceptually, 01:18:15.140 --> 01:18:16.490 is above main. 01:18:16.490 --> 01:18:19.400 So swap has two variables-- right-- 01:18:19.400 --> 01:18:21.470 two arguments, right, a and b. 01:18:21.470 --> 01:18:23.440 And it also had one other variable. 01:18:23.440 --> 01:18:24.260 AUDIENCE: Temp. 01:18:24.260 --> 01:18:25.135 DAVID J. MALAN: Temp. 01:18:25.135 --> 01:18:28.700 So those three values are going to be in this frame of memory. 01:18:28.700 --> 01:18:32.557 X and y are on the bottom, a, b, and temp are above it in there. 01:18:32.557 --> 01:18:33.890 So let's actually focus on this. 01:18:33.890 --> 01:18:36.200 If we focus on main, when my program first runs, 01:18:36.200 --> 01:18:37.590 I have two variables, x and y. 01:18:37.590 --> 01:18:40.730 And I initialize those to 1 and 2, respectively. 01:18:40.730 --> 01:18:42.740 Then the swap function gets called. 01:18:42.740 --> 01:18:46.340 So another frame gets used on the stack, just another bunch of bytes 01:18:46.340 --> 01:18:48.320 are being allocated by the computer for me. 01:18:48.320 --> 01:18:51.383 And swap had three variables, a, b, and temp. 01:18:51.383 --> 01:18:54.050 The first two were its inputs, its arguments, the third of which 01:18:54.050 --> 01:18:56.960 was an explicit temporary variable I gave it. 01:18:56.960 --> 01:19:02.180 With those lines of code from before I initialized a and b to 1 and 2, 01:19:02.180 --> 01:19:02.900 respectively. 01:19:02.900 --> 01:19:07.640 And notice, they are literally identical to x and y but copies of x and y. 01:19:07.640 --> 01:19:10.200 And then if we consider the code, what happens next? 01:19:10.200 --> 01:19:12.570 Well, temp is assigned a. 01:19:12.570 --> 01:19:14.696 So temp should take on what value? 01:19:14.696 --> 01:19:15.590 AUDIENCE: 1. 01:19:15.590 --> 01:19:16.550 DAVID J. MALAN: Just 1. 01:19:16.550 --> 01:19:18.910 And then second line of code, a equals b. 01:19:18.910 --> 01:19:23.120 So a should take on the value of b, which means it's now 2. 01:19:23.120 --> 01:19:28.730 And meanwhile, b equals temp means that b should take on the value of 1. 01:19:28.730 --> 01:19:31.040 And so now we have successfully swapped, it 01:19:31.040 --> 01:19:34.892 seems-- with these three lines of code taken from my actual program --a and b. 01:19:34.892 --> 01:19:37.850 Unfortunately, the thing about a stack is just like in the dining hall. 01:19:37.850 --> 01:19:41.432 When you have the stacks of Harvard trays in the dining halls and you 01:19:41.432 --> 01:19:43.640 keep putting news trays on top, on top, but then they 01:19:43.640 --> 01:19:45.950 keep getting taken from the top as well. 01:19:45.950 --> 01:19:50.360 So just when swap is done with its third line of code, 01:19:50.360 --> 01:19:54.273 it's like someone has taken the tray away and that frame disappears. 01:19:54.273 --> 01:19:56.190 So the memory technically doesn't go anywhere. 01:19:56.190 --> 01:19:57.540 It's still a physical device. 01:19:57.540 --> 01:20:01.460 But it's just no longer allocated for my own program. 01:20:01.460 --> 01:20:05.360 So main is still intact after the swap function returns. 01:20:05.360 --> 01:20:10.010 But of course, x and y have not actually been affected. 01:20:10.010 --> 01:20:14.960 So what's the fundamental solution to this problem? 01:20:14.960 --> 01:20:18.070 Swap did not work because it was passed copies. 01:20:18.070 --> 01:20:20.390 It was passed by value, so to speak, when 01:20:20.390 --> 01:20:26.510 main calls swap, passing an x and y, I get copies of x and y called a and b. 01:20:26.510 --> 01:20:28.176 What could I do instead? 01:20:28.176 --> 01:20:29.054 AUDIENCE: [INAUDIBLE] 01:20:29.054 --> 01:20:30.387 DAVID J. MALAN: A little louder. 01:20:30.387 --> 01:20:31.610 AUDIENCE: Pass by reference. 01:20:31.610 --> 01:20:34.027 DAVID J. MALAN: Pass by reference, and what's a reference? 01:20:34.027 --> 01:20:35.290 AUDIENCE: Make a pointer. 01:20:35.290 --> 01:20:35.680 DAVID J. MALAN: Yeah. 01:20:35.680 --> 01:20:38.180 So a reference is synonymous for our purposes, with pointer. 01:20:38.180 --> 01:20:41.020 So yeah, that's actually kind of the germ of an idea from before. 01:20:41.020 --> 01:20:44.260 If we now have the ability to address things --like slap some addresses 01:20:44.260 --> 01:20:45.520 on mailboxes-- 01:20:45.520 --> 01:20:49.570 you know what, let's not just pass from main to swap, literally x 01:20:49.570 --> 01:20:54.520 and y, why don't we tell swap what the address of x is and the address of y 01:20:54.520 --> 01:20:58.910 so that my swap code can go to x and y, change them. 01:20:58.910 --> 01:21:00.820 And then even when the swap function returns, 01:21:00.820 --> 01:21:03.760 that's fine because it went to the right locations. 01:21:03.760 --> 01:21:06.320 So pictorially, what I really want to do is this. 01:21:06.320 --> 01:21:09.220 If I take another stab at this, I'm going to go ahead now 01:21:09.220 --> 01:21:12.580 and reinitialize main to have x and y equal to 1 and 2. 01:21:12.580 --> 01:21:14.230 I'm now going to call swap. 01:21:14.230 --> 01:21:16.750 But what I really want to do, using pictures this time, 01:21:16.750 --> 01:21:20.950 is I want a to point to x and b to point to y. 01:21:20.950 --> 01:21:24.430 I don't want them to equal x and y because now I 01:21:24.430 --> 01:21:27.220 can sort of follow the breadcrumbs, or the chutes and ladder idea, 01:21:27.220 --> 01:21:28.553 whatever metaphor works for you. 01:21:28.553 --> 01:21:34.122 You can go from a to x, you can go from b to y, and do the switcheroo There 01:21:34.122 --> 01:21:35.830 So the code I'm actually going to use now 01:21:35.830 --> 01:21:37.788 is a little scary looking but it just goes back 01:21:37.788 --> 01:21:40.540 to those first principles from the very start today. 01:21:40.540 --> 01:21:44.140 I need to put, unfortunately, some asterisks all over the place here. 01:21:44.140 --> 01:21:45.310 But let's see why. 01:21:45.310 --> 01:21:47.860 First, let me actually back up for just a moment 01:21:47.860 --> 01:21:53.440 and propose that the swap code I'm going to use now is not that in no swap dot c 01:21:53.440 --> 01:21:58.060 but in a program called swap dot c. 01:21:58.060 --> 01:22:02.830 So in swap dot C I have almost the same code, except this. 01:22:02.830 --> 01:22:06.730 First of all, on line 13, I'm no longer passing an x and y, 01:22:06.730 --> 01:22:10.330 I'm passing in the address of x and the address of y. 01:22:10.330 --> 01:22:12.880 That was the key detail from earlier today when we first 01:22:12.880 --> 01:22:13.870 introduced ampersand. 01:22:13.870 --> 01:22:16.245 So this means, here's the address of x, the address of y. 01:22:16.245 --> 01:22:19.300 It's like providing a map to swap so that it can go there. 01:22:19.300 --> 01:22:23.110 The syntax for defining a function that accepts addresses 01:22:23.110 --> 01:22:28.060 is unfortunately a little cryptic but name of the function, like swap, 01:22:28.060 --> 01:22:30.880 the type of pointer, and the type of pointer. 01:22:30.880 --> 01:22:36.820 So, int Star a means, I accept the address of an int and call it a. 01:22:36.820 --> 01:22:40.330 I also accept the address of another int and I call it b. 01:22:40.330 --> 01:22:42.650 So that's all the star means in this context. 01:22:42.650 --> 01:22:43.780 It's a pointer to an int. 01:22:43.780 --> 01:22:46.600 It's a pointer to an int, both b and a. 01:22:46.600 --> 01:22:50.560 Down here just gets a little scary looking but it's the same exact thing. 01:22:50.560 --> 01:22:51.790 What does star a mean? 01:22:51.790 --> 01:22:55.010 Well, star means go to that address. 01:22:55.010 --> 01:22:59.180 So star a means follow the arrow to whatever a is pointing at. 01:22:59.180 --> 01:23:02.160 And what was a pointing at? 01:23:02.160 --> 01:23:03.610 It was pointing at x. 01:23:03.610 --> 01:23:07.180 So this means go to the address in a and that will reach-- 01:23:07.180 --> 01:23:10.110 that will lead you to x, whose value I think is 1. 01:23:10.110 --> 01:23:12.400 And that's going to store the number 1 in temp. 01:23:12.400 --> 01:23:14.230 The second line of code means go to b. 01:23:14.230 --> 01:23:19.106 So if you follow the address in b, where does it lead you? 01:23:19.106 --> 01:23:21.810 It should lead you to what we called y. 01:23:21.810 --> 01:23:24.060 And that y was a 2. 01:23:24.060 --> 01:23:27.690 And star a means go to the address in a and put whatever 01:23:27.690 --> 01:23:30.692 was at the address in b there as well. 01:23:30.692 --> 01:23:32.400 And then lastly, go ahead and take temp-- 01:23:32.400 --> 01:23:35.010 which is just the number one I claim --and go ahead and put it 01:23:35.010 --> 01:23:36.450 at the address in b. 01:23:36.450 --> 01:23:38.020 It's hard to see this in code. 01:23:38.020 --> 01:23:39.300 So let's instead visualize it. 01:23:39.300 --> 01:23:43.560 Instead, if I go back here to these three lines of code, 01:23:43.560 --> 01:23:45.540 here now is a correct version. 01:23:45.540 --> 01:23:48.190 The first line of code here says go to-- 01:23:48.190 --> 01:23:51.390 whatever-- go to the address in a and store it in temp. 01:23:51.390 --> 01:23:53.730 So in a moment I'm going to go to the address in a 01:23:53.730 --> 01:23:56.040 by following this arrow down to x. 01:23:56.040 --> 01:23:59.700 And I'm going to store in temp the number 1. 01:23:59.700 --> 01:24:02.520 Second line of code, I'm going to go to the address in b. 01:24:02.520 --> 01:24:05.520 so that's like following the arrow, which leads me to the 2 01:24:05.520 --> 01:24:09.660 I then follow the address and a, which leads me to x. 01:24:09.660 --> 01:24:12.750 And I put 2 in x. 01:24:12.750 --> 01:24:14.598 Last line, I go to temp. 01:24:14.598 --> 01:24:15.390 That's an easy one. 01:24:15.390 --> 01:24:16.620 It's just the number 1. 01:24:16.620 --> 01:24:20.620 Then I say, go to the address in b and store temp there. 01:24:20.620 --> 01:24:23.730 So let's go to the address in b by following the arrow 01:24:23.730 --> 01:24:26.160 and change it to temp. 01:24:26.160 --> 01:24:28.530 And so now I've still called another function. 01:24:28.530 --> 01:24:31.440 I'm still using local variables but these local variables 01:24:31.440 --> 01:24:35.490 are by definition now, pointers, addresses, or sort of treasure maps 01:24:35.490 --> 01:24:36.760 that are leading me-- 01:24:36.760 --> 01:24:41.890 a la these arrows --to the values in memory that I actually care about. 01:24:41.890 --> 01:24:43.650 And so now when the swap function returns, 01:24:43.650 --> 01:24:46.710 it doesn't matter that a and b and temp go away, 01:24:46.710 --> 01:24:53.660 I have actually changed fundamentally, what x and y themselves were. 01:24:53.660 --> 01:24:57.100 Any questions then on that code? 01:24:57.100 --> 01:24:57.746 Yeah. 01:24:57.746 --> 01:25:01.218 AUDIENCE: [INAUDIBLE] 01:25:07.840 --> 01:25:09.090 DAVID J. MALAN: Good question. 01:25:09.090 --> 01:25:12.830 So in this case, there is nothing to free because we did not use malloc. 01:25:12.830 --> 01:25:15.000 So you can use addresses without using malloc. 01:25:15.000 --> 01:25:17.000 In this case, I'm using the address of operator, 01:25:17.000 --> 01:25:19.750 which just tells me where x and y is-- 01:25:19.750 --> 01:25:20.418 or-- 01:25:20.418 --> 01:25:22.460 AUDIENCE: Not with this [INAUDIBLE],, in general, 01:25:22.460 --> 01:25:25.128 would you use malloc [INAUDIBLE] 01:25:25.128 --> 01:25:26.670 DAVID J. MALAN: Really good question. 01:25:26.670 --> 01:25:31.190 So if you're using malloc in a function and it returns some chunk of memory, 01:25:31.190 --> 01:25:32.540 how do you deal with that? 01:25:32.540 --> 01:25:35.600 The onus is on you to remember to somehow call free 01:25:35.600 --> 01:25:37.340 on that same block of memory. 01:25:37.340 --> 01:25:38.990 Case in point, getString does this. 01:25:38.990 --> 01:25:42.140 Long story short, getString allocates memory using malloc. 01:25:42.140 --> 01:25:45.230 And you, up to this date have never had to call 01:25:45.230 --> 01:25:48.760 free on strings, that's actually because one of the features of the CS50 library 01:25:48.760 --> 01:25:50.635 is something called garbage collection, where 01:25:50.635 --> 01:25:54.050 we notice if your program quits without freeing memory from getString. 01:25:54.050 --> 01:25:55.700 We do it for you magically. 01:25:55.700 --> 01:25:57.830 But you can see in the CS50 library how you can 01:25:57.830 --> 01:25:59.560 do exactly what you're asking about. 01:25:59.560 --> 01:26:02.250 And, or just ask me after as well. 01:26:02.250 --> 01:26:02.750 All right. 01:26:02.750 --> 01:26:06.530 So this is only to say that, OK, after all of last week's presumption 01:26:06.530 --> 01:26:09.510 that we could actually swap values, we can in fact do it. 01:26:09.510 --> 01:26:13.382 So how can we go about now solving more interesting, more real world problems? 01:26:13.382 --> 01:26:15.590 Well, let's transition from here to some of the power 01:26:15.590 --> 01:26:18.410 now that we gain by understanding these kinds of primitives. 01:26:18.410 --> 01:26:22.280 First of all, you might have noticed or anticipated this wasn't necessarily 01:26:22.280 --> 01:26:23.440 the best design. 01:26:23.440 --> 01:26:23.990 Right? 01:26:23.990 --> 01:26:26.640 What strikes you as worrisome about this picture at the moment? 01:26:26.640 --> 01:26:27.890 AUDIENCE: They're gonna crash. 01:26:27.890 --> 01:26:29.360 DAVID J. MALAN: Right, they're going to collide with each other. 01:26:29.360 --> 01:26:29.860 Right? 01:26:29.860 --> 01:26:32.698 If I keep calling malloc, malloc, malloc, malloc, per the arrow, 01:26:32.698 --> 01:26:35.240 I claim that you're going to keep using more and more memory. 01:26:35.240 --> 01:26:37.657 But it turns out you're going to keep using the stack too. 01:26:37.657 --> 01:26:40.040 If you call function, function, function, function, 01:26:40.040 --> 01:26:44.240 you're going to collide or somehow overrun each of these chunks of memory. 01:26:44.240 --> 01:26:46.530 And in fact, recall recursion from last week. 01:26:46.530 --> 01:26:49.670 If you don't have that base case and a function calls itself forever, 01:26:49.670 --> 01:26:52.718 you have what's actually called a stack overflow. 01:26:52.718 --> 01:26:55.760 And those of you familiar with the popular website for programmers, stack 01:26:55.760 --> 01:26:58.940 overflow derives its name from exactly that idea, the fact 01:26:58.940 --> 01:27:02.300 that a computer if running a program that has some bug-- 01:27:02.300 --> 01:27:06.260 whereby, function calls itself again, and again, and again, and again, 01:27:06.260 --> 01:27:09.110 and never stopping --you might overflow the stack. 01:27:09.110 --> 01:27:11.090 And there's other incarnations of that as well. 01:27:11.090 --> 01:27:14.330 But that's one of the forms from which the website gets its name. 01:27:14.330 --> 01:27:15.650 Heap overflow is the opposite. 01:27:15.650 --> 01:27:18.192 When you keep calling malloc, malloc, malloc, malloc, and you 01:27:18.192 --> 01:27:21.260 just ask for so much memory that you overwrite memory that's 01:27:21.260 --> 01:27:22.820 being used by some of your functions. 01:27:22.820 --> 01:27:24.920 Unfortunately, this is just the way life is. 01:27:24.920 --> 01:27:28.290 If you have a finite amount of memory, there is this risk. 01:27:28.290 --> 01:27:32.600 And this is why computers can only use so much memory before they indeed 01:27:32.600 --> 01:27:35.870 can't oh, load more files for you, can't open more images for you, 01:27:35.870 --> 01:27:40.520 or simply crash or freeze if the problem wasn't anticipated. 01:27:40.520 --> 01:27:43.220 Those are generally known as buffer overflows. 01:27:43.220 --> 01:27:46.790 So let's take off one final set of training wheels, if you will, 01:27:46.790 --> 01:27:49.700 all of these functions that you asked about earlier today. 01:27:49.700 --> 01:27:52.010 All of these functions, getFloat, getString, getDouble, 01:27:52.010 --> 01:27:57.380 and so forth-- from the CS50 library --actually deal with pointers for you 01:27:57.380 --> 01:27:59.990 and deal with memory addresses in a way that allows you not 01:27:59.990 --> 01:28:01.610 to have to worry about them. 01:28:01.610 --> 01:28:05.408 Let me go ahead and implement the same idea as getInt, 01:28:05.408 --> 01:28:08.450 but the low level way that you would have to do it if you didn't actually 01:28:08.450 --> 01:28:09.948 have CS50's library. 01:28:09.948 --> 01:28:11.990 I'm going to go ahead and create a program called 01:28:11.990 --> 01:28:14.500 scan f for formatted scan. 01:28:14.500 --> 01:28:18.510 And I'm going to go ahead and implement the following logic. 01:28:18.510 --> 01:28:22.490 Let me go ahead and first give myself include standard I/O dot 01:28:22.490 --> 01:28:24.740 h-- because I'm not going to use the CS50 library here 01:28:24.740 --> 01:28:28.550 at all --int main void-- so I have a default function --let me give myself 01:28:28.550 --> 01:28:29.900 a variable x. 01:28:29.900 --> 01:28:33.620 And let me go ahead and ask the human for a value of x. 01:28:33.620 --> 01:28:36.620 And then normally, I would have done this, 01:28:36.620 --> 01:28:39.380 getInt and get the int from the user. 01:28:39.380 --> 01:28:42.310 If we're taking away the CS50 library, we need an alternative. 01:28:42.310 --> 01:28:44.060 And it turns out there's a function called 01:28:44.060 --> 01:28:48.140 scanf and scanf is kind of similar to printf, 01:28:48.140 --> 01:28:52.160 where you give it a format code, which signifies what it is you want to scan 01:28:52.160 --> 01:28:54.620 from the user's keyboard, so to speak. 01:28:54.620 --> 01:28:58.370 And you specify the address of a chunk of memory 01:28:58.370 --> 01:29:01.550 that you want to put the user's input in. 01:29:01.550 --> 01:29:04.250 And then I'm going to go ahead, just arbitrarily, and print out 01:29:04.250 --> 01:29:08.370 that the human here typed in, for instance, that value. 01:29:08.370 --> 01:29:09.530 So what's new here? 01:29:09.530 --> 01:29:10.880 It's this line here. 01:29:10.880 --> 01:29:15.270 If we did not have the CS50 library and in turn, the getInt function, 01:29:15.270 --> 01:29:18.800 this is the line of code you would instead have been using since Week 1 01:29:18.800 --> 01:29:20.870 to get an integer from the user. 01:29:20.870 --> 01:29:24.890 It's up to you on line 5 to declare the variable, like x and int. 01:29:24.890 --> 01:29:28.910 It's then up to you on line 7 to pass the address of that variable 01:29:28.910 --> 01:29:33.440 to scanf because scanf's purpose in life is to give the human a blinking prompt. 01:29:33.440 --> 01:29:36.260 And provided the human types in a number and hits enter, 01:29:36.260 --> 01:29:40.590 that number will get stored at that address for you. 01:29:40.590 --> 01:29:44.330 And the reason why you need to call a function like scanf here-- 01:29:44.330 --> 01:29:47.900 or rather, the reason that you need to pass to scanf, the address of x, 01:29:47.900 --> 01:29:50.210 is for the same reason as swapping. 01:29:50.210 --> 01:29:53.660 If you want to use a helper function, something you wrote or someone else 01:29:53.660 --> 01:29:57.500 wrote, and you want it to change the value of a variable, 01:29:57.500 --> 01:29:59.180 you cannot pass it by value. 01:29:59.180 --> 01:30:02.240 You can't just pass an x because it will get a copy. 01:30:02.240 --> 01:30:03.770 And that will not persist. 01:30:03.770 --> 01:30:06.650 You have to instead use ampersand x to pass the address 01:30:06.650 --> 01:30:09.320 of x so that the function, swap-- 01:30:09.320 --> 01:30:11.990 or in this case, scanf --can go to that address 01:30:11.990 --> 01:30:14.720 and put some value there for you. 01:30:14.720 --> 01:30:17.000 Unfortunately, what scanf does not do is if the user 01:30:17.000 --> 01:30:19.520 types in Emma instead of an int, it's quite 01:30:19.520 --> 01:30:21.440 possible the program will choke, or crash, 01:30:21.440 --> 01:30:23.270 or behave in some unpredictable way. 01:30:23.270 --> 01:30:26.480 There's no error checking built in to scanf in this case. 01:30:26.480 --> 01:30:27.690 But let's try another thing. 01:30:27.690 --> 01:30:29.732 It's not that interesting to read in just an int. 01:30:29.732 --> 01:30:31.830 Let's try to read in something like a string. 01:30:31.830 --> 01:30:33.560 So I could give myself a string s-- 01:30:33.560 --> 01:30:35.990 although we know that there is no such thing as string. 01:30:35.990 --> 01:30:38.420 That's technically a char star or the address 01:30:38.420 --> 01:30:42.720 of a character called s --let me go ahead and prompt the human for string 01:30:42.720 --> 01:30:43.970 s here. 01:30:43.970 --> 01:30:47.150 And let me go ahead and read into that string using 01:30:47.150 --> 01:30:50.870 the percent s format code, the value s. 01:30:50.870 --> 01:30:55.670 And then let me go ahead and print out what the human typed for us, s colon 01:30:55.670 --> 01:30:56.660 that. 01:30:56.660 --> 01:30:59.270 So what am I doing here? 01:30:59.270 --> 01:31:03.320 Line 5 is saying, give me a variable called s 01:31:03.320 --> 01:31:06.650 that's going to store the address of a character. 01:31:06.650 --> 01:31:08.570 Line 6 just says, s colon, like print. 01:31:08.570 --> 01:31:11.150 It's a prompt for the human, nothing too interesting there. 01:31:11.150 --> 01:31:13.840 scanf is this function that takes the format code 01:31:13.840 --> 01:31:19.010 so it knows what to read from the user's keyboard and the address of a place 01:31:19.010 --> 01:31:19.977 to put it. 01:31:19.977 --> 01:31:22.310 And char star-- this is an address --I don't need to use 01:31:22.310 --> 01:31:25.610 ampersand because unlike an int, char star is already, 01:31:25.610 --> 01:31:28.580 by definition, a pointer or an address. 01:31:28.580 --> 01:31:31.850 And then lastly, I just print out whatever the human typed in. 01:31:31.850 --> 01:31:33.770 Unfortunately, let's see what happens here. 01:31:33.770 --> 01:31:37.640 Let me go ahead and save this. 01:31:37.640 --> 01:31:42.740 Make scanf-- give myself a bigger terminal window --enter. 01:31:42.740 --> 01:31:43.590 Oh, my goodness. 01:31:43.590 --> 01:31:44.090 All right. 01:31:44.090 --> 01:31:44.965 So what's wrong here? 01:31:44.965 --> 01:31:47.645 Variable s is uninitialized when used here. 01:31:47.645 --> 01:31:49.520 So Clang is trying to protect me from myself. 01:31:49.520 --> 01:31:52.070 I haven't initialized s to an address. 01:31:52.070 --> 01:31:53.810 Where do we want to put Emma's name? 01:31:53.810 --> 01:31:57.680 Well, maybe we could do like 0x123, or something like this, 01:31:57.680 --> 01:31:58.992 or in the absence of that-- 01:31:58.992 --> 01:32:00.950 if you don't know the address in advance --null 01:32:00.950 --> 01:32:02.783 is the convention to which it's alluding to. 01:32:02.783 --> 01:32:08.310 N-U-L-L is a special pointer that means there is no pointer there. 01:32:08.310 --> 01:32:09.140 It's all 0s. 01:32:09.140 --> 01:32:12.230 Let me try this again, make scanf-- 01:32:12.230 --> 01:32:15.080 OK, it seemed to work --dot slash scanf. 01:32:15.080 --> 01:32:17.374 Let me go ahead and type in Emma. 01:32:17.374 --> 01:32:18.230 Hmm. 01:32:18.230 --> 01:32:19.230 Emma is null. 01:32:19.230 --> 01:32:20.990 Let me try that again. 01:32:20.990 --> 01:32:25.400 So Emma is the Head CA for CS50-- 01:32:25.400 --> 01:32:27.590 let's type a longer string --null. 01:32:27.590 --> 01:32:31.880 So nothing even seems to fit, not even the first letter of her name. 01:32:31.880 --> 01:32:33.050 So why is that? 01:32:33.050 --> 01:32:35.990 And actually, sometimes we can get the program to crash. 01:32:35.990 --> 01:32:40.136 Let's see, a little weird but, let's do this. 01:32:40.136 --> 01:32:41.585 [CHUCKLES] 01:32:45.940 --> 01:32:47.402 So a longer string-- 01:32:47.402 --> 01:32:48.610 slightly creepy now, perhaps. 01:32:48.610 --> 01:32:50.710 But, OK. 01:32:50.710 --> 01:32:51.690 --enter. 01:32:51.690 --> 01:32:52.390 Dammit. 01:32:52.390 --> 01:32:53.380 Emma not found. 01:32:53.380 --> 01:32:56.080 OK, not what I intended. 01:32:56.080 --> 01:32:58.160 Let's do this once more. 01:32:58.160 --> 01:32:58.660 Oh, my god. 01:32:58.660 --> 01:33:06.110 Now, my histor-- OK, dot slash scanf, Emma, Emma, Emma, Emma, enter. 01:33:06.110 --> 01:33:07.016 Dammit. 01:33:07.016 --> 01:33:09.200 [LAUGHTER] 01:33:09.200 --> 01:33:12.500 OK, well, either way it's broken, which was the only point I'm trying to make. 01:33:12.500 --> 01:33:13.000 [LAUGHTER] 01:33:13.000 --> 01:33:15.370 So why is this not actually working? 01:33:15.370 --> 01:33:18.220 Well, you have to remember what char star s means. 01:33:18.220 --> 01:33:20.740 This means, give me a variable in which I can 01:33:20.740 --> 01:33:23.590 store the address of a chunk of memory. 01:33:23.590 --> 01:33:26.980 Null, at the moment is a symbol that means, 01:33:26.980 --> 01:33:29.050 like, there is no memory allocated yet. 01:33:29.050 --> 01:33:33.310 So technically speaking, I've not actually allocated any memory for Emma 01:33:33.310 --> 01:33:34.810 to actually be stored in. 01:33:34.810 --> 01:33:37.240 So really what I should be doing is something like this. 01:33:37.240 --> 01:33:39.157 If I know in advance, a little presumptuously, 01:33:39.157 --> 01:33:40.990 that the human's going to type in Emma, let 01:33:40.990 --> 01:33:45.520 me go ahead and give myself an array called s of size 5 01:33:45.520 --> 01:33:48.670 and then pass this in on line 7. 01:33:48.670 --> 01:33:50.240 So in short, there's this-- 01:33:50.240 --> 01:33:52.865 there's this relationship between arrays and pointers 01:33:52.865 --> 01:33:55.240 that's sort of been latent throughout today's discussion. 01:33:55.240 --> 01:33:58.390 An array is just a chunk of memory back-to-back-to-back. 01:33:58.390 --> 01:34:02.170 A string is just a sequence of characters back-to-back-to-back. 01:34:02.170 --> 01:34:05.890 A string is technically an address of the first byte of that memory. 01:34:05.890 --> 01:34:08.110 And so sort of by transitivity, a pointer 01:34:08.110 --> 01:34:12.560 can be viewed as the same thing as an array, at least in this context. 01:34:12.560 --> 01:34:15.940 So let me go ahead and allocate myself an array of five characters. 01:34:15.940 --> 01:34:22.030 It turns out that Clang will treat the name of an array just like a pointer 01:34:22.030 --> 01:34:25.240 if you use it in this context to scanf, passing 01:34:25.240 --> 01:34:28.670 in the address of the first byte in that array. 01:34:28.670 --> 01:34:31.630 So now if I go ahead and make scanf with this third version 01:34:31.630 --> 01:34:34.960 and do dot slash scanf and type in Emma-- 01:34:34.960 --> 01:34:35.980 that's four characters. 01:34:35.980 --> 01:34:39.220 I know safely I'm leaving room for the null terminator --now 01:34:39.220 --> 01:34:41.140 it's storing Emma's name successfully. 01:34:41.140 --> 01:34:45.260 And if I go ahead and do this here, emma, in lower case, that works. 01:34:45.260 --> 01:34:49.780 And if I get a little greedy and do like Emma Humphrey, first name, last name, 01:34:49.780 --> 01:34:50.740 Hmm. 01:34:50.740 --> 01:34:52.000 It didn't work. 01:34:52.000 --> 01:34:52.930 But why might that be? 01:34:52.930 --> 01:34:54.847 I haven't allocated enough space for her name. 01:34:54.847 --> 01:34:57.290 I'm lucky frankly, that the program's not crashing. 01:34:57.290 --> 01:35:00.130 But if I loaded as I was trying to do, a big enough paragraph 01:35:00.130 --> 01:35:03.980 of text, my program outright might crash or segfault, 01:35:03.980 --> 01:35:06.670 so to speak-- an error message that you'll likely see this week 01:35:06.670 --> 01:35:09.850 or next as we continue to use memory. 01:35:09.850 --> 01:35:11.860 Let me do one final example now because there's 01:35:11.860 --> 01:35:14.890 one sort of power we now get that we have the ability 01:35:14.890 --> 01:35:17.860 to talk in terms of memory addresses. 01:35:17.860 --> 01:35:21.070 I'm going to go ahead and make a program here, reminiscent of last week, 01:35:21.070 --> 01:35:23.860 called phone book dot C, whose purpose in life 01:35:23.860 --> 01:35:27.720 is going to be to store some information in a file-- 01:35:27.720 --> 01:35:28.720 for the very first time. 01:35:28.720 --> 01:35:31.137 I'm going to use the CS50 library just to put the training 01:35:31.137 --> 01:35:34.870 wheels back on briefly so I can get input from the human easily. 01:35:34.870 --> 01:35:38.410 But I'm going to go ahead then and use the string library and standard I/O, 01:35:38.410 --> 01:35:39.820 int main void. 01:35:39.820 --> 01:35:41.990 And I'm going to go ahead and do the following. 01:35:41.990 --> 01:35:45.640 I'm going to go ahead and open a file called 01:35:45.640 --> 01:35:52.030 file, using a new function called fopen, phone book dot CSV, a. 01:35:52.030 --> 01:35:53.720 Now what is going on here? 01:35:53.720 --> 01:35:56.770 Well it turns out, now that we know pointers-- or starting 01:35:56.770 --> 01:35:59.410 to get comfortable with pointers over the next couple of weeks 01:35:59.410 --> 01:36:03.490 --notice that I can actually use a new data type-- it's weirdly capitalized-- 01:36:03.490 --> 01:36:05.020 all caps, FILE. 01:36:05.020 --> 01:36:09.370 But I can say, give me a pointer to a file and call it lower case file. 01:36:09.370 --> 01:36:13.090 So this is just a variable called FILE, that effectively, for today's purposes, 01:36:13.090 --> 01:36:15.687 is going to store the contents of a file for me. 01:36:15.687 --> 01:36:18.520 It's not technically doing that but that's a reasonable mental model 01:36:18.520 --> 01:36:19.270 for now. 01:36:19.270 --> 01:36:23.830 fopen takes, as its first argument, the name of the file you want to open. 01:36:23.830 --> 01:36:29.320 And the second argument is either r, or w, or a-- r, for read w, for write, 01:36:29.320 --> 01:36:30.340 a, for append-- 01:36:30.340 --> 01:36:32.082 to just keep adding to a file. 01:36:32.082 --> 01:36:33.790 The goal at hand is to write a phone book 01:36:33.790 --> 01:36:36.610 program that lets me type in a human's name and number 01:36:36.610 --> 01:36:38.360 and just keep appending it to a text file, 01:36:38.360 --> 01:36:41.443 like a database that I can store if I want to keep track of people's phone 01:36:41.443 --> 01:36:41.980 numbers. 01:36:41.980 --> 01:36:46.790 fopen, by definition, is going to return a pointer to that file. 01:36:46.790 --> 01:36:49.523 So let me go ahead now and do the following. 01:36:49.523 --> 01:36:52.690 First, I'm going to go ahead and give myself a name, although I don't really 01:36:52.690 --> 01:36:53.890 need to use string per se. 01:36:53.890 --> 01:36:55.300 I'll use char star name. 01:36:55.300 --> 01:36:58.450 But I am going to use getString just to save myself some trouble here, 01:36:58.450 --> 01:37:00.070 asking the human for their name. 01:37:00.070 --> 01:37:03.530 I am going to then ask the human for their number using getString as well. 01:37:03.530 --> 01:37:05.350 But again I could use scanf If I want. 01:37:05.350 --> 01:37:08.770 But it's going to require more error checking today. 01:37:08.770 --> 01:37:10.550 And now I'm going to go ahead and do this. 01:37:10.550 --> 01:37:12.970 It turns out that besides the function printf, 01:37:12.970 --> 01:37:17.290 there's another function called fprintf, which means file printf. 01:37:17.290 --> 01:37:19.400 You can print literally to a file. 01:37:19.400 --> 01:37:23.740 So I'm going to go ahead here and now do print to this file, 01:37:23.740 --> 01:37:28.160 print a string, and a comma, and another string, and then a new line. 01:37:28.160 --> 01:37:31.840 And I'm going to go ahead and print out someone's name and then their number. 01:37:31.840 --> 01:37:34.660 And then down here I'm going to close the file. 01:37:34.660 --> 01:37:37.620 So a bunch of new lines, but this one in short-- 01:37:37.620 --> 01:37:45.280 I'll comment it --open file, get strings from user, print-- 01:37:45.280 --> 01:37:49.720 that is write --strings to file, and then close file. 01:37:49.720 --> 01:37:51.970 So new functions but pretty straightforward at least, 01:37:51.970 --> 01:37:53.137 conceptually, I would argue. 01:37:53.137 --> 01:37:56.320 It's terms of what's happening even though the syntax is a little strange. 01:37:56.320 --> 01:37:59.260 But I did deliberately choose this file name, phone book dot CSV. 01:37:59.260 --> 01:38:02.100 Does anyone know what a CSV is? 01:38:02.100 --> 01:38:03.850 Yeah, comma separated variables. 01:38:03.850 --> 01:38:04.900 It's like a very-- 01:38:04.900 --> 01:38:07.990 comma separated values, it's a very simple spreadsheet format 01:38:07.990 --> 01:38:11.410 that you can open in Excel, or Apple Numbers, or other tools like that. 01:38:11.410 --> 01:38:14.390 So I can actually make my own CSV files kind of like this. 01:38:14.390 --> 01:38:16.078 Let me go ahead and make phone book. 01:38:16.078 --> 01:38:17.370 All right, that seemed to work. 01:38:17.370 --> 01:38:19.630 Let me go ahead and do dot slash phone book. 01:38:19.630 --> 01:38:22.360 And now it's asking for a name, so I'll do Emma. 01:38:22.360 --> 01:38:27.340 And then I think her number last week was 555-0100, enter. 01:38:27.340 --> 01:38:30.490 But notice this, if I type ls, besides all of the programs 01:38:30.490 --> 01:38:34.430 we've written today, there's also this phone book dot CSV file. 01:38:34.430 --> 01:38:37.270 And in fact, let me open up phone book dot CSV. 01:38:37.270 --> 01:38:40.310 And there's Emma's name and number in a file. 01:38:40.310 --> 01:38:42.490 Let me go ahead and run it once more and this time 01:38:42.490 --> 01:38:48.280 do Rodrigo, like last week, 617-555-0101, enter. 01:38:48.280 --> 01:38:50.950 And voila, his name just appeared in the file. 01:38:50.950 --> 01:38:51.820 We'll do one more. 01:38:51.820 --> 01:38:56.320 So Brian was 617-555-0102, enter. 01:38:56.320 --> 01:38:58.870 And the CSV file is getting updated in real time. 01:38:58.870 --> 01:39:02.290 And now if I actually go and download this file from the IDE 01:39:02.290 --> 01:39:04.300 by control clicking or right clicking on it, 01:39:04.300 --> 01:39:05.930 that ends up in my downloads folder. 01:39:05.930 --> 01:39:08.930 And if I go ahead and click on this-- if you have something like Numbers 01:39:08.930 --> 01:39:12.400 or Microsoft Excel installed and you use it for the very first time-- 01:39:12.400 --> 01:39:16.300 you'll see that it's opened up a spreadsheet containing 01:39:16.300 --> 01:39:17.783 those names and those numbers. 01:39:17.783 --> 01:39:20.950 So if you've ever needed to do a sort of data science-like analysis of data, 01:39:20.950 --> 01:39:23.470 you can actually write code that generates the data for you 01:39:23.470 --> 01:39:29.470 in a CSV format and gives you these, perhaps, familiar, rows and columns. 01:39:29.470 --> 01:39:34.360 But let me do one final example now that will motivate this coming week's 01:39:34.360 --> 01:39:35.620 problem set challenges. 01:39:35.620 --> 01:39:39.430 So I'm going to go ahead now and write a final program that-- whose 01:39:39.430 --> 01:39:41.410 purpose in life is to detect this. 01:39:41.410 --> 01:39:48.310 I have here in front of me a picture of Brian [LAUGHTER] in JPEG format. 01:39:48.310 --> 01:39:52.660 And I have a cat in GIF format-- which doesn't work in the IDE 01:39:52.660 --> 01:39:56.020 but let me go ahead and download it locally --does look like this. 01:39:56.020 --> 01:39:58.060 So it's this guy from a couple of weeks ago. 01:39:58.060 --> 01:40:00.240 But both-- one is in GIF format, one is in JPEG, 01:40:00.240 --> 01:40:01.990 which if you're familiar from file formats 01:40:01.990 --> 01:40:04.030 are just different types of images. 01:40:04.030 --> 01:40:10.000 Let me go ahead and write a program real quick that is called JPEG dot c. 01:40:10.000 --> 01:40:15.430 And its purpose in life is just to check if a file passed by its name 01:40:15.430 --> 01:40:18.430 at the command line is a JPEG or not. 01:40:18.430 --> 01:40:22.270 I'm going to go ahead and include standard I/O dot h. 01:40:22.270 --> 01:40:24.850 I'm going to call my function int main, but not void. 01:40:24.850 --> 01:40:27.820 This time I'm going to use int argc, like last week, 01:40:27.820 --> 01:40:31.633 and string argv open paren-- 01:40:31.633 --> 01:40:32.800 open bracket closed bracket. 01:40:32.800 --> 01:40:33.550 But you know what? 01:40:33.550 --> 01:40:35.050 We don't need strings anymore. 01:40:35.050 --> 01:40:38.858 This is actually what you've been typing sort of unknowingly the past week when 01:40:38.858 --> 01:40:41.650 you were using command line arguments, or the past couple of weeks. 01:40:41.650 --> 01:40:44.020 Now I'm going to go ahead and do a quick error check. 01:40:44.020 --> 01:40:46.900 If argc does not equal 2, I'm just going to. quit. 01:40:46.900 --> 01:40:49.510 I want the human to type, not just the program's name, 01:40:49.510 --> 01:40:51.250 but one other word as well. 01:40:51.250 --> 01:40:54.340 I then want to go ahead and open up the file 01:40:54.340 --> 01:40:56.300 that the human typed in at the prompt-- 01:40:56.300 --> 01:40:59.890 which I claim is going to be the second word they type --so argv 1. 01:40:59.890 --> 01:41:02.490 And I want to read it this time, not append line-by-line, 01:41:02.490 --> 01:41:04.240 I just want to read it from the beginning. 01:41:04.240 --> 01:41:07.270 And the key-- keyword for that is r. 01:41:07.270 --> 01:41:10.660 And then I'm going to go ahead and actually do a little error check. 01:41:10.660 --> 01:41:12.940 If file equals equals null-- 01:41:12.940 --> 01:41:16.090 we haven't seen this before --but if fopen, if malloc, 01:41:16.090 --> 01:41:19.300 if getString return error conditions, they actually 01:41:19.300 --> 01:41:20.550 return the special value null. 01:41:20.550 --> 01:41:23.217 But for now, let me just go ahead and say, something went wrong. 01:41:23.217 --> 01:41:24.160 I'm going to return 1. 01:41:24.160 --> 01:41:26.770 But we won't worry too much more about it for now. 01:41:26.770 --> 01:41:30.850 So at this point I have opened file. 01:41:30.850 --> 01:41:36.310 I have ensure user ran program with two words 01:41:36.310 --> 01:41:39.310 at prompt, that's our argc use there. 01:41:39.310 --> 01:41:41.180 Now let's go ahead and do this. 01:41:41.180 --> 01:41:45.610 I'm going to go ahead and give myself an array of 3 bytes. 01:41:45.610 --> 01:41:48.755 And I'm going to go ahead and use a function called fread-- 01:41:48.755 --> 01:41:50.630 And we'll see more of this in the assignment. 01:41:50.630 --> 01:41:52.080 So this is deliberately quick. 01:41:52.080 --> 01:41:56.350 --I pass in his argument, the array, the number of bytes I want to read, 01:41:56.350 --> 01:41:59.410 how many times I want to read those bytes, and then the file 01:41:59.410 --> 01:42:01.210 from which I want to read those bytes. 01:42:01.210 --> 01:42:02.500 So that was a mouthful. 01:42:02.500 --> 01:42:08.420 But collectively, these two lines of code read 3 bytes from file. 01:42:08.420 --> 01:42:13.180 It just literally reads the first 24 bits, or 3 bytes-- 01:42:13.180 --> 01:42:15.940 each of which is 8 bits --from the file. 01:42:15.940 --> 01:42:17.420 And why am I doing this? 01:42:17.420 --> 01:42:24.247 Well, it turns out, check if bytes are 0xFF, 0xD8, 0xxFF. 01:42:24.247 --> 01:42:26.080 So again, coming full circle to hexadecimal, 01:42:26.080 --> 01:42:30.190 it turns out that in the documentation for the JPEG image format, 01:42:30.190 --> 01:42:32.925 the first 3 bytes of any JPEG in the world-- 01:42:32.925 --> 01:42:35.050 any photograph you've ever taken with your camera-- 01:42:35.050 --> 01:42:38.200 start with FF, then D8, then FF. 01:42:38.200 --> 01:42:41.770 This is a so-called magic number that the designers of the JPEG format 01:42:41.770 --> 01:42:45.330 just decided, use this as a sort of clue at the beginning of the file that hey, 01:42:45.330 --> 01:42:48.130 here comes a JPEG image. 01:42:48.130 --> 01:42:49.210 So how do I do this? 01:42:49.210 --> 01:42:53.970 It's actually pretty simple, if bytes 0 equals equals 0xFF-- 01:42:53.970 --> 01:42:57.840 I can literally type hexadecimal in C --or byte-- 01:42:57.840 --> 01:43:06.870 rather, and bytes 1 equals 0xD8, and bytes 2 equals equals 0xFF, 01:43:06.870 --> 01:43:10.947 then it turns out, it's probably a JPEG. 01:43:10.947 --> 01:43:12.030 There are some conditions. 01:43:12.030 --> 01:43:13.405 We'll explore in the problem set. 01:43:13.405 --> 01:43:16.380 So I'm just going to say maybe it's a JPEG. 01:43:16.380 --> 01:43:20.220 But if that's not true, I am going to say with confidence, 01:43:20.220 --> 01:43:25.110 no, it's not a JPEG if those first 3 bytes are not that. 01:43:25.110 --> 01:43:27.060 And then for arcane reasons, I technically 01:43:27.060 --> 01:43:28.920 need to make this what's called unsigned, 01:43:28.920 --> 01:43:33.660 which means it's a number from 0 to 255, instead of negative 128 to 127. 01:43:33.660 --> 01:43:37.170 But let me wave my hands at that, just so that we get this code right for now. 01:43:37.170 --> 01:43:41.550 I'm going to go ahead and run JPEG and fail miserably. 01:43:41.550 --> 01:43:43.400 What did I do wrong? 01:43:43.400 --> 01:43:46.410 fopen is the name of that function-- sorry-- 01:43:46.410 --> 01:43:48.390 make JPEG, good. 01:43:48.390 --> 01:43:52.710 And now I'm going to run JPEG on my Brian 01:43:52.710 --> 01:43:55.740 image, which is in my source for directory on the course's website. 01:43:55.740 --> 01:43:57.420 He is maybe a JPEG. 01:43:57.420 --> 01:44:01.770 And then I'm going to go ahead and do JPEG on source for cat dot GIF, which 01:44:01.770 --> 01:44:06.720 is no, not a GIF, which is to say that once you have the ability to express 01:44:06.720 --> 01:44:10.770 pointers, we now have the programmatic capabilities, not only to write files, 01:44:10.770 --> 01:44:12.370 but read them as well. 01:44:12.370 --> 01:44:15.210 Now what can we actually use that information for? 01:44:15.210 --> 01:44:19.500 Well it turns out what we'll be doing now, this coming week and beyond, 01:44:19.500 --> 01:44:25.470 is exploring a number of features here of what's 01:44:25.470 --> 01:44:29.010 called file I/O. Long story short, if you've ever wondered really 01:44:29.010 --> 01:44:32.290 what an image is-- we talked briefly about this in Week 0 01:44:32.290 --> 01:44:33.540 --this is an image. 01:44:33.540 --> 01:44:36.090 But it's in binary, 0s and 1s. 01:44:36.090 --> 01:44:37.950 Does anyone know what this image is of? 01:44:37.950 --> 01:44:38.982 AUDIENCE: A smiley face. 01:44:38.982 --> 01:44:42.190 DAVID J. MALAN: Well, how did you-- are a nonzero number of you looking ahead 01:44:42.190 --> 01:44:42.773 on the slides? 01:44:42.773 --> 01:44:44.470 Because yes, it's a smiley face. 01:44:44.470 --> 01:44:47.980 And you would only know this by assuming that 1 represents 01:44:47.980 --> 01:44:50.630 a white pixel, 0 represents a black pixel, 01:44:50.630 --> 01:44:53.440 and if we effectively have a grid of bits-- 01:44:53.440 --> 01:44:57.410 1's and 0's --this from far back kind of looks like the simplest possible smiley 01:44:57.410 --> 01:44:57.910 face. 01:44:57.910 --> 01:45:01.420 So that's an image, or a bitmap, a map of bits, 01:45:01.420 --> 01:45:03.725 that represent the pixels in an image. 01:45:03.725 --> 01:45:06.100 So with problem set four, what we're going to start to do 01:45:06.100 --> 01:45:08.858 is explore the world of forensics, first and foremost. 01:45:08.858 --> 01:45:10.150 And we have a few minutes left. 01:45:10.150 --> 01:45:13.100 And we're going to spend one of them on this little teaser here, 01:45:13.100 --> 01:45:17.870 which is something that you might see typically on your typical CSI type 01:45:17.870 --> 01:45:18.370 shows. 01:45:18.370 --> 01:45:20.570 And let's motivate it as follows. 01:45:20.570 --> 01:45:22.533 If we could dim the lights for this clip. 01:45:22.533 --> 01:45:23.360 [VIDEO PLAYBACK] 01:45:23.360 --> 01:45:25.520 - --we know? 01:45:25.520 --> 01:45:28.730 - That at 9:15, Ray Santoya was at the ATM. 01:45:28.730 --> 01:45:32.600 - OK, so the question is, what was he doing at 9:16? 01:45:32.600 --> 01:45:35.370 - Shooting the 9 millimeter at something. 01:45:35.370 --> 01:45:37.040 Maybe he saw the sniper. 01:45:37.040 --> 01:45:38.443 - Or was working with him? 01:45:38.443 --> 01:45:39.130 [BEEPS] 01:45:39.130 --> 01:45:41.460 - Wait, go back one. 01:45:41.460 --> 01:45:42.582 - What do you see? 01:45:42.582 --> 01:45:43.082 [TYPING] 01:45:43.082 --> 01:45:44.054 [BEEPS] 01:45:49.890 --> 01:45:51.900 - Bring his face up, full screen. 01:45:51.900 --> 01:45:53.700 [BEEPS] 01:45:53.700 --> 01:45:54.672 - His glasses. 01:45:54.672 --> 01:45:55.630 - There's a reflection. 01:45:57.088 --> 01:45:58.546 [TYPING] 01:45:58.546 --> 01:46:00.004 [BEEPS] 01:46:00.004 --> 01:46:02.920 [CHUCKLE] 01:46:02.920 --> 01:46:05.336 [BEEPS] 01:46:05.336 --> 01:46:05.836 [LAUGHTER] 01:46:05.836 --> 01:46:07.820 - [INAUDIBLE] baseball team. 01:46:07.820 --> 01:46:08.890 That's their logo. 01:46:08.890 --> 01:46:11.190 - And he's talking to whoever's wearing that jacket. 01:46:11.190 --> 01:46:13.360 - We may have a witness. 01:46:13.360 --> 01:46:15.017 - To both shootings. 01:46:15.017 --> 01:46:15.600 [END PLAYBACK] 01:46:15.600 --> 01:46:18.183 DAVID J. MALAN: So at the risk of ruining a lot of TV for you, 01:46:18.183 --> 01:46:19.270 this is not a thing. 01:46:19.270 --> 01:46:22.170 You can't just say enhance and things get enhanced. 01:46:22.170 --> 01:46:22.680 Why? 01:46:22.680 --> 01:46:24.380 Well, here's that same picture of Brian. 01:46:24.380 --> 01:46:27.480 And let's [LAUGHTER] look at this glint in his eye. 01:46:27.480 --> 01:46:28.480 Let's see what's there. 01:46:28.480 --> 01:46:31.680 If we could zoom in on this, and then zoom in on this, and then 01:46:31.680 --> 01:46:32.520 zoom in on this. 01:46:32.520 --> 01:46:35.640 This is all of the data that is in Brian's eye. 01:46:35.640 --> 01:46:37.920 There is no enhance at that point, when you're 01:46:37.920 --> 01:46:42.360 looking at just pixels represented by colors, a la Week 0. 01:46:42.360 --> 01:46:44.917 So what you'll do for this coming week in fact-- 01:46:44.917 --> 01:46:46.750 in fact, let's actually make this more real. 01:46:46.750 --> 01:46:50.040 If we could go back to the clip here for just 20 seconds, 01:46:50.040 --> 01:46:52.213 if we could dim the lights once more. 01:46:52.213 --> 01:46:52.880 [VIDEO PLAYBACK] 01:46:52.880 --> 01:46:54.950 - Magnify that death sphere. 01:46:54.950 --> 01:46:57.150 [BEEPS] 01:46:57.150 --> 01:46:58.610 Why is it still blurry? 01:46:58.610 --> 01:47:00.570 - That's all the resolution we have. 01:47:00.570 --> 01:47:02.940 Making it bigger doesn't make it clearer. 01:47:02.940 --> 01:47:04.757 - It does on CSI Miami. 01:47:04.757 --> 01:47:05.340 [END PLAYBACK] 01:47:05.340 --> 01:47:07.215 DAVID J. MALAN: So with that said, this week, 01:47:07.215 --> 01:47:09.390 will we understand all the more how images work? 01:47:09.390 --> 01:47:11.430 And here for instance, is a shot of the Charles River. 01:47:11.430 --> 01:47:14.640 And for the first part of the problem set, we implement a number of Instagram 01:47:14.640 --> 01:47:17.070 like filters, understanding how an image is represented 01:47:17.070 --> 01:47:19.170 and how you therefore can transform it. 01:47:19.170 --> 01:47:22.230 For instance, first, into grayscale, by writing your own grayscale 01:47:22.230 --> 01:47:24.720 filter, into sepia, into-- 01:47:24.720 --> 01:47:28.440 reflecting it on the opposite from left to right, blurring an image, 01:47:28.440 --> 01:47:28.980 even still. 01:47:28.980 --> 01:47:31.272 And if you're feeling more comfortable, to do something 01:47:31.272 --> 01:47:33.810 called edge detection, which finds all of the edges 01:47:33.810 --> 01:47:36.480 within a particular picture. 01:47:36.480 --> 01:47:38.520 More than that, will you actually implement 01:47:38.520 --> 01:47:41.160 code that recovers JPEG files? 01:47:41.160 --> 01:47:43.900 We've been taking some photographs of people, places, and things. 01:47:43.900 --> 01:47:46.530 Unfortunately, we accidentally deleted those photos 01:47:46.530 --> 01:47:50.548 but first made a forensic image of the memory card from the camera, which 01:47:50.548 --> 01:47:52.590 we will then provide to you so that you can write 01:47:52.590 --> 01:47:55.470 code in C that recovers all of the seemingly 01:47:55.470 --> 01:47:58.950 lost JPEGs from that forensic image. 01:47:58.950 --> 01:48:01.740 And last but not least, it would not be a CS class 01:48:01.740 --> 01:48:03.180 without a little bit of CS humor. 01:48:03.180 --> 01:48:09.218 We thought we'd end on this one note, a joke that you will perhaps now get. 01:48:09.218 --> 01:48:11.130 [LAUGHTER] 01:48:11.130 --> 01:48:11.630 All right. 01:48:11.630 --> 01:48:12.890 That's it for CS50. 01:48:12.890 --> 01:48:13.890 We'll see you next time. 01:48:13.890 --> 01:48:16.040 [MUSIC PLAYING]