WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:04.482 [MUSIC PLAYING] 00:00:49.370 --> 00:00:53.270 DAVID MALAN: All right, this is CS50, and this is week four. 00:00:53.270 --> 00:00:55.190 And for the past several weeks, we've had 00:00:55.190 --> 00:00:58.217 training wheels of sorts on, while using this language known as C. 00:00:58.217 --> 00:01:01.050 And those training wheels have been in the form of the CS50 library. 00:01:01.050 --> 00:01:05.580 And you use this library, of course, by selecting and including cs50.h 00:01:05.580 --> 00:01:06.650 atop your code. 00:01:06.650 --> 00:01:08.733 And then if you think about how clang works, 00:01:08.733 --> 00:01:12.080 you've been linking your code via dash L CS50. 00:01:12.080 --> 00:01:15.290 But all of that has been automated for you up until now, using make. 00:01:15.290 --> 00:01:17.900 Today, we'll transition from last week's focus 00:01:17.900 --> 00:01:21.290 on algorithms to a little more focus on machines 00:01:21.290 --> 00:01:24.980 and on the machines we now use to implement these algorithms all the more 00:01:24.980 --> 00:01:27.410 powerfully, as we begin to take off these training wheels 00:01:27.410 --> 00:01:30.840 and look at what's really going on underneath the hood of your computer. 00:01:30.840 --> 00:01:33.740 And as complicated as some aspects of C have been, 00:01:33.740 --> 00:01:36.320 as new is programming may very well be to you, 00:01:36.320 --> 00:01:39.710 realize that there's not all that much going on underneath the hood 00:01:39.710 --> 00:01:42.350 that we need to understand to now move onward 00:01:42.350 --> 00:01:45.920 and start solving far more interesting and more sophisticated and more 00:01:45.920 --> 00:01:46.820 fun problems. 00:01:46.820 --> 00:01:49.170 We just need a few additional building blocks. 00:01:49.170 --> 00:01:52.340 And so today, we'll do this, first, by relearning how to count. 00:01:52.340 --> 00:01:55.080 Here, for instance, is what we'll call the computer's memory. 00:01:55.080 --> 00:01:56.420 And we've seen this grid before. 00:01:56.420 --> 00:01:59.420 And we can number recall all of the bytes in your computer's memory. 00:01:59.420 --> 00:02:04.550 We might call this byte number 0, 1, 2, 3, 4, all the way up to byte 15, 00:02:04.550 --> 00:02:05.610 and so forth. 00:02:05.610 --> 00:02:08.240 But it turns out, when talking about computers' memories, 00:02:08.240 --> 00:02:10.610 computers and computer scientists and programmers 00:02:10.610 --> 00:02:13.070 actually don't tend to use decimal. 00:02:13.070 --> 00:02:15.830 They definitely don't tend to use binary at that low level. 00:02:15.830 --> 00:02:19.010 Instead, they tend to use, just for conventional sake, 00:02:19.010 --> 00:02:21.020 something called hexadecimal. 00:02:21.020 --> 00:02:23.210 Hexadecimal is a different base system that, 00:02:23.210 --> 00:02:27.120 instead of using 10 digits or 2 digits, uses 16 instead. 00:02:27.120 --> 00:02:29.360 And so a computer scientist, when numbering things 00:02:29.360 --> 00:02:33.980 like bytes in a computer memory, would still do 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. 00:02:33.980 --> 00:02:37.350 But after that, instead of going onward with decimal to, say, 10, 00:02:37.350 --> 00:02:40.970 11, 12, 13, 14, 15, they instead, conventionally, 00:02:40.970 --> 00:02:43.260 would start using a few letters of the alphabet. 00:02:43.260 --> 00:02:47.270 And so, in hexadecimal, this different base system base 16, 00:02:47.270 --> 00:02:48.980 you start counting at 0 still. 00:02:48.980 --> 00:02:51.130 You count up to and through 9. 00:02:51.130 --> 00:02:52.880 But when you want to keep counting higher, 00:02:52.880 --> 00:02:57.440 you then go to A, B, C, D, E, and F. 00:02:57.440 --> 00:03:02.630 And the upside of this is that, within hexadecimal-- and that hex implies 16-- 00:03:02.630 --> 00:03:08.630 you have 16 total individual digits, 0 through 9, and also now, A through F. 00:03:08.630 --> 00:03:12.300 So we don't have to introduce second digits just to count up as high as 16. 00:03:12.300 --> 00:03:14.480 We can use individual digits 0 through F. 00:03:14.480 --> 00:03:18.650 And we can keep counting up further by using multiple hexadecimal digits. 00:03:18.650 --> 00:03:21.150 But to get there, let's introduce this vocabulary. 00:03:21.150 --> 00:03:23.540 So in binary, of course, we use 0's and 1's. 00:03:23.540 --> 00:03:25.690 In decimal, of course, we use 0 through 9's. 00:03:25.690 --> 00:03:29.360 And in hexadecimal, to be clear, we're going to use 0 through F's, otherwise 00:03:29.360 --> 00:03:30.860 known as base-16. 00:03:30.860 --> 00:03:33.320 And it's just a convention that we use A through F. We 00:03:33.320 --> 00:03:35.450 could have used any other six symbols. 00:03:35.450 --> 00:03:37.560 But these are what humans have chosen. 00:03:37.560 --> 00:03:41.090 So hexadecimal works quite similarly to our familiar decimal system. 00:03:41.090 --> 00:03:45.110 And it's even familiar to, now, what you know as the binary system, as follows. 00:03:45.110 --> 00:03:49.370 Let's consider a two-digit value using hexadecimal instead of decimal 00:03:49.370 --> 00:03:50.600 and instead of binary. 00:03:50.600 --> 00:03:54.680 Well, just like in the world of decimal, we used base-10, 00:03:54.680 --> 00:03:57.080 or in the world of binary, we used base-2. 00:03:57.080 --> 00:04:01.170 We're just going to use, now, base-16, ergo, hexadecimal. 00:04:01.170 --> 00:04:02.360 So this is 16 to the first. 00:04:02.360 --> 00:04:03.590 This is 16 to the-- 00:04:03.590 --> 00:04:05.090 sorry 16 to the 0. 00:04:05.090 --> 00:04:06.590 This is 16 to the first. 00:04:06.590 --> 00:04:09.570 And of course, if we multiply that out, it's just the ones column 00:04:09.570 --> 00:04:11.280 and now the 16's column. 00:04:11.280 --> 00:04:13.550 And so if you want to count up in hexadecimal, 00:04:13.550 --> 00:04:21.290 you still start with 0 as usual, then 01, 02, 03, 04, 05, 06, 07, 08, 09. 00:04:21.290 --> 00:04:22.910 And then things get interesting. 00:04:22.910 --> 00:04:26.660 Now, you don't go to 01, because that would be incorrect. 00:04:26.660 --> 00:04:31.880 01, in this base system, would be like 16 times 1 plus 1 times 0. 00:04:31.880 --> 00:04:32.960 That's not what we want. 00:04:32.960 --> 00:04:38.930 After the number we know is 9, we now count up to A, B, C, D, E, F. 00:04:38.930 --> 00:04:40.670 And now, things get interesting again. 00:04:40.670 --> 00:04:43.580 But just like in the decimal system, when you count up to, like, 99, 00:04:43.580 --> 00:04:46.550 you have to start carrying the 1, same thing here. 00:04:46.550 --> 00:04:49.820 If you want to count past F, you carry the 1. 00:04:49.820 --> 00:04:55.340 And so now, to represent one value greater than F, we use 01, 00:04:55.340 --> 00:04:57.350 which looks like 10, but is not 10. 00:04:57.350 --> 00:04:59.675 In hexadecimal, it is 01. 00:04:59.675 --> 00:05:01.880 16 times 1 gives us 16. 00:05:01.880 --> 00:05:03.680 1 times 0 gives us 0. 00:05:03.680 --> 00:05:07.050 And of course, that gives us the decimal number we now know is 16. 00:05:07.050 --> 00:05:09.980 So we will no longer introduce more and more base systems. 00:05:09.980 --> 00:05:12.607 But let me stipulate that just by using these columns 00:05:12.607 --> 00:05:14.690 that you learned back in grade school, presumably, 00:05:14.690 --> 00:05:16.940 can you implement any base system now. 00:05:16.940 --> 00:05:19.310 It just so happens that in the world of computers, 00:05:19.310 --> 00:05:22.295 and today in the world of memory, and soon, also files, 00:05:22.295 --> 00:05:24.170 it's just going to be very conventional to be 00:05:24.170 --> 00:05:26.990 able to recognize and use hexadecimal. 00:05:26.990 --> 00:05:29.530 And in fact, there's a reason humans like hexadecimal, 00:05:29.530 --> 00:05:30.530 or at least some humans. 00:05:30.530 --> 00:05:36.827 Computer scientists recall that if we count up as high as FF, in this case, 00:05:36.827 --> 00:05:38.160 we would still do the same math. 00:05:38.160 --> 00:05:44.060 So 16 times 15 plus 1 times 15 is going to give us, really, this, 00:05:44.060 --> 00:05:49.210 or of course, 240 plus 15, or 255. 00:05:49.210 --> 00:05:50.460 And I did that pretty quickly. 00:05:50.460 --> 00:05:53.000 But that's just the sort of grade school math of multiplying 00:05:53.000 --> 00:05:55.730 the column by the value that's in it, where again, 00:05:55.730 --> 00:06:00.140 each of these F's is how we now express 15 using a single digit. 00:06:00.140 --> 00:06:02.480 But recall that we've seen 255 before. 00:06:02.480 --> 00:06:04.610 Back when we talked about binary a few weeks ago, 00:06:04.610 --> 00:06:12.450 255 also happened to be the pattern that we see here, eight 1 bits using binary. 00:06:12.450 --> 00:06:15.278 And so the reason that computer scientists tend to hexadecimal, 00:06:15.278 --> 00:06:17.570 is that, you know what, in eight bits, there's actually 00:06:17.570 --> 00:06:20.000 two pairs here, like four on the left, four on the right. 00:06:20.000 --> 00:06:22.340 If we sort of scooch these things over, it 00:06:22.340 --> 00:06:25.520 turns out that because hexadecimal allows 00:06:25.520 --> 00:06:28.730 you to represent 16 possible values, it's 00:06:28.730 --> 00:06:32.750 a perfect system for representing four bits at a time. 00:06:32.750 --> 00:06:36.980 After all, if you've got four bits here, each of which can be a 0 or 1, 00:06:36.980 --> 00:06:42.020 that's 2 times 2 times 2 times 2 possible values for each of those, 00:06:42.020 --> 00:06:45.740 or 16 total values, which is to say that in the world of computers, 00:06:45.740 --> 00:06:48.560 if you ever want to talk in units of four bits, 00:06:48.560 --> 00:06:51.590 it's wonderfully convenient to use hexadecimal instead, 00:06:51.590 --> 00:06:56.270 only because, conveniently, one hexadecimal digit happens to be 00:06:56.270 --> 00:07:00.590 equivalent to four binary digits, 0's and 1's. 00:07:00.590 --> 00:07:05.160 So 0, 0, 0, 0, all the way up through 1, 1, 1, 1. 00:07:05.160 --> 00:07:06.320 So why do humans do this? 00:07:06.320 --> 00:07:09.240 It's just now the human convention because of that convenience. 00:07:09.240 --> 00:07:11.760 Now, some of you may very well have seen hexadecimal before. 00:07:11.760 --> 00:07:14.660 In fact, recall our discussion in week 0 of RGB, 00:07:14.660 --> 00:07:17.660 where we discussed the representation of colors using 00:07:17.660 --> 00:07:19.860 some amount of red, green, and blue. 00:07:19.860 --> 00:07:21.720 And at the time, we used this example. 00:07:21.720 --> 00:07:24.080 We took our example out of context. 00:07:24.080 --> 00:07:27.560 And instead of using hi as a string of text, 00:07:27.560 --> 00:07:33.410 we reinterpreted 72, 73, and 33 as a sequence of colors. 00:07:33.410 --> 00:07:34.550 How much red do you want? 00:07:34.550 --> 00:07:35.720 How much green do you want? 00:07:35.720 --> 00:07:36.860 How much blue do you want? 00:07:36.860 --> 00:07:37.820 And that's fine. 00:07:37.820 --> 00:07:41.060 It's perfectly fine to think and express yourself in terms of decimal. 00:07:41.060 --> 00:07:44.270 But computer scientists tend not to do it that way in the context of colors 00:07:44.270 --> 00:07:45.790 and in the context of memory. 00:07:45.790 --> 00:07:49.160 Instead, they tend to use something called hexadecimal. 00:07:49.160 --> 00:07:51.590 And hexadecimal, here, would actually just 00:07:51.590 --> 00:07:57.860 have you change these values from 72, 73, 33, to the equivalent hexadecimal 00:07:57.860 --> 00:07:58.533 representation. 00:07:58.533 --> 00:08:00.200 And we won't bother doing the math here. 00:08:00.200 --> 00:08:04.340 But let me just stipulate that 72, 73, 33 in decimal 00:08:04.340 --> 00:08:10.262 is the same thing as 48, 49, 21 in hexadecimal. 00:08:10.262 --> 00:08:12.470 Now, obviously, if you glance at these three numbers, 00:08:12.470 --> 00:08:15.980 it's not at all obvious if you're looking at hexadecimal digits 00:08:15.980 --> 00:08:21.080 or decimal digits, because they do use the same subset, 0's through 9's. 00:08:21.080 --> 00:08:23.240 And so a convention, too, in the computing world, 00:08:23.240 --> 00:08:25.850 is any time you represent hexadecimal digits, 00:08:25.850 --> 00:08:29.300 you tend to prefix them, just because, with 0x. 00:08:29.300 --> 00:08:32.179 And there's no mathematical meaning to the 0 or the x. 00:08:32.179 --> 00:08:35.419 It's just a prefix you put there to make clear to the viewer 00:08:35.419 --> 00:08:38.299 that these are hexadecimal digits, even if they might otherwise 00:08:38.299 --> 00:08:40.490 look like decimal digits. 00:08:40.490 --> 00:08:41.940 So where are we going with this? 00:08:41.940 --> 00:08:43.857 Well, those of you who might have experimented 00:08:43.857 --> 00:08:46.850 in the past with making your own web pages and making them colorful, 00:08:46.850 --> 00:08:50.450 or those of you who are artists and have used programs like Photoshop, odds 00:08:50.450 --> 00:08:53.190 are, you've seen these codes before. 00:08:53.190 --> 00:08:55.940 In fact, here are a few screenshots of Photoshop itself. 00:08:55.940 --> 00:08:59.190 If you click on a color in Photoshop and you pull up this window, 00:08:59.190 --> 00:09:02.300 you can change the color that you're drawing on the screen 00:09:02.300 --> 00:09:04.970 to be any of the colors of the rainbow. 00:09:04.970 --> 00:09:07.470 But more arcanely, if you look down here, 00:09:07.470 --> 00:09:09.620 you can actually see these hexadecimal codes, 00:09:09.620 --> 00:09:11.990 because it's become human convention over the years 00:09:11.990 --> 00:09:15.630 to use hexadecimal to represent different amounts of red, green, 00:09:15.630 --> 00:09:16.320 and blue. 00:09:16.320 --> 00:09:23.435 So if you have no red, no green, no blue, otherwise represented as 000000, 00:09:23.435 --> 00:09:26.060 well, that's going to give you the color we know here as black. 00:09:26.060 --> 00:09:29.510 It's sort of the absence of any wavelengths of light there. 00:09:29.510 --> 00:09:33.470 If by contrast, though, you change all of those six digits 00:09:33.470 --> 00:09:38.810 to the highest possible value, which, again, is F. The range in hexadecimal 0 00:09:38.810 --> 00:09:42.890 through F, otherwise in decimal, being 0 through 15, well, 00:09:42.890 --> 00:09:46.800 with FFFFFF, that's a lot of red, a lot of green, a lot of blue. 00:09:46.800 --> 00:09:48.800 And when you combine those wavelengths of light, 00:09:48.800 --> 00:09:51.200 you get the color we see here as white. 00:09:51.200 --> 00:09:53.480 And you can imagine, now, combining different amounts 00:09:53.480 --> 00:09:54.930 of red or green or blue. 00:09:54.930 --> 00:10:00.740 So for instance, in hexadecimal, FF0000, is the color we know as red. 00:10:00.740 --> 00:10:05.270 00FF00 is the color we know as green. 00:10:05.270 --> 00:10:09.630 And finally, 0000FF is the color we know as blue, because again, 00:10:09.630 --> 00:10:14.240 the system that programmers and artists often but don't always use, is indeed, 00:10:14.240 --> 00:10:17.710 this system of RGB for red, green, and blue. 00:10:17.710 --> 00:10:19.460 So we introduced this here not because you 00:10:19.460 --> 00:10:21.810 have to start thinking any differently, because again, 00:10:21.810 --> 00:10:24.560 the mathematical mechanism is the same as week 0. 00:10:24.560 --> 00:10:28.970 But you're going to start seeing numbers in examples, in programs, 00:10:28.970 --> 00:10:32.900 as just appearing in hexadecimal by convention, as opposed to actually 00:10:32.900 --> 00:10:35.550 being interpreted as decimal. 00:10:35.550 --> 00:10:37.880 So if we consider, now, our computer's memory, 00:10:37.880 --> 00:10:40.610 we'll now start thinking of this whole canvas of memory, 00:10:40.610 --> 00:10:43.010 all of these bytes inside of our computer's memory, 00:10:43.010 --> 00:10:46.700 as being innumerable as 0, 1, 2, all the way through F. 00:10:46.700 --> 00:10:53.750 And then if we keep counting, we can go to 10, 11, 12, 13, 14, 15, 16, 17, 18, 00:10:53.750 --> 00:10:58.850 19, 1A, 1B, 1C, 1D, and so forth. 00:10:58.850 --> 00:11:00.790 And it's fine if it's not nearly that obvious, 00:11:00.790 --> 00:11:03.670 as you look at these things, what the decimal equivalents are. 00:11:03.670 --> 00:11:04.690 That's not a problem. 00:11:04.690 --> 00:11:09.130 It's just a different way of thinking about the locations, in this case, 00:11:09.130 --> 00:11:13.480 of a computer's memory, or the representation of one color or another. 00:11:13.480 --> 00:11:19.480 All right, well, let's now use this as an example of an opportunity, 00:11:19.480 --> 00:11:22.690 rather, to consider what's actually being stored in our computer's memory. 00:11:22.690 --> 00:11:26.320 And to be clear, I'll start prefixing all of these memory addresses, 00:11:26.320 --> 00:11:29.890 so to speak, with 0x, just to make clear that we're now talking, indeed, 00:11:29.890 --> 00:11:31.480 in terms of 0's and 1's. 00:11:31.480 --> 00:11:32.980 So here's a simple line of code. 00:11:32.980 --> 00:11:35.147 Out of context, we would need to, actually, put this 00:11:35.147 --> 00:11:37.910 in main or some other program to actually do anything with it. 00:11:37.910 --> 00:11:39.702 But we've seen this before many times, now, 00:11:39.702 --> 00:11:42.760 where you declare a variable, for instance, n for number. 00:11:42.760 --> 00:11:44.830 Declare it as an int for its type. 00:11:44.830 --> 00:11:47.170 And then, perhaps, even assign it a value. 00:11:47.170 --> 00:11:51.520 Well, what's actually going on when we use this kind of code in our computer? 00:11:51.520 --> 00:11:54.760 Well, let's go ahead and whip this thing up in a actual program. 00:11:54.760 --> 00:11:57.970 Let me create a file called address.c because I 00:11:57.970 --> 00:12:01.300 want to start experimenting with some addresses in the computer's memory. 00:12:01.300 --> 00:12:04.180 I'm going to go ahead and include standard io dot h. 00:12:04.180 --> 00:12:06.460 I'm going to give myself int main void. 00:12:06.460 --> 00:12:08.890 And down here, I'm going to go ahead and declare exactly 00:12:08.890 --> 00:12:10.915 that variable, int n equals 50. 00:12:10.915 --> 00:12:15.820 And then I'm going to go ahead and print out, with percent i and a backslash 0, 00:12:15.820 --> 00:12:17.230 the value of n. 00:12:17.230 --> 00:12:19.930 So nothing interesting there, nothing too complicated. 00:12:19.930 --> 00:12:21.790 I'm going to go ahead and make address. 00:12:21.790 --> 00:12:24.123 And then I'm going to go ahead and do dot slash address. 00:12:24.123 --> 00:12:26.380 And of course, as per week one, we should hopefully 00:12:26.380 --> 00:12:27.930 see just the number 50. 00:12:27.930 --> 00:12:31.570 But today, we're going to give you some more tools with which you can actually 00:12:31.570 --> 00:12:33.880 start poking around your computer's memory. 00:12:33.880 --> 00:12:35.950 But let's first consider this line of code 00:12:35.950 --> 00:12:38.240 in the context of your computer's hardware. 00:12:38.240 --> 00:12:41.200 So if you're writing a program with a line of code like this, 00:12:41.200 --> 00:12:44.500 that n needs to be somewhere in your computer's memory. 00:12:44.500 --> 00:12:47.870 That 50 needs to be put somewhere in your computer's memory. 00:12:47.870 --> 00:12:51.010 So if we, again, consider this to be just part of our computer's 00:12:51.010 --> 00:12:55.000 memory, a few dozen bytes, well, suppose that that variable, n, 00:12:55.000 --> 00:12:57.130 happens to end up down here. 00:12:57.130 --> 00:13:01.570 I've deliberately drawn n as taking up four bytes, four squares, because we 00:13:01.570 --> 00:13:05.830 call that an integer, typically, at least on CS50 IDE and modern systems, 00:13:05.830 --> 00:13:07.370 tends to be four bytes. 00:13:07.370 --> 00:13:10.630 So I made sure to have it fill four complete boxes. 00:13:10.630 --> 00:13:13.940 And then value might be 50 that's actually stored there. 00:13:13.940 --> 00:13:17.890 Well, it turns out that within your computer's memory, again, 00:13:17.890 --> 00:13:20.660 there are these addresses that are implicitly there. 00:13:20.660 --> 00:13:23.530 So even though, yes, we can refer to this variable, n, 00:13:23.530 --> 00:13:26.620 based on the variable name I gave it in my code, 00:13:26.620 --> 00:13:30.940 surely this variable exists at a specific location in memory. 00:13:30.940 --> 00:13:32.530 I don't know offhand where it is. 00:13:32.530 --> 00:13:38.410 But let me just propose that maybe it's at location 0x12345678, just 00:13:38.410 --> 00:13:39.550 an arbitrary address. 00:13:39.550 --> 00:13:41.690 I have no idea, in actuality, where it is. 00:13:41.690 --> 00:13:44.860 But it certainly does have an address, because every one of these squares 00:13:44.860 --> 00:13:49.540 inside of your computer's memory has an address, a unique identifier like 0, 1, 00:13:49.540 --> 00:13:50.750 2, and so forth. 00:13:50.750 --> 00:13:56.710 Maybe the 50 ended up at memory address 0x12345678. 00:13:56.710 --> 00:14:01.750 Well, that's kind of cool about C, is that we can actually begin to see this, 00:14:01.750 --> 00:14:03.020 no pun intended. 00:14:03.020 --> 00:14:05.080 So let me go ahead and modify this program 00:14:05.080 --> 00:14:07.480 and introduce a little bit of new syntax that 00:14:07.480 --> 00:14:11.510 will allow us to start poking around the inside of your computer's memory 00:14:11.510 --> 00:14:14.830 so we can actually see what's going on underneath. 00:14:14.830 --> 00:14:17.710 So I'm going to go ahead and change this program to do this instead. 00:14:17.710 --> 00:14:19.585 I'm going to go ahead and say, you know what? 00:14:19.585 --> 00:14:23.590 Don't just print out the value, n, which, of course, is 50. 00:14:23.590 --> 00:14:28.060 Let me see, just out of curiosity, what is the actual address of n. 00:14:28.060 --> 00:14:31.300 And to do that today, we're going to introduce one new piece of syntax, 00:14:31.300 --> 00:14:33.070 which happens to be this here. 00:14:33.070 --> 00:14:37.360 There's two new operators, today, in C. The first is an ampersand, which 00:14:37.360 --> 00:14:39.580 does not represent a logical and. 00:14:39.580 --> 00:14:42.100 Recall a couple of weeks ago, we did see that if you 00:14:42.100 --> 00:14:46.840 want to combine Boolean expressions, this and that, you use two ampersands. 00:14:46.840 --> 00:14:51.040 It's an unfortunate coincidence that an ampersand, solo like this, 00:14:51.040 --> 00:14:52.630 will mean something different today. 00:14:52.630 --> 00:14:56.830 Specifically, this ampersand is going to be our address of operator. 00:14:56.830 --> 00:15:02.590 By simply prefixing any variable name with an ampersand, we can tell C, 00:15:02.590 --> 00:15:06.520 please tell me what address this variable is stored in. 00:15:06.520 --> 00:15:10.180 And this star, not to be confused with multiplication, 00:15:10.180 --> 00:15:12.880 also has another meaning in today's context. 00:15:12.880 --> 00:15:15.310 When you use this asterisk, you can actually 00:15:15.310 --> 00:15:19.910 tell your program to look inside of a particular memory address. 00:15:19.910 --> 00:15:23.500 So the ampersand tells you what address a variable is at. 00:15:23.500 --> 00:15:27.310 The star operator, otherwise known as the dereference operator, 00:15:27.310 --> 00:15:30.190 means, go to the following address. 00:15:30.190 --> 00:15:32.050 So they sort of are reverse operations. 00:15:32.050 --> 00:15:33.400 One figures out the address. 00:15:33.400 --> 00:15:35.240 One goes to the address. 00:15:35.240 --> 00:15:37.850 And so let's see this for real here. 00:15:37.850 --> 00:15:43.070 Let me go ahead and change my n in my program here to ampersand n. 00:15:43.070 --> 00:15:48.980 So I want to print out, not the number in n, but the address of n. 00:15:48.980 --> 00:15:50.870 And now, how do I print out an address? 00:15:50.870 --> 00:15:52.170 Well, it is just a number. 00:15:52.170 --> 00:15:56.690 But actually, printf supports a different format code for addresses. 00:15:56.690 --> 00:15:59.840 You can do percent p, for reasons we'll soon see, 00:15:59.840 --> 00:16:02.510 that says to print out the address of this variable 00:16:02.510 --> 00:16:05.375 and interpret it as hexadecimal, again, by convention. 00:16:05.375 --> 00:16:07.250 So I'm going to go ahead and make address now 00:16:07.250 --> 00:16:10.530 after only making two changes to this file. 00:16:10.530 --> 00:16:12.350 Everything seems to compile OK. 00:16:12.350 --> 00:16:14.150 Now, I'm going to go ahead and run address. 00:16:14.150 --> 00:16:17.210 And we will see that, in this particular program, 00:16:17.210 --> 00:16:21.620 address.c, for whatever reason, that variable, n, 00:16:21.620 --> 00:16:30.110 ended up at crazy location 0x7ffd80792f7c. 00:16:30.110 --> 00:16:31.160 Now, is that useful? 00:16:31.160 --> 00:16:32.870 Not in practice, necessarily. 00:16:32.870 --> 00:16:36.530 We're going to make this become useful by leveraging these addresses. 00:16:36.530 --> 00:16:38.900 But the specific address is not interesting. 00:16:38.900 --> 00:16:40.070 I'm glancing at this number. 00:16:40.070 --> 00:16:41.993 I have no idea what that number is in decimal. 00:16:41.993 --> 00:16:44.660 I would have to do the math, or frankly, just Google a converter 00:16:44.660 --> 00:16:45.660 and do it for me. 00:16:45.660 --> 00:16:47.420 So again, that's not the interesting part. 00:16:47.420 --> 00:16:50.420 The fact that this is in hexadecimal is just an implementation detail. 00:16:50.420 --> 00:16:54.450 It happens to represent the location of this variable. 00:16:54.450 --> 00:16:58.230 And again, we won't want to do this, necessarily. 00:16:58.230 --> 00:17:00.830 But just to be clear that one of these operators, 00:17:00.830 --> 00:17:02.330 the ampersand gets the address. 00:17:02.330 --> 00:17:05.089 And the star operator goes to an address. 00:17:05.089 --> 00:17:07.160 We can actually undo the effects of these things. 00:17:07.160 --> 00:17:13.010 For instance, if I print out now, not ampersand n, but just out of curiosity, 00:17:13.010 --> 00:17:18.170 star ampersand n, I can kind of undo the effects of this operator. 00:17:18.170 --> 00:17:21.170 Ampersand n is going to say, what is the address of n? 00:17:21.170 --> 00:17:25.349 Star ampersand n is going to say, go to that address. 00:17:25.349 --> 00:17:29.360 So this is kind of a pointless exercise, because if I just want what's in n, 00:17:29.360 --> 00:17:32.120 I can just, obviously, print n like we began. 00:17:32.120 --> 00:17:34.560 But again, just as an intellectual exercise, 00:17:34.560 --> 00:17:38.750 if I prefix n with the address of operator, and then use the asterisk 00:17:38.750 --> 00:17:42.830 and say, go to that address, it's the same exact thing 00:17:42.830 --> 00:17:44.280 as just printing n itself. 00:17:44.280 --> 00:17:46.640 So let me change the format code back to an integer. 00:17:46.640 --> 00:17:50.060 Instead percent p, let me go ahead and make address now, 00:17:50.060 --> 00:17:52.100 seems to compile OK, and run address. 00:17:52.100 --> 00:17:53.885 And voila, we're back at the 50. 00:17:53.885 --> 00:17:57.050 So as weird as the syntax today might start to feel, 00:17:57.050 --> 00:17:59.330 realize that these operators, at the end of the day, 00:17:59.330 --> 00:18:01.833 are relatively simple in what they do. 00:18:01.833 --> 00:18:05.000 And if you understand that one just kind of undoes the effects of the other, 00:18:05.000 --> 00:18:08.360 can we start to build up some pretty interesting programs with them. 00:18:08.360 --> 00:18:11.870 And we're going to do so by leveraging a special type of variable, 00:18:11.870 --> 00:18:13.910 a variable called a pointer. 00:18:13.910 --> 00:18:16.670 And there is that p in percent p. 00:18:16.670 --> 00:18:22.240 A pointer is a variable that contains the address of some other value. 00:18:22.240 --> 00:18:23.790 So we've seen integers before. 00:18:23.790 --> 00:18:27.770 We've seen floats and chars and strings and other types as well. 00:18:27.770 --> 00:18:31.430 Pointers, now, are just a different type of variable 00:18:31.430 --> 00:18:34.640 that store the address of some value. 00:18:34.640 --> 00:18:40.250 And you can have pointers to integers, pointers to chars, pointers to bools, 00:18:40.250 --> 00:18:41.870 or any other data type. 00:18:41.870 --> 00:18:45.980 A pointer references the specific type of the value 00:18:45.980 --> 00:18:48.223 that it actually is referring to. 00:18:48.223 --> 00:18:49.640 So let's see this more concretely. 00:18:49.640 --> 00:18:51.620 Let me go back, now, to my program here. 00:18:51.620 --> 00:18:53.840 And let me introduce another variable here. 00:18:53.840 --> 00:18:58.430 Instead of immediately printing out something like n, let me go ahead 00:18:58.430 --> 00:19:02.870 and introduce a second variable that is of type int star. 00:19:02.870 --> 00:19:06.860 And this, I will admit, is probably the most confusing piece of C syntax 00:19:06.860 --> 00:19:09.860 that we'll, in general, see, just because, my god, star is now 00:19:09.860 --> 00:19:13.220 used for multiplication, for going to an address, and also, now, 00:19:13.220 --> 00:19:14.610 declaring a variable. 00:19:14.610 --> 00:19:17.120 This is, arguably, not the best design decision. 00:19:17.120 --> 00:19:18.350 But it was made decades ago. 00:19:18.350 --> 00:19:19.730 So this is what we have. 00:19:19.730 --> 00:19:26.240 But if I do int star p equals ampersand n, now, what I can do down here, 00:19:26.240 --> 00:19:31.770 is print out the address of n by temporarily storing it in a variable. 00:19:31.770 --> 00:19:33.830 So I'm not doing anything new just yet. 00:19:33.830 --> 00:19:36.020 I'm still declaring on line 5, an integer 00:19:36.020 --> 00:19:37.910 called n, assigning at the value 50. 00:19:37.910 --> 00:19:42.260 What's new now on line 6, is that I'm introducing a new type of variable. 00:19:42.260 --> 00:19:44.210 This type of variable is known as a pointer. 00:19:44.210 --> 00:19:48.410 A pointer, again, is just a variable that stores the address of some value. 00:19:48.410 --> 00:19:53.240 And the syntax, admittedly weird, for declaring a pointer to an integer, 00:19:53.240 --> 00:19:57.560 is literally say int, because that's the type you're pointing to, 00:19:57.560 --> 00:20:00.350 star, and then the name of the variable you want to create. 00:20:00.350 --> 00:20:03.320 And I could call this anything, but I'll call it p to keep it succinct. 00:20:03.320 --> 00:20:05.120 And again, on the right hand side of the equals sign 00:20:05.120 --> 00:20:06.620 is the same operator as before. 00:20:06.620 --> 00:20:10.040 If you want to figure out what is the address of n, it's just ampersand n. 00:20:10.040 --> 00:20:14.450 And so we can store that address, now, somewhere longer-term. 00:20:14.450 --> 00:20:18.110 Before, I just passed in ampersand n and printf did it's thing. 00:20:18.110 --> 00:20:23.120 Now, I'm temporarily, on line 6, storing that address in a new variable 00:20:23.120 --> 00:20:24.470 called p. 00:20:24.470 --> 00:20:28.910 And its type is technically int star, is what a programmer might say. 00:20:28.910 --> 00:20:33.680 So it would be incorrect to say int p equals ampersand n. 00:20:33.680 --> 00:20:35.780 And indeed, our compiler, Clang, won't like that. 00:20:35.780 --> 00:20:38.370 It won't let you compile the code, most likely. 00:20:38.370 --> 00:20:43.160 And so, instead, I do int star p to make clear that I know what I'm doing. 00:20:43.160 --> 00:20:48.450 I am storing the address of an int, not an integer, per say. 00:20:48.450 --> 00:20:53.040 So if I go ahead, now, and save this, recompile with make address. 00:20:53.040 --> 00:20:55.530 And notice, I changed one line of code 2 earlier. 00:20:55.530 --> 00:20:59.400 I went back to percent p to print a pointer that is an address. 00:20:59.400 --> 00:21:02.490 And I'm pointing out the value of p, no longer the value of n. 00:21:02.490 --> 00:21:07.050 If I now run dot slash address, voila, there's that cryptic address. 00:21:07.050 --> 00:21:09.300 And these addresses may very well change over time. 00:21:09.300 --> 00:21:11.640 Depending on what's going on inside of your program 00:21:11.640 --> 00:21:15.390 or other things on the system, these addresses might be different each time. 00:21:15.390 --> 00:21:18.060 And that's to be expected and not something to be relied on. 00:21:18.060 --> 00:21:20.250 But it's clearly some random cryptic address, 00:21:20.250 --> 00:21:24.400 similar to my arbitrary 0x12345678 before. 00:21:24.400 --> 00:21:26.310 But now, let's just undo this operation. 00:21:26.310 --> 00:21:30.120 Just so we can come full circle here, let me now propose 00:21:30.120 --> 00:21:33.495 how I can print out the value of n. 00:21:33.495 --> 00:21:35.370 And let me call on someone for this if I can. 00:21:35.370 --> 00:21:41.640 If my goal, now, on line 7, is no longer to print the address of n, but to print 00:21:41.640 --> 00:21:43.972 n itself using p. 00:21:43.972 --> 00:21:45.930 I'm going to go ahead and change, preemptively, 00:21:45.930 --> 00:21:47.820 the format code to percent i. 00:21:47.820 --> 00:21:51.660 And a shorthand notation would, obviously, be just print n. 00:21:51.660 --> 00:21:53.610 But suppose I don't want to print n for this 00:21:53.610 --> 00:22:02.880 exercise, how can I now print the value in n by referring to it by way of p? 00:22:02.880 --> 00:22:05.910 What should I literally type as printf's second argument 00:22:05.910 --> 00:22:12.530 to print out the value of n by using this new variable, p, in some way. 00:22:12.530 --> 00:22:16.290 Yeah, let's call on Joshua. 00:22:16.290 --> 00:22:19.860 AUDIENCE: I believe, if you use the ampersand before the p, 00:22:19.860 --> 00:22:21.642 it will probably do it. 00:22:21.642 --> 00:22:24.100 DAVID MALAN: OK, ampersand p, let me go ahead and try that. 00:22:24.100 --> 00:22:27.700 Let's try ampersand p to print out this value. 00:22:27.700 --> 00:22:30.370 So ampersand p, I'm going to save the file. 00:22:30.370 --> 00:22:32.610 I'm going to do make address and enter. 00:22:32.610 --> 00:22:34.415 And it doesn't seem to be the case. 00:22:34.415 --> 00:22:35.790 Notice that I'm getting an error. 00:22:35.790 --> 00:22:36.720 It's a little cryptic. 00:22:36.720 --> 00:22:40.920 Format specifies type int, but the argument has type int star star, 00:22:40.920 --> 00:22:42.090 more on that another time. 00:22:42.090 --> 00:22:43.570 So it turns out this was incorrect. 00:22:43.570 --> 00:22:47.430 Let's take one other suggestion, because the ampersand, recall, 00:22:47.430 --> 00:22:49.170 gets the address of something. 00:22:49.170 --> 00:22:50.880 But p is already an address. 00:22:50.880 --> 00:22:52.590 So Joshua, what you technically proposed, 00:22:52.590 --> 00:22:54.300 was get me the address of the address. 00:22:54.300 --> 00:22:56.190 And that's not the direction we want to go. 00:22:56.190 --> 00:22:58.170 We want to go to what is at that address. 00:22:58.170 --> 00:23:00.740 Sophia, what do you think? 00:23:00.740 --> 00:23:02.640 AUDIENCE: We want to add a percent-- 00:23:02.640 --> 00:23:06.820 or a star p when we print it. 00:23:06.820 --> 00:23:07.570 DAVID MALAN: Yeah. 00:23:07.570 --> 00:23:09.380 So I had a little trouble hearing you. 00:23:09.380 --> 00:23:12.370 But I think if we instead use not the ampersand operator, 00:23:12.370 --> 00:23:14.710 but the star operator, that's going to be, 00:23:14.710 --> 00:23:17.170 indeed, the dereference operator, which essentially means, 00:23:17.170 --> 00:23:19.120 go to the value in p. 00:23:19.120 --> 00:23:23.530 And if the value in p is an address, I think, let's try this, make address. 00:23:23.530 --> 00:23:25.490 Yep, that compiled OK this time. 00:23:25.490 --> 00:23:27.550 Now, if I do dot slash address, hopefully, I 00:23:27.550 --> 00:23:30.400 will now see, indeed, the number 50. 00:23:30.400 --> 00:23:33.010 So again, we don't seem to have made any fundamental progress. 00:23:33.010 --> 00:23:36.070 At the end of the day, I'm still just printing out the value of n. 00:23:36.070 --> 00:23:39.100 But we've introduced this new primitive, this new puzzle piece, 00:23:39.100 --> 00:23:41.440 if you will, that allows you, programmatically, 00:23:41.440 --> 00:23:44.390 to figure out the address of something in the computer's memory 00:23:44.390 --> 00:23:46.540 and to actually go to that address. 00:23:46.540 --> 00:23:52.070 And we'll soon see exercise more sophisticated control over it as well. 00:23:52.070 --> 00:23:56.050 But let's come back to a pictorial representation of this 00:23:56.050 --> 00:23:59.290 and consider what it is we just did in the context, now, of this code. 00:23:59.290 --> 00:24:02.080 So inside of my main, the two interesting lines of code, 00:24:02.080 --> 00:24:05.320 really, were these two lines first before we made Sophia's addition 00:24:05.320 --> 00:24:07.990 and actually dereferenced p and printed it out with printf. 00:24:07.990 --> 00:24:10.810 But let's consider, for a moment, what these values now 00:24:10.810 --> 00:24:12.280 look like in a computer's memory. 00:24:12.280 --> 00:24:14.440 And again, the syntax is a little cryptic 00:24:14.440 --> 00:24:16.475 because we now have a star and an ampersand. 00:24:16.475 --> 00:24:18.850 But again, that just means, now, we get to start thinking 00:24:18.850 --> 00:24:20.405 in terms of the computer's memory. 00:24:20.405 --> 00:24:23.030 So for instance, here's a grid of memory inside of my computer. 00:24:23.030 --> 00:24:26.980 And maybe, for instance, the 50 and the n end up down there. 00:24:26.980 --> 00:24:29.980 They could end up anywhere, not even pictured on the screen here. 00:24:29.980 --> 00:24:34.090 They end up somewhere in the computer's memory, for our purposes thus far. 00:24:34.090 --> 00:24:36.100 But it technically lives in an address. 00:24:36.100 --> 00:24:38.950 And let me simplify the address just so it's quicker to say. 00:24:38.950 --> 00:24:42.310 This 50, now, stored in the variable n, maybe it actually 00:24:42.310 --> 00:24:44.590 lives at address 0x123. 00:24:44.590 --> 00:24:46.480 I have no idea where it is, but we've clearly 00:24:46.480 --> 00:24:50.200 seen that it can live in a seemingly random address like that. 00:24:50.200 --> 00:24:51.640 Now, what about p? 00:24:51.640 --> 00:24:54.520 p is technically a variable itself. 00:24:54.520 --> 00:24:57.190 It's a variable that stores the address of something else. 00:24:57.190 --> 00:25:00.190 But it's still a variable, which means, when you declare p 00:25:00.190 --> 00:25:04.660 with the code earlier, it actually does take up some bytes of memory 00:25:04.660 --> 00:25:05.660 on the screen. 00:25:05.660 --> 00:25:10.420 And so let me go ahead and propose that p happens to end up in memory here. 00:25:10.420 --> 00:25:13.450 Now, p is deliberately drawn to be longer here. 00:25:13.450 --> 00:25:15.700 I'm consuming eight total bytes this time, 00:25:15.700 --> 00:25:20.470 because it turns out, on modern computer systems, including CS50 IDE, 00:25:20.470 --> 00:25:23.500 pointers tend to take up eight bytes. 00:25:23.500 --> 00:25:27.190 So not one, not four, but eight bytes, so I've simply drawn it to be bigger. 00:25:27.190 --> 00:25:31.240 So what is actually stored in the variable p? 00:25:31.240 --> 00:25:35.600 Well, it turns out that, again, it's just storing the address of some value. 00:25:35.600 --> 00:25:42.460 So if the integer n, which itself is storing 50, is at location 0x123, 00:25:42.460 --> 00:25:47.080 and pointer p is being assigned that address, it's just like saying, 00:25:47.080 --> 00:25:50.620 well, stored in this variable p, is literally just a number 00:25:50.620 --> 00:25:54.190 represented here in hexadecimal notation, 0x123. 00:25:54.190 --> 00:25:56.650 So that's all that's going on inside the computer's memory 00:25:56.650 --> 00:25:57.858 with those two lines of code. 00:25:57.858 --> 00:26:00.040 There's nothing fundamentally new, except the fact 00:26:00.040 --> 00:26:04.430 that we have new syntax with which to refer to these addresses explicitly. 00:26:04.430 --> 00:26:06.100 This is n down here. 00:26:06.100 --> 00:26:07.720 This is p up here. 00:26:07.720 --> 00:26:12.160 And the value of p just happens to be an address. 00:26:12.160 --> 00:26:15.205 Now, I keep saying that these addresses are a little cryptic. 00:26:15.205 --> 00:26:16.330 They're a little arbitrary. 00:26:16.330 --> 00:26:16.872 And they are. 00:26:16.872 --> 00:26:20.530 And honestly, it is rarely, if ever, going to be enlightening to know, 00:26:20.530 --> 00:26:25.030 as a human, what address this integer n is actually at. 00:26:25.030 --> 00:26:28.550 Who cares if it's at 0x123 or 0x456? 00:26:28.550 --> 00:26:29.800 Generally, we don't. 00:26:29.800 --> 00:26:33.070 And so computer scientists, when talking about computers' memory, 00:26:33.070 --> 00:26:38.010 tend not to talk at these low level details, in terms of actual numbers. , 00:26:38.010 --> 00:26:40.600 Instead, they tend to simplify the picture, 00:26:40.600 --> 00:26:44.230 sort of abstract away all of the other memory, which frankly, is not 00:26:44.230 --> 00:26:46.690 relevant to the discussion thus far, and just 00:26:46.690 --> 00:26:50.290 say, you know what, I know that p is storing an address. 00:26:50.290 --> 00:26:53.740 And that address happens to be that of 50 down here. 00:26:53.740 --> 00:26:56.830 But I really don't care, in my everyday programming life, 00:26:56.830 --> 00:26:58.360 what these specific addresses are. 00:26:58.360 --> 00:26:59.230 So you know what? 00:26:59.230 --> 00:27:01.730 Let's just abstract it away as an arrow. 00:27:01.730 --> 00:27:06.250 And again, abstraction is all about simplifying lower level details 00:27:06.250 --> 00:27:09.250 that you may very well need to understand but you don't necessarily 00:27:09.250 --> 00:27:10.520 need to keep thinking about. 00:27:10.520 --> 00:27:11.950 You don't need to keep thinking at this level. 00:27:11.950 --> 00:27:13.730 It suffices to think at this level. 00:27:13.730 --> 00:27:16.600 So we might as well draw a pointer, pictorially, 00:27:16.600 --> 00:27:20.710 as pointing at some value and irrespective of what 00:27:20.710 --> 00:27:22.330 the actual address is. 00:27:22.330 --> 00:27:25.150 And so this is very much the case in our human world. 00:27:25.150 --> 00:27:29.200 We have very similar conventions whether or not 00:27:29.200 --> 00:27:31.750 it might be obvious at first glance, such 00:27:31.750 --> 00:27:37.310 that we may very well be using these same mechanisms in our everyday lives. 00:27:37.310 --> 00:27:40.690 So for instance, if you happen to have a mailbox out in the street on your home 00:27:40.690 --> 00:27:43.768 or down in the basement of Harvard Science Center when on campus, it 00:27:43.768 --> 00:27:46.810 may very well look like something like this, at least more residentially. 00:27:46.810 --> 00:27:51.100 And suppose that this mailbox here is representing, in this case, p, 00:27:51.100 --> 00:27:51.790 in the story. 00:27:51.790 --> 00:27:55.490 It's storing a pointer, that is, the address of something else. 00:27:55.490 --> 00:27:58.360 Well, if there's a whole bunch of other mailboxes on the street, 00:27:58.360 --> 00:28:01.510 well, we can put anything we want in these mailboxes. 00:28:01.510 --> 00:28:04.840 We can put postcards, letters, packages even. 00:28:04.840 --> 00:28:08.250 And just as in the real world, can we do the same in the virtual. 00:28:08.250 --> 00:28:12.890 I can store chars or integers or other things, including addresses. 00:28:12.890 --> 00:28:17.100 So for instance, Brian, I think you have your own mailbox somewhere else. 00:28:17.100 --> 00:28:20.660 And Brian, of course, has a mailbox that itself has a unique address. 00:28:20.660 --> 00:28:23.600 So Brian, for instance, what happens to be the unique address 00:28:23.600 --> 00:28:26.030 of the mailbox on your street there? 00:28:26.030 --> 00:28:27.600 BRIAN: Yeah, so here is my mailbox. 00:28:27.600 --> 00:28:28.370 It's labeled n. 00:28:28.370 --> 00:28:29.750 And its address is over here. 00:28:29.750 --> 00:28:33.200 The address of my mailbox appears to be 0x123. 00:28:33.200 --> 00:28:35.450 DAVID MALAN: Yeah, so my mailbox, too, has an address. 00:28:35.450 --> 00:28:37.200 Frankly, again, I don't really care about it. 00:28:37.200 --> 00:28:39.033 So I've not even put it on the mailbox here. 00:28:39.033 --> 00:28:43.070 But if my mailbox represents p, a pointer, and Brian's mailbox 00:28:43.070 --> 00:28:45.920 represents n, an integer, well, it should 00:28:45.920 --> 00:28:49.260 mean that if I look inside the contents of my pointer 00:28:49.260 --> 00:28:53.690 and I see the value 0x123, that is now my clue, 00:28:53.690 --> 00:28:57.560 a breadcrumb of sorts, that can now let me go look inside of Brian's mailbox. 00:28:57.560 --> 00:29:00.320 And Brian, if you wouldn't mind doing that for us, 00:29:00.320 --> 00:29:02.430 what do you have at that address? 00:29:02.430 --> 00:29:05.540 BRIAN: And if I look in my mailbox at address 0x123, 00:29:05.540 --> 00:29:07.727 I have the number 50 inside of this mailbox. 00:29:07.727 --> 00:29:08.810 DAVID MALAN: Yeah, indeed. 00:29:08.810 --> 00:29:10.400 So in this case, he happens to be storing an int. 00:29:10.400 --> 00:29:11.650 But it could be anything else. 00:29:11.650 --> 00:29:14.480 And again, we don't typically care about these specific addresses. 00:29:14.480 --> 00:29:17.450 Once you understand the metaphor, really, we can do something silly 00:29:17.450 --> 00:29:20.630 and really just think of this mailbox as storing a value that's 00:29:20.630 --> 00:29:23.180 pointing at Brian's mailbox. 00:29:23.180 --> 00:29:26.510 It's some kind of direction drawn there, pictorially as an arrow, 00:29:26.510 --> 00:29:29.000 here as a silly foam finger. 00:29:29.000 --> 00:29:34.750 Or if you prefer, a foam Yale finger pointing, instead, at Brian's mailbox, 00:29:34.750 --> 00:29:38.720 just as a sort of breadcrumb leading us to some other value on the screen. 00:29:38.720 --> 00:29:41.408 So when we talk today and beyond about addresses, 00:29:41.408 --> 00:29:42.700 that's all we're talking about. 00:29:42.700 --> 00:29:45.790 We humans in the real world have been using addresses for eons, now, 00:29:45.790 --> 00:29:49.030 to uniquely identify our homes or businesses or the like. 00:29:49.030 --> 00:29:51.520 Computers do the exact same thing at a lower level 00:29:51.520 --> 00:29:53.440 using their computer's memory. 00:29:53.440 --> 00:29:58.330 So let me pause here to see if there are any questions on pointers, variables 00:29:58.330 --> 00:30:00.760 that store addresses, or on these new operators, 00:30:00.760 --> 00:30:02.890 like the ampersand or the asterisk, which 00:30:02.890 --> 00:30:06.310 now has a new meaning today onward. 00:30:06.310 --> 00:30:06.968 Nothing yet. 00:30:06.968 --> 00:30:09.010 All right, seeing none, well, let's consider now, 00:30:09.010 --> 00:30:12.250 the same story in the context of a completely different data type. 00:30:12.250 --> 00:30:15.310 Thus far, we've played only with ints. 00:30:15.310 --> 00:30:16.630 But consider strings. 00:30:16.630 --> 00:30:20.950 We've spent a lot of time on strings, using encryption with them 00:30:20.950 --> 00:30:25.880 and solving implementing electoral algorithms using user's input. 00:30:25.880 --> 00:30:27.940 So let's consider a fundamentally different data 00:30:27.940 --> 00:30:31.940 type that stores, not individual integers, but strings of text instead. 00:30:31.940 --> 00:30:34.150 So for instance, in any program involving a string, 00:30:34.150 --> 00:30:38.245 you might have a line of code that looks like this. string s equals, quote 00:30:38.245 --> 00:30:40.090 unquote, "HI!" 00:30:40.090 --> 00:30:41.852 in all caps with an exclamation point. 00:30:41.852 --> 00:30:44.560 So that may very well be a line of code that we've seen thus far. 00:30:44.560 --> 00:30:46.935 What's actually going on inside of the computer's memory? 00:30:46.935 --> 00:30:51.340 Well, let me propose that when you type in quote unquote, "HI!" in a computer, 00:30:51.340 --> 00:30:53.780 it ends up somewhere in your computer's memory. 00:30:53.780 --> 00:30:58.840 So HI exclamation point, plus, per last week, a backslash 0-- or two weeks ago, 00:30:58.840 --> 00:31:04.040 a backslash 0, which is how a computer represents the end of that string. 00:31:04.040 --> 00:31:06.100 But let's look a little more carefully at 00:31:06.100 --> 00:31:08.350 what is going on underneath this hood here. 00:31:08.350 --> 00:31:12.190 Technically speaking, I could address those individual characters 00:31:12.190 --> 00:31:16.280 we have seen as of week two, by using bracket notation like s bracket 0, 00:31:16.280 --> 00:31:18.910 s bracket 1, s bracket 2, and s bracket 3. 00:31:18.910 --> 00:31:22.427 We use the square bracket notation to treat a string 00:31:22.427 --> 00:31:24.010 as though it's an array of characters. 00:31:24.010 --> 00:31:26.900 And it is, it was, and it still is. 00:31:26.900 --> 00:31:32.230 But it turns out, strings can also be manipulated by way of their addresses 00:31:32.230 --> 00:31:32.960 as well. 00:31:32.960 --> 00:31:36.640 And so for instance, maybe this same exact string, HI, 00:31:36.640 --> 00:31:43.480 is stored at memory address 0x123 and then 0x124, 0x125, and 0x126. 00:31:43.480 --> 00:31:46.150 Notice that they're deliberately contiguous 00:31:46.150 --> 00:31:47.560 addresses, back to back to back. 00:31:47.560 --> 00:31:50.870 And they're only one byte apart, because each of these chars, of course, 00:31:50.870 --> 00:31:53.140 is just one byte in C. 00:31:53.140 --> 00:31:56.920 So those numbers are not important, specifically. 00:31:56.920 --> 00:31:59.530 But the fact that they're one byte apart from each other 00:31:59.530 --> 00:32:02.350 is important, because that's the definition of a string, 00:32:02.350 --> 00:32:05.470 and indeed, an array, to have memory back to back to back. 00:32:05.470 --> 00:32:08.140 Now, what exactly, though, is S? 00:32:08.140 --> 00:32:11.530 S was the name of the variable I gave a moment ago to go to that line of code, 00:32:11.530 --> 00:32:13.840 string S equals quote unquote, "HI." 00:32:13.840 --> 00:32:14.710 well, what is S? 00:32:14.710 --> 00:32:18.950 S is a variable that has to go somewhere in the computer's memory. 00:32:18.950 --> 00:32:24.880 And suppose that S is, indeed, HI with an exclamation point. 00:32:24.880 --> 00:32:28.600 And the HI happens to live at this location here. 00:32:28.600 --> 00:32:31.390 You know what you can think of S as being now, 00:32:31.390 --> 00:32:34.840 isn't, at a high level, a string, but at a lower level, 00:32:34.840 --> 00:32:37.300 it's just the address of a string. 00:32:37.300 --> 00:32:40.780 More specifically, let's start thinking about a string 00:32:40.780 --> 00:32:46.297 as technically being just the address of the first character in the string. 00:32:46.297 --> 00:32:48.130 Now, that might give you pause for a moment, 00:32:48.130 --> 00:32:49.810 because why the first character? 00:32:49.810 --> 00:32:53.710 How are you going to remember that, wait a minute, this string isn't at and only 00:32:53.710 --> 00:32:54.940 at 0x123. 00:32:54.940 --> 00:33:00.110 It also continues at 0x124, 0x125, and so forth. 00:33:00.110 --> 00:33:02.950 But let me pause and ask the group here, why 00:33:02.950 --> 00:33:06.110 might it very well be sufficient for a computer 00:33:06.110 --> 00:33:12.550 and us programmers to just think of strings in terms of being 00:33:12.550 --> 00:33:15.460 the address of the very first byte. 00:33:15.460 --> 00:33:18.220 Like, why is it sufficient, no matter how long 00:33:18.220 --> 00:33:20.830 the string is, even if it's a whole paragraph of text, 00:33:20.830 --> 00:33:25.360 why is it very cleverly sufficient to think of a string like S 00:33:25.360 --> 00:33:31.420 as just being identical to the address of the first byte? 00:33:31.420 --> 00:33:33.718 Ginni, is it? 00:33:33.718 --> 00:33:37.480 AUDIENCE: Possibly because it happens that strings, whenever we are defining 00:33:37.480 --> 00:33:39.490 a new string, that is altogether. 00:33:39.490 --> 00:33:44.410 Suppose, if I'm writing my name, Ginni, so it will be G-I-N-N-I altogether. 00:33:44.410 --> 00:33:46.810 So it will be sufficient if something is pointed 00:33:46.810 --> 00:33:50.560 towards just first character of my name, so that I can just 00:33:50.560 --> 00:33:55.895 follow up for the first character and then get all the characters afterwards. 00:33:55.895 --> 00:33:56.770 DAVID MALAN: Perfect. 00:33:56.770 --> 00:33:59.800 So all of these basic definitions we had over the past couple of weeks 00:33:59.800 --> 00:34:00.790 now come together. 00:34:00.790 --> 00:34:02.812 If a string is just an array of characters-- 00:34:02.812 --> 00:34:05.020 and by definition of array, those characters are back 00:34:05.020 --> 00:34:09.280 to back to back, and per two weeks ago, every string 00:34:09.280 --> 00:34:13.300 ends with this conventional backslash zero or nul character. 00:34:13.300 --> 00:34:15.550 All you need to do when thinking about a string 00:34:15.550 --> 00:34:17.530 is just to know where does the string begin, 00:34:17.530 --> 00:34:19.719 because you can use a four loop or a while loop 00:34:19.719 --> 00:34:22.540 or some other heuristic with a condition and a Boolean expression 00:34:22.540 --> 00:34:25.929 to figure out where the string ends without even knowing, 00:34:25.929 --> 00:34:27.710 in advance, its length. 00:34:27.710 --> 00:34:30.159 So that is to say, let's start, for the moment, 00:34:30.159 --> 00:34:32.679 thinking of about strings as being quite simply 00:34:32.679 --> 00:34:37.969 that, just the address of the first character in the string. 00:34:37.969 --> 00:34:40.989 And if we then take that as fact, let's go ahead, now, 00:34:40.989 --> 00:34:43.989 and start playing with a program that doesn't use integers, but instead, 00:34:43.989 --> 00:34:46.570 used strings, using this basic primitive. 00:34:46.570 --> 00:34:49.929 So let me go ahead and delete the code I'd written before, an address.c. 00:34:49.929 --> 00:34:54.580 Let me just change it up to be string equals quote unquote, "HI" semicolon. 00:34:54.580 --> 00:34:57.700 And notice, I'm not manually typing any backslash 0's. 00:34:57.700 --> 00:34:59.560 C does that for us automatically. 00:34:59.560 --> 00:35:02.260 When you close the quote, the compiler takes care 00:35:02.260 --> 00:35:04.158 of adding that backslash 0 for you. 00:35:04.158 --> 00:35:05.950 Now, I'm going to go ahead on the next line 00:35:05.950 --> 00:35:10.042 and go ahead and print out percent s backslash n comma s, 00:35:10.042 --> 00:35:11.500 if I want to print out that string. 00:35:11.500 --> 00:35:13.968 Now, this program is not at all interesting anymore. 00:35:13.968 --> 00:35:15.760 Back in week one, we wrote something like-- 00:35:15.760 --> 00:35:18.730 OK, yes it is interesting because I screwed up. 00:35:18.730 --> 00:35:19.780 So five errors. 00:35:19.780 --> 00:35:22.450 I've written seven lines of code and five errors. 00:35:22.450 --> 00:35:24.070 And let's see what's going on. 00:35:24.070 --> 00:35:27.430 As always, always go to the top, because odds are, 00:35:27.430 --> 00:35:29.650 there's just some confusing cascading effect. 00:35:29.650 --> 00:35:34.090 The very first error I see is use of undeclared identifier string. 00:35:34.090 --> 00:35:35.230 Did I mean standard n? 00:35:35.230 --> 00:35:37.900 I didn't mean standard n, string, string, string. 00:35:37.900 --> 00:35:40.780 So I could run help 50 as my frontier, but honestly, I 00:35:40.780 --> 00:35:43.150 make this mistake often enough that I kind of know now 00:35:43.150 --> 00:35:46.690 that I forgot to include cs50.h. 00:35:46.690 --> 00:35:49.960 And indeed, if I now do this and recompile make address-- 00:35:49.960 --> 00:35:53.080 OK, all five errors are gone just by that one simple change. 00:35:53.080 --> 00:35:56.200 And if I run address now, it's just going to, quite simply, say HI. 00:35:56.200 --> 00:35:59.020 But let's now start to consider what's going 00:35:59.020 --> 00:36:00.650 on underneath the hood of this program. 00:36:00.650 --> 00:36:06.040 Suppose I am curious and want to print out what is actually 00:36:06.040 --> 00:36:08.170 the address at which this string lives. 00:36:08.170 --> 00:36:09.520 Well, it turns out-- 00:36:09.520 --> 00:36:10.690 let me be clever here. 00:36:10.690 --> 00:36:14.830 Let me print out, not a format code of percent s, but percent p. 00:36:14.830 --> 00:36:18.290 Show me this same string as an address. 00:36:18.290 --> 00:36:22.060 Let me go ahead and recompile, make address, seems to compile OK. 00:36:22.060 --> 00:36:23.560 Let me run dot slash address. 00:36:23.560 --> 00:36:26.350 And again, I'm still printing s, but I'm asking printf 00:36:26.350 --> 00:36:30.260 to present it as though it's a pointer. 00:36:30.260 --> 00:36:32.430 And interesting, it's not the same as before. 00:36:32.430 --> 00:36:35.060 But again, that's reasonable because the memory addresses 00:36:35.060 --> 00:36:36.540 aren't going to always be the same. 00:36:36.540 --> 00:36:37.940 But it doesn't matter what it is. 00:36:37.940 --> 00:36:39.232 But that's kind of interesting. 00:36:39.232 --> 00:36:41.750 All this time, any time you've been using strings, 00:36:41.750 --> 00:36:44.300 had you just changed your percent s to a percent p, 00:36:44.300 --> 00:36:48.290 you could have seen where, in memory, that string actually starts. 00:36:48.290 --> 00:36:50.780 It's not functionally useful to us just yet. 00:36:50.780 --> 00:36:52.700 But it's been there this whole time. 00:36:52.700 --> 00:36:54.800 And let me go ahead and do the following now. 00:36:54.800 --> 00:36:58.950 Suppose I get a little curious further, and I do printf. 00:36:58.950 --> 00:37:02.390 Let me go ahead and print out another address followed by a new line. 00:37:02.390 --> 00:37:07.035 And let me go ahead and print out the address of the first character. 00:37:07.035 --> 00:37:08.660 So again, this is a little weird to do. 00:37:08.660 --> 00:37:10.220 And we wouldn't typically do this that often. 00:37:10.220 --> 00:37:13.430 But again, just to make the point that these operators give us very simple 00:37:13.430 --> 00:37:16.850 answers to questions like, what is the address of this thing? 00:37:16.850 --> 00:37:23.960 If s bracket i, as of week two in CS50, represented the second character in s, 00:37:23.960 --> 00:37:28.190 because 0 index means s bracket 0 is the first, s bracket 1 is the second. 00:37:28.190 --> 00:37:30.410 If I play around with today's new operator, 00:37:30.410 --> 00:37:36.020 this ampersand, I bet I can see the address of that second character. 00:37:36.020 --> 00:37:38.390 And in fact, let me go ahead and be more explicit. 00:37:38.390 --> 00:37:43.160 Let me change this first s to be s bracket 0 and put an ampersand here. 00:37:43.160 --> 00:37:46.430 And let me go ahead, now, and make this program, make address. 00:37:46.430 --> 00:37:48.170 OK, a little funky-- 00:37:48.170 --> 00:37:49.680 I just missed a semicolon. 00:37:49.680 --> 00:37:51.060 So easy fix there. 00:37:51.060 --> 00:37:53.600 Let me go ahead and recompile with make address. 00:37:53.600 --> 00:37:55.880 Let me go ahead and run dot slash address. 00:37:55.880 --> 00:37:58.970 And interesting, well, maybe-- 00:37:58.970 --> 00:38:00.320 interesting to me. 00:38:00.320 --> 00:38:02.780 So you see, now, two addresses, the first of which 00:38:02.780 --> 00:38:08.900 is 0x4006a4, which apparently, is the address of the first character in s. 00:38:08.900 --> 00:38:10.880 But notice what's curious about the next one. 00:38:10.880 --> 00:38:15.720 It's almost the same except the byte is one further away. 00:38:15.720 --> 00:38:18.380 And I bet if I do this, not just for the h and the i, 00:38:18.380 --> 00:38:20.330 but also the exclamation point-- let me do 00:38:20.330 --> 00:38:23.210 one more line of almost identical code, just 00:38:23.210 --> 00:38:26.240 to make the point that all this time it's, indeed, 00:38:26.240 --> 00:38:30.560 been the case that all characters in a string are back to back to back. 00:38:30.560 --> 00:38:32.540 And you can now see it in code. 00:38:32.540 --> 00:38:37.610 b4, b5, b6, are just one byte apart. 00:38:37.610 --> 00:38:40.940 So we see some visual confirmation, now, that strings are indeed 00:38:40.940 --> 00:38:42.990 laid out in memory just like this. 00:38:42.990 --> 00:38:46.130 Now, again, this is not a very useful programmatic exercise 00:38:46.130 --> 00:38:48.500 to look at the address of individual characters. 00:38:48.500 --> 00:38:51.350 But again, this is just to emphasize that underneath the hood, 00:38:51.350 --> 00:38:53.960 some relatively simple operations are being 00:38:53.960 --> 00:38:58.562 enabled by way of this new ampersand, and in turn, star operator. 00:38:58.562 --> 00:39:00.770 So let's consider for a moment what this really looks 00:39:00.770 --> 00:39:02.390 like inside the computer's memory. 00:39:02.390 --> 00:39:05.660 At a low level, yes, s is technically an address. 00:39:05.660 --> 00:39:08.540 And yes, it's technically the address of the first byte, 00:39:08.540 --> 00:39:10.880 which in the actual computer, looked different. 00:39:10.880 --> 00:39:13.100 But in my slide here, I just arbitrarily proposed 00:39:13.100 --> 00:39:17.210 that it's at 0x123, 0x124, 0x125. 00:39:17.210 --> 00:39:20.300 But again, let's not care about that level of detail. 00:39:20.300 --> 00:39:23.210 Let's just kind of wave our hands and abstract away these addresses 00:39:23.210 --> 00:39:30.950 and just now start thinking of s, that is a string, as technically just being 00:39:30.950 --> 00:39:32.450 a pointer. 00:39:32.450 --> 00:39:33.260 A pointer. 00:39:33.260 --> 00:39:36.463 So it turns out that even though it's very useful and very common 00:39:36.463 --> 00:39:39.380 to think of strings as, obviously, just being sequences of characters. 00:39:39.380 --> 00:39:41.240 And that's been true since week one. 00:39:41.240 --> 00:39:43.130 And you can also think of them as arrays, 00:39:43.130 --> 00:39:44.990 back to back sequences of characters. 00:39:44.990 --> 00:39:47.330 You can also, it turns out, starting today, 00:39:47.330 --> 00:39:51.290 think of them as just being pointers, that is, 00:39:51.290 --> 00:39:54.900 the address of a character somewhere in the computer's memory. 00:39:54.900 --> 00:39:58.550 And as Ginni notes, because all of the characters in a string 00:39:58.550 --> 00:40:00.770 are, by definition, back to back to back, 00:40:00.770 --> 00:40:05.720 and because, by definition, all strings end with a backslash 0, that 00:40:05.720 --> 00:40:08.750 is literally the smallest and only amount of information 00:40:08.750 --> 00:40:12.920 you need to keep around in a computer to know where all of your strings are. 00:40:12.920 --> 00:40:16.340 Just remember the address of the very first character 00:40:16.340 --> 00:40:19.430 therein, because you can find your way to the end 00:40:19.430 --> 00:40:24.320 by remembering that this backslash 0 is, really, just eight 0 00:40:24.320 --> 00:40:27.080 bits, otherwise represented as backslash 0. 00:40:27.080 --> 00:40:29.617 And so we could certainly have an if condition, 00:40:29.617 --> 00:40:31.700 much like we did two weeks ago when playing around 00:40:31.700 --> 00:40:36.230 with the lengths of strings, that allows us to check for precisely that. 00:40:36.230 --> 00:40:41.030 And so when I say we're taking off some training wheels, here they go. 00:40:41.030 --> 00:40:44.330 So up until now, we've been using, again, the CS50 library, 00:40:44.330 --> 00:40:47.470 which gives us, conveniently, functions like get string and get int 00:40:47.470 --> 00:40:49.650 and get float and so forth. 00:40:49.650 --> 00:40:54.650 But all this time, the CS50 library, specifically the file, cs50.h, 00:40:54.650 --> 00:40:58.070 had a little bit of a pedagogical simplification in it. 00:40:58.070 --> 00:41:02.510 Recall last week, that you can define your own custom data types. 00:41:02.510 --> 00:41:06.955 Well, it turns out that all this time, we've been claiming that strings exist 00:41:06.955 --> 00:41:09.080 and they're something you can use in your programs. 00:41:09.080 --> 00:41:14.420 And strings do exist in C. They do exist in Python, in JavaScript, in Java, 00:41:14.420 --> 00:41:16.980 and C++, in many, many, many other languages. 00:41:16.980 --> 00:41:18.860 This is not a CS50 term. 00:41:18.860 --> 00:41:25.190 But string, technically, does not exist as a data type in C. It instead, 00:41:25.190 --> 00:41:31.180 is more cryptically and more low-level known as char star. 00:41:31.180 --> 00:41:33.080 Char star, now what does that mean? 00:41:33.080 --> 00:41:37.180 Well, char star, much like our int star a few minutes ago, 00:41:37.180 --> 00:41:40.840 just represents the address of a character, much like int star 00:41:40.840 --> 00:41:43.210 represents the address of an int. 00:41:43.210 --> 00:41:46.210 And if, again, you kind of agree with me now, 00:41:46.210 --> 00:41:49.450 that you can think of strings as sequences of characters, 00:41:49.450 --> 00:41:52.660 or more specifically, arrays of characters, or more specifically, 00:41:52.660 --> 00:41:56.920 as of today, the address of just the first character, 00:41:56.920 --> 00:41:59.680 then it's, indeed, the case that we now can 00:41:59.680 --> 00:42:02.800 apply this new terminology, today, of pointer, 00:42:02.800 --> 00:42:06.040 to our old familiar friends, strings. 00:42:06.040 --> 00:42:10.690 String is the same thing as a synonym, if you will, for char star. 00:42:10.690 --> 00:42:14.200 And it's in the CS50 library that we, essentially, have a line of code 00:42:14.200 --> 00:42:18.348 that simplifies or abstracts away char star, which honestly, no one wants 00:42:18.348 --> 00:42:20.890 to think about or struggle with in the first week of a class, 00:42:20.890 --> 00:42:23.260 let alone the first two or three weeks of a class. 00:42:23.260 --> 00:42:28.475 It's a simplification, a custom data type, that we name string, 00:42:28.475 --> 00:42:30.850 just so you don't have to think about, what is this star? 00:42:30.850 --> 00:42:32.017 What is it to the character? 00:42:32.017 --> 00:42:33.100 What is it an address of? 00:42:33.100 --> 00:42:37.450 But today, we can remove those training wheels and reveal that, all this time, 00:42:37.450 --> 00:42:40.720 you've just been manipulating characters at specific addresses. 00:42:40.720 --> 00:42:43.180 And we've used this kind of technique before, 00:42:43.180 --> 00:42:45.550 abstracting away these lower level details. 00:42:45.550 --> 00:42:48.310 For instance, recall last week, that we introduced 00:42:48.310 --> 00:42:52.630 this notion of a struct, a data type that you can customize to be your own. 00:42:52.630 --> 00:42:56.200 We implemented a better phone book by wrapping together 00:42:56.200 --> 00:42:58.630 a name and a number inside of a custom data type, 00:42:58.630 --> 00:43:01.960 encapsulating them if you will, inside of something we called person. 00:43:01.960 --> 00:43:05.650 And every person we claimed had a structure 00:43:05.650 --> 00:43:07.580 that contains a name and a number. 00:43:07.580 --> 00:43:11.410 And by the way of this feature of C, typedef, we can define a new type. 00:43:11.410 --> 00:43:15.200 And the name of that type, last week, was just person. 00:43:15.200 --> 00:43:18.100 So we're using, already, and we have been sort of secretly 00:43:18.100 --> 00:43:22.750 using since the first week of C in the class, a line of code that 00:43:22.750 --> 00:43:24.020 actually looks like this. 00:43:24.020 --> 00:43:28.090 And this is, indeed, one of the lines of code inside of cs50.h. 00:43:28.090 --> 00:43:31.000 It says typedef, which means give me a custom type. 00:43:31.000 --> 00:43:35.770 And it creates a synonym for char star called string. 00:43:35.770 --> 00:43:39.700 And it's just a way where we can hide the funky char star. 00:43:39.700 --> 00:43:42.070 We can hide the asterisk, in particular, which would not 00:43:42.070 --> 00:43:43.990 be fun to play with in the first few days, 00:43:43.990 --> 00:43:47.200 but without changing the definition of what a string is. 00:43:47.200 --> 00:43:51.850 So strings exist in C. But there's no data type called string in C 00:43:51.850 --> 00:43:56.020 until you use a library like CS50's, which makes it exist 00:43:56.020 --> 00:43:58.930 by way of that kind of definition. 00:43:58.930 --> 00:44:01.450 All right, let me pause here to see if there's 00:44:01.450 --> 00:44:03.760 any questions, then, about what strings are 00:44:03.760 --> 00:44:09.360 or these new ways of thinking about them. 00:44:09.360 --> 00:44:13.390 Any questions about strings or char stars? 00:44:13.390 --> 00:44:15.140 All right, well, if no questions here, why 00:44:15.140 --> 00:44:17.515 don't we go ahead and take our 5 minute break here first. 00:44:17.515 --> 00:44:19.790 And we'll be back in 5 and take another look 00:44:19.790 --> 00:44:22.040 at what we can now do with these new primitives. 00:44:22.040 --> 00:44:23.480 All right, we're back. 00:44:23.480 --> 00:44:27.680 And we have, now, this ability in code to get the address of some variable 00:44:27.680 --> 00:44:30.140 and also to go to an address using ampersand 00:44:30.140 --> 00:44:31.850 and the asterisk, respectively. 00:44:31.850 --> 00:44:36.530 We've thought about strings as being not only contiguous sequences 00:44:36.530 --> 00:44:38.150 of characters, but also arrays. 00:44:38.150 --> 00:44:42.477 And then of course, as of today now, actual addresses, 00:44:42.477 --> 00:44:44.810 the address of the first character and then, from there, 00:44:44.810 --> 00:44:46.940 can we find our way, programmatically, to the end, 00:44:46.940 --> 00:44:48.380 thanks to that nul character. 00:44:48.380 --> 00:44:52.220 But it turns out there's one other thing we can do with these addresses 00:44:52.220 --> 00:44:53.840 or with pointers more generally. 00:44:53.840 --> 00:44:55.550 And that's known as pointer arithmetic. 00:44:55.550 --> 00:44:58.577 So anything that's a number, of course, we can do math on. 00:44:58.577 --> 00:45:00.410 And the math is not going to be complicated, 00:45:00.410 --> 00:45:03.390 but it is going to be powerful for us here. 00:45:03.390 --> 00:45:07.040 So I'm going to go back to my most recent state of address.c. 00:45:07.040 --> 00:45:11.480 And let me go ahead, now, and reiterate that we can print out 00:45:11.480 --> 00:45:15.800 the individual characters in a string, just like we did back in week two, 00:45:15.800 --> 00:45:18.270 as by using our square bracket notation. 00:45:18.270 --> 00:45:21.170 So I'm getting rid of all evidence of those addresses for now. 00:45:21.170 --> 00:45:23.420 I'm recompiling this program as make address. 00:45:23.420 --> 00:45:25.650 And then I'm going to run dot slash address now. 00:45:25.650 --> 00:45:29.690 And I see HI exclamation point, one character per line. 00:45:29.690 --> 00:45:34.290 But now, consider that there doesn't need to be a string data type. 00:45:34.290 --> 00:45:36.320 In fact, we can take this training wheel off. 00:45:36.320 --> 00:45:38.690 And while it might feel a little uncomfortable at first, 00:45:38.690 --> 00:45:42.620 if I delete this first line altogether, as I've accidentally omitted anyway 00:45:42.620 --> 00:45:45.660 sometimes, I don't need to keep calling things strings. 00:45:45.660 --> 00:45:47.570 I can describe them as strings verbally. 00:45:47.570 --> 00:45:49.790 I can think of them as strings, because string 00:45:49.790 --> 00:45:53.150 is a thing in many different programming languages. 00:45:53.150 --> 00:45:56.070 But by default, in C, it just doesn't exist as a type. 00:45:56.070 --> 00:45:59.750 Instead, the type is somewhat cryptically named, char star. 00:45:59.750 --> 00:46:02.840 But again, all that means is that the star means here's 00:46:02.840 --> 00:46:04.010 the address of something. 00:46:04.010 --> 00:46:06.140 Char means it's the address of a char. 00:46:06.140 --> 00:46:09.950 So char star gives you a pointer variable 00:46:09.950 --> 00:46:12.720 that's going to point to a character. 00:46:12.720 --> 00:46:16.080 So now, if s is that, I can actually treat it the same. 00:46:16.080 --> 00:46:20.960 There's no reason I can't keep using s like a string was back in week two, 00:46:20.960 --> 00:46:22.400 using our square bracket notation. 00:46:22.400 --> 00:46:24.770 And I can keep printing out HI exclamation point 00:46:24.770 --> 00:46:27.320 using that same square bracket syntax. 00:46:27.320 --> 00:46:30.170 But there's one other way I can do this. 00:46:30.170 --> 00:46:35.150 If I now know that s is really just an address, 00:46:35.150 --> 00:46:37.760 I can get rid of this square bracket notation. 00:46:37.760 --> 00:46:42.860 And I can actually just do star s, because recall that star, in addition 00:46:42.860 --> 00:46:47.270 to being the new symbol that we use when declaring a pointer up here, 00:46:47.270 --> 00:46:50.990 it's also the same symbol, confusingly, admittedly, 00:46:50.990 --> 00:46:53.310 that we used to go to an address. 00:46:53.310 --> 00:46:57.650 So if s is storing an address, which it is by definition of being a pointer, 00:46:57.650 --> 00:46:59.900 star s means go to that address. 00:46:59.900 --> 00:47:02.000 And per my picture earlier, it would seem 00:47:02.000 --> 00:47:08.060 to be the case that s is most likely at an address beginning at 0x123. 00:47:08.060 --> 00:47:10.250 It's not going to be the same in my actual IDE here. 00:47:10.250 --> 00:47:12.167 It will be whatever the computer has ordained. 00:47:12.167 --> 00:47:14.610 But it's going to be the same exact idea. 00:47:14.610 --> 00:47:17.150 So let me go ahead and go to star s. 00:47:17.150 --> 00:47:20.130 And just for kicks, let me leave it as just that one line. 00:47:20.130 --> 00:47:23.870 So let me go ahead and rerun this as make address. 00:47:23.870 --> 00:47:25.470 All right, and now dot slash address. 00:47:25.470 --> 00:47:30.710 I should see, hopefully, a capital H and only an H. But watch this. 00:47:30.710 --> 00:47:34.400 If I know that s, a string, is technically just an address, 00:47:34.400 --> 00:47:35.960 I can actually now do math on it. 00:47:35.960 --> 00:47:39.470 And I can go ahead and print out another character, followed by a new line. 00:47:39.470 --> 00:47:44.090 And I can go to, not s, but how about s plus 1. 00:47:44.090 --> 00:47:47.600 So I can do some very simple arithmetic, if you will, on that pointer. 00:47:47.600 --> 00:47:49.920 And let me go ahead and now recompile this. 00:47:49.920 --> 00:47:54.800 So make address, compiles OK, dot slash address. 00:47:54.800 --> 00:47:56.570 And I should see HI. 00:47:56.570 --> 00:48:01.790 And if I do one more line of code like this, printf, percent c, backslash n, 00:48:01.790 --> 00:48:07.130 star s plus 2, I can now go to the character 00:48:07.130 --> 00:48:10.770 that is two bytes away from whatever s is, 00:48:10.770 --> 00:48:12.480 which again, is the start of the string. 00:48:12.480 --> 00:48:15.890 So now, I've reprinted HI with the exclamation point character 00:48:15.890 --> 00:48:19.280 by character, but not by using this fancy square bracket 00:48:19.280 --> 00:48:24.710 notation, fancy only in the sense that it was sort of an abstraction for us, 00:48:24.710 --> 00:48:25.670 if you will. 00:48:25.670 --> 00:48:28.885 I'm instead, manipulating s for what it really is, which is just an address. 00:48:28.885 --> 00:48:31.010 And so here, too, and I've used this phrase before, 00:48:31.010 --> 00:48:33.710 that square bracket notation that we introduced in week two, 00:48:33.710 --> 00:48:36.410 is technically just syntactic sugar. 00:48:36.410 --> 00:48:39.500 It's not doing anything fundamentally different 00:48:39.500 --> 00:48:42.770 from these asterisks and these addresses. 00:48:42.770 --> 00:48:45.440 It's just doing it, honestly, in a much more user-friendly way. 00:48:45.440 --> 00:48:49.160 I still prefer, personally, the square bracket notation from week two. 00:48:49.160 --> 00:48:54.680 But it's the same thing as using the star and doing this math yourself. 00:48:54.680 --> 00:48:57.020 So C is just providing us with this handy feature 00:48:57.020 --> 00:49:00.200 of using square brackets that does all of this so-called pointer 00:49:00.200 --> 00:49:02.360 arithmetic for you. 00:49:02.360 --> 00:49:04.290 But again, we're going to this low level just 00:49:04.290 --> 00:49:10.310 to emphasize what it is that's going on ultimately underneath the hood here. 00:49:10.310 --> 00:49:13.070 All right, let me pause here for any questions. 00:49:13.070 --> 00:49:17.290 And Brian, please do feel free to verbalize any on your end. 00:49:17.290 --> 00:49:19.790 BRIAN: I see a question that came in about what would happen 00:49:19.790 --> 00:49:22.233 if you tried to print star s plus 3. 00:49:22.233 --> 00:49:25.400 DAVID MALAN: So I'm pretty sure that's going to print out the nul character. 00:49:25.400 --> 00:49:27.233 But let's go ahead and confirm as much here, 00:49:27.233 --> 00:49:31.760 percent c backslash n star s plus 3. 00:49:31.760 --> 00:49:35.120 All right, I'm getting a little adventurous here 00:49:35.120 --> 00:49:38.060 by looking at things I maybe shouldn't be looking at, because that's 00:49:38.060 --> 00:49:39.545 a low level implementation detail. 00:49:39.545 --> 00:49:40.670 But let's see what happens. 00:49:40.670 --> 00:49:43.130 It compiles OK, dot slash address. 00:49:43.130 --> 00:49:44.780 And it seems to be blank. 00:49:44.780 --> 00:49:46.730 Now, maybe that's the nul character. 00:49:46.730 --> 00:49:48.980 Honestly, it's not meant to be a printable character. 00:49:48.980 --> 00:49:52.770 It's this special sentinel value that indicates the end of the string. 00:49:52.770 --> 00:49:54.020 But I could do this. 00:49:54.020 --> 00:49:57.170 I know from week two that chars are integers 00:49:57.170 --> 00:49:59.670 and integers are chars if I want to think of them that way. 00:49:59.670 --> 00:50:01.880 So let me change only the very last character 00:50:01.880 --> 00:50:03.950 to use the format code percent i. 00:50:03.950 --> 00:50:05.690 Let me recompile my code. 00:50:05.690 --> 00:50:07.940 Let me go ahead and run address. 00:50:07.940 --> 00:50:11.540 And voila, HI exclamation 0. 00:50:11.540 --> 00:50:16.400 And there is the all 0 bits represented here as one single decimal digit thanks 00:50:16.400 --> 00:50:17.570 to percent i. 00:50:17.570 --> 00:50:19.970 Now, I can get really crazy here. 00:50:19.970 --> 00:50:23.420 And why don't we go ahead and print out not just what characters 00:50:23.420 --> 00:50:28.580 are right after this sequence, HI exclamation point nul character, 00:50:28.580 --> 00:50:33.770 why don't we go to-- oh heck, how about address 1,000 bytes away, 00:50:33.770 --> 00:50:35.990 and really get nosy inside of my computer? 00:50:35.990 --> 00:50:38.450 Let me recompile that dot slash address. 00:50:38.450 --> 00:50:40.460 OK, nothing really going on over there. 00:50:40.460 --> 00:50:42.620 How about 10,000 bytes away? 00:50:42.620 --> 00:50:44.270 Let me go ahead and make address. 00:50:44.270 --> 00:50:47.990 Let me go ahead and run this segmentation fault. All, right 00:50:47.990 --> 00:50:49.010 that's bad. 00:50:49.010 --> 00:50:53.030 And you might be among the fortunate few who have seen this error before 00:50:53.030 --> 00:50:54.440 by touching memory you shouldn't. 00:50:54.440 --> 00:50:56.607 And we're going to deliberately consider this today. 00:50:56.607 --> 00:50:59.540 But a segmentation fault, indeed, means that you have done something 00:50:59.540 --> 00:51:01.430 wrong somewhere in your code. 00:51:01.430 --> 00:51:04.000 And it tends to mean that you touched a segment of memory 00:51:04.000 --> 00:51:05.000 that you shouldn't have. 00:51:05.000 --> 00:51:08.750 And I have no business, honestly, looking 10,000 bytes away 00:51:08.750 --> 00:51:11.420 from the memory that I know belongs to the string. 00:51:11.420 --> 00:51:14.670 That's like arbitrarily looking anywhere in your computer's memory, 00:51:14.670 --> 00:51:16.890 which probably, it seems, is not a good idea. 00:51:16.890 --> 00:51:19.000 But more on that in just a bit. 00:51:19.000 --> 00:51:21.470 So let's consider, now, some of the implications 00:51:21.470 --> 00:51:25.130 of these underlying implementation details 00:51:25.130 --> 00:51:28.580 and consider, now, from last week, why we did a few things the way 00:51:28.580 --> 00:51:30.590 we did in the past few weeks, in fact. 00:51:30.590 --> 00:51:32.360 So string is just a char star. 00:51:32.360 --> 00:51:33.860 And let's, now, consider an example. 00:51:33.860 --> 00:51:37.260 Let me zoom out on my memory, just so I can cram more in at once. 00:51:37.260 --> 00:51:39.620 Let's consider an example where I might want to write 00:51:39.620 --> 00:51:42.570 a program that compares two strings. 00:51:42.570 --> 00:51:45.830 Let me go ahead and write some new code here in a new file this time, 00:51:45.830 --> 00:51:48.350 called, for instance, compare.c. 00:51:48.350 --> 00:51:50.480 My goal with this program, quite simply, is 00:51:50.480 --> 00:51:55.580 going to be to print out the contents of-- or rather to compare 00:51:55.580 --> 00:51:57.590 two strings that the user might input. 00:51:57.590 --> 00:52:00.040 I'm going to go ahead and include cs59.h, 00:52:00.040 --> 00:52:02.810 not because I want string, per say, anymore, 00:52:02.810 --> 00:52:05.750 but because I want to use get string just for convenience. 00:52:05.750 --> 00:52:08.180 But we'll take that training wheel off in a bit, too. 00:52:08.180 --> 00:52:10.520 And in this program, I'm going to go ahead and first 00:52:10.520 --> 00:52:11.690 use, not get string yet. 00:52:11.690 --> 00:52:14.450 Let me go ahead and keep it simple and start with get int. 00:52:14.450 --> 00:52:16.910 And I'll ask the user for a variable i. 00:52:16.910 --> 00:52:19.340 And let me do another one of these in get int and ask 00:52:19.340 --> 00:52:21.270 the user for a value for j. 00:52:21.270 --> 00:52:24.665 And then let me go ahead and quite simply say, if i equals equals j, 00:52:24.665 --> 00:52:28.790 then go ahead and print out same else. 00:52:28.790 --> 00:52:31.770 Let me go ahead and print out different. 00:52:31.770 --> 00:52:35.930 So this is week one stuff, where I'm using a couple of variables. 00:52:35.930 --> 00:52:38.300 I'm using a condition with two branches, and I'm 00:52:38.300 --> 00:52:42.990 using printf to print out whether those two variables, i and j, are the same. 00:52:42.990 --> 00:52:44.930 So let's go ahead and compile this. 00:52:44.930 --> 00:52:45.950 All is well. 00:52:45.950 --> 00:52:49.310 Run compare, and let me give it digits 1 and 2. 00:52:49.310 --> 00:52:50.630 And indeed, they're different. 00:52:50.630 --> 00:52:53.400 And let me go ahead and give it 1 and 1, and they're the same. 00:52:53.400 --> 00:52:56.270 So I think, logically, proof by example, if you will, 00:52:56.270 --> 00:52:57.860 this program looks correct. 00:52:57.860 --> 00:53:02.630 But let me quickly make it seemingly uncorrect, by not using integers. 00:53:02.630 --> 00:53:05.840 But how about, by using strings instead. 00:53:05.840 --> 00:53:07.988 Let me go ahead and give myself a string. 00:53:07.988 --> 00:53:10.280 Although, no, I don't need that training wheel anymore. 00:53:10.280 --> 00:53:15.300 Let's just do char star s equals get string of s. 00:53:15.300 --> 00:53:17.300 But again, even though I'm calling it char star, 00:53:17.300 --> 00:53:19.580 it's still a string like it was weeks ago. 00:53:19.580 --> 00:53:23.510 Let me give myself another string called t, just to keep the name short. 00:53:23.510 --> 00:53:25.100 And s will get-- 00:53:25.100 --> 00:53:26.730 t will get that value there. 00:53:26.730 --> 00:53:30.140 And let me just, very naively but kind of reasonably, 00:53:30.140 --> 00:53:34.310 say if s equals equals t, let's go ahead and print out same. 00:53:34.310 --> 00:53:38.000 And otherwise, let's go ahead and print out different. 00:53:38.000 --> 00:53:41.240 So same exact code, just different data types, and using 00:53:41.240 --> 00:53:42.830 get string instead of get int. 00:53:42.830 --> 00:53:47.360 Let me go ahead and make compare, seems to compile OK, dot slash compare. 00:53:47.360 --> 00:53:51.770 Let me go ahead and type in HI!-- 00:53:51.770 --> 00:53:53.570 woops, HI!. 00:53:53.570 --> 00:53:55.220 Let me go ahead and type in HI! again. 00:53:55.220 --> 00:53:57.500 And voila, different. 00:53:57.500 --> 00:54:01.010 And I forgot my backslash n's, but that seems to be the least of my problems. 00:54:01.010 --> 00:54:05.240 Let me recompile this, make compare, and now, let me run it again. 00:54:05.240 --> 00:54:07.130 How about, let's do a quick test. 00:54:07.130 --> 00:54:09.010 David, Brian, these are definitely different. 00:54:09.010 --> 00:54:09.580 OK, good. 00:54:09.580 --> 00:54:11.240 So the program seems to work. 00:54:11.240 --> 00:54:13.150 How about David, David? 00:54:13.150 --> 00:54:14.140 Also different. 00:54:14.140 --> 00:54:15.370 Huh, let me try again. 00:54:15.370 --> 00:54:18.600 Brian, Brian, also different. 00:54:18.600 --> 00:54:21.570 But I'm pretty sure those strings are the same. 00:54:21.570 --> 00:54:24.180 Why might this program be flawed? 00:54:24.180 --> 00:54:28.582 What is wrong with this program right now? 00:54:28.582 --> 00:54:30.290 BRIAN: A couple of people in the chat are 00:54:30.290 --> 00:54:32.750 saying that we're not actually comparing the characters, 00:54:32.750 --> 00:54:34.370 we're comparing the addresses. 00:54:34.370 --> 00:54:37.377 DAVID MALAN: Yeah, so that's sort of the logical conclusion from today's 00:54:37.377 --> 00:54:38.960 definition of what a string really is. 00:54:38.960 --> 00:54:41.750 If a string is just the address of its first character, 00:54:41.750 --> 00:54:44.450 then if you're literally doing s equals equals t, 00:54:44.450 --> 00:54:46.697 you're comparing those two addresses. 00:54:46.697 --> 00:54:48.530 And they are probably going to be different, 00:54:48.530 --> 00:54:50.990 even if I type in the same thing, because every time we've 00:54:50.990 --> 00:54:55.010 called get int or get string, it's kind of plopped the user's input 00:54:55.010 --> 00:54:56.750 somewhere in my computer's memory. 00:54:56.750 --> 00:55:00.560 But we now have the tools, honestly, to answer this or vet this answer 00:55:00.560 --> 00:55:01.130 ourselves. 00:55:01.130 --> 00:55:03.230 Let me go ahead and simplify this program. 00:55:03.230 --> 00:55:06.050 And let's, just as a quick sanity check, print out s. 00:55:06.050 --> 00:55:10.610 And let's go ahead and print out t using a new line after each, 00:55:10.610 --> 00:55:12.350 just so we can see what the strings are. 00:55:12.350 --> 00:55:16.830 So let me go ahead and do this again, make compare, compiles OK, dot slash 00:55:16.830 --> 00:55:17.330 compare. 00:55:17.330 --> 00:55:19.310 Let me type in HI, HI. 00:55:19.310 --> 00:55:21.710 And they seem to be visually the same. 00:55:21.710 --> 00:55:24.770 But recall that, now, I have this other format code, 00:55:24.770 --> 00:55:27.080 such that I can now start treating strings 00:55:27.080 --> 00:55:29.330 as the addresses they technically are. 00:55:29.330 --> 00:55:33.140 So let me change percent s to percent p in both places. 00:55:33.140 --> 00:55:37.610 Let me then recompile the program, and now, rerun compare with both HI and HI 00:55:37.610 --> 00:55:38.690 identically typed. 00:55:38.690 --> 00:55:43.100 But notice, they've ended up at slightly different memory locations. 00:55:43.100 --> 00:55:46.820 Even though I have coincidentally typed the same thing, C and my computer 00:55:46.820 --> 00:55:52.097 are not going to be so presumptuous as to use the same bytes for both strings. 00:55:52.097 --> 00:55:53.930 That's not going to give me much flexibility 00:55:53.930 --> 00:55:55.490 if I want to change one or the other. 00:55:55.490 --> 00:55:58.490 It's going to very simplistically put one in this chunk of memory 00:55:58.490 --> 00:56:00.240 and the other in this chunk of memory. 00:56:00.240 --> 00:56:03.680 And indeed, those addresses are respectively, but arbitrarily, 00:56:03.680 --> 00:56:07.220 0x22fe670 and 0x22fe6b0. 00:56:09.770 --> 00:56:12.500 So they are spread apart some distance. 00:56:12.500 --> 00:56:15.810 But again, it's up to the computer to decide where to actually put those. 00:56:15.810 --> 00:56:18.310 So what's actually going on inside of the computer's memory? 00:56:18.310 --> 00:56:22.010 Well, let's consider if, for instance, this is s, my pointer, or really, 00:56:22.010 --> 00:56:22.640 my string. 00:56:22.640 --> 00:56:23.810 But it's just a pointer now. 00:56:23.810 --> 00:56:25.060 It's the address of something. 00:56:25.060 --> 00:56:28.250 Notice that I've drawn it as taking up eight squares, 00:56:28.250 --> 00:56:31.680 because again, a pointer on modern systems is eight bytes. 00:56:31.680 --> 00:56:33.320 So that's why this thing is so big. 00:56:33.320 --> 00:56:37.100 Meanwhile, when I type in something like HI with the exclamation point, 00:56:37.100 --> 00:56:38.720 then it ends up somewhere in memory. 00:56:38.720 --> 00:56:40.440 We don't really know or care where it is. 00:56:40.440 --> 00:56:42.773 So let's just arbitrarily say it happens to end up there 00:56:42.773 --> 00:56:43.850 in my computer's memory. 00:56:43.850 --> 00:56:46.730 Now, each of those bytes, of course, has an address. 00:56:46.730 --> 00:56:48.950 I don't necessarily know or care what they are. 00:56:48.950 --> 00:56:52.040 But for explanation's sake, let's just number them again like before, 00:56:52.040 --> 00:56:56.810 0x123, 0x124, 0x125, 0x126. 00:56:56.810 --> 00:57:02.960 When I then assign s on the left the value from get string on the right, 00:57:02.960 --> 00:57:04.670 get string, what is it going to do? 00:57:04.670 --> 00:57:07.640 Well, all of this time since week one, since you've been using it, 00:57:07.640 --> 00:57:11.970 it is, yes, getting a string and handing it back to you as a return value. 00:57:11.970 --> 00:57:13.680 But what does that really mean? 00:57:13.680 --> 00:57:18.200 Well, if a string is just an address, the return value of a function 00:57:18.200 --> 00:57:23.030 like get string is to return to, not the string per se, because that's 00:57:23.030 --> 00:57:24.740 kind of a high level concept. 00:57:24.740 --> 00:57:27.050 What get string has always been doing for us 00:57:27.050 --> 00:57:29.810 is returning the address of the string, or more 00:57:29.810 --> 00:57:33.410 specifically, the address of the first character in the string. 00:57:33.410 --> 00:57:39.740 And so what is technically stored in s, to be clear, is that address, 0x123. 00:57:39.740 --> 00:57:43.400 It's not returning to the whole string, the H, the I, the exclamation point. 00:57:43.400 --> 00:57:46.040 Rather, it's returning just one value to you. 00:57:46.040 --> 00:57:50.990 It's returning only to you the address of the first character of that string. 00:57:50.990 --> 00:57:54.500 But again, this is all very good for just s. 00:57:54.500 --> 00:57:55.880 What's going on with t? 00:57:55.880 --> 00:57:58.910 t is kind of the same story, because I'm calling get string again. 00:57:58.910 --> 00:58:02.390 t is going to get assigned the address of the first character 00:58:02.390 --> 00:58:03.500 of this version of HI. 00:58:03.500 --> 00:58:13.160 And let's just arbitrarily say it's at 0x456, 0x457, 0x458, and 0x459. 00:58:13.160 --> 00:58:16.873 And at this point, t is going to take on the value of 0x456. 00:58:16.873 --> 00:58:19.790 And now, at this point, honestly, we're really getting into the weeds. 00:58:19.790 --> 00:58:21.665 Let's just start abstracting all of this away 00:58:21.665 --> 00:58:23.870 and use arrows to point at the values. 00:58:23.870 --> 00:58:26.720 And indeed, these arrows just represent pointers 00:58:26.720 --> 00:58:29.190 when we stop caring about the particular addresses. 00:58:29.190 --> 00:58:32.300 So s is really just a pointer, a variable pointing 00:58:32.300 --> 00:58:34.070 at the first character of HI here. 00:58:34.070 --> 00:58:38.490 t is just a variable pointing at the first character of HI there. 00:58:38.490 --> 00:58:41.540 And so when you are comparing two strings 00:58:41.540 --> 00:58:45.440 as I was before in the earlier version of my program, 00:58:45.440 --> 00:58:53.540 where I was checking if s equals equals t, I was, indeed, comparing s and t. 00:58:53.540 --> 00:58:55.130 What are s and t? 00:58:55.130 --> 00:59:01.640 s and t, respectively, are 0x123 and 0x456, 00:59:01.640 --> 00:59:03.770 or whatever the actual values happen to be, 00:59:03.770 --> 00:59:06.320 which are not going to be the same because they happen 00:59:06.320 --> 00:59:09.920 to point to different chunks of memory. 00:59:09.920 --> 00:59:12.110 All right, well who cares? 00:59:12.110 --> 00:59:14.630 This is all kind of a nice intellectual exercise. 00:59:14.630 --> 00:59:15.512 But who cares? 00:59:15.512 --> 00:59:16.970 Well, how do we solve this problem? 00:59:16.970 --> 00:59:20.480 Let's consider what I actually did in a previous demo. 00:59:20.480 --> 00:59:23.955 I sort of preemptively mentioned that there's this function, string compare, 00:59:23.955 --> 00:59:25.580 that allows you to compare two strings. 00:59:25.580 --> 00:59:28.040 And I promised that we would eventually explain 00:59:28.040 --> 00:59:31.573 why we use str compare as opposed to just using the equal equal sign. 00:59:31.573 --> 00:59:33.740 Well, to use this function, I'm going to need to add 00:59:33.740 --> 00:59:37.910 in string.h up here per lat time. 00:59:37.910 --> 00:59:40.790 But if string compare s t, let me go ahead and recompile this, 00:59:40.790 --> 00:59:43.160 compare dots slash compare. 00:59:43.160 --> 00:59:45.710 Now, let me type HI! and HI! identically. 00:59:45.710 --> 00:59:47.870 Now, they still seem to be different. 00:59:47.870 --> 00:59:51.680 And dammit, I made the same stupid mistake as I did last time. 00:59:51.680 --> 00:59:57.170 Does anyone know what mistake I made when comparing two strings? 00:59:57.170 --> 01:00:00.590 Somehow I seem to be very good at making this mistake. 01:00:00.590 --> 01:00:03.440 BRIAN: Ibrahim is suggesting that you add an equal equal zero. 01:00:03.440 --> 01:00:04.398 DAVID MALAN: Thank you. 01:00:04.398 --> 01:00:05.390 Ibrahim is quite right. 01:00:05.390 --> 01:00:08.000 The return value, recall, of str compare, 01:00:08.000 --> 01:00:13.040 is to return 0 if they're the same, a negative number if one comes 01:00:13.040 --> 01:00:16.430 before the other, and a positive number if one comes after the other, 01:00:16.430 --> 01:00:18.600 as in ASCIIbetical order. 01:00:18.600 --> 01:00:21.440 So what I should have done, both last time and this time, 01:00:21.440 --> 01:00:23.600 is check for equality with 0. 01:00:23.600 --> 01:00:26.220 Let me go ahead and recompile this program. 01:00:26.220 --> 01:00:27.050 OK, good. 01:00:27.050 --> 01:00:29.090 Now, let me rerun this program with HI! 01:00:29.090 --> 01:00:30.230 twice. 01:00:30.230 --> 01:00:31.940 Voila, they're the same. 01:00:31.940 --> 01:00:34.580 And just to make sure, let me do one other check. 01:00:34.580 --> 01:00:38.810 Let me do David and Brian, which should be, indeed, different. 01:00:38.810 --> 01:00:42.050 So now, again, I haven't really done anything different from that last time. 01:00:42.050 --> 01:00:47.420 But I'm now thinking about these strings as being fundamentally just 01:00:47.420 --> 01:00:48.173 their addresses. 01:00:48.173 --> 01:00:50.090 And so, now, let's make this actually germane. 01:00:50.090 --> 01:00:52.160 Let me go ahead and create a new file altogether. 01:00:52.160 --> 01:00:56.590 And let's, pretty reasonably, try to copy one string and make changes to it. 01:00:56.590 --> 01:00:57.840 So I'm going to go ahead here. 01:00:57.840 --> 01:01:00.230 And just for convenience, I'm going to still use the CS50 library, 01:01:00.230 --> 01:01:02.300 not for the string data type, but just for the 01:01:02.300 --> 01:01:06.200 get string function, which we'll see is more handy than other things-- 01:01:06.200 --> 01:01:07.790 than other ways of doing things. 01:01:07.790 --> 01:01:11.630 And I'm going to go ahead and include standard io dot h. 01:01:11.630 --> 01:01:17.450 And I'm going to go ahead and include, how about, string.h. 01:01:17.450 --> 01:01:20.000 Let me go ahead and do int main void. 01:01:20.000 --> 01:01:22.790 And let me go ahead, in this program, and get myself a string. 01:01:22.790 --> 01:01:24.540 But note, we won't call it string anymore. 01:01:24.540 --> 01:01:26.030 We'll just call it char star. 01:01:26.030 --> 01:01:28.380 So again, start taking off that training wheel. 01:01:28.380 --> 01:01:31.312 And I'm going to go ahead and get a string called s. 01:01:31.312 --> 01:01:33.020 And then I'm going to get another string. 01:01:33.020 --> 01:01:34.062 But I won't call it that. 01:01:34.062 --> 01:01:36.230 I'll call it char star t. 01:01:36.230 --> 01:01:37.400 And I want to copy s. 01:01:37.400 --> 01:01:40.790 And so you might think, based on week one, week two, and since, that OK, 01:01:40.790 --> 01:01:42.890 if you want to copy a variable, just do it. 01:01:42.890 --> 01:01:44.690 I mean, we've used the assignment operator 01:01:44.690 --> 01:01:48.530 to copy a variable from right to left for integers, for chars, 01:01:48.530 --> 01:01:50.600 and for other data types, perhaps, too. 01:01:50.600 --> 01:01:54.690 I'm going to go ahead, now, and make a change to the original string. 01:01:54.690 --> 01:01:56.270 So let me go ahead and do this. 01:01:56.270 --> 01:02:01.280 Let me go ahead and say, let's change the first character of t 01:02:01.280 --> 01:02:02.780 to be uppercase. 01:02:02.780 --> 01:02:04.940 Recall that there's this function, to upper, 01:02:04.940 --> 01:02:09.170 which takes, as input, a character, like the first character in t, 01:02:09.170 --> 01:02:11.120 and returns the uppercase version. 01:02:11.120 --> 01:02:14.240 Now, to use to upper, I need another header file, 01:02:14.240 --> 01:02:17.990 which I recall from a couple of weeks ago now, I need ctype.h. 01:02:17.990 --> 01:02:20.750 So let me preemptively go back and put that there. 01:02:20.750 --> 01:02:23.280 And now, let me go ahead and print these two strings. 01:02:23.280 --> 01:02:27.500 Let me go ahead and print out s as being this percent s. 01:02:27.500 --> 01:02:33.990 And let me go ahead and print out the value of t with percent s as follows. 01:02:33.990 --> 01:02:36.680 So again, what I'm doing is I'm getting a string from the user. 01:02:36.680 --> 01:02:40.490 And the only new thing here is char star today, which is synonymous with string. 01:02:40.490 --> 01:02:44.270 On line 10 here, I'm copying the string from right to left. 01:02:44.270 --> 01:02:47.330 And then I'm capitalizing only the first letter 01:02:47.330 --> 01:02:49.640 in the copy, otherwise known as t. 01:02:49.640 --> 01:02:51.140 And then I'm just printing both out. 01:02:51.140 --> 01:02:54.290 So let me go ahead and make copy, compiles OK. 01:02:54.290 --> 01:02:56.510 Make cop-- dot slash copy. 01:02:56.510 --> 01:03:00.020 Let me go ahead and type in hi! in lowercase, all lowercase, 01:03:00.020 --> 01:03:00.920 and then enter. 01:03:00.920 --> 01:03:03.830 And voila, huh. 01:03:03.830 --> 01:03:10.760 It would seem that I somehow capitalized both S and T, even though I only 01:03:10.760 --> 01:03:17.080 called to upper on T. Brian, any thoughts 01:03:17.080 --> 01:03:24.820 from the group on why I've accidentally and erroneously capitalized 01:03:24.820 --> 01:03:26.260 both somehow? 01:03:26.260 --> 01:03:29.735 BRIAN: A couple of people are saying that t is just an alias of s. 01:03:29.735 --> 01:03:32.860 DAVID MALAN: Just an alias of s, that's a reasonable way of thinking of it, 01:03:32.860 --> 01:03:33.360 sure. 01:03:33.360 --> 01:03:38.320 And more precisely, any other thoughts on why this is incorrect somehow? 01:03:38.320 --> 01:03:41.540 BRIAN: Peter is now suggesting that they have the same address. 01:03:41.540 --> 01:03:45.880 DAVID MALAN: So yeah, more specifically, all I've done is copy s into t. 01:03:45.880 --> 01:03:48.040 But again, what is s as of today? 01:03:48.040 --> 01:03:49.390 It's just an address. 01:03:49.390 --> 01:03:51.040 So yes, I have copied s. 01:03:51.040 --> 01:03:54.820 But I've copied it literally, which means copying its address, 0x123, 01:03:54.820 --> 01:03:55.820 or whatever it is. 01:03:55.820 --> 01:04:01.180 And then on line 12, notice that I'm changing t by uppercasing it. 01:04:01.180 --> 01:04:04.130 But t is at the same address of s. 01:04:04.130 --> 01:04:08.130 So really, I'm changing one in the same string. 01:04:08.130 --> 01:04:10.630 So if we think about this in terms of the computer's memory, 01:04:10.630 --> 01:04:12.088 let's consider what I've just done. 01:04:12.088 --> 01:04:13.570 Let me clear the computer's memory. 01:04:13.570 --> 01:04:15.290 Let me put s down as before. 01:04:15.290 --> 01:04:18.250 Let me put hi! down as before, but all lowercase this time. 01:04:18.250 --> 01:04:23.320 And recall that it might be it addresses 0x123, 124, 125, and 126. 01:04:23.320 --> 01:04:26.350 And now, if we consider that s technically 01:04:26.350 --> 01:04:29.740 contains the address of that first character, 0x123, 01:04:29.740 --> 01:04:34.960 and I proceed to create a new variable, t, and assign t the value of s, 01:04:34.960 --> 01:04:36.970 I got to take that statement literally. 01:04:36.970 --> 01:04:39.670 I'm literally just putting 0x123 here. 01:04:39.670 --> 01:04:41.770 And if we now abstract away these details just 01:04:41.770 --> 01:04:44.020 to make it more clear visually what's going on, 01:04:44.020 --> 01:04:48.070 that's pretty much like saying that both s and t point 01:04:48.070 --> 01:04:49.750 to the same location in memory. 01:04:49.750 --> 01:04:52.297 So yes, in that sense, t is just an alias for s, 01:04:52.297 --> 01:04:54.130 which is a reasonable way of thinking of it. 01:04:54.130 --> 01:04:56.920 But really, just t is identical to s. 01:04:56.920 --> 01:04:59.110 So when you use the square bracket notation 01:04:59.110 --> 01:05:02.290 to go to the first character of t, you are equivalently 01:05:02.290 --> 01:05:04.750 going to the first character in s. 01:05:04.750 --> 01:05:06.200 They are one in the same. 01:05:06.200 --> 01:05:10.390 So when I call to upper, I'm calling it on this character, which of course, is 01:05:10.390 --> 01:05:12.970 the one and only h in the story. 01:05:12.970 --> 01:05:16.240 And when I print s and I print t, printf is 01:05:16.240 --> 01:05:18.610 following those same breadcrumbs, if you will, 01:05:18.610 --> 01:05:24.070 and ultimately displaying the same value as having changed. 01:05:24.070 --> 01:05:27.220 So we would seem to need to fundamentally rethink 01:05:27.220 --> 01:05:28.990 how we are copying strings. 01:05:28.990 --> 01:05:34.300 And let me ask, if this is the wrong way to copy one string into the other, what 01:05:34.300 --> 01:05:35.350 is the right way? 01:05:35.350 --> 01:05:39.340 Even if you don't have the functions in mind or the right vocabulary, 01:05:39.340 --> 01:05:43.750 just intuitively, , if we want to copy a string in the way that a human would 01:05:43.750 --> 01:05:50.020 think of copying one into the other, like a photograph or a photocopy, 01:05:50.020 --> 01:05:52.610 how do we want to do this? 01:05:52.610 --> 01:05:54.460 Any thoughts, Brian? 01:05:54.460 --> 01:05:57.430 BRIAN: Yeah, Sophia suggested we would want to somehow loop over 01:05:57.430 --> 01:05:59.948 the elements in s and put them into t. 01:05:59.948 --> 01:06:01.240 DAVID MALAN: Yeah, I like that. 01:06:01.240 --> 01:06:04.120 So loop over the elements of s and put them into t. 01:06:04.120 --> 01:06:05.800 So it sounds like more work. 01:06:05.800 --> 01:06:07.660 But that's, again, what we're going to have 01:06:07.660 --> 01:06:09.582 to do if we want to think of these-- 01:06:09.582 --> 01:06:12.790 if we want to accept the fact that these things, s and t, are just addresses, 01:06:12.790 --> 01:06:15.550 we're going to now have to go and follow those breadcrumbs. 01:06:15.550 --> 01:06:18.790 So let's go ahead and consider a variant of this program. 01:06:18.790 --> 01:06:24.520 Let me go ahead, here, and change this such that I'm still getting a string s. 01:06:24.520 --> 01:06:28.390 But now, let me go ahead and propose exactly that, 01:06:28.390 --> 01:06:30.340 that we copy the individual characters. 01:06:30.340 --> 01:06:32.320 But I need to copy them somewhere. 01:06:32.320 --> 01:06:35.200 So I feel like another step in this process of copying a string 01:06:35.200 --> 01:06:37.750 has to be to give myself some additional memory. 01:06:37.750 --> 01:06:40.840 If I have H i exclamation point in nul character, 01:06:40.840 --> 01:06:43.150 I need to, now, somehow take control of this situation 01:06:43.150 --> 01:06:48.320 and tell the computer somehow, in code, give me four more bytes of memory 01:06:48.320 --> 01:06:53.390 so that I have location for t in which to copy those characters. 01:06:53.390 --> 01:06:55.360 So here's a new function today. 01:06:55.360 --> 01:06:59.470 If I want to create a string t, otherwise known today as a char star, 01:06:59.470 --> 01:07:02.680 there is a new function we can use called malloc, which 01:07:02.680 --> 01:07:04.720 represents memory allocation. 01:07:04.720 --> 01:07:08.200 This is a pretty fancy function that, fortunately, is pretty simple to use. 01:07:08.200 --> 01:07:10.390 It takes, as input, just a number. 01:07:10.390 --> 01:07:14.480 How many bytes of memory do you want to ask the computer for? 01:07:14.480 --> 01:07:16.000 So how do I do this? 01:07:16.000 --> 01:07:20.110 Well, H i exclamation point backslash 0, I could literally just say four. 01:07:20.110 --> 01:07:21.850 But this doesn't feel very dynamic. 01:07:21.850 --> 01:07:26.410 I think I can programmatically implement this a little more elegantly. 01:07:26.410 --> 01:07:30.370 Let me go ahead and say, give me as many bytes 01:07:30.370 --> 01:07:35.200 as there are characters in s plus 1. 01:07:35.200 --> 01:07:37.090 Plus 1, why am I doing this? 01:07:37.090 --> 01:07:40.773 Well, H i exclamation point nul character, that's technically 01:07:40.773 --> 01:07:42.190 what's stored underneath the hood. 01:07:42.190 --> 01:07:45.250 But what do you and I think of the length of Hi! as being? 01:07:45.250 --> 01:07:48.070 Well, odds are, in the human world, it's H i exclamation point. 01:07:48.070 --> 01:07:50.710 And who cares about this low level detail, this nul terminator. 01:07:50.710 --> 01:07:53.800 You don't include that in the length of an English word or any word. 01:07:53.800 --> 01:07:56.290 You only think of the actual characters you can see. 01:07:56.290 --> 01:08:00.580 So the length of H, i, exclamation point 3. 01:08:00.580 --> 01:08:08.110 But I do need to cleverly add one more bite, a fourth, for the nul character, 01:08:08.110 --> 01:08:10.580 because I'm going to have to copy that over as well. 01:08:10.580 --> 01:08:13.270 Otherwise, if I don't have an identical nul character, 01:08:13.270 --> 01:08:15.830 t is not going to have an obvious ending. 01:08:15.830 --> 01:08:17.872 So how do I copy, now, one string into the other? 01:08:17.872 --> 01:08:20.538 Well, let me go ahead and take out our old friend, the for loop, 01:08:20.538 --> 01:08:21.380 from week one. 01:08:21.380 --> 01:08:24.050 And say, for i equals 0-- 01:08:24.050 --> 01:08:26.810 how about, actually, n equals string length of s. 01:08:26.810 --> 01:08:28.279 We've done this trick before. 01:08:28.279 --> 01:08:33.080 i is less than n, i++. 01:08:33.080 --> 01:08:38.689 Let me go ahead and, quite simply, say t bracket i gets s bracket i. 01:08:38.689 --> 01:08:43.939 So this will literally copy, from s, each of the characters one at a time 01:08:43.939 --> 01:08:45.020 into t. 01:08:45.020 --> 01:08:46.640 But I need to be a little smarter now. 01:08:46.640 --> 01:08:49.130 Even though we almost always do i less than n, 01:08:49.130 --> 01:08:55.660 I'm actually going to very aggressively say i less than or equal to n. 01:08:55.660 --> 01:08:56.830 Why? 01:08:56.830 --> 01:09:00.250 Why am I going one step further than I feel we normally 01:09:00.250 --> 01:09:03.310 do when iterating over strings, and one step further than you 01:09:03.310 --> 01:09:07.149 probably did when iterating over a caesar cipher or a string 01:09:07.149 --> 01:09:09.130 in that context? 01:09:09.130 --> 01:09:10.939 Brian, any thoughts here? 01:09:10.939 --> 01:09:16.569 Why am I going from i less than or equal to n kind of for the first time here? 01:09:16.569 --> 01:09:19.779 BRIAN: Celina is suggesting that we need to include the nul character. 01:09:19.779 --> 01:09:22.843 DAVID MALAN: Yeah, so if I-- and now I understand how strings works. 01:09:22.843 --> 01:09:25.510 So it's not sufficient to just copy the H, I, exclamation point. 01:09:25.510 --> 01:09:29.020 I need to go one step further, one more than the length of the string. 01:09:29.020 --> 01:09:32.290 And the easiest way to do that would be less than or equal to n. 01:09:32.290 --> 01:09:34.450 Or I could just do a plus 1 there. 01:09:34.450 --> 01:09:35.950 Or I can do this any number of ways. 01:09:35.950 --> 01:09:37.399 Doesn't matter how you do it. 01:09:37.399 --> 01:09:40.899 But I think a less than or equal to is one reasonable way to do it. 01:09:40.899 --> 01:09:43.540 And now, let's go down to the bottom here and now actually 01:09:43.540 --> 01:09:44.590 do this capitalization. 01:09:44.590 --> 01:09:47.710 Let's now change the first character in t 01:09:47.710 --> 01:09:52.750 to be the result of calling to upper on the first character of t. 01:09:52.750 --> 01:09:56.770 And then, as before, let's go ahead and print out whatever s is. 01:09:56.770 --> 01:09:59.080 And like before, let's go ahead and print out 01:09:59.080 --> 01:10:05.110 whatever t is and hope now that only t has been capitalized. 01:10:05.110 --> 01:10:07.330 But I do need to make one change now. 01:10:07.330 --> 01:10:10.690 It turns out that this function, malloc, comes 01:10:10.690 --> 01:10:12.897 in a file called standard lib dot h. 01:10:12.897 --> 01:10:15.730 And again, this is the kind of thing that you can jot down in notes. 01:10:15.730 --> 01:10:17.563 You can always Google these kinds of things. 01:10:17.563 --> 01:10:20.740 Even I forget what header files these functions are sometimes declared in. 01:10:20.740 --> 01:10:24.310 But it happens to be a new one called standard lib for library 01:10:24.310 --> 01:10:26.110 that gives you access to malloc. 01:10:26.110 --> 01:10:29.800 So let me go ahead, now, and make compare. 01:10:29.800 --> 01:10:31.210 All right, so far so good. 01:10:31.210 --> 01:10:34.360 Dot slash compare-- sorry, this is not compare. 01:10:34.360 --> 01:10:35.680 The old program works fine. 01:10:35.680 --> 01:10:38.630 Make copy-- oh my god, seven mistakes. 01:10:38.630 --> 01:10:40.460 What'd I do wrong here? 01:10:40.460 --> 01:10:44.560 Oh, it looks like I forgot the type of i and n. 01:10:44.560 --> 01:10:47.440 So let me go into my for loop and add the int. 01:10:47.440 --> 01:10:49.870 That was my fault. Let me make copy again. 01:10:49.870 --> 01:10:51.910 OK, all seven errors, thankfully, went away. 01:10:51.910 --> 01:10:56.710 Make copy, let's go ahead and type in hi! in lower case and hit Enter. 01:10:56.710 --> 01:11:02.860 And voila, now I have capitalized only the copy of s, a.k.a. 01:11:02.860 --> 01:11:03.580 t. 01:11:03.580 --> 01:11:06.010 And just to be clear, I've kind of regressed back 01:11:06.010 --> 01:11:09.140 to my square bracket notation, honestly, because it's perfectly acceptable. 01:11:09.140 --> 01:11:10.360 It's very readable. 01:11:10.360 --> 01:11:12.640 But notice, if I really want to show off, 01:11:12.640 --> 01:11:19.190 I could say something like, well, go to t's plus i location. 01:11:19.190 --> 01:11:23.078 And then do this, which again, I don't necessarily recommend for readability. 01:11:23.078 --> 01:11:24.620 But again, there is this equivalence. 01:11:24.620 --> 01:11:28.640 The square bracket notation is the same thing as pointer arithmetic. 01:11:28.640 --> 01:11:34.160 So if you want to go to the address at t plus whatever i is to offset yourself 01:11:34.160 --> 01:11:36.570 one or more bytes, you can totally do that. 01:11:36.570 --> 01:11:39.920 And if I want to be fancy, I can go down here and say, 01:11:39.920 --> 01:11:45.350 go to the first character in t and capitalize it. 01:11:45.350 --> 01:11:48.170 But again, I would argue that even though, yes, you're very clever 01:11:48.170 --> 01:11:50.420 and that you understand pointers and addresses at this point 01:11:50.420 --> 01:11:51.795 if you're writing code like this. 01:11:51.795 --> 01:11:53.990 Honestly, it's not necessarily as readable. 01:11:53.990 --> 01:11:57.800 So sticking with week two syntax of the square bracket notation, totally 01:11:57.800 --> 01:12:03.110 reasonable, totally correct, totally well-designed, and perhaps preferable, 01:12:03.110 --> 01:12:04.890 though I should be careful here. 01:12:04.890 --> 01:12:07.550 This line of code is a little bit risky for me 01:12:07.550 --> 01:12:10.310 because what if the user just hits Enter and they don't type hi 01:12:10.310 --> 01:12:11.540 or David or Brian. 01:12:11.540 --> 01:12:13.580 What if they type nothing except Enter? 01:12:13.580 --> 01:12:16.130 In that case, the length of the string might be 0. 01:12:16.130 --> 01:12:19.220 And then I probably shouldn't capitalizing the first character 01:12:19.220 --> 01:12:22.230 in a string that doesn't really even exist. 01:12:22.230 --> 01:12:25.250 So I should probably have some error checking, 01:12:25.250 --> 01:12:32.450 like if, for instance, the string length of t is at least greater than 0, 01:12:32.450 --> 01:12:34.960 then go ahead and safely do that. 01:12:34.960 --> 01:12:37.550 But again, this is just one example of some additional error 01:12:37.550 --> 01:12:39.200 checking I can add to the program. 01:12:39.200 --> 01:12:41.300 There's actually one more piece of error checking 01:12:41.300 --> 01:12:43.520 I should really do in a fully correct program, 01:12:43.520 --> 01:12:45.170 as you should do in problem sets. 01:12:45.170 --> 01:12:47.010 Sometimes things can go wrong. 01:12:47.010 --> 01:12:50.270 And if your program is so big, so fancy, and so memory-hungry 01:12:50.270 --> 01:12:52.187 that you're mallocing lots and lots of memory, 01:12:52.187 --> 01:12:54.062 which you won't do in the program this small, 01:12:54.062 --> 01:12:56.270 but over time you might need more and more memory, 01:12:56.270 --> 01:13:01.490 we should also make sure that t actually has a valid address. 01:13:01.490 --> 01:13:04.670 It turns out that malloc, most of the time, 01:13:04.670 --> 01:13:08.090 is going to return to you the address of a chunk of memory 01:13:08.090 --> 01:13:09.470 it has allocated for you. 01:13:09.470 --> 01:13:11.300 Just like get string, it will return to you 01:13:11.300 --> 01:13:14.900 the address of the first byte of the chunk of memory 01:13:14.900 --> 01:13:16.820 that it has found space for. 01:13:16.820 --> 01:13:18.740 However, sometimes things can go wrong. 01:13:18.740 --> 01:13:20.630 Sometimes your computer can be out of memory. 01:13:20.630 --> 01:13:24.320 You've probably seen your Mac or PC freeze or hang or reboot itself. 01:13:24.320 --> 01:13:26.910 That is very often the result of memory errors. 01:13:26.910 --> 01:13:29.000 So we should actually check something like this. 01:13:29.000 --> 01:13:32.570 If t equals equals this special value nul, 01:13:32.570 --> 01:13:35.360 then I'm going to go ahead and just bail out and return one, 01:13:35.360 --> 01:13:37.280 quit, let's get out of the program. 01:13:37.280 --> 01:13:38.760 It's not going to work. 01:13:38.760 --> 01:13:41.610 This might only happen one out of a million times. 01:13:41.610 --> 01:13:44.220 But it's more correct to check for nul. 01:13:44.220 --> 01:13:48.350 Now, unfortunately, the designers of C kind of used-- or programmers 01:13:48.350 --> 01:13:53.210 more generally, use this word, which is almost the same as N-U-L, 01:13:53.210 --> 01:13:54.980 otherwise known as backslash 0. 01:13:54.980 --> 01:13:57.290 Unfortunately, this is a different value. 01:13:57.290 --> 01:14:01.370 N-U-L-L represents a nul pointer. 01:14:01.370 --> 01:14:02.870 It is a bogus address. 01:14:02.870 --> 01:14:04.580 It is the absence of an address. 01:14:04.580 --> 01:14:06.950 Technically, its address 0. 01:14:06.950 --> 01:14:09.230 It is different from backslash 0. 01:14:09.230 --> 01:14:14.000 You use N-U-L-L in the context of pointers, as we are doing today. 01:14:14.000 --> 01:14:17.390 You use backslash 0, otherwise known verbally, 01:14:17.390 --> 01:14:21.210 as an N-U-L, or nul, in the context of characters. 01:14:21.210 --> 01:14:23.810 So backslash 0 is for characters. 01:14:23.810 --> 01:14:26.750 N-U-L-L in all caps is for pointers. 01:14:26.750 --> 01:14:29.750 And it's just a new symbol we're introducing today 01:14:29.750 --> 01:14:34.520 that comes with this standard lib dot h file. 01:14:34.520 --> 01:14:38.190 All right, so it turns out, honestly, I don't need to do some of this work. 01:14:38.190 --> 01:14:41.610 It turns out that if I want to copy one string to another, 01:14:41.610 --> 01:14:43.170 there is a function for that. 01:14:43.170 --> 01:14:45.920 And increasingly, you will not have to write as many lines of code 01:14:45.920 --> 01:14:49.520 as you previously did, because if you look up in the manual pages 01:14:49.520 --> 01:14:52.730 or you've heard about or find online that there's another function, like one 01:14:52.730 --> 01:14:56.790 called strcpy, you can actually, more simply, do something like this. 01:14:56.790 --> 01:15:00.410 So even though I really liked the idea, and it was correct to use a for loop 01:15:00.410 --> 01:15:04.950 to copy all of the characters from s into t, there's a function for that. 01:15:04.950 --> 01:15:06.200 It's called strcpy. 01:15:06.200 --> 01:15:09.830 It takes two arguments, the destination followed by the source. 01:15:09.830 --> 01:15:12.200 And it will just handle all of the looping 01:15:12.200 --> 01:15:15.890 for us, all of the copying for us, including the backslash 0, 01:15:15.890 --> 01:15:18.830 so that I can focus on what I want to do, which in this case, 01:15:18.830 --> 01:15:21.300 is actually capitalize things. 01:15:21.300 --> 01:15:26.497 So if we consider, now, this example, in the context of my computer's memory, 01:15:26.497 --> 01:15:28.580 we'll see that it's laid out a little differently. 01:15:28.580 --> 01:15:31.050 But there's one more bug I do want to fix first. 01:15:31.050 --> 01:15:33.230 And this is something we've not had to do yet. 01:15:33.230 --> 01:15:37.850 It turns out that any time you allocate memory with malloc, 01:15:37.850 --> 01:15:41.330 you ask the computer for memory, the onus is on you, the programmer, 01:15:41.330 --> 01:15:43.160 to eventually give it back. 01:15:43.160 --> 01:15:46.070 And by that, I mean if you allocate four bytes, 01:15:46.070 --> 01:15:49.430 or who knows, four million bytes of memory for an even bigger program, 01:15:49.430 --> 01:15:52.160 you'd better give it back to the computer, more specifically, 01:15:52.160 --> 01:15:55.252 the operating system, be it Linux or Mac OS or Windows, 01:15:55.252 --> 01:15:57.710 so that your computer eventually doesn't run out of memory. 01:15:57.710 --> 01:16:00.418 If all you ever do is ask for more memory, ask for more memory, 01:16:00.418 --> 01:16:03.710 it stands to reason that eventually your computer will run out, because it only 01:16:03.710 --> 01:16:05.370 has a finite amount of memory. 01:16:05.370 --> 01:16:07.910 It's got a finite amount of hardware recall. 01:16:07.910 --> 01:16:11.780 So when you're done with memory, it should be your best practice 01:16:11.780 --> 01:16:14.970 to free it afterward as well. 01:16:14.970 --> 01:16:18.950 And the opposite of malloc is just a function called free, which takes, 01:16:18.950 --> 01:16:22.040 as its input, whatever the output of malloc was. 01:16:22.040 --> 01:16:25.070 And recall that the output of malloc, the return value of malloc, 01:16:25.070 --> 01:16:30.210 is just the address of the first byte of memory that it has allocated for you. 01:16:30.210 --> 01:16:34.010 So if you ask it for four bytes, like I did a few lines ago with malloc, 01:16:34.010 --> 01:16:37.100 you're going to get back the address of the first of those bytes. 01:16:37.100 --> 01:16:41.150 And it's up to you to remember how many bytes you asked for. 01:16:41.150 --> 01:16:43.760 In the case of free, all you have to do is 01:16:43.760 --> 01:16:49.820 tell free via its input what the address was that malloc gave you. 01:16:49.820 --> 01:16:53.210 So if you stored that address as I did, in this variable called t, 01:16:53.210 --> 01:16:58.190 it suffices when you're done with that memory just called free t. 01:16:58.190 --> 01:17:02.360 And the computer will go about freeing up that memory for you. 01:17:02.360 --> 01:17:04.880 And you might very well get it back later on. 01:17:04.880 --> 01:17:07.400 But at least your computer won't run out of memory 01:17:07.400 --> 01:17:13.490 as quickly, because it can now reuse that space for something else. 01:17:13.490 --> 01:17:15.410 All right, let me go ahead, then, and propose 01:17:15.410 --> 01:17:17.870 that we draw a picture of this-- 01:17:17.870 --> 01:17:20.942 now new program's memory, where we copy things. 01:17:20.942 --> 01:17:23.900 So recall, this is where we left off before when comparing two strings. 01:17:23.900 --> 01:17:29.010 If this was s and s was pointing to h, i, exclamation point in lowercase, 01:17:29.010 --> 01:17:32.510 this new version of my code in copy.c, notice, 01:17:32.510 --> 01:17:34.550 still gives me another pointer called t. 01:17:34.550 --> 01:17:36.530 So that part of the story hasn't changed. 01:17:36.530 --> 01:17:37.970 But I call malloc now. 01:17:37.970 --> 01:17:40.790 And malloc is going to return to me some new chunk of memory. 01:17:40.790 --> 01:17:42.440 I don't know in advance where it is. 01:17:42.440 --> 01:17:45.740 But malloc's return value is going to be the address 01:17:45.740 --> 01:17:47.920 of the first bite of that memory. 01:17:47.920 --> 01:17:51.050 So for instance, 0x456 or whatever it is. 01:17:51.050 --> 01:17:54.230 And the subsequent bytes are going to be increasing by one 01:17:54.230 --> 01:17:59.630 byte at a time, 0x457, 0x458, 0x459. 01:17:59.630 --> 01:18:03.800 So what is, ultimately, stored in t when I assign it the return value of malloc? 01:18:03.800 --> 01:18:05.570 It's whatever that address is. 01:18:05.570 --> 01:18:07.980 Again, I could technically write 0x456 up here. 01:18:07.980 --> 01:18:09.800 But again, we're kind of past that. 01:18:09.800 --> 01:18:10.970 That's very 30 minutes ago. 01:18:10.970 --> 01:18:14.300 Let's now focus on just the abstraction that is a pointer. 01:18:14.300 --> 01:18:17.690 A pointer is just an arrow pointing from the variable 01:18:17.690 --> 01:18:19.980 to the actual location in memory. 01:18:19.980 --> 01:18:26.720 So now, if I go about copying s into t using strcpy, or more manually, 01:18:26.720 --> 01:18:28.670 using my for loop, what happens? 01:18:28.670 --> 01:18:31.610 Well, I'm copying the h over from s into t. 01:18:31.610 --> 01:18:36.110 I'm copying the i over from s into t, the exclamation point from s into t. 01:18:36.110 --> 01:18:40.530 And then lastly, the terminating nul character from s into t. 01:18:40.530 --> 01:18:42.740 So the picture is now fundamentally different. 01:18:42.740 --> 01:18:45.020 t is not pointing at the same thing. 01:18:45.020 --> 01:18:50.570 It's pointing at its own chunk of memory that has now, one step at a time, 01:18:50.570 --> 01:18:56.210 been duplicating whatever was at the address s. 01:18:56.210 --> 01:18:59.600 And so this is what you and I as humans would consider, presumably, 01:18:59.600 --> 01:19:04.080 to be a proper copy of the program. 01:19:04.080 --> 01:19:09.660 Any questions, then, on what we've just done by introducing malloc and free? 01:19:09.660 --> 01:19:11.910 The first of which allocates memory and gives you 01:19:11.910 --> 01:19:15.750 the address of the first byte of memory that you can now use, 01:19:15.750 --> 01:19:19.260 the latter of which hands it back to your operating system and says, 01:19:19.260 --> 01:19:20.700 I'm done with this. 01:19:20.700 --> 01:19:24.360 It can now be reused for something else, some other variable, 01:19:24.360 --> 01:19:27.090 maybe, down the road, if our program were longer. 01:19:27.090 --> 01:19:31.530 Brian, any questions or confusion I can help with? 01:19:31.530 --> 01:19:33.870 BRIAN: Someone asked, even if you're using strcpy 01:19:33.870 --> 01:19:37.470 to copy the string instead of copying the characters one at a time yourself, 01:19:37.470 --> 01:19:39.420 do you still need to free the memory? 01:19:39.420 --> 01:19:40.545 DAVID MALAN: Good question. 01:19:40.545 --> 01:19:43.320 Even if you're using strcpy, you do need to still use free. 01:19:43.320 --> 01:19:48.120 Yes, anytime you use malloc henceforth, you must use free. 01:19:48.120 --> 01:19:52.470 Anytime you use malloc, you must use free in order to free up that memory. 01:19:52.470 --> 01:19:56.370 strcpy is copying the contents of one chunk of memory to the other. 01:19:56.370 --> 01:19:59.220 It is not allocating or managing that memory for you. 01:19:59.220 --> 01:20:02.520 It is just implementing, essentially, that for loop. 01:20:02.520 --> 01:20:05.520 And it's, perhaps, time too, where I can take off another training wheel 01:20:05.520 --> 01:20:06.020 verbally. 01:20:06.020 --> 01:20:10.410 It turns out that get string, all this time, is kind of magical. 01:20:10.410 --> 01:20:13.470 One of the things that get string does from the CS50 library 01:20:13.470 --> 01:20:16.080 is it itself uses malloc. 01:20:16.080 --> 01:20:19.800 Consider, after all, when we, the staff, wrote get string years ago, 01:20:19.800 --> 01:20:22.830 we have no idea how long your names are going to be this year. 01:20:22.830 --> 01:20:24.690 We have no idea what sentences you're going 01:20:24.690 --> 01:20:28.350 to type, what paragraphs you're going to type, what text you're going to analyze 01:20:28.350 --> 01:20:30.240 for a program like readability. 01:20:30.240 --> 01:20:32.610 So we had to implement get string in such a way 01:20:32.610 --> 01:20:35.730 that you can type as few or as many characters at your keyboard 01:20:35.730 --> 01:20:36.420 as you want. 01:20:36.420 --> 01:20:40.150 And we will make sure there's enough memory for that string. 01:20:40.150 --> 01:20:43.530 So get string, underneath the hood, if you look at the code we, the staff, 01:20:43.530 --> 01:20:46.530 wrote someday, you'll see that we use malloc. 01:20:46.530 --> 01:20:51.390 And we call malloc in order to get enough memory to fit that string. 01:20:51.390 --> 01:20:54.600 And then, what the CS50 library is also secretly doing, 01:20:54.600 --> 01:20:57.060 is it is also calling free for you. 01:20:57.060 --> 01:20:59.130 There's, essentially, a fancy way where you 01:20:59.130 --> 01:21:03.690 can write a program that, as soon as main is about to quit or return 01:21:03.690 --> 01:21:06.480 to your blinking prompt, some special code 01:21:06.480 --> 01:21:10.860 we wrote swoops in at that final moment, frees any of the memory 01:21:10.860 --> 01:21:14.130 that we, the library, allocated so that you 01:21:14.130 --> 01:21:17.190 don't run out of memory because of us. 01:21:17.190 --> 01:21:19.590 But you all, when using malloc, will have 01:21:19.590 --> 01:21:23.700 to call free, because the library is not going to do that for you. 01:21:23.700 --> 01:21:26.400 And indeed, the goal of today and next week and beyond 01:21:26.400 --> 01:21:30.833 is to stop using the CS50 library, ultimately, altogether. 01:21:30.833 --> 01:21:33.000 All right, well let's-- it would be unfair, I think, 01:21:33.000 --> 01:21:36.000 if we introduced all of these fancy new techniques but don't necessarily 01:21:36.000 --> 01:21:40.620 provide you with any sort of tools with which to determine to chase down bugs 01:21:40.620 --> 01:21:43.245 in your new fancy code or solve problems, now, 01:21:43.245 --> 01:21:44.370 that are related to memory. 01:21:44.370 --> 01:21:46.860 And thankfully, there are programs via which 01:21:46.860 --> 01:21:49.560 you can chase down memory-related bugs. 01:21:49.560 --> 01:21:52.080 This is in addition to printf, that function, 01:21:52.080 --> 01:21:56.550 and help50 and check50 and debug50 and debuggers more generally. 01:21:56.550 --> 01:21:59.940 This program-- and it's really the last of the new tools we'll introduce you 01:21:59.940 --> 01:22:01.920 to in C-- is called valgrind. 01:22:01.920 --> 01:22:04.830 And this is a program that exists in CS50 IDE. 01:22:04.830 --> 01:22:07.620 But it exists on Macs and PC's and Linux computers 01:22:07.620 --> 01:22:10.050 anywhere, where you can run it on your own code 01:22:10.050 --> 01:22:12.870 to detect if you're doing anything wrong with memory. 01:22:12.870 --> 01:22:14.370 What might you do wrong with memory? 01:22:14.370 --> 01:22:17.037 Well, previously, remember, I triggered that segmentation fault. 01:22:17.037 --> 01:22:19.320 I touched memory that I should not. 01:22:19.320 --> 01:22:22.080 Valgrind is a tool that can help you figure out, 01:22:22.080 --> 01:22:25.000 where did you touch memory that you shouldn't have, 01:22:25.000 --> 01:22:27.960 so as to focus your own human attention on whatever lines of code 01:22:27.960 --> 01:22:28.830 might be buggy. 01:22:28.830 --> 01:22:32.610 Valgrind grant can also detect if you forget to call free. 01:22:32.610 --> 01:22:36.240 If you call malloc one or more times, but don't call free 01:22:36.240 --> 01:22:38.280 a corresponding number of times, valgrind 01:22:38.280 --> 01:22:40.890 is a program that can notice that and tell you that you have 01:22:40.890 --> 01:22:42.580 what's called a memory leak. 01:22:42.580 --> 01:22:44.760 And indeed, this is germane to our own Macs and PCs. 01:22:44.760 --> 01:22:47.100 Again, if you've been using your Mac or PC or sometimes 01:22:47.100 --> 01:22:50.070 even your phone for a long, long time, and maybe 01:22:50.070 --> 01:22:53.340 running lots of different programs at once, lots of browser tabs 01:22:53.340 --> 01:22:55.680 open, lots of different programs open at once, 01:22:55.680 --> 01:22:59.370 your Mac or PC might very well have begun to slow to a crawl. 01:22:59.370 --> 01:23:01.920 It might be annoying, if not impossible to use, 01:23:01.920 --> 01:23:03.960 because everything is so darn slow. 01:23:03.960 --> 01:23:07.920 That may very well be because one or more of the programs you're using 01:23:07.920 --> 01:23:12.480 has some bug in it whereby a programmer kept allocating memory 01:23:12.480 --> 01:23:15.210 and never got around to calling free. 01:23:15.210 --> 01:23:17.273 Maybe it's a bug, maybe it was deliberate, 01:23:17.273 --> 01:23:19.440 they didn't expect you to have so many windows open. 01:23:19.440 --> 01:23:21.360 But valgrind can detect errors like that. 01:23:21.360 --> 01:23:23.730 And honestly, some of you, if you're like me, 01:23:23.730 --> 01:23:29.370 you might very well have 10, 20, 50 different browser tabs open at once, 01:23:29.370 --> 01:23:32.910 thinking oh, I'm going to come back to that someday, even though we never do. 01:23:32.910 --> 01:23:34.950 Each of those tabs takes up memory. 01:23:34.950 --> 01:23:37.320 Literally, any time you open a browser tab, think of it, 01:23:37.320 --> 01:23:41.580 really, as Chrome or Edge or Firefox or whatever 01:23:41.580 --> 01:23:43.920 you're using, underneath the hood, they're 01:23:43.920 --> 01:23:46.320 probably calling a function on Mac OS or Windows 01:23:46.320 --> 01:23:50.670 like malloc to give you more memory to contain the contents of that web page 01:23:50.670 --> 01:23:51.480 temporarily. 01:23:51.480 --> 01:23:54.310 And if you keep opening more and more browser tabs, 01:23:54.310 --> 01:23:56.190 it's like calling malloc, malloc, malloc. 01:23:56.190 --> 01:23:57.840 Eventually, you're going to run out. 01:23:57.840 --> 01:23:59.700 And computers can be smart these days. 01:23:59.700 --> 01:24:03.060 They can kind of temporarily remove things from memory to free up space. 01:24:03.060 --> 01:24:04.477 This is called virtual memory. 01:24:04.477 --> 01:24:06.310 But eventually, something is going to break. 01:24:06.310 --> 01:24:08.520 And it might very well be your user experience 01:24:08.520 --> 01:24:11.700 when things get so slow that you literally have to quit the program 01:24:11.700 --> 01:24:14.140 or maybe even reboot your computer. 01:24:14.140 --> 01:24:15.240 So how do we use valgrind? 01:24:15.240 --> 01:24:17.430 Well, let me go ahead and write a short program 01:24:17.430 --> 01:24:20.040 that doesn't do anything useful, but demonstrates 01:24:20.040 --> 01:24:22.080 multiple memory-related mistakes. 01:24:22.080 --> 01:24:24.060 I'll call this file memory.c. 01:24:24.060 --> 01:24:27.550 I'm going to go ahead and open up the file memory.c 01:24:27.550 --> 01:24:30.842 and include at the top standard io dot h. 01:24:30.842 --> 01:24:32.550 And then I'm going to also, preemptively, 01:24:32.550 --> 01:24:37.290 include standard lib dot h, which recalls where malloc, int main void. 01:24:37.290 --> 01:24:39.070 And I'm going to keep this one simple. 01:24:39.070 --> 01:24:42.370 I'm going to go ahead and just give myself a whole bunch of integer. 01:24:42.370 --> 01:24:43.810 So this is actually kind of cool. 01:24:43.810 --> 01:24:46.480 It turns out that-- 01:24:46.480 --> 01:24:47.880 well, let's go ahead. 01:24:47.880 --> 01:24:48.910 Yeah, I can do this. 01:24:48.910 --> 01:24:50.035 Let's go ahead and do this. 01:24:50.035 --> 01:24:52.650 Char star s gets malloc. 01:24:52.650 --> 01:24:57.630 And let me go ahead and give myself, how about three of these. 01:24:57.630 --> 01:25:01.050 Let me go ahead and allocate space for three chars. 01:25:01.050 --> 01:25:03.640 Or actually, let's give me four, just like before. 01:25:03.640 --> 01:25:08.340 Now, I'm going to go ahead and say s bracket 0 equals 72. 01:25:08.340 --> 01:25:12.220 s bracket 1-- actually, I'll just do this manually. 01:25:12.220 --> 01:25:14.160 Let's do h. 01:25:14.160 --> 01:25:16.350 Let's do i. 01:25:16.350 --> 01:25:19.960 Let's do our usual exclamation point. 01:25:19.960 --> 01:25:22.170 And then just for good measure, s bracket 3 gets 01:25:22.170 --> 01:25:24.120 quote unquote, backslash 0. 01:25:24.120 --> 01:25:29.340 This is the very manual way of actually-- 01:25:29.340 --> 01:25:32.430 this is the very manual way of actually building up a string. 01:25:32.430 --> 01:25:34.060 But let me introduce a mistake. 01:25:34.060 --> 01:25:37.320 Let me accidentally allocate only three bytes, 01:25:37.320 --> 01:25:40.440 even though I clearly need a fourth for that terminating nul character. 01:25:40.440 --> 01:25:42.510 And notice too, the absence of free. 01:25:42.510 --> 01:25:45.720 I'm going to, very sloppily, not bother calling free. 01:25:45.720 --> 01:25:49.590 Now, I'm going to go ahead and compile this program, make memory. 01:25:49.590 --> 01:25:53.430 OK, it compiles OK, so that's good, dot slash memory. 01:25:53.430 --> 01:25:55.413 OK, nothing happens, but that kind of makes 01:25:55.413 --> 01:25:57.330 sense because I didn't tell it to do anything. 01:25:57.330 --> 01:26:01.360 Just for kicks, let's print out that string just like we always do. 01:26:01.360 --> 01:26:04.500 Let me now recompile memory, still compiles. 01:26:04.500 --> 01:26:06.360 Let me run dot slash memory. 01:26:06.360 --> 01:26:07.570 OK, it seems to work. 01:26:07.570 --> 01:26:10.000 So at first glance, you might be really proud of yourself. 01:26:10.000 --> 01:26:12.910 You've written another correct program, seems to pass check50. 01:26:12.910 --> 01:26:13.410 You submit. 01:26:13.410 --> 01:26:14.327 You go about your day. 01:26:14.327 --> 01:26:16.380 And you're very disappointed some days later 01:26:16.380 --> 01:26:19.920 when you realize, dammit, I did not get full credit on this because there's 01:26:19.920 --> 01:26:21.780 actually a latent bug. 01:26:21.780 --> 01:26:24.780 So sometimes, indeed, there are bugs in your code 01:26:24.780 --> 01:26:27.420 that you don't necessarily see visually, you don't necessarily 01:26:27.420 --> 01:26:30.990 experience when running it yourself, but eventually, there 01:26:30.990 --> 01:26:33.443 might be an error when running it enough times. 01:26:33.443 --> 01:26:36.360 Eventually, a computer might notice that you're doing something wrong. 01:26:36.360 --> 01:26:38.460 And thankfully, tools exist like valgrind, 01:26:38.460 --> 01:26:40.098 that can allow you to detect that. 01:26:40.098 --> 01:26:43.140 So let me go ahead and just increase the size of my terminal window here. 01:26:43.140 --> 01:26:48.090 And let me go ahead and run valgrind on dot slash memory. 01:26:48.090 --> 01:26:49.290 So it's just like debug50. 01:26:49.290 --> 01:26:53.040 Instead of running debug50 and then dot slash whatever the program is, 01:26:53.040 --> 01:26:55.813 you run valgrind dot slash memory. 01:26:55.813 --> 01:26:58.230 This one, unfortunately, is only a command line interface. 01:26:58.230 --> 01:27:00.480 There's no graphical user interface like debug50. 01:27:00.480 --> 01:27:04.530 And honestly, it's a hideous sequence of output. 01:27:04.530 --> 01:27:06.630 This should overwhelm you at first glance. 01:27:06.630 --> 01:27:08.190 There's crazy cryptic-ness here. 01:27:08.190 --> 01:27:09.690 It's not the best-designed program. 01:27:09.690 --> 01:27:12.520 It really was meant for the most comfortable people. 01:27:12.520 --> 01:27:15.180 But there are some useful tidbits we can take away from it. 01:27:15.180 --> 01:27:17.490 As always, let me show all the way to the top 01:27:17.490 --> 01:27:19.260 to the very first line of output. 01:27:19.260 --> 01:27:21.600 And I'll draw your attention to a couple of things 01:27:21.600 --> 01:27:23.070 that will start to jump out to you. 01:27:23.070 --> 01:27:24.960 And help50 can help you with this. 01:27:24.960 --> 01:27:28.020 If you're confused by valgrind's output, rerun it. 01:27:28.020 --> 01:27:29.520 But put help50 at the beginning. 01:27:29.520 --> 01:27:32.120 And just like I will do now verbally, so can help50 01:27:32.120 --> 01:27:36.510 help you notice the important things in this crazy mess of output. 01:27:36.510 --> 01:27:37.770 This is worrisome. 01:27:37.770 --> 01:27:41.880 Valgrind is noting on this line here, invalid right of size 1. 01:27:41.880 --> 01:27:44.370 And that's on line 10 of memory.c. 01:27:44.370 --> 01:27:46.510 So we'll look at that in a moment. 01:27:46.510 --> 01:27:50.530 If I scroll down further, invalid read of size 1. 01:27:50.530 --> 01:27:55.810 And that also seems to be on here, it looks like, on line 11 of memory.c. 01:27:55.810 --> 01:27:59.070 And then if I keep scrolling, keep scrolling, keep scrolling, 01:27:59.070 --> 01:28:00.990 I'm not liking this. 01:28:00.990 --> 01:28:05.910 3 bytes in 1 blocks are definitely lost in loss record, whatever that is. 01:28:05.910 --> 01:28:10.170 But three bytes in 1 blocks are definitely lost. 01:28:10.170 --> 01:28:15.240 And then down here, leak summary, definitely lost, 3 bytes in 1 blocks. 01:28:15.240 --> 01:28:17.703 Incidentally, 1 blocks, obviously not correct grammar. 01:28:17.703 --> 01:28:19.620 This is what happens when your program doesn't 01:28:19.620 --> 01:28:24.210 have an if condition that checks if the number is 1 or positive or 0. 01:28:24.210 --> 01:28:27.300 You could fix this, grammatically, honestly, with a simple if condition. 01:28:27.300 --> 01:28:29.770 They did not when writing this program years ago. 01:28:29.770 --> 01:28:32.110 So there's two or three mistakes here. 01:28:32.110 --> 01:28:34.620 One is some kind of invalid read or write. 01:28:34.620 --> 01:28:35.953 And another is this leak. 01:28:35.953 --> 01:28:36.870 Well, what is a write? 01:28:36.870 --> 01:28:38.940 A write just refers to changing a value. 01:28:38.940 --> 01:28:43.150 A read just refers to reading or using or printing a value. 01:28:43.150 --> 01:28:44.730 So let's focus on line 10. 01:28:44.730 --> 01:28:48.060 If I scroll back down to my code and look on line 10, 01:28:48.060 --> 01:28:51.760 this was an invalid write, invalid write. 01:28:51.760 --> 01:28:52.950 Well, why is it invalid? 01:28:52.950 --> 01:28:57.180 Well, per today's definition, if you are allocating 3 bytes, 01:28:57.180 --> 01:29:01.710 you are welcome to touch the first byte, the second byte, and the third byte. 01:29:01.710 --> 01:29:04.500 But you have no business touching the fourth byte 01:29:04.500 --> 01:29:06.420 if you've only asked for three. 01:29:06.420 --> 01:29:11.070 This is like a small scale version of the very adventurous and inappropriate 01:29:11.070 --> 01:29:14.100 poking around I did when I looked at 10,000 bytes away. 01:29:14.100 --> 01:29:16.680 Even looking one byte away is a potential bug 01:29:16.680 --> 01:29:18.780 and can cause a program to crash. 01:29:18.780 --> 01:29:21.720 Meanwhile, line 11 is also problematic, which 01:29:21.720 --> 01:29:25.470 is an invalid read, because now, you're saying go print out this string. 01:29:25.470 --> 01:29:28.043 But that string contains a memory address 01:29:28.043 --> 01:29:30.210 that you should not have touched in the first place. 01:29:30.210 --> 01:29:34.080 And the memory leak, the third problem, stems from the fact 01:29:34.080 --> 01:29:36.520 that I didn't free that memory. 01:29:36.520 --> 01:29:40.380 So again, it'll take some practice and experience, some mistakes of your own, 01:29:40.380 --> 01:29:42.480 to notice and understand these bugs. 01:29:42.480 --> 01:29:44.670 But let me fix the first two like this. 01:29:44.670 --> 01:29:46.530 Let me just give myself four bytes. 01:29:46.530 --> 01:29:48.990 And let me fix the second one or the third one, 01:29:48.990 --> 01:29:53.820 really, by freeing s at the very end, because again, any time you use malloc 01:29:53.820 --> 01:29:55.590 you must use free. 01:29:55.590 --> 01:29:59.310 Let me go ahead and recompile memory, seems to compile. 01:29:59.310 --> 01:30:02.130 Let me rerun it, still works the same, visually. 01:30:02.130 --> 01:30:05.670 But now, let's rerun valgrind on it and see if there are any errors now, 01:30:05.670 --> 01:30:08.710 so valgrind dot slash memory, Enter. 01:30:08.710 --> 01:30:10.710 The output's still going to look pretty cryptic. 01:30:10.710 --> 01:30:15.300 But notice all heap blocks were freed, whatever that means. 01:30:15.300 --> 01:30:16.217 No leaks are possible. 01:30:16.217 --> 01:30:18.133 It doesn't really get more explicit than that. 01:30:18.133 --> 01:30:19.090 That's a good thing. 01:30:19.090 --> 01:30:23.100 And if I scroll up, I see no mention of those invalid reads or writes. 01:30:23.100 --> 01:30:26.168 So starting with this week's problems and next week's in C, 01:30:26.168 --> 01:30:27.960 not only are you going to want to use tools 01:30:27.960 --> 01:30:31.590 like help50 and printf and debug50 and check50, 01:30:31.590 --> 01:30:35.710 but even if you think your code's right, the output looks right, 01:30:35.710 --> 01:30:37.050 you might have a latent bug. 01:30:37.050 --> 01:30:40.200 And even when your programs are small, they might not crash the computer. 01:30:40.200 --> 01:30:43.500 They might not cause that segmentation fault. Eventually, they will. 01:30:43.500 --> 01:30:47.850 And you do want to use tools like this to chase down any such mistakes. 01:30:47.850 --> 01:30:50.460 Otherwise, bad things can happen. 01:30:50.460 --> 01:30:51.600 And what might happen? 01:30:51.600 --> 01:30:54.900 Well, let me go ahead and reveal an example here 01:30:54.900 --> 01:30:57.840 that presents some code that's a little dangerous. 01:30:57.840 --> 01:31:00.600 So here, for instance, is an example where 01:31:00.600 --> 01:31:05.202 I'm declaring at the top of the function, int star x and int star y. 01:31:05.202 --> 01:31:06.160 So what does that mean? 01:31:06.160 --> 01:31:08.700 Well, per today's parlance, this just means give me 01:31:08.700 --> 01:31:11.550 a pointer to an integer called x. 01:31:11.550 --> 01:31:13.800 Give me a pointer to an integer called y. 01:31:13.800 --> 01:31:16.650 Put another way, give me a variable called x that I 01:31:16.650 --> 01:31:18.900 can store the address of an int in. 01:31:18.900 --> 01:31:23.640 Give me a variable called y that I can store the address of another int in. 01:31:23.640 --> 01:31:27.880 But notice what I am not doing on these first two lines. 01:31:27.880 --> 01:31:31.950 I'm not actually assigning them a value until line 3. 01:31:31.950 --> 01:31:36.000 On line 3, even though this is weird-- this is not how we've allocated space 01:31:36.000 --> 01:31:37.530 for integers before-- 01:31:37.530 --> 01:31:41.130 there's no reason that you can't use malloc 01:31:41.130 --> 01:31:45.550 and say, give me enough space for the size of an integer. 01:31:45.550 --> 01:31:46.370 sizeof is new. 01:31:46.370 --> 01:31:50.150 It's just an operator in C that tells you the size of a data type, 01:31:50.150 --> 01:31:51.500 like a size of an int. 01:31:51.500 --> 01:31:53.480 So maybe you forgot that an int is 4. 01:31:53.480 --> 01:31:56.450 And indeed, an int is usually 4, but not always 4 in all systems. 01:31:56.450 --> 01:32:00.020 So size of int just makes sure that it will always give you the right answer, 01:32:00.020 --> 01:32:02.630 whether you're using a modern computer or an old one. 01:32:02.630 --> 01:32:07.190 So this just means, really, allocate 4 bytes to me on a modern system. 01:32:07.190 --> 01:32:11.370 And it stores the address of the first byte in x. 01:32:11.370 --> 01:32:15.360 Would someone mind translating to layman's terms, what 01:32:15.360 --> 01:32:18.480 is star x equal 42 doing? 01:32:18.480 --> 01:32:20.880 Star, again, is the dereference operator. 01:32:20.880 --> 01:32:23.430 It means go to the address. 01:32:23.430 --> 01:32:24.375 And do what? 01:32:24.375 --> 01:32:27.510 How would you describe, with a verbal comment, 01:32:27.510 --> 01:32:30.450 what star x equals 42 is doing? 01:32:30.450 --> 01:32:33.630 Brian, would you mind verbalizing any thoughts? 01:32:33.630 --> 01:32:37.555 BRIAN: Yeah, so Sophia suggested that at that address, we are going to place 42. 01:32:37.555 --> 01:32:38.430 DAVID MALAN: Perfect. 01:32:38.430 --> 01:32:40.080 At that address put 42. 01:32:40.080 --> 01:32:44.640 Equivalently, go to that address in x and put the number 42 there. 01:32:44.640 --> 01:32:48.870 It's like going to Brian's mailbox and putting the 42 in his mailbox, 01:32:48.870 --> 01:32:52.035 instead of what we previously had there, which was the number 50. 01:32:52.035 --> 01:32:57.180 How about this next fifth line, star y equals 13? 01:32:57.180 --> 01:32:59.670 Brian, could you verbalize someone else? 01:32:59.670 --> 01:33:03.500 What does star y equals 13 do for us? 01:33:03.500 --> 01:33:07.850 And it's not an accident that 13 tends to be unlucky. 01:33:07.850 --> 01:33:10.530 BRIAN: Peter says, put 13 at the address y. 01:33:10.530 --> 01:33:12.710 DAVID MALAN: Good, put 13 at the address in y. 01:33:12.710 --> 01:33:16.860 Or put another way, go to the address in y and put 13 there. 01:33:16.860 --> 01:33:19.070 But there's a logical problem here. 01:33:19.070 --> 01:33:20.870 What is in y? 01:33:20.870 --> 01:33:24.860 If I rewind, I never actually assign y a value. 01:33:24.860 --> 01:33:27.050 I don't initially, and I don't eventually. 01:33:27.050 --> 01:33:30.500 At least with x, even though I didn't give it a value in declaring it up here 01:33:30.500 --> 01:33:34.850 as a variable, I eventually got around to storing in it the actual address. 01:33:34.850 --> 01:33:38.060 Now, just to be really nit picky, I should probably even, in this program, 01:33:38.060 --> 01:33:40.495 check for nul just in case anything went wrong. 01:33:40.495 --> 01:33:41.870 But that's a whole other problem. 01:33:41.870 --> 01:33:46.470 It is a more damning problem that I haven't even given y a value. 01:33:46.470 --> 01:33:49.610 And here's where we can reveal one other detail about a computer. 01:33:49.610 --> 01:33:53.750 Thus far, we've been taking for granted that you and I almost always initialize 01:33:53.750 --> 01:33:54.360 our memory. 01:33:54.360 --> 01:33:56.900 If we want to give ourselves a char, an int, a string, 01:33:56.900 --> 01:33:59.900 we literally type it out into the program 01:33:59.900 --> 01:34:02.150 itself so that it's there when we want it. 01:34:02.150 --> 01:34:04.070 But if we consider this picture here, which 01:34:04.070 --> 01:34:07.370 is now just a physical incarnation of some of the contents of your computer's 01:34:07.370 --> 01:34:11.750 memory, playfully labeled with a lot of Oscar the Grouches, 01:34:11.750 --> 01:34:16.250 this is because you should never trust the contents of your computer's memory 01:34:16.250 --> 01:34:18.500 if you yourself have not put something there. 01:34:18.500 --> 01:34:21.560 There's a term of art in programming called garbage values. 01:34:21.560 --> 01:34:26.180 If you yourself have not put a value somewhere in memory, 01:34:26.180 --> 01:34:30.210 you should assume, to be safe, that it is a quote unquote, "garbage value." 01:34:30.210 --> 01:34:31.440 It's not a weird value. 01:34:31.440 --> 01:34:34.580 It's just a 1, a 2, an A, a B, a C, you just 01:34:34.580 --> 01:34:38.510 don't know what it is, because if your program is running over time 01:34:38.510 --> 01:34:40.890 and you're calling functions and functions are returning. 01:34:40.890 --> 01:34:43.348 You're calling other functions and functions are returning. 01:34:43.348 --> 01:34:46.970 These values in your computer's memory are constantly changing, 01:34:46.970 --> 01:34:48.740 and your memory gets reused. 01:34:48.740 --> 01:34:53.180 When you free memory, that doesn't erase it or set it all back to 0's or set it 01:34:53.180 --> 01:34:53.990 all back to 1's. 01:34:53.990 --> 01:34:56.600 It just leaves it alone so that you can reuse 01:34:56.600 --> 01:34:59.810 it, which means over time, your computer contains remnants 01:34:59.810 --> 01:35:03.960 of all of the variables you've ever used in your program over here, over here, 01:35:03.960 --> 01:35:04.730 over there. 01:35:04.730 --> 01:35:10.850 And so in a program like this, where you have not explicitly initialized y 01:35:10.850 --> 01:35:14.000 to anything, you should assume that Oscar the Grouch, so to speak, 01:35:14.000 --> 01:35:15.020 is at that location. 01:35:15.020 --> 01:35:20.570 It is a garbage value that looks like an address but is not a valid address. 01:35:20.570 --> 01:35:25.040 And so when you say star y equals 13, that means go to that address. 01:35:25.040 --> 01:35:28.910 But really, go to that bogus address and put something there. 01:35:28.910 --> 01:35:31.850 And odds are, your program is going to crash. 01:35:31.850 --> 01:35:33.650 You are going to get a segmentation fault, 01:35:33.650 --> 01:35:37.562 because by going to some arbitrary garbage value address, 01:35:37.562 --> 01:35:40.520 it would be like picking up a random piece of paper with a number on it 01:35:40.520 --> 01:35:42.030 and then going to that mailbox. 01:35:42.030 --> 01:35:42.530 Why? 01:35:42.530 --> 01:35:44.300 It does it belong to you. 01:35:44.300 --> 01:35:47.930 If you try to dereference an uninitialized variable, 01:35:47.930 --> 01:35:49.850 your program may very well crash. 01:35:49.850 --> 01:35:51.890 And this is, perhaps, no better-presented 01:35:51.890 --> 01:35:55.970 than by some of our friends, Nick Parlante, a professor at Stanford 01:35:55.970 --> 01:36:02.510 University who is breathed life into a character in claymation known as Binky. 01:36:02.510 --> 01:36:06.140 We have just a 2 minute clip from this that paints the picture of bad things 01:36:06.140 --> 01:36:09.020 indeed happening when you touch memory that you shouldn't. 01:36:09.020 --> 01:36:13.340 So hopefully, a helpful reminder as to what to do and not to do with pointers. 01:36:13.340 --> 01:36:14.790 Here we go. 01:36:14.790 --> 01:36:16.610 [VIDEO PLAYBACK] 01:36:16.610 --> 01:36:17.540 - Hey, Binky. 01:36:17.540 --> 01:36:20.890 Wake up, it's time for pointer fun. 01:36:20.890 --> 01:36:22.060 - What's that? 01:36:22.060 --> 01:36:23.620 Learn about pointers? 01:36:23.620 --> 01:36:25.390 Oh, goody! 01:36:25.390 --> 01:36:28.430 - Well, to get started, I guess we're going to need a couple pointers. 01:36:28.430 --> 01:36:32.940 - OK, this code allocates two pointers which can point to integers. 01:36:32.940 --> 01:36:35.042 - OK, well I see the two pointers. 01:36:35.042 --> 01:36:37.000 But they don't seem to be pointing to anything. 01:36:37.000 --> 01:36:37.780 - That's right. 01:36:37.780 --> 01:36:39.970 Initially, pointers don't point to anything. 01:36:39.970 --> 01:36:42.190 The things they point to or called pointees. 01:36:42.190 --> 01:36:44.110 And setting them up's a separate step. 01:36:44.110 --> 01:36:45.100 - Oh, right, right. 01:36:45.100 --> 01:36:45.790 I knew that. 01:36:45.790 --> 01:36:47.750 The pointees are separate. 01:36:47.750 --> 01:36:50.050 So how do you allocate a pointee? 01:36:50.050 --> 01:36:53.800 - OK, well, this code allocates a new integer pointee. 01:36:53.800 --> 01:36:56.880 And this part sets x to point to it. 01:36:56.880 --> 01:36:58.180 - Hey, that looks better. 01:36:58.180 --> 01:36:59.700 So make it do something. 01:36:59.700 --> 01:37:05.460 - OK, I'll dereference the pointer x to store the number 42 into its pointee. 01:37:05.460 --> 01:37:08.970 For this trick, I'll need my magic wand of dereferencing. 01:37:08.970 --> 01:37:12.660 - Your magic wand of dereferencing? 01:37:12.660 --> 01:37:14.170 That's great. 01:37:14.170 --> 01:37:15.910 - This is what the code looks like. 01:37:15.910 --> 01:37:17.800 I'll just set up the number and-- 01:37:17.800 --> 01:37:18.900 [POP] 01:37:18.900 --> 01:37:21.000 - Hey, look, there it goes. 01:37:21.000 --> 01:37:25.830 So doing a dereference on x follows the arrow to access its pointee. 01:37:25.830 --> 01:37:28.020 In this case, to store 42 in there. 01:37:28.020 --> 01:37:32.450 Hey, try using it to store the number 13 through the other pointer, y. 01:37:32.450 --> 01:37:33.570 - OK. 01:37:33.570 --> 01:37:38.100 I'll just go over here to y and get the number 13 set up 01:37:38.100 --> 01:37:41.970 and then take the wand of dereferencing and just-- 01:37:41.970 --> 01:37:43.580 [HORN] whoa! 01:37:43.580 --> 01:37:45.930 - Oh, hey, that didn't work. 01:37:45.930 --> 01:37:51.370 Say, Binky, I don't think dereferencing y is a good idea, because setting up 01:37:51.370 --> 01:37:52.840 the pointee is a separate step. 01:37:52.840 --> 01:37:54.815 And I don't think we ever did it. 01:37:54.815 --> 01:37:56.430 - Hmm, good point. 01:37:56.430 --> 01:37:58.800 - Yeah, we allocated the pointer y. 01:37:58.800 --> 01:38:01.570 But we never set it to point to a pointee. 01:38:01.570 --> 01:38:03.480 - Hmm, very observant. 01:38:03.480 --> 01:38:05.310 - Hey, you're looking good there, Binky. 01:38:05.310 --> 01:38:08.250 Can you fix it so that y points to the same pointee as x? 01:38:08.250 --> 01:38:11.620 - Sure, I'll use my magic wand of pointer assignment. 01:38:11.620 --> 01:38:13.800 - Is that going to be a problem like before? 01:38:13.800 --> 01:38:15.630 - No, this doesn't touch the pointees. 01:38:15.630 --> 01:38:19.170 It just changes one pointer to point to the same thing as another. 01:38:19.170 --> 01:38:20.310 - Oh, I see. 01:38:20.310 --> 01:38:23.040 Now, y points to the same place as x. 01:38:23.040 --> 01:38:24.840 So wait, now y is fixed. 01:38:24.840 --> 01:38:25.950 It has a pointee. 01:38:25.950 --> 01:38:29.760 So you can try the wand of dereferencing again to send the 13 over. 01:38:29.760 --> 01:38:31.093 - Oh, OK. 01:38:31.093 --> 01:38:31.635 Here it goes. 01:38:31.635 --> 01:38:32.900 [POP] 01:38:32.900 --> 01:38:34.160 - Hey, look at that. 01:38:34.160 --> 01:38:35.870 Now, dereferencing works on y. 01:38:35.870 --> 01:38:39.980 And because the pointers are sharing that one pointee, they both see the 13. 01:38:39.980 --> 01:38:41.720 - Yeah, sharing, whatever. 01:38:41.720 --> 01:38:43.610 So are we going to switch places now? 01:38:43.610 --> 01:38:45.270 - Oh look, we're out of time. 01:38:45.270 --> 01:38:45.770 - But-- 01:38:45.770 --> 01:38:46.040 [END PLAYBACK] 01:38:46.040 --> 01:38:47.570 DAVID MALAN: All right, so we are not quite out of time. 01:38:47.570 --> 01:38:50.028 But let's go ahead and take our second 5 minute break here. 01:38:50.028 --> 01:38:52.910 And when we return, we'll take a closer look at Oscar and more. 01:38:52.910 --> 01:38:54.260 Back in 5. 01:38:54.260 --> 01:38:57.380 All right, so I claim that there's all these garbage 01:38:57.380 --> 01:38:58.950 values in your computer's memory. 01:38:58.950 --> 01:39:00.860 But how can you see them? 01:39:00.860 --> 01:39:04.400 What Binky did was, of course, try to dereference a garbage value 01:39:04.400 --> 01:39:05.817 when bad things happen. 01:39:05.817 --> 01:39:07.900 But we can actually see this with code of our own. 01:39:07.900 --> 01:39:10.970 So let me go ahead, quickly, and whip up a little program here, 01:39:10.970 --> 01:39:15.290 just like something we did in week one or week two, 01:39:15.290 --> 01:39:17.090 but without doing it very well. 01:39:17.090 --> 01:39:21.410 Let me go ahead and include standard io dot h as usual, int main void. 01:39:21.410 --> 01:39:24.290 And then let me go ahead and give myself an array of scores. 01:39:24.290 --> 01:39:26.000 How about an array of three scores? 01:39:26.000 --> 01:39:28.715 And we've done this before where we collected scores from a user. 01:39:28.715 --> 01:39:30.590 But this time, I'm going to deliberately make 01:39:30.590 --> 01:39:33.170 the mistake of not actually initializing those scores 01:39:33.170 --> 01:39:35.450 or even asking the human for those scores. 01:39:35.450 --> 01:39:41.060 I'm just going to blindly go about iterating from i equals 0 on up to 3. 01:39:41.060 --> 01:39:46.070 And on each iteration, I'm just going to presumptuously print whatever is 01:39:46.070 --> 01:39:49.220 at that location in scores bracket i. 01:39:49.220 --> 01:39:52.430 So logically, my code is correct in what it's trying to do, 01:39:52.430 --> 01:39:54.230 print out the values in scores. 01:39:54.230 --> 01:39:57.170 But notice that I have deliberately not initialized any 01:39:57.170 --> 01:40:00.147 of the 1, 2, 3 scores in that array. 01:40:00.147 --> 01:40:01.730 So who knows what's going to be there? 01:40:01.730 --> 01:40:04.650 Indeed, it should be garbage values of some sort 01:40:04.650 --> 01:40:06.650 that we couldn't necessarily predict in advance. 01:40:06.650 --> 01:40:10.050 So let me go ahead and make garbage, since this program 01:40:10.050 --> 01:40:11.300 is in a file called garbage.c. 01:40:11.300 --> 01:40:15.140 Compiles OK, but when I now run garbage, we 01:40:15.140 --> 01:40:21.230 should see three scores, which are cryptically negative, 833060864. 01:40:21.230 --> 01:40:23.780 Another one is 32765. 01:40:23.780 --> 01:40:25.760 And the third just happens to be 0. 01:40:25.760 --> 01:40:28.490 So there are those garbage values, because again, the computer 01:40:28.490 --> 01:40:31.800 is not going to initialize any of those values for you. 01:40:31.800 --> 01:40:33.570 Now, there are exceptions. 01:40:33.570 --> 01:40:36.320 We have, on occasion, used a global variable, 01:40:36.320 --> 01:40:40.490 a constant that is outside the context of main and all of my other functions. 01:40:40.490 --> 01:40:42.860 Global variables, if you do not set them, 01:40:42.860 --> 01:40:47.210 are conventionally initialized to 0 or nul for you. 01:40:47.210 --> 01:40:50.000 But you should generally not rely on that kind of behavior. 01:40:50.000 --> 01:40:53.120 Your instinct should be to always initialize values 01:40:53.120 --> 01:40:56.630 before thinking of touching or reading them 01:40:56.630 --> 01:40:59.030 as via printf or some other mechanism. 01:40:59.030 --> 01:41:02.720 All right, well, let's see how this understanding, now, of memory, 01:41:02.720 --> 01:41:06.350 can lead us to solve problems, but also encounter new types of problems, 01:41:06.350 --> 01:41:08.960 but problems that we can now hopefully understand. 01:41:08.960 --> 01:41:11.250 I'm going to go ahead and create a new program here. 01:41:11.250 --> 01:41:14.390 And recall from last week that it was very common 01:41:14.390 --> 01:41:15.890 for us to want to swap values. 01:41:15.890 --> 01:41:19.010 When Brian was doing our sorts for us, whether it was selection or bubble 01:41:19.010 --> 01:41:21.710 sort, there was a lot of swapping going on. 01:41:21.710 --> 01:41:24.440 And yet, we didn't really write any code for those algorithms. 01:41:24.440 --> 01:41:25.232 And that's fine. 01:41:25.232 --> 01:41:27.440 But let's consider that very simple primitive of just 01:41:27.440 --> 01:41:30.440 swapping two values, for instance, swapping two integers. 01:41:30.440 --> 01:41:34.160 Let me go ahead and give myself the start of a program and swap.c here. 01:41:34.160 --> 01:41:38.630 I'm going to include standard io dot h, int main void. 01:41:38.630 --> 01:41:41.370 And inside of main, I'm going to give myself two integers. 01:41:41.370 --> 01:41:44.960 Let's just give myself an int called x and assign it 1, an int called y 01:41:44.960 --> 01:41:46.140 and assign it 2. 01:41:46.140 --> 01:41:48.890 And then let me go ahead and just print out what those values are. 01:41:48.890 --> 01:41:55.520 I'll just say, literally, x is percent i comma y is percent i backslash n. 01:41:55.520 --> 01:41:59.490 And then I'm going to go ahead and print out x comma y, respectively. 01:41:59.490 --> 01:42:02.930 And then I'm eventually going to write a function called 01:42:02.930 --> 01:42:04.613 swap that swaps x and y. 01:42:04.613 --> 01:42:06.530 But let's assume, for the moment, that exists. 01:42:06.530 --> 01:42:08.870 It doesn't, because what I then want to do right 01:42:08.870 --> 01:42:13.340 after that is just reprint the same thing, x is now percent i, 01:42:13.340 --> 01:42:17.690 y is percent i, my presumption being that the values of x and y 01:42:17.690 --> 01:42:18.870 will be swapped. 01:42:18.870 --> 01:42:20.480 So how might I swap these two values? 01:42:20.480 --> 01:42:23.120 Well, let me go ahead and implement my own function. 01:42:23.120 --> 01:42:25.110 I don't think it needs to return anything, 01:42:25.110 --> 01:42:27.110 so I'm going to say void is the return type. 01:42:27.110 --> 01:42:28.340 I'll call it swap. 01:42:28.340 --> 01:42:30.830 It's going to take two arguments as input. 01:42:30.830 --> 01:42:33.320 We'll call it a and b, both integers. 01:42:33.320 --> 01:42:34.820 But I could call it anything I want. 01:42:34.820 --> 01:42:36.800 But a and b seems reasonable. 01:42:36.800 --> 01:42:39.350 And now, I want to go ahead and swap two values. 01:42:39.350 --> 01:42:42.140 Now, Brian was kind of doing this with his two hands last week. 01:42:42.140 --> 01:42:45.830 And that's fine, but we should probably consider this a little more closely. 01:42:45.830 --> 01:42:48.050 In fact, Brian, instead of numbers, let's 01:42:48.050 --> 01:42:49.920 do something a little more real world. 01:42:49.920 --> 01:42:53.080 I think you have a couple of beverages in front of you. 01:42:53.080 --> 01:42:53.580 BRIAN: Yeah. 01:42:53.580 --> 01:42:56.220 So right here, I have a red glass and a blue glass, 01:42:56.220 --> 01:42:58.970 which I guess we can use to represent two variables, for instance. 01:42:58.970 --> 01:42:59.180 DAVID MALAN: Yeah. 01:42:59.180 --> 01:43:00.198 Now, let me suppose-- 01:43:00.198 --> 01:43:01.490 I wish I'd told you in advance. 01:43:01.490 --> 01:43:03.920 I'd actually prefer that the red liquid be 01:43:03.920 --> 01:43:07.050 in the blue glass and the blue liquid be in the red glass. 01:43:07.050 --> 01:43:08.780 So do you mind swapping those two values, 01:43:08.780 --> 01:43:11.310 just like you swapped numbers last week? 01:43:11.310 --> 01:43:12.060 BRIAN: Yeah, sure. 01:43:12.060 --> 01:43:14.810 So I can just take the two glasses, and I can switch their places. 01:43:14.810 --> 01:43:17.717 DAVID MALAN: OK, wait, OK, that's not exactly-- 01:43:17.717 --> 01:43:18.800 you took me too literally. 01:43:18.800 --> 01:43:22.760 I think here, if we think of the glasses, now, as specific locations 01:43:22.760 --> 01:43:24.980 in memory, you can't just physically move 01:43:24.980 --> 01:43:27.540 the chips of memory inside of your computer to swap things. 01:43:27.540 --> 01:43:30.410 So I think I literally need you to move the blue liquid 01:43:30.410 --> 01:43:33.350 into the red glass and the red liquid into the blue glass 01:43:33.350 --> 01:43:36.100 so that it's more like a computer's memory. 01:43:36.100 --> 01:43:37.657 BRIAN: OK, I can try to do that. 01:43:37.657 --> 01:43:40.240 I'm a little nervous, though, because I feel like I can't just 01:43:40.240 --> 01:43:43.270 pour the blue liquid into the red glass, because the red liquid's already 01:43:43.270 --> 01:43:43.640 in there. 01:43:43.640 --> 01:43:45.730 DAVID MALAN: Yeah, so this probably doesn't end well, 01:43:45.730 --> 01:43:48.220 if he's got to do some kind of switcheroo between the two glasses. 01:43:48.220 --> 01:43:49.240 So any thoughts here? 01:43:49.240 --> 01:43:54.100 Like what is the real world solution to this weird but real problem, where 01:43:54.100 --> 01:43:57.490 we want to swap the contents of these two locations, 01:43:57.490 --> 01:44:01.180 just like Brian was swapping the contents of two memory locations 01:44:01.180 --> 01:44:02.290 last week? 01:44:02.290 --> 01:44:04.900 Brian, if you have your eye on the chat in parallel, 01:44:04.900 --> 01:44:08.480 might anyone have ideas on how we could swap these two liquids? 01:44:08.480 --> 01:44:11.620 BRIAN: Yeah, a couple of people are saying that I need a third glass. 01:44:11.620 --> 01:44:13.370 DAVID MALAN: All right, well Brian, do you 01:44:13.370 --> 01:44:16.370 happen to have a third glass with you back there behind back stage? 01:44:16.370 --> 01:44:18.040 BRIAN: In fact, I think I do. 01:44:18.040 --> 01:44:21.190 So I have a third glass here that just so happens to be empty. 01:44:21.190 --> 01:44:22.100 DAVID MALAN: OK. 01:44:22.100 --> 01:44:25.610 And how would you, now, go about swapping these two things? 01:44:25.610 --> 01:44:28.870 BRIAN: All right, so I want to put the blue liquid inside the red glass. 01:44:28.870 --> 01:44:30.578 So the first thing I need to do, I think, 01:44:30.578 --> 01:44:34.040 is just to empty out the red glass to make space for the blue liquid. 01:44:34.040 --> 01:44:36.310 So I'm going to take the red liquid, and I'm just 01:44:36.310 --> 01:44:38.470 going to pour it into this extra glass. 01:44:38.470 --> 01:44:39.520 DAVID MALAN: Temporarily though, right? 01:44:39.520 --> 01:44:39.870 BRIAN: Temporarily, yeah. 01:44:39.870 --> 01:44:40.570 DAVID MALAN: OK. 01:44:40.570 --> 01:44:42.620 BRIAN: Just to keep it to store it there. 01:44:42.620 --> 01:44:45.100 And now, I think I can just pour the blue liquid 01:44:45.100 --> 01:44:48.942 into the original red glass, because now I'm free to do so. 01:44:48.942 --> 01:44:50.400 So I'll pour the blue liquid there. 01:44:53.230 --> 01:44:56.220 And I think the last thing I need to do now is, now this blue-- 01:44:56.220 --> 01:44:59.680 this glass that originally held the blue liquid is now empty. 01:44:59.680 --> 01:45:03.130 So the red liquid, which was inside of this temporary glass over here, 01:45:03.130 --> 01:45:07.350 I can take the red liquid and just pour it into this glass here. 01:45:07.350 --> 01:45:10.290 And now, I didn't swap the positions of the glasses. 01:45:10.290 --> 01:45:12.390 But the liquids have actually switched places. 01:45:12.390 --> 01:45:15.355 Now, the blue liquid is on the left and the red liquid is on the right. 01:45:15.355 --> 01:45:16.230 DAVID MALAN: Awesome. 01:45:16.230 --> 01:45:18.660 Yeah, I think that is a more literal implementation 01:45:18.660 --> 01:45:21.150 of what you were doing and taking for granted last week, 01:45:21.150 --> 01:45:24.182 swapping the two values in two separate locations. 01:45:24.182 --> 01:45:25.640 So it seems pretty straightforward. 01:45:25.640 --> 01:45:27.210 I just need a little more space. 01:45:27.210 --> 01:45:29.670 I need a temporary variable in code, if you will. 01:45:29.670 --> 01:45:31.545 And it seems I need three steps. 01:45:31.545 --> 01:45:34.670 I need to pour one out, pour the other one out, pour the other one back in. 01:45:34.670 --> 01:45:37.122 So I think I can translate that into code here. 01:45:37.122 --> 01:45:39.330 Let me go ahead and give myself a temporary variable, 01:45:39.330 --> 01:45:40.840 like a glass, like Brian did. 01:45:40.840 --> 01:45:43.650 And I'll call it tmp, T-M-P, which is pretty conventional when 01:45:43.650 --> 01:45:45.180 you want to swap two things in code. 01:45:45.180 --> 01:45:47.850 And I'm going to sign it, temporarily, the value of a. 01:45:47.850 --> 01:45:51.550 I'm going to then change the contents of a to equal whatever the contents of B 01:45:51.550 --> 01:45:52.050 are. 01:45:52.050 --> 01:45:56.010 And then I'm going to change b to be whatever the contents of tmp were. 01:45:56.010 --> 01:45:58.650 So this feels pretty reasonable and pretty correct, 01:45:58.650 --> 01:46:01.230 because it's just a literal translation into code, 01:46:01.230 --> 01:46:03.700 now, of what Brian did in the real world. 01:46:03.700 --> 01:46:05.610 And I think this will compile. 01:46:05.610 --> 01:46:08.040 So let's start there, make swap. 01:46:08.040 --> 01:46:09.690 It does-- oh, doesn't compile. 01:46:09.690 --> 01:46:13.410 OK, previous implicit declaration, oh, so many errors, my god. 01:46:13.410 --> 01:46:15.687 Implicit declaration of function swap-- 01:46:15.687 --> 01:46:16.270 wait a minute. 01:46:16.270 --> 01:46:17.230 I've seen that before. 01:46:17.230 --> 01:46:18.480 I've made this mistake before. 01:46:18.480 --> 01:46:20.050 You might have as well. 01:46:20.050 --> 01:46:23.293 Anytime you see this, recall it's just that you're missing your prototype. 01:46:23.293 --> 01:46:25.710 Remember that the compiler is going to take you literally. 01:46:25.710 --> 01:46:28.500 And if it doesn't know the word swap exists when it sees it, 01:46:28.500 --> 01:46:30.310 it's not going to compile successfully. 01:46:30.310 --> 01:46:33.030 So we need to include my prototype at the top of my file. 01:46:33.030 --> 01:46:35.460 Now, let me try this again, make swap. 01:46:35.460 --> 01:46:36.780 OK, that compiles. 01:46:36.780 --> 01:46:40.950 Let me go ahead now and run swap and recall that, in main, what I did 01:46:40.950 --> 01:46:43.380 was initialize x to 1, y to 2. 01:46:43.380 --> 01:46:45.900 I then print out what x is and what y is. 01:46:45.900 --> 01:46:50.040 I call swap, and then I print out what x is and y is again. 01:46:50.040 --> 01:46:52.770 So I should see 1, 2, and then 2, 1. 01:46:52.770 --> 01:46:55.430 So lets hit Enter. 01:46:55.430 --> 01:46:58.800 Huh, it does not seem to be working. 01:46:58.800 --> 01:47:01.740 Well, let's try it again, just in case-- 01:47:01.740 --> 01:47:04.020 no, not working. 01:47:04.020 --> 01:47:05.530 Well, let me try this. 01:47:05.530 --> 01:47:07.590 Let me add some-- printf is my friend. 01:47:07.590 --> 01:47:10.971 Let me go ahead and say a is percent i. 01:47:10.971 --> 01:47:14.460 b is percent i backslash n, a, b. 01:47:14.460 --> 01:47:15.510 So let's print that out. 01:47:15.510 --> 01:47:16.650 And let's print that out twice. 01:47:16.650 --> 01:47:18.480 So this would be a reasonable debugging technique. 01:47:18.480 --> 01:47:21.605 If you want to know what's going on underneath the hood, add some printf's. 01:47:21.605 --> 01:47:23.760 Let me go ahead and make swap. 01:47:23.760 --> 01:47:26.520 That compiles, dot slash swap. 01:47:26.520 --> 01:47:32.880 And let's see, a is 1, b is 2, a is 2, b is 1. 01:47:32.880 --> 01:47:35.470 But then x and y are unchanged. 01:47:35.470 --> 01:47:37.170 So I feel like my logic is right. 01:47:37.170 --> 01:47:38.550 It's switching a and b. 01:47:38.550 --> 01:47:41.490 But it's not actually switching x and y. 01:47:41.490 --> 01:47:43.340 And I could confirm as much, right? 01:47:43.340 --> 01:47:45.510 The more powerful way to debug this would 01:47:45.510 --> 01:47:49.890 be to run debug50, set a break point, for instance, at line 17, 01:47:49.890 --> 01:47:54.270 step through my code, step by step, stepping into the swap function. 01:47:54.270 --> 01:47:57.030 But for now, it seems clear that swap works. 01:47:57.030 --> 01:48:00.250 But main isn't really seeing those results. 01:48:00.250 --> 01:48:01.450 So what's actually going on? 01:48:01.450 --> 01:48:04.170 Well, let's consider this real world incarnation of what my memory is 01:48:04.170 --> 01:48:05.712 so I can actually move things around. 01:48:05.712 --> 01:48:08.820 And this is all thanks to our friends in the theater's prop shop in back. 01:48:08.820 --> 01:48:10.830 If we think of this as my computer's memory, 01:48:10.830 --> 01:48:12.540 initially, it's all garbage values. 01:48:12.540 --> 01:48:16.080 But I can use this as a canvas to start laying things out in memory. 01:48:16.080 --> 01:48:19.020 But calling functions is something we've taken for granted thus far. 01:48:19.020 --> 01:48:22.200 And it turns out, when you call functions, the computer, by default, 01:48:22.200 --> 01:48:25.500 uses this memory in kind of a standard way. 01:48:25.500 --> 01:48:29.850 In fact, let me go ahead and draw a more pictorial picture. 01:48:29.850 --> 01:48:33.440 Let me draw a more literal picture here, if you will, of the computer's memory 01:48:33.440 --> 01:48:33.940 again. 01:48:33.940 --> 01:48:36.660 So if this is the computer's memory and we zoom in on one of the chips, 01:48:36.660 --> 01:48:39.120 and we think of the chip as having a whole bunch of bytes like this. 01:48:39.120 --> 01:48:42.390 Let's abstract away the actual hardware and think of it as we have been. 01:48:42.390 --> 01:48:45.720 It's just this big rectangular region of memory, not unlike all of those Oscar 01:48:45.720 --> 01:48:47.520 the Grouches a moment ago. 01:48:47.520 --> 01:48:51.150 But by convention, your computer does not just plop things 01:48:51.150 --> 01:48:52.710 in random locations in memory. 01:48:52.710 --> 01:48:55.710 It has certain rules of thumb that it adheres to. 01:48:55.710 --> 01:48:59.460 In particular, it treats different portions of your computer's memory 01:48:59.460 --> 01:49:00.330 in different ways. 01:49:00.330 --> 01:49:03.570 It uses it in a standard way so that it's not completely random. 01:49:03.570 --> 01:49:08.910 For instance, when you run a program by doing dot slash something on CS50 IDE 01:49:08.910 --> 01:49:12.270 or on Linux more generally, or you double click an icon on Mac OS 01:49:12.270 --> 01:49:16.590 or Windows, that triggers the computer's-- 01:49:16.590 --> 01:49:21.030 the program's 0's and 1's stored on your hard drive to be loaded up here, 01:49:21.030 --> 01:49:23.742 to what we'll call machine code, which again, is the 0's and 1's. 01:49:23.742 --> 01:49:25.950 So if you think again, metaphorically, as your memory 01:49:25.950 --> 01:49:29.730 is this rectangular region, then the machine code, 01:49:29.730 --> 01:49:35.732 the 0's and 1's composing your program are loaded into the top part of memory. 01:49:35.732 --> 01:49:38.940 And again, top, bottom, left, right, it has no fundamental technical meaning. 01:49:38.940 --> 01:49:40.470 It's just an artist's rendition. 01:49:40.470 --> 01:49:42.960 But it does go into a standard location. 01:49:42.960 --> 01:49:45.700 Below that are all of your global variables. 01:49:45.700 --> 01:49:48.250 So are your constants that you put outside of your functions. 01:49:48.250 --> 01:49:50.500 Those are going to end up just below the machine code, 01:49:50.500 --> 01:49:53.340 so again, at the top of your computer's memory. 01:49:53.340 --> 01:49:55.200 Below that is what's called the heap. 01:49:55.200 --> 01:49:56.940 And this is a technical term. 01:49:56.940 --> 01:50:00.780 And it refers to a big chunk of memory that malloc 01:50:00.780 --> 01:50:03.640 uses to get you some spare memory. 01:50:03.640 --> 01:50:09.270 Any time you call malloc, you are given the address of some chunk of memory 01:50:09.270 --> 01:50:13.200 up in this region, below the machine code, below your global variables. 01:50:13.200 --> 01:50:15.270 And it's kind of a big zone. 01:50:15.270 --> 01:50:19.120 But the catch is that other parts of your memory are used differently. 01:50:19.120 --> 01:50:24.570 In fact, whereas the heap is considered to be here on down, somewhat 01:50:24.570 --> 01:50:28.830 worrisomely, the stack is considered to be here on up. 01:50:28.830 --> 01:50:32.070 This is to say, when you call malloc and ask for memory, 01:50:32.070 --> 01:50:35.670 that gets allocated up here. 01:50:35.670 --> 01:50:39.540 When you call a function, though, those functions 01:50:39.540 --> 01:50:42.900 use what's called stack space instead of heap space. 01:50:42.900 --> 01:50:48.450 So any time you call a function, main or swap or strlang or string compare 01:50:48.450 --> 01:50:51.330 or any of the functions you've used thus far, 01:50:51.330 --> 01:50:54.150 your computer will automatically store any 01:50:54.150 --> 01:50:58.860 of the local variables or parameters from those functions down here. 01:50:58.860 --> 01:51:00.840 Now, this is not necessarily the best design, 01:51:00.840 --> 01:51:02.550 because you can see the two arrows pointing at one 01:51:02.550 --> 01:51:05.383 another is like two trains barreling down the tracks at one another. 01:51:05.383 --> 01:51:07.268 Bad things can eventually happen. 01:51:07.268 --> 01:51:09.060 Thankfully, we typically have enough memory 01:51:09.060 --> 01:51:12.370 that these two things don't collide, but more on that in just a bit. 01:51:12.370 --> 01:51:15.570 So again, when you call functions, memory down here is used. 01:51:15.570 --> 01:51:17.710 When you use malloc, memory up here is used. 01:51:17.710 --> 01:51:19.710 Now, for my swap function, I'm not using malloc. 01:51:19.710 --> 01:51:21.690 So I don't think I have to worry about heap. 01:51:21.690 --> 01:51:23.283 And I don't have any global variables. 01:51:23.283 --> 01:51:25.200 And I don't really care about my machine code. 01:51:25.200 --> 01:51:27.240 I just need to know that it's stored somewhere. 01:51:27.240 --> 01:51:30.210 But let's consider, then, what the stack is all about. 01:51:30.210 --> 01:51:32.670 The stack, indeed, is this sort of dynamic place 01:51:32.670 --> 01:51:34.860 where memory keeps getting used and reused. 01:51:34.860 --> 01:51:40.440 So for instance, when you call main, as you might when this swap program is 01:51:40.440 --> 01:51:45.010 run, main uses a sliver of memory at the bottom of this picture, if you will. 01:51:45.010 --> 01:51:47.910 So the local variables in main, like x and y, 01:51:47.910 --> 01:51:49.920 end up at this bottom portion of memory. 01:51:49.920 --> 01:51:53.790 When you call swap, swap uses a chunk of memory just above main, 01:51:53.790 --> 01:51:58.350 pictorally, in this diagram, such as variables a and b and temp, 01:51:58.350 --> 01:51:59.410 for that matter. 01:51:59.410 --> 01:52:04.680 And then, once swap returns and is done executing, that sliver of memory 01:52:04.680 --> 01:52:06.010 essentially goes away. 01:52:06.010 --> 01:52:07.230 Now, it doesn't disappear. 01:52:07.230 --> 01:52:09.610 Obviously, there's still physical memory there. 01:52:09.610 --> 01:52:12.810 But that's when we get into the discussion of garbage values again. 01:52:12.810 --> 01:52:15.540 They're still like Oscar the Grouches all over the place. 01:52:15.540 --> 01:52:18.600 You just don't know, or at this point care, what the values are. 01:52:18.600 --> 01:52:20.010 But there are values there. 01:52:20.010 --> 01:52:23.640 And that's why, a moment ago, when I printed out that uninitialized score's 01:52:23.640 --> 01:52:26.970 array, I did see some bogus values, because there's still 01:52:26.970 --> 01:52:30.510 going to be 0's and 1's there that are left over from before. 01:52:30.510 --> 01:52:31.750 The problem, though, is this. 01:52:31.750 --> 01:52:35.070 Let me go over to this physical incarnation of our memory 01:52:35.070 --> 01:52:38.010 and consider this as being our stack, so it's growing on up. 01:52:38.010 --> 01:52:42.060 And in fact, if I want to have two local variables like I do, x and y, 01:52:42.060 --> 01:52:47.400 let's go ahead and think of this row of memory here as being main, 01:52:47.400 --> 01:52:48.870 for instance, here. 01:52:48.870 --> 01:52:51.630 And I'm going to go ahead and replace all these garbage values 01:52:51.630 --> 01:52:53.790 with an actual value that I care about. 01:52:53.790 --> 01:52:57.660 And the actual values that I care about, we're going to call x and y, just as 01:52:57.660 --> 01:52:58.480 before. 01:52:58.480 --> 01:53:01.020 So each of these Oscars happens to be one byte. 01:53:01.020 --> 01:53:02.068 But an int is 4 bytes. 01:53:02.068 --> 01:53:04.110 So thankfully, from our friends in the prop shop, 01:53:04.110 --> 01:53:06.178 we have these bigger integer-sized blocks. 01:53:06.178 --> 01:53:08.220 And I'm going to go ahead and slide this in here. 01:53:08.220 --> 01:53:10.740 And we're going to think of this, in a moment, as x. 01:53:10.740 --> 01:53:14.340 And indeed, I'm going to go ahead and call this x with a marker. 01:53:14.340 --> 01:53:17.760 And then I'm going to go ahead and give myself another integer, a size 4, 01:53:17.760 --> 01:53:19.300 and put it down here. 01:53:19.300 --> 01:53:21.300 And we're going to think of this as y. 01:53:21.300 --> 01:53:23.940 And recall, what do I initialize these values to? 01:53:23.940 --> 01:53:27.690 Well, the value 1, initially, and the value 2. 01:53:27.690 --> 01:53:29.370 But then I called the swap function. 01:53:29.370 --> 01:53:32.160 And the swap function has two arguments, a and b. 01:53:32.160 --> 01:53:38.400 And those, by design, become copies of x and y, because I passed in x comma y. 01:53:38.400 --> 01:53:41.280 And I defined swap as taking a comma b. 01:53:41.280 --> 01:53:44.970 So I think what I need to do, physically here, is now 01:53:44.970 --> 01:53:50.170 think of this second row of memory as now belonging to the swap function, 01:53:50.170 --> 01:53:51.210 not to main. 01:53:51.210 --> 01:53:54.090 And inside of this second row of memory, I'll 01:53:54.090 --> 01:53:57.540 think of this as belonging to swap. 01:53:57.540 --> 01:54:02.100 And within the swap row, I'm going to have another integer of size 4. 01:54:02.100 --> 01:54:07.500 And we're going to call this one a, as down there, a. 01:54:07.500 --> 01:54:10.350 And then I'm going to have another chunk of size 4. 01:54:10.350 --> 01:54:12.600 And we're going to call this b. 01:54:12.600 --> 01:54:16.050 And again, because those are just the arguments, x comma y, otherwise 01:54:16.050 --> 01:54:20.760 now known as a comma b, I copy 1 and 2 into those values. 01:54:20.760 --> 01:54:22.770 But swap has a third variable. 01:54:22.770 --> 01:54:24.730 Brian proposed a temporary variable. 01:54:24.730 --> 01:54:27.480 So I'm going to go ahead and give myself four more bytes, 01:54:27.480 --> 01:54:30.210 thereby getting rid of whatever the garbage value's there 01:54:30.210 --> 01:54:34.260 and actually setting it to an integer call tmp. 01:54:34.260 --> 01:54:39.030 So I'm going to go ahead and call this thing tmp, T-M-P. 01:54:39.030 --> 01:54:40.440 And what did I do first? 01:54:40.440 --> 01:54:43.845 I set tmp equals to a. 01:54:43.845 --> 01:54:45.120 So tmp equals to a. 01:54:45.120 --> 01:54:47.520 So if a is 1, tmp is 1. 01:54:47.520 --> 01:54:48.750 Then what did I do? 01:54:48.750 --> 01:54:51.780 I then did a equals b. 01:54:51.780 --> 01:54:55.150 So b is 2. 01:54:55.150 --> 01:54:57.800 a is 2 as well. 01:54:57.800 --> 01:55:00.030 And then lastly, what did I do? 01:55:00.030 --> 01:55:02.145 I did b gets tmp. 01:55:02.145 --> 01:55:05.020 So I have to go ahead and change this to be whatever the value of tmp 01:55:05.020 --> 01:55:07.630 is, which is now the number 1. 01:55:07.630 --> 01:55:10.150 So you can see that swap is correct insofar 01:55:10.150 --> 01:55:12.655 as it is swapping the values of a and b. 01:55:12.655 --> 01:55:16.690 But the moment swap returns, these return 01:55:16.690 --> 01:55:19.000 to being thought of as garbage values. 01:55:19.000 --> 01:55:20.860 Main is still in the middle of running. 01:55:20.860 --> 01:55:22.300 Swap is no longer running. 01:55:22.300 --> 01:55:23.743 But these values stay there. 01:55:23.743 --> 01:55:24.910 So those are garbage values. 01:55:24.910 --> 01:55:27.850 We happen to know what they are, but they're no longer valid, 01:55:27.850 --> 01:55:32.560 because when I go to print out x and y for the second time, what are x and y? 01:55:32.560 --> 01:55:33.820 They're still the same. 01:55:33.820 --> 01:55:37.870 And so this is to say, when you actually write code that takes arguments 01:55:37.870 --> 01:55:40.750 and you pass arguments from one function to another, 01:55:40.750 --> 01:55:43.930 those arguments are copied from one function to another. 01:55:43.930 --> 01:55:47.140 And indeed, x and y are copied into a and b. 01:55:47.140 --> 01:55:51.670 So your code may very well look correct in that it's swopping correctly. 01:55:51.670 --> 01:55:55.750 But it's only swapping correctly in the context of swap, 01:55:55.750 --> 01:55:58.370 not touching the original values. 01:55:58.370 --> 01:56:00.730 So what I think we need to do, fundamentally, 01:56:00.730 --> 01:56:06.130 is reimplement swap in such a way that we actually 01:56:06.130 --> 01:56:10.450 change the values of x and y. 01:56:10.450 --> 01:56:11.500 But how can we do this? 01:56:11.500 --> 01:56:13.810 Brian, if we could call in someone here. 01:56:13.810 --> 01:56:18.340 How could I conceptually change my implementation of swap 01:56:18.340 --> 01:56:26.110 so that it somehow empowers me to change x and y, not change copies of x and y? 01:56:26.110 --> 01:56:28.570 What could I pass into swap, Brian? 01:56:28.570 --> 01:56:31.150 BRIAN: Igor is suggesting that we use pointers instead. 01:56:31.150 --> 01:56:33.733 DAVID MALAN: Yeah, so perhaps the leading question here today. 01:56:33.733 --> 01:56:36.010 But pointers would seem to give us a solution. 01:56:36.010 --> 01:56:38.170 If pointers are essentially like a treasure 01:56:38.170 --> 01:56:41.500 map to a specific address in your computer's memory, what I should really 01:56:41.500 --> 01:56:45.940 do from main to swap is pass in not x and y literally, 01:56:45.940 --> 01:56:49.630 but why don't I pass in the address of x and the address of y, 01:56:49.630 --> 01:56:53.230 so that swap can now go to those addresses 01:56:53.230 --> 01:56:57.460 and actually do the sort of swap that Brian enacted in person. 01:56:57.460 --> 01:57:02.050 So give the function a sort of map to those values, pointers to those values, 01:57:02.050 --> 01:57:03.560 and then go to those values. 01:57:03.560 --> 01:57:04.580 So how might I do this? 01:57:04.580 --> 01:57:06.580 Well, the code has to be a little different now. 01:57:06.580 --> 01:57:09.640 When I call swap this time, what I really need to do 01:57:09.640 --> 01:57:12.710 is pass in the addresses of these two variables. 01:57:12.710 --> 01:57:14.950 So I don't necessarily know what those addresses are. 01:57:14.950 --> 01:57:16.900 But for the sake of the story, we can just 01:57:16.900 --> 01:57:21.340 assume that this address, for instance, is like, 0x123. 01:57:21.340 --> 01:57:25.142 And then four bytes away from that might be 0x127, for instance. 01:57:25.142 --> 01:57:27.100 But again, it doesn't really matter what it is. 01:57:27.100 --> 01:57:29.440 But they do have addresses, x and y. 01:57:29.440 --> 01:57:31.562 So a pointer recall tends to be pretty big. 01:57:31.562 --> 01:57:33.520 So we needed to get out a bigger piece of wood, 01:57:33.520 --> 01:57:35.590 eight bytes that represents a pointer. 01:57:35.590 --> 01:57:38.830 And I actually need to use a bit more memory in swap now. 01:57:38.830 --> 01:57:42.490 If I now declare a to be, not an integer, 01:57:42.490 --> 01:57:47.020 but a pointer to an int, that is a int star variable, 01:57:47.020 --> 01:57:49.330 I could call this thing a now. 01:57:49.330 --> 01:57:54.340 And I could store, in it, the address of x, like 0x123. 01:57:54.340 --> 01:57:57.640 If I then change the definition of b to be 01:57:57.640 --> 01:58:01.390 not an integer, but a pointer to an integer, 01:58:01.390 --> 01:58:04.810 that is another int star, which happens to be eight bytes. 01:58:04.810 --> 01:58:07.780 I'm going to use a little more memory for this thing, but that's OK. 01:58:07.780 --> 01:58:10.030 And its name is going to be b now. 01:58:10.030 --> 01:58:13.600 And it's going to contain 0x127. 01:58:13.600 --> 01:58:15.820 I still need a temporary variable. 01:58:15.820 --> 01:58:18.650 I still need a temporary variable, but that's fine. 01:58:18.650 --> 01:58:20.980 I just need four bytes for that, because the variable 01:58:20.980 --> 01:58:25.990 itself just needs to store an int, like Brian temporarily stored it in a glass. 01:58:25.990 --> 01:58:29.260 So I just need an additional four bytes, like before, for that. 01:58:29.260 --> 01:58:31.720 And now, let's just consider the logic. 01:58:31.720 --> 01:58:32.710 Here's main. 01:58:32.710 --> 01:58:34.990 And swap is now using these 3-- 01:58:34.990 --> 01:58:36.550 2 and 1/2 rows of memory. 01:58:36.550 --> 01:58:37.240 And that's fine. 01:58:37.240 --> 01:58:39.640 It's growing upward as I proposed. 01:58:39.640 --> 01:58:41.860 X is at address 0x123. 01:58:41.860 --> 01:58:44.560 y is at address 0x127. 01:58:44.560 --> 01:58:48.370 Therefore, a and b, I propose conceptually, like Igor proposed, 01:58:48.370 --> 01:58:52.280 store the addresses of a, x and y, respectively. 01:58:52.280 --> 01:58:55.060 And now my code, I think, needs to say this. 01:58:55.060 --> 01:59:00.025 Go and store, in the variable tmp, whatever is at the address a. 01:59:00.025 --> 01:59:02.650 So you can kind of think of this as being an arrow down here. 01:59:02.650 --> 01:59:03.910 Follow the arrow, OK. 01:59:03.910 --> 01:59:06.010 What is at address 0x123? 01:59:06.010 --> 01:59:06.910 The number 1. 01:59:06.910 --> 01:59:09.250 So we put one in tmp, just like before. 01:59:09.250 --> 01:59:10.310 Then what do we do? 01:59:10.310 --> 01:59:13.540 Well, now, I'm going to go ahead and change, not the value of a, 01:59:13.540 --> 01:59:18.010 but I'm going to change what is at the location in a to be 01:59:18.010 --> 01:59:24.790 whatever is at the location in b, which is an arrow pointing down here, 0x127. 01:59:24.790 --> 01:59:27.850 So I'm going to change this 1, now, to be a 2. 01:59:27.850 --> 01:59:30.910 And the third and final step, recall, is for me, now, 01:59:30.910 --> 01:59:37.150 to go, not to b, but to go where b points to, which happens to be y, 01:59:37.150 --> 01:59:42.440 and change that to be the value of tmp, which of course, is up here. 01:59:42.440 --> 01:59:45.430 And at this point in the story, it's still just three lines of code. 01:59:45.430 --> 01:59:47.380 They're different types of lines of code. 01:59:47.380 --> 01:59:48.950 It's three lines of code. 01:59:48.950 --> 01:59:52.180 But when swap is done executing, notice what we've done. 01:59:52.180 --> 01:59:55.190 We have successfully swapped x and y by letting 01:59:55.190 --> 01:59:59.270 swap go to those addresses as opposed to just naively getting 01:59:59.270 --> 02:00:02.180 copies of the values therein. 02:00:02.180 --> 02:00:05.150 Now, even though this code is going to look a little cryptic, 02:00:05.150 --> 02:00:10.820 it's, frankly, just an application of the logic we've seen thus far. 02:00:10.820 --> 02:00:13.860 I'm going to go ahead and go back to my old buggy version. 02:00:13.860 --> 02:00:15.860 And I'm going to change the definition of swap 02:00:15.860 --> 02:00:19.190 to say that it doesn't take two integers, a and b, but two 02:00:19.190 --> 02:00:20.810 pointers to integers a and b. 02:00:20.810 --> 02:00:24.080 And the way you declare a pointer recall is the type of variable 02:00:24.080 --> 02:00:26.767 you point at followed by a star and then the name of it. 02:00:26.767 --> 02:00:28.850 And we haven't seen it, admittedly, in the context 02:00:28.850 --> 02:00:31.550 of a function taking parameters yet. 02:00:31.550 --> 02:00:33.170 But it's quite simply that. 02:00:33.170 --> 02:00:34.610 I added the stars. 02:00:34.610 --> 02:00:40.040 Down here, I need to say, store in tmp, whatever is at a. 02:00:40.040 --> 02:00:41.870 How do I express go to a? 02:00:41.870 --> 02:00:43.520 Just add a star here. 02:00:43.520 --> 02:00:46.880 How do I express go to a and put whatever is at b? 02:00:46.880 --> 02:00:48.500 I add stars there. 02:00:48.500 --> 02:00:51.560 How do I say, go to b and store whatever is at tmp? 02:00:51.560 --> 02:00:53.190 I add one star there. 02:00:53.190 --> 02:00:55.520 So tmp is just a simple integer. 02:00:55.520 --> 02:00:57.380 It's just an empty glass like Brian had. 02:00:57.380 --> 02:00:58.620 There's nothing fancy there. 02:00:58.620 --> 02:01:00.650 So we don't need stars around tmp. 02:01:00.650 --> 02:01:04.970 But I do, now, need to change how I'm using a and b, 02:01:04.970 --> 02:01:08.330 because now they are addresses that I actually want to go to. 02:01:08.330 --> 02:01:12.140 There's no need for the address of operator in this context. 02:01:12.140 --> 02:01:14.330 But up here, I'm going to need to make a change. 02:01:14.330 --> 02:01:16.380 I do need to change the prototype to match. 02:01:16.380 --> 02:01:18.200 So that's just a copy paste. 02:01:18.200 --> 02:01:23.120 But I bet you can imagine what, lastly, needs to change. 02:01:23.120 --> 02:01:26.750 When calling swap, I don't want to pass in naively x and y, because again, 02:01:26.750 --> 02:01:27.980 they're going to get copied. 02:01:27.980 --> 02:01:32.000 I want to pass in the address of x and the address of y, 02:01:32.000 --> 02:01:35.690 so that swap now has sort of special access 02:01:35.690 --> 02:01:38.750 to the contents of those locations in memory 02:01:38.750 --> 02:01:42.740 so that it actually can make some changes therein. 02:01:42.740 --> 02:01:47.780 And that, indeed, if I now recompile this program, make swap, and I do 02:01:47.780 --> 02:01:50.390 dot swap and cross my fingers, voila. 02:01:50.390 --> 02:01:53.855 Now, I have successfully swapped lines of code. 02:01:53.855 --> 02:01:55.730 So last week, if you were wondering, perhaps, 02:01:55.730 --> 02:01:58.250 why we didn't show you how to do swap, we could have. 02:01:58.250 --> 02:01:59.900 And we didn't need a special function. 02:01:59.900 --> 02:02:03.200 You don't necessarily need pointers if we did all of this in main. 02:02:03.200 --> 02:02:06.470 But I'm trying to introduce an abstraction, this function that 02:02:06.470 --> 02:02:09.740 does swap just like Brian swapped those glasses for us. 02:02:09.740 --> 02:02:12.650 And to pass values from one function to another, 02:02:12.650 --> 02:02:15.990 you do need to understand what's going on in your computer's memory 02:02:15.990 --> 02:02:18.830 so that you can actually pass in little breadcrumbs again, 02:02:18.830 --> 02:02:23.330 treasure maps to those locations and memories, again, thanks to these things 02:02:23.330 --> 02:02:25.100 called pointers. 02:02:25.100 --> 02:02:27.770 All right, well let me propose and emphasize, 02:02:27.770 --> 02:02:30.770 then, that this design of the heap being up at the top, 02:02:30.770 --> 02:02:33.200 where malloc uses memory and the stack being 02:02:33.200 --> 02:02:35.540 at the bottom where your own functions use memory, 02:02:35.540 --> 02:02:37.730 this is a problem clearly waiting to happen. 02:02:37.730 --> 02:02:39.460 And those problems actually have names. 02:02:39.460 --> 02:02:41.210 And some of you who have programmed before 02:02:41.210 --> 02:02:45.230 might know some of these terms, either heap overflow or stack overflow. 02:02:45.230 --> 02:02:48.650 And in fact, many of you might know stackoverflow.com as just a website. 02:02:48.650 --> 02:02:50.840 Well, there is an origin story to its name. 02:02:50.840 --> 02:02:56.240 A stack overflow refers to the process of calling a function so many times 02:02:56.240 --> 02:02:58.550 that it overflows the heap. 02:02:58.550 --> 02:03:00.320 That is, every time you call the function, 02:03:00.320 --> 02:03:04.950 like I did here, you use more and more rows, so to speak, of memory. 02:03:04.950 --> 02:03:07.730 And if you call so many functions again and again, 02:03:07.730 --> 02:03:11.690 eventually, you may very well run over the area of memory called heap. 02:03:11.690 --> 02:03:14.090 And at that point, your program will crash. 02:03:14.090 --> 02:03:18.950 There is no fundamental solution to that problem other than don't do that. 02:03:18.950 --> 02:03:20.420 Don't use too much memory. 02:03:20.420 --> 02:03:21.680 But that can be hard to do. 02:03:21.680 --> 02:03:24.138 And indeed, that's one of the dangers of programming today. 02:03:24.138 --> 02:03:27.800 And we can actually induce this a little bit deliberately ourselves. 02:03:27.800 --> 02:03:30.620 And in fact, I thought we could revisit, for instance, 02:03:30.620 --> 02:03:34.220 where we left off with Mario last time, which was this picture here. 02:03:34.220 --> 02:03:37.580 Recall that this was a pyramid, of course, 02:03:37.580 --> 02:03:40.400 simpler than the one you might have played with for problems at 0. 02:03:40.400 --> 02:03:44.360 But it's a recursive pyramid in that you can define a pyramid of height 4, 02:03:44.360 --> 02:03:47.690 in terms of a pyramid of height 3, in terms of a pyramid of height 2 02:03:47.690 --> 02:03:48.380 and a height 1. 02:03:48.380 --> 02:03:52.580 And indeed, I built that last week using these very blocks. 02:03:52.580 --> 02:03:56.180 Well, you can implement Mario's pyramid like this 02:03:56.180 --> 02:03:57.660 in a couple of different ways. 02:03:57.660 --> 02:04:01.160 One is just using week one style iteration, using a loop. 02:04:01.160 --> 02:04:03.890 And in fact, let me go ahead and whip up a quick solution that 02:04:03.890 --> 02:04:05.340 does exactly that. 02:04:05.340 --> 02:04:07.730 Let me go ahead and call this mario.c. 02:04:07.730 --> 02:04:10.610 And I'm going to go ahead and include cs50.h. 02:04:10.610 --> 02:04:12.290 So we can use one of our get functions. 02:04:12.290 --> 02:04:14.300 I'm going to use standard io dot h. 02:04:14.300 --> 02:04:16.160 And I'm going to do int main void. 02:04:16.160 --> 02:04:18.590 And all I want to do is print out this pyramid. 02:04:18.590 --> 02:04:20.340 But I want to ask the user for the height. 02:04:20.340 --> 02:04:23.090 So I'm going to say int height equals get int. 02:04:23.090 --> 02:04:26.870 And we'll ask the user for the height, just like you did for problem set 1. 02:04:26.870 --> 02:04:30.000 And then I'm going to go ahead and draw a pyramid of that height. 02:04:30.000 --> 02:04:31.340 Now, draw doesn't exist. 02:04:31.340 --> 02:04:32.030 But that's fine. 02:04:32.030 --> 02:04:34.735 I'm going to go ahead and draw this now, implement draw myself. 02:04:34.735 --> 02:04:36.860 It doesn't need to return a value, because I'm just 02:04:36.860 --> 02:04:38.273 printing stuff on the screen. 02:04:38.273 --> 02:04:40.190 Function's called draw, and it's going to take 02:04:40.190 --> 02:04:42.710 an input called h, for instance. h for height, 02:04:42.710 --> 02:04:45.080 but I could call its argument anything I want. 02:04:45.080 --> 02:04:48.650 And then I'm just going to do this, for int i gets 1, 02:04:48.650 --> 02:04:52.850 i less than or equal to h, i++. 02:04:52.850 --> 02:04:56.170 And then inside of this, this is where you might recall, from problem set one, 02:04:56.170 --> 02:04:58.700 have found a nested loop to be useful. 02:04:58.700 --> 02:05:04.150 Let me do int j gets 1, j less than or equal to i, j++. 02:05:04.150 --> 02:05:08.178 This will be similar but not identical to either the less comfortable or more 02:05:08.178 --> 02:05:09.970 comfortable version of Mario from the past, 02:05:09.970 --> 02:05:13.240 because this pyramid is shaped in a different direction. 02:05:13.240 --> 02:05:15.610 Now, you print a hash there. 02:05:15.610 --> 02:05:17.830 And then let me go ahead and print a new line here. 02:05:17.830 --> 02:05:19.570 So I did this super quickly. 02:05:19.570 --> 02:05:21.880 But logically, what I'm doing is iterating 02:05:21.880 --> 02:05:29.710 over every row, so from 1 through h, so row 1, 2, 3, 4, for instance. 02:05:29.710 --> 02:05:34.210 And then on each row, I'm deliberately iterating from 1 through i. 02:05:34.210 --> 02:05:37.870 So I print 1, then 2, then 3, then 4. 02:05:37.870 --> 02:05:39.640 And again, I could zero index if I want. 02:05:39.640 --> 02:05:44.170 I find that in this context, more user friendly, more intelligible to me 02:05:44.170 --> 02:05:46.660 to index from 1, totally reasonable if you think 02:05:46.660 --> 02:05:48.310 there's a compelling design argument. 02:05:48.310 --> 02:05:50.030 So let me go ahead and make Mario. 02:05:50.030 --> 02:05:51.520 Ah, darn it. 02:05:51.520 --> 02:05:53.980 Oh, I missed my prototype. 02:05:53.980 --> 02:05:55.870 So notice, it's not understanding draw. 02:05:55.870 --> 02:05:58.900 So the fix for that is to either move the whole function 02:05:58.900 --> 02:06:02.980 or, as we've preached instead, to just put your prototype up top. 02:06:02.980 --> 02:06:05.050 Let me recompile Mario. 02:06:05.050 --> 02:06:06.430 OK, now successful. 02:06:06.430 --> 02:06:08.710 Mario, let's do a height of 4, and voila. 02:06:08.710 --> 02:06:11.350 Now, I have a relatively simple-- though I certainly 02:06:11.350 --> 02:06:13.600 did it faster than you might without some practice-- 02:06:13.600 --> 02:06:15.760 implementation of Mario's pyramid. 02:06:15.760 --> 02:06:17.980 But here's where things get kind of cool. 02:06:17.980 --> 02:06:20.800 Let me stipulate that that is a correct iterative solution, even 02:06:20.800 --> 02:06:24.970 if it might take you some number of steps or trial and error 02:06:24.970 --> 02:06:28.180 to get that iterative loop-based code correct. 02:06:28.180 --> 02:06:30.580 Let me change this, now, to be recursive. 02:06:30.580 --> 02:06:34.510 And recall, a recursive function is one that calls itself. 02:06:34.510 --> 02:06:37.660 How do you print a pyramid of height h? 02:06:37.660 --> 02:06:41.980 Well, recall that you print a pyramid of height h minus 1, 02:06:41.980 --> 02:06:45.340 and then you proceed to print one more row of blocks. 02:06:45.340 --> 02:06:48.970 So let me take that literally. for int i gets zero. 02:06:48.970 --> 02:06:51.550 i is less than h, i++. 02:06:51.550 --> 02:06:54.550 Let me go ahead and just print that extra row of bricks 02:06:54.550 --> 02:06:58.480 like this, followed by a new line. 02:06:58.480 --> 02:07:00.260 So now, I did this kind of fast. 02:07:00.260 --> 02:07:01.340 But what am I doing here? 02:07:01.340 --> 02:07:06.520 Well, if the height equals 1, I want this loop to iterate one time. 02:07:06.520 --> 02:07:10.760 If the height equals 2, I wanted to iterate two times, 3, and so forth. 02:07:10.760 --> 02:07:14.260 So I think, using my zero-indexing technique here, this will work too. 02:07:14.260 --> 02:07:17.080 But if you prefer, I could certainly just change this to a 1 02:07:17.080 --> 02:07:18.638 and change this 2. 02:07:18.638 --> 02:07:19.930 But I'm going to go ahead and-- 02:07:19.930 --> 02:07:20.500 actually, no. 02:07:20.500 --> 02:07:23.350 In this case, I want to leave it as such, zero index, 02:07:23.350 --> 02:07:25.450 just like we typically do. 02:07:25.450 --> 02:07:29.200 All right, let me go ahead and compile this, make Mario. 02:07:29.200 --> 02:07:31.870 OK, oops, interesting. 02:07:31.870 --> 02:07:34.940 All paths through this function will call itself. 02:07:34.940 --> 02:07:37.780 So clang is being kind of smart here, whereby, 02:07:37.780 --> 02:07:42.260 it's noticing that in my draw function, I'm calling my draw function. 02:07:42.260 --> 02:07:44.358 And that's a process that never changes. 02:07:44.358 --> 02:07:46.150 In fact, let me see if I can override that. 02:07:46.150 --> 02:07:51.310 Let me use clang manually and compile a program called mario using mario.c. 02:07:51.310 --> 02:07:53.140 And let me go ahead and link in cs50. 02:07:53.140 --> 02:07:55.960 So I'm using our old school syntax from week two. 02:07:55.960 --> 02:07:56.980 OK, that compiled. 02:07:56.980 --> 02:07:58.270 And why did that compile? 02:07:58.270 --> 02:08:01.872 Well, make is, again, a program that uses your compiler clang. 02:08:01.872 --> 02:08:05.080 And we've configured make to be a little more user-friendly and a little more 02:08:05.080 --> 02:08:07.450 protective of you by turning on special features 02:08:07.450 --> 02:08:09.250 where we detect problems like that. 02:08:09.250 --> 02:08:12.730 By using clang directly now, I'm disabling those special checks. 02:08:12.730 --> 02:08:16.840 And watch what happens when I run Mario now for height of 4, for instance. 02:08:16.840 --> 02:08:18.730 Boom, it crashed. 02:08:18.730 --> 02:08:20.500 It didn't even print anything. 02:08:20.500 --> 02:08:21.953 It crashed pretty quickly. 02:08:21.953 --> 02:08:25.120 And again, a segmentation fault means you touched memory that you shouldn't. 02:08:25.120 --> 02:08:26.200 So what's going on? 02:08:26.200 --> 02:08:30.302 Well, if you think of this memory as representing main still, but then draw, 02:08:30.302 --> 02:08:33.610 draw, draw, draw, draw, draw. 02:08:33.610 --> 02:08:37.540 If every one of your calls to draw just cause draw again, 02:08:37.540 --> 02:08:39.070 why would it ever stop? 02:08:39.070 --> 02:08:41.590 It wouldn't seem to stop here, necessarily. 02:08:41.590 --> 02:08:45.070 So it seems that I'm missing a key detail in my recursive version. 02:08:45.070 --> 02:08:45.670 You know what? 02:08:45.670 --> 02:08:51.130 If there's nothing to draw, if height equals equals 0, let me go ahead, then, 02:08:51.130 --> 02:08:54.260 and just return immediately. 02:08:54.260 --> 02:08:57.250 Otherwise, I'll go ahead and draw part of the pyramid 02:08:57.250 --> 02:08:59.260 and then add the new row. 02:08:59.260 --> 02:09:02.110 So you need this so-called base case, which you literally 02:09:02.110 --> 02:09:05.410 choose to equal some simple value, like height of 0, height of 1, 02:09:05.410 --> 02:09:10.880 any hardcoded value, so that eventually, draw does not call itself. 02:09:10.880 --> 02:09:15.040 So let me go ahead and recompile this with clang or make. 02:09:15.040 --> 02:09:18.430 Let me rerun it, height of 4, and voila. 02:09:18.430 --> 02:09:20.680 It's still working just like the interior version, 02:09:20.680 --> 02:09:22.340 but it's now using recursion. 02:09:22.340 --> 02:09:24.250 So here's a sort of design question. 02:09:24.250 --> 02:09:26.020 Is iteration better than recursion? 02:09:26.020 --> 02:09:26.680 It depends. 02:09:26.680 --> 02:09:28.270 Iteration will always work. 02:09:28.270 --> 02:09:32.290 When using the iterative version, I will never overflow the stack 02:09:32.290 --> 02:09:33.140 and hit the heap. 02:09:33.140 --> 02:09:33.640 Why? 02:09:33.640 --> 02:09:35.723 Because I'm not calling functions again and again. 02:09:35.723 --> 02:09:38.410 There's only main and one invocation of draw. 02:09:38.410 --> 02:09:42.550 But with the recursive version, it's kind of a cool, powerful way 02:09:42.550 --> 02:09:43.270 to do things. 02:09:43.270 --> 02:09:45.610 Like, oh, I can draw you a pyramid of height h. 02:09:45.610 --> 02:09:48.370 Let me just have you draw me a pyramid of height h minus 1, 02:09:48.370 --> 02:09:49.750 and then I'll add a row. 02:09:49.750 --> 02:09:54.950 It's kind of this clever, cyclical argument that does work very elegantly. 02:09:54.950 --> 02:09:56.150 But there's a danger. 02:09:56.150 --> 02:10:00.830 And in fact, even though this base case ensures that it doesn't go forever, 02:10:00.830 --> 02:10:05.180 it could go on so long-- maybe let's try 10,000 invocations. 02:10:05.180 --> 02:10:06.290 So that worked OK. 02:10:06.290 --> 02:10:07.820 It's a little slow. 02:10:07.820 --> 02:10:09.320 I'm losing control over my keyboard. 02:10:09.320 --> 02:10:10.730 So Control C is your friend. 02:10:10.730 --> 02:10:12.050 Let me try this once more. 02:10:12.050 --> 02:10:16.700 Let me go ahead and do something like 2 billion and see if that works. 02:10:16.700 --> 02:10:17.540 Boom. 02:10:17.540 --> 02:10:19.110 So even that doesn't work. 02:10:19.110 --> 02:10:21.710 So there's this inherent danger with recursion, whereby, 02:10:21.710 --> 02:10:25.010 even though it empowered us last week to solve a problem even more efficiently 02:10:25.010 --> 02:10:29.810 with merge sort, we kind of got lucky, in that we weren't trying to crazy big 02:10:29.810 --> 02:10:33.080 things on Brian's shelf, because it would seem if you use recursion 02:10:33.080 --> 02:10:35.330 and call yourself again and again and again and again, 02:10:35.330 --> 02:10:40.340 even finitely many times, you might eventually touch memory you shouldn't. 02:10:40.340 --> 02:10:42.290 And what's the solution here? 02:10:42.290 --> 02:10:44.510 Unfortunately, it's don't do that. 02:10:44.510 --> 02:10:48.020 Design your algorithms, choose your inputs in such a way 02:10:48.020 --> 02:10:49.560 that there just isn't that risk. 02:10:49.560 --> 02:10:51.800 And we'll use recursion again in a few weeks 02:10:51.800 --> 02:10:54.800 time when we look at more sophisticated data structures. 02:10:54.800 --> 02:10:56.600 But again, there's always this trade off. 02:10:56.600 --> 02:10:58.725 Just because you can design something a little more 02:10:58.725 --> 02:11:03.120 elegantly doesn't necessarily mean that it's always going to work for you. 02:11:03.120 --> 02:11:06.560 But more commonly, are you likely to run into other problems as well? 02:11:06.560 --> 02:11:08.540 There's something called a buffer overflow. 02:11:08.540 --> 02:11:10.880 And this you will surely trip over in the coming weeks. 02:11:10.880 --> 02:11:13.610 A buffer overflow is when you allocate an array 02:11:13.610 --> 02:11:15.590 and go too far past the end of it. 02:11:15.590 --> 02:11:18.650 Or you use malloc and you, nonetheless, go farther 02:11:18.650 --> 02:11:21.020 than the end of the chunk of memory that you allocated. 02:11:21.020 --> 02:11:25.010 A buffer it's just a chunk of memory, so to speak, that you can use as you see 02:11:25.010 --> 02:11:25.550 fit. 02:11:25.550 --> 02:11:30.230 Buffer overflow means going beyond the boundaries of that array. 02:11:30.230 --> 02:11:32.930 You might use-- you're using, right now, video. 02:11:32.930 --> 02:11:35.125 You might know the phrase buffering from videos, 02:11:35.125 --> 02:11:37.250 like sort of buffering and annoying you on Netflix, 02:11:37.250 --> 02:11:39.050 because there's a spinning icon or whatnot. 02:11:39.050 --> 02:11:40.700 Well, that means exactly this. 02:11:40.700 --> 02:11:44.090 A buffer, in the context of YouTube or Zoom or Netflix, 02:11:44.090 --> 02:11:46.910 means some chunk of memory that was retrieved 02:11:46.910 --> 02:11:49.880 via malloc or some similar tool that gets filled 02:11:49.880 --> 02:11:52.580 with bytes comprising your video. 02:11:52.580 --> 02:11:56.210 And it's finite, which is why you can only buffer so many seconds 02:11:56.210 --> 02:11:59.520 or minutes of video before, eventually, if you're offline, 02:11:59.520 --> 02:12:01.220 you run out of video content to watch. 02:12:01.220 --> 02:12:02.930 And the stupid icon comes up, and you can 02:12:02.930 --> 02:12:07.680 watch no more, because a buffer is just a chunk of memory, an array of memory. 02:12:07.680 --> 02:12:12.830 And if Netflix or Google or others were to implement their code unsafely, 02:12:12.830 --> 02:12:16.740 they might very well go too far past that boundary as well. 02:12:16.740 --> 02:12:22.070 So with all this said, let's consider, in some of our final minutes 02:12:22.070 --> 02:12:26.000 here today, just what else we've been getting from these training wheels, 02:12:26.000 --> 02:12:28.830 because we do want to take them mostly off for you. 02:12:28.830 --> 02:12:30.890 So the CS50 library not only provides you 02:12:30.890 --> 02:12:33.855 with this abstraction of a string type, which again, 02:12:33.855 --> 02:12:35.480 doesn't give you any new functionality. 02:12:35.480 --> 02:12:38.600 Strings in C exist, just not by that name. 02:12:38.600 --> 02:12:40.850 They're known more properly as char stars. 02:12:40.850 --> 02:12:43.730 But all of these functions in the CS50 library 02:12:43.730 --> 02:12:49.490 can be implemented with other actual C functions that weren't from CS50, 02:12:49.490 --> 02:12:51.740 namely using one called scanf. 02:12:51.740 --> 02:12:54.260 But you're going to see, immediately, some of the dangers 02:12:54.260 --> 02:12:57.980 of using something like scanf, which is an old school function. 02:12:57.980 --> 02:13:01.280 It was not designed to be self-defensive like CS50's library. 02:13:01.280 --> 02:13:03.510 And so it's very easy to make mistakes. 02:13:03.510 --> 02:13:06.650 Let me go ahead, for instance, and create a file 02:13:06.650 --> 02:13:09.860 called scanf.c, just to demonstrate this function. 02:13:09.860 --> 02:13:13.200 I'm not going to use the CS50 library, just standard io dot h. 02:13:13.200 --> 02:13:15.470 And I'm going to give myself int main void. 02:13:15.470 --> 02:13:18.110 And I'm going to go ahead and give myself a variable x. 02:13:18.110 --> 02:13:21.260 And I'm going to go ahead and print out quote unquote, "x:" 02:13:21.260 --> 02:13:24.060 just like CS50's get int function does. 02:13:24.060 --> 02:13:25.940 And then I'm going to call scanf. 02:13:25.940 --> 02:13:30.170 And I'm going to go ahead and say, scan from the user's keyboard, an integer, 02:13:30.170 --> 02:13:33.708 and store it in the location of x. 02:13:33.708 --> 02:13:35.750 Then, I'm going to go ahead and print out, again, 02:13:35.750 --> 02:13:40.340 x, and a colon and a backslash percent i backslash n. 02:13:40.340 --> 02:13:41.420 And I'm going to print x. 02:13:41.420 --> 02:13:42.830 So what's going on here? 02:13:42.830 --> 02:13:46.580 In line 5, I'm declaring a variable called x, just like in week one. 02:13:46.580 --> 02:13:49.220 Line 6, just using printf, like in week one. 02:13:49.220 --> 02:13:52.460 The interesting stuff seems to be in line 7. 02:13:52.460 --> 02:13:56.870 Scanf is a function that takes input from the user, just like get int, get 02:13:56.870 --> 02:13:58.500 string, get float, and so forth. 02:13:58.500 --> 02:14:02.630 But it does it only by you having to understand pointers, 02:14:02.630 --> 02:14:07.790 because recall from our swap example, if you want to have a function, 02:14:07.790 --> 02:14:12.110 change the contents of a variable, as we did with a and b 02:14:12.110 --> 02:14:15.920 and x and y, you have to pass in the address of the variable, whose 02:14:15.920 --> 02:14:17.060 value you want to change. 02:14:17.060 --> 02:14:19.200 You can't just pass in x itself. 02:14:19.200 --> 02:14:22.263 So if we didn't use the CS50 library in week one, 02:14:22.263 --> 02:14:25.430 you would have been writing code like this just to get an int from the user. 02:14:25.430 --> 02:14:27.347 And you would have had to understand pointers. 02:14:27.347 --> 02:14:30.170 And you would have to understand ampersand and stars and so forth. 02:14:30.170 --> 02:14:32.712 It's just too much, when all we care about in the first weeks 02:14:32.712 --> 02:14:35.990 are loops and variables and conditions and sort of the fundamentals. 02:14:35.990 --> 02:14:39.230 But here, we now have the ability to call scanf, tell it 02:14:39.230 --> 02:14:41.150 to scan from the user's keyboard, so to speak, 02:14:41.150 --> 02:14:45.380 an integer, or percent f would give us a float or other such codes, 02:14:45.380 --> 02:14:49.040 and pass in the address of x so that scanf can go to that address 02:14:49.040 --> 02:14:51.440 and put the integer from the user's keyboard there. 02:14:51.440 --> 02:14:53.030 Line 8 is like week one stuff. 02:14:53.030 --> 02:14:54.680 I'm just printing out the value. 02:14:54.680 --> 02:14:55.950 And this is pretty safe. 02:14:55.950 --> 02:14:57.800 I'm going to go ahead and make scanf. 02:14:57.800 --> 02:14:58.495 It compiles OK. 02:14:58.495 --> 02:14:59.870 I'm going to go ahead and run it. 02:14:59.870 --> 02:15:00.980 I'm going to type in 50. 02:15:00.980 --> 02:15:03.180 And voila, it prints out a 50. 02:15:03.180 --> 02:15:06.920 But there's some weirdness, because if you run this program too 02:15:06.920 --> 02:15:09.410 and type in cat, well then x is 0. 02:15:09.410 --> 02:15:10.940 And there's no error checking. 02:15:10.940 --> 02:15:12.767 So immediately, you should glimpse that one 02:15:12.767 --> 02:15:14.600 of the features of the CS50 library, recall, 02:15:14.600 --> 02:15:17.630 is that we keep prompting the user again and again if they're not 02:15:17.630 --> 02:15:19.310 cooperating and giving you an int. 02:15:19.310 --> 02:15:21.740 So that's one feature you get from the library. 02:15:21.740 --> 02:15:26.120 But it turns out that get string is even more powerful, 02:15:26.120 --> 02:15:29.000 because if I go and change this program now, not to get an int, 02:15:29.000 --> 02:15:30.710 but something fancier like a string-- 02:15:30.710 --> 02:15:33.223 or wait, we're calling it char star now. 02:15:33.223 --> 02:15:35.390 I'm going to go ahead and do something very similar. 02:15:35.390 --> 02:15:37.640 I'm going to prompt the user for string s. 02:15:37.640 --> 02:15:39.020 And I'm going to use scanf. 02:15:39.020 --> 02:15:42.320 And I'm going to use percent s, just like printf uses percent s. 02:15:42.320 --> 02:15:44.510 And I'm going to pass in s. 02:15:44.510 --> 02:15:48.890 Now, to be clear, I don't need to do ampersand s here, 02:15:48.890 --> 02:15:53.010 because now, we all know that s is fundamentally an address. 02:15:53.010 --> 02:15:56.270 So it suffices just to pass in the address that you already have. 02:15:56.270 --> 02:16:01.280 Now, I'm going to go ahead and print out s colon, percent s backslash n, 02:16:01.280 --> 02:16:02.930 and print out s. 02:16:02.930 --> 02:16:07.730 But when I compile this, make scanf, it doesn't like it 02:16:07.730 --> 02:16:10.970 when I compile variable s's uninitialized when used here. 02:16:10.970 --> 02:16:14.390 All right, well if I really want to be sort of adventurous, 02:16:14.390 --> 02:16:16.350 I can override make's protections. 02:16:16.350 --> 02:16:19.880 And I can just compile this manually myself using scanf-- 02:16:19.880 --> 02:16:21.260 using clang directly. 02:16:21.260 --> 02:16:23.600 That worked, dot slash scanf. 02:16:23.600 --> 02:16:26.870 Let me go ahead and type in, for instance, "HI!" 02:16:26.870 --> 02:16:29.000 and you see weirdness, nul. 02:16:29.000 --> 02:16:31.190 Well, fortunately, make, and in turn clang, 02:16:31.190 --> 02:16:33.830 were kind of helping us help ourselves there. 02:16:33.830 --> 02:16:35.840 It was pointing out that you declared s. 02:16:35.840 --> 02:16:38.660 So you were declared 8 bytes for a pointer. 02:16:38.660 --> 02:16:39.860 But there's nothing there. 02:16:39.860 --> 02:16:41.459 It's a garbage value. 02:16:41.459 --> 02:16:43.170 And so there's nowhere to put this. 02:16:43.170 --> 02:16:45.889 And thankfully, printf and scanf are being smart enough 02:16:45.889 --> 02:16:48.870 by not just blindly going there and plopping H, I, 02:16:48.870 --> 02:16:50.760 exclamation point in a nul character. 02:16:50.760 --> 02:16:52.010 They're just leaving it alone. 02:16:52.010 --> 02:16:55.910 And this parenthetical nul is just a printf feature saying, you screwed up. 02:16:55.910 --> 02:16:58.100 If you see nul, you've done something wrong. 02:16:58.100 --> 02:17:00.830 It's just being generous and not crashing on you. 02:17:00.830 --> 02:17:04.879 If I actually want to get user's input, I need to be smarter than this. 02:17:04.879 --> 02:17:10.040 And I need to either allocate myself 4 bytes, as we've done earlier today. 02:17:10.040 --> 02:17:14.209 Or I could go back to week two stuff and say something like, give me 4 bytes. 02:17:14.209 --> 02:17:18.830 This, though, gives me 4 bytes on the stack somewhere 02:17:18.830 --> 02:17:21.410 down here in main's frame, so to speak. 02:17:21.410 --> 02:17:23.270 These rows are called frames. 02:17:23.270 --> 02:17:27.260 If I use malloc instead, it comes from the so-called heap, 02:17:27.260 --> 02:17:29.780 which not pictured, is sort of up here. 02:17:29.780 --> 02:17:34.309 And the only difference is that if I'm using malloc, I have to use free. 02:17:34.309 --> 02:17:38.930 If I'm using the stack, as I did in week two, I don't have to use free. 02:17:38.930 --> 02:17:40.730 It's automatically managed for me. 02:17:40.730 --> 02:17:42.590 So frankly, there's so much new stuff today. 02:17:42.590 --> 02:17:46.280 I like the idea of sticking with the old school arrays. 02:17:46.280 --> 02:17:51.379 So now, though, if I go ahead and make scanf, now it compiles with make. 02:17:51.379 --> 02:17:55.610 If I then run scanf and type in, HI!, voila, it seems to work. 02:17:55.610 --> 02:17:58.549 But that's because I was smart and anticipated that H-I, 02:17:58.549 --> 02:17:59.660 OK four characters. 02:17:59.660 --> 02:18:00.980 I gave myself 4 bytes. 02:18:00.980 --> 02:18:06.110 But what if the user types in, HI THERE, DAVID, HOW ARE YOU? 02:18:06.110 --> 02:18:08.059 Clearly, more than four bytes. 02:18:08.059 --> 02:18:11.959 And I hit Enter now, something weird there happened. 02:18:11.959 --> 02:18:13.790 The rest is just lost. 02:18:13.790 --> 02:18:16.670 And this would really be annoying and very frustrating 02:18:16.670 --> 02:18:19.520 if you-- trying to get user input in the first week of the class. 02:18:19.520 --> 02:18:21.500 Get string avoids this for you. 02:18:21.500 --> 02:18:23.719 Get string calls malloc for you. 02:18:23.719 --> 02:18:27.200 And it calls it for as big a chunk of memory as the string 02:18:27.200 --> 02:18:28.070 the human types in. 02:18:28.070 --> 02:18:30.980 Long story short, we sort of watch what they're typing character 02:18:30.980 --> 02:18:32.209 by character by character. 02:18:32.209 --> 02:18:34.340 And we make sure to allocate or reallocate 02:18:34.340 --> 02:18:38.879 just enough memory to fit whatever it is the human has typed in. 02:18:38.879 --> 02:18:42.107 So scanf is, essentially, how a function like the CS50 library 02:18:42.107 --> 02:18:43.190 works underneath the hood. 02:18:43.190 --> 02:18:46.650 But it is doing all of this for you. 02:18:46.650 --> 02:18:49.549 And as soon as you take away training wheels like that, or frankly, 02:18:49.549 --> 02:18:52.469 libraries like that, which it really is at the end of the day. 02:18:52.469 --> 02:18:53.719 It's not just a teaching tool. 02:18:53.719 --> 02:18:55.070 It's a useful library. 02:18:55.070 --> 02:18:58.469 You have to start implementing more of this low-level stuff yourself. 02:18:58.469 --> 02:18:59.810 So again, there is a trade off. 02:18:59.810 --> 02:19:02.727 If you don't want to use something like the CS50 library, that's fine. 02:19:02.727 --> 02:19:08.400 Now, the onus is on you to avoid all of these possible error conditions. 02:19:08.400 --> 02:19:11.209 All right, with that said, we have one final feature 02:19:11.209 --> 02:19:14.270 to give you in order to motivate this week's problems, wherein 02:19:14.270 --> 02:19:18.230 you'll actually explore and manipulate and write code to change files. 02:19:18.230 --> 02:19:22.790 And for that, we need one final topic of file I/O. File I/O 02:19:22.790 --> 02:19:27.350 is the term of art that describes taking input and output from files. 02:19:27.350 --> 02:19:30.980 Pretty much every program we've written thus far just uses memory, like this 02:19:30.980 --> 02:19:32.924 here, whereby, you can put stuff in memory. 02:19:32.924 --> 02:19:34.549 But as soon as your program ends, boom. 02:19:34.549 --> 02:19:35.330 It's gone. 02:19:35.330 --> 02:19:37.070 The contents of memory are gone. 02:19:37.070 --> 02:19:39.770 Files, of course, are where you and I in the computing world 02:19:39.770 --> 02:19:42.020 save our essays and documents and resumes 02:19:42.020 --> 02:19:44.629 and all of that permanently on your computer. 02:19:44.629 --> 02:19:48.590 In C, you have the ability, certainly, to write code yourself that 02:19:48.590 --> 02:19:50.730 saves files long term. 02:19:50.730 --> 02:19:53.450 So for instance, let me go ahead and write my own program here, 02:19:53.450 --> 02:19:59.260 a phonebook program that stores names and numbers in a file. 02:19:59.260 --> 02:20:02.380 I'm going to go ahead and include, just for convenience, the CS50 library 02:20:02.380 --> 02:20:04.480 again, because I don't want to deal with scanf. 02:20:04.480 --> 02:20:08.200 I'm going to go ahead and save this, incidentally, as phonebook.c. 02:20:08.200 --> 02:20:12.370 I'm going to go ahead and include, not just the CS50 library, but standard io. 02:20:12.370 --> 02:20:18.373 And preemptively, I'm going to go ahead and include string.h as well. 02:20:18.373 --> 02:20:20.290 And I'm going to go ahead in my main function. 02:20:20.290 --> 02:20:23.990 And I'm going to use a few new functions that we'll see only briefly here. 02:20:23.990 --> 02:20:27.260 But in the next problem set, will you explore these in more detail. 02:20:27.260 --> 02:20:29.980 I'm going to give myself a pointer to a file. 02:20:29.980 --> 02:20:33.820 It turns out, weirdly, that in all caps, FILE, 02:20:33.820 --> 02:20:38.540 this is a new data type that does come with C that represents a file. 02:20:38.540 --> 02:20:42.383 So I'm going to go ahead and give myself a pointer to a file, 02:20:42.383 --> 02:20:43.300 the address of a file. 02:20:43.300 --> 02:20:44.800 And I'm going to call the variable file. 02:20:44.800 --> 02:20:46.300 I could call it f I could call it x. 02:20:46.300 --> 02:20:49.130 I'm going to call it lowercase file, just to be clear. 02:20:49.130 --> 02:20:52.180 And I'm going to use a new function called f open, which means file open. 02:20:52.180 --> 02:20:54.077 And file open takes two arguments. 02:20:54.077 --> 02:20:57.160 It takes the first argument, which is the name of a file you want to open. 02:20:57.160 --> 02:20:59.638 I'm going to open a file called phonebook.csv. 02:20:59.638 --> 02:21:02.680 And then I'm going to go ahead and open it, specifically, in append mode. 02:21:02.680 --> 02:21:05.050 Long story short, you can open files in different ways, 02:21:05.050 --> 02:21:08.450 to read them, that is just look at their contents, to write them, 02:21:08.450 --> 02:21:10.780 which is to change their contents entirely, 02:21:10.780 --> 02:21:15.730 or to append to them, a, which means to add row by row to them, 02:21:15.730 --> 02:21:18.370 so to keep tacking on more information to them. 02:21:18.370 --> 02:21:20.210 I'm going to go ahead and, just to be safe, 02:21:20.210 --> 02:21:23.650 I'm going to say if file equals equals nul, 02:21:23.650 --> 02:21:26.180 because recall that nul signifies something went wrong, 02:21:26.180 --> 02:21:27.280 let's just return now. 02:21:27.280 --> 02:21:28.960 Maybe I mistyped the name of the file. 02:21:28.960 --> 02:21:29.950 Maybe it doesn't exist. 02:21:29.950 --> 02:21:31.420 Something went wrong, potentially. 02:21:31.420 --> 02:21:34.660 I'm going to check for that by saying, if file equals equals nul, just 02:21:34.660 --> 02:21:36.178 quit out of the program now. 02:21:36.178 --> 02:21:38.470 But after that, I'm going to go ahead and get a string. 02:21:38.470 --> 02:21:41.920 But we can call that char star now, called name. 02:21:41.920 --> 02:21:44.440 And I'm going to ask the user for a name. 02:21:44.440 --> 02:21:45.820 And we've done this before. 02:21:45.820 --> 02:21:48.610 I'm going to go ahead and ask them for a number, phone number. 02:21:48.610 --> 02:21:49.970 And we've done this before. 02:21:49.970 --> 02:21:52.690 The only difference, now, is I'm calling string char star. 02:21:52.690 --> 02:21:54.400 And now, here's the cool part. 02:21:54.400 --> 02:21:56.830 It turns out, if I want to save this name and number 02:21:56.830 --> 02:21:58.990 to that file permanently in a CSV-- 02:21:58.990 --> 02:22:02.170 if unfamiliar, popular in the consulting world, the analytics world. 02:22:02.170 --> 02:22:04.900 It's just a spreadsheet, a comma-separated value 02:22:04.900 --> 02:22:08.470 file that you can open in Excel or numbers or Google spreadsheet. 02:22:08.470 --> 02:22:13.660 I'm going to go ahead and, not printf, but fprintf to that file, 02:22:13.660 --> 02:22:18.580 a string followed by a comma, followed by a string, followed by a new line, 02:22:18.580 --> 02:22:21.070 plugging in the name and the number. 02:22:21.070 --> 02:22:25.280 And then down here, I'm going to close the file. 02:22:25.280 --> 02:22:28.570 So this is new. fprintf is not printf, which prints to your screen. 02:22:28.570 --> 02:22:30.307 fprintf prints to a file. 02:22:30.307 --> 02:22:32.890 So you have to pass in one more argument, the first one, which 02:22:32.890 --> 02:22:37.150 is the pointer to the file that you want to send these new strings to. 02:22:37.150 --> 02:22:40.180 Then you still provide a format string, which says, hey fprintf, 02:22:40.180 --> 02:22:43.060 this is the kind of data I want to print to the file. 02:22:43.060 --> 02:22:46.930 And then you plug in the variables, just like we've always done with printf. 02:22:46.930 --> 02:22:49.610 And then lastly, we close the file. 02:22:49.610 --> 02:22:53.200 So in short, this program would seem to prompt a human for a name and number. 02:22:53.200 --> 02:22:55.420 And then it's going to go ahead and write those names 02:22:55.420 --> 02:22:56.990 and numbers to the file. 02:22:56.990 --> 02:22:59.035 So let me go ahead and make phonebook. 02:22:59.035 --> 02:23:07.810 OK, no mistake so far, dot slash phonebook, David, 949-468-2750. 02:23:07.810 --> 02:23:11.140 OK, let me run it once more, even though nothing seems to have happened. 02:23:11.140 --> 02:23:15.730 Brian, how about 617-495-1000, Enter. 02:23:15.730 --> 02:23:17.950 Let me check my file browser here. 02:23:17.950 --> 02:23:22.240 Notice, all of the files we've created today, including, if I zoom in, 02:23:22.240 --> 02:23:25.390 not just phonebook.c, but phonebook.csv. 02:23:25.390 --> 02:23:29.290 And if I double click that, notice what's inside of this. 02:23:29.290 --> 02:23:33.700 Voila, David's name, Brian's name, and each of our numbers. 02:23:33.700 --> 02:23:36.280 And even cooler than that, let me go ahead and close this. 02:23:36.280 --> 02:23:40.213 Let me go ahead and download this file using the IDE. 02:23:40.213 --> 02:23:42.380 And that's going to put it into my Downloads folder. 02:23:42.380 --> 02:23:43.420 Let me go ahead and click on it. 02:23:43.420 --> 02:23:45.545 And it's going to open Excel or Numbers or whatever 02:23:45.545 --> 02:23:47.470 you happen to have on your Mac or PC. 02:23:47.470 --> 02:23:50.740 I'm going to go ahead and just proceed. 02:23:50.740 --> 02:23:54.400 And voila, looks a little stupid in this formatting here. 02:23:54.400 --> 02:23:57.160 But I've opened up a spreadsheet that I, myself, generated 02:23:57.160 --> 02:24:01.390 using fopen, fprintf, and fclose. 02:24:01.390 --> 02:24:04.180 So already, now that we have pointers at our disposal, 02:24:04.180 --> 02:24:08.292 can we actually manipulate things like files, which is quite cool. 02:24:08.292 --> 02:24:10.000 But we're going to do that this week, not 02:24:10.000 --> 02:24:12.940 with text, but with actual specific types of files. 02:24:12.940 --> 02:24:16.840 And indeed, recall this kind of thinking here. 02:24:16.840 --> 02:24:19.150 If you glance at this, it's probably pretty cryptic. 02:24:19.150 --> 02:24:21.400 It looks like machine code, but it's not. 02:24:21.400 --> 02:24:24.070 This is, perhaps, the simplest representation 02:24:24.070 --> 02:24:26.410 of a smiley face inside of a file. 02:24:26.410 --> 02:24:31.000 If you have a bitmap file, a map of bits, a grid of bits, those bits, 02:24:31.000 --> 02:24:33.130 quite simply, could literally be 0's and 1's. 02:24:33.130 --> 02:24:37.240 And if you assign the color black to 0 and the color white to 1, 02:24:37.240 --> 02:24:40.660 you could actually think of this same grid of 0's and 1's as representing, 02:24:40.660 --> 02:24:41.930 indeed, a smiley face. 02:24:41.930 --> 02:24:43.690 In other words, here are some pixels. 02:24:43.690 --> 02:24:45.520 We talked about pixels in week zero. 02:24:45.520 --> 02:24:49.567 Pixels are just the dots that compose a graphic file on your computer. 02:24:49.567 --> 02:24:50.650 And pixels are everywhere. 02:24:50.650 --> 02:24:53.320 All of us, now, tuning in live via Zoom or YouTube or the like, 02:24:53.320 --> 02:24:56.800 we're watching streams of pixels, which compose multiple images and multiple 02:24:56.800 --> 02:25:02.290 images compose video that appears to be moving at, like, 20 something or 30 02:25:02.290 --> 02:25:04.670 frames per second, images per second. 02:25:04.670 --> 02:25:08.530 Now, of course, there's only so much fidelity in these kinds of images. 02:25:08.530 --> 02:25:11.097 And it's quite common in the case on TV and in movies, 02:25:11.097 --> 02:25:13.930 if there's some bad guy that's been picked up with some surveillance 02:25:13.930 --> 02:25:17.050 footage or the like, invariably, the folks on Law & Order and the like 02:25:17.050 --> 02:25:19.930 can just kind of enhance the video and zoom in and see 02:25:19.930 --> 02:25:24.710 exactly the glint in the person's eye that reveals who committed some crime. 02:25:24.710 --> 02:25:26.140 Well, that's all kind of nonsense. 02:25:26.140 --> 02:25:29.367 And it derives from some of the primitives we introduced in week zero. 02:25:29.367 --> 02:25:31.450 In fact, just to poke fun at this, let me go ahead 02:25:31.450 --> 02:25:34.990 and play on a few seconds of this TV show here in the US 02:25:34.990 --> 02:25:39.670 called CSI, just to give you a sense of just how commonplace this kind of logic 02:25:39.670 --> 02:25:40.180 is. 02:25:40.180 --> 02:25:41.140 [VIDEO PLAYBACK] 02:25:41.140 --> 02:25:43.330 - We know. 02:25:43.330 --> 02:25:46.930 - That at 9:15, Ray Santoya was at the ATM. 02:25:46.930 --> 02:25:50.380 - So the question is, what was he doing at 9:16? 02:25:50.380 --> 02:25:53.180 - Shooting the 9 millimeter at something. 02:25:53.180 --> 02:25:54.820 Maybe he saw the sniper. 02:25:54.820 --> 02:25:56.920 - Or was working with him. 02:25:56.920 --> 02:25:59.490 - Wait, go back one. 02:25:59.490 --> 02:26:00.481 - What do you see? 02:26:00.481 --> 02:26:05.291 [CLICKING] 02:26:07.700 --> 02:26:11.420 - Bring his face up, full screen. 02:26:11.420 --> 02:26:12.530 - His glasses. 02:26:12.530 --> 02:26:13.982 - There's a reflection. 02:26:13.982 --> 02:26:17.426 [TYPING] 02:26:23.840 --> 02:26:25.620 - That's Neuvitas baseball team. 02:26:25.620 --> 02:26:26.630 That's their logo. 02:26:26.630 --> 02:26:29.075 - And he's talking to whoever's wearing that jacket. 02:26:29.075 --> 02:26:31.160 - We may have a witness. 02:26:31.160 --> 02:26:32.700 - To both shootings. 02:26:32.700 --> 02:26:33.283 [END PLAYBACK] 02:26:33.283 --> 02:26:36.408 DAVID MALAN: So unfortunately, today will rather ruin a lot of TV and movie 02:26:36.408 --> 02:26:38.650 for you, because you can't just zoom in infinitely 02:26:38.650 --> 02:26:41.250 and see more information if that information is not there. 02:26:41.250 --> 02:26:43.750 At the end of the day, there's only a finite number of bits. 02:26:43.750 --> 02:26:46.120 And case in point, here's a photograph of Brian. 02:26:46.120 --> 02:26:48.580 And you might see that, oh, there's a glint in his eye. 02:26:48.580 --> 02:26:50.930 Let's see what was being reflected in his eye there. 02:26:50.930 --> 02:26:53.410 And so if we Zoom in on this image here of Brian, 02:26:53.410 --> 02:26:57.730 and maybe we zoom in a little further, that's all that's actually there. 02:26:57.730 --> 02:27:00.160 You can't just click the enhance button and see more, 02:27:00.160 --> 02:27:02.368 because at the end of the day, these are just pixels. 02:27:02.368 --> 02:27:06.310 And pixels, per week zero, are just 0's and 1's, and finitely, many so. 02:27:06.310 --> 02:27:08.470 So what you see is what you get. 02:27:08.470 --> 02:27:12.190 Now, with that said-- and actually, we can poke fun of this, too, here. 02:27:12.190 --> 02:27:14.830 Let me just play one other short clip from Futurama, 02:27:14.830 --> 02:27:18.423 which kind of hammers home this point as well, but more playfully so. 02:27:18.423 --> 02:27:19.090 [VIDEO PLAYBACK] 02:27:19.090 --> 02:27:23.250 - Magnify that death speed. 02:27:23.250 --> 02:27:24.770 Why is it still blurry? 02:27:24.770 --> 02:27:26.710 - That's all the resolution we have. 02:27:26.710 --> 02:27:29.050 Making it bigger doesn't make it clearer. 02:27:29.050 --> 02:27:31.220 - It does on CSI: Miami. 02:27:31.220 --> 02:27:32.020 - [SIGH] 02:27:32.020 --> 02:27:32.170 [END PLAYBACK] 02:27:32.170 --> 02:27:35.212 DAVID MALAN: So there, we have two clips talking, rather, to one another. 02:27:35.212 --> 02:27:37.330 But I have to update things for 2020. 02:27:37.330 --> 02:27:41.972 You can't really pick up the internet these days or magazine these days, 02:27:41.972 --> 02:27:43.930 if you even would, that doesn't somehow mention 02:27:43.930 --> 02:27:45.850 machine learning and artificial intelligence 02:27:45.850 --> 02:27:48.005 and fancy algorithms via which you can do things 02:27:48.005 --> 02:27:49.630 that previously weren't quite possible. 02:27:49.630 --> 02:27:51.460 And that's actually kinda sorta the case. 02:27:51.460 --> 02:27:56.290 You might recall from week zero, that we found this beautiful watercolor 02:27:56.290 --> 02:28:00.250 painting in the Harvard archives that's only about 11 inches tall total. 02:28:00.250 --> 02:28:03.700 And yet somehow, it's 13 feet tall here behind me. 02:28:03.700 --> 02:28:06.533 Now, normally, if you were to just enhance this watercolor painting, 02:28:06.533 --> 02:28:08.658 it would start to look pretty stupid pretty quickly 02:28:08.658 --> 02:28:10.570 with lots and lots of pixelation, even if you 02:28:10.570 --> 02:28:12.940 used a very fancy camera, as the archives do, 02:28:12.940 --> 02:28:14.440 to capture the original image. 02:28:14.440 --> 02:28:16.810 But we wanted to blow it up to 13 feet tall 02:28:16.810 --> 02:28:21.110 so that it would stand at high quality behind us this whole time. 02:28:21.110 --> 02:28:24.790 And there, we actually did use enhance, in some sense. 02:28:24.790 --> 02:28:28.640 So using, long story short, fancier algorithms than those last week, 02:28:28.640 --> 02:28:31.690 you can use artificial intelligence, machine learning, 02:28:31.690 --> 02:28:36.130 to actually analyze data and find patterns where there weren't-- 02:28:36.130 --> 02:28:38.280 that aren't necessarily visible to the human eye. 02:28:38.280 --> 02:28:41.590 So for instance, if we take the original here and start to zoom in, 02:28:41.590 --> 02:28:43.600 it looks pretty good at this resolution. 02:28:43.600 --> 02:28:44.720 But it's pretty smooth. 02:28:44.720 --> 02:28:48.730 You don't really see the fact that this was paint on an actual canvas. 02:28:48.730 --> 02:28:50.707 So this was just zooming in on Photoshop. 02:28:50.707 --> 02:28:52.540 But when you actually run an image like this 02:28:52.540 --> 02:28:55.990 through fancy machine learning-based software, artificial intelligence, 02:28:55.990 --> 02:28:58.570 you can begin to improve it and actually see, 02:28:58.570 --> 02:29:01.390 not just this window from the top of one of the buildings, which 02:29:01.390 --> 02:29:03.520 is pretty glossed over here in Photoshop, 02:29:03.520 --> 02:29:05.480 you can start to see more detail. 02:29:05.480 --> 02:29:08.750 So this is literally the before, just zooming in Photoshop. 02:29:08.750 --> 02:29:12.572 This is after, actually applying fancy artificial intelligence algorithms 02:29:12.572 --> 02:29:15.280 that notice, wait a minute, there's a little discoloration there. 02:29:15.280 --> 02:29:17.072 Wait, there's a little discoloration there. 02:29:17.072 --> 02:29:20.830 And nowadays, enhance is increasingly becoming a thing. 02:29:20.830 --> 02:29:22.450 It's still inferring. 02:29:22.450 --> 02:29:25.270 It's not resurrecting information that was necessarily there. 02:29:25.270 --> 02:29:28.240 It's doing its best guess, really, algorithmically, 02:29:28.240 --> 02:29:30.487 to reconstruct what the image actually was. 02:29:30.487 --> 02:29:32.320 And if we zoom in further, you can, perhaps, 02:29:32.320 --> 02:29:35.440 see that this is really starting to get blurry if you just use Photoshop 02:29:35.440 --> 02:29:36.578 and keep zooming in. 02:29:36.578 --> 02:29:38.620 But if you run it through fancy enough algorithms 02:29:38.620 --> 02:29:40.780 and start to notice slight discolorations that 02:29:40.780 --> 02:29:44.920 aren't super visible to the human eye, we can enhance that even further. 02:29:44.920 --> 02:29:46.540 And you can't do it infinitely so. 02:29:46.540 --> 02:29:48.550 And in some sense, we're creating information 02:29:48.550 --> 02:29:51.282 where there isn't necessarily that information there. 02:29:51.282 --> 02:29:54.490 So whether or not these kinds of things hold up in court is another question. 02:29:54.490 --> 02:29:56.920 But it can improve the fidelity of images like this. 02:29:56.920 --> 02:30:02.570 And indeed, it allowed us to zoom in from 11 inches to 13 feet instead. 02:30:02.570 --> 02:30:05.920 So when it comes to manipulating images, ultimately, we 02:30:05.920 --> 02:30:10.030 do have some programmatic capabilities, including this file pointer, 02:30:10.030 --> 02:30:13.280 like we just saw, and also, a few other functions as well. 02:30:13.280 --> 02:30:15.550 And our final examples, here, will lay the foundation 02:30:15.550 --> 02:30:17.380 for what you'll do this coming week, which 02:30:17.380 --> 02:30:21.250 is manipulate your very own graphical files with a newfound understanding 02:30:21.250 --> 02:30:25.270 of pointers and addresses and now files and input and output. 02:30:25.270 --> 02:30:30.010 For instance, I'm going to go ahead and open up a program here called-- 02:30:30.010 --> 02:30:32.110 give me just one second. 02:30:32.110 --> 02:30:37.660 I'm going to open up a program here called jpeg.c. 02:30:37.660 --> 02:30:40.610 And this program, jpeg.c, which I wrote in advance, 02:30:40.610 --> 02:30:43.400 which is on the course's website, does the following. 02:30:43.400 --> 02:30:46.510 It first declares a type called byte. 02:30:46.510 --> 02:30:49.990 It turns out, in C, there's no common definition of what a byte is. 02:30:49.990 --> 02:30:51.610 A bite, as we know it, is a bit. 02:30:51.610 --> 02:30:53.680 And it turns out, the simplest way to create 02:30:53.680 --> 02:30:57.250 a byte is to define our own, just like we've defined a string, 02:30:57.250 --> 02:31:01.840 just like we've defined other types too, like a student, in order-- 02:31:01.840 --> 02:31:04.640 a person, rather, in order to give us a byte. 02:31:04.640 --> 02:31:07.210 So this first line of code just declares a data type 02:31:07.210 --> 02:31:11.830 called byte, using another, more arcane data type called u int a underscore t. 02:31:11.830 --> 02:31:13.330 But more on that in the problem set. 02:31:13.330 --> 02:31:15.820 That this just did invent something called byte. 02:31:15.820 --> 02:31:17.928 Notice, in this program, I'm resurrecting the idea 02:31:17.928 --> 02:31:21.220 from week two of command line arguments, where we can take input from the user. 02:31:21.220 --> 02:31:23.860 Notice that I'm checking if the user typed in two arguments. 02:31:23.860 --> 02:31:27.520 And if not, I'm returning one immediately to signify error. 02:31:27.520 --> 02:31:30.490 In line 17, I'm using my new technique. 02:31:30.490 --> 02:31:34.210 I'm opening a file using the name of the file 02:31:34.210 --> 02:31:36.050 that the human typed at the command line. 02:31:36.050 --> 02:31:40.270 And this time, I'm opening it to read it with quote unquote, r instead of a. 02:31:40.270 --> 02:31:41.660 But if there's not a file-- 02:31:41.660 --> 02:31:44.920 so if bang file, that is, if exclamation point file, 02:31:44.920 --> 02:31:47.990 or if file equals equals NULL, those mean the same thing. 02:31:47.990 --> 02:31:51.040 I can go ahead and return one, signifying an error. 02:31:51.040 --> 02:31:53.710 Down here, I'm doing something a little clever. 02:31:53.710 --> 02:31:56.890 It turns out that with very high probability, 02:31:56.890 --> 02:32:01.640 you can determine if any file is a jpeg by looking only at its first three 02:32:01.640 --> 02:32:02.140 bytes. 02:32:02.140 --> 02:32:04.720 A lot of file formats have what are called magic numbers 02:32:04.720 --> 02:32:06.350 at the beginning of their files. 02:32:06.350 --> 02:32:10.990 And these are industry standard numbers, 1 or 2 or 3 or more of them, 02:32:10.990 --> 02:32:13.910 that is just commonly expected to be at the beginning of a file, 02:32:13.910 --> 02:32:16.240 so that a program can quickly check, is this a jpeg? 02:32:16.240 --> 02:32:16.960 Is this a gif? 02:32:16.960 --> 02:32:18.070 Is this a Word document? 02:32:18.070 --> 02:32:19.300 Is this an Excel file? 02:32:19.300 --> 02:32:21.910 They tend to have these numbers at the beginning of them. 02:32:21.910 --> 02:32:26.020 And jpegs have a sequence of bytes that we're about to see. 02:32:26.020 --> 02:32:29.770 This line of code 24 here, as you'll see in the next problem set, 02:32:29.770 --> 02:32:33.070 is how you might give yourself a buffer of bytes, specifically 02:32:33.070 --> 02:32:35.320 an array of three bytes. 02:32:35.320 --> 02:32:38.380 This next line of code, as you'll see this coming week, is called fread. 02:32:38.380 --> 02:32:40.720 fread, as the name suggests, reads from a file. 02:32:40.720 --> 02:32:42.940 That is, it grabs bytes from a file. 02:32:42.940 --> 02:32:45.790 And it's a little fancy to use, but you'll get more comfortable 02:32:45.790 --> 02:32:47.140 with this over time. 02:32:47.140 --> 02:32:52.060 It reads into this buffer, its first argument, the size of this data type, 02:32:52.060 --> 02:32:53.050 the size of a byte. 02:32:53.050 --> 02:32:58.250 And it reads in this many of those data types from this file. 02:32:58.250 --> 02:33:01.480 So again, it's for arguments, which is kind of a lot from what we've seen. 02:33:01.480 --> 02:33:08.230 But it reads from this file, three bytes into this array, 02:33:08.230 --> 02:33:09.770 a.k.a. buffer, called bytes. 02:33:09.770 --> 02:33:13.460 So this is just how you write code that doesn't put data in a file, 02:33:13.460 --> 02:33:14.650 but read it from it. 02:33:14.650 --> 02:33:16.700 And then here, notice our hexadecimal. 02:33:16.700 --> 02:33:18.190 So we've come full circle. 02:33:18.190 --> 02:33:23.110 If bytes bracket 0 equals equals 0xff and bytes 02:33:23.110 --> 02:33:27.080 bracket 1 equals 0xd8 and bytes bracket 2 equals 0xff, 02:33:27.080 --> 02:33:28.960 this definitely looks cryptic to you. 02:33:28.960 --> 02:33:31.570 But that's just because I looked up in the manual for jpegs, 02:33:31.570 --> 02:33:34.900 and it turns out that almost any jpeg, rather, 02:33:34.900 --> 02:33:39.430 must start with 0xff, 0xd8, 0xff. 02:33:39.430 --> 02:33:43.450 Those are the first three bytes of any jpeg on your Mac, your PC, 02:33:43.450 --> 02:33:44.350 on the internet. 02:33:44.350 --> 02:33:46.300 There are always those three bytes. 02:33:46.300 --> 02:33:50.500 It turns out, the fourth byte further decides whether or not 02:33:50.500 --> 02:33:51.730 a file is actually a jpeg. 02:33:51.730 --> 02:33:54.640 But the algorithm for that's a little fancier, so I kept it simple. 02:33:54.640 --> 02:33:59.020 If the first three bytes of a file are those, maybe you have a jpeg. 02:33:59.020 --> 02:34:01.150 But if you don't have exactly those three bytes, 02:34:01.150 --> 02:34:02.920 you definitely don't have a jpeg. 02:34:02.920 --> 02:34:05.270 And so what I can do, here, is as follows. 02:34:05.270 --> 02:34:09.700 In today's code-- let me go ahead and grab two other files 02:34:09.700 --> 02:34:11.620 that I brought with me. 02:34:11.620 --> 02:34:16.210 And one happens to be a photograph again. 02:34:16.210 --> 02:34:18.160 Give me one second. 02:34:18.160 --> 02:34:24.010 I brought with me a few files, one of which is called brian.jpeg, 02:34:24.010 --> 02:34:25.870 which is the same photo of Brian. 02:34:25.870 --> 02:34:28.030 And then I have a gif, which of course, is not 02:34:28.030 --> 02:34:31.210 a jpeg, that is this cat typing here. 02:34:31.210 --> 02:34:33.250 And what I, effectively, have in front of me now 02:34:33.250 --> 02:34:37.870 is a program that if I do make jpeg, because this file is jpeg.c, 02:34:37.870 --> 02:34:43.360 and I run dot slash jpeg, I can type in something like cat.gif 02:34:43.360 --> 02:34:46.990 at the command line as an argument, hit Enter, and I should see no. 02:34:46.990 --> 02:34:51.550 By contrast, if I pass in Brian's jpeg at the command line as an argument, 02:34:51.550 --> 02:34:52.630 I see maybe. 02:34:52.630 --> 02:34:54.430 And again, maybe only because the algorithm 02:34:54.430 --> 02:34:56.638 for actually adjudicating whether something is a jpeg 02:34:56.638 --> 02:34:58.550 is a little more complicated than that. 02:34:58.550 --> 02:35:02.590 But indeed, I can now access the individual bytes, and therefore pixels, 02:35:02.590 --> 02:35:06.310 it would seem, of an image file. 02:35:06.310 --> 02:35:08.575 And in fact, we can even do this. 02:35:08.575 --> 02:35:10.450 Let me go ahead and show you one last program 02:35:10.450 --> 02:35:13.960 that we wrote deliberately in advance, just to give you a taste of what's 02:35:13.960 --> 02:35:15.790 coming with the next problem set. 02:35:15.790 --> 02:35:19.480 This program is a reimplementation of the program you've probably 02:35:19.480 --> 02:35:21.820 used one or more times called CP. 02:35:21.820 --> 02:35:25.570 Recall that CP is a program in the IDE and in Linux, 02:35:25.570 --> 02:35:27.730 more generally, that allows you to copy a file. 02:35:27.730 --> 02:35:31.660 You do CP, space, the filename, space, the new filename. 02:35:31.660 --> 02:35:32.650 How does this work? 02:35:32.650 --> 02:35:37.090 I now have all of the building blocks with which to copy files myself. 02:35:37.090 --> 02:35:39.100 So again, I'm defining a byte up here. 02:35:39.100 --> 02:35:41.930 I'm defining main as taking command line arguments here. 02:35:41.930 --> 02:35:43.000 And notice one change. 02:35:43.000 --> 02:35:44.800 I'm not using the CS50 library. 02:35:44.800 --> 02:35:52.090 So even what was previously string in week two is now char star. 02:35:52.090 --> 02:35:55.450 Even here for argv, I'm making sure that the human types 02:35:55.450 --> 02:36:00.580 in three words, the program's name and the source file and the destination 02:36:00.580 --> 02:36:01.180 file. 02:36:01.180 --> 02:36:02.410 I'm using fopen again. 02:36:02.410 --> 02:36:06.100 I'm opening the source file here from argv1. 02:36:06.100 --> 02:36:07.358 I'm making sure it's not nul. 02:36:07.358 --> 02:36:08.650 And then I'm quitting if it is. 02:36:08.650 --> 02:36:13.030 I'm then-- here's something new, opening the destination file here, also 02:36:13.030 --> 02:36:13.870 with fopen. 02:36:13.870 --> 02:36:15.700 But I'm using quote unquote, "w." 02:36:15.700 --> 02:36:19.630 I'm opening one file with r, one file for w, because I want to read from one 02:36:19.630 --> 02:36:21.160 and write to the other. 02:36:21.160 --> 02:36:25.360 And then down here, this loop is a clever way 02:36:25.360 --> 02:36:27.370 of copying one file to another. 02:36:27.370 --> 02:36:30.790 I'm giving myself a buffer of one byte, so just a temporary variable, just 02:36:30.790 --> 02:36:33.090 like Brian's temp or empty glass. 02:36:33.090 --> 02:36:35.160 And I'm using this function, fread. 02:36:35.160 --> 02:36:39.750 I'm reading into that buffer via its address, the size of a byte, 02:36:39.750 --> 02:36:42.870 specifically one byte from the source file. 02:36:42.870 --> 02:36:47.940 And then, in that same loop, I'm writing from that buffer, the size of a byte, 02:36:47.940 --> 02:36:50.950 specifically one byte, to the destination. 02:36:50.950 --> 02:36:53.760 So literally, the CP program you might have seen me use 02:36:53.760 --> 02:36:57.090 or you yourself have used to copy files, is literally doing this. 02:36:57.090 --> 02:36:59.790 It's opening one file, iterating over all of its bytes, 02:36:59.790 --> 02:37:02.010 and copying them from source to destination. 02:37:02.010 --> 02:37:04.260 And then lastly, it's closing the file. 02:37:04.260 --> 02:37:06.360 And these last two examples deliberately fast, 02:37:06.360 --> 02:37:11.130 because this whole week will be spent diving into file I/O and images 02:37:11.130 --> 02:37:11.890 thereof. 02:37:11.890 --> 02:37:16.560 But all that we've done is use these fread, fopen, and fwrite and f close, 02:37:16.560 --> 02:37:18.610 to manipulate those very files. 02:37:18.610 --> 02:37:21.975 So for instance, if I now do this, let me do make cp. 02:37:21.975 --> 02:37:25.800 OK, seems to compile, dot slash cp, brian.jpeg. 02:37:25.800 --> 02:37:27.750 How about brian2.jpeg? 02:37:27.750 --> 02:37:28.680 And hit Enter. 02:37:28.680 --> 02:37:29.880 Nothing seems to happen. 02:37:29.880 --> 02:37:33.240 But if I go in here and double click on brian2, 02:37:33.240 --> 02:37:37.420 we see that we have a second copy of Brian's actual file. 02:37:37.420 --> 02:37:41.560 So this coming week, you'll experiment with multiple file formats for images. 02:37:41.560 --> 02:37:42.580 The first is jpegs. 02:37:42.580 --> 02:37:45.000 And we will give you a so-called forensic image 02:37:45.000 --> 02:37:47.938 of a whole bunch of photographs from a digital memory card. 02:37:47.938 --> 02:37:50.730 In fact, it's very common these days, certainly in law enforcement, 02:37:50.730 --> 02:37:53.580 to take forensic copies of hard drives, of media sticks, 02:37:53.580 --> 02:37:55.920 of phones and other devices, and then analyze them 02:37:55.920 --> 02:37:58.650 for data that's been lost or corrupted or deleted. 02:37:58.650 --> 02:38:01.980 We'll do exactly that, whereby, you'll write a program that recovers 02:38:01.980 --> 02:38:05.850 jpegs that have been accidentally deleted from a digital memory card. 02:38:05.850 --> 02:38:08.100 And we'll give you all copies of that memory card 02:38:08.100 --> 02:38:11.220 by making a forensic image of it, that is copying all of the 0's and 1's 02:38:11.220 --> 02:38:13.710 from a camera and giving them to you in a file 02:38:13.710 --> 02:38:16.710 that you can fread and then fwrite from. 02:38:16.710 --> 02:38:18.930 We'll also introduce you to bitmap files, 02:38:18.930 --> 02:38:22.290 BMP's, popularized by the Windows operating 02:38:22.290 --> 02:38:24.160 system for wallpaper's and the like. 02:38:24.160 --> 02:38:28.470 But we'll use them to implement using pointers and using file I/O, 02:38:28.470 --> 02:38:30.550 your very own Instagram-like filter. 02:38:30.550 --> 02:38:33.540 So we'll take this picture, here, of the Weeks footbridge 02:38:33.540 --> 02:38:35.578 here in Cambridge, Massachusetts by Harvard. 02:38:35.578 --> 02:38:37.620 And we'll have you implement a number of filters, 02:38:37.620 --> 02:38:39.328 taking this original image, for instance, 02:38:39.328 --> 02:38:41.910 and desaturating it, making it black and white, 02:38:41.910 --> 02:38:45.210 by iterating over all of the pixels top to bottom, left to right, 02:38:45.210 --> 02:38:49.350 and recognizing any colors, like red or green or blue or anything in between, 02:38:49.350 --> 02:38:53.467 and changing them to some shade of gray, doing a sepia filter, 02:38:53.467 --> 02:38:55.800 making things look old school, like this photo was taken 02:38:55.800 --> 02:39:00.810 many years ago, by similarly applying a heuristic that alters the colors of all 02:39:00.810 --> 02:39:02.345 of the pixels in this picture. 02:39:02.345 --> 02:39:05.220 We'll have you flip it around so you have to put this pixel over here 02:39:05.220 --> 02:39:06.630 and this pixel over there. 02:39:06.630 --> 02:39:09.690 And you'll appreciate exactly how files are implemented 02:39:09.690 --> 02:39:12.180 within your own hard drive and phone. 02:39:12.180 --> 02:39:17.580 And you'll even implement, for instance, a blur filter, which no accident, 02:39:17.580 --> 02:39:20.010 makes it harder to see what's going on here, 02:39:20.010 --> 02:39:23.700 because you're starting to, now, average together pixels that are nearby 02:39:23.700 --> 02:39:27.090 each other to kind of gloss things over and deliberately 02:39:27.090 --> 02:39:28.990 make it harder to see here. 02:39:28.990 --> 02:39:30.733 And so we'll even, if you so choose, have 02:39:30.733 --> 02:39:33.150 you implement edge detection, if feeling more comfortable, 02:39:33.150 --> 02:39:37.020 where you find the edges of all of the physical objects in these pictures, 02:39:37.020 --> 02:39:43.350 in order to actually detect them in code and create visual art like this. 02:39:43.350 --> 02:39:44.220 Now, this was a lot. 02:39:44.220 --> 02:39:45.960 And I know pointers are generally considered 02:39:45.960 --> 02:39:47.820 to be among the more challenging features of C, 02:39:47.820 --> 02:39:49.403 and certainly, programming in general. 02:39:49.403 --> 02:39:52.140 So if you're feeling like it's been quite a bit, it was. 02:39:52.140 --> 02:39:55.290 But you do now have the ability, either today 02:39:55.290 --> 02:39:59.040 or in the very near term, to understand even XKCD comics like this that most 02:39:59.040 --> 02:40:00.990 any computer scientist out there has seen. 02:40:00.990 --> 02:40:05.130 So our final look for you, today, is on this joke here. 02:40:05.130 --> 02:40:10.050 And even though I can't necessarily hear you from afar, 02:40:10.050 --> 02:40:12.690 I'll just assume, in our final moments today, 02:40:12.690 --> 02:40:16.650 that everyone is breaking out into a very geeky laughter. 02:40:16.650 --> 02:40:19.530 And I see some smiles, at least, which is reassuring. 02:40:19.530 --> 02:40:21.480 This was, then, CS50. 02:40:21.480 --> 02:40:23.010 We'll see you next time. 02:40:23.010 --> 02:40:26.360 [MUSIC PLAYING]