1 00:00:00,000 --> 00:00:04,482 [MUSIC PLAYING] 2 00:00:04,482 --> 00:00:49,370 3 00:00:49,370 --> 00:00:53,270 DAVID MALAN: All right, this is CS50, and this is week four. 4 00:00:53,270 --> 00:00:55,190 And for the past several weeks, we've had 5 00:00:55,190 --> 00:00:58,217 training wheels of sorts on, while using this language known as C. 6 00:00:58,217 --> 00:01:01,050 And those training wheels have been in the form of the CS50 library. 7 00:01:01,050 --> 00:01:05,580 And you use this library, of course, by selecting and including cs50.h 8 00:01:05,580 --> 00:01:06,650 atop your code. 9 00:01:06,650 --> 00:01:08,733 And then if you think about how clang works, 10 00:01:08,733 --> 00:01:12,080 you've been linking your code via dash L CS50. 11 00:01:12,080 --> 00:01:15,290 But all of that has been automated for you up until now, using make. 12 00:01:15,290 --> 00:01:17,900 Today, we'll transition from last week's focus 13 00:01:17,900 --> 00:01:21,290 on algorithms to a little more focus on machines 14 00:01:21,290 --> 00:01:24,980 and on the machines we now use to implement these algorithms all the more 15 00:01:24,980 --> 00:01:27,410 powerfully, as we begin to take off these training wheels 16 00:01:27,410 --> 00:01:30,840 and look at what's really going on underneath the hood of your computer. 17 00:01:30,840 --> 00:01:33,740 And as complicated as some aspects of C have been, 18 00:01:33,740 --> 00:01:36,320 as new is programming may very well be to you, 19 00:01:36,320 --> 00:01:39,710 realize that there's not all that much going on underneath the hood 20 00:01:39,710 --> 00:01:42,350 that we need to understand to now move onward 21 00:01:42,350 --> 00:01:45,920 and start solving far more interesting and more sophisticated and more 22 00:01:45,920 --> 00:01:46,820 fun problems. 23 00:01:46,820 --> 00:01:49,170 We just need a few additional building blocks. 24 00:01:49,170 --> 00:01:52,340 And so today, we'll do this, first, by relearning how to count. 25 00:01:52,340 --> 00:01:55,080 Here, for instance, is what we'll call the computer's memory. 26 00:01:55,080 --> 00:01:56,420 And we've seen this grid before. 27 00:01:56,420 --> 00:01:59,420 And we can number recall all of the bytes in your computer's memory. 28 00:01:59,420 --> 00:02:04,550 We might call this byte number 0, 1, 2, 3, 4, all the way up to byte 15, 29 00:02:04,550 --> 00:02:05,610 and so forth. 30 00:02:05,610 --> 00:02:08,240 But it turns out, when talking about computers' memories, 31 00:02:08,240 --> 00:02:10,610 computers and computer scientists and programmers 32 00:02:10,610 --> 00:02:13,070 actually don't tend to use decimal. 33 00:02:13,070 --> 00:02:15,830 They definitely don't tend to use binary at that low level. 34 00:02:15,830 --> 00:02:19,010 Instead, they tend to use, just for conventional sake, 35 00:02:19,010 --> 00:02:21,020 something called hexadecimal. 36 00:02:21,020 --> 00:02:23,210 Hexadecimal is a different base system that, 37 00:02:23,210 --> 00:02:27,120 instead of using 10 digits or 2 digits, uses 16 instead. 38 00:02:27,120 --> 00:02:29,360 And so a computer scientist, when numbering things 39 00:02:29,360 --> 00:02:33,980 like bytes in a computer memory, would still do 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. 40 00:02:33,980 --> 00:02:37,350 But after that, instead of going onward with decimal to, say, 10, 41 00:02:37,350 --> 00:02:40,970 11, 12, 13, 14, 15, they instead, conventionally, 42 00:02:40,970 --> 00:02:43,260 would start using a few letters of the alphabet. 43 00:02:43,260 --> 00:02:47,270 And so, in hexadecimal, this different base system base 16, 44 00:02:47,270 --> 00:02:48,980 you start counting at 0 still. 45 00:02:48,980 --> 00:02:51,130 You count up to and through 9. 46 00:02:51,130 --> 00:02:52,880 But when you want to keep counting higher, 47 00:02:52,880 --> 00:02:57,440 you then go to A, B, C, D, E, and F. 48 00:02:57,440 --> 00:03:02,630 And the upside of this is that, within hexadecimal-- and that hex implies 16-- 49 00:03:02,630 --> 00:03:08,630 you have 16 total individual digits, 0 through 9, and also now, A through F. 50 00:03:08,630 --> 00:03:12,300 So we don't have to introduce second digits just to count up as high as 16. 51 00:03:12,300 --> 00:03:14,480 We can use individual digits 0 through F. 52 00:03:14,480 --> 00:03:18,650 And we can keep counting up further by using multiple hexadecimal digits. 53 00:03:18,650 --> 00:03:21,150 But to get there, let's introduce this vocabulary. 54 00:03:21,150 --> 00:03:23,540 So in binary, of course, we use 0's and 1's. 55 00:03:23,540 --> 00:03:25,690 In decimal, of course, we use 0 through 9's. 56 00:03:25,690 --> 00:03:29,360 And in hexadecimal, to be clear, we're going to use 0 through F's, otherwise 57 00:03:29,360 --> 00:03:30,860 known as base-16. 58 00:03:30,860 --> 00:03:33,320 And it's just a convention that we use A through F. We 59 00:03:33,320 --> 00:03:35,450 could have used any other six symbols. 60 00:03:35,450 --> 00:03:37,560 But these are what humans have chosen. 61 00:03:37,560 --> 00:03:41,090 So hexadecimal works quite similarly to our familiar decimal system. 62 00:03:41,090 --> 00:03:45,110 And it's even familiar to, now, what you know as the binary system, as follows. 63 00:03:45,110 --> 00:03:49,370 Let's consider a two-digit value using hexadecimal instead of decimal 64 00:03:49,370 --> 00:03:50,600 and instead of binary. 65 00:03:50,600 --> 00:03:54,680 Well, just like in the world of decimal, we used base-10, 66 00:03:54,680 --> 00:03:57,080 or in the world of binary, we used base-2. 67 00:03:57,080 --> 00:04:01,170 We're just going to use, now, base-16, ergo, hexadecimal. 68 00:04:01,170 --> 00:04:02,360 So this is 16 to the first. 69 00:04:02,360 --> 00:04:03,590 This is 16 to the-- 70 00:04:03,590 --> 00:04:05,090 sorry 16 to the 0. 71 00:04:05,090 --> 00:04:06,590 This is 16 to the first. 72 00:04:06,590 --> 00:04:09,570 And of course, if we multiply that out, it's just the ones column 73 00:04:09,570 --> 00:04:11,280 and now the 16's column. 74 00:04:11,280 --> 00:04:13,550 And so if you want to count up in hexadecimal, 75 00:04:13,550 --> 00:04:21,290 you still start with 0 as usual, then 01, 02, 03, 04, 05, 06, 07, 08, 09. 76 00:04:21,290 --> 00:04:22,910 And then things get interesting. 77 00:04:22,910 --> 00:04:26,660 Now, you don't go to 01, because that would be incorrect. 78 00:04:26,660 --> 00:04:31,880 01, in this base system, would be like 16 times 1 plus 1 times 0. 79 00:04:31,880 --> 00:04:32,960 That's not what we want. 80 00:04:32,960 --> 00:04:38,930 After the number we know is 9, we now count up to A, B, C, D, E, F. 81 00:04:38,930 --> 00:04:40,670 And now, things get interesting again. 82 00:04:40,670 --> 00:04:43,580 But just like in the decimal system, when you count up to, like, 99, 83 00:04:43,580 --> 00:04:46,550 you have to start carrying the 1, same thing here. 84 00:04:46,550 --> 00:04:49,820 If you want to count past F, you carry the 1. 85 00:04:49,820 --> 00:04:55,340 And so now, to represent one value greater than F, we use 01, 86 00:04:55,340 --> 00:04:57,350 which looks like 10, but is not 10. 87 00:04:57,350 --> 00:04:59,675 In hexadecimal, it is 01. 88 00:04:59,675 --> 00:05:01,880 16 times 1 gives us 16. 89 00:05:01,880 --> 00:05:03,680 1 times 0 gives us 0. 90 00:05:03,680 --> 00:05:07,050 And of course, that gives us the decimal number we now know is 16. 91 00:05:07,050 --> 00:05:09,980 So we will no longer introduce more and more base systems. 92 00:05:09,980 --> 00:05:12,607 But let me stipulate that just by using these columns 93 00:05:12,607 --> 00:05:14,690 that you learned back in grade school, presumably, 94 00:05:14,690 --> 00:05:16,940 can you implement any base system now. 95 00:05:16,940 --> 00:05:19,310 It just so happens that in the world of computers, 96 00:05:19,310 --> 00:05:22,295 and today in the world of memory, and soon, also files, 97 00:05:22,295 --> 00:05:24,170 it's just going to be very conventional to be 98 00:05:24,170 --> 00:05:26,990 able to recognize and use hexadecimal. 99 00:05:26,990 --> 00:05:29,530 And in fact, there's a reason humans like hexadecimal, 100 00:05:29,530 --> 00:05:30,530 or at least some humans. 101 00:05:30,530 --> 00:05:36,827 Computer scientists recall that if we count up as high as FF, in this case, 102 00:05:36,827 --> 00:05:38,160 we would still do the same math. 103 00:05:38,160 --> 00:05:44,060 So 16 times 15 plus 1 times 15 is going to give us, really, this, 104 00:05:44,060 --> 00:05:49,210 or of course, 240 plus 15, or 255. 105 00:05:49,210 --> 00:05:50,460 And I did that pretty quickly. 106 00:05:50,460 --> 00:05:53,000 But that's just the sort of grade school math of multiplying 107 00:05:53,000 --> 00:05:55,730 the column by the value that's in it, where again, 108 00:05:55,730 --> 00:06:00,140 each of these F's is how we now express 15 using a single digit. 109 00:06:00,140 --> 00:06:02,480 But recall that we've seen 255 before. 110 00:06:02,480 --> 00:06:04,610 Back when we talked about binary a few weeks ago, 111 00:06:04,610 --> 00:06:12,450 255 also happened to be the pattern that we see here, eight 1 bits using binary. 112 00:06:12,450 --> 00:06:15,278 And so the reason that computer scientists tend to hexadecimal, 113 00:06:15,278 --> 00:06:17,570 is that, you know what, in eight bits, there's actually 114 00:06:17,570 --> 00:06:20,000 two pairs here, like four on the left, four on the right. 115 00:06:20,000 --> 00:06:22,340 If we sort of scooch these things over, it 116 00:06:22,340 --> 00:06:25,520 turns out that because hexadecimal allows 117 00:06:25,520 --> 00:06:28,730 you to represent 16 possible values, it's 118 00:06:28,730 --> 00:06:32,750 a perfect system for representing four bits at a time. 119 00:06:32,750 --> 00:06:36,980 After all, if you've got four bits here, each of which can be a 0 or 1, 120 00:06:36,980 --> 00:06:42,020 that's 2 times 2 times 2 times 2 possible values for each of those, 121 00:06:42,020 --> 00:06:45,740 or 16 total values, which is to say that in the world of computers, 122 00:06:45,740 --> 00:06:48,560 if you ever want to talk in units of four bits, 123 00:06:48,560 --> 00:06:51,590 it's wonderfully convenient to use hexadecimal instead, 124 00:06:51,590 --> 00:06:56,270 only because, conveniently, one hexadecimal digit happens to be 125 00:06:56,270 --> 00:07:00,590 equivalent to four binary digits, 0's and 1's. 126 00:07:00,590 --> 00:07:05,160 So 0, 0, 0, 0, all the way up through 1, 1, 1, 1. 127 00:07:05,160 --> 00:07:06,320 So why do humans do this? 128 00:07:06,320 --> 00:07:09,240 It's just now the human convention because of that convenience. 129 00:07:09,240 --> 00:07:11,760 Now, some of you may very well have seen hexadecimal before. 130 00:07:11,760 --> 00:07:14,660 In fact, recall our discussion in week 0 of RGB, 131 00:07:14,660 --> 00:07:17,660 where we discussed the representation of colors using 132 00:07:17,660 --> 00:07:19,860 some amount of red, green, and blue. 133 00:07:19,860 --> 00:07:21,720 And at the time, we used this example. 134 00:07:21,720 --> 00:07:24,080 We took our example out of context. 135 00:07:24,080 --> 00:07:27,560 And instead of using hi as a string of text, 136 00:07:27,560 --> 00:07:33,410 we reinterpreted 72, 73, and 33 as a sequence of colors. 137 00:07:33,410 --> 00:07:34,550 How much red do you want? 138 00:07:34,550 --> 00:07:35,720 How much green do you want? 139 00:07:35,720 --> 00:07:36,860 How much blue do you want? 140 00:07:36,860 --> 00:07:37,820 And that's fine. 141 00:07:37,820 --> 00:07:41,060 It's perfectly fine to think and express yourself in terms of decimal. 142 00:07:41,060 --> 00:07:44,270 But computer scientists tend not to do it that way in the context of colors 143 00:07:44,270 --> 00:07:45,790 and in the context of memory. 144 00:07:45,790 --> 00:07:49,160 Instead, they tend to use something called hexadecimal. 145 00:07:49,160 --> 00:07:51,590 And hexadecimal, here, would actually just 146 00:07:51,590 --> 00:07:57,860 have you change these values from 72, 73, 33, to the equivalent hexadecimal 147 00:07:57,860 --> 00:07:58,533 representation. 148 00:07:58,533 --> 00:08:00,200 And we won't bother doing the math here. 149 00:08:00,200 --> 00:08:04,340 But let me just stipulate that 72, 73, 33 in decimal 150 00:08:04,340 --> 00:08:10,262 is the same thing as 48, 49, 21 in hexadecimal. 151 00:08:10,262 --> 00:08:12,470 Now, obviously, if you glance at these three numbers, 152 00:08:12,470 --> 00:08:15,980 it's not at all obvious if you're looking at hexadecimal digits 153 00:08:15,980 --> 00:08:21,080 or decimal digits, because they do use the same subset, 0's through 9's. 154 00:08:21,080 --> 00:08:23,240 And so a convention, too, in the computing world, 155 00:08:23,240 --> 00:08:25,850 is any time you represent hexadecimal digits, 156 00:08:25,850 --> 00:08:29,300 you tend to prefix them, just because, with 0x. 157 00:08:29,300 --> 00:08:32,179 And there's no mathematical meaning to the 0 or the x. 158 00:08:32,179 --> 00:08:35,419 It's just a prefix you put there to make clear to the viewer 159 00:08:35,419 --> 00:08:38,299 that these are hexadecimal digits, even if they might otherwise 160 00:08:38,299 --> 00:08:40,490 look like decimal digits. 161 00:08:40,490 --> 00:08:41,940 So where are we going with this? 162 00:08:41,940 --> 00:08:43,857 Well, those of you who might have experimented 163 00:08:43,857 --> 00:08:46,850 in the past with making your own web pages and making them colorful, 164 00:08:46,850 --> 00:08:50,450 or those of you who are artists and have used programs like Photoshop, odds 165 00:08:50,450 --> 00:08:53,190 are, you've seen these codes before. 166 00:08:53,190 --> 00:08:55,940 In fact, here are a few screenshots of Photoshop itself. 167 00:08:55,940 --> 00:08:59,190 If you click on a color in Photoshop and you pull up this window, 168 00:08:59,190 --> 00:09:02,300 you can change the color that you're drawing on the screen 169 00:09:02,300 --> 00:09:04,970 to be any of the colors of the rainbow. 170 00:09:04,970 --> 00:09:07,470 But more arcanely, if you look down here, 171 00:09:07,470 --> 00:09:09,620 you can actually see these hexadecimal codes, 172 00:09:09,620 --> 00:09:11,990 because it's become human convention over the years 173 00:09:11,990 --> 00:09:15,630 to use hexadecimal to represent different amounts of red, green, 174 00:09:15,630 --> 00:09:16,320 and blue. 175 00:09:16,320 --> 00:09:23,435 So if you have no red, no green, no blue, otherwise represented as 000000, 176 00:09:23,435 --> 00:09:26,060 well, that's going to give you the color we know here as black. 177 00:09:26,060 --> 00:09:29,510 It's sort of the absence of any wavelengths of light there. 178 00:09:29,510 --> 00:09:33,470 If by contrast, though, you change all of those six digits 179 00:09:33,470 --> 00:09:38,810 to the highest possible value, which, again, is F. The range in hexadecimal 0 180 00:09:38,810 --> 00:09:42,890 through F, otherwise in decimal, being 0 through 15, well, 181 00:09:42,890 --> 00:09:46,800 with FFFFFF, that's a lot of red, a lot of green, a lot of blue. 182 00:09:46,800 --> 00:09:48,800 And when you combine those wavelengths of light, 183 00:09:48,800 --> 00:09:51,200 you get the color we see here as white. 184 00:09:51,200 --> 00:09:53,480 And you can imagine, now, combining different amounts 185 00:09:53,480 --> 00:09:54,930 of red or green or blue. 186 00:09:54,930 --> 00:10:00,740 So for instance, in hexadecimal, FF0000, is the color we know as red. 187 00:10:00,740 --> 00:10:05,270 00FF00 is the color we know as green. 188 00:10:05,270 --> 00:10:09,630 And finally, 0000FF is the color we know as blue, because again, 189 00:10:09,630 --> 00:10:14,240 the system that programmers and artists often but don't always use, is indeed, 190 00:10:14,240 --> 00:10:17,710 this system of RGB for red, green, and blue. 191 00:10:17,710 --> 00:10:19,460 So we introduced this here not because you 192 00:10:19,460 --> 00:10:21,810 have to start thinking any differently, because again, 193 00:10:21,810 --> 00:10:24,560 the mathematical mechanism is the same as week 0. 194 00:10:24,560 --> 00:10:28,970 But you're going to start seeing numbers in examples, in programs, 195 00:10:28,970 --> 00:10:32,900 as just appearing in hexadecimal by convention, as opposed to actually 196 00:10:32,900 --> 00:10:35,550 being interpreted as decimal. 197 00:10:35,550 --> 00:10:37,880 So if we consider, now, our computer's memory, 198 00:10:37,880 --> 00:10:40,610 we'll now start thinking of this whole canvas of memory, 199 00:10:40,610 --> 00:10:43,010 all of these bytes inside of our computer's memory, 200 00:10:43,010 --> 00:10:46,700 as being innumerable as 0, 1, 2, all the way through F. 201 00:10:46,700 --> 00:10:53,750 And then if we keep counting, we can go to 10, 11, 12, 13, 14, 15, 16, 17, 18, 202 00:10:53,750 --> 00:10:58,850 19, 1A, 1B, 1C, 1D, and so forth. 203 00:10:58,850 --> 00:11:00,790 And it's fine if it's not nearly that obvious, 204 00:11:00,790 --> 00:11:03,670 as you look at these things, what the decimal equivalents are. 205 00:11:03,670 --> 00:11:04,690 That's not a problem. 206 00:11:04,690 --> 00:11:09,130 It's just a different way of thinking about the locations, in this case, 207 00:11:09,130 --> 00:11:13,480 of a computer's memory, or the representation of one color or another. 208 00:11:13,480 --> 00:11:19,480 All right, well, let's now use this as an example of an opportunity, 209 00:11:19,480 --> 00:11:22,690 rather, to consider what's actually being stored in our computer's memory. 210 00:11:22,690 --> 00:11:26,320 And to be clear, I'll start prefixing all of these memory addresses, 211 00:11:26,320 --> 00:11:29,890 so to speak, with 0x, just to make clear that we're now talking, indeed, 212 00:11:29,890 --> 00:11:31,480 in terms of 0's and 1's. 213 00:11:31,480 --> 00:11:32,980 So here's a simple line of code. 214 00:11:32,980 --> 00:11:35,147 Out of context, we would need to, actually, put this 215 00:11:35,147 --> 00:11:37,910 in main or some other program to actually do anything with it. 216 00:11:37,910 --> 00:11:39,702 But we've seen this before many times, now, 217 00:11:39,702 --> 00:11:42,760 where you declare a variable, for instance, n for number. 218 00:11:42,760 --> 00:11:44,830 Declare it as an int for its type. 219 00:11:44,830 --> 00:11:47,170 And then, perhaps, even assign it a value. 220 00:11:47,170 --> 00:11:51,520 Well, what's actually going on when we use this kind of code in our computer? 221 00:11:51,520 --> 00:11:54,760 Well, let's go ahead and whip this thing up in a actual program. 222 00:11:54,760 --> 00:11:57,970 Let me create a file called address.c because I 223 00:11:57,970 --> 00:12:01,300 want to start experimenting with some addresses in the computer's memory. 224 00:12:01,300 --> 00:12:04,180 I'm going to go ahead and include standard io dot h. 225 00:12:04,180 --> 00:12:06,460 I'm going to give myself int main void. 226 00:12:06,460 --> 00:12:08,890 And down here, I'm going to go ahead and declare exactly 227 00:12:08,890 --> 00:12:10,915 that variable, int n equals 50. 228 00:12:10,915 --> 00:12:15,820 And then I'm going to go ahead and print out, with percent i and a backslash 0, 229 00:12:15,820 --> 00:12:17,230 the value of n. 230 00:12:17,230 --> 00:12:19,930 So nothing interesting there, nothing too complicated. 231 00:12:19,930 --> 00:12:21,790 I'm going to go ahead and make address. 232 00:12:21,790 --> 00:12:24,123 And then I'm going to go ahead and do dot slash address. 233 00:12:24,123 --> 00:12:26,380 And of course, as per week one, we should hopefully 234 00:12:26,380 --> 00:12:27,930 see just the number 50. 235 00:12:27,930 --> 00:12:31,570 But today, we're going to give you some more tools with which you can actually 236 00:12:31,570 --> 00:12:33,880 start poking around your computer's memory. 237 00:12:33,880 --> 00:12:35,950 But let's first consider this line of code 238 00:12:35,950 --> 00:12:38,240 in the context of your computer's hardware. 239 00:12:38,240 --> 00:12:41,200 So if you're writing a program with a line of code like this, 240 00:12:41,200 --> 00:12:44,500 that n needs to be somewhere in your computer's memory. 241 00:12:44,500 --> 00:12:47,870 That 50 needs to be put somewhere in your computer's memory. 242 00:12:47,870 --> 00:12:51,010 So if we, again, consider this to be just part of our computer's 243 00:12:51,010 --> 00:12:55,000 memory, a few dozen bytes, well, suppose that that variable, n, 244 00:12:55,000 --> 00:12:57,130 happens to end up down here. 245 00:12:57,130 --> 00:13:01,570 I've deliberately drawn n as taking up four bytes, four squares, because we 246 00:13:01,570 --> 00:13:05,830 call that an integer, typically, at least on CS50 IDE and modern systems, 247 00:13:05,830 --> 00:13:07,370 tends to be four bytes. 248 00:13:07,370 --> 00:13:10,630 So I made sure to have it fill four complete boxes. 249 00:13:10,630 --> 00:13:13,940 And then value might be 50 that's actually stored there. 250 00:13:13,940 --> 00:13:17,890 Well, it turns out that within your computer's memory, again, 251 00:13:17,890 --> 00:13:20,660 there are these addresses that are implicitly there. 252 00:13:20,660 --> 00:13:23,530 So even though, yes, we can refer to this variable, n, 253 00:13:23,530 --> 00:13:26,620 based on the variable name I gave it in my code, 254 00:13:26,620 --> 00:13:30,940 surely this variable exists at a specific location in memory. 255 00:13:30,940 --> 00:13:32,530 I don't know offhand where it is. 256 00:13:32,530 --> 00:13:38,410 But let me just propose that maybe it's at location 0x12345678, just 257 00:13:38,410 --> 00:13:39,550 an arbitrary address. 258 00:13:39,550 --> 00:13:41,690 I have no idea, in actuality, where it is. 259 00:13:41,690 --> 00:13:44,860 But it certainly does have an address, because every one of these squares 260 00:13:44,860 --> 00:13:49,540 inside of your computer's memory has an address, a unique identifier like 0, 1, 261 00:13:49,540 --> 00:13:50,750 2, and so forth. 262 00:13:50,750 --> 00:13:56,710 Maybe the 50 ended up at memory address 0x12345678. 263 00:13:56,710 --> 00:14:01,750 Well, that's kind of cool about C, is that we can actually begin to see this, 264 00:14:01,750 --> 00:14:03,020 no pun intended. 265 00:14:03,020 --> 00:14:05,080 So let me go ahead and modify this program 266 00:14:05,080 --> 00:14:07,480 and introduce a little bit of new syntax that 267 00:14:07,480 --> 00:14:11,510 will allow us to start poking around the inside of your computer's memory 268 00:14:11,510 --> 00:14:14,830 so we can actually see what's going on underneath. 269 00:14:14,830 --> 00:14:17,710 So I'm going to go ahead and change this program to do this instead. 270 00:14:17,710 --> 00:14:19,585 I'm going to go ahead and say, you know what? 271 00:14:19,585 --> 00:14:23,590 Don't just print out the value, n, which, of course, is 50. 272 00:14:23,590 --> 00:14:28,060 Let me see, just out of curiosity, what is the actual address of n. 273 00:14:28,060 --> 00:14:31,300 And to do that today, we're going to introduce one new piece of syntax, 274 00:14:31,300 --> 00:14:33,070 which happens to be this here. 275 00:14:33,070 --> 00:14:37,360 There's two new operators, today, in C. The first is an ampersand, which 276 00:14:37,360 --> 00:14:39,580 does not represent a logical and. 277 00:14:39,580 --> 00:14:42,100 Recall a couple of weeks ago, we did see that if you 278 00:14:42,100 --> 00:14:46,840 want to combine Boolean expressions, this and that, you use two ampersands. 279 00:14:46,840 --> 00:14:51,040 It's an unfortunate coincidence that an ampersand, solo like this, 280 00:14:51,040 --> 00:14:52,630 will mean something different today. 281 00:14:52,630 --> 00:14:56,830 Specifically, this ampersand is going to be our address of operator. 282 00:14:56,830 --> 00:15:02,590 By simply prefixing any variable name with an ampersand, we can tell C, 283 00:15:02,590 --> 00:15:06,520 please tell me what address this variable is stored in. 284 00:15:06,520 --> 00:15:10,180 And this star, not to be confused with multiplication, 285 00:15:10,180 --> 00:15:12,880 also has another meaning in today's context. 286 00:15:12,880 --> 00:15:15,310 When you use this asterisk, you can actually 287 00:15:15,310 --> 00:15:19,910 tell your program to look inside of a particular memory address. 288 00:15:19,910 --> 00:15:23,500 So the ampersand tells you what address a variable is at. 289 00:15:23,500 --> 00:15:27,310 The star operator, otherwise known as the dereference operator, 290 00:15:27,310 --> 00:15:30,190 means, go to the following address. 291 00:15:30,190 --> 00:15:32,050 So they sort of are reverse operations. 292 00:15:32,050 --> 00:15:33,400 One figures out the address. 293 00:15:33,400 --> 00:15:35,240 One goes to the address. 294 00:15:35,240 --> 00:15:37,850 And so let's see this for real here. 295 00:15:37,850 --> 00:15:43,070 Let me go ahead and change my n in my program here to ampersand n. 296 00:15:43,070 --> 00:15:48,980 So I want to print out, not the number in n, but the address of n. 297 00:15:48,980 --> 00:15:50,870 And now, how do I print out an address? 298 00:15:50,870 --> 00:15:52,170 Well, it is just a number. 299 00:15:52,170 --> 00:15:56,690 But actually, printf supports a different format code for addresses. 300 00:15:56,690 --> 00:15:59,840 You can do percent p, for reasons we'll soon see, 301 00:15:59,840 --> 00:16:02,510 that says to print out the address of this variable 302 00:16:02,510 --> 00:16:05,375 and interpret it as hexadecimal, again, by convention. 303 00:16:05,375 --> 00:16:07,250 So I'm going to go ahead and make address now 304 00:16:07,250 --> 00:16:10,530 after only making two changes to this file. 305 00:16:10,530 --> 00:16:12,350 Everything seems to compile OK. 306 00:16:12,350 --> 00:16:14,150 Now, I'm going to go ahead and run address. 307 00:16:14,150 --> 00:16:17,210 And we will see that, in this particular program, 308 00:16:17,210 --> 00:16:21,620 address.c, for whatever reason, that variable, n, 309 00:16:21,620 --> 00:16:30,110 ended up at crazy location 0x7ffd80792f7c. 310 00:16:30,110 --> 00:16:31,160 Now, is that useful? 311 00:16:31,160 --> 00:16:32,870 Not in practice, necessarily. 312 00:16:32,870 --> 00:16:36,530 We're going to make this become useful by leveraging these addresses. 313 00:16:36,530 --> 00:16:38,900 But the specific address is not interesting. 314 00:16:38,900 --> 00:16:40,070 I'm glancing at this number. 315 00:16:40,070 --> 00:16:41,993 I have no idea what that number is in decimal. 316 00:16:41,993 --> 00:16:44,660 I would have to do the math, or frankly, just Google a converter 317 00:16:44,660 --> 00:16:45,660 and do it for me. 318 00:16:45,660 --> 00:16:47,420 So again, that's not the interesting part. 319 00:16:47,420 --> 00:16:50,420 The fact that this is in hexadecimal is just an implementation detail. 320 00:16:50,420 --> 00:16:54,450 It happens to represent the location of this variable. 321 00:16:54,450 --> 00:16:58,230 And again, we won't want to do this, necessarily. 322 00:16:58,230 --> 00:17:00,830 But just to be clear that one of these operators, 323 00:17:00,830 --> 00:17:02,330 the ampersand gets the address. 324 00:17:02,330 --> 00:17:05,089 And the star operator goes to an address. 325 00:17:05,089 --> 00:17:07,160 We can actually undo the effects of these things. 326 00:17:07,160 --> 00:17:13,010 For instance, if I print out now, not ampersand n, but just out of curiosity, 327 00:17:13,010 --> 00:17:18,170 star ampersand n, I can kind of undo the effects of this operator. 328 00:17:18,170 --> 00:17:21,170 Ampersand n is going to say, what is the address of n? 329 00:17:21,170 --> 00:17:25,349 Star ampersand n is going to say, go to that address. 330 00:17:25,349 --> 00:17:29,360 So this is kind of a pointless exercise, because if I just want what's in n, 331 00:17:29,360 --> 00:17:32,120 I can just, obviously, print n like we began. 332 00:17:32,120 --> 00:17:34,560 But again, just as an intellectual exercise, 333 00:17:34,560 --> 00:17:38,750 if I prefix n with the address of operator, and then use the asterisk 334 00:17:38,750 --> 00:17:42,830 and say, go to that address, it's the same exact thing 335 00:17:42,830 --> 00:17:44,280 as just printing n itself. 336 00:17:44,280 --> 00:17:46,640 So let me change the format code back to an integer. 337 00:17:46,640 --> 00:17:50,060 Instead percent p, let me go ahead and make address now, 338 00:17:50,060 --> 00:17:52,100 seems to compile OK, and run address. 339 00:17:52,100 --> 00:17:53,885 And voila, we're back at the 50. 340 00:17:53,885 --> 00:17:57,050 So as weird as the syntax today might start to feel, 341 00:17:57,050 --> 00:17:59,330 realize that these operators, at the end of the day, 342 00:17:59,330 --> 00:18:01,833 are relatively simple in what they do. 343 00:18:01,833 --> 00:18:05,000 And if you understand that one just kind of undoes the effects of the other, 344 00:18:05,000 --> 00:18:08,360 can we start to build up some pretty interesting programs with them. 345 00:18:08,360 --> 00:18:11,870 And we're going to do so by leveraging a special type of variable, 346 00:18:11,870 --> 00:18:13,910 a variable called a pointer. 347 00:18:13,910 --> 00:18:16,670 And there is that p in percent p. 348 00:18:16,670 --> 00:18:22,240 A pointer is a variable that contains the address of some other value. 349 00:18:22,240 --> 00:18:23,790 So we've seen integers before. 350 00:18:23,790 --> 00:18:27,770 We've seen floats and chars and strings and other types as well. 351 00:18:27,770 --> 00:18:31,430 Pointers, now, are just a different type of variable 352 00:18:31,430 --> 00:18:34,640 that store the address of some value. 353 00:18:34,640 --> 00:18:40,250 And you can have pointers to integers, pointers to chars, pointers to bools, 354 00:18:40,250 --> 00:18:41,870 or any other data type. 355 00:18:41,870 --> 00:18:45,980 A pointer references the specific type of the value 356 00:18:45,980 --> 00:18:48,223 that it actually is referring to. 357 00:18:48,223 --> 00:18:49,640 So let's see this more concretely. 358 00:18:49,640 --> 00:18:51,620 Let me go back, now, to my program here. 359 00:18:51,620 --> 00:18:53,840 And let me introduce another variable here. 360 00:18:53,840 --> 00:18:58,430 Instead of immediately printing out something like n, let me go ahead 361 00:18:58,430 --> 00:19:02,870 and introduce a second variable that is of type int star. 362 00:19:02,870 --> 00:19:06,860 And this, I will admit, is probably the most confusing piece of C syntax 363 00:19:06,860 --> 00:19:09,860 that we'll, in general, see, just because, my god, star is now 364 00:19:09,860 --> 00:19:13,220 used for multiplication, for going to an address, and also, now, 365 00:19:13,220 --> 00:19:14,610 declaring a variable. 366 00:19:14,610 --> 00:19:17,120 This is, arguably, not the best design decision. 367 00:19:17,120 --> 00:19:18,350 But it was made decades ago. 368 00:19:18,350 --> 00:19:19,730 So this is what we have. 369 00:19:19,730 --> 00:19:26,240 But if I do int star p equals ampersand n, now, what I can do down here, 370 00:19:26,240 --> 00:19:31,770 is print out the address of n by temporarily storing it in a variable. 371 00:19:31,770 --> 00:19:33,830 So I'm not doing anything new just yet. 372 00:19:33,830 --> 00:19:36,020 I'm still declaring on line 5, an integer 373 00:19:36,020 --> 00:19:37,910 called n, assigning at the value 50. 374 00:19:37,910 --> 00:19:42,260 What's new now on line 6, is that I'm introducing a new type of variable. 375 00:19:42,260 --> 00:19:44,210 This type of variable is known as a pointer. 376 00:19:44,210 --> 00:19:48,410 A pointer, again, is just a variable that stores the address of some value. 377 00:19:48,410 --> 00:19:53,240 And the syntax, admittedly weird, for declaring a pointer to an integer, 378 00:19:53,240 --> 00:19:57,560 is literally say int, because that's the type you're pointing to, 379 00:19:57,560 --> 00:20:00,350 star, and then the name of the variable you want to create. 380 00:20:00,350 --> 00:20:03,320 And I could call this anything, but I'll call it p to keep it succinct. 381 00:20:03,320 --> 00:20:05,120 And again, on the right hand side of the equals sign 382 00:20:05,120 --> 00:20:06,620 is the same operator as before. 383 00:20:06,620 --> 00:20:10,040 If you want to figure out what is the address of n, it's just ampersand n. 384 00:20:10,040 --> 00:20:14,450 And so we can store that address, now, somewhere longer-term. 385 00:20:14,450 --> 00:20:18,110 Before, I just passed in ampersand n and printf did it's thing. 386 00:20:18,110 --> 00:20:23,120 Now, I'm temporarily, on line 6, storing that address in a new variable 387 00:20:23,120 --> 00:20:24,470 called p. 388 00:20:24,470 --> 00:20:28,910 And its type is technically int star, is what a programmer might say. 389 00:20:28,910 --> 00:20:33,680 So it would be incorrect to say int p equals ampersand n. 390 00:20:33,680 --> 00:20:35,780 And indeed, our compiler, Clang, won't like that. 391 00:20:35,780 --> 00:20:38,370 It won't let you compile the code, most likely. 392 00:20:38,370 --> 00:20:43,160 And so, instead, I do int star p to make clear that I know what I'm doing. 393 00:20:43,160 --> 00:20:48,450 I am storing the address of an int, not an integer, per say. 394 00:20:48,450 --> 00:20:53,040 So if I go ahead, now, and save this, recompile with make address. 395 00:20:53,040 --> 00:20:55,530 And notice, I changed one line of code 2 earlier. 396 00:20:55,530 --> 00:20:59,400 I went back to percent p to print a pointer that is an address. 397 00:20:59,400 --> 00:21:02,490 And I'm pointing out the value of p, no longer the value of n. 398 00:21:02,490 --> 00:21:07,050 If I now run dot slash address, voila, there's that cryptic address. 399 00:21:07,050 --> 00:21:09,300 And these addresses may very well change over time. 400 00:21:09,300 --> 00:21:11,640 Depending on what's going on inside of your program 401 00:21:11,640 --> 00:21:15,390 or other things on the system, these addresses might be different each time. 402 00:21:15,390 --> 00:21:18,060 And that's to be expected and not something to be relied on. 403 00:21:18,060 --> 00:21:20,250 But it's clearly some random cryptic address, 404 00:21:20,250 --> 00:21:24,400 similar to my arbitrary 0x12345678 before. 405 00:21:24,400 --> 00:21:26,310 But now, let's just undo this operation. 406 00:21:26,310 --> 00:21:30,120 Just so we can come full circle here, let me now propose 407 00:21:30,120 --> 00:21:33,495 how I can print out the value of n. 408 00:21:33,495 --> 00:21:35,370 And let me call on someone for this if I can. 409 00:21:35,370 --> 00:21:41,640 If my goal, now, on line 7, is no longer to print the address of n, but to print 410 00:21:41,640 --> 00:21:43,972 n itself using p. 411 00:21:43,972 --> 00:21:45,930 I'm going to go ahead and change, preemptively, 412 00:21:45,930 --> 00:21:47,820 the format code to percent i. 413 00:21:47,820 --> 00:21:51,660 And a shorthand notation would, obviously, be just print n. 414 00:21:51,660 --> 00:21:53,610 But suppose I don't want to print n for this 415 00:21:53,610 --> 00:22:02,880 exercise, how can I now print the value in n by referring to it by way of p? 416 00:22:02,880 --> 00:22:05,910 What should I literally type as printf's second argument 417 00:22:05,910 --> 00:22:12,530 to print out the value of n by using this new variable, p, in some way. 418 00:22:12,530 --> 00:22:16,290 Yeah, let's call on Joshua. 419 00:22:16,290 --> 00:22:19,860 AUDIENCE: I believe, if you use the ampersand before the p, 420 00:22:19,860 --> 00:22:21,642 it will probably do it. 421 00:22:21,642 --> 00:22:24,100 DAVID MALAN: OK, ampersand p, let me go ahead and try that. 422 00:22:24,100 --> 00:22:27,700 Let's try ampersand p to print out this value. 423 00:22:27,700 --> 00:22:30,370 So ampersand p, I'm going to save the file. 424 00:22:30,370 --> 00:22:32,610 I'm going to do make address and enter. 425 00:22:32,610 --> 00:22:34,415 And it doesn't seem to be the case. 426 00:22:34,415 --> 00:22:35,790 Notice that I'm getting an error. 427 00:22:35,790 --> 00:22:36,720 It's a little cryptic. 428 00:22:36,720 --> 00:22:40,920 Format specifies type int, but the argument has type int star star, 429 00:22:40,920 --> 00:22:42,090 more on that another time. 430 00:22:42,090 --> 00:22:43,570 So it turns out this was incorrect. 431 00:22:43,570 --> 00:22:47,430 Let's take one other suggestion, because the ampersand, recall, 432 00:22:47,430 --> 00:22:49,170 gets the address of something. 433 00:22:49,170 --> 00:22:50,880 But p is already an address. 434 00:22:50,880 --> 00:22:52,590 So Joshua, what you technically proposed, 435 00:22:52,590 --> 00:22:54,300 was get me the address of the address. 436 00:22:54,300 --> 00:22:56,190 And that's not the direction we want to go. 437 00:22:56,190 --> 00:22:58,170 We want to go to what is at that address. 438 00:22:58,170 --> 00:23:00,740 Sophia, what do you think? 439 00:23:00,740 --> 00:23:02,640 AUDIENCE: We want to add a percent-- 440 00:23:02,640 --> 00:23:06,820 or a star p when we print it. 441 00:23:06,820 --> 00:23:07,570 DAVID MALAN: Yeah. 442 00:23:07,570 --> 00:23:09,380 So I had a little trouble hearing you. 443 00:23:09,380 --> 00:23:12,370 But I think if we instead use not the ampersand operator, 444 00:23:12,370 --> 00:23:14,710 but the star operator, that's going to be, 445 00:23:14,710 --> 00:23:17,170 indeed, the dereference operator, which essentially means, 446 00:23:17,170 --> 00:23:19,120 go to the value in p. 447 00:23:19,120 --> 00:23:23,530 And if the value in p is an address, I think, let's try this, make address. 448 00:23:23,530 --> 00:23:25,490 Yep, that compiled OK this time. 449 00:23:25,490 --> 00:23:27,550 Now, if I do dot slash address, hopefully, I 450 00:23:27,550 --> 00:23:30,400 will now see, indeed, the number 50. 451 00:23:30,400 --> 00:23:33,010 So again, we don't seem to have made any fundamental progress. 452 00:23:33,010 --> 00:23:36,070 At the end of the day, I'm still just printing out the value of n. 453 00:23:36,070 --> 00:23:39,100 But we've introduced this new primitive, this new puzzle piece, 454 00:23:39,100 --> 00:23:41,440 if you will, that allows you, programmatically, 455 00:23:41,440 --> 00:23:44,390 to figure out the address of something in the computer's memory 456 00:23:44,390 --> 00:23:46,540 and to actually go to that address. 457 00:23:46,540 --> 00:23:52,070 And we'll soon see exercise more sophisticated control over it as well. 458 00:23:52,070 --> 00:23:56,050 But let's come back to a pictorial representation of this 459 00:23:56,050 --> 00:23:59,290 and consider what it is we just did in the context, now, of this code. 460 00:23:59,290 --> 00:24:02,080 So inside of my main, the two interesting lines of code, 461 00:24:02,080 --> 00:24:05,320 really, were these two lines first before we made Sophia's addition 462 00:24:05,320 --> 00:24:07,990 and actually dereferenced p and printed it out with printf. 463 00:24:07,990 --> 00:24:10,810 But let's consider, for a moment, what these values now 464 00:24:10,810 --> 00:24:12,280 look like in a computer's memory. 465 00:24:12,280 --> 00:24:14,440 And again, the syntax is a little cryptic 466 00:24:14,440 --> 00:24:16,475 because we now have a star and an ampersand. 467 00:24:16,475 --> 00:24:18,850 But again, that just means, now, we get to start thinking 468 00:24:18,850 --> 00:24:20,405 in terms of the computer's memory. 469 00:24:20,405 --> 00:24:23,030 So for instance, here's a grid of memory inside of my computer. 470 00:24:23,030 --> 00:24:26,980 And maybe, for instance, the 50 and the n end up down there. 471 00:24:26,980 --> 00:24:29,980 They could end up anywhere, not even pictured on the screen here. 472 00:24:29,980 --> 00:24:34,090 They end up somewhere in the computer's memory, for our purposes thus far. 473 00:24:34,090 --> 00:24:36,100 But it technically lives in an address. 474 00:24:36,100 --> 00:24:38,950 And let me simplify the address just so it's quicker to say. 475 00:24:38,950 --> 00:24:42,310 This 50, now, stored in the variable n, maybe it actually 476 00:24:42,310 --> 00:24:44,590 lives at address 0x123. 477 00:24:44,590 --> 00:24:46,480 I have no idea where it is, but we've clearly 478 00:24:46,480 --> 00:24:50,200 seen that it can live in a seemingly random address like that. 479 00:24:50,200 --> 00:24:51,640 Now, what about p? 480 00:24:51,640 --> 00:24:54,520 p is technically a variable itself. 481 00:24:54,520 --> 00:24:57,190 It's a variable that stores the address of something else. 482 00:24:57,190 --> 00:25:00,190 But it's still a variable, which means, when you declare p 483 00:25:00,190 --> 00:25:04,660 with the code earlier, it actually does take up some bytes of memory 484 00:25:04,660 --> 00:25:05,660 on the screen. 485 00:25:05,660 --> 00:25:10,420 And so let me go ahead and propose that p happens to end up in memory here. 486 00:25:10,420 --> 00:25:13,450 Now, p is deliberately drawn to be longer here. 487 00:25:13,450 --> 00:25:15,700 I'm consuming eight total bytes this time, 488 00:25:15,700 --> 00:25:20,470 because it turns out, on modern computer systems, including CS50 IDE, 489 00:25:20,470 --> 00:25:23,500 pointers tend to take up eight bytes. 490 00:25:23,500 --> 00:25:27,190 So not one, not four, but eight bytes, so I've simply drawn it to be bigger. 491 00:25:27,190 --> 00:25:31,240 So what is actually stored in the variable p? 492 00:25:31,240 --> 00:25:35,600 Well, it turns out that, again, it's just storing the address of some value. 493 00:25:35,600 --> 00:25:42,460 So if the integer n, which itself is storing 50, is at location 0x123, 494 00:25:42,460 --> 00:25:47,080 and pointer p is being assigned that address, it's just like saying, 495 00:25:47,080 --> 00:25:50,620 well, stored in this variable p, is literally just a number 496 00:25:50,620 --> 00:25:54,190 represented here in hexadecimal notation, 0x123. 497 00:25:54,190 --> 00:25:56,650 So that's all that's going on inside the computer's memory 498 00:25:56,650 --> 00:25:57,858 with those two lines of code. 499 00:25:57,858 --> 00:26:00,040 There's nothing fundamentally new, except the fact 500 00:26:00,040 --> 00:26:04,430 that we have new syntax with which to refer to these addresses explicitly. 501 00:26:04,430 --> 00:26:06,100 This is n down here. 502 00:26:06,100 --> 00:26:07,720 This is p up here. 503 00:26:07,720 --> 00:26:12,160 And the value of p just happens to be an address. 504 00:26:12,160 --> 00:26:15,205 Now, I keep saying that these addresses are a little cryptic. 505 00:26:15,205 --> 00:26:16,330 They're a little arbitrary. 506 00:26:16,330 --> 00:26:16,872 And they are. 507 00:26:16,872 --> 00:26:20,530 And honestly, it is rarely, if ever, going to be enlightening to know, 508 00:26:20,530 --> 00:26:25,030 as a human, what address this integer n is actually at. 509 00:26:25,030 --> 00:26:28,550 Who cares if it's at 0x123 or 0x456? 510 00:26:28,550 --> 00:26:29,800 Generally, we don't. 511 00:26:29,800 --> 00:26:33,070 And so computer scientists, when talking about computers' memory, 512 00:26:33,070 --> 00:26:38,010 tend not to talk at these low level details, in terms of actual numbers. , 513 00:26:38,010 --> 00:26:40,600 Instead, they tend to simplify the picture, 514 00:26:40,600 --> 00:26:44,230 sort of abstract away all of the other memory, which frankly, is not 515 00:26:44,230 --> 00:26:46,690 relevant to the discussion thus far, and just 516 00:26:46,690 --> 00:26:50,290 say, you know what, I know that p is storing an address. 517 00:26:50,290 --> 00:26:53,740 And that address happens to be that of 50 down here. 518 00:26:53,740 --> 00:26:56,830 But I really don't care, in my everyday programming life, 519 00:26:56,830 --> 00:26:58,360 what these specific addresses are. 520 00:26:58,360 --> 00:26:59,230 So you know what? 521 00:26:59,230 --> 00:27:01,730 Let's just abstract it away as an arrow. 522 00:27:01,730 --> 00:27:06,250 And again, abstraction is all about simplifying lower level details 523 00:27:06,250 --> 00:27:09,250 that you may very well need to understand but you don't necessarily 524 00:27:09,250 --> 00:27:10,520 need to keep thinking about. 525 00:27:10,520 --> 00:27:11,950 You don't need to keep thinking at this level. 526 00:27:11,950 --> 00:27:13,730 It suffices to think at this level. 527 00:27:13,730 --> 00:27:16,600 So we might as well draw a pointer, pictorially, 528 00:27:16,600 --> 00:27:20,710 as pointing at some value and irrespective of what 529 00:27:20,710 --> 00:27:22,330 the actual address is. 530 00:27:22,330 --> 00:27:25,150 And so this is very much the case in our human world. 531 00:27:25,150 --> 00:27:29,200 We have very similar conventions whether or not 532 00:27:29,200 --> 00:27:31,750 it might be obvious at first glance, such 533 00:27:31,750 --> 00:27:37,310 that we may very well be using these same mechanisms in our everyday lives. 534 00:27:37,310 --> 00:27:40,690 So for instance, if you happen to have a mailbox out in the street on your home 535 00:27:40,690 --> 00:27:43,768 or down in the basement of Harvard Science Center when on campus, it 536 00:27:43,768 --> 00:27:46,810 may very well look like something like this, at least more residentially. 537 00:27:46,810 --> 00:27:51,100 And suppose that this mailbox here is representing, in this case, p, 538 00:27:51,100 --> 00:27:51,790 in the story. 539 00:27:51,790 --> 00:27:55,490 It's storing a pointer, that is, the address of something else. 540 00:27:55,490 --> 00:27:58,360 Well, if there's a whole bunch of other mailboxes on the street, 541 00:27:58,360 --> 00:28:01,510 well, we can put anything we want in these mailboxes. 542 00:28:01,510 --> 00:28:04,840 We can put postcards, letters, packages even. 543 00:28:04,840 --> 00:28:08,250 And just as in the real world, can we do the same in the virtual. 544 00:28:08,250 --> 00:28:12,890 I can store chars or integers or other things, including addresses. 545 00:28:12,890 --> 00:28:17,100 So for instance, Brian, I think you have your own mailbox somewhere else. 546 00:28:17,100 --> 00:28:20,660 And Brian, of course, has a mailbox that itself has a unique address. 547 00:28:20,660 --> 00:28:23,600 So Brian, for instance, what happens to be the unique address 548 00:28:23,600 --> 00:28:26,030 of the mailbox on your street there? 549 00:28:26,030 --> 00:28:27,600 BRIAN: Yeah, so here is my mailbox. 550 00:28:27,600 --> 00:28:28,370 It's labeled n. 551 00:28:28,370 --> 00:28:29,750 And its address is over here. 552 00:28:29,750 --> 00:28:33,200 The address of my mailbox appears to be 0x123. 553 00:28:33,200 --> 00:28:35,450 DAVID MALAN: Yeah, so my mailbox, too, has an address. 554 00:28:35,450 --> 00:28:37,200 Frankly, again, I don't really care about it. 555 00:28:37,200 --> 00:28:39,033 So I've not even put it on the mailbox here. 556 00:28:39,033 --> 00:28:43,070 But if my mailbox represents p, a pointer, and Brian's mailbox 557 00:28:43,070 --> 00:28:45,920 represents n, an integer, well, it should 558 00:28:45,920 --> 00:28:49,260 mean that if I look inside the contents of my pointer 559 00:28:49,260 --> 00:28:53,690 and I see the value 0x123, that is now my clue, 560 00:28:53,690 --> 00:28:57,560 a breadcrumb of sorts, that can now let me go look inside of Brian's mailbox. 561 00:28:57,560 --> 00:29:00,320 And Brian, if you wouldn't mind doing that for us, 562 00:29:00,320 --> 00:29:02,430 what do you have at that address? 563 00:29:02,430 --> 00:29:05,540 BRIAN: And if I look in my mailbox at address 0x123, 564 00:29:05,540 --> 00:29:07,727 I have the number 50 inside of this mailbox. 565 00:29:07,727 --> 00:29:08,810 DAVID MALAN: Yeah, indeed. 566 00:29:08,810 --> 00:29:10,400 So in this case, he happens to be storing an int. 567 00:29:10,400 --> 00:29:11,650 But it could be anything else. 568 00:29:11,650 --> 00:29:14,480 And again, we don't typically care about these specific addresses. 569 00:29:14,480 --> 00:29:17,450 Once you understand the metaphor, really, we can do something silly 570 00:29:17,450 --> 00:29:20,630 and really just think of this mailbox as storing a value that's 571 00:29:20,630 --> 00:29:23,180 pointing at Brian's mailbox. 572 00:29:23,180 --> 00:29:26,510 It's some kind of direction drawn there, pictorially as an arrow, 573 00:29:26,510 --> 00:29:29,000 here as a silly foam finger. 574 00:29:29,000 --> 00:29:34,750 Or if you prefer, a foam Yale finger pointing, instead, at Brian's mailbox, 575 00:29:34,750 --> 00:29:38,720 just as a sort of breadcrumb leading us to some other value on the screen. 576 00:29:38,720 --> 00:29:41,408 So when we talk today and beyond about addresses, 577 00:29:41,408 --> 00:29:42,700 that's all we're talking about. 578 00:29:42,700 --> 00:29:45,790 We humans in the real world have been using addresses for eons, now, 579 00:29:45,790 --> 00:29:49,030 to uniquely identify our homes or businesses or the like. 580 00:29:49,030 --> 00:29:51,520 Computers do the exact same thing at a lower level 581 00:29:51,520 --> 00:29:53,440 using their computer's memory. 582 00:29:53,440 --> 00:29:58,330 So let me pause here to see if there are any questions on pointers, variables 583 00:29:58,330 --> 00:30:00,760 that store addresses, or on these new operators, 584 00:30:00,760 --> 00:30:02,890 like the ampersand or the asterisk, which 585 00:30:02,890 --> 00:30:06,310 now has a new meaning today onward. 586 00:30:06,310 --> 00:30:06,968 Nothing yet. 587 00:30:06,968 --> 00:30:09,010 All right, seeing none, well, let's consider now, 588 00:30:09,010 --> 00:30:12,250 the same story in the context of a completely different data type. 589 00:30:12,250 --> 00:30:15,310 Thus far, we've played only with ints. 590 00:30:15,310 --> 00:30:16,630 But consider strings. 591 00:30:16,630 --> 00:30:20,950 We've spent a lot of time on strings, using encryption with them 592 00:30:20,950 --> 00:30:25,880 and solving implementing electoral algorithms using user's input. 593 00:30:25,880 --> 00:30:27,940 So let's consider a fundamentally different data 594 00:30:27,940 --> 00:30:31,940 type that stores, not individual integers, but strings of text instead. 595 00:30:31,940 --> 00:30:34,150 So for instance, in any program involving a string, 596 00:30:34,150 --> 00:30:38,245 you might have a line of code that looks like this. string s equals, quote 597 00:30:38,245 --> 00:30:40,090 unquote, "HI!" 598 00:30:40,090 --> 00:30:41,852 in all caps with an exclamation point. 599 00:30:41,852 --> 00:30:44,560 So that may very well be a line of code that we've seen thus far. 600 00:30:44,560 --> 00:30:46,935 What's actually going on inside of the computer's memory? 601 00:30:46,935 --> 00:30:51,340 Well, let me propose that when you type in quote unquote, "HI!" in a computer, 602 00:30:51,340 --> 00:30:53,780 it ends up somewhere in your computer's memory. 603 00:30:53,780 --> 00:30:58,840 So HI exclamation point, plus, per last week, a backslash 0-- or two weeks ago, 604 00:30:58,840 --> 00:31:04,040 a backslash 0, which is how a computer represents the end of that string. 605 00:31:04,040 --> 00:31:06,100 But let's look a little more carefully at 606 00:31:06,100 --> 00:31:08,350 what is going on underneath this hood here. 607 00:31:08,350 --> 00:31:12,190 Technically speaking, I could address those individual characters 608 00:31:12,190 --> 00:31:16,280 we have seen as of week two, by using bracket notation like s bracket 0, 609 00:31:16,280 --> 00:31:18,910 s bracket 1, s bracket 2, and s bracket 3. 610 00:31:18,910 --> 00:31:22,427 We use the square bracket notation to treat a string 611 00:31:22,427 --> 00:31:24,010 as though it's an array of characters. 612 00:31:24,010 --> 00:31:26,900 And it is, it was, and it still is. 613 00:31:26,900 --> 00:31:32,230 But it turns out, strings can also be manipulated by way of their addresses 614 00:31:32,230 --> 00:31:32,960 as well. 615 00:31:32,960 --> 00:31:36,640 And so for instance, maybe this same exact string, HI, 616 00:31:36,640 --> 00:31:43,480 is stored at memory address 0x123 and then 0x124, 0x125, and 0x126. 617 00:31:43,480 --> 00:31:46,150 Notice that they're deliberately contiguous 618 00:31:46,150 --> 00:31:47,560 addresses, back to back to back. 619 00:31:47,560 --> 00:31:50,870 And they're only one byte apart, because each of these chars, of course, 620 00:31:50,870 --> 00:31:53,140 is just one byte in C. 621 00:31:53,140 --> 00:31:56,920 So those numbers are not important, specifically. 622 00:31:56,920 --> 00:31:59,530 But the fact that they're one byte apart from each other 623 00:31:59,530 --> 00:32:02,350 is important, because that's the definition of a string, 624 00:32:02,350 --> 00:32:05,470 and indeed, an array, to have memory back to back to back. 625 00:32:05,470 --> 00:32:08,140 Now, what exactly, though, is S? 626 00:32:08,140 --> 00:32:11,530 S was the name of the variable I gave a moment ago to go to that line of code, 627 00:32:11,530 --> 00:32:13,840 string S equals quote unquote, "HI." 628 00:32:13,840 --> 00:32:14,710 well, what is S? 629 00:32:14,710 --> 00:32:18,950 S is a variable that has to go somewhere in the computer's memory. 630 00:32:18,950 --> 00:32:24,880 And suppose that S is, indeed, HI with an exclamation point. 631 00:32:24,880 --> 00:32:28,600 And the HI happens to live at this location here. 632 00:32:28,600 --> 00:32:31,390 You know what you can think of S as being now, 633 00:32:31,390 --> 00:32:34,840 isn't, at a high level, a string, but at a lower level, 634 00:32:34,840 --> 00:32:37,300 it's just the address of a string. 635 00:32:37,300 --> 00:32:40,780 More specifically, let's start thinking about a string 636 00:32:40,780 --> 00:32:46,297 as technically being just the address of the first character in the string. 637 00:32:46,297 --> 00:32:48,130 Now, that might give you pause for a moment, 638 00:32:48,130 --> 00:32:49,810 because why the first character? 639 00:32:49,810 --> 00:32:53,710 How are you going to remember that, wait a minute, this string isn't at and only 640 00:32:53,710 --> 00:32:54,940 at 0x123. 641 00:32:54,940 --> 00:33:00,110 It also continues at 0x124, 0x125, and so forth. 642 00:33:00,110 --> 00:33:02,950 But let me pause and ask the group here, why 643 00:33:02,950 --> 00:33:06,110 might it very well be sufficient for a computer 644 00:33:06,110 --> 00:33:12,550 and us programmers to just think of strings in terms of being 645 00:33:12,550 --> 00:33:15,460 the address of the very first byte. 646 00:33:15,460 --> 00:33:18,220 Like, why is it sufficient, no matter how long 647 00:33:18,220 --> 00:33:20,830 the string is, even if it's a whole paragraph of text, 648 00:33:20,830 --> 00:33:25,360 why is it very cleverly sufficient to think of a string like S 649 00:33:25,360 --> 00:33:31,420 as just being identical to the address of the first byte? 650 00:33:31,420 --> 00:33:33,718 Ginni, is it? 651 00:33:33,718 --> 00:33:37,480 AUDIENCE: Possibly because it happens that strings, whenever we are defining 652 00:33:37,480 --> 00:33:39,490 a new string, that is altogether. 653 00:33:39,490 --> 00:33:44,410 Suppose, if I'm writing my name, Ginni, so it will be G-I-N-N-I altogether. 654 00:33:44,410 --> 00:33:46,810 So it will be sufficient if something is pointed 655 00:33:46,810 --> 00:33:50,560 towards just first character of my name, so that I can just 656 00:33:50,560 --> 00:33:55,895 follow up for the first character and then get all the characters afterwards. 657 00:33:55,895 --> 00:33:56,770 DAVID MALAN: Perfect. 658 00:33:56,770 --> 00:33:59,800 So all of these basic definitions we had over the past couple of weeks 659 00:33:59,800 --> 00:34:00,790 now come together. 660 00:34:00,790 --> 00:34:02,812 If a string is just an array of characters-- 661 00:34:02,812 --> 00:34:05,020 and by definition of array, those characters are back 662 00:34:05,020 --> 00:34:09,280 to back to back, and per two weeks ago, every string 663 00:34:09,280 --> 00:34:13,300 ends with this conventional backslash zero or nul character. 664 00:34:13,300 --> 00:34:15,550 All you need to do when thinking about a string 665 00:34:15,550 --> 00:34:17,530 is just to know where does the string begin, 666 00:34:17,530 --> 00:34:19,719 because you can use a four loop or a while loop 667 00:34:19,719 --> 00:34:22,540 or some other heuristic with a condition and a Boolean expression 668 00:34:22,540 --> 00:34:25,929 to figure out where the string ends without even knowing, 669 00:34:25,929 --> 00:34:27,710 in advance, its length. 670 00:34:27,710 --> 00:34:30,159 So that is to say, let's start, for the moment, 671 00:34:30,159 --> 00:34:32,679 thinking of about strings as being quite simply 672 00:34:32,679 --> 00:34:37,969 that, just the address of the first character in the string. 673 00:34:37,969 --> 00:34:40,989 And if we then take that as fact, let's go ahead, now, 674 00:34:40,989 --> 00:34:43,989 and start playing with a program that doesn't use integers, but instead, 675 00:34:43,989 --> 00:34:46,570 used strings, using this basic primitive. 676 00:34:46,570 --> 00:34:49,929 So let me go ahead and delete the code I'd written before, an address.c. 677 00:34:49,929 --> 00:34:54,580 Let me just change it up to be string equals quote unquote, "HI" semicolon. 678 00:34:54,580 --> 00:34:57,700 And notice, I'm not manually typing any backslash 0's. 679 00:34:57,700 --> 00:34:59,560 C does that for us automatically. 680 00:34:59,560 --> 00:35:02,260 When you close the quote, the compiler takes care 681 00:35:02,260 --> 00:35:04,158 of adding that backslash 0 for you. 682 00:35:04,158 --> 00:35:05,950 Now, I'm going to go ahead on the next line 683 00:35:05,950 --> 00:35:10,042 and go ahead and print out percent s backslash n comma s, 684 00:35:10,042 --> 00:35:11,500 if I want to print out that string. 685 00:35:11,500 --> 00:35:13,968 Now, this program is not at all interesting anymore. 686 00:35:13,968 --> 00:35:15,760 Back in week one, we wrote something like-- 687 00:35:15,760 --> 00:35:18,730 OK, yes it is interesting because I screwed up. 688 00:35:18,730 --> 00:35:19,780 So five errors. 689 00:35:19,780 --> 00:35:22,450 I've written seven lines of code and five errors. 690 00:35:22,450 --> 00:35:24,070 And let's see what's going on. 691 00:35:24,070 --> 00:35:27,430 As always, always go to the top, because odds are, 692 00:35:27,430 --> 00:35:29,650 there's just some confusing cascading effect. 693 00:35:29,650 --> 00:35:34,090 The very first error I see is use of undeclared identifier string. 694 00:35:34,090 --> 00:35:35,230 Did I mean standard n? 695 00:35:35,230 --> 00:35:37,900 I didn't mean standard n, string, string, string. 696 00:35:37,900 --> 00:35:40,780 So I could run help 50 as my frontier, but honestly, I 697 00:35:40,780 --> 00:35:43,150 make this mistake often enough that I kind of know now 698 00:35:43,150 --> 00:35:46,690 that I forgot to include cs50.h. 699 00:35:46,690 --> 00:35:49,960 And indeed, if I now do this and recompile make address-- 700 00:35:49,960 --> 00:35:53,080 OK, all five errors are gone just by that one simple change. 701 00:35:53,080 --> 00:35:56,200 And if I run address now, it's just going to, quite simply, say HI. 702 00:35:56,200 --> 00:35:59,020 But let's now start to consider what's going 703 00:35:59,020 --> 00:36:00,650 on underneath the hood of this program. 704 00:36:00,650 --> 00:36:06,040 Suppose I am curious and want to print out what is actually 705 00:36:06,040 --> 00:36:08,170 the address at which this string lives. 706 00:36:08,170 --> 00:36:09,520 Well, it turns out-- 707 00:36:09,520 --> 00:36:10,690 let me be clever here. 708 00:36:10,690 --> 00:36:14,830 Let me print out, not a format code of percent s, but percent p. 709 00:36:14,830 --> 00:36:18,290 Show me this same string as an address. 710 00:36:18,290 --> 00:36:22,060 Let me go ahead and recompile, make address, seems to compile OK. 711 00:36:22,060 --> 00:36:23,560 Let me run dot slash address. 712 00:36:23,560 --> 00:36:26,350 And again, I'm still printing s, but I'm asking printf 713 00:36:26,350 --> 00:36:30,260 to present it as though it's a pointer. 714 00:36:30,260 --> 00:36:32,430 And interesting, it's not the same as before. 715 00:36:32,430 --> 00:36:35,060 But again, that's reasonable because the memory addresses 716 00:36:35,060 --> 00:36:36,540 aren't going to always be the same. 717 00:36:36,540 --> 00:36:37,940 But it doesn't matter what it is. 718 00:36:37,940 --> 00:36:39,232 But that's kind of interesting. 719 00:36:39,232 --> 00:36:41,750 All this time, any time you've been using strings, 720 00:36:41,750 --> 00:36:44,300 had you just changed your percent s to a percent p, 721 00:36:44,300 --> 00:36:48,290 you could have seen where, in memory, that string actually starts. 722 00:36:48,290 --> 00:36:50,780 It's not functionally useful to us just yet. 723 00:36:50,780 --> 00:36:52,700 But it's been there this whole time. 724 00:36:52,700 --> 00:36:54,800 And let me go ahead and do the following now. 725 00:36:54,800 --> 00:36:58,950 Suppose I get a little curious further, and I do printf. 726 00:36:58,950 --> 00:37:02,390 Let me go ahead and print out another address followed by a new line. 727 00:37:02,390 --> 00:37:07,035 And let me go ahead and print out the address of the first character. 728 00:37:07,035 --> 00:37:08,660 So again, this is a little weird to do. 729 00:37:08,660 --> 00:37:10,220 And we wouldn't typically do this that often. 730 00:37:10,220 --> 00:37:13,430 But again, just to make the point that these operators give us very simple 731 00:37:13,430 --> 00:37:16,850 answers to questions like, what is the address of this thing? 732 00:37:16,850 --> 00:37:23,960 If s bracket i, as of week two in CS50, represented the second character in s, 733 00:37:23,960 --> 00:37:28,190 because 0 index means s bracket 0 is the first, s bracket 1 is the second. 734 00:37:28,190 --> 00:37:30,410 If I play around with today's new operator, 735 00:37:30,410 --> 00:37:36,020 this ampersand, I bet I can see the address of that second character. 736 00:37:36,020 --> 00:37:38,390 And in fact, let me go ahead and be more explicit. 737 00:37:38,390 --> 00:37:43,160 Let me change this first s to be s bracket 0 and put an ampersand here. 738 00:37:43,160 --> 00:37:46,430 And let me go ahead, now, and make this program, make address. 739 00:37:46,430 --> 00:37:48,170 OK, a little funky-- 740 00:37:48,170 --> 00:37:49,680 I just missed a semicolon. 741 00:37:49,680 --> 00:37:51,060 So easy fix there. 742 00:37:51,060 --> 00:37:53,600 Let me go ahead and recompile with make address. 743 00:37:53,600 --> 00:37:55,880 Let me go ahead and run dot slash address. 744 00:37:55,880 --> 00:37:58,970 And interesting, well, maybe-- 745 00:37:58,970 --> 00:38:00,320 interesting to me. 746 00:38:00,320 --> 00:38:02,780 So you see, now, two addresses, the first of which 747 00:38:02,780 --> 00:38:08,900 is 0x4006a4, which apparently, is the address of the first character in s. 748 00:38:08,900 --> 00:38:10,880 But notice what's curious about the next one. 749 00:38:10,880 --> 00:38:15,720 It's almost the same except the byte is one further away. 750 00:38:15,720 --> 00:38:18,380 And I bet if I do this, not just for the h and the i, 751 00:38:18,380 --> 00:38:20,330 but also the exclamation point-- let me do 752 00:38:20,330 --> 00:38:23,210 one more line of almost identical code, just 753 00:38:23,210 --> 00:38:26,240 to make the point that all this time it's, indeed, 754 00:38:26,240 --> 00:38:30,560 been the case that all characters in a string are back to back to back. 755 00:38:30,560 --> 00:38:32,540 And you can now see it in code. 756 00:38:32,540 --> 00:38:37,610 b4, b5, b6, are just one byte apart. 757 00:38:37,610 --> 00:38:40,940 So we see some visual confirmation, now, that strings are indeed 758 00:38:40,940 --> 00:38:42,990 laid out in memory just like this. 759 00:38:42,990 --> 00:38:46,130 Now, again, this is not a very useful programmatic exercise 760 00:38:46,130 --> 00:38:48,500 to look at the address of individual characters. 761 00:38:48,500 --> 00:38:51,350 But again, this is just to emphasize that underneath the hood, 762 00:38:51,350 --> 00:38:53,960 some relatively simple operations are being 763 00:38:53,960 --> 00:38:58,562 enabled by way of this new ampersand, and in turn, star operator. 764 00:38:58,562 --> 00:39:00,770 So let's consider for a moment what this really looks 765 00:39:00,770 --> 00:39:02,390 like inside the computer's memory. 766 00:39:02,390 --> 00:39:05,660 At a low level, yes, s is technically an address. 767 00:39:05,660 --> 00:39:08,540 And yes, it's technically the address of the first byte, 768 00:39:08,540 --> 00:39:10,880 which in the actual computer, looked different. 769 00:39:10,880 --> 00:39:13,100 But in my slide here, I just arbitrarily proposed 770 00:39:13,100 --> 00:39:17,210 that it's at 0x123, 0x124, 0x125. 771 00:39:17,210 --> 00:39:20,300 But again, let's not care about that level of detail. 772 00:39:20,300 --> 00:39:23,210 Let's just kind of wave our hands and abstract away these addresses 773 00:39:23,210 --> 00:39:30,950 and just now start thinking of s, that is a string, as technically just being 774 00:39:30,950 --> 00:39:32,450 a pointer. 775 00:39:32,450 --> 00:39:33,260 A pointer. 776 00:39:33,260 --> 00:39:36,463 So it turns out that even though it's very useful and very common 777 00:39:36,463 --> 00:39:39,380 to think of strings as, obviously, just being sequences of characters. 778 00:39:39,380 --> 00:39:41,240 And that's been true since week one. 779 00:39:41,240 --> 00:39:43,130 And you can also think of them as arrays, 780 00:39:43,130 --> 00:39:44,990 back to back sequences of characters. 781 00:39:44,990 --> 00:39:47,330 You can also, it turns out, starting today, 782 00:39:47,330 --> 00:39:51,290 think of them as just being pointers, that is, 783 00:39:51,290 --> 00:39:54,900 the address of a character somewhere in the computer's memory. 784 00:39:54,900 --> 00:39:58,550 And as Ginni notes, because all of the characters in a string 785 00:39:58,550 --> 00:40:00,770 are, by definition, back to back to back, 786 00:40:00,770 --> 00:40:05,720 and because, by definition, all strings end with a backslash 0, that 787 00:40:05,720 --> 00:40:08,750 is literally the smallest and only amount of information 788 00:40:08,750 --> 00:40:12,920 you need to keep around in a computer to know where all of your strings are. 789 00:40:12,920 --> 00:40:16,340 Just remember the address of the very first character 790 00:40:16,340 --> 00:40:19,430 therein, because you can find your way to the end 791 00:40:19,430 --> 00:40:24,320 by remembering that this backslash 0 is, really, just eight 0 792 00:40:24,320 --> 00:40:27,080 bits, otherwise represented as backslash 0. 793 00:40:27,080 --> 00:40:29,617 And so we could certainly have an if condition, 794 00:40:29,617 --> 00:40:31,700 much like we did two weeks ago when playing around 795 00:40:31,700 --> 00:40:36,230 with the lengths of strings, that allows us to check for precisely that. 796 00:40:36,230 --> 00:40:41,030 And so when I say we're taking off some training wheels, here they go. 797 00:40:41,030 --> 00:40:44,330 So up until now, we've been using, again, the CS50 library, 798 00:40:44,330 --> 00:40:47,470 which gives us, conveniently, functions like get string and get int 799 00:40:47,470 --> 00:40:49,650 and get float and so forth. 800 00:40:49,650 --> 00:40:54,650 But all this time, the CS50 library, specifically the file, cs50.h, 801 00:40:54,650 --> 00:40:58,070 had a little bit of a pedagogical simplification in it. 802 00:40:58,070 --> 00:41:02,510 Recall last week, that you can define your own custom data types. 803 00:41:02,510 --> 00:41:06,955 Well, it turns out that all this time, we've been claiming that strings exist 804 00:41:06,955 --> 00:41:09,080 and they're something you can use in your programs. 805 00:41:09,080 --> 00:41:14,420 And strings do exist in C. They do exist in Python, in JavaScript, in Java, 806 00:41:14,420 --> 00:41:16,980 and C++, in many, many, many other languages. 807 00:41:16,980 --> 00:41:18,860 This is not a CS50 term. 808 00:41:18,860 --> 00:41:25,190 But string, technically, does not exist as a data type in C. It instead, 809 00:41:25,190 --> 00:41:31,180 is more cryptically and more low-level known as char star. 810 00:41:31,180 --> 00:41:33,080 Char star, now what does that mean? 811 00:41:33,080 --> 00:41:37,180 Well, char star, much like our int star a few minutes ago, 812 00:41:37,180 --> 00:41:40,840 just represents the address of a character, much like int star 813 00:41:40,840 --> 00:41:43,210 represents the address of an int. 814 00:41:43,210 --> 00:41:46,210 And if, again, you kind of agree with me now, 815 00:41:46,210 --> 00:41:49,450 that you can think of strings as sequences of characters, 816 00:41:49,450 --> 00:41:52,660 or more specifically, arrays of characters, or more specifically, 817 00:41:52,660 --> 00:41:56,920 as of today, the address of just the first character, 818 00:41:56,920 --> 00:41:59,680 then it's, indeed, the case that we now can 819 00:41:59,680 --> 00:42:02,800 apply this new terminology, today, of pointer, 820 00:42:02,800 --> 00:42:06,040 to our old familiar friends, strings. 821 00:42:06,040 --> 00:42:10,690 String is the same thing as a synonym, if you will, for char star. 822 00:42:10,690 --> 00:42:14,200 And it's in the CS50 library that we, essentially, have a line of code 823 00:42:14,200 --> 00:42:18,348 that simplifies or abstracts away char star, which honestly, no one wants 824 00:42:18,348 --> 00:42:20,890 to think about or struggle with in the first week of a class, 825 00:42:20,890 --> 00:42:23,260 let alone the first two or three weeks of a class. 826 00:42:23,260 --> 00:42:28,475 It's a simplification, a custom data type, that we name string, 827 00:42:28,475 --> 00:42:30,850 just so you don't have to think about, what is this star? 828 00:42:30,850 --> 00:42:32,017 What is it to the character? 829 00:42:32,017 --> 00:42:33,100 What is it an address of? 830 00:42:33,100 --> 00:42:37,450 But today, we can remove those training wheels and reveal that, all this time, 831 00:42:37,450 --> 00:42:40,720 you've just been manipulating characters at specific addresses. 832 00:42:40,720 --> 00:42:43,180 And we've used this kind of technique before, 833 00:42:43,180 --> 00:42:45,550 abstracting away these lower level details. 834 00:42:45,550 --> 00:42:48,310 For instance, recall last week, that we introduced 835 00:42:48,310 --> 00:42:52,630 this notion of a struct, a data type that you can customize to be your own. 836 00:42:52,630 --> 00:42:56,200 We implemented a better phone book by wrapping together 837 00:42:56,200 --> 00:42:58,630 a name and a number inside of a custom data type, 838 00:42:58,630 --> 00:43:01,960 encapsulating them if you will, inside of something we called person. 839 00:43:01,960 --> 00:43:05,650 And every person we claimed had a structure 840 00:43:05,650 --> 00:43:07,580 that contains a name and a number. 841 00:43:07,580 --> 00:43:11,410 And by the way of this feature of C, typedef, we can define a new type. 842 00:43:11,410 --> 00:43:15,200 And the name of that type, last week, was just person. 843 00:43:15,200 --> 00:43:18,100 So we're using, already, and we have been sort of secretly 844 00:43:18,100 --> 00:43:22,750 using since the first week of C in the class, a line of code that 845 00:43:22,750 --> 00:43:24,020 actually looks like this. 846 00:43:24,020 --> 00:43:28,090 And this is, indeed, one of the lines of code inside of cs50.h. 847 00:43:28,090 --> 00:43:31,000 It says typedef, which means give me a custom type. 848 00:43:31,000 --> 00:43:35,770 And it creates a synonym for char star called string. 849 00:43:35,770 --> 00:43:39,700 And it's just a way where we can hide the funky char star. 850 00:43:39,700 --> 00:43:42,070 We can hide the asterisk, in particular, which would not 851 00:43:42,070 --> 00:43:43,990 be fun to play with in the first few days, 852 00:43:43,990 --> 00:43:47,200 but without changing the definition of what a string is. 853 00:43:47,200 --> 00:43:51,850 So strings exist in C. But there's no data type called string in C 854 00:43:51,850 --> 00:43:56,020 until you use a library like CS50's, which makes it exist 855 00:43:56,020 --> 00:43:58,930 by way of that kind of definition. 856 00:43:58,930 --> 00:44:01,450 All right, let me pause here to see if there's 857 00:44:01,450 --> 00:44:03,760 any questions, then, about what strings are 858 00:44:03,760 --> 00:44:09,360 or these new ways of thinking about them. 859 00:44:09,360 --> 00:44:13,390 Any questions about strings or char stars? 860 00:44:13,390 --> 00:44:15,140 All right, well, if no questions here, why 861 00:44:15,140 --> 00:44:17,515 don't we go ahead and take our 5 minute break here first. 862 00:44:17,515 --> 00:44:19,790 And we'll be back in 5 and take another look 863 00:44:19,790 --> 00:44:22,040 at what we can now do with these new primitives. 864 00:44:22,040 --> 00:44:23,480 All right, we're back. 865 00:44:23,480 --> 00:44:27,680 And we have, now, this ability in code to get the address of some variable 866 00:44:27,680 --> 00:44:30,140 and also to go to an address using ampersand 867 00:44:30,140 --> 00:44:31,850 and the asterisk, respectively. 868 00:44:31,850 --> 00:44:36,530 We've thought about strings as being not only contiguous sequences 869 00:44:36,530 --> 00:44:38,150 of characters, but also arrays. 870 00:44:38,150 --> 00:44:42,477 And then of course, as of today now, actual addresses, 871 00:44:42,477 --> 00:44:44,810 the address of the first character and then, from there, 872 00:44:44,810 --> 00:44:46,940 can we find our way, programmatically, to the end, 873 00:44:46,940 --> 00:44:48,380 thanks to that nul character. 874 00:44:48,380 --> 00:44:52,220 But it turns out there's one other thing we can do with these addresses 875 00:44:52,220 --> 00:44:53,840 or with pointers more generally. 876 00:44:53,840 --> 00:44:55,550 And that's known as pointer arithmetic. 877 00:44:55,550 --> 00:44:58,577 So anything that's a number, of course, we can do math on. 878 00:44:58,577 --> 00:45:00,410 And the math is not going to be complicated, 879 00:45:00,410 --> 00:45:03,390 but it is going to be powerful for us here. 880 00:45:03,390 --> 00:45:07,040 So I'm going to go back to my most recent state of address.c. 881 00:45:07,040 --> 00:45:11,480 And let me go ahead, now, and reiterate that we can print out 882 00:45:11,480 --> 00:45:15,800 the individual characters in a string, just like we did back in week two, 883 00:45:15,800 --> 00:45:18,270 as by using our square bracket notation. 884 00:45:18,270 --> 00:45:21,170 So I'm getting rid of all evidence of those addresses for now. 885 00:45:21,170 --> 00:45:23,420 I'm recompiling this program as make address. 886 00:45:23,420 --> 00:45:25,650 And then I'm going to run dot slash address now. 887 00:45:25,650 --> 00:45:29,690 And I see HI exclamation point, one character per line. 888 00:45:29,690 --> 00:45:34,290 But now, consider that there doesn't need to be a string data type. 889 00:45:34,290 --> 00:45:36,320 In fact, we can take this training wheel off. 890 00:45:36,320 --> 00:45:38,690 And while it might feel a little uncomfortable at first, 891 00:45:38,690 --> 00:45:42,620 if I delete this first line altogether, as I've accidentally omitted anyway 892 00:45:42,620 --> 00:45:45,660 sometimes, I don't need to keep calling things strings. 893 00:45:45,660 --> 00:45:47,570 I can describe them as strings verbally. 894 00:45:47,570 --> 00:45:49,790 I can think of them as strings, because string 895 00:45:49,790 --> 00:45:53,150 is a thing in many different programming languages. 896 00:45:53,150 --> 00:45:56,070 But by default, in C, it just doesn't exist as a type. 897 00:45:56,070 --> 00:45:59,750 Instead, the type is somewhat cryptically named, char star. 898 00:45:59,750 --> 00:46:02,840 But again, all that means is that the star means here's 899 00:46:02,840 --> 00:46:04,010 the address of something. 900 00:46:04,010 --> 00:46:06,140 Char means it's the address of a char. 901 00:46:06,140 --> 00:46:09,950 So char star gives you a pointer variable 902 00:46:09,950 --> 00:46:12,720 that's going to point to a character. 903 00:46:12,720 --> 00:46:16,080 So now, if s is that, I can actually treat it the same. 904 00:46:16,080 --> 00:46:20,960 There's no reason I can't keep using s like a string was back in week two, 905 00:46:20,960 --> 00:46:22,400 using our square bracket notation. 906 00:46:22,400 --> 00:46:24,770 And I can keep printing out HI exclamation point 907 00:46:24,770 --> 00:46:27,320 using that same square bracket syntax. 908 00:46:27,320 --> 00:46:30,170 But there's one other way I can do this. 909 00:46:30,170 --> 00:46:35,150 If I now know that s is really just an address, 910 00:46:35,150 --> 00:46:37,760 I can get rid of this square bracket notation. 911 00:46:37,760 --> 00:46:42,860 And I can actually just do star s, because recall that star, in addition 912 00:46:42,860 --> 00:46:47,270 to being the new symbol that we use when declaring a pointer up here, 913 00:46:47,270 --> 00:46:50,990 it's also the same symbol, confusingly, admittedly, 914 00:46:50,990 --> 00:46:53,310 that we used to go to an address. 915 00:46:53,310 --> 00:46:57,650 So if s is storing an address, which it is by definition of being a pointer, 916 00:46:57,650 --> 00:46:59,900 star s means go to that address. 917 00:46:59,900 --> 00:47:02,000 And per my picture earlier, it would seem 918 00:47:02,000 --> 00:47:08,060 to be the case that s is most likely at an address beginning at 0x123. 919 00:47:08,060 --> 00:47:10,250 It's not going to be the same in my actual IDE here. 920 00:47:10,250 --> 00:47:12,167 It will be whatever the computer has ordained. 921 00:47:12,167 --> 00:47:14,610 But it's going to be the same exact idea. 922 00:47:14,610 --> 00:47:17,150 So let me go ahead and go to star s. 923 00:47:17,150 --> 00:47:20,130 And just for kicks, let me leave it as just that one line. 924 00:47:20,130 --> 00:47:23,870 So let me go ahead and rerun this as make address. 925 00:47:23,870 --> 00:47:25,470 All right, and now dot slash address. 926 00:47:25,470 --> 00:47:30,710 I should see, hopefully, a capital H and only an H. But watch this. 927 00:47:30,710 --> 00:47:34,400 If I know that s, a string, is technically just an address, 928 00:47:34,400 --> 00:47:35,960 I can actually now do math on it. 929 00:47:35,960 --> 00:47:39,470 And I can go ahead and print out another character, followed by a new line. 930 00:47:39,470 --> 00:47:44,090 And I can go to, not s, but how about s plus 1. 931 00:47:44,090 --> 00:47:47,600 So I can do some very simple arithmetic, if you will, on that pointer. 932 00:47:47,600 --> 00:47:49,920 And let me go ahead and now recompile this. 933 00:47:49,920 --> 00:47:54,800 So make address, compiles OK, dot slash address. 934 00:47:54,800 --> 00:47:56,570 And I should see HI. 935 00:47:56,570 --> 00:48:01,790 And if I do one more line of code like this, printf, percent c, backslash n, 936 00:48:01,790 --> 00:48:07,130 star s plus 2, I can now go to the character 937 00:48:07,130 --> 00:48:10,770 that is two bytes away from whatever s is, 938 00:48:10,770 --> 00:48:12,480 which again, is the start of the string. 939 00:48:12,480 --> 00:48:15,890 So now, I've reprinted HI with the exclamation point character 940 00:48:15,890 --> 00:48:19,280 by character, but not by using this fancy square bracket 941 00:48:19,280 --> 00:48:24,710 notation, fancy only in the sense that it was sort of an abstraction for us, 942 00:48:24,710 --> 00:48:25,670 if you will. 943 00:48:25,670 --> 00:48:28,885 I'm instead, manipulating s for what it really is, which is just an address. 944 00:48:28,885 --> 00:48:31,010 And so here, too, and I've used this phrase before, 945 00:48:31,010 --> 00:48:33,710 that square bracket notation that we introduced in week two, 946 00:48:33,710 --> 00:48:36,410 is technically just syntactic sugar. 947 00:48:36,410 --> 00:48:39,500 It's not doing anything fundamentally different 948 00:48:39,500 --> 00:48:42,770 from these asterisks and these addresses. 949 00:48:42,770 --> 00:48:45,440 It's just doing it, honestly, in a much more user-friendly way. 950 00:48:45,440 --> 00:48:49,160 I still prefer, personally, the square bracket notation from week two. 951 00:48:49,160 --> 00:48:54,680 But it's the same thing as using the star and doing this math yourself. 952 00:48:54,680 --> 00:48:57,020 So C is just providing us with this handy feature 953 00:48:57,020 --> 00:49:00,200 of using square brackets that does all of this so-called pointer 954 00:49:00,200 --> 00:49:02,360 arithmetic for you. 955 00:49:02,360 --> 00:49:04,290 But again, we're going to this low level just 956 00:49:04,290 --> 00:49:10,310 to emphasize what it is that's going on ultimately underneath the hood here. 957 00:49:10,310 --> 00:49:13,070 All right, let me pause here for any questions. 958 00:49:13,070 --> 00:49:17,290 And Brian, please do feel free to verbalize any on your end. 959 00:49:17,290 --> 00:49:19,790 BRIAN: I see a question that came in about what would happen 960 00:49:19,790 --> 00:49:22,233 if you tried to print star s plus 3. 961 00:49:22,233 --> 00:49:25,400 DAVID MALAN: So I'm pretty sure that's going to print out the nul character. 962 00:49:25,400 --> 00:49:27,233 But let's go ahead and confirm as much here, 963 00:49:27,233 --> 00:49:31,760 percent c backslash n star s plus 3. 964 00:49:31,760 --> 00:49:35,120 All right, I'm getting a little adventurous here 965 00:49:35,120 --> 00:49:38,060 by looking at things I maybe shouldn't be looking at, because that's 966 00:49:38,060 --> 00:49:39,545 a low level implementation detail. 967 00:49:39,545 --> 00:49:40,670 But let's see what happens. 968 00:49:40,670 --> 00:49:43,130 It compiles OK, dot slash address. 969 00:49:43,130 --> 00:49:44,780 And it seems to be blank. 970 00:49:44,780 --> 00:49:46,730 Now, maybe that's the nul character. 971 00:49:46,730 --> 00:49:48,980 Honestly, it's not meant to be a printable character. 972 00:49:48,980 --> 00:49:52,770 It's this special sentinel value that indicates the end of the string. 973 00:49:52,770 --> 00:49:54,020 But I could do this. 974 00:49:54,020 --> 00:49:57,170 I know from week two that chars are integers 975 00:49:57,170 --> 00:49:59,670 and integers are chars if I want to think of them that way. 976 00:49:59,670 --> 00:50:01,880 So let me change only the very last character 977 00:50:01,880 --> 00:50:03,950 to use the format code percent i. 978 00:50:03,950 --> 00:50:05,690 Let me recompile my code. 979 00:50:05,690 --> 00:50:07,940 Let me go ahead and run address. 980 00:50:07,940 --> 00:50:11,540 And voila, HI exclamation 0. 981 00:50:11,540 --> 00:50:16,400 And there is the all 0 bits represented here as one single decimal digit thanks 982 00:50:16,400 --> 00:50:17,570 to percent i. 983 00:50:17,570 --> 00:50:19,970 Now, I can get really crazy here. 984 00:50:19,970 --> 00:50:23,420 And why don't we go ahead and print out not just what characters 985 00:50:23,420 --> 00:50:28,580 are right after this sequence, HI exclamation point nul character, 986 00:50:28,580 --> 00:50:33,770 why don't we go to-- oh heck, how about address 1,000 bytes away, 987 00:50:33,770 --> 00:50:35,990 and really get nosy inside of my computer? 988 00:50:35,990 --> 00:50:38,450 Let me recompile that dot slash address. 989 00:50:38,450 --> 00:50:40,460 OK, nothing really going on over there. 990 00:50:40,460 --> 00:50:42,620 How about 10,000 bytes away? 991 00:50:42,620 --> 00:50:44,270 Let me go ahead and make address. 992 00:50:44,270 --> 00:50:47,990 Let me go ahead and run this segmentation fault. All, right 993 00:50:47,990 --> 00:50:49,010 that's bad. 994 00:50:49,010 --> 00:50:53,030 And you might be among the fortunate few who have seen this error before 995 00:50:53,030 --> 00:50:54,440 by touching memory you shouldn't. 996 00:50:54,440 --> 00:50:56,607 And we're going to deliberately consider this today. 997 00:50:56,607 --> 00:50:59,540 But a segmentation fault, indeed, means that you have done something 998 00:50:59,540 --> 00:51:01,430 wrong somewhere in your code. 999 00:51:01,430 --> 00:51:04,000 And it tends to mean that you touched a segment of memory 1000 00:51:04,000 --> 00:51:05,000 that you shouldn't have. 1001 00:51:05,000 --> 00:51:08,750 And I have no business, honestly, looking 10,000 bytes away 1002 00:51:08,750 --> 00:51:11,420 from the memory that I know belongs to the string. 1003 00:51:11,420 --> 00:51:14,670 That's like arbitrarily looking anywhere in your computer's memory, 1004 00:51:14,670 --> 00:51:16,890 which probably, it seems, is not a good idea. 1005 00:51:16,890 --> 00:51:19,000 But more on that in just a bit. 1006 00:51:19,000 --> 00:51:21,470 So let's consider, now, some of the implications 1007 00:51:21,470 --> 00:51:25,130 of these underlying implementation details 1008 00:51:25,130 --> 00:51:28,580 and consider, now, from last week, why we did a few things the way 1009 00:51:28,580 --> 00:51:30,590 we did in the past few weeks, in fact. 1010 00:51:30,590 --> 00:51:32,360 So string is just a char star. 1011 00:51:32,360 --> 00:51:33,860 And let's, now, consider an example. 1012 00:51:33,860 --> 00:51:37,260 Let me zoom out on my memory, just so I can cram more in at once. 1013 00:51:37,260 --> 00:51:39,620 Let's consider an example where I might want to write 1014 00:51:39,620 --> 00:51:42,570 a program that compares two strings. 1015 00:51:42,570 --> 00:51:45,830 Let me go ahead and write some new code here in a new file this time, 1016 00:51:45,830 --> 00:51:48,350 called, for instance, compare.c. 1017 00:51:48,350 --> 00:51:50,480 My goal with this program, quite simply, is 1018 00:51:50,480 --> 00:51:55,580 going to be to print out the contents of-- or rather to compare 1019 00:51:55,580 --> 00:51:57,590 two strings that the user might input. 1020 00:51:57,590 --> 00:52:00,040 I'm going to go ahead and include cs59.h, 1021 00:52:00,040 --> 00:52:02,810 not because I want string, per say, anymore, 1022 00:52:02,810 --> 00:52:05,750 but because I want to use get string just for convenience. 1023 00:52:05,750 --> 00:52:08,180 But we'll take that training wheel off in a bit, too. 1024 00:52:08,180 --> 00:52:10,520 And in this program, I'm going to go ahead and first 1025 00:52:10,520 --> 00:52:11,690 use, not get string yet. 1026 00:52:11,690 --> 00:52:14,450 Let me go ahead and keep it simple and start with get int. 1027 00:52:14,450 --> 00:52:16,910 And I'll ask the user for a variable i. 1028 00:52:16,910 --> 00:52:19,340 And let me do another one of these in get int and ask 1029 00:52:19,340 --> 00:52:21,270 the user for a value for j. 1030 00:52:21,270 --> 00:52:24,665 And then let me go ahead and quite simply say, if i equals equals j, 1031 00:52:24,665 --> 00:52:28,790 then go ahead and print out same else. 1032 00:52:28,790 --> 00:52:31,770 Let me go ahead and print out different. 1033 00:52:31,770 --> 00:52:35,930 So this is week one stuff, where I'm using a couple of variables. 1034 00:52:35,930 --> 00:52:38,300 I'm using a condition with two branches, and I'm 1035 00:52:38,300 --> 00:52:42,990 using printf to print out whether those two variables, i and j, are the same. 1036 00:52:42,990 --> 00:52:44,930 So let's go ahead and compile this. 1037 00:52:44,930 --> 00:52:45,950 All is well. 1038 00:52:45,950 --> 00:52:49,310 Run compare, and let me give it digits 1 and 2. 1039 00:52:49,310 --> 00:52:50,630 And indeed, they're different. 1040 00:52:50,630 --> 00:52:53,400 And let me go ahead and give it 1 and 1, and they're the same. 1041 00:52:53,400 --> 00:52:56,270 So I think, logically, proof by example, if you will, 1042 00:52:56,270 --> 00:52:57,860 this program looks correct. 1043 00:52:57,860 --> 00:53:02,630 But let me quickly make it seemingly uncorrect, by not using integers. 1044 00:53:02,630 --> 00:53:05,840 But how about, by using strings instead. 1045 00:53:05,840 --> 00:53:07,988 Let me go ahead and give myself a string. 1046 00:53:07,988 --> 00:53:10,280 Although, no, I don't need that training wheel anymore. 1047 00:53:10,280 --> 00:53:15,300 Let's just do char star s equals get string of s. 1048 00:53:15,300 --> 00:53:17,300 But again, even though I'm calling it char star, 1049 00:53:17,300 --> 00:53:19,580 it's still a string like it was weeks ago. 1050 00:53:19,580 --> 00:53:23,510 Let me give myself another string called t, just to keep the name short. 1051 00:53:23,510 --> 00:53:25,100 And s will get-- 1052 00:53:25,100 --> 00:53:26,730 t will get that value there. 1053 00:53:26,730 --> 00:53:30,140 And let me just, very naively but kind of reasonably, 1054 00:53:30,140 --> 00:53:34,310 say if s equals equals t, let's go ahead and print out same. 1055 00:53:34,310 --> 00:53:38,000 And otherwise, let's go ahead and print out different. 1056 00:53:38,000 --> 00:53:41,240 So same exact code, just different data types, and using 1057 00:53:41,240 --> 00:53:42,830 get string instead of get int. 1058 00:53:42,830 --> 00:53:47,360 Let me go ahead and make compare, seems to compile OK, dot slash compare. 1059 00:53:47,360 --> 00:53:51,770 Let me go ahead and type in HI!-- 1060 00:53:51,770 --> 00:53:53,570 woops, HI!. 1061 00:53:53,570 --> 00:53:55,220 Let me go ahead and type in HI! again. 1062 00:53:55,220 --> 00:53:57,500 And voila, different. 1063 00:53:57,500 --> 00:54:01,010 And I forgot my backslash n's, but that seems to be the least of my problems. 1064 00:54:01,010 --> 00:54:05,240 Let me recompile this, make compare, and now, let me run it again. 1065 00:54:05,240 --> 00:54:07,130 How about, let's do a quick test. 1066 00:54:07,130 --> 00:54:09,010 David, Brian, these are definitely different. 1067 00:54:09,010 --> 00:54:09,580 OK, good. 1068 00:54:09,580 --> 00:54:11,240 So the program seems to work. 1069 00:54:11,240 --> 00:54:13,150 How about David, David? 1070 00:54:13,150 --> 00:54:14,140 Also different. 1071 00:54:14,140 --> 00:54:15,370 Huh, let me try again. 1072 00:54:15,370 --> 00:54:18,600 Brian, Brian, also different. 1073 00:54:18,600 --> 00:54:21,570 But I'm pretty sure those strings are the same. 1074 00:54:21,570 --> 00:54:24,180 Why might this program be flawed? 1075 00:54:24,180 --> 00:54:28,582 What is wrong with this program right now? 1076 00:54:28,582 --> 00:54:30,290 BRIAN: A couple of people in the chat are 1077 00:54:30,290 --> 00:54:32,750 saying that we're not actually comparing the characters, 1078 00:54:32,750 --> 00:54:34,370 we're comparing the addresses. 1079 00:54:34,370 --> 00:54:37,377 DAVID MALAN: Yeah, so that's sort of the logical conclusion from today's 1080 00:54:37,377 --> 00:54:38,960 definition of what a string really is. 1081 00:54:38,960 --> 00:54:41,750 If a string is just the address of its first character, 1082 00:54:41,750 --> 00:54:44,450 then if you're literally doing s equals equals t, 1083 00:54:44,450 --> 00:54:46,697 you're comparing those two addresses. 1084 00:54:46,697 --> 00:54:48,530 And they are probably going to be different, 1085 00:54:48,530 --> 00:54:50,990 even if I type in the same thing, because every time we've 1086 00:54:50,990 --> 00:54:55,010 called get int or get string, it's kind of plopped the user's input 1087 00:54:55,010 --> 00:54:56,750 somewhere in my computer's memory. 1088 00:54:56,750 --> 00:55:00,560 But we now have the tools, honestly, to answer this or vet this answer 1089 00:55:00,560 --> 00:55:01,130 ourselves. 1090 00:55:01,130 --> 00:55:03,230 Let me go ahead and simplify this program. 1091 00:55:03,230 --> 00:55:06,050 And let's, just as a quick sanity check, print out s. 1092 00:55:06,050 --> 00:55:10,610 And let's go ahead and print out t using a new line after each, 1093 00:55:10,610 --> 00:55:12,350 just so we can see what the strings are. 1094 00:55:12,350 --> 00:55:16,830 So let me go ahead and do this again, make compare, compiles OK, dot slash 1095 00:55:16,830 --> 00:55:17,330 compare. 1096 00:55:17,330 --> 00:55:19,310 Let me type in HI, HI. 1097 00:55:19,310 --> 00:55:21,710 And they seem to be visually the same. 1098 00:55:21,710 --> 00:55:24,770 But recall that, now, I have this other format code, 1099 00:55:24,770 --> 00:55:27,080 such that I can now start treating strings 1100 00:55:27,080 --> 00:55:29,330 as the addresses they technically are. 1101 00:55:29,330 --> 00:55:33,140 So let me change percent s to percent p in both places. 1102 00:55:33,140 --> 00:55:37,610 Let me then recompile the program, and now, rerun compare with both HI and HI 1103 00:55:37,610 --> 00:55:38,690 identically typed. 1104 00:55:38,690 --> 00:55:43,100 But notice, they've ended up at slightly different memory locations. 1105 00:55:43,100 --> 00:55:46,820 Even though I have coincidentally typed the same thing, C and my computer 1106 00:55:46,820 --> 00:55:52,097 are not going to be so presumptuous as to use the same bytes for both strings. 1107 00:55:52,097 --> 00:55:53,930 That's not going to give me much flexibility 1108 00:55:53,930 --> 00:55:55,490 if I want to change one or the other. 1109 00:55:55,490 --> 00:55:58,490 It's going to very simplistically put one in this chunk of memory 1110 00:55:58,490 --> 00:56:00,240 and the other in this chunk of memory. 1111 00:56:00,240 --> 00:56:03,680 And indeed, those addresses are respectively, but arbitrarily, 1112 00:56:03,680 --> 00:56:07,220 0x22fe670 and 0x22fe6b0. 1113 00:56:07,220 --> 00:56:09,770 1114 00:56:09,770 --> 00:56:12,500 So they are spread apart some distance. 1115 00:56:12,500 --> 00:56:15,810 But again, it's up to the computer to decide where to actually put those. 1116 00:56:15,810 --> 00:56:18,310 So what's actually going on inside of the computer's memory? 1117 00:56:18,310 --> 00:56:22,010 Well, let's consider if, for instance, this is s, my pointer, or really, 1118 00:56:22,010 --> 00:56:22,640 my string. 1119 00:56:22,640 --> 00:56:23,810 But it's just a pointer now. 1120 00:56:23,810 --> 00:56:25,060 It's the address of something. 1121 00:56:25,060 --> 00:56:28,250 Notice that I've drawn it as taking up eight squares, 1122 00:56:28,250 --> 00:56:31,680 because again, a pointer on modern systems is eight bytes. 1123 00:56:31,680 --> 00:56:33,320 So that's why this thing is so big. 1124 00:56:33,320 --> 00:56:37,100 Meanwhile, when I type in something like HI with the exclamation point, 1125 00:56:37,100 --> 00:56:38,720 then it ends up somewhere in memory. 1126 00:56:38,720 --> 00:56:40,440 We don't really know or care where it is. 1127 00:56:40,440 --> 00:56:42,773 So let's just arbitrarily say it happens to end up there 1128 00:56:42,773 --> 00:56:43,850 in my computer's memory. 1129 00:56:43,850 --> 00:56:46,730 Now, each of those bytes, of course, has an address. 1130 00:56:46,730 --> 00:56:48,950 I don't necessarily know or care what they are. 1131 00:56:48,950 --> 00:56:52,040 But for explanation's sake, let's just number them again like before, 1132 00:56:52,040 --> 00:56:56,810 0x123, 0x124, 0x125, 0x126. 1133 00:56:56,810 --> 00:57:02,960 When I then assign s on the left the value from get string on the right, 1134 00:57:02,960 --> 00:57:04,670 get string, what is it going to do? 1135 00:57:04,670 --> 00:57:07,640 Well, all of this time since week one, since you've been using it, 1136 00:57:07,640 --> 00:57:11,970 it is, yes, getting a string and handing it back to you as a return value. 1137 00:57:11,970 --> 00:57:13,680 But what does that really mean? 1138 00:57:13,680 --> 00:57:18,200 Well, if a string is just an address, the return value of a function 1139 00:57:18,200 --> 00:57:23,030 like get string is to return to, not the string per se, because that's 1140 00:57:23,030 --> 00:57:24,740 kind of a high level concept. 1141 00:57:24,740 --> 00:57:27,050 What get string has always been doing for us 1142 00:57:27,050 --> 00:57:29,810 is returning the address of the string, or more 1143 00:57:29,810 --> 00:57:33,410 specifically, the address of the first character in the string. 1144 00:57:33,410 --> 00:57:39,740 And so what is technically stored in s, to be clear, is that address, 0x123. 1145 00:57:39,740 --> 00:57:43,400 It's not returning to the whole string, the H, the I, the exclamation point. 1146 00:57:43,400 --> 00:57:46,040 Rather, it's returning just one value to you. 1147 00:57:46,040 --> 00:57:50,990 It's returning only to you the address of the first character of that string. 1148 00:57:50,990 --> 00:57:54,500 But again, this is all very good for just s. 1149 00:57:54,500 --> 00:57:55,880 What's going on with t? 1150 00:57:55,880 --> 00:57:58,910 t is kind of the same story, because I'm calling get string again. 1151 00:57:58,910 --> 00:58:02,390 t is going to get assigned the address of the first character 1152 00:58:02,390 --> 00:58:03,500 of this version of HI. 1153 00:58:03,500 --> 00:58:13,160 And let's just arbitrarily say it's at 0x456, 0x457, 0x458, and 0x459. 1154 00:58:13,160 --> 00:58:16,873 And at this point, t is going to take on the value of 0x456. 1155 00:58:16,873 --> 00:58:19,790 And now, at this point, honestly, we're really getting into the weeds. 1156 00:58:19,790 --> 00:58:21,665 Let's just start abstracting all of this away 1157 00:58:21,665 --> 00:58:23,870 and use arrows to point at the values. 1158 00:58:23,870 --> 00:58:26,720 And indeed, these arrows just represent pointers 1159 00:58:26,720 --> 00:58:29,190 when we stop caring about the particular addresses. 1160 00:58:29,190 --> 00:58:32,300 So s is really just a pointer, a variable pointing 1161 00:58:32,300 --> 00:58:34,070 at the first character of HI here. 1162 00:58:34,070 --> 00:58:38,490 t is just a variable pointing at the first character of HI there. 1163 00:58:38,490 --> 00:58:41,540 And so when you are comparing two strings 1164 00:58:41,540 --> 00:58:45,440 as I was before in the earlier version of my program, 1165 00:58:45,440 --> 00:58:53,540 where I was checking if s equals equals t, I was, indeed, comparing s and t. 1166 00:58:53,540 --> 00:58:55,130 What are s and t? 1167 00:58:55,130 --> 00:59:01,640 s and t, respectively, are 0x123 and 0x456, 1168 00:59:01,640 --> 00:59:03,770 or whatever the actual values happen to be, 1169 00:59:03,770 --> 00:59:06,320 which are not going to be the same because they happen 1170 00:59:06,320 --> 00:59:09,920 to point to different chunks of memory. 1171 00:59:09,920 --> 00:59:12,110 All right, well who cares? 1172 00:59:12,110 --> 00:59:14,630 This is all kind of a nice intellectual exercise. 1173 00:59:14,630 --> 00:59:15,512 But who cares? 1174 00:59:15,512 --> 00:59:16,970 Well, how do we solve this problem? 1175 00:59:16,970 --> 00:59:20,480 Let's consider what I actually did in a previous demo. 1176 00:59:20,480 --> 00:59:23,955 I sort of preemptively mentioned that there's this function, string compare, 1177 00:59:23,955 --> 00:59:25,580 that allows you to compare two strings. 1178 00:59:25,580 --> 00:59:28,040 And I promised that we would eventually explain 1179 00:59:28,040 --> 00:59:31,573 why we use str compare as opposed to just using the equal equal sign. 1180 00:59:31,573 --> 00:59:33,740 Well, to use this function, I'm going to need to add 1181 00:59:33,740 --> 00:59:37,910 in string.h up here per lat time. 1182 00:59:37,910 --> 00:59:40,790 But if string compare s t, let me go ahead and recompile this, 1183 00:59:40,790 --> 00:59:43,160 compare dots slash compare. 1184 00:59:43,160 --> 00:59:45,710 Now, let me type HI! and HI! identically. 1185 00:59:45,710 --> 00:59:47,870 Now, they still seem to be different. 1186 00:59:47,870 --> 00:59:51,680 And dammit, I made the same stupid mistake as I did last time. 1187 00:59:51,680 --> 00:59:57,170 Does anyone know what mistake I made when comparing two strings? 1188 00:59:57,170 --> 01:00:00,590 Somehow I seem to be very good at making this mistake. 1189 01:00:00,590 --> 01:00:03,440 BRIAN: Ibrahim is suggesting that you add an equal equal zero. 1190 01:00:03,440 --> 01:00:04,398 DAVID MALAN: Thank you. 1191 01:00:04,398 --> 01:00:05,390 Ibrahim is quite right. 1192 01:00:05,390 --> 01:00:08,000 The return value, recall, of str compare, 1193 01:00:08,000 --> 01:00:13,040 is to return 0 if they're the same, a negative number if one comes 1194 01:00:13,040 --> 01:00:16,430 before the other, and a positive number if one comes after the other, 1195 01:00:16,430 --> 01:00:18,600 as in ASCIIbetical order. 1196 01:00:18,600 --> 01:00:21,440 So what I should have done, both last time and this time, 1197 01:00:21,440 --> 01:00:23,600 is check for equality with 0. 1198 01:00:23,600 --> 01:00:26,220 Let me go ahead and recompile this program. 1199 01:00:26,220 --> 01:00:27,050 OK, good. 1200 01:00:27,050 --> 01:00:29,090 Now, let me rerun this program with HI! 1201 01:00:29,090 --> 01:00:30,230 twice. 1202 01:00:30,230 --> 01:00:31,940 Voila, they're the same. 1203 01:00:31,940 --> 01:00:34,580 And just to make sure, let me do one other check. 1204 01:00:34,580 --> 01:00:38,810 Let me do David and Brian, which should be, indeed, different. 1205 01:00:38,810 --> 01:00:42,050 So now, again, I haven't really done anything different from that last time. 1206 01:00:42,050 --> 01:00:47,420 But I'm now thinking about these strings as being fundamentally just 1207 01:00:47,420 --> 01:00:48,173 their addresses. 1208 01:00:48,173 --> 01:00:50,090 And so, now, let's make this actually germane. 1209 01:00:50,090 --> 01:00:52,160 Let me go ahead and create a new file altogether. 1210 01:00:52,160 --> 01:00:56,590 And let's, pretty reasonably, try to copy one string and make changes to it. 1211 01:00:56,590 --> 01:00:57,840 So I'm going to go ahead here. 1212 01:00:57,840 --> 01:01:00,230 And just for convenience, I'm going to still use the CS50 library, 1213 01:01:00,230 --> 01:01:02,300 not for the string data type, but just for the 1214 01:01:02,300 --> 01:01:06,200 get string function, which we'll see is more handy than other things-- 1215 01:01:06,200 --> 01:01:07,790 than other ways of doing things. 1216 01:01:07,790 --> 01:01:11,630 And I'm going to go ahead and include standard io dot h. 1217 01:01:11,630 --> 01:01:17,450 And I'm going to go ahead and include, how about, string.h. 1218 01:01:17,450 --> 01:01:20,000 Let me go ahead and do int main void. 1219 01:01:20,000 --> 01:01:22,790 And let me go ahead, in this program, and get myself a string. 1220 01:01:22,790 --> 01:01:24,540 But note, we won't call it string anymore. 1221 01:01:24,540 --> 01:01:26,030 We'll just call it char star. 1222 01:01:26,030 --> 01:01:28,380 So again, start taking off that training wheel. 1223 01:01:28,380 --> 01:01:31,312 And I'm going to go ahead and get a string called s. 1224 01:01:31,312 --> 01:01:33,020 And then I'm going to get another string. 1225 01:01:33,020 --> 01:01:34,062 But I won't call it that. 1226 01:01:34,062 --> 01:01:36,230 I'll call it char star t. 1227 01:01:36,230 --> 01:01:37,400 And I want to copy s. 1228 01:01:37,400 --> 01:01:40,790 And so you might think, based on week one, week two, and since, that OK, 1229 01:01:40,790 --> 01:01:42,890 if you want to copy a variable, just do it. 1230 01:01:42,890 --> 01:01:44,690 I mean, we've used the assignment operator 1231 01:01:44,690 --> 01:01:48,530 to copy a variable from right to left for integers, for chars, 1232 01:01:48,530 --> 01:01:50,600 and for other data types, perhaps, too. 1233 01:01:50,600 --> 01:01:54,690 I'm going to go ahead, now, and make a change to the original string. 1234 01:01:54,690 --> 01:01:56,270 So let me go ahead and do this. 1235 01:01:56,270 --> 01:02:01,280 Let me go ahead and say, let's change the first character of t 1236 01:02:01,280 --> 01:02:02,780 to be uppercase. 1237 01:02:02,780 --> 01:02:04,940 Recall that there's this function, to upper, 1238 01:02:04,940 --> 01:02:09,170 which takes, as input, a character, like the first character in t, 1239 01:02:09,170 --> 01:02:11,120 and returns the uppercase version. 1240 01:02:11,120 --> 01:02:14,240 Now, to use to upper, I need another header file, 1241 01:02:14,240 --> 01:02:17,990 which I recall from a couple of weeks ago now, I need ctype.h. 1242 01:02:17,990 --> 01:02:20,750 So let me preemptively go back and put that there. 1243 01:02:20,750 --> 01:02:23,280 And now, let me go ahead and print these two strings. 1244 01:02:23,280 --> 01:02:27,500 Let me go ahead and print out s as being this percent s. 1245 01:02:27,500 --> 01:02:33,990 And let me go ahead and print out the value of t with percent s as follows. 1246 01:02:33,990 --> 01:02:36,680 So again, what I'm doing is I'm getting a string from the user. 1247 01:02:36,680 --> 01:02:40,490 And the only new thing here is char star today, which is synonymous with string. 1248 01:02:40,490 --> 01:02:44,270 On line 10 here, I'm copying the string from right to left. 1249 01:02:44,270 --> 01:02:47,330 And then I'm capitalizing only the first letter 1250 01:02:47,330 --> 01:02:49,640 in the copy, otherwise known as t. 1251 01:02:49,640 --> 01:02:51,140 And then I'm just printing both out. 1252 01:02:51,140 --> 01:02:54,290 So let me go ahead and make copy, compiles OK. 1253 01:02:54,290 --> 01:02:56,510 Make cop-- dot slash copy. 1254 01:02:56,510 --> 01:03:00,020 Let me go ahead and type in hi! in lowercase, all lowercase, 1255 01:03:00,020 --> 01:03:00,920 and then enter. 1256 01:03:00,920 --> 01:03:03,830 And voila, huh. 1257 01:03:03,830 --> 01:03:10,760 It would seem that I somehow capitalized both S and T, even though I only 1258 01:03:10,760 --> 01:03:17,080 called to upper on T. Brian, any thoughts 1259 01:03:17,080 --> 01:03:24,820 from the group on why I've accidentally and erroneously capitalized 1260 01:03:24,820 --> 01:03:26,260 both somehow? 1261 01:03:26,260 --> 01:03:29,735 BRIAN: A couple of people are saying that t is just an alias of s. 1262 01:03:29,735 --> 01:03:32,860 DAVID MALAN: Just an alias of s, that's a reasonable way of thinking of it, 1263 01:03:32,860 --> 01:03:33,360 sure. 1264 01:03:33,360 --> 01:03:38,320 And more precisely, any other thoughts on why this is incorrect somehow? 1265 01:03:38,320 --> 01:03:41,540 BRIAN: Peter is now suggesting that they have the same address. 1266 01:03:41,540 --> 01:03:45,880 DAVID MALAN: So yeah, more specifically, all I've done is copy s into t. 1267 01:03:45,880 --> 01:03:48,040 But again, what is s as of today? 1268 01:03:48,040 --> 01:03:49,390 It's just an address. 1269 01:03:49,390 --> 01:03:51,040 So yes, I have copied s. 1270 01:03:51,040 --> 01:03:54,820 But I've copied it literally, which means copying its address, 0x123, 1271 01:03:54,820 --> 01:03:55,820 or whatever it is. 1272 01:03:55,820 --> 01:04:01,180 And then on line 12, notice that I'm changing t by uppercasing it. 1273 01:04:01,180 --> 01:04:04,130 But t is at the same address of s. 1274 01:04:04,130 --> 01:04:08,130 So really, I'm changing one in the same string. 1275 01:04:08,130 --> 01:04:10,630 So if we think about this in terms of the computer's memory, 1276 01:04:10,630 --> 01:04:12,088 let's consider what I've just done. 1277 01:04:12,088 --> 01:04:13,570 Let me clear the computer's memory. 1278 01:04:13,570 --> 01:04:15,290 Let me put s down as before. 1279 01:04:15,290 --> 01:04:18,250 Let me put hi! down as before, but all lowercase this time. 1280 01:04:18,250 --> 01:04:23,320 And recall that it might be it addresses 0x123, 124, 125, and 126. 1281 01:04:23,320 --> 01:04:26,350 And now, if we consider that s technically 1282 01:04:26,350 --> 01:04:29,740 contains the address of that first character, 0x123, 1283 01:04:29,740 --> 01:04:34,960 and I proceed to create a new variable, t, and assign t the value of s, 1284 01:04:34,960 --> 01:04:36,970 I got to take that statement literally. 1285 01:04:36,970 --> 01:04:39,670 I'm literally just putting 0x123 here. 1286 01:04:39,670 --> 01:04:41,770 And if we now abstract away these details just 1287 01:04:41,770 --> 01:04:44,020 to make it more clear visually what's going on, 1288 01:04:44,020 --> 01:04:48,070 that's pretty much like saying that both s and t point 1289 01:04:48,070 --> 01:04:49,750 to the same location in memory. 1290 01:04:49,750 --> 01:04:52,297 So yes, in that sense, t is just an alias for s, 1291 01:04:52,297 --> 01:04:54,130 which is a reasonable way of thinking of it. 1292 01:04:54,130 --> 01:04:56,920 But really, just t is identical to s. 1293 01:04:56,920 --> 01:04:59,110 So when you use the square bracket notation 1294 01:04:59,110 --> 01:05:02,290 to go to the first character of t, you are equivalently 1295 01:05:02,290 --> 01:05:04,750 going to the first character in s. 1296 01:05:04,750 --> 01:05:06,200 They are one in the same. 1297 01:05:06,200 --> 01:05:10,390 So when I call to upper, I'm calling it on this character, which of course, is 1298 01:05:10,390 --> 01:05:12,970 the one and only h in the story. 1299 01:05:12,970 --> 01:05:16,240 And when I print s and I print t, printf is 1300 01:05:16,240 --> 01:05:18,610 following those same breadcrumbs, if you will, 1301 01:05:18,610 --> 01:05:24,070 and ultimately displaying the same value as having changed. 1302 01:05:24,070 --> 01:05:27,220 So we would seem to need to fundamentally rethink 1303 01:05:27,220 --> 01:05:28,990 how we are copying strings. 1304 01:05:28,990 --> 01:05:34,300 And let me ask, if this is the wrong way to copy one string into the other, what 1305 01:05:34,300 --> 01:05:35,350 is the right way? 1306 01:05:35,350 --> 01:05:39,340 Even if you don't have the functions in mind or the right vocabulary, 1307 01:05:39,340 --> 01:05:43,750 just intuitively, , if we want to copy a string in the way that a human would 1308 01:05:43,750 --> 01:05:50,020 think of copying one into the other, like a photograph or a photocopy, 1309 01:05:50,020 --> 01:05:52,610 how do we want to do this? 1310 01:05:52,610 --> 01:05:54,460 Any thoughts, Brian? 1311 01:05:54,460 --> 01:05:57,430 BRIAN: Yeah, Sophia suggested we would want to somehow loop over 1312 01:05:57,430 --> 01:05:59,948 the elements in s and put them into t. 1313 01:05:59,948 --> 01:06:01,240 DAVID MALAN: Yeah, I like that. 1314 01:06:01,240 --> 01:06:04,120 So loop over the elements of s and put them into t. 1315 01:06:04,120 --> 01:06:05,800 So it sounds like more work. 1316 01:06:05,800 --> 01:06:07,660 But that's, again, what we're going to have 1317 01:06:07,660 --> 01:06:09,582 to do if we want to think of these-- 1318 01:06:09,582 --> 01:06:12,790 if we want to accept the fact that these things, s and t, are just addresses, 1319 01:06:12,790 --> 01:06:15,550 we're going to now have to go and follow those breadcrumbs. 1320 01:06:15,550 --> 01:06:18,790 So let's go ahead and consider a variant of this program. 1321 01:06:18,790 --> 01:06:24,520 Let me go ahead, here, and change this such that I'm still getting a string s. 1322 01:06:24,520 --> 01:06:28,390 But now, let me go ahead and propose exactly that, 1323 01:06:28,390 --> 01:06:30,340 that we copy the individual characters. 1324 01:06:30,340 --> 01:06:32,320 But I need to copy them somewhere. 1325 01:06:32,320 --> 01:06:35,200 So I feel like another step in this process of copying a string 1326 01:06:35,200 --> 01:06:37,750 has to be to give myself some additional memory. 1327 01:06:37,750 --> 01:06:40,840 If I have H i exclamation point in nul character, 1328 01:06:40,840 --> 01:06:43,150 I need to, now, somehow take control of this situation 1329 01:06:43,150 --> 01:06:48,320 and tell the computer somehow, in code, give me four more bytes of memory 1330 01:06:48,320 --> 01:06:53,390 so that I have location for t in which to copy those characters. 1331 01:06:53,390 --> 01:06:55,360 So here's a new function today. 1332 01:06:55,360 --> 01:06:59,470 If I want to create a string t, otherwise known today as a char star, 1333 01:06:59,470 --> 01:07:02,680 there is a new function we can use called malloc, which 1334 01:07:02,680 --> 01:07:04,720 represents memory allocation. 1335 01:07:04,720 --> 01:07:08,200 This is a pretty fancy function that, fortunately, is pretty simple to use. 1336 01:07:08,200 --> 01:07:10,390 It takes, as input, just a number. 1337 01:07:10,390 --> 01:07:14,480 How many bytes of memory do you want to ask the computer for? 1338 01:07:14,480 --> 01:07:16,000 So how do I do this? 1339 01:07:16,000 --> 01:07:20,110 Well, H i exclamation point backslash 0, I could literally just say four. 1340 01:07:20,110 --> 01:07:21,850 But this doesn't feel very dynamic. 1341 01:07:21,850 --> 01:07:26,410 I think I can programmatically implement this a little more elegantly. 1342 01:07:26,410 --> 01:07:30,370 Let me go ahead and say, give me as many bytes 1343 01:07:30,370 --> 01:07:35,200 as there are characters in s plus 1. 1344 01:07:35,200 --> 01:07:37,090 Plus 1, why am I doing this? 1345 01:07:37,090 --> 01:07:40,773 Well, H i exclamation point nul character, that's technically 1346 01:07:40,773 --> 01:07:42,190 what's stored underneath the hood. 1347 01:07:42,190 --> 01:07:45,250 But what do you and I think of the length of Hi! as being? 1348 01:07:45,250 --> 01:07:48,070 Well, odds are, in the human world, it's H i exclamation point. 1349 01:07:48,070 --> 01:07:50,710 And who cares about this low level detail, this nul terminator. 1350 01:07:50,710 --> 01:07:53,800 You don't include that in the length of an English word or any word. 1351 01:07:53,800 --> 01:07:56,290 You only think of the actual characters you can see. 1352 01:07:56,290 --> 01:08:00,580 So the length of H, i, exclamation point 3. 1353 01:08:00,580 --> 01:08:08,110 But I do need to cleverly add one more bite, a fourth, for the nul character, 1354 01:08:08,110 --> 01:08:10,580 because I'm going to have to copy that over as well. 1355 01:08:10,580 --> 01:08:13,270 Otherwise, if I don't have an identical nul character, 1356 01:08:13,270 --> 01:08:15,830 t is not going to have an obvious ending. 1357 01:08:15,830 --> 01:08:17,872 So how do I copy, now, one string into the other? 1358 01:08:17,872 --> 01:08:20,538 Well, let me go ahead and take out our old friend, the for loop, 1359 01:08:20,538 --> 01:08:21,380 from week one. 1360 01:08:21,380 --> 01:08:24,050 And say, for i equals 0-- 1361 01:08:24,050 --> 01:08:26,810 how about, actually, n equals string length of s. 1362 01:08:26,810 --> 01:08:28,279 We've done this trick before. 1363 01:08:28,279 --> 01:08:33,080 i is less than n, i++. 1364 01:08:33,080 --> 01:08:38,689 Let me go ahead and, quite simply, say t bracket i gets s bracket i. 1365 01:08:38,689 --> 01:08:43,939 So this will literally copy, from s, each of the characters one at a time 1366 01:08:43,939 --> 01:08:45,020 into t. 1367 01:08:45,020 --> 01:08:46,640 But I need to be a little smarter now. 1368 01:08:46,640 --> 01:08:49,130 Even though we almost always do i less than n, 1369 01:08:49,130 --> 01:08:55,660 I'm actually going to very aggressively say i less than or equal to n. 1370 01:08:55,660 --> 01:08:56,830 Why? 1371 01:08:56,830 --> 01:09:00,250 Why am I going one step further than I feel we normally 1372 01:09:00,250 --> 01:09:03,310 do when iterating over strings, and one step further than you 1373 01:09:03,310 --> 01:09:07,149 probably did when iterating over a caesar cipher or a string 1374 01:09:07,149 --> 01:09:09,130 in that context? 1375 01:09:09,130 --> 01:09:10,939 Brian, any thoughts here? 1376 01:09:10,939 --> 01:09:16,569 Why am I going from i less than or equal to n kind of for the first time here? 1377 01:09:16,569 --> 01:09:19,779 BRIAN: Celina is suggesting that we need to include the nul character. 1378 01:09:19,779 --> 01:09:22,843 DAVID MALAN: Yeah, so if I-- and now I understand how strings works. 1379 01:09:22,843 --> 01:09:25,510 So it's not sufficient to just copy the H, I, exclamation point. 1380 01:09:25,510 --> 01:09:29,020 I need to go one step further, one more than the length of the string. 1381 01:09:29,020 --> 01:09:32,290 And the easiest way to do that would be less than or equal to n. 1382 01:09:32,290 --> 01:09:34,450 Or I could just do a plus 1 there. 1383 01:09:34,450 --> 01:09:35,950 Or I can do this any number of ways. 1384 01:09:35,950 --> 01:09:37,399 Doesn't matter how you do it. 1385 01:09:37,399 --> 01:09:40,899 But I think a less than or equal to is one reasonable way to do it. 1386 01:09:40,899 --> 01:09:43,540 And now, let's go down to the bottom here and now actually 1387 01:09:43,540 --> 01:09:44,590 do this capitalization. 1388 01:09:44,590 --> 01:09:47,710 Let's now change the first character in t 1389 01:09:47,710 --> 01:09:52,750 to be the result of calling to upper on the first character of t. 1390 01:09:52,750 --> 01:09:56,770 And then, as before, let's go ahead and print out whatever s is. 1391 01:09:56,770 --> 01:09:59,080 And like before, let's go ahead and print out 1392 01:09:59,080 --> 01:10:05,110 whatever t is and hope now that only t has been capitalized. 1393 01:10:05,110 --> 01:10:07,330 But I do need to make one change now. 1394 01:10:07,330 --> 01:10:10,690 It turns out that this function, malloc, comes 1395 01:10:10,690 --> 01:10:12,897 in a file called standard lib dot h. 1396 01:10:12,897 --> 01:10:15,730 And again, this is the kind of thing that you can jot down in notes. 1397 01:10:15,730 --> 01:10:17,563 You can always Google these kinds of things. 1398 01:10:17,563 --> 01:10:20,740 Even I forget what header files these functions are sometimes declared in. 1399 01:10:20,740 --> 01:10:24,310 But it happens to be a new one called standard lib for library 1400 01:10:24,310 --> 01:10:26,110 that gives you access to malloc. 1401 01:10:26,110 --> 01:10:29,800 So let me go ahead, now, and make compare. 1402 01:10:29,800 --> 01:10:31,210 All right, so far so good. 1403 01:10:31,210 --> 01:10:34,360 Dot slash compare-- sorry, this is not compare. 1404 01:10:34,360 --> 01:10:35,680 The old program works fine. 1405 01:10:35,680 --> 01:10:38,630 Make copy-- oh my god, seven mistakes. 1406 01:10:38,630 --> 01:10:40,460 What'd I do wrong here? 1407 01:10:40,460 --> 01:10:44,560 Oh, it looks like I forgot the type of i and n. 1408 01:10:44,560 --> 01:10:47,440 So let me go into my for loop and add the int. 1409 01:10:47,440 --> 01:10:49,870 That was my fault. Let me make copy again. 1410 01:10:49,870 --> 01:10:51,910 OK, all seven errors, thankfully, went away. 1411 01:10:51,910 --> 01:10:56,710 Make copy, let's go ahead and type in hi! in lower case and hit Enter. 1412 01:10:56,710 --> 01:11:02,860 And voila, now I have capitalized only the copy of s, a.k.a. 1413 01:11:02,860 --> 01:11:03,580 t. 1414 01:11:03,580 --> 01:11:06,010 And just to be clear, I've kind of regressed back 1415 01:11:06,010 --> 01:11:09,140 to my square bracket notation, honestly, because it's perfectly acceptable. 1416 01:11:09,140 --> 01:11:10,360 It's very readable. 1417 01:11:10,360 --> 01:11:12,640 But notice, if I really want to show off, 1418 01:11:12,640 --> 01:11:19,190 I could say something like, well, go to t's plus i location. 1419 01:11:19,190 --> 01:11:23,078 And then do this, which again, I don't necessarily recommend for readability. 1420 01:11:23,078 --> 01:11:24,620 But again, there is this equivalence. 1421 01:11:24,620 --> 01:11:28,640 The square bracket notation is the same thing as pointer arithmetic. 1422 01:11:28,640 --> 01:11:34,160 So if you want to go to the address at t plus whatever i is to offset yourself 1423 01:11:34,160 --> 01:11:36,570 one or more bytes, you can totally do that. 1424 01:11:36,570 --> 01:11:39,920 And if I want to be fancy, I can go down here and say, 1425 01:11:39,920 --> 01:11:45,350 go to the first character in t and capitalize it. 1426 01:11:45,350 --> 01:11:48,170 But again, I would argue that even though, yes, you're very clever 1427 01:11:48,170 --> 01:11:50,420 and that you understand pointers and addresses at this point 1428 01:11:50,420 --> 01:11:51,795 if you're writing code like this. 1429 01:11:51,795 --> 01:11:53,990 Honestly, it's not necessarily as readable. 1430 01:11:53,990 --> 01:11:57,800 So sticking with week two syntax of the square bracket notation, totally 1431 01:11:57,800 --> 01:12:03,110 reasonable, totally correct, totally well-designed, and perhaps preferable, 1432 01:12:03,110 --> 01:12:04,890 though I should be careful here. 1433 01:12:04,890 --> 01:12:07,550 This line of code is a little bit risky for me 1434 01:12:07,550 --> 01:12:10,310 because what if the user just hits Enter and they don't type hi 1435 01:12:10,310 --> 01:12:11,540 or David or Brian. 1436 01:12:11,540 --> 01:12:13,580 What if they type nothing except Enter? 1437 01:12:13,580 --> 01:12:16,130 In that case, the length of the string might be 0. 1438 01:12:16,130 --> 01:12:19,220 And then I probably shouldn't capitalizing the first character 1439 01:12:19,220 --> 01:12:22,230 in a string that doesn't really even exist. 1440 01:12:22,230 --> 01:12:25,250 So I should probably have some error checking, 1441 01:12:25,250 --> 01:12:32,450 like if, for instance, the string length of t is at least greater than 0, 1442 01:12:32,450 --> 01:12:34,960 then go ahead and safely do that. 1443 01:12:34,960 --> 01:12:37,550 But again, this is just one example of some additional error 1444 01:12:37,550 --> 01:12:39,200 checking I can add to the program. 1445 01:12:39,200 --> 01:12:41,300 There's actually one more piece of error checking 1446 01:12:41,300 --> 01:12:43,520 I should really do in a fully correct program, 1447 01:12:43,520 --> 01:12:45,170 as you should do in problem sets. 1448 01:12:45,170 --> 01:12:47,010 Sometimes things can go wrong. 1449 01:12:47,010 --> 01:12:50,270 And if your program is so big, so fancy, and so memory-hungry 1450 01:12:50,270 --> 01:12:52,187 that you're mallocing lots and lots of memory, 1451 01:12:52,187 --> 01:12:54,062 which you won't do in the program this small, 1452 01:12:54,062 --> 01:12:56,270 but over time you might need more and more memory, 1453 01:12:56,270 --> 01:13:01,490 we should also make sure that t actually has a valid address. 1454 01:13:01,490 --> 01:13:04,670 It turns out that malloc, most of the time, 1455 01:13:04,670 --> 01:13:08,090 is going to return to you the address of a chunk of memory 1456 01:13:08,090 --> 01:13:09,470 it has allocated for you. 1457 01:13:09,470 --> 01:13:11,300 Just like get string, it will return to you 1458 01:13:11,300 --> 01:13:14,900 the address of the first byte of the chunk of memory 1459 01:13:14,900 --> 01:13:16,820 that it has found space for. 1460 01:13:16,820 --> 01:13:18,740 However, sometimes things can go wrong. 1461 01:13:18,740 --> 01:13:20,630 Sometimes your computer can be out of memory. 1462 01:13:20,630 --> 01:13:24,320 You've probably seen your Mac or PC freeze or hang or reboot itself. 1463 01:13:24,320 --> 01:13:26,910 That is very often the result of memory errors. 1464 01:13:26,910 --> 01:13:29,000 So we should actually check something like this. 1465 01:13:29,000 --> 01:13:32,570 If t equals equals this special value nul, 1466 01:13:32,570 --> 01:13:35,360 then I'm going to go ahead and just bail out and return one, 1467 01:13:35,360 --> 01:13:37,280 quit, let's get out of the program. 1468 01:13:37,280 --> 01:13:38,760 It's not going to work. 1469 01:13:38,760 --> 01:13:41,610 This might only happen one out of a million times. 1470 01:13:41,610 --> 01:13:44,220 But it's more correct to check for nul. 1471 01:13:44,220 --> 01:13:48,350 Now, unfortunately, the designers of C kind of used-- or programmers 1472 01:13:48,350 --> 01:13:53,210 more generally, use this word, which is almost the same as N-U-L, 1473 01:13:53,210 --> 01:13:54,980 otherwise known as backslash 0. 1474 01:13:54,980 --> 01:13:57,290 Unfortunately, this is a different value. 1475 01:13:57,290 --> 01:14:01,370 N-U-L-L represents a nul pointer. 1476 01:14:01,370 --> 01:14:02,870 It is a bogus address. 1477 01:14:02,870 --> 01:14:04,580 It is the absence of an address. 1478 01:14:04,580 --> 01:14:06,950 Technically, its address 0. 1479 01:14:06,950 --> 01:14:09,230 It is different from backslash 0. 1480 01:14:09,230 --> 01:14:14,000 You use N-U-L-L in the context of pointers, as we are doing today. 1481 01:14:14,000 --> 01:14:17,390 You use backslash 0, otherwise known verbally, 1482 01:14:17,390 --> 01:14:21,210 as an N-U-L, or nul, in the context of characters. 1483 01:14:21,210 --> 01:14:23,810 So backslash 0 is for characters. 1484 01:14:23,810 --> 01:14:26,750 N-U-L-L in all caps is for pointers. 1485 01:14:26,750 --> 01:14:29,750 And it's just a new symbol we're introducing today 1486 01:14:29,750 --> 01:14:34,520 that comes with this standard lib dot h file. 1487 01:14:34,520 --> 01:14:38,190 All right, so it turns out, honestly, I don't need to do some of this work. 1488 01:14:38,190 --> 01:14:41,610 It turns out that if I want to copy one string to another, 1489 01:14:41,610 --> 01:14:43,170 there is a function for that. 1490 01:14:43,170 --> 01:14:45,920 And increasingly, you will not have to write as many lines of code 1491 01:14:45,920 --> 01:14:49,520 as you previously did, because if you look up in the manual pages 1492 01:14:49,520 --> 01:14:52,730 or you've heard about or find online that there's another function, like one 1493 01:14:52,730 --> 01:14:56,790 called strcpy, you can actually, more simply, do something like this. 1494 01:14:56,790 --> 01:15:00,410 So even though I really liked the idea, and it was correct to use a for loop 1495 01:15:00,410 --> 01:15:04,950 to copy all of the characters from s into t, there's a function for that. 1496 01:15:04,950 --> 01:15:06,200 It's called strcpy. 1497 01:15:06,200 --> 01:15:09,830 It takes two arguments, the destination followed by the source. 1498 01:15:09,830 --> 01:15:12,200 And it will just handle all of the looping 1499 01:15:12,200 --> 01:15:15,890 for us, all of the copying for us, including the backslash 0, 1500 01:15:15,890 --> 01:15:18,830 so that I can focus on what I want to do, which in this case, 1501 01:15:18,830 --> 01:15:21,300 is actually capitalize things. 1502 01:15:21,300 --> 01:15:26,497 So if we consider, now, this example, in the context of my computer's memory, 1503 01:15:26,497 --> 01:15:28,580 we'll see that it's laid out a little differently. 1504 01:15:28,580 --> 01:15:31,050 But there's one more bug I do want to fix first. 1505 01:15:31,050 --> 01:15:33,230 And this is something we've not had to do yet. 1506 01:15:33,230 --> 01:15:37,850 It turns out that any time you allocate memory with malloc, 1507 01:15:37,850 --> 01:15:41,330 you ask the computer for memory, the onus is on you, the programmer, 1508 01:15:41,330 --> 01:15:43,160 to eventually give it back. 1509 01:15:43,160 --> 01:15:46,070 And by that, I mean if you allocate four bytes, 1510 01:15:46,070 --> 01:15:49,430 or who knows, four million bytes of memory for an even bigger program, 1511 01:15:49,430 --> 01:15:52,160 you'd better give it back to the computer, more specifically, 1512 01:15:52,160 --> 01:15:55,252 the operating system, be it Linux or Mac OS or Windows, 1513 01:15:55,252 --> 01:15:57,710 so that your computer eventually doesn't run out of memory. 1514 01:15:57,710 --> 01:16:00,418 If all you ever do is ask for more memory, ask for more memory, 1515 01:16:00,418 --> 01:16:03,710 it stands to reason that eventually your computer will run out, because it only 1516 01:16:03,710 --> 01:16:05,370 has a finite amount of memory. 1517 01:16:05,370 --> 01:16:07,910 It's got a finite amount of hardware recall. 1518 01:16:07,910 --> 01:16:11,780 So when you're done with memory, it should be your best practice 1519 01:16:11,780 --> 01:16:14,970 to free it afterward as well. 1520 01:16:14,970 --> 01:16:18,950 And the opposite of malloc is just a function called free, which takes, 1521 01:16:18,950 --> 01:16:22,040 as its input, whatever the output of malloc was. 1522 01:16:22,040 --> 01:16:25,070 And recall that the output of malloc, the return value of malloc, 1523 01:16:25,070 --> 01:16:30,210 is just the address of the first byte of memory that it has allocated for you. 1524 01:16:30,210 --> 01:16:34,010 So if you ask it for four bytes, like I did a few lines ago with malloc, 1525 01:16:34,010 --> 01:16:37,100 you're going to get back the address of the first of those bytes. 1526 01:16:37,100 --> 01:16:41,150 And it's up to you to remember how many bytes you asked for. 1527 01:16:41,150 --> 01:16:43,760 In the case of free, all you have to do is 1528 01:16:43,760 --> 01:16:49,820 tell free via its input what the address was that malloc gave you. 1529 01:16:49,820 --> 01:16:53,210 So if you stored that address as I did, in this variable called t, 1530 01:16:53,210 --> 01:16:58,190 it suffices when you're done with that memory just called free t. 1531 01:16:58,190 --> 01:17:02,360 And the computer will go about freeing up that memory for you. 1532 01:17:02,360 --> 01:17:04,880 And you might very well get it back later on. 1533 01:17:04,880 --> 01:17:07,400 But at least your computer won't run out of memory 1534 01:17:07,400 --> 01:17:13,490 as quickly, because it can now reuse that space for something else. 1535 01:17:13,490 --> 01:17:15,410 All right, let me go ahead, then, and propose 1536 01:17:15,410 --> 01:17:17,870 that we draw a picture of this-- 1537 01:17:17,870 --> 01:17:20,942 now new program's memory, where we copy things. 1538 01:17:20,942 --> 01:17:23,900 So recall, this is where we left off before when comparing two strings. 1539 01:17:23,900 --> 01:17:29,010 If this was s and s was pointing to h, i, exclamation point in lowercase, 1540 01:17:29,010 --> 01:17:32,510 this new version of my code in copy.c, notice, 1541 01:17:32,510 --> 01:17:34,550 still gives me another pointer called t. 1542 01:17:34,550 --> 01:17:36,530 So that part of the story hasn't changed. 1543 01:17:36,530 --> 01:17:37,970 But I call malloc now. 1544 01:17:37,970 --> 01:17:40,790 And malloc is going to return to me some new chunk of memory. 1545 01:17:40,790 --> 01:17:42,440 I don't know in advance where it is. 1546 01:17:42,440 --> 01:17:45,740 But malloc's return value is going to be the address 1547 01:17:45,740 --> 01:17:47,920 of the first bite of that memory. 1548 01:17:47,920 --> 01:17:51,050 So for instance, 0x456 or whatever it is. 1549 01:17:51,050 --> 01:17:54,230 And the subsequent bytes are going to be increasing by one 1550 01:17:54,230 --> 01:17:59,630 byte at a time, 0x457, 0x458, 0x459. 1551 01:17:59,630 --> 01:18:03,800 So what is, ultimately, stored in t when I assign it the return value of malloc? 1552 01:18:03,800 --> 01:18:05,570 It's whatever that address is. 1553 01:18:05,570 --> 01:18:07,980 Again, I could technically write 0x456 up here. 1554 01:18:07,980 --> 01:18:09,800 But again, we're kind of past that. 1555 01:18:09,800 --> 01:18:10,970 That's very 30 minutes ago. 1556 01:18:10,970 --> 01:18:14,300 Let's now focus on just the abstraction that is a pointer. 1557 01:18:14,300 --> 01:18:17,690 A pointer is just an arrow pointing from the variable 1558 01:18:17,690 --> 01:18:19,980 to the actual location in memory. 1559 01:18:19,980 --> 01:18:26,720 So now, if I go about copying s into t using strcpy, or more manually, 1560 01:18:26,720 --> 01:18:28,670 using my for loop, what happens? 1561 01:18:28,670 --> 01:18:31,610 Well, I'm copying the h over from s into t. 1562 01:18:31,610 --> 01:18:36,110 I'm copying the i over from s into t, the exclamation point from s into t. 1563 01:18:36,110 --> 01:18:40,530 And then lastly, the terminating nul character from s into t. 1564 01:18:40,530 --> 01:18:42,740 So the picture is now fundamentally different. 1565 01:18:42,740 --> 01:18:45,020 t is not pointing at the same thing. 1566 01:18:45,020 --> 01:18:50,570 It's pointing at its own chunk of memory that has now, one step at a time, 1567 01:18:50,570 --> 01:18:56,210 been duplicating whatever was at the address s. 1568 01:18:56,210 --> 01:18:59,600 And so this is what you and I as humans would consider, presumably, 1569 01:18:59,600 --> 01:19:04,080 to be a proper copy of the program. 1570 01:19:04,080 --> 01:19:09,660 Any questions, then, on what we've just done by introducing malloc and free? 1571 01:19:09,660 --> 01:19:11,910 The first of which allocates memory and gives you 1572 01:19:11,910 --> 01:19:15,750 the address of the first byte of memory that you can now use, 1573 01:19:15,750 --> 01:19:19,260 the latter of which hands it back to your operating system and says, 1574 01:19:19,260 --> 01:19:20,700 I'm done with this. 1575 01:19:20,700 --> 01:19:24,360 It can now be reused for something else, some other variable, 1576 01:19:24,360 --> 01:19:27,090 maybe, down the road, if our program were longer. 1577 01:19:27,090 --> 01:19:31,530 Brian, any questions or confusion I can help with? 1578 01:19:31,530 --> 01:19:33,870 BRIAN: Someone asked, even if you're using strcpy 1579 01:19:33,870 --> 01:19:37,470 to copy the string instead of copying the characters one at a time yourself, 1580 01:19:37,470 --> 01:19:39,420 do you still need to free the memory? 1581 01:19:39,420 --> 01:19:40,545 DAVID MALAN: Good question. 1582 01:19:40,545 --> 01:19:43,320 Even if you're using strcpy, you do need to still use free. 1583 01:19:43,320 --> 01:19:48,120 Yes, anytime you use malloc henceforth, you must use free. 1584 01:19:48,120 --> 01:19:52,470 Anytime you use malloc, you must use free in order to free up that memory. 1585 01:19:52,470 --> 01:19:56,370 strcpy is copying the contents of one chunk of memory to the other. 1586 01:19:56,370 --> 01:19:59,220 It is not allocating or managing that memory for you. 1587 01:19:59,220 --> 01:20:02,520 It is just implementing, essentially, that for loop. 1588 01:20:02,520 --> 01:20:05,520 And it's, perhaps, time too, where I can take off another training wheel 1589 01:20:05,520 --> 01:20:06,020 verbally. 1590 01:20:06,020 --> 01:20:10,410 It turns out that get string, all this time, is kind of magical. 1591 01:20:10,410 --> 01:20:13,470 One of the things that get string does from the CS50 library 1592 01:20:13,470 --> 01:20:16,080 is it itself uses malloc. 1593 01:20:16,080 --> 01:20:19,800 Consider, after all, when we, the staff, wrote get string years ago, 1594 01:20:19,800 --> 01:20:22,830 we have no idea how long your names are going to be this year. 1595 01:20:22,830 --> 01:20:24,690 We have no idea what sentences you're going 1596 01:20:24,690 --> 01:20:28,350 to type, what paragraphs you're going to type, what text you're going to analyze 1597 01:20:28,350 --> 01:20:30,240 for a program like readability. 1598 01:20:30,240 --> 01:20:32,610 So we had to implement get string in such a way 1599 01:20:32,610 --> 01:20:35,730 that you can type as few or as many characters at your keyboard 1600 01:20:35,730 --> 01:20:36,420 as you want. 1601 01:20:36,420 --> 01:20:40,150 And we will make sure there's enough memory for that string. 1602 01:20:40,150 --> 01:20:43,530 So get string, underneath the hood, if you look at the code we, the staff, 1603 01:20:43,530 --> 01:20:46,530 wrote someday, you'll see that we use malloc. 1604 01:20:46,530 --> 01:20:51,390 And we call malloc in order to get enough memory to fit that string. 1605 01:20:51,390 --> 01:20:54,600 And then, what the CS50 library is also secretly doing, 1606 01:20:54,600 --> 01:20:57,060 is it is also calling free for you. 1607 01:20:57,060 --> 01:20:59,130 There's, essentially, a fancy way where you 1608 01:20:59,130 --> 01:21:03,690 can write a program that, as soon as main is about to quit or return 1609 01:21:03,690 --> 01:21:06,480 to your blinking prompt, some special code 1610 01:21:06,480 --> 01:21:10,860 we wrote swoops in at that final moment, frees any of the memory 1611 01:21:10,860 --> 01:21:14,130 that we, the library, allocated so that you 1612 01:21:14,130 --> 01:21:17,190 don't run out of memory because of us. 1613 01:21:17,190 --> 01:21:19,590 But you all, when using malloc, will have 1614 01:21:19,590 --> 01:21:23,700 to call free, because the library is not going to do that for you. 1615 01:21:23,700 --> 01:21:26,400 And indeed, the goal of today and next week and beyond 1616 01:21:26,400 --> 01:21:30,833 is to stop using the CS50 library, ultimately, altogether. 1617 01:21:30,833 --> 01:21:33,000 All right, well let's-- it would be unfair, I think, 1618 01:21:33,000 --> 01:21:36,000 if we introduced all of these fancy new techniques but don't necessarily 1619 01:21:36,000 --> 01:21:40,620 provide you with any sort of tools with which to determine to chase down bugs 1620 01:21:40,620 --> 01:21:43,245 in your new fancy code or solve problems, now, 1621 01:21:43,245 --> 01:21:44,370 that are related to memory. 1622 01:21:44,370 --> 01:21:46,860 And thankfully, there are programs via which 1623 01:21:46,860 --> 01:21:49,560 you can chase down memory-related bugs. 1624 01:21:49,560 --> 01:21:52,080 This is in addition to printf, that function, 1625 01:21:52,080 --> 01:21:56,550 and help50 and check50 and debug50 and debuggers more generally. 1626 01:21:56,550 --> 01:21:59,940 This program-- and it's really the last of the new tools we'll introduce you 1627 01:21:59,940 --> 01:22:01,920 to in C-- is called valgrind. 1628 01:22:01,920 --> 01:22:04,830 And this is a program that exists in CS50 IDE. 1629 01:22:04,830 --> 01:22:07,620 But it exists on Macs and PC's and Linux computers 1630 01:22:07,620 --> 01:22:10,050 anywhere, where you can run it on your own code 1631 01:22:10,050 --> 01:22:12,870 to detect if you're doing anything wrong with memory. 1632 01:22:12,870 --> 01:22:14,370 What might you do wrong with memory? 1633 01:22:14,370 --> 01:22:17,037 Well, previously, remember, I triggered that segmentation fault. 1634 01:22:17,037 --> 01:22:19,320 I touched memory that I should not. 1635 01:22:19,320 --> 01:22:22,080 Valgrind is a tool that can help you figure out, 1636 01:22:22,080 --> 01:22:25,000 where did you touch memory that you shouldn't have, 1637 01:22:25,000 --> 01:22:27,960 so as to focus your own human attention on whatever lines of code 1638 01:22:27,960 --> 01:22:28,830 might be buggy. 1639 01:22:28,830 --> 01:22:32,610 Valgrind grant can also detect if you forget to call free. 1640 01:22:32,610 --> 01:22:36,240 If you call malloc one or more times, but don't call free 1641 01:22:36,240 --> 01:22:38,280 a corresponding number of times, valgrind 1642 01:22:38,280 --> 01:22:40,890 is a program that can notice that and tell you that you have 1643 01:22:40,890 --> 01:22:42,580 what's called a memory leak. 1644 01:22:42,580 --> 01:22:44,760 And indeed, this is germane to our own Macs and PCs. 1645 01:22:44,760 --> 01:22:47,100 Again, if you've been using your Mac or PC or sometimes 1646 01:22:47,100 --> 01:22:50,070 even your phone for a long, long time, and maybe 1647 01:22:50,070 --> 01:22:53,340 running lots of different programs at once, lots of browser tabs 1648 01:22:53,340 --> 01:22:55,680 open, lots of different programs open at once, 1649 01:22:55,680 --> 01:22:59,370 your Mac or PC might very well have begun to slow to a crawl. 1650 01:22:59,370 --> 01:23:01,920 It might be annoying, if not impossible to use, 1651 01:23:01,920 --> 01:23:03,960 because everything is so darn slow. 1652 01:23:03,960 --> 01:23:07,920 That may very well be because one or more of the programs you're using 1653 01:23:07,920 --> 01:23:12,480 has some bug in it whereby a programmer kept allocating memory 1654 01:23:12,480 --> 01:23:15,210 and never got around to calling free. 1655 01:23:15,210 --> 01:23:17,273 Maybe it's a bug, maybe it was deliberate, 1656 01:23:17,273 --> 01:23:19,440 they didn't expect you to have so many windows open. 1657 01:23:19,440 --> 01:23:21,360 But valgrind can detect errors like that. 1658 01:23:21,360 --> 01:23:23,730 And honestly, some of you, if you're like me, 1659 01:23:23,730 --> 01:23:29,370 you might very well have 10, 20, 50 different browser tabs open at once, 1660 01:23:29,370 --> 01:23:32,910 thinking oh, I'm going to come back to that someday, even though we never do. 1661 01:23:32,910 --> 01:23:34,950 Each of those tabs takes up memory. 1662 01:23:34,950 --> 01:23:37,320 Literally, any time you open a browser tab, think of it, 1663 01:23:37,320 --> 01:23:41,580 really, as Chrome or Edge or Firefox or whatever 1664 01:23:41,580 --> 01:23:43,920 you're using, underneath the hood, they're 1665 01:23:43,920 --> 01:23:46,320 probably calling a function on Mac OS or Windows 1666 01:23:46,320 --> 01:23:50,670 like malloc to give you more memory to contain the contents of that web page 1667 01:23:50,670 --> 01:23:51,480 temporarily. 1668 01:23:51,480 --> 01:23:54,310 And if you keep opening more and more browser tabs, 1669 01:23:54,310 --> 01:23:56,190 it's like calling malloc, malloc, malloc. 1670 01:23:56,190 --> 01:23:57,840 Eventually, you're going to run out. 1671 01:23:57,840 --> 01:23:59,700 And computers can be smart these days. 1672 01:23:59,700 --> 01:24:03,060 They can kind of temporarily remove things from memory to free up space. 1673 01:24:03,060 --> 01:24:04,477 This is called virtual memory. 1674 01:24:04,477 --> 01:24:06,310 But eventually, something is going to break. 1675 01:24:06,310 --> 01:24:08,520 And it might very well be your user experience 1676 01:24:08,520 --> 01:24:11,700 when things get so slow that you literally have to quit the program 1677 01:24:11,700 --> 01:24:14,140 or maybe even reboot your computer. 1678 01:24:14,140 --> 01:24:15,240 So how do we use valgrind? 1679 01:24:15,240 --> 01:24:17,430 Well, let me go ahead and write a short program 1680 01:24:17,430 --> 01:24:20,040 that doesn't do anything useful, but demonstrates 1681 01:24:20,040 --> 01:24:22,080 multiple memory-related mistakes. 1682 01:24:22,080 --> 01:24:24,060 I'll call this file memory.c. 1683 01:24:24,060 --> 01:24:27,550 I'm going to go ahead and open up the file memory.c 1684 01:24:27,550 --> 01:24:30,842 and include at the top standard io dot h. 1685 01:24:30,842 --> 01:24:32,550 And then I'm going to also, preemptively, 1686 01:24:32,550 --> 01:24:37,290 include standard lib dot h, which recalls where malloc, int main void. 1687 01:24:37,290 --> 01:24:39,070 And I'm going to keep this one simple. 1688 01:24:39,070 --> 01:24:42,370 I'm going to go ahead and just give myself a whole bunch of integer. 1689 01:24:42,370 --> 01:24:43,810 So this is actually kind of cool. 1690 01:24:43,810 --> 01:24:46,480 It turns out that-- 1691 01:24:46,480 --> 01:24:47,880 well, let's go ahead. 1692 01:24:47,880 --> 01:24:48,910 Yeah, I can do this. 1693 01:24:48,910 --> 01:24:50,035 Let's go ahead and do this. 1694 01:24:50,035 --> 01:24:52,650 Char star s gets malloc. 1695 01:24:52,650 --> 01:24:57,630 And let me go ahead and give myself, how about three of these. 1696 01:24:57,630 --> 01:25:01,050 Let me go ahead and allocate space for three chars. 1697 01:25:01,050 --> 01:25:03,640 Or actually, let's give me four, just like before. 1698 01:25:03,640 --> 01:25:08,340 Now, I'm going to go ahead and say s bracket 0 equals 72. 1699 01:25:08,340 --> 01:25:12,220 s bracket 1-- actually, I'll just do this manually. 1700 01:25:12,220 --> 01:25:14,160 Let's do h. 1701 01:25:14,160 --> 01:25:16,350 Let's do i. 1702 01:25:16,350 --> 01:25:19,960 Let's do our usual exclamation point. 1703 01:25:19,960 --> 01:25:22,170 And then just for good measure, s bracket 3 gets 1704 01:25:22,170 --> 01:25:24,120 quote unquote, backslash 0. 1705 01:25:24,120 --> 01:25:29,340 This is the very manual way of actually-- 1706 01:25:29,340 --> 01:25:32,430 this is the very manual way of actually building up a string. 1707 01:25:32,430 --> 01:25:34,060 But let me introduce a mistake. 1708 01:25:34,060 --> 01:25:37,320 Let me accidentally allocate only three bytes, 1709 01:25:37,320 --> 01:25:40,440 even though I clearly need a fourth for that terminating nul character. 1710 01:25:40,440 --> 01:25:42,510 And notice too, the absence of free. 1711 01:25:42,510 --> 01:25:45,720 I'm going to, very sloppily, not bother calling free. 1712 01:25:45,720 --> 01:25:49,590 Now, I'm going to go ahead and compile this program, make memory. 1713 01:25:49,590 --> 01:25:53,430 OK, it compiles OK, so that's good, dot slash memory. 1714 01:25:53,430 --> 01:25:55,413 OK, nothing happens, but that kind of makes 1715 01:25:55,413 --> 01:25:57,330 sense because I didn't tell it to do anything. 1716 01:25:57,330 --> 01:26:01,360 Just for kicks, let's print out that string just like we always do. 1717 01:26:01,360 --> 01:26:04,500 Let me now recompile memory, still compiles. 1718 01:26:04,500 --> 01:26:06,360 Let me run dot slash memory. 1719 01:26:06,360 --> 01:26:07,570 OK, it seems to work. 1720 01:26:07,570 --> 01:26:10,000 So at first glance, you might be really proud of yourself. 1721 01:26:10,000 --> 01:26:12,910 You've written another correct program, seems to pass check50. 1722 01:26:12,910 --> 01:26:13,410 You submit. 1723 01:26:13,410 --> 01:26:14,327 You go about your day. 1724 01:26:14,327 --> 01:26:16,380 And you're very disappointed some days later 1725 01:26:16,380 --> 01:26:19,920 when you realize, dammit, I did not get full credit on this because there's 1726 01:26:19,920 --> 01:26:21,780 actually a latent bug. 1727 01:26:21,780 --> 01:26:24,780 So sometimes, indeed, there are bugs in your code 1728 01:26:24,780 --> 01:26:27,420 that you don't necessarily see visually, you don't necessarily 1729 01:26:27,420 --> 01:26:30,990 experience when running it yourself, but eventually, there 1730 01:26:30,990 --> 01:26:33,443 might be an error when running it enough times. 1731 01:26:33,443 --> 01:26:36,360 Eventually, a computer might notice that you're doing something wrong. 1732 01:26:36,360 --> 01:26:38,460 And thankfully, tools exist like valgrind, 1733 01:26:38,460 --> 01:26:40,098 that can allow you to detect that. 1734 01:26:40,098 --> 01:26:43,140 So let me go ahead and just increase the size of my terminal window here. 1735 01:26:43,140 --> 01:26:48,090 And let me go ahead and run valgrind on dot slash memory. 1736 01:26:48,090 --> 01:26:49,290 So it's just like debug50. 1737 01:26:49,290 --> 01:26:53,040 Instead of running debug50 and then dot slash whatever the program is, 1738 01:26:53,040 --> 01:26:55,813 you run valgrind dot slash memory. 1739 01:26:55,813 --> 01:26:58,230 This one, unfortunately, is only a command line interface. 1740 01:26:58,230 --> 01:27:00,480 There's no graphical user interface like debug50. 1741 01:27:00,480 --> 01:27:04,530 And honestly, it's a hideous sequence of output. 1742 01:27:04,530 --> 01:27:06,630 This should overwhelm you at first glance. 1743 01:27:06,630 --> 01:27:08,190 There's crazy cryptic-ness here. 1744 01:27:08,190 --> 01:27:09,690 It's not the best-designed program. 1745 01:27:09,690 --> 01:27:12,520 It really was meant for the most comfortable people. 1746 01:27:12,520 --> 01:27:15,180 But there are some useful tidbits we can take away from it. 1747 01:27:15,180 --> 01:27:17,490 As always, let me show all the way to the top 1748 01:27:17,490 --> 01:27:19,260 to the very first line of output. 1749 01:27:19,260 --> 01:27:21,600 And I'll draw your attention to a couple of things 1750 01:27:21,600 --> 01:27:23,070 that will start to jump out to you. 1751 01:27:23,070 --> 01:27:24,960 And help50 can help you with this. 1752 01:27:24,960 --> 01:27:28,020 If you're confused by valgrind's output, rerun it. 1753 01:27:28,020 --> 01:27:29,520 But put help50 at the beginning. 1754 01:27:29,520 --> 01:27:32,120 And just like I will do now verbally, so can help50 1755 01:27:32,120 --> 01:27:36,510 help you notice the important things in this crazy mess of output. 1756 01:27:36,510 --> 01:27:37,770 This is worrisome. 1757 01:27:37,770 --> 01:27:41,880 Valgrind is noting on this line here, invalid right of size 1. 1758 01:27:41,880 --> 01:27:44,370 And that's on line 10 of memory.c. 1759 01:27:44,370 --> 01:27:46,510 So we'll look at that in a moment. 1760 01:27:46,510 --> 01:27:50,530 If I scroll down further, invalid read of size 1. 1761 01:27:50,530 --> 01:27:55,810 And that also seems to be on here, it looks like, on line 11 of memory.c. 1762 01:27:55,810 --> 01:27:59,070 And then if I keep scrolling, keep scrolling, keep scrolling, 1763 01:27:59,070 --> 01:28:00,990 I'm not liking this. 1764 01:28:00,990 --> 01:28:05,910 3 bytes in 1 blocks are definitely lost in loss record, whatever that is. 1765 01:28:05,910 --> 01:28:10,170 But three bytes in 1 blocks are definitely lost. 1766 01:28:10,170 --> 01:28:15,240 And then down here, leak summary, definitely lost, 3 bytes in 1 blocks. 1767 01:28:15,240 --> 01:28:17,703 Incidentally, 1 blocks, obviously not correct grammar. 1768 01:28:17,703 --> 01:28:19,620 This is what happens when your program doesn't 1769 01:28:19,620 --> 01:28:24,210 have an if condition that checks if the number is 1 or positive or 0. 1770 01:28:24,210 --> 01:28:27,300 You could fix this, grammatically, honestly, with a simple if condition. 1771 01:28:27,300 --> 01:28:29,770 They did not when writing this program years ago. 1772 01:28:29,770 --> 01:28:32,110 So there's two or three mistakes here. 1773 01:28:32,110 --> 01:28:34,620 One is some kind of invalid read or write. 1774 01:28:34,620 --> 01:28:35,953 And another is this leak. 1775 01:28:35,953 --> 01:28:36,870 Well, what is a write? 1776 01:28:36,870 --> 01:28:38,940 A write just refers to changing a value. 1777 01:28:38,940 --> 01:28:43,150 A read just refers to reading or using or printing a value. 1778 01:28:43,150 --> 01:28:44,730 So let's focus on line 10. 1779 01:28:44,730 --> 01:28:48,060 If I scroll back down to my code and look on line 10, 1780 01:28:48,060 --> 01:28:51,760 this was an invalid write, invalid write. 1781 01:28:51,760 --> 01:28:52,950 Well, why is it invalid? 1782 01:28:52,950 --> 01:28:57,180 Well, per today's definition, if you are allocating 3 bytes, 1783 01:28:57,180 --> 01:29:01,710 you are welcome to touch the first byte, the second byte, and the third byte. 1784 01:29:01,710 --> 01:29:04,500 But you have no business touching the fourth byte 1785 01:29:04,500 --> 01:29:06,420 if you've only asked for three. 1786 01:29:06,420 --> 01:29:11,070 This is like a small scale version of the very adventurous and inappropriate 1787 01:29:11,070 --> 01:29:14,100 poking around I did when I looked at 10,000 bytes away. 1788 01:29:14,100 --> 01:29:16,680 Even looking one byte away is a potential bug 1789 01:29:16,680 --> 01:29:18,780 and can cause a program to crash. 1790 01:29:18,780 --> 01:29:21,720 Meanwhile, line 11 is also problematic, which 1791 01:29:21,720 --> 01:29:25,470 is an invalid read, because now, you're saying go print out this string. 1792 01:29:25,470 --> 01:29:28,043 But that string contains a memory address 1793 01:29:28,043 --> 01:29:30,210 that you should not have touched in the first place. 1794 01:29:30,210 --> 01:29:34,080 And the memory leak, the third problem, stems from the fact 1795 01:29:34,080 --> 01:29:36,520 that I didn't free that memory. 1796 01:29:36,520 --> 01:29:40,380 So again, it'll take some practice and experience, some mistakes of your own, 1797 01:29:40,380 --> 01:29:42,480 to notice and understand these bugs. 1798 01:29:42,480 --> 01:29:44,670 But let me fix the first two like this. 1799 01:29:44,670 --> 01:29:46,530 Let me just give myself four bytes. 1800 01:29:46,530 --> 01:29:48,990 And let me fix the second one or the third one, 1801 01:29:48,990 --> 01:29:53,820 really, by freeing s at the very end, because again, any time you use malloc 1802 01:29:53,820 --> 01:29:55,590 you must use free. 1803 01:29:55,590 --> 01:29:59,310 Let me go ahead and recompile memory, seems to compile. 1804 01:29:59,310 --> 01:30:02,130 Let me rerun it, still works the same, visually. 1805 01:30:02,130 --> 01:30:05,670 But now, let's rerun valgrind on it and see if there are any errors now, 1806 01:30:05,670 --> 01:30:08,710 so valgrind dot slash memory, Enter. 1807 01:30:08,710 --> 01:30:10,710 The output's still going to look pretty cryptic. 1808 01:30:10,710 --> 01:30:15,300 But notice all heap blocks were freed, whatever that means. 1809 01:30:15,300 --> 01:30:16,217 No leaks are possible. 1810 01:30:16,217 --> 01:30:18,133 It doesn't really get more explicit than that. 1811 01:30:18,133 --> 01:30:19,090 That's a good thing. 1812 01:30:19,090 --> 01:30:23,100 And if I scroll up, I see no mention of those invalid reads or writes. 1813 01:30:23,100 --> 01:30:26,168 So starting with this week's problems and next week's in C, 1814 01:30:26,168 --> 01:30:27,960 not only are you going to want to use tools 1815 01:30:27,960 --> 01:30:31,590 like help50 and printf and debug50 and check50, 1816 01:30:31,590 --> 01:30:35,710 but even if you think your code's right, the output looks right, 1817 01:30:35,710 --> 01:30:37,050 you might have a latent bug. 1818 01:30:37,050 --> 01:30:40,200 And even when your programs are small, they might not crash the computer. 1819 01:30:40,200 --> 01:30:43,500 They might not cause that segmentation fault. Eventually, they will. 1820 01:30:43,500 --> 01:30:47,850 And you do want to use tools like this to chase down any such mistakes. 1821 01:30:47,850 --> 01:30:50,460 Otherwise, bad things can happen. 1822 01:30:50,460 --> 01:30:51,600 And what might happen? 1823 01:30:51,600 --> 01:30:54,900 Well, let me go ahead and reveal an example here 1824 01:30:54,900 --> 01:30:57,840 that presents some code that's a little dangerous. 1825 01:30:57,840 --> 01:31:00,600 So here, for instance, is an example where 1826 01:31:00,600 --> 01:31:05,202 I'm declaring at the top of the function, int star x and int star y. 1827 01:31:05,202 --> 01:31:06,160 So what does that mean? 1828 01:31:06,160 --> 01:31:08,700 Well, per today's parlance, this just means give me 1829 01:31:08,700 --> 01:31:11,550 a pointer to an integer called x. 1830 01:31:11,550 --> 01:31:13,800 Give me a pointer to an integer called y. 1831 01:31:13,800 --> 01:31:16,650 Put another way, give me a variable called x that I 1832 01:31:16,650 --> 01:31:18,900 can store the address of an int in. 1833 01:31:18,900 --> 01:31:23,640 Give me a variable called y that I can store the address of another int in. 1834 01:31:23,640 --> 01:31:27,880 But notice what I am not doing on these first two lines. 1835 01:31:27,880 --> 01:31:31,950 I'm not actually assigning them a value until line 3. 1836 01:31:31,950 --> 01:31:36,000 On line 3, even though this is weird-- this is not how we've allocated space 1837 01:31:36,000 --> 01:31:37,530 for integers before-- 1838 01:31:37,530 --> 01:31:41,130 there's no reason that you can't use malloc 1839 01:31:41,130 --> 01:31:45,550 and say, give me enough space for the size of an integer. 1840 01:31:45,550 --> 01:31:46,370 sizeof is new. 1841 01:31:46,370 --> 01:31:50,150 It's just an operator in C that tells you the size of a data type, 1842 01:31:50,150 --> 01:31:51,500 like a size of an int. 1843 01:31:51,500 --> 01:31:53,480 So maybe you forgot that an int is 4. 1844 01:31:53,480 --> 01:31:56,450 And indeed, an int is usually 4, but not always 4 in all systems. 1845 01:31:56,450 --> 01:32:00,020 So size of int just makes sure that it will always give you the right answer, 1846 01:32:00,020 --> 01:32:02,630 whether you're using a modern computer or an old one. 1847 01:32:02,630 --> 01:32:07,190 So this just means, really, allocate 4 bytes to me on a modern system. 1848 01:32:07,190 --> 01:32:11,370 And it stores the address of the first byte in x. 1849 01:32:11,370 --> 01:32:15,360 Would someone mind translating to layman's terms, what 1850 01:32:15,360 --> 01:32:18,480 is star x equal 42 doing? 1851 01:32:18,480 --> 01:32:20,880 Star, again, is the dereference operator. 1852 01:32:20,880 --> 01:32:23,430 It means go to the address. 1853 01:32:23,430 --> 01:32:24,375 And do what? 1854 01:32:24,375 --> 01:32:27,510 How would you describe, with a verbal comment, 1855 01:32:27,510 --> 01:32:30,450 what star x equals 42 is doing? 1856 01:32:30,450 --> 01:32:33,630 Brian, would you mind verbalizing any thoughts? 1857 01:32:33,630 --> 01:32:37,555 BRIAN: Yeah, so Sophia suggested that at that address, we are going to place 42. 1858 01:32:37,555 --> 01:32:38,430 DAVID MALAN: Perfect. 1859 01:32:38,430 --> 01:32:40,080 At that address put 42. 1860 01:32:40,080 --> 01:32:44,640 Equivalently, go to that address in x and put the number 42 there. 1861 01:32:44,640 --> 01:32:48,870 It's like going to Brian's mailbox and putting the 42 in his mailbox, 1862 01:32:48,870 --> 01:32:52,035 instead of what we previously had there, which was the number 50. 1863 01:32:52,035 --> 01:32:57,180 How about this next fifth line, star y equals 13? 1864 01:32:57,180 --> 01:32:59,670 Brian, could you verbalize someone else? 1865 01:32:59,670 --> 01:33:03,500 What does star y equals 13 do for us? 1866 01:33:03,500 --> 01:33:07,850 And it's not an accident that 13 tends to be unlucky. 1867 01:33:07,850 --> 01:33:10,530 BRIAN: Peter says, put 13 at the address y. 1868 01:33:10,530 --> 01:33:12,710 DAVID MALAN: Good, put 13 at the address in y. 1869 01:33:12,710 --> 01:33:16,860 Or put another way, go to the address in y and put 13 there. 1870 01:33:16,860 --> 01:33:19,070 But there's a logical problem here. 1871 01:33:19,070 --> 01:33:20,870 What is in y? 1872 01:33:20,870 --> 01:33:24,860 If I rewind, I never actually assign y a value. 1873 01:33:24,860 --> 01:33:27,050 I don't initially, and I don't eventually. 1874 01:33:27,050 --> 01:33:30,500 At least with x, even though I didn't give it a value in declaring it up here 1875 01:33:30,500 --> 01:33:34,850 as a variable, I eventually got around to storing in it the actual address. 1876 01:33:34,850 --> 01:33:38,060 Now, just to be really nit picky, I should probably even, in this program, 1877 01:33:38,060 --> 01:33:40,495 check for nul just in case anything went wrong. 1878 01:33:40,495 --> 01:33:41,870 But that's a whole other problem. 1879 01:33:41,870 --> 01:33:46,470 It is a more damning problem that I haven't even given y a value. 1880 01:33:46,470 --> 01:33:49,610 And here's where we can reveal one other detail about a computer. 1881 01:33:49,610 --> 01:33:53,750 Thus far, we've been taking for granted that you and I almost always initialize 1882 01:33:53,750 --> 01:33:54,360 our memory. 1883 01:33:54,360 --> 01:33:56,900 If we want to give ourselves a char, an int, a string, 1884 01:33:56,900 --> 01:33:59,900 we literally type it out into the program 1885 01:33:59,900 --> 01:34:02,150 itself so that it's there when we want it. 1886 01:34:02,150 --> 01:34:04,070 But if we consider this picture here, which 1887 01:34:04,070 --> 01:34:07,370 is now just a physical incarnation of some of the contents of your computer's 1888 01:34:07,370 --> 01:34:11,750 memory, playfully labeled with a lot of Oscar the Grouches, 1889 01:34:11,750 --> 01:34:16,250 this is because you should never trust the contents of your computer's memory 1890 01:34:16,250 --> 01:34:18,500 if you yourself have not put something there. 1891 01:34:18,500 --> 01:34:21,560 There's a term of art in programming called garbage values. 1892 01:34:21,560 --> 01:34:26,180 If you yourself have not put a value somewhere in memory, 1893 01:34:26,180 --> 01:34:30,210 you should assume, to be safe, that it is a quote unquote, "garbage value." 1894 01:34:30,210 --> 01:34:31,440 It's not a weird value. 1895 01:34:31,440 --> 01:34:34,580 It's just a 1, a 2, an A, a B, a C, you just 1896 01:34:34,580 --> 01:34:38,510 don't know what it is, because if your program is running over time 1897 01:34:38,510 --> 01:34:40,890 and you're calling functions and functions are returning. 1898 01:34:40,890 --> 01:34:43,348 You're calling other functions and functions are returning. 1899 01:34:43,348 --> 01:34:46,970 These values in your computer's memory are constantly changing, 1900 01:34:46,970 --> 01:34:48,740 and your memory gets reused. 1901 01:34:48,740 --> 01:34:53,180 When you free memory, that doesn't erase it or set it all back to 0's or set it 1902 01:34:53,180 --> 01:34:53,990 all back to 1's. 1903 01:34:53,990 --> 01:34:56,600 It just leaves it alone so that you can reuse 1904 01:34:56,600 --> 01:34:59,810 it, which means over time, your computer contains remnants 1905 01:34:59,810 --> 01:35:03,960 of all of the variables you've ever used in your program over here, over here, 1906 01:35:03,960 --> 01:35:04,730 over there. 1907 01:35:04,730 --> 01:35:10,850 And so in a program like this, where you have not explicitly initialized y 1908 01:35:10,850 --> 01:35:14,000 to anything, you should assume that Oscar the Grouch, so to speak, 1909 01:35:14,000 --> 01:35:15,020 is at that location. 1910 01:35:15,020 --> 01:35:20,570 It is a garbage value that looks like an address but is not a valid address. 1911 01:35:20,570 --> 01:35:25,040 And so when you say star y equals 13, that means go to that address. 1912 01:35:25,040 --> 01:35:28,910 But really, go to that bogus address and put something there. 1913 01:35:28,910 --> 01:35:31,850 And odds are, your program is going to crash. 1914 01:35:31,850 --> 01:35:33,650 You are going to get a segmentation fault, 1915 01:35:33,650 --> 01:35:37,562 because by going to some arbitrary garbage value address, 1916 01:35:37,562 --> 01:35:40,520 it would be like picking up a random piece of paper with a number on it 1917 01:35:40,520 --> 01:35:42,030 and then going to that mailbox. 1918 01:35:42,030 --> 01:35:42,530 Why? 1919 01:35:42,530 --> 01:35:44,300 It does it belong to you. 1920 01:35:44,300 --> 01:35:47,930 If you try to dereference an uninitialized variable, 1921 01:35:47,930 --> 01:35:49,850 your program may very well crash. 1922 01:35:49,850 --> 01:35:51,890 And this is, perhaps, no better-presented 1923 01:35:51,890 --> 01:35:55,970 than by some of our friends, Nick Parlante, a professor at Stanford 1924 01:35:55,970 --> 01:36:02,510 University who is breathed life into a character in claymation known as Binky. 1925 01:36:02,510 --> 01:36:06,140 We have just a 2 minute clip from this that paints the picture of bad things 1926 01:36:06,140 --> 01:36:09,020 indeed happening when you touch memory that you shouldn't. 1927 01:36:09,020 --> 01:36:13,340 So hopefully, a helpful reminder as to what to do and not to do with pointers. 1928 01:36:13,340 --> 01:36:14,790 Here we go. 1929 01:36:14,790 --> 01:36:16,610 [VIDEO PLAYBACK] 1930 01:36:16,610 --> 01:36:17,540 - Hey, Binky. 1931 01:36:17,540 --> 01:36:20,890 Wake up, it's time for pointer fun. 1932 01:36:20,890 --> 01:36:22,060 - What's that? 1933 01:36:22,060 --> 01:36:23,620 Learn about pointers? 1934 01:36:23,620 --> 01:36:25,390 Oh, goody! 1935 01:36:25,390 --> 01:36:28,430 - Well, to get started, I guess we're going to need a couple pointers. 1936 01:36:28,430 --> 01:36:32,940 - OK, this code allocates two pointers which can point to integers. 1937 01:36:32,940 --> 01:36:35,042 - OK, well I see the two pointers. 1938 01:36:35,042 --> 01:36:37,000 But they don't seem to be pointing to anything. 1939 01:36:37,000 --> 01:36:37,780 - That's right. 1940 01:36:37,780 --> 01:36:39,970 Initially, pointers don't point to anything. 1941 01:36:39,970 --> 01:36:42,190 The things they point to or called pointees. 1942 01:36:42,190 --> 01:36:44,110 And setting them up's a separate step. 1943 01:36:44,110 --> 01:36:45,100 - Oh, right, right. 1944 01:36:45,100 --> 01:36:45,790 I knew that. 1945 01:36:45,790 --> 01:36:47,750 The pointees are separate. 1946 01:36:47,750 --> 01:36:50,050 So how do you allocate a pointee? 1947 01:36:50,050 --> 01:36:53,800 - OK, well, this code allocates a new integer pointee. 1948 01:36:53,800 --> 01:36:56,880 And this part sets x to point to it. 1949 01:36:56,880 --> 01:36:58,180 - Hey, that looks better. 1950 01:36:58,180 --> 01:36:59,700 So make it do something. 1951 01:36:59,700 --> 01:37:05,460 - OK, I'll dereference the pointer x to store the number 42 into its pointee. 1952 01:37:05,460 --> 01:37:08,970 For this trick, I'll need my magic wand of dereferencing. 1953 01:37:08,970 --> 01:37:12,660 - Your magic wand of dereferencing? 1954 01:37:12,660 --> 01:37:14,170 That's great. 1955 01:37:14,170 --> 01:37:15,910 - This is what the code looks like. 1956 01:37:15,910 --> 01:37:17,800 I'll just set up the number and-- 1957 01:37:17,800 --> 01:37:18,900 [POP] 1958 01:37:18,900 --> 01:37:21,000 - Hey, look, there it goes. 1959 01:37:21,000 --> 01:37:25,830 So doing a dereference on x follows the arrow to access its pointee. 1960 01:37:25,830 --> 01:37:28,020 In this case, to store 42 in there. 1961 01:37:28,020 --> 01:37:32,450 Hey, try using it to store the number 13 through the other pointer, y. 1962 01:37:32,450 --> 01:37:33,570 - OK. 1963 01:37:33,570 --> 01:37:38,100 I'll just go over here to y and get the number 13 set up 1964 01:37:38,100 --> 01:37:41,970 and then take the wand of dereferencing and just-- 1965 01:37:41,970 --> 01:37:43,580 [HORN] whoa! 1966 01:37:43,580 --> 01:37:45,930 - Oh, hey, that didn't work. 1967 01:37:45,930 --> 01:37:51,370 Say, Binky, I don't think dereferencing y is a good idea, because setting up 1968 01:37:51,370 --> 01:37:52,840 the pointee is a separate step. 1969 01:37:52,840 --> 01:37:54,815 And I don't think we ever did it. 1970 01:37:54,815 --> 01:37:56,430 - Hmm, good point. 1971 01:37:56,430 --> 01:37:58,800 - Yeah, we allocated the pointer y. 1972 01:37:58,800 --> 01:38:01,570 But we never set it to point to a pointee. 1973 01:38:01,570 --> 01:38:03,480 - Hmm, very observant. 1974 01:38:03,480 --> 01:38:05,310 - Hey, you're looking good there, Binky. 1975 01:38:05,310 --> 01:38:08,250 Can you fix it so that y points to the same pointee as x? 1976 01:38:08,250 --> 01:38:11,620 - Sure, I'll use my magic wand of pointer assignment. 1977 01:38:11,620 --> 01:38:13,800 - Is that going to be a problem like before? 1978 01:38:13,800 --> 01:38:15,630 - No, this doesn't touch the pointees. 1979 01:38:15,630 --> 01:38:19,170 It just changes one pointer to point to the same thing as another. 1980 01:38:19,170 --> 01:38:20,310 - Oh, I see. 1981 01:38:20,310 --> 01:38:23,040 Now, y points to the same place as x. 1982 01:38:23,040 --> 01:38:24,840 So wait, now y is fixed. 1983 01:38:24,840 --> 01:38:25,950 It has a pointee. 1984 01:38:25,950 --> 01:38:29,760 So you can try the wand of dereferencing again to send the 13 over. 1985 01:38:29,760 --> 01:38:31,093 - Oh, OK. 1986 01:38:31,093 --> 01:38:31,635 Here it goes. 1987 01:38:31,635 --> 01:38:32,900 [POP] 1988 01:38:32,900 --> 01:38:34,160 - Hey, look at that. 1989 01:38:34,160 --> 01:38:35,870 Now, dereferencing works on y. 1990 01:38:35,870 --> 01:38:39,980 And because the pointers are sharing that one pointee, they both see the 13. 1991 01:38:39,980 --> 01:38:41,720 - Yeah, sharing, whatever. 1992 01:38:41,720 --> 01:38:43,610 So are we going to switch places now? 1993 01:38:43,610 --> 01:38:45,270 - Oh look, we're out of time. 1994 01:38:45,270 --> 01:38:45,770 - But-- 1995 01:38:45,770 --> 01:38:46,040 [END PLAYBACK] 1996 01:38:46,040 --> 01:38:47,570 DAVID MALAN: All right, so we are not quite out of time. 1997 01:38:47,570 --> 01:38:50,028 But let's go ahead and take our second 5 minute break here. 1998 01:38:50,028 --> 01:38:52,910 And when we return, we'll take a closer look at Oscar and more. 1999 01:38:52,910 --> 01:38:54,260 Back in 5. 2000 01:38:54,260 --> 01:38:57,380 All right, so I claim that there's all these garbage 2001 01:38:57,380 --> 01:38:58,950 values in your computer's memory. 2002 01:38:58,950 --> 01:39:00,860 But how can you see them? 2003 01:39:00,860 --> 01:39:04,400 What Binky did was, of course, try to dereference a garbage value 2004 01:39:04,400 --> 01:39:05,817 when bad things happen. 2005 01:39:05,817 --> 01:39:07,900 But we can actually see this with code of our own. 2006 01:39:07,900 --> 01:39:10,970 So let me go ahead, quickly, and whip up a little program here, 2007 01:39:10,970 --> 01:39:15,290 just like something we did in week one or week two, 2008 01:39:15,290 --> 01:39:17,090 but without doing it very well. 2009 01:39:17,090 --> 01:39:21,410 Let me go ahead and include standard io dot h as usual, int main void. 2010 01:39:21,410 --> 01:39:24,290 And then let me go ahead and give myself an array of scores. 2011 01:39:24,290 --> 01:39:26,000 How about an array of three scores? 2012 01:39:26,000 --> 01:39:28,715 And we've done this before where we collected scores from a user. 2013 01:39:28,715 --> 01:39:30,590 But this time, I'm going to deliberately make 2014 01:39:30,590 --> 01:39:33,170 the mistake of not actually initializing those scores 2015 01:39:33,170 --> 01:39:35,450 or even asking the human for those scores. 2016 01:39:35,450 --> 01:39:41,060 I'm just going to blindly go about iterating from i equals 0 on up to 3. 2017 01:39:41,060 --> 01:39:46,070 And on each iteration, I'm just going to presumptuously print whatever is 2018 01:39:46,070 --> 01:39:49,220 at that location in scores bracket i. 2019 01:39:49,220 --> 01:39:52,430 So logically, my code is correct in what it's trying to do, 2020 01:39:52,430 --> 01:39:54,230 print out the values in scores. 2021 01:39:54,230 --> 01:39:57,170 But notice that I have deliberately not initialized any 2022 01:39:57,170 --> 01:40:00,147 of the 1, 2, 3 scores in that array. 2023 01:40:00,147 --> 01:40:01,730 So who knows what's going to be there? 2024 01:40:01,730 --> 01:40:04,650 Indeed, it should be garbage values of some sort 2025 01:40:04,650 --> 01:40:06,650 that we couldn't necessarily predict in advance. 2026 01:40:06,650 --> 01:40:10,050 So let me go ahead and make garbage, since this program 2027 01:40:10,050 --> 01:40:11,300 is in a file called garbage.c. 2028 01:40:11,300 --> 01:40:15,140 Compiles OK, but when I now run garbage, we 2029 01:40:15,140 --> 01:40:21,230 should see three scores, which are cryptically negative, 833060864. 2030 01:40:21,230 --> 01:40:23,780 Another one is 32765. 2031 01:40:23,780 --> 01:40:25,760 And the third just happens to be 0. 2032 01:40:25,760 --> 01:40:28,490 So there are those garbage values, because again, the computer 2033 01:40:28,490 --> 01:40:31,800 is not going to initialize any of those values for you. 2034 01:40:31,800 --> 01:40:33,570 Now, there are exceptions. 2035 01:40:33,570 --> 01:40:36,320 We have, on occasion, used a global variable, 2036 01:40:36,320 --> 01:40:40,490 a constant that is outside the context of main and all of my other functions. 2037 01:40:40,490 --> 01:40:42,860 Global variables, if you do not set them, 2038 01:40:42,860 --> 01:40:47,210 are conventionally initialized to 0 or nul for you. 2039 01:40:47,210 --> 01:40:50,000 But you should generally not rely on that kind of behavior. 2040 01:40:50,000 --> 01:40:53,120 Your instinct should be to always initialize values 2041 01:40:53,120 --> 01:40:56,630 before thinking of touching or reading them 2042 01:40:56,630 --> 01:40:59,030 as via printf or some other mechanism. 2043 01:40:59,030 --> 01:41:02,720 All right, well, let's see how this understanding, now, of memory, 2044 01:41:02,720 --> 01:41:06,350 can lead us to solve problems, but also encounter new types of problems, 2045 01:41:06,350 --> 01:41:08,960 but problems that we can now hopefully understand. 2046 01:41:08,960 --> 01:41:11,250 I'm going to go ahead and create a new program here. 2047 01:41:11,250 --> 01:41:14,390 And recall from last week that it was very common 2048 01:41:14,390 --> 01:41:15,890 for us to want to swap values. 2049 01:41:15,890 --> 01:41:19,010 When Brian was doing our sorts for us, whether it was selection or bubble 2050 01:41:19,010 --> 01:41:21,710 sort, there was a lot of swapping going on. 2051 01:41:21,710 --> 01:41:24,440 And yet, we didn't really write any code for those algorithms. 2052 01:41:24,440 --> 01:41:25,232 And that's fine. 2053 01:41:25,232 --> 01:41:27,440 But let's consider that very simple primitive of just 2054 01:41:27,440 --> 01:41:30,440 swapping two values, for instance, swapping two integers. 2055 01:41:30,440 --> 01:41:34,160 Let me go ahead and give myself the start of a program and swap.c here. 2056 01:41:34,160 --> 01:41:38,630 I'm going to include standard io dot h, int main void. 2057 01:41:38,630 --> 01:41:41,370 And inside of main, I'm going to give myself two integers. 2058 01:41:41,370 --> 01:41:44,960 Let's just give myself an int called x and assign it 1, an int called y 2059 01:41:44,960 --> 01:41:46,140 and assign it 2. 2060 01:41:46,140 --> 01:41:48,890 And then let me go ahead and just print out what those values are. 2061 01:41:48,890 --> 01:41:55,520 I'll just say, literally, x is percent i comma y is percent i backslash n. 2062 01:41:55,520 --> 01:41:59,490 And then I'm going to go ahead and print out x comma y, respectively. 2063 01:41:59,490 --> 01:42:02,930 And then I'm eventually going to write a function called 2064 01:42:02,930 --> 01:42:04,613 swap that swaps x and y. 2065 01:42:04,613 --> 01:42:06,530 But let's assume, for the moment, that exists. 2066 01:42:06,530 --> 01:42:08,870 It doesn't, because what I then want to do right 2067 01:42:08,870 --> 01:42:13,340 after that is just reprint the same thing, x is now percent i, 2068 01:42:13,340 --> 01:42:17,690 y is percent i, my presumption being that the values of x and y 2069 01:42:17,690 --> 01:42:18,870 will be swapped. 2070 01:42:18,870 --> 01:42:20,480 So how might I swap these two values? 2071 01:42:20,480 --> 01:42:23,120 Well, let me go ahead and implement my own function. 2072 01:42:23,120 --> 01:42:25,110 I don't think it needs to return anything, 2073 01:42:25,110 --> 01:42:27,110 so I'm going to say void is the return type. 2074 01:42:27,110 --> 01:42:28,340 I'll call it swap. 2075 01:42:28,340 --> 01:42:30,830 It's going to take two arguments as input. 2076 01:42:30,830 --> 01:42:33,320 We'll call it a and b, both integers. 2077 01:42:33,320 --> 01:42:34,820 But I could call it anything I want. 2078 01:42:34,820 --> 01:42:36,800 But a and b seems reasonable. 2079 01:42:36,800 --> 01:42:39,350 And now, I want to go ahead and swap two values. 2080 01:42:39,350 --> 01:42:42,140 Now, Brian was kind of doing this with his two hands last week. 2081 01:42:42,140 --> 01:42:45,830 And that's fine, but we should probably consider this a little more closely. 2082 01:42:45,830 --> 01:42:48,050 In fact, Brian, instead of numbers, let's 2083 01:42:48,050 --> 01:42:49,920 do something a little more real world. 2084 01:42:49,920 --> 01:42:53,080 I think you have a couple of beverages in front of you. 2085 01:42:53,080 --> 01:42:53,580 BRIAN: Yeah. 2086 01:42:53,580 --> 01:42:56,220 So right here, I have a red glass and a blue glass, 2087 01:42:56,220 --> 01:42:58,970 which I guess we can use to represent two variables, for instance. 2088 01:42:58,970 --> 01:42:59,180 DAVID MALAN: Yeah. 2089 01:42:59,180 --> 01:43:00,198 Now, let me suppose-- 2090 01:43:00,198 --> 01:43:01,490 I wish I'd told you in advance. 2091 01:43:01,490 --> 01:43:03,920 I'd actually prefer that the red liquid be 2092 01:43:03,920 --> 01:43:07,050 in the blue glass and the blue liquid be in the red glass. 2093 01:43:07,050 --> 01:43:08,780 So do you mind swapping those two values, 2094 01:43:08,780 --> 01:43:11,310 just like you swapped numbers last week? 2095 01:43:11,310 --> 01:43:12,060 BRIAN: Yeah, sure. 2096 01:43:12,060 --> 01:43:14,810 So I can just take the two glasses, and I can switch their places. 2097 01:43:14,810 --> 01:43:17,717 DAVID MALAN: OK, wait, OK, that's not exactly-- 2098 01:43:17,717 --> 01:43:18,800 you took me too literally. 2099 01:43:18,800 --> 01:43:22,760 I think here, if we think of the glasses, now, as specific locations 2100 01:43:22,760 --> 01:43:24,980 in memory, you can't just physically move 2101 01:43:24,980 --> 01:43:27,540 the chips of memory inside of your computer to swap things. 2102 01:43:27,540 --> 01:43:30,410 So I think I literally need you to move the blue liquid 2103 01:43:30,410 --> 01:43:33,350 into the red glass and the red liquid into the blue glass 2104 01:43:33,350 --> 01:43:36,100 so that it's more like a computer's memory. 2105 01:43:36,100 --> 01:43:37,657 BRIAN: OK, I can try to do that. 2106 01:43:37,657 --> 01:43:40,240 I'm a little nervous, though, because I feel like I can't just 2107 01:43:40,240 --> 01:43:43,270 pour the blue liquid into the red glass, because the red liquid's already 2108 01:43:43,270 --> 01:43:43,640 in there. 2109 01:43:43,640 --> 01:43:45,730 DAVID MALAN: Yeah, so this probably doesn't end well, 2110 01:43:45,730 --> 01:43:48,220 if he's got to do some kind of switcheroo between the two glasses. 2111 01:43:48,220 --> 01:43:49,240 So any thoughts here? 2112 01:43:49,240 --> 01:43:54,100 Like what is the real world solution to this weird but real problem, where 2113 01:43:54,100 --> 01:43:57,490 we want to swap the contents of these two locations, 2114 01:43:57,490 --> 01:44:01,180 just like Brian was swapping the contents of two memory locations 2115 01:44:01,180 --> 01:44:02,290 last week? 2116 01:44:02,290 --> 01:44:04,900 Brian, if you have your eye on the chat in parallel, 2117 01:44:04,900 --> 01:44:08,480 might anyone have ideas on how we could swap these two liquids? 2118 01:44:08,480 --> 01:44:11,620 BRIAN: Yeah, a couple of people are saying that I need a third glass. 2119 01:44:11,620 --> 01:44:13,370 DAVID MALAN: All right, well Brian, do you 2120 01:44:13,370 --> 01:44:16,370 happen to have a third glass with you back there behind back stage? 2121 01:44:16,370 --> 01:44:18,040 BRIAN: In fact, I think I do. 2122 01:44:18,040 --> 01:44:21,190 So I have a third glass here that just so happens to be empty. 2123 01:44:21,190 --> 01:44:22,100 DAVID MALAN: OK. 2124 01:44:22,100 --> 01:44:25,610 And how would you, now, go about swapping these two things? 2125 01:44:25,610 --> 01:44:28,870 BRIAN: All right, so I want to put the blue liquid inside the red glass. 2126 01:44:28,870 --> 01:44:30,578 So the first thing I need to do, I think, 2127 01:44:30,578 --> 01:44:34,040 is just to empty out the red glass to make space for the blue liquid. 2128 01:44:34,040 --> 01:44:36,310 So I'm going to take the red liquid, and I'm just 2129 01:44:36,310 --> 01:44:38,470 going to pour it into this extra glass. 2130 01:44:38,470 --> 01:44:39,520 DAVID MALAN: Temporarily though, right? 2131 01:44:39,520 --> 01:44:39,870 BRIAN: Temporarily, yeah. 2132 01:44:39,870 --> 01:44:40,570 DAVID MALAN: OK. 2133 01:44:40,570 --> 01:44:42,620 BRIAN: Just to keep it to store it there. 2134 01:44:42,620 --> 01:44:45,100 And now, I think I can just pour the blue liquid 2135 01:44:45,100 --> 01:44:48,942 into the original red glass, because now I'm free to do so. 2136 01:44:48,942 --> 01:44:50,400 So I'll pour the blue liquid there. 2137 01:44:50,400 --> 01:44:53,230 2138 01:44:53,230 --> 01:44:56,220 And I think the last thing I need to do now is, now this blue-- 2139 01:44:56,220 --> 01:44:59,680 this glass that originally held the blue liquid is now empty. 2140 01:44:59,680 --> 01:45:03,130 So the red liquid, which was inside of this temporary glass over here, 2141 01:45:03,130 --> 01:45:07,350 I can take the red liquid and just pour it into this glass here. 2142 01:45:07,350 --> 01:45:10,290 And now, I didn't swap the positions of the glasses. 2143 01:45:10,290 --> 01:45:12,390 But the liquids have actually switched places. 2144 01:45:12,390 --> 01:45:15,355 Now, the blue liquid is on the left and the red liquid is on the right. 2145 01:45:15,355 --> 01:45:16,230 DAVID MALAN: Awesome. 2146 01:45:16,230 --> 01:45:18,660 Yeah, I think that is a more literal implementation 2147 01:45:18,660 --> 01:45:21,150 of what you were doing and taking for granted last week, 2148 01:45:21,150 --> 01:45:24,182 swapping the two values in two separate locations. 2149 01:45:24,182 --> 01:45:25,640 So it seems pretty straightforward. 2150 01:45:25,640 --> 01:45:27,210 I just need a little more space. 2151 01:45:27,210 --> 01:45:29,670 I need a temporary variable in code, if you will. 2152 01:45:29,670 --> 01:45:31,545 And it seems I need three steps. 2153 01:45:31,545 --> 01:45:34,670 I need to pour one out, pour the other one out, pour the other one back in. 2154 01:45:34,670 --> 01:45:37,122 So I think I can translate that into code here. 2155 01:45:37,122 --> 01:45:39,330 Let me go ahead and give myself a temporary variable, 2156 01:45:39,330 --> 01:45:40,840 like a glass, like Brian did. 2157 01:45:40,840 --> 01:45:43,650 And I'll call it tmp, T-M-P, which is pretty conventional when 2158 01:45:43,650 --> 01:45:45,180 you want to swap two things in code. 2159 01:45:45,180 --> 01:45:47,850 And I'm going to sign it, temporarily, the value of a. 2160 01:45:47,850 --> 01:45:51,550 I'm going to then change the contents of a to equal whatever the contents of B 2161 01:45:51,550 --> 01:45:52,050 are. 2162 01:45:52,050 --> 01:45:56,010 And then I'm going to change b to be whatever the contents of tmp were. 2163 01:45:56,010 --> 01:45:58,650 So this feels pretty reasonable and pretty correct, 2164 01:45:58,650 --> 01:46:01,230 because it's just a literal translation into code, 2165 01:46:01,230 --> 01:46:03,700 now, of what Brian did in the real world. 2166 01:46:03,700 --> 01:46:05,610 And I think this will compile. 2167 01:46:05,610 --> 01:46:08,040 So let's start there, make swap. 2168 01:46:08,040 --> 01:46:09,690 It does-- oh, doesn't compile. 2169 01:46:09,690 --> 01:46:13,410 OK, previous implicit declaration, oh, so many errors, my god. 2170 01:46:13,410 --> 01:46:15,687 Implicit declaration of function swap-- 2171 01:46:15,687 --> 01:46:16,270 wait a minute. 2172 01:46:16,270 --> 01:46:17,230 I've seen that before. 2173 01:46:17,230 --> 01:46:18,480 I've made this mistake before. 2174 01:46:18,480 --> 01:46:20,050 You might have as well. 2175 01:46:20,050 --> 01:46:23,293 Anytime you see this, recall it's just that you're missing your prototype. 2176 01:46:23,293 --> 01:46:25,710 Remember that the compiler is going to take you literally. 2177 01:46:25,710 --> 01:46:28,500 And if it doesn't know the word swap exists when it sees it, 2178 01:46:28,500 --> 01:46:30,310 it's not going to compile successfully. 2179 01:46:30,310 --> 01:46:33,030 So we need to include my prototype at the top of my file. 2180 01:46:33,030 --> 01:46:35,460 Now, let me try this again, make swap. 2181 01:46:35,460 --> 01:46:36,780 OK, that compiles. 2182 01:46:36,780 --> 01:46:40,950 Let me go ahead now and run swap and recall that, in main, what I did 2183 01:46:40,950 --> 01:46:43,380 was initialize x to 1, y to 2. 2184 01:46:43,380 --> 01:46:45,900 I then print out what x is and what y is. 2185 01:46:45,900 --> 01:46:50,040 I call swap, and then I print out what x is and y is again. 2186 01:46:50,040 --> 01:46:52,770 So I should see 1, 2, and then 2, 1. 2187 01:46:52,770 --> 01:46:55,430 So lets hit Enter. 2188 01:46:55,430 --> 01:46:58,800 Huh, it does not seem to be working. 2189 01:46:58,800 --> 01:47:01,740 Well, let's try it again, just in case-- 2190 01:47:01,740 --> 01:47:04,020 no, not working. 2191 01:47:04,020 --> 01:47:05,530 Well, let me try this. 2192 01:47:05,530 --> 01:47:07,590 Let me add some-- printf is my friend. 2193 01:47:07,590 --> 01:47:10,971 Let me go ahead and say a is percent i. 2194 01:47:10,971 --> 01:47:14,460 b is percent i backslash n, a, b. 2195 01:47:14,460 --> 01:47:15,510 So let's print that out. 2196 01:47:15,510 --> 01:47:16,650 And let's print that out twice. 2197 01:47:16,650 --> 01:47:18,480 So this would be a reasonable debugging technique. 2198 01:47:18,480 --> 01:47:21,605 If you want to know what's going on underneath the hood, add some printf's. 2199 01:47:21,605 --> 01:47:23,760 Let me go ahead and make swap. 2200 01:47:23,760 --> 01:47:26,520 That compiles, dot slash swap. 2201 01:47:26,520 --> 01:47:32,880 And let's see, a is 1, b is 2, a is 2, b is 1. 2202 01:47:32,880 --> 01:47:35,470 But then x and y are unchanged. 2203 01:47:35,470 --> 01:47:37,170 So I feel like my logic is right. 2204 01:47:37,170 --> 01:47:38,550 It's switching a and b. 2205 01:47:38,550 --> 01:47:41,490 But it's not actually switching x and y. 2206 01:47:41,490 --> 01:47:43,340 And I could confirm as much, right? 2207 01:47:43,340 --> 01:47:45,510 The more powerful way to debug this would 2208 01:47:45,510 --> 01:47:49,890 be to run debug50, set a break point, for instance, at line 17, 2209 01:47:49,890 --> 01:47:54,270 step through my code, step by step, stepping into the swap function. 2210 01:47:54,270 --> 01:47:57,030 But for now, it seems clear that swap works. 2211 01:47:57,030 --> 01:48:00,250 But main isn't really seeing those results. 2212 01:48:00,250 --> 01:48:01,450 So what's actually going on? 2213 01:48:01,450 --> 01:48:04,170 Well, let's consider this real world incarnation of what my memory is 2214 01:48:04,170 --> 01:48:05,712 so I can actually move things around. 2215 01:48:05,712 --> 01:48:08,820 And this is all thanks to our friends in the theater's prop shop in back. 2216 01:48:08,820 --> 01:48:10,830 If we think of this as my computer's memory, 2217 01:48:10,830 --> 01:48:12,540 initially, it's all garbage values. 2218 01:48:12,540 --> 01:48:16,080 But I can use this as a canvas to start laying things out in memory. 2219 01:48:16,080 --> 01:48:19,020 But calling functions is something we've taken for granted thus far. 2220 01:48:19,020 --> 01:48:22,200 And it turns out, when you call functions, the computer, by default, 2221 01:48:22,200 --> 01:48:25,500 uses this memory in kind of a standard way. 2222 01:48:25,500 --> 01:48:29,850 In fact, let me go ahead and draw a more pictorial picture. 2223 01:48:29,850 --> 01:48:33,440 Let me draw a more literal picture here, if you will, of the computer's memory 2224 01:48:33,440 --> 01:48:33,940 again. 2225 01:48:33,940 --> 01:48:36,660 So if this is the computer's memory and we zoom in on one of the chips, 2226 01:48:36,660 --> 01:48:39,120 and we think of the chip as having a whole bunch of bytes like this. 2227 01:48:39,120 --> 01:48:42,390 Let's abstract away the actual hardware and think of it as we have been. 2228 01:48:42,390 --> 01:48:45,720 It's just this big rectangular region of memory, not unlike all of those Oscar 2229 01:48:45,720 --> 01:48:47,520 the Grouches a moment ago. 2230 01:48:47,520 --> 01:48:51,150 But by convention, your computer does not just plop things 2231 01:48:51,150 --> 01:48:52,710 in random locations in memory. 2232 01:48:52,710 --> 01:48:55,710 It has certain rules of thumb that it adheres to. 2233 01:48:55,710 --> 01:48:59,460 In particular, it treats different portions of your computer's memory 2234 01:48:59,460 --> 01:49:00,330 in different ways. 2235 01:49:00,330 --> 01:49:03,570 It uses it in a standard way so that it's not completely random. 2236 01:49:03,570 --> 01:49:08,910 For instance, when you run a program by doing dot slash something on CS50 IDE 2237 01:49:08,910 --> 01:49:12,270 or on Linux more generally, or you double click an icon on Mac OS 2238 01:49:12,270 --> 01:49:16,590 or Windows, that triggers the computer's-- 2239 01:49:16,590 --> 01:49:21,030 the program's 0's and 1's stored on your hard drive to be loaded up here, 2240 01:49:21,030 --> 01:49:23,742 to what we'll call machine code, which again, is the 0's and 1's. 2241 01:49:23,742 --> 01:49:25,950 So if you think again, metaphorically, as your memory 2242 01:49:25,950 --> 01:49:29,730 is this rectangular region, then the machine code, 2243 01:49:29,730 --> 01:49:35,732 the 0's and 1's composing your program are loaded into the top part of memory. 2244 01:49:35,732 --> 01:49:38,940 And again, top, bottom, left, right, it has no fundamental technical meaning. 2245 01:49:38,940 --> 01:49:40,470 It's just an artist's rendition. 2246 01:49:40,470 --> 01:49:42,960 But it does go into a standard location. 2247 01:49:42,960 --> 01:49:45,700 Below that are all of your global variables. 2248 01:49:45,700 --> 01:49:48,250 So are your constants that you put outside of your functions. 2249 01:49:48,250 --> 01:49:50,500 Those are going to end up just below the machine code, 2250 01:49:50,500 --> 01:49:53,340 so again, at the top of your computer's memory. 2251 01:49:53,340 --> 01:49:55,200 Below that is what's called the heap. 2252 01:49:55,200 --> 01:49:56,940 And this is a technical term. 2253 01:49:56,940 --> 01:50:00,780 And it refers to a big chunk of memory that malloc 2254 01:50:00,780 --> 01:50:03,640 uses to get you some spare memory. 2255 01:50:03,640 --> 01:50:09,270 Any time you call malloc, you are given the address of some chunk of memory 2256 01:50:09,270 --> 01:50:13,200 up in this region, below the machine code, below your global variables. 2257 01:50:13,200 --> 01:50:15,270 And it's kind of a big zone. 2258 01:50:15,270 --> 01:50:19,120 But the catch is that other parts of your memory are used differently. 2259 01:50:19,120 --> 01:50:24,570 In fact, whereas the heap is considered to be here on down, somewhat 2260 01:50:24,570 --> 01:50:28,830 worrisomely, the stack is considered to be here on up. 2261 01:50:28,830 --> 01:50:32,070 This is to say, when you call malloc and ask for memory, 2262 01:50:32,070 --> 01:50:35,670 that gets allocated up here. 2263 01:50:35,670 --> 01:50:39,540 When you call a function, though, those functions 2264 01:50:39,540 --> 01:50:42,900 use what's called stack space instead of heap space. 2265 01:50:42,900 --> 01:50:48,450 So any time you call a function, main or swap or strlang or string compare 2266 01:50:48,450 --> 01:50:51,330 or any of the functions you've used thus far, 2267 01:50:51,330 --> 01:50:54,150 your computer will automatically store any 2268 01:50:54,150 --> 01:50:58,860 of the local variables or parameters from those functions down here. 2269 01:50:58,860 --> 01:51:00,840 Now, this is not necessarily the best design, 2270 01:51:00,840 --> 01:51:02,550 because you can see the two arrows pointing at one 2271 01:51:02,550 --> 01:51:05,383 another is like two trains barreling down the tracks at one another. 2272 01:51:05,383 --> 01:51:07,268 Bad things can eventually happen. 2273 01:51:07,268 --> 01:51:09,060 Thankfully, we typically have enough memory 2274 01:51:09,060 --> 01:51:12,370 that these two things don't collide, but more on that in just a bit. 2275 01:51:12,370 --> 01:51:15,570 So again, when you call functions, memory down here is used. 2276 01:51:15,570 --> 01:51:17,710 When you use malloc, memory up here is used. 2277 01:51:17,710 --> 01:51:19,710 Now, for my swap function, I'm not using malloc. 2278 01:51:19,710 --> 01:51:21,690 So I don't think I have to worry about heap. 2279 01:51:21,690 --> 01:51:23,283 And I don't have any global variables. 2280 01:51:23,283 --> 01:51:25,200 And I don't really care about my machine code. 2281 01:51:25,200 --> 01:51:27,240 I just need to know that it's stored somewhere. 2282 01:51:27,240 --> 01:51:30,210 But let's consider, then, what the stack is all about. 2283 01:51:30,210 --> 01:51:32,670 The stack, indeed, is this sort of dynamic place 2284 01:51:32,670 --> 01:51:34,860 where memory keeps getting used and reused. 2285 01:51:34,860 --> 01:51:40,440 So for instance, when you call main, as you might when this swap program is 2286 01:51:40,440 --> 01:51:45,010 run, main uses a sliver of memory at the bottom of this picture, if you will. 2287 01:51:45,010 --> 01:51:47,910 So the local variables in main, like x and y, 2288 01:51:47,910 --> 01:51:49,920 end up at this bottom portion of memory. 2289 01:51:49,920 --> 01:51:53,790 When you call swap, swap uses a chunk of memory just above main, 2290 01:51:53,790 --> 01:51:58,350 pictorally, in this diagram, such as variables a and b and temp, 2291 01:51:58,350 --> 01:51:59,410 for that matter. 2292 01:51:59,410 --> 01:52:04,680 And then, once swap returns and is done executing, that sliver of memory 2293 01:52:04,680 --> 01:52:06,010 essentially goes away. 2294 01:52:06,010 --> 01:52:07,230 Now, it doesn't disappear. 2295 01:52:07,230 --> 01:52:09,610 Obviously, there's still physical memory there. 2296 01:52:09,610 --> 01:52:12,810 But that's when we get into the discussion of garbage values again. 2297 01:52:12,810 --> 01:52:15,540 They're still like Oscar the Grouches all over the place. 2298 01:52:15,540 --> 01:52:18,600 You just don't know, or at this point care, what the values are. 2299 01:52:18,600 --> 01:52:20,010 But there are values there. 2300 01:52:20,010 --> 01:52:23,640 And that's why, a moment ago, when I printed out that uninitialized score's 2301 01:52:23,640 --> 01:52:26,970 array, I did see some bogus values, because there's still 2302 01:52:26,970 --> 01:52:30,510 going to be 0's and 1's there that are left over from before. 2303 01:52:30,510 --> 01:52:31,750 The problem, though, is this. 2304 01:52:31,750 --> 01:52:35,070 Let me go over to this physical incarnation of our memory 2305 01:52:35,070 --> 01:52:38,010 and consider this as being our stack, so it's growing on up. 2306 01:52:38,010 --> 01:52:42,060 And in fact, if I want to have two local variables like I do, x and y, 2307 01:52:42,060 --> 01:52:47,400 let's go ahead and think of this row of memory here as being main, 2308 01:52:47,400 --> 01:52:48,870 for instance, here. 2309 01:52:48,870 --> 01:52:51,630 And I'm going to go ahead and replace all these garbage values 2310 01:52:51,630 --> 01:52:53,790 with an actual value that I care about. 2311 01:52:53,790 --> 01:52:57,660 And the actual values that I care about, we're going to call x and y, just as 2312 01:52:57,660 --> 01:52:58,480 before. 2313 01:52:58,480 --> 01:53:01,020 So each of these Oscars happens to be one byte. 2314 01:53:01,020 --> 01:53:02,068 But an int is 4 bytes. 2315 01:53:02,068 --> 01:53:04,110 So thankfully, from our friends in the prop shop, 2316 01:53:04,110 --> 01:53:06,178 we have these bigger integer-sized blocks. 2317 01:53:06,178 --> 01:53:08,220 And I'm going to go ahead and slide this in here. 2318 01:53:08,220 --> 01:53:10,740 And we're going to think of this, in a moment, as x. 2319 01:53:10,740 --> 01:53:14,340 And indeed, I'm going to go ahead and call this x with a marker. 2320 01:53:14,340 --> 01:53:17,760 And then I'm going to go ahead and give myself another integer, a size 4, 2321 01:53:17,760 --> 01:53:19,300 and put it down here. 2322 01:53:19,300 --> 01:53:21,300 And we're going to think of this as y. 2323 01:53:21,300 --> 01:53:23,940 And recall, what do I initialize these values to? 2324 01:53:23,940 --> 01:53:27,690 Well, the value 1, initially, and the value 2. 2325 01:53:27,690 --> 01:53:29,370 But then I called the swap function. 2326 01:53:29,370 --> 01:53:32,160 And the swap function has two arguments, a and b. 2327 01:53:32,160 --> 01:53:38,400 And those, by design, become copies of x and y, because I passed in x comma y. 2328 01:53:38,400 --> 01:53:41,280 And I defined swap as taking a comma b. 2329 01:53:41,280 --> 01:53:44,970 So I think what I need to do, physically here, is now 2330 01:53:44,970 --> 01:53:50,170 think of this second row of memory as now belonging to the swap function, 2331 01:53:50,170 --> 01:53:51,210 not to main. 2332 01:53:51,210 --> 01:53:54,090 And inside of this second row of memory, I'll 2333 01:53:54,090 --> 01:53:57,540 think of this as belonging to swap. 2334 01:53:57,540 --> 01:54:02,100 And within the swap row, I'm going to have another integer of size 4. 2335 01:54:02,100 --> 01:54:07,500 And we're going to call this one a, as down there, a. 2336 01:54:07,500 --> 01:54:10,350 And then I'm going to have another chunk of size 4. 2337 01:54:10,350 --> 01:54:12,600 And we're going to call this b. 2338 01:54:12,600 --> 01:54:16,050 And again, because those are just the arguments, x comma y, otherwise 2339 01:54:16,050 --> 01:54:20,760 now known as a comma b, I copy 1 and 2 into those values. 2340 01:54:20,760 --> 01:54:22,770 But swap has a third variable. 2341 01:54:22,770 --> 01:54:24,730 Brian proposed a temporary variable. 2342 01:54:24,730 --> 01:54:27,480 So I'm going to go ahead and give myself four more bytes, 2343 01:54:27,480 --> 01:54:30,210 thereby getting rid of whatever the garbage value's there 2344 01:54:30,210 --> 01:54:34,260 and actually setting it to an integer call tmp. 2345 01:54:34,260 --> 01:54:39,030 So I'm going to go ahead and call this thing tmp, T-M-P. 2346 01:54:39,030 --> 01:54:40,440 And what did I do first? 2347 01:54:40,440 --> 01:54:43,845 I set tmp equals to a. 2348 01:54:43,845 --> 01:54:45,120 So tmp equals to a. 2349 01:54:45,120 --> 01:54:47,520 So if a is 1, tmp is 1. 2350 01:54:47,520 --> 01:54:48,750 Then what did I do? 2351 01:54:48,750 --> 01:54:51,780 I then did a equals b. 2352 01:54:51,780 --> 01:54:55,150 So b is 2. 2353 01:54:55,150 --> 01:54:57,800 a is 2 as well. 2354 01:54:57,800 --> 01:55:00,030 And then lastly, what did I do? 2355 01:55:00,030 --> 01:55:02,145 I did b gets tmp. 2356 01:55:02,145 --> 01:55:05,020 So I have to go ahead and change this to be whatever the value of tmp 2357 01:55:05,020 --> 01:55:07,630 is, which is now the number 1. 2358 01:55:07,630 --> 01:55:10,150 So you can see that swap is correct insofar 2359 01:55:10,150 --> 01:55:12,655 as it is swapping the values of a and b. 2360 01:55:12,655 --> 01:55:16,690 But the moment swap returns, these return 2361 01:55:16,690 --> 01:55:19,000 to being thought of as garbage values. 2362 01:55:19,000 --> 01:55:20,860 Main is still in the middle of running. 2363 01:55:20,860 --> 01:55:22,300 Swap is no longer running. 2364 01:55:22,300 --> 01:55:23,743 But these values stay there. 2365 01:55:23,743 --> 01:55:24,910 So those are garbage values. 2366 01:55:24,910 --> 01:55:27,850 We happen to know what they are, but they're no longer valid, 2367 01:55:27,850 --> 01:55:32,560 because when I go to print out x and y for the second time, what are x and y? 2368 01:55:32,560 --> 01:55:33,820 They're still the same. 2369 01:55:33,820 --> 01:55:37,870 And so this is to say, when you actually write code that takes arguments 2370 01:55:37,870 --> 01:55:40,750 and you pass arguments from one function to another, 2371 01:55:40,750 --> 01:55:43,930 those arguments are copied from one function to another. 2372 01:55:43,930 --> 01:55:47,140 And indeed, x and y are copied into a and b. 2373 01:55:47,140 --> 01:55:51,670 So your code may very well look correct in that it's swopping correctly. 2374 01:55:51,670 --> 01:55:55,750 But it's only swapping correctly in the context of swap, 2375 01:55:55,750 --> 01:55:58,370 not touching the original values. 2376 01:55:58,370 --> 01:56:00,730 So what I think we need to do, fundamentally, 2377 01:56:00,730 --> 01:56:06,130 is reimplement swap in such a way that we actually 2378 01:56:06,130 --> 01:56:10,450 change the values of x and y. 2379 01:56:10,450 --> 01:56:11,500 But how can we do this? 2380 01:56:11,500 --> 01:56:13,810 Brian, if we could call in someone here. 2381 01:56:13,810 --> 01:56:18,340 How could I conceptually change my implementation of swap 2382 01:56:18,340 --> 01:56:26,110 so that it somehow empowers me to change x and y, not change copies of x and y? 2383 01:56:26,110 --> 01:56:28,570 What could I pass into swap, Brian? 2384 01:56:28,570 --> 01:56:31,150 BRIAN: Igor is suggesting that we use pointers instead. 2385 01:56:31,150 --> 01:56:33,733 DAVID MALAN: Yeah, so perhaps the leading question here today. 2386 01:56:33,733 --> 01:56:36,010 But pointers would seem to give us a solution. 2387 01:56:36,010 --> 01:56:38,170 If pointers are essentially like a treasure 2388 01:56:38,170 --> 01:56:41,500 map to a specific address in your computer's memory, what I should really 2389 01:56:41,500 --> 01:56:45,940 do from main to swap is pass in not x and y literally, 2390 01:56:45,940 --> 01:56:49,630 but why don't I pass in the address of x and the address of y, 2391 01:56:49,630 --> 01:56:53,230 so that swap can now go to those addresses 2392 01:56:53,230 --> 01:56:57,460 and actually do the sort of swap that Brian enacted in person. 2393 01:56:57,460 --> 01:57:02,050 So give the function a sort of map to those values, pointers to those values, 2394 01:57:02,050 --> 01:57:03,560 and then go to those values. 2395 01:57:03,560 --> 01:57:04,580 So how might I do this? 2396 01:57:04,580 --> 01:57:06,580 Well, the code has to be a little different now. 2397 01:57:06,580 --> 01:57:09,640 When I call swap this time, what I really need to do 2398 01:57:09,640 --> 01:57:12,710 is pass in the addresses of these two variables. 2399 01:57:12,710 --> 01:57:14,950 So I don't necessarily know what those addresses are. 2400 01:57:14,950 --> 01:57:16,900 But for the sake of the story, we can just 2401 01:57:16,900 --> 01:57:21,340 assume that this address, for instance, is like, 0x123. 2402 01:57:21,340 --> 01:57:25,142 And then four bytes away from that might be 0x127, for instance. 2403 01:57:25,142 --> 01:57:27,100 But again, it doesn't really matter what it is. 2404 01:57:27,100 --> 01:57:29,440 But they do have addresses, x and y. 2405 01:57:29,440 --> 01:57:31,562 So a pointer recall tends to be pretty big. 2406 01:57:31,562 --> 01:57:33,520 So we needed to get out a bigger piece of wood, 2407 01:57:33,520 --> 01:57:35,590 eight bytes that represents a pointer. 2408 01:57:35,590 --> 01:57:38,830 And I actually need to use a bit more memory in swap now. 2409 01:57:38,830 --> 01:57:42,490 If I now declare a to be, not an integer, 2410 01:57:42,490 --> 01:57:47,020 but a pointer to an int, that is a int star variable, 2411 01:57:47,020 --> 01:57:49,330 I could call this thing a now. 2412 01:57:49,330 --> 01:57:54,340 And I could store, in it, the address of x, like 0x123. 2413 01:57:54,340 --> 01:57:57,640 If I then change the definition of b to be 2414 01:57:57,640 --> 01:58:01,390 not an integer, but a pointer to an integer, 2415 01:58:01,390 --> 01:58:04,810 that is another int star, which happens to be eight bytes. 2416 01:58:04,810 --> 01:58:07,780 I'm going to use a little more memory for this thing, but that's OK. 2417 01:58:07,780 --> 01:58:10,030 And its name is going to be b now. 2418 01:58:10,030 --> 01:58:13,600 And it's going to contain 0x127. 2419 01:58:13,600 --> 01:58:15,820 I still need a temporary variable. 2420 01:58:15,820 --> 01:58:18,650 I still need a temporary variable, but that's fine. 2421 01:58:18,650 --> 01:58:20,980 I just need four bytes for that, because the variable 2422 01:58:20,980 --> 01:58:25,990 itself just needs to store an int, like Brian temporarily stored it in a glass. 2423 01:58:25,990 --> 01:58:29,260 So I just need an additional four bytes, like before, for that. 2424 01:58:29,260 --> 01:58:31,720 And now, let's just consider the logic. 2425 01:58:31,720 --> 01:58:32,710 Here's main. 2426 01:58:32,710 --> 01:58:34,990 And swap is now using these 3-- 2427 01:58:34,990 --> 01:58:36,550 2 and 1/2 rows of memory. 2428 01:58:36,550 --> 01:58:37,240 And that's fine. 2429 01:58:37,240 --> 01:58:39,640 It's growing upward as I proposed. 2430 01:58:39,640 --> 01:58:41,860 X is at address 0x123. 2431 01:58:41,860 --> 01:58:44,560 y is at address 0x127. 2432 01:58:44,560 --> 01:58:48,370 Therefore, a and b, I propose conceptually, like Igor proposed, 2433 01:58:48,370 --> 01:58:52,280 store the addresses of a, x and y, respectively. 2434 01:58:52,280 --> 01:58:55,060 And now my code, I think, needs to say this. 2435 01:58:55,060 --> 01:59:00,025 Go and store, in the variable tmp, whatever is at the address a. 2436 01:59:00,025 --> 01:59:02,650 So you can kind of think of this as being an arrow down here. 2437 01:59:02,650 --> 01:59:03,910 Follow the arrow, OK. 2438 01:59:03,910 --> 01:59:06,010 What is at address 0x123? 2439 01:59:06,010 --> 01:59:06,910 The number 1. 2440 01:59:06,910 --> 01:59:09,250 So we put one in tmp, just like before. 2441 01:59:09,250 --> 01:59:10,310 Then what do we do? 2442 01:59:10,310 --> 01:59:13,540 Well, now, I'm going to go ahead and change, not the value of a, 2443 01:59:13,540 --> 01:59:18,010 but I'm going to change what is at the location in a to be 2444 01:59:18,010 --> 01:59:24,790 whatever is at the location in b, which is an arrow pointing down here, 0x127. 2445 01:59:24,790 --> 01:59:27,850 So I'm going to change this 1, now, to be a 2. 2446 01:59:27,850 --> 01:59:30,910 And the third and final step, recall, is for me, now, 2447 01:59:30,910 --> 01:59:37,150 to go, not to b, but to go where b points to, which happens to be y, 2448 01:59:37,150 --> 01:59:42,440 and change that to be the value of tmp, which of course, is up here. 2449 01:59:42,440 --> 01:59:45,430 And at this point in the story, it's still just three lines of code. 2450 01:59:45,430 --> 01:59:47,380 They're different types of lines of code. 2451 01:59:47,380 --> 01:59:48,950 It's three lines of code. 2452 01:59:48,950 --> 01:59:52,180 But when swap is done executing, notice what we've done. 2453 01:59:52,180 --> 01:59:55,190 We have successfully swapped x and y by letting 2454 01:59:55,190 --> 01:59:59,270 swap go to those addresses as opposed to just naively getting 2455 01:59:59,270 --> 02:00:02,180 copies of the values therein. 2456 02:00:02,180 --> 02:00:05,150 Now, even though this code is going to look a little cryptic, 2457 02:00:05,150 --> 02:00:10,820 it's, frankly, just an application of the logic we've seen thus far. 2458 02:00:10,820 --> 02:00:13,860 I'm going to go ahead and go back to my old buggy version. 2459 02:00:13,860 --> 02:00:15,860 And I'm going to change the definition of swap 2460 02:00:15,860 --> 02:00:19,190 to say that it doesn't take two integers, a and b, but two 2461 02:00:19,190 --> 02:00:20,810 pointers to integers a and b. 2462 02:00:20,810 --> 02:00:24,080 And the way you declare a pointer recall is the type of variable 2463 02:00:24,080 --> 02:00:26,767 you point at followed by a star and then the name of it. 2464 02:00:26,767 --> 02:00:28,850 And we haven't seen it, admittedly, in the context 2465 02:00:28,850 --> 02:00:31,550 of a function taking parameters yet. 2466 02:00:31,550 --> 02:00:33,170 But it's quite simply that. 2467 02:00:33,170 --> 02:00:34,610 I added the stars. 2468 02:00:34,610 --> 02:00:40,040 Down here, I need to say, store in tmp, whatever is at a. 2469 02:00:40,040 --> 02:00:41,870 How do I express go to a? 2470 02:00:41,870 --> 02:00:43,520 Just add a star here. 2471 02:00:43,520 --> 02:00:46,880 How do I express go to a and put whatever is at b? 2472 02:00:46,880 --> 02:00:48,500 I add stars there. 2473 02:00:48,500 --> 02:00:51,560 How do I say, go to b and store whatever is at tmp? 2474 02:00:51,560 --> 02:00:53,190 I add one star there. 2475 02:00:53,190 --> 02:00:55,520 So tmp is just a simple integer. 2476 02:00:55,520 --> 02:00:57,380 It's just an empty glass like Brian had. 2477 02:00:57,380 --> 02:00:58,620 There's nothing fancy there. 2478 02:00:58,620 --> 02:01:00,650 So we don't need stars around tmp. 2479 02:01:00,650 --> 02:01:04,970 But I do, now, need to change how I'm using a and b, 2480 02:01:04,970 --> 02:01:08,330 because now they are addresses that I actually want to go to. 2481 02:01:08,330 --> 02:01:12,140 There's no need for the address of operator in this context. 2482 02:01:12,140 --> 02:01:14,330 But up here, I'm going to need to make a change. 2483 02:01:14,330 --> 02:01:16,380 I do need to change the prototype to match. 2484 02:01:16,380 --> 02:01:18,200 So that's just a copy paste. 2485 02:01:18,200 --> 02:01:23,120 But I bet you can imagine what, lastly, needs to change. 2486 02:01:23,120 --> 02:01:26,750 When calling swap, I don't want to pass in naively x and y, because again, 2487 02:01:26,750 --> 02:01:27,980 they're going to get copied. 2488 02:01:27,980 --> 02:01:32,000 I want to pass in the address of x and the address of y, 2489 02:01:32,000 --> 02:01:35,690 so that swap now has sort of special access 2490 02:01:35,690 --> 02:01:38,750 to the contents of those locations in memory 2491 02:01:38,750 --> 02:01:42,740 so that it actually can make some changes therein. 2492 02:01:42,740 --> 02:01:47,780 And that, indeed, if I now recompile this program, make swap, and I do 2493 02:01:47,780 --> 02:01:50,390 dot swap and cross my fingers, voila. 2494 02:01:50,390 --> 02:01:53,855 Now, I have successfully swapped lines of code. 2495 02:01:53,855 --> 02:01:55,730 So last week, if you were wondering, perhaps, 2496 02:01:55,730 --> 02:01:58,250 why we didn't show you how to do swap, we could have. 2497 02:01:58,250 --> 02:01:59,900 And we didn't need a special function. 2498 02:01:59,900 --> 02:02:03,200 You don't necessarily need pointers if we did all of this in main. 2499 02:02:03,200 --> 02:02:06,470 But I'm trying to introduce an abstraction, this function that 2500 02:02:06,470 --> 02:02:09,740 does swap just like Brian swapped those glasses for us. 2501 02:02:09,740 --> 02:02:12,650 And to pass values from one function to another, 2502 02:02:12,650 --> 02:02:15,990 you do need to understand what's going on in your computer's memory 2503 02:02:15,990 --> 02:02:18,830 so that you can actually pass in little breadcrumbs again, 2504 02:02:18,830 --> 02:02:23,330 treasure maps to those locations and memories, again, thanks to these things 2505 02:02:23,330 --> 02:02:25,100 called pointers. 2506 02:02:25,100 --> 02:02:27,770 All right, well let me propose and emphasize, 2507 02:02:27,770 --> 02:02:30,770 then, that this design of the heap being up at the top, 2508 02:02:30,770 --> 02:02:33,200 where malloc uses memory and the stack being 2509 02:02:33,200 --> 02:02:35,540 at the bottom where your own functions use memory, 2510 02:02:35,540 --> 02:02:37,730 this is a problem clearly waiting to happen. 2511 02:02:37,730 --> 02:02:39,460 And those problems actually have names. 2512 02:02:39,460 --> 02:02:41,210 And some of you who have programmed before 2513 02:02:41,210 --> 02:02:45,230 might know some of these terms, either heap overflow or stack overflow. 2514 02:02:45,230 --> 02:02:48,650 And in fact, many of you might know stackoverflow.com as just a website. 2515 02:02:48,650 --> 02:02:50,840 Well, there is an origin story to its name. 2516 02:02:50,840 --> 02:02:56,240 A stack overflow refers to the process of calling a function so many times 2517 02:02:56,240 --> 02:02:58,550 that it overflows the heap. 2518 02:02:58,550 --> 02:03:00,320 That is, every time you call the function, 2519 02:03:00,320 --> 02:03:04,950 like I did here, you use more and more rows, so to speak, of memory. 2520 02:03:04,950 --> 02:03:07,730 And if you call so many functions again and again, 2521 02:03:07,730 --> 02:03:11,690 eventually, you may very well run over the area of memory called heap. 2522 02:03:11,690 --> 02:03:14,090 And at that point, your program will crash. 2523 02:03:14,090 --> 02:03:18,950 There is no fundamental solution to that problem other than don't do that. 2524 02:03:18,950 --> 02:03:20,420 Don't use too much memory. 2525 02:03:20,420 --> 02:03:21,680 But that can be hard to do. 2526 02:03:21,680 --> 02:03:24,138 And indeed, that's one of the dangers of programming today. 2527 02:03:24,138 --> 02:03:27,800 And we can actually induce this a little bit deliberately ourselves. 2528 02:03:27,800 --> 02:03:30,620 And in fact, I thought we could revisit, for instance, 2529 02:03:30,620 --> 02:03:34,220 where we left off with Mario last time, which was this picture here. 2530 02:03:34,220 --> 02:03:37,580 Recall that this was a pyramid, of course, 2531 02:03:37,580 --> 02:03:40,400 simpler than the one you might have played with for problems at 0. 2532 02:03:40,400 --> 02:03:44,360 But it's a recursive pyramid in that you can define a pyramid of height 4, 2533 02:03:44,360 --> 02:03:47,690 in terms of a pyramid of height 3, in terms of a pyramid of height 2 2534 02:03:47,690 --> 02:03:48,380 and a height 1. 2535 02:03:48,380 --> 02:03:52,580 And indeed, I built that last week using these very blocks. 2536 02:03:52,580 --> 02:03:56,180 Well, you can implement Mario's pyramid like this 2537 02:03:56,180 --> 02:03:57,660 in a couple of different ways. 2538 02:03:57,660 --> 02:04:01,160 One is just using week one style iteration, using a loop. 2539 02:04:01,160 --> 02:04:03,890 And in fact, let me go ahead and whip up a quick solution that 2540 02:04:03,890 --> 02:04:05,340 does exactly that. 2541 02:04:05,340 --> 02:04:07,730 Let me go ahead and call this mario.c. 2542 02:04:07,730 --> 02:04:10,610 And I'm going to go ahead and include cs50.h. 2543 02:04:10,610 --> 02:04:12,290 So we can use one of our get functions. 2544 02:04:12,290 --> 02:04:14,300 I'm going to use standard io dot h. 2545 02:04:14,300 --> 02:04:16,160 And I'm going to do int main void. 2546 02:04:16,160 --> 02:04:18,590 And all I want to do is print out this pyramid. 2547 02:04:18,590 --> 02:04:20,340 But I want to ask the user for the height. 2548 02:04:20,340 --> 02:04:23,090 So I'm going to say int height equals get int. 2549 02:04:23,090 --> 02:04:26,870 And we'll ask the user for the height, just like you did for problem set 1. 2550 02:04:26,870 --> 02:04:30,000 And then I'm going to go ahead and draw a pyramid of that height. 2551 02:04:30,000 --> 02:04:31,340 Now, draw doesn't exist. 2552 02:04:31,340 --> 02:04:32,030 But that's fine. 2553 02:04:32,030 --> 02:04:34,735 I'm going to go ahead and draw this now, implement draw myself. 2554 02:04:34,735 --> 02:04:36,860 It doesn't need to return a value, because I'm just 2555 02:04:36,860 --> 02:04:38,273 printing stuff on the screen. 2556 02:04:38,273 --> 02:04:40,190 Function's called draw, and it's going to take 2557 02:04:40,190 --> 02:04:42,710 an input called h, for instance. h for height, 2558 02:04:42,710 --> 02:04:45,080 but I could call its argument anything I want. 2559 02:04:45,080 --> 02:04:48,650 And then I'm just going to do this, for int i gets 1, 2560 02:04:48,650 --> 02:04:52,850 i less than or equal to h, i++. 2561 02:04:52,850 --> 02:04:56,170 And then inside of this, this is where you might recall, from problem set one, 2562 02:04:56,170 --> 02:04:58,700 have found a nested loop to be useful. 2563 02:04:58,700 --> 02:05:04,150 Let me do int j gets 1, j less than or equal to i, j++. 2564 02:05:04,150 --> 02:05:08,178 This will be similar but not identical to either the less comfortable or more 2565 02:05:08,178 --> 02:05:09,970 comfortable version of Mario from the past, 2566 02:05:09,970 --> 02:05:13,240 because this pyramid is shaped in a different direction. 2567 02:05:13,240 --> 02:05:15,610 Now, you print a hash there. 2568 02:05:15,610 --> 02:05:17,830 And then let me go ahead and print a new line here. 2569 02:05:17,830 --> 02:05:19,570 So I did this super quickly. 2570 02:05:19,570 --> 02:05:21,880 But logically, what I'm doing is iterating 2571 02:05:21,880 --> 02:05:29,710 over every row, so from 1 through h, so row 1, 2, 3, 4, for instance. 2572 02:05:29,710 --> 02:05:34,210 And then on each row, I'm deliberately iterating from 1 through i. 2573 02:05:34,210 --> 02:05:37,870 So I print 1, then 2, then 3, then 4. 2574 02:05:37,870 --> 02:05:39,640 And again, I could zero index if I want. 2575 02:05:39,640 --> 02:05:44,170 I find that in this context, more user friendly, more intelligible to me 2576 02:05:44,170 --> 02:05:46,660 to index from 1, totally reasonable if you think 2577 02:05:46,660 --> 02:05:48,310 there's a compelling design argument. 2578 02:05:48,310 --> 02:05:50,030 So let me go ahead and make Mario. 2579 02:05:50,030 --> 02:05:51,520 Ah, darn it. 2580 02:05:51,520 --> 02:05:53,980 Oh, I missed my prototype. 2581 02:05:53,980 --> 02:05:55,870 So notice, it's not understanding draw. 2582 02:05:55,870 --> 02:05:58,900 So the fix for that is to either move the whole function 2583 02:05:58,900 --> 02:06:02,980 or, as we've preached instead, to just put your prototype up top. 2584 02:06:02,980 --> 02:06:05,050 Let me recompile Mario. 2585 02:06:05,050 --> 02:06:06,430 OK, now successful. 2586 02:06:06,430 --> 02:06:08,710 Mario, let's do a height of 4, and voila. 2587 02:06:08,710 --> 02:06:11,350 Now, I have a relatively simple-- though I certainly 2588 02:06:11,350 --> 02:06:13,600 did it faster than you might without some practice-- 2589 02:06:13,600 --> 02:06:15,760 implementation of Mario's pyramid. 2590 02:06:15,760 --> 02:06:17,980 But here's where things get kind of cool. 2591 02:06:17,980 --> 02:06:20,800 Let me stipulate that that is a correct iterative solution, even 2592 02:06:20,800 --> 02:06:24,970 if it might take you some number of steps or trial and error 2593 02:06:24,970 --> 02:06:28,180 to get that iterative loop-based code correct. 2594 02:06:28,180 --> 02:06:30,580 Let me change this, now, to be recursive. 2595 02:06:30,580 --> 02:06:34,510 And recall, a recursive function is one that calls itself. 2596 02:06:34,510 --> 02:06:37,660 How do you print a pyramid of height h? 2597 02:06:37,660 --> 02:06:41,980 Well, recall that you print a pyramid of height h minus 1, 2598 02:06:41,980 --> 02:06:45,340 and then you proceed to print one more row of blocks. 2599 02:06:45,340 --> 02:06:48,970 So let me take that literally. for int i gets zero. 2600 02:06:48,970 --> 02:06:51,550 i is less than h, i++. 2601 02:06:51,550 --> 02:06:54,550 Let me go ahead and just print that extra row of bricks 2602 02:06:54,550 --> 02:06:58,480 like this, followed by a new line. 2603 02:06:58,480 --> 02:07:00,260 So now, I did this kind of fast. 2604 02:07:00,260 --> 02:07:01,340 But what am I doing here? 2605 02:07:01,340 --> 02:07:06,520 Well, if the height equals 1, I want this loop to iterate one time. 2606 02:07:06,520 --> 02:07:10,760 If the height equals 2, I wanted to iterate two times, 3, and so forth. 2607 02:07:10,760 --> 02:07:14,260 So I think, using my zero-indexing technique here, this will work too. 2608 02:07:14,260 --> 02:07:17,080 But if you prefer, I could certainly just change this to a 1 2609 02:07:17,080 --> 02:07:18,638 and change this 2. 2610 02:07:18,638 --> 02:07:19,930 But I'm going to go ahead and-- 2611 02:07:19,930 --> 02:07:20,500 actually, no. 2612 02:07:20,500 --> 02:07:23,350 In this case, I want to leave it as such, zero index, 2613 02:07:23,350 --> 02:07:25,450 just like we typically do. 2614 02:07:25,450 --> 02:07:29,200 All right, let me go ahead and compile this, make Mario. 2615 02:07:29,200 --> 02:07:31,870 OK, oops, interesting. 2616 02:07:31,870 --> 02:07:34,940 All paths through this function will call itself. 2617 02:07:34,940 --> 02:07:37,780 So clang is being kind of smart here, whereby, 2618 02:07:37,780 --> 02:07:42,260 it's noticing that in my draw function, I'm calling my draw function. 2619 02:07:42,260 --> 02:07:44,358 And that's a process that never changes. 2620 02:07:44,358 --> 02:07:46,150 In fact, let me see if I can override that. 2621 02:07:46,150 --> 02:07:51,310 Let me use clang manually and compile a program called mario using mario.c. 2622 02:07:51,310 --> 02:07:53,140 And let me go ahead and link in cs50. 2623 02:07:53,140 --> 02:07:55,960 So I'm using our old school syntax from week two. 2624 02:07:55,960 --> 02:07:56,980 OK, that compiled. 2625 02:07:56,980 --> 02:07:58,270 And why did that compile? 2626 02:07:58,270 --> 02:08:01,872 Well, make is, again, a program that uses your compiler clang. 2627 02:08:01,872 --> 02:08:05,080 And we've configured make to be a little more user-friendly and a little more 2628 02:08:05,080 --> 02:08:07,450 protective of you by turning on special features 2629 02:08:07,450 --> 02:08:09,250 where we detect problems like that. 2630 02:08:09,250 --> 02:08:12,730 By using clang directly now, I'm disabling those special checks. 2631 02:08:12,730 --> 02:08:16,840 And watch what happens when I run Mario now for height of 4, for instance. 2632 02:08:16,840 --> 02:08:18,730 Boom, it crashed. 2633 02:08:18,730 --> 02:08:20,500 It didn't even print anything. 2634 02:08:20,500 --> 02:08:21,953 It crashed pretty quickly. 2635 02:08:21,953 --> 02:08:25,120 And again, a segmentation fault means you touched memory that you shouldn't. 2636 02:08:25,120 --> 02:08:26,200 So what's going on? 2637 02:08:26,200 --> 02:08:30,302 Well, if you think of this memory as representing main still, but then draw, 2638 02:08:30,302 --> 02:08:33,610 draw, draw, draw, draw, draw. 2639 02:08:33,610 --> 02:08:37,540 If every one of your calls to draw just cause draw again, 2640 02:08:37,540 --> 02:08:39,070 why would it ever stop? 2641 02:08:39,070 --> 02:08:41,590 It wouldn't seem to stop here, necessarily. 2642 02:08:41,590 --> 02:08:45,070 So it seems that I'm missing a key detail in my recursive version. 2643 02:08:45,070 --> 02:08:45,670 You know what? 2644 02:08:45,670 --> 02:08:51,130 If there's nothing to draw, if height equals equals 0, let me go ahead, then, 2645 02:08:51,130 --> 02:08:54,260 and just return immediately. 2646 02:08:54,260 --> 02:08:57,250 Otherwise, I'll go ahead and draw part of the pyramid 2647 02:08:57,250 --> 02:08:59,260 and then add the new row. 2648 02:08:59,260 --> 02:09:02,110 So you need this so-called base case, which you literally 2649 02:09:02,110 --> 02:09:05,410 choose to equal some simple value, like height of 0, height of 1, 2650 02:09:05,410 --> 02:09:10,880 any hardcoded value, so that eventually, draw does not call itself. 2651 02:09:10,880 --> 02:09:15,040 So let me go ahead and recompile this with clang or make. 2652 02:09:15,040 --> 02:09:18,430 Let me rerun it, height of 4, and voila. 2653 02:09:18,430 --> 02:09:20,680 It's still working just like the interior version, 2654 02:09:20,680 --> 02:09:22,340 but it's now using recursion. 2655 02:09:22,340 --> 02:09:24,250 So here's a sort of design question. 2656 02:09:24,250 --> 02:09:26,020 Is iteration better than recursion? 2657 02:09:26,020 --> 02:09:26,680 It depends. 2658 02:09:26,680 --> 02:09:28,270 Iteration will always work. 2659 02:09:28,270 --> 02:09:32,290 When using the iterative version, I will never overflow the stack 2660 02:09:32,290 --> 02:09:33,140 and hit the heap. 2661 02:09:33,140 --> 02:09:33,640 Why? 2662 02:09:33,640 --> 02:09:35,723 Because I'm not calling functions again and again. 2663 02:09:35,723 --> 02:09:38,410 There's only main and one invocation of draw. 2664 02:09:38,410 --> 02:09:42,550 But with the recursive version, it's kind of a cool, powerful way 2665 02:09:42,550 --> 02:09:43,270 to do things. 2666 02:09:43,270 --> 02:09:45,610 Like, oh, I can draw you a pyramid of height h. 2667 02:09:45,610 --> 02:09:48,370 Let me just have you draw me a pyramid of height h minus 1, 2668 02:09:48,370 --> 02:09:49,750 and then I'll add a row. 2669 02:09:49,750 --> 02:09:54,950 It's kind of this clever, cyclical argument that does work very elegantly. 2670 02:09:54,950 --> 02:09:56,150 But there's a danger. 2671 02:09:56,150 --> 02:10:00,830 And in fact, even though this base case ensures that it doesn't go forever, 2672 02:10:00,830 --> 02:10:05,180 it could go on so long-- maybe let's try 10,000 invocations. 2673 02:10:05,180 --> 02:10:06,290 So that worked OK. 2674 02:10:06,290 --> 02:10:07,820 It's a little slow. 2675 02:10:07,820 --> 02:10:09,320 I'm losing control over my keyboard. 2676 02:10:09,320 --> 02:10:10,730 So Control C is your friend. 2677 02:10:10,730 --> 02:10:12,050 Let me try this once more. 2678 02:10:12,050 --> 02:10:16,700 Let me go ahead and do something like 2 billion and see if that works. 2679 02:10:16,700 --> 02:10:17,540 Boom. 2680 02:10:17,540 --> 02:10:19,110 So even that doesn't work. 2681 02:10:19,110 --> 02:10:21,710 So there's this inherent danger with recursion, whereby, 2682 02:10:21,710 --> 02:10:25,010 even though it empowered us last week to solve a problem even more efficiently 2683 02:10:25,010 --> 02:10:29,810 with merge sort, we kind of got lucky, in that we weren't trying to crazy big 2684 02:10:29,810 --> 02:10:33,080 things on Brian's shelf, because it would seem if you use recursion 2685 02:10:33,080 --> 02:10:35,330 and call yourself again and again and again and again, 2686 02:10:35,330 --> 02:10:40,340 even finitely many times, you might eventually touch memory you shouldn't. 2687 02:10:40,340 --> 02:10:42,290 And what's the solution here? 2688 02:10:42,290 --> 02:10:44,510 Unfortunately, it's don't do that. 2689 02:10:44,510 --> 02:10:48,020 Design your algorithms, choose your inputs in such a way 2690 02:10:48,020 --> 02:10:49,560 that there just isn't that risk. 2691 02:10:49,560 --> 02:10:51,800 And we'll use recursion again in a few weeks 2692 02:10:51,800 --> 02:10:54,800 time when we look at more sophisticated data structures. 2693 02:10:54,800 --> 02:10:56,600 But again, there's always this trade off. 2694 02:10:56,600 --> 02:10:58,725 Just because you can design something a little more 2695 02:10:58,725 --> 02:11:03,120 elegantly doesn't necessarily mean that it's always going to work for you. 2696 02:11:03,120 --> 02:11:06,560 But more commonly, are you likely to run into other problems as well? 2697 02:11:06,560 --> 02:11:08,540 There's something called a buffer overflow. 2698 02:11:08,540 --> 02:11:10,880 And this you will surely trip over in the coming weeks. 2699 02:11:10,880 --> 02:11:13,610 A buffer overflow is when you allocate an array 2700 02:11:13,610 --> 02:11:15,590 and go too far past the end of it. 2701 02:11:15,590 --> 02:11:18,650 Or you use malloc and you, nonetheless, go farther 2702 02:11:18,650 --> 02:11:21,020 than the end of the chunk of memory that you allocated. 2703 02:11:21,020 --> 02:11:25,010 A buffer it's just a chunk of memory, so to speak, that you can use as you see 2704 02:11:25,010 --> 02:11:25,550 fit. 2705 02:11:25,550 --> 02:11:30,230 Buffer overflow means going beyond the boundaries of that array. 2706 02:11:30,230 --> 02:11:32,930 You might use-- you're using, right now, video. 2707 02:11:32,930 --> 02:11:35,125 You might know the phrase buffering from videos, 2708 02:11:35,125 --> 02:11:37,250 like sort of buffering and annoying you on Netflix, 2709 02:11:37,250 --> 02:11:39,050 because there's a spinning icon or whatnot. 2710 02:11:39,050 --> 02:11:40,700 Well, that means exactly this. 2711 02:11:40,700 --> 02:11:44,090 A buffer, in the context of YouTube or Zoom or Netflix, 2712 02:11:44,090 --> 02:11:46,910 means some chunk of memory that was retrieved 2713 02:11:46,910 --> 02:11:49,880 via malloc or some similar tool that gets filled 2714 02:11:49,880 --> 02:11:52,580 with bytes comprising your video. 2715 02:11:52,580 --> 02:11:56,210 And it's finite, which is why you can only buffer so many seconds 2716 02:11:56,210 --> 02:11:59,520 or minutes of video before, eventually, if you're offline, 2717 02:11:59,520 --> 02:12:01,220 you run out of video content to watch. 2718 02:12:01,220 --> 02:12:02,930 And the stupid icon comes up, and you can 2719 02:12:02,930 --> 02:12:07,680 watch no more, because a buffer is just a chunk of memory, an array of memory. 2720 02:12:07,680 --> 02:12:12,830 And if Netflix or Google or others were to implement their code unsafely, 2721 02:12:12,830 --> 02:12:16,740 they might very well go too far past that boundary as well. 2722 02:12:16,740 --> 02:12:22,070 So with all this said, let's consider, in some of our final minutes 2723 02:12:22,070 --> 02:12:26,000 here today, just what else we've been getting from these training wheels, 2724 02:12:26,000 --> 02:12:28,830 because we do want to take them mostly off for you. 2725 02:12:28,830 --> 02:12:30,890 So the CS50 library not only provides you 2726 02:12:30,890 --> 02:12:33,855 with this abstraction of a string type, which again, 2727 02:12:33,855 --> 02:12:35,480 doesn't give you any new functionality. 2728 02:12:35,480 --> 02:12:38,600 Strings in C exist, just not by that name. 2729 02:12:38,600 --> 02:12:40,850 They're known more properly as char stars. 2730 02:12:40,850 --> 02:12:43,730 But all of these functions in the CS50 library 2731 02:12:43,730 --> 02:12:49,490 can be implemented with other actual C functions that weren't from CS50, 2732 02:12:49,490 --> 02:12:51,740 namely using one called scanf. 2733 02:12:51,740 --> 02:12:54,260 But you're going to see, immediately, some of the dangers 2734 02:12:54,260 --> 02:12:57,980 of using something like scanf, which is an old school function. 2735 02:12:57,980 --> 02:13:01,280 It was not designed to be self-defensive like CS50's library. 2736 02:13:01,280 --> 02:13:03,510 And so it's very easy to make mistakes. 2737 02:13:03,510 --> 02:13:06,650 Let me go ahead, for instance, and create a file 2738 02:13:06,650 --> 02:13:09,860 called scanf.c, just to demonstrate this function. 2739 02:13:09,860 --> 02:13:13,200 I'm not going to use the CS50 library, just standard io dot h. 2740 02:13:13,200 --> 02:13:15,470 And I'm going to give myself int main void. 2741 02:13:15,470 --> 02:13:18,110 And I'm going to go ahead and give myself a variable x. 2742 02:13:18,110 --> 02:13:21,260 And I'm going to go ahead and print out quote unquote, "x:" 2743 02:13:21,260 --> 02:13:24,060 just like CS50's get int function does. 2744 02:13:24,060 --> 02:13:25,940 And then I'm going to call scanf. 2745 02:13:25,940 --> 02:13:30,170 And I'm going to go ahead and say, scan from the user's keyboard, an integer, 2746 02:13:30,170 --> 02:13:33,708 and store it in the location of x. 2747 02:13:33,708 --> 02:13:35,750 Then, I'm going to go ahead and print out, again, 2748 02:13:35,750 --> 02:13:40,340 x, and a colon and a backslash percent i backslash n. 2749 02:13:40,340 --> 02:13:41,420 And I'm going to print x. 2750 02:13:41,420 --> 02:13:42,830 So what's going on here? 2751 02:13:42,830 --> 02:13:46,580 In line 5, I'm declaring a variable called x, just like in week one. 2752 02:13:46,580 --> 02:13:49,220 Line 6, just using printf, like in week one. 2753 02:13:49,220 --> 02:13:52,460 The interesting stuff seems to be in line 7. 2754 02:13:52,460 --> 02:13:56,870 Scanf is a function that takes input from the user, just like get int, get 2755 02:13:56,870 --> 02:13:58,500 string, get float, and so forth. 2756 02:13:58,500 --> 02:14:02,630 But it does it only by you having to understand pointers, 2757 02:14:02,630 --> 02:14:07,790 because recall from our swap example, if you want to have a function, 2758 02:14:07,790 --> 02:14:12,110 change the contents of a variable, as we did with a and b 2759 02:14:12,110 --> 02:14:15,920 and x and y, you have to pass in the address of the variable, whose 2760 02:14:15,920 --> 02:14:17,060 value you want to change. 2761 02:14:17,060 --> 02:14:19,200 You can't just pass in x itself. 2762 02:14:19,200 --> 02:14:22,263 So if we didn't use the CS50 library in week one, 2763 02:14:22,263 --> 02:14:25,430 you would have been writing code like this just to get an int from the user. 2764 02:14:25,430 --> 02:14:27,347 And you would have had to understand pointers. 2765 02:14:27,347 --> 02:14:30,170 And you would have to understand ampersand and stars and so forth. 2766 02:14:30,170 --> 02:14:32,712 It's just too much, when all we care about in the first weeks 2767 02:14:32,712 --> 02:14:35,990 are loops and variables and conditions and sort of the fundamentals. 2768 02:14:35,990 --> 02:14:39,230 But here, we now have the ability to call scanf, tell it 2769 02:14:39,230 --> 02:14:41,150 to scan from the user's keyboard, so to speak, 2770 02:14:41,150 --> 02:14:45,380 an integer, or percent f would give us a float or other such codes, 2771 02:14:45,380 --> 02:14:49,040 and pass in the address of x so that scanf can go to that address 2772 02:14:49,040 --> 02:14:51,440 and put the integer from the user's keyboard there. 2773 02:14:51,440 --> 02:14:53,030 Line 8 is like week one stuff. 2774 02:14:53,030 --> 02:14:54,680 I'm just printing out the value. 2775 02:14:54,680 --> 02:14:55,950 And this is pretty safe. 2776 02:14:55,950 --> 02:14:57,800 I'm going to go ahead and make scanf. 2777 02:14:57,800 --> 02:14:58,495 It compiles OK. 2778 02:14:58,495 --> 02:14:59,870 I'm going to go ahead and run it. 2779 02:14:59,870 --> 02:15:00,980 I'm going to type in 50. 2780 02:15:00,980 --> 02:15:03,180 And voila, it prints out a 50. 2781 02:15:03,180 --> 02:15:06,920 But there's some weirdness, because if you run this program too 2782 02:15:06,920 --> 02:15:09,410 and type in cat, well then x is 0. 2783 02:15:09,410 --> 02:15:10,940 And there's no error checking. 2784 02:15:10,940 --> 02:15:12,767 So immediately, you should glimpse that one 2785 02:15:12,767 --> 02:15:14,600 of the features of the CS50 library, recall, 2786 02:15:14,600 --> 02:15:17,630 is that we keep prompting the user again and again if they're not 2787 02:15:17,630 --> 02:15:19,310 cooperating and giving you an int. 2788 02:15:19,310 --> 02:15:21,740 So that's one feature you get from the library. 2789 02:15:21,740 --> 02:15:26,120 But it turns out that get string is even more powerful, 2790 02:15:26,120 --> 02:15:29,000 because if I go and change this program now, not to get an int, 2791 02:15:29,000 --> 02:15:30,710 but something fancier like a string-- 2792 02:15:30,710 --> 02:15:33,223 or wait, we're calling it char star now. 2793 02:15:33,223 --> 02:15:35,390 I'm going to go ahead and do something very similar. 2794 02:15:35,390 --> 02:15:37,640 I'm going to prompt the user for string s. 2795 02:15:37,640 --> 02:15:39,020 And I'm going to use scanf. 2796 02:15:39,020 --> 02:15:42,320 And I'm going to use percent s, just like printf uses percent s. 2797 02:15:42,320 --> 02:15:44,510 And I'm going to pass in s. 2798 02:15:44,510 --> 02:15:48,890 Now, to be clear, I don't need to do ampersand s here, 2799 02:15:48,890 --> 02:15:53,010 because now, we all know that s is fundamentally an address. 2800 02:15:53,010 --> 02:15:56,270 So it suffices just to pass in the address that you already have. 2801 02:15:56,270 --> 02:16:01,280 Now, I'm going to go ahead and print out s colon, percent s backslash n, 2802 02:16:01,280 --> 02:16:02,930 and print out s. 2803 02:16:02,930 --> 02:16:07,730 But when I compile this, make scanf, it doesn't like it 2804 02:16:07,730 --> 02:16:10,970 when I compile variable s's uninitialized when used here. 2805 02:16:10,970 --> 02:16:14,390 All right, well if I really want to be sort of adventurous, 2806 02:16:14,390 --> 02:16:16,350 I can override make's protections. 2807 02:16:16,350 --> 02:16:19,880 And I can just compile this manually myself using scanf-- 2808 02:16:19,880 --> 02:16:21,260 using clang directly. 2809 02:16:21,260 --> 02:16:23,600 That worked, dot slash scanf. 2810 02:16:23,600 --> 02:16:26,870 Let me go ahead and type in, for instance, "HI!" 2811 02:16:26,870 --> 02:16:29,000 and you see weirdness, nul. 2812 02:16:29,000 --> 02:16:31,190 Well, fortunately, make, and in turn clang, 2813 02:16:31,190 --> 02:16:33,830 were kind of helping us help ourselves there. 2814 02:16:33,830 --> 02:16:35,840 It was pointing out that you declared s. 2815 02:16:35,840 --> 02:16:38,660 So you were declared 8 bytes for a pointer. 2816 02:16:38,660 --> 02:16:39,860 But there's nothing there. 2817 02:16:39,860 --> 02:16:41,459 It's a garbage value. 2818 02:16:41,459 --> 02:16:43,170 And so there's nowhere to put this. 2819 02:16:43,170 --> 02:16:45,889 And thankfully, printf and scanf are being smart enough 2820 02:16:45,889 --> 02:16:48,870 by not just blindly going there and plopping H, I, 2821 02:16:48,870 --> 02:16:50,760 exclamation point in a nul character. 2822 02:16:50,760 --> 02:16:52,010 They're just leaving it alone. 2823 02:16:52,010 --> 02:16:55,910 And this parenthetical nul is just a printf feature saying, you screwed up. 2824 02:16:55,910 --> 02:16:58,100 If you see nul, you've done something wrong. 2825 02:16:58,100 --> 02:17:00,830 It's just being generous and not crashing on you. 2826 02:17:00,830 --> 02:17:04,879 If I actually want to get user's input, I need to be smarter than this. 2827 02:17:04,879 --> 02:17:10,040 And I need to either allocate myself 4 bytes, as we've done earlier today. 2828 02:17:10,040 --> 02:17:14,209 Or I could go back to week two stuff and say something like, give me 4 bytes. 2829 02:17:14,209 --> 02:17:18,830 This, though, gives me 4 bytes on the stack somewhere 2830 02:17:18,830 --> 02:17:21,410 down here in main's frame, so to speak. 2831 02:17:21,410 --> 02:17:23,270 These rows are called frames. 2832 02:17:23,270 --> 02:17:27,260 If I use malloc instead, it comes from the so-called heap, 2833 02:17:27,260 --> 02:17:29,780 which not pictured, is sort of up here. 2834 02:17:29,780 --> 02:17:34,309 And the only difference is that if I'm using malloc, I have to use free. 2835 02:17:34,309 --> 02:17:38,930 If I'm using the stack, as I did in week two, I don't have to use free. 2836 02:17:38,930 --> 02:17:40,730 It's automatically managed for me. 2837 02:17:40,730 --> 02:17:42,590 So frankly, there's so much new stuff today. 2838 02:17:42,590 --> 02:17:46,280 I like the idea of sticking with the old school arrays. 2839 02:17:46,280 --> 02:17:51,379 So now, though, if I go ahead and make scanf, now it compiles with make. 2840 02:17:51,379 --> 02:17:55,610 If I then run scanf and type in, HI!, voila, it seems to work. 2841 02:17:55,610 --> 02:17:58,549 But that's because I was smart and anticipated that H-I, 2842 02:17:58,549 --> 02:17:59,660 OK four characters. 2843 02:17:59,660 --> 02:18:00,980 I gave myself 4 bytes. 2844 02:18:00,980 --> 02:18:06,110 But what if the user types in, HI THERE, DAVID, HOW ARE YOU? 2845 02:18:06,110 --> 02:18:08,059 Clearly, more than four bytes. 2846 02:18:08,059 --> 02:18:11,959 And I hit Enter now, something weird there happened. 2847 02:18:11,959 --> 02:18:13,790 The rest is just lost. 2848 02:18:13,790 --> 02:18:16,670 And this would really be annoying and very frustrating 2849 02:18:16,670 --> 02:18:19,520 if you-- trying to get user input in the first week of the class. 2850 02:18:19,520 --> 02:18:21,500 Get string avoids this for you. 2851 02:18:21,500 --> 02:18:23,719 Get string calls malloc for you. 2852 02:18:23,719 --> 02:18:27,200 And it calls it for as big a chunk of memory as the string 2853 02:18:27,200 --> 02:18:28,070 the human types in. 2854 02:18:28,070 --> 02:18:30,980 Long story short, we sort of watch what they're typing character 2855 02:18:30,980 --> 02:18:32,209 by character by character. 2856 02:18:32,209 --> 02:18:34,340 And we make sure to allocate or reallocate 2857 02:18:34,340 --> 02:18:38,879 just enough memory to fit whatever it is the human has typed in. 2858 02:18:38,879 --> 02:18:42,107 So scanf is, essentially, how a function like the CS50 library 2859 02:18:42,107 --> 02:18:43,190 works underneath the hood. 2860 02:18:43,190 --> 02:18:46,650 But it is doing all of this for you. 2861 02:18:46,650 --> 02:18:49,549 And as soon as you take away training wheels like that, or frankly, 2862 02:18:49,549 --> 02:18:52,469 libraries like that, which it really is at the end of the day. 2863 02:18:52,469 --> 02:18:53,719 It's not just a teaching tool. 2864 02:18:53,719 --> 02:18:55,070 It's a useful library. 2865 02:18:55,070 --> 02:18:58,469 You have to start implementing more of this low-level stuff yourself. 2866 02:18:58,469 --> 02:18:59,810 So again, there is a trade off. 2867 02:18:59,810 --> 02:19:02,727 If you don't want to use something like the CS50 library, that's fine. 2868 02:19:02,727 --> 02:19:08,400 Now, the onus is on you to avoid all of these possible error conditions. 2869 02:19:08,400 --> 02:19:11,209 All right, with that said, we have one final feature 2870 02:19:11,209 --> 02:19:14,270 to give you in order to motivate this week's problems, wherein 2871 02:19:14,270 --> 02:19:18,230 you'll actually explore and manipulate and write code to change files. 2872 02:19:18,230 --> 02:19:22,790 And for that, we need one final topic of file I/O. File I/O 2873 02:19:22,790 --> 02:19:27,350 is the term of art that describes taking input and output from files. 2874 02:19:27,350 --> 02:19:30,980 Pretty much every program we've written thus far just uses memory, like this 2875 02:19:30,980 --> 02:19:32,924 here, whereby, you can put stuff in memory. 2876 02:19:32,924 --> 02:19:34,549 But as soon as your program ends, boom. 2877 02:19:34,549 --> 02:19:35,330 It's gone. 2878 02:19:35,330 --> 02:19:37,070 The contents of memory are gone. 2879 02:19:37,070 --> 02:19:39,770 Files, of course, are where you and I in the computing world 2880 02:19:39,770 --> 02:19:42,020 save our essays and documents and resumes 2881 02:19:42,020 --> 02:19:44,629 and all of that permanently on your computer. 2882 02:19:44,629 --> 02:19:48,590 In C, you have the ability, certainly, to write code yourself that 2883 02:19:48,590 --> 02:19:50,730 saves files long term. 2884 02:19:50,730 --> 02:19:53,450 So for instance, let me go ahead and write my own program here, 2885 02:19:53,450 --> 02:19:59,260 a phonebook program that stores names and numbers in a file. 2886 02:19:59,260 --> 02:20:02,380 I'm going to go ahead and include, just for convenience, the CS50 library 2887 02:20:02,380 --> 02:20:04,480 again, because I don't want to deal with scanf. 2888 02:20:04,480 --> 02:20:08,200 I'm going to go ahead and save this, incidentally, as phonebook.c. 2889 02:20:08,200 --> 02:20:12,370 I'm going to go ahead and include, not just the CS50 library, but standard io. 2890 02:20:12,370 --> 02:20:18,373 And preemptively, I'm going to go ahead and include string.h as well. 2891 02:20:18,373 --> 02:20:20,290 And I'm going to go ahead in my main function. 2892 02:20:20,290 --> 02:20:23,990 And I'm going to use a few new functions that we'll see only briefly here. 2893 02:20:23,990 --> 02:20:27,260 But in the next problem set, will you explore these in more detail. 2894 02:20:27,260 --> 02:20:29,980 I'm going to give myself a pointer to a file. 2895 02:20:29,980 --> 02:20:33,820 It turns out, weirdly, that in all caps, FILE, 2896 02:20:33,820 --> 02:20:38,540 this is a new data type that does come with C that represents a file. 2897 02:20:38,540 --> 02:20:42,383 So I'm going to go ahead and give myself a pointer to a file, 2898 02:20:42,383 --> 02:20:43,300 the address of a file. 2899 02:20:43,300 --> 02:20:44,800 And I'm going to call the variable file. 2900 02:20:44,800 --> 02:20:46,300 I could call it f I could call it x. 2901 02:20:46,300 --> 02:20:49,130 I'm going to call it lowercase file, just to be clear. 2902 02:20:49,130 --> 02:20:52,180 And I'm going to use a new function called f open, which means file open. 2903 02:20:52,180 --> 02:20:54,077 And file open takes two arguments. 2904 02:20:54,077 --> 02:20:57,160 It takes the first argument, which is the name of a file you want to open. 2905 02:20:57,160 --> 02:20:59,638 I'm going to open a file called phonebook.csv. 2906 02:20:59,638 --> 02:21:02,680 And then I'm going to go ahead and open it, specifically, in append mode. 2907 02:21:02,680 --> 02:21:05,050 Long story short, you can open files in different ways, 2908 02:21:05,050 --> 02:21:08,450 to read them, that is just look at their contents, to write them, 2909 02:21:08,450 --> 02:21:10,780 which is to change their contents entirely, 2910 02:21:10,780 --> 02:21:15,730 or to append to them, a, which means to add row by row to them, 2911 02:21:15,730 --> 02:21:18,370 so to keep tacking on more information to them. 2912 02:21:18,370 --> 02:21:20,210 I'm going to go ahead and, just to be safe, 2913 02:21:20,210 --> 02:21:23,650 I'm going to say if file equals equals nul, 2914 02:21:23,650 --> 02:21:26,180 because recall that nul signifies something went wrong, 2915 02:21:26,180 --> 02:21:27,280 let's just return now. 2916 02:21:27,280 --> 02:21:28,960 Maybe I mistyped the name of the file. 2917 02:21:28,960 --> 02:21:29,950 Maybe it doesn't exist. 2918 02:21:29,950 --> 02:21:31,420 Something went wrong, potentially. 2919 02:21:31,420 --> 02:21:34,660 I'm going to check for that by saying, if file equals equals nul, just 2920 02:21:34,660 --> 02:21:36,178 quit out of the program now. 2921 02:21:36,178 --> 02:21:38,470 But after that, I'm going to go ahead and get a string. 2922 02:21:38,470 --> 02:21:41,920 But we can call that char star now, called name. 2923 02:21:41,920 --> 02:21:44,440 And I'm going to ask the user for a name. 2924 02:21:44,440 --> 02:21:45,820 And we've done this before. 2925 02:21:45,820 --> 02:21:48,610 I'm going to go ahead and ask them for a number, phone number. 2926 02:21:48,610 --> 02:21:49,970 And we've done this before. 2927 02:21:49,970 --> 02:21:52,690 The only difference, now, is I'm calling string char star. 2928 02:21:52,690 --> 02:21:54,400 And now, here's the cool part. 2929 02:21:54,400 --> 02:21:56,830 It turns out, if I want to save this name and number 2930 02:21:56,830 --> 02:21:58,990 to that file permanently in a CSV-- 2931 02:21:58,990 --> 02:22:02,170 if unfamiliar, popular in the consulting world, the analytics world. 2932 02:22:02,170 --> 02:22:04,900 It's just a spreadsheet, a comma-separated value 2933 02:22:04,900 --> 02:22:08,470 file that you can open in Excel or numbers or Google spreadsheet. 2934 02:22:08,470 --> 02:22:13,660 I'm going to go ahead and, not printf, but fprintf to that file, 2935 02:22:13,660 --> 02:22:18,580 a string followed by a comma, followed by a string, followed by a new line, 2936 02:22:18,580 --> 02:22:21,070 plugging in the name and the number. 2937 02:22:21,070 --> 02:22:25,280 And then down here, I'm going to close the file. 2938 02:22:25,280 --> 02:22:28,570 So this is new. fprintf is not printf, which prints to your screen. 2939 02:22:28,570 --> 02:22:30,307 fprintf prints to a file. 2940 02:22:30,307 --> 02:22:32,890 So you have to pass in one more argument, the first one, which 2941 02:22:32,890 --> 02:22:37,150 is the pointer to the file that you want to send these new strings to. 2942 02:22:37,150 --> 02:22:40,180 Then you still provide a format string, which says, hey fprintf, 2943 02:22:40,180 --> 02:22:43,060 this is the kind of data I want to print to the file. 2944 02:22:43,060 --> 02:22:46,930 And then you plug in the variables, just like we've always done with printf. 2945 02:22:46,930 --> 02:22:49,610 And then lastly, we close the file. 2946 02:22:49,610 --> 02:22:53,200 So in short, this program would seem to prompt a human for a name and number. 2947 02:22:53,200 --> 02:22:55,420 And then it's going to go ahead and write those names 2948 02:22:55,420 --> 02:22:56,990 and numbers to the file. 2949 02:22:56,990 --> 02:22:59,035 So let me go ahead and make phonebook. 2950 02:22:59,035 --> 02:23:07,810 OK, no mistake so far, dot slash phonebook, David, 949-468-2750. 2951 02:23:07,810 --> 02:23:11,140 OK, let me run it once more, even though nothing seems to have happened. 2952 02:23:11,140 --> 02:23:15,730 Brian, how about 617-495-1000, Enter. 2953 02:23:15,730 --> 02:23:17,950 Let me check my file browser here. 2954 02:23:17,950 --> 02:23:22,240 Notice, all of the files we've created today, including, if I zoom in, 2955 02:23:22,240 --> 02:23:25,390 not just phonebook.c, but phonebook.csv. 2956 02:23:25,390 --> 02:23:29,290 And if I double click that, notice what's inside of this. 2957 02:23:29,290 --> 02:23:33,700 Voila, David's name, Brian's name, and each of our numbers. 2958 02:23:33,700 --> 02:23:36,280 And even cooler than that, let me go ahead and close this. 2959 02:23:36,280 --> 02:23:40,213 Let me go ahead and download this file using the IDE. 2960 02:23:40,213 --> 02:23:42,380 And that's going to put it into my Downloads folder. 2961 02:23:42,380 --> 02:23:43,420 Let me go ahead and click on it. 2962 02:23:43,420 --> 02:23:45,545 And it's going to open Excel or Numbers or whatever 2963 02:23:45,545 --> 02:23:47,470 you happen to have on your Mac or PC. 2964 02:23:47,470 --> 02:23:50,740 I'm going to go ahead and just proceed. 2965 02:23:50,740 --> 02:23:54,400 And voila, looks a little stupid in this formatting here. 2966 02:23:54,400 --> 02:23:57,160 But I've opened up a spreadsheet that I, myself, generated 2967 02:23:57,160 --> 02:24:01,390 using fopen, fprintf, and fclose. 2968 02:24:01,390 --> 02:24:04,180 So already, now that we have pointers at our disposal, 2969 02:24:04,180 --> 02:24:08,292 can we actually manipulate things like files, which is quite cool. 2970 02:24:08,292 --> 02:24:10,000 But we're going to do that this week, not 2971 02:24:10,000 --> 02:24:12,940 with text, but with actual specific types of files. 2972 02:24:12,940 --> 02:24:16,840 And indeed, recall this kind of thinking here. 2973 02:24:16,840 --> 02:24:19,150 If you glance at this, it's probably pretty cryptic. 2974 02:24:19,150 --> 02:24:21,400 It looks like machine code, but it's not. 2975 02:24:21,400 --> 02:24:24,070 This is, perhaps, the simplest representation 2976 02:24:24,070 --> 02:24:26,410 of a smiley face inside of a file. 2977 02:24:26,410 --> 02:24:31,000 If you have a bitmap file, a map of bits, a grid of bits, those bits, 2978 02:24:31,000 --> 02:24:33,130 quite simply, could literally be 0's and 1's. 2979 02:24:33,130 --> 02:24:37,240 And if you assign the color black to 0 and the color white to 1, 2980 02:24:37,240 --> 02:24:40,660 you could actually think of this same grid of 0's and 1's as representing, 2981 02:24:40,660 --> 02:24:41,930 indeed, a smiley face. 2982 02:24:41,930 --> 02:24:43,690 In other words, here are some pixels. 2983 02:24:43,690 --> 02:24:45,520 We talked about pixels in week zero. 2984 02:24:45,520 --> 02:24:49,567 Pixels are just the dots that compose a graphic file on your computer. 2985 02:24:49,567 --> 02:24:50,650 And pixels are everywhere. 2986 02:24:50,650 --> 02:24:53,320 All of us, now, tuning in live via Zoom or YouTube or the like, 2987 02:24:53,320 --> 02:24:56,800 we're watching streams of pixels, which compose multiple images and multiple 2988 02:24:56,800 --> 02:25:02,290 images compose video that appears to be moving at, like, 20 something or 30 2989 02:25:02,290 --> 02:25:04,670 frames per second, images per second. 2990 02:25:04,670 --> 02:25:08,530 Now, of course, there's only so much fidelity in these kinds of images. 2991 02:25:08,530 --> 02:25:11,097 And it's quite common in the case on TV and in movies, 2992 02:25:11,097 --> 02:25:13,930 if there's some bad guy that's been picked up with some surveillance 2993 02:25:13,930 --> 02:25:17,050 footage or the like, invariably, the folks on Law & Order and the like 2994 02:25:17,050 --> 02:25:19,930 can just kind of enhance the video and zoom in and see 2995 02:25:19,930 --> 02:25:24,710 exactly the glint in the person's eye that reveals who committed some crime. 2996 02:25:24,710 --> 02:25:26,140 Well, that's all kind of nonsense. 2997 02:25:26,140 --> 02:25:29,367 And it derives from some of the primitives we introduced in week zero. 2998 02:25:29,367 --> 02:25:31,450 In fact, just to poke fun at this, let me go ahead 2999 02:25:31,450 --> 02:25:34,990 and play on a few seconds of this TV show here in the US 3000 02:25:34,990 --> 02:25:39,670 called CSI, just to give you a sense of just how commonplace this kind of logic 3001 02:25:39,670 --> 02:25:40,180 is. 3002 02:25:40,180 --> 02:25:41,140 [VIDEO PLAYBACK] 3003 02:25:41,140 --> 02:25:43,330 - We know. 3004 02:25:43,330 --> 02:25:46,930 - That at 9:15, Ray Santoya was at the ATM. 3005 02:25:46,930 --> 02:25:50,380 - So the question is, what was he doing at 9:16? 3006 02:25:50,380 --> 02:25:53,180 - Shooting the 9 millimeter at something. 3007 02:25:53,180 --> 02:25:54,820 Maybe he saw the sniper. 3008 02:25:54,820 --> 02:25:56,920 - Or was working with him. 3009 02:25:56,920 --> 02:25:59,490 - Wait, go back one. 3010 02:25:59,490 --> 02:26:00,481 - What do you see? 3011 02:26:00,481 --> 02:26:05,291 [CLICKING] 3012 02:26:05,291 --> 02:26:07,700 3013 02:26:07,700 --> 02:26:11,420 - Bring his face up, full screen. 3014 02:26:11,420 --> 02:26:12,530 - His glasses. 3015 02:26:12,530 --> 02:26:13,982 - There's a reflection. 3016 02:26:13,982 --> 02:26:17,426 [TYPING] 3017 02:26:17,426 --> 02:26:23,840 3018 02:26:23,840 --> 02:26:25,620 - That's Neuvitas baseball team. 3019 02:26:25,620 --> 02:26:26,630 That's their logo. 3020 02:26:26,630 --> 02:26:29,075 - And he's talking to whoever's wearing that jacket. 3021 02:26:29,075 --> 02:26:31,160 - We may have a witness. 3022 02:26:31,160 --> 02:26:32,700 - To both shootings. 3023 02:26:32,700 --> 02:26:33,283 [END PLAYBACK] 3024 02:26:33,283 --> 02:26:36,408 DAVID MALAN: So unfortunately, today will rather ruin a lot of TV and movie 3025 02:26:36,408 --> 02:26:38,650 for you, because you can't just zoom in infinitely 3026 02:26:38,650 --> 02:26:41,250 and see more information if that information is not there. 3027 02:26:41,250 --> 02:26:43,750 At the end of the day, there's only a finite number of bits. 3028 02:26:43,750 --> 02:26:46,120 And case in point, here's a photograph of Brian. 3029 02:26:46,120 --> 02:26:48,580 And you might see that, oh, there's a glint in his eye. 3030 02:26:48,580 --> 02:26:50,930 Let's see what was being reflected in his eye there. 3031 02:26:50,930 --> 02:26:53,410 And so if we Zoom in on this image here of Brian, 3032 02:26:53,410 --> 02:26:57,730 and maybe we zoom in a little further, that's all that's actually there. 3033 02:26:57,730 --> 02:27:00,160 You can't just click the enhance button and see more, 3034 02:27:00,160 --> 02:27:02,368 because at the end of the day, these are just pixels. 3035 02:27:02,368 --> 02:27:06,310 And pixels, per week zero, are just 0's and 1's, and finitely, many so. 3036 02:27:06,310 --> 02:27:08,470 So what you see is what you get. 3037 02:27:08,470 --> 02:27:12,190 Now, with that said-- and actually, we can poke fun of this, too, here. 3038 02:27:12,190 --> 02:27:14,830 Let me just play one other short clip from Futurama, 3039 02:27:14,830 --> 02:27:18,423 which kind of hammers home this point as well, but more playfully so. 3040 02:27:18,423 --> 02:27:19,090 [VIDEO PLAYBACK] 3041 02:27:19,090 --> 02:27:23,250 - Magnify that death speed. 3042 02:27:23,250 --> 02:27:24,770 Why is it still blurry? 3043 02:27:24,770 --> 02:27:26,710 - That's all the resolution we have. 3044 02:27:26,710 --> 02:27:29,050 Making it bigger doesn't make it clearer. 3045 02:27:29,050 --> 02:27:31,220 - It does on CSI: Miami. 3046 02:27:31,220 --> 02:27:32,020 - [SIGH] 3047 02:27:32,020 --> 02:27:32,170 [END PLAYBACK] 3048 02:27:32,170 --> 02:27:35,212 DAVID MALAN: So there, we have two clips talking, rather, to one another. 3049 02:27:35,212 --> 02:27:37,330 But I have to update things for 2020. 3050 02:27:37,330 --> 02:27:41,972 You can't really pick up the internet these days or magazine these days, 3051 02:27:41,972 --> 02:27:43,930 if you even would, that doesn't somehow mention 3052 02:27:43,930 --> 02:27:45,850 machine learning and artificial intelligence 3053 02:27:45,850 --> 02:27:48,005 and fancy algorithms via which you can do things 3054 02:27:48,005 --> 02:27:49,630 that previously weren't quite possible. 3055 02:27:49,630 --> 02:27:51,460 And that's actually kinda sorta the case. 3056 02:27:51,460 --> 02:27:56,290 You might recall from week zero, that we found this beautiful watercolor 3057 02:27:56,290 --> 02:28:00,250 painting in the Harvard archives that's only about 11 inches tall total. 3058 02:28:00,250 --> 02:28:03,700 And yet somehow, it's 13 feet tall here behind me. 3059 02:28:03,700 --> 02:28:06,533 Now, normally, if you were to just enhance this watercolor painting, 3060 02:28:06,533 --> 02:28:08,658 it would start to look pretty stupid pretty quickly 3061 02:28:08,658 --> 02:28:10,570 with lots and lots of pixelation, even if you 3062 02:28:10,570 --> 02:28:12,940 used a very fancy camera, as the archives do, 3063 02:28:12,940 --> 02:28:14,440 to capture the original image. 3064 02:28:14,440 --> 02:28:16,810 But we wanted to blow it up to 13 feet tall 3065 02:28:16,810 --> 02:28:21,110 so that it would stand at high quality behind us this whole time. 3066 02:28:21,110 --> 02:28:24,790 And there, we actually did use enhance, in some sense. 3067 02:28:24,790 --> 02:28:28,640 So using, long story short, fancier algorithms than those last week, 3068 02:28:28,640 --> 02:28:31,690 you can use artificial intelligence, machine learning, 3069 02:28:31,690 --> 02:28:36,130 to actually analyze data and find patterns where there weren't-- 3070 02:28:36,130 --> 02:28:38,280 that aren't necessarily visible to the human eye. 3071 02:28:38,280 --> 02:28:41,590 So for instance, if we take the original here and start to zoom in, 3072 02:28:41,590 --> 02:28:43,600 it looks pretty good at this resolution. 3073 02:28:43,600 --> 02:28:44,720 But it's pretty smooth. 3074 02:28:44,720 --> 02:28:48,730 You don't really see the fact that this was paint on an actual canvas. 3075 02:28:48,730 --> 02:28:50,707 So this was just zooming in on Photoshop. 3076 02:28:50,707 --> 02:28:52,540 But when you actually run an image like this 3077 02:28:52,540 --> 02:28:55,990 through fancy machine learning-based software, artificial intelligence, 3078 02:28:55,990 --> 02:28:58,570 you can begin to improve it and actually see, 3079 02:28:58,570 --> 02:29:01,390 not just this window from the top of one of the buildings, which 3080 02:29:01,390 --> 02:29:03,520 is pretty glossed over here in Photoshop, 3081 02:29:03,520 --> 02:29:05,480 you can start to see more detail. 3082 02:29:05,480 --> 02:29:08,750 So this is literally the before, just zooming in Photoshop. 3083 02:29:08,750 --> 02:29:12,572 This is after, actually applying fancy artificial intelligence algorithms 3084 02:29:12,572 --> 02:29:15,280 that notice, wait a minute, there's a little discoloration there. 3085 02:29:15,280 --> 02:29:17,072 Wait, there's a little discoloration there. 3086 02:29:17,072 --> 02:29:20,830 And nowadays, enhance is increasingly becoming a thing. 3087 02:29:20,830 --> 02:29:22,450 It's still inferring. 3088 02:29:22,450 --> 02:29:25,270 It's not resurrecting information that was necessarily there. 3089 02:29:25,270 --> 02:29:28,240 It's doing its best guess, really, algorithmically, 3090 02:29:28,240 --> 02:29:30,487 to reconstruct what the image actually was. 3091 02:29:30,487 --> 02:29:32,320 And if we zoom in further, you can, perhaps, 3092 02:29:32,320 --> 02:29:35,440 see that this is really starting to get blurry if you just use Photoshop 3093 02:29:35,440 --> 02:29:36,578 and keep zooming in. 3094 02:29:36,578 --> 02:29:38,620 But if you run it through fancy enough algorithms 3095 02:29:38,620 --> 02:29:40,780 and start to notice slight discolorations that 3096 02:29:40,780 --> 02:29:44,920 aren't super visible to the human eye, we can enhance that even further. 3097 02:29:44,920 --> 02:29:46,540 And you can't do it infinitely so. 3098 02:29:46,540 --> 02:29:48,550 And in some sense, we're creating information 3099 02:29:48,550 --> 02:29:51,282 where there isn't necessarily that information there. 3100 02:29:51,282 --> 02:29:54,490 So whether or not these kinds of things hold up in court is another question. 3101 02:29:54,490 --> 02:29:56,920 But it can improve the fidelity of images like this. 3102 02:29:56,920 --> 02:30:02,570 And indeed, it allowed us to zoom in from 11 inches to 13 feet instead. 3103 02:30:02,570 --> 02:30:05,920 So when it comes to manipulating images, ultimately, we 3104 02:30:05,920 --> 02:30:10,030 do have some programmatic capabilities, including this file pointer, 3105 02:30:10,030 --> 02:30:13,280 like we just saw, and also, a few other functions as well. 3106 02:30:13,280 --> 02:30:15,550 And our final examples, here, will lay the foundation 3107 02:30:15,550 --> 02:30:17,380 for what you'll do this coming week, which 3108 02:30:17,380 --> 02:30:21,250 is manipulate your very own graphical files with a newfound understanding 3109 02:30:21,250 --> 02:30:25,270 of pointers and addresses and now files and input and output. 3110 02:30:25,270 --> 02:30:30,010 For instance, I'm going to go ahead and open up a program here called-- 3111 02:30:30,010 --> 02:30:32,110 give me just one second. 3112 02:30:32,110 --> 02:30:37,660 I'm going to open up a program here called jpeg.c. 3113 02:30:37,660 --> 02:30:40,610 And this program, jpeg.c, which I wrote in advance, 3114 02:30:40,610 --> 02:30:43,400 which is on the course's website, does the following. 3115 02:30:43,400 --> 02:30:46,510 It first declares a type called byte. 3116 02:30:46,510 --> 02:30:49,990 It turns out, in C, there's no common definition of what a byte is. 3117 02:30:49,990 --> 02:30:51,610 A bite, as we know it, is a bit. 3118 02:30:51,610 --> 02:30:53,680 And it turns out, the simplest way to create 3119 02:30:53,680 --> 02:30:57,250 a byte is to define our own, just like we've defined a string, 3120 02:30:57,250 --> 02:31:01,840 just like we've defined other types too, like a student, in order-- 3121 02:31:01,840 --> 02:31:04,640 a person, rather, in order to give us a byte. 3122 02:31:04,640 --> 02:31:07,210 So this first line of code just declares a data type 3123 02:31:07,210 --> 02:31:11,830 called byte, using another, more arcane data type called u int a underscore t. 3124 02:31:11,830 --> 02:31:13,330 But more on that in the problem set. 3125 02:31:13,330 --> 02:31:15,820 That this just did invent something called byte. 3126 02:31:15,820 --> 02:31:17,928 Notice, in this program, I'm resurrecting the idea 3127 02:31:17,928 --> 02:31:21,220 from week two of command line arguments, where we can take input from the user. 3128 02:31:21,220 --> 02:31:23,860 Notice that I'm checking if the user typed in two arguments. 3129 02:31:23,860 --> 02:31:27,520 And if not, I'm returning one immediately to signify error. 3130 02:31:27,520 --> 02:31:30,490 In line 17, I'm using my new technique. 3131 02:31:30,490 --> 02:31:34,210 I'm opening a file using the name of the file 3132 02:31:34,210 --> 02:31:36,050 that the human typed at the command line. 3133 02:31:36,050 --> 02:31:40,270 And this time, I'm opening it to read it with quote unquote, r instead of a. 3134 02:31:40,270 --> 02:31:41,660 But if there's not a file-- 3135 02:31:41,660 --> 02:31:44,920 so if bang file, that is, if exclamation point file, 3136 02:31:44,920 --> 02:31:47,990 or if file equals equals NULL, those mean the same thing. 3137 02:31:47,990 --> 02:31:51,040 I can go ahead and return one, signifying an error. 3138 02:31:51,040 --> 02:31:53,710 Down here, I'm doing something a little clever. 3139 02:31:53,710 --> 02:31:56,890 It turns out that with very high probability, 3140 02:31:56,890 --> 02:32:01,640 you can determine if any file is a jpeg by looking only at its first three 3141 02:32:01,640 --> 02:32:02,140 bytes. 3142 02:32:02,140 --> 02:32:04,720 A lot of file formats have what are called magic numbers 3143 02:32:04,720 --> 02:32:06,350 at the beginning of their files. 3144 02:32:06,350 --> 02:32:10,990 And these are industry standard numbers, 1 or 2 or 3 or more of them, 3145 02:32:10,990 --> 02:32:13,910 that is just commonly expected to be at the beginning of a file, 3146 02:32:13,910 --> 02:32:16,240 so that a program can quickly check, is this a jpeg? 3147 02:32:16,240 --> 02:32:16,960 Is this a gif? 3148 02:32:16,960 --> 02:32:18,070 Is this a Word document? 3149 02:32:18,070 --> 02:32:19,300 Is this an Excel file? 3150 02:32:19,300 --> 02:32:21,910 They tend to have these numbers at the beginning of them. 3151 02:32:21,910 --> 02:32:26,020 And jpegs have a sequence of bytes that we're about to see. 3152 02:32:26,020 --> 02:32:29,770 This line of code 24 here, as you'll see in the next problem set, 3153 02:32:29,770 --> 02:32:33,070 is how you might give yourself a buffer of bytes, specifically 3154 02:32:33,070 --> 02:32:35,320 an array of three bytes. 3155 02:32:35,320 --> 02:32:38,380 This next line of code, as you'll see this coming week, is called fread. 3156 02:32:38,380 --> 02:32:40,720 fread, as the name suggests, reads from a file. 3157 02:32:40,720 --> 02:32:42,940 That is, it grabs bytes from a file. 3158 02:32:42,940 --> 02:32:45,790 And it's a little fancy to use, but you'll get more comfortable 3159 02:32:45,790 --> 02:32:47,140 with this over time. 3160 02:32:47,140 --> 02:32:52,060 It reads into this buffer, its first argument, the size of this data type, 3161 02:32:52,060 --> 02:32:53,050 the size of a byte. 3162 02:32:53,050 --> 02:32:58,250 And it reads in this many of those data types from this file. 3163 02:32:58,250 --> 02:33:01,480 So again, it's for arguments, which is kind of a lot from what we've seen. 3164 02:33:01,480 --> 02:33:08,230 But it reads from this file, three bytes into this array, 3165 02:33:08,230 --> 02:33:09,770 a.k.a. buffer, called bytes. 3166 02:33:09,770 --> 02:33:13,460 So this is just how you write code that doesn't put data in a file, 3167 02:33:13,460 --> 02:33:14,650 but read it from it. 3168 02:33:14,650 --> 02:33:16,700 And then here, notice our hexadecimal. 3169 02:33:16,700 --> 02:33:18,190 So we've come full circle. 3170 02:33:18,190 --> 02:33:23,110 If bytes bracket 0 equals equals 0xff and bytes 3171 02:33:23,110 --> 02:33:27,080 bracket 1 equals 0xd8 and bytes bracket 2 equals 0xff, 3172 02:33:27,080 --> 02:33:28,960 this definitely looks cryptic to you. 3173 02:33:28,960 --> 02:33:31,570 But that's just because I looked up in the manual for jpegs, 3174 02:33:31,570 --> 02:33:34,900 and it turns out that almost any jpeg, rather, 3175 02:33:34,900 --> 02:33:39,430 must start with 0xff, 0xd8, 0xff. 3176 02:33:39,430 --> 02:33:43,450 Those are the first three bytes of any jpeg on your Mac, your PC, 3177 02:33:43,450 --> 02:33:44,350 on the internet. 3178 02:33:44,350 --> 02:33:46,300 There are always those three bytes. 3179 02:33:46,300 --> 02:33:50,500 It turns out, the fourth byte further decides whether or not 3180 02:33:50,500 --> 02:33:51,730 a file is actually a jpeg. 3181 02:33:51,730 --> 02:33:54,640 But the algorithm for that's a little fancier, so I kept it simple. 3182 02:33:54,640 --> 02:33:59,020 If the first three bytes of a file are those, maybe you have a jpeg. 3183 02:33:59,020 --> 02:34:01,150 But if you don't have exactly those three bytes, 3184 02:34:01,150 --> 02:34:02,920 you definitely don't have a jpeg. 3185 02:34:02,920 --> 02:34:05,270 And so what I can do, here, is as follows. 3186 02:34:05,270 --> 02:34:09,700 In today's code-- let me go ahead and grab two other files 3187 02:34:09,700 --> 02:34:11,620 that I brought with me. 3188 02:34:11,620 --> 02:34:16,210 And one happens to be a photograph again. 3189 02:34:16,210 --> 02:34:18,160 Give me one second. 3190 02:34:18,160 --> 02:34:24,010 I brought with me a few files, one of which is called brian.jpeg, 3191 02:34:24,010 --> 02:34:25,870 which is the same photo of Brian. 3192 02:34:25,870 --> 02:34:28,030 And then I have a gif, which of course, is not 3193 02:34:28,030 --> 02:34:31,210 a jpeg, that is this cat typing here. 3194 02:34:31,210 --> 02:34:33,250 And what I, effectively, have in front of me now 3195 02:34:33,250 --> 02:34:37,870 is a program that if I do make jpeg, because this file is jpeg.c, 3196 02:34:37,870 --> 02:34:43,360 and I run dot slash jpeg, I can type in something like cat.gif 3197 02:34:43,360 --> 02:34:46,990 at the command line as an argument, hit Enter, and I should see no. 3198 02:34:46,990 --> 02:34:51,550 By contrast, if I pass in Brian's jpeg at the command line as an argument, 3199 02:34:51,550 --> 02:34:52,630 I see maybe. 3200 02:34:52,630 --> 02:34:54,430 And again, maybe only because the algorithm 3201 02:34:54,430 --> 02:34:56,638 for actually adjudicating whether something is a jpeg 3202 02:34:56,638 --> 02:34:58,550 is a little more complicated than that. 3203 02:34:58,550 --> 02:35:02,590 But indeed, I can now access the individual bytes, and therefore pixels, 3204 02:35:02,590 --> 02:35:06,310 it would seem, of an image file. 3205 02:35:06,310 --> 02:35:08,575 And in fact, we can even do this. 3206 02:35:08,575 --> 02:35:10,450 Let me go ahead and show you one last program 3207 02:35:10,450 --> 02:35:13,960 that we wrote deliberately in advance, just to give you a taste of what's 3208 02:35:13,960 --> 02:35:15,790 coming with the next problem set. 3209 02:35:15,790 --> 02:35:19,480 This program is a reimplementation of the program you've probably 3210 02:35:19,480 --> 02:35:21,820 used one or more times called CP. 3211 02:35:21,820 --> 02:35:25,570 Recall that CP is a program in the IDE and in Linux, 3212 02:35:25,570 --> 02:35:27,730 more generally, that allows you to copy a file. 3213 02:35:27,730 --> 02:35:31,660 You do CP, space, the filename, space, the new filename. 3214 02:35:31,660 --> 02:35:32,650 How does this work? 3215 02:35:32,650 --> 02:35:37,090 I now have all of the building blocks with which to copy files myself. 3216 02:35:37,090 --> 02:35:39,100 So again, I'm defining a byte up here. 3217 02:35:39,100 --> 02:35:41,930 I'm defining main as taking command line arguments here. 3218 02:35:41,930 --> 02:35:43,000 And notice one change. 3219 02:35:43,000 --> 02:35:44,800 I'm not using the CS50 library. 3220 02:35:44,800 --> 02:35:52,090 So even what was previously string in week two is now char star. 3221 02:35:52,090 --> 02:35:55,450 Even here for argv, I'm making sure that the human types 3222 02:35:55,450 --> 02:36:00,580 in three words, the program's name and the source file and the destination 3223 02:36:00,580 --> 02:36:01,180 file. 3224 02:36:01,180 --> 02:36:02,410 I'm using fopen again. 3225 02:36:02,410 --> 02:36:06,100 I'm opening the source file here from argv1. 3226 02:36:06,100 --> 02:36:07,358 I'm making sure it's not nul. 3227 02:36:07,358 --> 02:36:08,650 And then I'm quitting if it is. 3228 02:36:08,650 --> 02:36:13,030 I'm then-- here's something new, opening the destination file here, also 3229 02:36:13,030 --> 02:36:13,870 with fopen. 3230 02:36:13,870 --> 02:36:15,700 But I'm using quote unquote, "w." 3231 02:36:15,700 --> 02:36:19,630 I'm opening one file with r, one file for w, because I want to read from one 3232 02:36:19,630 --> 02:36:21,160 and write to the other. 3233 02:36:21,160 --> 02:36:25,360 And then down here, this loop is a clever way 3234 02:36:25,360 --> 02:36:27,370 of copying one file to another. 3235 02:36:27,370 --> 02:36:30,790 I'm giving myself a buffer of one byte, so just a temporary variable, just 3236 02:36:30,790 --> 02:36:33,090 like Brian's temp or empty glass. 3237 02:36:33,090 --> 02:36:35,160 And I'm using this function, fread. 3238 02:36:35,160 --> 02:36:39,750 I'm reading into that buffer via its address, the size of a byte, 3239 02:36:39,750 --> 02:36:42,870 specifically one byte from the source file. 3240 02:36:42,870 --> 02:36:47,940 And then, in that same loop, I'm writing from that buffer, the size of a byte, 3241 02:36:47,940 --> 02:36:50,950 specifically one byte, to the destination. 3242 02:36:50,950 --> 02:36:53,760 So literally, the CP program you might have seen me use 3243 02:36:53,760 --> 02:36:57,090 or you yourself have used to copy files, is literally doing this. 3244 02:36:57,090 --> 02:36:59,790 It's opening one file, iterating over all of its bytes, 3245 02:36:59,790 --> 02:37:02,010 and copying them from source to destination. 3246 02:37:02,010 --> 02:37:04,260 And then lastly, it's closing the file. 3247 02:37:04,260 --> 02:37:06,360 And these last two examples deliberately fast, 3248 02:37:06,360 --> 02:37:11,130 because this whole week will be spent diving into file I/O and images 3249 02:37:11,130 --> 02:37:11,890 thereof. 3250 02:37:11,890 --> 02:37:16,560 But all that we've done is use these fread, fopen, and fwrite and f close, 3251 02:37:16,560 --> 02:37:18,610 to manipulate those very files. 3252 02:37:18,610 --> 02:37:21,975 So for instance, if I now do this, let me do make cp. 3253 02:37:21,975 --> 02:37:25,800 OK, seems to compile, dot slash cp, brian.jpeg. 3254 02:37:25,800 --> 02:37:27,750 How about brian2.jpeg? 3255 02:37:27,750 --> 02:37:28,680 And hit Enter. 3256 02:37:28,680 --> 02:37:29,880 Nothing seems to happen. 3257 02:37:29,880 --> 02:37:33,240 But if I go in here and double click on brian2, 3258 02:37:33,240 --> 02:37:37,420 we see that we have a second copy of Brian's actual file. 3259 02:37:37,420 --> 02:37:41,560 So this coming week, you'll experiment with multiple file formats for images. 3260 02:37:41,560 --> 02:37:42,580 The first is jpegs. 3261 02:37:42,580 --> 02:37:45,000 And we will give you a so-called forensic image 3262 02:37:45,000 --> 02:37:47,938 of a whole bunch of photographs from a digital memory card. 3263 02:37:47,938 --> 02:37:50,730 In fact, it's very common these days, certainly in law enforcement, 3264 02:37:50,730 --> 02:37:53,580 to take forensic copies of hard drives, of media sticks, 3265 02:37:53,580 --> 02:37:55,920 of phones and other devices, and then analyze them 3266 02:37:55,920 --> 02:37:58,650 for data that's been lost or corrupted or deleted. 3267 02:37:58,650 --> 02:38:01,980 We'll do exactly that, whereby, you'll write a program that recovers 3268 02:38:01,980 --> 02:38:05,850 jpegs that have been accidentally deleted from a digital memory card. 3269 02:38:05,850 --> 02:38:08,100 And we'll give you all copies of that memory card 3270 02:38:08,100 --> 02:38:11,220 by making a forensic image of it, that is copying all of the 0's and 1's 3271 02:38:11,220 --> 02:38:13,710 from a camera and giving them to you in a file 3272 02:38:13,710 --> 02:38:16,710 that you can fread and then fwrite from. 3273 02:38:16,710 --> 02:38:18,930 We'll also introduce you to bitmap files, 3274 02:38:18,930 --> 02:38:22,290 BMP's, popularized by the Windows operating 3275 02:38:22,290 --> 02:38:24,160 system for wallpaper's and the like. 3276 02:38:24,160 --> 02:38:28,470 But we'll use them to implement using pointers and using file I/O, 3277 02:38:28,470 --> 02:38:30,550 your very own Instagram-like filter. 3278 02:38:30,550 --> 02:38:33,540 So we'll take this picture, here, of the Weeks footbridge 3279 02:38:33,540 --> 02:38:35,578 here in Cambridge, Massachusetts by Harvard. 3280 02:38:35,578 --> 02:38:37,620 And we'll have you implement a number of filters, 3281 02:38:37,620 --> 02:38:39,328 taking this original image, for instance, 3282 02:38:39,328 --> 02:38:41,910 and desaturating it, making it black and white, 3283 02:38:41,910 --> 02:38:45,210 by iterating over all of the pixels top to bottom, left to right, 3284 02:38:45,210 --> 02:38:49,350 and recognizing any colors, like red or green or blue or anything in between, 3285 02:38:49,350 --> 02:38:53,467 and changing them to some shade of gray, doing a sepia filter, 3286 02:38:53,467 --> 02:38:55,800 making things look old school, like this photo was taken 3287 02:38:55,800 --> 02:39:00,810 many years ago, by similarly applying a heuristic that alters the colors of all 3288 02:39:00,810 --> 02:39:02,345 of the pixels in this picture. 3289 02:39:02,345 --> 02:39:05,220 We'll have you flip it around so you have to put this pixel over here 3290 02:39:05,220 --> 02:39:06,630 and this pixel over there. 3291 02:39:06,630 --> 02:39:09,690 And you'll appreciate exactly how files are implemented 3292 02:39:09,690 --> 02:39:12,180 within your own hard drive and phone. 3293 02:39:12,180 --> 02:39:17,580 And you'll even implement, for instance, a blur filter, which no accident, 3294 02:39:17,580 --> 02:39:20,010 makes it harder to see what's going on here, 3295 02:39:20,010 --> 02:39:23,700 because you're starting to, now, average together pixels that are nearby 3296 02:39:23,700 --> 02:39:27,090 each other to kind of gloss things over and deliberately 3297 02:39:27,090 --> 02:39:28,990 make it harder to see here. 3298 02:39:28,990 --> 02:39:30,733 And so we'll even, if you so choose, have 3299 02:39:30,733 --> 02:39:33,150 you implement edge detection, if feeling more comfortable, 3300 02:39:33,150 --> 02:39:37,020 where you find the edges of all of the physical objects in these pictures, 3301 02:39:37,020 --> 02:39:43,350 in order to actually detect them in code and create visual art like this. 3302 02:39:43,350 --> 02:39:44,220 Now, this was a lot. 3303 02:39:44,220 --> 02:39:45,960 And I know pointers are generally considered 3304 02:39:45,960 --> 02:39:47,820 to be among the more challenging features of C, 3305 02:39:47,820 --> 02:39:49,403 and certainly, programming in general. 3306 02:39:49,403 --> 02:39:52,140 So if you're feeling like it's been quite a bit, it was. 3307 02:39:52,140 --> 02:39:55,290 But you do now have the ability, either today 3308 02:39:55,290 --> 02:39:59,040 or in the very near term, to understand even XKCD comics like this that most 3309 02:39:59,040 --> 02:40:00,990 any computer scientist out there has seen. 3310 02:40:00,990 --> 02:40:05,130 So our final look for you, today, is on this joke here. 3311 02:40:05,130 --> 02:40:10,050 And even though I can't necessarily hear you from afar, 3312 02:40:10,050 --> 02:40:12,690 I'll just assume, in our final moments today, 3313 02:40:12,690 --> 02:40:16,650 that everyone is breaking out into a very geeky laughter. 3314 02:40:16,650 --> 02:40:19,530 And I see some smiles, at least, which is reassuring. 3315 02:40:19,530 --> 02:40:21,480 This was, then, CS50. 3316 02:40:21,480 --> 02:40:23,010 We'll see you next time. 3317 02:40:23,010 --> 02:40:26,360 [MUSIC PLAYING] 3318 02:40:26,360 --> 02:41:23,000