1 00:00:00,000 --> 00:00:03,486 [MUSIC PLAYING] 2 00:00:03,486 --> 00:01:01,752 3 00:01:01,752 --> 00:01:03,600 SPEAKER 1: All right. 4 00:01:03,600 --> 00:01:06,200 So this is CS50, and this is week four. 5 00:01:06,200 --> 00:01:07,950 And this is actually one of the weeks that 6 00:01:07,950 --> 00:01:11,040 really makes CS50, CS50, insofar as we'll 7 00:01:11,040 --> 00:01:13,590 take an even lower level look at how computers work, 8 00:01:13,590 --> 00:01:16,050 and in turn, what it is you're doing when you write code 9 00:01:16,050 --> 00:01:18,900 toward an end of really giving you a complete mental model of what's 10 00:01:18,900 --> 00:01:21,983 going on inside, so that when you run to solve some problem, when you want 11 00:01:21,983 --> 00:01:25,050 to fix, solve some problem, when you want to write some code, 12 00:01:25,050 --> 00:01:28,260 you actually know what those building blocks inside of the computer 13 00:01:28,260 --> 00:01:29,570 itself actually are. 14 00:01:29,570 --> 00:01:32,070 We'll, ultimately, too, take off some of the training wheels 15 00:01:32,070 --> 00:01:34,570 that we've had on for the past few weeks, particularly in C, 16 00:01:34,570 --> 00:01:37,150 and we'll also introduce more familiar media types. 17 00:01:37,150 --> 00:01:39,750 So files, like images are sort of everywhere. 18 00:01:39,750 --> 00:01:41,730 And we'll introduce you to exactly what's 19 00:01:41,730 --> 00:01:43,980 going on when you just look at a photograph, or a GIF, 20 00:01:43,980 --> 00:01:47,190 or a PNG, or any kind of image on your screen like this one here. 21 00:01:47,190 --> 00:01:51,240 And it will become clear that, unlike Hollywood TV shows and movies, 22 00:01:51,240 --> 00:01:53,790 if you try to enhance a picture like this 23 00:01:53,790 --> 00:01:57,450 to look closer, and closer, and closer, in the movies typically 24 00:01:57,450 --> 00:02:00,540 trying to figure out who the bad guy is, for instance, eventually, 25 00:02:00,540 --> 00:02:04,600 you run out of information because there's only a finite number of bits 26 00:02:04,600 --> 00:02:06,392 or bytes that compose these files. 27 00:02:06,392 --> 00:02:08,350 So any time you've seen computers that you just 28 00:02:08,350 --> 00:02:11,410 hit a button, and boom, it's enhanced, and all of a sudden the suspect 29 00:02:11,410 --> 00:02:14,770 is clear, that's a lot more Hollywood than it is computer science. 30 00:02:14,770 --> 00:02:16,720 But with that said, later in the term, we 31 00:02:16,720 --> 00:02:18,850 will talk about artificial intelligence. 32 00:02:18,850 --> 00:02:22,270 And even though there might not be that information there, 33 00:02:22,270 --> 00:02:25,720 through statistical reasoning, and modeling, and predictions, 34 00:02:25,720 --> 00:02:28,090 can computers increasingly actually create 35 00:02:28,090 --> 00:02:30,520 information, where perhaps there was none, just based 36 00:02:30,520 --> 00:02:32,080 on what's most likely to be there? 37 00:02:32,080 --> 00:02:34,030 So more on that before long, too. 38 00:02:34,030 --> 00:02:37,220 But you'll see that all of these dots on the screen, all of these pixels, 39 00:02:37,220 --> 00:02:39,610 so to speak, are just a grid up, down, left, right that 40 00:02:39,610 --> 00:02:40,720 compose these pictures. 41 00:02:40,720 --> 00:02:43,300 And we're fortunate to have three volunteers on stage who 42 00:02:43,300 --> 00:02:47,620 kindly, just before the lecture began, created their own pixel artwork, 43 00:02:47,620 --> 00:02:49,870 so to speak, on this here easel. 44 00:02:49,870 --> 00:02:52,310 If you guys would like to spin this around, 45 00:02:52,310 --> 00:02:57,260 let's see what it is you've been working on. 46 00:02:57,260 --> 00:03:00,010 And if you'd like to introduce yourselves as our three artists 47 00:03:00,010 --> 00:03:00,700 today, first. 48 00:03:00,700 --> 00:03:03,090 SPEAKER 2: Yes, I'm Talia. 49 00:03:03,090 --> 00:03:07,000 I'm a junior at the college studying economics with a possible computer 50 00:03:07,000 --> 00:03:08,470 science secondary. 51 00:03:08,470 --> 00:03:09,450 SPEAKER 3: Hi. 52 00:03:09,450 --> 00:03:10,440 My name is Bulut. 53 00:03:10,440 --> 00:03:12,310 I'm from BU. 54 00:03:12,310 --> 00:03:13,470 SPEAKER 1: Welcome. 55 00:03:13,470 --> 00:03:16,530 SPEAKER 4: I'm a Assalo Caesar, self-taught computer science student. 56 00:03:16,530 --> 00:03:19,230 I've been working as a software engineer since age 16. 57 00:03:19,230 --> 00:03:20,010 SPEAKER 1: Nice. 58 00:03:20,010 --> 00:03:21,160 Well, welcome to you all. 59 00:03:21,160 --> 00:03:22,830 And if you would like to give us a description of what it 60 00:03:22,830 --> 00:03:24,510 is that you built out of pixels here. 61 00:03:24,510 --> 00:03:27,330 SPEAKER 2: So we built a firework. 62 00:03:27,330 --> 00:03:28,920 SPEAKER 1: OK, nice. 63 00:03:28,920 --> 00:03:31,710 And it's very blocky because what we've given 64 00:03:31,710 --> 00:03:34,890 them is post-it notes, each of which represents one of these pixels or dots. 65 00:03:34,890 --> 00:03:36,930 Now, typically, it might be black or white, 66 00:03:36,930 --> 00:03:39,580 but the post-it notes we have here are pink or blue. 67 00:03:39,580 --> 00:03:41,580 So each of these represents a dot on the screen. 68 00:03:41,580 --> 00:03:43,740 And I gather you did one other that actually 69 00:03:43,740 --> 00:03:47,820 conveys maybe a bit more information, if you want to reveal version two. 70 00:03:47,820 --> 00:03:50,550 And thus we have yet more pixel art. 71 00:03:50,550 --> 00:03:54,510 So maybe round of applause for what our volunteers were able to do using 72 00:03:54,510 --> 00:03:55,380 pixels alone. 73 00:03:55,380 --> 00:03:56,010 Thank you. 74 00:03:56,010 --> 00:04:00,420 We have, as always, limited supply of delicious Super Mario Brothers 75 00:04:00,420 --> 00:04:01,500 Oreos for each of you. 76 00:04:01,500 --> 00:04:03,160 Thank you so much for coming up. 77 00:04:03,160 --> 00:04:03,858 But thank you. 78 00:04:03,858 --> 00:04:05,650 But the point here, really, is that there's 79 00:04:05,650 --> 00:04:08,320 only so much you can do when you just have dots on the screen. 80 00:04:08,320 --> 00:04:10,960 Now, of course, the image that we saw a moment ago of these 81 00:04:10,960 --> 00:04:13,510 here stress balls is much higher quality. 82 00:04:13,510 --> 00:04:17,360 It's much higher fidelity, or more specifically, much higher resolution. 83 00:04:17,360 --> 00:04:20,490 And resolution just refers to how many dots or pixels are on the screen. 84 00:04:20,490 --> 00:04:22,240 And the smaller they are, and the more you 85 00:04:22,240 --> 00:04:25,850 cram in on the screen, the clearer, and clearer the images are. 86 00:04:25,850 --> 00:04:28,960 But at the end of the day, even this here pixel art 87 00:04:28,960 --> 00:04:32,950 represents what's going on your phone, your laptop, your desktop, your TV 88 00:04:32,950 --> 00:04:36,160 nowadays, because all it is this grid of pixels. 89 00:04:36,160 --> 00:04:39,820 Now, before we can actually write code that actually manipulates 90 00:04:39,820 --> 00:04:42,130 these kinds of images, we need to understand, 91 00:04:42,130 --> 00:04:44,980 and we need to have some new syntax for navigating files. 92 00:04:44,980 --> 00:04:48,400 So not just text, but files stored somewhere on the computer, 93 00:04:48,400 --> 00:04:49,570 somewhere on the server. 94 00:04:49,570 --> 00:04:52,580 But let's consider how we might store even information like this. 95 00:04:52,580 --> 00:04:53,740 But we'll make it simpler. 96 00:04:53,740 --> 00:04:57,190 Here is a grid of zeros and ones, clearly. 97 00:04:57,190 --> 00:04:59,740 But I would argue that each of these might as well represent 98 00:04:59,740 --> 00:05:01,450 a pixel, an individual dot. 99 00:05:01,450 --> 00:05:04,720 And if that dot is a zero, it's representing the color black. 100 00:05:04,720 --> 00:05:09,040 If that dot is a one, it's representing the color white. 101 00:05:09,040 --> 00:05:15,940 Given that, can anyone see what this grid is a picture of, 102 00:05:15,940 --> 00:05:20,920 even though it's using zeros and ones and not post-it notes, like this here? 103 00:05:20,920 --> 00:05:23,360 Yeah, in the back? 104 00:05:23,360 --> 00:05:25,010 It's a smiley face. 105 00:05:25,010 --> 00:05:26,150 How do you see that? 106 00:05:26,150 --> 00:05:28,260 Well, in a moment it's going to be super obvious. 107 00:05:28,260 --> 00:05:32,600 But if I actually get rid of the ones, leaving just the zeros, 108 00:05:32,600 --> 00:05:35,467 there you have the zeros that were there just a moment ago. 109 00:05:35,467 --> 00:05:37,550 So what this translates to, typically on a screen, 110 00:05:37,550 --> 00:05:40,670 is not a pattern of zeros and ones literally on the screen, 111 00:05:40,670 --> 00:05:41,970 but a pattern of dots. 112 00:05:41,970 --> 00:05:46,070 So again, white might be one, and black might be-- 113 00:05:46,070 --> 00:05:48,110 one might be white. 114 00:05:48,110 --> 00:05:49,220 Zero might be black. 115 00:05:49,220 --> 00:05:52,130 But we picture it, of course, on our screens as this actual grid. 116 00:05:52,130 --> 00:05:53,570 But that's really all we need. 117 00:05:53,570 --> 00:05:55,880 Inside of a file to store something like an image, 118 00:05:55,880 --> 00:05:58,070 we just need a pattern of zeros and ones. 119 00:05:58,070 --> 00:06:00,770 But of course, having more colors would be more interesting. 120 00:06:00,770 --> 00:06:04,400 And if you actually have a larger grid, you can do even more with pixel art. 121 00:06:04,400 --> 00:06:06,740 And in fact for fun, at the beginning of the semester, 122 00:06:06,740 --> 00:06:10,220 we have a staff training with all of the teaching fellows, course assistants, 123 00:06:10,220 --> 00:06:13,580 teaching assistants, and we gave them all this Google spreadsheet. 124 00:06:13,580 --> 00:06:16,280 And we sort of resized all of the rows and columns 125 00:06:16,280 --> 00:06:19,440 to just be squares instead of the default rectangles. 126 00:06:19,440 --> 00:06:21,980 And then we encouraged them to create something out of this. 127 00:06:21,980 --> 00:06:25,230 And in fact, just a few weeks ago, here are some of this year's creations, 128 00:06:25,230 --> 00:06:28,410 creating, essentially, images using Google Spreadsheets 129 00:06:28,410 --> 00:06:32,080 by treating each of the cells as just a dot on the screen. 130 00:06:32,080 --> 00:06:33,960 So here, we have a team who in a few minutes 131 00:06:33,960 --> 00:06:36,550 made a Super Mario World, a bigger canvas, of course, 132 00:06:36,550 --> 00:06:37,680 than this here easel. 133 00:06:37,680 --> 00:06:42,030 Here we have a pixel based version of Scratch. 134 00:06:42,030 --> 00:06:46,080 Here, we had an homage to the Harvard-Yale football competition. 135 00:06:46,080 --> 00:06:48,960 And then here, we had a character of some sort. 136 00:06:48,960 --> 00:06:50,670 So this is what the team here did. 137 00:06:50,670 --> 00:06:52,290 And actually, if you'd like to play along at home 138 00:06:52,290 --> 00:06:54,582 at the risk of distracting you the entirety of lecture, 139 00:06:54,582 --> 00:06:56,880 if you go to this URL here, it'll actually give you 140 00:06:56,880 --> 00:06:59,430 a copy of that same blank spreadsheet. 141 00:06:59,430 --> 00:07:02,010 But let's talk about representing, not just zeros and ones, 142 00:07:02,010 --> 00:07:03,930 and black and white, but actual colors. 143 00:07:03,930 --> 00:07:06,630 And so recall from week zero when we talked 144 00:07:06,630 --> 00:07:09,120 about how to represent information, colors among, 145 00:07:09,120 --> 00:07:12,850 them we introduced RGB, which stands for red, green, blue. 146 00:07:12,850 --> 00:07:15,600 And it's just this kind of convention of using some amount of red, 147 00:07:15,600 --> 00:07:18,660 some amount of green, and some amount of blue mixed together to give you 148 00:07:18,660 --> 00:07:20,895 the actual color that you want. 149 00:07:20,895 --> 00:07:22,770 Well, it turns out in the world of computers, 150 00:07:22,770 --> 00:07:27,460 there's a standard way for describing those amounts of red, green, and blue. 151 00:07:27,460 --> 00:07:29,460 At the end of the day, it's of course just bits. 152 00:07:29,460 --> 00:07:33,040 And equivalently, it's just numbers, like 72, 73, 153 00:07:33,040 --> 00:07:37,180 33 was the arbitrary example we used in week zero for the color yellow. 154 00:07:37,180 --> 00:07:40,500 But there actually tends to be a different notation by convention 155 00:07:40,500 --> 00:07:43,320 for representing colors that we'll actually see today, too, 156 00:07:43,320 --> 00:07:45,160 as we explore the world of memory. 157 00:07:45,160 --> 00:07:46,958 So here's a screenshot of Photoshop. 158 00:07:46,958 --> 00:07:49,500 If you've never used it before, this is like the color picker 159 00:07:49,500 --> 00:07:52,620 that you can pull up, just to pick any number of millions of colors 160 00:07:52,620 --> 00:07:54,750 by clicking and dragging, or typing in numbers. 161 00:07:54,750 --> 00:07:56,190 But notice down here. 162 00:07:56,190 --> 00:07:58,680 We've picked at the moment the color black by moving 163 00:07:58,680 --> 00:08:01,480 the slider all the way down here to the bottom left hand corner. 164 00:08:01,480 --> 00:08:03,480 And what this user interface is telling us 165 00:08:03,480 --> 00:08:06,060 is that there's zero red, zero green, zero blue. 166 00:08:06,060 --> 00:08:09,630 And a conventional way of writing this on a screen 167 00:08:09,630 --> 00:08:14,040 would be, literally, a hash symbol, and then three pairs of digits. 168 00:08:14,040 --> 00:08:18,150 zero, zero for red, zero, zero for green, zero, zero for blue. 169 00:08:18,150 --> 00:08:22,530 If by contrast, you were to pick the color, say, white in Photoshop, 170 00:08:22,530 --> 00:08:23,580 it gets a little weird. 171 00:08:23,580 --> 00:08:27,210 Now it's a lot of red, a lot of green, a lot of blue, as you might expect, 172 00:08:27,210 --> 00:08:28,800 cranking all of those values up. 173 00:08:28,800 --> 00:08:31,950 But the way you write it conventionally is not using decimal, 174 00:08:31,950 --> 00:08:34,710 but using letters of the alphabet, it would seem here. 175 00:08:34,710 --> 00:08:38,130 So FF for red, FF, for Green, FF for blue. 176 00:08:38,130 --> 00:08:39,270 More on that in a moment. 177 00:08:39,270 --> 00:08:43,350 When it comes to representing red, here's a lot of red, 255. 178 00:08:43,350 --> 00:08:44,880 Zero green, zero blue. 179 00:08:44,880 --> 00:08:48,930 And so the pattern is now FF0000. 180 00:08:48,930 --> 00:08:52,740 Before I reveal what green is, what probably should it be? 181 00:08:52,740 --> 00:08:53,520 What pattern? 182 00:08:53,520 --> 00:08:56,250 Yeah. 183 00:08:56,250 --> 00:08:56,760 Close. 184 00:08:56,760 --> 00:09:04,050 Not 0000FF, but 00FF00 because it seems to be following this pattern, indeed, 185 00:09:04,050 --> 00:09:05,940 from left to right of red, green, blue. 186 00:09:05,940 --> 00:09:11,190 So zero red, 255 green, zero blue, and thus 00FF00. 187 00:09:11,190 --> 00:09:15,150 And then lastly, if we do solid blue, it's zero red, zero green, 188 00:09:15,150 --> 00:09:18,660 a lot of blue, and thus 0000FF. 189 00:09:18,660 --> 00:09:24,712 So somehow or other, FF is apparently representing the number 255. 190 00:09:24,712 --> 00:09:26,170 And we'll see why in just a moment. 191 00:09:26,170 --> 00:09:27,920 But recall that in the world of computers, 192 00:09:27,920 --> 00:09:29,430 they just speak zeros and ones. 193 00:09:29,430 --> 00:09:31,872 And we've seen that already in black and white form. 194 00:09:31,872 --> 00:09:34,830 We of course, in the real world, tend to use decimal instead of binary. 195 00:09:34,830 --> 00:09:38,400 So we have 10 digits at our disposal, zero through nine. 196 00:09:38,400 --> 00:09:42,130 But it turns out that in the world of graphics and colors, 197 00:09:42,130 --> 00:09:44,430 turns out in the world of computer memory, 198 00:09:44,430 --> 00:09:48,480 it tends to be convenient not to use binary, per se, not to use decimal, 199 00:09:48,480 --> 00:09:50,880 per se, but to use something called hexadecimal, 200 00:09:50,880 --> 00:09:54,480 where as soon as you need more than 10 digits total, 201 00:09:54,480 --> 00:09:56,460 you start stealing from the English alphabet. 202 00:09:56,460 --> 00:09:58,950 So the next few numbers, or digits rather, 203 00:09:58,950 --> 00:10:04,080 are A, B, C, D, E, F. And there's other systems that 204 00:10:04,080 --> 00:10:06,720 use even more letters of the alphabet, but this is probably 205 00:10:06,720 --> 00:10:08,680 the last we'll discuss in any detail. 206 00:10:08,680 --> 00:10:12,750 So in this case, we have a total of 10 plus one, two, three, four, five, 207 00:10:12,750 --> 00:10:15,180 six, so 16 total, a.k.a. 208 00:10:15,180 --> 00:10:18,262 hexadecimal, or what we might call base 16. 209 00:10:18,262 --> 00:10:20,220 And the capitalization actually doesn't matter. 210 00:10:20,220 --> 00:10:22,825 It's conventional to use uppercase or lowercase, 211 00:10:22,825 --> 00:10:24,450 so long as you're generally consistent. 212 00:10:24,450 --> 00:10:26,710 So hexa, implying 16, decimal. 213 00:10:26,710 --> 00:10:30,360 So hexadecimal notation here, or otherwise known as base 16, 214 00:10:30,360 --> 00:10:34,440 for mathematical reasons that go back to our discussion in week zero. 215 00:10:34,440 --> 00:10:37,500 So here's some of that same reasoning from week zero. 216 00:10:37,500 --> 00:10:40,350 How might we go about representing using two 217 00:10:40,350 --> 00:10:44,980 digits in hexadecimal, different numbers that you and I know as decimal? 218 00:10:44,980 --> 00:10:49,925 Well, if we consider this as being the 16 to the zeros place, 16 219 00:10:49,925 --> 00:10:52,300 to the ones place, and if we do out that math, of course, 220 00:10:52,300 --> 00:10:54,730 that gives us the ones place and the sixteens place. 221 00:10:54,730 --> 00:10:57,750 So we've only changed the base, not the story from week zero. 222 00:10:57,750 --> 00:11:01,680 So if we were to start representing actual values in hexadecimal, 223 00:11:01,680 --> 00:11:03,930 here are two zeros. 224 00:11:03,930 --> 00:11:07,530 So that's 1 times 0 plus 16 times 0, which, of course, gives us 225 00:11:07,530 --> 00:11:08,970 the number you and I know is zero. 226 00:11:08,970 --> 00:11:12,030 So in hexadecimal, and in binary, and in decimal, it's 227 00:11:12,030 --> 00:11:15,030 the same way to represent the number you and I know as zero. 228 00:11:15,030 --> 00:11:17,640 But here now is the number one in hexadecimal. 229 00:11:17,640 --> 00:11:18,930 Here's the number two. 230 00:11:18,930 --> 00:11:24,340 Here's the number three, four, five, six, seven, eight, nine. 231 00:11:24,340 --> 00:11:28,120 So it's identical up until this point to our world of decimal. 232 00:11:28,120 --> 00:11:32,590 But how do I count up to what you and I would call 10 in decimal, 233 00:11:32,590 --> 00:11:36,130 according to what we're seeing here thus far? 234 00:11:36,130 --> 00:11:36,640 Yeah. 235 00:11:36,640 --> 00:11:39,970 So now it goes up to A, because A would, apparently, 236 00:11:39,970 --> 00:11:41,940 represent what you and I know as 10. 237 00:11:41,940 --> 00:11:43,420 B represents 11. 238 00:11:43,420 --> 00:11:47,740 C represents 12, 13, 14, 15. 239 00:11:47,740 --> 00:11:50,890 How, though, do I count up to 16? 240 00:11:50,890 --> 00:11:52,330 Yeah. 241 00:11:52,330 --> 00:11:53,080 Exactly. 242 00:11:53,080 --> 00:11:56,320 So not 10, quote unquote, but one, zero because the one 243 00:11:56,320 --> 00:11:59,920 in the second column here to the left actually represents the sixteens place. 244 00:11:59,920 --> 00:12:05,180 So it's 16 times 1 gives you 16, plus 1 times 0 gives you 0, so 16 in total. 245 00:12:05,180 --> 00:12:09,350 So this now is the way the number you and I would think of as 17, 246 00:12:09,350 --> 00:12:13,990 18, 19, 20, 21, dot, dot, dot. 247 00:12:13,990 --> 00:12:16,690 And if we go all the way up, as high up as we 248 00:12:16,690 --> 00:12:20,450 can count, well, what's the largest digit, apparently, in hexadecimal? 249 00:12:20,450 --> 00:12:23,980 The smallest is clearly zero, and the biggest I said was F. 250 00:12:23,980 --> 00:12:26,680 So once you get to FF, the math gets a little annoying. 251 00:12:26,680 --> 00:12:33,250 But this is now 16 times 15 plus 1 times 15. 252 00:12:33,250 --> 00:12:37,820 And what that gives us, actually, is the number you and I know as 255. 253 00:12:37,820 --> 00:12:39,045 So we saw it in Photoshop. 254 00:12:39,045 --> 00:12:40,420 We've seen it now in hexadecimal. 255 00:12:40,420 --> 00:12:42,545 This is not math that you would ever do frequently, 256 00:12:42,545 --> 00:12:45,610 but indeed, it's the exact same system as week zero, 257 00:12:45,610 --> 00:12:46,930 just with a different base. 258 00:12:46,930 --> 00:12:48,890 But why all of this additional complexity? 259 00:12:48,890 --> 00:12:51,640 Why are we jumping through these hoops introducing yet another one 260 00:12:51,640 --> 00:12:54,370 to give us just some pattern like this of FF? 261 00:12:54,370 --> 00:12:57,070 Well, it turns out that hexadecimal is just convenient. 262 00:12:57,070 --> 00:12:57,640 Why? 263 00:12:57,640 --> 00:13:00,400 Well, if you have 16 digits in your alphabet, 264 00:13:00,400 --> 00:13:04,450 zero through F, how many bits, how many zeros and ones 265 00:13:04,450 --> 00:13:09,330 do you need to represent 16 different values? 266 00:13:09,330 --> 00:13:10,520 It's four, right? 267 00:13:10,520 --> 00:13:14,210 Because if you've got four bits, that's two possibilities for the first times 268 00:13:14,210 --> 00:13:16,070 2, times 2, times 2. 269 00:13:16,070 --> 00:13:17,660 So that's 16 possibilities. 270 00:13:17,660 --> 00:13:18,900 2 to the fourth power. 271 00:13:18,900 --> 00:13:22,190 And if you've got four bits represented by a single digit, 272 00:13:22,190 --> 00:13:25,970 it's just convenient in practice for computer scientists and programmers. 273 00:13:25,970 --> 00:13:29,000 So F might indeed represent 1111. 274 00:13:29,000 --> 00:13:31,280 But that's not a full byte, which is eight bits. 275 00:13:31,280 --> 00:13:33,950 And no one counts in units of four in computing. 276 00:13:33,950 --> 00:13:38,790 It's always in units of, like, eight, or 16, or 32, or 64, or the like. 277 00:13:38,790 --> 00:13:42,110 So it turns out, though, because hexadecimal lends itself 278 00:13:42,110 --> 00:13:46,070 to representing four bits at a time, well, if you just use two of them, 279 00:13:46,070 --> 00:13:47,960 you can represent eight bits at a time. 280 00:13:47,960 --> 00:13:51,560 And eight bits is a byte, which is a common unit of measure. 281 00:13:51,560 --> 00:13:56,270 And this is why even Photoshop uses this convention, as do color programs, 282 00:13:56,270 --> 00:14:01,040 as does web development, more generally, of using two hexadecimal digits just 283 00:14:01,040 --> 00:14:02,895 to represent single bytes. 284 00:14:02,895 --> 00:14:06,020 Because the one on the left represents the first bits, the first four bits. 285 00:14:06,020 --> 00:14:08,250 The one on the right represents the second four bits. 286 00:14:08,250 --> 00:14:10,020 So it's not a big deal, per se. 287 00:14:10,020 --> 00:14:14,040 It's just convenient, even though this might feel like a lot all at once. 288 00:14:14,040 --> 00:14:18,000 Any questions then on hexadecimal? 289 00:14:18,000 --> 00:14:20,370 Yeah, in the middle. 290 00:14:20,370 --> 00:14:21,690 No. 291 00:14:21,690 --> 00:14:22,680 OK, no. 292 00:14:22,680 --> 00:14:24,480 Questions on hexadecimal. 293 00:14:24,480 --> 00:14:25,380 All right. 294 00:14:25,380 --> 00:14:32,220 So with this system in mind, let's go about considering where else we might 295 00:14:32,220 --> 00:14:34,110 see this in the computing world. 296 00:14:34,110 --> 00:14:37,320 And I would propose that we consider, as we've done in the past, 297 00:14:37,320 --> 00:14:40,420 that our computer is really just this grid of memory, for instance, 298 00:14:40,420 --> 00:14:42,990 where each of these squares represents a single byte. 299 00:14:42,990 --> 00:14:45,660 And I proposed a couple of times already that, when 300 00:14:45,660 --> 00:14:47,580 we talk about a computer's memory, we can 301 00:14:47,580 --> 00:14:50,970 think of them as each of these squares as having an individual location. 302 00:14:50,970 --> 00:14:53,850 Like, I spitballed back in the day that maybe this is the first byte, 303 00:14:53,850 --> 00:14:55,350 the second byte, the third byte. 304 00:14:55,350 --> 00:14:57,840 Maybe this is the billionth byte, so we can number 305 00:14:57,840 --> 00:14:59,670 of the bytes inside of a computer. 306 00:14:59,670 --> 00:15:02,040 Well, it turns out, as we'll see today in code, 307 00:15:02,040 --> 00:15:06,330 computers typically use numbers, indeed, to represent 308 00:15:06,330 --> 00:15:08,880 all of the bytes in their memory, and they typically 309 00:15:08,880 --> 00:15:11,740 use hexadecimal notation for such by convention. 310 00:15:11,740 --> 00:15:12,900 So what do I mean by that? 311 00:15:12,900 --> 00:15:15,000 Technically, if we were to start numbering these 312 00:15:15,000 --> 00:15:17,190 and count at zero, as most programmers would, 313 00:15:17,190 --> 00:15:19,960 this is byte zero, one, two, three, dot, dot, dot. 314 00:15:19,960 --> 00:15:21,580 This is byte 15. 315 00:15:21,580 --> 00:15:25,060 But if I wanted to keep going, it would be then 16, 17, 18, 316 00:15:25,060 --> 00:15:27,560 but that's not the true in hexadecimal. 317 00:15:27,560 --> 00:15:30,040 So instead in hexadecimal, once you hit the nine, 318 00:15:30,040 --> 00:15:33,160 you'd actually use A through F, just as I've proposed. 319 00:15:33,160 --> 00:15:37,270 Meanwhile, if you kept going thereafter, you would have one zero. 320 00:15:37,270 --> 00:15:39,010 But as you noted, this is not 10. 321 00:15:39,010 --> 00:15:44,270 This is 16 here, 17, 18, 19. 322 00:15:44,270 --> 00:15:46,250 And so here's where things get a little weird. 323 00:15:46,250 --> 00:15:47,680 I'm saying 16. 324 00:15:47,680 --> 00:15:51,460 I'm saying 17, and you're obviously seeing what any reasonable person would 325 00:15:51,460 --> 00:15:53,200 read as 10 and 11. 326 00:15:53,200 --> 00:15:56,170 So there's this dichotomy, and so we need some convention 327 00:15:56,170 --> 00:15:59,690 for making clear to the reader that these are hexadecimal numbers, not 328 00:15:59,690 --> 00:16:00,190 decimal. 329 00:16:00,190 --> 00:16:01,960 Otherwise, it's completely ambiguous. 330 00:16:01,960 --> 00:16:04,877 And the convention there, which you might have seen in the real world, 331 00:16:04,877 --> 00:16:09,215 even though it's a bit weird, is just to prefix hexadecimal numbers with zero X. 332 00:16:09,215 --> 00:16:10,840 It's not doing anything mathematically. 333 00:16:10,840 --> 00:16:12,790 It's not multiplication or anything like that. 334 00:16:12,790 --> 00:16:17,380 Just zero X means, here comes a hexadecimal number hereafter, just 335 00:16:17,380 --> 00:16:20,170 to distinguish it from decimal. 336 00:16:20,170 --> 00:16:24,190 And you can see that, even though we don't have enough room for 255 bytes, 337 00:16:24,190 --> 00:16:26,482 you start to see patterns that we haven't even 338 00:16:26,482 --> 00:16:29,440 talked about yet because we're just using those two columns as the ones 339 00:16:29,440 --> 00:16:32,620 place, the 16th place, and so forth. 340 00:16:32,620 --> 00:16:35,540 Capital or uppercase is fine. 341 00:16:35,540 --> 00:16:36,040 All right. 342 00:16:36,040 --> 00:16:39,820 So with that said, let's actually do things more technically interesting, 343 00:16:39,820 --> 00:16:42,220 like looking back at some code that we've already seen 344 00:16:42,220 --> 00:16:47,560 and seeing what we can actually glean from this newfound representation 345 00:16:47,560 --> 00:16:48,450 of memory location. 346 00:16:48,450 --> 00:16:51,700 So I'm going to go over to VS Code here, where I've opened my terminal window, 347 00:16:51,700 --> 00:16:53,020 but no code file yet. 348 00:16:53,020 --> 00:16:56,470 And I'm going to go ahead and create a file called addresses.c 349 00:16:56,470 --> 00:17:01,270 because I want to start playing around now with the addresses of information 350 00:17:01,270 --> 00:17:02,390 in my computer. 351 00:17:02,390 --> 00:17:05,170 And to do this, let me do something super simple first. 352 00:17:05,170 --> 00:17:07,480 Let me include standard io.h. 353 00:17:07,480 --> 00:17:10,490 Let me do an int main void, no command line arguments. 354 00:17:10,490 --> 00:17:13,359 And then in here, let me do exactly the line of code we just saw. 355 00:17:13,359 --> 00:17:18,069 Declare an int called n, set it equal to a default value of 50. 356 00:17:18,069 --> 00:17:20,920 And just so that the program does something noteworthy, 357 00:17:20,920 --> 00:17:24,339 let's have it actually print out %i backslash n, 358 00:17:24,339 --> 00:17:25,960 and plug-in that value of n. 359 00:17:25,960 --> 00:17:28,720 So this is, like, week one stuff, just creating a variable, 360 00:17:28,720 --> 00:17:32,030 and printing out its value, just to make sure that we're on the same page. 361 00:17:32,030 --> 00:17:35,800 So let me do make addresses in my terminal window, enter. 362 00:17:35,800 --> 00:17:38,140 And when I do dot slash addresses, no surprise. 363 00:17:38,140 --> 00:17:40,780 I should indeed see the number 50. 364 00:17:40,780 --> 00:17:45,200 But let's consider what that actually does inside of the computer 365 00:17:45,200 --> 00:17:48,640 now by flipping over, for instance, to this same line of code, 366 00:17:48,640 --> 00:17:50,505 and translating it into this same grid. 367 00:17:50,505 --> 00:17:52,630 So here's a grid of memory, and I don't necessarily 368 00:17:52,630 --> 00:17:54,370 know where in the computer's memory it's going to end up. 369 00:17:54,370 --> 00:17:56,140 So I'm picking spots arbitrarily. 370 00:17:56,140 --> 00:18:01,120 But I know that an int, typically, is four bytes on most systems. 371 00:18:01,120 --> 00:18:03,700 And so I've used one, two, three, four squares. 372 00:18:03,700 --> 00:18:06,610 And the first four that I assume are available are down here, 373 00:18:06,610 --> 00:18:09,220 and I'm calling this n, and I'm putting the value 50 in it. 374 00:18:09,220 --> 00:18:13,900 So literally, when you write that line of code, int n equals 50 semicolon, 375 00:18:13,900 --> 00:18:16,960 the computer's doing something like this underneath the hood. 376 00:18:16,960 --> 00:18:20,630 Might be over here, might be over there, but I've drawn it simply down there. 377 00:18:20,630 --> 00:18:24,760 But that means that that 50 and that variable n, 378 00:18:24,760 --> 00:18:28,790 in particular, live somewhere in the computer's memory. 379 00:18:28,790 --> 00:18:30,187 And where might it live? 380 00:18:30,187 --> 00:18:31,270 Well, I don't really know. 381 00:18:31,270 --> 00:18:34,850 And frankly, I'm not going to care, ultimately, after today. 382 00:18:34,850 --> 00:18:39,220 But let me propose that, if all of these bytes are numbered from zero on down, 383 00:18:39,220 --> 00:18:42,823 maybe this is address OX123, for the sake of discussion. 384 00:18:42,823 --> 00:18:44,740 So it's a hexadecimal number, one, two, three. 385 00:18:44,740 --> 00:18:46,240 It's not 123. 386 00:18:46,240 --> 00:18:48,340 It's one, two, three, but in hexadecimal, 387 00:18:48,340 --> 00:18:50,140 just because it's a little easy to say. 388 00:18:50,140 --> 00:18:55,510 But that variable n clearly must live at some address. 389 00:18:55,510 --> 00:18:57,860 So can we maybe see this? 390 00:18:57,860 --> 00:19:02,710 Well, it turns out that in C, there is a bit more syntax we can introduce today 391 00:19:02,710 --> 00:19:08,770 that actually gives you access to the locations of variables 392 00:19:08,770 --> 00:19:10,810 inside of the computer's memory. 393 00:19:10,810 --> 00:19:13,000 The first of these is literally an ampersand, 394 00:19:13,000 --> 00:19:15,670 and you might pronounce that the address of operator. 395 00:19:15,670 --> 00:19:18,250 Using a single ampersand, you can actually ask the computer 396 00:19:18,250 --> 00:19:19,990 at what address is this variable. 397 00:19:19,990 --> 00:19:22,330 And then the asterisk here might be known 398 00:19:22,330 --> 00:19:25,720 as the dereference operator, which allows you to take an address 399 00:19:25,720 --> 00:19:28,720 and go to it, kind of like following a map. 400 00:19:28,720 --> 00:19:29,620 X marks the spot. 401 00:19:29,620 --> 00:19:32,630 The star will take you to that location in memory, 402 00:19:32,630 --> 00:19:34,520 so you can see what's actually there. 403 00:19:34,520 --> 00:19:35,720 So what do I mean by that? 404 00:19:35,720 --> 00:19:38,770 Well, let me go back over to VS Code here, and let me go ahead 405 00:19:38,770 --> 00:19:43,070 and change my program to be ever so slightly different, as follows. 406 00:19:43,070 --> 00:19:46,870 I'm going to still declare n, just as before, to have the value of 50. 407 00:19:46,870 --> 00:19:49,630 But instead of printing out an integer, per se, 408 00:19:49,630 --> 00:19:51,790 I'm going to print out an address. 409 00:19:51,790 --> 00:19:56,380 And it turns out the format code for that, using printf, is %p. 410 00:19:56,380 --> 00:20:00,130 And if I want to print out now the address of n, 411 00:20:00,130 --> 00:20:02,950 recall that I have these two new capabilities, the first of which 412 00:20:02,950 --> 00:20:03,670 is germane. 413 00:20:03,670 --> 00:20:07,120 The ampersand will get me the address of n. 414 00:20:07,120 --> 00:20:09,970 So let me go back now to VS Code, and let me make a change, 415 00:20:09,970 --> 00:20:13,150 whereby I'm going to change the %i to %p, 416 00:20:13,150 --> 00:20:16,720 which is going to show me an address, as opposed to an integer, per se. 417 00:20:16,720 --> 00:20:20,960 But I need to tell printf what address to show, so I don't want to print out n 418 00:20:20,960 --> 00:20:22,700 because that's literally the number 50. 419 00:20:22,700 --> 00:20:26,640 I want to print out the address of n, like, where is it in memory. 420 00:20:26,640 --> 00:20:28,940 So here I prefix it with an ampersand. 421 00:20:28,940 --> 00:20:32,990 And now if I go back into my terminal window, make addresses again, 422 00:20:32,990 --> 00:20:34,520 dot slash addresses. 423 00:20:34,520 --> 00:20:37,953 I'm not going to get as lucky as seeing OX123, probably, 424 00:20:37,953 --> 00:20:40,370 because I got even more memory than that in this computer. 425 00:20:40,370 --> 00:20:44,300 But when I hit enter, I do indeed see OX something. 426 00:20:44,300 --> 00:20:46,940 And if I zoom in here, enhance, if you will, 427 00:20:46,940 --> 00:20:51,090 it happens to be at this moment in time, on this server, OX7FFC3A7CFFBC. 428 00:20:51,090 --> 00:20:54,980 429 00:20:54,980 --> 00:20:56,180 So it's a big address. 430 00:20:56,180 --> 00:20:59,030 That's a really big number if we actually did all of the math. 431 00:20:59,030 --> 00:21:00,290 But who really cares? 432 00:21:00,290 --> 00:21:04,430 Just the fact that it exists somewhere is the only point for now. 433 00:21:04,430 --> 00:21:10,670 So this %p symbol that we're passing into printf as a format code is 434 00:21:10,670 --> 00:21:15,350 leveraging the fact that C supports what are known as pointers. 435 00:21:15,350 --> 00:21:20,000 So a pointer is really just an address, the address of some variable 436 00:21:20,000 --> 00:21:24,840 that you can even store in another variable called itself a pointer. 437 00:21:24,840 --> 00:21:26,130 So what do I mean by this? 438 00:21:26,130 --> 00:21:29,750 Well, if a pointer is an address, we can start to tinker with this same idea 439 00:21:29,750 --> 00:21:30,390 as follows. 440 00:21:30,390 --> 00:21:33,590 Let me actually go back to VS Code once more 441 00:21:33,590 --> 00:21:35,940 and play around with syntax like this. 442 00:21:35,940 --> 00:21:39,890 So let me still declare a variable called n and set it equal to 50. 443 00:21:39,890 --> 00:21:43,010 But let's actually create an actual pointer, 444 00:21:43,010 --> 00:21:47,000 a variable whose purpose in life is not to store a boring number like 50, 445 00:21:47,000 --> 00:21:49,380 but the address of some value. 446 00:21:49,380 --> 00:21:51,920 And so the syntax for that is admittedly weird. 447 00:21:51,920 --> 00:21:55,850 If you want p to be a pointer, a variable that stores an address, 448 00:21:55,850 --> 00:21:59,838 you literally say int star for reasons we'll sort of see. 449 00:21:59,838 --> 00:22:02,630 And this is different from the star I mentioned earlier for reasons 450 00:22:02,630 --> 00:22:03,530 we'll also see soon. 451 00:22:03,530 --> 00:22:06,710 But int star p means, hey compiler, give me 452 00:22:06,710 --> 00:22:12,262 a variable called p, inside of which I can store the address of an integer. 453 00:22:12,262 --> 00:22:13,970 What address do you want to put in there? 454 00:22:13,970 --> 00:22:16,640 Well, now I can borrow that same syntax from a moment ago. 455 00:22:16,640 --> 00:22:19,970 I can use ampersand n, which is going to say, hey compiler, give me-- or hey 456 00:22:19,970 --> 00:22:23,690 computer, give me the address of n itself. 457 00:22:23,690 --> 00:22:25,580 Previously, I didn't bother with a variable. 458 00:22:25,580 --> 00:22:28,760 I just sent the address of n into printf directly. 459 00:22:28,760 --> 00:22:30,660 But I can now play with it as follows. 460 00:22:30,660 --> 00:22:32,240 Let me go back to VS Code here. 461 00:22:32,240 --> 00:22:33,770 I'll clear my terminal window. 462 00:22:33,770 --> 00:22:36,510 And let's just play around with two variables. 463 00:22:36,510 --> 00:22:41,060 So int star p-- so it's an asterisk, but most people would say star-- 464 00:22:41,060 --> 00:22:43,760 equals the address of n. 465 00:22:43,760 --> 00:22:47,270 And now, I can just tweak line seven ever so slightly. 466 00:22:47,270 --> 00:22:50,780 Instead of printing out in duplicate ampersand n, 467 00:22:50,780 --> 00:22:53,630 I can literally just pass in p for pointer. 468 00:22:53,630 --> 00:22:56,930 So I've not done anything really that interesting, other than add a variable, 469 00:22:56,930 --> 00:22:58,940 but just to show you the syntax via which 470 00:22:58,940 --> 00:23:01,400 you can create a variable whose purpose in life 471 00:23:01,400 --> 00:23:03,170 is to store one of these addresses. 472 00:23:03,170 --> 00:23:08,180 So let me go ahead and now and do make addresses once more. 473 00:23:08,180 --> 00:23:09,320 Dot slash addresses. 474 00:23:09,320 --> 00:23:13,760 And we should see, indeed, pretty much the same idea, the address 475 00:23:13,760 --> 00:23:19,850 at which n happens to be, now that I've recompiled and actually run my code. 476 00:23:19,850 --> 00:23:22,550 But it gets a little more interesting than that. 477 00:23:22,550 --> 00:23:26,000 I can do one more thing when it comes to my computer's memory. 478 00:23:26,000 --> 00:23:29,840 In VS Code here, let me clear my terminal again, and let me see 479 00:23:29,840 --> 00:23:32,840 if I can perhaps reverse this process. 480 00:23:32,840 --> 00:23:38,120 If n is 50, and p is storing the address of n, 481 00:23:38,120 --> 00:23:43,880 wouldn't it be interesting if I could somehow express, go to the address of n 482 00:23:43,880 --> 00:23:45,500 and tell me what is there. 483 00:23:45,500 --> 00:23:48,770 So to do that, I'm just kind of undoing all of the intellectual work 484 00:23:48,770 --> 00:23:49,470 I'm doing here. 485 00:23:49,470 --> 00:23:53,480 But if I want to print out an integer at some location, I can go back to %i, 486 00:23:53,480 --> 00:23:55,400 just print an integer as always. 487 00:23:55,400 --> 00:23:58,670 But p now is storing the address of someplace. 488 00:23:58,670 --> 00:24:00,600 It is the treasure map, so to speak. 489 00:24:00,600 --> 00:24:02,840 So if I want to go where X marks the spot, 490 00:24:02,840 --> 00:24:07,490 the syntax for that I claimed a moment ago is star p. 491 00:24:07,490 --> 00:24:09,980 So star p means go to that address. 492 00:24:09,980 --> 00:24:12,230 Don't print the address, go to that address, 493 00:24:12,230 --> 00:24:15,450 and show me what's inside of the computer's memory there. 494 00:24:15,450 --> 00:24:18,410 So now, if I go into my terminal and do make addresses, 495 00:24:18,410 --> 00:24:22,580 and do dot slash addresses, what should I see on the screen when I hit enter? 496 00:24:22,580 --> 00:24:23,630 50. 497 00:24:23,630 --> 00:24:26,060 So I indeed see now 50. 498 00:24:26,060 --> 00:24:29,600 Now, here's where it's an unfortunate choice of syntax 499 00:24:29,600 --> 00:24:32,120 from the authors of C decades ago. 500 00:24:32,120 --> 00:24:34,713 Clearly, I'm using star in two different locations. 501 00:24:34,713 --> 00:24:37,130 And suffice it to say, it doesn't represent multiplication 502 00:24:37,130 --> 00:24:37,880 in either of them. 503 00:24:37,880 --> 00:24:40,190 It's being used to represent addresses somehow. 504 00:24:40,190 --> 00:24:44,750 When, on line six, I specify a data type like int, 505 00:24:44,750 --> 00:24:47,330 and then I have a star, and then the name of a variable, 506 00:24:47,330 --> 00:24:51,470 that is the syntax for declaring a pointer, for declaring 507 00:24:51,470 --> 00:24:53,690 a variable that will store an address. 508 00:24:53,690 --> 00:24:54,530 What address? 509 00:24:54,530 --> 00:24:57,440 Well, ampersand n, whatever that is, OX something. 510 00:24:57,440 --> 00:25:03,590 When you do a star and then the name of a pointer without specifying a type, 511 00:25:03,590 --> 00:25:05,670 this just means, go there. 512 00:25:05,670 --> 00:25:09,530 So the star clearly is related to addresses. 513 00:25:09,530 --> 00:25:11,322 It's unfortunate that it's the same symbol. 514 00:25:11,322 --> 00:25:13,947 It would have been nice if they picked maybe a different symbol 515 00:25:13,947 --> 00:25:14,660 of punctuation. 516 00:25:14,660 --> 00:25:17,090 But they mean slightly different things in that context. 517 00:25:17,090 --> 00:25:19,460 On line six, we're declaring the pointer, 518 00:25:19,460 --> 00:25:23,910 declaring a variable called p that's going to point to an integers location. 519 00:25:23,910 --> 00:25:28,230 But when I say star p, that means go to that actual location. 520 00:25:28,230 --> 00:25:32,430 So just try to keep that in mind, even though it's ever so slightly subtly 521 00:25:32,430 --> 00:25:33,070 different. 522 00:25:33,070 --> 00:25:36,060 So what's going on then inside of the computer's actual memory? 523 00:25:36,060 --> 00:25:38,320 Well, let's consider that in pictorial form again. 524 00:25:38,320 --> 00:25:41,370 So even though I've written the pointer in this way, 525 00:25:41,370 --> 00:25:45,740 int then a space, then star p equals ampersand n semicolon, that 526 00:25:45,740 --> 00:25:46,740 is the conventional way. 527 00:25:46,740 --> 00:25:49,157 That's how you'll see it on most websites, most textbooks. 528 00:25:49,157 --> 00:25:51,990 Technically speaking, I will admit that it might actually 529 00:25:51,990 --> 00:25:54,630 be easier to understand if you actually move 530 00:25:54,630 --> 00:25:57,990 the asterisk a little to the left, because this makes, visually, 531 00:25:57,990 --> 00:26:03,570 I think, it even more clear that int star is the type of the variable p, as 532 00:26:03,570 --> 00:26:06,970 opposed to the star being somehow attached to the variable name itself. 533 00:26:06,970 --> 00:26:10,050 However, you might also see it written with a space on either side, which 534 00:26:10,050 --> 00:26:11,467 I don't think really helps anyone. 535 00:26:11,467 --> 00:26:14,640 But the point is that white space does not matter in this context. 536 00:26:14,640 --> 00:26:17,880 And the conventional way is to do it by prefixing 537 00:26:17,880 --> 00:26:19,320 the variable's name with the star. 538 00:26:19,320 --> 00:26:21,390 And this avoids getting into trouble when you 539 00:26:21,390 --> 00:26:23,470 declare multiple variables at a time. 540 00:26:23,470 --> 00:26:26,970 But if it helps you to think about it, you can think of it as int star 541 00:26:26,970 --> 00:26:28,410 as being the type. 542 00:26:28,410 --> 00:26:30,970 It's not just an int, per se. 543 00:26:30,970 --> 00:26:34,890 So with that said, let's consider now the canvas of computer's memory inside 544 00:26:34,890 --> 00:26:36,930 of which we're storing n, and now, p. 545 00:26:36,930 --> 00:26:39,840 So previously, I proposed that n is maybe, yeah, 546 00:26:39,840 --> 00:26:42,670 maybe it's done in the bottom right hand corner of the screen. 547 00:26:42,670 --> 00:26:45,660 So n is storing the number 50 here. 548 00:26:45,660 --> 00:26:47,460 But technically, n lives somewhere. 549 00:26:47,460 --> 00:26:51,720 And for simplicity, I'm going to claim it's at OX123, rather than the bigger 550 00:26:51,720 --> 00:26:53,290 actual address we just saw. 551 00:26:53,290 --> 00:26:54,660 But what about p? 552 00:26:54,660 --> 00:26:58,530 Well, p itself is another variable that I declared separately, 553 00:26:58,530 --> 00:27:01,090 so it's got to live somewhere in the computer's memory. 554 00:27:01,090 --> 00:27:05,460 And it turns out, by convention, pointers take up more space. 555 00:27:05,460 --> 00:27:09,150 They typically use eight bytes nowadays, rather than just four. 556 00:27:09,150 --> 00:27:09,720 Why is that? 557 00:27:09,720 --> 00:27:11,910 Well, if you've got eight bytes, you can count even higher. 558 00:27:11,910 --> 00:27:13,050 You can have even more addresses. 559 00:27:13,050 --> 00:27:15,660 You can have more memory in your Mac, your PC, and phone. 560 00:27:15,660 --> 00:27:16,630 That's a good thing. 561 00:27:16,630 --> 00:27:18,870 So pointers tend to be eight bytes, which 562 00:27:18,870 --> 00:27:21,900 is why I've used eight squares on the screen here. 563 00:27:21,900 --> 00:27:25,050 But what is actually p storing? 564 00:27:25,050 --> 00:27:27,300 Well, it's just storing a number. 565 00:27:27,300 --> 00:27:29,610 Yes, it's technically an integer, but that integer 566 00:27:29,610 --> 00:27:34,900 is itself should be thought of as the address of some other value. 567 00:27:34,900 --> 00:27:39,380 So n is down here at OX123. p is up here at who knows what address. 568 00:27:39,380 --> 00:27:41,130 Doesn't matter for the sake of discussion, 569 00:27:41,130 --> 00:27:46,050 but its value, what it's storing with its pattern of 64 bits 570 00:27:46,050 --> 00:27:49,200 is apparently OX123. 571 00:27:49,200 --> 00:27:51,070 So how does this help us? 572 00:27:51,070 --> 00:27:54,032 Well, if you think about this a little more abstractly, who 573 00:27:54,032 --> 00:27:56,490 cares about what else is going on in the computer's memory? 574 00:27:56,490 --> 00:27:59,317 It actually tends to be helpful to think about this pictorially 575 00:27:59,317 --> 00:28:00,900 as being a little something like this. 576 00:28:00,900 --> 00:28:02,700 At the end of the day, you and I, even when 577 00:28:02,700 --> 00:28:05,760 we start writing code in C that uses pointers, 578 00:28:05,760 --> 00:28:09,270 generally, you and I are never going to care about the actual addresses. 579 00:28:09,270 --> 00:28:12,000 Even though I showed you OX7 something, that's 580 00:28:12,000 --> 00:28:13,930 not generally useful information. 581 00:28:13,930 --> 00:28:16,500 It's suffices to know that it exists somewhere, 582 00:28:16,500 --> 00:28:19,030 and let the computer figure out how to get there. 583 00:28:19,030 --> 00:28:22,980 And so very often when talking about pointers and addresses more generally, 584 00:28:22,980 --> 00:28:25,275 people actually abstract them away, so to speak. 585 00:28:25,275 --> 00:28:27,150 So instead of literally writing on the screen 586 00:28:27,150 --> 00:28:30,990 or the whiteboard when discussing this, OX123, what the actual address is, 587 00:28:30,990 --> 00:28:32,130 who cares what it is? 588 00:28:32,130 --> 00:28:36,630 It suffices that it's a value that leads me to the other value 589 00:28:36,630 --> 00:28:41,320 that I care about, sort of the treasure map, as I described it earlier. 590 00:28:41,320 --> 00:28:45,360 So let's now connect this maybe a little more metaphorically. 591 00:28:45,360 --> 00:28:47,220 So Carter, maybe here you might have noticed 592 00:28:47,220 --> 00:28:50,670 that we've had for a while now these two mailboxes on the stage. 593 00:28:50,670 --> 00:28:55,590 So this white one here is labeled p to represent our pointer variable. 594 00:28:55,590 --> 00:28:59,490 Carter's is labeled n, representing our actual integer. 595 00:28:59,490 --> 00:29:01,350 And what's really kind of going on here is 596 00:29:01,350 --> 00:29:06,450 that, if I were to access the value inside of p, 597 00:29:06,450 --> 00:29:10,950 much like we saw it up here, that's like opening this up and figuring out 598 00:29:10,950 --> 00:29:12,300 what the actual value is. 599 00:29:12,300 --> 00:29:13,870 Now, this itself is a little arcane. 600 00:29:13,870 --> 00:29:14,790 OX123. 601 00:29:14,790 --> 00:29:17,370 And so if we actually do this a little more metaphorically, 602 00:29:17,370 --> 00:29:24,690 we can maybe do this and point our way, if you don't mind. 603 00:29:24,690 --> 00:29:30,082 So here we have a big pointer. 604 00:29:30,082 --> 00:29:31,137 Oh, forgive me. 605 00:29:31,137 --> 00:29:32,470 I guess we'll use this one here. 606 00:29:32,470 --> 00:29:33,100 OK. 607 00:29:33,100 --> 00:29:36,490 So we have this big pointer that's essentially 608 00:29:36,490 --> 00:29:39,460 pointing at the location in memory that we care about, be it OX123, 609 00:29:39,460 --> 00:29:40,210 or something else. 610 00:29:40,210 --> 00:29:43,120 And then if we dereference this, that is, use the star notation, 611 00:29:43,120 --> 00:29:46,870 star p, that's like asking Carter to go to that location, 612 00:29:46,870 --> 00:29:48,850 open up the mailbox, and voila. 613 00:29:48,850 --> 00:29:50,620 What value do you have there? 614 00:29:50,620 --> 00:29:51,250 Voila. 615 00:29:51,250 --> 00:29:53,950 Maybe a big round of applause for Carter for having 616 00:29:53,950 --> 00:29:57,670 practiced this beforehand with me. 617 00:29:57,670 --> 00:29:58,318 All right. 618 00:29:58,318 --> 00:30:00,860 That was mostly just an excuse to use the foam fingers today. 619 00:30:00,860 --> 00:30:03,580 But with that said, that's hopefully a helpful metaphor, 620 00:30:03,580 --> 00:30:06,820 honestly, because these pointers, these addresses actually 621 00:30:06,820 --> 00:30:08,828 tend to be among the more arcane topics in C 622 00:30:08,828 --> 00:30:11,620 that, even if things are kind of clicking right now, as soon as you 623 00:30:11,620 --> 00:30:13,780 start writing code involving addresses, it's 624 00:30:13,780 --> 00:30:16,090 easy to get lost in some of the details. 625 00:30:16,090 --> 00:30:18,220 But metaphorically, these mailboxes are meant 626 00:30:18,220 --> 00:30:20,110 to represent, really, what's going on. 627 00:30:20,110 --> 00:30:22,990 Mailboxes in the physical human world have addresses. 628 00:30:22,990 --> 00:30:26,710 I can go to that address, open it up, and then I can go to another address 629 00:30:26,710 --> 00:30:30,230 by following that treasure map, if you will, or pictorially here, 630 00:30:30,230 --> 00:30:33,840 the arrow that's pointing from one location to another. 631 00:30:33,840 --> 00:30:36,890 So even though it's very weird syntax with ampersands, and asterisks, 632 00:30:36,890 --> 00:30:39,920 and the like, it's just addresses in memory, 633 00:30:39,920 --> 00:30:42,780 much like mailboxes in the real world. 634 00:30:42,780 --> 00:30:46,940 So with that said, let's maybe begin to take off certain training 635 00:30:46,940 --> 00:30:51,030 wheels by revisiting what strings are, as we've been using them thus far. 636 00:30:51,030 --> 00:30:55,100 So here's a line of code in C that we've been using since week one, 637 00:30:55,100 --> 00:30:59,210 really, where I declare a string variable called s, and set it equal to, 638 00:30:59,210 --> 00:31:00,500 quote unquote, hi. 639 00:31:00,500 --> 00:31:05,240 Now, technically "hi" is three letters, or two letters in a punctuation symbol. 640 00:31:05,240 --> 00:31:10,130 But how many bytes is that string taking up? 641 00:31:10,130 --> 00:31:13,268 Is it one, two, three, or was it-- 642 00:31:13,268 --> 00:31:14,060 I'm seeing it here. 643 00:31:14,060 --> 00:31:14,510 It's four. 644 00:31:14,510 --> 00:31:15,010 Why? 645 00:31:15,010 --> 00:31:17,440 646 00:31:17,440 --> 00:31:20,660 Yeah, there's always a null character that, even though you don't see it 647 00:31:20,660 --> 00:31:25,440 on the screen, that is what terminates every string, we claimed, a while back. 648 00:31:25,440 --> 00:31:27,368 So if I were to draw this maybe "hi" ends up 649 00:31:27,368 --> 00:31:29,910 in the computer's memory down here, bottom right hand corner. 650 00:31:29,910 --> 00:31:32,270 But it is indeed four bytes, not just three 651 00:31:32,270 --> 00:31:35,060 because, secretly, there's always been that null character, 652 00:31:35,060 --> 00:31:36,950 even though we as programmers don't often 653 00:31:36,950 --> 00:31:38,630 have to type it explicitly ourselves. 654 00:31:38,630 --> 00:31:40,500 That's what the double quotes do for us. 655 00:31:40,500 --> 00:31:42,770 It terminates the string with that null character. 656 00:31:42,770 --> 00:31:46,190 Now, recall from week two when we talked about arrays, 657 00:31:46,190 --> 00:31:48,710 we started playing around with strings as really 658 00:31:48,710 --> 00:31:50,340 just being arrays of characters. 659 00:31:50,340 --> 00:31:53,840 So we call them a string, but we could treat them as arrays of char, 660 00:31:53,840 --> 00:31:54,720 so to speak. 661 00:31:54,720 --> 00:31:58,670 So if the string was called s, s bracket zero would give us the first char, 662 00:31:58,670 --> 00:32:00,900 s bracket one the second, s bracket two the third. 663 00:32:00,900 --> 00:32:03,410 And if you're really curious, s bracket three 664 00:32:03,410 --> 00:32:05,730 would give you the last hidden null character, 665 00:32:05,730 --> 00:32:08,150 which we saw on the screen as just a zero 666 00:32:08,150 --> 00:32:11,450 when we printed it out, while tinkering with some actual code. 667 00:32:11,450 --> 00:32:14,390 But technically, today, logically, it would 668 00:32:14,390 --> 00:32:17,300 seem that it's also true that H-I exclamation 669 00:32:17,300 --> 00:32:20,570 point and the null character must clearly live at some address. 670 00:32:20,570 --> 00:32:23,940 They must clearly live in their own mailbox, so to speak. 671 00:32:23,940 --> 00:32:28,610 So maybe, for the sake of discussion, this H today is at OX123. 672 00:32:28,610 --> 00:32:33,840 But recall that arrays are characterized by contiguousness from left to right. 673 00:32:33,840 --> 00:32:41,600 So if H is at OX123, it must be the case that I is at OX124, I is at 125, 674 00:32:41,600 --> 00:32:46,917 and the null character is at OX126 because those are one byte apart. 675 00:32:46,917 --> 00:32:48,750 And I deliberately chose numbers here where, 676 00:32:48,750 --> 00:32:51,090 whether it's decimal or hexadecimal, it doesn't matter. 677 00:32:51,090 --> 00:32:53,970 These differ by just one byte themselves. 678 00:32:53,970 --> 00:32:55,760 So that's what implies that they're indeed 679 00:32:55,760 --> 00:32:57,740 adjacent, or contiguous in memory. 680 00:32:57,740 --> 00:32:59,000 But what is s then? 681 00:32:59,000 --> 00:33:02,540 When I declared s to be a string, what is 682 00:33:02,540 --> 00:33:04,640 it that's been going in s all of this time 683 00:33:04,640 --> 00:33:08,090 if, clearly, s is actually this thing here? 684 00:33:08,090 --> 00:33:11,060 Well, strings have kind of been a white lie for a few weeks 685 00:33:11,060 --> 00:33:16,130 because s itself, technically, is a pointer. 686 00:33:16,130 --> 00:33:18,720 s is the address of this string. 687 00:33:18,720 --> 00:33:20,810 So the string is somewhere in memory, but s 688 00:33:20,810 --> 00:33:23,870 itself is a separate variable that gives you 689 00:33:23,870 --> 00:33:26,820 a clue as how to find all of those characters in memory. 690 00:33:26,820 --> 00:33:28,820 So if you had to guess just intuitively now, 691 00:33:28,820 --> 00:33:32,270 if this is the string actually in memory, that is, this 692 00:33:32,270 --> 00:33:35,180 is the array of chars in memory, what would logically 693 00:33:35,180 --> 00:33:38,702 make sense to put as the value of s? 694 00:33:38,702 --> 00:33:39,590 A pointer. 695 00:33:39,590 --> 00:33:40,430 Specifically? 696 00:33:40,430 --> 00:33:43,190 697 00:33:43,190 --> 00:33:44,270 A pointer to h. 698 00:33:44,270 --> 00:33:45,830 And how would I express that? 699 00:33:45,830 --> 00:33:46,940 What's the actual value? 700 00:33:46,940 --> 00:33:54,800 OX123 might very well suffice as the value here of s. 701 00:33:54,800 --> 00:33:57,150 Now, why might that be? 702 00:33:57,150 --> 00:33:59,780 Well, that essentially gives you enough information 703 00:33:59,780 --> 00:34:03,950 to find the beginning of the string, "hi" in this case. 704 00:34:03,950 --> 00:34:05,700 Now, you might think, well, wait a minute. 705 00:34:05,700 --> 00:34:07,910 How does it know about the second character and the third character? 706 00:34:07,910 --> 00:34:10,327 But now, if you kind of rewind in time, oh, wait a minute, 707 00:34:10,327 --> 00:34:14,420 maybe now the null character actually makes even more sense from week two. 708 00:34:14,420 --> 00:34:15,050 Why? 709 00:34:15,050 --> 00:34:18,139 Because if s technically storing the location 710 00:34:18,139 --> 00:34:20,420 of the beginning of the string, someone's 711 00:34:20,420 --> 00:34:23,330 got to keep track of where the string ends, presumably. 712 00:34:23,330 --> 00:34:26,780 And that's effectively the string itself because humans decided decades ago, 713 00:34:26,780 --> 00:34:29,870 let's just null terminate every string with a special 714 00:34:29,870 --> 00:34:35,300 so character, zero, all zero bits, eight zero bits, specifically. 715 00:34:35,300 --> 00:34:36,929 But and that's enough information. 716 00:34:36,929 --> 00:34:39,679 The sort of treasure map leads you to the beginning of the string, 717 00:34:39,679 --> 00:34:41,719 and then you can use a for loop, a while loop, 718 00:34:41,719 --> 00:34:44,880 or whatever to walk through the string, and that's what printf does. 719 00:34:44,880 --> 00:34:48,690 And you just stop as soon as you see that null character. 720 00:34:48,690 --> 00:34:52,444 So this then is what a string actually is. 721 00:34:52,444 --> 00:34:57,440 s is and has always been, since week one, a pointer, so to speak, 722 00:34:57,440 --> 00:35:01,710 that actually refers to the start of that array of characters. 723 00:35:01,710 --> 00:35:04,550 And frankly, again, who cares about the OX123 specifics? 724 00:35:04,550 --> 00:35:07,550 We can abstract that away and actually just treat 725 00:35:07,550 --> 00:35:11,690 s as, literally, an arrow that points to the beginning of that string 726 00:35:11,690 --> 00:35:15,350 because it will be rare that we actually care about where this thing 727 00:35:15,350 --> 00:35:18,800 physically is in the computer's memory. 728 00:35:18,800 --> 00:35:23,390 Now, before we see this in code, any questions on this revelation? 729 00:35:23,390 --> 00:35:23,900 Yeah. 730 00:35:23,900 --> 00:35:27,530 AUDIENCE: Have pointers gotten larger as computer memories 731 00:35:27,530 --> 00:35:28,863 have increased over the decades? 732 00:35:28,863 --> 00:35:29,488 SPEAKER 1: Yes. 733 00:35:29,488 --> 00:35:32,840 Have pointers gotten larger as computers memory has increased over the decades? 734 00:35:32,840 --> 00:35:33,650 Short answer, yes. 735 00:35:33,650 --> 00:35:36,530 Like, back in my day, we were limited to, like, two gigabytes 736 00:35:36,530 --> 00:35:37,760 of memory total. 737 00:35:37,760 --> 00:35:38,600 Well, why two? 738 00:35:38,600 --> 00:35:42,260 Well, if you had 32-bit memory, or if you use 32 bits 739 00:35:42,260 --> 00:35:45,650 to represent addresses, a.k.a. four bytes, as was conventional, 740 00:35:45,650 --> 00:35:48,650 you can count recall as high as 4 billion values. 741 00:35:48,650 --> 00:35:52,050 But generally, numbers are both negative and positive, so that halves it. 742 00:35:52,050 --> 00:35:54,890 So the reason decades ago, computers, PCs, Macs 743 00:35:54,890 --> 00:35:58,670 could have no more than two gigabytes of memory was because, literally, 744 00:35:58,670 --> 00:36:01,250 the integers being used, the pointers being used 745 00:36:01,250 --> 00:36:03,830 were only four bits, that is 32 bits. 746 00:36:03,830 --> 00:36:06,620 Sorry, four bytes, that is, 32 bits long. 747 00:36:06,620 --> 00:36:08,750 And so you literally could buy more memory. 748 00:36:08,750 --> 00:36:10,833 You could buy a third gigabyte, a fourth gigabyte, 749 00:36:10,833 --> 00:36:13,280 but you literally had no way mathematically 750 00:36:13,280 --> 00:36:16,530 to express all of those bigger locations. 751 00:36:16,530 --> 00:36:18,470 So it was effectively useless, in that case. 752 00:36:18,470 --> 00:36:21,680 In more modern times, computers tend now to use 64 bits, which 753 00:36:21,680 --> 00:36:23,220 allows you to count crazy high. 754 00:36:23,220 --> 00:36:27,140 And that's more than enough to address bigger chunks of memory. 755 00:36:27,140 --> 00:36:28,100 Really good question. 756 00:36:28,100 --> 00:36:32,257 Others on memory thus far. 757 00:36:32,257 --> 00:36:32,840 No, all right. 758 00:36:32,840 --> 00:36:36,810 Well, let's translate this a bit to code by going back over to VS Code here. 759 00:36:36,810 --> 00:36:41,390 And let me propose now that we revisit maybe a simpler string 760 00:36:41,390 --> 00:36:43,110 example, as opposed to these integers. 761 00:36:43,110 --> 00:36:46,430 So let me go ahead and throw away all of this integer related code. 762 00:36:46,430 --> 00:36:49,910 Let me go ahead and, for the moment, include CS50.h 763 00:36:49,910 --> 00:36:53,030 so that we have access to string and other things as in week one. 764 00:36:53,030 --> 00:36:56,240 And let me do a string s equals quote unquote "HI" in all caps. 765 00:36:56,240 --> 00:37:00,845 And let me do a simple safety check %s backslash n s, 766 00:37:00,845 --> 00:37:04,100 just to make sure everything works as it did in week one. 767 00:37:04,100 --> 00:37:07,310 So make addresses, dot slash addresses, and I should indeed 768 00:37:07,310 --> 00:37:09,320 see "HI" on the screen. 769 00:37:09,320 --> 00:37:12,680 Well, let's now kind of tinker with what's going on underneath the hood. 770 00:37:12,680 --> 00:37:17,100 And now, things can get a little more memory specific. 771 00:37:17,100 --> 00:37:19,610 So I'm still going to declare s as a string up here. 772 00:37:19,610 --> 00:37:20,360 But you know what? 773 00:37:20,360 --> 00:37:22,610 Instead of printing out the string itself, 774 00:37:22,610 --> 00:37:26,870 let me actually treat s as the pointer I claim it is. 775 00:37:26,870 --> 00:37:30,210 I claim a string is just an address, so I have this new syntax today, 776 00:37:30,210 --> 00:37:33,120 %p to print out pointers, to print out addresses. 777 00:37:33,120 --> 00:37:35,340 Let's see what s actually is. 778 00:37:35,340 --> 00:37:39,870 Let me do make addresses again, dot slash addresses, and there it is. 779 00:37:39,870 --> 00:37:50,110 It's not as simple as OX123, but it is at location OX55C670878004. 780 00:37:50,110 --> 00:37:50,610 All right. 781 00:37:50,610 --> 00:37:52,180 Who really cares, specifically? 782 00:37:52,180 --> 00:37:55,450 But if we poke around a bit more, things might make a bit more sense. 783 00:37:55,450 --> 00:37:56,100 Let's do this. 784 00:37:56,100 --> 00:38:01,740 Let's also print out the address using %p of, how about the very first 785 00:38:01,740 --> 00:38:02,805 character of s. 786 00:38:02,805 --> 00:38:06,720 So the very first character of is known as s bracket zero. 787 00:38:06,720 --> 00:38:10,000 We did that in week two, treating a string as an array. 788 00:38:10,000 --> 00:38:13,110 But how do I get the address of a character? 789 00:38:13,110 --> 00:38:15,750 Well, I have our new symbol today, ampersand. 790 00:38:15,750 --> 00:38:18,750 So even though this looks like a mouthful, ampersand, s, square bracket, 791 00:38:18,750 --> 00:38:21,015 zero, square bracket, it's just two ideas combined. 792 00:38:21,015 --> 00:38:24,900 s bracket zero gives you the first character in the string, s. 793 00:38:24,900 --> 00:38:28,620 And adding an ampersand at the beginning says, tell me what that address is. 794 00:38:28,620 --> 00:38:34,260 So if I recompile this code, make addresses, dot slash addresses, 795 00:38:34,260 --> 00:38:37,680 even if you don't remember the value OX whatever, 796 00:38:37,680 --> 00:38:43,510 what are we going to see on the screen at a higher level? 797 00:38:43,510 --> 00:38:46,780 Perhaps the same exact thing. 798 00:38:46,780 --> 00:38:47,320 Why? 799 00:38:47,320 --> 00:38:49,070 Well, s is just an address. 800 00:38:49,070 --> 00:38:50,120 But what does that mean? 801 00:38:50,120 --> 00:38:52,900 Well, it's just the address of its first character. 802 00:38:52,900 --> 00:38:56,030 And we saw that per our picture a moment ago. 803 00:38:56,030 --> 00:38:58,000 So can I see the contiguousness of this? 804 00:38:58,000 --> 00:39:00,400 Well, I'm going to resort to some copy paste just for time's sake, 805 00:39:00,400 --> 00:39:01,960 even though this is going to look a little silly, 806 00:39:01,960 --> 00:39:03,730 and I could certainly use a loop instead. 807 00:39:03,730 --> 00:39:07,420 But let me print out the second location, the third location, and heck, 808 00:39:07,420 --> 00:39:12,050 even the fourth location, whoops, the fourth location of that null character. 809 00:39:12,050 --> 00:39:15,850 If I now do make addresses again and dot slash addresses, and zoom in, 810 00:39:15,850 --> 00:39:18,580 I don't really care about what these are, specifically. 811 00:39:18,580 --> 00:39:22,560 But notice the first two are indeed the same because the first represents s. 812 00:39:22,560 --> 00:39:24,640 The first represents the first character of s, 813 00:39:24,640 --> 00:39:27,320 which now I reveal are exactly the same idea. 814 00:39:27,320 --> 00:39:30,580 And the next ones are literally just one byte 815 00:39:30,580 --> 00:39:34,490 away, ending in five, six, and seven, respectively. 816 00:39:34,490 --> 00:39:36,340 So again, the numbers in and of themselves 817 00:39:36,340 --> 00:39:38,950 are not useful, actionable information, but it 818 00:39:38,950 --> 00:39:44,240 does let us actually see what's going on underneath the hood. 819 00:39:44,240 --> 00:39:47,420 So just to rewind for a moment, let me actually go back to the original 820 00:39:47,420 --> 00:39:51,480 version, where I'm printing out the string itself, using %s. 821 00:39:51,480 --> 00:39:57,410 Let me remake addresses to make sure that, OK, it still prints out "HI". 822 00:39:57,410 --> 00:40:00,290 But what has been going on now all this time? 823 00:40:00,290 --> 00:40:02,368 Well, let me go back to our simple line of code 824 00:40:02,368 --> 00:40:04,160 that we've been using since week one, which 825 00:40:04,160 --> 00:40:07,730 gave us a string called s, setting it equal to the value of "HI". 826 00:40:07,730 --> 00:40:12,380 Let me propose now that strings were indeed this white lie. 827 00:40:12,380 --> 00:40:14,750 And if I can unnecessarily dramatically say, 828 00:40:14,750 --> 00:40:18,470 here we take the training wheels off and reveal 829 00:40:18,470 --> 00:40:30,080 that, all this time, string, string is probably, actually, what, technically? 830 00:40:30,080 --> 00:40:32,180 Yeah. 831 00:40:32,180 --> 00:40:35,210 A char star. 832 00:40:35,210 --> 00:40:36,270 That was amazing. 833 00:40:36,270 --> 00:40:37,080 Thank you for that. 834 00:40:37,080 --> 00:40:39,100 So yeah. 835 00:40:39,100 --> 00:40:43,720 So it's a char star, which admittedly at first glance, 836 00:40:43,720 --> 00:40:46,870 just makes a simple idea look unnecessarily complicated. 837 00:40:46,870 --> 00:40:50,410 And that's why in week one, we indeed introduced these training wheels, 838 00:40:50,410 --> 00:40:55,720 whereby we, CS50, invented the datatype called string, just to kind of hide 839 00:40:55,720 --> 00:40:57,085 this lower level detail. 840 00:40:57,085 --> 00:40:58,960 If you will, string for us is an abstraction. 841 00:40:58,960 --> 00:41:02,170 Now, that is to say string is not a CS50 specific word. 842 00:41:02,170 --> 00:41:05,500 Every programmer in the world knows what a string is. 843 00:41:05,500 --> 00:41:07,690 It is a sequence of characters. 844 00:41:07,690 --> 00:41:09,190 It is an array of characters. 845 00:41:09,190 --> 00:41:12,220 But in C, technically, decades ago when it was invented, 846 00:41:12,220 --> 00:41:16,150 they didn't think, they didn't decide to create an actual data type called 847 00:41:16,150 --> 00:41:19,570 string because, especially if they were among those more comfortable, 848 00:41:19,570 --> 00:41:23,470 char star is equivalent, and it achieves the exact same thing, 849 00:41:23,470 --> 00:41:26,800 even though at a glance, we didn't want to start week one with that lower level 850 00:41:26,800 --> 00:41:27,670 detail. 851 00:41:27,670 --> 00:41:28,990 Question here in front. 852 00:41:28,990 --> 00:41:33,020 853 00:41:33,020 --> 00:41:33,980 Sure. 854 00:41:33,980 --> 00:41:36,450 Can I clarify how the star makes it a string? 855 00:41:36,450 --> 00:41:39,240 So we've, up until now, been just calling it a string 856 00:41:39,240 --> 00:41:40,560 so that's s is a string. 857 00:41:40,560 --> 00:41:42,930 And that's a sufficient mental model. 858 00:41:42,930 --> 00:41:44,900 But technically, what is a string? 859 00:41:44,900 --> 00:41:48,110 I claimed pictorially with my grid of memory 860 00:41:48,110 --> 00:41:50,670 that a string is really just an address. 861 00:41:50,670 --> 00:41:53,240 It's really just the address of its first character. 862 00:41:53,240 --> 00:41:57,410 I then tried to demonstrate as much in code by using percent p 863 00:41:57,410 --> 00:42:01,370 and showing you, literally, s is a value, like, OX something. 864 00:42:01,370 --> 00:42:05,220 And literally, its first character is at that same address, OX something. 865 00:42:05,220 --> 00:42:09,050 So here, when I claim that string has never really existed, 866 00:42:09,050 --> 00:42:13,250 except within the confines of CS50, technically, the data type of a string 867 00:42:13,250 --> 00:42:15,450 is best expressed as char star. 868 00:42:15,450 --> 00:42:15,950 Why? 869 00:42:15,950 --> 00:42:19,760 Well, a string clearly can't just be a char because a char, by definition, 870 00:42:19,760 --> 00:42:20,720 is a single character. 871 00:42:20,720 --> 00:42:23,460 A string, we already know, is a sequence of characters. 872 00:42:23,460 --> 00:42:25,670 But how can you represent a sequence of characters? 873 00:42:25,670 --> 00:42:29,480 You can call it a char star, which is a different data type that we're 874 00:42:29,480 --> 00:42:31,100 introducing today for the first time. 875 00:42:31,100 --> 00:42:34,320 And the star just means that s itself is not a char. 876 00:42:34,320 --> 00:42:38,400 The star means that s is the address of a char. 877 00:42:38,400 --> 00:42:42,700 And by convention, it's the address of the first char in a string. 878 00:42:42,700 --> 00:42:48,090 So with that said, if I go back to my actual VS Code over here, 879 00:42:48,090 --> 00:42:55,820 I can change, literally, char string to char star s. 880 00:42:55,820 --> 00:42:59,220 I can get rid of the CS50 library, our so-called training wheels, 881 00:42:59,220 --> 00:43:02,220 which has been the goal for the past few weeks, to put them on initially 882 00:43:02,220 --> 00:43:04,300 and then take them off quite quickly. 883 00:43:04,300 --> 00:43:08,613 So now this is the same program, and %s is still the same. s is still the same. 884 00:43:08,613 --> 00:43:10,030 Everything else is still the same. 885 00:43:10,030 --> 00:43:12,600 All I've done is change, quote unquote, string to, 886 00:43:12,600 --> 00:43:16,200 quote unquote, char star, which obviates the need for the CS50 library. 887 00:43:16,200 --> 00:43:18,870 And if I now do make addresses and dot slash addresses, 888 00:43:18,870 --> 00:43:21,570 "HI" behaves exactly as it would. 889 00:43:21,570 --> 00:43:25,470 So this is now raw native C code without any training wheels, 890 00:43:25,470 --> 00:43:30,840 without any CS50 scaffolding, that just uses these basic building 891 00:43:30,840 --> 00:43:32,310 blocks and primitives. 892 00:43:32,310 --> 00:43:33,455 Other questions on this? 893 00:43:33,455 --> 00:43:36,890 AUDIENCE: Could you please clarify why we don't use the end symbol for that 894 00:43:36,890 --> 00:43:38,310 s, as opposed to the other ones? 895 00:43:38,310 --> 00:43:39,180 SPEAKER 1: Correct. 896 00:43:39,180 --> 00:43:44,320 Why don't we use the ampersand symbol for this, though we did earlier? 897 00:43:44,320 --> 00:43:47,670 So in this case, there's no reason for an ampersand 898 00:43:47,670 --> 00:43:52,620 because the ampersand tells you what the address of a variable is. 899 00:43:52,620 --> 00:43:55,170 I'll concede that it probably would be a little more 900 00:43:55,170 --> 00:43:59,430 consistent for us to do this, which is maybe where your mind is going. 901 00:43:59,430 --> 00:44:02,880 Now, never mind the fact that looks even worse, I think, syntactically. 902 00:44:02,880 --> 00:44:06,930 It's a reasonable instinct, but it turns out 903 00:44:06,930 --> 00:44:09,150 that two is what the double quotes are doing for you. 904 00:44:09,150 --> 00:44:11,790 The C compiler, called Clang, is smart enough 905 00:44:11,790 --> 00:44:15,060 to realize that when it sees double quotes around a sequence of characters, 906 00:44:15,060 --> 00:44:20,550 it wants to put the address of that first char in the variable for you. 907 00:44:20,550 --> 00:44:23,850 But when we had a variable like n, which we created, 908 00:44:23,850 --> 00:44:26,020 you have to distinguish n from its address. 909 00:44:26,020 --> 00:44:28,620 So that's why we prefixed n with an ampersand. 910 00:44:28,620 --> 00:44:30,540 But the double quotes take care of it for you. 911 00:44:30,540 --> 00:44:36,240 Other questions on these here addresses? 912 00:44:36,240 --> 00:44:36,880 No? 913 00:44:36,880 --> 00:44:37,380 All right. 914 00:44:37,380 --> 00:44:43,530 Well, beyond that, let me propose that we tinker with one other idea 915 00:44:43,530 --> 00:44:46,170 to see how we actually invented this thing called a string. 916 00:44:46,170 --> 00:44:48,180 Well, I claim that string is just char star. 917 00:44:48,180 --> 00:44:50,430 You've actually seen this technique before. 918 00:44:50,430 --> 00:44:54,630 It was just a week ago that we tinkered with structures, custom data types 919 00:44:54,630 --> 00:44:55,890 to represent a person. 920 00:44:55,890 --> 00:44:59,530 And recall that we had a structure of a name and a number representing 921 00:44:59,530 --> 00:45:00,030 a person. 922 00:45:00,030 --> 00:45:03,030 But more importantly, we had this keyword typedef, 923 00:45:03,030 --> 00:45:05,940 which defines your own type to be whatever you want. 924 00:45:05,940 --> 00:45:08,160 Now, we used it a little more powerfully last time 925 00:45:08,160 --> 00:45:11,880 to actually represent a whole structure of a person, having a name, 926 00:45:11,880 --> 00:45:12,790 and having a number. 927 00:45:12,790 --> 00:45:15,623 But at the end of the day, we really just invented our own data type 928 00:45:15,623 --> 00:45:17,910 that we called, obviously, person. 929 00:45:17,910 --> 00:45:20,400 But and that represented, indeed, this structure. 930 00:45:20,400 --> 00:45:23,380 But typedef was really the enabling element there. 931 00:45:23,380 --> 00:45:25,470 And so it turns out with typedef, you can create 932 00:45:25,470 --> 00:45:27,060 any number of data types of your own. 933 00:45:27,060 --> 00:45:28,860 For instance, if you just really can't get 934 00:45:28,860 --> 00:45:31,140 the hang of calling an integer an int, you 935 00:45:31,140 --> 00:45:34,230 can create your own data type called integer 936 00:45:34,230 --> 00:45:37,403 that itself is a synonym for int, because the way typedef works, 937 00:45:37,403 --> 00:45:39,570 even though this one's even simpler than the struct, 938 00:45:39,570 --> 00:45:42,310 is you can read it from right to left. 939 00:45:42,310 --> 00:45:47,662 This means give me a data type called integer that is actually an int. 940 00:45:47,662 --> 00:45:49,870 And that's the same thing that happened a moment ago. 941 00:45:49,870 --> 00:45:53,130 Give me a data type called person that is actually this whole structure. 942 00:45:53,130 --> 00:45:54,760 But an integer is even simpler. 943 00:45:54,760 --> 00:45:56,340 Now, most people wouldn't do this. 944 00:45:56,340 --> 00:45:59,820 This really doesn't create any intellectual enhancement of the data 945 00:45:59,820 --> 00:46:02,130 types, but you could do it if you really wanted. 946 00:46:02,130 --> 00:46:05,070 More commonly, and as you'll see this in code in the future, 947 00:46:05,070 --> 00:46:07,470 would be not just a typedef something like an integer. 948 00:46:07,470 --> 00:46:11,460 But it turns out, curiously, C has no data type for a byte. 949 00:46:11,460 --> 00:46:15,750 Like, there's no built in obvious way to represent eight bits 950 00:46:15,750 --> 00:46:17,830 and store whatever you want in them. 951 00:46:17,830 --> 00:46:21,720 However, you can use what's called a uint8_t, t 952 00:46:21,720 --> 00:46:25,530 which is a data type that comes with C. And frankly, those more comfortable 953 00:46:25,530 --> 00:46:27,600 might simply use this data type once you sort of 954 00:46:27,600 --> 00:46:29,070 commit to memory that it exists. 955 00:46:29,070 --> 00:46:32,680 But honestly, for most of us, it's a lot more convenient to think of a byte 956 00:46:32,680 --> 00:46:34,210 as being its own data type. 957 00:46:34,210 --> 00:46:37,540 When you want to write code that manipulates one or two or more bytes, 958 00:46:37,540 --> 00:46:39,850 wouldn't it be nice to have a data type called byte? 959 00:46:39,850 --> 00:46:43,630 So it turns out that you can represent a byte, which is eight bits using 960 00:46:43,630 --> 00:46:45,830 an unsigned integer with 8 bits. 961 00:46:45,830 --> 00:46:49,900 And this is just a data type that's declared in some other C header file. 962 00:46:49,900 --> 00:46:52,870 But long story short, you'll see and use this before long. 963 00:46:52,870 --> 00:46:56,290 But it's just a synonym to make things a little more user friendly, 964 00:46:56,290 --> 00:46:58,885 like person, like string, like byte. 965 00:46:58,885 --> 00:47:02,860 So what is in the CS50 header file, among other things? 966 00:47:02,860 --> 00:47:04,750 Literally, this line of code. 967 00:47:04,750 --> 00:47:07,180 This is the single line of code that we deploy 968 00:47:07,180 --> 00:47:11,260 in week one onward that teaches Clang to think of the word string 969 00:47:11,260 --> 00:47:15,550 as being synonymous with char star, so that you all never have to type, 970 00:47:15,550 --> 00:47:20,470 or know, or think about char star until, wonderfully, today in week four, 971 00:47:20,470 --> 00:47:22,670 a couple of weeks later instead. 972 00:47:22,670 --> 00:47:24,340 So that's all we've been doing. 973 00:47:24,340 --> 00:47:27,430 That is the technical implementation of the training wheels. 974 00:47:27,430 --> 00:47:32,870 It's just using a custom data type in this way. 975 00:47:32,870 --> 00:47:37,970 So how about one other maybe pair of examples here with our addresses, 976 00:47:37,970 --> 00:47:40,340 such that we can tinker a little bit further? 977 00:47:40,340 --> 00:47:44,600 It turns out that, once everything in the world 978 00:47:44,600 --> 00:47:48,470 is addressable using these pointers, like using numeric addresses 979 00:47:48,470 --> 00:47:50,870 to represent where things are in memory, you can actually 980 00:47:50,870 --> 00:47:52,700 do something called pointer arithmetic. 981 00:47:52,700 --> 00:47:54,680 And here, too, we the programmers generally 982 00:47:54,680 --> 00:47:59,640 don't care what the specific values are, but we care that they do exist. 983 00:47:59,640 --> 00:48:02,600 And if they do exist, we can maybe do some arithmetic on them 984 00:48:02,600 --> 00:48:05,750 and add one to go to the next byte, add two to go to the next, next byte, 985 00:48:05,750 --> 00:48:08,400 add three to go to the next, next, net byte, and so forth. 986 00:48:08,400 --> 00:48:12,710 So pointer arithmetic literally refers to doing math on addresses. 987 00:48:12,710 --> 00:48:15,260 So how do we translate this into something actionable? 988 00:48:15,260 --> 00:48:18,410 Let me actually go back to VS Code here, and let 989 00:48:18,410 --> 00:48:21,210 me propose that we do something like the following. 990 00:48:21,210 --> 00:48:25,040 I'm going to throw away my first printf here. 991 00:48:25,040 --> 00:48:28,400 And I'm instead going to print out this string character by character, 992 00:48:28,400 --> 00:48:30,920 just like we did in week two. 993 00:48:30,920 --> 00:48:36,950 Let me go ahead and call printf, pass in %c for a single char, backslash n, 994 00:48:36,950 --> 00:48:40,610 comma, and now I want to print out the first character in s. 995 00:48:40,610 --> 00:48:45,250 Using array notation, what do I type to print the first character in s? 996 00:48:45,250 --> 00:48:48,250 997 00:48:48,250 --> 00:48:49,530 Yep, over here. 998 00:48:49,530 --> 00:48:50,430 s bracket zero. 999 00:48:50,430 --> 00:48:52,950 So s bracket zero gives me the first character in s. 1000 00:48:52,950 --> 00:48:55,320 And let me copy paste just for demonstration sake 1001 00:48:55,320 --> 00:48:58,200 here inside of my same curly braces, and print out 1002 00:48:58,200 --> 00:48:59,520 the second char, and the third. 1003 00:48:59,520 --> 00:49:00,840 And I don't care about the null character. 1004 00:49:00,840 --> 00:49:03,160 I just want to print the string itself for now. 1005 00:49:03,160 --> 00:49:07,095 So even though this is jumping through way more hoops than just using %s 1006 00:49:07,095 --> 00:49:10,530 and print the whole thing at once, it's again, just demonstrating how we can, 1007 00:49:10,530 --> 00:49:12,840 at a lower level, manipulate these strings. 1008 00:49:12,840 --> 00:49:15,270 So let me do make addresses, dot slash addresses. 1009 00:49:15,270 --> 00:49:18,090 And yet again, we see, somewhat stupidly, one 1010 00:49:18,090 --> 00:49:21,630 per line, H-I exclamation point. 1011 00:49:21,630 --> 00:49:23,940 I can, of course fix that by getting rid of this. 1012 00:49:23,940 --> 00:49:27,052 I can get rid of this, and I can leave the last backslash n. 1013 00:49:27,052 --> 00:49:29,010 So let's just make it a look a little prettier. 1014 00:49:29,010 --> 00:49:32,070 Make addresses, dot slash addresses, enter, 1015 00:49:32,070 --> 00:49:34,320 and I can print it out all on one line. 1016 00:49:34,320 --> 00:49:37,140 But now using pointer notation, it turns out 1017 00:49:37,140 --> 00:49:40,230 we can do one other thing, which admittedly, for now, 1018 00:49:40,230 --> 00:49:42,450 is going to feel like unnecessary complexity. 1019 00:49:42,450 --> 00:49:46,420 But it's actually a really helpful tool to add to our toolkit, so to speak, 1020 00:49:46,420 --> 00:49:48,820 whereby I could instead do this. 1021 00:49:48,820 --> 00:49:54,370 To print out the first character in s, yes, I can treat it as an array 1022 00:49:54,370 --> 00:49:56,560 and get the zeroth index. 1023 00:49:56,560 --> 00:49:59,800 However, what is s? s is just the address of a string. 1024 00:49:59,800 --> 00:50:03,710 What does that mean? s is the address of the first char in the string. 1025 00:50:03,710 --> 00:50:09,560 So if I do star s, what's that going to print? 1026 00:50:09,560 --> 00:50:11,090 Presumably h, right? 1027 00:50:11,090 --> 00:50:18,410 Because if the first character in s is h, then star s will go to that address 1028 00:50:18,410 --> 00:50:20,090 and show me what's actually there. 1029 00:50:20,090 --> 00:50:21,930 And let me go ahead and do this again. 1030 00:50:21,930 --> 00:50:25,530 Let me copy paste twice and then tweak this a little bit. 1031 00:50:25,530 --> 00:50:27,920 I want to go to the next byte over. 1032 00:50:27,920 --> 00:50:29,780 Well, I could do s bracket one. 1033 00:50:29,780 --> 00:50:34,760 All right, but I could instead go to s plus one. 1034 00:50:34,760 --> 00:50:38,600 And I could instead go to s plus two, thereby 1035 00:50:38,600 --> 00:50:41,960 doing what we're calling pointer arithmetic, math on addresses. 1036 00:50:41,960 --> 00:50:47,000 And now, if I go ahead and rerun make addresses, dot slash addresses, voila. 1037 00:50:47,000 --> 00:50:48,710 Whoops, I forgot my backslash n. 1038 00:50:48,710 --> 00:50:51,080 Let's fix that just to be tidy. 1039 00:50:51,080 --> 00:50:53,690 Dot slash addresses, voila. 1040 00:50:53,690 --> 00:50:55,670 There is our "HI". 1041 00:50:55,670 --> 00:50:58,400 Now, this is not how a normal person would print out a string, 1042 00:50:58,400 --> 00:51:02,210 but it does go to show you that there's not really been any magic. 1043 00:51:02,210 --> 00:51:05,390 Like, these characters are just where we predicted they would be. 1044 00:51:05,390 --> 00:51:08,700 And now that you have this star notation, the dereference operator, 1045 00:51:08,700 --> 00:51:12,120 which means go there, you have the ability to access individual values. 1046 00:51:12,120 --> 00:51:15,750 You even have the ability to ask where those things are by using ampersands, 1047 00:51:15,750 --> 00:51:16,480 as well. 1048 00:51:16,480 --> 00:51:20,820 But it turns out that the reason that we introduced the array syntax first 1049 00:51:20,820 --> 00:51:22,860 is that the array syntax is what the world would 1050 00:51:22,860 --> 00:51:26,040 call syntactic sugar for exactly this. 1051 00:51:26,040 --> 00:51:31,950 When you say s bracket zero, the compiler is essentially doing star s 1052 00:51:31,950 --> 00:51:33,300 and saving you the trouble. 1053 00:51:33,300 --> 00:51:37,170 When you do s bracket one, the compiler is essentially 1054 00:51:37,170 --> 00:51:41,100 saving you the trouble of doing star, in parentheses, s plus one, 1055 00:51:41,100 --> 00:51:43,750 and same for the third char, as well. 1056 00:51:43,750 --> 00:51:47,850 So all this time, pointers have been there underneath the hood. 1057 00:51:47,850 --> 00:51:50,760 They are what allow us to go to very specific memory locations. 1058 00:51:50,760 --> 00:51:54,400 They are going to be what allow us soon to start manipulating files, 1059 00:51:54,400 --> 00:51:57,908 whether it's photographs of stress balls, or CSI style content. 1060 00:51:57,908 --> 00:52:01,200 But for now, I think we should take our 10 minute break where whoopie pies will 1061 00:52:01,200 --> 00:52:02,640 now be served in the transept. 1062 00:52:02,640 --> 00:52:04,530 See you in 10. 1063 00:52:04,530 --> 00:52:05,270 All right. 1064 00:52:05,270 --> 00:52:08,520 So we are back, and we've clearly drawn too much attention to the stress balls 1065 00:52:08,520 --> 00:52:10,937 today because now we're all out of these and whoopie pies. 1066 00:52:10,937 --> 00:52:12,297 But more next week. 1067 00:52:12,297 --> 00:52:14,130 In the meantime, though, we thought we'd now 1068 00:52:14,130 --> 00:52:17,940 use some of these new building blocks, this idea of being able to manipulate 1069 00:52:17,940 --> 00:52:20,610 underlying addresses, to revisit a couple of problems 1070 00:52:20,610 --> 00:52:23,580 that we kind of swept under the rug previously 1071 00:52:23,580 --> 00:52:26,120 by avoiding these problems altogether. 1072 00:52:26,120 --> 00:52:27,120 So by that, I mean this. 1073 00:52:27,120 --> 00:52:29,160 Let me go over to VS Code. 1074 00:52:29,160 --> 00:52:31,690 And let me create another example called compare.c, 1075 00:52:31,690 --> 00:52:34,440 whose purpose in life in a moment is going to be to compare values 1076 00:52:34,440 --> 00:52:36,600 in kind of a very weak one way, too. 1077 00:52:36,600 --> 00:52:39,660 So let me go ahead and include CS50.h. 1078 00:52:39,660 --> 00:52:42,270 Let me go ahead and include standard io.h. 1079 00:52:42,270 --> 00:52:44,820 Let me do int main void, no command line arguments. 1080 00:52:44,820 --> 00:52:49,000 And in here, let me just get two integers using getint as follows. 1081 00:52:49,000 --> 00:52:51,300 So getint, and we'll ask for i. 1082 00:52:51,300 --> 00:52:54,420 Let's go ahead and get int and ask for j, 1083 00:52:54,420 --> 00:52:57,190 just so that we have two things to compare. 1084 00:52:57,190 --> 00:52:59,200 And then I'm going to do something super simple. 1085 00:52:59,200 --> 00:53:03,630 So if i equals, equals j, then let's print out, 1086 00:53:03,630 --> 00:53:07,020 as we actually did in the past, same backslash n. 1087 00:53:07,020 --> 00:53:10,430 Else, if they're not the same, let's of course, print out, for instance, 1088 00:53:10,430 --> 00:53:10,930 different. 1089 00:53:10,930 --> 00:53:13,890 So super simple program that we used the first time around, really, 1090 00:53:13,890 --> 00:53:15,430 just to demonstrate conditionals. 1091 00:53:15,430 --> 00:53:18,550 But now, we'll use it to tease apart some subtleties. 1092 00:53:18,550 --> 00:53:21,990 So let me go ahead and compile this with make compare. 1093 00:53:21,990 --> 00:53:23,730 Dot slash compare. 1094 00:53:23,730 --> 00:53:27,330 And we'll compare one and one for i and j respectively. 1095 00:53:27,330 --> 00:53:28,650 Those are, of course, the same. 1096 00:53:28,650 --> 00:53:30,357 Let's compare one and two. 1097 00:53:30,357 --> 00:53:31,690 Those are, of course, different. 1098 00:53:31,690 --> 00:53:34,560 So long story short, this program seems to work, 1099 00:53:34,560 --> 00:53:36,630 and we won't dwell much further on it. 1100 00:53:36,630 --> 00:53:39,630 But let's consider for a moment what's going on inside of the computer's 1101 00:53:39,630 --> 00:53:42,250 memory when that code is executed. 1102 00:53:42,250 --> 00:53:43,920 So here's my canvas of memory. 1103 00:53:43,920 --> 00:53:45,660 Maybe i ends up over here. 1104 00:53:45,660 --> 00:53:47,350 Maybe j ends up over here. 1105 00:53:47,350 --> 00:53:51,480 Each of them I've drawn as four squares because integers are typically 1106 00:53:51,480 --> 00:53:53,260 four bytes, or 32 bits. 1107 00:53:53,260 --> 00:53:56,520 So i has the value 50 here. i has the value 50. 1108 00:53:56,520 --> 00:53:59,220 So I accidentally typed one and one, but assume 1109 00:53:59,220 --> 00:54:00,940 that I had typed 50 in both cases. 1110 00:54:00,940 --> 00:54:03,340 They both live at these two separate locations. 1111 00:54:03,340 --> 00:54:03,840 All right. 1112 00:54:03,840 --> 00:54:05,800 So that's all fine and good. 1113 00:54:05,800 --> 00:54:09,960 And when we compare them, of course, 50 and 50, or one and one 1114 00:54:09,960 --> 00:54:12,120 are, in fact, the exact same. 1115 00:54:12,120 --> 00:54:16,290 But what if we actually compare different types of values? 1116 00:54:16,290 --> 00:54:18,210 Let me go back into VS Code here. 1117 00:54:18,210 --> 00:54:22,050 And instead of integers, let's still, using the CS50 library, 1118 00:54:22,050 --> 00:54:24,490 maybe use some strings instead. 1119 00:54:24,490 --> 00:54:29,380 So let me go ahead and change my i and j to maybe s and t, respectively. 1120 00:54:29,380 --> 00:54:31,710 So string s equals getstring. 1121 00:54:31,710 --> 00:54:34,440 And I'll ask for s, quote unquote. 1122 00:54:34,440 --> 00:54:37,260 And then string t equals getstring. 1123 00:54:37,260 --> 00:54:39,900 And then I'll ask for t, quote unquote. 1124 00:54:39,900 --> 00:54:44,700 And then down here, I'll compare s equals, equals t. 1125 00:54:44,700 --> 00:54:46,980 So here's the code, almost the same logically. 1126 00:54:46,980 --> 00:54:50,760 I'm just getting different data types instead, still using the CS50 library. 1127 00:54:50,760 --> 00:54:54,930 So let's do make compare again, dot slash compare. 1128 00:54:54,930 --> 00:54:58,140 And let's type in something like "HI" exclamation point, 1129 00:54:58,140 --> 00:54:59,760 "HI" exclamation point. 1130 00:54:59,760 --> 00:55:02,580 And that's interesting. 1131 00:55:02,580 --> 00:55:05,320 All right, let's maybe try it again. 1132 00:55:05,320 --> 00:55:08,040 So maybe lowercase "hi", "hi". 1133 00:55:08,040 --> 00:55:09,280 No, those are different. 1134 00:55:09,280 --> 00:55:12,190 Let's do it one more time, like "hi", "bye". 1135 00:55:12,190 --> 00:55:13,810 OK, so it half works. 1136 00:55:13,810 --> 00:55:17,210 But it seems to be saying different no matter what. 1137 00:55:17,210 --> 00:55:18,420 Well, why might that be? 1138 00:55:18,420 --> 00:55:20,170 Well, let me first just peel back a layer. 1139 00:55:20,170 --> 00:55:22,850 We already know that strings don't technically exist. 1140 00:55:22,850 --> 00:55:24,850 They're really char star. 1141 00:55:24,850 --> 00:55:28,120 And string here is char star. 1142 00:55:28,120 --> 00:55:32,590 So does this reveal, perhaps implicitly, why 1143 00:55:32,590 --> 00:55:35,860 s and t are being thought to be different, even though I literally 1144 00:55:35,860 --> 00:55:37,105 typed "hi" twice? 1145 00:55:37,105 --> 00:55:40,290 1146 00:55:40,290 --> 00:55:43,830 Yeah, on line nine here, I'm really just comparing 1147 00:55:43,830 --> 00:55:48,300 the addresses that are in s and t, and that's why I changed it to char star, 1148 00:55:48,300 --> 00:55:51,330 just not to change anything, but to make it even clearer 1149 00:55:51,330 --> 00:55:53,520 that s and t are, in fact, addresses. 1150 00:55:53,520 --> 00:55:55,300 They're not strings, per se. 1151 00:55:55,300 --> 00:55:57,870 They're the address of the first character in those strings. 1152 00:55:57,870 --> 00:56:00,810 And even though they happen to be the same words that I typed in, 1153 00:56:00,810 --> 00:56:03,790 it would seem to imply that they're ending up in different places. 1154 00:56:03,790 --> 00:56:06,360 So here's another canvas of memory for this program. 1155 00:56:06,360 --> 00:56:08,850 And here, for instance, might be s with enough room 1156 00:56:08,850 --> 00:56:11,550 for eight bytes up here as a pointer. 1157 00:56:11,550 --> 00:56:14,790 Here maybe is where "hi" ended up for this particular story. 1158 00:56:14,790 --> 00:56:16,455 Well, what's actually going in s? 1159 00:56:16,455 --> 00:56:21,930 Well, if h is at OX123, i is at OX124, and so forth, what's going in 1160 00:56:21,930 --> 00:56:24,180 s is OX123. 1161 00:56:24,180 --> 00:56:29,520 But when I use get string a second time and type in "hi" 1162 00:56:29,520 --> 00:56:33,090 exclamation point, even the exact same way, uppercase or lowercase, 1163 00:56:33,090 --> 00:56:35,710 t is ending up, presumably, somewhere else in memory. 1164 00:56:35,710 --> 00:56:37,830 So it's maybe using these eight bytes over here. 1165 00:56:37,830 --> 00:56:41,430 The same letters, coincidentally, by nature of how getstring works, 1166 00:56:41,430 --> 00:56:44,580 are ending up in the computer's memory, maybe down there, bottom right 1167 00:56:44,580 --> 00:56:45,270 hand corner. 1168 00:56:45,270 --> 00:56:50,370 Those are presumably different addresses, OX456, 457, 458, 459. 1169 00:56:50,370 --> 00:56:54,720 So what's going to go in t as its value? 1170 00:56:54,720 --> 00:56:57,370 OX456, according to this example. 1171 00:56:57,370 --> 00:57:00,750 And so when you literally compare s equals, 1172 00:57:00,750 --> 00:57:04,660 equals t, no, they're not the same. 1173 00:57:04,660 --> 00:57:07,770 They are, in fact, different, even if what they're pointing at 1174 00:57:07,770 --> 00:57:08,887 happens to be the same. 1175 00:57:08,887 --> 00:57:10,470 So the computer's taking us literally. 1176 00:57:10,470 --> 00:57:12,790 If you compare s and t respectively, it's 1177 00:57:12,790 --> 00:57:14,790 going to compare what their values actually are. 1178 00:57:14,790 --> 00:57:17,490 And their values are the addresses of the first letter 1179 00:57:17,490 --> 00:57:20,860 of this string, and the first letter of this string, respectively. 1180 00:57:20,860 --> 00:57:23,640 And if those addresses differ, which they clearly do, 1181 00:57:23,640 --> 00:57:25,600 they're going to be deemed different. 1182 00:57:25,600 --> 00:57:28,350 Now, you might wonder, well, this just seems stupidly inefficient. 1183 00:57:28,350 --> 00:57:31,930 Why put the same string in two different places? 1184 00:57:31,930 --> 00:57:34,415 Well, maybe the string needs to be changed later on, 1185 00:57:34,415 --> 00:57:36,790 and we might want to have two different versions thereof. 1186 00:57:36,790 --> 00:57:40,683 And frankly, the first time you call getstring, it does its thing. 1187 00:57:40,683 --> 00:57:43,100 The second time you call getstring, it does its own thing. 1188 00:57:43,100 --> 00:57:46,433 It doesn't necessarily know how many times it's been called in the past. 1189 00:57:46,433 --> 00:57:48,850 And so maybe there's no communication between those calls. 1190 00:57:48,850 --> 00:57:50,860 And so surely, it's going to do the simple thing 1191 00:57:50,860 --> 00:57:54,160 and just create more memory, create more memory for each of those strings, 1192 00:57:54,160 --> 00:57:56,350 duplicates though they may seem to be. 1193 00:57:56,350 --> 00:57:58,150 So what does this imply? 1194 00:57:58,150 --> 00:58:01,540 Well, you might recall that we avoided this problem altogether 1195 00:58:01,540 --> 00:58:06,850 just a week ago by using what solution on line nine? 1196 00:58:06,850 --> 00:58:10,240 I did not compare two strings using equals, equals last time. 1197 00:58:10,240 --> 00:58:12,940 1198 00:58:12,940 --> 00:58:13,780 Exactly. 1199 00:58:13,780 --> 00:58:18,250 We used the strcompare function, which is in string.h very deliberately 1200 00:58:18,250 --> 00:58:21,280 at the time, because I didn't want to trip over this mistake at the time 1201 00:58:21,280 --> 00:58:24,010 until we were sort of ready and had the vocabulary to discuss it. 1202 00:58:24,010 --> 00:58:28,090 But I did not do s equals, equals t, even though, logically, that's 1203 00:58:28,090 --> 00:58:30,330 what you're trying to do, compare for equality. 1204 00:58:30,330 --> 00:58:32,080 But if you know now what a string is, it's 1205 00:58:32,080 --> 00:58:34,480 an array of characters starting at some address. 1206 00:58:34,480 --> 00:58:37,630 You really need someone something to do the heavy lifting 1207 00:58:37,630 --> 00:58:40,900 of comparing every one of those chars from left to right. 1208 00:58:40,900 --> 00:58:45,700 We did it ourselves last time by just implementing it in code two weeks ago. 1209 00:58:45,700 --> 00:58:48,350 But strcompare compare does it for us. 1210 00:58:48,350 --> 00:58:55,030 So strcompare, s comma t actually weirdly returns three possible values, 1211 00:58:55,030 --> 00:58:59,140 zero if they're the same, a positive number if one comes before the other, 1212 00:58:59,140 --> 00:59:01,370 or a negative number if the opposite is true. 1213 00:59:01,370 --> 00:59:04,420 So strcomp, remember, can be used for alphabetizing 1214 00:59:04,420 --> 00:59:08,780 things, or ascii-betizing things, based on those Ascii values. 1215 00:59:08,780 --> 00:59:12,910 So this version, if I open my terminal window now and do make compare, 1216 00:59:12,910 --> 00:59:17,337 dot slash compare, and type in "hi" and "hi", now, in fact, 1217 00:59:17,337 --> 00:59:19,420 they're the same because strcomp is doing the work 1218 00:59:19,420 --> 00:59:20,920 of comparing them char by char. 1219 00:59:20,920 --> 00:59:25,280 And if I do "hi" and "bye", those are now, in fact, different. 1220 00:59:25,280 --> 00:59:28,990 So we avoided the problem last time for this very reason that simply using 1221 00:59:28,990 --> 00:59:30,790 equals, equals would not have worked. 1222 00:59:30,790 --> 00:59:31,290 Yes. 1223 00:59:31,290 --> 00:59:38,432 1224 00:59:38,432 --> 00:59:40,384 AUDIENCE: Using those values? 1225 00:59:40,384 --> 00:59:43,312 Is it like one minus one, or one, two, three, 1226 00:59:43,312 --> 00:59:44,992 depending how different they are? 1227 00:59:44,992 --> 00:59:46,200 SPEAKER 1: Oh, good question. 1228 00:59:46,200 --> 00:59:49,590 So when using strcompare, the documentation 1229 00:59:49,590 --> 00:59:53,807 says that it will return zero, or a positive number, or a negative number. 1230 00:59:53,807 --> 00:59:55,390 It doesn't tell you a specific number. 1231 00:59:55,390 --> 00:59:59,110 So the magnitude of the integer that comes back actually has no meaning. 1232 00:59:59,110 --> 01:00:03,010 It might very well be one, zero, and negative one, but there's no guarantee. 1233 01:00:03,010 --> 01:00:06,510 And so you can check for equality equals, equals, 1234 01:00:06,510 --> 01:00:10,050 or you should check for greater than or less 1235 01:00:10,050 --> 01:00:12,340 than, but not specific to a certain number. 1236 01:00:12,340 --> 01:00:14,010 So it just gives you relative ordering. 1237 01:00:14,010 --> 01:00:17,190 It doesn't give you any more detail than that. 1238 01:00:17,190 --> 01:00:17,710 All right. 1239 01:00:17,710 --> 01:00:21,990 So if we were to now take this lesson a step further, 1240 01:00:21,990 --> 01:00:26,927 just to hammer home this point, whereby these strings s and t must clearly 1241 01:00:26,927 --> 01:00:29,760 live at different addresses, let's actually try to see this in code. 1242 01:00:29,760 --> 01:00:31,990 So let me go back to VS Code here. 1243 01:00:31,990 --> 01:00:34,590 Let me go ahead and just remove all of the conditional code, 1244 01:00:34,590 --> 01:00:38,910 and instead do something old school, like print out %s backslash n and print 1245 01:00:38,910 --> 01:00:39,525 out s. 1246 01:00:39,525 --> 01:00:43,390 Then Let's go ahead and print out %s again, but print out t, 1247 01:00:43,390 --> 01:00:46,040 just to see the two strings as being duplicative. 1248 01:00:46,040 --> 01:00:46,720 So here I go. 1249 01:00:46,720 --> 01:00:50,652 Make compare dot slash compare, "hi" exclamation point, 1250 01:00:50,652 --> 01:00:51,610 "hi" exclamation point. 1251 01:00:51,610 --> 01:00:53,510 And of course, they're actually the same. 1252 01:00:53,510 --> 01:00:56,830 But if I actually want to see where s and t are, 1253 01:00:56,830 --> 01:00:59,650 I can change the % s to what? 1254 01:00:59,650 --> 01:01:02,200 %p, %p here. 1255 01:01:02,200 --> 01:01:05,020 And I don't need to use an ampersand before the s or the t 1256 01:01:05,020 --> 01:01:06,700 because they are already addresses. 1257 01:01:06,700 --> 01:01:08,480 That was today's big reveal. 1258 01:01:08,480 --> 01:01:12,050 And it turns out that printf is smart enough when you use s, 1259 01:01:12,050 --> 01:01:18,280 and you give it an address of s, or the address in t, to just go there for you. 1260 01:01:18,280 --> 01:01:21,025 So printf has been doing all of that for us with %s. 1261 01:01:21,025 --> 01:01:24,070 But %p is actually going to print out those raw addresses. 1262 01:01:24,070 --> 01:01:28,330 So let me do make compare, dot slash compare, "hi" once, "hi" twice. 1263 01:01:28,330 --> 01:01:34,120 And here now, we should see the addresses at which "hi" lives. 1264 01:01:34,120 --> 01:01:37,870 And it's not going to be as simplistic as OX123 and OX456. 1265 01:01:37,870 --> 01:01:40,540 But if I go back to my terminal and hit enter, 1266 01:01:40,540 --> 01:01:43,480 indeed, I get two different hexadecimal values 1267 01:01:43,480 --> 01:01:46,870 that makes clear that, if I were to naively compare them with equals, 1268 01:01:46,870 --> 01:01:49,450 equals, they're always going to be different, 1269 01:01:49,450 --> 01:01:51,740 even if I typed in the same words. 1270 01:01:51,740 --> 01:01:54,850 So there's implications now of this, especially if we want 1271 01:01:54,850 --> 01:01:56,960 to start changing things in memory. 1272 01:01:56,960 --> 01:02:00,490 So for instance, let me create a new program called copy.c. 1273 01:02:00,490 --> 01:02:04,240 And in here, we'll start somewhat similarly with CS5o.h. 1274 01:02:04,240 --> 01:02:09,070 We'll start with standard io.h. 1275 01:02:09,070 --> 01:02:13,730 And preemptively, I'm going to go ahead and include string.h, as well. 1276 01:02:13,730 --> 01:02:16,750 I'm going to declare main as not taking any command line arguments. 1277 01:02:16,750 --> 01:02:20,050 And this time, I'm just going to get one string s with getstring, 1278 01:02:20,050 --> 01:02:22,540 and I'll prompt the user for s. 1279 01:02:22,540 --> 01:02:24,730 And now, let me go ahead and naively say this. 1280 01:02:24,730 --> 01:02:28,660 Let me give myself a new string called t and just set it equal to s, 1281 01:02:28,660 --> 01:02:31,672 my instinct being this is how I've copied integers before. 1282 01:02:31,672 --> 01:02:33,880 This is how I've copied floating point values before. 1283 01:02:33,880 --> 01:02:38,800 This surely is how I copy strings, using the assignment operator as usual. 1284 01:02:38,800 --> 01:02:40,810 Let me now for the sake of discussion propose 1285 01:02:40,810 --> 01:02:43,480 that I want to capitalize the first letter in t, 1286 01:02:43,480 --> 01:02:45,330 but not the first letter in s. 1287 01:02:45,330 --> 01:02:50,590 So logically, based on week two syntax, I'm going to go into the t string, 1288 01:02:50,590 --> 01:02:58,180 go to location zero, and set it equal to upper of t bracket zero. 1289 01:02:58,180 --> 01:02:59,980 So recall, we introduced two upper. 1290 01:02:59,980 --> 01:03:02,140 It's just a handy function for capitalizing things. 1291 01:03:02,140 --> 01:03:04,510 There's two lower, and there's a bunch of others, as well. 1292 01:03:04,510 --> 01:03:07,385 I didn't include the header file yet, though, so I'm going to go back 1293 01:03:07,385 --> 01:03:08,050 and include-- 1294 01:03:08,050 --> 01:03:10,120 anyone remember where these are? 1295 01:03:10,120 --> 01:03:11,860 Yeah, ctype.h. 1296 01:03:11,860 --> 01:03:14,600 And it's fine to look that up in the menu if you ever need it. 1297 01:03:14,600 --> 01:03:19,150 So here, I am, a little naively, capitalizing the first letter in t. 1298 01:03:19,150 --> 01:03:21,850 Technically speaking, I should check what the length of t 1299 01:03:21,850 --> 01:03:25,772 is first, because if there's no characters there, if it has zero chars, 1300 01:03:25,772 --> 01:03:26,980 there's nothing to uppercase. 1301 01:03:26,980 --> 01:03:30,190 But for now, I'm going to keep it simple and just blindly do that there. 1302 01:03:30,190 --> 01:03:34,510 Now, let me go ahead and print out with %s the value of s. 1303 01:03:34,510 --> 01:03:38,080 Now, let me go ahead and print out with %s value of t. 1304 01:03:38,080 --> 01:03:43,930 And I should see one lowercase s and one capitalized T. All right, here we go. 1305 01:03:43,930 --> 01:03:46,875 Make copy, dot slash copy. 1306 01:03:46,875 --> 01:03:49,000 And I'm going to deliberately type it in lowercase. 1307 01:03:49,000 --> 01:03:55,810 "hi" exclamation point, and we should see now they're both capitalized, 1308 01:03:55,810 --> 01:03:57,340 it would seem. 1309 01:03:57,340 --> 01:04:00,295 Intuitively, why might that be? 1310 01:04:00,295 --> 01:04:02,950 1311 01:04:02,950 --> 01:04:05,090 Exactly, the addresses are the same. 1312 01:04:05,090 --> 01:04:10,030 So if I do use the assignment operator and just do t equals s semicolon, 1313 01:04:10,030 --> 01:04:14,950 it's going to take me literally and copy the address in s over to t, 1314 01:04:14,950 --> 01:04:17,330 so that effectively, they're pointing at the same thing. 1315 01:04:17,330 --> 01:04:21,000 So if we draw another picture here, for instance, here maybe is s, 1316 01:04:21,000 --> 01:04:23,800 and here maybe is the lowercase "hi" that I first type 1317 01:04:23,800 --> 01:04:25,270 in down here in memory. 1318 01:04:25,270 --> 01:04:28,660 Maybe that's at OX123 again, and therefore that's what's in s. 1319 01:04:28,660 --> 01:04:33,340 When I then create the variable t by declaring it to be a string, 1320 01:04:33,340 --> 01:04:36,160 as well, that gives me another variable here called t. 1321 01:04:36,160 --> 01:04:38,665 But I'm just setting it equal to s. 1322 01:04:38,665 --> 01:04:41,710 I'm not calling getstring again in this version of copy. 1323 01:04:41,710 --> 01:04:43,030 That was in compare. 1324 01:04:43,030 --> 01:04:46,290 In copy, I'm just literally copying s into t. 1325 01:04:46,290 --> 01:04:50,370 So that literally just changes the value to OX123, also. 1326 01:04:50,370 --> 01:04:52,440 And if we abstract away all of these addresses, 1327 01:04:52,440 --> 01:04:56,920 that's essentially like s and t both pointing to the same place. 1328 01:04:56,920 --> 01:05:02,560 So if I use s bracket zero, or t bracket zero, they are one and the same. 1329 01:05:02,560 --> 01:05:05,220 So when I use t bracket zero to use uppercase, 1330 01:05:05,220 --> 01:05:08,070 it's changing that lowercase h to capital H. 1331 01:05:08,070 --> 01:05:13,980 But again, both strings, both pointers are pointing at the same value. 1332 01:05:13,980 --> 01:05:17,130 And again, this should be even clearer as of today. 1333 01:05:17,130 --> 01:05:21,060 If I go back into VS Code and, indeed, take these training wheels off, 1334 01:05:21,060 --> 01:05:25,950 and treat string as what it is, char star, which indicates that both s and t 1335 01:05:25,950 --> 01:05:29,700 are just addresses, which makes even clearer, syntactically, 1336 01:05:29,700 --> 01:05:33,640 that this is probably the picture that's going on underneath the hood. 1337 01:05:33,640 --> 01:05:35,640 Now, just to make the code a little more robust, 1338 01:05:35,640 --> 01:05:38,190 let me at least be a little careful here. 1339 01:05:38,190 --> 01:05:44,530 If the string length of t is greater than zero, then and only then, 1340 01:05:44,530 --> 01:05:48,880 should I really blindly index into the string and go to location zero. 1341 01:05:48,880 --> 01:05:51,310 That doesn't really solve the fundamental problem, 1342 01:05:51,310 --> 01:05:54,640 but it at least avoids a situation where maybe the user just hits enter, 1343 01:05:54,640 --> 01:05:56,800 gives me no characters, and I try to blindly 1344 01:05:56,800 --> 01:05:59,180 uppercase something that's not there. 1345 01:05:59,180 --> 01:06:00,377 But there's still a bug. 1346 01:06:00,377 --> 01:06:01,210 There's still a bug. 1347 01:06:01,210 --> 01:06:03,290 So how do I actually solve this? 1348 01:06:03,290 --> 01:06:05,260 Well, it turns out we need two other functions 1349 01:06:05,260 --> 01:06:06,873 that we haven't had occasion to use. 1350 01:06:06,873 --> 01:06:09,040 But these are perhaps the most powerful, and they're 1351 01:06:09,040 --> 01:06:11,860 going to allow us to solve even grander problems next week when 1352 01:06:11,860 --> 01:06:14,540 we discuss all the more, things called data structures. 1353 01:06:14,540 --> 01:06:19,160 But for now, let's very simply solve this idea of copying a string. 1354 01:06:19,160 --> 01:06:24,100 Let me go back into VS Code here, and let me give myself one more header file 1355 01:06:24,100 --> 01:06:26,810 that's called standard lib for standard library. 1356 01:06:26,810 --> 01:06:31,420 So include standard lib dot h, in which both of these functions, 1357 01:06:31,420 --> 01:06:33,670 malloc and free, are declared for me. 1358 01:06:33,670 --> 01:06:37,300 And now, in my code, I'm going to behave a little bit differently here. 1359 01:06:37,300 --> 01:06:40,900 Clearly, I got into trouble by just blindly copying the addresses. 1360 01:06:40,900 --> 01:06:43,960 What I really want to do when I copy strings, presumably, and then 1361 01:06:43,960 --> 01:06:47,530 uppercase one of them, is I want to create 1362 01:06:47,530 --> 01:06:51,160 a duplicate string, a second array that is identical, 1363 01:06:51,160 --> 01:06:52,970 but is elsewhere in memory. 1364 01:06:52,970 --> 01:06:55,520 So the way to do this might be as follows. 1365 01:06:55,520 --> 01:06:59,470 Instead of just setting t equal to s, I should really 1366 01:06:59,470 --> 01:07:02,050 call this brand new function called malloc, 1367 01:07:02,050 --> 01:07:06,020 which stands for memory allocate, and it takes a single argument, 1368 01:07:06,020 --> 01:07:09,010 which is just the number of bytes you would like the operating system 1369 01:07:09,010 --> 01:07:10,400 to allocate for you. 1370 01:07:10,400 --> 01:07:13,700 So whether you're using this on Windows, Mac OS, or Linux in our case, 1371 01:07:13,700 --> 01:07:16,060 this is a way I can literally ask the operating system, 1372 01:07:16,060 --> 01:07:19,330 please find for me some number of bytes in the computer's memory 1373 01:07:19,330 --> 01:07:21,740 that I can now use for my own purposes. 1374 01:07:21,740 --> 01:07:24,978 So malloc here, I technically need at least three bytes, 1375 01:07:24,978 --> 01:07:26,770 but that's not going to be enough because I 1376 01:07:26,770 --> 01:07:28,490 need a fourth for the null character. 1377 01:07:28,490 --> 01:07:29,560 So I could put four here. 1378 01:07:29,560 --> 01:07:30,340 But that's stupid. 1379 01:07:30,340 --> 01:07:33,070 I shouldn't just hardcode a number like this we've seen. 1380 01:07:33,070 --> 01:07:38,392 So I could probably do strlen of s to dynamically figure out 1381 01:07:38,392 --> 01:07:39,850 how many bytes I want for the copy. 1382 01:07:39,850 --> 01:07:42,160 But that, too, is not enough because string length 1383 01:07:42,160 --> 01:07:46,630 returns the human readable length, so H-I exclamation point. 1384 01:07:46,630 --> 01:07:48,480 So I think I want a plus one in there, too. 1385 01:07:48,480 --> 01:07:51,188 So that just means get the length of whatever the human typed in, 1386 01:07:51,188 --> 01:07:54,810 add one for the null character to make sure that we're not undercounting. 1387 01:07:54,810 --> 01:07:56,520 Now, what can I then do? 1388 01:07:56,520 --> 01:07:59,050 Unfortunately, I need to do a bit of work here. 1389 01:07:59,050 --> 01:08:02,310 So let me actually go ahead now and do something like this. 1390 01:08:02,310 --> 01:08:10,740 For int i equals zero, I is less than the string length of s, i plus, plus. 1391 01:08:10,740 --> 01:08:15,540 And then inside of this loop, I could copy into the ith location of t, 1392 01:08:15,540 --> 01:08:17,970 whatever is in the ith location of s. 1393 01:08:17,970 --> 01:08:20,609 Now, this is a little buggy. 1394 01:08:20,609 --> 01:08:23,763 One, this is inefficient to keep asking this question. 1395 01:08:23,763 --> 01:08:25,680 We talked about this in the context of design. 1396 01:08:25,680 --> 01:08:29,580 I should probably improve this by giving myself a variable like n, 1397 01:08:29,580 --> 01:08:33,450 set that equal to the string length, and then do i is less than n 1398 01:08:33,450 --> 01:08:35,520 again, and again, just so I'm not stupidly 1399 01:08:35,520 --> 01:08:38,580 calling string length four different times, or three different times. 1400 01:08:38,580 --> 01:08:42,359 But this, too, is slightly buggy, and this one's very subtle. 1401 01:08:42,359 --> 01:08:47,740 This does not fully copy s into t. 1402 01:08:47,740 --> 01:08:54,069 Does anyone see the very subtle bug that I've introduced? 1403 01:08:54,069 --> 01:08:55,660 Sorry? 1404 01:08:55,660 --> 01:08:58,120 Yeah, I'm forgetting the backslash zero. 1405 01:08:58,120 --> 01:09:00,340 So even though I'm copying H-I exclamation point, 1406 01:09:00,340 --> 01:09:04,240 or whatever the human typed in, I need to go one step further deliberately 1407 01:09:04,240 --> 01:09:07,180 to make sure I also copy the backslash zero, 1408 01:09:07,180 --> 01:09:09,250 or at least manually put it in myself. 1409 01:09:09,250 --> 01:09:14,215 So I could solve this by, either doing this up to and through n, 1410 01:09:14,215 --> 01:09:17,620 i is less than or equal to n, or I could plus one here. 1411 01:09:17,620 --> 01:09:18,859 That, too, would be fine. 1412 01:09:18,859 --> 01:09:25,510 Or if I really want, I could do this, like t bracket 1413 01:09:25,510 --> 01:09:27,673 three equals, quote unquote, backslash zero. 1414 01:09:27,673 --> 01:09:30,340 But again, I shouldn't get into the habit of hard coding things. 1415 01:09:30,340 --> 01:09:37,029 I could do string length of s, and that would give me the last location in s, 1416 01:09:37,029 --> 01:09:38,200 which would also work. 1417 01:09:38,200 --> 01:09:39,620 But that, too, is stupid. 1418 01:09:39,620 --> 01:09:40,540 I might as well-- 1419 01:09:40,540 --> 01:09:42,250 or just unnecessarily complex. 1420 01:09:42,250 --> 01:09:44,460 Let's just do this, change one symbol, and boom. 1421 01:09:44,460 --> 01:09:49,800 Now we're copying all three, and the fourth character, as well. 1422 01:09:49,800 --> 01:09:52,380 All right, so with this said, let's go ahead now 1423 01:09:52,380 --> 01:09:56,310 and make sure that t is indeed of length at least greater than zero. 1424 01:09:56,310 --> 01:09:59,830 Then let's go ahead and capitalize t as before and print out the results. 1425 01:09:59,830 --> 01:10:04,212 So let me go ahead and open my terminal window, make copy, dot slash copy, 1426 01:10:04,212 --> 01:10:06,420 and I'm going to deliberately type "hi" in lowercase. 1427 01:10:06,420 --> 01:10:10,380 And now we should see disparate s and t. 1428 01:10:10,380 --> 01:10:14,760 s is now still lowercase, and T is now capitalized. 1429 01:10:14,760 --> 01:10:16,980 But why is that exactly? 1430 01:10:16,980 --> 01:10:21,060 Well, let me actually go into, say, my computer's memory 1431 01:10:21,060 --> 01:10:24,210 again and propose that, if what I had before 1432 01:10:24,210 --> 01:10:27,090 was this situation, where s is pointing at this chunk of memory, 1433 01:10:27,090 --> 01:10:30,660 and t was accidentally pointing in that same chunk of memory, what we really 1434 01:10:30,660 --> 01:10:33,750 want to do is have t point at a new chunk of memory. 1435 01:10:33,750 --> 01:10:36,660 And malloc is what gives us this chunk of memory. 1436 01:10:36,660 --> 01:10:41,490 And then using that for loop, can I copy the H, the I, the exclamation point, 1437 01:10:41,490 --> 01:10:44,020 and even the backslash zero. 1438 01:10:44,020 --> 01:10:48,280 So now, this is a little subtle, but malloc is what gives me 1439 01:10:48,280 --> 01:10:50,620 access to this new chunk of memory. 1440 01:10:50,620 --> 01:10:55,060 Malloc takes one argument, the number of bytes that you want it to find for you. 1441 01:10:55,060 --> 01:10:56,020 Take a guess. 1442 01:10:56,020 --> 01:11:00,970 What value is malloc returning? 1443 01:11:00,970 --> 01:11:06,220 Conceptually, it's returning a chunk of memory, but that's kind of handwavy. 1444 01:11:06,220 --> 01:11:08,200 What might malloc actually be returning? 1445 01:11:08,200 --> 01:11:10,675 AUDIENCE: Maybe the pointer to the first character? 1446 01:11:10,675 --> 01:11:11,710 SPEAKER 1: Perfect. 1447 01:11:11,710 --> 01:11:17,050 malloc is returning the address of that chunk of memory, not the last address. 1448 01:11:17,050 --> 01:11:18,220 The first address. 1449 01:11:18,220 --> 01:11:20,050 And here's a difference with strings. 1450 01:11:20,050 --> 01:11:23,710 This chunk of memory is not magically terminated with null for you. 1451 01:11:23,710 --> 01:11:27,580 I had to do that with for loop. malloc, and in turn, your operating system, 1452 01:11:27,580 --> 01:11:30,560 does keep track of how big these chunks of memory are. 1453 01:11:30,560 --> 01:11:32,770 So even though it's only returning the address 1454 01:11:32,770 --> 01:11:35,860 of the first byte of that memory, the operating system 1455 01:11:35,860 --> 01:11:39,582 is going to know that it used up four bytes here, four bytes here. 1456 01:11:39,582 --> 01:11:41,290 And it will keep track of that so that it 1457 01:11:41,290 --> 01:11:43,570 doesn't give you an overlapping address in the future 1458 01:11:43,570 --> 01:11:44,380 because that would be bad. 1459 01:11:44,380 --> 01:11:45,670 Your data would get corrupted. 1460 01:11:45,670 --> 01:11:50,500 But you, similarly, have to remember or figure out how many bytes are available 1461 01:11:50,500 --> 01:11:51,070 thereafter. 1462 01:11:51,070 --> 01:11:54,970 It's up to you to manage it, as by putting a null character there 1463 01:11:54,970 --> 01:11:55,970 yourself. 1464 01:11:55,970 --> 01:11:58,480 So if I go back to my code now, let me actually 1465 01:11:58,480 --> 01:12:02,090 harden this code just a little bit more as follows, 1466 01:12:02,090 --> 01:12:05,450 whereby I can do this a little better. 1467 01:12:05,450 --> 01:12:10,010 If I go back to VS Code here, it turns out, if something goes wrong 1468 01:12:10,010 --> 01:12:12,470 and I'm out of memory, maybe I've got an old computer, 1469 01:12:12,470 --> 01:12:15,732 or maybe I'm typing something way bigger than three characters in, 1470 01:12:15,732 --> 01:12:17,690 like three billion characters, and the computer 1471 01:12:17,690 --> 01:12:19,310 might genuinely run out of memory. 1472 01:12:19,310 --> 01:12:21,410 I actually should be in the habit of doing this. 1473 01:12:21,410 --> 01:12:27,320 If t equals, equals a special symbol called null with two Ls, 1474 01:12:27,320 --> 01:12:29,330 and I promised this would eventually exist, 1475 01:12:29,330 --> 01:12:32,750 I should just return one now, or return two, return negative one, 1476 01:12:32,750 --> 01:12:37,340 return any value other than zero, and just abort the program early. 1477 01:12:37,340 --> 01:12:42,230 That means, if malloc returns null, there's not enough memory available. 1478 01:12:42,230 --> 01:12:45,260 And it turns out, all this time, I'm going to do one other crazy thing, 1479 01:12:45,260 --> 01:12:47,660 even though we've not expected you to do this thus far. 1480 01:12:47,660 --> 01:12:50,180 Technically, when using getstring, getstring, 1481 01:12:50,180 --> 01:12:53,960 if you read the documentation, the manual, it too can return null. 1482 01:12:53,960 --> 01:12:56,300 Because if you type in a crazy long string, 1483 01:12:56,300 --> 01:12:58,340 and the computer can't fit it in its memory, 1484 01:12:58,340 --> 01:13:00,440 getstring needs to signal that to you somehow. 1485 01:13:00,440 --> 01:13:05,940 And the documentation actually says that, if getstring returns null, 1486 01:13:05,940 --> 01:13:08,750 then you too should not trust what's in it. 1487 01:13:08,750 --> 01:13:12,710 You should just exit the program immediately, in this case. 1488 01:13:12,710 --> 01:13:15,648 But there's one other improvement we can make here. 1489 01:13:15,648 --> 01:13:18,440 And even though this is making the code seem way longer than it is, 1490 01:13:18,440 --> 01:13:20,750 most of this I've just added is just error checking, 1491 01:13:20,750 --> 01:13:24,680 just mindless error checking to make sure that I don't treat s as being 1492 01:13:24,680 --> 01:13:27,020 valid, or t as being valid when it isn't. 1493 01:13:27,020 --> 01:13:28,290 It turns out this is stupid. 1494 01:13:28,290 --> 01:13:29,790 I don't need to reinvent this wheel. 1495 01:13:29,790 --> 01:13:32,900 Certainly, for decades, people have been copying strings, even in C. 1496 01:13:32,900 --> 01:13:36,350 So it turns out there's another fun function called strcopy, wonderfully 1497 01:13:36,350 --> 01:13:39,440 enough, that takes the destination as its first argument, 1498 01:13:39,440 --> 01:13:41,610 the source as its second argument. 1499 01:13:41,610 --> 01:13:46,500 And that will for me copy s into t, respectively. 1500 01:13:46,500 --> 01:13:51,380 So that does the equivalent of that for loop, including the backslash zero. 1501 01:13:51,380 --> 01:13:55,310 However, there's one other function recall that was on our cheat sheet 1502 01:13:55,310 --> 01:13:58,100 a moment ago, whereby malloc is accompanied 1503 01:13:58,100 --> 01:14:00,110 by one other function called Free. 1504 01:14:00,110 --> 01:14:02,090 So Free is the opposite of malloc. 1505 01:14:02,090 --> 01:14:04,040 When you're done with your computer's memory, 1506 01:14:04,040 --> 01:14:07,070 you're supposed to give it back to Windows, to Mac OS, to Linux 1507 01:14:07,070 --> 01:14:09,320 so it can reuse it for something else. 1508 01:14:09,320 --> 01:14:12,350 And frankly, if you've ever been using your computer for hours 1509 01:14:12,350 --> 01:14:15,720 on end, days on end, and maybe it's getting slower, and slower, 1510 01:14:15,720 --> 01:14:18,470 maybe it's Photoshop, maybe it's a really big document, generally, 1511 01:14:18,470 --> 01:14:20,840 really big files consume lots and lots of memory. 1512 01:14:20,840 --> 01:14:24,320 If the humans who wrote that software, be it Photoshop or something else, 1513 01:14:24,320 --> 01:14:29,150 wrote buggy code and kept using malloc, malloc, malloc, malloc, asking 1514 01:14:29,150 --> 01:14:32,780 for more and more memory, but they never call the opposite function, Free, 1515 01:14:32,780 --> 01:14:34,820 your computer might actually run out of memory. 1516 01:14:34,820 --> 01:14:37,130 And typically, the symptom is that it gets so darn 1517 01:14:37,130 --> 01:14:38,720 slow it becomes annoying to use. 1518 01:14:38,720 --> 01:14:42,230 And frankly, the mouse starts moving very slowly, maybe the thing freezes 1519 01:14:42,230 --> 01:14:43,880 altogether, the computer crashes. 1520 01:14:43,880 --> 01:14:46,080 Bad things happen when you run out of memory. 1521 01:14:46,080 --> 01:14:48,800 So in my case here, if I go back to VS Code, 1522 01:14:48,800 --> 01:14:53,300 it's actually on me in this language called C to actually manage 1523 01:14:53,300 --> 01:14:57,530 the memory myself so that, when I have called malloc, 1524 01:14:57,530 --> 01:15:00,753 thereafter, I had better free that same memory. 1525 01:15:00,753 --> 01:15:02,420 Now, I don't want to free it right away. 1526 01:15:02,420 --> 01:15:03,890 I want to free it when I'm done with it. 1527 01:15:03,890 --> 01:15:06,590 So frankly, the very last thing I'm going to do in my program 1528 01:15:06,590 --> 01:15:12,720 here is called Free on t because t is what I malloced up here. 1529 01:15:12,720 --> 01:15:15,950 So at the very bottom of my program, I should free t. 1530 01:15:15,950 --> 01:15:18,920 And then just to be super nitpicky, let me return zero just 1531 01:15:18,920 --> 01:15:21,710 to signify success at this point. 1532 01:15:21,710 --> 01:15:25,460 Now, there's a slight asymmetry, which is a little inconsistent here. 1533 01:15:25,460 --> 01:15:28,130 Even though getstring, I'm going to imply, 1534 01:15:28,130 --> 01:15:31,760 is still allocating memory for me, it actually does use malloc. 1535 01:15:31,760 --> 01:15:34,710 getstring and CS50's other functions are special. 1536 01:15:34,710 --> 01:15:38,120 They manage memory for you, so you do not and should not free 1537 01:15:38,120 --> 01:15:41,000 memory that getstring returns to you. 1538 01:15:41,000 --> 01:15:42,350 We handle all of that for you. 1539 01:15:42,350 --> 01:15:45,350 But that's a training wheel that's going to be taken off as of this week 1540 01:15:45,350 --> 01:15:46,558 anyway, so it's kind of moot. 1541 01:15:46,558 --> 01:15:47,540 So not to worry. 1542 01:15:47,540 --> 01:15:52,100 But I'm only freeing memory that malloced. 1543 01:15:52,100 --> 01:15:52,820 All right. 1544 01:15:52,820 --> 01:15:56,516 Null, then, means the-- 1545 01:15:56,516 --> 01:15:57,690 what is null? 1546 01:15:57,690 --> 01:16:01,000 It is just an address, and it's literally the address zero. 1547 01:16:01,000 --> 01:16:02,000 So there's this theme. 1548 01:16:02,000 --> 01:16:03,827 N-U-L recall, was the terminating symbol, 1549 01:16:03,827 --> 01:16:05,410 which just means the string ends here. 1550 01:16:05,410 --> 01:16:09,370 N-U-L-L, which is not greatly named, but it's what humans went with years ago, 1551 01:16:09,370 --> 01:16:11,603 just means that this is the address zero. 1552 01:16:11,603 --> 01:16:14,770 And what your computer does is, even though I've been playfully saying that, 1553 01:16:14,770 --> 01:16:18,580 oh, in the top left is address zero, and then one, and then two, and then three, 1554 01:16:18,580 --> 01:16:20,333 the address zero is hands off. 1555 01:16:20,333 --> 01:16:22,750 It's kind of a wasted byte that your computer should never 1556 01:16:22,750 --> 01:16:27,400 use because the computer uses zero as a special sentinel value, null, 1557 01:16:27,400 --> 01:16:28,880 to signify error. 1558 01:16:28,880 --> 01:16:32,170 So we're spending one byte out of billions nowadays just to make sure 1559 01:16:32,170 --> 01:16:34,900 that there's a special symbol that's coming back that can 1560 01:16:34,900 --> 01:16:37,600 indicate when something has gone wrong. 1561 01:16:37,600 --> 01:16:38,320 All right. 1562 01:16:38,320 --> 01:16:39,700 That was a mouthful. 1563 01:16:39,700 --> 01:16:46,120 Any questions on this copying of strings, this malloc-ing, 1564 01:16:46,120 --> 01:16:48,640 or this freeing? 1565 01:16:48,640 --> 01:16:49,570 Oh, all right. 1566 01:16:49,570 --> 01:16:55,460 So let me give you a tool with which to make some of this stuff easier, 1567 01:16:55,460 --> 01:16:58,690 so that when you make mistakes or have bugs, as you invariably will, 1568 01:16:58,690 --> 01:17:01,148 you can chase them down without having to raise your hand, 1569 01:17:01,148 --> 01:17:02,440 without having to ask the duck. 1570 01:17:02,440 --> 01:17:04,482 You actually have more technical tools with which 1571 01:17:04,482 --> 01:17:06,190 to diagnose the problem yourself. 1572 01:17:06,190 --> 01:17:09,280 And there's this new tool that we'll introduce today called valgrind. 1573 01:17:09,280 --> 01:17:13,150 And valgrind's purpose in life is to check your usage of memory for you. 1574 01:17:13,150 --> 01:17:14,890 Admittedly, it's an older program. 1575 01:17:14,890 --> 01:17:16,800 It's pretty arcane in terms of its interface, 1576 01:17:16,800 --> 01:17:19,300 and there's just going to be a mess of output on the screen. 1577 01:17:19,300 --> 01:17:21,970 But there's going to be certain patterns of mistakes that you'll notice, 1578 01:17:21,970 --> 01:17:24,137 and I'll demonstrate a couple of them now so you can 1579 01:17:24,137 --> 01:17:26,270 see where and how you might go wrong. 1580 01:17:26,270 --> 01:17:28,030 So I'm going to go over to VS Code here. 1581 01:17:28,030 --> 01:17:32,020 I'm going to create a program called memory.c that is deliberately buggy, 1582 01:17:32,020 --> 01:17:34,780 but it's not going to be obviously buggy at first. 1583 01:17:34,780 --> 01:17:36,010 So by that I mean this. 1584 01:17:36,010 --> 01:17:39,100 Let me do include standard io.h. 1585 01:17:39,100 --> 01:17:44,320 Let me also include proactively standard lib.h so I can use malloc. 1586 01:17:44,320 --> 01:17:47,110 Let me declare main with no command line arguments, 1587 01:17:47,110 --> 01:17:49,280 and let me do something very simple. 1588 01:17:49,280 --> 01:17:53,740 Instead of just declaring an int called X, let me be a little crazy 1589 01:17:53,740 --> 01:17:56,290 and manually allocate this memory myself. 1590 01:17:56,290 --> 01:17:59,110 So int X just gives me an integer, and it has since week one. 1591 01:17:59,110 --> 01:18:02,990 But now that I have malloc, I can kind of take control over this process. 1592 01:18:02,990 --> 01:18:07,420 So let me declare, not an int, but an int star called X. So 1593 01:18:07,420 --> 01:18:10,120 give me the address of an integer, and let 1594 01:18:10,120 --> 01:18:15,970 me store there the return value of malloc by asking malloc for, let's say, 1595 01:18:15,970 --> 01:18:17,300 four bytes. 1596 01:18:17,300 --> 01:18:19,150 So I know that ints are four bytes. 1597 01:18:19,150 --> 01:18:21,740 If I want four bytes, I just tell malloc, give me four bytes. 1598 01:18:21,740 --> 01:18:23,380 Now, frankly, this is a little stupid. 1599 01:18:23,380 --> 01:18:26,080 I shouldn't just assume that the int is always 1600 01:18:26,080 --> 01:18:28,122 going to be four bytes on everyone's computer. 1601 01:18:28,122 --> 01:18:31,330 So there's this function you can start using called sizeof, or this operator, 1602 01:18:31,330 --> 01:18:33,220 technically, where you can say sizeof int. 1603 01:18:33,220 --> 01:18:36,820 And even if you're on an older computer, for instance, really old at this point, 1604 01:18:36,820 --> 01:18:40,047 sizeof int will return the correct value, no matter what. 1605 01:18:40,047 --> 01:18:42,130 You don't have to assume that it's, in fact, four. 1606 01:18:42,130 --> 01:18:42,880 But you know what? 1607 01:18:42,880 --> 01:18:44,920 I'd actually like more than this number of ints. 1608 01:18:44,920 --> 01:18:48,200 Let me actually treat X as an array of integers. 1609 01:18:48,200 --> 01:18:51,610 So actually, if I want an array of integers, I could do this. 1610 01:18:51,610 --> 01:18:53,320 Give me three integers. 1611 01:18:53,320 --> 01:18:54,200 But no, no. 1612 01:18:54,200 --> 01:18:55,750 Let me not do week two syntax. 1613 01:18:55,750 --> 01:18:57,710 Let me do this myself as follows. 1614 01:18:57,710 --> 01:19:02,170 Let me treat this as three times the size of an int. 1615 01:19:02,170 --> 01:19:04,270 So that's technically going to give me 12 bytes. 1616 01:19:04,270 --> 01:19:07,265 But this makes X effectively an array. 1617 01:19:07,265 --> 01:19:09,640 And this is kind of deliberate now because if an array is 1618 01:19:09,640 --> 01:19:12,550 just contiguous memory, and malloc returns to you 1619 01:19:12,550 --> 01:19:16,300 a chunk of contiguous memory, you can treat what comes back from malloc 1620 01:19:16,300 --> 01:19:16,802 as an array. 1621 01:19:16,802 --> 01:19:18,760 And indeed, that's what we're doing as strings. 1622 01:19:18,760 --> 01:19:22,277 We're treating chunks of memory as arrays of chars. 1623 01:19:22,277 --> 01:19:23,860 So let me do something arbitrary here. 1624 01:19:23,860 --> 01:19:28,195 Let me go to X bracket one and set it equal to 72. 1625 01:19:28,195 --> 01:19:30,880 X bracket two, set it equal to 73. 1626 01:19:30,880 --> 01:19:34,030 X bracket three, set it equal to 33. 1627 01:19:34,030 --> 01:19:35,680 And we did this a couple of weeks ago. 1628 01:19:35,680 --> 01:19:37,450 That's "hi" but in Ascii code. 1629 01:19:37,450 --> 01:19:40,420 Let me go ahead and make memory, and it seems to work fine. 1630 01:19:40,420 --> 01:19:43,600 Let me do dot slash memory, and no problem. 1631 01:19:43,600 --> 01:19:45,460 There's no error messages from the compiler. 1632 01:19:45,460 --> 01:19:48,700 There's no runtime errors when I actually run the code. 1633 01:19:48,700 --> 01:19:53,350 But does anyone see any of the bugs thus far? 1634 01:19:53,350 --> 01:19:54,555 What did I do wrong? 1635 01:19:54,555 --> 01:19:55,930 Let me look a little in the back. 1636 01:19:55,930 --> 01:19:57,616 Yeah. 1637 01:19:57,616 --> 01:20:00,118 AUDIENCE: Does it not know when the array ends? 1638 01:20:00,118 --> 01:20:02,410 SPEAKER 1: It doesn't seem to know when the array ends. 1639 01:20:02,410 --> 01:20:04,285 Or more specifically, I'm not respecting when 1640 01:20:04,285 --> 01:20:06,520 the array ends because I'm sort of stupidly 1641 01:20:06,520 --> 01:20:08,590 starting at one, then two then three. 1642 01:20:08,590 --> 01:20:11,380 But technically, if I asked for three of these things, 1643 01:20:11,380 --> 01:20:15,040 I should have done bracket zero, bracket one, bracket two. 1644 01:20:15,040 --> 01:20:18,905 And there's a second more subtle bug that you would only know from today. 1645 01:20:18,905 --> 01:20:19,405 Yeah. 1646 01:20:19,405 --> 01:20:23,290 1647 01:20:23,290 --> 01:20:26,980 OK, I don't necessarily know when one integer ends and the next one begins. 1648 01:20:26,980 --> 01:20:29,770 That's actually not a problem, because on a given system, 1649 01:20:29,770 --> 01:20:31,810 integers are always the same size. 1650 01:20:31,810 --> 01:20:35,892 So the computer can be smart enough to go from here, four bytes this way, 1651 01:20:35,892 --> 01:20:37,600 four bytes this way, four bytes this way. 1652 01:20:37,600 --> 01:20:38,410 That's OK. 1653 01:20:38,410 --> 01:20:41,560 Strings are problematic because who knows how big the sentence was 1654 01:20:41,560 --> 01:20:43,070 that the human typed in. 1655 01:20:43,070 --> 01:20:44,320 But there's a more subtle bug. 1656 01:20:44,320 --> 01:20:45,850 What have I not done? 1657 01:20:45,850 --> 01:20:46,990 I didn't call free. 1658 01:20:46,990 --> 01:20:49,120 So I didn't practice what I just preached. 1659 01:20:49,120 --> 01:20:50,770 Anytime I malloc, I call free. 1660 01:20:50,770 --> 01:20:54,670 But again, per my terminal window, neither of these bugs seem obvious. 1661 01:20:54,670 --> 01:20:57,460 You might submit this code, or deploy it to your software, 1662 01:20:57,460 --> 01:20:58,760 and be none the wiser. 1663 01:20:58,760 --> 01:21:01,790 But a tool like valgrind can actually help you find these things. 1664 01:21:01,790 --> 01:21:03,920 So let me increase the size of my terminal window. 1665 01:21:03,920 --> 01:21:07,030 Let me run this command valgrind on my program. 1666 01:21:07,030 --> 01:21:09,640 So dot slash memory is how I ran it a moment ago. 1667 01:21:09,640 --> 01:21:12,825 Just like debug50, you type before the name of your program. 1668 01:21:12,825 --> 01:21:14,950 Valgrind, you type before the name of your program. 1669 01:21:14,950 --> 01:21:19,360 And the output is going to look crazy, but this is useful. 1670 01:21:19,360 --> 01:21:20,110 Why? 1671 01:21:20,110 --> 01:21:24,140 So notice at the very top of this, we're just seeing what version of valgrind 1672 01:21:24,140 --> 01:21:26,180 we're using and what command we ran. 1673 01:21:26,180 --> 01:21:29,060 But this starts to get juicy, and I'll highlight this here. 1674 01:21:29,060 --> 01:21:33,180 Invalid write of size four invalid write. 1675 01:21:33,180 --> 01:21:37,040 So writing means changing information, like setting a value or assigning it a. 1676 01:21:37,040 --> 01:21:39,030 Value and this is useful here. 1677 01:21:39,030 --> 01:21:42,080 The problem is in memory.c at line nine. 1678 01:21:42,080 --> 01:21:44,990 So colon nine means line nine. 1679 01:21:44,990 --> 01:21:50,060 All right, so let me go back to my code, look at line nine, and oh, interesting. 1680 01:21:50,060 --> 01:21:53,060 So invalid write of size four. 1681 01:21:53,060 --> 01:21:56,400 So it's cryptic, but size four I know is the size of an integer. 1682 01:21:56,400 --> 01:21:59,420 So I'm probably doing something stupid on line nine involving 1683 01:21:59,420 --> 01:22:00,770 changing an integer. 1684 01:22:00,770 --> 01:22:04,190 And sure enough, even though it's not super obvious, X bracket three, 1685 01:22:04,190 --> 01:22:06,150 oh, obviously, this doesn't exist. 1686 01:22:06,150 --> 01:22:07,670 So I have to change the problem. 1687 01:22:07,670 --> 01:22:10,940 One and two were OK, even though it's logically the wrong thing. 1688 01:22:10,940 --> 01:22:13,170 Now I think this will get rid of this error. 1689 01:22:13,170 --> 01:22:16,710 So let me actually clear my terminal window and make it bigger again. 1690 01:22:16,710 --> 01:22:20,180 Let me recompile my code because I made a change. 1691 01:22:20,180 --> 01:22:24,080 Let me rerun valgrind of dot slash memory. 1692 01:22:24,080 --> 01:22:27,590 And now, that error went away. 1693 01:22:27,590 --> 01:22:30,170 There's a mess of output here, but that error went away. 1694 01:22:30,170 --> 01:22:32,540 But this is interesting here now. 1695 01:22:32,540 --> 01:22:37,610 12 bytes in one blocks are definitely lost in loss record one of one. 1696 01:22:37,610 --> 01:22:40,310 So unnecessarily verbose, but the hint here 1697 01:22:40,310 --> 01:22:45,000 is that I somehow lost some bytes, otherwise known as a memory leak. 1698 01:22:45,000 --> 01:22:48,202 So earlier, when I described an imaginary bad programmer 1699 01:22:48,202 --> 01:22:50,660 who kept calling malloc, malloc, malloc, and never freeing, 1700 01:22:50,660 --> 01:22:52,370 that's what's called a memory leak, where 1701 01:22:52,370 --> 01:22:55,370 you're sort of losing track of your memory and never freeing it again. 1702 01:22:55,370 --> 01:22:59,120 So I've definitely lost 12 bytes in one block, whatever a block is, 1703 01:22:59,120 --> 01:22:59,960 in this case. 1704 01:22:59,960 --> 01:23:01,640 This is a little less obvious. 1705 01:23:01,640 --> 01:23:05,960 It's up to us to notice that, OK, wait a minute, memory.c line six is somehow 1706 01:23:05,960 --> 01:23:06,710 germane. 1707 01:23:06,710 --> 01:23:08,000 Let me go back to-- 1708 01:23:08,000 --> 01:23:10,250 oh, this is where I called malloc. 1709 01:23:10,250 --> 01:23:13,640 And valgrind doesn't necessarily know when I should free the memory. 1710 01:23:13,640 --> 01:23:17,660 That's up to me, but I should probably free it at the end of my function 1711 01:23:17,660 --> 01:23:20,660 when I'm definitely done with it, because once you free your memory, 1712 01:23:20,660 --> 01:23:24,370 you should not touch that variable again, unless you actually 1713 01:23:24,370 --> 01:23:26,110 change what its value is. 1714 01:23:26,110 --> 01:23:28,480 So now, as I've done this, and this program to be clear 1715 01:23:28,480 --> 01:23:29,560 does nothing useful. 1716 01:23:29,560 --> 01:23:32,920 This is just an intellectual exercise, not anything productive. 1717 01:23:32,920 --> 01:23:36,160 Let me do make memory one last time. 1718 01:23:36,160 --> 01:23:39,050 Let's do valgrind, dot slash memory. 1719 01:23:39,050 --> 01:23:42,640 And let me grow my terminal window again and hit enter. 1720 01:23:42,640 --> 01:23:44,710 And even though it's still kind of output, 1721 01:23:44,710 --> 01:23:48,702 it's still kind of cryptic, at least it says no leaks are possible. 1722 01:23:48,702 --> 01:23:51,160 So now this is my own sort of teaching assistant telling me 1723 01:23:51,160 --> 01:23:53,440 before I submit the code, or before I deploy it 1724 01:23:53,440 --> 01:23:55,930 to production in real software, that at least there 1725 01:23:55,930 --> 01:23:57,980 seem to be no memory related errors. 1726 01:23:57,980 --> 01:23:59,710 So valgrind is not for logical bugs. 1727 01:23:59,710 --> 01:24:01,120 It's not for syntax errors. 1728 01:24:01,120 --> 01:24:04,810 It's for memory related bugs, as of today. 1729 01:24:04,810 --> 01:24:10,310 Questions on any of that? 1730 01:24:10,310 --> 01:24:10,810 No? 1731 01:24:10,810 --> 01:24:12,820 OK, so what else can go wrong? 1732 01:24:12,820 --> 01:24:14,320 We mentioned these in the past. 1733 01:24:14,320 --> 01:24:16,780 It turns out that garbage values are a thing. 1734 01:24:16,780 --> 01:24:18,910 And recall that, if you declare a variable 1735 01:24:18,910 --> 01:24:20,890 but don't give it a value with an equal sign, 1736 01:24:20,890 --> 01:24:23,960 and you just blindly start using it, like printing it out, or doing math 1737 01:24:23,960 --> 01:24:26,660 on it, you might be manipulating a garbage value, which 1738 01:24:26,660 --> 01:24:29,750 is some number that's essentially remnants of your computer 1739 01:24:29,750 --> 01:24:30,945 having been on for a while. 1740 01:24:30,945 --> 01:24:33,320 Because if you're using this canvas and reusing it again, 1741 01:24:33,320 --> 01:24:37,040 and again, surely there's going to be patterns of zeros and ones 1742 01:24:37,040 --> 01:24:39,890 there that you didn't put there yourself, at least in the moment. 1743 01:24:39,890 --> 01:24:41,420 They might be remnants of the past. 1744 01:24:41,420 --> 01:24:46,370 So garbage values are values of variables that you did not proactively 1745 01:24:46,370 --> 01:24:48,810 set yourself as intended. 1746 01:24:48,810 --> 01:24:50,030 So we can actually see this. 1747 01:24:50,030 --> 01:24:53,780 Let me actually go ahead and whip up a really quick program here 1748 01:24:53,780 --> 01:24:55,710 after shrinking my terminal window. 1749 01:24:55,710 --> 01:24:58,310 Let me close memory.c. 1750 01:24:58,310 --> 01:25:01,820 Let me go ahead and open garbage.c. 1751 01:25:01,820 --> 01:25:04,340 And in here, I'll do include. 1752 01:25:04,340 --> 01:25:06,590 How about standard io.h? 1753 01:25:06,590 --> 01:25:09,638 Let's include standard lib.h 1754 01:25:09,638 --> 01:25:12,341 Actually, we don't even need standard lib.h. 1755 01:25:12,341 --> 01:25:16,790 Let's go ahead and include standard io.h and then int main void. 1756 01:25:16,790 --> 01:25:18,980 And then inside of the curly braces, let's 1757 01:25:18,980 --> 01:25:23,580 give me a really big array of scores, like 1,024 scores, 1758 01:25:23,580 --> 01:25:25,920 like if it's a really busy semester. 1759 01:25:25,920 --> 01:25:29,400 And then let me go ahead and just blindly iterate from i 1760 01:25:29,400 --> 01:25:32,138 equals zero on up to i is less than 1,024. 1761 01:25:32,138 --> 01:25:33,930 And I'm not going to bother with constants. 1762 01:25:33,930 --> 01:25:36,513 I'm just going to play around with these numbers for a moment. 1763 01:25:36,513 --> 01:25:41,120 1764 01:25:41,120 --> 01:25:43,460 And, oh, thank you. 1765 01:25:43,460 --> 01:25:44,540 Oh, cookies for you. 1766 01:25:44,540 --> 01:25:45,680 OK. 1767 01:25:45,680 --> 01:25:46,400 OK, here we go. 1768 01:25:46,400 --> 01:25:47,130 OK, come on up. 1769 01:25:47,130 --> 01:25:48,440 Thank you very much. 1770 01:25:48,440 --> 01:25:49,580 Fair is fair. 1771 01:25:49,580 --> 01:25:50,780 OK. 1772 01:25:50,780 --> 01:25:52,940 Thank you. 1773 01:25:52,940 --> 01:25:54,440 OK. 1774 01:25:54,440 --> 01:25:56,280 OK, now everyone's really paying attention. 1775 01:25:56,280 --> 01:25:56,780 All right. 1776 01:25:56,780 --> 01:25:59,900 So in my loop here, I'm just going to do something stupid, 1777 01:25:59,900 --> 01:26:04,970 like print out all of the values in the scores array using percent i, 1778 01:26:04,970 --> 01:26:08,820 even though I did not put anything in this array. 1779 01:26:08,820 --> 01:26:11,570 So on line five, I'm obviously declaring an array 1780 01:26:11,570 --> 01:26:15,230 of size 1,024 for that many ints, but I'm never 1781 01:26:15,230 --> 01:26:19,890 actually putting values in there myself, or with getint, or any other function. 1782 01:26:19,890 --> 01:26:21,560 So there's garbage values there. 1783 01:26:21,560 --> 01:26:24,440 There's presumably 1,024 garbage values there, 1784 01:26:24,440 --> 01:26:26,060 and we can now actually see them. 1785 01:26:26,060 --> 01:26:27,650 Let me make my terminal window bigger. 1786 01:26:27,650 --> 01:26:31,178 Let me make garbage, no pun intended, dot slash garbage. 1787 01:26:31,178 --> 01:26:33,720 And there's going to be way more than even fit on the screen. 1788 01:26:33,720 --> 01:26:34,303 But who cares? 1789 01:26:34,303 --> 01:26:35,390 We just need to see a few. 1790 01:26:35,390 --> 01:26:38,730 There are some of the garbage values in the array. 1791 01:26:38,730 --> 01:26:42,200 So make super clear that when you create variables of your own, 1792 01:26:42,200 --> 01:26:44,780 you do not give them values of your own. 1793 01:26:44,780 --> 01:26:46,580 Who knows what may be there? 1794 01:26:46,580 --> 01:26:50,930 In some cases, it gets automatically initialized for you to all zeros, 1795 01:26:50,930 --> 01:26:52,540 but that is not always the case. 1796 01:26:52,540 --> 01:26:55,980 And in general, distrust the variable unless you yourself 1797 01:26:55,980 --> 01:26:57,580 have put a value there. 1798 01:26:57,580 --> 01:27:03,090 So how now might we leverage this to-- 1799 01:27:03,090 --> 01:27:05,670 how now might we think about potential problems? 1800 01:27:05,670 --> 01:27:08,640 Well, consider this code here, which this program, too, 1801 01:27:08,640 --> 01:27:13,320 is more for discussion than actual utility, where at the top of it, 1802 01:27:13,320 --> 01:27:15,870 I declare a variable called x and a variable called y, 1803 01:27:15,870 --> 01:27:16,890 both of type pointer. 1804 01:27:16,890 --> 01:27:19,980 So x and y are supposed to be the addresses of two integers. 1805 01:27:19,980 --> 01:27:22,950 malloc, the size of an int, and stored in x. 1806 01:27:22,950 --> 01:27:26,220 So I'm giving myself space for x, even though, obviously, I 1807 01:27:26,220 --> 01:27:29,010 could have done this weeks ago by just not using the star, 1808 01:27:29,010 --> 01:27:30,360 and just say give me an int x. 1809 01:27:30,360 --> 01:27:35,640 Now I'm doing it the low level way, malloc-ing the x for myself. 1810 01:27:35,640 --> 01:27:39,510 I'm then saying go to x, go to that address in memory, 1811 01:27:39,510 --> 01:27:41,340 and put the number 42 there. 1812 01:27:41,340 --> 01:27:46,020 I'm then saying go to y and put the unlucky number 13 there. 1813 01:27:46,020 --> 01:27:49,350 But what's worrisome about this line here? 1814 01:27:49,350 --> 01:27:55,700 After this line, this line, this line, something's bad, I think. 1815 01:27:55,700 --> 01:27:57,830 Yeah, I never allocated memory for y. 1816 01:27:57,830 --> 01:28:00,540 So specifically, I never assigned y a value, 1817 01:28:00,540 --> 01:28:02,990 which means it's a garbage value, which is still a number. 1818 01:28:02,990 --> 01:28:03,680 Maybe it's zero. 1819 01:28:03,680 --> 01:28:04,680 Maybe it's a big number. 1820 01:28:04,680 --> 01:28:06,290 Maybe it's a negative number. 1821 01:28:06,290 --> 01:28:10,100 And if it's a positive number, it could be an actual address 1822 01:28:10,100 --> 01:28:11,670 somewhere in the computer's memory. 1823 01:28:11,670 --> 01:28:13,190 But star y means go there. 1824 01:28:13,190 --> 01:28:15,050 Who knows what memory I'm touching? 1825 01:28:15,050 --> 01:28:18,320 That's how computers crash if you touch memory that you're not supposed to. 1826 01:28:18,320 --> 01:28:21,740 So let me pretend that I didn't at least do this and let me just forge ahead 1827 01:28:21,740 --> 01:28:25,320 and set y equal to x so they're the same. 1828 01:28:25,320 --> 01:28:28,700 And I think what that would mean is now, if I do star y 1829 01:28:28,700 --> 01:28:33,140 and go to the address, that's the same thing as going to the address in x. 1830 01:28:33,140 --> 01:28:37,640 And I think this will have the effect of changing the 42 to 13. 1831 01:28:37,640 --> 01:28:42,470 So this code is correct, so long as I don't blindly dereference y 1832 01:28:42,470 --> 01:28:44,940 by using star y notation. 1833 01:28:44,940 --> 01:28:48,320 So this gets a little abstract, even though this is just an exercise here. 1834 01:28:48,320 --> 01:28:51,200 And our friend Nick Parlante, a professor at Stanford, 1835 01:28:51,200 --> 01:28:54,210 wonderfully put together a little claymation 1836 01:28:54,210 --> 01:28:58,990 that's fun to take a look at, whereby if I go ahead and open up this file, 1837 01:28:58,990 --> 01:29:02,070 we'll be introduced to someone who's a little famous 1838 01:29:02,070 --> 01:29:05,250 in the world of computing named Binky, if we could dim the lights 1839 01:29:05,250 --> 01:29:08,010 and take a look at what bad things can happen if you 1840 01:29:08,010 --> 01:29:11,820 don't manage your memory properly. 1841 01:29:11,820 --> 01:29:13,050 SPEAKER 5: Hey, Binky. 1842 01:29:13,050 --> 01:29:13,800 Wake up. 1843 01:29:13,800 --> 01:29:16,025 It's time for pointer fun. 1844 01:29:16,025 --> 01:29:17,320 SPEAKER 6: What's that? 1845 01:29:17,320 --> 01:29:19,150 Learn about pointers? 1846 01:29:19,150 --> 01:29:20,118 Oh, goody. 1847 01:29:20,118 --> 01:29:23,410 SPEAKER 5: Well, to get started, I guess we're going to need a couple pointers. 1848 01:29:23,410 --> 01:29:24,610 SPEAKER 6: OK. 1849 01:29:24,610 --> 01:29:28,060 This code allocates two pointers, which can point to integers. 1850 01:29:28,060 --> 01:29:28,810 SPEAKER 5: OK. 1851 01:29:28,810 --> 01:29:32,220 Well, I see the two pointers, but they don't seem to be pointing to anything. 1852 01:29:32,220 --> 01:29:33,220 SPEAKER 6: That's right. 1853 01:29:33,220 --> 01:29:35,350 Initially, pointers don't point to anything. 1854 01:29:35,350 --> 01:29:38,410 The things they point to are called pointees, and setting them up 1855 01:29:38,410 --> 01:29:39,250 is a separate step. 1856 01:29:39,250 --> 01:29:40,570 SPEAKER 5: Oh, right, right. 1857 01:29:40,570 --> 01:29:41,230 I knew that. 1858 01:29:41,230 --> 01:29:43,120 The pointees are separate. 1859 01:29:43,120 --> 01:29:45,190 So how do you allocate a pointee? 1860 01:29:45,190 --> 01:29:46,150 SPEAKER 6: OK. 1861 01:29:46,150 --> 01:29:49,150 Well, this code allocates a new integer pointee, 1862 01:29:49,150 --> 01:29:51,870 and this part sets x to point to it. 1863 01:29:51,870 --> 01:29:53,480 SPEAKER 5: Hey, that looks better. 1864 01:29:53,480 --> 01:29:54,740 So make it do something. 1865 01:29:54,740 --> 01:29:58,640 SPEAKER 6: OK, I'll dereference the pointer x to store the number 1866 01:29:58,640 --> 01:30:00,680 42 into its pointee. 1867 01:30:00,680 --> 01:30:03,860 For this trick, I'll need my magic wand of dereferencing. 1868 01:30:03,860 --> 01:30:07,770 SPEAKER 5: Your magic wand of dereferencing? 1869 01:30:07,770 --> 01:30:09,195 That's great. 1870 01:30:09,195 --> 01:30:11,350 SPEAKER 6: This is what the code looks like. 1871 01:30:11,350 --> 01:30:13,965 I'll just set up the number and-- 1872 01:30:13,965 --> 01:30:15,020 SPEAKER 5: Hey, look. 1873 01:30:15,020 --> 01:30:16,220 There it goes. 1874 01:30:16,220 --> 01:30:19,910 So doing a dereference on x follows the arrow 1875 01:30:19,910 --> 01:30:23,300 to access its pointee, in this case, to store 42 in there. 1876 01:30:23,300 --> 01:30:27,560 Hey, try using it to store the number 13 through the other pointer, y. 1877 01:30:27,560 --> 01:30:28,860 SPEAKER 6: OK. 1878 01:30:28,860 --> 01:30:33,380 I'll just go over here to y and get the number 13 set up, 1879 01:30:33,380 --> 01:30:38,030 and then take the wand of dereferencing, and just-- 1880 01:30:38,030 --> 01:30:38,810 whoa. 1881 01:30:38,810 --> 01:30:41,150 SPEAKER 5: Oh, hey, that didn't work. 1882 01:30:41,150 --> 01:30:44,150 Say, Binky, I don't think dereferencing y 1883 01:30:44,150 --> 01:30:48,330 is a good idea because setting up the pointee is a separate step, 1884 01:30:48,330 --> 01:30:50,405 and I don't think we ever did it. 1885 01:30:50,405 --> 01:30:51,365 SPEAKER 6: Good point. 1886 01:30:51,365 --> 01:30:54,230 SPEAKER 5: Yeah, we allocated the pointer y, 1887 01:30:54,230 --> 01:30:56,986 but we never set it to point to a pointee. 1888 01:30:56,986 --> 01:30:58,518 SPEAKER 6: Very observant. 1889 01:30:58,518 --> 01:31:00,560 SPEAKER 5: Hey, you're looking good there, Binky. 1890 01:31:00,560 --> 01:31:03,590 Can you fix it so that y points to the same pointee as x? 1891 01:31:03,590 --> 01:31:04,280 SPEAKER 6: Sure. 1892 01:31:04,280 --> 01:31:06,500 I'll use my magic wand of pointer assignment. 1893 01:31:06,500 --> 01:31:08,765 SPEAKER 5: Is that going to be a problem like before? 1894 01:31:08,765 --> 01:31:11,000 SPEAKER 6: No, this doesn't touch the pointees. 1895 01:31:11,000 --> 01:31:14,470 It just changes one pointer to point to the same thing as another. 1896 01:31:14,470 --> 01:31:15,840 SPEAKER 5: Oh, I see. 1897 01:31:15,840 --> 01:31:18,390 Now y points to the same place as x. 1898 01:31:18,390 --> 01:31:19,350 So wait. 1899 01:31:19,350 --> 01:31:20,370 Now y is fixed. 1900 01:31:20,370 --> 01:31:23,430 It has a pointee, so you can try the wand of dereferencing again 1901 01:31:23,430 --> 01:31:25,630 to send the 13 over. 1902 01:31:25,630 --> 01:31:26,640 SPEAKER 6: OK. 1903 01:31:26,640 --> 01:31:28,085 Here it goes. 1904 01:31:28,085 --> 01:31:29,500 SPEAKER 5: Hey, look at that. 1905 01:31:29,500 --> 01:31:31,300 Now dereferencing works on y. 1906 01:31:31,300 --> 01:31:34,795 And because the pointers are sharing that one pointee, they both see the 13. 1907 01:31:34,795 --> 01:31:36,550 SPEAKER 6: Yeah, sharing. 1908 01:31:36,550 --> 01:31:37,060 Whatever. 1909 01:31:37,060 --> 01:31:38,860 So are we going to switch places now? 1910 01:31:38,860 --> 01:31:39,790 SPEAKER 5: Oh, look. 1911 01:31:39,790 --> 01:31:40,675 We're out of time. 1912 01:31:40,675 --> 01:31:41,950 SPEAKER 6: But-- 1913 01:31:41,950 --> 01:31:43,090 SPEAKER 1: All right. 1914 01:31:43,090 --> 01:31:44,320 So our thanks to Nick. 1915 01:31:44,320 --> 01:31:47,290 I can only imagine how many hours he spent making that happen. 1916 01:31:47,290 --> 01:31:50,290 But hopefully, it gives you more of a visual as to what's happening when 1917 01:31:50,290 --> 01:31:54,160 we're dereferencing these addresses, and going to them, and assigning values, 1918 01:31:54,160 --> 01:31:58,720 and as per Binky's explosion there, what happens when you dereference values you 1919 01:31:58,720 --> 01:31:59,380 shouldn't. 1920 01:31:59,380 --> 01:32:01,730 So related thereto, let me do this. 1921 01:32:01,730 --> 01:32:05,050 Let me go over to VS Code and open up now a program 1922 01:32:05,050 --> 01:32:07,390 I wrote in advance called swap.c. 1923 01:32:07,390 --> 01:32:11,990 And the purpose of this program is just to swap the value of two variables. 1924 01:32:11,990 --> 01:32:16,660 So let me walk over to the code here and point out that, in main, I've got 1925 01:32:16,660 --> 01:32:17,890 two variables, x and y. 1926 01:32:17,890 --> 01:32:19,300 No pointers, no magic there. 1927 01:32:19,300 --> 01:32:21,670 Just x and y are one and two respectively. 1928 01:32:21,670 --> 01:32:27,520 I've got a couple of printfs here saying x is %i, y is %i, passing in x and y, 1929 01:32:27,520 --> 01:32:30,290 just so we can see that x and y are indeed one and two. 1930 01:32:30,290 --> 01:32:32,810 I'm then calling a function called swap, which 1931 01:32:32,810 --> 01:32:34,950 presumably, should swap the two values. 1932 01:32:34,950 --> 01:32:37,340 And then I'm just printing the exact same thing again, 1933 01:32:37,340 --> 01:32:39,680 my hoping that it's first going to say one, two, 1934 01:32:39,680 --> 01:32:43,460 then it's going to say two, one, thus achieving the idea of swapping here. 1935 01:32:43,460 --> 01:32:44,480 And here's swap. 1936 01:32:44,480 --> 01:32:46,755 Swap takes in two integers, a and b, though I 1937 01:32:46,755 --> 01:32:48,380 could have called them whatever I want. 1938 01:32:48,380 --> 01:32:52,040 It temporarily puts a in temp. 1939 01:32:52,040 --> 01:32:54,290 It then changes a to b. 1940 01:32:54,290 --> 01:32:57,050 It then changes b to temp, and then that's it. 1941 01:32:57,050 --> 01:32:59,220 It's a void function, so it doesn't return anything, 1942 01:32:59,220 --> 01:33:02,190 but it does all of the mathematical work in here. 1943 01:33:02,190 --> 01:33:07,610 So this is curious, though, because when it runs, 1944 01:33:07,610 --> 01:33:09,260 let me open up my terminal window here. 1945 01:33:09,260 --> 01:33:14,150 Make swap, dot slash swap, I should see one, two, and then two, one. 1946 01:33:14,150 --> 01:33:17,750 But no, even though I do think this is logically correct. 1947 01:33:17,750 --> 01:33:21,020 And actually, we're almost out of stock, but we do have another box of cookies 1948 01:33:21,020 --> 01:33:21,520 here. 1949 01:33:21,520 --> 01:33:24,260 Can we get one volunteer to come on up here maybe? 1950 01:33:24,260 --> 01:33:25,760 OK, how about you? 1951 01:33:25,760 --> 01:33:28,010 Yes, in the pink, come on up. 1952 01:33:28,010 --> 01:33:31,470 A round of applause, though, really, it's about the cookies now, I know. 1953 01:33:31,470 --> 01:33:35,420 1954 01:33:35,420 --> 01:33:38,180 OK. 1955 01:33:38,180 --> 01:33:40,706 And what is your name? 1956 01:33:40,706 --> 01:33:43,437 SPEAKER 7: My name is Caleb, and I'm a first year concentrating 1957 01:33:43,437 --> 01:33:44,270 in computer science. 1958 01:33:44,270 --> 01:33:45,520 SPEAKER 1: All right, welcome. 1959 01:33:45,520 --> 01:33:48,290 Please stand behind the desk here. 1960 01:33:48,290 --> 01:33:49,040 No, you can stand. 1961 01:33:49,040 --> 01:33:49,550 It's fine. 1962 01:33:49,550 --> 01:33:50,133 SPEAKER 7: OK. 1963 01:33:50,133 --> 01:33:52,438 SPEAKER 1: We have two glasses of water, colored, blue, 1964 01:33:52,438 --> 01:33:53,480 and orange, respectively. 1965 01:33:53,480 --> 01:33:57,870 And I would like you to swap the values of these two variables 1966 01:33:57,870 --> 01:34:00,035 so that the orange liquid goes in the blue glass, 1967 01:34:00,035 --> 01:34:01,910 and the blue liquid goes in the orange glass. 1968 01:34:01,910 --> 01:34:05,864 1969 01:34:05,864 --> 01:34:08,000 SPEAKER 7: Seems like a bad idea. 1970 01:34:08,000 --> 01:34:09,695 SPEAKER 1: Why is that? 1971 01:34:09,695 --> 01:34:14,930 SPEAKER 7: Because I can't get one out to put the other one in because there's 1972 01:34:14,930 --> 01:34:15,800 no third glass. 1973 01:34:15,800 --> 01:34:20,210 SPEAKER 1: OK correct because we do have what we generally 1974 01:34:20,210 --> 01:34:21,660 call a temporary variable here. 1975 01:34:21,660 --> 01:34:23,660 So here, let me give you a variable called temp. 1976 01:34:23,660 --> 01:34:26,090 And if I give you this, how does that change things? 1977 01:34:26,090 --> 01:34:27,695 SPEAKER 7: Well, now, I can take one. 1978 01:34:27,695 --> 01:34:28,520 SPEAKER 1: OK. 1979 01:34:28,520 --> 01:34:29,720 SPEAKER 7: Very carefully. 1980 01:34:29,720 --> 01:34:32,278 1981 01:34:32,278 --> 01:34:32,945 SPEAKER 1: Nice. 1982 01:34:32,945 --> 01:34:34,100 SPEAKER 7: I'm trying. 1983 01:34:34,100 --> 01:34:34,850 SPEAKER 1: OK. 1984 01:34:34,850 --> 01:34:35,810 SPEAKER 7: There we go. 1985 01:34:35,810 --> 01:34:42,620 SPEAKER 1: And now you can put b into a, if you will. 1986 01:34:42,620 --> 01:34:44,350 Nice. 1987 01:34:44,350 --> 01:34:49,900 And now temp goes back into that one. 1988 01:34:49,900 --> 01:34:50,470 All right. 1989 01:34:50,470 --> 01:34:51,490 That was very well done. 1990 01:34:51,490 --> 01:34:53,270 Maybe round of applause. 1991 01:34:53,270 --> 01:34:56,450 Thank you. 1992 01:34:56,450 --> 01:35:00,950 So this was just a cookie based way of making clear 1993 01:35:00,950 --> 01:35:04,040 that the code on the screen seems to work. 1994 01:35:04,040 --> 01:35:07,130 If I scroll back down to the swap function, 1995 01:35:07,130 --> 01:35:12,230 it seems to do exactly what you just did there, whereby the temporary glass is 1996 01:35:12,230 --> 01:35:15,650 where we put a, then we changed a to contain b, 1997 01:35:15,650 --> 01:35:18,770 then we changed b to contain what was originally an a, 1998 01:35:18,770 --> 01:35:20,708 but is now in the temporary glass. 1999 01:35:20,708 --> 01:35:21,500 And now we're done. 2000 01:35:21,500 --> 01:35:25,130 So it did achieve the stated goal, and yet when I ran this code a moment ago, 2001 01:35:25,130 --> 01:35:27,950 it was one, two, and then one, two again. 2002 01:35:27,950 --> 01:35:30,140 So why might that actually be? 2003 01:35:30,140 --> 01:35:33,860 Well, here we can go back to some of today's fundamentals 2004 01:35:33,860 --> 01:35:36,030 to consider what it is that's going wrong. 2005 01:35:36,030 --> 01:35:39,830 And in this case, it's actually related to a concept 2006 01:35:39,830 --> 01:35:44,780 we introduced some time ago, whereby there seems to be an issue of scope, 2007 01:35:44,780 --> 01:35:48,350 whereby sometimes when you're manipulating variables inside 2008 01:35:48,350 --> 01:35:53,240 of curly braces, thus defining their scope, it has no effect on values 2009 01:35:53,240 --> 01:35:53,870 elsewhere. 2010 01:35:53,870 --> 01:35:57,640 The variables might not even exist elsewhere, as we saw in the past. 2011 01:35:57,640 --> 01:35:58,900 So what do I mean by this? 2012 01:35:58,900 --> 01:36:03,798 Well, with matters of scope, it turns out that in this case, the way 2013 01:36:03,798 --> 01:36:06,090 I've implemented the swap function, I'm doing something 2014 01:36:06,090 --> 01:36:08,640 a programmer would call passing by value. 2015 01:36:08,640 --> 01:36:12,990 I'm literally passing in x and y by their values, one and two. 2016 01:36:12,990 --> 01:36:15,750 Another way of putting this is passing by copy. 2017 01:36:15,750 --> 01:36:18,900 So when I pass x and y into the swap function, 2018 01:36:18,900 --> 01:36:23,340 it turns out swap is actually getting copies thereof. 2019 01:36:23,340 --> 01:36:24,580 Now, what do I mean by this? 2020 01:36:24,580 --> 01:36:26,288 Well, let's go back again to this picture 2021 01:36:26,288 --> 01:36:29,290 of memory representative of what's in your Mac, your PC, or your phone. 2022 01:36:29,290 --> 01:36:31,650 And if we zoom in on this chip and we treat 2023 01:36:31,650 --> 01:36:34,980 it more abstractly as this canvas, get rid of the actual hardware, 2024 01:36:34,980 --> 01:36:37,780 and consider what's going on inside of the computer, 2025 01:36:37,780 --> 01:36:41,940 it turns out that there are conventions of how computers use this memory. 2026 01:36:41,940 --> 01:36:44,740 And it's worth having a general sense of what goes where. 2027 01:36:44,740 --> 01:36:48,300 So generally speaking, if this is a big rectangular region of memory, 2028 01:36:48,300 --> 01:36:50,700 even though this is just an artist's depiction thereof, 2029 01:36:50,700 --> 01:36:53,250 it turns out that the top of your memory, so to speak, 2030 01:36:53,250 --> 01:36:54,750 is where machine code goes. 2031 01:36:54,750 --> 01:36:59,470 The zeros and ones that you compile get loaded into here. 2032 01:36:59,470 --> 01:37:03,270 So when you do dot slash something, or on a Mac or PC, when you double click, 2033 01:37:03,270 --> 01:37:07,950 or on a phone, when you single tap, that loads your program's machine code, 2034 01:37:07,950 --> 01:37:10,528 the app's machine code to the top of your computer's memory. 2035 01:37:10,528 --> 01:37:12,570 Strictly speaking, it doesn't have to be the top. 2036 01:37:12,570 --> 01:37:14,730 But for our sake, it's in this region here. 2037 01:37:14,730 --> 01:37:17,970 That's how the computer can access all of those zeros and ones quickly. 2038 01:37:17,970 --> 01:37:21,000 Below that, so to speak, are where global variables go. 2039 01:37:21,000 --> 01:37:22,870 We haven't had many occasions to use these. 2040 01:37:22,870 --> 01:37:25,410 But if you define a variable outside of main, 2041 01:37:25,410 --> 01:37:27,420 and outside of every other function in C, 2042 01:37:27,420 --> 01:37:29,140 it's what's called a global variable. 2043 01:37:29,140 --> 01:37:31,380 So those get tucked especially up at the top 2044 01:37:31,380 --> 01:37:34,195 so that they're accessible everywhere else in your program. 2045 01:37:34,195 --> 01:37:35,820 Then there's something called the heap. 2046 01:37:35,820 --> 01:37:37,180 More on that in a moment. 2047 01:37:37,180 --> 01:37:40,860 And it grows downward, so you have a lot of memory available to you 2048 01:37:40,860 --> 01:37:43,380 here in the heap, and you can keep getting more, and more, 2049 01:37:43,380 --> 01:37:44,700 and more available to you. 2050 01:37:44,700 --> 01:37:47,820 But at the bottom of this memory is what's called the stack. 2051 01:37:47,820 --> 01:37:51,060 And the stack actually grows, curiously, in the other direction, up, and up, 2052 01:37:51,060 --> 01:37:51,570 and up. 2053 01:37:51,570 --> 01:37:55,320 And it turns out, when you use malloc and ask the computer for memory, 2054 01:37:55,320 --> 01:37:58,020 it comes from this heap region, specifically. 2055 01:37:58,020 --> 01:38:02,250 When you use functions with variables and arguments, 2056 01:38:02,250 --> 01:38:04,170 you're using stack memory. 2057 01:38:04,170 --> 01:38:07,405 Now, the astute viewer will notice that this does not seem like a good thing 2058 01:38:07,405 --> 01:38:09,030 if they're about to collide eventually. 2059 01:38:09,030 --> 01:38:12,657 And bad things can and will happen when one overflows the other, but more 2060 01:38:12,657 --> 01:38:13,740 on that, too, in a moment. 2061 01:38:13,740 --> 01:38:15,630 But let's focus for the moment on a stack 2062 01:38:15,630 --> 01:38:17,770 when we do something like this swap function. 2063 01:38:17,770 --> 01:38:21,150 So for instance, when we had code like this, which was bad, 2064 01:38:21,150 --> 01:38:26,130 it did not allow us to permanently change the values of x and y. 2065 01:38:26,130 --> 01:38:27,060 Why? 2066 01:38:27,060 --> 01:38:28,170 No pun intended. 2067 01:38:28,170 --> 01:38:31,710 Here on the stack is where the very first function 2068 01:38:31,710 --> 01:38:33,640 goes in your computer's memory. 2069 01:38:33,640 --> 01:38:36,030 So main, if you have any variables, they go 2070 01:38:36,030 --> 01:38:39,210 at the bottom of the computer's memory once you've loaded that program. 2071 01:38:39,210 --> 01:38:40,660 So what do I mean by that? 2072 01:38:40,660 --> 01:38:44,400 Well, if you think back to the code a moment ago, it was things like x and y, 2073 01:38:44,400 --> 01:38:45,450 and so forth. 2074 01:38:45,450 --> 01:38:50,790 When main calls swap, swap goes above it on the stack, so to speak. 2075 01:38:50,790 --> 01:38:53,230 And each of these rectangles, the technical term is frame. 2076 01:38:53,230 --> 01:38:54,450 So this is a stack frame. 2077 01:38:54,450 --> 01:38:55,620 This is a stack frame. 2078 01:38:55,620 --> 01:38:57,960 And if swap called another function, another frame 2079 01:38:57,960 --> 01:38:59,490 would go on the stack this way. 2080 01:38:59,490 --> 01:39:01,710 And then as soon as swap returns, though, 2081 01:39:01,710 --> 01:39:03,900 that memory essentially goes away, or the computer 2082 01:39:03,900 --> 01:39:06,608 forgets about it, even though the bits are obviously still there. 2083 01:39:06,608 --> 01:39:08,610 You still have the hardware, but it's forgotten. 2084 01:39:08,610 --> 01:39:12,450 And main remains until main finishes and exits your program. 2085 01:39:12,450 --> 01:39:16,000 But let's consider what's going inside of these stack frames. 2086 01:39:16,000 --> 01:39:20,010 So here's main at the bottom, and it had two variables, x and y. 2087 01:39:20,010 --> 01:39:23,310 Those variables were one and two, respectively. 2088 01:39:23,310 --> 01:39:28,080 Main called swap, which had two arguments, a and b, 2089 01:39:28,080 --> 01:39:30,965 also integers, which are effectively local variables, also, 2090 01:39:30,965 --> 01:39:33,090 even though you're declaring them in the signature, 2091 01:39:33,090 --> 01:39:34,570 the prototype of the function. 2092 01:39:34,570 --> 01:39:39,660 So when swap is called, swap is using its frame of memory as follows. 2093 01:39:39,660 --> 01:39:42,090 Room for a, room for b, room for temp. 2094 01:39:42,090 --> 01:39:43,260 Not necessarily to scale. 2095 01:39:43,260 --> 01:39:45,630 I just wanted everything to be a pretty rectangle. 2096 01:39:45,630 --> 01:39:47,070 What's going where? 2097 01:39:47,070 --> 01:39:56,820 Well, because functions in C pass by value, that is, copy, a is a copy of x, 2098 01:39:56,820 --> 01:39:58,890 and b is a copy of y. 2099 01:39:58,890 --> 01:40:00,570 But they're separate bytes. 2100 01:40:00,570 --> 01:40:02,823 This is a different memory location than this. 2101 01:40:02,823 --> 01:40:04,740 This is a different memory location than this. 2102 01:40:04,740 --> 01:40:07,960 So we're just copying the patterns of bits from one to the other. 2103 01:40:07,960 --> 01:40:09,570 This is passing by value, a.k.a. 2104 01:40:09,570 --> 01:40:10,680 Passing by copy. 2105 01:40:10,680 --> 01:40:11,730 So what then happens? 2106 01:40:11,730 --> 01:40:14,970 Just like our demonstration, we used temp cleverly, 2107 01:40:14,970 --> 01:40:19,290 whereby with this code here, we copied the value of a into temp. 2108 01:40:19,290 --> 01:40:21,540 So that puts the number one here, too. 2109 01:40:21,540 --> 01:40:23,820 We then changed a to equal b. 2110 01:40:23,820 --> 01:40:25,770 So that's what happened here. 2111 01:40:25,770 --> 01:40:29,760 We then changed b to equal temp, so that changed the value there. 2112 01:40:29,760 --> 01:40:31,200 But then swap returned. 2113 01:40:31,200 --> 01:40:35,640 You went back to your seat, leaving a and b swapped, yes. 2114 01:40:35,640 --> 01:40:40,050 But what was not swapped was x and y. 2115 01:40:40,050 --> 01:40:43,980 You did all of this work correctly, but in the wrong scope. 2116 01:40:43,980 --> 01:40:46,150 You operated on copies thereof. 2117 01:40:46,150 --> 01:40:48,660 So this swap function, while logically correct, 2118 01:40:48,660 --> 01:40:51,540 will never solve this problem correctly as written 2119 01:40:51,540 --> 01:40:54,160 because we've been passing by value. 2120 01:40:54,160 --> 01:40:56,250 So today, we introduce a technique, whereby 2121 01:40:56,250 --> 01:41:00,750 you can pass by reference instead, pass by pointer instead, 2122 01:41:00,750 --> 01:41:03,270 because instead of just passing in copies, 2123 01:41:03,270 --> 01:41:08,410 what if we actually tell swap where x is and where y is, not what it is 2124 01:41:08,410 --> 01:41:10,260 and what it is, but where each is? 2125 01:41:10,260 --> 01:41:14,820 Then swap can follow the proverbial treasure map, go to those locations, 2126 01:41:14,820 --> 01:41:16,630 and change them permanently. 2127 01:41:16,630 --> 01:41:19,290 So this was the bad code in red, and this 2128 01:41:19,290 --> 01:41:20,940 is going to escalate quickly visually. 2129 01:41:20,940 --> 01:41:23,880 But it's just an application of today's ideas. 2130 01:41:23,880 --> 01:41:26,610 This is the correct solution now. 2131 01:41:26,610 --> 01:41:27,810 So let me do before. 2132 01:41:27,810 --> 01:41:29,140 In red is bad. 2133 01:41:29,140 --> 01:41:31,380 Green, after, is correct. 2134 01:41:31,380 --> 01:41:32,010 Why? 2135 01:41:32,010 --> 01:41:34,860 The way you specify pass by reference or pointer 2136 01:41:34,860 --> 01:41:37,710 instead is you change swap to take, not two integers, per se, 2137 01:41:37,710 --> 01:41:39,660 but two addresses of integers. 2138 01:41:39,660 --> 01:41:41,970 And the syntax for that today is just to add the star. 2139 01:41:41,970 --> 01:41:44,280 So int star, and int star. 2140 01:41:44,280 --> 01:41:46,650 Meanwhile, the code down here has to change. 2141 01:41:46,650 --> 01:41:47,910 Temp does not have to change. 2142 01:41:47,910 --> 01:41:53,670 Temp is still just a variable that's ready for some value. 2143 01:41:53,670 --> 01:41:58,660 But the a and the b, and the b need to be rewritten as follows. 2144 01:41:58,660 --> 01:42:03,720 So star a means go to the address a, and get its value, 2145 01:42:03,720 --> 01:42:06,570 and put it in temp, just like you reached for one of the glasses 2146 01:42:06,570 --> 01:42:07,500 and poured it in. 2147 01:42:07,500 --> 01:42:11,400 Star b means, go to the value in b and grab it. 2148 01:42:11,400 --> 01:42:16,710 And then go to the value at a and change it to be that at b. 2149 01:42:16,710 --> 01:42:19,500 And then lastly, this is not now sweat. 2150 01:42:19,500 --> 01:42:21,240 This is now colored liquid. 2151 01:42:21,240 --> 01:42:28,750 This last line is go to the address b, and put temp there instead. 2152 01:42:28,750 --> 01:42:31,080 So the picture now is fundamentally different. 2153 01:42:31,080 --> 01:42:32,430 Main looks the same still. 2154 01:42:32,430 --> 01:42:36,990 But when swap is called, effectively, and we won't bother with OX123 or 456. 2155 01:42:36,990 --> 01:42:39,540 Let's just do it with arrows pointing at things. 2156 01:42:39,540 --> 01:42:43,710 a is a pointer to x, b is a pointer to y. 2157 01:42:43,710 --> 01:42:46,200 So what do those lines of code tell us to do? 2158 01:42:46,200 --> 01:42:46,972 Go to a. 2159 01:42:46,972 --> 01:42:49,680 So that means this, kind of like the old chutes and ladders game, 2160 01:42:49,680 --> 01:42:50,580 if you ever played. 2161 01:42:50,580 --> 01:42:54,900 Follow the arrow, and that leads you to the number one, and store it in temp. 2162 01:42:54,900 --> 01:42:56,970 So that one was straightforward. 2163 01:42:56,970 --> 01:42:58,500 Go to the value in b. 2164 01:42:58,500 --> 01:43:00,000 So follow the arrow. 2165 01:43:00,000 --> 01:43:02,970 That gives us two, and put it at the location 2166 01:43:02,970 --> 01:43:05,880 in a, which means put the two there. 2167 01:43:05,880 --> 01:43:09,270 The very last line of code now means get temp, which is obviously there, 2168 01:43:09,270 --> 01:43:14,100 and go to the address b, and change it to one. 2169 01:43:14,100 --> 01:43:17,640 So now, even though we've not changed a and b at all, per se, 2170 01:43:17,640 --> 01:43:20,100 we've used them as little breadcrumbs to lead us 2171 01:43:20,100 --> 01:43:22,180 to the right location in the computer's memory. 2172 01:43:22,180 --> 01:43:25,800 So when swap returns this time, even though it's a void function, 2173 01:43:25,800 --> 01:43:27,630 it has made a difference. 2174 01:43:27,630 --> 01:43:32,910 And it's had this effect of swapping the actual original values of x and y. 2175 01:43:32,910 --> 01:43:35,160 The code, admittedly, is cryptic looking. 2176 01:43:35,160 --> 01:43:38,220 It's not the most user friendly syntax, but this ability 2177 01:43:38,220 --> 01:43:41,490 now to go to locations in memory and change 2178 01:43:41,490 --> 01:43:44,220 what is actually there is what we've been given today 2179 01:43:44,220 --> 01:43:48,100 with this new syntax of the star operator, and occasionally as needed, 2180 01:43:48,100 --> 01:43:51,520 the ampersand one, as well. 2181 01:43:51,520 --> 01:43:55,590 Questions on this technique, which is, admittedly, the most sophisticated 2182 01:43:55,590 --> 01:43:58,990 of the examples thus far, and we'll probably take time to get used to. 2183 01:43:58,990 --> 01:43:59,490 Yeah. 2184 01:43:59,490 --> 01:44:02,805 2185 01:44:02,805 --> 01:44:03,430 Say that again. 2186 01:44:03,430 --> 01:44:04,252 Will this work if-- 2187 01:44:04,252 --> 01:44:06,460 AUDIENCE: Will this work if you're swapping the value 2188 01:44:06,460 --> 01:44:07,755 of two strings instead of ints? 2189 01:44:07,755 --> 01:44:08,590 SPEAKER 1: Ah, good question. 2190 01:44:08,590 --> 01:44:10,423 Will this work if you're swapping the values 2191 01:44:10,423 --> 01:44:12,400 of two strings instead of two ints? 2192 01:44:12,400 --> 01:44:18,610 Yes, if you go to the address that the string represents and change maybe 2193 01:44:18,610 --> 01:44:21,350 with a loop all of the characters one at a time. 2194 01:44:21,350 --> 01:44:24,335 So it's going to be more complicated than this in green because you're 2195 01:44:24,335 --> 01:44:26,710 going to have to change all of the individual characters, 2196 01:44:26,710 --> 01:44:31,240 probably reusing a temporary char, instead of a temporary integer. 2197 01:44:31,240 --> 01:44:31,850 But you could. 2198 01:44:31,850 --> 01:44:32,350 Yeah. 2199 01:44:32,350 --> 01:44:38,218 AUDIENCE: [INAUDIBLE] 2200 01:44:38,218 --> 01:44:40,510 SPEAKER 1: Since integers have a fixed number of bits, 2201 01:44:40,510 --> 01:44:43,093 can you ever run into a situation where you run out of memory? 2202 01:44:43,093 --> 01:44:43,900 Absolutely. 2203 01:44:43,900 --> 01:44:46,480 Your phone, your laptop, your desktop can only do so much, 2204 01:44:46,480 --> 01:44:49,300 can only count so high because of these physical limitations. 2205 01:44:49,300 --> 01:44:51,390 And hopefully, it's just never reach that limit. 2206 01:44:51,390 --> 01:44:54,640 But we'll talk in a couple of weeks time when we transition to web programming 2207 01:44:54,640 --> 01:44:57,220 and databases, and the Metas, the Microsofts, 2208 01:44:57,220 --> 01:45:00,250 the Googles of the world that have crazy large amounts of data. 2209 01:45:00,250 --> 01:45:02,163 The number of bits we use in those contexts 2210 01:45:02,163 --> 01:45:04,330 is actually going to matter for exactly that reason. 2211 01:45:04,330 --> 01:45:05,590 If business is booming, and if you've got 2212 01:45:05,590 --> 01:45:07,690 lots and lots of data, lots and lots of users, 2213 01:45:07,690 --> 01:45:09,760 you need to be able to count higher. 2214 01:45:09,760 --> 01:45:12,760 Just so that you've seen the code actually in operation, 2215 01:45:12,760 --> 01:45:15,380 here is, of course, my swap function down below. 2216 01:45:15,380 --> 01:45:17,350 And if I go ahead and change its prototype 2217 01:45:17,350 --> 01:45:21,950 to take in pointers to a and b, and similarly change the prototype up here. 2218 01:45:21,950 --> 01:45:26,320 And if I go in and change a here to be star a to dereference it, star 2219 01:45:26,320 --> 01:45:28,600 b to dereference it, star a to dereference it, 2220 01:45:28,600 --> 01:45:32,200 and star b to dereference it, I claim now that this version of the code 2221 01:45:32,200 --> 01:45:33,110 should now work. 2222 01:45:33,110 --> 01:45:36,130 In fact, let me go ahead and do make swap. 2223 01:45:36,130 --> 01:45:37,620 It didn't compile. 2224 01:45:37,620 --> 01:45:38,520 So why might that be? 2225 01:45:38,520 --> 01:45:40,603 Well, let me scroll up to see what the message is. 2226 01:45:40,603 --> 01:45:42,590 Incompatible integer to pointer conversion 2227 01:45:42,590 --> 01:45:45,230 passing in parameter of type int star. 2228 01:45:45,230 --> 01:45:46,580 That's a lot to absorb. 2229 01:45:46,580 --> 01:45:49,670 But clearly, the issue is with how I'm calling swap. 2230 01:45:49,670 --> 01:45:50,940 So why is this? 2231 01:45:50,940 --> 01:45:55,640 Well, notice that now that my swap function expects as arguments pointers 2232 01:45:55,640 --> 01:46:00,140 to integers, I can't just blindly pass in x and y, which are integers. 2233 01:46:00,140 --> 01:46:04,100 Instead, I do need to use our friend, the new ampersand operator to pass 2234 01:46:04,100 --> 01:46:06,410 in the address of x and the address of y. 2235 01:46:06,410 --> 01:46:12,140 So now, if I reopen my terminal window, run make swap, and do dot slash swap, 2236 01:46:12,140 --> 01:46:15,290 now I see one, two, and then two, one. 2237 01:46:15,290 --> 01:46:16,640 So the code changes in this way. 2238 01:46:16,640 --> 01:46:18,530 And maybe this example more so than others 2239 01:46:18,530 --> 01:46:21,950 makes clear why the star operator lets us go somewhere, 2240 01:46:21,950 --> 01:46:24,980 but the ampersand effectively does the opposite and figures out 2241 01:46:24,980 --> 01:46:27,390 what the address of something now is. 2242 01:46:27,390 --> 01:46:27,890 All right. 2243 01:46:27,890 --> 01:46:30,720 So what about these other locations in memory? 2244 01:46:30,720 --> 01:46:34,700 Well, it turns out that, indeed, the stack, as we've described it, 2245 01:46:34,700 --> 01:46:35,750 grows up, and up, and up. 2246 01:46:35,750 --> 01:46:37,700 And recall that stack here in this sense is 2247 01:46:37,700 --> 01:46:41,300 kind of like the stack of trays and the cafeteria or any of the dining halls. 2248 01:46:41,300 --> 01:46:43,820 There's one tray, another tray, another tray, another tray. 2249 01:46:43,820 --> 01:46:45,955 But then you start removing them from top down. 2250 01:46:45,955 --> 01:46:48,830 So there's an ordering to them that we'll actually revisit next week. 2251 01:46:48,830 --> 01:46:51,230 But this is not a good design, in general. 2252 01:46:51,230 --> 01:46:55,040 You shouldn't be doing things like two trains on the tracks barreling together 2253 01:46:55,040 --> 01:46:56,435 toward each other in this way. 2254 01:46:56,435 --> 01:46:59,060 But honestly, it's kind of the only way, because if you've only 2255 01:46:59,060 --> 01:47:01,310 got a finite amount of memory, OK, sure, you 2256 01:47:01,310 --> 01:47:02,750 can have them both grow in the same direction. 2257 01:47:02,750 --> 01:47:04,940 But they're still going to hit some impasse eventually. 2258 01:47:04,940 --> 01:47:06,565 You're still going to run out of space. 2259 01:47:06,565 --> 01:47:10,590 So the way computers were designed years ago is they use memory in this way, 2260 01:47:10,590 --> 01:47:14,270 even though bad things can happen, if you use too much stack space, 2261 01:47:14,270 --> 01:47:15,510 or too much heap space. 2262 01:47:15,510 --> 01:47:16,740 So what do I mean by that? 2263 01:47:16,740 --> 01:47:20,210 Our example a moment ago just had us call main and then swap 2264 01:47:20,210 --> 01:47:21,000 and that was it. 2265 01:47:21,000 --> 01:47:22,950 So it's like two frames no big deal. 2266 01:47:22,950 --> 01:47:24,980 But if you call many functions again, and again, 2267 01:47:24,980 --> 01:47:27,980 and again, if you do something recursively, where you call yourself, 2268 01:47:27,980 --> 01:47:31,070 you're going to pile, pile, pile stack frames potentially. 2269 01:47:31,070 --> 01:47:33,750 So you could start to hit the so-called heap area. 2270 01:47:33,750 --> 01:47:36,427 Meanwhile, if you call malloc too many times, 2271 01:47:36,427 --> 01:47:38,510 you might be growing down, down, down, down, down, 2272 01:47:38,510 --> 01:47:41,400 and then overrun some of the stack memory, as well. 2273 01:47:41,400 --> 01:47:44,270 So bad things can happen when you overrun either of these. 2274 01:47:44,270 --> 01:47:46,670 And those of you maybe with prior programming experience 2275 01:47:46,670 --> 01:47:50,180 might have heard at least one of these terms, heap overflow, or more 2276 01:47:50,180 --> 01:47:51,890 popularly, stack overflow. 2277 01:47:51,890 --> 01:47:55,100 Super popular website for questions and answers about programming. 2278 01:47:55,100 --> 01:47:59,990 The etymology thereof is exactly this idea of overflowing the stack 2279 01:47:59,990 --> 01:48:03,290 and touching memory that you should not, whether it's 2280 01:48:03,290 --> 01:48:06,200 memory down here, or even worse, memory over here, 2281 01:48:06,200 --> 01:48:08,390 as by something called heap overflow. 2282 01:48:08,390 --> 01:48:12,530 And these are specific examples of what we'll start calling buffer overflows. 2283 01:48:12,530 --> 01:48:14,540 Buffer is just a chunk of memory. 2284 01:48:14,540 --> 01:48:18,860 And buffer overflows means overflowing, using too much of that memory. 2285 01:48:18,860 --> 01:48:20,190 And buffers are everywhere. 2286 01:48:20,190 --> 01:48:22,130 In fact, if you've used YouTube recently, 2287 01:48:22,130 --> 01:48:25,460 and maybe it's just kind of paused and spinning, and spinning, and spinning, 2288 01:48:25,460 --> 01:48:27,320 maybe you're on a really bad connection. 2289 01:48:27,320 --> 01:48:29,690 There's no more bytes in your buffer. 2290 01:48:29,690 --> 01:48:31,850 There's no more video footage in the buffer 2291 01:48:31,850 --> 01:48:33,890 because maybe you have such a bad connection. 2292 01:48:33,890 --> 01:48:36,650 But if Google were to make mistakes and try 2293 01:48:36,650 --> 01:48:40,230 to download too many bytes at a time, they too could overflow a buffer. 2294 01:48:40,230 --> 01:48:42,470 And if YouTube or similar apps have ever crashed, 2295 01:48:42,470 --> 01:48:47,090 it could be because they're trying to use more memory than they actually 2296 01:48:47,090 --> 01:48:47,610 should be. 2297 01:48:47,610 --> 01:48:49,490 So these things are sort of everywhere. 2298 01:48:49,490 --> 01:48:53,270 Now, as for these training wheels, we sort of took away 2299 01:48:53,270 --> 01:48:54,980 the mystery of what a string is. 2300 01:48:54,980 --> 01:48:57,230 But what about all of these other functions we've been 2301 01:48:57,230 --> 01:48:59,300 taking for granted now for a few weeks? 2302 01:48:59,300 --> 01:49:03,680 You can and should still use them to solve some problems because, frankly, C 2303 01:49:03,680 --> 01:49:09,350 does not make it easy to get user input safely, like, period, full stop. 2304 01:49:09,350 --> 01:49:13,640 It is very non-trivial to get user input without running 2305 01:49:13,640 --> 01:49:16,100 the risk of overflowing a buffer. 2306 01:49:16,100 --> 01:49:16,610 Why? 2307 01:49:16,610 --> 01:49:17,870 Well, you're the programmer. 2308 01:49:17,870 --> 01:49:21,020 How do you possibly know in advance how big of a string 2309 01:49:21,020 --> 01:49:24,170 a human might type in tomorrow, or the next week, or the next day? 2310 01:49:24,170 --> 01:49:27,450 You could try to be safe and allocate a million bytes all at once. 2311 01:49:27,450 --> 01:49:29,630 But what if they type in 1,000,001 characters, 2312 01:49:29,630 --> 01:49:32,870 or use copy paste so much that they similarly overflow? 2313 01:49:32,870 --> 01:49:36,030 So getting user input is a hard problem. 2314 01:49:36,030 --> 01:49:38,370 So let's introduce you to what the alternative would be 2315 01:49:38,370 --> 01:49:43,050 and given an appreciation for what libraries like CS50's and others 2316 01:49:43,050 --> 01:49:45,880 like it are actually doing for you. 2317 01:49:45,880 --> 01:49:49,680 Let me go ahead and create our own version of getint and getstring 2318 01:49:49,680 --> 01:49:52,770 without using the CS50 library, but using a standard C 2319 01:49:52,770 --> 01:49:54,360 function called scanf. 2320 01:49:54,360 --> 01:49:56,430 And to do that, let me go over to VS Code. 2321 01:49:56,430 --> 01:49:59,610 Let me create a new file called, for instance, get.c. 2322 01:49:59,610 --> 01:50:02,220 And then in get.c, let's make a very simple program 2323 01:50:02,220 --> 01:50:06,600 first that just gets an integer, but again, without using the CS50 library. 2324 01:50:06,600 --> 01:50:09,780 So let me go ahead and include the standard io library. 2325 01:50:09,780 --> 01:50:13,140 Let me go ahead and declare main as int main void. 2326 01:50:13,140 --> 01:50:17,017 And then inside of main, let me go ahead and declare an integer n 2327 01:50:17,017 --> 01:50:19,350 so that we have some place to put the integer that we're 2328 01:50:19,350 --> 01:50:20,490 getting from the user. 2329 01:50:20,490 --> 01:50:23,040 Then let me go ahead and just prompt the user 2330 01:50:23,040 --> 01:50:26,940 for a value for n, so n colon space, for instance. 2331 01:50:26,940 --> 01:50:29,220 Because again, I'm not using getint, so I can't just 2332 01:50:29,220 --> 01:50:31,300 call it to present the user with a prompt. 2333 01:50:31,300 --> 01:50:33,550 So I'm going to use printf to create my own prompt. 2334 01:50:33,550 --> 01:50:36,370 And now, let me use this function scanf as follows. 2335 01:50:36,370 --> 01:50:39,880 I'm going to call scanf, and then I'm going to pass to scanf, 2336 01:50:39,880 --> 01:50:45,460 similar in spirit to printf, a format code, like %i, 2337 01:50:45,460 --> 01:50:49,340 effectively telling scanf that what I want it to scan, so to speak, 2338 01:50:49,340 --> 01:50:52,715 from the user's keyboard is in fact a single integer. 2339 01:50:52,715 --> 01:50:55,840 Now, I'm going to close quotes, and I don't need a new line because I'm not 2340 01:50:55,840 --> 01:50:56,882 trying to print anything. 2341 01:50:56,882 --> 01:50:59,530 I'm trying to get something from the user using scanf. 2342 01:50:59,530 --> 01:51:03,680 But I do need to tell scanf where to put this integer. 2343 01:51:03,680 --> 01:51:06,040 Now, if I want to put this integer in the variable n, 2344 01:51:06,040 --> 01:51:08,470 it's not quite as simple as just passing n 2345 01:51:08,470 --> 01:51:11,080 in because recall how variables are passed. 2346 01:51:11,080 --> 01:51:13,690 This variable n is going to be passed by value. 2347 01:51:13,690 --> 01:51:16,510 Effectively, a copy is going to go into scanf, 2348 01:51:16,510 --> 01:51:19,910 and so scanf is not going to have the ability to change that value. 2349 01:51:19,910 --> 01:51:22,660 But if you think back to how we swapped two values 2350 01:51:22,660 --> 01:51:27,550 and passed two values into that swap function in C, 2351 01:51:27,550 --> 01:51:31,360 well, if we pass those two values in by their addresses, 2352 01:51:31,360 --> 01:51:34,690 so passing by reference, so to speak, then the function, 2353 01:51:34,690 --> 01:51:39,130 swap in that case, scanf in this case, can actually go to that address 2354 01:51:39,130 --> 01:51:40,400 and change the value. 2355 01:51:40,400 --> 01:51:43,840 So to summarize, I'm going to pass the scanf one argument, which is a format 2356 01:51:43,840 --> 01:51:47,830 code, and a second argument, which is the address of an integer into which 2357 01:51:47,830 --> 01:51:49,165 to put the user's value. 2358 01:51:49,165 --> 01:51:52,040 After that, I'm just going to go ahead and print out what's happened. 2359 01:51:52,040 --> 01:51:55,490 So I'm going to go ahead and print out the value of n followed by a colon, 2360 01:51:55,490 --> 01:51:58,450 followed by an actual placeholder, %i backslash n. 2361 01:51:58,450 --> 01:52:01,390 And I'm going to pass in now to printf the value n. 2362 01:52:01,390 --> 01:52:04,300 So to be clear, I'm still passing n into printf, just like we've 2363 01:52:04,300 --> 01:52:08,650 been doing since week one, but I'm passing to scanf the address of n, 2364 01:52:08,650 --> 01:52:12,510 so that scanf can actually go to that address and change the value of n. 2365 01:52:12,510 --> 01:52:14,260 So I think this is actually going to work, 2366 01:52:14,260 --> 01:52:17,710 even though I've not used getint or any of the CS50 library. 2367 01:52:17,710 --> 01:52:20,860 Let me go into my terminal, run make get. 2368 01:52:20,860 --> 01:52:22,090 Seems to compile OK. 2369 01:52:22,090 --> 01:52:26,470 Let me do dot shalsh get, and let me type in a value like 50 for n. 2370 01:52:26,470 --> 01:52:30,920 And indeed, I should see spit back at me that the value I got was 50. 2371 01:52:30,920 --> 01:52:33,730 So it turns out that getting an integer from users 2372 01:52:33,730 --> 01:52:37,210 is relatively straightforward just using scanf. 2373 01:52:37,210 --> 01:52:40,690 But of course, to use scanf, you need to know a little something about pointers 2374 01:52:40,690 --> 01:52:42,220 or addresses, more generally. 2375 01:52:42,220 --> 01:52:43,970 That was not knowledge we had in week one. 2376 01:52:43,970 --> 01:52:45,928 And so we do, indeed, use those training wheels 2377 01:52:45,928 --> 01:52:48,440 of the CS50 library for the past few weeks 2378 01:52:48,440 --> 01:52:50,350 so that we can get integers more easily. 2379 01:52:50,350 --> 01:52:55,210 And it turns out, if the user types more than a simple integer, 2380 01:52:55,210 --> 01:52:59,080 or doesn't even type in an integer, scanf isn't necessarily 2381 01:52:59,080 --> 01:53:02,560 going to behave as user friendly as getint might. 2382 01:53:02,560 --> 01:53:06,642 So in the CS50 library, we do a bit more error handling for you, as well. 2383 01:53:06,642 --> 01:53:08,350 But let's consider now an implementation, 2384 01:53:08,350 --> 01:53:11,170 not of getting an integer, but getting a string instead. 2385 01:53:11,170 --> 01:53:15,220 Let me clear my terminal window, and let me go ahead and erase all of this code, 2386 01:53:15,220 --> 01:53:17,740 and instead focus this time on getting a string. 2387 01:53:17,740 --> 01:53:21,250 Well, we know we can't use string anymore, at least if we're not 2388 01:53:21,250 --> 01:53:22,420 using the CS50 library. 2389 01:53:22,420 --> 01:53:25,880 But not a problem because we know that strings are now char stars. 2390 01:53:25,880 --> 01:53:28,600 So if I want to get a string from the user, that's like getting, 2391 01:53:28,600 --> 01:53:29,860 I think, a char star. 2392 01:53:29,860 --> 01:53:32,680 So let me just call this string s by default. 2393 01:53:32,680 --> 01:53:36,010 Let me go ahead therefore and declare a variable s that's 2394 01:53:36,010 --> 01:53:37,190 going to store my string. 2395 01:53:37,190 --> 01:53:40,000 Let me go ahead next, as before, and prompt the user 2396 01:53:40,000 --> 01:53:43,330 for the value of that variable, just by prompting them with printf. 2397 01:53:43,330 --> 01:53:44,620 So nothing fancy there. 2398 01:53:44,620 --> 01:53:47,650 And let me try again using scanf to scan this time 2399 01:53:47,650 --> 01:53:49,630 a string from the user's keyboard. 2400 01:53:49,630 --> 01:53:51,160 I'm going to type scanf. 2401 01:53:51,160 --> 01:53:54,490 I'm going to do %s instead of %i because I, indeed, 2402 01:53:54,490 --> 01:53:56,900 want to scan a string in this case. 2403 01:53:56,900 --> 01:53:59,230 And then I'm going to go ahead and pass in just s. 2404 01:53:59,230 --> 01:54:02,680 And here, at first glance, seems to be an inconsistency because, 2405 01:54:02,680 --> 01:54:05,230 previously, I did ampersand n. 2406 01:54:05,230 --> 01:54:09,820 But that's because n was an integer, not the address thereof. 2407 01:54:09,820 --> 01:54:12,260 But in the world of strings as we now know, 2408 01:54:12,260 --> 01:54:14,590 a string is just the address of its first byte. 2409 01:54:14,590 --> 01:54:18,250 And so if we declare s to be a char star, a.k.a. 2410 01:54:18,250 --> 01:54:21,190 string, well, s is already in address. 2411 01:54:21,190 --> 01:54:24,070 So I can just pass in s in this case to scanf 2412 01:54:24,070 --> 01:54:26,380 without actually using an ampersand. 2413 01:54:26,380 --> 01:54:29,800 After that, let's go ahead and print out the result. So let's just use printf. 2414 01:54:29,800 --> 01:54:33,970 Let's print out a prefix, like s colon again, %s as my placeholder, 2415 01:54:33,970 --> 01:54:37,130 and now backslash n because I'm formatting it on the screen. 2416 01:54:37,130 --> 01:54:41,480 And then let's go ahead and pass in s as always to printf. 2417 01:54:41,480 --> 01:54:44,660 So s is just a string, so I just pass it into printf like that. 2418 01:54:44,660 --> 01:54:49,220 Well, let me go ahead now, and I'm going to go ahead and compile 2419 01:54:49,220 --> 01:54:52,158 this an old fashioned way because we actually protect you 2420 01:54:52,158 --> 01:54:53,450 from doing something like this. 2421 01:54:53,450 --> 01:54:56,390 But I'm going to go ahead and ignore the warnings you would otherwise 2422 01:54:56,390 --> 01:54:56,900 see us make. 2423 01:54:56,900 --> 01:55:00,020 And I'm going to go ahead and compile this with clang directly. 2424 01:55:00,020 --> 01:55:03,980 So clang-o get, because that's the name of the program I want to output. 2425 01:55:03,980 --> 01:55:09,740 But I'm also going to specify dash capital W, no, uninitialized, which 2426 01:55:09,740 --> 01:55:13,040 is simply another command line argument that's 2427 01:55:13,040 --> 01:55:16,603 going to tell clang not to warn us about variables that are not initialized. 2428 01:55:16,603 --> 01:55:19,520 Because case in point on line five, as some of you might have noticed, 2429 01:55:19,520 --> 01:55:22,610 I didn't actually initialize s to anything, even null. 2430 01:55:22,610 --> 01:55:26,090 But that's OK because I want to forge ahead blindly just to make a point as 2431 01:55:26,090 --> 01:55:27,530 to what's going on here. 2432 01:55:27,530 --> 01:55:31,490 And in fact, let's go ahead and compile this code as follows. 2433 01:55:31,490 --> 01:55:33,530 It does seem to compile, even though make would 2434 01:55:33,530 --> 01:55:35,210 have warned us that something's awry. 2435 01:55:35,210 --> 01:55:39,350 Let me go ahead now and run dot slash get, and this time, not type in 50. 2436 01:55:39,350 --> 01:55:41,600 But let me type in something like our familiar "hi" 2437 01:55:41,600 --> 01:55:44,930 exclamation point, and hit enter. 2438 01:55:44,930 --> 01:55:48,110 I immediately get a segmentation fault, which 2439 01:55:48,110 --> 01:55:50,930 means something has gone wrong related to memory. 2440 01:55:50,930 --> 01:55:54,140 A segment of memory has been touched that I shouldn't have. 2441 01:55:54,140 --> 01:55:55,608 Well, why, in fact, is this? 2442 01:55:55,608 --> 01:55:57,650 Well, let's consider what it is we've been doing. 2443 01:55:57,650 --> 01:56:00,050 If this here is my computer's memory, and in the first case 2444 01:56:00,050 --> 01:56:01,700 I was just trying to get an integer, that 2445 01:56:01,700 --> 01:56:04,400 was actually pretty straightforward, because even if this memory is filled 2446 01:56:04,400 --> 01:56:07,130 with a whole bunch of garbage values, as personified here 2447 01:56:07,130 --> 01:56:11,270 by Oscar the Grouch, when I declared n to be an integer before, 2448 01:56:11,270 --> 01:56:14,210 I just needed, on this machine, four bytes, which 2449 01:56:14,210 --> 01:56:15,570 is the typical size for an int. 2450 01:56:15,570 --> 01:56:17,100 And I put the number 50 there. 2451 01:56:17,100 --> 01:56:19,517 So it doesn't matter that there were these garbage values. 2452 01:56:19,517 --> 01:56:22,190 I just went to those four bytes after declaring 2453 01:56:22,190 --> 01:56:24,680 a variable called n and overwrote those bits, 2454 01:56:24,680 --> 01:56:27,410 with some pattern of bits representing the number 50. 2455 01:56:27,410 --> 01:56:30,260 But strings we now know are sort of fundamentally different. 2456 01:56:30,260 --> 01:56:34,850 If I go back to that same memory space, and I declare s to be a pointer, 2457 01:56:34,850 --> 01:56:37,970 that is, a char star, well, recall that pointers are generally 2458 01:56:37,970 --> 01:56:40,100 eight bytes on modern systems. 2459 01:56:40,100 --> 01:56:44,960 And so that's like taking eight of these bytes from memory and calling it s. 2460 01:56:44,960 --> 01:56:48,140 But the catch is, if I haven't initialized s to actually 2461 01:56:48,140 --> 01:56:52,100 be a valid location, as via calling malloc, 2462 01:56:52,100 --> 01:56:54,088 there are still garbage values there. 2463 01:56:54,088 --> 01:56:55,880 That is to say, patterns of bits that maybe 2464 01:56:55,880 --> 01:56:58,730 have been there always, but from some previous function that got 2465 01:56:58,730 --> 01:57:01,880 called, or some other lines of code if this program were actually bigger. 2466 01:57:01,880 --> 01:57:06,350 So it's just some garbage value is filling that variable s. 2467 01:57:06,350 --> 01:57:10,610 The problem, though, is that in my code now, when I call scanf 2468 01:57:10,610 --> 01:57:14,855 and I tell scanf to scan a string from the user and to put it at that location 2469 01:57:14,855 --> 01:57:16,670 s, well, what is that location s? 2470 01:57:16,670 --> 01:57:18,650 It's literally a garbage value. 2471 01:57:18,650 --> 01:57:21,630 It's the equivalent of a foam finger pointing there, there, there. 2472 01:57:21,630 --> 01:57:25,050 We just don't know because it's not a valid address. 2473 01:57:25,050 --> 01:57:28,190 And so I get that segmentation fault here in my terminal window 2474 01:57:28,190 --> 01:57:31,490 because I've not initialized s to be some known value. 2475 01:57:31,490 --> 01:57:34,070 I get a segmentation fault because, effectively, I've 2476 01:57:34,070 --> 01:57:38,690 accidentally touched memory that I should not, in fact, have done so. 2477 01:57:38,690 --> 01:57:40,740 So how do we fix this? 2478 01:57:40,740 --> 01:57:43,645 Well clearly, I need s to point at some valid chunk of memory, 2479 01:57:43,645 --> 01:57:45,020 and I could do that using malloc. 2480 01:57:45,020 --> 01:57:47,395 But frankly, in this case, I could do it even more simply 2481 01:57:47,395 --> 01:57:50,660 by just declaring s to be an array of characters, 2482 01:57:50,660 --> 01:57:52,130 as we might have in week two. 2483 01:57:52,130 --> 01:57:54,510 So let me go ahead and clear my terminal window here. 2484 01:57:54,510 --> 01:57:59,360 Let me go into get.c, and let's simply change what's s is. 2485 01:57:59,360 --> 01:58:02,720 Instead of a char star, which we know is what a string technically is, 2486 01:58:02,720 --> 01:58:05,970 we can still implement strings as arrays of characters. 2487 01:58:05,970 --> 01:58:08,880 That's certainly still true. 2488 01:58:08,880 --> 01:58:11,300 So let me go ahead and do that, declare s to be 2489 01:58:11,300 --> 01:58:12,772 an array of, say, four characters. 2490 01:58:12,772 --> 01:58:15,980 And in this case, I should have enough room for the H, the I, the exclamation 2491 01:58:15,980 --> 01:58:20,390 point, and even that null character, the trailing backslash zero. 2492 01:58:20,390 --> 01:58:22,370 So now, let me go ahead and build this. 2493 01:58:22,370 --> 01:58:23,360 Make get. 2494 01:58:23,360 --> 01:58:26,430 And because I'm not, not initializing something this time, 2495 01:58:26,430 --> 01:58:29,960 I can use make as usual without getting yelled at because I'm not yet 2496 01:58:29,960 --> 01:58:31,160 doing anything wrong. 2497 01:58:31,160 --> 01:58:34,250 Now let me go ahead and do dot slash get enter. 2498 01:58:34,250 --> 01:58:38,240 And in this case, it's ready to receive my H-I exclamation point enter, 2499 01:58:38,240 --> 01:58:40,460 and all actually seems well. 2500 01:58:40,460 --> 01:58:40,970 Why? 2501 01:58:40,970 --> 01:58:44,240 Because in this case, I actually had enough space for s, 2502 01:58:44,240 --> 01:58:47,750 because if I go back to my memory here, because I've now redeclared s 2503 01:58:47,750 --> 01:58:50,090 as an actual array of four characters, that's 2504 01:58:50,090 --> 01:58:53,600 like asking the operating system, for instance, for these four chars here. 2505 01:58:53,600 --> 01:58:58,340 And certainly, I can fit H-I exclamation point and the null character 2506 01:58:58,340 --> 01:58:59,970 into those four bytes. 2507 01:58:59,970 --> 01:59:01,430 So there's not a problem. 2508 01:59:01,430 --> 01:59:05,810 But there might be a problem if the I, or the user, more generally, 2509 01:59:05,810 --> 01:59:07,520 types in too many characters. 2510 01:59:07,520 --> 01:59:09,860 So let me go ahead and run dot slash get again. 2511 01:59:09,860 --> 01:59:12,330 Let me type H-I exclamation point. 2512 01:59:12,330 --> 01:59:16,340 But just to get a little aggressive, let me highlight that and paste it again, 2513 01:59:16,340 --> 01:59:19,880 again, again, and again, and really type, very excitedly, 2514 01:59:19,880 --> 01:59:23,460 a pretty long string that is surely longer than four bytes. 2515 01:59:23,460 --> 01:59:25,820 Well unfortunately, I've only asked the operating system 2516 01:59:25,820 --> 01:59:27,540 for an array of four bytes. 2517 01:59:27,540 --> 01:59:31,340 So what's going to happen with all of those extra hi's, hi's, hi's? 2518 01:59:31,340 --> 01:59:34,795 They're just going to, by default, remain contiguous from left to right, 2519 01:59:34,795 --> 01:59:36,420 top to bottom in the computer's memory. 2520 01:59:36,420 --> 01:59:38,930 But they're going to end up, some of those characters, 2521 01:59:38,930 --> 01:59:43,020 at locations I didn't ask the operating system for in this array. 2522 01:59:43,020 --> 01:59:47,450 So if I go back to VS Code here, I've typed in a very long string, certainly 2523 01:59:47,450 --> 01:59:48,860 longer than four bytes in total. 2524 01:59:48,860 --> 01:59:49,880 Let me hit enter. 2525 01:59:49,880 --> 01:59:50,690 And darn it. 2526 01:59:50,690 --> 01:59:53,800 There is another segmentation fault. So in short, we, 2527 01:59:53,800 --> 01:59:56,540 you are going to see these segmentation faults any time 2528 01:59:56,540 --> 01:59:58,910 you touch segments of memory, so to speak, 2529 01:59:58,910 --> 02:00:02,870 that do not belong to you, that you didn't allocate space for, as 2530 02:00:02,870 --> 02:00:05,220 via an array, or even via malloc. 2531 02:00:05,220 --> 02:00:07,970 And this is going to be a fundamental problem with getting strings 2532 02:00:07,970 --> 02:00:11,210 because I don't in advance how long the string is going to be 2533 02:00:11,210 --> 02:00:12,680 that the human's going to type in. 2534 02:00:12,680 --> 02:00:13,670 Maybe it's four. 2535 02:00:13,670 --> 02:00:14,840 Maybe it's fewer characters. 2536 02:00:14,840 --> 02:00:15,980 Maybe it's even more. 2537 02:00:15,980 --> 02:00:17,400 So what's the alternative? 2538 02:00:17,400 --> 02:00:20,480 Well, I could go in here maybe and allocate, I don't know, 2539 02:00:20,480 --> 02:00:22,970 4,000 characters for s. 2540 02:00:22,970 --> 02:00:27,770 But what if you type in an even longer string that's 4,001 characters or more? 2541 02:00:27,770 --> 02:00:31,580 I might still have these memory related errors, these segmentation faults. 2542 02:00:31,580 --> 02:00:35,540 So one of the reasons then, too, that we provide you with the CS50 library, 2543 02:00:35,540 --> 02:00:37,880 and in turn, functions like getstring, is 2544 02:00:37,880 --> 02:00:43,250 that getstring very, very conservatively walks through these user's input 2545 02:00:43,250 --> 02:00:46,410 byte, by byte, by byte, one character at a time. 2546 02:00:46,410 --> 02:00:48,980 And what the CS50 library is doing underneath the hood 2547 02:00:48,980 --> 02:00:52,400 is, as soon as it realizes, oh, the user gave us another byte, another byte, 2548 02:00:52,400 --> 02:00:56,270 we in the CS50 library are constantly allocating and re 2549 02:00:56,270 --> 02:00:59,960 allocating more and more memory using malloc for you, 2550 02:00:59,960 --> 02:01:03,660 and effectively managing the memory required for that string. 2551 02:01:03,660 --> 02:01:07,580 So even though scanf exists, it's dangerous to use with strings. 2552 02:01:07,580 --> 02:01:09,890 And even with integers, it turns out it lacks 2553 02:01:09,890 --> 02:01:13,940 some of the error handling that the CS50 library has thus far provided. 2554 02:01:13,940 --> 02:01:16,010 How do we actually go about solving this? 2555 02:01:16,010 --> 02:01:21,170 The way getstring actually works in the CS50 library is it kind of tiptoes. 2556 02:01:21,170 --> 02:01:21,860 It waits. 2557 02:01:21,860 --> 02:01:25,250 It gets one character from you and then checks if there's another one coming. 2558 02:01:25,250 --> 02:01:27,027 Then it allocates more space for a second. 2559 02:01:27,027 --> 02:01:30,110 If there's still a third, it allocates more space, more space, more space. 2560 02:01:30,110 --> 02:01:33,260 So essentially, what getstring does is it uses malloc again, 2561 02:01:33,260 --> 02:01:36,290 and again, and again, and it kind of lays the tracks down 2562 02:01:36,290 --> 02:01:38,690 as you're typing in the keystrokes and hitting enter, 2563 02:01:38,690 --> 02:01:42,230 so that we never assume how many characters you're going to type in. 2564 02:01:42,230 --> 02:01:46,820 We dynamically allocate just enough bytes for you, plus one extra, 2565 02:01:46,820 --> 02:01:48,540 for the null character. 2566 02:01:48,540 --> 02:01:50,358 And this is sort of a hoop that's just not 2567 02:01:50,358 --> 02:01:53,150 fun to jump through when, at the end of the day, all you want to do 2568 02:01:53,150 --> 02:01:55,230 is get input from the user. 2569 02:01:55,230 --> 02:01:58,100 So even with the training wheels officially off, 2570 02:01:58,100 --> 02:02:01,190 it's going to be annoying to get strings from users in C. 2571 02:02:01,190 --> 02:02:04,247 But it is easy with ints, with floats, with other data types. 2572 02:02:04,247 --> 02:02:06,080 And frankly, we'll soon, in two weeks, pivot 2573 02:02:06,080 --> 02:02:09,380 to Python, which takes care of all of these problems for us 2574 02:02:09,380 --> 02:02:10,700 and manages our memory. 2575 02:02:10,700 --> 02:02:13,820 But for now, we have one final to do beyond scanf, 2576 02:02:13,820 --> 02:02:17,360 which is file IO, which is a fancy way of saying input and output. 2577 02:02:17,360 --> 02:02:19,760 Because now that we a little bit of hexadecimal, 2578 02:02:19,760 --> 02:02:21,980 now that we know a little bit about pointers, 2579 02:02:21,980 --> 02:02:24,170 we actually have some more functions available to us 2580 02:02:24,170 --> 02:02:29,000 that will let us actually manipulate files on a computer's hard drive, 2581 02:02:29,000 --> 02:02:33,020 like image files, or text files, or anything else we might want. 2582 02:02:33,020 --> 02:02:38,510 Among the most common functions that are related to files are these here. 2583 02:02:38,510 --> 02:02:42,770 fopen is going to be a function that lets you open a file, doing in code 2584 02:02:42,770 --> 02:02:46,670 what you might otherwise do by going to File, Open in a graphical program. 2585 02:02:46,670 --> 02:02:48,000 Fclose does the opposite. 2586 02:02:48,000 --> 02:02:51,347 It's the way you, in code, click on an X and close a file. 2587 02:02:51,347 --> 02:02:53,180 Nothing's going to happen visually, but it's 2588 02:02:53,180 --> 02:02:57,410 how you give access to a program to the contents of a file. 2589 02:02:57,410 --> 02:03:01,550 fprintf, it allows you to print, not to the screen, but to a file. 2590 02:03:01,550 --> 02:03:05,672 fscanf lets you read data, not from the keyboard, but from a file. fread 2591 02:03:05,672 --> 02:03:08,630 and fwrite write are similarly used to read and write data from a file, 2592 02:03:08,630 --> 02:03:11,630 but generally binary data, like images, or something 2593 02:03:11,630 --> 02:03:13,820 that's not Ascii or Unicode text. 2594 02:03:13,820 --> 02:03:15,932 fseek is a function that lets you move around 2595 02:03:15,932 --> 02:03:18,140 in a file left to right, kind of like fast forwarding 2596 02:03:18,140 --> 02:03:20,240 or rewinding through Netflix, or similar when 2597 02:03:20,240 --> 02:03:23,840 you want to jump to a different location in a video, or in this case, a file. 2598 02:03:23,840 --> 02:03:25,620 And there's bunches of others, as well. 2599 02:03:25,620 --> 02:03:28,370 So to give you a sense of what you can do with it when 2600 02:03:28,370 --> 02:03:32,270 it comes to manipulating files, let's write just a couple of final programs, 2601 02:03:32,270 --> 02:03:36,090 for instance, that let us manipulate some of this code for us. 2602 02:03:36,090 --> 02:03:41,900 In fact, let me go ahead and open up here in VS Code a new file called, 2603 02:03:41,900 --> 02:03:43,820 say, phonebook.c. 2604 02:03:43,820 --> 02:03:46,280 And in phonebook.c, we're going to implement now 2605 02:03:46,280 --> 02:03:48,600 a version of the phonebook like we did in the past. 2606 02:03:48,600 --> 02:03:55,010 But in this case, we don't actually have a forgetful program that 2607 02:03:55,010 --> 02:03:57,827 prompts the user with getstring for a couple of names and numbers, 2608 02:03:57,827 --> 02:03:59,660 and then just forgets about them altogether. 2609 02:03:59,660 --> 02:04:01,610 This version of the phonebook is actually 2610 02:04:01,610 --> 02:04:05,420 going to go ahead and save them persistently to a file for us. 2611 02:04:05,420 --> 02:04:10,250 And for this, let me go ahead and open up just on my other screen here, 2612 02:04:10,250 --> 02:04:14,420 without flipping over just yet, let me go ahead and open up-- 2613 02:04:14,420 --> 02:04:17,220 give me just one moment. 2614 02:04:17,220 --> 02:04:19,680 So we have this ready to go. 2615 02:04:19,680 --> 02:04:23,230 Let me go ahead and create the file, this program as follows. 2616 02:04:23,230 --> 02:04:26,400 I'm going to cheat and save time by using the CS50 library because I do not 2617 02:04:26,400 --> 02:04:29,920 want to get into the nuances of getting strings character by character, 2618 02:04:29,920 --> 02:04:31,590 which itself will escalate too quickly. 2619 02:04:31,590 --> 02:04:34,890 But let me go ahead and include the CS50 library, the standard IO library, 2620 02:04:34,890 --> 02:04:38,340 and lastly, the string library for this particular case. 2621 02:04:38,340 --> 02:04:41,850 In my main function now, I'm going to go ahead and open up 2622 02:04:41,850 --> 02:04:44,880 a file called maybe phonebook.csv. 2623 02:04:44,880 --> 02:04:48,210 If you've ever used a CSV file, it's like a lightweight spreadsheet 2624 02:04:48,210 --> 02:04:51,630 that you can open in Apple Numbers, Google Spreadsheets, Microsoft Excel. 2625 02:04:51,630 --> 02:04:55,240 But CSV means that we're going to separate all of the values by commas. 2626 02:04:55,240 --> 02:04:57,960 So anywhere we want a new column, we actually use a comma, 2627 02:04:57,960 --> 02:04:59,140 as we'll soon see. 2628 02:04:59,140 --> 02:05:00,510 So how do I actually do this? 2629 02:05:00,510 --> 02:05:07,980 I can open a file called phonebook.csv by literally using fopen phonebook.csv. 2630 02:05:07,980 --> 02:05:11,400 And I have to tell fopen how I want to open it. 2631 02:05:11,400 --> 02:05:14,040 Do I want to open it for reading with R? 2632 02:05:14,040 --> 02:05:16,710 Do I want to open it for writing with W? 2633 02:05:16,710 --> 02:05:19,210 Or do I want to open it with appending, A? 2634 02:05:19,210 --> 02:05:22,990 And for something like a phone book, if I run this program again and again, 2635 02:05:22,990 --> 02:05:26,830 I'm going to actually do append so that new contacts get added to the file, 2636 02:05:26,830 --> 02:05:29,440 and we don't overwrite it with W. 2637 02:05:29,440 --> 02:05:31,120 Now, what is fopen return? 2638 02:05:31,120 --> 02:05:35,300 It technically returns a pointer to a file. 2639 02:05:35,300 --> 02:05:36,550 But this one's a little weird. 2640 02:05:36,550 --> 02:05:41,830 It's all capitalized, but it is a thing in C. File in all caps star file 2641 02:05:41,830 --> 02:05:44,660 is going to be a pointer to that file in memory. 2642 02:05:44,660 --> 02:05:48,700 So think of fopen as opening the file and returning the address thereof 2643 02:05:48,700 --> 02:05:50,060 in the computer's memory. 2644 02:05:50,060 --> 02:05:50,560 All right. 2645 02:05:50,560 --> 02:05:51,670 What do I want to next do? 2646 02:05:51,670 --> 02:05:54,970 I want to go ahead and get two strings from the user, like maybe someone's 2647 02:05:54,970 --> 02:05:59,380 name, using getstring, again, to keep things simple for now. 2648 02:05:59,380 --> 02:06:01,490 Let me then go ahead and get another one. 2649 02:06:01,490 --> 02:06:02,620 How about their number? 2650 02:06:02,620 --> 02:06:05,320 Using getstring, again, prompting for a number. 2651 02:06:05,320 --> 02:06:07,480 And I don't strictly need these training wheels. 2652 02:06:07,480 --> 02:06:09,160 So even though it doesn't really make a difference, 2653 02:06:09,160 --> 02:06:11,827 I'm going to at least change that to char star, even though I do 2654 02:06:11,827 --> 02:06:14,560 want to keep using getstring conveniently. 2655 02:06:14,560 --> 02:06:18,590 And now I want to save this person's name and number to that CSV file. 2656 02:06:18,590 --> 02:06:21,550 So I'm going to use, not printf, but fprintf, 2657 02:06:21,550 --> 02:06:26,380 printing to that file, variable, which is open in the computer's memory. 2658 02:06:26,380 --> 02:06:31,360 Now I'm going to go ahead and print out two strings, %s comma, %s. 2659 02:06:31,360 --> 02:06:34,870 Then I want to go ahead and print out the name for the first placeholder, 2660 02:06:34,870 --> 02:06:37,150 and the number for the second placeholder. 2661 02:06:37,150 --> 02:06:40,600 And for good measure, I want to move the cursor to the next line in the file, 2662 02:06:40,600 --> 02:06:43,600 so I am going to include a backslash n. 2663 02:06:43,600 --> 02:06:48,820 Then I'm going to go ahead and fclose that same file with fclose. 2664 02:06:48,820 --> 02:06:49,540 And that's it. 2665 02:06:49,540 --> 02:06:50,920 No more printing to the user. 2666 02:06:50,920 --> 02:06:53,620 But I claim that I'm going to be changing the file again, 2667 02:06:53,620 --> 02:06:54,628 and again, and again. 2668 02:06:54,628 --> 02:06:55,420 So let me try this. 2669 02:06:55,420 --> 02:06:59,950 Make phone book, OK, dot slash phonebook, enter. 2670 02:06:59,950 --> 02:07:01,480 And let's type in David. 2671 02:07:01,480 --> 02:07:06,880 And how about +1 617-495-1000? 2672 02:07:06,880 --> 02:07:07,930 Enter. 2673 02:07:07,930 --> 02:07:09,310 OK, hopefully, it worked. 2674 02:07:09,310 --> 02:07:10,060 Let's do it again. 2675 02:07:10,060 --> 02:07:10,990 Dot slash phonebook. 2676 02:07:10,990 --> 02:07:14,170 Carter, we'll give him the same number as last time. 2677 02:07:14,170 --> 02:07:16,690 495-1000. 2678 02:07:16,690 --> 02:07:19,770 And let's do, how about just those two? 2679 02:07:19,770 --> 02:07:23,880 So let me go ahead now and reveal that we do have 2680 02:07:23,880 --> 02:07:27,510 a file in here called phonebook.csv. 2681 02:07:27,510 --> 02:07:29,290 So that does exist. 2682 02:07:29,290 --> 02:07:30,490 Let me go ahead and do this. 2683 02:07:30,490 --> 02:07:33,838 Let me open up my file browser over here. 2684 02:07:33,838 --> 02:07:35,380 I've got a lot of files I've created. 2685 02:07:35,380 --> 02:07:36,450 Here's phonebook.csv. 2686 02:07:36,450 --> 02:07:39,210 And if I click on it, there is the file that I just 2687 02:07:39,210 --> 02:07:40,780 created, separated by commas. 2688 02:07:40,780 --> 02:07:44,430 But even more interestingly, let me actually right click or control click 2689 02:07:44,430 --> 02:07:47,620 on this, download it to my Mac's downloads folder. 2690 02:07:47,620 --> 02:07:50,370 Let me go into my downloads folder just for fun, 2691 02:07:50,370 --> 02:07:52,740 and I've installed in advance Microsoft Excel. 2692 02:07:52,740 --> 02:07:58,060 If I go into my downloads folder and open up phonebook.csv, 2693 02:07:58,060 --> 02:08:02,755 we're going to see, oh, Apple Numbers, not Excel, opening up. 2694 02:08:02,755 --> 02:08:03,630 View my spreadsheets. 2695 02:08:03,630 --> 02:08:05,172 All right, numbers is kind of stupid. 2696 02:08:05,172 --> 02:08:06,690 So there we go. 2697 02:08:06,690 --> 02:08:10,210 No, this isn't a Mac versus PC thing. 2698 02:08:10,210 --> 02:08:14,880 So now we have phonebook.csv rendered in this format here. 2699 02:08:14,880 --> 02:08:18,190 Numbers presumes that the top row should be gray and not white, as well. 2700 02:08:18,190 --> 02:08:20,470 So the formatting looks a bit off. 2701 02:08:20,470 --> 02:08:22,960 Anyhow, clearly, you could open this same file 2702 02:08:22,960 --> 02:08:25,720 in a spreadsheet program like Microsoft Office, or Apple Numbers, 2703 02:08:25,720 --> 02:08:28,330 or of course, something like Google Spreadsheets. 2704 02:08:28,330 --> 02:08:32,170 But let me do one other thing when it comes to copying files now, 2705 02:08:32,170 --> 02:08:37,180 whereby besides making a phone book, whereby I clearly have the ability now 2706 02:08:37,180 --> 02:08:40,120 to save strings in files. 2707 02:08:40,120 --> 02:08:43,270 And actually, just for good measure, let me hammer home the point 2708 02:08:43,270 --> 02:08:46,460 that anytime we're dealing with pointers now, something could go wrong. 2709 02:08:46,460 --> 02:08:48,370 And if you read the documentation for fopen, 2710 02:08:48,370 --> 02:08:52,540 we should also check that file could be null. 2711 02:08:52,540 --> 02:08:55,940 Maybe the file is not found, or something's not working on the server. 2712 02:08:55,940 --> 02:08:58,610 And so just to be safe, we should return one there. 2713 02:08:58,610 --> 02:09:01,210 So even not just malloc, not just getstring. 2714 02:09:01,210 --> 02:09:03,700 Any time a function returns a pointer, you 2715 02:09:03,700 --> 02:09:06,940 should check if it's null, because if it is, per the documentation, 2716 02:09:06,940 --> 02:09:08,990 almost always means something has gone wrong. 2717 02:09:08,990 --> 02:09:12,520 So you should get out, lest you trust the return value therein. 2718 02:09:12,520 --> 02:09:16,300 So let me go ahead and do one other program here. 2719 02:09:16,300 --> 02:09:18,110 Let me create my own copy program. 2720 02:09:18,110 --> 02:09:22,610 So up until now, we've used commands like RM, and LS, and CP for copy. 2721 02:09:22,610 --> 02:09:26,360 I can actually create my own version of Linux's copy program, 2722 02:09:26,360 --> 02:09:27,630 perhaps as follows. 2723 02:09:27,630 --> 02:09:31,730 Let me actually go into cp.c, in this case. 2724 02:09:31,730 --> 02:09:33,440 Let me include some familiar file. 2725 02:09:33,440 --> 02:09:34,880 Standard io.h. 2726 02:09:34,880 --> 02:09:37,550 Let me include, how about one other? 2727 02:09:37,550 --> 02:09:40,940 Standard int.h for reasons we'll see now, 2728 02:09:40,940 --> 02:09:46,010 because in standard int.h is that uint8_type that I mentioned 2729 02:09:46,010 --> 02:09:51,530 earlier, which just means, give me an eight bit value that's unsigned, 2730 02:09:51,530 --> 02:09:53,000 which means no negative numbers. 2731 02:09:53,000 --> 02:09:54,320 It's just raw data. 2732 02:09:54,320 --> 02:09:57,210 It's not an integer in the positive or negative sense. 2733 02:09:57,210 --> 02:09:59,180 And let me just nickname that to byte, just 2734 02:09:59,180 --> 02:10:02,780 to make clear that I want to manipulate files byte at a time. 2735 02:10:02,780 --> 02:10:05,390 Let me now declare, for the first time today, 2736 02:10:05,390 --> 02:10:10,610 a version of main that takes in an ergc command, takes in argc, 2737 02:10:10,610 --> 02:10:15,938 and takes in argv, which is for command line arguments. 2738 02:10:15,938 --> 02:10:18,730 Technically though, I'm not using the CS50 library in this version, 2739 02:10:18,730 --> 02:10:20,590 so even that can now be changed. 2740 02:10:20,590 --> 02:10:23,730 And this is the canonical way in C to declare 2741 02:10:23,730 --> 02:10:25,980 main when you want to get command line arguments using 2742 02:10:25,980 --> 02:10:28,240 char star instead of string. 2743 02:10:28,240 --> 02:10:30,210 So now, I'm going to do two things. 2744 02:10:30,210 --> 02:10:31,470 Remember how copy works. 2745 02:10:31,470 --> 02:10:34,710 You specify two files, the file you want to copy, 2746 02:10:34,710 --> 02:10:36,840 and the new name that you want to give to the copy. 2747 02:10:36,840 --> 02:10:41,860 So it would be like CP, space, old name, space, new name at the command line. 2748 02:10:41,860 --> 02:10:43,500 So accordingly, I'm going to do this. 2749 02:10:43,500 --> 02:10:47,250 I'm going to create one file in memory called source, or SRC for short. 2750 02:10:47,250 --> 02:10:52,050 And I'm going to set that equal to whatever is in argv one in read mode. 2751 02:10:52,050 --> 02:10:55,200 But just to be super specific, I'm going to use read binary mode. 2752 02:10:55,200 --> 02:10:56,910 I don't want to be copying text files. 2753 02:10:56,910 --> 02:10:59,770 I want binary data, zeros and ones, like images. 2754 02:10:59,770 --> 02:11:02,850 So I'm going to tell fopen to expect binary data. 2755 02:11:02,850 --> 02:11:05,610 I'm then going to go ahead and create a second variable 2756 02:11:05,610 --> 02:11:08,170 called destination, DST for short. 2757 02:11:08,170 --> 02:11:10,620 And I'm going to open up whatever is in argv two, 2758 02:11:10,620 --> 02:11:12,670 the second file name at the command line. 2759 02:11:12,670 --> 02:11:14,550 But I don't want to read this file. 2760 02:11:14,550 --> 02:11:18,330 I want to write to it in binary using zeros and ones. 2761 02:11:18,330 --> 02:11:21,657 Now, let me do the copying one bite at a time. 2762 02:11:21,657 --> 02:11:22,740 It's a little inefficient. 2763 02:11:22,740 --> 02:11:25,200 I should really do bunches of bytes at a time for speed. 2764 02:11:25,200 --> 02:11:28,650 But let me just give myself one byte in a variable called b. 2765 02:11:28,650 --> 02:11:31,260 So byte is not a thing in C. It's literally 2766 02:11:31,260 --> 02:11:34,080 a synonym I created just for the sake of discussion 2767 02:11:34,080 --> 02:11:36,545 because we'll do this in the future, as well. 2768 02:11:36,545 --> 02:11:37,920 Now, let me go ahead and do this. 2769 02:11:37,920 --> 02:11:40,617 How do you copy a file from old to new? 2770 02:11:40,617 --> 02:11:42,450 Well, I think it would suffice to use a loop 2771 02:11:42,450 --> 02:11:44,190 and just start at the beginning of the file, 2772 02:11:44,190 --> 02:11:46,732 loop all the way to the end of the file, and within the loop, 2773 02:11:46,732 --> 02:11:48,720 copy one byte from old to new. 2774 02:11:48,720 --> 02:11:49,960 So how do I do that? 2775 02:11:49,960 --> 02:11:52,860 I used fprintf last time to write text. 2776 02:11:52,860 --> 02:11:55,740 This time, I'm going to use a different function as follows. 2777 02:11:55,740 --> 02:11:58,912 While there are bytes to read from the file, 2778 02:11:58,912 --> 02:12:01,620 and this one's going to be a mouthful, so let me just type it out 2779 02:12:01,620 --> 02:12:02,880 and then I'll explain it. 2780 02:12:02,880 --> 02:12:05,770 2781 02:12:05,770 --> 02:12:11,770 While that line is true, go ahead and write 2782 02:12:11,770 --> 02:12:15,370 this line, which is similarly a mouthful, so I'll type it first 2783 02:12:15,370 --> 02:12:17,410 and then explain what it does. 2784 02:12:17,410 --> 02:12:20,270 Then I'm going to close destination. 2785 02:12:20,270 --> 02:12:20,770 Whoops. 2786 02:12:20,770 --> 02:12:25,150 Then I'm going to close source, and I claim, if I haven't messed anything up, 2787 02:12:25,150 --> 02:12:28,130 this will now copy files for me. 2788 02:12:28,130 --> 02:12:28,750 How? 2789 02:12:28,750 --> 02:12:31,510 So this is indeed a mouthful, but there's a function called fread, 2790 02:12:31,510 --> 02:12:34,660 whose purpose in life is to read one or more bytes for you. 2791 02:12:34,660 --> 02:12:35,930 How does it work? 2792 02:12:35,930 --> 02:12:40,210 Well, just like swap, just like scanf, you 2793 02:12:40,210 --> 02:12:43,040 have to tell it where to load those bytes in memory. 2794 02:12:43,040 --> 02:12:45,250 So if I want to put them in the byte called b, 2795 02:12:45,250 --> 02:12:47,470 I can't just say b because that's passed by value. 2796 02:12:47,470 --> 02:12:48,800 I need to pass by reference. 2797 02:12:48,800 --> 02:12:51,520 So I say the address of b is where I want you to put 2798 02:12:51,520 --> 02:12:53,620 one byte from the file at a time. 2799 02:12:53,620 --> 02:12:54,910 How big is a byte? 2800 02:12:54,910 --> 02:12:58,180 Technically, I could just say one because we all know how big a byte is. 2801 02:12:58,180 --> 02:13:01,240 But I'm just going to be super proper and generalize this as size of b 2802 02:13:01,240 --> 02:13:03,740 so it just figures it out for me, just in case we ever 2803 02:13:03,740 --> 02:13:05,390 do more than one bite at a time. 2804 02:13:05,390 --> 02:13:07,700 How many bytes do I want to copy at a time? 2805 02:13:07,700 --> 02:13:09,530 One, just to keep it simple. 2806 02:13:09,530 --> 02:13:12,650 And where do I want to read those bytes from? 2807 02:13:12,650 --> 02:13:14,450 The source file. 2808 02:13:14,450 --> 02:13:17,450 fread, if you read the documentation, just tells you 2809 02:13:17,450 --> 02:13:20,090 how many bytes were successfully read. 2810 02:13:20,090 --> 02:13:22,460 Logically, it should either be one was read, 2811 02:13:22,460 --> 02:13:25,380 or zero were read, based on what I'm asking it to do. 2812 02:13:25,380 --> 02:13:28,800 I'm asking it to read one at a time, so it's either going to succeed or fail. 2813 02:13:28,800 --> 02:13:31,218 So I want to do this for as long as it succeeds 2814 02:13:31,218 --> 02:13:34,010 because it's going to succeed until it gets to the end of the file, 2815 02:13:34,010 --> 02:13:38,210 and then there's no more bytes to read, at which point it will return zero. 2816 02:13:38,210 --> 02:13:42,470 So now, I do the opposite with fwrite, and it's almost the same line. 2817 02:13:42,470 --> 02:13:44,930 Where do I want to write that byte? 2818 02:13:44,930 --> 02:13:47,600 Well, first, I tell fwrite where to find the byte, 2819 02:13:47,600 --> 02:13:49,700 go there, and get the byte that was copied. 2820 02:13:49,700 --> 02:13:53,060 It's this size, which is going to be one, but I did it generally. 2821 02:13:53,060 --> 02:13:54,770 One bite at a time, please. 2822 02:13:54,770 --> 02:13:57,540 And write it to the destination file. 2823 02:13:57,540 --> 02:14:00,860 So if I now open up my terminal window, let me first 2824 02:14:00,860 --> 02:14:04,100 make CP to create my own copy program. 2825 02:14:04,100 --> 02:14:06,470 Let me actually open an image I came with today. 2826 02:14:06,470 --> 02:14:08,840 Here's a happy cat from the internet. 2827 02:14:08,840 --> 02:14:10,760 And that's going to be my original image. 2828 02:14:10,760 --> 02:14:12,890 Let me now go ahead and run this. 2829 02:14:12,890 --> 02:14:14,180 Dot slash CP. 2830 02:14:14,180 --> 02:14:16,528 I have to run dot slash because I want my version of CP, 2831 02:14:16,528 --> 02:14:17,945 not the one that comes with Linux. 2832 02:14:17,945 --> 02:14:22,940 So dot slash CP, cat.jpeg, and let's call it maybe 2833 02:14:22,940 --> 02:14:26,210 my backup cat, just in case I ever mess up the original. 2834 02:14:26,210 --> 02:14:27,290 Enter. 2835 02:14:27,290 --> 02:14:29,180 Seems to work OK. 2836 02:14:29,180 --> 02:14:33,770 When I run now code of backup dot jpeg to open the copy, 2837 02:14:33,770 --> 02:14:36,230 there is that same happy cat. 2838 02:14:36,230 --> 02:14:39,350 So it's very low level manipulation, but it all 2839 02:14:39,350 --> 02:14:42,230 results from my now having the power to express myself 2840 02:14:42,230 --> 02:14:46,070 in terms of locations and memory using pointers, understanding 2841 02:14:46,070 --> 02:14:50,090 that strings and now files are really just abstractions on top of these lower 2842 02:14:50,090 --> 02:14:50,960 level details. 2843 02:14:50,960 --> 02:14:54,950 And from all of that is going to come some pretty powerful functionality. 2844 02:14:54,950 --> 02:14:58,620 In fact, among the things that you can now do, as you'll soon see, 2845 02:14:58,620 --> 02:15:02,270 is manipulate at least simple files, known as bitmap files. 2846 02:15:02,270 --> 02:15:06,980 So BMP is bitmap file, and it essentially implements images exactly 2847 02:15:06,980 --> 02:15:13,220 as we began today, as just a map of bits, a grid, xy coordinates of grids, 2848 02:15:13,220 --> 02:15:15,950 each of which represents a pixel coordinate. 2849 02:15:15,950 --> 02:15:20,600 A bitmap is a type of file with a dot BMP file extension on a computer 2850 02:15:20,600 --> 02:15:22,440 that stores images just like that. 2851 02:15:22,440 --> 02:15:25,730 And now that you have the ability to not only think about images in this way, 2852 02:15:25,730 --> 02:15:27,860 but write code that manipulates images, you 2853 02:15:27,860 --> 02:15:31,130 can do powerful things all on Instagram, and TikTok, and Snapchat, 2854 02:15:31,130 --> 02:15:32,330 like filters nowadays. 2855 02:15:32,330 --> 02:15:35,540 So for instance, here is an image of the bridge, 2856 02:15:35,540 --> 02:15:37,340 the Weeks Bridge across the river. 2857 02:15:37,340 --> 02:15:41,840 Here is a black and white filter that we've applied by writing some C code, 2858 02:15:41,840 --> 02:15:45,380 as you soon will, to change it from colorful to black and white. 2859 02:15:45,380 --> 02:15:48,380 Here's the original that you might see every day. 2860 02:15:48,380 --> 02:15:50,330 Here, meanwhile, is a reflection thereof. 2861 02:15:50,330 --> 02:15:52,550 If you've ever flipped an image around on the x-axis, 2862 02:15:52,550 --> 02:15:55,467 this can actually rotate the image, even though this is the other side 2863 02:15:55,467 --> 02:15:56,660 of the bridge over there. 2864 02:15:56,660 --> 02:15:58,970 Meanwhile, here is a blurred version. 2865 02:15:58,970 --> 02:16:01,490 If it looks a little blurry, that's deliberate 2866 02:16:01,490 --> 02:16:05,390 because we've essentially smudged all of the values by looking at every pixel, 2867 02:16:05,390 --> 02:16:07,820 looking up, down, left, and right, and kind of blurring 2868 02:16:07,820 --> 02:16:09,590 the effect to give it this effect here. 2869 02:16:09,590 --> 02:16:11,518 Here is what's called edge detection, whereby 2870 02:16:11,518 --> 02:16:13,310 if you're feeling more comfortable, you can 2871 02:16:13,310 --> 02:16:16,070 write code that looks at these individual pixels, 2872 02:16:16,070 --> 02:16:19,910 tries to figure out where the edges are, just like a fancy computer might, 2873 02:16:19,910 --> 02:16:21,797 and then colorize it in this way, as well. 2874 02:16:21,797 --> 02:16:24,380 And you'll be able to do all of that because images like these 2875 02:16:24,380 --> 02:16:27,555 are just grids with coordinates with lots and lots of pixels. 2876 02:16:27,555 --> 02:16:29,930 So what started quite simply now is going to be something 2877 02:16:29,930 --> 02:16:32,180 you now have complete control over, now that we've 2878 02:16:32,180 --> 02:16:33,770 taken off these training wheels. 2879 02:16:33,770 --> 02:16:35,540 And it's cultural within computer science 2880 02:16:35,540 --> 02:16:37,590 to understand geek humor like this. 2881 02:16:37,590 --> 02:16:41,270 And so the last thing we'll do today is give you 2882 02:16:41,270 --> 02:16:45,680 this joke to end on, which for better or for worse, should now make sense. 2883 02:16:45,680 --> 02:16:48,799 And those chuckles will suffice. 2884 02:16:48,799 --> 02:16:51,250 This was CS50. 2885 02:16:51,250 --> 02:17:22,000