1 00:00:00,000 --> 00:00:11,242 >> [MUSIC PLAYING] 2 00:00:11,242 --> 00:00:16,630 >> DAVID J. MALAN: All right this is CS50 and this is the start of week five. 3 00:00:16,630 --> 00:00:21,480 So today, underneath your seat cushions, you will not find anything. 4 00:00:21,480 --> 00:00:24,790 But above, you should find these, a little token of our appreciation for 5 00:00:24,790 --> 00:00:26,970 all of the work that you put into the Game of Fifteen. 6 00:00:26,970 --> 00:00:30,290 Simply remove the little circle on the bottom to begin playing for the 7 00:00:30,290 --> 00:00:31,680 remainder of class. 8 00:00:31,680 --> 00:00:38,930 >> So recall that, or know that problem set four, which went out this weekend, 9 00:00:38,930 --> 00:00:40,340 involves writing another game. 10 00:00:40,340 --> 00:00:43,740 But this time it involves using an actual graphical user interface, not a 11 00:00:43,740 --> 00:00:46,310 textual interface like Game of Fifteen was. 12 00:00:46,310 --> 00:00:50,210 And the game that lies ahead of you, if you've not yet seen this next, 13 00:00:50,210 --> 00:00:52,310 looks a little something like this. 14 00:00:52,310 --> 00:00:55,170 I'm going to go into my terminal window here in GDB. 15 00:00:55,170 --> 00:00:58,600 And I'm going to go ahead and run the staff solution, which you can access 16 00:00:58,600 --> 00:01:01,010 after running update 50 as usual. 17 00:01:01,010 --> 00:01:04,090 >> But I'm going to put it into a little secret mode, a little Easter egg, 18 00:01:04,090 --> 00:01:08,480 so-called God mode, by putting God in argv1. 19 00:01:08,480 --> 00:01:12,920 And I have to follow my own directions, running it in my own 20 00:01:12,920 --> 00:01:14,220 problem set directory. 21 00:01:14,220 --> 00:01:19,190 So now you see a complete version of the game of Breakout. 22 00:01:19,190 --> 00:01:21,090 In fact, this is no-hands mode. 23 00:01:21,090 --> 00:01:24,850 So it's actually-- 24 00:01:24,850 --> 00:01:26,470 wowed though you might be-- 25 00:01:26,470 --> 00:01:30,850 pretty trivial to implement God mode in Breakout, unlike Game of Fifteen, 26 00:01:30,850 --> 00:01:33,590 which some of you might have tackled for the hacker edition. 27 00:01:33,590 --> 00:01:37,890 >> In Breakout it suffices in God mode to simply do what, 28 00:01:37,890 --> 00:01:41,220 intuitively with the paddle? 29 00:01:41,220 --> 00:01:45,630 Just make it equal to whatever the horizontal position is of the ball. 30 00:01:45,630 --> 00:01:49,220 And so long as you do this in lockstep with the ball moving this game will 31 00:01:49,220 --> 00:01:53,100 never, ever, ever miss the ball and you'll win every time. 32 00:01:53,100 --> 00:01:55,430 >> But in this week's hacker edition there's more than just God mode. 33 00:01:55,430 --> 00:01:56,720 There's a number of other features. 34 00:01:56,720 --> 00:01:58,140 Among them, lasers. 35 00:01:58,140 --> 00:02:01,070 So that if you really get impatient you can start shooting down the bricks 36 00:02:01,070 --> 00:02:02,120 and a few others. 37 00:02:02,120 --> 00:02:04,560 And for those of you who'd like to calibrate standard versus hacker 38 00:02:04,560 --> 00:02:08,750 edition, I can see that this week's hacker edition deliberately is a 39 00:02:08,750 --> 00:02:12,830 little more doable, say, than God mode was with Game of Fifteen. 40 00:02:12,830 --> 00:02:15,300 >> So if you're looking for a stretch and you're looking for some additional fun 41 00:02:15,300 --> 00:02:18,400 features do dive in if of interest. 42 00:02:18,400 --> 00:02:21,280 Now more practically, let me point out one thing as well. 43 00:02:21,280 --> 00:02:24,780 GDB, which some of you may not have yet touched personally, which is fine. 44 00:02:24,780 --> 00:02:28,530 But now is really the time to get used to this and comfortable with this tool 45 00:02:28,530 --> 00:02:31,510 because it will make your lives much easier, truly. 46 00:02:31,510 --> 00:02:34,900 >> Per Rob's lecture on GDB a couple of weeks ago, recall 47 00:02:34,900 --> 00:02:36,810 that GDB is a debugger. 48 00:02:36,810 --> 00:02:41,230 It's a tool that lets you run your program but run it step by step, line 49 00:02:41,230 --> 00:02:45,680 by line, so that you can poke around, so that you see things happening, so 50 00:02:45,680 --> 00:02:47,310 that you can print out values of variables. 51 00:02:47,310 --> 00:02:50,580 In short, it gives you so much more power than printDef does. 52 00:02:50,580 --> 00:02:52,900 >> Now admittedly, the interface is pretty arcane. 53 00:02:52,900 --> 00:02:55,180 Black and white textual interface for the most part. 54 00:02:55,180 --> 00:02:57,400 The commands are somewhat tough to remember at first. 55 00:02:57,400 --> 00:03:01,230 But even though it might take you half an hour, an hour, to put that upfront 56 00:03:01,230 --> 00:03:02,940 investment of time into it, trust me. 57 00:03:02,940 --> 00:03:06,440 Certainly by semester's end it will save you an order of magnitude more 58 00:03:06,440 --> 00:03:07,600 time than that. 59 00:03:07,600 --> 00:03:09,200 >> So early in the week dive in. 60 00:03:09,200 --> 00:03:13,200 And in terms of Breakout, know that you can do this so long as you have 61 00:03:13,200 --> 00:03:18,230 the distribution code or your own code in progress in your Pst4 directory. 62 00:03:18,230 --> 00:03:21,680 Know that you can run gdb./breakout. 63 00:03:21,680 --> 00:03:23,490 >> This is going to open up a window like this. 64 00:03:23,490 --> 00:03:25,530 Let me give myself more of a terminal window. 65 00:03:25,530 --> 00:03:27,770 And then what I'm going to go ahead and do, it's not just run it. 66 00:03:27,770 --> 00:03:30,690 I'm going to first set a break point recall, which allows you to pause 67 00:03:30,690 --> 00:03:32,500 execution at a particular place. 68 00:03:32,500 --> 00:03:35,750 >> Just to keep things simple I'm going to break at line one just by typing 69 00:03:35,750 --> 00:03:37,000 the number one. 70 00:03:37,000 --> 00:03:40,080 71 00:03:40,080 --> 00:03:43,250 Let me actually re-open this window because it's getting a 72 00:03:43,250 --> 00:03:45,700 little small there. 73 00:03:45,700 --> 00:03:53,270 So what I'm now going to do here is if I open up my terminal window. 74 00:03:53,270 --> 00:03:53,910 Come on, there we go. 75 00:03:53,910 --> 00:03:59,850 >> So now if I go back to dropbox, Pst4 and run gdb./breakout enter, notice 76 00:03:59,850 --> 00:04:02,600 I'm going to break one to set a break point at line one. 77 00:04:02,600 --> 00:04:04,840 And now I'm going to go ahead and type run. 78 00:04:04,840 --> 00:04:07,370 And when I do, notice nothing seems to happen. 79 00:04:07,370 --> 00:04:08,120 >> There's no pop up. 80 00:04:08,120 --> 00:04:09,790 There's no graphical user interface yet. 81 00:04:09,790 --> 00:04:13,340 But that's understandable because I'm literally at line one in my program. 82 00:04:13,340 --> 00:04:17,110 And notice that I've fast forwarded, specifically now to 62, because all 83 00:04:17,110 --> 00:04:20,600 the stuff at the top of this file is things like comments and constants and 84 00:04:20,600 --> 00:04:22,460 uninteresting stuff for now. 85 00:04:22,460 --> 00:04:25,840 >> So now I'm inside of main, it seems, at line 62. 86 00:04:25,840 --> 00:04:27,960 And this is just the distribution code, recall. 87 00:04:27,960 --> 00:04:33,810 If I open this up by going, similarly, into my drop box directory into Pst4, 88 00:04:33,810 --> 00:04:35,450 into breakout.c. 89 00:04:35,450 --> 00:04:40,670 And if I scroll down and down and down, and let me go ahead and turn on 90 00:04:40,670 --> 00:04:44,990 my line numbers. 91 00:04:44,990 --> 00:04:50,300 >> What I'll see, if I scroll down to line 62, is exactly the line that 92 00:04:50,300 --> 00:04:50,910 we've paused on. 93 00:04:50,910 --> 00:04:53,720 So this line here, 62, is where we're about to be. 94 00:04:53,720 --> 00:04:57,470 So now in GDB, if I go ahead and type now next, enter it's going to 95 00:04:57,470 --> 00:04:58,450 execute that line. 96 00:04:58,450 --> 00:05:00,610 And voila, we have the so-called g window. 97 00:05:00,610 --> 00:05:02,800 If unfamiliar with what a GWindow is, not to worry. 98 00:05:02,800 --> 00:05:05,740 The spec will introduce you to it, as well as a number of walkthrough videos 99 00:05:05,740 --> 00:05:06,830 embedded in the spec. 100 00:05:06,830 --> 00:05:08,610 >> But now let's make this a little more interesting. 101 00:05:08,610 --> 00:05:10,960 Let me move this window over to the side a little bit. 102 00:05:10,960 --> 00:05:13,480 Let me make the window a little bigger so I can see more. 103 00:05:13,480 --> 00:05:16,140 >> And now let me go ahead and do next again. 104 00:05:16,140 --> 00:05:17,550 And there are my bricks. 105 00:05:17,550 --> 00:05:20,490 If I type next again now I see the ball. 106 00:05:20,490 --> 00:05:23,520 And if I type next again now I see the paddle. 107 00:05:23,520 --> 00:05:26,690 >> And fortunately this gedit is not really cooperating by showing me 108 00:05:26,690 --> 00:05:27,660 everything I want. 109 00:05:27,660 --> 00:05:30,820 But now if I do next again, next again, I'm just 110 00:05:30,820 --> 00:05:32,260 declaring some variables. 111 00:05:32,260 --> 00:05:34,750 And I can print any one of these guys out. 112 00:05:34,750 --> 00:05:37,170 Print bricks, prints lives. 113 00:05:37,170 --> 00:05:39,910 >> And now if I continue to do next, notice that I'll be 114 00:05:39,910 --> 00:05:40,870 inside of that loop. 115 00:05:40,870 --> 00:05:43,380 But the code is going to execute exactly as I expect. 116 00:05:43,380 --> 00:05:45,810 So when I hit this function, Wait for Click, it's going to do 117 00:05:45,810 --> 00:05:46,830 it literally that. 118 00:05:46,830 --> 00:05:48,870 So I seemed to have lost control over the program. 119 00:05:48,870 --> 00:05:50,480 >> GDB is not giving me another prompt. 120 00:05:50,480 --> 00:05:51,500 But not to worry. 121 00:05:51,500 --> 00:05:53,720 Go to my game, click somewhere. 122 00:05:53,720 --> 00:05:56,270 >> And voila, now it proceeds to line 86. 123 00:05:56,270 --> 00:05:59,460 So again, it's invaluable, ultimately, for debugging problems. 124 00:05:59,460 --> 00:06:03,050 Because you can literally step through your code, print things out and much, 125 00:06:03,050 --> 00:06:03,640 much, more. 126 00:06:03,640 --> 00:06:07,210 But for now, those tools alone should get you pretty far. 127 00:06:07,210 --> 00:06:10,050 >> So we're, of course, taking a look at Graphics now, all of a sudden. 128 00:06:10,050 --> 00:06:12,350 And now our world gets a little more interesting. 129 00:06:12,350 --> 00:06:15,680 And you know, perhaps, from some of the videos online that we have these 130 00:06:15,680 --> 00:06:18,280 shorts that you've been watching as part of problem sets. 131 00:06:18,280 --> 00:06:20,460 >> And they've been shot, deliberately, against a white backdrop. 132 00:06:20,460 --> 00:06:23,380 And some of them have the teaching Fellows drawing some text on the 133 00:06:23,380 --> 00:06:25,490 screen that's overlaid on the side of them. 134 00:06:25,490 --> 00:06:27,760 But of course, this isn't all that interesting in the real world. 135 00:06:27,760 --> 00:06:30,520 This is just a lecture hall with a big white screen and a backdrop. 136 00:06:30,520 --> 00:06:33,330 And our amazing production team sort of makes everything look beautiful 137 00:06:33,330 --> 00:06:36,620 after the fact by cropping out or overlaying anything 138 00:06:36,620 --> 00:06:37,840 we do or don't want. 139 00:06:37,840 --> 00:06:41,560 >> Now just to motivate this week and really, where you can go, ultimately, 140 00:06:41,560 --> 00:06:42,560 with computer science. 141 00:06:42,560 --> 00:06:44,260 Not just after problem set four. 142 00:06:44,260 --> 00:06:48,240 But after another course or an entire curriculum it's amazing what you can 143 00:06:48,240 --> 00:06:51,090 do these days in terms of graphics in particular. 144 00:06:51,090 --> 00:06:53,440 >> Some of you might have seen this flowing around online. 145 00:06:53,440 --> 00:06:56,240 But I thought I'd show you, for just a couple of minutes, a glimpse of what 146 00:06:56,240 --> 00:07:01,890 computer technology and what CGI, computer graphics can do these days 147 00:07:01,890 --> 00:07:04,510 with a familiar song and perhaps movie. 148 00:07:04,510 --> 00:07:05,760 >> [MUSIC - LANA DEL RAY, "YOUNG AND BEAUTIFUL] 149 00:07:05,760 --> 00:10:50,270 150 00:10:50,270 --> 00:10:52,470 >> SPEAKER 1: It's just a little bit amazing, perhaps, just how 151 00:10:52,470 --> 00:10:52,857 omnipresent-- 152 00:10:52,857 --> 00:10:57,040 >> [APPLAUSE] 153 00:10:57,040 --> 00:10:59,230 >> SPEAKER 1: I just downloaded it. 154 00:10:59,230 --> 00:11:02,920 But it's really amazing, I think, just how omnipresent software and code and 155 00:11:02,920 --> 00:11:04,230 tools like this really are. 156 00:11:04,230 --> 00:11:07,685 So that's a taste of the direction in which you can go. 157 00:11:07,685 --> 00:11:10,620 Oh, no more Appliance today. 158 00:11:10,620 --> 00:11:14,640 Well, that's actually tragic timing given the point I just tried to make. 159 00:11:14,640 --> 00:11:18,670 >> All right, so let's launch Fusion again. 160 00:11:18,670 --> 00:11:20,800 Remind me later. 161 00:11:20,800 --> 00:11:24,190 All right, and you should have got an email as an aside if you did get a 162 00:11:24,190 --> 00:11:25,460 notice like that. 163 00:11:25,460 --> 00:11:29,940 All right, so recall that last week we started to peel back this 164 00:11:29,940 --> 00:11:31,380 later known as string. 165 00:11:31,380 --> 00:11:34,700 >> string recalls a data type that's declared in the CS50 library. 166 00:11:34,700 --> 00:11:37,740 And it's part of the training wheels that will now begin to take off. 167 00:11:37,740 --> 00:11:41,280 It was a useful concept early on. 168 00:11:41,280 --> 00:11:43,750 But now it's going to get more interesting and more powerful to 169 00:11:43,750 --> 00:11:48,330 actually see that underneath the hood, a string is just what, did we said? 170 00:11:48,330 --> 00:11:50,500 >> Yeah, so it's a so-called char*. 171 00:11:50,500 --> 00:11:53,860 And the * there denotes that there's some kind of address involved. 172 00:11:53,860 --> 00:11:58,690 And so when you say char* you just mean a variable whose data type is a 173 00:11:58,690 --> 00:11:59,290 pointer now. 174 00:11:59,290 --> 00:12:01,770 The fact that there's the star there just means that you are declaring a 175 00:12:01,770 --> 00:12:03,020 so-called pointer. 176 00:12:03,020 --> 00:12:06,220 And that pointer is going to apparently store the address of, of 177 00:12:06,220 --> 00:12:07,810 course, a char. 178 00:12:07,810 --> 00:12:08,960 >> Now why does this make sense? 179 00:12:08,960 --> 00:12:11,200 Well, what is a string underneath the hood? 180 00:12:11,200 --> 00:12:15,130 Well, for some time we've been saying that a string underneath the hood is 181 00:12:15,130 --> 00:12:18,460 just h-e-l-l-o, for instance. 182 00:12:18,460 --> 00:12:21,585 >> But we've talked about this as being, essentially, an array. 183 00:12:21,585 --> 00:12:25,410 And an array would then look a little more like this, with each of these 184 00:12:25,410 --> 00:12:26,460 taking up a bite. 185 00:12:26,460 --> 00:12:28,710 And then we've said that there's something special back here, the 186 00:12:28,710 --> 00:12:31,270 backslash 0, or null terminator. 187 00:12:31,270 --> 00:12:35,230 >> So all this time, this here has been a string. 188 00:12:35,230 --> 00:12:38,320 But really, a string is actually an address. 189 00:12:38,320 --> 00:12:43,210 And addresses, as we'll see, are often prefixed with 0x by convention. 190 00:12:43,210 --> 00:12:44,540 What does 0x denote? 191 00:12:44,540 --> 00:12:45,970 Does anyone know? 192 00:12:45,970 --> 00:12:47,320 >> So it just means hexadecimal. 193 00:12:47,320 --> 00:12:52,360 So you might recall, actually, from Pst 1, I believe, one of the warm-up 194 00:12:52,360 --> 00:12:55,740 questions actually asked about hexadecimal notation in addition to 195 00:12:55,740 --> 00:12:57,100 binary and decimal. 196 00:12:57,100 --> 00:13:00,460 And the motivation here is that with hexadecimal you have 16 197 00:13:00,460 --> 00:13:01,770 digits at your disposal. 198 00:13:01,770 --> 00:13:07,900 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, followed by a, b, c, d, e, f. 199 00:13:07,900 --> 00:13:10,430 >> And if you count all those up, you get a total of 16. 200 00:13:10,430 --> 00:13:13,200 So this is in contrast with decimal, where we have 10 201 00:13:13,200 --> 00:13:14,690 digits, 0 through nine. 202 00:13:14,690 --> 00:13:17,750 It's in contrast with binary where we just have 0 and 1. 203 00:13:17,750 --> 00:13:21,450 >> But at the end of the day you can just represent the same numbers, but 204 00:13:21,450 --> 00:13:22,500 somewhat differently. 205 00:13:22,500 --> 00:13:25,840 And hexadecimal is common because as it turns out-- and we'll see this 206 00:13:25,840 --> 00:13:28,790 later in the course-- even when we get to web programming in the context of 207 00:13:28,790 --> 00:13:32,100 HTML and color codes, hexadecimal is nice. 208 00:13:32,100 --> 00:13:36,390 Because each digit, turns out, represents four bits perfectly. 209 00:13:36,390 --> 00:13:39,280 So it just kind of lines up nicely as we'll eventually see. 210 00:13:39,280 --> 00:13:44,720 So this might be Ox123 or something like that, denoting address 123 211 00:13:44,720 --> 00:13:47,050 somewhere inside of my computer's memory. 212 00:13:47,050 --> 00:13:50,600 >> But of course, some problems arise because of this underlying 213 00:13:50,600 --> 00:13:51,520 implementation. 214 00:13:51,520 --> 00:13:55,930 And recall that I took a stab at implementing a function like this-- 215 00:13:55,930 --> 00:14:00,260 compare dash 0 dot c last week, that even though it looked like it was 216 00:14:00,260 --> 00:14:04,270 right, it simply didn't compare two strings correctly. 217 00:14:04,270 --> 00:14:07,470 >> I've thrown away main, and I've thrown away the comments just to focus in on 218 00:14:07,470 --> 00:14:08,970 the code that's of interest here. 219 00:14:08,970 --> 00:14:10,660 And it's in red because it's buggy. 220 00:14:10,660 --> 00:14:11,670 For what reason? 221 00:14:11,670 --> 00:14:15,890 >> Well, at the top there when I declared a string, what was really going on 222 00:14:15,890 --> 00:14:17,260 underneath the hood? 223 00:14:17,260 --> 00:14:19,530 Well, let me go over to the screen here and draw that. 224 00:14:19,530 --> 00:14:23,230 So I declared, again, string s GetString. 225 00:14:23,230 --> 00:14:26,640 >> So I'm going to go ahead now and draw s for what it really is. 226 00:14:26,640 --> 00:14:28,590 It's going to be a square here. 227 00:14:28,590 --> 00:14:30,490 And I'm going to claim that that's 32 bits. 228 00:14:30,490 --> 00:14:32,890 At least it usually is, at least on the CS50 229 00:14:32,890 --> 00:14:34,520 appliance in a lot of computers. 230 00:14:34,520 --> 00:14:35,980 I'm going to call it s. 231 00:14:35,980 --> 00:14:39,070 >> But now recall that we called GetString. 232 00:14:39,070 --> 00:14:41,430 So GetString returns, of course, a string. 233 00:14:41,430 --> 00:14:45,790 If the user types in h-e-l-l-o enter the string hello gets returned. 234 00:14:45,790 --> 00:14:51,010 And that string, as we just said, ends up somewhere in your computer's memory 235 00:14:51,010 --> 00:14:53,240 with a backslash 0 at the end. 236 00:14:53,240 --> 00:14:56,650 I'll draw this like the array-- or contiguous block of characters-- 237 00:14:56,650 --> 00:14:58,330 that it actually is. 238 00:14:58,330 --> 00:15:01,790 >> And now, what is GetString actually returning? 239 00:15:01,790 --> 00:15:04,340 What has GetString been returning all of this time? 240 00:15:04,340 --> 00:15:07,520 Well, we say, in weeks prior, it returns a string. 241 00:15:07,520 --> 00:15:10,250 But more technically, what does GetString return apparently? 242 00:15:10,250 --> 00:15:11,610 >> AUDIENCE: An address. 243 00:15:11,610 --> 00:15:12,600 >> SPEAKER 1: An address. 244 00:15:12,600 --> 00:15:16,630 Specifically it returns the address of the very first bite, whatever it is. 245 00:15:16,630 --> 00:15:18,830 I just keep using one, two, three because it's convenient. 246 00:15:18,830 --> 00:15:21,380 >> It returns the address of the first character in the string. 247 00:15:21,380 --> 00:15:23,510 And we said last week that that is sufficient. 248 00:15:23,510 --> 00:15:26,710 Because we can always figure out where the end of the string just by 249 00:15:26,710 --> 00:15:30,150 iterating over it, maybe, with a for loop or a while loop or something like 250 00:15:30,150 --> 00:15:34,990 that, just looking for "backslash 0", the special sentinel character. 251 00:15:34,990 --> 00:15:37,220 >> And then we know that the string happens to be of length-- 252 00:15:37,220 --> 00:15:37,980 in this case-- 253 00:15:37,980 --> 00:15:38,670 five. 254 00:15:38,670 --> 00:15:43,800 So technically what GetString does is it returns Ox123 in this case. 255 00:15:43,800 --> 00:15:53,670 And technically what then happens is that we store, inside of s, Ox123. 256 00:15:53,670 --> 00:15:56,460 At the end of the day, even though this is new concept, pointers, they're 257 00:15:56,460 --> 00:15:57,350 just variables. 258 00:15:57,350 --> 00:16:00,440 But they happen to store bits that collectively represent an address. 259 00:16:00,440 --> 00:16:03,700 So technically all they gets stored in s is Ox123. 260 00:16:03,700 --> 00:16:04,680 >> But we as humans-- 261 00:16:04,680 --> 00:16:06,020 including today onward-- 262 00:16:06,020 --> 00:16:09,290 are really not going to care, typically, what the actual address is 263 00:16:09,290 --> 00:16:10,520 of some chunk of memory. 264 00:16:10,520 --> 00:16:14,040 It's just to low level of detail to be intellectually interesting. 265 00:16:14,040 --> 00:16:15,440 So I'm going to undo this. 266 00:16:15,440 --> 00:16:19,810 And instead, more high level, just say that when we're talking about pointers 267 00:16:19,810 --> 00:16:22,170 I'm going to just draw more user-friendly arrow that conveys the 268 00:16:22,170 --> 00:16:26,060 same idea and abstracts away the particulars of what the actual 269 00:16:26,060 --> 00:16:27,700 underlying address is. 270 00:16:27,700 --> 00:16:33,290 >> Now if we go back to the code, what happened last week if we have string t 271 00:16:33,290 --> 00:16:34,510 equals GetString? 272 00:16:34,510 --> 00:16:38,630 Well, if I again, type in hello this time I'm going to get 273 00:16:38,630 --> 00:16:40,460 another chunk of memory. 274 00:16:40,460 --> 00:16:44,820 h-e-l-l-o backslash 0. 275 00:16:44,820 --> 00:16:48,320 >> But because I called GetString a second time-- 276 00:16:48,320 --> 00:16:51,100 and I know this from looking at the source code for GetString-- even 277 00:16:51,100 --> 00:16:54,350 though it's coincidental that hello was typed in twice, GetString is not 278 00:16:54,350 --> 00:16:55,890 going to try to optimize and be clever. 279 00:16:55,890 --> 00:16:58,550 It's just going to get another chunk of memory from the computer, which is 280 00:16:58,550 --> 00:16:59,640 going to be at another address. 281 00:16:59,640 --> 00:17:02,330 Let's arbitrarily just say 456. 282 00:17:02,330 --> 00:17:04,079 >> And then what is it going to return? 283 00:17:04,079 --> 00:17:08,030 It's going to return 456 and store it in t. 284 00:17:08,030 --> 00:17:12,010 So what is really going on, on the left-hand side is I have another chunk 285 00:17:12,010 --> 00:17:14,260 of memory, 32 bits typically. 286 00:17:14,260 --> 00:17:16,720 And in there is going to go Ox456. 287 00:17:16,720 --> 00:17:20,140 But again, I'm not interested in these particular numbers anymore. 288 00:17:20,140 --> 00:17:23,069 I'm just going to abstractly draw it as an arrow. 289 00:17:23,069 --> 00:17:25,202 >> So this is now a new explanation. 290 00:17:25,202 --> 00:17:28,735 But it's the same exact idea that's been happening all this time. 291 00:17:28,735 --> 00:17:33,150 And so the reason then, that this first version of compare was buggy 292 00:17:33,150 --> 00:17:34,480 last week is why? 293 00:17:34,480 --> 00:17:38,000 When you do if s equals equals t what are you truly 294 00:17:38,000 --> 00:17:40,550 underneath the hood comparing? 295 00:17:40,550 --> 00:17:41,910 >> You're comparing the addresses. 296 00:17:41,910 --> 00:17:47,950 And just intuitively, clearly, Ox123 is not going to equal Ox456. 297 00:17:47,950 --> 00:17:49,380 Those numbers, those bits are just different. 298 00:17:49,380 --> 00:17:53,220 >> And so consistently, last week it said you type different things, even if the 299 00:17:53,220 --> 00:17:55,360 words were verbatim the same. 300 00:17:55,360 --> 00:17:58,770 So we fix this. 301 00:17:58,770 --> 00:18:00,120 In layman's terms, what was the fix? 302 00:18:00,120 --> 00:18:02,110 >> AUDIENCE: Use a function. 303 00:18:02,110 --> 00:18:02,870 >> SPEAKER 1: Use a function. 304 00:18:02,870 --> 00:18:05,190 Or stars are definitely involved, but use a function to do what? 305 00:18:05,190 --> 00:18:05,962 >> AUDIENCE: To compare the strings. 306 00:18:05,962 --> 00:18:07,390 >> SPEAKER 1: To compare the strings. 307 00:18:07,390 --> 00:18:11,030 So the fundamental problem here was that I was just considering the 308 00:18:11,030 --> 00:18:15,870 quality of strings to be defined by comparison of their addresses. 309 00:18:15,870 --> 00:18:18,540 And obviously that's just dumb now once you understand what's going on 310 00:18:18,540 --> 00:18:19,510 underneath the hood. 311 00:18:19,510 --> 00:18:23,270 To truly compare strings to see if they're equal in the way that a human 312 00:18:23,270 --> 00:18:26,680 would consider two strings to be equal we need to compare them character for 313 00:18:26,680 --> 00:18:28,070 character for character. 314 00:18:28,070 --> 00:18:30,020 >> Now I could have done this very tediously. 315 00:18:30,020 --> 00:18:32,240 But familiarly, we're using a for loop. 316 00:18:32,240 --> 00:18:36,050 And just compare s bracket i against t bracket i. 317 00:18:36,050 --> 00:18:39,590 s bracket i plus 1 against t bracket i plus 1, and so forth, inside 318 00:18:39,590 --> 00:18:40,580 some kind of loop. 319 00:18:40,580 --> 00:18:44,950 And if I spot any two characters that differ, or if I realize that ooh, s is 320 00:18:44,950 --> 00:18:48,410 shorter than t or longer than t I can immediately say false, 321 00:18:48,410 --> 00:18:49,390 they're not the same. 322 00:18:49,390 --> 00:18:55,370 >> But if I get through s and t and say same, same, same, same, same, end of 323 00:18:55,370 --> 00:18:58,520 both strings, I can say true, they are equal. 324 00:18:58,520 --> 00:19:01,040 Well, thankfully, years ago someone wrote that code for us. 325 00:19:01,040 --> 00:19:03,790 >> And they called it StrComp for string compare. 326 00:19:03,790 --> 00:19:11,900 And even though it's a little counter intuitive, StrComp returns 0 if those 327 00:19:11,900 --> 00:19:14,520 two strings, s and t are the same. 328 00:19:14,520 --> 00:19:18,090 But it returns negative value if s should come before t alphabetically or 329 00:19:18,090 --> 00:19:20,610 positive value if it should come after t alphabetically. 330 00:19:20,610 --> 00:19:24,030 >> So if you ever want to sort something, it turns out that StrComp is useful. 331 00:19:24,030 --> 00:19:26,660 Because it doesn't just say yes or no, equal or not. 332 00:19:26,660 --> 00:19:30,440 It gives you a sense of ordering like a dictionary might. 333 00:19:30,440 --> 00:19:33,770 So StrComp, s comma t equals equals 0 means that the 334 00:19:33,770 --> 00:19:35,200 strings are truly equal. 335 00:19:35,200 --> 00:19:38,680 Because whoever wrote this function years ago presumably used a for loop 336 00:19:38,680 --> 00:19:42,840 or a while loop or something like that to integrate over the characters again 337 00:19:42,840 --> 00:19:45,270 and again and again. 338 00:19:45,270 --> 00:19:47,300 >> But problem two arose here. 339 00:19:47,300 --> 00:19:48,750 This was copy0.c. 340 00:19:48,750 --> 00:19:51,680 And the two in red is because it's flawed. 341 00:19:51,680 --> 00:19:52,800 And what did we do here? 342 00:19:52,800 --> 00:19:54,310 Well, first I called GetString. 343 00:19:54,310 --> 00:19:56,255 And I stored the return value in s. 344 00:19:56,255 --> 00:20:00,260 So that's pretty much the same as this top part of the picture. 345 00:20:00,260 --> 00:20:01,490 >> But what comes after that? 346 00:20:01,490 --> 00:20:04,980 Well, let me go ahead and get rid of a whole bunch of this. 347 00:20:04,980 --> 00:20:09,650 We'll rewind in time to where we just have s, which is now consistent with 348 00:20:09,650 --> 00:20:10,940 line one up there. 349 00:20:10,940 --> 00:20:11,400 >> I check. 350 00:20:11,400 --> 00:20:13,450 If s equals equals 0. 351 00:20:13,450 --> 00:20:18,670 Now, a quick side note, when might GetString return 0? 352 00:20:18,670 --> 00:20:19,580 There's not enough memory. 353 00:20:19,580 --> 00:20:19,880 Right? 354 00:20:19,880 --> 00:20:22,310 >> It's rare that this is going to happen, certainly on a computer that's 355 00:20:22,310 --> 00:20:24,740 got hundreds of megs or even gigs of RAM. 356 00:20:24,740 --> 00:20:27,080 But it could, in theory, return 0, especially if the 357 00:20:27,080 --> 00:20:28,080 user doesn't cooperate. 358 00:20:28,080 --> 00:20:31,640 There's ways to pretend like you haven't inputted anything and trick 359 00:20:31,640 --> 00:20:34,100 GetString into returning 0 effectively. 360 00:20:34,100 --> 00:20:35,470 >> So it's going to check for that. 361 00:20:35,470 --> 00:20:39,430 Because if any of you have started to get, already, segmentation faults-- 362 00:20:39,430 --> 00:20:42,280 which has probably been a source of some frustration-- 363 00:20:42,280 --> 00:20:46,150 those are almost always the result of memory-related error. 364 00:20:46,150 --> 00:20:50,440 Somehow you messed up with regard to a pointer, even if you didn't realize 365 00:20:50,440 --> 00:20:51,530 there was a pointer. 366 00:20:51,530 --> 00:20:55,260 So you might have induced segmentation faults as early as week one using 367 00:20:55,260 --> 00:21:02,100 something like a for loop or a while loop and an array by going too far 368 00:21:02,100 --> 00:21:05,900 past the boundaries of some array that you declared, in week two in 369 00:21:05,900 --> 00:21:06,690 particular. 370 00:21:06,690 --> 00:21:09,220 >> You might have done it even in problem set four with Breakout. 371 00:21:09,220 --> 00:21:12,910 Even though you probably haven't seen any stars in the distribution code for 372 00:21:12,910 --> 00:21:17,410 Breakout, it turns out that those GRect and GOval and other such things, 373 00:21:17,410 --> 00:21:19,650 those are actually pointers underneath the hood. 374 00:21:19,650 --> 00:21:23,430 >> But Stanford, like us, sort of hides that detail at least for the libraries 375 00:21:23,430 --> 00:21:26,540 purposes, much like we do for string and char*. 376 00:21:26,540 --> 00:21:30,060 But GRect and GOval and all of those things you guys are or will be using 377 00:21:30,060 --> 00:21:32,630 this week are ultimately memory addresses. 378 00:21:32,630 --> 00:21:33,650 You just don't know it. 379 00:21:33,650 --> 00:21:37,240 >> So it's not surprising then, perhaps, that you might trip over some 380 00:21:37,240 --> 00:21:38,580 segmentation faults. 381 00:21:38,580 --> 00:21:41,290 But what's interesting here now, if after we check for 0 we do 382 00:21:41,290 --> 00:21:43,460 string t gets s. 383 00:21:43,460 --> 00:21:44,690 Well, let me declare t. 384 00:21:44,690 --> 00:21:47,730 I'm going to draw it as a square, 32 bits, call it t. 385 00:21:47,730 --> 00:21:49,740 And then I'm going to do gets s. 386 00:21:49,740 --> 00:21:51,130 >> Well, what does that mean? 387 00:21:51,130 --> 00:21:53,280 Well, it's a little hard to think about it picture wise. 388 00:21:53,280 --> 00:21:55,025 But let's think about what's inside of x? 389 00:21:55,025 --> 00:21:59,430 What's literally inside this variable? 390 00:21:59,430 --> 00:22:01,500 The value Ox123. 391 00:22:01,500 --> 00:22:05,815 >> So when I say string t gets s, that just literally means take the number 392 00:22:05,815 --> 00:22:10,070 in s, which is Ox123 and put it Ox123. 393 00:22:10,070 --> 00:22:13,740 Or pictorially, if I kind of abstract away from that detail it has the 394 00:22:13,740 --> 00:22:16,600 effect of literally doing this as well. 395 00:22:16,600 --> 00:22:22,110 >> So now, think back to last week when we proceeded to capitalist T. I 396 00:22:22,110 --> 00:22:23,800 did T bracket 0. 397 00:22:23,800 --> 00:22:27,150 Well, T bracket 0, even though it's a pointer, you can treat it as though 398 00:22:27,150 --> 00:22:29,220 it's an array, with a square bracket notation. 399 00:22:29,220 --> 00:22:31,550 >> So where is T bracket 0? 400 00:22:31,550 --> 00:22:32,990 Well, it's the h. 401 00:22:32,990 --> 00:22:36,800 And so when we use that line of code, two upper, which is in that c type.h 402 00:22:36,800 --> 00:22:38,460 header file, that's where it's declared. 403 00:22:38,460 --> 00:22:44,410 You're capitalizing this H. But of course, that's the same exact h that's 404 00:22:44,410 --> 00:22:46,540 inside of s, so to speak. 405 00:22:46,540 --> 00:22:51,930 And so now you have changed or capitalized both the original and the 406 00:22:51,930 --> 00:22:53,120 so-called copy. 407 00:22:53,120 --> 00:22:56,620 Because you didn't make a copy in the way that a human would want it to be. 408 00:22:56,620 --> 00:22:59,710 >> So what was the fix here, in copy1.c last week? 409 00:22:59,710 --> 00:23:03,070 410 00:23:03,070 --> 00:23:05,580 Functions, so we could actually copy the string. 411 00:23:05,580 --> 00:23:08,700 And fundamentally, what do we need to do in order to copy the string? 412 00:23:08,700 --> 00:23:12,070 >> Well, in this green version here I'm going to do it fairly low level. 413 00:23:12,070 --> 00:23:14,260 There are actually functions they could help with this. 414 00:23:14,260 --> 00:23:17,710 But the most basic one, and the most familiar one, at least, will soon be 415 00:23:17,710 --> 00:23:19,600 familiar to us, is the following-- 416 00:23:19,600 --> 00:23:21,910 so one on the first line of code in green now. 417 00:23:21,910 --> 00:23:23,970 >> I just rewrote s as char*. 418 00:23:23,970 --> 00:23:25,250 There's no functional difference there. 419 00:23:25,250 --> 00:23:28,790 I just threw away the CS50 library and I'm calling it what it is, a char*. 420 00:23:28,790 --> 00:23:31,640 >> Now dot, dot, dot, because there were some error checking that's not 421 00:23:31,640 --> 00:23:33,200 interesting to talk about again. 422 00:23:33,200 --> 00:23:34,710 So now t is declared. 423 00:23:34,710 --> 00:23:35,780 It too is a char*. 424 00:23:35,780 --> 00:23:38,280 So I drew a little square on the screen like before. 425 00:23:38,280 --> 00:23:41,870 >> But on the right-hand side, malloc, we said is memory allocate. 426 00:23:41,870 --> 00:23:44,130 So allocate some chunk of memory. 427 00:23:44,130 --> 00:23:48,830 And how many bytes do we actually want to allocate, does it seem? 428 00:23:48,830 --> 00:23:50,340 >> Well, the string length of s. 429 00:23:50,340 --> 00:23:52,310 So if it's hello that's going to be five. 430 00:23:52,310 --> 00:23:53,950 We'll say h-e-l-l-o. 431 00:23:53,950 --> 00:23:55,090 So five bytes. 432 00:23:55,090 --> 00:23:57,960 >> But then plus 1, why 1? 433 00:23:57,960 --> 00:23:58,830 The 0 character. 434 00:23:58,830 --> 00:24:03,640 If we don't leave room for this guy we might accidentally create a situation 435 00:24:03,640 --> 00:24:05,600 where the string is h-e-l-l-o. 436 00:24:05,600 --> 00:24:08,470 And then the next time GetString is called and I type in, for instance, 437 00:24:08,470 --> 00:24:14,020 David, D-a-v-i-d, the computer is going to think that s is actually 438 00:24:14,020 --> 00:24:18,900 h-e-l-l-o-d-a-v-i-d because there's no break in between those words. 439 00:24:18,900 --> 00:24:19,810 >> So we need that break. 440 00:24:19,810 --> 00:24:20,720 So we don't want five. 441 00:24:20,720 --> 00:24:22,100 We want six bytes. 442 00:24:22,100 --> 00:24:23,110 >> And bytes I say. 443 00:24:23,110 --> 00:24:25,220 But it's really time size of char. 444 00:24:25,220 --> 00:24:28,040 Technically char is almost always a single byte. 445 00:24:28,040 --> 00:24:31,030 >> But just to make our code portable, so to speak, so that it works on 446 00:24:31,030 --> 00:24:33,750 different computers even if they might be somewhat different underneath the 447 00:24:33,750 --> 00:24:36,590 hood, I'm going to generically say size of char so that 448 00:24:36,590 --> 00:24:37,660 my code always work. 449 00:24:37,660 --> 00:24:40,610 And I don't have to recompile it just because I upgrade my computer or use 450 00:24:40,610 --> 00:24:42,140 some different platform. 451 00:24:42,140 --> 00:24:45,300 >> So I've got 6 times the size of a char, which happens to be 1. 452 00:24:45,300 --> 00:24:47,440 So that means malloc could give me six bytes. 453 00:24:47,440 --> 00:24:49,140 What is that actually doing? 454 00:24:49,140 --> 00:24:52,810 Well, let me roll back in time here to where we are in the story. 455 00:24:52,810 --> 00:24:57,620 >> So if I go back here, I've declared a char* called t. 456 00:24:57,620 --> 00:25:00,280 I've now called malloc for six bytes. 457 00:25:00,280 --> 00:25:06,400 And now I'm going to draw those six bytes just like the array earlier. 458 00:25:06,400 --> 00:25:10,570 But I actually don't know what's inside this array. 459 00:25:10,570 --> 00:25:14,640 >> If you allocate memory it turns out that you can't trust that there's some 460 00:25:14,640 --> 00:25:15,810 known value there. 461 00:25:15,810 --> 00:25:18,400 It could have been used by something else, some other function, some other 462 00:25:18,400 --> 00:25:19,630 line of code that you wrote. 463 00:25:19,630 --> 00:25:22,870 So we'll generally call these garbage values and draw them, perhaps, as 464 00:25:22,870 --> 00:25:26,170 question marks, just indicating that we don't know what's actually there. 465 00:25:26,170 --> 00:25:30,390 And that's no big deal so long as we are smart enough to overwrite those 466 00:25:30,390 --> 00:25:34,550 garbage values with numbers or chars that we care about. 467 00:25:34,550 --> 00:25:36,340 >> So in this case what am I going to do? 468 00:25:36,340 --> 00:25:38,670 Well, my line of code next, I have four. 469 00:25:38,670 --> 00:25:41,350 int i get 0, n gets the string length of s. 470 00:25:41,350 --> 00:25:42,750 So a familiar for loop. 471 00:25:42,750 --> 00:25:45,875 I is less than or equal to n, which usually is above. 472 00:25:45,875 --> 00:25:47,500 >> But this time it's deliberate. 473 00:25:47,500 --> 00:25:51,890 I++, and then I simply do t bracket i gets s. 474 00:25:51,890 --> 00:25:56,320 Because my picture looks like this at this moment, stored in t is the 475 00:25:56,320 --> 00:25:59,530 address of that random chunk of memory whose values are unknown. 476 00:25:59,530 --> 00:26:03,030 But as soon as I do t bracket 0 that puts me here. 477 00:26:03,030 --> 00:26:07,430 >> And what ends up getting drawn there? 478 00:26:07,430 --> 00:26:08,740 We end up putting h. 479 00:26:08,740 --> 00:26:11,170 Because that's what's at s bracket 0. 480 00:26:11,170 --> 00:26:14,300 And then the same thing for e, and l, and l, and o. 481 00:26:14,300 --> 00:26:17,930 >> n, why did I go up through an equal to n? 482 00:26:17,930 --> 00:26:19,200 Because of the 0 character. 483 00:26:19,200 --> 00:26:23,580 So just to be clear, then, if I actually erase whatever these garbage 484 00:26:23,580 --> 00:26:28,870 values are and then actually draw in what I expect, this is s bracket 1, 2, 485 00:26:28,870 --> 00:26:32,440 3, 4, plus that's trailing new character. 486 00:26:32,440 --> 00:26:36,080 >> And so now if we continued past the dot, dot, dot in this correct version 487 00:26:36,080 --> 00:26:41,930 and capitalized t bracket 0 I would, of course, be capitalizing just this 488 00:26:41,930 --> 00:26:47,050 guy here, which conceptually, was ultimately the goal. 489 00:26:47,050 --> 00:26:48,040 So that's all the pointer is. 490 00:26:48,040 --> 00:26:51,430 >> And you've been using them for weeks now in the context of strings. 491 00:26:51,430 --> 00:26:53,530 But underneath the hood they're a little more complex. 492 00:26:53,530 --> 00:26:57,520 But if you think about them in this pictorial form I propose that they're 493 00:26:57,520 --> 00:27:01,720 probably not all that scary as they might first seem at first glance, 494 00:27:01,720 --> 00:27:04,730 particularly with such new syntax. 495 00:27:04,730 --> 00:27:07,290 Any questions on pointers, strings, or chars? 496 00:27:07,290 --> 00:27:07,580 Yeah? 497 00:27:07,580 --> 00:27:09,252 >> AUDIENCE: Can you go back to the [INAUDIBLE]? 498 00:27:09,252 --> 00:27:10,502 >> SPEAKER 1: Sure. 499 00:27:10,502 --> 00:27:14,058 500 00:27:14,058 --> 00:27:19,525 >> AUDIENCE: So how come in your very last line, you don't have a *t line 501 00:27:19,525 --> 00:27:21,513 and a *s in the line? 502 00:27:21,513 --> 00:27:23,004 Don't you have the reference to the-- 503 00:27:23,004 --> 00:27:24,640 >> SPEAKER 1: Ah, a really good question. 504 00:27:24,640 --> 00:27:26,800 Why don't I have a *t and a *s? 505 00:27:26,800 --> 00:27:30,340 Because briefly, last week, like in our swap function, I did say that when 506 00:27:30,340 --> 00:27:33,350 you've got a pointer the means by which you go there as we did 507 00:27:33,350 --> 00:27:36,590 physically on stage, was to actually use the star operator. 508 00:27:36,590 --> 00:27:40,570 >> It turns out that this square-bracket notation is what we'll call syntactic 509 00:27:40,570 --> 00:27:44,190 sugar, which is just a sexy way of saying it's shorthand notation for 510 00:27:44,190 --> 00:27:45,950 exactly what you're describing. 511 00:27:45,950 --> 00:27:49,385 But it's a little more intuitive. 512 00:27:49,385 --> 00:27:53,510 And at the risk of making this seem more complicated than it needs to be, 513 00:27:53,510 --> 00:27:56,990 what's really going on here is the following-- 514 00:27:56,990 --> 00:28:01,450 If I say *t that means go to the address stored in t. 515 00:28:01,450 --> 00:28:04,350 >> So literally, if t is storing the address of that h 516 00:28:04,350 --> 00:28:07,300 initially, *t means go here. 517 00:28:07,300 --> 00:28:10,730 Now, what does t bracket 0 mean? 518 00:28:10,730 --> 00:28:11,560 Same exact thing. 519 00:28:11,560 --> 00:28:13,510 It's just a little more user friendly to write. 520 00:28:13,510 --> 00:28:14,430 >> But I'm not done yet. 521 00:28:14,430 --> 00:28:17,800 I can't just say *t gets *s. 522 00:28:17,800 --> 00:28:19,440 Because what would I be doing then? 523 00:28:19,440 --> 00:28:22,950 I'd be putting h, h, h, h, h throughout the whole thing. 524 00:28:22,950 --> 00:28:22,995 Right? 525 00:28:22,995 --> 00:28:26,020 >> Because *t is go to the address in t. 526 00:28:26,020 --> 00:28:27,580 But we're inside of a loop. 527 00:28:27,580 --> 00:28:32,150 And what value am I incrementing, of course, on each iteration? 528 00:28:32,150 --> 00:28:32,690 i. 529 00:28:32,690 --> 00:28:34,590 >> But there's an opportunity here, right? 530 00:28:34,590 --> 00:28:37,870 Even though this feels like it's getting a little more sophisticated 531 00:28:37,870 --> 00:28:40,730 than the square-bracket notation we've used for some time-- 532 00:28:40,730 --> 00:28:43,840 let me undo my h change there-- 533 00:28:43,840 --> 00:28:48,870 even though this is now getting a little fancier, the basic idea, if *t 534 00:28:48,870 --> 00:28:53,630 means here and *t is just go to the address in t. 535 00:28:53,630 --> 00:28:54,990 >> But what was the address in t? 536 00:28:54,990 --> 00:28:56,850 The number we keep using? 537 00:28:56,850 --> 00:29:00,540 Like Ox456, let's bring that back just for the sake of discussion. 538 00:29:00,540 --> 00:29:05,380 Well, if I want to get at the e in t string, I just want to go to, 539 00:29:05,380 --> 00:29:06,460 essentially, 456. 540 00:29:06,460 --> 00:29:09,230 >> Or rather, 457. 541 00:29:09,230 --> 00:29:10,590 I just need to add one. 542 00:29:10,590 --> 00:29:11,790 But I can do that, right? 543 00:29:11,790 --> 00:29:14,680 Because t, even though I keep drawing it now as an arrow, it's just a 544 00:29:14,680 --> 00:29:16,570 number, Ox456. 545 00:29:16,570 --> 00:29:21,400 And if I add one to that, or more generally, if I add I to that I can 546 00:29:21,400 --> 00:29:24,350 actually get exactly where I want. 547 00:29:24,350 --> 00:29:26,260 So if I actually do this-- 548 00:29:26,260 --> 00:29:28,970 and this is what's now called pointer arithmetic-- 549 00:29:28,970 --> 00:29:30,375 I can remove this line. 550 00:29:30,375 --> 00:29:33,550 Which is, frankly, I think clearer and a little more user friendly to read. 551 00:29:33,550 --> 00:29:35,970 But this is no less correct. 552 00:29:35,970 --> 00:29:38,570 >> This line of code now is using pointer arithmetic. 553 00:29:38,570 --> 00:29:40,920 It's saying go to the following address-- 554 00:29:40,920 --> 00:29:44,670 whatever the start of t is, which is t plus i, which initially 555 00:29:44,670 --> 00:29:45,730 is 0, which is great. 556 00:29:45,730 --> 00:29:49,280 Because that means the beginning of t plus 1, plus 2, plus 3, and so forth. 557 00:29:49,280 --> 00:29:51,030 And the same deal with s. 558 00:29:51,030 --> 00:29:52,750 >> So syntactic sugar for this. 559 00:29:52,750 --> 00:29:55,900 But understanding what's really going on underneath the hood, I would argue, 560 00:29:55,900 --> 00:29:57,410 is actually useful in and of itself. 561 00:29:57,410 --> 00:30:00,620 Because it means now there's not much more magic going on 562 00:30:00,620 --> 00:30:01,620 underneath the hood. 563 00:30:01,620 --> 00:30:03,920 There aren't going to be many more layers that we can peel back for you. 564 00:30:03,920 --> 00:30:04,810 This is c. 565 00:30:04,810 --> 00:30:06,410 And this is programming. 566 00:30:06,410 --> 00:30:08,002 Really good question. 567 00:30:08,002 --> 00:30:11,570 >> All right, so this was that buggy program I was referring to earlier. 568 00:30:11,570 --> 00:30:12,650 swap was flawed. 569 00:30:12,650 --> 00:30:14,070 If did seem to work. 570 00:30:14,070 --> 00:30:17,390 Recall that just like with the milk and the orange juice-- which I started 571 00:30:17,390 --> 00:30:18,660 drinking today's demonstration. 572 00:30:18,660 --> 00:30:22,220 So just as with the orange juice and the milk, we did have to use a 573 00:30:22,220 --> 00:30:26,200 temporary variable, tmp, to hold a temporarily so that we could then 574 00:30:26,200 --> 00:30:28,820 change its value and then update b. 575 00:30:28,820 --> 00:30:32,870 >> But this function, we said, or this program in which this function was 576 00:30:32,870 --> 00:30:35,670 written was wrong and flawed, why? 577 00:30:35,670 --> 00:30:38,870 578 00:30:38,870 --> 00:30:39,090 Yes? 579 00:30:39,090 --> 00:30:42,471 >> AUDIENCE: [INAUDIBLE]. 580 00:30:42,471 --> 00:30:44,940 >> SPEAKER 1: Exactly, when you call swap-- 581 00:30:44,940 --> 00:30:47,820 or more generally, when you call most any function-- 582 00:30:47,820 --> 00:30:51,210 if the arguments to that function are primitive, so to speak, ints and chars 583 00:30:51,210 --> 00:30:56,740 and doubles and floats, things without stars, you are passing in a copy of 584 00:30:56,740 --> 00:30:57,540 the argument. 585 00:30:57,540 --> 00:31:01,580 So if x was 1 and y was 2, a is going to be 1 and b is going to be 2. 586 00:31:01,580 --> 00:31:05,250 But they're going to be different chunks of bits, different chunks of 587 00:31:05,250 --> 00:31:07,540 memory that happen to be storing identical values. 588 00:31:07,540 --> 00:31:12,160 >> So this code is super perfect at swapping a and b. 589 00:31:12,160 --> 00:31:13,850 It's no good at swapping-- 590 00:31:13,850 --> 00:31:15,290 in last week's example-- 591 00:31:15,290 --> 00:31:16,390 x and y. 592 00:31:16,390 --> 00:31:18,780 Because again, they're in the wrong scope. 593 00:31:18,780 --> 00:31:21,310 >> Now, how did we go about fixing this? 594 00:31:21,310 --> 00:31:23,140 We had to make the function look a little uglier. 595 00:31:23,140 --> 00:31:25,250 But again, consider what this just means. 596 00:31:25,250 --> 00:31:27,840 597 00:31:27,840 --> 00:31:31,500 >> And actually, let me, for consistency, change one thing so it's identical to 598 00:31:31,500 --> 00:31:33,200 what we just did. 599 00:31:33,200 --> 00:31:35,690 As I mentioned last week, it doesn't matter where it goes. 600 00:31:35,690 --> 00:31:38,120 In fact, typically you would put the star next to the variable name. 601 00:31:38,120 --> 00:31:40,750 But I think it would be a little easier to consider the * next to the 602 00:31:40,750 --> 00:31:44,910 data type as meaning it's a pointer to an int in this case. 603 00:31:44,910 --> 00:31:46,270 >> So what am I doing here? 604 00:31:46,270 --> 00:31:49,590 I'm saying don't give me an int followed by another int, 605 00:31:49,590 --> 00:31:50,810 calling them a and b. 606 00:31:50,810 --> 00:31:52,460 Give me the address of an int. 607 00:31:52,460 --> 00:31:53,960 Give me the address of another int. 608 00:31:53,960 --> 00:31:56,330 Call those addresses a and b. 609 00:31:56,330 --> 00:32:00,860 >> And then using the * notation down below, go to each of those addresses 610 00:32:00,860 --> 00:32:05,290 as needed to either get or set its value. 611 00:32:05,290 --> 00:32:07,400 But there's an exception here. 612 00:32:07,400 --> 00:32:11,130 Why do I not have a * next to tmp? 613 00:32:11,130 --> 00:32:15,070 Why do I not do this, for instance? 614 00:32:15,070 --> 00:32:19,370 It feels like I should just go all out and correct the whole thing. 615 00:32:19,370 --> 00:32:19,752 Yeah? 616 00:32:19,752 --> 00:32:21,002 >> AUDIENCE: [INAUDIBLE]. 617 00:32:21,002 --> 00:32:23,280 618 00:32:23,280 --> 00:32:25,480 >> SPEAKER 1: I haven't declared tmp as a string. 619 00:32:25,480 --> 00:32:28,830 620 00:32:28,830 --> 00:32:34,950 So this would declare, in this case, a tmp to be the address of an int. 621 00:32:34,950 --> 00:32:37,380 But that's not quite what I want, for a couple of reasons. 622 00:32:37,380 --> 00:32:38,616 >> AUDIENCE: You don't want to swap them. 623 00:32:38,616 --> 00:32:41,800 >> SPEAKER 1: Exactly, I don't want to swap anything with tmp. tmp is just 624 00:32:41,800 --> 00:32:42,790 week-one stuff. 625 00:32:42,790 --> 00:32:45,150 All I want is a variable to store some number. 626 00:32:45,150 --> 00:32:47,330 I don't even care about addresses at this moment. 627 00:32:47,330 --> 00:32:50,530 >> I just need 32 bits or so to store an int. 628 00:32:50,530 --> 00:32:56,690 And I want to put in those 32 bits whatever is not in a, so to speak, but 629 00:32:56,690 --> 00:33:01,260 what is at a, just to be more precise. 630 00:33:01,260 --> 00:33:06,420 Because if a is an address, *a means go there and get the value 1. 631 00:33:06,420 --> 00:33:10,560 For instance, in last week's example or in b's case, get the value of 2. 632 00:33:10,560 --> 00:33:11,750 >> So what's really going on? 633 00:33:11,750 --> 00:33:15,070 Let me draw a picture here that will only tease apart part of today. 634 00:33:15,070 --> 00:33:18,580 But this will continue to appear for quite some time. 635 00:33:18,580 --> 00:33:22,430 >> This, I claim, is what your computer's memory looks like when you run a 636 00:33:22,430 --> 00:33:24,060 program, any program. 637 00:33:24,060 --> 00:33:28,340 When you run a program at the very top of your computer's RAM-- so think of 638 00:33:28,340 --> 00:33:33,530 this rectangle, truly, as your computer's RAM or memory, all 101 639 00:33:33,530 --> 00:33:36,920 billion bytes of it, all two billion bytes, all two gigabytes of it, 640 00:33:36,920 --> 00:33:39,910 whatever the quantity you have is, let's draw it as a rectangle. 641 00:33:39,910 --> 00:33:43,260 And I claim that when you run a program like Microsoft Word or Chrome 642 00:33:43,260 --> 00:33:49,220 or anything like that, the bits that Microsoft or that Google wrote-- 643 00:33:49,220 --> 00:33:50,910 in the cases of those programs-- 644 00:33:50,910 --> 00:33:54,490 are loaded into your computer's memory where they can be executed more 645 00:33:54,490 --> 00:33:57,520 quickly and fed into the CPU, which is the brains of the computer. 646 00:33:57,520 --> 00:34:00,940 >> And in TAM they're stored at the very top of your program, so to speak. 647 00:34:00,940 --> 00:34:03,300 In other words, if this is a chunk of memory, when you double click on 648 00:34:03,300 --> 00:34:05,740 Microsoft Word, the bits come off the hard drive. 649 00:34:05,740 --> 00:34:06,680 They get loaded into RAM. 650 00:34:06,680 --> 00:34:10,330 And we'll shove them up at the very top of this rectangle conceptually. 651 00:34:10,330 --> 00:34:13,010 >> Well, the rest of your memory is used for different things. 652 00:34:13,010 --> 00:34:16,460 At the very top you see initialize data and uninitialize data. 653 00:34:16,460 --> 00:34:20,500 This has to do, for the most part, with constants or global variables 654 00:34:20,500 --> 00:34:21,340 that have values. 655 00:34:21,340 --> 00:34:22,980 But more on those another time. 656 00:34:22,980 --> 00:34:25,150 >> Then you have the heap, which we'll come back to. 657 00:34:25,150 --> 00:34:28,420 But at the bottom is the part that's particularly germane right now. 658 00:34:28,420 --> 00:34:30,210 It's the so-called stack. 659 00:34:30,210 --> 00:34:33,850 So just like in most any D hall here on campus, you have those trays that 660 00:34:33,850 --> 00:34:37,210 just stack on top of each other on which you can put food and whatnot. 661 00:34:37,210 --> 00:34:40,139 The stack in a computer system is very similar. 662 00:34:40,139 --> 00:34:42,679 Except whereas the tray, as we use in the dining hall, of course, is meant 663 00:34:42,679 --> 00:34:45,710 to carry things the trays or the frames-- 664 00:34:45,710 --> 00:34:49,469 as we'll call them-- in a computer's memory is used to hold 665 00:34:49,469 --> 00:34:51,610 variables and values. 666 00:34:51,610 --> 00:34:53,929 >> So what really goes on underneath the hood? 667 00:34:53,929 --> 00:34:55,820 Well, let me flip over to the screen here. 668 00:34:55,820 --> 00:34:58,370 And let's focus just on the bottom part for a moment. 669 00:34:58,370 --> 00:35:02,770 If this is the bottom portion of my computer's memory it turns out when I 670 00:35:02,770 --> 00:35:05,350 call the function main-- which happens, frankly, 671 00:35:05,350 --> 00:35:06,950 automatically for me-- 672 00:35:06,950 --> 00:35:10,510 I get a chunk of memory at the bottom of my RAM so to speak. 673 00:35:10,510 --> 00:35:13,390 And this is where main's local variables go. 674 00:35:13,390 --> 00:35:16,770 It's where argc and argv maybe go, and any variables I 675 00:35:16,770 --> 00:35:18,170 declare inside of main. 676 00:35:18,170 --> 00:35:20,260 They end up at the bottom of my computer's RAM. 677 00:35:20,260 --> 00:35:25,040 >> Now suppose that main calls a function like swap, like it did last week? 678 00:35:25,040 --> 00:35:30,620 Well, we essentially put a new tray, a new frame, onto my chunk of memory. 679 00:35:30,620 --> 00:35:34,160 And I'm going to describe this as belonging to the swap function. 680 00:35:34,160 --> 00:35:35,770 >> Now what's inside of swap? 681 00:35:35,770 --> 00:35:39,240 Well, based on last week's program and the one we just saw an excerpt from, 682 00:35:39,240 --> 00:35:46,590 inside of swap's frame, or on swap's tray, are what variables? 683 00:35:46,590 --> 00:35:47,970 Well, a and b. 684 00:35:47,970 --> 00:35:51,850 Because those were its local arguments, plus a third, tmp. 685 00:35:51,850 --> 00:35:54,470 So really, I could draw this a little more cleanly. 686 00:35:54,470 --> 00:35:56,680 Let me go ahead and undo the label. 687 00:35:56,680 --> 00:35:58,520 And let me claim that you know what? 688 00:35:58,520 --> 00:36:00,560 >> a is probably going to end up here. 689 00:36:00,560 --> 00:36:02,160 B is going to end up here. 690 00:36:02,160 --> 00:36:03,810 And tmp is going to end up here. 691 00:36:03,810 --> 00:36:05,160 Now, the ordering might be a little different. 692 00:36:05,160 --> 00:36:06,840 But conceptually this is the idea. 693 00:36:06,840 --> 00:36:11,490 >> And just collectively, this is what we'll call swap's frame, or 694 00:36:11,490 --> 00:36:12,136 dining-hall tray. 695 00:36:12,136 --> 00:36:13,150 And the same deal with main. 696 00:36:13,150 --> 00:36:14,040 But I won't redraw that. 697 00:36:14,040 --> 00:36:17,810 But that's where argc and argv and any of its local variables like x and y 698 00:36:17,810 --> 00:36:18,940 might be as well. 699 00:36:18,940 --> 00:36:22,170 >> So now consider what's really happening when you call swap. 700 00:36:22,170 --> 00:36:26,370 When you call swap, executing code like this, you're passing in, in the 701 00:36:26,370 --> 00:36:30,670 buggy version, a and b as copies of x and y. 702 00:36:30,670 --> 00:36:34,300 So if I do now draw this on the screen-- 703 00:36:34,300 --> 00:36:36,700 got to get better at this-- 704 00:36:36,700 --> 00:36:40,850 so the story I was telling to myself was in this buggy version, when we 705 00:36:40,850 --> 00:36:46,130 call swap passing in literally a and b as integers, what's really happening? 706 00:36:46,130 --> 00:36:48,250 >> Well, what's really happening is this. 707 00:36:48,250 --> 00:36:52,850 Let me go ahead and undo just to clear up some space here. 708 00:36:52,850 --> 00:36:54,720 So this is my computer's memory. 709 00:36:54,720 --> 00:36:57,510 >> So if I have, for instance-- 710 00:36:57,510 --> 00:36:58,910 actually let's do it this way-- 711 00:36:58,910 --> 00:37:02,690 if I claim that this is x, storing the value 1 just like last week. 712 00:37:02,690 --> 00:37:05,930 And this is y, storing the value 2 just like last week. 713 00:37:05,930 --> 00:37:11,370 And this is main, when I call swap, thereby giving myself access to a and 714 00:37:11,370 --> 00:37:15,150 b and tmp, I'm going to claim that this is a and this is 1. 715 00:37:15,150 --> 00:37:16,080 >> This is b. 716 00:37:16,080 --> 00:37:17,010 This is 2. 717 00:37:17,010 --> 00:37:18,370 This is called tmp. 718 00:37:18,370 --> 00:37:23,360 >> And initially, it has some garbage value until I actually store in it a, 719 00:37:23,360 --> 00:37:24,450 which is 1. 720 00:37:24,450 --> 00:37:28,320 Then I go ahead and change a to be what? 721 00:37:28,320 --> 00:37:29,720 B's value. 722 00:37:29,720 --> 00:37:31,980 >> And so now I have two here. 723 00:37:31,980 --> 00:37:34,050 And then we said b gets tmp. 724 00:37:34,050 --> 00:37:37,670 Again, just as a sanity check, the third line of code here is simply this 725 00:37:37,670 --> 00:37:39,440 one, b gets tmp. 726 00:37:39,440 --> 00:37:41,730 >> And so lastly, what do I do? 727 00:37:41,730 --> 00:37:46,800 I go ahead and change b to be whatever the value of tmp is, which is 1. 728 00:37:46,800 --> 00:37:48,390 I don't touch tmp again. 729 00:37:48,390 --> 00:37:54,100 >> But now, the problem is as soon as swap returns, because it's not handing 730 00:37:54,100 --> 00:37:57,540 back some value, there's no return statement explicitly in it. 731 00:37:57,540 --> 00:37:59,080 What's actually happening? 732 00:37:59,080 --> 00:38:03,480 Well, essentially all this memory-- 733 00:38:03,480 --> 00:38:07,410 OK, apparently the eraser likes only one finger at a time-- 734 00:38:07,410 --> 00:38:08,180 just disappears. 735 00:38:08,180 --> 00:38:10,070 >> Now in reality it's not going anywhere. 736 00:38:10,070 --> 00:38:11,810 But you can think of it now as question marks. 737 00:38:11,810 --> 00:38:14,040 Because it's no longer actually in use. 738 00:38:14,040 --> 00:38:17,470 And nothing is done with those values. 739 00:38:17,470 --> 00:38:21,920 >> So in the case of the green version of this code, what instead is being 740 00:38:21,920 --> 00:38:24,640 passed into swap? 741 00:38:24,640 --> 00:38:25,770 So addresses. 742 00:38:25,770 --> 00:38:28,520 So the address of x and the address of y. 743 00:38:28,520 --> 00:38:35,790 So if we re-tell this story one last time, and I actually draw swap again, 744 00:38:35,790 --> 00:38:44,620 but with pointers, this being a, this being b, and this being tmp, what is 745 00:38:44,620 --> 00:38:49,080 actually stored in a in this green version of my code where I'm passing 746 00:38:49,080 --> 00:38:52,110 in addresses? 747 00:38:52,110 --> 00:38:53,780 >> It's going to be a pointer to x. 748 00:38:53,780 --> 00:38:54,890 So I could draw an arrow. 749 00:38:54,890 --> 00:38:57,310 But let's use the same arbitrary example as before. 750 00:38:57,310 --> 00:39:01,220 Let's say that this is something like Ox123. 751 00:39:01,220 --> 00:39:04,970 And this is going to be Ox127 because it's four bytes away because it's an 752 00:39:04,970 --> 00:39:07,370 int, so Ox127. 753 00:39:07,370 --> 00:39:09,080 >> And again, I'm taking some liberties with the numbers. 754 00:39:09,080 --> 00:39:11,430 They're much smaller than they would actually be and in a different order. 755 00:39:11,430 --> 00:39:14,350 But that's how the picture is now different. 756 00:39:14,350 --> 00:39:19,060 >> But when I use this green code and I do int tmp get *a. 757 00:39:19,060 --> 00:39:25,010 *a means to do the following, take the address that's in a and go to it, 758 00:39:25,010 --> 00:39:26,190 which is 1. 759 00:39:26,190 --> 00:39:28,480 And that's what I then put in tmp. 760 00:39:28,480 --> 00:39:32,480 Meanwhile, in the next line of code here, *a gets b, what does that mean? 761 00:39:32,480 --> 00:39:36,910 >> Well, *a, so go here gets *b, which means go there. 762 00:39:36,910 --> 00:39:39,310 And that means put the value to there. 763 00:39:39,310 --> 00:39:43,670 Finally, the last line of code simply said *b gets tmp. 764 00:39:43,670 --> 00:39:48,900 >> So b says go there and overwrite it with tmp which, in this case, is going 765 00:39:48,900 --> 00:39:51,520 to be, again, 1. 766 00:39:51,520 --> 00:39:54,920 And this is why the green version of our code works, whereas the red 767 00:39:54,920 --> 00:39:56,010 version never did. 768 00:39:56,010 --> 00:39:59,020 It all just boils down to how the memory is managed and where it's 769 00:39:59,020 --> 00:40:02,580 actually placed in your computer's RAM. 770 00:40:02,580 --> 00:40:07,270 And for now, that's one of the things that the stack is being used for. 771 00:40:07,270 --> 00:40:09,225 >> Questions on the layout? 772 00:40:09,225 --> 00:40:10,380 On pointers? 773 00:40:10,380 --> 00:40:11,630 Or on swap? 774 00:40:11,630 --> 00:40:13,740 775 00:40:13,740 --> 00:40:17,043 >> All right, so malloc, recall, did something like this. 776 00:40:17,043 --> 00:40:18,260 This was a super simple example. 777 00:40:18,260 --> 00:40:20,550 And this was the one that Binky introduced us to, albeit quite 778 00:40:20,550 --> 00:40:21,870 quickly, at the end of class. 779 00:40:21,870 --> 00:40:24,480 Dammit, there we go again. 780 00:40:24,480 --> 00:40:28,780 >> So recall that this was the example that Binky introduced us to, albeit 781 00:40:28,780 --> 00:40:30,360 somewhat quickly at the end of class. 782 00:40:30,360 --> 00:40:33,640 And here we used malloc really for the second time. 783 00:40:33,640 --> 00:40:37,330 Because the first time we used it to create enough RAM, allocate enough RAM 784 00:40:37,330 --> 00:40:38,340 to store a string. 785 00:40:38,340 --> 00:40:40,250 >> This time Binky kept it simple. 786 00:40:40,250 --> 00:40:42,465 So it's to store just an int, apparently. 787 00:40:42,465 --> 00:40:43,510 And that's totally fine. 788 00:40:43,510 --> 00:40:46,560 It's a little weird, frankly, to use malloc to allocate one int. 789 00:40:46,560 --> 00:40:50,650 But the point of Nick's claymation was really just tell the story of what 790 00:40:50,650 --> 00:40:53,830 happens or doesn't happen when you mistreat memory. 791 00:40:53,830 --> 00:40:56,520 >> So in this case, this program did a few things. 792 00:40:56,520 --> 00:41:01,580 In the first case here, it declares a pointer called x to an int. 793 00:41:01,580 --> 00:41:04,480 It then declares a pointer called y to an int. 794 00:41:04,480 --> 00:41:06,150 It then stores in x, what? 795 00:41:06,150 --> 00:41:07,110 Someone else now. 796 00:41:07,110 --> 00:41:09,685 What gets stored in x according to the third line of this program? 797 00:41:09,685 --> 00:41:12,380 >> AUDIENCE: [INAUDIBLE]. 798 00:41:12,380 --> 00:41:14,130 >> SPEAKER 1: Well, not quite bytes, per say. 799 00:41:14,130 --> 00:41:16,760 Be more precise now. 800 00:41:16,760 --> 00:41:18,325 What gets stored in x? 801 00:41:18,325 --> 00:41:21,000 802 00:41:21,000 --> 00:41:22,060 An address, I think I heard it. 803 00:41:22,060 --> 00:41:23,570 >> So what does malloc return? 804 00:41:23,570 --> 00:41:26,030 malloc behaviorally allocates a chunk of memory. 805 00:41:26,030 --> 00:41:27,850 But how does it give you access to it? 806 00:41:27,850 --> 00:41:29,460 It returns what? 807 00:41:29,460 --> 00:41:32,000 The address of the first byte in the chunk of memory. 808 00:41:32,000 --> 00:41:33,020 >> Now, this is super simple. 809 00:41:33,020 --> 00:41:35,380 It's just one byte, which means the address we're getting back is the 810 00:41:35,380 --> 00:41:37,300 address of the whole thing. 811 00:41:37,300 --> 00:41:42,070 So stored in x then, is the address of that chunk of memory. 812 00:41:42,070 --> 00:41:43,400 Meanwhile, what happens next? 813 00:41:43,400 --> 00:41:45,890 So actually, let's go ahead and draw this out real fast. 814 00:41:45,890 --> 00:41:52,490 >> So if we go over to the screen here and we play this out int* x and int* y 815 00:41:52,490 --> 00:41:53,740 is going to do what for me? 816 00:41:53,740 --> 00:41:58,280 I claim that it's just going to do something like this and call it x, and 817 00:41:58,280 --> 00:42:00,010 this and call it y. 818 00:42:00,010 --> 00:42:03,110 Meanwhile, the third line of code is going to allocate the size of an int, 819 00:42:03,110 --> 00:42:06,160 which happens to be-- sorry if I said one before I meant one int-- 820 00:42:06,160 --> 00:42:08,280 four bytes on a typical computer. 821 00:42:08,280 --> 00:42:09,720 At least with the CS50 appliance. 822 00:42:09,720 --> 00:42:11,490 >> So this is going to allocate it, who knows? 823 00:42:11,490 --> 00:42:12,800 Somewhere out here. 824 00:42:12,800 --> 00:42:15,780 And this is stored at some address Ox, who knows? 825 00:42:15,780 --> 00:42:18,330 But what's going to get returned is that address. 826 00:42:18,330 --> 00:42:22,270 But we'll draw this pictorially as just an arrow like that. 827 00:42:22,270 --> 00:42:25,430 >> Now in the next line *x gets 42. 828 00:42:25,430 --> 00:42:29,400 What does *x mean in layman's terms? 829 00:42:29,400 --> 00:42:30,040 Just go there. 830 00:42:30,040 --> 00:42:30,960 Go to that address. 831 00:42:30,960 --> 00:42:35,900 Or in other words, follow the arrow and put 42 there. 832 00:42:35,900 --> 00:42:38,140 But then something bad happened to Binky, right? 833 00:42:38,140 --> 00:42:43,950 >> Recall that line five here, *y gets 13, indeed an unlucky number, 834 00:42:43,950 --> 00:42:44,760 did what for us? 835 00:42:44,760 --> 00:42:47,320 Well, *y means go there. 836 00:42:47,320 --> 00:42:50,460 Well, this has not been given a value yet, right? 837 00:42:50,460 --> 00:42:54,090 The code does not have y being initialized to anything. 838 00:42:54,090 --> 00:42:56,120 We had x being initialized to an address. 839 00:42:56,120 --> 00:42:57,640 But y was declared up top. 840 00:42:57,640 --> 00:43:00,250 But then a semicolon, no value was actually put in it. 841 00:43:00,250 --> 00:43:02,330 So it's fair to call this a garbage value. 842 00:43:02,330 --> 00:43:03,430 Who knows what's there? 843 00:43:03,430 --> 00:43:07,160 It's the remnants of bits that were used by some previous line of code in 844 00:43:07,160 --> 00:43:08,300 my program. 845 00:43:08,300 --> 00:43:13,250 >> So if I say go there, this is like, I have no idea where this arrow is 846 00:43:13,250 --> 00:43:14,490 going to end up. 847 00:43:14,490 --> 00:43:17,720 And that's when you typically get a segmentation fault. 848 00:43:17,720 --> 00:43:22,430 If you accidentally dereference, so to speak, or go to an address that's not 849 00:43:22,430 --> 00:43:25,400 actually a legitimate address, bad things happen. 850 00:43:25,400 --> 00:43:27,550 >> And that's exactly what happened to think Binky. 851 00:43:27,550 --> 00:43:31,060 So recall that the story that Nick was telling here was the same idea as what 852 00:43:31,060 --> 00:43:34,050 I've drawn with the illusion of chalk on the board there. 853 00:43:34,050 --> 00:43:35,960 X and y are declared. 854 00:43:35,960 --> 00:43:39,690 >> Then we allocated the size of an int and stored it in x. 855 00:43:39,690 --> 00:43:42,130 Then the next line we did *x. 856 00:43:42,130 --> 00:43:46,070 This was Nick's magic wand of dereferencing. 857 00:43:46,070 --> 00:43:49,780 That put 42 in the memory pointed out by x. 858 00:43:49,780 --> 00:43:51,600 >> But this is where things went horribly wrong. 859 00:43:51,600 --> 00:43:51,820 Right? 860 00:43:51,820 --> 00:43:53,550 We tried to dereference y. 861 00:43:53,550 --> 00:43:55,620 But y had some bogus value, right? 862 00:43:55,620 --> 00:43:57,720 >> That arrow in the bottom left-hand corner, is not 863 00:43:57,720 --> 00:43:58,950 actually pointing to anything. 864 00:43:58,950 --> 00:44:01,520 It's kind of doing what I did here on the board. 865 00:44:01,520 --> 00:44:05,900 So bad things happen, segmentation fault, or Binky fault, in this case. 866 00:44:05,900 --> 00:44:10,800 >> But if we then fix that by doing x gets y how does the story change? 867 00:44:10,800 --> 00:44:15,760 Well, if I do x gets y, that's effectively the same as saying 868 00:44:15,760 --> 00:44:19,235 whatever this is, Ox-something is going to be the same here, 869 00:44:19,235 --> 00:44:20,080 Ox-something. 870 00:44:20,080 --> 00:44:22,970 Or pictorially we'll draw an arrow. 871 00:44:22,970 --> 00:44:25,530 >> So here on the board with Binky, with the next line of 872 00:44:25,530 --> 00:44:28,350 code, *y means go there. 873 00:44:28,350 --> 00:44:29,400 Where is there? 874 00:44:29,400 --> 00:44:30,820 It means over here. 875 00:44:30,820 --> 00:44:36,050 >> And when we update that to be 13 it just involves going and 876 00:44:36,050 --> 00:44:39,470 writing 13 here now. 877 00:44:39,470 --> 00:44:44,130 So perhaps not completely straightforward at first glance. 878 00:44:44,130 --> 00:44:47,740 But to recap and to use the same jargon that Binky was using here, so 879 00:44:47,740 --> 00:44:50,485 the first two allocate the pointers, x and y, but not the pointees. 880 00:44:50,485 --> 00:44:54,750 And pointees is not a generally used term. 881 00:44:54,750 --> 00:44:56,120 But pointer absolutely is. 882 00:44:56,120 --> 00:44:59,200 But it's what's being pointed at in Binky's nomenclature. 883 00:44:59,200 --> 00:45:01,660 >> This next line, of course, allocates an int pointee. 884 00:45:01,660 --> 00:45:04,840 So a chunk of memory-- as I drew over on the right-hand side there-- and set 885 00:45:04,840 --> 00:45:06,470 x equal to point to it. 886 00:45:06,470 --> 00:45:11,350 This dereferences x to store 42 in the memory that it's pointing at. 887 00:45:11,350 --> 00:45:13,380 And then this, of course, was a bad thing. 888 00:45:13,380 --> 00:45:15,600 Because y was not pointing at anything yet. 889 00:45:15,600 --> 00:45:16,530 This fixes it. 890 00:45:16,530 --> 00:45:18,240 So this is still buggy program. 891 00:45:18,240 --> 00:45:21,580 Just because we're blowing through the code line by line and saying, oh well, 892 00:45:21,580 --> 00:45:22,690 let it crash there. 893 00:45:22,690 --> 00:45:23,420 That's a bad thing. 894 00:45:23,420 --> 00:45:26,790 Odds are the program's just going to abort altogether at that line. 895 00:45:26,790 --> 00:45:30,550 But if you were to remove the crashed line and replace it with the last two 896 00:45:30,550 --> 00:45:32,470 lines there you assign-- 897 00:45:32,470 --> 00:45:35,310 using pointer assignment-- y to point to x as point t. 898 00:45:35,310 --> 00:45:39,280 And then you dereference y in a very safe way. 899 00:45:39,280 --> 00:45:41,520 >> So where does this leave us? 900 00:45:41,520 --> 00:45:45,350 Well, turns out that underneath the hood in the CS50 library, pointers are 901 00:45:45,350 --> 00:45:46,320 used throughout. 902 00:45:46,320 --> 00:45:48,910 And we'll actually start to peel back that layer before long. 903 00:45:48,910 --> 00:45:51,740 But it turns too, an expression that some of you might be familiar with, 904 00:45:51,740 --> 00:45:54,580 particularly those more comfortable, is actually that of a very popular 905 00:45:54,580 --> 00:45:56,390 website, or stack overflow, these days. 906 00:45:56,390 --> 00:45:58,720 >> But this actually has very technical meaning. 907 00:45:58,720 --> 00:46:00,160 We now know what a stack is. 908 00:46:00,160 --> 00:46:02,550 It's like a stack of trays inside of a dining hall. 909 00:46:02,550 --> 00:46:05,140 >> Or inside of your computer's memory its those frames 910 00:46:05,140 --> 00:46:06,900 that are used by functions. 911 00:46:06,900 --> 00:46:10,760 Well, it turns out that because of that very simple implementation of 912 00:46:10,760 --> 00:46:14,970 memory and the frames on the so-called stack, you can actually take control 913 00:46:14,970 --> 00:46:17,050 of a computer system fairly easily. 914 00:46:17,050 --> 00:46:22,180 You can hack into a system if people like us have not written our code 915 00:46:22,180 --> 00:46:23,300 particularly well. 916 00:46:23,300 --> 00:46:26,670 >> If people like us use chunks of memory or use arrays-- 917 00:46:26,670 --> 00:46:27,810 even more commonly-- 918 00:46:27,810 --> 00:46:31,800 but sometimes forget to check the boundaries of our array as you might 919 00:46:31,800 --> 00:46:38,470 have yourself sometimes, and iterated way too far past the end an array. 920 00:46:38,470 --> 00:46:40,520 In the best case, your program might just crash. 921 00:46:40,520 --> 00:46:42,280 Segmentation fault, kind of embarrassing. 922 00:46:42,280 --> 00:46:45,480 Not great, but it's not necessarily a hugely bad thing. 923 00:46:45,480 --> 00:46:49,480 >> But if your program is actually on real users' computers, if it's running 924 00:46:49,480 --> 00:46:53,070 on a website that actual random people on the internet are hitting, letting 925 00:46:53,070 --> 00:46:56,690 people induce bad things on your code is generally not a good thing because 926 00:46:56,690 --> 00:46:59,930 it means an opportunity to take control of the computer. 927 00:46:59,930 --> 00:47:01,350 And this is going to look a little cryptic. 928 00:47:01,350 --> 00:47:04,570 But I thought I'd scare you with this last example here. 929 00:47:04,570 --> 00:47:05,650 >> Here's an example of code. 930 00:47:05,650 --> 00:47:07,370 And there's a good Wikipedia article that walks through 931 00:47:07,370 --> 00:47:08,530 this in more detail. 932 00:47:08,530 --> 00:47:13,890 I have main on the bottom calling foo, passing in argv of 1. 933 00:47:13,890 --> 00:47:15,750 And that's just so that you can run the program and pass 934 00:47:15,750 --> 00:47:17,080 an arbitrary input. 935 00:47:17,080 --> 00:47:20,180 >> And then foo is declared up top as accepting a string, or more 936 00:47:20,180 --> 00:47:21,700 precisely, a char*. 937 00:47:21,700 --> 00:47:23,860 It then declares an array of chars. 938 00:47:23,860 --> 00:47:27,130 Call it a buffer, more generally, of size 12. 939 00:47:27,130 --> 00:47:30,900 So 12 chars can fit inside of that array called c. 940 00:47:30,900 --> 00:47:33,510 >> And then it uses this new function, which is new but not hard to 941 00:47:33,510 --> 00:47:34,930 understand, memory copy. 942 00:47:34,930 --> 00:47:39,290 It copies the memory from bar, which was the variable past n, whatever the 943 00:47:39,290 --> 00:47:42,080 user typed into argv 1 into c. 944 00:47:42,080 --> 00:47:43,090 How many bytes? 945 00:47:43,090 --> 00:47:44,260 The string length of bar. 946 00:47:44,260 --> 00:47:48,380 >> So in other words, if the user types in h-e-l-l-o enter, the string length 947 00:47:48,380 --> 00:47:49,260 of hello is five. 948 00:47:49,260 --> 00:47:52,790 So five of those bytes is going to get copied into the array called c, which 949 00:47:52,790 --> 00:47:54,110 is of size 12. 950 00:47:54,110 --> 00:47:58,710 But what the user types in a much longer word that's 13 characters or 14 951 00:47:58,710 --> 00:48:01,250 characters or 100 characters or more? 952 00:48:01,250 --> 00:48:02,660 >> Where are they going to go? 953 00:48:02,660 --> 00:48:06,090 Well, that frame, that tray in the dining-hall stack, 954 00:48:06,090 --> 00:48:06,930 they're going to go there. 955 00:48:06,930 --> 00:48:10,080 And it's just going to start overwriting other stuff that's already 956 00:48:10,080 --> 00:48:12,880 on that stack, overflowing the stack, so to speak. 957 00:48:12,880 --> 00:48:14,780 >> So pictorially, think of it this way. 958 00:48:14,780 --> 00:48:17,970 This is just a colorful version of the picture we've been drawing. 959 00:48:17,970 --> 00:48:20,060 At the bottom, let's say, is main. 960 00:48:20,060 --> 00:48:24,690 And on the top, what you're seeing now is the frame, color coded now, for a 961 00:48:24,690 --> 00:48:26,090 function called foo. 962 00:48:26,090 --> 00:48:30,170 But what's interesting here about foo is that here is its frame. 963 00:48:30,170 --> 00:48:32,860 So it's drawn just like I did but in light blue. 964 00:48:32,860 --> 00:48:35,220 And now this is where c bracket 0 goes. 965 00:48:35,220 --> 00:48:37,410 And this is where c bracket 11 is going to end up. 966 00:48:37,410 --> 00:48:39,670 >> In other words, it happens to be represented as a square. 967 00:48:39,670 --> 00:48:42,320 But if you just keep plopping bytes down-- or chars-- they're going to end 968 00:48:42,320 --> 00:48:46,070 up at location 0 all the way up to 11 because it's 0 indexed. 969 00:48:46,070 --> 00:48:49,170 >> But where is the 13th character going to end up? 970 00:48:49,170 --> 00:48:50,310 Where's the 14th? 971 00:48:50,310 --> 00:48:52,430 Where's the 50th character going to end up? 972 00:48:52,430 --> 00:48:54,070 >> It's going to keep going down. 973 00:48:54,070 --> 00:48:57,350 Because even though we've drawn the picture with the stack growing up, the 974 00:48:57,350 --> 00:48:59,920 addresses, it turns out, go from small addresses, small 975 00:48:59,920 --> 00:49:01,830 pointers, to big addresses. 976 00:49:01,830 --> 00:49:03,540 So it just keeps going up and up. 977 00:49:03,540 --> 00:49:05,660 >> So if the user types in hello, that's great. 978 00:49:05,660 --> 00:49:08,650 No bug, no problem, everyone's safe. 979 00:49:08,650 --> 00:49:11,940 But if the user types in what we'll call adversarial code, represented 980 00:49:11,940 --> 00:49:16,040 generically as a, attack, attack, attack, attack, what can happen? 981 00:49:16,040 --> 00:49:19,760 >> Well, if all of the input that the user typed in isn't just some friendly 982 00:49:19,760 --> 00:49:21,540 or offensive string of characters. 983 00:49:21,540 --> 00:49:24,050 It's actually a sequence of characters that if you compiled it, 984 00:49:24,050 --> 00:49:26,050 it actually is code. 985 00:49:26,050 --> 00:49:29,570 Maybe it's code that deletes all the files on your hard drive or sends spam 986 00:49:29,570 --> 00:49:30,810 or something like that. 987 00:49:30,810 --> 00:49:35,110 Notice that what's key here is that if the bad guy got lucky enough to 988 00:49:35,110 --> 00:49:37,830 overwrite the red chunk of memory-- 989 00:49:37,830 --> 00:49:41,080 which I didn't draw on my picture but this Wikipedia picture here has-- 990 00:49:41,080 --> 00:49:42,890 its so-called return address. 991 00:49:42,890 --> 00:49:47,470 >> When food returns, when swap returns, how does the computer know to go from 992 00:49:47,470 --> 00:49:49,790 up here to down here? 993 00:49:49,790 --> 00:49:52,920 Or in the tech segment up above, how does it know to go from the swap 994 00:49:52,920 --> 00:49:54,870 code-- the 0's and 1's that compose swap-- 995 00:49:54,870 --> 00:49:56,020 back to main? 996 00:49:56,020 --> 00:50:00,450 There's a so-called return address stored in that same stack frame, on 997 00:50:00,450 --> 00:50:02,140 the same cafeteria tray. 998 00:50:02,140 --> 00:50:06,080 >> So if the bad guy is clever enough to put attack code, attack code, attack 999 00:50:06,080 --> 00:50:07,960 code, and get lucky enough-- 1000 00:50:07,960 --> 00:50:11,630 often through trial and error-- to overwrite that red return address, 1001 00:50:11,630 --> 00:50:14,360 with the address and notice the very top. 1002 00:50:14,360 --> 00:50:16,830 Notice 0835C080. 1003 00:50:16,830 --> 00:50:20,650 It's written backwards up top for reasons we'll perhaps revisit. 1004 00:50:20,650 --> 00:50:22,050 This is that number. 1005 00:50:22,050 --> 00:50:25,790 >> So if the bad guy gets lucky enough or is smart enough to overwrite the red 1006 00:50:25,790 --> 00:50:29,480 strip of memory with the address of code that he or she has somehow 1007 00:50:29,480 --> 00:50:34,980 injected into your computer, guess whose code is going to be returned to 1008 00:50:34,980 --> 00:50:38,260 as soon as foo is done executing? 1009 00:50:38,260 --> 00:50:39,440 >> The bad guy's code. 1010 00:50:39,440 --> 00:50:43,610 So this attack code, AAA, again, might send spam, might delete all the files 1011 00:50:43,610 --> 00:50:44,500 on your hard drive. 1012 00:50:44,500 --> 00:50:48,740 But that is what truly a stack overflow is, or a buffer overrun, or a 1013 00:50:48,740 --> 00:50:51,060 buffer overflow attack. 1014 00:50:51,060 --> 00:50:54,400 >> And it's incredibly, incredibly common to this day with programs written in 1015 00:50:54,400 --> 00:50:58,220 C, C++, and even some other languages. 1016 00:50:58,220 --> 00:51:02,275 On that scary note, we'll end with a joke. 1017 00:51:02,275 --> 00:51:03,230 >> [LAUGHTER] 1018 00:51:03,230 --> 00:51:04,550 >> See you on Wednesday. 1019 00:51:04,550 --> 00:51:07,920 1020 00:51:07,920 --> 00:51:10,310 At the next CS50-- 1021 00:51:10,310 --> 00:51:15,920 So I'm all out of disk lamps today but wait, fat-free milk, half the phone 1022 00:51:15,920 --> 00:51:17,850 book, the orange juice that I drank today. 1023 00:51:17,850 --> 00:51:20,370 1024 00:51:20,370 --> 00:51:22,780 USB cable, a wrench. 1025 00:51:22,780 --> 00:51:24,800 >> [MUSIC PLAYING]