[MUSIC PLAYING] DAVID J. MALAN: All right this is CS50 and this is the start of week five. So today, underneath your seat cushions, you will not find anything. But above, you should find these, a little token of our appreciation for all of the work that you put into the Game of Fifteen. Simply remove the little circle on the bottom to begin playing for the remainder of class. So recall that, or know that problem set four, which went out this weekend, involves writing another game. But this time it involves using an actual graphical user interface, not a textual interface like Game of Fifteen was. And the game that lies ahead of you, if you've not yet seen this next, looks a little something like this. I'm going to go into my terminal window here in GDB. And I'm going to go ahead and run the staff solution, which you can access after running update 50 as usual. But I'm going to put it into a little secret mode, a little Easter egg, so-called God mode, by putting God in argv1. And I have to follow my own directions, running it in my own problem set directory. So now you see a complete version of the game of Breakout. In fact, this is no-hands mode. So it's actually-- wowed though you might be-- pretty trivial to implement God mode in Breakout, unlike Game of Fifteen, which some of you might have tackled for the hacker edition. In Breakout it suffices in God mode to simply do what, intuitively with the paddle? Just make it equal to whatever the horizontal position is of the ball. And so long as you do this in lockstep with the ball moving this game will never, ever, ever miss the ball and you'll win every time. But in this week's hacker edition there's more than just God mode. There's a number of other features. Among them, lasers. So that if you really get impatient you can start shooting down the bricks and a few others. And for those of you who'd like to calibrate standard versus hacker edition, I can see that this week's hacker edition deliberately is a little more doable, say, than God mode was with Game of Fifteen. So if you're looking for a stretch and you're looking for some additional fun features do dive in if of interest. Now more practically, let me point out one thing as well. GDB, which some of you may not have yet touched personally, which is fine. But now is really the time to get used to this and comfortable with this tool because it will make your lives much easier, truly. Per Rob's lecture on GDB a couple of weeks ago, recall that GDB is a debugger. It's a tool that lets you run your program but run it step by step, line by line, so that you can poke around, so that you see things happening, so that you can print out values of variables. In short, it gives you so much more power than printDef does. Now admittedly, the interface is pretty arcane. Black and white textual interface for the most part. The commands are somewhat tough to remember at first. But even though it might take you half an hour, an hour, to put that upfront investment of time into it, trust me. Certainly by semester's end it will save you an order of magnitude more time than that. So early in the week dive in. And in terms of Breakout, know that you can do this so long as you have the distribution code or your own code in progress in your Pst4 directory. Know that you can run gdb./breakout. This is going to open up a window like this. Let me give myself more of a terminal window. And then what I'm going to go ahead and do, it's not just run it. I'm going to first set a break point recall, which allows you to pause execution at a particular place. Just to keep things simple I'm going to break at line one just by typing the number one. Let me actually re-open this window because it's getting a little small there. So what I'm now going to do here is if I open up my terminal window. Come on, there we go. So now if I go back to dropbox, Pst4 and run gdb./breakout enter, notice I'm going to break one to set a break point at line one. And now I'm going to go ahead and type run. And when I do, notice nothing seems to happen. There's no pop up. There's no graphical user interface yet. But that's understandable because I'm literally at line one in my program. And notice that I've fast forwarded, specifically now to 62, because all the stuff at the top of this file is things like comments and constants and uninteresting stuff for now. So now I'm inside of main, it seems, at line 62. And this is just the distribution code, recall. If I open this up by going, similarly, into my drop box directory into Pst4, into breakout.c. And if I scroll down and down and down, and let me go ahead and turn on my line numbers. What I'll see, if I scroll down to line 62, is exactly the line that we've paused on. So this line here, 62, is where we're about to be. So now in GDB, if I go ahead and type now next, enter it's going to execute that line. And voila, we have the so-called g window. If unfamiliar with what a GWindow is, not to worry. The spec will introduce you to it, as well as a number of walkthrough videos embedded in the spec. But now let's make this a little more interesting. Let me move this window over to the side a little bit. Let me make the window a little bigger so I can see more. And now let me go ahead and do next again. And there are my bricks. If I type next again now I see the ball. And if I type next again now I see the paddle. And fortunately this gedit is not really cooperating by showing me everything I want. But now if I do next again, next again, I'm just declaring some variables. And I can print any one of these guys out. Print bricks, prints lives. And now if I continue to do next, notice that I'll be inside of that loop. But the code is going to execute exactly as I expect. So when I hit this function, Wait for Click, it's going to do it literally that. So I seemed to have lost control over the program. GDB is not giving me another prompt. But not to worry. Go to my game, click somewhere. And voila, now it proceeds to line 86. So again, it's invaluable, ultimately, for debugging problems. Because you can literally step through your code, print things out and much, much, more. But for now, those tools alone should get you pretty far. So we're, of course, taking a look at Graphics now, all of a sudden. And now our world gets a little more interesting. And you know, perhaps, from some of the videos online that we have these shorts that you've been watching as part of problem sets. And they've been shot, deliberately, against a white backdrop. And some of them have the teaching Fellows drawing some text on the screen that's overlaid on the side of them. But of course, this isn't all that interesting in the real world. This is just a lecture hall with a big white screen and a backdrop. And our amazing production team sort of makes everything look beautiful after the fact by cropping out or overlaying anything we do or don't want. Now just to motivate this week and really, where you can go, ultimately, with computer science. Not just after problem set four. But after another course or an entire curriculum it's amazing what you can do these days in terms of graphics in particular. Some of you might have seen this flowing around online. But I thought I'd show you, for just a couple of minutes, a glimpse of what computer technology and what CGI, computer graphics can do these days with a familiar song and perhaps movie. [MUSIC - LANA DEL RAY, "YOUNG AND BEAUTIFUL] SPEAKER 1: It's just a little bit amazing, perhaps, just how omnipresent-- [APPLAUSE] SPEAKER 1: I just downloaded it. But it's really amazing, I think, just how omnipresent software and code and tools like this really are. So that's a taste of the direction in which you can go. Oh, no more Appliance today. Well, that's actually tragic timing given the point I just tried to make. All right, so let's launch Fusion again. Remind me later. All right, and you should have got an email as an aside if you did get a notice like that. All right, so recall that last week we started to peel back this later known as string. string recalls a data type that's declared in the CS50 library. And it's part of the training wheels that will now begin to take off. It was a useful concept early on. But now it's going to get more interesting and more powerful to actually see that underneath the hood, a string is just what, did we said? Yeah, so it's a so-called char*. And the * there denotes that there's some kind of address involved. And so when you say char* you just mean a variable whose data type is a pointer now. The fact that there's the star there just means that you are declaring a so-called pointer. And that pointer is going to apparently store the address of, of course, a char. Now why does this make sense? Well, what is a string underneath the hood? Well, for some time we've been saying that a string underneath the hood is just h-e-l-l-o, for instance. But we've talked about this as being, essentially, an array. And an array would then look a little more like this, with each of these taking up a bite. And then we've said that there's something special back here, the backslash 0, or null terminator. So all this time, this here has been a string. But really, a string is actually an address. And addresses, as we'll see, are often prefixed with 0x by convention. What does 0x denote? Does anyone know? So it just means hexadecimal. So you might recall, actually, from Pst 1, I believe, one of the warm-up questions actually asked about hexadecimal notation in addition to binary and decimal. And the motivation here is that with hexadecimal you have 16 digits at your disposal. 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, followed by a, b, c, d, e, f. And if you count all those up, you get a total of 16. So this is in contrast with decimal, where we have 10 digits, 0 through nine. It's in contrast with binary where we just have 0 and 1. But at the end of the day you can just represent the same numbers, but somewhat differently. And hexadecimal is common because as it turns out-- and we'll see this later in the course-- even when we get to web programming in the context of HTML and color codes, hexadecimal is nice. Because each digit, turns out, represents four bits perfectly. So it just kind of lines up nicely as we'll eventually see. So this might be Ox123 or something like that, denoting address 123 somewhere inside of my computer's memory. But of course, some problems arise because of this underlying implementation. And recall that I took a stab at implementing a function like this-- compare dash 0 dot c last week, that even though it looked like it was right, it simply didn't compare two strings correctly. I've thrown away main, and I've thrown away the comments just to focus in on the code that's of interest here. And it's in red because it's buggy. For what reason? Well, at the top there when I declared a string, what was really going on underneath the hood? Well, let me go over to the screen here and draw that. So I declared, again, string s GetString. So I'm going to go ahead now and draw s for what it really is. It's going to be a square here. And I'm going to claim that that's 32 bits. At least it usually is, at least on the CS50 appliance in a lot of computers. I'm going to call it s. But now recall that we called GetString. So GetString returns, of course, a string. If the user types in h-e-l-l-o enter the string hello gets returned. And that string, as we just said, ends up somewhere in your computer's memory with a backslash 0 at the end. I'll draw this like the array-- or contiguous block of characters-- that it actually is. And now, what is GetString actually returning? What has GetString been returning all of this time? Well, we say, in weeks prior, it returns a string. But more technically, what does GetString return apparently? AUDIENCE: An address. SPEAKER 1: An address. Specifically it returns the address of the very first bite, whatever it is. I just keep using one, two, three because it's convenient. It returns the address of the first character in the string. And we said last week that that is sufficient. Because we can always figure out where the end of the string just by iterating over it, maybe, with a for loop or a while loop or something like that, just looking for "backslash 0", the special sentinel character. And then we know that the string happens to be of length-- in this case-- five. So technically what GetString does is it returns Ox123 in this case. And technically what then happens is that we store, inside of s, Ox123. At the end of the day, even though this is new concept, pointers, they're just variables. But they happen to store bits that collectively represent an address. So technically all they gets stored in s is Ox123. But we as humans-- including today onward-- are really not going to care, typically, what the actual address is of some chunk of memory. It's just to low level of detail to be intellectually interesting. So I'm going to undo this. And instead, more high level, just say that when we're talking about pointers I'm going to just draw more user-friendly arrow that conveys the same idea and abstracts away the particulars of what the actual underlying address is. Now if we go back to the code, what happened last week if we have string t equals GetString? Well, if I again, type in hello this time I'm going to get another chunk of memory. h-e-l-l-o backslash 0. But because I called GetString a second time-- and I know this from looking at the source code for GetString-- even though it's coincidental that hello was typed in twice, GetString is not going to try to optimize and be clever. It's just going to get another chunk of memory from the computer, which is going to be at another address. Let's arbitrarily just say 456. And then what is it going to return? It's going to return 456 and store it in t. So what is really going on, on the left-hand side is I have another chunk of memory, 32 bits typically. And in there is going to go Ox456. But again, I'm not interested in these particular numbers anymore. I'm just going to abstractly draw it as an arrow. So this is now a new explanation. But it's the same exact idea that's been happening all this time. And so the reason then, that this first version of compare was buggy last week is why? When you do if s equals equals t what are you truly underneath the hood comparing? You're comparing the addresses. And just intuitively, clearly, Ox123 is not going to equal Ox456. Those numbers, those bits are just different. And so consistently, last week it said you type different things, even if the words were verbatim the same. So we fix this. In layman's terms, what was the fix? AUDIENCE: Use a function. SPEAKER 1: Use a function. Or stars are definitely involved, but use a function to do what? AUDIENCE: To compare the strings. SPEAKER 1: To compare the strings. So the fundamental problem here was that I was just considering the quality of strings to be defined by comparison of their addresses. And obviously that's just dumb now once you understand what's going on underneath the hood. To truly compare strings to see if they're equal in the way that a human would consider two strings to be equal we need to compare them character for character for character. Now I could have done this very tediously. But familiarly, we're using a for loop. And just compare s bracket i against t bracket i. s bracket i plus 1 against t bracket i plus 1, and so forth, inside some kind of loop. And if I spot any two characters that differ, or if I realize that ooh, s is shorter than t or longer than t I can immediately say false, they're not the same. But if I get through s and t and say same, same, same, same, same, end of both strings, I can say true, they are equal. Well, thankfully, years ago someone wrote that code for us. And they called it StrComp for string compare. And even though it's a little counter intuitive, StrComp returns 0 if those two strings, s and t are the same. But it returns negative value if s should come before t alphabetically or positive value if it should come after t alphabetically. So if you ever want to sort something, it turns out that StrComp is useful. Because it doesn't just say yes or no, equal or not. It gives you a sense of ordering like a dictionary might. So StrComp, s comma t equals equals 0 means that the strings are truly equal. Because whoever wrote this function years ago presumably used a for loop or a while loop or something like that to integrate over the characters again and again and again. But problem two arose here. This was copy0.c. And the two in red is because it's flawed. And what did we do here? Well, first I called GetString. And I stored the return value in s. So that's pretty much the same as this top part of the picture. But what comes after that? Well, let me go ahead and get rid of a whole bunch of this. We'll rewind in time to where we just have s, which is now consistent with line one up there. I check. If s equals equals 0. Now, a quick side note, when might GetString return 0? There's not enough memory. Right? It's rare that this is going to happen, certainly on a computer that's got hundreds of megs or even gigs of RAM. But it could, in theory, return 0, especially if the user doesn't cooperate. There's ways to pretend like you haven't inputted anything and trick GetString into returning 0 effectively. So it's going to check for that. Because if any of you have started to get, already, segmentation faults-- which has probably been a source of some frustration-- those are almost always the result of memory-related error. Somehow you messed up with regard to a pointer, even if you didn't realize there was a pointer. So you might have induced segmentation faults as early as week one using something like a for loop or a while loop and an array by going too far past the boundaries of some array that you declared, in week two in particular. You might have done it even in problem set four with Breakout. Even though you probably haven't seen any stars in the distribution code for Breakout, it turns out that those GRect and GOval and other such things, those are actually pointers underneath the hood. But Stanford, like us, sort of hides that detail at least for the libraries purposes, much like we do for string and char*. But GRect and GOval and all of those things you guys are or will be using this week are ultimately memory addresses. You just don't know it. So it's not surprising then, perhaps, that you might trip over some segmentation faults. But what's interesting here now, if after we check for 0 we do string t gets s. Well, let me declare t. I'm going to draw it as a square, 32 bits, call it t. And then I'm going to do gets s. Well, what does that mean? Well, it's a little hard to think about it picture wise. But let's think about what's inside of x? What's literally inside this variable? The value Ox123. So when I say string t gets s, that just literally means take the number in s, which is Ox123 and put it Ox123. Or pictorially, if I kind of abstract away from that detail it has the effect of literally doing this as well. So now, think back to last week when we proceeded to capitalist T. I did T bracket 0. Well, T bracket 0, even though it's a pointer, you can treat it as though it's an array, with a square bracket notation. So where is T bracket 0? Well, it's the h. And so when we use that line of code, two upper, which is in that c type.h header file, that's where it's declared. You're capitalizing this H. But of course, that's the same exact h that's inside of s, so to speak. And so now you have changed or capitalized both the original and the so-called copy. Because you didn't make a copy in the way that a human would want it to be. So what was the fix here, in copy1.c last week? Functions, so we could actually copy the string. And fundamentally, what do we need to do in order to copy the string? Well, in this green version here I'm going to do it fairly low level. There are actually functions they could help with this. But the most basic one, and the most familiar one, at least, will soon be familiar to us, is the following-- so one on the first line of code in green now. I just rewrote s as char*. There's no functional difference there. I just threw away the CS50 library and I'm calling it what it is, a char*. Now dot, dot, dot, because there were some error checking that's not interesting to talk about again. So now t is declared. It too is a char*. So I drew a little square on the screen like before. But on the right-hand side, malloc, we said is memory allocate. So allocate some chunk of memory. And how many bytes do we actually want to allocate, does it seem? Well, the string length of s. So if it's hello that's going to be five. We'll say h-e-l-l-o. So five bytes. But then plus 1, why 1? The 0 character. If we don't leave room for this guy we might accidentally create a situation where the string is h-e-l-l-o. And then the next time GetString is called and I type in, for instance, David, D-a-v-i-d, the computer is going to think that s is actually h-e-l-l-o-d-a-v-i-d because there's no break in between those words. So we need that break. So we don't want five. We want six bytes. And bytes I say. But it's really time size of char. Technically char is almost always a single byte. But just to make our code portable, so to speak, so that it works on different computers even if they might be somewhat different underneath the hood, I'm going to generically say size of char so that my code always work. And I don't have to recompile it just because I upgrade my computer or use some different platform. So I've got 6 times the size of a char, which happens to be 1. So that means malloc could give me six bytes. What is that actually doing? Well, let me roll back in time here to where we are in the story. So if I go back here, I've declared a char* called t. I've now called malloc for six bytes. And now I'm going to draw those six bytes just like the array earlier. But I actually don't know what's inside this array. If you allocate memory it turns out that you can't trust that there's some known value there. It could have been used by something else, some other function, some other line of code that you wrote. So we'll generally call these garbage values and draw them, perhaps, as question marks, just indicating that we don't know what's actually there. And that's no big deal so long as we are smart enough to overwrite those garbage values with numbers or chars that we care about. So in this case what am I going to do? Well, my line of code next, I have four. int i get 0, n gets the string length of s. So a familiar for loop. I is less than or equal to n, which usually is above. But this time it's deliberate. I++, and then I simply do t bracket i gets s. Because my picture looks like this at this moment, stored in t is the address of that random chunk of memory whose values are unknown. But as soon as I do t bracket 0 that puts me here. And what ends up getting drawn there? We end up putting h. Because that's what's at s bracket 0. And then the same thing for e, and l, and l, and o. n, why did I go up through an equal to n? Because of the 0 character. So just to be clear, then, if I actually erase whatever these garbage values are and then actually draw in what I expect, this is s bracket 1, 2, 3, 4, plus that's trailing new character. And so now if we continued past the dot, dot, dot in this correct version and capitalized t bracket 0 I would, of course, be capitalizing just this guy here, which conceptually, was ultimately the goal. So that's all the pointer is. And you've been using them for weeks now in the context of strings. But underneath the hood they're a little more complex. But if you think about them in this pictorial form I propose that they're probably not all that scary as they might first seem at first glance, particularly with such new syntax. Any questions on pointers, strings, or chars? Yeah? AUDIENCE: Can you go back to the [INAUDIBLE]? SPEAKER 1: Sure. AUDIENCE: So how come in your very last line, you don't have a *t line and a *s in the line? Don't you have the reference to the-- SPEAKER 1: Ah, a really good question. Why don't I have a *t and a *s? Because briefly, last week, like in our swap function, I did say that when you've got a pointer the means by which you go there as we did physically on stage, was to actually use the star operator. It turns out that this square-bracket notation is what we'll call syntactic sugar, which is just a sexy way of saying it's shorthand notation for exactly what you're describing. But it's a little more intuitive. And at the risk of making this seem more complicated than it needs to be, what's really going on here is the following-- If I say *t that means go to the address stored in t. So literally, if t is storing the address of that h initially, *t means go here. Now, what does t bracket 0 mean? Same exact thing. It's just a little more user friendly to write. But I'm not done yet. I can't just say *t gets *s. Because what would I be doing then? I'd be putting h, h, h, h, h throughout the whole thing. Right? Because *t is go to the address in t. But we're inside of a loop. And what value am I incrementing, of course, on each iteration? i. But there's an opportunity here, right? Even though this feels like it's getting a little more sophisticated than the square-bracket notation we've used for some time-- let me undo my h change there-- even though this is now getting a little fancier, the basic idea, if *t means here and *t is just go to the address in t. But what was the address in t? The number we keep using? Like Ox456, let's bring that back just for the sake of discussion. Well, if I want to get at the e in t string, I just want to go to, essentially, 456. Or rather, 457. I just need to add one. But I can do that, right? Because t, even though I keep drawing it now as an arrow, it's just a number, Ox456. And if I add one to that, or more generally, if I add I to that I can actually get exactly where I want. So if I actually do this-- and this is what's now called pointer arithmetic-- I can remove this line. Which is, frankly, I think clearer and a little more user friendly to read. But this is no less correct. This line of code now is using pointer arithmetic. It's saying go to the following address-- whatever the start of t is, which is t plus i, which initially is 0, which is great. Because that means the beginning of t plus 1, plus 2, plus 3, and so forth. And the same deal with s. So syntactic sugar for this. But understanding what's really going on underneath the hood, I would argue, is actually useful in and of itself. Because it means now there's not much more magic going on underneath the hood. There aren't going to be many more layers that we can peel back for you. This is c. And this is programming. Really good question. All right, so this was that buggy program I was referring to earlier. swap was flawed. If did seem to work. Recall that just like with the milk and the orange juice-- which I started drinking today's demonstration. So just as with the orange juice and the milk, we did have to use a temporary variable, tmp, to hold a temporarily so that we could then change its value and then update b. But this function, we said, or this program in which this function was written was wrong and flawed, why? Yes? AUDIENCE: [INAUDIBLE]. SPEAKER 1: Exactly, when you call swap-- or more generally, when you call most any function-- if the arguments to that function are primitive, so to speak, ints and chars and doubles and floats, things without stars, you are passing in a copy of the argument. So if x was 1 and y was 2, a is going to be 1 and b is going to be 2. But they're going to be different chunks of bits, different chunks of memory that happen to be storing identical values. So this code is super perfect at swapping a and b. It's no good at swapping-- in last week's example-- x and y. Because again, they're in the wrong scope. Now, how did we go about fixing this? We had to make the function look a little uglier. But again, consider what this just means. And actually, let me, for consistency, change one thing so it's identical to what we just did. As I mentioned last week, it doesn't matter where it goes. In fact, typically you would put the star next to the variable name. But I think it would be a little easier to consider the * next to the data type as meaning it's a pointer to an int in this case. So what am I doing here? I'm saying don't give me an int followed by another int, calling them a and b. Give me the address of an int. Give me the address of another int. Call those addresses a and b. And then using the * notation down below, go to each of those addresses as needed to either get or set its value. But there's an exception here. Why do I not have a * next to tmp? Why do I not do this, for instance? It feels like I should just go all out and correct the whole thing. Yeah? AUDIENCE: [INAUDIBLE]. SPEAKER 1: I haven't declared tmp as a string. So this would declare, in this case, a tmp to be the address of an int. But that's not quite what I want, for a couple of reasons. AUDIENCE: You don't want to swap them. SPEAKER 1: Exactly, I don't want to swap anything with tmp. tmp is just week-one stuff. All I want is a variable to store some number. I don't even care about addresses at this moment. I just need 32 bits or so to store an int. And I want to put in those 32 bits whatever is not in a, so to speak, but what is at a, just to be more precise. Because if a is an address, *a means go there and get the value 1. For instance, in last week's example or in b's case, get the value of 2. So what's really going on? Let me draw a picture here that will only tease apart part of today. But this will continue to appear for quite some time. This, I claim, is what your computer's memory looks like when you run a program, any program. When you run a program at the very top of your computer's RAM-- so think of this rectangle, truly, as your computer's RAM or memory, all 101 billion bytes of it, all two billion bytes, all two gigabytes of it, whatever the quantity you have is, let's draw it as a rectangle. And I claim that when you run a program like Microsoft Word or Chrome or anything like that, the bits that Microsoft or that Google wrote-- in the cases of those programs-- are loaded into your computer's memory where they can be executed more quickly and fed into the CPU, which is the brains of the computer. And in TAM they're stored at the very top of your program, so to speak. In other words, if this is a chunk of memory, when you double click on Microsoft Word, the bits come off the hard drive. They get loaded into RAM. And we'll shove them up at the very top of this rectangle conceptually. Well, the rest of your memory is used for different things. At the very top you see initialize data and uninitialize data. This has to do, for the most part, with constants or global variables that have values. But more on those another time. Then you have the heap, which we'll come back to. But at the bottom is the part that's particularly germane right now. It's the so-called stack. So just like in most any D hall here on campus, you have those trays that just stack on top of each other on which you can put food and whatnot. The stack in a computer system is very similar. Except whereas the tray, as we use in the dining hall, of course, is meant to carry things the trays or the frames-- as we'll call them-- in a computer's memory is used to hold variables and values. So what really goes on underneath the hood? Well, let me flip over to the screen here. And let's focus just on the bottom part for a moment. If this is the bottom portion of my computer's memory it turns out when I call the function main-- which happens, frankly, automatically for me-- I get a chunk of memory at the bottom of my RAM so to speak. And this is where main's local variables go. It's where argc and argv maybe go, and any variables I declare inside of main. They end up at the bottom of my computer's RAM. Now suppose that main calls a function like swap, like it did last week? Well, we essentially put a new tray, a new frame, onto my chunk of memory. And I'm going to describe this as belonging to the swap function. Now what's inside of swap? Well, based on last week's program and the one we just saw an excerpt from, inside of swap's frame, or on swap's tray, are what variables? Well, a and b. Because those were its local arguments, plus a third, tmp. So really, I could draw this a little more cleanly. Let me go ahead and undo the label. And let me claim that you know what? a is probably going to end up here. B is going to end up here. And tmp is going to end up here. Now, the ordering might be a little different. But conceptually this is the idea. And just collectively, this is what we'll call swap's frame, or dining-hall tray. And the same deal with main. But I won't redraw that. But that's where argc and argv and any of its local variables like x and y might be as well. So now consider what's really happening when you call swap. When you call swap, executing code like this, you're passing in, in the buggy version, a and b as copies of x and y. So if I do now draw this on the screen-- got to get better at this-- so the story I was telling to myself was in this buggy version, when we call swap passing in literally a and b as integers, what's really happening? Well, what's really happening is this. Let me go ahead and undo just to clear up some space here. So this is my computer's memory. So if I have, for instance-- actually let's do it this way-- if I claim that this is x, storing the value 1 just like last week. And this is y, storing the value 2 just like last week. And this is main, when I call swap, thereby giving myself access to a and b and tmp, I'm going to claim that this is a and this is 1. This is b. This is 2. This is called tmp. And initially, it has some garbage value until I actually store in it a, which is 1. Then I go ahead and change a to be what? B's value. And so now I have two here. And then we said b gets tmp. Again, just as a sanity check, the third line of code here is simply this one, b gets tmp. And so lastly, what do I do? I go ahead and change b to be whatever the value of tmp is, which is 1. I don't touch tmp again. But now, the problem is as soon as swap returns, because it's not handing back some value, there's no return statement explicitly in it. What's actually happening? Well, essentially all this memory-- OK, apparently the eraser likes only one finger at a time-- just disappears. Now in reality it's not going anywhere. But you can think of it now as question marks. Because it's no longer actually in use. And nothing is done with those values. So in the case of the green version of this code, what instead is being passed into swap? So addresses. So the address of x and the address of y. So if we re-tell this story one last time, and I actually draw swap again, but with pointers, this being a, this being b, and this being tmp, what is actually stored in a in this green version of my code where I'm passing in addresses? It's going to be a pointer to x. So I could draw an arrow. But let's use the same arbitrary example as before. Let's say that this is something like Ox123. And this is going to be Ox127 because it's four bytes away because it's an int, so Ox127. And again, I'm taking some liberties with the numbers. They're much smaller than they would actually be and in a different order. But that's how the picture is now different. But when I use this green code and I do int tmp get *a. *a means to do the following, take the address that's in a and go to it, which is 1. And that's what I then put in tmp. Meanwhile, in the next line of code here, *a gets b, what does that mean? Well, *a, so go here gets *b, which means go there. And that means put the value to there. Finally, the last line of code simply said *b gets tmp. So b says go there and overwrite it with tmp which, in this case, is going to be, again, 1. And this is why the green version of our code works, whereas the red version never did. It all just boils down to how the memory is managed and where it's actually placed in your computer's RAM. And for now, that's one of the things that the stack is being used for. Questions on the layout? On pointers? Or on swap? All right, so malloc, recall, did something like this. This was a super simple example. And this was the one that Binky introduced us to, albeit quite quickly, at the end of class. Dammit, there we go again. So recall that this was the example that Binky introduced us to, albeit somewhat quickly at the end of class. And here we used malloc really for the second time. Because the first time we used it to create enough RAM, allocate enough RAM to store a string. This time Binky kept it simple. So it's to store just an int, apparently. And that's totally fine. It's a little weird, frankly, to use malloc to allocate one int. But the point of Nick's claymation was really just tell the story of what happens or doesn't happen when you mistreat memory. So in this case, this program did a few things. In the first case here, it declares a pointer called x to an int. It then declares a pointer called y to an int. It then stores in x, what? Someone else now. What gets stored in x according to the third line of this program? AUDIENCE: [INAUDIBLE]. SPEAKER 1: Well, not quite bytes, per say. Be more precise now. What gets stored in x? An address, I think I heard it. So what does malloc return? malloc behaviorally allocates a chunk of memory. But how does it give you access to it? It returns what? The address of the first byte in the chunk of memory. Now, this is super simple. It's just one byte, which means the address we're getting back is the address of the whole thing. So stored in x then, is the address of that chunk of memory. Meanwhile, what happens next? So actually, let's go ahead and draw this out real fast. So if we go over to the screen here and we play this out int* x and int* y is going to do what for me? I claim that it's just going to do something like this and call it x, and this and call it y. Meanwhile, the third line of code is going to allocate the size of an int, which happens to be-- sorry if I said one before I meant one int-- four bytes on a typical computer. At least with the CS50 appliance. So this is going to allocate it, who knows? Somewhere out here. And this is stored at some address Ox, who knows? But what's going to get returned is that address. But we'll draw this pictorially as just an arrow like that. Now in the next line *x gets 42. What does *x mean in layman's terms? Just go there. Go to that address. Or in other words, follow the arrow and put 42 there. But then something bad happened to Binky, right? Recall that line five here, *y gets 13, indeed an unlucky number, did what for us? Well, *y means go there. Well, this has not been given a value yet, right? The code does not have y being initialized to anything. We had x being initialized to an address. But y was declared up top. But then a semicolon, no value was actually put in it. So it's fair to call this a garbage value. Who knows what's there? It's the remnants of bits that were used by some previous line of code in my program. So if I say go there, this is like, I have no idea where this arrow is going to end up. And that's when you typically get a segmentation fault. If you accidentally dereference, so to speak, or go to an address that's not actually a legitimate address, bad things happen. And that's exactly what happened to think Binky. So recall that the story that Nick was telling here was the same idea as what I've drawn with the illusion of chalk on the board there. X and y are declared. Then we allocated the size of an int and stored it in x. Then the next line we did *x. This was Nick's magic wand of dereferencing. That put 42 in the memory pointed out by x. But this is where things went horribly wrong. Right? We tried to dereference y. But y had some bogus value, right? That arrow in the bottom left-hand corner, is not actually pointing to anything. It's kind of doing what I did here on the board. So bad things happen, segmentation fault, or Binky fault, in this case. But if we then fix that by doing x gets y how does the story change? Well, if I do x gets y, that's effectively the same as saying whatever this is, Ox-something is going to be the same here, Ox-something. Or pictorially we'll draw an arrow. So here on the board with Binky, with the next line of code, *y means go there. Where is there? It means over here. And when we update that to be 13 it just involves going and writing 13 here now. So perhaps not completely straightforward at first glance. But to recap and to use the same jargon that Binky was using here, so the first two allocate the pointers, x and y, but not the pointees. And pointees is not a generally used term. But pointer absolutely is. But it's what's being pointed at in Binky's nomenclature. This next line, of course, allocates an int pointee. So a chunk of memory-- as I drew over on the right-hand side there-- and set x equal to point to it. This dereferences x to store 42 in the memory that it's pointing at. And then this, of course, was a bad thing. Because y was not pointing at anything yet. This fixes it. So this is still buggy program. Just because we're blowing through the code line by line and saying, oh well, let it crash there. That's a bad thing. Odds are the program's just going to abort altogether at that line. But if you were to remove the crashed line and replace it with the last two lines there you assign-- using pointer assignment-- y to point to x as point t. And then you dereference y in a very safe way. So where does this leave us? Well, turns out that underneath the hood in the CS50 library, pointers are used throughout. And we'll actually start to peel back that layer before long. But it turns too, an expression that some of you might be familiar with, particularly those more comfortable, is actually that of a very popular website, or stack overflow, these days. But this actually has very technical meaning. We now know what a stack is. It's like a stack of trays inside of a dining hall. Or inside of your computer's memory its those frames that are used by functions. Well, it turns out that because of that very simple implementation of memory and the frames on the so-called stack, you can actually take control of a computer system fairly easily. You can hack into a system if people like us have not written our code particularly well. If people like us use chunks of memory or use arrays-- even more commonly-- but sometimes forget to check the boundaries of our array as you might have yourself sometimes, and iterated way too far past the end an array. In the best case, your program might just crash. Segmentation fault, kind of embarrassing. Not great, but it's not necessarily a hugely bad thing. But if your program is actually on real users' computers, if it's running on a website that actual random people on the internet are hitting, letting people induce bad things on your code is generally not a good thing because it means an opportunity to take control of the computer. And this is going to look a little cryptic. But I thought I'd scare you with this last example here. Here's an example of code. And there's a good Wikipedia article that walks through this in more detail. I have main on the bottom calling foo, passing in argv of 1. And that's just so that you can run the program and pass an arbitrary input. And then foo is declared up top as accepting a string, or more precisely, a char*. It then declares an array of chars. Call it a buffer, more generally, of size 12. So 12 chars can fit inside of that array called c. And then it uses this new function, which is new but not hard to understand, memory copy. It copies the memory from bar, which was the variable past n, whatever the user typed into argv 1 into c. How many bytes? The string length of bar. So in other words, if the user types in h-e-l-l-o enter, the string length of hello is five. So five of those bytes is going to get copied into the array called c, which is of size 12. But what the user types in a much longer word that's 13 characters or 14 characters or 100 characters or more? Where are they going to go? Well, that frame, that tray in the dining-hall stack, they're going to go there. And it's just going to start overwriting other stuff that's already on that stack, overflowing the stack, so to speak. So pictorially, think of it this way. This is just a colorful version of the picture we've been drawing. At the bottom, let's say, is main. And on the top, what you're seeing now is the frame, color coded now, for a function called foo. But what's interesting here about foo is that here is its frame. So it's drawn just like I did but in light blue. And now this is where c bracket 0 goes. And this is where c bracket 11 is going to end up. In other words, it happens to be represented as a square. But if you just keep plopping bytes down-- or chars-- they're going to end up at location 0 all the way up to 11 because it's 0 indexed. But where is the 13th character going to end up? Where's the 14th? Where's the 50th character going to end up? It's going to keep going down. Because even though we've drawn the picture with the stack growing up, the addresses, it turns out, go from small addresses, small pointers, to big addresses. So it just keeps going up and up. So if the user types in hello, that's great. No bug, no problem, everyone's safe. But if the user types in what we'll call adversarial code, represented generically as a, attack, attack, attack, attack, what can happen? Well, if all of the input that the user typed in isn't just some friendly or offensive string of characters. It's actually a sequence of characters that if you compiled it, it actually is code. Maybe it's code that deletes all the files on your hard drive or sends spam or something like that. Notice that what's key here is that if the bad guy got lucky enough to overwrite the red chunk of memory-- which I didn't draw on my picture but this Wikipedia picture here has-- its so-called return address. When food returns, when swap returns, how does the computer know to go from up here to down here? Or in the tech segment up above, how does it know to go from the swap code-- the 0's and 1's that compose swap-- back to main? There's a so-called return address stored in that same stack frame, on the same cafeteria tray. So if the bad guy is clever enough to put attack code, attack code, attack code, and get lucky enough-- often through trial and error-- to overwrite that red return address, with the address and notice the very top. Notice 0835C080. It's written backwards up top for reasons we'll perhaps revisit. This is that number. So if the bad guy gets lucky enough or is smart enough to overwrite the red strip of memory with the address of code that he or she has somehow injected into your computer, guess whose code is going to be returned to as soon as foo is done executing? The bad guy's code. So this attack code, AAA, again, might send spam, might delete all the files on your hard drive. But that is what truly a stack overflow is, or a buffer overrun, or a buffer overflow attack. And it's incredibly, incredibly common to this day with programs written in C, C++, and even some other languages. On that scary note, we'll end with a joke. [LAUGHTER] See you on Wednesday. At the next CS50-- So I'm all out of disk lamps today but wait, fat-free milk, half the phone book, the orange juice that I drank today. USB cable, a wrench. [MUSIC PLAYING]