>> David: All right. Welcome back to CS50. This is the end of week four. So, literally earlier today when I was on my way to campus I emerged from the Harvard Subway Station and I was about to exit the station up the stairs when I noticed this older woman who was standing by one of these things. If unfamiliar, these are the mechanical computer devices that you have to use to buy a ticket on the T these days and there was one of these really awkward but touching situations where it was clear that this woman had no idea, you know, what to do, how to get from here to here, and I could see her just staring at the turnstiles that you're supposed to go through to get into the subway. So, you know, I actually paused and tried to lend a hand with this and what struck me at the end of this experience of walking this individual through the process of buying a ticket was actually a recurring theme in my mind for this course. One of the discomforts, one of the reasons that I think so many people in society and even in this class are uncomfortable when it comes to computers and intimidated by technology and kind of throw up their hands when things go wrong that you don't understand is frankly not necessarily your fault. The problem is recurringly in society that it's just horrible designs. People make really stupid decisions and so as you know in this course with P sets there are three axes that we look at when evaluating your code. Correctness just doesn't do what it's supposed to do but then also design and design is one of those things it's a little harder to put your finger on initially, but it really is something to bear in mind especially toward terms end when you tackle your own final projects. Odds are statistically most of you will tackle something web based although you'll be able to tackle something android based, iPhone based, C based, PHP, really the choice will ultimately be yours, but to think about exactly how to design good software is something you, is certainly a lesson that we want to promote. So, I actually paused for a moment to take some photographs after this particular experience because after walking this woman who happened to be, I mean she was at least 70 plus years old I would say, no offense if she's watching this on the Internet now and I got that wrong, but at least 70 years old. She was foreign and she admitted to me upfront this was her first time in the Boston Subway System and so this was completely new to her and yet you would think for sort of a metropolitan city you should have designed your technology, you should design your computer programs in a way that anyone who is visiting your city or who lives in your city can actually solve these problems quickly and obviously and yet here is what our experience was like together. So, we approached this machine here, the photograph is a little blurry, but it was very clear to us but if you zoom in on the screen and apologies for the reflections, again, this was just taken with my cell phone, you see what appear to be three huge buttons and at the top it said something like have a Charlie Card, which may very well apply to a lot of locals. And then at top right there's this longer sentence to purchase new Charlie ticket, value tickets and passes, press here. So, ask yourself frankly especially when it comes to designing your own tools and your own programs whether it's for real users to use, whether it's for your research group to use in some other field, you know, where do you even begin? Frankly, I would argue that probably 80%, 90% of the users of Boston Subway just want to buy one damn ticket to get on the subway and yet where is the button that says one-way pass? Where is the button that just says click here to do the most common case? This is optimized not for that common case but for all possible cases. So, I did some reading quickly and she looked at the screen for a moment and then I suggested we hit the top right button to purchase new Charlie Ticket, whatever that is, value tickets and passes. So, we pressed here. So now we get this. It's a little more straightforward now, but even then semantically frankly it's a little confusing. What ticket would you like to purchase? Well, you can buy a pass, which would seem conceptually not to be a ticket, but at top left was somewhat obvious this time bus and subway tickets. We're in the subway we want to buy a subway ticket so we click that. So, then we had to select our fare. So this, too, fairly straightforward and yet and this was a socially awkward moment I had to ask her are you an adult or a senior? I didn't want to make that judgment call for her when she said I am, what did she say? She said I am beyond senior or something like that. So, I said, okay, I didn't want to assume so we clicked senior and frankly it was then a dead end because apparently to get some senior discount you need to have some special pass or something like this that we didn't have and yet that certainly wasn't obvious here so now we together and this person in particular had to figure out how you go back and then restart this process. So we went back. We clicked adult at top left this time and now this. Like I just want to get on the damn subway. I don't want to have to consider how much it costs or what multiple of number of tickets I want to get. I just wanted to get her on the subway and she, too, she did mention I want a round trip, she wants to be able to come back from Park Street Station so where is the option that just says one way or round trip? No, instead we have to click other amount at bottom right there, when we have to input in this screen 4-0-0 after consulting visually the little cheat sheet on the device itself it tells you in a very long chart how much one pass costs so we multiply and type in 4-0-0, we hit enter. Now, we're reminded that an SV adult is what we're buying, whatever SV is, probably single value, but who the hell cares at this point? So now amount to pay $4, credit card/debit card. So, okay, we go ahead and click credit card at this point. I'm reminded of the confirmation screen here and that's not too bad, then we put in the card and we click confirm and then finally we see the swap function. [Laughter] Then finally literally nine steps later the ticket comes out of the machine and then, frankly, had to go through the process of where to put the ticket in the machines. If you've ever used those systems they actually run Windows because I've seen the blue screen of death on these screens, but you put the ticket in, it goes in and out, in and out and in and out and eventually comes back out in the right place, you pull the ticket out and in my experience in Boston for the past several years, I'd say 5% of the time the gates don't even open at that point, but that's a whole other issue, but the point here is that at every point in this design process the wrong decision was made, right? Again, I don't have data to back this up, but just common sense suggests to me that the common case is if I'm on the subway platform I want to get on that train and maybe I want to get back and where are those two buttons, right? So when it comes time to implementing your own software, ask yourselves these questions. It may be obvious to you, the developer or the computer scientist or the engineer, well, obviously, I just click, here, here, here, here, here, here and voila, out comes my ticket, but most people don't have an interest in that process and a lot of people, this person in particular, just don't necessarily understand that process. Her goal was so simple, get on the train. The solution is so simple, give us $2, we'll give you a ticket and yet it takes nine steps to do that. So, I don't mean to start off with a bit of a rant. I actually do get worked up over these things because it drives me nuts because these are not hard problems to solve and yet consistently throughout society and your own laptops there are dozens of examples I'm sure of poorly designed software. So, among the lessons you'll hopefully exist this course with is not just how to make something correct. The machine is for the most part correct. It eventually outputs a ticket with the right amount, but my God, how many steps does it take for us to get there? Let me disclaim, too, we're by no means perfect so the Harvard courses APP that a lot of you used on the web to shop for courses, we recently sent out a form to get feedback from everyone because we don't doubt that it was imperfect the first time around, but we're going to iterate and make amends with that. So realize you don't have to get it right the first time, but you have to listen and watch to what actually those problems are and that is one of the roles your teaching fellows play. It's not just about stopping some stupid number on your transcript in the end; it's about actually providing some compelling feedback. So, with that said let me hop off the soapbox and remind us of this. So, a couple of weeks ago we proposed to swap two integers, and we couldn't at the time. Such a simple goal and you look at the code, looks, correct, we used GDB recall, the new debugger, and we stepped through this code and it was correct up until the last line but then as soon as that second curly brace happened and this function called swap returned as we say all of this hard work of swapping A and B seemed to have been lost and now last week we attributed this to this issue of scope. This idea that each function has its own chunk of memory that other functions can't necessarily see unless you give them access to it by passing in copies of variables or as we'll see today providing references there to or pointers there to and this is new jargon that we'll tease apart today and it's at this point in the class where you really begin to appreciate both the power and frankly the danger of programming particularly in a fairly low-level language like C which gives you great flexibility. You can touch almost any part of memory in the computer systems that you want with your program, but do you want to? Probably not. Almost everyone in this room has probably had a segfault at some point or core dump where you end up with this random file called core, which recall is just the contents of memory at the time your program crashed. Well, that suggests that you were not using memory correctly and so we'll tease apart today exactly what it means to navigate inside of a computer by way of memory and we'll also touch on over time what are some of the evils that might happen. In fact, I dug up this from a few years ago it was actually pretty fortuitous like the day before one of CS50s lectures the original iPhone was cracked. To crack the phone means or it was jail broken as people say which means you were able to use an iPhone back then and now with other software on like AT&T's network or broad, or sorry, on T-Mobile's network as opposed to just AT&T. You can install other software on it, you don't have to use the Apple Store, you can use third party stores. So, it frees you from some of Apple's tethers and this was the code that circulated on the Internet with which people could crack their iPhones. It was pretty primitive at the time. This was not sort of for the timid because you could very easily brick your iPhone. Yes, as you may have, okay, maybe I could have, I see the smiles that there's something interesting in the comment. So, I'll just fast forward through that, but look the point really I was trying to make, is there anything else inappropriate here? No, we're good. So, this is clearly C, right, and it's C because with C and a few other languages you get some low-level control and what I believe happens in this case of the iPhone's first jail breaking was that there was something called a buffer overflow exploit. An opportunity in Apple's own source code where someone accidently didn't check the size of an array, didn't make sure that when you're iterating from I to zero plus, plus, plus, plus, plus, plus that you don't actually stop for instance at the end of the array and so what these folks were able to do was inject their own code that they had written into their iPhones in such a way that they trick the phone into executing that code and thereafter they had full-fledged access to the iPhone. In fact, I believe they were able to Telnet or SSH essentially into their phones and get a blinking prompt because essentially under the hood of the phone is a UNIX like system. So it's actually pretty cool. Frankly, it's gotten much easier these days. Apparently you can now jail break your iPhones and your iPads and your iPod Touches by like pulling up a web page and typing the right command. So, we don't necessarily recommend that, but what was pretty cool especially that day, okay, so this is okay. Why am I reversing this APP, we were going to release the resources. All right. So there's some cool tricks people can do. We'll make this available because, frankly, you could Google it and now it's pretty moot because Apple has plugged this particular hole but there's all sorts of other bugs still, but it all boils down to one of the topics today, which is going to be that of this thing called a pointer. So, here was the problem. Let me draw a quick picture in chalk here. We generally draw our computers memory lately just some rectangle like this where the bottom of it is the thing called the stack and it's called that, it's nice to call it that conceptually because every time you call a function, recall that you put another frame on this stack, which means allocate a chunk of memory from main then allocate a chunk of memory for say swap and conceptually it's going on top, on top, on top of the previous function's memory and that's conceptually true and also we'll see eventually if you use GDB and you actually print the addresses of every byte in memory, you'll see that, in fact, the addresses are going higher and higher and higher or lower and lower depending on where you are in the story. So down below is generally where main's memory goes. So, if you've got local variables in main, if you're using arc and argv in main, they generally belong at the bottom of the stack conceptually and if main now calls a function whether it's get string or printf or swap or foo or whatever, it then gets pushed onto the stack here so we'll call this the swap frame here on the stack and any variables that are local to swap like A and B or temp also end up in that chunk of memory. So what's happening in buggy3, which recall looked a little something like this, is when you have this code in main up top whereby I'm declaring an int called x and then I'm assigning it the value of 1 and then I'm declaring another variable called y assigning it the value of 2, well, what's really going on here in main is that if we carve out 32 bits, this is the thing that we're referring to as X, this is the thing we're referring to as y and so when I say x gets 1, I'm putting the value 1 where the pattern of bits that represent 1 there and the same deal for 2. Now, where they end up isn't so important for the picture sake. They go somewhere in that chunk of memory back-to-back, but for now the point is that they are inside of x and y main's frame. So, what happens next is when you call swap with a line of code like this one here, you're passing to swap two values, x and Y, but in order to bridge this gap, you can think of this line as really being some kind of barrier, in order to pass some variables from this frame into this frame, you can't pass the originals, right? You can't physically move these bits here, but you could very easily make a copy and so somewhere in swap, let's just draw it here for simplicity, there's going to be another 32 bits called A and then here there's another 32 bits called B and what we put in there is just copies of whatever was in x and Y. And so if at first glance it feels like this code should absolutely work, well, if you think about what's really going on underneath the hood, well, sure it works. It swaps this one and these two, A and B, but it has no access to main's memory. So, today do we finally equip you with a solution to this problem one that allows us to do many more powerful things including starting to read and write from discs, read and write files, which will be useful for our forensics piece when we actually recover JPEGs and similar file-based mechanisms and it allows us, too, to really pass anything we want around in memory, and we just need a new, tiny piece of new syntax in order to make this happen. So this is version two of the swap function and, in fact, it is almost identical, this is before, this is after, before, after and it looks like the solution to this problem that we've been revisiting a couple of times now is just to do what? Well, it's a prefix most of the symbols in this function with just an asterisk, with a * from the keyboard so Shift-8. Well, what does this actually mean? Well, it means something a little different conceptually. Notice that we're still saying int and we're still saying int, we're still calling things A and B, again, to be clear this was before, this was after, but the * in this context before a means that a is no longer an int; it's instead a pointer to an int. It is the address of an integer. So, whereas before we are literally passing a copy of x and y and respectively calling them a and b, they are different chunks of 32 bits, this time what we're doing is passing to swap not x itself, not y itself, but in a moment we'll see that we're actually passing in the addresses of x and y, right? Because if you think about it conceptually just sort of from a real-world perspective, if the problem is that swap is broken because he cannot access main's memory, he cannot access x or y, well, the solution just intuitively is give swap access to x and y. How do you give one function access to another function's chunks of memory? Well, simply with the * notation at least on the way in when you declare the function called swap, you simply say this is not going to take an int and another int because that's useless. It would give me copies of things. Instead I'm going to be expecting the address of some int and the address of another int and thanks to this address I can literally find this address in RAM, do anything I want there, return and what I've just done is actually changed or mutate the values of those original variables. In other words, if you hand someone literally the street address of something, right, if you hand, let's say, if you hand someone the address of a house or a home that person can then literally go to that location and do whatever they want at that location. It's the same idea here. If main wants to give access to swap to x and Y, it just has to tell swap where x and y are. So, let's actually take a look at this in action. So this here is going to be among your printouts from this current week. It's a program called swap. It's in a file called swap.C. Everything should be alphabetical so it should be toward the end of this week's packet and notice that it's almost identical except for a couple of syntax changes. So, at the very top of the file I'm just copying and pasting what was essentially on the slide a moment ago. I have to change the prototype of this function to say that swap no longer takes an int per se or another int per se, but rather it takes two pointers, two ints, and in fact, it's on the very last page if you're still flipping. So, main meanwhile is going to change, but let's fast forward to the end just to confirm that all I did was copy and paste what was on the slide a moment ago. Indeed, swap in your printout there is just defined as now taking *a and *b and then it also uses the * later, but we'll come back to what the different uses of the * means, but for now I claim conceptually it just means swap has access to the locations of its parameters. So, main does have to change a tiny bit. So this is the last new piece of syntax to be honest for a while. See, to be honest even though it might feel like a whole of new stuff at once, it's a pretty small language and so we've almost seen all of the syntactic features thus far so now we'll be able to start focusing more on concepts. Notice the one new piece of syntax is this. I'm not passing an x and I'm not passing a Y, I'm passing an ampersand x and ampersand y and can you take a guess as to what the ampersand operator must mean? So the address. That's just the special symbol in C that says don't pass X, that is don't pass a copy of X. Rather figure out where x is in memory, where he is in that frame and provide swap the numeric address in RAM of that value so that swap can go do anything it wants at that address. It is the postal address of x and the postal address of y. So that's the only change we need to make there and watch what happens when we actually run this. If I go ahead and make swap and then run the program called swap, finally for the first time I'm claiming that I've swapped x and y from 1 and 2 to 2 and 1 and, indeed, that is the case because if you look at what mains does, it is specifically designed to print x and y twice both before and after and this is the first time we've actually gotten the answers we expected. So, what must be going on in swap? Because we just have to reason through now what the * notation actually means. So the *a and *b here in swap's prototype says to expect the address of an int call it a and the address of another int call it b. Well, I still need to make a copy of one of those integers because remember we can't just pass these values, swap these values simultaneously in this way although as an aside there's actually a neat bit-wise operation with which you can do this -- we'll see this eventually -- but for now you can't just swap them simultaneously because as we've seen we'll lose one of their values. So let's instead do this. Let's instead declare another int just as we've been doing all this time. I'm going to call it temp and it doesn't matter where it ends up but conceptually it's in this frame. This thing is going to be called temp and what are we putting there? It looks like we're putting *a. Now this is perhaps, frankly this is really the reason that people tend to get confused with the new piece of syntax because the * means different things in different contexts. Here in functions prototype it just means the address of. Expect the address of an int but here in the context of the function itself inside the curly braces, it means go to that address. So, when you say Star A, A is again an address, *a in the inside of the function means go to that address so that means go to that address A, grab its value and put it where? In temp. So that means at the end of that first line of code just as before the value of 1 ends up in a variable called temp because we said go to A and I look around, oh, there's A, what's there? That value. And so I put it there. Now, by that logic the next two steps are pretty much the same. Star A, *b. What does that mean? So that means Star B go to B so where is B? So B is, and now let me update this, oh, actually let me take one step back here for a moment. Let me rewind slightly in the story so that I don't actually mislead because I forgot to update these two cells. So, in this program when we call this version of swap 1 and 2 do not end up here, right? That's the whole point of the solution. We're not putting 1 and 2 here. What are we really putting in these boxes? So the address right? So the location. Now I don't know what the locations are so let's just arbitrarily come up with some memory addresses. I'm going to call this address one, two, three and I'm going to call this one 4, 5, 6, right? I don't exactly know where they are in RAM, but I know if I've got 2 gigabytes of RAM, zero is here, byte 1 is here, byte 2 is here, dot, dot, dot so maybe byte 123 is over here and 456 is here. Whatever. We just need to know that they are numeric addresses because what that means here is that what goes inside of A is 4, 5, 6 this time and 1, 2, 3 because, again, we're passing into swap the address of A, 4, 5, 6, and the address of B, 1, 2, 3. So literally are those numbers stored in those same chunks of memory. So now let me pick up where we left off. This line of code here, temp gets *a. So now we can tell the story properly. All right, what's A? A is 4, 5, 6. The *a means go to that address so it means, all right, where's 5, 6? Here's 0, 1, 3, ah-ha, here's 4, 5, 6 what is there? One. And so I put 1 into this box so that's the correction I needed to make from before. So now the next line. Star A gets *b. So *b means, look at B, all right so B is this thing here in the swap frame all right it's 1, 2, 3. Star B means go there, all right? So by 0, 1 dot, dot, dot, ah-ha, here's 1, 2, 3. What is there? It looks like the number 2 so I'm going to go ahead and put the number 2 where? Well, on the left-hand side now we say *a so it's the same thing. So *a. What's A? A is 4, 5, 6. Star A means go there and so where am I going to put the value of 2? Where I am. Right? So I've left my finger there because I've gone there, *a, go there put what's in *b so that means put the value 2 there. So at this point in the story, the addresses are still in A and B; 4, 5, 6 and 1, 2, 3. The value 2 is still in what we called Y. It's now also in what we called X, but notice inside of main and here's the power. Now finally swap has the ability, the power, to modify memory that isn't his own, that's not in his own scope because we've passed it in by address. Now, we're not quite there because now we have two copies of 2 so the last line of code says *b gets temp. Well, temp is easy. There's temp. It's the value 1 this is sort of week one stuff. Variable stores the value 1 I've got it. Star B in the last line of code now means find B, all right, that's 1, 2, 3. Star B means go there so that's 0, 1, 2, 3. Okay, 1, 2, 3. So the value of 2 is here so now *b gets temp so what goes here? Well, what's in temp? And so now at the end of the story we have, indeed, swapped the values, but the story isn't quite complete because there's still something that's going to happen next. What happens now after we've executed this third line of code, what happens next in the story? So we hit the curly brace so the very bottom of the function and as soon as you hit that, the next line in the story is well, then we return to main and where are we executing in main? Well, if we scroll up where did we begin? We began here so at this point in the story the next thing that's going to happen is this line called Print F that says swap exclamation point, right? But the moment swap returns, the moment we hit this bottom most curly brace, what conceptually happens in memory? We lose this frame. It gets popped off the stack as one says and so you know what the values actually as we've hinted at with our brief discussions of forensics they're actually still there. So the fact that I'm erasing it does not mean that the computer erased that memory or over wrote those values with all zeros. They're still there, but the computer has forgotten what is, in fact, there. Now, is that a problem? No. Because A and B and even temp they were by nature local or temporary variables. They might very well be storing the addresses of memory elsewhere, but we just needed them as sort of a cheat sheet, a little address card to know where the original values x and y were. So, any questions? Can't have told that perfectly that story I'm sure so any questions? Now is your chance. Really? That well? Ah, damn, okay. >> Student: [inaudible] >> David: So good question. So, and this is where, again, things get sort of unnecessarily complicated just because of the syntax. So, when you want to figure out the address of a variable, you say ampersand. That is the address of operator. The * is the "go there" operator. So, if you assume that a or b are actually addresses, **a means look at those addresses and then go there just as I did with my finger on the board, but in the context of swap we use the ampersand because swap needs to be informed of the locations of x and Y. swap needs to be told that x is at 4, 5, 6 and y is at 1, 2, 3. Well, the mechanism that C gives us for asking those questions is just the ampersand and so ampersand x in this line of code here will literally return according to our story the number 4, 5, 6 ampersand y will literally return in our story the number 1, 2, 3. So it's 4, 5, 6 comma 1, 2, 3 that are passed into swap and that's why those same numbers end up in A and B. Yeah? >> Student: [Inaudible]. >> David: Yes, if you really, if you want to reimplement the solution from last where whereby it was buggy, you can absolutely say take the address of x then go there, thereby undoing all of the work you just did. If you really want to be obnoxious, you can do this because really these operators just undo themselves, right? One says get the address, one says go there. Get the address, go there. So, in fact, you could do this. Now, you might in some context have to add the right kind of parentheses so that GCC or your compiler doesn't get confused, but really they just reverse each other's processes. Ampersand gets the address so you can use it. Star means here is an address go there. That's all. And now the point just to tease apart, I can't erase that, let's fix, so now the one detail that's worth pointing out is that this here the * in swap's prototype and the * here in swap's prototype do not mean go there. It may mean something different and this is sort of just stupid re-use of syntax although frankly it wouldn't really be much fun to have yet a symbol so they went with the same symbol which is pretty reasonable in the context of a function prototype this just means *a expect A to be the address of an int and expect B to be the address of an int. They don't mean go there. They only mean go there when you're inside the curly braces. So that's the only distinction. Other questions? All right. So let's take a look then at somewhere where this is actually a useful thing to know so this is compare1.c. So, in compare1.c I've stripped out the comments in my version but in your version you do have comments for reference and it's actually pretty self-explanatory if you just read through the code. At the top, I've got some CS50 Library going on, standard I/O library; I don't bother mentioning argc or argv because I'm not going to use them in this program. I'm going to use the CS50 Library instead for user input I'm saying say something then I'd get a string from the user and I call time s1 and then I say, say something, and then I get another string from the user and call it s2 and apparently this program's purpose in life is to tell me yes or no the user said the same thing both times. Now, we're using the equal equals operator and this conceptually is correct. This is the equality operator. It would definitely be wrong if we were doing this because little sanity check if you just got the single one it means what instead? Assignment. So that would, this is actually a very common mistake and it makes sense if you think about it. If you accidently do this in your code, which is very commonly done, you're saying if S gets s2 so you're literally saying copy s2 into s1 and then you're asking the question if this expression is true, so it turns out if you've ever encountered this what's going to happen there if s2 is anything other than zero, it it's 1, if it's 2, if it's 3, if it's negative 1, negative 2, negative 3, well, both s1 and s2 are going to become the value 1, 2 or 3 or negative 1, negative 2, negative 3, but because it's non-zero what does the if condition evaluate to? So true. So even though we think of conditions as being Boolean values true or false, really underneath the hood true means anything other than zero. False means zero and so if you've run into this in your own code and for some reason it's this first branch that's always executing or almost always executing odds are it's because you've assigned one variable to another, they both happen to then be non-zero and so, of course, the first branch is going to execute because again a Boolean expression really is equivalent to anything but zero is true and zero is false. Now, this was not the bug in this program because I did, in fact, use the equality operator and so it feels like if I type the word foo both times I should, in fact, get back you type the same thing. So, let's try this. So make compare 1 and then I'm going to go ahead and run compare 1 enter. So, I'm going to type foo enter, foo enter and I type different things. All right. Let's try again with maybe a different word. Let's just type my own name. That's definitely the same though. Let's try foo and bar. I know this is wrong. So, it seems to always say I type different things. Now, why is that? Well, if we look back at the code here, take a look at what we're really doing. We're assigning to s1 the string that the user typed in. We're assigning to s2 the other string that the user typed in and then we're comparing them, but today we now begin to take off the training wheels that are the CS50 Library or really more technically we're going to peel back the layers that we've been borrowing from the CS50 Library and really look under the hood at what's been going on. Well, we've actually said this already. The word string is just a synonym for what? char *. So all this time and frankly this is why we hide this detail in the first week or two because it's just not interesting to get into that early. It's just a distraction. Really this program is identical to this. So, if we now apply the logic from today, char * s1 means that s1 is not a char, it's instead what? A pointer or the address of what? The address of a char, right? Because if *a meant the address of an int and just call it a, well, then *s1 means this is just the address of a char and so it turns out what's really happening underneath the hood with the CS50 Library is we're not handing you a string in the same sense we're handing you an int because the string recall is just a sequence of characters back to back to back to back. Well, if I've got a five-letter word like D-A-V-I-D, well, that's like five bytes and yet we only have the ability thus far in this class to return one thing at a time I can't return five bytes to you, but wait a minute, those bytes by nature of a computer are just stored in RAM. D-A-V-I-D back to back to back so if I want you to have access to that string, you know, I don't have to return all five characters to you. What could I just return instead? Just the first one or more generally the address of the first one and then, man, I'll just figure it out from there where the rest of the letters are because by definition of a string, they're back to back to back. So I just have to see, oh, here's a D, let me keep looking, here's an A, keep looking here's a V, here's an I, here's a D and yet, and this is one tidbit we introduced a week or so ago, how do I now know if I'm just given the address of the start of the string where the end is? Right, so there's that special sentinel value that's back slash zero, which we'll start to see more explicitly now. This was the special value that said to the computer string stops here and because you have that barrier given to you at the end of every string, just the most efficient way you can pass strings around is just by passing the address of their very first bytes, the location in RAM. So, what are we really doing in this line of code here? What are we really comparing? The addresses. Now, if I've asked the user for a string and then a moment later I ask the user for another string, well, they're going to end up in different locations in memory just by nature of get string. If you called get string multiple times, surely you've been able to get different strings from the user. So they are, in fact, returning different chunks of memory as we'll soon see but when you're comparing s1 against s2 you're literally saying does the address of the first string equal the address of the second string and hopefully that's not the case because otherwise get string is pretty useless if it only returns the same chunk of memory again and again. You could never ask the user for more than one word or more than one sentence. So, in other words, if we apply some of the storyline from the previous tale, if s1 happens to be 4, 5, 6 and s2 happens to be 1, 2, 3, you're just asking the question is 4, 5, 6 equals equals to 1, 2, 3? Well, obviously not so you're always going to say you typed different things. Well, let's try; let's be a little resistant here to this. Let me go ahead and open copy1.c. Let's actually try to really manipulate these strings in a more deliberate fashion. This program is not too long. It's just a few lines here and I'll introduce one new function as well. Here in this version this is copy1.c, which you should have a printout of. I again say, say something. Now, I've taken off the training wheel of the CS50 Library and I'm just flat out saying char *. No more strings. They are char *s because we now know what that means. So, I'm getting a string and so what really am I doing in this line of code? Well, get string again is returning the address of the string the user typed in, the address of the very first character like the letter D and storing that address in this variable s1. Now, I mentioned this other sentinel value a while back. If a get string returns null, that pretty much means something went wrong and there are a few things that could go wrong. One of the perhaps easiest ones to put your finger on is what if the computer is out of memory? I mean what if you're running so many things, what if the user has copied and pasted their thesis and just pasted it at the blinking prompt such that you're now out of memory because your computer is somewhat limited in memory so get string cannot possibly return all those characters or fit all of those characters in memory and return to you the address of the first. It can't do any of that. And so get string is defined as we'll see as just returning this special value in all uppercase called null which is its way of saying there is no address because, in fact, what null really represents under the hood is just similar in spirit to this, but it's just this value 0x00. This just means hexadecimal notation, which we'll come back to before long so really it just means it's returning the address zero. Now the address zero I said I think on Monday is special. Only the operating system has controlling of byte zero in the computer's RAM and so if a function ever returns null, aka zero, well, something must have gone wrong because that can't possibly belong to me that memory because by human convention zero is owned by the operating system; not by a program I wrote. So, I just return 1. Why 1? It's arbitrary, but it's anything other than zero so I just exit because something went wrong and this program is just going to bail. Now, what's really going on here in this line of code here char * s2 gets s1? Well, let's just reason through it. s2 is what data type now? It's not a char. It's instead a? So it's an address, it's a pointer, address pointer, synonyms for now, so it's the address of a char so this makes sense because s1 is also the address of a char so if I wanted to make a copy of that address this is absolutely the right syntax. It's char * s2 gets s1. So that is actually making a copy of the address and putting it in s2. What is it not doing conceptually though? It's not all copying the string. It's not copying the F-O-O or the D-A-V-I-D. It's only copying the address that was returned by get string so even though conceptually is the name this program suggests, I really just want to make a copy so that maybe I can make one version of D-A-V-I-D or F-O-O all uppercase or all lowercase or I want to spell check, something. I want an actual copy in memory so that that string is in two different places. This is not going to be the way to do it. So, let's do a little sanity check now and let's try to demonstrate as much I now claim I'm going to capitalize the copy of the string I just made. First, I'm going to do a sanity check so we've used string length, strlen, before. Turns out you really want to start using string length any time you're dealing with strings or now char *s because string length was written a long time ago and it was not designed itself to check if the length of a string is zero so if you don't check whether the length of a string is zero and you pass in essentially the address of a bogus chunk of memory you pass in zero, zero, string length is not going to return and say retry. It's instead going to do what do you think? It's going to segfault, right? If you pass a function in C, a value that it's not expecting bad things happen and bad things generally reduce to segfault. So, I'm going to check for it myself. It's very easy to do. It's string length takes a string as its argument or now aka char * so I can just say is the length of s2 greater than 1? If so, I've got some letters to capitalize. So, what do I do? Well, this is syntax you might have used in caesar or vigenere. You can treat a string as though it's an array because it really is. It's just a chunk of characters back to back to back so this is storing at the very first location in the string what? It's storing what there? It's passing to this function called toupper, which if you've never used it it actually does what it says it makes it touppercase. So, I'm passing in the first character in s2, I'm making it uppercase and then I'm putting it back so casually speaking this is just capitalizing the first letter of whatever word the user typed in to s2. That copy, that's all. It's not capitalizing the whole thing. Because I've hard coded the zero, it's just capitalizing they first character. So the very last two lines of code I'm saying this, the original string is s1, the copy of the string is s2, but if I haven't lost you, what are we really going to see when we print this? Someone assert. >> Student: [inaudible]. >> David: Both capitalized. We're going to see them both capitalized because I've made a newbie mistake here. This is not actually copying the string and giving me two different versions of foo or David or whatever I typed in, it's actually just giving me two copies of the address and so when I do this here manipulating the zero character at address s2, well, you know what? If s2 equals s1, guess which character you're also changing? The original because it's the same thing. You've made a copy of the address. So let's actually see this in action. So make copy one. All right, I'm going to run copy one enter. Let's type in foo. All right, so looks good, right? Not a problem, but now let's actually type in something in all lowercase. F-O-O in lowercase and, in fact, it is buggy because I typed lowercase F-O-O and yet I get back capital F-O-O as both the original and the copy and that's simply because I haven't done this correctly. So, any questions on that? Yeah? >> Student: [Inaudible]. >> David: Correct, yep, char *s2 gets s1. >> Student: [Inaudible]. >> David: So the, why do I need the *? >> Student: [inaudible]. >> David: So the reason here, in a nutshell and we'll come back to other examples that demonstrate this, in a nutshell because I declared s1 to be a char *, s2 if I'm going to make a copy of s1, it has to be the same data type and so recall that just as when, and actually maybe this is my fault, just as when you declare a function prototype, which we did a moment ago, recall that in swap.c when you declare a function prototype and you, therefore, declare the variables, you say * to denote this is going to be a pointer, and I did not say this before. When you declare a pointer yourself manually, you do say char * the variable name because recall that's the same thing that we did earlier but we called it instead string. And so if we wanted to do this here, it's again, the same thing we're just now pulling back this layer and calling them char *s not actually strings. >> Student: [inaudible] >> David: And the * will have other powers that we'll soon see. Why don't we take a three-minute break? All right. So, we're back. So this is the picture we've been using a whole lot. And, again, the rectangle represents your computer's RAM, the bottom represents the part of RAM that we generally call the stack, main conceptually ends up on the bottom of the stack followed by its local variables then the function say foo that it calls and on and on and on and up, but there is, in fact, something above all of this and we've seen this picture briefly and that's this thing called the heap. So, in memory, you have different zones if you will. The bottom, again, is generally called the stack, but it turns out there's stuff even lower than that conceptually things called environment variables above, in fact, memory at the very top conceptually is the text segment and so it turns out that even though it's given a strange name the text segment in RAM is actually the zeros and ones that you compiled and then when you ran your program a program just like on your Mac or PC gets loaded into memory. You double click an icon, the program gets loaded into memory, well, conceptually where does your program end up? It ends up at the top most portion of the computer's RAM. All right so it has to live in RAM as opposed to the hard drive because otherwise things would be terribly slow as you know so it's much better if your programs live while they're running in RAM and they end up in what's called the tech segment. Now, as an aside, there's another couple of layers at the very top above the stack and above the heap, but below the tech segment and those are called initialized data and uninitialized data. So it turns out if you've ever declared a global variable, which we did once in lecture and if you've tackled the game of 15 already you'll know that the board and the dimensions of the board are very intentionally by us declared to be global at the very top of fifteen.c. While the implication of that in RAM is if you declare a global variable outside of all your functions at the very top of your file, they don't end up in the stack they end up way, way up at the top of memory in either the initialized data segment if you assigned it a value with the equal sign, the assignment's operator or they end up in this part, the uninitialized data if you just say int x; and don't actually give it a value yet. So, conceptually if you've ever wondered why you get access in all of your functions to global variables that's because they're not down here, they're at the very top of RAM and any function can access that RAM way up there, but for now the interesting player in the story is this thing called the heap. So the heap is a chunk of memory in a computer's RAM that's conceptually allocated to what's called dynamic memory allocation. There are absolutely going to be times where you're running a program where the programmer, say you, didn't possibly know in advanced how much RAM the program was going to need. For instance in the past, we had that silly little program for computing the average of some quizzes and it was actually a pretty bad implementation because I had essentially hard coded in the number of quizzes. We introduced a feature of C called a constant, but how many quizzes did I assume every student has? So, two. Now this is somewhat annoying because if we in some future term have three quizzes or four or we want to use this code for other courses that have a weekly quiz, we'd have to recompile the program every time we want to change that value and that's just annoying especially if you're writing software that needs to actually end up on consumer's computers you can't expect them to wait until you change your code, recompile or give them an update. Why not write the program in a way where you figure out dynamically when the program is run how much memory you need rather than hard coding in two with or within that constant. The heap offers us a solution to that problem. In fact, consider the CS50 Library. You've been calling get string; you've been calling get in, maybe get float and a couple of others. Well, we certainly didn't know on day one how many times each of you was going to want to call get string, how many words a user might type when you call get string. So, certainly the CS50 Library designed to be dynamic and, in fact, any time you call get string, we are, in fact, allocating a chunk of RAM but it's not coming from the stack; it's actually coming from this portion of memory called the heap. So, let's actually take a look at how this might actually affect us. So this is the code that we had a moment ago for copy one. No, for copy1.c. Let me pull this up. So this is copy1.c, same program as before break, let's really see what's going on here. So, let me go had and let's swivel this around for just a moment so we have a blank slate and now what's really going on? Well, what's nice about this program is that there's just one function, main, so we don't need to draw the stack and get things all complicated. We can just treat the whole blackboard as mains frames. So, I'll be a little looser now where I put things. So, we say, say something. Then I go ahead and declare a char * called s1. Well, it turns out on most computers an address of the location and memory, aka a pointer, is itself 32 bits. They can be 64 bits, but very often on systems today are pointers 32 bits. Now that's changing the more years that pass the more of you have 64-bit computers and the more servers have many, many gigabytes of RAM and so you need actually 64-bits, but for now let's assume a common system whereby a pointer just by definition of the homework is 32 bits. Now, I mentioned this why? Well, all this time on the blackboard I always draw an int as a square. Well, int is almost always 32 bits at least so far as we've seen so, in fact, any time I draw a pointer hence force I'm just going to draw a square as well because they are, in fact, the same size usually in memory. So, here we go. I have a variable called s1 and it's of type char * so here we go. It's a little square, it's going to be called s1 and that's what exists as of line two of this program, but now I've called get string. Get string recall does not use the stack. It uses this thing called the heap. I have no idea where that is other than to know that it's up so let's just go ahead and assume that everything over here is the heap so what has just happened? Suppose I type in the word foo, F-O-O. What get string does and we'll eventually see the actual source code for CS50's Library what get string does is it allocates in this chunk of memory called the heap enough RAM for the F, for the O, for the O and for the back slash zero. So it does that for us so that you can get away with just knowing the address of the first byte and it will make sure that you know when to stop by including the special value so, in fact, get the string if you type in a three-letter word, we allocate four bytes no matter what because we need an additional byte for this special sentinel value back slash zero at the very end. So now at this point in the story get string is about to return because the user just typed in F-O-O enter get string is about to return. It can't return a chunk of memory like it can a copy of an int so it can't return the whole thing so what again is returned? So it's the address. Now, again, what's the address? Frankly I don't know. I mean you might have two gigabytes of RAM. Down here we said this is like 4, 5, 6 and 1, 2, 3 so this is going to be memory location 7, 7, 7, 7, 1 whatever. It's pretty arbitrary, but it's big relative to what the numbers were we were talking about before. All right so I probably should have chosen a shorter number because now I can't figure it in here, but let's call it 71. So, the address is 71 so what ends up in s1? Well, literally the number 71. That is the address in memory in the heap of the first byte that the user typed in. Now, frankly this is going to very quickly get distracting and because who cares, certainly for the classes purposes where things are in memory and for you in memory who cares where things end up, you just care that they end up somewhere in memory so what the world generally does is not bother coming up with these contrived examples like 71 for this address and 1, 2, 3, and 3, 4, 5, 4, 5, 6. Instead if we want this pointer to represent the address of something in as much as it points at that address let's just draw an arrow. So, for the most part any time we talk about or draw pointers an arrow suffices, which really in there is a number like 71, which is the literal byte that the F is actually in in RAM, but frankly who cares? Let's just assume that s1 is a pointer and as an arrow suggests it's pointing at this byte here. So, after the second line for code here char * s1 gets the return value of get string this is what the state of our world looks like. This is main string here on the stack. This is the heap somewhere else in memory, but they are, in fact, in distinct locations. So, now I check. Is s1 equal, equal to null? Well, it's not. It's not zero. It's 71 or whatever so that if condition doesn't apply, but now what do I do in this next line? This one we concluded by talking about earlier? char * s2 gets s1. Well, I know how big a pointer is. It looks like a square because it's 32 bits I'm told. This thing is called s2 so let's just label it s2 and now s2 gets s1. Well, the assignment operator makes a copy of the thing on the right and puts it in the thing on the left. Technically speaking what's here? Seventy-one. So, what really ends up here? Seventy-one. Again, stupid, arbitrary number to choose let's just update our picture. At this point in the story both s1 and s2 are literally pointing at the same location in memory so that's now what the story looks like. So, now what happens next? I next say I am capitalizing the copy dot, dot, dot. I then do a sanity check. Is the strength length greater than two or greater than zero? Yeah, there's like three letters there so that condition doesn't cause problems. So now I do this. s2 bracket zero. So s2 is this guy. It turns out that the bracket notation just means that this is essentially an array and you know what? It is. It's drawn like an array, it effectively is an array so bracket zero means go to the zero's location in that array, which happens to be F and do what with it? Well, call toupper pass this lowercase F to this function called toupper it's going to return capital F and so what do I assign to s2 bracket zero? Well, the return value of toupper so that literally changes this. Now notice the key takeaway of that example was s1 is pointing at that first letter; s2 is pointing at that same letter. It doesn't matter that I happen to get there by way of s2 if they both lead to the same destination the end result is going to be that they both, in fact, look like capital F lowercase o, lowercase o. Any questions on that part of the story? Yeah? >> Student: [Inaudible]. >> David: Could you speak up a little louder? >> Student: [Inaudible]. >> David: Okay. So good question. So, generally speaking when a function returns the memory that it used gets overwritten or disappears. First, that's actually not applicable here because in this story there's only one function that I wrote at least called main so there's no notion of one stack frame then another then another and then those being popped off. >> Student: Yes, that's the question. >> David: So, correct. Okay, so good point. So there are function calls going on not that I wrote but that the CS50 Library wrote the get string function does, in fact, result in this process of a frame going on the stack then it gets popped off when get string return so that part of the story is, in fact, consistent, but what's saving us here is that get string allocates these bytes for F-O-O and then the back slash zero not on the stack. It's not a local variable. We'll see in a moment there's a special function. Its name is malloc for memory allocation and what malloc does for us is we say, hmm, the user has typed in a three-letter word. I need four bytes and so you essentially call malloc a four and this function is going to grab from the heap the address of four spare bytes that are not currently being used by anything else and the fact that get string is using the heap and not the stack means we avoid that inherent problem that we keep tripping over with swap because we're using a different chunk of memory altogether and that, again, is why the heap is now such a compelling part of the story. It gives us the power to have complete dynamism in our program. We don't have to worry about memory disappearing just because my function is done executing. So let's actually see how we solve this with this function with actual code. So, copy2.c is also among your printouts and there's a little more line of code here that needs to solve this for us, but the rest of it is pretty much the same. So, let's take a look. This is copy2.c. At the very beginning I, again, demand say something and then I declare s1 to be a string, aka char *, and I store in s1 the string the user types in. Now, that's not quite proper. I store in s1 the address of the first byte that the user typed in and by the way that first bite happens to live in this new place called the heap and that's the only update to the story thus far. Null does not apply. Let's assume that the user typed in a pretty short word we didn't run out of memory or anything crazy so here's the new feature. It looks a little cryptic but it actually just does what it says. So, malloc is a new function. It's defined in a header file just FYI called standard lib. Not stdio but stdlib.h. So that's why I've included that header file. malloc as its name suggests allocates memory and it takes right now a single argument. It just needs to know how many bytes of memory do you want? Well, I don't know yet and although, yes, I do in this story. If the purpose of this program is to make a copy of the string. Well, the user, I don't know in advance what the user is going to type in, right? I certainly don't want to assume they're always going to type a three-letter or a four-letter or a two-letter word I want some dynamism but that's fine because get string can get a string of any length, I can then use the string length function to just ask while the program is running how big is the string that I was handed? How many characters are in it? In this case, I'm going to get back the answer 3 because the user typed in F-O-O, but wait a minute, what's with this? Why the plus 1? The back slash zero, right? Because you have so much control over the computer now, yes, the word is three letters, F-O-O, but if you're going to make a copy of this thing, you better adhere to the conventions of the computer, which allocates one additional byte for that back slash zero so I have to tell malloc I need three, the length of s1 plus an additional 1 and now this is actually just a little safety net. So, I'm now multiplying by the result of calling the size of operator, which we've seen before. We did it for silly purposes just to see how big each data type was, but on most systems the size of a char is what? How many bytes? One. So, on most systems if you didn't remember that, no bigger. FYI, it's almost always 1 so this line here calling size of char is pretty much equivalent on most computers to doing this by just typing in 1 and at which point if you're just going to multiply by 1 you're just wasting time why not just eliminate that all together, but if you write in code that you want to be able to execute on today's computers and the computers that come out next year where maybe the bytes are, maybe the chars are two bytes the right way to write code that does not crash when hardware changes is to actually include something like that time size of char so that if it does change at some point, your program is not going to break. So in the end, this is just saying allocate me as many bytes as were needed to store s1 itself. What am I storing in s2? What must malloc be returning? It's the address of that chunk of memory. So, what's happening here in the story is this. Now, again in heap malloc always gives you memory from the heap not the stack and so as an aside the CS50 Library uses, again, malloc to get memory to use. When I call malloc of that crazy arithmetic expression, I'm really just calling get malloc a four because I need 1, 2, 3, 4 bytes. What malloc does for me is it finds in RAM somewhere where there's four bytes in a row. I have no idea what's here at the moment so I'm just going to draw a question mark because that memory might have been used previously for some other purpose, but we know it's currently available to us so we have four bytes of memory. What does malloc return? It returns the address of this first byte so really the address of the first char here and so what gets stored in s2 now? We're actually making a legitimate copy with this version of the program so I don't have this bug in anymore. I'm going to delete that arrow and actually draw s2 as pointing to this chunk of memory because whereas before this sequence of chars might have lived at address 71 or whatever, well, this one might live at 91. Somewhere else in memory but it's a different number hence it's a different arrow. It's pointing at a different place in the heap. So now a sanity check. Is s2 equal equal to null? Could be. If the user typed in a really long word, we're out of memory, something else went wrong, could be. So, I have to check for that, but nine times out of ten there's not going to be a problem so now let's do this. Well, what does it mean to copy a string? These question marks are not the string yet right? All I did was ask for memory. If I want a copy of the string, I've got to whip out my week 1 skills of just iterating with a four loop from left to right and make copies of those characters. So, what am I doing there? So, I'm initializing a variable called N to the string length of s1. So conceptually that's going to be what, 3 or 4? What's the string length of s1? >> Student: Three. >> David: So it is 3. Even though there's four bytes the length of the string is consistent with what a human being would interpret as the length of the string which is 3. So, what do I next do? I iterate from I equals zero on up to less than N so I'm at this point in the story here plus, plus each time. Well, this is, again, this is kind of like caesar stuff or vigenere where you just copied characters back and forth. So this has the effect of starting at location zero and I do s2 bracket I gets s1 bracket I. Well, what's s1 bracket I? Here's s1, it's pointing to this array, bracket I is bracket zero initially so it's pointing at capital F. Same deal. s2 bracket I is s2 bracket zero which means here so the question mark now gets clobbered or overwritten with literally what was above it here. Now what happens here? Well, this question mark becomes an O, this question mark becomes an O, and then the loop terminates because it's iterating from zero to N so that's zero, 1, 2 and the length of the string is 3 so the loop terminates, but I remember that I needed to have this special sentinel value so I'm just going to put it there manually. I could have done this in another way but I am going to just be extra deliberate so I make sure that this copy ends with back slash zero and voila, now, I have a copy. So, if I compile and run copy2.c now let's see what happens. Make copy2, all right, run copy2 enter. I'm going to go ahead and type foo in all lowercase enter and finally now I seem to have a working program because this foo is at a different location than this foo and so I have, in fact, copied the actual copy or I have actually capitalized the actual copy. Any questions on this copy program? Anything at all? All right. So let's look at one other example that we can then visualize with a fun video of sorts. So, let's see this eraser actually never works very well so I'm just going to draw below it. So, suppose I'm in a function, I need the space. All right. All right. So I am going to on the fly write a very simple little main program that just illustrates some of the syntax that we can then visualize more interesting. So, int main, I don't care about any command line argument so I'm just going to say void. All right so now here's the start of my program. It's not going to be very interesting yet, but it will allow us to tease apart these ideas with, well, we'll see some Play-Doh. All right so I want first a pointer. I'm going to call it x and how do I make a pointer to an int? Well, I use the * notation. So we've seen this before. So I have int *x. All right. Now I'm going to have another pointer to an int. I'm going to call it int *y enter or, sorry, semicolon, and then what if I do this? Based on the lessons we've been telling if I say *, you know what I have a computer. Why am I using chalk? All right. [Laughter] There we go. Let's modernize this program in main, boy, this is a lot easier it turns out. Okay, and all of you can actually see it. So, int *x. Now get int *y so now I'll use the board for things I can't really draw very well with the keyboard so what does the memory of my program look like at the moment? Well, at this point I have two boxes. A square it's called X, another square it's called y and now this time I'm doing pointers to int not points to char. So, should I be drawing these boxes four times bigger since ints are four times bigger than chars usually? So a little sanity check. No. Even though the key word has changed from char to int, the * is the same and the fact that I've got a * there is saying this is a pointer to an int or a pointer to a char pointers are always the same size on a computer. They are always say 32 bits or always 64 bits they do not vary in size based on what they're pointing at. All right so now I have this part of the story. Let me go ahead and do this. At this point in the story you know what's inside of this thing? Well, who knows frankly. Question mark. Right? I've allocated a variable, it happens to be a pointer; it's still a variable, who know what's in it, right? We've seen in class that you get garbage values if you don't initialize something to a value so let me actually initialize x to a value. I'm going to now use this new, fancy function called malloc and I'm going to say x gets the return value of malloc of the size of an int. So what does this line of code do? Well, somewhere in the world there's this thing called the heap, foo is going to be no more. This is a new program, but I'll keep drawing the heap over here. So, when I call malloc size of int and int on this system is probably 4 bytes or 32 bits so what am I getting back? Well, I'm going to get back, let's just draw a bigger square than usual that's roughly four times the size. This is now on the heap it's four bytes so what gets stored in X? What's that? Yeah, a pointer to this chunk of memory. Conceptually it's this, x has been assigned the return value of malloc, malloc returns the address of the chunk of memory that you've asked for of the operating system so pictorially we just have an arrow. Now, really underneath the hood? Well, maybe this ended up at location 7, 8, 9 so what's really in x is the number 7, 8, 9, which is the address in memory, but again, uninteresting. The picture is more compelling with an arrow at this point. So, now I have x pointing at a chunk of memory what's in this memory though? Honestly who knows. It's just some random chunk of four bytes that happens to be available at this point in time, but I can put something there. Let me go ahead and put the number 42. So, is this correct? >> Student: No. >> David: Okay. So, leading question. It's not correct, but why? If I do x equals 42, I'm not going to put the 42 here, where am I going to put it? It's going to go here, which means I'm going to lose this arrow which means I've just asked for memory, I've lost track of it and so we mentioned in week zero I think the notion of a memory leak, which is often creates the symptom of your computer slowing down, you need to reboot, it just gets slower and slower and slower. Well, if you forget where the memory is that you asked for, well, in fact, that is how you make a memory leak. You just forget where the memory is that you asked for on the heap and it's not going to get cleaned up on the stack because malloc puts it somewhere else. So this isn't quite right. We need to tell the computer go to the address in x and put 42 there so is the symbology there & or *? It is *. Star means go there. So, now we have said go to the address stored in X, which is who knows, it's over there, put the number 42 there so what I've just drawn is this part of this story here. Well, now let's do something stupid. Let's actually screw this up and let's actually say, hmm, how about *y? Let's put something there. *y gets 13, an unlucky number. In fact, what have I probably just done? Segfault. And why might I possibly have done that? Why did that possibly happen? Star y gets 13. Same exact syntax. Well, yeah, we don't have memory being pointed at. In fact, we kind of do in a sense. This question mark I mean it's obviously not a question mark. There's some bogus value here like 9, 9, 9. Why not, whatever? It's just reminiscent of a previous program or a previous function the number 9, 9, 9 so what does that really mean? That means pictorially this y by default is pointing somewhere, but who knows where? And so when I say *y gets 13, okay, I interpret, misinterpret this value as a pointer and I say go to location 9, 9, 9. That is here in RAM. I can certainly try to write the number 13 there, but bam, my program very likely will crash if this chunk of memory was not given to me previously by the operating system and maybe it is owned by the operating system. Maybe it's really close to the number zero to null and, therefore, I'm absolutely going to crash. So, this was bad. So, we'll have to fix that but let's do something correct. Can I do this? y gets X? So, let me say this is bad, this line, y gets X. This is legitimate. Right? If I want to store in y the same thing that's in X, that's just like drawing what on the board here? An arrow to the same thing, right? We played that game last time, right? y gets the same value of x so pictorially points the same place and so now with the last line of code what if I do this, *y gets 13? How does the picture change? The 42 becomes a 13 and just as we did for the incorrect copy, now both what x is pointing at and what y is pointing at has, in fact, changed in memory. It becomes a 13 and that is consistent with the code and it also doesn't crash because now y has been assigned to the correct value. So this we would want to delete in order for the code not to run the risk of crashing, but let's now see this was made by an excellent teacher out at Stanford University. It's kind of one of these things gets passed around quite a bit. Allow me, it's about 2, 3 minutes here to introduce you to someone called Binky. Though this might look silly and it is, in fact, Play-doh it is a professor at Stanford University so it's legit. All right. [ Video playing ] >> Hey, Binky. Wake up. It's. >> David: And that is the professor's voice. [Laughter] >> Time for pointer fun. >> What's that? Learn about pointers? Oh, goody. >> Well, to get started I guess we're going to need a couple of pointers. >> Okay. This code allocates two pointers which can point to integers. >> Okay. Well, I see the two pointers, but they don't seem to be pointing to anything. >> That's right. Initially pointers don't point to anything. The things they point to are called pointees and setting them is a separate step. >> Oh, right, right. I knew that. The pointees are separate. So, how do you allocate a pointee? >> Okay. Well, this code allocates a new integer pointee and this part sets x to point to it. >> Hey, that looks better. So, make it do something. >> Okay. I'll dereference the pointer x to store the number 42 into its pointee. For this trick I'll need my magic wand of dereferencing. >> Your magic wand of dereferencing? That's great. >> This is what the code looks like. I'll just set up the number and. >> Hey, look, there it goes. So, doing a dereference on x follows the arrow to access its pointee. In this case, to store 42 in there. Hey, try using it to store the number 13 through the other pointer Y. >> Okay. I'll just go over here to y and get the number 13 set up and then take the wand of dereferencing and just, whoa. >> Oh, hey. That didn't work. Say, Binky, I don't think dereferencing y is a good idea because, you know, setting up the pointee is a separate step and I don't think we ever did it. >> Good point. >> Yeah. We allocated the pointer y but we never set it to point to a pointee. >> Very observant. >> Hey, you're looking good there, Binky. Can you fix it so that y points to the same pointee as X? >> Sure. I'll use my magic wand of pointer assignment. >> Is that going to be a problem like before? >> No, this doesn't touch the pointees. It just changes one pointer to point to the same thing as another. >> Oh, I see. Now, y points to the same place as X. So, wait now y is fixed. It has a pointee. So you can try the wand of dereferencing again to send the 13 over. >> Okay. Here it goes. >> Hey, look at that. Now, dereferencing works on y and because the pointers are sharing that one pointee, they both see the 13. >> Yeah, sure, whatever. So, are we going to switch places now? >> Oh, look, we're out of time. >> But. >> Just remember the three pointer rules. Number one, the basic structure is that you have a pointer and it points over to a pointee, but the pointer and pointee are separate and the common error is to set up a pointer but to forget to give it a pointee. Number two, pointer dereferencing starts at the pointer and follows its arrow over to access its pointee. As we all know, this only works if there is a pointee, which kind of gets back to rule number one. Number three, pointer assignment, takes one pointer and changes it to point to the same pointee as another pointer so after the assignment the two pointers will point to the same pointee. Sometimes that's called sharing and that's all there is to it, really. Bye-bye now. >> David: So this is just the beginning of this new power you now have. We will return on Monday and do yet more powerful things with this. See you then. ==== Transcribed by Automatic Sync Technologies ====