DAVID MALAN: Hello, and welcome back to CS50. So this is the end of week four. Just one announcement first. So the so-called fifth Monday is coming up this coming Monday. This is the opportunity to change from SAT/UNSAT to a letter grade, or from letter grade SAT/UNSAT. Annoyingly, that process does require a signature, because you have to fill out one of those pink add/drop forms. Because technically, the SAT/UNSAT version and the letter grade version have distinct catalog numbers. But no big deal. Just come up to me or to Rob or to Lauren at any point. Or email us if you don't have the kind of paperwork you need today, and we will be sure to help you take care of that before Monday. All right, so today-- actually, there's a bit of an echo. Can we tone me down a bit? OK. So today, we introduce a topic known as pointers. And I'll admit that this is one of the more complex topics that we tend to cover in this class, or really any introductory course that uses C. But take my word for it, particularly if your mind feels a bit more bent today and in the weeks to come. It's not representative of you getting any worse at this it just means that it's a particularly sophisticated topic that I promise, a few weeks hence, will seem all too strikingly straightforward in retrospect. I still remember to this day. I was sitting in Elliott Dining Hall, sitting next to my TF Nishat Mehta, who was a resident of Elliott house. And for some reason, this topic just clicks. Which is to say that I too struggled with it for some amount of time, but I will do my best to help avoid any such struggle with a topic that ultimately is quite powerful. In fact, one of the topics we'll discuss in the weeks to come is that of security, and how you can actually exploit machines in ways that were not intended. And those exploitations are typically the result of bugs, mistakes that we people make by not understanding some of the underlying implementation details via which programs are made. Now to make this seem all the more user friendly, I thought I'd play a 10 second preview of a little claymation figure named Binky who was brought to life by a friend of ours at Stanford, professor Nick Parlante. So allow me to give you this teaser of Binky here. [VIDEO PLAYBACK] -Hey, Binky. Wake up. It's time for pointer fun. -What's that? Learn about pointers? Oh, goodie. [END VIDEO PLAYBACK] DAVID MALAN: That is Stanford computer science. So more on that to come. [APPLAUSE] DAVID MALAN: Sorry, Nick. So recall that last time we ended on this really exciting cliffhanger whereby this function just didn't work. At least intuitively, it felt like it should work. Simply swapping the values of two integers. But recall that when we printed out the original values in main, one and two, they were still one and two and not two and one. So let me actually switch over to the appliance. And I wrote up a bit of skeletal code in advance here, where I claim that x will be 1, y will be 2. I then print out both of their values with print f. I then claim down here that we're going to swap them. I left a blank spot here for us to fill in today in just a moment. Then, I'm going to claim that the two variables have been swapped. Then I'm going to print them out again. And so hopefully, I should see 1, 2. 2, 1. That's the super simple goal right now. So how do we go about swapping two variables? Well if I propose here that these cups might represent memory in a computer. This is a few bites, this is another few bites. Could we have a volunteer come on up and mix us some drinks, if familiar? Come on up. What's your name? JESS: Jess. DAVID MALAN: Jess? Come on up, Jess. If you don't mind, we have to put the Google Glass on you so we can immortalize this. OK, glass. Record a video. And OK, we are good to go with Jess here. All right. Nice to meet you. So what I'd like you do here-- if you could, quite quickly-- just pours us half a glass of orange juice and half a glass of milk, representing effectively the numbers 1 in one cup and 2 in the other cup. This is going to be good footage. JESS: Sorry. DAVID MALAN: No, no. It's OK. Nice. All right, so we have four bytes worth of orange juice. We'll called it the value 1. Now another four bytes worth of milk. Will call it value 2. So x and y, respectively. All right, so now if the task at hand-- for you, Jess, in front of all of your classmates-- is to swap the values of x and y such that we want the orange juice in the other cup and the milk in this cup, how might you-- before you actually do it-- go about doing this? OK, wise decision. So you need a bit more memory. So let's allocate a temporary cup, if you will. And now proceed to swap x and y. Excellent. So very well done. Thank you so much, Jess. Here you are. A little souvenir. OK, so obviously, super simple idea. Completely intuitive that we need a bit more storage space-- in this form, a cup-- if we actually want to swap these two variables. So let's do exactly that. Up here in between where I claim I'm going to be doing some swapping, I'll go ahead and declare temp. And I'll set it equal to, say, x. Then I'm going to change the value of x just like Jess did here with the milk and orange juice to be equal to y. And I'm going to change y to be equal to not x, because now we would be stuck in a circle, but rather temp. Where I temporarily-- or where Jess temporarily put the orange juice before clobbering that cup with the milk. So let me go ahead now and make this. It's called noswap.c. And now let me run no swap. And indeed I see, if I expand the window a little bit, that x is 1, y is 2. And then x is 2, y is 1. But recall that on Monday we did things a little differently whereby I instead implemented a helper function, if you will, that was actually void. I called it swap. I gave it two parameters, and I called them a and I called them b. Frankly, I could call them x and y. There's nothing stopping me from doing that. But I would argue it's then a little ambiguous. Because recall for Monday that we claimed that these parameters were copies of the values passed in. So it just messes with your mind, I think, if you use exactly the same variables. So I'll instead call them a and b, just for clarity. But we could call them most anything we want. And I'm going to copy and paste effectively this code from up there down into here. Because I just saw that it works. So that's in pretty good shape. And I'll change my x to a, my x to a, my y to b and my y to b. So in other words, exact same logic. The exact same thing that Jess did. And then the one thing I have to do up here, of course, is now invoke this function, or call this function. So I will call this function with two inputs, x and y, and hit Save. All right, so fundamentally the same thing. In fact, I've probably made the program unnecessarily complex by writing a function that's just taking some six lines of code whereas I previously had implemented this in just three. So let me go ahead now and remake this, make no swap. All right, I screwed up here. This should be an error that you might see increasingly commonly as your programs get more complex. But there's an easy fix. Let me scroll back up here. And what's the first error I'm seeing? Implicit declaration. What does that typically indicate? Oh, I forgot the prototype. I forgot to teach the compiler that swap is going to exist even though he doesn't exist at the very beginning of the program. So I'm just going to say void, swap, int, a int b, semicolon. So I'm not going to reimplement it. But now it matches what's down here. And notice, the absence of a semicolon here, which is not necessary when implementing. So let me remake this, make no swap. Much better shape. Run no swap. And damn it. Now we're back where we were on Monday, where the thing didn't swap. And what's the intuitive explanation for why this is the case? Yeah? STUDENT: [INAUDIBLE]. DAVID MALAN: Exactly. So a and b are copies of x and y. And in fact, any time you've been calling a function thus far that passes variables like ints-- just as swap is expecting here-- you guys have been passing in copies. Now that means it takes a little bit of time, a split second, for the computer to copy the bits from one variable into the bits of another. But that's not such a big deal. But they're nonetheless a copy. And so now, in the context of swap, I am in fact successfully changing a and b. In fact, let's do a quick sanity check. Print f a is %i, new line. And let's plug in a. Now let's do the same thing with b. And let's do the same thing here. And now, let me copy those same lines again at the bottom of the function after my three lines of interesting could have executed, and print a and b yet again. So now let's make this, make no swap. Let me make the terminal window a bit taller, so that we can see more of it at once. And run no swap. x is 1, y is 2. a is 1, b is 2. And then, a is 2, b is 1. So it is working, just like Jess did here inside of swap. But of course, it's having no effect on the variables in main. So we saw a trick whereby we could fix this, right? When you're faced with this scoping issue, you could just punt and make x and y what kind of variables instead? You could make them global. Put them at the very top of the file as we did, even in the game of 15. We use a global variable. But in the context of the game a 15, it's reasonable to have a global variable representing the board, because the entirety of 15.c is all about implementing that game. That's what the file exists to do. But in this case here, I'm calling a function swap. I want to swap two variables. And it should start to feel just sloppy if the solution to all of our problems when we run into scope issues is make it global. Because very quickly our program is going to become quite a mess. And we did that very sparingly as a result in 15.c. But it turns out there's a better way altogether. Let me actually go back and delete the print f's, just to simplify this code. And let me propose that this, indeed, is bad. But if I instead add in some asterisks and stars, I can instead turn this function into one that's actually operational. So let me go back here and admit saying asterisks is always difficult, so I'll say stars. I'll just fess up to that one. All right. And now, what am I going to do instead? So first of all, I'm going to specify that instead of passing an int into the swap function, I'm instead of going to say int star. Now, what does the star indicate? This is that notion of a pointer that Binky, the claymation character, was referring to a moment ago. So if we say int star, the meaning of this now is that a is not going to be passed in by its value. It's not going to be copied in. Rather, the address of a is going to be passed in. So recall that inside of your computer is a whole bunch of memory, otherwise known as RAM. And that RAM is just a whole bunch of bytes. So if your Mac or your PC has two gigabytes, you have 2 billion bytes of memory. Now let's just suppose that just to keep things nice and orderly, we assign an address-- a number-- to every byte of RAM in your computer. The very first byte of those 2 billion is by number zero. The next one is byte number one, number two, all the way on up, dot dot dot, to roughly 2 billion. So you can number of the bytes of memory in your computer. So let's assume that that's what we mean by an address. So when I see int star a, what's going to be passed into swap now is the address of a. Not its value, but whatever its postal address is, so to speak-- its location in RAM. And similarly for b, I'm going to say the same thing. Int, star, b. As an aside, technically the star could go in other locations. But we'll standardize on the star being right next to the data type. So swap signature now means, give me the address of an int, and call that address a. And give me another address of an int and call that address b. But now my code here has to change. Because if I declare int temp-- which is still of type int-- but I store in it a, what kind of value? To be clear, am I putting an a with the code as written right now? I'm putting the location in a. But I don't care about the location now, right? Temp exists just Jess' third cup existed, for what purpose? To store a value. Milk or orange juice. Not to actually store the address of either of those things, which feels a little nonsensical in this real world context anyway. So really, what I want to put in temp is not the address of a, but the contents of a. So if a is a number like 123, this is the 123rd byte of memory that a just happens to be occupying, that the value in a happens to be occupying. If I want to go to that address, I need to say star a. Similarly, if I were to change what's at the address a, I change this to start a. If I want to store in what's at the location a with what's at the location at b, star b star. So in short, even if this isn't quite sinking in yet-- and I wouldn't expect that it would so fast-- realize that all I'm doing is prefixing these stars to my variables, saying don't grab the values. Don't change the values. But rather, go to those addresses and get the value. Go to that address and change the value there. So now let me scroll back up to the top, just to fix this line here, to change the prototype to match. But I now need to do one other thing. Intuitively, if I've changed the types of arguments that swap is expecting, what else do I need to change in my code? When I call swap. Because right now, what am I passing to swap still? The value x and the value of y, or the milk and the orange juice. But I don't want to do that. I instead want to pass in what? The location of x and the location of y. What are their postal addresses, so to speak. So to do that, there's an ampersand. Ampersand kind of sounds like address. so n, ampersand, the address of x, and the address of y. So it's deliberate that we use ampersands when calling the function, and stars when declaring and when implementing the function. And just think of ampersand as the address of operator, and star as the go there operator-- or, more properly, the dereference operator. So that's a whole lot of words just to say that now, hopefully, swap is going to be correct. Let me go ahead and make-- let's actually rename the file, lest this program still be called no swap. I claim that we'll call it swap.c now. So make, swap. Dot, slash, swap. And now indeed, x is 1, y is 2. And then, x is 2, y is one. Well let's see if we can't do this a little bit differently as to what's going on here. First, let me zoom in on our drawing screen here. And let me propose for a moment-- and whenever I draw here will be mirrored up there now-- let me propose that here's a whole bunch of memory, or RAM, inside of my computer. And this will be bite number, let's say, 1. This will be bytes number 2. And I'll do a whole bunch more, and then a bunch of dot dot dots to indicate that there's 2 billion of these things. 4, 5, and so forth. So there are the first five bytes of my computer's memory. All right? Very few out of 2 billion. But now I'm going to propose the following. I'm going to propose that x is going to store the number 1, and y is going to store the number 2. And let me go ahead now and represents these values as follows. Let's do this as follows. Give me just one second. One second. OK. I want to make this a little-- let's do this again. Otherwise I'm going to and using the same numbers, unintentionally, multiple times. So just so we have different numbers to talk about, let's call this byte number 123, 124, 125, 126, and dot dot dot. And let me claim now that I'm going to put the value 1 here, and the value 2 here, otherwise known as x and y. So it just so happens that this is x, this is y. And just by some random chance, the computer, the operating system, happened to put x at location number 123. And y ended up at location 124-- damn it. I should have fixed this. Oh man, do I really want to do this? Yes, I want to fix this and b proper about this today. Sorry, new at this. 127, 131, and I didn't want to be this complex, but why did I change the numbers there? Because I want the ints to actually be four bytes. So let's be super anal about this. So that if 1 happens to be addressed 123, the 2 is going to be at address 127 because it's just 4 byes away. That's all. And we'll forget about all of the other addresses in the world. So x is at location 123, y is at location 127. And now, what do I actually want to do? When I call swap now, what's actually going on? Well, when I call swap, I'm passing in the address of x and the address of y. So for instance, if these two pieces of paper now represent the two arguments a and b to swap, what am I going to write on the first of these, which I'm going to call refer to as a? Exactly, 123. So this I claim is a. This is the parameter a. I'm putting the address of x in there. What's that? What's that? No, no. That's OK. Still good, still good. So this is a. And now on the second piece of paper, this is going to be b, and what am I going to be writing on this piece of paper? 127. So the only thing that's changed since our previous telling of this story is, rather than literally 1 and 2, I'm going to pass in 123 and 127. And I'm now going to put these inside of this box, all right? So that black box now represents the swap function. Meanwhile, let's now have someone implement the swap function. Would someone up here like to volunteer? Come on up. What's your name? Charlie. All right, Charlie. Come on up. So Charlie is going to play the role of our black box. And Charlie, what I'd like you to do now is implement swap in such a way that, given those two addresses, you were actually going to change the values. And I'll whisper in your ear how to run the TV here. So go ahead, and you're the black box. Reach in there. What values do you see for a, and what values do you see for b? CHARLIE: a is 123 and b is 127. DAVID MALAN: OK, exactly. Now pause there for just a moment. The first thing you're going to do now, according to the code-- which I'll now pull up on the screen-- is going to be to allocate a little bit of memory called temp. So I'm going to go ahead and give you that memory. So this is going to be a third variable that you have accessible to you called temp. And what are you going to write on the temp piece of paper? CHARLIE: Pointers, right? DAVID MALAN: OK, well not necessarily pointers. So the line of code that I've highlighted on the right hand side, let's start there. It says star a. So a is currently storing the number 123. And just intuitively, what did star 123 mean? But specifically, if a is 123, star a means what? The value of a. Or more casually, go there. So let me propose that, holding the a in your hand, go ahead and treat that as though it's a map. And walk yourself over to the computer's memory, and find us what is at location 123. Exactly. So we see at location 123 is what, obviously? OK, so what value now are you going to put into temp? Exactly. So go ahead and do that. And write the number 1 on the piece of paper that's currently titled temp. And now the next step that you're going to implement is going to be what. Well, on the right hand side of the next line of code is star b. b, of course, stores an address. That addresses 127. Star b means what, casually speaking? Go to that location. So go ahead and find us what's at location 127. OK. Of course, at location 127, is still the value 2. So what are you going now store at whatever's at the location in a? So star a means go to the location a. What is the location a? Exactly. So now, if you want to change what's at that location-- I'll go ahead and run the eraser are here. And now put it back on the brush. What number are you going to write in that blank box now? Exactly. So this line of code, to be clear-- let me pause what Charlie's doing and point out here, what he's just done is write into that box at location 123 the value that was previously at b. And so we've now implemented indeed this second line of code. Now unfortunately, there's still one line remaining. Now what is in temp, literally? It's obviously the number one. That's not an address. It's just a number, sort of a variable from week one. And now when you say star b, that means go to the address b, which is of course here. So once you get there-- I'll go ahead and erase what's actually there-- and what are you going to write now at location 127? CHARLIE: Temp, which is one. DAVID MALAN: Temp, which is one. And what happens to temp in the end? Well, we don't really know. We don't really care. Any time we've implemented a function thus far, any local variables you have are indeed local. And they just disappear. They're reclaimed by the operating system eventually. So the fact that temp still has the value 1 is sort of fundamentally uninteresting to us. All right, so a round of applause if we could for Charlie. Very well done. All right, so what more does this mean we can do? So it turns out that we've been telling a few white lies for quite some time. Indeed, it turns out that a string, all of this time, is not really a sequence of characters per se. It kind of is that intuitively. But technically speaking, string is a data type that we declared inside of the CS50 library to simplify the world for the first few weeks of class. What a string really is is the address of a character somewhere in RAM. A string is really a number, like 123 or 127, that happens to demarcate where a string begins in your computer's memory. But it doesn't represent the string, per se, itself. And we can see this as follows. Let me go ahead and open up some code that's among today's source code examples. And I'm going to go ahead and open up, let's say, compare-0.c. This is a buggy program that is going to be implemented as follows. First. I'm going to say something. Then I'm going to go ahead and get a string from the user in that next line. Then I'm going to say it again. Then I'm going to get another string from the user. And notice, I'm showing one of the strings in a variable called s, and another of these strings in a variable called t. And now I'm going to claim, very reasonably, that if s equals equals t, the strings are the same. You type the same thing. Else, the strings are not the same thing. After all, if we input two ints, two chars, two floats, two doubles, any of the data types we've talked about thus far to compare them-- recall we made very clear a while ago that you don't do this, because a single equal sign is of course the assignment operator. So that would be a bug. We use the equal equal sign, which indeed compares things for true equality. But I claim this is buggy. If I go ahead and make compare zero, and then do dot slash compare zero. And I type in, let's say, hello. And then let's say hello again. Literally the same thing, the computer claims I typed different things. Now maybe I just mistyped something. I'll type my name this time. I mean, hello. Hello. It's different every single time. Well, why is that? What's really going on underneath the hood? Well, what's really going on underneath the hood is the string then I typed in that first time for instance is the word hello, of course. But if we represent this underneath the hood, recall that a string is in an array. And we've said as much in the past. So if I draw that array like this, I'm going to represent something quite similar to what we did a moment ago. And there's actually something special here, too. What did we determine was at the end of every string? Yeah, this backslash zero, which is just the way of representing, literally, 00000000. Eight 0 bits in a row. I don't know, frankly, what's after this. That's just a bunch more RAM inside of my computer. But this is an array. We talked about arrays before. And we typically talk about arrays as being location zero, then one, then two. But that's just for convenience. And that's entirely relative. When you're actually getting memory from the computer, it's of course any 2 billion some odd bytes, potentially. So really underneath the hood, all this time, yes. This might very well be bracket zero. But if you dig even deeper underneath the hood, that's really address number 123. This is address 124. This is address 125. And I didn't screw up this time. These are now one bytes apart for what reason? How big is a char? A char is just one byte. An int is typically four bytes. So that's why I made it 123, 127, 131 and so forth. Now I can keep the math simpler and just do plus 1. And this is now what's really going on underneath the hood. So when you declare something like this, string s, this is actually-- it turns out-- char star. Star, of course, means address, aka pointer. So it's the address of something. What is it the address of? Well-- I'm the only one who can see the very important point I'm making, or think I'm making. So string-- the sad thing is I have a monitor right there where I could have seen that. All right, so string s is what I declared previously. But it turns out, thanks to a little magic in the CS50 library, all this time string has literally been char star. The star again means pointer or address. The fact that it's flanking the word char means it's the address of a character. So if get string is called, and I type in H-E-L-L-O, propose now what has get string literally been returning all of this time, even though we've rather oversimplified the world? What does get string actually return as its return value? 123 in this case, for instance. We've previously said that get string simply returns a string, a sequence of characters. But that's a bit of a white lie. The way get string really works underneath the hood is it gets a string from the user. It plops the characters that he or she types in memory. It puts a backslash zero at the end of those sequence of characters. But then what does get string literally return? It literally returns the address of the very first bytes in the RAM that it used for that strength. And it turns out that just by returning a single address of the first character in the string, that is sufficient for finding the entirety of the string. In other words, get string does not have to return 123 and 124 and 125. It doesn't have to give me a long list of all of the bytes that my string is using. Because one, they're all back to back. And two, based on the first address, I can figure out where the string ends. How? The special null character, the backslash zero at the end. So in other words, if you pass around-- inside of variables-- the address of a char, and you assume that at the end of any string, any sequence of characters as we humans think of strings, if you assume that at the end of any such string there's a backslash zero, you're golden. Because you can always find the end of a string. Now what's really then going on in this program? Why is this program, compare-0.c, buggy? What is actually being compared? Yeah? STUDENT: [INAUDIBLE]. DAVID MALAN: Exactly. It's comparing the locations of the strings. So if the user has typed in hello once, as I did, memory might end up looking like this. If the user then types in hello again, but by calling get string again, c is not particularly clever unless you teach it to be clever by writing code. C-- and computers more generally-- if you type in the word hello again, you know what you're going to get. You're just going to get a second array of memory that, yes, happens be storing H-E-L-L-O and so forth. It's going to look the same to us humans, but this address might not be 123. It might just so happen that the operating system has some available space for instance at location-- let's say something arbitrary, like this is location 200. And this is location 201. And this is location 202. We have no idea where that's going to be in memory. But what this means is that what is going to be stored ultimately in s? The number 123. What's going to be stored in t, in this arbitrary example? The number 200. And all that means then is obviously, 123 does not equal 200. And so this if condition never evaluates to true. Because get string is using different chunks of memory each time. Now we can see this again in another example. Let me go ahead and open up copy-0.c. I claim that this example is going to try-- but fail-- to copy two strings as follows. I'm going to say something to the user. I'm then going to get a string and call it s. And now, I'm doing this check here. We mentioned this a while back. But when might get string return null, another special character, or special symbol let's say. If it's out of memory. For instance, if the user is really being difficult and types an atrocious number of characters at the keyboard and hits Enter. If that number of characters just can't fit in RAM for whatever crazy reason, well get string might very well return null. Or if your program itself is doing a lot of other things and there's just not enough memory for get string to succeed, It might end up returning null. But let's be more precise as to what this is. What is s's data type really? Char star. So it turns out now we can peel back the layer of null. Turns out, null is-- yes, obviously a special symbol. But what is it really? Really, null is just a symbol that we humans use to represent zero as well. So the authors of C, and computers more generally, decided years ago that, you know what. Why don't we ensure that no user data is ever, ever, ever stored at bye zero? In fact, even in my arbitrary example before, I didn't start numbering the bytes at zero. I started at one. Because I knew that people in the world have decided to reserve the zero byte in anyone's RAM as something special. The reason being, anytime you want to signal that something has gone wrong with regard to addresses, you returned null-- otherwise known as zero-- and because you know that there's no legit data at address zero, clearly that means an error. And that's why we, by convention, check for null and return something like one in those cases. So if we scroll down now, this is just then some error checking, just in case something went wrong with [? bail ?] altogether and quit the program by returning early. This line now could be rewritten as this, which means what? On the left hand side, give me another pointer to a character, and call it t. What am I storing inside of t, based on this one line of code? I'm storing a location. Specifically the location that was in s. So if the user has typed in hello, and that first hello happens to end up here, then the number 123 is going to come back from get string and be stored-- as we said earlier-- in s. When I now declare another pointer to a char and call it t, what number is literally going to end up in t according to the story? So 123. So technically now both s and t are pointing to the exact same chunks of memory. So notice what I'm going to do now to prove that this program is buggy. First I'm going to claim, with a print f, capitalizing the copy of the string. Then I'm going to do a little error checking. I'm going to make sure. Let's make sure that the string t is at least greater than zero in length, so there's some character there to actually capitalize. And then you might recall this from previous examples. 2 upper-- which is in the ctype.h file. T bracket zero gives me the zero character of the string t. And 2 upper of that same value, of course, converts it to uppercase. So intuitively, this highlighted line of code is capitalizing the first letter in t. But it's not capitalizing, intuitively, the first letter in s. But if you're thinking ahead, what am I about to see when I run this program and print out both the original, s, and the so-called copy, t? They're actually going to be the same. And why are they going to be the same? They're both pointing to exactly the same thing. So let's do this. Make copy zero. It compiles OK. Let me run copy zero. Let me type something like hello in all lowercase then hit Enter. And it claims that both the original s and the copy are indeed identical. So what really happened here? Let me redraw this picture just to tell the story in a slightly different way. What's really going on underneath the hood when I declare something like char start s, or string s, I am getting a pointer-- which happens to be four bytes in the CS50 appliance and in a lot of computers. And I'm going to call this s. And this currently has some unknown value. When you declare a variable, unless you yourself put a value there, who knows what's there. It could be some random sequence of bits from the previous execution. So when I, in my line of code do get string, and then store the return value in s get string somehow-- and we'll eventually peel back how get string works, somehow allocates an array that probably looks a bit like this. H-E-L-L-O, backslash zero. Let's suppose that this is address 123 just first consistency. So get string returns, in the highlighted line there, it returns the number we said, 123. So what really goes inside of s here? Well, what really goes inside of s is 123. But frankly, I'm getting a little confused by all of these addresses, all of these arbitrary numbers. 123, 124, 127. So let's actually simplify the world a little bit. When we talk about pointers, frankly, to us humans, who the heck cares where things are in memory? That's completely arbitrary. It's going to depend on how much RAM the user has. It's going to depend on when in the day you run the program, perhaps, and what input the user gives you. We're dwelling on unimportant details. So let's abstract away and say that, when you run a line of code like this, char star s gets the return value of get string. Why don't we instead just draw what we keep calling a pointer as though it's pointing at something? So I claim now that s up there is a pointer-- underneath the hood it's an address. But it's just pointing to the first byte in the string that's been returned. If I now return to the code here, what's going on at this line? Well, in this highlighted line now, I'm declaring apparently another variable called t. But it's also a pointer, so I'm going to draw it as, in theory, the exact same size box. And I'm going to call it t. And now if we go back to the code again, when I store s inside of t, what am I technically putting inside of t? Well technically, this was the number 123. So really I should be writing the number 123 there. But let's take it higher level. t, if it is just a pointer, intuitively, is just that. That's all that's being stored in there. So now in the last interesting lines of code, when I actually go about capitalizing the zero character in t, what is going on? Well, t bracket zero is now pointing to what character, presumably? It's pointing to h. Because t bracket zero-- recall, this is old syntax. t bracket zero just means if t is a string, t bracket zero means getting the zero character in that strength. So what that really means is go to this array-- and yes, this might be 123, this might be 124. But it's all relative, remember. Whenever talking about an array, we have the advantage of talking about relative indices. And so now we can just assume that t bracket zero is h. So if I call 2 upper on it, what that's really doing is capitalizing the lowercase h to uppercase H. But of course, what is s? It's pointing to the same darn string. So this is all that's been happening in this code so far. So what's then the implication? How do we fix these two problems? How do we compare to actual strings? Well intuitively, how would you go about comparing two strings for true equality? What does it mean if two strings are equal? Clearly not that their addresses are equal in memory, because that's a low level implementation detail. All the characters are the same. So let me propose, and let me introduce in version one of compare.c here, so compare-1.c. Let me propose that we still get a pointer called s, and store in it the return value of get string. Let's do the same thing with t. So none of the code is different. I'm going to add a little more error checking now. So now that we're sort of peeling back this layers in CS50 of what a string actually is, we need to be more anal about making sure we don't abuse invalid values like null. So I'm just going to check. If s does not equal null and t does not equal null, that means we're OK. Get string did not screw up getting either of those strings. And you can perhaps guess now, what does STR CMP presumably do? String compare. So if you've programme in java before, this is like the equals method in the string class. But for those of you who haven't programmed before, this is just a c function. It happens to come in a file called string.h. That's where it's declared. And string compare-- I actually forget its usage, but never mind that. Recall that we can do man, stir compare. And this is going to bring up the Linux programmers manual. And it's, frankly, a little cryptic. But I can see here that, yep. I have to include string.h. And it says here under description, "the string compare function compares the two strings S1 and S2." And S1 and S2 are apparently the two arguments passed in. I don't really remember what const is, but now notice-- and you may have seen this already when you've use the man pages if you have it all-- that char star is just synonymous with string. So it compares the two strings, S1 and S2, and it returns an integer less than or equal to or greater than zero if S1 is found, respectively, to be less than, or match, or be greater than S2. That's just a very complex way of saying that string compare returns zero if two strings are intuitively identical, character for character for character. It returns a negative number if s, alphabetically, is supposed to come before t. Or returns a positive number if s is supposed to come after t alphabetically. So with this simple function, could you, for instance, sort a whole bunch of words? So in this new version, I'm going to go ahead and make compare1. Dot slash compare one. I'll type in hello in all lower case. I'm going to type in hello in all lowercase again. And thankfully now it realizes I typed the same thing. Meanwhile, if I type in hello in lower case and HELLO in upper case and compare them, I typed different things. Because not only are the addresses different, but we're comparing different characters again and again. Well let's go and fix one other problem now. Let me open up version one of copy, which now addresses this issue as follows. And this one's going to look a little more complex. But if you think about what problem we need to solve, hopefully this will be clear in just a moment now. So this first line, char start t, in layman's terms could someone propose what this line here means? Char star t, what is that doing? Good. Create a pointer to some spot in memory. And let me refine it a little bit. Declare a variable that will store the address of some char in memory, just to be a little more proper. OK, so now on the right hand side, I've never seen one of these functions before, malloc. But what might that mean? Allocation of memory. Memory allocation. So it turns out, up until now, we haven't really had a powerful way of asking the operating system, give me some memory. Rather, we now have a function called malloc that does exactly that. Even though this is a bit of a distraction right now, notice that in between the two parentheses is just going to be a number. Where I've typed in question marks can be a number. And that number means, give me 10 bytes. Give me 20 bytes. Give me 100 bytes. And malloc will do its best to ask the operating system-- Linux, in this case-- hey, are their 100 bytes of RAM available? If so, return those bytes to me by returning the address of which of those bytes, perhaps? The very first one. So here too-- and this is predominant in C, any time you're dealing with addresses? You're almost always dealing with the first such address, no matter how big a chunk of memory you are being handed back, so to speak. So let's dive in here. I am trying to allocate how many bytes, exactly? Well. String length of s-- let's do a concrete example. If s is hello, H-E-L-L-O, what's the string length of s, obviously? So it's five. But I'm doing a plus 1 on that, why? Why do I want six bytes instead of five? The null character. I don't want to leave off this special null character. Because if I make a copy of Hello and just do H-E-L-L-O, but I don't put that special character, the computer might not have, by chance, a backslash zero there for me. And so if I'm trying to figure out the length of the copy, I might think that it's 20 characters long, or a million characters long if I just never happen to hit a backslash zero. So we need six bytes to store H-E-L-L-O, backslash zero. And then this is just to be super anal. Suppose that I forget what the size of a char is. We keep saying it's one byte. And it usually is. In theory, it could be something different, on a different Mac or a different PC. So it turns out there's this operator called sizeof that if you pass it the name of a data type-- like char, or int, or float-- it will tell you, dynamically, how many bytes a char takes up on this particular computer. So this is effectively just like saying times 1 or times nothing at all. But I'm doing it just to be super anal, that just in case a char differs on your computer versus mine, this way the math is always going to check out. Lastly, down here I check for null, which is always good practice-- again, any time we're dealing with pointers. If malloc wasn't able to give me six byes-- which is unlikely, but just in case-- return one immediately. And now, go ahead and copy the string as follows. And this is familiar syntax, albeit in a different role. I'm going to go ahead and get the string length of s and store it in n. I'm then going to iterate from i equals zero up to and including n, greater than or equal to. So that on each iteration, I put the ith character of s in the ith character of t. So what's really going on underneath the hood here? Well if this, for instance, is s-- and I have typed in the word H-E-L-L-O and there's a backslash zero. And again, this is s pointing here. And here now is t. And this is pointing now to a copy of memory, right? Malloc has given me a whole chunk of memory. I don't know initially what's in any of these locations. So I'm going to think of these as a whole bunch of question marks. But as soon as I start looping from zero on up through the length of s, t bracket zero and t bracket 1-- and I'll put this now on the overhead-- t bracket zero and s bracket zero mean that I'm going to be copying iteratively h in here, E-L-L-O. Plus, because I did the plus 1, backslash zero. So now in the case of compare-1.c, in the end, if I print out the capitalization of t, we should see that s is unchanged. Let me go ahead now and do this. So make copy1. Dot slash copy1. I'm going to type in hello, Enter. And now notice, only the copy has been capitalized. Because I truly have two chunks of memory. Unfortunately, you can do some pretty bad and pretty dangerous things here. Let me pull up an example here now, that gives us an example of a few different lines. So just intuitively here, the first line of code, int star x, is declaring a variable called x. And what's the data type of that variable? What's the data type of that variable? That was not the cliffhanger. The data type is int star. So what does that mean? x will store the address of an int. Simple as that. Y is going to store the address of an int. What is the third line of code doing there? It's allocating how many bytes, most likely? Four. Because of the size of an int is generally four, malloc of four gives me back the address of a chunk of memory, the first of whose bytes is stored now in x. Now we're moving a little quickly. Star x means what? It means go to that address and put what number there? Put the number 42 there. Star y means go to what's at y and put the number 13 there. But wait a minute. What is in y at the moment? What address is y storing? We don't know, right? We have never once use the assignment operator involving y. So y as declared on the second line of code is just some garbage value, a big question mark so to speak. It could be pointing randomly to anything in memory, which is generally bad. So as soon as we hit that line there, star y equals 13, something bad, something very bad is about to happen to Binky. So let's see what's going to end up happening to Binky here in this minute or so look. [VIDEO PLAYBACK] -Hey, Binky. Wake up. It's time for pointer fun. -What's that? Learn about pointers? Oh, goodie. -Well, to get started, I guess we're going to need a couple pointers. -OK. This code allocates two pointers which can point to integers. -OK, well, I see the two pointers. But they don't seem to be pointing to anything. -That's right. Initially, pointers don't point to anything. The things they point to are called pointees, and setting them up is a separate step. -Oh, right, right. I knew that. The pointees are separate. So how do you allocate a pointee? -OK. Well, this code allocates a new integers pointee, and this part sets x to point to it. -Hey, that looks better. So make it do something. -OK. I'll dereference the pointer x to store the number 42 into its pointee. For this trick, I'll need my magic wand of dereferencing. -Your magic wand of dereferencing? Uh, that's great. -This is what the code looks like. I'll just set up the number, and-- -Hey, look. There it goes. So doing a dereference on x follows the arrow to access its pointee. In this case, to store 42 in there. Hey, try using it to store the number 13 through the other pointer, y. -OK. I'll just go over here to y and get the number 13 set up. And then take the wand of dereferencing and just-- whoa! -Oh, hey. That didn't work. Say, Binky, I don't think the dereferencing y is a good idea, because setting up the pointee is a separate step. And I don't think we ever did it. -Hmm. Good point. -Yeah, we allocated the pointer y. But we never set it to point to a pointee. -Hmm. Very observant. -Hey, you're looking good there, Binky. Can you fix it so that y points to the same pointee as x? -Sure. I'll use my magic wand of pointer assignment. -Is that going to be a problem like before? -No. This doesn't touch the pointees. It just changes one pointer to point to the same thing as another. -Oh, I see. Now y points to the same place as x. So wait. Now y is fixed. It has a pointee. So you can try the wand of dereferencing again to send the 13 over. -OK. Here goes. -Hey, look at that. Now dereferencing works on y. And because the pointers are sharing that one pointee, they both see the 13. -Yeah. Sharing. Whatever. So are we going switch places now? -Oh, look. We're out of time. -But-- -Just remember the three pointer rules. Number one, the basic structure is that you have a pointer. And it points over to a pointee. But the pointer and pointee are separate. And the common error is to set up a pointer, but to forget to given a pointee. Number two, pointer dereferencing starts at the pointer and follows its arrow over to access its pointee. As we all know, this only works if there is a pointee, which gets back to rule number one. Number three, pointer assignment takes one pointer and changes it to point to the same pointee as another pointer. So after the assignment, the two pointers will point to the same pointee. Sometimes that's called sharing. And that's all there is to it, really. Bye bye now. [END VIDEO PLAYBACK] DAVID MALAN: So more on pointers, more on Binky next week. We'll see you on Monday.