[MUSIC PLAYING] DAVID J. MALAN: Well, this is CS50, and already this is week four, and recall that last week, week three, we began to explore the inside of a computer's memory a bit more. We talked about arrays, which were just chunks of memory back to back to back that really lay things out left to right, top to bottom, and this is actually a pretty common paradigm, even if you're new to programming, and certainly new to C. You've seen this approach of just using memory in some way to lay things out, like images, for instance. So for instance, here is a photo taken of last week's front row, for instance, and this is an opportunity to explore exactly what happens if we start to zoom in and zoom in and zoom in, because it seems like most any TV show like CSI, or whatever, or any movie that explores forensic information might have the investigators zoom in on an image like this to see what the glint in someone's eye is because that reveals the license plate number of someone that just drove past. Something that's a little over the top there, but there's an opportunity here to speak to why that is so unrealistic. For instance, let's zoom on this puppet here's eye and let's zoom in a little more to see what might be reflected. Let's zoom in a little more, and that's it. There's only finite amount of information if you have an image represented in this way. We're using pixels-- these dots on the screen as rows and columns-- because if you're only using a finite amount of memory then at the end of the day, you can only store a finite amount of information. At least I don't really see in this grid here any glint of a license plate or something like that that you might otherwise see in Hollywood. So today we'll explore these kinds of representations of how you might use memory in new and interesting ways to represent now, very familiar things, but also start to explore what some of the limitations are of this representation. But consider after all that this doesn't need to be even as high resolution, as many pixels as something like this other image, you can imagine just doing something silly with Post-It notes, like this. And if you think of an image as just having rows and columns, these rows otherwise known as scan lines-- something we'll explore in the coming week-- you could make this fun smiley face by just using two different values, maybe a zero and a one. Or yellow and purple, or vice versa, just to make something come to life. Now in practice, recall we talked about storing not just a zero or one, but maybe an R, a G, and a B value-- like 24 bits, or three bytes in total-- but we'll come back to that. That would just be a more involved image. But for fun, if today you want to tackle something passively in the background, if you go to this URL here, we've put together an opportunity to do a bit of pixel art. If you go to this URL here, that'll redirect you to a Google Spreadsheet. If you have a laptop with you today that'll look a little something like this, which we've organized in rows and columns. So if you'd like to go ahead and use Google Spreadsheet's colorization feature to color in those individual squares if you'd like, see if you can't make something a little creative and then email it to Carter and we'll exhibit some of the best or favorites on the website thereafter. So let's transition then to something a little more familiar-- images. And not all of you have used, presumably, Photoshop, but you're probably generally familiar with Photoshop as a program for editing and creating images or photos or the like. And here is a screenshot of p's color picker, via which you can change what color you're going to draw with the paint brush, or what color you're going to fill in with the paint bucket. It's representative of any kind of graphical tool. And there's a lot of information in here, but there's perhaps some familiar terms now-- R, G, and B. In fact, right now this is Photoshop's way of saying you're about to fill in your background or foreground with the color black, and that appears to be represented with an R, a G, and a B value of zero, zero, zero. Or alternatively, using a hash symbol and then 000000. And if some of you have already made web pages before and you know a little bit of HTML and CSS, you probably are familiar with this kind of syntax-- a hash symbol and then six, or sometimes three digits thereafter. And if we look at a few different colors here, for instance, here might be the representation of white. Now the R, the G, and the B values went way up from 0 to 255, 255, 255. Or alternatively, it looks like Photoshop, and in turn web browsers, could represent that same color white with FFFFFF. And let's just do a few others. Here is red, and it turns out that red is a whole lot of red, 255, but no green, no blue. Or, a.k.a. FF0000. So there's perhaps a pattern here emerging. Here is green, zero, 255, zero, a.k.a. 00FF00, or lastly, here blue, which is no red, no green but apparently a lot of blue, 255 again, a.k.a. 0000FF. Now some of you, again, might have seen this notation before, these zeros and these F's and all of the numbers and letters in between, but this is another form of notation. And in fact, we'll explore this today-- really is just a precondition for talking about some other concepts. But the ideas, ultimately, are really no different. What we're about to see is a different base system-- not just binary, not just decimal, but something we're about to call hexadecimal. But first, recall that with RGB we previously did the following. Any RGB value-- red, green, blue-- just combine some amount of red or green or blue. So here we have 72, 73, 33, which in the context of an email or text, of course, said what-- a couple of weeks back? Just hi with an exclamation point, but in the context of a Photoshop-like program, this might instead be representing, collectively, this shade of yellow, for instance, when you combine that much red that much green that much blue. So here is the same idea. If you've got a lot of red, no green, no blue, together that's going to give us red. If you've got no red, a lot of green, no blue, that's going to give us, of course, green. If you've got no red, no green, a lot of blue, that of course, is going to give us blue. So there's a pattern emerging here where apparently 00 is none, as always, and FF is apparently a lot. And it's maybe somehow equated with 255, at least per that Photoshop screenshot. Meanwhile, if we combine one last one, a lot of red, a lot of green, a lot of blue-- that's actually going to give us a single white pixel like this. All right, so think back. Here was binary-- in the world of binary you had just two digits, zero and one. Could have been anything else-- A or B, X or Y, but the world standardized on these numerals zero and one. In our world's decimal system, of course, you have zero through nine. As of today though, we're going to start using hexadecimal sometimes in the context of images and also files just because it's a convention and there's some conveniences to it. Where now, you're going to be able to count up to F in a notation called hexadecimal. From zero through nine, then you keep going to A to B to C to D to E to F, the idea being each of these, even though it's weirdly a letter of the English alphabet, it's still just a single symbol. It's not one zero for 10, or 1 1 for eleven-- all 16 of these values, these digits, so to speak, are indeed still just single symbols, and that's a characteristic of just using this other notational system. So how do we get from 00 and FF to something like 0 and 255, respectively? Well, this hexadecimal system, a.k.a. Base 16, just does the math from week zero and really, grade school, a little bit differently. For instance, if you have a number that's got two digits, or hexadecimal digits as of today, the columns are just a little different. Instead of powers of two or powers of 10, which we saw for binary and decimal respectively, it's powers of 16. So if we just do the math out, that's the ones column, this is the 16s column, and so forth. Things get actually pretty big pretty quickly in this system. But now let's just consider how we would represent familiar numbers. If you've got two hexadecimal digits for which these hashes are just placeholders, zero, zero is going to mathematically equal the decimal number you and I know, of course, as zero. Why? Same thing as week zero-- 16 times zero plus one times zero is the number you and I know as zero. And we can count up from here. This, in hexadecimal, would be how a computer represents the number we know as one. It would be zero one in this case. This would be two, three, four, five, six, seven, eight, nine-- in decimal, we're about to go to 10. But in hexadecimal, to be clear, what comes next? So, apparently A, so 0A, 0B, which is now 10, or 11, or 12, 13, 14, 15. So using hexadecimal is just an interesting way of using single symbols now, zero through F, to count from zero through 15. And we'll see why it's 15 in a moment, but as soon as we get to F, anyone want to conjecture how in hexadecimal, a.k.a. hex, do we now count up one position higher? What comes after 0F in hexadecimal? So, one zero-- it's the same kind of thing-- once you're at the highest digit possible, F-- or in our decimal world that would have been nine-- you add one more, nine wraps around to zero, or in this case, F wraps around to zero. You carry the one and voila-- now we're representing the number you and I know as 16. And we could keep going forever, literally. This could be 17, 18, 19, 20, and decimal-- but let's just wave our hands at it and count as high as we can-- dot, dot, dot-- the highest we could count in hexadecimal with two digits, just logically, would be what, in hexadecimal? Something, something. FF, I heard. So yes, that's the biggest digit possible, so FF is what we have. So how high can you count in hexadecimal if you've got just two of these digits? Well, it's the same math as always. 16 times F, a.k.a. 15, so that's 16 times 15 plus one times F, or one times 15-- that gives us 240 plus 15 in decimal, the result of which, of course, now is 255. So this hexadecimal system-- you may have seen in the world of web pages, and if you haven't we'll get to that in this class in a few weeks, or we just saw in the context of Photoshop-- just has this shorthand notation of counting as high as 255 but just calling it FF. Now it's marginal, but that's like 50% savings of how many digits you need in order to count as high as 255 because in decimal, of course, 255 is three digits. In hexadecimal you can count as high using just two, and that difference is going to get magnified the bigger our numbers get. Let me stipulate for now, you're going to get more and more savings in terms of just how many symbols you need on the screen to represent bigger and bigger numbers than that. All right, let me pause here just to see if there's any questions thus far on what we've called hexadecimal, which again, just gives us zero through nine as well as A through F. Any questions or confusion? And if it feels like we're lingering a bit much on arithmetic, we're not really going to see other notations besides this moving forward. These are the go-to three in a programmer's world, typically. But there are some others. Yeah. AUDIENCE: Does the hexadecimal symbol take more storage than the decimal system? DAVID J. MALAN: Good question. Does hexadecimal require more storage or less storage than the decimal system? Theoretically no, because this is just a way of representing information and we'll see in a concrete example in a moment. But inside of the computer, at the end of the day, you're still storing bits. And using hexadecimal is not using more or fewer bits, think of this as how you might write it down on a piece of paper, just how many digits you're going to write or on a computer screen, how many digits you're going to see at once, but it doesn't change how the computer is representing information because all they're representing at the end of the day is zeros and ones. So in fact, let's go there. If this-- a moment ago FF I claimed was 255-- let's just rewind to week zero and if we wanted to count to 255 in binary, that's as high as you can count, recall, with eight bits. And there's only a few of these numbers that are useful to memorize, like 255 is as high as you can count with eight bits if you start at zero, because two to the eighth is 256, but if you start at zero it's zero through 255. So in binary, recall if you have eight bits, all of which were ones, and I won't do out the math pedantically here, but if I do do this plus this plus this, dot, dot, dot-- that's also going to give me 255. So this is what's interesting here about hexadecimal. It turns out that an upside of storing values in hexadecimal is that we're going to see the first F represents the left half of all these bits, and the second F in this case represents the rightmost four of these bits. So it turns out hexadecimal is very useful when you want to treat data in units of four. It's not quite eight, but units of four, and that's not bad. Which is why-- if you use two digits like I have thus far, 00 or FF or anything in between-- that's actually a convenient way of representing eight bits in total. One hex digit for the first four bits, one hex digit for the second. And again, there's nothing new intellectually here per se, it's just a different way of representing the same story as before-- zeros and ones. So in what context do we see this? Well, we talked about memory last week, and we're going to talk more about it this week. If this is my computer's RAM-- random access memory-- you can again think of each byte as having a number associated with it-- its address or location. This might be zero, this might be 2 billion, and so in the past I've described these as just this, using decimal numbers. Here's byte zero, one, two, three, four, five, six, seven, 15, 16 would be here, and so forth. But it turns out in the world of memory, and thus today, programming, people tend to count memory bytes using hexadecimal. Partly just by convention, but also partly because it's a little more succinct and again, each digit represents four bits, typically. So what comes after F here? Well, if I think about the computer's memory, I normally might do after F, which is 15, 16. But instead, one zero, one one, one two, one three-- this is not 10, 11, 12, 13, because I claim I'm in the context of hexadecimal now. As per the previous slide, we already started going into A's through F's, so you immediately see here a possible problem. Why is this now worrisome, if all of a sudden you're seeing seemingly familiar numbers like 10, 11, 12, 13? We didn't really stumble across this problem when it was all zeros and ones before. Yeah. AUDIENCE: Try to do math [INAUDIBLE]. DAVID J. MALAN: Yeah, so if you're writing some code in C that's doing some math, you might accidentally-- or the computer might accidentally confuse hexadecimal with decimal if they look in some context the same. Any number on the board that doesn't have a letter is ambiguously hexadecimal or decimal at this point, and so how might we resolve this? Well, it turns out that what computers typically do is this. By convention, any time you see 0x and then a number, that's a human convention of saying-- signaling to the reader that this is in fact a hexadecimal number. So if it's 0x10, that is not the number 10, that is the hexadecimal number one zero, which recall we said earlier, is how you count up to 16. And again, these are not the kinds of things to memorize, it's really just the system for how you think about these things. So henceforth today, we're going to start seeing hexadecimal in a bunch of contexts. When you write code, you might even write code using some hexadecimal but again, it's just a different way of representing numbers and humans have different conventions for different contexts. All right, so with that said, any questions now on this building block? But here on out, we'll start using it in some actual code. Any questions? Nothing so far? All right. So, let's go ahead and consider maybe a familiar example. Something where involving code, where I initialize a variable like n to a value like 50, in this case. And then let's start to tinker around with what's going on inside of the computer's memory. In a moment I'm going to load up VS Code on my computer and I'm going to go ahead and whip up a program that very simply assigns a value like the number 50 to a variable called n, but today, keep in mind that that variable n and that value 50 is going to be stored somewhere in my computer's memory, and it turns out today we'll introduce a bit more syntax so you can actually see where things are being stored. So let me click over to VS Code here. I'm going to create a program called address.c just to explore computer's addresses today, and I'm going to do an include stdio.h, int main(void), as usual. No command line arguments for now. I'm going to declare that variable n equals 50, and then I'm just going to go ahead and print it out. So nothing very interesting but I'll use %i backslash n and then comma n to print out that value. Nothing here should be very interesting to compile or run, but I'll do it just to make sure I didn't make any mistakes. Looks like as expected, it simply prints out the number 50, like this. But let's consider then, what this code is doing underneath the hood when it's actually run on your machine. So here we have that grid of memory. That variable n is an int, and if you think back, how many bytes typically do we use for an int? Yeah. Four, so four bytes, or 32 bits. So if each of these squares represents one byte, then my computer, somewhere in my memory, or RAM, is using four of these squares. Maybe it ends up over here just because there's other stuff being used elsewhere, for instance. Though I don't really know, and frankly, I don't really care where it ends up, just that it ends up somewhere. So the variable-- the value 50 is stored here in a variable called n. Even though I've written it as decimal, just like in my code-- let me again remind that this is 32 zeros and ones representing that 50-- it's just going to be very tedious if we start writing everything in binary, so I'll use the more comfortable human decimal system. So that's what's going on inside of the computer's memory. So what if I actually wanted to start tinkering with its location, or maybe just knowing its location? Well, this variable n indeed has a name, n-- that's a label of sorts for it-- but at the end of the day that 50 is technically at a specific address, and I'm going to make one up-- 0x123, and it's 123 because I really don't care what it is, I just want an address for the sake of discussion. So way over here off screen might be byte zero, way down here is byte 0x123. It's in hexadecimal notation just by convention. So how can I actually see where my variables are ending up in memory if I'm curious to do so? Well, let me go back to my code here and let me actually change this just a little bit. Let me go ahead and introduce, for instance, another symbol here and another topic altogether, namely pointers. So a pointer is a variable that stores the address of some value-- the location of some value or more specifically, the specific byte in which that value is stored. So again, if you think of your memory as being a whole bunch of bytes-- zero at top left, 2 billion or whatever at bottom right, depending on how much RAM you have-- each of those things has a location, or an address. A pointer is just a variable storing one such address. So it turns out that in the world of C, there's a couple of new symbols we can use if we want to see what it is we're talking about here, and those two operators, as of today, are these. You can use the ampersand operator in C in a couple of ways. We already saw it very briefly to do ampersand ampersand-- it's kind of and two Boolean expressions together in the context of a conditional. This is different. A single ampersand is the address of operator. So literally, in your code, if you've got a variable like n or anything else and you write &n, C is going to figure out for you what is the address of that variable n in the computer's memory. And it's going to give you a number, otherwise known as the address of that. If you want to store that address in a variable even though yes, it's a number like 0x123, you have to tell C in advance that you want to store not an int per se, but the address of an int. And the syntax for doing that-- somewhat nonobviously-- is to use an asterisk here, a star operator, and you say this when creating the variable. If you want p to be a pointer, that is the address of some other variable, you do int star p. And the star just tells the computer, this is not an integer per se, this is the address of something that yes, is an int, but we're just being more precise. So on the right hand side you have the address of operator. As always with the equal sign, you copy from right to left. Because &n is by definition the address of something you have to store it in a pointer, and the way to declare a pointer is to specify the type of value whose address you're storing, and then use the star to indicate that this is indeed a pointer and not just a regular old int. So let's see this in practice. Let me go back to my own source code here and let me make just a couple of tweaks. I'm going to leave n alone here but I'm going to go ahead and initially just do this. Let me say int star p equals ampersand n, and then down here, I'm going to print out not n this time, but p-- the variable p. And then even though yes, it's just a number and therefore I could use %i for integers, there's actually a special format code in printf for printing pointers or addresses, and that's %p. So now let's go ahead and recompile this, make address-- so far so good-- ./address, Enter, and a little weirdly, but perhaps understandably now, the address in my computer's memory at which the variable n happened to be stored was not quite as simple as 0x123. This computer has a lot more memory so technically, it was stored at 0x7FFCB4578E5C. Now that has no special significance to me. It could have ended up somewhere else altogether, but this is just where, in my computer-- or technically the cloud server to which I'm connected using VS Code here-- that just happens to be where n ended up. And strictly speaking, I don't even need to introduce this variable. I could get rid of p and I could just say print not just n, but the address of n and achieve the same thing. You don't need to temporarily store it in a variable. Let me just do make address again, ./address, and now I see this address here. And notice if I keep running the program, it's actually moving around. There's other stuff presumably going on inside of the computer. Maybe it's actually randomizing it so it's not always at the same location. That can actually be a security feature underneath the hood, but this happens to be at that moment in time where that value is in memory, quite like our picture a moment ago. All right, so let me pause here to see if there's now any questions on what we just did. Yeah? AUDIENCE: Is there any way to control where you are storing something in memory? Does it even matter if it works, or does it just matter that you could go in and locate where something is? DAVID J. MALAN: Really good question. Is there any way to control where something is in memory? Short answer is yes, and this is both the power in the danger of C, and we're going to do this today and make a few deliberate mistakes, because with this power of going to or getting the address of any variable, I could just arbitrarily right now write code that stores a value at byte 2 billion, or zero, or anything in between. But that also means potentially, I could start creepily looking around at all of the computer's memory, even at things that I didn't put there. Maybe other programs, maybe other parts of programs and indeed, this is a potential security threat, if suddenly you're able to just look anywhere you want in the computer's memory. Now, I'm overselling it a little bit because nowadays, in this decade, there are some defenses in place in compilers and in our operating systems that do hedge against this a little bit. But this is still a very frequent source of problems, and later today we'll talk briefly about things called stack overflow, which is not just a website, it is a problem that you can encounter. Heap overflow, and more generally buffer overflows-- there's just so many things that can go wrong using this language called C, and if any of you have encountered a segmentation fault yet? I think we saw a few hands for that already. You touched memory that you shouldn't have and odds are you did it most recently by going too far in an array. Going to the left, or negative in an array, or somehow looking at memory you shouldn't have. And we'll explain today why it is you were able to do that. Other questions on these primitives so far? Yeah, from Carter? AUDIENCE: [INAUDIBLE] pointer star p, but then we used p later in the code. Is it called star p or p? DAVID J. MALAN: Good question. Earlier, we used star p. Let me rewind in time to the previous version of this code, where I actually had a variable called p. Just like with variable declarations in the past, once you've declared a variable to be an int, a char, a bool, or an int star, a.k.a. a pointer, you don't thereafter keep using the word int or now, the star. Once you've declared it, that's it. You only refer to it by name. And so it's very deliberate what I did here, saying that the type here is int star-- that is a pointer to an int-- but here I just said the name of the variable, as always. I didn't repeat int, and I also didn't repeat star. But at the risk of bending one's minds a little bit there is unfortunately one other use for the star operator, and that's as follows. If you want to print out not the address of something, but what is at a specific address, you can actually do this. If I want to print out the integer via %i, that is at that address, I can actually use the star here, which technically contradicts what I just said but it has a different function here-- a different purpose. So let me go ahead and do this in two different ways. I'm going to leave this line of code as is, but I'm going to add another line of code now that prints out what apparently will be an integer, in a moment. So %i backslash n, and I could see-- and let me just do n for now. So there's really nothing special happening now, I'm just adding a sort of mindless printing of n. So make address, ./address-- there's the current address of n and there's the value of n. But what's kind of cool about C here, too, is if you know that a value is at a specific address like p, there's one other use for this star operator, the asterisk. You can use it as the so-called dereference operator, which means go to that address. And so here what we actually have is an example of a pointer p, which is an address like 0x123 or 0x7FF and so forth. But if you say star p now, you're not redeclaring the variable because I didn't mention int-- you're going to that address in p. So let me recompile this now. Make address, ./address, and just to be clear-- what should I see? I'm first going to see the pointer itself, 0x something. What's the second line of output I should presumably see now? Shout a little louder. So I'm hearing 50, and that's true because if you figure out the address of n and print it in line seven, but then go to the address of n, a.k.a. p, that's indeed going to just show you the number n-- the value of n again. All right, any questions now on this syntax-- and I will concede, I think this is confusing-- the fact that we use the star for multiplication, the fact that we use the star to declare a pointer, but then we use a star in a third way to dereference the pointer and go to the pointer. It's just too confusing, honestly, but with practice comes comfort. Yeah. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Good question. Do you-- when you are using the ampersand operator to get the address of something, the onus is on you at the moment to know what you are getting the address of. Is it a string? Is it a char? Is it a bool? Is it an int? I wrote this code so I know in line six that I'm trying to get the address of what is an integer. AUDIENCE: What about line eight? DAVID J. MALAN: In line eight you don't have to worry about that-- good question. Notice in line eight, I didn't tell the computer, other than the %i, what kind of address I'm going to, but I did already in line six. I told the compiler that p, now and forever, is going to be the address of an int. That's enough information in advance so that printf, or really the language C, still knows on line eight that p is a pointer to an int, and that way it will print out all four bytes at that address, not just part of it, and not more than those four bytes. Good question. Yeah, next to you. AUDIENCE: Do pointers have pointers? DAVID J. MALAN: Do pointers have pointers? Yes. We won't do this today by having pointers to pointers, but yes, you can use star star, and then things get-- I'm sorry. We won't do that today and we won't do that often. In fact Python, another language, is just a couple of weeks away, so hang in there. Almost there. A question back here? Was there? That was-- more verbal feedback like that is helpful as we forge into the more complicated stuff. Other questions? Yeah. AUDIENCE: What's the point of [INAUDIBLE]? DAVID J. MALAN: What's the point of printing the address? AUDIENCE: Like, using the address to [INAUDIBLE]. DAVID J. MALAN: Sure. What's the point of doing this? If you don't mind, let me-- let's get there in a moment. This is not the common use case, just printing out the address-- who really cares? At the moment we care only for the sake of discussion. We're soon going to start using these addresses. So hang in there just a little bit for that one, too, but it will solve some problems for us before long. So let's actually just now depict what was going on inside of the computer's memory just a moment ago. So if I toggle back here, let me redraw my computer's memory, now let me plop into the memory n, which is storing in this program the number 50. Where is p in my computer's memory? Specifically, I don't know and apparently it moves around each time I run the program so for the sake of discussion, let's just propose that if 50 ended up at address 0x123, I don't know-- p ends up over here, at address-- whoops-- at whatever address this is here. But notice a couple of curiosities now. If p is a pointer, it's the address of something. So the value in p should be an address, and I've indeed written it as such-- 0x123, and technically there's not an x there, there's not a zero there, there's not even a 123 there per se-- there's a pattern of bits that represents the address 0x123. But again, that's weak zero-- don't care about binary day-to-day. So if this is p, and this I claimed was n, why is p so much bigger? Can someone conjecture here? Because it turns out whether n is an int or a char or a bool, which are different types-- heck, even a long-- it turns out that p is always going to take up eight squares on the board, but why might that be? What might explain that? Yeah, thoughts? AUDIENCE: Perhaps it allocates eight bytes, but it doesn't know the type of the data [INAUDIBLE]. DAVID J. MALAN: OK, fair. Maybe it's allocating eight bytes because it doesn't know the type. Turns out that's OK because an address is an address. It's really up to the programmer to use it as a string or a char or a bool. Other thoughts? AUDIENCE: Maybe the first four for the actual number and the last four is some null that [INAUDIBLE] where the pointer ends. DAVID J. MALAN: OK, possibly. It could be that pointers have some complexity like a backslash n or something curious like that, like we talked about for strings. Turns out that's not the case. It turns out that pointers nowadays typically are, but not always are eight bytes, a.k.a. 64 bits, because you and I-- our Macs, our PCs, heck-- even our phones have a lot more memory than they did years ago. Back in the day, a pointer might have only been 32 bits, or even only eight bits way back in the day. It's considered 32 bits, because that was the norm for some time. How high can you count, roughly, if you've got 32 bits? What's the number we keep rattling off? 32 bits is roughly 2 to the 32, so it's 4 billion, and I keep saying it's 2 billion if you do negative, but in the world of memory there's a reason I keep saying 2 billion bytes, two gigabytes, because for a very long time that was the maximum amount of memory a computer could have. Why? Because the pointers that the computers were using were only, for instance, 32 bits. And with 32 bits, depending on whether you allow for negatives or not, you can count as high as 2 billion, roughly, or maybe 4 billion but you know what-- your Mac, your PC, your phone could not have had five gigabytes of memory, or 5 billion bytes of memory. You certainly couldn't have had what computers nowadays come with, which might be 8 gigabytes of memory-- 16 gigabytes of memory. Why? Because with 4 bytes, or 32 bits, you literally, physically, can't count that high, which means if I drew a picture of all of the memory we would run out of numbers to describe them, which means most of my memory would just be unusable. So pointers nowadays are 64 bits, or eight bytes. That's really big. I can't even pronounce how big that number is, but it's plenty for the next many years, and so we've drawn it that way on the board here. Now let's just abstract this away. Let's get rid of all the other bytes that are storing something or nothing else, and let's now start to abstract away this complexity because the reality is, to your question earlier-- what is this useful for, or what do we-- do we actually care about these addresses? Generally, no. We're doing this so that you see there's no magic. We're just moving things around and poking around in memory. But what a person would typically do when talking about pointers would literally be to just point at something. I really don't care what address n is at, so it suffices when general, when drawing pictures on a whiteboard, having a discussion with another programmer, you just draw an arrow from the pointer to the value in question, because neither you nor I probably care about the specifics of 0x whatever. There's your pointer-- it's literally an arrow, and we can see this. So it turns out that these pointers, these addresses, are not that dissimilar to what we've done for hundreds of years in the form of a postal system. For instance, here is a post office-- here, no-- here is a mailbox, and suppose that this is a mailbox labeled p. It's a pointer, and suppose there's another mailbox way over there, which is just another bite of my computer's memory. What are we really talking about? Well, you store in a computer's memory values like the number 50, or the word "hi" inside of your computer's memory at some location. But today we can also use those same memory locations to store the address of things. For instance, if I open this up here and I see OK, the value inside of this mailbox is not a number like 50, it's actually an address-- 0x123-- that's like a pointer, a breadcrumb leading from one location in memory to another. And in fact, would someone who's seated roughly over there-- do you mind getting the mail over there? Any volunteers over in this section? Just need you to get to the mailbox before I do. Who's being volunteered? Oh yes, please. Whoever is gesturing most wildly, come on down. Sure. What's your name? AUDIENCE: Anfoo. DAVID J. MALAN: Say again? AUDIENCE: Anfoo. DAVID J. MALAN: Anfoo? OK, come on up to the edge of the stage there and just to be clear-- if this is p, that is apparently n, but to make clear what we're talking about when we're storing 0x whatever values-- like 0x123, that's essentially equivalent to my maybe pulling out something like this and just abstractly pointing to your mailbox there, or if you prefer, pointing to the mailbox-- OK, all right. Thank you. All right. This is akin to me pointing at your mailbox, and if you want to go ahead and open your mailbox and reveal to the crowd what's inside your mailbox labeled n. All right. Thank you. We have a little CS50 stress ball for your trouble. Thank you for coming up. So that's just to put a visual on what it is we're talking about, because it can get very abstract, very cryptic quickly when we're talking about addresses and memory and drawing it like these little squares. But if you think about just walking into a post office or an apartment complex that's got a lot of mailboxes, those mailboxes essentially are a big chunk of memory and each of those mailboxes has an address-- this is apartment one, two, three-- apartment 2 billion. And inside of those mailboxes can go anything that can be represented as information. It could be a number like n, or 50, or if you prefer it could be a number that represents the address of another mailbox. And this is akin, really, if you've ever had an apartment or you and your parents have moved, to having a forwarding address. It's like having the Post Office in the US put some kind of piece of paper in your old mailbox saying, actually forward it to that other mailbox. That really is all a pointer is doing. At the end of the day, it's just a number but it's a number being used in a different way and it's the syntax that we've introduced, not just int but int star, that tells the computer how to treat that number in this slightly different way. Are there any questions then, on this? Yeah, in back. AUDIENCE: If you had a variable, like int c, [INAUDIBLE]. DAVID J. MALAN: If I did int c and-- say the code again? Once more? Equal to n, so let me actually type it out. If I give myself another line of code, tell me one last time what to type. int is equal to n, like this? So this is OK, and I can't draw it quite quickly enough on the board here, but this would be like creating another four bytes somewhere in memory, maybe down here, that stores an identical copy of 50 because the assignment operator from right to left copies one value to another. So that would just add one more rectangle of size four to this particular picture. If I'm answering your question as intended. OK, so that is week one style use of assignment operators before pointers. I could, though, start copying pointers but again, we'll come back to some of that complexity. Any other questions here? AUDIENCE: That was a great question. Does the pointer point-- does the same pointer point to the new replica as well? DAVID J. MALAN: Ah, good question. Short answer, no. And to repeat for the camera, if I create a second variable like this, int c equals n, and I claim without actually drawing it on the board that this gives me another rectangle, the value of which is also 50, p does not get touched. And this is what's important and really characteristic of C. Nothing happens automatically for you. p is not going to be updated unless you update p in some way, so creating a third variable called c-- even if you're copying its value from right to left, that has no effect on anything else in the program. A good question. So what have we seen that's perhaps now a little more explainable? Well, recall that we talked quite a bit last week about strings, and just to recap in layperson's terms, what is this string as you now understand it? So say-- well, let me take a specific hand here. What's a string? How about over here. AUDIENCE: An array of characters. DAVID J. MALAN: OK, sure. Both of you are right. An array of characters. An array of characters, and we-- I claimed-- or revealed last week that string is not technically a feature built into C. It's not an official data type but every programmer in most any language refers to sequences of characters-- words, letters, paragraphs-- as strings. So the vernacular exists but the data type doesn't typically exist per se in C. So what we're about to do, if you will, for dramatic effect, is take off some training wheels today. The CS50 library implemented in the form of the header file cs50.h-- we claim has had a bunch of things in it. Prototypes for GetString, prototypes for GetInt, and all of those other functions, but it turns out it also is what defines the word "string" in such a way that you all can use it these past several weeks. So let's take a look at an example of a string in use. Here, for instance, is a tiny bit of code that uses the word "string," creating a variable called s and then storing quote unquote, hi, exclamation point. Let's consider what this looks like now in the computer's memory. I don't care about all the other bytes, let's just focus on these, and this per last week is how "hi" might be stored. h-i exclamation point and then one more, as someone already observed, that sentinel value-- that null character which just means eight zero bits to demarcate the end of that string just in case there's something to the right of it, the computer can now distinguish one string from another. So last week we introduced this new syntax. Well, if strings are just arrays of characters you can then very cleverly use that square bracket notation and go to location zero or one or two, which are like addresses, but they're relative to the string. This could be at 0x123 or 0x456, but with this bracket notation zero is always the beginning of the string, one is the next, two is the next, and so forth. So that was our array syntax for indexing into an array. But technically speaking, we can go a little deeper today-- technically speaking, if hi is starting at the address 0x123 then it stands to reason that i is at 0x124, exclamation point's at 0x125, and the null is that 0x126. Now, I don't care about 123 per se, but even though this is hexadecimal, this is correct math. Even in hex, if you just add one when you start at 0x123, the next number is four, five, six at the end. I don't have to worry about A's, B's, and C's because I'm not counting that high in this example. So if that's the case, and my computer is actually laying out the word hi in memory like that, well, what exactly is s? What exactly is s if, at the end of the day, H-I exclamation point null is storing-- or is or stored at these addresses? Where is s? Now that I've taken off those training wheels and showed you where H-I exclamation point null actually are, what happened to s? Well s, as always, is actually a variable. Even in the code I proposed a moment ago, s is apparently a data type that yes, doesn't come with C, but CS50's library makes it exist. s is a variable of type string, so where is s in this picture? Well, it turns out that s might be up here. Again, I'm just drawing it anywhere for the sake of discussion, but s is a variable per that line of code. What s is storing, apparently, I claim, is 0x123. I actually don't really care about these addresses, so let's abstract that away. s is apparently, as of now, today, one week later, just a pointer to a character. Specifically, the first character in s. And this is the last piece of the puzzle. Last week we had this clever way of demarcating the end of a string. Well, it turns out that strings are represented in the computer's memory as a variable that is a pointer, inside of which is the address of the first character in the string. So if s points at the first character and you can trust that backslash zero is at the end of the string, that's literally all you need to figure out where a string begins and ends. So what do I mean by this? Well, let's be a little more concrete. In terms of this picture, if I've started with this line of code here, it turns out all this time since week 1, that the word string has just semi-secretly been an alias for char star. I know, so char star. So why does this make sense? It's a little weird still, but if in our previous example we were able to store the address of an integer by declaring a variable called p, as int star p-- well, if as of now strings are just the address of the first character in a string, then probably a string is just a char star because that means s is the address of a character, the very first character in the string. Now, the string might have three letters like it did, or four, or even a hundred if it's a long paragraph, but that's fine because you can trust that there's going to be that null character at the very end. So this is a general purpose way of representing strings using this new mechanism in C. So in fact, let me go ahead here and introduce maybe a couple of manipulations of this. Let me go back to my code here, and let's get rid of this integer stuff, and let's instead now do, for instance, this. Let me add in the CS50 library, so we'll include CS50.H for now. I'm going to go ahead and inside of main, give myself a string s equals hi exclamation point. I don't type the backslash zero. C does that for me automatically by using my double quotes like this. Now let me just go ahead and print it. So this again is week 1 style stuff where I'm just printing a string. No pointers yet. So let me do make address, Enter, ./address, and hopefully I see hi, so nothing new there. But let's start to peel back some of these layers here. Let me first of all, get rid of the CS50 library for a moment and let me change string to char star. And it's a little bit weird but yes, the convention is to say char, a space, then the star, and then immediately thereafter the name of the variable. Strictly speaking though, you might see textbooks or websites that do it like this or like this, but the canonical way is typically to do it like that. So now no more CS50 library, no more training wheels, if you will. I'm just treating strings for what they really are. Let me go ahead and do make address, Enter-- so far so good-- ./address-- and that, too, still works. So %s is a thing that comes with printf because the word string is programmer terminology but strictly speaking C doesn't have a string data type. It's always been char star, so what this means now is I can start to have some fun with these basic ideas, even though this is not purposeful other than for the sake of discussion. But if s is this-- let me go back and give myself the CS50 library. Let's put those training wheels back on for just a moment so that I can do one manipulation at a time. Here's my string s, as before. Well, let me go ahead and declare a char called c, and let me store the first character in the string there, which is s bracket zero, and that should give me h. And then just for kicks, let me go ahead and do char star-- whoops-- let me go ahead and do char star p equals ampersand c, and see what this actually prints for me. Let me go ahead and print out what p is here. So we're just playing around. So make address-- so far so good-- ./address. All right, so what have I just done? I've just created a char c and stored in it the letter H, which is the same thing as s bracket I, then I'm saying, what's the address of c, and that's apparently 0x7FF whatever. So that's the address. But I technically didn't have to do that. Let me go ahead and do two things now. Instead of just printing p, let me go ahead and print out maybe s itself. Let me go ahead and do make address, Enter-- so far so good-- ./address and-- damn it, what did I do wrong. Oh shoot, I didn't want to do that. Oh, I really made a mess of this. What did I want to do here? That was supposed to be impressive but it was the opposite. So let me turn it around. So if I intended to do this, why are lines nine and 10 printing different values? Didn't really intend to go here, but let me try to save this. Why are we seeing different addresses, namely this address 402004 for s, and then 0x7FF for p? Any thoughts? Yeah, over here. AUDIENCE: [INAUDIBLE] is the character c is its own sort of location of the [INAUDIBLE], and it's taking off just the values [INAUDIBLE]. DAVID J. MALAN: Correct. So if I really wanted to weasel my way out of this, this is a great answer to the previous question which was about, what if I introduce another variable, c, that's a copy of the value, and not in this case an int, but an actual char. Here, I've made c be a copy of the character that's at the beginning of s, but that's indeed a copy. So if I were to draw it on the screen that would give me a different rectangle in which this copy of h would actually be stored. So I didn't intend to do this, but what you're seeing is yes, the address of s-- and apparently that's at a pretty low address by default here-- then you're seeing the address of c. But even though each of them is h, I claim one is at a different address in memory. And this has always been happening. Any time you created one variable or another it was ending up here, or here, or here, or somewhere else in memory. Now for the first time all we're doing is actually just poking around the computer's memory to see what is actually there. So let me actually back this up a little bit and do what I intended to do here, which was something like this. So if string s equals quote unquote, hi, let's go ahead and give myself a pointer, called p, to the first character in s. All right, so now let me go ahead and print out the value of this pointer, %p, printing out p. So we're just going to do one thing at a time. So make address, Enter, ./address. There, at the moment, is the address of the first character in s. What I meant to do now, was this. If I want to print out two things this time, let me print out not only what p is, but also what s itself originally is. Because if I claim that everyone from last week should be comfortable with s bracket zero just representing the first character in s by definition of strings being arrays of characters. Then s, as of today, is itself the address of a character, the first one in s. So if I now do make address, and do ./address, this time I see the same exact things. Thank you. This is really the lamest sort of thing to be applauding over, but what we're demonstrating here is that s is by definition the address of the first character in c. So if we borrow some of our mental model from last week-- well, if s bracket zero is the first character in c, doing the ampersand on that expression should be the same as s. Now this isn't to say that we would jump through these hoops all the time with this much syntax, but this is just to do proof by example that s is in fact, as I claimed a moment ago, just the address of a character. Not even multiple characters, it's the address of a single character, but the key thing is it's the address of the first character in the string, and per last week we trust that C is going to look for that null character at the very end just to make sure it knows where the string actually ends. All right, a question came up over here. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Correct. To summarize, on line eight, when I am using %p-- that just means print a pointer value, so 0x something-- I'm passing it s. Previously, when we used %s, printf knew to print not just the first character of s, but h, i, exclamation point, and then stop when it hits the backslash zero. p is different. %p tells the computer to go to that address-- sorry, tells the computer to print that address on the screen. So this is where %s all this time has been powerful. The reason printf worked in week 1 and 2 and 3 was because printf was designed by some human years ago to go to the address that's being passed in-- for instance, s-- and print out character after character after character until it sees the null character backslash zero, and then stop printing it. So that's-- you're getting a lot of functionality for free from %s. Today we're using something much simpler, %p, which just literally prints what s is. And the reason we don't do this in week 1 is just because this is like way too much to be interesting when all you want to print out is hi or hello, world, or the like. But now what we're really doing is revealing what's been going on this whole time. And let me make one other example here. Let me go ahead and get rid of this variable here and let me just print out a few things to make the same point. I'm going to print out not just s like I did here, but let's go ahead and print out every-- the address of every character in s. So let's get the first letter in s and get its address, and I'm going to do copy paste for time's sake, but not something I would do frequently. So let me print out the address of the first character, the second character, the third, and actually even the fourth, which is the backslash zero, by doing this. So when I compiled this program-- make address, ./address-- I should see two identical values and then additional values that are one byte away. In my diagram a moment ago, my addresses were arbitrarily 0x123, 124, 125, 126. Now it starts at, by chance, 0x402004, which is s. 0x402004 is the same thing as s because I'm just saying go to the first character and then get its address. Those are one in the same now. And then after that is 0x402005, 006, 007, because that is just like the diagram. Go to the i, to the exclamation point, and to the null character. So all I'm doing now is using my newfound understanding of what ampersand does and what the star does, is I'm just playing around. I'm poking around in the computer's memory. Just to demonstrate there's no magic. It's all there very deliberately because I or printf or someone else put it there. Yeah. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Really good observation. So it's indeed the case that hi, unlike 50, is ending up at a very low address, not the 0x7FF wherever it was. That's actually because, long story short, strings are often stored in a different part of the computer's memory-- more on that later today-- for efficiency. There's actually only going to be one copy of the word "hi" and exclamation point, and the computer is going to tuck it at the beginning of my memory, but other values like ints and floats and the like-- they end up lower in memory by convention. But a good observation, because that is consistent here. All right, so a couple final details then, on what's been going on here. Let me go ahead and claim that we implemented char star-- or rather, string as a char star as follows. As of last week we were writing this code. As of this week, we can now start writing this code because char star specifically, we invented in the CS50 library. But it turns out you've seen a way of inventing your own data types. Recall this thing here. We played around last time with data structures, or the struct keyword in C, and briefly the typedef keyword, which defines a type for you. And if I highlight what's interesting here, the way we invented a person data type last time was to define a person as having two variables inside of it-- a structure that encapsulates a name and encapsulates a number. Now even though the syntax is a little different today because of the star thing, notice that this could be a similar application of that idea. If I want to create a type called string, highlighted in yellow here, then I use typedef to make it defined to be char star. So this is literally all that has ever been in CS50.h, in addition to those prototypes of functions we've talked about. typedef char star string is a one-line code that brings the word string as a data type into existence, and that's all that's ever been there. But the star, the char star, is just too much in week 1. We wait until this point to peel back that layer. are any questions, then, on what a string is? What star or the ampersand are doing? Yeah. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Oh my God. Massive spoiler, but yes. If that is-- is that why when you compare two strings as I briefly did, or almost did, problems arise. And in fact yes, last week we use str compare-- STRCMP-- for a very deliberate reason because yes, the spoiler is I accidentally would have compared two addresses in memory, not the strings at those addresses. Other questions here. All right, well, before we give ourselves maybe a 10 minute break here, we have lots of pieces of paper. If anyone wants to come on up and play with this big stack of Post-Its, if you want to make your own eight by eight grid of something to share with the class if you're artistically inclined, come on up. Otherwise, let's take 10 minutes and will return after 10. All right, so let's come back to this question of how we can start to use these pointers and these addresses, ultimately in an interesting way. The goal ultimately next week is going to be to use these addresses to really stitch together more complicated data structures than just persons, like last week, or candidates in the context of an electoral algorithm, if you will, and actually really use our memory in the most versatile way to represent not just images but maybe videos and other two-dimensional structures as well. But for now, let's come back to this address example, whittle it down to just a hi initially, and see what's going on again, here underneath the hood. So let me re-add the CS50 library just so we use our synonym for a moment, that is the word string, and I'll redefine s as a string. And what I didn't mention before is that these double quotes that you've been using for some time are actually a little special. The double quotes are a clue to the compiler that what is between them is in fact a string as we now know it, which means the compiler will do all the work of figuring out where to put the h, the i, the exclamation point, and even adding for you automatically a backslash zero. And what the compiler will do for you, too, is figure out what address all four of those chars ended up at and store it for you in the variable s. So that's why it just happens with strings without using ampersands or even stars explicitly, but the star at least has been there because again, string is just synonymous now with char star. It's not really as readable, but it is now the same idea. So I'll leave string in place just to do something week 1 style here for a moment, and let's go ahead and print out a few characters. So I'm going to use %c this time, and I'm going to print out s bracket zero and then I'm going to print out s bracket one and s bracket two, literally doing week three style from last week-- a printing of every character in s as though it were an array. So ./address should give me h-i exclamation point. And if I really want to get curious, technically speaking, I could print out one more location, and let me go ahead and recompile, make address ./address and there is, it would seem, the backslash zero. I'm not seeing zero because I didn't type literally the zero char in ASCII, it's literally eight zero bits which are technically unprintable, if you will, in printf speak. And so what I'm seeing here is like a blank symbol. That just means there is something else there-- it's apparently all eight zero bits, but they are there even though we're not seeing them literally right now. Well, let's go ahead and peel back one of these layers and let me go ahead and get rid of the CS50 library and get rid of, therefore, the word string because again, henceforth it's just char star. Nothing else is different. I'm going to now do make address, ./address, and it's the same exact thing. And now, let's just focus on the hi rather than even worry about that. So I'm going to recompile one last time and now I have h-i exclamation point. Well, it turns out that the array notation we used last week was technically some of this syntactic sugar. Sort of a neat way to use syntax in a useful way, but we can see more explicitly today what the square brackets for a string is actually doing. Let me go ahead and do this. Let me adventurously say I want to print out not s bracket zero, but I want to print out whatever the first character of s is. So to be clear, what is s now? It's the address of a string. OK, but what is s, really? s is the address of the first char in a string and again, that's sufficient for defining a string because eventually the computer will see that there's a backslash n at the end of it. So s is specifically the address of the first character in a string. So that means, using my new syntax, if I want to print out that first character I can print out star s, because recall that star is the dereference operator when you don't repeat the word char, you don't repeat the word int-- you just use the star here. That means go to that address. Similarly, if I, in my newfound knowledge of how strings work, know that the h comes first, then the i right after it, then the exclamation point, then the backslash zero, contiguously one byte apart, I could start to do some arithmetic. I could go to s plus 1 byte and print out the second character, and I could print out whatever is at s plus 2-- in fact, doing what's generally known as pointer arithmetic. Literally treating pointers as the numbers they are-- hexadecimal or decimal, doesn't really matter-- it's still just numbers. And go ahead and add one byte or two bytes to them to start at the beginning of a string and just poke around from left to right. So this now is equivalent to what we did last week using square bracket notation, but now I'm re implementing that same idea with this lower level plumbing, understanding ampersand and stars now a little bit more, so if I remake this program and do ./address, I should still see h-i exclamation point. But what I'm really doing is just kind of demonstrating, hopefully, my understanding of what really is going on in the computer's memory. Now, programmers who are maybe trying to show off might actually write this syntax. I think the more common syntax would be what we did last week-- s bracket zero, s bracket one. Why? It's just a little more readable and we don't need to brag about or care about this underlying representation. The square brackets last week we're an abstraction, if you will, on top of what is lower level math. But that's all that's going on underneath the hood. We're poking around from byte to byte to byte. All right, let me pause here, see if there's any questions on that one. Any questions on this? Let's do one more then, just to demonstrate that this is not even specific to strings. Let me go ahead and get rid of all of this and let me give myself an array of numbers like I did last week. So if I'm going to declare all the numbers at once using this funky curly brace notation, I can do like 4, 6, 8, 2, 7, 5, 0. So seven different numbers inside of an array that's automatically initialized like this. I don't, strictly speaking, need to say seven. The compiler is smart enough to figure out how many numbers I put with commas between them, and that just gives me an array containing 4, 6, 8, 2, 7, 5, 0. So it turns out I can print each of these numbers in the familiar way. I can do a printf of %i backslash n, and I can print numbers bracket zero, and let me just do some quick copy/paste just to print the first three of these. Theoretically, that should print out 4, 6, 8, and so forth. But I can do the same sort of manipulation understanding what pointers now are, using pointer arithmetic. So let me actually unwind this and just go back to one printf, and instead of printing numbers bracket zero like I might have last week, let me just go and print out whatever is at that address-- so asterisk numbers. Let me then print out the second digit, which is going to be whatever is at numbers plus 1, and then let me do this further and do whatever is at numbers plus 2, and if I really want to repeat this, let me do it four more times and do what's at location three, four, five, and six. And that's seven total numbers because I started counting at zero. So let me just quickly run this. Make address, ./address. There are those seven digits being printed. But there's something subtle but also useful here. Each of these digits-- 4, 6, 8, 2,7,5, 0-- is an int. Why? Because I made an array of integers. But think back-- how big is a typical integer, have we claimed? Four bytes, or 32 bits, so it's worth noting that I don't really need to worry about that detail. Notice that I did not do plus 4, plus 8, plus 12, plus 16, plus 20. I, the programmer, strictly speaking, don't need to worry about how big the data type is. This is the power of pointer arithmetic. The compiler is smart enough to know that if you add 1 to this pointer, that is the same as saying go one more piece of data-- not just one byte-- so if it's an int, move four. If it's a second int, move eight. If it's a third int, move 12. Pointer arithmetic handles that annoying arithmetic for you so you can just think of this as a number after a number after a number that are back to back to back but not one byte apart, but four bytes apart. Which is only to say plus 1, plus 2, plus 3 works no matter the data type. Why? Because the compiler knows what type of data you're talking about. Now, there's one other detail I should reveal here that I've taken for granted. In the past I was using double quotes to represent strings, and I claim that the compiler's smart enough to realize that oh, if I have double quote hi, that means it's an array of h-i exclamation point, and then the backslash zero. Notice this usefulness. It turns out that you can actually treat arrays as though the name of the array is itself a pointer, and this is actually going to be something useful in upcoming problems when we want to pass arrays around in the computer's memory. Notice that strictly speaking on line five, there's no pointers going on. There's no star, there's no ampersand-- there's nothing new there, and yet instantly on line seven I'm pretending that it is the address, and this is actually OK. It turns out that an array really can be treated as the address of the first element in that array. The difference is that there's no secret backslash zero anywhere. This is just part of the phone number here, the ending in zero-- that's not like a special backslash zero. So this is something we're going to take advantage of too, before long. There's this interrelationship between addresses and arrays that just generally allows you to treat one as though it is the other, but the math is taken care of for you. Are any questions then on this before we start to solve some bigger problems? Yeah. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Potentially. If you go beyond the end of an array, you might get a segmentation fault. The problem is that that symptom is sometimes nondeterministic, which means that sometimes it will happen, sometimes it won't. It often depends on how far off the end of the array you actually go. You'll often not induce the segmentation fault if you just poke a little too far, but if you go way too far it quite likely will. But we'll give you a tool today actually for detecting and solving exactly that kind of situation. So let's go ahead now and do something a little different in code, but that actually comes back to that spoiler from earlier. Let me go ahead and create a program called compare.c, and in this program I'm going to go ahead and allow myself the CS50 library, not so much for string but so that I can actually use GetInt still, which is way easier than the way we'll see that C normally lets you get input. Let me give myself stdio.h, do an int main(void), not worrying about command line arguments today, and let me go ahead and get an int i using get int, and ask the human for the value of i, then let me give myself an int j, ask the user for another int, calling it j, and then let me go ahead and kind of naively, but to your point earlier, if i equals equals j, then let's go ahead and print out something like "same," backslash n, else let's go ahead and print out "different" if they are not, in fact, the same. So that would seem to be a program that compares the value of two integers. All right, so let's go ahead and run make compare-- so far so good-- ./compare. OK, i will be 50, j will be 50-- they're the same. Let's do it once more. i will be 50, j will be 42. They are different. So so far, so good in this first version of comparison. But as you might see where I'm going with this, let's move away from integers and let's actually change these things to char-- to strings. So I could do string s over here-- GetString s over here. Then I could do string t over here, and GetString over here, asking the user for t this time, here. And then I can compare the two. If s equals equals t-- and this is a common convention. If you've used s for string already you can use t for the next one, at least for simple demonstrations like this. I'm going to compare the two, just like I did for ints, which worked great. Make compare-- so far so good-- ./address-- oh, sorry. Wrong program-- ./compare. Let me go ahead and type in something like hi, exclamation point and bye, exclamation point, which of course should definitely be different. Let me run it again with hi, exclamation point and hi, exclamation point. Different-- maybe I messed up. Let's maybe do it lowercase, maybe that'll fix. But no, those two are different. So to come back to what I described as a spoiler earlier, what's the fundamental issue here, to be clear? Why is it saying different even though I'm pretty sure I typed the same thing twice. Yeah. Yeah, this is where it's now useful to know that string has been an abstraction-- a training wheel, if you will-- and if we take that away-- still use GetString because that's convenient still-- but if I change string to be char star, it's a little more explicit as to what s and what t are. s is a pointer to a char, that is the address of a char. t is a pointer to a char, that is the address of a char. Specifically, the first character in s and the first character in t, respectively. So if I'm comparing these two it should stand to reason that they're going to be different. Why? Because s might end up here in memory and t might end up here in memory. Each time I call GetString, it is not smart enough or advanced enough to know that, wait a minute-- you typed the same thing. I'm just going to hand you back the same address. That doesn't happen because we did not design GetString that way. Each time I call GetString, it returns, apparently, a different copy of the string that was typed in. A hi over here and a hi over here. They might look the same to the human but to the computer they are different chunks of memory, and therefore at different addresses. And here, too, we can reveal what is GetString returning? Well, up until today it was returning a string, so to speak. That's not really a thing. Technically, what GetString has always been doing is returning the address of the first char in a string and trusting that we put a backslash zero at the end of whatever the human typed in, and that's enough now for printf, for strlen, for you to know where a string begins and ends. So GetString has actually always returned a pointer. It has not returned a quote unquote string per se, but there are functions that can solve this comparison for us. Recall that I could do something like this. I could actually go in here and I could-- let's see, where was it? So if I include str compare here and use it to pass in two values, s and t, let's see now what happens when I make compare. Implicitly declaring library function str compare with type int-- and well, there's a star. So you might have seen this error before and you might have ignored most of it, but there's some evidence of stars or pointers going on here. It looks like I didn't include the string.h header file, so that's an easy fix. Include string.h which, despite its name, does not create a data type called string, it just has string-related functions in it like str compare. Let's make compare again. Now it compiles, ./compare. Now let's type in hi, exclamation point and even the same thing again. These are now-- oh, I used it wrong. OK, user error. That was supposed to be impressive, but it's the opposite. What did I do wrong? What did I do wrong here? Yeah. Yeah. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Yeah, it returns three different values. Zero if they're the same, positive 1 becomes before the other, negative if the opposite is true. I just forgot that, so like I did last week correctly, if I want to compare them for equality per the manual page, I should be checking for zero as the return value. Now make compare, ./compare, Enter. Let's try it one last time-- hi and hi. OK now, they're in fact the same. And Justin, thank you. And indeed, not that it's returning same all the time. If I type in hi and then bye, it's indeed noticing that difference as well. Well, let me go ahead and do one other thing here. Let's do one other thing. Let me go ahead now and just reveal more pictorially what's going on. Let's get rid of the string comparison and let's just print these things out. The simple way to print this out would be with %s and again, %s is special-- printf knows-- taking an address and start there, print every character up until the backslash n, so let's just hand it s and do that. And then let's do one more, %s,t. This is, again, sort of a mix of week 1 and this week because I got rid of the word string. I'm using char star, but I'm still using printf and %s in the same way. Let me go ahead and run compare now, and if I type hi and hi, I should see the same thing twice. So they look the same, but here now we have the syntax today to print out the actual addresses of these things. So let me just change the s to a p, because p means don't go to the address and print it, it means just print the address as a pointer. So make compare, ./compare, and now let's type in hi, and once more, and I should see, indeed, two slightly different addresses given in hexadecimal. One's got a B at the end, one's got an F at the end, and they are indeed a few bytes apart. So this is just confirming what our suspicions have actually been. So what does this mean, perhaps in the computer's memory? Well, let's take a look. I've zoomed out so I have a little more squares to look at at once. Here might be s in memory when I do string s equals, or char star s equals. I get a variable that's of size 1, 2, 3, 4, 5, 6, 7, 8, because I claimed earlier that on modern systems, pointers are generally eight bytes nowadays so they can count even higher. And inside of the computer's memory, also, might be hi. And I don't know where it ends up so for the sake of discussion it ended up down here. That's what was free when I ran the program. h-i exclamation point, backslash zero. Maybe it ended up, for the sake of discussion, at 0x123, 4, 5, and 6. So to be clear, what is s storing once the assignment operator copies from right to left? What is s storing if I advance one more slide? Yeah. 0x123, the presumption being that if a string is defined by the address of its first char and that address of its first char is 0x123, then that's indeed what should be in the variable s. And so technically, that's what's been happening with that assignment operator from right to left. GetString indeed returns a string, so to speak, but more properly it returns the address of a char. What's been then copied from right to left using that assignment operator all these weeks is indeed that address. Now technically, we don't really need to care about where these addresses are. It suffices to just think about them referentially, but let's first consider where t might be. t is just another variable that I created on my second line of code. Maybe it ends up there, maybe somewhere else. For the sake of discussion I'll draw it left and right. Where did the second word end up that I typed in? Well, suppose the second copy of hi ended up at 0x456457458459. What ended up in t? I'll pluck this one off myself. 0x456, presumably. And so this is now a pictorial representation of why, and let's abstract away everything else. When I compared s against t using equal equals, based on the picture they're obviously not the same. One is over here, one is over here. And per a moment ago, one is 0x123, the other is 0x456. Yes, technically they're pointing at something that's the same, but that just reveals how str compare works. str compare is apparently a function that takes in the address of a string as its argument and the address of another string as its argument, it goes to the first character in each of those strings, respectively, and probably has a for loop or a while loop and just goes from left to right, comparing, looking for the same chars left and right, and if it doesn't notice any differences, boom-- it returns zero. If it does notice a difference it returns a positive or a negative value. And that's very similar, recall, to how we implemented string length ourselves last week. I used a for loop, I was looking for a backslash zero. str compare is probably a little similar in spirit, looping from left to right but comparing, this time not just counting. Are any questions then, on string comparison and why it is that we use str compare and not equals equals? Yeah. AUDIENCE: Do pointers have addresses? DAVID J. MALAN: Do pointers have addresses? Yes. So we won't do that today, but I could actually use the ampersand operator on s or on t. That would give me the equivalent of a char star star that itself could be stored elsewhere in memory. That's where it ends. We don't do that recursively forever. There's star and there's star star, but yes, that is a thing and it's very often useful in the context of two dimensional arrays, which we haven't really talked about, but that is a feature of the language, too. But not today. Good question. All right, so what might we now do to take things up a notch? Well let's go ahead and implement a different program here that maybe tries copying some values, just to demonstrate this. Let me open up a file called, how about copy.c, and I'm going to start off with a few includes. So let's include the CS50 library just so we have a way of getting user input. Let's include-- how about stdio as always, let's preemptively include string.h and maybe one other in a moment. Let's do int main(void) as before. And then in here, let's get a string from the user and just call it s for simplicity. And heck, we can actually just call this char star if we want, or string, since we're using the RS50 library. But we'll come back to that. Let's now make a copy of s and do s equals t, using a single assignment operator and then let's check something like this. Let's go into the first character of t, which is t bracket zero, and then let's uppercase it using that function that we've used in the past of toupper t bracket zero, semicolon. And actually, I should go back up here. If I'm using toupper or if you use tolower or isupper or islower-- I might not remember this offhand, but it was in another header file called C type dot h. There was a bunch of helpful functions in that library as well. Now at the very last line of the program let's just print out what both s and t are by simply printing out %s for each of them, and t is %s also, not %t, of course, and let's see what happens here. So let me make copy-- oh my God, so many mistakes. What did I do wrong? Oh. OK, that was unintended. String t equals s, sorry, so I'm creating two variables, s and t respectively, and I'm copying s into t. Make copy, Enter. There we go. ./copy, and let's now type in, for instance, how about hi exclamation point in all lowercase this time, and now what gets printed? I don't think that's what I intended, so to speak, here. Because notice that I got s from the user, so that checks out. I then copied t into s, which looks correct. That's what we always use assignment for. Then I uppercase the first letter in t, but not s-- at least in my code-- then I printed s and t and then noticed, apparently, both s and t got capitalized. So if you're starting to get a little comfortable with what's going on underneath the hood, what's the fundamental problem here? Why did both get capitalized? Why did both get capitalized? Yeah, over here. AUDIENCE: Could it be they're referencing the same address? DAVID J. MALAN: Yeah, they're representing the same address. So C is really literal. If you create another variable called t and you assign it the value of s, you are literally assigning it the value in s, which is 0x123 or something like that. And so at that point in the story both s and t presumably have a value of 0x123, which means they technically point to the same h-i exclamation point in memory. Nowhere did I tell the computer to give me a copy of a h-i exclamation point per se, I literally said just copy s. So here's where an understanding of what s literally is explains the situation. I'm only copying the pointers. So what actually went on in memory? Let's take a look here at this grid. If I created s initially, maybe it ends up here. And I created hi in lowercase, and it ended up down here. Then the address was, again, like 0x123456, 0x123 is what's in s. If then I create a second variable called t, and I call it a string, a.k.a. char star, maybe it again ends up here. But when I copy s into t by doing t equals s semicolon, that literally just copies s into t, which puts the value 0x123 there. So if we now abstract away all these numbers and just think about a picture with arrows, what we've drawn in the computer's memory is this. Two different pointers but storing the same address, which means the breadcrumbs lead to the same place. And so if you follow the t breadcrumb and capitalize the first letter, it is functionally the same as copying the-- changing the first letter in the version s as well. So what's the solution, then, to this kind of problem? Even if you have no idea how to do it in code, what's the gist of what I really intended, which is, I want a genuine copy of s, called t. I want a new h-i exclamation point backslash zero. What do I need to do to make that happen? Thoughts? AUDIENCE: I think there's a function called str copy. DAVID J. MALAN: So there is a function called str copy, strcpy, which is a possible answer to this question. The catch with stir copy is that you have to tell it in advance not only what the source string is-- the one you want to copy-- you also need to pass in the address of a chunk of memory into which you can copy the string, and here's one thing we haven't seen yet, and we need one more building block today, if you will. We haven't yet seen a way to create new chunks of memory and then let some other function copy into them. And for this, we're going to introduce something called dynamic memory allocation. And this is the last and most powerful feature perhaps, today, whereby we're going to introduce two functions, malloc and free, where malloc means memory allocate, which literally does just that. It's a function that takes a number as input-- how many bytes of memory do you want the operating system to find for you somewhere in that big grid? It's going to find it and it's going to return to you the address of the first byte of contiguous memory back to back to back, and then you can do anything you want with that chunk of memory. free is going to do the opposite. When you're done using a chunk of memory that malloc has given you, you can say free it, and that means you hand it back to the operating system and then the operating system can use it for something else later. So this is actually evidence of a common problem in programming. If your Mac your PC has ever been in the habit of starting to get really, really slow, or it's slowing to a crawl-- heck, maybe it even freezes-- one of the possible explanations could be that the program you're running by Apple or Microsoft or whoever, maybe they're using malloc or some equivalent, asking the operating system-- Mac OS or Windows-- for, give me more memory. I need more memory. The user is creating more images. The user is typing a longer essay. Give me more memory, more memory. If the program has a bug and never actually frees any of that memory, your computer might end up using all of the available memory and honestly, humans are not very good at handling corner cases like that. Very often programs, computers just freeze at that point or get really, really slow because they start trying to be creative when there's not enough memory left. So one of the reasons for a computer really slowing down might be calling for malloc a lot, or some equivalent, but never freeing it. Which is to say, you should always use these two functions in concert and free memory once you are done with it. So let me go ahead and do this in code and solve this problem properly. Let me go ahead and do this. Before I copy s into t using something like str copy, I first need to get a bunch of memory from the computer. So to do that, let's make this super clear that we're dealing with pointer, so I'm going to change my strings to char stars for both s and t, and what I technically am going to store in t is the address of an available chunk of memory. To do that, I can ask the computer to allocate memory for me, and how many bytes. If I want to create a copy of h-i exclamation point, I need how many bytes? Good! Four! Because I need the h, the i, the exclamation point, and additional space for the backslash zero. It's up to me to understand that and ask for it. It's not going to happen magically. Nothing does in C. So I could just naively type four there, and that would be correct if I type in h-i exclamation point or any other three letter word or phrase, but to do this dynamically I should probably do something like strlen of s plus 1 for the additional null character. Recall that string length does it in the English sense-- it returns the length of the string you see, plus 1 also takes into account the fact that I'm going to need that backslash n. Now let me do this old school style first. Let me go ahead and manually copy the string s into t first. So for int i equals 0, i is less than the string length of s, i plus plus. Then inside my for loop, I'm going to do t bracket i equals s bracket i, but actually I want the null character too, so I want to do the length of the string plus 1 more, and heck, I think I learned an optimization last time. If I'm doing this again and again, I could really do n equals strlen of s plus 1 and then do i is less than n, just as a nice design optimization. I think this for loop will actually handle the process, then, of copying every character from s into every available byte of memory in t. Or I could get rid of all of that and take your suggestion, which is to use str copy, which takes as its first argument the destination and its second argument the source. So copy from right to left in this case, too, that's going to do all of that automatically for me as well. Now I think I'm good. I can now capitalize safely. The first character in t, which is now a different chunk of memory than s, and then I can print them both out to see that one has not changed but the other has. So make copy-- all right, what did I do wrong? Implicitly declaring library function malloc dot, dot, dot. So we've seen this kind of error before. What is-- even if you don't know quite how to solve it, what's the essence of the solution? What do I need to do to fix this kind of problem involving implicitly declaring a library function? What did I forget? Yeah. I need to include the library. And I could look this up in the manual, or I know it off the top of my head, I just forgot it. There's another library we'll occasionally need now called standard lib-- standard library-- that contains malloc and free prototypes and some other stuff, too. All right, let me just clear this away and do make copy one more time. Now I'm good. ./copy, Enter, All right. s, I'm going to type in hi, lowercase. t and s now come back as intended. s is untouched, it would seem, but t is now capitalized. Are any questions, then, on what we just did in code? Yeah. AUDIENCE: You said that malloc and free go together. [INAUDIBLE] DAVID J. MALAN: Indeed. There's a few improvements I want to make, so let me actually do those right now. Technically, I should practice what I preached and I should indeed, when I'm done with t, free t. Fortunately, I don't have to worry about how big t was-- the computer remembers how many bytes it gave me and it will go free all of them, not just the first. I should do free t. I don't need to do free s, and I shouldn't, because that is handled automatically by the CS50 library. s, recall, came from GetString, and we actually have some fancy code in place that makes sure that at the end of your program's execution we free any memory that we allocated so we don't actually waste memory like I described earlier. But there's actually a couple of other things if I really want to be pedantic I should put in here. It turns out that sometimes malloc can fail, and sometimes malloc doesn't have enough memory available because maybe your computer's doing so much stuff there's just no more RAM available. So technically, I should do something like this-- if t equals equals null, with two L's today, then I should just return 1 or something to say that there was a problem. I should probably print an error message too, but for now I'm going to keep it simple. I should also probably check this. This is a little risky of me. If I'm doing t bracket zero, this is assuming that there is a letter there. But what if the human just hit Enter at the prompt and didn't even type h, let alone h-i exclamation point? What if there is no t bracket zero? So technically, what I should probably do here is, if the length of t is at least greater than zero, then go ahead and safely capitalize the first letter of it. And then at the very end if all goes well, I can return zero, thereby signifying that indeed, this thing was successful. So yes, these two functions, malloc and free, should be in concert. And so if you call malloc you should call free eventually. But you did not call malloc for s, so you should not call free for s. Yeah, other question. AUDIENCE: Here's a question. Why do we do malloc plus 1? DAVID J. MALAN: Why did I do malloc plus 1? So malloc-- sorry, malloc of string length of s plus 1-- the string length is the literal length of the string as a human would perceive it in English. So h-i exclamation point-- strlen gives me 3, but I know now as of last week and this week what a string technically is and a string always has an extra byte. The onus is on me to understand and apply that lesson learned so that I actually give str copy enough room for that trailing null character. And here's just an annoying thing when we called the backslash zero N-U-L last week, it turns out that N-U-L-L is the same idea. It's also zero, but it's zero in the context of pointer. So long story short, you never really write N-U-L, I've just said it and we saw it on the screen. You will start writing N-U-L-L when you want to check whether or not a pointer is valid or not. And what I mean by that is this. If malloc fails and there's just not enough memory left inside of the computer for you, it's got to return a special value, and that special value is N-U-L-L in all capital letters. That signifies something went wrong. Do not trust that I'm giving you a useful return value. Other questions on these copies thus far? Yeah, over there. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Good question. Will str copy not work without malloc? You kind of need both in this case because str copy, by definition-- if I pull up its manual page-- needs a destination to put the copied characters. It's not sufficient just to say char star t semicolon. That only gives you a pointer. But I need another chunk of memory that's just as big as h-i exclamation point backslash zero, so malloc gives me a whole bunch of memory and then str copy fills it with h-i exclamation point backslash zero. So again, that's why we're going down to this lower level, because once you understand what needs to be done you now have the functions to do it. So let's actually consider what we just solved. So in this next version of the program where I actually introduced malloc, t was initialized for the return value of malloc, and maybe the memory that I got back was here-- 0x456457458459. I've left it blank initially because nothing is put there automatically by malloc. I just get a chunk of memory that is now mine to use as I see fit. I then assign t to that return value, which points t at the first address. Notice there's no backslash zero. This is not yet a string it's just a chunk of memory-- four bytes-- an array of four bytes. What str copy eventually did for me was it copied the h over, the i over, the exclamation point over, and the backslash zero. And if I didn't want to use str copy or I forgot that it existed, my for loop would have done exactly the same thing. Are any questions, then, on these examples here. Any questions? Yeah. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Good question. After malloc, if I had then still done just t equals s, it actually would have recreated the same original problem by just copying 0x123 from s into t. So then I would have been left with a picture that looked like this a few steps ago, I would have-- and I can't quite do it live-- this arrow, if I did what you just described, would now be pointing over here and so I wouldn't have fundamentally solved the problem, I would have just additionally wasted four bytes temporarily that I'm not actually using. Yeah. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: You can-- do you always use malloc and str copy together? Not necessarily. These are both solving two different problems. malloc's giving me enough memory to make a copy, str copy is doing the copy. However, you could actually use an array, if you wanted, of characters, and you could use str copy on that, and there's other use cases for str copy. But thus far, it's a reasonable mental model to have that if you want to copy strings, you use malloc and then str copy, or your own homegrown loop. Yeah. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Say that once more. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: No. It will-- good question. If I had a-- str copy, per its documentation, will copy the whole string plus the null character at the end. It just assumes there will be one there. It's therefore up to you to pass str copy a long enough chunk of memory to have room for that. If I only ask malloc for three bytes, that could have potentially created a memory problem whereby str copy would just still blindly copy one, two, three, four bytes, but technically it should have only touched three of those. You do not yet have access to the fourth one, or the rights to it, because you never asked malloc for it. Yeah. AUDIENCE: So the number inside malloc would be the number of bytes. DAVID J. MALAN: Correct. The number inside malloc-- it's one argument. It's the number of bytes you want back. AUDIENCE: Does that mean you have to remember [INAUDIBLE]? DAVID J. MALAN: Yes, the onus is on you, the programmer, to remember or frankly, use a function to figure out how many bytes you actually need. That's why I did not ultimately type in four manually, I used str length plus 1. So the plus 1 is necessary if you understand how strings are represented, but using strlen means that I can actually play around with any types of inputs and it will dynamically figure out the length. So suffice it to say, there's so many ways already where you can start to break programs. Let's give you at least one tool for finding mistakes that you might make. And indeed, in upcoming problem sets you will use this to find bugs in your own code. Not just using printf, not just using the built-in debugger, but another tool here as well. So let me go ahead and deliberately write a program called memory.c that has some memory-related errors. Let me include stdio.h at the top and let me include stdlib.h at the top so I have access to malloc now. Let me do int main(void) and then inside of main, let me do this-- I want to allocate maybe how about three-- space for three integers. Why? Just for the sake of discussion. So I'm going to go ahead and do malloc of three, but I don't want three bytes. I want three integers and an integer is four bytes, so technically I could do this-- 3 times 4, or I could do 12 but again, that's making certain assumptions and if I run this program on a slightly different computer, int might be a different size. so the better way to do this would be 3 times whatever the size is of an int. And this is just an operator you can use any time if you just want to find out on this computer, how big is an int? How big is a float, or something else? So that's going to give me that many-- that much memory for three ints. What do I want to assign this to? Well, malloc returns an address. Pointers are addresses, so I'm going to create a pointer to an int called x and assign it the value. So what am I doing here? This is a little less obvious, but again go back to basics. The right hand side here gives me a chunk of memory for three integers. malloc returns the address of the first byte of that chunk. How do I store the address of anything? I need a pointer. The syntax for today is type of data, star, where the type of data in question is three ints, so I do int star x. Again, it's kind of purposeless, only for sort of instructional purposes here, but this is equivalent now to having a chunk of memory of size 12 in total, presumably, so I can technically now do this. I can go into maybe the first location and assign it the number 72 like the other day. Second location, the number 73, and the third location, maybe the number 33. Now I've deliberately made two mistakes here because I'm trying to trip over my newfound understanding, or my greenness with understanding pointers. One, I didn't remember that I should be treating chunks of memory as zero indexed. malloc essentially returns an array, if you want to think of it as that. An array of three ints, or more technically, the address of a chunk of memory that could fit three ints. So I can use my square bracket notation, or I could be really cool and use pointer arithmetic, but this is a little more user friendly. But I have made two mistakes. I did not start indexing at zero, so line seven should have been x bracket zero. Line eight should have been x bracket 1, and then line nine should have been x bracket 2. So first mistake. The second mistake that I've made as a side effect, is I'm also touching memory that I shouldn't. x bracket 3 would mean go to the fourth int in the chunk of memory that came back. I only asked for enough memory for three ints, not four, so this is what's called a buffer overflow. I am accidentally, but deliberately at the moment, going beyond the boundaries of this array, this chunk of memory. So bad things happen, but not necessarily by just running your program. Let me go ahead and just try this. Make memory, and you'll see here that it compiles OK. ./memory, and it actually does not segmentation fault, which comes back to that point of nondeterminism. Sometimes it does, sometimes it doesn't-- it depends on how bad of a mistake you made. But there's a program that can spot these kinds of mistakes, and I'm going to go ahead and expand my terminal window for a moment and I'm going to run not just ./memory, but a program called Valgrind./memory. This is a command that comes with a lot of computer systems that's designed to find memory-related bugs in code. So it's a new tool in your toolkit today, and you'll use it with the coming problem sets. I'm going to run this now. It's output, honestly, it's hideous. But there's a few things that will start to jump out and will help you with tools and the problems sets to see these kinds of things. Here's the first mistake. Invalid write of size four. That's on memory.c line nine, per my highlights. So let me go look at line nine. In what sense is this an invalid write of size four? Well, I'm touching memory that I shouldn't, and I'm touching it as though it's an int. And an int is four bytes-- size four. So again, this takes some practice to get used to, the nomenclature here, but this is now a clue for me, the programmer, that not only did I screw up, but I screwed up related to memory and so this is just a hint, if you will. It's not going to necessarily tell you exactly how to fix it, you have to wrestle with the semantics, but invalid write of size four-- oh, OK. So I should not have indexed past the boundary here. All right, so I shouldn't have done that. So let me go ahead then and change this to zero, one, and two, perhaps, here. All right, so let me go ahead and recompile my code. Make memory, ./memory, still doesn't seem to be broken but it is technically buggy. Let me go ahead and run Valgrind again, so Valgrind of ./memory, Enter. And now there's fewer scary-- less scary output now, but there's still something in there. Notice this-- 12 bytes in one blocks-- no regard for grammar there-- are definitely lost in lost record one of one. Super cryptic, but this is hinting at a so-called memory leak. The blocks of memory are lost in the sense that I malloc'd them-- I asked for them but I never-- take a guess-- freed them. I have a memory leak. And this is the arcane way of saying, you've screwed up. You have a memory leak. So this is an easy fix, fortunately. Once I'm done with this memory I just need to free it at the end. So now let me go ahead and rerun make memory, it's still runs fine so all the while I might have thought, incorrectly, my code is correct. But let me run Valgrind one more time. Valgrin of ./memory, Enter. Now, this is pretty good. All heap blocks were freed, whatever that means. No leaks are possible. And even though it's still a little cryptic, there's no other error here and in fact, it's pretty explicit-- error summary, zero errors from zero contexts, dot, dot, dot. So even though this is one of the most arcane tools we'll use, it's also one of the most powerful because it can see things that you, the human, might not, and maybe even that the debugger might not. It does a much closer reading of your code while it's running to figure out exactly what is going on. Any questions, then, on this tool? And we'll guide you after today with actually using this, too. Just helps you find memory-related mistakes that you might now be capable of making. All right, let's do one other memory-related thing. Let me shrink my terminal window here. Let me create one other file here called garbage.c. It turns out there's a term of ours called garbage values in programming that we can reveal as follows. Let me include stdio.h, and let me include-- how about stdlib.h, and then let me give myself int main(void), and then in this relatively short program let me give myself three ints using last week's notation, just int scores bracket 3 for 3 quiz scores, or whatever. Then let me go ahead and do for int i equals zero, i less than 3, i plus plus, then let me go ahead and print out, %i backslash n, scores bracket i semicolon. That's it. This code, pretty sure is going to compile and it's going to run, but what is my logical bug? I've forgotten a step even though the code that's written is not so wrong. Yeah? Yeah, I didn't provide the scores, so I didn't actually initialize the array called scores to have any scores whatsoever. What's curious about this, though, is that the computer technically doesn't mind. Let me go ahead and playfully make garbage, Enter, and it's an apt description because what I'm about to see are so-called garbage values. When you, the programmer, do not initialize your codes variables to have values, sometimes, who knows what's going to be there. The computer's been doing some other things, there's a bit of work that happens even before your code runs in the computer, so there might be remnants of past ints, chars, strings, floats-- anything else in there and what you're seeing is those garbage values, which is to say you should never forget, as I just did, to initialize the value of some variable. And this is actually pretty dangerous, and there have been many examples of software being compromised because of one of these issues where a variable wasn't initialized and all of a sudden users, maybe people on the internet in the context of web applications, could suddenly see the contents of someone else's memory, or remnants. Maybe someone's password that had been previously typed in or some other value like a credit card number that had been previously typed in. There are different defense mechanisms in place to generally make this not so likely, but it's certainly very possible, at least in this kind of context, to see values that you probably shouldn't because they might be remnants from something else that used them. So this is to say again, you have this great power now to manipulate memory, but also now you have this great hacking ability to poke around the contents of memory, and this is exactly what hackers sometimes do when trying to find ways to exploit systems. Are any questions here? No? All right, let's go ahead and take a quick five minute break and when we come back, we'll build on these final topics. See you in five. We are back. First, just a little programmer humor from XKCD, which hopefully now will make a little bit of sense to you. And what we'll also do next to take a look at a short two minute video that animates with claymation, if you will, from our friends at Stanford, exactly what happens now if you have an understanding of what garbage values are and how they get there, and what happens then if you misuse them. It's one thing just to print them out as I just did, it's another if you actually mistake a garbage value for a valid pointer, because garbage values are just zeros and ones somewhere-- numbers, that is. But if you use that new dereference operator, the star, and try to go to a garbage value thinking incorrectly that it's a valid pointer, bad things can happen. Computers can crash or more familiarly, segmentation faults can happen. So allow me to introduce, if we could dim the lights for two minutes, our friend Binky from Stanford. SPEAKER 1: Hey Binky, wake up. It's time for pointer fun. BINKY: What's that? Learn about pointers? Oh, goody! SPEAKER 1: Well, to get started, I guess we're going to need a couple of pointers. BINKY: OK, this code allocates two pointers which can point to integers. SPEAKER 1: OK. Well, I see the two pointers, but they don't seem to be pointing to anything. BINKY: That's right. Initially, pointers don't point to anything. The things they point to are called pointees, and setting them up is a separate step. SPEAKER 1: Oh, right, right. I knew that. The pointees are separate. So how do you allocate a pointee? BINKY: OK, well this code allocates a new integer pointee, and this part sets x to point to it. SPEAKER 1: Hey, that looks better. So make it do something. BINKY: OK, I'll dereference the pointer x to store the number 42 into its pointee. For this trick, I'll need my magic wand of dereferencing. SPEAKER 1: Your magic wand of dereferencing? That great. BINKY: This is what the code looks like. I'll just set up the number and-- SPEAKER 1: Hey, look. There it goes. So doing a dereference on x follows the arrow to access its pointee, in this case to store 42 in there. Hey, try using it to store the number 13 through the other pointer, y. BINKY: OK. I'll just go over here to y and get the number 13 set up, and then take the wand of dereferencing and just-- whoa! SPEAKER 1: Oh hey, that didn't work. Say, Binky, I don't think dereferencing y is a good idea because setting up the pointee is a separate step and I don't think we ever did it. BINKY: Good point. SPEAKER 1: Yeah, we allocated the pointer y, but we never set it to point to a pointee. BINKY: Very observant. SPEAKER 1: Hey, you're looking good there, Binky. Can you fix it so that y points to the same pointee as x? BINKY: Sure, I'll use my magic wand of pointer assignment. SPEAKER 1: Is that going to be a problem, like before? BINKY: No, this doesn't touch the pointees, it just changes one pointer to point to the same thing as another. SPEAKER 1: Oh, I see. Now y points to the same place as x. So wait, now y is fixed. It has a pointee so you can try the wand of dereferencing again to send the 13 over. BINKY: OK, here it goes. SPEAKER 1: Hey, look at that. Now dereferencing works on y. And because the pointers are sharing that one pointee, they both see the 13. BINKY: Yeah, sharing. Whatever. So are we going to switch places now? SPEAKER 1: Oh look, we're out of time. BINKY: But-- That's from our friend Nick Parlante at Stanford. So let's consider what Nick did here as Binky. So here is all the code together. These first couple of lines were not bad, and notice that in Stanford's code they move the stars to the left. That's fine. Again, more conventional might be this syntax here. These two lines are fine. It's OK to create variables, even pointers, and not assign them a value initially so long as you eventually do. So we eventually do here, with this line. We assign to x the return value of malloc, which is presumably the address of something. To be fair, we should really be checking for null as well, but that's not the biggest problem here. The biggest problem is not even this next line, which means go to the memory location in x and store the number 42 there. That's fine, because again, malloc returns the address of some chunk of memory. This chunk of memory is big enough for an int. x is therefore going to store the address of that chunk that's big enough for an int. Star x recalls the dereference operator, means go to that address and put 42 in it. It's like going to the mailbox and putting the number 42 in it instead of taking the number 50 out, like we did before. But why is this line bad? This is where Binky lost his head, so to speak. Why is this bad? Yeah. AUDIENCE: We haven't yet allocated space for it. DAVID J. MALAN: Exactly. We haven't yet allocated space for y. There's no mention of malloc, there's no assignment of y, even to that same memory. So this would be, go to the address in y, but if there is no known address in y, it is a so-called garbage value, which means go to some random address that you have no control over, and boom-- that might cause what we've seen in the past, perhaps as a segmentation fault. Now this, fortunately, is the kind of thing that if you don't quite have the eye for it yet, Valgrins, that new tool, could help you find as well. But it's just another example of again, the sort of upside and downside of having control now over memory at this level. All right. Well, let's go ahead and do one other thing. Considering from last week that this notion of swapping was actually a really common operation. We had all of our volunteers come up, we had to swap a lot of things during bubble sorts and even selection sort, and we just took for granted that the two humans would swap themselves just fine. But there needs to be code to do that if you actually implement bubble sort, selection sort, or anything that involves swapping. So let's consider some code like this. We'll keep it simple like last week, and where we wanted to swap some values like int A and int B, for instance, here. Void because I'm not going to return a value, but I have a function called swap. So here, for instance, might be some code for this. But why is it so complicated? Here, let's actually take a step back. Why don't we do this here. I think we have time for one more volunteer. Could we get someone to come on up? You have to be comfy on camera and you're being asked to help with your-- oh, I'll go with the friend, pointing. So whoever has their friend doing this here-- no? Now they're pointing it over here. Now, literally an arm is being twisted. OK. Come on down. That backfired. Come on over. And what is your name? AUDIENCE: Marina. DAVID J. MALAN: Marina. Nice to meet you. Who were you trying to volunteer? AUDIENCE: My friend Jesse. DAVID J. MALAN: OK. So here we have for Marina two glasses of liquid, orange and purple, just so that they're super obvious. And suppose that the problem at hand, like last week, it's just to swap two values, as though these two glasses represented two people and we want to swap them. But let's consider these glasses to be like variables, or location in an array, and you know what? I'd really like you to swap the values. So orange has to go in there, and purple has to go in there. How would you do it? And we'll see if we can then translate that to code. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: OK, what-- say it a little louder. All right, yeah. So presumably, you're struggling mentally with how you would do this without having an extra cup, so good foresight here. Let me go ahead and we do have a temporary variable, if you will. So if I hand you this, how would you now solve this problem? AUDIENCE: I would go like that, but it's-- DAVID J. MALAN: No, that's-- Oh. Well, OK. Go do it-- go with your instincts. OK. Sure, go ahead. Go to whatever your instincts are. Yeah, so a little-- so strictly speaking, probably shouldn't have moved the glasses just because that would be like moving the array locations, so let's actually do it one more time but the glasses now have to go back where they originally are. So how would you swap these now, using this temporary variable? OK, good. Otherwise we'd be completely uprooting the array, for instance, by just physically moving it around. So you moved the orange into this temporary variable, then you copied the purple into where the orange was, and now, presumably, excellent. The orange is going to end up where the purple once was and this temporary variable, it stored up some extra memory. It was necessary at the time, but not necessary, ultimately. But a round of applause if we could, and thank you for doing that so well. So the fact that it instantly occurred to Mariana that you need some temporary variable is a perfect translation to code, and in fact this code here, that we might glimpse now, is reminiscent of exactly that algorithm, where A and B, at the end of the day, are the same chunks of memory. Just like the second time, the two glasses have to kind of stay put, even though we're physically lifting them, but they're going back to where they were, is kind of like having two values, A and B, and you just have a temporary variable into which you copy A, then you change A with B, then you go and change B with whatever the original value of A was, because you temporarily stored it in this temporary variable, tmp. Unfortunately, this code doesn't necessarily work as intended. So let me go over to my VS Code here and open up a program called swap.c, and in swap.c, let me whip up something really quickly here with, how about include stdio.h, int main(void). Inside of main let me do something like x gets 1 and y gets 2. Let me just print out as a visual confirmation that x is %i, y is %i backslash n, plugging in x and y, respectively. Then let me call a swap function that we'll invent in just a moment. Swap x and y And then let me print out again x is %i, y is %i backslash n, just to print out again what they are, because presumably I should see 1, 2 first, then 2, 1 the second time. Now how is swap going to be implemented? Let me implement it exactly as on the screen a moment ago. So void swap int x-- or let's call it int A for consistency, int B. But I could always call those anything I want. Int tmp gets A, A gets B, B gets tmp. So exactly as I proposed a moment ago, and exactly as Mariana really implemented it using these glasses of water. I need to now include my prototype, as always, so nothing new there. And I'll just copy/paste that up here, and now let's go ahead and run this. So make swap-- so far, so good-- swap-- x is now 1, y is 2, x is 1, y is 2. So there seems to be a bit of a bug here, but why might this be? This code does not in fact work, even though it obviously works in reality. Yeah? AUDIENCE: Because A and B have different addresses than x and y [INAUDIBLE]. DAVID J. MALAN: Good, and let me summarize. A and B do indeed have different addresses of x and y, and in fact what happens when you call a function like this on line 11, calling swap, passing in x and y, you are calling a function by value, so to speak. And this is a term of art that just means you are passing in copies of x and y, respectively, and calling them A and B in the context of this function, but they're indeed copies. Now technically, these names are local only. I could have called this x, I could have called this y, I could have changed this to x, this to y, this to x, and this to y. The problem would still remain. Just because you use the same names in one function as you do elsewhere, that doesn't mean they're the same. They just look the same to you. But indeed, swap is going to get copies of this x and y, and in this context, this scope, so to speak-- x and y will be copies of the original. So for clarity, let me revert this back to A and B just to make super clear that they're indeed different, albeit copies, but there's indeed a problem there. This function actually works fine. In fact, notice this. Let me go ahead and print out inside of this. printf A is %i, B is %i backslash n, and then I'll print A and B. And let me do that same thing at the beginning of this function before it does any work. Let me go ahead and rerun. Make swap, ./swap, and this is promising. Initially, x is 1, y is 2, A is 1, B is 2, A is 2, B is 1, but then nope-- x is 1, y is 2. So if anything, I've confirmed that the logic is right-- Mariana's logic is right, but there's something about C. There's something about using one function versus another that's actually creating a problem here. The fact that I'm passing in copies of these values is creating this problem. So what in fact is going on? Well again, inside of your computer's memory there is these little chips, and we've been talking about them abstractly, it's just this grid of memory locations. It turns out that your computer uses this memory in a pretty conventional way. It's not just random, where it just puts stuff wherever is available, it actually uses different parts of the memory for different purposes. And you have control over a lot of it, but the computer uses some of it for itself. And let's go ahead and zoom out from this and consider that within your computer's memory, what a computer will typically do is actually store initially, all of the zeros and ones that you compiled in the top of your computer's memory, so to speak. So when you compile a program and then you run it with ./whatever, or on a Mac or PC you double click on it, the computer first-- the operating system first-- loads all of your program zeros and ones, a.k.a. Machine code, into just one big chunk of memory at the top, so to speak. Below that it stores global variables-- any variables you have created in your program that are outside of main and outside of any functions. Generally, the top of your file. Globals tend to go at the top there. Then there's this chunk of memory that's generally known as the heap-- and we saw that word briefly in Valgin's output, and then there's this other chunk of memory called the stack. And it turns out that up until this week you were using the stack heavily. Any time you use local variables in a function they end up on the stack. Any time you use malloc, that memory ends up on the heap. Now as the arrow suggests, this actually looks like a problem waiting to happen because if you use more and more and more heap, and more and more and more stack, it's like two things barreling down the tracks at one another-- this does not end well. And that's actually a problem. If you've ever heard the phrase stack overflow, or use the website, this is the origin of its name. When you start to use more and more and more memory by calling lots and lots of functions or using lots and lots of local variables, you use a lot of this stack memory. Or if you use malloc a lot and keep calling malloc, malloc, malloc, and never really, or rarely calling free, you just use more and more memory and eventually these two things might overflow each other, at which point you're just out of luck. The program will crash or something bad will happen. So the onus is on you just to don't do that. But this is the design, generally, of what's going on inside of your computer's memory. Now within that memory, though, there are certain conventions focusing on here, the stack. And in fact, let me go over here with a marker and say that this represents the bottom of my memory, ultimately. And so here we have a whole bunch of wooden blocks and each of these squares represents a byte of memory and this, for instance, might represent four bytes altogether-- good enough for an int, or something like that. So in my original code that I wrote earlier, that is in fact, buggy, what is in fact going on inside the swap function? We can visualize it like this-- when you run ./swap or any program for that matter, main is the first function to get called with a C program, and so I'm just going to label this bottom row of memory as main. And what were the two variables I had in main called in this code? Yeah. x and y. And each of those was an int, so that's four bytes, so it's deliberate that I reserved four-- a chunk of wood here that's four bytes. So let me just call this x, and I'm just going to write the number 1 in this box here. And then I had my other variable y, and I'm going to put the number 2 there. What happens when main calls swap like it does in this code here? Well, it has two variables of its own, A and B, and A initially is 1 and B is initially 2, but it has a third variable, tmp, which is a local variable in addition to the arguments A and B that are passed in, so I'm going to call this tmp, tmp over here. And what is the value of tmp? Well, we have to look back at the code. tmp initially gets the value of A. All right, the value of a was 1, so tmp initially gets 1. That's step one in my three line program. OK, A equals B. So that is assigned from the right to the left of the B into the A So B is 2, A is this, so let me go ahead and erase this and just overwrite that. So at this moment in the story you have two copies of two, so that's OK though, because the third line of code says tmp gets copied into B. So what's tmp-- 1, gets copied into B, so let me overwrite this 2 with a 1, and now what happens? Now unfortunately, the code ends. swap doesn't actually do anything with the result, and the problem in C is that I could have had a return value. I could go in there and change void to int, but which one am I going to return? The A or the B? The whole goal is to swap two values, and it seems kind of lame if you can't write a function to do something as common per last week sorting algorithms as swapping two values. But what really happens? Well, even though when this program starts running, main is using this chunk of memory at the bottom in the so-called stack, and the stack is just like a cafeteria stack of trays-- it grows up, like this. Here's main's memory on the stack. Here's the swap function's memory on the stack. It's using three ints instead of two-- instead of only two. What happens when the function returns, whether it's void or not? The sort of recollection that this is swap's memory goes away and garbage values are left. So, adorably, we get rid of these values here, and there's still data there-- technically, the numbers 1, 1, and 2 are still there in the computer's memory but they no longer belong to us because the function has now returned. So they're still in there and this is kind of an example visually of why there's other stuff in memory even though you didn't put it there, necessarily. Sometimes you did put it there, but now once swap returns you only should be touching memory inside of main. But we've never actually copied one value into main. We haven't returned anything and we haven't solved this fundamentally. So how could we do this? Well, what if we instead passed into swap not copies of x and y, calling them A and B. What if they passed in breadcrumbs to x and y, sort of a treasure map that will lead swap to the actual x and to the actual y? Today we have that capability using pointers. So suppose that we use this code instead. There's a lot of stars going on here, which is a bit annoying, but let's consider what it is we're trying to achieve. What if we pass in not x and y, but the address of x and the address of y, respectively-- breadcrumbs, if you will-- that will lead swap to the original values. Then what we do is we still give ourselves a tmp variable, like an empty glass. It's still a glass, so we still call it an int, but what do we want to put into that temporary variable? We don't want to put A into it, because that's an address now. We want to go to that address per the star and put whatever's at that address. What do we then want to do? Well, we want to then copy into whatever's at location A, we want to copy over to location A's contents whatever is at location B's contents and then lastly, we want to copy tmp into whatever's at location B. So again, we're very deliberately introducing all of these stars because we don't want to change any of these addresses, we want to go to these addresses per the reference operator and put values there, or get values from. So what does this actually mean? Well, if I kind of rewind in this story and I go back here, I still have tmp, although I'm going to delete its value to begin with, I still have B and I still have A, but what's going to be different this time is how I use A and B. So let me finish erasing those. That's A on the left, this is B on the right. At this point in the story, we're rerunning swap with this new and improved version, and let's see what happens. Well, x is presumably at some address. Maybe it's like 0x123, as always. What then does A get when I'm using this code? The value of A is 0x123. What is the value of B? Maybe y is that 0x456. What goes in B? Well, I'm going to put 0x456, and the what am I going to do? Based on these three lines of code, I'm going to store in tmp whatever is at the address in A. What is the address in A? That's this thing here, so I'm going to put 1 in tmp. Line two-- I'm going to go to B-- all right, B is 456, so I'm going to B and I'm going to store 2 at whatever is at location A, and at location A is 123, so that's this, so what am I going to do? I'm going to change this 1 to a 2. Last line of code-- get the value of tmp, which is 1, and then put it at whatever the location B is, so B, 456, go there and change it to be the value of tmp, tmp, which puts 1 here. That's it for the code. There's still no return value. swap returns, which means these three temporary variables are garbage values now. They can be reused by subsequent function calls but now, I've actually swapped the values of x and y. Which is to say what came as naturally as the real world here for Mariana is not quite as simply done in C because again, functions are isolated from each other. You can pass in values but you get copies of those values. If you want one function to affect the value of a variable somewhere else, you have to 1, understand what's going on but 2, pass things in as by a pointer here. So if I go back to my code here, I need to make a few changes now. Let me get rid of these extra printf's. Let me go in and add all these stars. So I'm dereferencing these actual addresses here and here, and I've got to make one more change. How do I now call swap if swap is expecting an int star and an int star? That is, the address of an int and the address of another int. What do I change on line 11 here? Yeah. Sorry, a little louder. AUDIENCE: [INAUDIBLE] DAVID J. MALAN: Sorry, the address of operator. So up here on line 11, we do ampersand x and ampersand y. So that yes, we're technically passing in a copy of a value, but this time the copy we're passing in is technically an address, and as soon as we have an address, just like when I held up the fuzzy finger-- the foamy finger-- I can point at that address, I can go to that address and actually get a value from the mailbox or put a value into the mailbox if I even want. So let's cross our fingers now and do make swap, Enter. Oh my God, so many mistakes. Oh, I didn't remember to change my prototype, so let me go way up here and add two more stars because I made that change already. Make swap, ./swap, and viola-- now I have actually swapped. Thank you. Thank you. The two values. All right, so what more can we do here? Well, let me consider that all this time we've been deliberately using GetString and GetInt and GetFloat and so forth, but for a reason. These aren't just training wheels for the sake of making things easier, they're actually in place to make your code safer. And to illustrate this, let me go ahead and open up one other file here. How about a file called scanf.c. It turns out that the old school way-- the way in C, really, of getting user input, is via functions like scanf, and let me go ahead and include stdio.h, int main(void), and without using the CS50 library at all for strings or for any of those get functions. Let me give myself an int called x. Let me just print out what the value of x is, even though it's going to be a-- or rather, ask the user for the value by asking them for x. And I'm going to use a function called scanf that's going to scan in an integer using %i, and I'm going to store whatever the human types in at this location. And then I'm going to go ahead and, just so we can see what happened, I'm going to print out with %i whatever the human typed in as follows. All right, so line eight is week 1 style code. Line five and six is week 1 style code. So the curiosity today is this new line. scanf is another function in stdio.h, and notice what I'm doing. I'm using the same syntax that I use for printf, which is kind of a little clue-- a format code to tell scanf what it is I want to scan in, that is, read from the human's keyboard-- and I'm telling it where to put whatever the human typed in. I can't just say x, because we run into the same darn problem as with swap. I have to give a little breadcrumb to the variable where I want scanf to put the human's integer. And so this just tells the computer to get an int. This is what you would have had to type, essentially, in week 1 just to get an int from the user, and there's a whole bunch of things that can go wrong still, but that's the cryptic syntax we would have had to show you in week 1. Let me go ahead and make scanf here-- oops-- user error. Put the semicolon in the wrong place. Make scanf, Enter. Oh my God. Non void doesn't return a value. Oh, thank you. Strike two. OK. Make scanf. There we go. OK, so scanf-- I'm going to type in a number like 50 and it just prints it back out. So that is the traditional way of implementing something like GetInt. The problem, though, is when you start to get into strings, things get dangerous quickly. Let me delete all of this and give myself a string s, although wait a minute-- we don't call it strings anymore-- char star to store a string. Then let me go ahead and just prompt the user for a string, using just printf. Then let me go ahead and use scanf, ask them for a string this time with %s, and store it at that address. Then let me go ahead and print out whatever the human typed in just by using the same notation. So here, line five is the same thing as string s, but we've taken back that layer today so it's char star s. This is just week one this is just week one, line seven is new. scanf will also read from the human's keyboard a string and store it at s. But that's OK, because s is an address. It's correct not to do the ampersand. It's not necessary. A string is and has always been a char star, a.k.a string. The problem, though, arises as follows-- if I do make scanf-- oh my God, what did I do wrong-- I can't-- OK, we have certain defenses in place with make. Let me do clang of scanf.c, an output of program called scanf. All right, so I'm overriding some of our pedagogical defenses that we have in place with make. Let me now run scanf of this version, Enter, and let me type in something like, how about hi again. So it didn't even store something and it weirdly printed out null. This time it's in lowercase, but that is somewhat related. What did I fundamentally do wrong though, here? Why is this getting more and more dangerous? And let me illustrate the point even more. What if I type in not just something like hello, which also doesn't work. What if I do like, hellooooo and make a really long string, Enter-- that still works. Can I do this again? Let's try again. Right, a really long, unexpectedly long string. This is the nondeterminism kicking in. Enter. All right, damn it. I was trying to trigger a segmentation fault but it wouldn't, but the point still remains. It's still not working, but what's the essence of why this isn't working, and it's not storing my actual input? Yeah. AUDIENCE: Do you have to make a space? DAVID J. MALAN: We have to make space for it. So what we're missing here is malloc, or something like that. So I could do that, I could do something like this. Well, let the human type in at least a three letter word so I could do malloc of 3 plus 1 for the null character. So let me give them four characters, and let me go ahead and do make scanf-- whoops. Nope, sorry. clang, I have to-- nope. Dammit. Oh, include stdlib.h-- there we go. That gives me malloc, now I'm going to recompile this with clang, now I'm going to rerun it, and now I'm going to type in my first thing, hi. That now works. And let me get a little aggressive now and type in hello, which is too long. Still works, but I'm getting lucky. Let me try a hellooooooo. Damn it, that still works, too. Sort of. But it actually-- not quite. There's some weirdness going on there already. It turns out I can also do this. I could actually just say char star four and give myself an array of four characters. Let me try this one more time. So let me rerun clang ./scanf. Hellooooooo, clearly exceeding the four characters-- there we go. Thank you, all right. So the point here, though, is if we hadn't given you GetInt, you would have had to use the scanf thing-- not a huge deal because it seemed to work. But if we hadn't given you GetString you would have had to do stuff like this, knowing about malloc already or knowing about strings being erased, and even now there's a danger. If the human types in five letters, six letters, 100 letters-- this code, like with the Hello input, will probably just crash, which is bad. So GetString also has this functionality built in where we have a fancy loop inside such that we allocate using malloc as many bytes as you physically type in, and we use malloc essentially every keystroke. The moment you type in h-e-l-l-o, we're laying the tracks as we go and we keep allocating more and more memory so that we theoretically will never crash with GetString even though it's this easy to crack-- this easy to crash your code using scanf if you again did it without the help of a library. So where are we all going with this? Well, let me show you a few final examples that'll pave the way for what will be problem set four. Let me go ahead and open up from today's code-- which is available on the course's website-- for instance, a program like this, called phonebook.c, and I'm just going to give you a quick tour of it, that you'll see more details on in the context of p-set four itself. We're going to introduce a few new functions you're going to see. You're going to see a function called fopen, which stands for file open, and it takes two arguments-- the name of a file to open like a CSV that you might manipulate in Excel or Google Spreadsheets or the like-- comma separated values, and then something like A for append, R for read, W for write, depending on whether you want to add to the file, just open it up, or change it. We're going to introduce you to a file pointer. You'll see that capital file-- which is a little bit unconventional-- capital file is a pointer to an actual file on the computer's hard drive so that you can actually access something like a CSV file, or heck, even images. And we're going to see down below that you're also going to have the ability to write files as well, or print to files. You'll see functions like printf printf for file printf. Or fwrite-- file write-- which now that you will begin to understand pointers, you'll have the ability to actually not only read files-- text files, images, other things-- but also write them out. In fact for instance, just as a teaser here, JPEGs will be one of the things we focus on this week where we give you a forensic image and your goal is to recover as many photographs from this forensic image of a digital camera as you possibly can. And the way you're going to do that is by knowing in advance that every JPEG in the world starts with these three bytes, written in hexadecimal, but these three numbers. And so in fact, just as a teaser, let me open up an example you'll see on the course's website for today. If I scroll through here, you'll see a program that does a little something like this. And again, more on this-- if we could hit the button-- there we go. So here we have the notion of a byte we're going to create for ourselves. We'll see a data type called byte, which is a common convention. This gives me three bytes. And you're going to learn about a function called fread, which reads from a file some number of bytes-- for instance, three bytes. We might then use code like this. If bytes bracket zero equals equals 0xFF and bytes bracket 1 equals 0xD8 and bytes bracket 2 equals 0xFF, all three of those bytes I just claimed represent a JPEG, you'll see an output like this. Let me go ahead and run this program as follows. Let me copy jpeg.c into my directory from today's distribution. Let me do make jpeg, and let me run jpeg on a file which is available online called lecture.jpeg, and I claim yes, it's possibly a JPEG. Well, what is that file? Let me open it up for us, called lecture.jpeg, and here, for instance, is that same photo with which we began class, namely implemented as a JPEG. But what we're also going to do this week is start to implement our own sort of filters a la Instagram, whereby we might take images and actually run them through a program that creates different versions thereof. For instance, using a different file format called BMP, which essentially lays out all of its pixels from left to right, top to bottom, in a grid. You're going to see a struct-- a data struct in C that's way more complicated than the candidate structure from the past, or the person structure from the past, that looks like this, which is just a whole bunch more values in it, but we'll walk you through these in the p-set. And we might take a photograph like this and ask you to run a few different filters on it a la Instagram, like a black and white filter, or grayscale, a sepia filter to give it some old school feel, or a reflection like this to invert it, or blur it, even in this way. And just to end on a note here, I have a version of this code ready to go that doesn't implement all of those filters, it just implements one filter initially. Let me go ahead and just ready this on my computer here. I'm going to go into my own version of filter and you'll see a few files that will give you a tour of this coming week in bitmap.h, for instance, is a version of this structure that I claimed existed a moment ago. And let me show you this file here, helpers.c, in which there is a function called filter that I've already implemented in advance today. But the ones we give you for the piece that won't already be implemented, this function called filter takes the height of an image, the width of an image, and a two dimensional array. So rows and columns of pixels, and then I have a loop like this that iterates over all of the pixels in an image from top to bottom, left to right. And then notice what I'm going to do here. I'm going to change the blue value to be zero in this case, and the green value to be zero in this case. But why? Well, the image I have here in mind is this one, whereby we have this hidden image that simply has old school style-- a secret message embedded in it. And if you don't happen to have in your dorm one of these secret decoder glasses that essentially make everything red-- getting rid of the green in the world and the blue in the world-- you can actually-- I'm actually probably the only one who can read this right now-- see what message is hidden behind all of this red noise. But if using my code written here in helpers.c I get rid of all the blue in the picture and I get rid of all the green in the picture, essentially implementing the idea of this filter-- this red filter where you only see red-- well, let's go ahead and compile this program. Make filter, run ./filter on this hidden message.bmp. I'm going to save it in a new file called message.bmp, and with one final flourish we're going to open up message.bmp, which is the result of having put on these glasses, and hopefully now you too will see what I see. All right, that's it for CS50! We'll see you next time. [MUSIC PLAYING]