[MUSIC PLAYING] DAVID MALAN: All right, this is CS50, and this is week four. And for the past several weeks, we've had training wheels of sorts on, while using this language known as C. And those training wheels have been in the form of the CS50 library. And you use this library, of course, by selecting and including cs50.h atop your code. And then if you think about how clang works, you've been linking your code via dash L CS50. But all of that has been automated for you up until now, using make. Today, we'll transition from last week's focus on algorithms to a little more focus on machines and on the machines we now use to implement these algorithms all the more powerfully, as we begin to take off these training wheels and look at what's really going on underneath the hood of your computer. And as complicated as some aspects of C have been, as new is programming may very well be to you, realize that there's not all that much going on underneath the hood that we need to understand to now move onward and start solving far more interesting and more sophisticated and more fun problems. We just need a few additional building blocks. And so today, we'll do this, first, by relearning how to count. Here, for instance, is what we'll call the computer's memory. And we've seen this grid before. And we can number recall all of the bytes in your computer's memory. We might call this byte number 0, 1, 2, 3, 4, all the way up to byte 15, and so forth. But it turns out, when talking about computers' memories, computers and computer scientists and programmers actually don't tend to use decimal. They definitely don't tend to use binary at that low level. Instead, they tend to use, just for conventional sake, something called hexadecimal. Hexadecimal is a different base system that, instead of using 10 digits or 2 digits, uses 16 instead. And so a computer scientist, when numbering things like bytes in a computer memory, would still do 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. But after that, instead of going onward with decimal to, say, 10, 11, 12, 13, 14, 15, they instead, conventionally, would start using a few letters of the alphabet. And so, in hexadecimal, this different base system base 16, you start counting at 0 still. You count up to and through 9. But when you want to keep counting higher, you then go to A, B, C, D, E, and F. And the upside of this is that, within hexadecimal-- and that hex implies 16-- you have 16 total individual digits, 0 through 9, and also now, A through F. So we don't have to introduce second digits just to count up as high as 16. We can use individual digits 0 through F. And we can keep counting up further by using multiple hexadecimal digits. But to get there, let's introduce this vocabulary. So in binary, of course, we use 0's and 1's. In decimal, of course, we use 0 through 9's. And in hexadecimal, to be clear, we're going to use 0 through F's, otherwise known as base-16. And it's just a convention that we use A through F. We could have used any other six symbols. But these are what humans have chosen. So hexadecimal works quite similarly to our familiar decimal system. And it's even familiar to, now, what you know as the binary system, as follows. Let's consider a two-digit value using hexadecimal instead of decimal and instead of binary. Well, just like in the world of decimal, we used base-10, or in the world of binary, we used base-2. We're just going to use, now, base-16, ergo, hexadecimal. So this is 16 to the first. This is 16 to the-- sorry 16 to the 0. This is 16 to the first. And of course, if we multiply that out, it's just the ones column and now the 16's column. And so if you want to count up in hexadecimal, you still start with 0 as usual, then 01, 02, 03, 04, 05, 06, 07, 08, 09. And then things get interesting. Now, you don't go to 01, because that would be incorrect. 01, in this base system, would be like 16 times 1 plus 1 times 0. That's not what we want. After the number we know is 9, we now count up to A, B, C, D, E, F. And now, things get interesting again. But just like in the decimal system, when you count up to, like, 99, you have to start carrying the 1, same thing here. If you want to count past F, you carry the 1. And so now, to represent one value greater than F, we use 01, which looks like 10, but is not 10. In hexadecimal, it is 01. 16 times 1 gives us 16. 1 times 0 gives us 0. And of course, that gives us the decimal number we now know is 16. So we will no longer introduce more and more base systems. But let me stipulate that just by using these columns that you learned back in grade school, presumably, can you implement any base system now. It just so happens that in the world of computers, and today in the world of memory, and soon, also files, it's just going to be very conventional to be able to recognize and use hexadecimal. And in fact, there's a reason humans like hexadecimal, or at least some humans. Computer scientists recall that if we count up as high as FF, in this case, we would still do the same math. So 16 times 15 plus 1 times 15 is going to give us, really, this, or of course, 240 plus 15, or 255. And I did that pretty quickly. But that's just the sort of grade school math of multiplying the column by the value that's in it, where again, each of these F's is how we now express 15 using a single digit. But recall that we've seen 255 before. Back when we talked about binary a few weeks ago, 255 also happened to be the pattern that we see here, eight 1 bits using binary. And so the reason that computer scientists tend to hexadecimal, is that, you know what, in eight bits, there's actually two pairs here, like four on the left, four on the right. If we sort of scooch these things over, it turns out that because hexadecimal allows you to represent 16 possible values, it's a perfect system for representing four bits at a time. After all, if you've got four bits here, each of which can be a 0 or 1, that's 2 times 2 times 2 times 2 possible values for each of those, or 16 total values, which is to say that in the world of computers, if you ever want to talk in units of four bits, it's wonderfully convenient to use hexadecimal instead, only because, conveniently, one hexadecimal digit happens to be equivalent to four binary digits, 0's and 1's. So 0, 0, 0, 0, all the way up through 1, 1, 1, 1. So why do humans do this? It's just now the human convention because of that convenience. Now, some of you may very well have seen hexadecimal before. In fact, recall our discussion in week 0 of RGB, where we discussed the representation of colors using some amount of red, green, and blue. And at the time, we used this example. We took our example out of context. And instead of using hi as a string of text, we reinterpreted 72, 73, and 33 as a sequence of colors. How much red do you want? How much green do you want? How much blue do you want? And that's fine. It's perfectly fine to think and express yourself in terms of decimal. But computer scientists tend not to do it that way in the context of colors and in the context of memory. Instead, they tend to use something called hexadecimal. And hexadecimal, here, would actually just have you change these values from 72, 73, 33, to the equivalent hexadecimal representation. And we won't bother doing the math here. But let me just stipulate that 72, 73, 33 in decimal is the same thing as 48, 49, 21 in hexadecimal. Now, obviously, if you glance at these three numbers, it's not at all obvious if you're looking at hexadecimal digits or decimal digits, because they do use the same subset, 0's through 9's. And so a convention, too, in the computing world, is any time you represent hexadecimal digits, you tend to prefix them, just because, with 0x. And there's no mathematical meaning to the 0 or the x. It's just a prefix you put there to make clear to the viewer that these are hexadecimal digits, even if they might otherwise look like decimal digits. So where are we going with this? Well, those of you who might have experimented in the past with making your own web pages and making them colorful, or those of you who are artists and have used programs like Photoshop, odds are, you've seen these codes before. In fact, here are a few screenshots of Photoshop itself. If you click on a color in Photoshop and you pull up this window, you can change the color that you're drawing on the screen to be any of the colors of the rainbow. But more arcanely, if you look down here, you can actually see these hexadecimal codes, because it's become human convention over the years to use hexadecimal to represent different amounts of red, green, and blue. So if you have no red, no green, no blue, otherwise represented as 000000, well, that's going to give you the color we know here as black. It's sort of the absence of any wavelengths of light there. If by contrast, though, you change all of those six digits to the highest possible value, which, again, is F. The range in hexadecimal 0 through F, otherwise in decimal, being 0 through 15, well, with FFFFFF, that's a lot of red, a lot of green, a lot of blue. And when you combine those wavelengths of light, you get the color we see here as white. And you can imagine, now, combining different amounts of red or green or blue. So for instance, in hexadecimal, FF0000, is the color we know as red. 00FF00 is the color we know as green. And finally, 0000FF is the color we know as blue, because again, the system that programmers and artists often but don't always use, is indeed, this system of RGB for red, green, and blue. So we introduced this here not because you have to start thinking any differently, because again, the mathematical mechanism is the same as week 0. But you're going to start seeing numbers in examples, in programs, as just appearing in hexadecimal by convention, as opposed to actually being interpreted as decimal. So if we consider, now, our computer's memory, we'll now start thinking of this whole canvas of memory, all of these bytes inside of our computer's memory, as being innumerable as 0, 1, 2, all the way through F. And then if we keep counting, we can go to 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1A, 1B, 1C, 1D, and so forth. And it's fine if it's not nearly that obvious, as you look at these things, what the decimal equivalents are. That's not a problem. It's just a different way of thinking about the locations, in this case, of a computer's memory, or the representation of one color or another. All right, well, let's now use this as an example of an opportunity, rather, to consider what's actually being stored in our computer's memory. And to be clear, I'll start prefixing all of these memory addresses, so to speak, with 0x, just to make clear that we're now talking, indeed, in terms of 0's and 1's. So here's a simple line of code. Out of context, we would need to, actually, put this in main or some other program to actually do anything with it. But we've seen this before many times, now, where you declare a variable, for instance, n for number. Declare it as an int for its type. And then, perhaps, even assign it a value. Well, what's actually going on when we use this kind of code in our computer? Well, let's go ahead and whip this thing up in a actual program. Let me create a file called address.c because I want to start experimenting with some addresses in the computer's memory. I'm going to go ahead and include standard io dot h. I'm going to give myself int main void. And down here, I'm going to go ahead and declare exactly that variable, int n equals 50. And then I'm going to go ahead and print out, with percent i and a backslash 0, the value of n. So nothing interesting there, nothing too complicated. I'm going to go ahead and make address. And then I'm going to go ahead and do dot slash address. And of course, as per week one, we should hopefully see just the number 50. But today, we're going to give you some more tools with which you can actually start poking around your computer's memory. But let's first consider this line of code in the context of your computer's hardware. So if you're writing a program with a line of code like this, that n needs to be somewhere in your computer's memory. That 50 needs to be put somewhere in your computer's memory. So if we, again, consider this to be just part of our computer's memory, a few dozen bytes, well, suppose that that variable, n, happens to end up down here. I've deliberately drawn n as taking up four bytes, four squares, because we call that an integer, typically, at least on CS50 IDE and modern systems, tends to be four bytes. So I made sure to have it fill four complete boxes. And then value might be 50 that's actually stored there. Well, it turns out that within your computer's memory, again, there are these addresses that are implicitly there. So even though, yes, we can refer to this variable, n, based on the variable name I gave it in my code, surely this variable exists at a specific location in memory. I don't know offhand where it is. But let me just propose that maybe it's at location 0x12345678, just an arbitrary address. I have no idea, in actuality, where it is. But it certainly does have an address, because every one of these squares inside of your computer's memory has an address, a unique identifier like 0, 1, 2, and so forth. Maybe the 50 ended up at memory address 0x12345678. Well, that's kind of cool about C, is that we can actually begin to see this, no pun intended. So let me go ahead and modify this program and introduce a little bit of new syntax that will allow us to start poking around the inside of your computer's memory so we can actually see what's going on underneath. So I'm going to go ahead and change this program to do this instead. I'm going to go ahead and say, you know what? Don't just print out the value, n, which, of course, is 50. Let me see, just out of curiosity, what is the actual address of n. And to do that today, we're going to introduce one new piece of syntax, which happens to be this here. There's two new operators, today, in C. The first is an ampersand, which does not represent a logical and. Recall a couple of weeks ago, we did see that if you want to combine Boolean expressions, this and that, you use two ampersands. It's an unfortunate coincidence that an ampersand, solo like this, will mean something different today. Specifically, this ampersand is going to be our address of operator. By simply prefixing any variable name with an ampersand, we can tell C, please tell me what address this variable is stored in. And this star, not to be confused with multiplication, also has another meaning in today's context. When you use this asterisk, you can actually tell your program to look inside of a particular memory address. So the ampersand tells you what address a variable is at. The star operator, otherwise known as the dereference operator, means, go to the following address. So they sort of are reverse operations. One figures out the address. One goes to the address. And so let's see this for real here. Let me go ahead and change my n in my program here to ampersand n. So I want to print out, not the number in n, but the address of n. And now, how do I print out an address? Well, it is just a number. But actually, printf supports a different format code for addresses. You can do percent p, for reasons we'll soon see, that says to print out the address of this variable and interpret it as hexadecimal, again, by convention. So I'm going to go ahead and make address now after only making two changes to this file. Everything seems to compile OK. Now, I'm going to go ahead and run address. And we will see that, in this particular program, address.c, for whatever reason, that variable, n, ended up at crazy location 0x7ffd80792f7c. Now, is that useful? Not in practice, necessarily. We're going to make this become useful by leveraging these addresses. But the specific address is not interesting. I'm glancing at this number. I have no idea what that number is in decimal. I would have to do the math, or frankly, just Google a converter and do it for me. So again, that's not the interesting part. The fact that this is in hexadecimal is just an implementation detail. It happens to represent the location of this variable. And again, we won't want to do this, necessarily. But just to be clear that one of these operators, the ampersand gets the address. And the star operator goes to an address. We can actually undo the effects of these things. For instance, if I print out now, not ampersand n, but just out of curiosity, star ampersand n, I can kind of undo the effects of this operator. Ampersand n is going to say, what is the address of n? Star ampersand n is going to say, go to that address. So this is kind of a pointless exercise, because if I just want what's in n, I can just, obviously, print n like we began. But again, just as an intellectual exercise, if I prefix n with the address of operator, and then use the asterisk and say, go to that address, it's the same exact thing as just printing n itself. So let me change the format code back to an integer. Instead percent p, let me go ahead and make address now, seems to compile OK, and run address. And voila, we're back at the 50. So as weird as the syntax today might start to feel, realize that these operators, at the end of the day, are relatively simple in what they do. And if you understand that one just kind of undoes the effects of the other, can we start to build up some pretty interesting programs with them. And we're going to do so by leveraging a special type of variable, a variable called a pointer. And there is that p in percent p. A pointer is a variable that contains the address of some other value. So we've seen integers before. We've seen floats and chars and strings and other types as well. Pointers, now, are just a different type of variable that store the address of some value. And you can have pointers to integers, pointers to chars, pointers to bools, or any other data type. A pointer references the specific type of the value that it actually is referring to. So let's see this more concretely. Let me go back, now, to my program here. And let me introduce another variable here. Instead of immediately printing out something like n, let me go ahead and introduce a second variable that is of type int star. And this, I will admit, is probably the most confusing piece of C syntax that we'll, in general, see, just because, my god, star is now used for multiplication, for going to an address, and also, now, declaring a variable. This is, arguably, not the best design decision. But it was made decades ago. So this is what we have. But if I do int star p equals ampersand n, now, what I can do down here, is print out the address of n by temporarily storing it in a variable. So I'm not doing anything new just yet. I'm still declaring on line 5, an integer called n, assigning at the value 50. What's new now on line 6, is that I'm introducing a new type of variable. This type of variable is known as a pointer. A pointer, again, is just a variable that stores the address of some value. And the syntax, admittedly weird, for declaring a pointer to an integer, is literally say int, because that's the type you're pointing to, star, and then the name of the variable you want to create. And I could call this anything, but I'll call it p to keep it succinct. And again, on the right hand side of the equals sign is the same operator as before. If you want to figure out what is the address of n, it's just ampersand n. And so we can store that address, now, somewhere longer-term. Before, I just passed in ampersand n and printf did it's thing. Now, I'm temporarily, on line 6, storing that address in a new variable called p. And its type is technically int star, is what a programmer might say. So it would be incorrect to say int p equals ampersand n. And indeed, our compiler, Clang, won't like that. It won't let you compile the code, most likely. And so, instead, I do int star p to make clear that I know what I'm doing. I am storing the address of an int, not an integer, per say. So if I go ahead, now, and save this, recompile with make address. And notice, I changed one line of code 2 earlier. I went back to percent p to print a pointer that is an address. And I'm pointing out the value of p, no longer the value of n. If I now run dot slash address, voila, there's that cryptic address. And these addresses may very well change over time. Depending on what's going on inside of your program or other things on the system, these addresses might be different each time. And that's to be expected and not something to be relied on. But it's clearly some random cryptic address, similar to my arbitrary 0x12345678 before. But now, let's just undo this operation. Just so we can come full circle here, let me now propose how I can print out the value of n. And let me call on someone for this if I can. If my goal, now, on line 7, is no longer to print the address of n, but to print n itself using p. I'm going to go ahead and change, preemptively, the format code to percent i. And a shorthand notation would, obviously, be just print n. But suppose I don't want to print n for this exercise, how can I now print the value in n by referring to it by way of p? What should I literally type as printf's second argument to print out the value of n by using this new variable, p, in some way. Yeah, let's call on Joshua. AUDIENCE: I believe, if you use the ampersand before the p, it will probably do it. DAVID MALAN: OK, ampersand p, let me go ahead and try that. Let's try ampersand p to print out this value. So ampersand p, I'm going to save the file. I'm going to do make address and enter. And it doesn't seem to be the case. Notice that I'm getting an error. It's a little cryptic. Format specifies type int, but the argument has type int star star, more on that another time. So it turns out this was incorrect. Let's take one other suggestion, because the ampersand, recall, gets the address of something. But p is already an address. So Joshua, what you technically proposed, was get me the address of the address. And that's not the direction we want to go. We want to go to what is at that address. Sophia, what do you think? AUDIENCE: We want to add a percent-- or a star p when we print it. DAVID MALAN: Yeah. So I had a little trouble hearing you. But I think if we instead use not the ampersand operator, but the star operator, that's going to be, indeed, the dereference operator, which essentially means, go to the value in p. And if the value in p is an address, I think, let's try this, make address. Yep, that compiled OK this time. Now, if I do dot slash address, hopefully, I will now see, indeed, the number 50. So again, we don't seem to have made any fundamental progress. At the end of the day, I'm still just printing out the value of n. But we've introduced this new primitive, this new puzzle piece, if you will, that allows you, programmatically, to figure out the address of something in the computer's memory and to actually go to that address. And we'll soon see exercise more sophisticated control over it as well. But let's come back to a pictorial representation of this and consider what it is we just did in the context, now, of this code. So inside of my main, the two interesting lines of code, really, were these two lines first before we made Sophia's addition and actually dereferenced p and printed it out with printf. But let's consider, for a moment, what these values now look like in a computer's memory. And again, the syntax is a little cryptic because we now have a star and an ampersand. But again, that just means, now, we get to start thinking in terms of the computer's memory. So for instance, here's a grid of memory inside of my computer. And maybe, for instance, the 50 and the n end up down there. They could end up anywhere, not even pictured on the screen here. They end up somewhere in the computer's memory, for our purposes thus far. But it technically lives in an address. And let me simplify the address just so it's quicker to say. This 50, now, stored in the variable n, maybe it actually lives at address 0x123. I have no idea where it is, but we've clearly seen that it can live in a seemingly random address like that. Now, what about p? p is technically a variable itself. It's a variable that stores the address of something else. But it's still a variable, which means, when you declare p with the code earlier, it actually does take up some bytes of memory on the screen. And so let me go ahead and propose that p happens to end up in memory here. Now, p is deliberately drawn to be longer here. I'm consuming eight total bytes this time, because it turns out, on modern computer systems, including CS50 IDE, pointers tend to take up eight bytes. So not one, not four, but eight bytes, so I've simply drawn it to be bigger. So what is actually stored in the variable p? Well, it turns out that, again, it's just storing the address of some value. So if the integer n, which itself is storing 50, is at location 0x123, and pointer p is being assigned that address, it's just like saying, well, stored in this variable p, is literally just a number represented here in hexadecimal notation, 0x123. So that's all that's going on inside the computer's memory with those two lines of code. There's nothing fundamentally new, except the fact that we have new syntax with which to refer to these addresses explicitly. This is n down here. This is p up here. And the value of p just happens to be an address. Now, I keep saying that these addresses are a little cryptic. They're a little arbitrary. And they are. And honestly, it is rarely, if ever, going to be enlightening to know, as a human, what address this integer n is actually at. Who cares if it's at 0x123 or 0x456? Generally, we don't. And so computer scientists, when talking about computers' memory, tend not to talk at these low level details, in terms of actual numbers. , Instead, they tend to simplify the picture, sort of abstract away all of the other memory, which frankly, is not relevant to the discussion thus far, and just say, you know what, I know that p is storing an address. And that address happens to be that of 50 down here. But I really don't care, in my everyday programming life, what these specific addresses are. So you know what? Let's just abstract it away as an arrow. And again, abstraction is all about simplifying lower level details that you may very well need to understand but you don't necessarily need to keep thinking about. You don't need to keep thinking at this level. It suffices to think at this level. So we might as well draw a pointer, pictorially, as pointing at some value and irrespective of what the actual address is. And so this is very much the case in our human world. We have very similar conventions whether or not it might be obvious at first glance, such that we may very well be using these same mechanisms in our everyday lives. So for instance, if you happen to have a mailbox out in the street on your home or down in the basement of Harvard Science Center when on campus, it may very well look like something like this, at least more residentially. And suppose that this mailbox here is representing, in this case, p, in the story. It's storing a pointer, that is, the address of something else. Well, if there's a whole bunch of other mailboxes on the street, well, we can put anything we want in these mailboxes. We can put postcards, letters, packages even. And just as in the real world, can we do the same in the virtual. I can store chars or integers or other things, including addresses. So for instance, Brian, I think you have your own mailbox somewhere else. And Brian, of course, has a mailbox that itself has a unique address. So Brian, for instance, what happens to be the unique address of the mailbox on your street there? BRIAN: Yeah, so here is my mailbox. It's labeled n. And its address is over here. The address of my mailbox appears to be 0x123. DAVID MALAN: Yeah, so my mailbox, too, has an address. Frankly, again, I don't really care about it. So I've not even put it on the mailbox here. But if my mailbox represents p, a pointer, and Brian's mailbox represents n, an integer, well, it should mean that if I look inside the contents of my pointer and I see the value 0x123, that is now my clue, a breadcrumb of sorts, that can now let me go look inside of Brian's mailbox. And Brian, if you wouldn't mind doing that for us, what do you have at that address? BRIAN: And if I look in my mailbox at address 0x123, I have the number 50 inside of this mailbox. DAVID MALAN: Yeah, indeed. So in this case, he happens to be storing an int. But it could be anything else. And again, we don't typically care about these specific addresses. Once you understand the metaphor, really, we can do something silly and really just think of this mailbox as storing a value that's pointing at Brian's mailbox. It's some kind of direction drawn there, pictorially as an arrow, here as a silly foam finger. Or if you prefer, a foam Yale finger pointing, instead, at Brian's mailbox, just as a sort of breadcrumb leading us to some other value on the screen. So when we talk today and beyond about addresses, that's all we're talking about. We humans in the real world have been using addresses for eons, now, to uniquely identify our homes or businesses or the like. Computers do the exact same thing at a lower level using their computer's memory. So let me pause here to see if there are any questions on pointers, variables that store addresses, or on these new operators, like the ampersand or the asterisk, which now has a new meaning today onward. Nothing yet. All right, seeing none, well, let's consider now, the same story in the context of a completely different data type. Thus far, we've played only with ints. But consider strings. We've spent a lot of time on strings, using encryption with them and solving implementing electoral algorithms using user's input. So let's consider a fundamentally different data type that stores, not individual integers, but strings of text instead. So for instance, in any program involving a string, you might have a line of code that looks like this. string s equals, quote unquote, "HI!" in all caps with an exclamation point. So that may very well be a line of code that we've seen thus far. What's actually going on inside of the computer's memory? Well, let me propose that when you type in quote unquote, "HI!" in a computer, it ends up somewhere in your computer's memory. So HI exclamation point, plus, per last week, a backslash 0-- or two weeks ago, a backslash 0, which is how a computer represents the end of that string. But let's look a little more carefully at what is going on underneath this hood here. Technically speaking, I could address those individual characters we have seen as of week two, by using bracket notation like s bracket 0, s bracket 1, s bracket 2, and s bracket 3. We use the square bracket notation to treat a string as though it's an array of characters. And it is, it was, and it still is. But it turns out, strings can also be manipulated by way of their addresses as well. And so for instance, maybe this same exact string, HI, is stored at memory address 0x123 and then 0x124, 0x125, and 0x126. Notice that they're deliberately contiguous addresses, back to back to back. And they're only one byte apart, because each of these chars, of course, is just one byte in C. So those numbers are not important, specifically. But the fact that they're one byte apart from each other is important, because that's the definition of a string, and indeed, an array, to have memory back to back to back. Now, what exactly, though, is S? S was the name of the variable I gave a moment ago to go to that line of code, string S equals quote unquote, "HI." well, what is S? S is a variable that has to go somewhere in the computer's memory. And suppose that S is, indeed, HI with an exclamation point. And the HI happens to live at this location here. You know what you can think of S as being now, isn't, at a high level, a string, but at a lower level, it's just the address of a string. More specifically, let's start thinking about a string as technically being just the address of the first character in the string. Now, that might give you pause for a moment, because why the first character? How are you going to remember that, wait a minute, this string isn't at and only at 0x123. It also continues at 0x124, 0x125, and so forth. But let me pause and ask the group here, why might it very well be sufficient for a computer and us programmers to just think of strings in terms of being the address of the very first byte. Like, why is it sufficient, no matter how long the string is, even if it's a whole paragraph of text, why is it very cleverly sufficient to think of a string like S as just being identical to the address of the first byte? Ginni, is it? AUDIENCE: Possibly because it happens that strings, whenever we are defining a new string, that is altogether. Suppose, if I'm writing my name, Ginni, so it will be G-I-N-N-I altogether. So it will be sufficient if something is pointed towards just first character of my name, so that I can just follow up for the first character and then get all the characters afterwards. DAVID MALAN: Perfect. So all of these basic definitions we had over the past couple of weeks now come together. If a string is just an array of characters-- and by definition of array, those characters are back to back to back, and per two weeks ago, every string ends with this conventional backslash zero or nul character. All you need to do when thinking about a string is just to know where does the string begin, because you can use a four loop or a while loop or some other heuristic with a condition and a Boolean expression to figure out where the string ends without even knowing, in advance, its length. So that is to say, let's start, for the moment, thinking of about strings as being quite simply that, just the address of the first character in the string. And if we then take that as fact, let's go ahead, now, and start playing with a program that doesn't use integers, but instead, used strings, using this basic primitive. So let me go ahead and delete the code I'd written before, an address.c. Let me just change it up to be string equals quote unquote, "HI" semicolon. And notice, I'm not manually typing any backslash 0's. C does that for us automatically. When you close the quote, the compiler takes care of adding that backslash 0 for you. Now, I'm going to go ahead on the next line and go ahead and print out percent s backslash n comma s, if I want to print out that string. Now, this program is not at all interesting anymore. Back in week one, we wrote something like-- OK, yes it is interesting because I screwed up. So five errors. I've written seven lines of code and five errors. And let's see what's going on. As always, always go to the top, because odds are, there's just some confusing cascading effect. The very first error I see is use of undeclared identifier string. Did I mean standard n? I didn't mean standard n, string, string, string. So I could run help 50 as my frontier, but honestly, I make this mistake often enough that I kind of know now that I forgot to include cs50.h. And indeed, if I now do this and recompile make address-- OK, all five errors are gone just by that one simple change. And if I run address now, it's just going to, quite simply, say HI. But let's now start to consider what's going on underneath the hood of this program. Suppose I am curious and want to print out what is actually the address at which this string lives. Well, it turns out-- let me be clever here. Let me print out, not a format code of percent s, but percent p. Show me this same string as an address. Let me go ahead and recompile, make address, seems to compile OK. Let me run dot slash address. And again, I'm still printing s, but I'm asking printf to present it as though it's a pointer. And interesting, it's not the same as before. But again, that's reasonable because the memory addresses aren't going to always be the same. But it doesn't matter what it is. But that's kind of interesting. All this time, any time you've been using strings, had you just changed your percent s to a percent p, you could have seen where, in memory, that string actually starts. It's not functionally useful to us just yet. But it's been there this whole time. And let me go ahead and do the following now. Suppose I get a little curious further, and I do printf. Let me go ahead and print out another address followed by a new line. And let me go ahead and print out the address of the first character. So again, this is a little weird to do. And we wouldn't typically do this that often. But again, just to make the point that these operators give us very simple answers to questions like, what is the address of this thing? If s bracket i, as of week two in CS50, represented the second character in s, because 0 index means s bracket 0 is the first, s bracket 1 is the second. If I play around with today's new operator, this ampersand, I bet I can see the address of that second character. And in fact, let me go ahead and be more explicit. Let me change this first s to be s bracket 0 and put an ampersand here. And let me go ahead, now, and make this program, make address. OK, a little funky-- I just missed a semicolon. So easy fix there. Let me go ahead and recompile with make address. Let me go ahead and run dot slash address. And interesting, well, maybe-- interesting to me. So you see, now, two addresses, the first of which is 0x4006a4, which apparently, is the address of the first character in s. But notice what's curious about the next one. It's almost the same except the byte is one further away. And I bet if I do this, not just for the h and the i, but also the exclamation point-- let me do one more line of almost identical code, just to make the point that all this time it's, indeed, been the case that all characters in a string are back to back to back. And you can now see it in code. b4, b5, b6, are just one byte apart. So we see some visual confirmation, now, that strings are indeed laid out in memory just like this. Now, again, this is not a very useful programmatic exercise to look at the address of individual characters. But again, this is just to emphasize that underneath the hood, some relatively simple operations are being enabled by way of this new ampersand, and in turn, star operator. So let's consider for a moment what this really looks like inside the computer's memory. At a low level, yes, s is technically an address. And yes, it's technically the address of the first byte, which in the actual computer, looked different. But in my slide here, I just arbitrarily proposed that it's at 0x123, 0x124, 0x125. But again, let's not care about that level of detail. Let's just kind of wave our hands and abstract away these addresses and just now start thinking of s, that is a string, as technically just being a pointer. A pointer. So it turns out that even though it's very useful and very common to think of strings as, obviously, just being sequences of characters. And that's been true since week one. And you can also think of them as arrays, back to back sequences of characters. You can also, it turns out, starting today, think of them as just being pointers, that is, the address of a character somewhere in the computer's memory. And as Ginni notes, because all of the characters in a string are, by definition, back to back to back, and because, by definition, all strings end with a backslash 0, that is literally the smallest and only amount of information you need to keep around in a computer to know where all of your strings are. Just remember the address of the very first character therein, because you can find your way to the end by remembering that this backslash 0 is, really, just eight 0 bits, otherwise represented as backslash 0. And so we could certainly have an if condition, much like we did two weeks ago when playing around with the lengths of strings, that allows us to check for precisely that. And so when I say we're taking off some training wheels, here they go. So up until now, we've been using, again, the CS50 library, which gives us, conveniently, functions like get string and get int and get float and so forth. But all this time, the CS50 library, specifically the file, cs50.h, had a little bit of a pedagogical simplification in it. Recall last week, that you can define your own custom data types. Well, it turns out that all this time, we've been claiming that strings exist and they're something you can use in your programs. And strings do exist in C. They do exist in Python, in JavaScript, in Java, and C++, in many, many, many other languages. This is not a CS50 term. But string, technically, does not exist as a data type in C. It instead, is more cryptically and more low-level known as char star. Char star, now what does that mean? Well, char star, much like our int star a few minutes ago, just represents the address of a character, much like int star represents the address of an int. And if, again, you kind of agree with me now, that you can think of strings as sequences of characters, or more specifically, arrays of characters, or more specifically, as of today, the address of just the first character, then it's, indeed, the case that we now can apply this new terminology, today, of pointer, to our old familiar friends, strings. String is the same thing as a synonym, if you will, for char star. And it's in the CS50 library that we, essentially, have a line of code that simplifies or abstracts away char star, which honestly, no one wants to think about or struggle with in the first week of a class, let alone the first two or three weeks of a class. It's a simplification, a custom data type, that we name string, just so you don't have to think about, what is this star? What is it to the character? What is it an address of? But today, we can remove those training wheels and reveal that, all this time, you've just been manipulating characters at specific addresses. And we've used this kind of technique before, abstracting away these lower level details. For instance, recall last week, that we introduced this notion of a struct, a data type that you can customize to be your own. We implemented a better phone book by wrapping together a name and a number inside of a custom data type, encapsulating them if you will, inside of something we called person. And every person we claimed had a structure that contains a name and a number. And by the way of this feature of C, typedef, we can define a new type. And the name of that type, last week, was just person. So we're using, already, and we have been sort of secretly using since the first week of C in the class, a line of code that actually looks like this. And this is, indeed, one of the lines of code inside of cs50.h. It says typedef, which means give me a custom type. And it creates a synonym for char star called string. And it's just a way where we can hide the funky char star. We can hide the asterisk, in particular, which would not be fun to play with in the first few days, but without changing the definition of what a string is. So strings exist in C. But there's no data type called string in C until you use a library like CS50's, which makes it exist by way of that kind of definition. All right, let me pause here to see if there's any questions, then, about what strings are or these new ways of thinking about them. Any questions about strings or char stars? All right, well, if no questions here, why don't we go ahead and take our 5 minute break here first. And we'll be back in 5 and take another look at what we can now do with these new primitives. All right, we're back. And we have, now, this ability in code to get the address of some variable and also to go to an address using ampersand and the asterisk, respectively. We've thought about strings as being not only contiguous sequences of characters, but also arrays. And then of course, as of today now, actual addresses, the address of the first character and then, from there, can we find our way, programmatically, to the end, thanks to that nul character. But it turns out there's one other thing we can do with these addresses or with pointers more generally. And that's known as pointer arithmetic. So anything that's a number, of course, we can do math on. And the math is not going to be complicated, but it is going to be powerful for us here. So I'm going to go back to my most recent state of address.c. And let me go ahead, now, and reiterate that we can print out the individual characters in a string, just like we did back in week two, as by using our square bracket notation. So I'm getting rid of all evidence of those addresses for now. I'm recompiling this program as make address. And then I'm going to run dot slash address now. And I see HI exclamation point, one character per line. But now, consider that there doesn't need to be a string data type. In fact, we can take this training wheel off. And while it might feel a little uncomfortable at first, if I delete this first line altogether, as I've accidentally omitted anyway sometimes, I don't need to keep calling things strings. I can describe them as strings verbally. I can think of them as strings, because string is a thing in many different programming languages. But by default, in C, it just doesn't exist as a type. Instead, the type is somewhat cryptically named, char star. But again, all that means is that the star means here's the address of something. Char means it's the address of a char. So char star gives you a pointer variable that's going to point to a character. So now, if s is that, I can actually treat it the same. There's no reason I can't keep using s like a string was back in week two, using our square bracket notation. And I can keep printing out HI exclamation point using that same square bracket syntax. But there's one other way I can do this. If I now know that s is really just an address, I can get rid of this square bracket notation. And I can actually just do star s, because recall that star, in addition to being the new symbol that we use when declaring a pointer up here, it's also the same symbol, confusingly, admittedly, that we used to go to an address. So if s is storing an address, which it is by definition of being a pointer, star s means go to that address. And per my picture earlier, it would seem to be the case that s is most likely at an address beginning at 0x123. It's not going to be the same in my actual IDE here. It will be whatever the computer has ordained. But it's going to be the same exact idea. So let me go ahead and go to star s. And just for kicks, let me leave it as just that one line. So let me go ahead and rerun this as make address. All right, and now dot slash address. I should see, hopefully, a capital H and only an H. But watch this. If I know that s, a string, is technically just an address, I can actually now do math on it. And I can go ahead and print out another character, followed by a new line. And I can go to, not s, but how about s plus 1. So I can do some very simple arithmetic, if you will, on that pointer. And let me go ahead and now recompile this. So make address, compiles OK, dot slash address. And I should see HI. And if I do one more line of code like this, printf, percent c, backslash n, star s plus 2, I can now go to the character that is two bytes away from whatever s is, which again, is the start of the string. So now, I've reprinted HI with the exclamation point character by character, but not by using this fancy square bracket notation, fancy only in the sense that it was sort of an abstraction for us, if you will. I'm instead, manipulating s for what it really is, which is just an address. And so here, too, and I've used this phrase before, that square bracket notation that we introduced in week two, is technically just syntactic sugar. It's not doing anything fundamentally different from these asterisks and these addresses. It's just doing it, honestly, in a much more user-friendly way. I still prefer, personally, the square bracket notation from week two. But it's the same thing as using the star and doing this math yourself. So C is just providing us with this handy feature of using square brackets that does all of this so-called pointer arithmetic for you. But again, we're going to this low level just to emphasize what it is that's going on ultimately underneath the hood here. All right, let me pause here for any questions. And Brian, please do feel free to verbalize any on your end. BRIAN: I see a question that came in about what would happen if you tried to print star s plus 3. DAVID MALAN: So I'm pretty sure that's going to print out the nul character. But let's go ahead and confirm as much here, percent c backslash n star s plus 3. All right, I'm getting a little adventurous here by looking at things I maybe shouldn't be looking at, because that's a low level implementation detail. But let's see what happens. It compiles OK, dot slash address. And it seems to be blank. Now, maybe that's the nul character. Honestly, it's not meant to be a printable character. It's this special sentinel value that indicates the end of the string. But I could do this. I know from week two that chars are integers and integers are chars if I want to think of them that way. So let me change only the very last character to use the format code percent i. Let me recompile my code. Let me go ahead and run address. And voila, HI exclamation 0. And there is the all 0 bits represented here as one single decimal digit thanks to percent i. Now, I can get really crazy here. And why don't we go ahead and print out not just what characters are right after this sequence, HI exclamation point nul character, why don't we go to-- oh heck, how about address 1,000 bytes away, and really get nosy inside of my computer? Let me recompile that dot slash address. OK, nothing really going on over there. How about 10,000 bytes away? Let me go ahead and make address. Let me go ahead and run this segmentation fault. All, right that's bad. And you might be among the fortunate few who have seen this error before by touching memory you shouldn't. And we're going to deliberately consider this today. But a segmentation fault, indeed, means that you have done something wrong somewhere in your code. And it tends to mean that you touched a segment of memory that you shouldn't have. And I have no business, honestly, looking 10,000 bytes away from the memory that I know belongs to the string. That's like arbitrarily looking anywhere in your computer's memory, which probably, it seems, is not a good idea. But more on that in just a bit. So let's consider, now, some of the implications of these underlying implementation details and consider, now, from last week, why we did a few things the way we did in the past few weeks, in fact. So string is just a char star. And let's, now, consider an example. Let me zoom out on my memory, just so I can cram more in at once. Let's consider an example where I might want to write a program that compares two strings. Let me go ahead and write some new code here in a new file this time, called, for instance, compare.c. My goal with this program, quite simply, is going to be to print out the contents of-- or rather to compare two strings that the user might input. I'm going to go ahead and include cs59.h, not because I want string, per say, anymore, but because I want to use get string just for convenience. But we'll take that training wheel off in a bit, too. And in this program, I'm going to go ahead and first use, not get string yet. Let me go ahead and keep it simple and start with get int. And I'll ask the user for a variable i. And let me do another one of these in get int and ask the user for a value for j. And then let me go ahead and quite simply say, if i equals equals j, then go ahead and print out same else. Let me go ahead and print out different. So this is week one stuff, where I'm using a couple of variables. I'm using a condition with two branches, and I'm using printf to print out whether those two variables, i and j, are the same. So let's go ahead and compile this. All is well. Run compare, and let me give it digits 1 and 2. And indeed, they're different. And let me go ahead and give it 1 and 1, and they're the same. So I think, logically, proof by example, if you will, this program looks correct. But let me quickly make it seemingly uncorrect, by not using integers. But how about, by using strings instead. Let me go ahead and give myself a string. Although, no, I don't need that training wheel anymore. Let's just do char star s equals get string of s. But again, even though I'm calling it char star, it's still a string like it was weeks ago. Let me give myself another string called t, just to keep the name short. And s will get-- t will get that value there. And let me just, very naively but kind of reasonably, say if s equals equals t, let's go ahead and print out same. And otherwise, let's go ahead and print out different. So same exact code, just different data types, and using get string instead of get int. Let me go ahead and make compare, seems to compile OK, dot slash compare. Let me go ahead and type in HI!-- woops, HI!. Let me go ahead and type in HI! again. And voila, different. And I forgot my backslash n's, but that seems to be the least of my problems. Let me recompile this, make compare, and now, let me run it again. How about, let's do a quick test. David, Brian, these are definitely different. OK, good. So the program seems to work. How about David, David? Also different. Huh, let me try again. Brian, Brian, also different. But I'm pretty sure those strings are the same. Why might this program be flawed? What is wrong with this program right now? BRIAN: A couple of people in the chat are saying that we're not actually comparing the characters, we're comparing the addresses. DAVID MALAN: Yeah, so that's sort of the logical conclusion from today's definition of what a string really is. If a string is just the address of its first character, then if you're literally doing s equals equals t, you're comparing those two addresses. And they are probably going to be different, even if I type in the same thing, because every time we've called get int or get string, it's kind of plopped the user's input somewhere in my computer's memory. But we now have the tools, honestly, to answer this or vet this answer ourselves. Let me go ahead and simplify this program. And let's, just as a quick sanity check, print out s. And let's go ahead and print out t using a new line after each, just so we can see what the strings are. So let me go ahead and do this again, make compare, compiles OK, dot slash compare. Let me type in HI, HI. And they seem to be visually the same. But recall that, now, I have this other format code, such that I can now start treating strings as the addresses they technically are. So let me change percent s to percent p in both places. Let me then recompile the program, and now, rerun compare with both HI and HI identically typed. But notice, they've ended up at slightly different memory locations. Even though I have coincidentally typed the same thing, C and my computer are not going to be so presumptuous as to use the same bytes for both strings. That's not going to give me much flexibility if I want to change one or the other. It's going to very simplistically put one in this chunk of memory and the other in this chunk of memory. And indeed, those addresses are respectively, but arbitrarily, 0x22fe670 and 0x22fe6b0. So they are spread apart some distance. But again, it's up to the computer to decide where to actually put those. So what's actually going on inside of the computer's memory? Well, let's consider if, for instance, this is s, my pointer, or really, my string. But it's just a pointer now. It's the address of something. Notice that I've drawn it as taking up eight squares, because again, a pointer on modern systems is eight bytes. So that's why this thing is so big. Meanwhile, when I type in something like HI with the exclamation point, then it ends up somewhere in memory. We don't really know or care where it is. So let's just arbitrarily say it happens to end up there in my computer's memory. Now, each of those bytes, of course, has an address. I don't necessarily know or care what they are. But for explanation's sake, let's just number them again like before, 0x123, 0x124, 0x125, 0x126. When I then assign s on the left the value from get string on the right, get string, what is it going to do? Well, all of this time since week one, since you've been using it, it is, yes, getting a string and handing it back to you as a return value. But what does that really mean? Well, if a string is just an address, the return value of a function like get string is to return to, not the string per se, because that's kind of a high level concept. What get string has always been doing for us is returning the address of the string, or more specifically, the address of the first character in the string. And so what is technically stored in s, to be clear, is that address, 0x123. It's not returning to the whole string, the H, the I, the exclamation point. Rather, it's returning just one value to you. It's returning only to you the address of the first character of that string. But again, this is all very good for just s. What's going on with t? t is kind of the same story, because I'm calling get string again. t is going to get assigned the address of the first character of this version of HI. And let's just arbitrarily say it's at 0x456, 0x457, 0x458, and 0x459. And at this point, t is going to take on the value of 0x456. And now, at this point, honestly, we're really getting into the weeds. Let's just start abstracting all of this away and use arrows to point at the values. And indeed, these arrows just represent pointers when we stop caring about the particular addresses. So s is really just a pointer, a variable pointing at the first character of HI here. t is just a variable pointing at the first character of HI there. And so when you are comparing two strings as I was before in the earlier version of my program, where I was checking if s equals equals t, I was, indeed, comparing s and t. What are s and t? s and t, respectively, are 0x123 and 0x456, or whatever the actual values happen to be, which are not going to be the same because they happen to point to different chunks of memory. All right, well who cares? This is all kind of a nice intellectual exercise. But who cares? Well, how do we solve this problem? Let's consider what I actually did in a previous demo. I sort of preemptively mentioned that there's this function, string compare, that allows you to compare two strings. And I promised that we would eventually explain why we use str compare as opposed to just using the equal equal sign. Well, to use this function, I'm going to need to add in string.h up here per lat time. But if string compare s t, let me go ahead and recompile this, compare dots slash compare. Now, let me type HI! and HI! identically. Now, they still seem to be different. And dammit, I made the same stupid mistake as I did last time. Does anyone know what mistake I made when comparing two strings? Somehow I seem to be very good at making this mistake. BRIAN: Ibrahim is suggesting that you add an equal equal zero. DAVID MALAN: Thank you. Ibrahim is quite right. The return value, recall, of str compare, is to return 0 if they're the same, a negative number if one comes before the other, and a positive number if one comes after the other, as in ASCIIbetical order. So what I should have done, both last time and this time, is check for equality with 0. Let me go ahead and recompile this program. OK, good. Now, let me rerun this program with HI! twice. Voila, they're the same. And just to make sure, let me do one other check. Let me do David and Brian, which should be, indeed, different. So now, again, I haven't really done anything different from that last time. But I'm now thinking about these strings as being fundamentally just their addresses. And so, now, let's make this actually germane. Let me go ahead and create a new file altogether. And let's, pretty reasonably, try to copy one string and make changes to it. So I'm going to go ahead here. And just for convenience, I'm going to still use the CS50 library, not for the string data type, but just for the get string function, which we'll see is more handy than other things-- than other ways of doing things. And I'm going to go ahead and include standard io dot h. And I'm going to go ahead and include, how about, string.h. Let me go ahead and do int main void. And let me go ahead, in this program, and get myself a string. But note, we won't call it string anymore. We'll just call it char star. So again, start taking off that training wheel. And I'm going to go ahead and get a string called s. And then I'm going to get another string. But I won't call it that. I'll call it char star t. And I want to copy s. And so you might think, based on week one, week two, and since, that OK, if you want to copy a variable, just do it. I mean, we've used the assignment operator to copy a variable from right to left for integers, for chars, and for other data types, perhaps, too. I'm going to go ahead, now, and make a change to the original string. So let me go ahead and do this. Let me go ahead and say, let's change the first character of t to be uppercase. Recall that there's this function, to upper, which takes, as input, a character, like the first character in t, and returns the uppercase version. Now, to use to upper, I need another header file, which I recall from a couple of weeks ago now, I need ctype.h. So let me preemptively go back and put that there. And now, let me go ahead and print these two strings. Let me go ahead and print out s as being this percent s. And let me go ahead and print out the value of t with percent s as follows. So again, what I'm doing is I'm getting a string from the user. And the only new thing here is char star today, which is synonymous with string. On line 10 here, I'm copying the string from right to left. And then I'm capitalizing only the first letter in the copy, otherwise known as t. And then I'm just printing both out. So let me go ahead and make copy, compiles OK. Make cop-- dot slash copy. Let me go ahead and type in hi! in lowercase, all lowercase, and then enter. And voila, huh. It would seem that I somehow capitalized both S and T, even though I only called to upper on T. Brian, any thoughts from the group on why I've accidentally and erroneously capitalized both somehow? BRIAN: A couple of people are saying that t is just an alias of s. DAVID MALAN: Just an alias of s, that's a reasonable way of thinking of it, sure. And more precisely, any other thoughts on why this is incorrect somehow? BRIAN: Peter is now suggesting that they have the same address. DAVID MALAN: So yeah, more specifically, all I've done is copy s into t. But again, what is s as of today? It's just an address. So yes, I have copied s. But I've copied it literally, which means copying its address, 0x123, or whatever it is. And then on line 12, notice that I'm changing t by uppercasing it. But t is at the same address of s. So really, I'm changing one in the same string. So if we think about this in terms of the computer's memory, let's consider what I've just done. Let me clear the computer's memory. Let me put s down as before. Let me put hi! down as before, but all lowercase this time. And recall that it might be it addresses 0x123, 124, 125, and 126. And now, if we consider that s technically contains the address of that first character, 0x123, and I proceed to create a new variable, t, and assign t the value of s, I got to take that statement literally. I'm literally just putting 0x123 here. And if we now abstract away these details just to make it more clear visually what's going on, that's pretty much like saying that both s and t point to the same location in memory. So yes, in that sense, t is just an alias for s, which is a reasonable way of thinking of it. But really, just t is identical to s. So when you use the square bracket notation to go to the first character of t, you are equivalently going to the first character in s. They are one in the same. So when I call to upper, I'm calling it on this character, which of course, is the one and only h in the story. And when I print s and I print t, printf is following those same breadcrumbs, if you will, and ultimately displaying the same value as having changed. So we would seem to need to fundamentally rethink how we are copying strings. And let me ask, if this is the wrong way to copy one string into the other, what is the right way? Even if you don't have the functions in mind or the right vocabulary, just intuitively, , if we want to copy a string in the way that a human would think of copying one into the other, like a photograph or a photocopy, how do we want to do this? Any thoughts, Brian? BRIAN: Yeah, Sophia suggested we would want to somehow loop over the elements in s and put them into t. DAVID MALAN: Yeah, I like that. So loop over the elements of s and put them into t. So it sounds like more work. But that's, again, what we're going to have to do if we want to think of these-- if we want to accept the fact that these things, s and t, are just addresses, we're going to now have to go and follow those breadcrumbs. So let's go ahead and consider a variant of this program. Let me go ahead, here, and change this such that I'm still getting a string s. But now, let me go ahead and propose exactly that, that we copy the individual characters. But I need to copy them somewhere. So I feel like another step in this process of copying a string has to be to give myself some additional memory. If I have H i exclamation point in nul character, I need to, now, somehow take control of this situation and tell the computer somehow, in code, give me four more bytes of memory so that I have location for t in which to copy those characters. So here's a new function today. If I want to create a string t, otherwise known today as a char star, there is a new function we can use called malloc, which represents memory allocation. This is a pretty fancy function that, fortunately, is pretty simple to use. It takes, as input, just a number. How many bytes of memory do you want to ask the computer for? So how do I do this? Well, H i exclamation point backslash 0, I could literally just say four. But this doesn't feel very dynamic. I think I can programmatically implement this a little more elegantly. Let me go ahead and say, give me as many bytes as there are characters in s plus 1. Plus 1, why am I doing this? Well, H i exclamation point nul character, that's technically what's stored underneath the hood. But what do you and I think of the length of Hi! as being? Well, odds are, in the human world, it's H i exclamation point. And who cares about this low level detail, this nul terminator. You don't include that in the length of an English word or any word. You only think of the actual characters you can see. So the length of H, i, exclamation point 3. But I do need to cleverly add one more bite, a fourth, for the nul character, because I'm going to have to copy that over as well. Otherwise, if I don't have an identical nul character, t is not going to have an obvious ending. So how do I copy, now, one string into the other? Well, let me go ahead and take out our old friend, the for loop, from week one. And say, for i equals 0-- how about, actually, n equals string length of s. We've done this trick before. i is less than n, i++. Let me go ahead and, quite simply, say t bracket i gets s bracket i. So this will literally copy, from s, each of the characters one at a time into t. But I need to be a little smarter now. Even though we almost always do i less than n, I'm actually going to very aggressively say i less than or equal to n. Why? Why am I going one step further than I feel we normally do when iterating over strings, and one step further than you probably did when iterating over a caesar cipher or a string in that context? Brian, any thoughts here? Why am I going from i less than or equal to n kind of for the first time here? BRIAN: Celina is suggesting that we need to include the nul character. DAVID MALAN: Yeah, so if I-- and now I understand how strings works. So it's not sufficient to just copy the H, I, exclamation point. I need to go one step further, one more than the length of the string. And the easiest way to do that would be less than or equal to n. Or I could just do a plus 1 there. Or I can do this any number of ways. Doesn't matter how you do it. But I think a less than or equal to is one reasonable way to do it. And now, let's go down to the bottom here and now actually do this capitalization. Let's now change the first character in t to be the result of calling to upper on the first character of t. And then, as before, let's go ahead and print out whatever s is. And like before, let's go ahead and print out whatever t is and hope now that only t has been capitalized. But I do need to make one change now. It turns out that this function, malloc, comes in a file called standard lib dot h. And again, this is the kind of thing that you can jot down in notes. You can always Google these kinds of things. Even I forget what header files these functions are sometimes declared in. But it happens to be a new one called standard lib for library that gives you access to malloc. So let me go ahead, now, and make compare. All right, so far so good. Dot slash compare-- sorry, this is not compare. The old program works fine. Make copy-- oh my god, seven mistakes. What'd I do wrong here? Oh, it looks like I forgot the type of i and n. So let me go into my for loop and add the int. That was my fault. Let me make copy again. OK, all seven errors, thankfully, went away. Make copy, let's go ahead and type in hi! in lower case and hit Enter. And voila, now I have capitalized only the copy of s, a.k.a. t. And just to be clear, I've kind of regressed back to my square bracket notation, honestly, because it's perfectly acceptable. It's very readable. But notice, if I really want to show off, I could say something like, well, go to t's plus i location. And then do this, which again, I don't necessarily recommend for readability. But again, there is this equivalence. The square bracket notation is the same thing as pointer arithmetic. So if you want to go to the address at t plus whatever i is to offset yourself one or more bytes, you can totally do that. And if I want to be fancy, I can go down here and say, go to the first character in t and capitalize it. But again, I would argue that even though, yes, you're very clever and that you understand pointers and addresses at this point if you're writing code like this. Honestly, it's not necessarily as readable. So sticking with week two syntax of the square bracket notation, totally reasonable, totally correct, totally well-designed, and perhaps preferable, though I should be careful here. This line of code is a little bit risky for me because what if the user just hits Enter and they don't type hi or David or Brian. What if they type nothing except Enter? In that case, the length of the string might be 0. And then I probably shouldn't capitalizing the first character in a string that doesn't really even exist. So I should probably have some error checking, like if, for instance, the string length of t is at least greater than 0, then go ahead and safely do that. But again, this is just one example of some additional error checking I can add to the program. There's actually one more piece of error checking I should really do in a fully correct program, as you should do in problem sets. Sometimes things can go wrong. And if your program is so big, so fancy, and so memory-hungry that you're mallocing lots and lots of memory, which you won't do in the program this small, but over time you might need more and more memory, we should also make sure that t actually has a valid address. It turns out that malloc, most of the time, is going to return to you the address of a chunk of memory it has allocated for you. Just like get string, it will return to you the address of the first byte of the chunk of memory that it has found space for. However, sometimes things can go wrong. Sometimes your computer can be out of memory. You've probably seen your Mac or PC freeze or hang or reboot itself. That is very often the result of memory errors. So we should actually check something like this. If t equals equals this special value nul, then I'm going to go ahead and just bail out and return one, quit, let's get out of the program. It's not going to work. This might only happen one out of a million times. But it's more correct to check for nul. Now, unfortunately, the designers of C kind of used-- or programmers more generally, use this word, which is almost the same as N-U-L, otherwise known as backslash 0. Unfortunately, this is a different value. N-U-L-L represents a nul pointer. It is a bogus address. It is the absence of an address. Technically, its address 0. It is different from backslash 0. You use N-U-L-L in the context of pointers, as we are doing today. You use backslash 0, otherwise known verbally, as an N-U-L, or nul, in the context of characters. So backslash 0 is for characters. N-U-L-L in all caps is for pointers. And it's just a new symbol we're introducing today that comes with this standard lib dot h file. All right, so it turns out, honestly, I don't need to do some of this work. It turns out that if I want to copy one string to another, there is a function for that. And increasingly, you will not have to write as many lines of code as you previously did, because if you look up in the manual pages or you've heard about or find online that there's another function, like one called strcpy, you can actually, more simply, do something like this. So even though I really liked the idea, and it was correct to use a for loop to copy all of the characters from s into t, there's a function for that. It's called strcpy. It takes two arguments, the destination followed by the source. And it will just handle all of the looping for us, all of the copying for us, including the backslash 0, so that I can focus on what I want to do, which in this case, is actually capitalize things. So if we consider, now, this example, in the context of my computer's memory, we'll see that it's laid out a little differently. But there's one more bug I do want to fix first. And this is something we've not had to do yet. It turns out that any time you allocate memory with malloc, you ask the computer for memory, the onus is on you, the programmer, to eventually give it back. And by that, I mean if you allocate four bytes, or who knows, four million bytes of memory for an even bigger program, you'd better give it back to the computer, more specifically, the operating system, be it Linux or Mac OS or Windows, so that your computer eventually doesn't run out of memory. If all you ever do is ask for more memory, ask for more memory, it stands to reason that eventually your computer will run out, because it only has a finite amount of memory. It's got a finite amount of hardware recall. So when you're done with memory, it should be your best practice to free it afterward as well. And the opposite of malloc is just a function called free, which takes, as its input, whatever the output of malloc was. And recall that the output of malloc, the return value of malloc, is just the address of the first byte of memory that it has allocated for you. So if you ask it for four bytes, like I did a few lines ago with malloc, you're going to get back the address of the first of those bytes. And it's up to you to remember how many bytes you asked for. In the case of free, all you have to do is tell free via its input what the address was that malloc gave you. So if you stored that address as I did, in this variable called t, it suffices when you're done with that memory just called free t. And the computer will go about freeing up that memory for you. And you might very well get it back later on. But at least your computer won't run out of memory as quickly, because it can now reuse that space for something else. All right, let me go ahead, then, and propose that we draw a picture of this-- now new program's memory, where we copy things. So recall, this is where we left off before when comparing two strings. If this was s and s was pointing to h, i, exclamation point in lowercase, this new version of my code in copy.c, notice, still gives me another pointer called t. So that part of the story hasn't changed. But I call malloc now. And malloc is going to return to me some new chunk of memory. I don't know in advance where it is. But malloc's return value is going to be the address of the first bite of that memory. So for instance, 0x456 or whatever it is. And the subsequent bytes are going to be increasing by one byte at a time, 0x457, 0x458, 0x459. So what is, ultimately, stored in t when I assign it the return value of malloc? It's whatever that address is. Again, I could technically write 0x456 up here. But again, we're kind of past that. That's very 30 minutes ago. Let's now focus on just the abstraction that is a pointer. A pointer is just an arrow pointing from the variable to the actual location in memory. So now, if I go about copying s into t using strcpy, or more manually, using my for loop, what happens? Well, I'm copying the h over from s into t. I'm copying the i over from s into t, the exclamation point from s into t. And then lastly, the terminating nul character from s into t. So the picture is now fundamentally different. t is not pointing at the same thing. It's pointing at its own chunk of memory that has now, one step at a time, been duplicating whatever was at the address s. And so this is what you and I as humans would consider, presumably, to be a proper copy of the program. Any questions, then, on what we've just done by introducing malloc and free? The first of which allocates memory and gives you the address of the first byte of memory that you can now use, the latter of which hands it back to your operating system and says, I'm done with this. It can now be reused for something else, some other variable, maybe, down the road, if our program were longer. Brian, any questions or confusion I can help with? BRIAN: Someone asked, even if you're using strcpy to copy the string instead of copying the characters one at a time yourself, do you still need to free the memory? DAVID MALAN: Good question. Even if you're using strcpy, you do need to still use free. Yes, anytime you use malloc henceforth, you must use free. Anytime you use malloc, you must use free in order to free up that memory. strcpy is copying the contents of one chunk of memory to the other. It is not allocating or managing that memory for you. It is just implementing, essentially, that for loop. And it's, perhaps, time too, where I can take off another training wheel verbally. It turns out that get string, all this time, is kind of magical. One of the things that get string does from the CS50 library is it itself uses malloc. Consider, after all, when we, the staff, wrote get string years ago, we have no idea how long your names are going to be this year. We have no idea what sentences you're going to type, what paragraphs you're going to type, what text you're going to analyze for a program like readability. So we had to implement get string in such a way that you can type as few or as many characters at your keyboard as you want. And we will make sure there's enough memory for that string. So get string, underneath the hood, if you look at the code we, the staff, wrote someday, you'll see that we use malloc. And we call malloc in order to get enough memory to fit that string. And then, what the CS50 library is also secretly doing, is it is also calling free for you. There's, essentially, a fancy way where you can write a program that, as soon as main is about to quit or return to your blinking prompt, some special code we wrote swoops in at that final moment, frees any of the memory that we, the library, allocated so that you don't run out of memory because of us. But you all, when using malloc, will have to call free, because the library is not going to do that for you. And indeed, the goal of today and next week and beyond is to stop using the CS50 library, ultimately, altogether. All right, well let's-- it would be unfair, I think, if we introduced all of these fancy new techniques but don't necessarily provide you with any sort of tools with which to determine to chase down bugs in your new fancy code or solve problems, now, that are related to memory. And thankfully, there are programs via which you can chase down memory-related bugs. This is in addition to printf, that function, and help50 and check50 and debug50 and debuggers more generally. This program-- and it's really the last of the new tools we'll introduce you to in C-- is called valgrind. And this is a program that exists in CS50 IDE. But it exists on Macs and PC's and Linux computers anywhere, where you can run it on your own code to detect if you're doing anything wrong with memory. What might you do wrong with memory? Well, previously, remember, I triggered that segmentation fault. I touched memory that I should not. Valgrind is a tool that can help you figure out, where did you touch memory that you shouldn't have, so as to focus your own human attention on whatever lines of code might be buggy. Valgrind grant can also detect if you forget to call free. If you call malloc one or more times, but don't call free a corresponding number of times, valgrind is a program that can notice that and tell you that you have what's called a memory leak. And indeed, this is germane to our own Macs and PCs. Again, if you've been using your Mac or PC or sometimes even your phone for a long, long time, and maybe running lots of different programs at once, lots of browser tabs open, lots of different programs open at once, your Mac or PC might very well have begun to slow to a crawl. It might be annoying, if not impossible to use, because everything is so darn slow. That may very well be because one or more of the programs you're using has some bug in it whereby a programmer kept allocating memory and never got around to calling free. Maybe it's a bug, maybe it was deliberate, they didn't expect you to have so many windows open. But valgrind can detect errors like that. And honestly, some of you, if you're like me, you might very well have 10, 20, 50 different browser tabs open at once, thinking oh, I'm going to come back to that someday, even though we never do. Each of those tabs takes up memory. Literally, any time you open a browser tab, think of it, really, as Chrome or Edge or Firefox or whatever you're using, underneath the hood, they're probably calling a function on Mac OS or Windows like malloc to give you more memory to contain the contents of that web page temporarily. And if you keep opening more and more browser tabs, it's like calling malloc, malloc, malloc. Eventually, you're going to run out. And computers can be smart these days. They can kind of temporarily remove things from memory to free up space. This is called virtual memory. But eventually, something is going to break. And it might very well be your user experience when things get so slow that you literally have to quit the program or maybe even reboot your computer. So how do we use valgrind? Well, let me go ahead and write a short program that doesn't do anything useful, but demonstrates multiple memory-related mistakes. I'll call this file memory.c. I'm going to go ahead and open up the file memory.c and include at the top standard io dot h. And then I'm going to also, preemptively, include standard lib dot h, which recalls where malloc, int main void. And I'm going to keep this one simple. I'm going to go ahead and just give myself a whole bunch of integer. So this is actually kind of cool. It turns out that-- well, let's go ahead. Yeah, I can do this. Let's go ahead and do this. Char star s gets malloc. And let me go ahead and give myself, how about three of these. Let me go ahead and allocate space for three chars. Or actually, let's give me four, just like before. Now, I'm going to go ahead and say s bracket 0 equals 72. s bracket 1-- actually, I'll just do this manually. Let's do h. Let's do i. Let's do our usual exclamation point. And then just for good measure, s bracket 3 gets quote unquote, backslash 0. This is the very manual way of actually-- this is the very manual way of actually building up a string. But let me introduce a mistake. Let me accidentally allocate only three bytes, even though I clearly need a fourth for that terminating nul character. And notice too, the absence of free. I'm going to, very sloppily, not bother calling free. Now, I'm going to go ahead and compile this program, make memory. OK, it compiles OK, so that's good, dot slash memory. OK, nothing happens, but that kind of makes sense because I didn't tell it to do anything. Just for kicks, let's print out that string just like we always do. Let me now recompile memory, still compiles. Let me run dot slash memory. OK, it seems to work. So at first glance, you might be really proud of yourself. You've written another correct program, seems to pass check50. You submit. You go about your day. And you're very disappointed some days later when you realize, dammit, I did not get full credit on this because there's actually a latent bug. So sometimes, indeed, there are bugs in your code that you don't necessarily see visually, you don't necessarily experience when running it yourself, but eventually, there might be an error when running it enough times. Eventually, a computer might notice that you're doing something wrong. And thankfully, tools exist like valgrind, that can allow you to detect that. So let me go ahead and just increase the size of my terminal window here. And let me go ahead and run valgrind on dot slash memory. So it's just like debug50. Instead of running debug50 and then dot slash whatever the program is, you run valgrind dot slash memory. This one, unfortunately, is only a command line interface. There's no graphical user interface like debug50. And honestly, it's a hideous sequence of output. This should overwhelm you at first glance. There's crazy cryptic-ness here. It's not the best-designed program. It really was meant for the most comfortable people. But there are some useful tidbits we can take away from it. As always, let me show all the way to the top to the very first line of output. And I'll draw your attention to a couple of things that will start to jump out to you. And help50 can help you with this. If you're confused by valgrind's output, rerun it. But put help50 at the beginning. And just like I will do now verbally, so can help50 help you notice the important things in this crazy mess of output. This is worrisome. Valgrind is noting on this line here, invalid right of size 1. And that's on line 10 of memory.c. So we'll look at that in a moment. If I scroll down further, invalid read of size 1. And that also seems to be on here, it looks like, on line 11 of memory.c. And then if I keep scrolling, keep scrolling, keep scrolling, I'm not liking this. 3 bytes in 1 blocks are definitely lost in loss record, whatever that is. But three bytes in 1 blocks are definitely lost. And then down here, leak summary, definitely lost, 3 bytes in 1 blocks. Incidentally, 1 blocks, obviously not correct grammar. This is what happens when your program doesn't have an if condition that checks if the number is 1 or positive or 0. You could fix this, grammatically, honestly, with a simple if condition. They did not when writing this program years ago. So there's two or three mistakes here. One is some kind of invalid read or write. And another is this leak. Well, what is a write? A write just refers to changing a value. A read just refers to reading or using or printing a value. So let's focus on line 10. If I scroll back down to my code and look on line 10, this was an invalid write, invalid write. Well, why is it invalid? Well, per today's definition, if you are allocating 3 bytes, you are welcome to touch the first byte, the second byte, and the third byte. But you have no business touching the fourth byte if you've only asked for three. This is like a small scale version of the very adventurous and inappropriate poking around I did when I looked at 10,000 bytes away. Even looking one byte away is a potential bug and can cause a program to crash. Meanwhile, line 11 is also problematic, which is an invalid read, because now, you're saying go print out this string. But that string contains a memory address that you should not have touched in the first place. And the memory leak, the third problem, stems from the fact that I didn't free that memory. So again, it'll take some practice and experience, some mistakes of your own, to notice and understand these bugs. But let me fix the first two like this. Let me just give myself four bytes. And let me fix the second one or the third one, really, by freeing s at the very end, because again, any time you use malloc you must use free. Let me go ahead and recompile memory, seems to compile. Let me rerun it, still works the same, visually. But now, let's rerun valgrind on it and see if there are any errors now, so valgrind dot slash memory, Enter. The output's still going to look pretty cryptic. But notice all heap blocks were freed, whatever that means. No leaks are possible. It doesn't really get more explicit than that. That's a good thing. And if I scroll up, I see no mention of those invalid reads or writes. So starting with this week's problems and next week's in C, not only are you going to want to use tools like help50 and printf and debug50 and check50, but even if you think your code's right, the output looks right, you might have a latent bug. And even when your programs are small, they might not crash the computer. They might not cause that segmentation fault. Eventually, they will. And you do want to use tools like this to chase down any such mistakes. Otherwise, bad things can happen. And what might happen? Well, let me go ahead and reveal an example here that presents some code that's a little dangerous. So here, for instance, is an example where I'm declaring at the top of the function, int star x and int star y. So what does that mean? Well, per today's parlance, this just means give me a pointer to an integer called x. Give me a pointer to an integer called y. Put another way, give me a variable called x that I can store the address of an int in. Give me a variable called y that I can store the address of another int in. But notice what I am not doing on these first two lines. I'm not actually assigning them a value until line 3. On line 3, even though this is weird-- this is not how we've allocated space for integers before-- there's no reason that you can't use malloc and say, give me enough space for the size of an integer. sizeof is new. It's just an operator in C that tells you the size of a data type, like a size of an int. So maybe you forgot that an int is 4. And indeed, an int is usually 4, but not always 4 in all systems. So size of int just makes sure that it will always give you the right answer, whether you're using a modern computer or an old one. So this just means, really, allocate 4 bytes to me on a modern system. And it stores the address of the first byte in x. Would someone mind translating to layman's terms, what is star x equal 42 doing? Star, again, is the dereference operator. It means go to the address. And do what? How would you describe, with a verbal comment, what star x equals 42 is doing? Brian, would you mind verbalizing any thoughts? BRIAN: Yeah, so Sophia suggested that at that address, we are going to place 42. DAVID MALAN: Perfect. At that address put 42. Equivalently, go to that address in x and put the number 42 there. It's like going to Brian's mailbox and putting the 42 in his mailbox, instead of what we previously had there, which was the number 50. How about this next fifth line, star y equals 13? Brian, could you verbalize someone else? What does star y equals 13 do for us? And it's not an accident that 13 tends to be unlucky. BRIAN: Peter says, put 13 at the address y. DAVID MALAN: Good, put 13 at the address in y. Or put another way, go to the address in y and put 13 there. But there's a logical problem here. What is in y? If I rewind, I never actually assign y a value. I don't initially, and I don't eventually. At least with x, even though I didn't give it a value in declaring it up here as a variable, I eventually got around to storing in it the actual address. Now, just to be really nit picky, I should probably even, in this program, check for nul just in case anything went wrong. But that's a whole other problem. It is a more damning problem that I haven't even given y a value. And here's where we can reveal one other detail about a computer. Thus far, we've been taking for granted that you and I almost always initialize our memory. If we want to give ourselves a char, an int, a string, we literally type it out into the program itself so that it's there when we want it. But if we consider this picture here, which is now just a physical incarnation of some of the contents of your computer's memory, playfully labeled with a lot of Oscar the Grouches, this is because you should never trust the contents of your computer's memory if you yourself have not put something there. There's a term of art in programming called garbage values. If you yourself have not put a value somewhere in memory, you should assume, to be safe, that it is a quote unquote, "garbage value." It's not a weird value. It's just a 1, a 2, an A, a B, a C, you just don't know what it is, because if your program is running over time and you're calling functions and functions are returning. You're calling other functions and functions are returning. These values in your computer's memory are constantly changing, and your memory gets reused. When you free memory, that doesn't erase it or set it all back to 0's or set it all back to 1's. It just leaves it alone so that you can reuse it, which means over time, your computer contains remnants of all of the variables you've ever used in your program over here, over here, over there. And so in a program like this, where you have not explicitly initialized y to anything, you should assume that Oscar the Grouch, so to speak, is at that location. It is a garbage value that looks like an address but is not a valid address. And so when you say star y equals 13, that means go to that address. But really, go to that bogus address and put something there. And odds are, your program is going to crash. You are going to get a segmentation fault, because by going to some arbitrary garbage value address, it would be like picking up a random piece of paper with a number on it and then going to that mailbox. Why? It does it belong to you. If you try to dereference an uninitialized variable, your program may very well crash. And this is, perhaps, no better-presented than by some of our friends, Nick Parlante, a professor at Stanford University who is breathed life into a character in claymation known as Binky. We have just a 2 minute clip from this that paints the picture of bad things indeed happening when you touch memory that you shouldn't. So hopefully, a helpful reminder as to what to do and not to do with pointers. Here we go. [VIDEO PLAYBACK] - Hey, Binky. Wake up, it's time for pointer fun. - What's that? Learn about pointers? Oh, goody! - Well, to get started, I guess we're going to need a couple pointers. - OK, this code allocates two pointers which can point to integers. - OK, well I see the two pointers. But they don't seem to be pointing to anything. - That's right. Initially, pointers don't point to anything. The things they point to or called pointees. And setting them up's a separate step. - Oh, right, right. I knew that. The pointees are separate. So how do you allocate a pointee? - OK, well, this code allocates a new integer pointee. And this part sets x to point to it. - Hey, that looks better. So make it do something. - OK, I'll dereference the pointer x to store the number 42 into its pointee. For this trick, I'll need my magic wand of dereferencing. - Your magic wand of dereferencing? That's great. - This is what the code looks like. I'll just set up the number and-- [POP]. - Hey, look, there it goes. So doing a dereference on x follows the arrow to access its pointee. In this case, to store 42 in there. Hey, try using it to store the number 13 through the other pointer, y. - OK. I'll just go over here to y and get the number 13 set up and then take the wand of dereferencing and just-- [HORN] whoa! - Oh, hey, that didn't work. Say, Binky, I don't think dereferencing y is a good idea, because setting up the pointee is a separate step. And I don't think we ever did it. - Hmm, good point. - Yeah, we allocated the pointer y. But we never set it to point to a pointee. - Hmm, very observant. - Hey, you're looking good there, Binky. Can you fix it so that y points to the same pointee as x? - Sure, I'll use my magic wand of pointer assignment. - Is that going to be a problem like before? - No, this doesn't touch the pointees. It just changes one pointer to point to the same thing as another. - Oh, I see. Now, y points to the same place as x. So wait, now y is fixed. It has a pointee. So you can try the wand of dereferencing again to send the 13 over. - Oh, OK. Here it goes. [POP] - Hey, look at that. Now, dereferencing works on y. And because the pointers are sharing that one pointee, they both see the 13. - Yeah, sharing, whatever. So are we going to switch places now? - Oh look, we're out of time. - But-- [END PLAYBACK] DAVID MALAN: All right, so we are not quite out of time. But let's go ahead and take our second 5 minute break here. And when we return, we'll take a closer look at Oscar and more. Back in 5. All right, so I claim that there's all these garbage values in your computer's memory. But how can you see them? What Binky did was, of course, try to dereference a garbage value when bad things happen. But we can actually see this with code of our own. So let me go ahead, quickly, and whip up a little program here, just like something we did in week one or week two, but without doing it very well. Let me go ahead and include standard io dot h as usual, int main void. And then let me go ahead and give myself an array of scores. How about an array of three scores? And we've done this before where we collected scores from a user. But this time, I'm going to deliberately make the mistake of not actually initializing those scores or even asking the human for those scores. I'm just going to blindly go about iterating from i equals 0 on up to 3. And on each iteration, I'm just going to presumptuously print whatever is at that location in scores bracket i. So logically, my code is correct in what it's trying to do, print out the values in scores. But notice that I have deliberately not initialized any of the 1, 2, 3 scores in that array. So who knows what's going to be there? Indeed, it should be garbage values of some sort that we couldn't necessarily predict in advance. So let me go ahead and make garbage, since this program is in a file called garbage.c. Compiles OK, but when I now run garbage, we should see three scores, which are cryptically negative, 833060864. Another one is 32765. And the third just happens to be 0. So there are those garbage values, because again, the computer is not going to initialize any of those values for you. Now, there are exceptions. We have, on occasion, used a global variable, a constant that is outside the context of main and all of my other functions. Global variables, if you do not set them, are conventionally initialized to 0 or nul for you. But you should generally not rely on that kind of behavior. Your instinct should be to always initialize values before thinking of touching or reading them as via printf or some other mechanism. All right, well, let's see how this understanding, now, of memory, can lead us to solve problems, but also encounter new types of problems, but problems that we can now hopefully understand. I'm going to go ahead and create a new program here. And recall from last week that it was very common for us to want to swap values. When Brian was doing our sorts for us, whether it was selection or bubble sort, there was a lot of swapping going on. And yet, we didn't really write any code for those algorithms. And that's fine. But let's consider that very simple primitive of just swapping two values, for instance, swapping two integers. Let me go ahead and give myself the start of a program and swap.c here. I'm going to include standard io dot h, int main void. And inside of main, I'm going to give myself two integers. Let's just give myself an int called x and assign it 1, an int called y and assign it 2. And then let me go ahead and just print out what those values are. I'll just say, literally, x is percent i comma y is percent i backslash n. And then I'm going to go ahead and print out x comma y, respectively. And then I'm eventually going to write a function called swap that swaps x and y. But let's assume, for the moment, that exists. It doesn't, because what I then want to do right after that is just reprint the same thing, x is now percent i, y is percent i, my presumption being that the values of x and y will be swapped. So how might I swap these two values? Well, let me go ahead and implement my own function. I don't think it needs to return anything, so I'm going to say void is the return type. I'll call it swap. It's going to take two arguments as input. We'll call it a and b, both integers. But I could call it anything I want. But a and b seems reasonable. And now, I want to go ahead and swap two values. Now, Brian was kind of doing this with his two hands last week. And that's fine, but we should probably consider this a little more closely. In fact, Brian, instead of numbers, let's do something a little more real world. I think you have a couple of beverages in front of you. BRIAN: Yeah. So right here, I have a red glass and a blue glass, which I guess we can use to represent two variables, for instance. DAVID MALAN: Yeah. Now, let me suppose-- I wish I'd told you in advance. I'd actually prefer that the red liquid be in the blue glass and the blue liquid be in the red glass. So do you mind swapping those two values, just like you swapped numbers last week? BRIAN: Yeah, sure. So I can just take the two glasses, and I can switch their places. DAVID MALAN: OK, wait, OK, that's not exactly-- you took me too literally. I think here, if we think of the glasses, now, as specific locations in memory, you can't just physically move the chips of memory inside of your computer to swap things. So I think I literally need you to move the blue liquid into the red glass and the red liquid into the blue glass so that it's more like a computer's memory. BRIAN: OK, I can try to do that. I'm a little nervous, though, because I feel like I can't just pour the blue liquid into the red glass, because the red liquid's already in there. DAVID MALAN: Yeah, so this probably doesn't end well, if he's got to do some kind of switcheroo between the two glasses. So any thoughts here? Like what is the real world solution to this weird but real problem, where we want to swap the contents of these two locations, just like Brian was swapping the contents of two memory locations last week? Brian, if you have your eye on the chat in parallel, might anyone have ideas on how we could swap these two liquids? BRIAN: Yeah, a couple of people are saying that I need a third glass. DAVID MALAN: All right, well Brian, do you happen to have a third glass with you back there behind back stage? BRIAN: In fact, I think I do. So I have a third glass here that just so happens to be empty. DAVID MALAN: OK. And how would you, now, go about swapping these two things? BRIAN: All right, so I want to put the blue liquid inside the red glass. So the first thing I need to do, I think, is just to empty out the red glass to make space for the blue liquid. So I'm going to take the red liquid, and I'm just going to pour it into this extra glass. DAVID MALAN: Temporarily though, right? BRIAN: Temporarily, yeah. DAVID MALAN: OK. BRIAN: Just to keep it to store it there. And now, I think I can just pour the blue liquid into the original red glass, because now I'm free to do so. So I'll pour the blue liquid there. And I think the last thing I need to do now is, now this blue-- this glass that originally held the blue liquid is now empty. So the red liquid, which was inside of this temporary glass over here, I can take the red liquid and just pour it into this glass here. And now, I didn't swap the positions of the glasses. But the liquids have actually switched places. Now, the blue liquid is on the left and the red liquid is on the right. DAVID MALAN: Awesome. Yeah, I think that is a more literal implementation of what you were doing and taking for granted last week, swapping the two values in two separate locations. So it seems pretty straightforward. I just need a little more space. I need a temporary variable in code, if you will. And it seems I need three steps. I need to pour one out, pour the other one out, pour the other one back in. So I think I can translate that into code here. Let me go ahead and give myself a temporary variable, like a glass, like Brian did. And I'll call it tmp, T-M-P, which is pretty conventional when you want to swap two things in code. And I'm going to sign it, temporarily, the value of a. I'm going to then change the contents of a to equal whatever the contents of B are. And then I'm going to change b to be whatever the contents of tmp were. So this feels pretty reasonable and pretty correct, because it's just a literal translation into code, now, of what Brian did in the real world. And I think this will compile. So let's start there, make swap. It does-- oh, doesn't compile. OK, previous implicit declaration, oh, so many errors, my god. Implicit declaration of function swap-- wait a minute. I've seen that before. I've made this mistake before. You might have as well. Anytime you see this, recall it's just that you're missing your prototype. Remember that the compiler is going to take you literally. And if it doesn't know the word swap exists when it sees it, it's not going to compile successfully. So we need to include my prototype at the top of my file. Now, let me try this again, make swap. OK, that compiles. Let me go ahead now and run swap and recall that, in main, what I did was initialize x to 1, y to 2. I then print out what x is and what y is. I call swap, and then I print out what x is and y is again. So I should see 1, 2, and then 2, 1. So lets hit Enter. Huh, it does not seem to be working. Well, let's try it again, just in case-- no, not working. Well, let me try this. Let me add some-- printf is my friend. Let me go ahead and say a is percent i. b is percent i backslash n, a, b. So let's print that out. And let's print that out twice. So this would be a reasonable debugging technique. If you want to know what's going on underneath the hood, add some printf's. Let me go ahead and make swap. That compiles, dot slash swap. And let's see, a is 1, b is 2, a is 2, b is 1. But then x and y are unchanged. So I feel like my logic is right. It's switching a and b. But it's not actually switching x and y. And I could confirm as much, right? The more powerful way to debug this would be to run debug50, set a break point, for instance, at line 17, step through my code, step by step, stepping into the swap function. But for now, it seems clear that swap works. But main isn't really seeing those results. So what's actually going on? Well, let's consider this real world incarnation of what my memory is so I can actually move things around. And this is all thanks to our friends in the theater's prop shop in back. If we think of this as my computer's memory, initially, it's all garbage values. But I can use this as a canvas to start laying things out in memory. But calling functions is something we've taken for granted thus far. And it turns out, when you call functions, the computer, by default, uses this memory in kind of a standard way. In fact, let me go ahead and draw a more pictorial picture. Let me draw a more literal picture here, if you will, of the computer's memory again. So if this is the computer's memory and we zoom in on one of the chips, and we think of the chip as having a whole bunch of bytes like this. Let's abstract away the actual hardware and think of it as we have been. It's just this big rectangular region of memory, not unlike all of those Oscar the Grouches a moment ago. But by convention, your computer does not just plop things in random locations in memory. It has certain rules of thumb that it adheres to. In particular, it treats different portions of your computer's memory in different ways. It uses it in a standard way so that it's not completely random. For instance, when you run a program by doing dot slash something on CS50 IDE or on Linux more generally, or you double click an icon on Mac OS or Windows, that triggers the computer's-- the program's 0's and 1's stored on your hard drive to be loaded up here, to what we'll call machine code, which again, is the 0's and 1's. So if you think again, metaphorically, as your memory is this rectangular region, then the machine code, the 0's and 1's composing your program are loaded into the top part of memory. And again, top, bottom, left, right, it has no fundamental technical meaning. It's just an artist's rendition. But it does go into a standard location. Below that are all of your global variables. So are your constants that you put outside of your functions. Those are going to end up just below the machine code, so again, at the top of your computer's memory. Below that is what's called the heap. And this is a technical term. And it refers to a big chunk of memory that malloc uses to get you some spare memory. Any time you call malloc, you are given the address of some chunk of memory up in this region, below the machine code, below your global variables. And it's kind of a big zone. But the catch is that other parts of your memory are used differently. In fact, whereas the heap is considered to be here on down, somewhat worrisomely, the stack is considered to be here on up. This is to say, when you call malloc and ask for memory, that gets allocated up here. When you call a function, though, those functions use what's called stack space instead of heap space. So any time you call a function, main or swap or strlang or string compare or any of the functions you've used thus far, your computer will automatically store any of the local variables or parameters from those functions down here. Now, this is not necessarily the best design, because you can see the two arrows pointing at one another is like two trains barreling down the tracks at one another. Bad things can eventually happen. Thankfully, we typically have enough memory that these two things don't collide, but more on that in just a bit. So again, when you call functions, memory down here is used. When you use malloc, memory up here is used. Now, for my swap function, I'm not using malloc. So I don't think I have to worry about heap. And I don't have any global variables. And I don't really care about my machine code. I just need to know that it's stored somewhere. But let's consider, then, what the stack is all about. The stack, indeed, is this sort of dynamic place where memory keeps getting used and reused. So for instance, when you call main, as you might when this swap program is run, main uses a sliver of memory at the bottom of this picture, if you will. So the local variables in main, like x and y, end up at this bottom portion of memory. When you call swap, swap uses a chunk of memory just above main, pictorally, in this diagram, such as variables a and b and temp, for that matter. And then, once swap returns and is done executing, that sliver of memory essentially goes away. Now, it doesn't disappear. Obviously, there's still physical memory there. But that's when we get into the discussion of garbage values again. They're still like Oscar the Grouches all over the place. You just don't know, or at this point care, what the values are. But there are values there. And that's why, a moment ago, when I printed out that uninitialized score's array, I did see some bogus values, because there's still going to be 0's and 1's there that are left over from before. The problem, though, is this. Let me go over to this physical incarnation of our memory and consider this as being our stack, so it's growing on up. And in fact, if I want to have two local variables like I do, x and y, let's go ahead and think of this row of memory here as being main, for instance, here. And I'm going to go ahead and replace all these garbage values with an actual value that I care about. And the actual values that I care about, we're going to call x and y, just as before. So each of these Oscars happens to be one byte. But an int is 4 bytes. So thankfully, from our friends in the prop shop, we have these bigger integer-sized blocks. And I'm going to go ahead and slide this in here. And we're going to think of this, in a moment, as x. And indeed, I'm going to go ahead and call this x with a marker. And then I'm going to go ahead and give myself another integer, a size 4, and put it down here. And we're going to think of this as y. And recall, what do I initialize these values to? Well, the value 1, initially, and the value 2. But then I called the swap function. And the swap function has two arguments, a and b. And those, by design, become copies of x and y, because I passed in x comma y. And I defined swap as taking a comma b. So I think what I need to do, physically here, is now think of this second row of memory as now belonging to the swap function, not to main. And inside of this second row of memory, I'll think of this as belonging to swap. And within the swap row, I'm going to have another integer of size 4. And we're going to call this one a, as down there, a. And then I'm going to have another chunk of size 4. And we're going to call this b. And again, because those are just the arguments, x comma y, otherwise now known as a comma b, I copy 1 and 2 into those values. But swap has a third variable. Brian proposed a temporary variable. So I'm going to go ahead and give myself four more bytes, thereby getting rid of whatever the garbage value's there and actually setting it to an integer call tmp. So I'm going to go ahead and call this thing tmp, T-M-P. And what did I do first? I set tmp equals to a. So tmp equals to a. So if a is 1, tmp is 1. Then what did I do? I then did a equals b. So b is 2. a is 2 as well. And then lastly, what did I do? I did b gets tmp. So I have to go ahead and change this to be whatever the value of tmp is, which is now the number 1. So you can see that swap is correct insofar as it is swapping the values of a and b. But the moment swap returns, these return to being thought of as garbage values. Main is still in the middle of running. Swap is no longer running. But these values stay there. So those are garbage values. We happen to know what they are, but they're no longer valid, because when I go to print out x and y for the second time, what are x and y? They're still the same. And so this is to say, when you actually write code that takes arguments and you pass arguments from one function to another, those arguments are copied from one function to another. And indeed, x and y are copied into a and b. So your code may very well look correct in that it's swopping correctly. But it's only swapping correctly in the context of swap, not touching the original values. So what I think we need to do, fundamentally, is reimplement swap in such a way that we actually change the values of x and y. But how can we do this? Brian, if we could call in someone here. How could I conceptually change my implementation of swap so that it somehow empowers me to change x and y, not change copies of x and y? What could I pass into swap, Brian? BRIAN: Igor is suggesting that we use pointers instead. DAVID MALAN: Yeah, so perhaps the leading question here today. But pointers would seem to give us a solution. If pointers are essentially like a treasure map to a specific address in your computer's memory, what I should really do from main to swap is pass in not x and y literally, but why don't I pass in the address of x and the address of y, so that swap can now go to those addresses and actually do the sort of swap that Brian enacted in person. So give the function a sort of map to those values, pointers to those values, and then go to those values. So how might I do this? Well, the code has to be a little different now. When I call swap this time, what I really need to do is pass in the addresses of these two variables. So I don't necessarily know what those addresses are. But for the sake of the story, we can just assume that this address, for instance, is like, 0x123. And then four bytes away from that might be 0x127, for instance. But again, it doesn't really matter what it is. But they do have addresses, x and y. So a pointer recall tends to be pretty big. So we needed to get out a bigger piece of wood, eight bytes that represents a pointer. And I actually need to use a bit more memory in swap now. If I now declare a to be, not an integer, but a pointer to an int, that is a int star variable, I could call this thing a now. And I could store, in it, the address of x, like 0x123. If I then change the definition of b to be not an integer, but a pointer to an integer, that is another int star, which happens to be eight bytes. I'm going to use a little more memory for this thing, but that's OK. And its name is going to be b now. And it's going to contain 0x127. I still need a temporary variable. I still need a temporary variable, but that's fine. I just need four bytes for that, because the variable itself just needs to store an int, like Brian temporarily stored it in a glass. So I just need an additional four bytes, like before, for that. And now, let's just consider the logic. Here's main. And swap is now using these 3-- 2 and 1/2 rows of memory. And that's fine. It's growing upward as I proposed. X is at address 0x123. y is at address 0x127. Therefore, a and b, I propose conceptually, like Igor proposed, store the addresses of a, x and y, respectively. And now my code, I think, needs to say this. Go and store, in the variable tmp, whatever is at the address a. So you can kind of think of this as being an arrow down here. Follow the arrow, OK. What is at address 0x123? The number 1. So we put one in tmp, just like before. Then what do we do? Well, now, I'm going to go ahead and change, not the value of a, but I'm going to change what is at the location in a to be whatever is at the location in b, which is an arrow pointing down here, 0x127. So I'm going to change this 1, now, to be a 2. And the third and final step, recall, is for me, now, to go, not to b, but to go where b points to, which happens to be y, and change that to be the value of tmp, which of course, is up here. And at this point in the story, it's still just three lines of code. They're different types of lines of code. It's three lines of code. But when swap is done executing, notice what we've done. We have successfully swapped x and y by letting swap go to those addresses as opposed to just naively getting copies of the values therein. Now, even though this code is going to look a little cryptic, it's, frankly, just an application of the logic we've seen thus far. I'm going to go ahead and go back to my old buggy version. And I'm going to change the definition of swap to say that it doesn't take two integers, a and b, but two pointers to integers a and b. And the way you declare a pointer recall is the type of variable you point at followed by a star and then the name of it. And we haven't seen it, admittedly, in the context of a function taking parameters yet. But it's quite simply that. I added the stars. Down here, I need to say, store in tmp, whatever is at a. How do I express go to a? Just add a star here. How do I express go to a and put whatever is at b? I add stars there. How do I say, go to b and store whatever is at tmp? I add one star there. So tmp is just a simple integer. It's just an empty glass like Brian had. There's nothing fancy there. So we don't need stars around tmp. But I do, now, need to change how I'm using a and b, because now they are addresses that I actually want to go to. There's no need for the address of operator in this context. But up here, I'm going to need to make a change. I do need to change the prototype to match. So that's just a copy paste. But I bet you can imagine what, lastly, needs to change. When calling swap, I don't want to pass in naively x and y, because again, they're going to get copied. I want to pass in the address of x and the address of y, so that swap now has sort of special access to the contents of those locations in memory so that it actually can make some changes therein. And that, indeed, if I now recompile this program, make swap, and I do dot swap and cross my fingers, voila. Now, I have successfully swapped lines of code. So last week, if you were wondering, perhaps, why we didn't show you how to do swap, we could have. And we didn't need a special function. You don't necessarily need pointers if we did all of this in main. But I'm trying to introduce an abstraction, this function that does swap just like Brian swapped those glasses for us. And to pass values from one function to another, you do need to understand what's going on in your computer's memory so that you can actually pass in little breadcrumbs again, treasure maps to those locations and memories, again, thanks to these things called pointers. All right, well let me propose and emphasize, then, that this design of the heap being up at the top, where malloc uses memory and the stack being at the bottom where your own functions use memory, this is a problem clearly waiting to happen. And those problems actually have names. And some of you who have programmed before might know some of these terms, either heap overflow or stack overflow. And in fact, many of you might know stackoverflow.com as just a website. Well, there is an origin story to its name. A stack overflow refers to the process of calling a function so many times that it overflows the heap. That is, every time you call the function, like I did here, you use more and more rows, so to speak, of memory. And if you call so many functions again and again, eventually, you may very well run over the area of memory called heap. And at that point, your program will crash. There is no fundamental solution to that problem other than don't do that. Don't use too much memory. But that can be hard to do. And indeed, that's one of the dangers of programming today. And we can actually induce this a little bit deliberately ourselves. And in fact, I thought we could revisit, for instance, where we left off with Mario last time, which was this picture here. Recall that this was a pyramid, of course, simpler than the one you might have played with for problems at 0. But it's a recursive pyramid in that you can define a pyramid of height 4, in terms of a pyramid of height 3, in terms of a pyramid of height 2 and a height 1. And indeed, I built that last week using these very blocks. Well, you can implement Mario's pyramid like this in a couple of different ways. One is just using week one style iteration, using a loop. And in fact, let me go ahead and whip up a quick solution that does exactly that. Let me go ahead and call this mario.c. And I'm going to go ahead and include cs50.h. So we can use one of our get functions. I'm going to use standard io dot h. And I'm going to do int main void. And all I want to do is print out this pyramid. But I want to ask the user for the height. So I'm going to say int height equals get int. And we'll ask the user for the height, just like you did for problem set 1. And then I'm going to go ahead and draw a pyramid of that height. Now, draw doesn't exist. But that's fine. I'm going to go ahead and draw this now, implement draw myself. It doesn't need to return a value, because I'm just printing stuff on the screen. Function's called draw, and it's going to take an input called h, for instance. h for height, but I could call its argument anything I want. And then I'm just going to do this, for int i gets 1, i less than or equal to h, i++. And then inside of this, this is where you might recall, from problem set one, have found a nested loop to be useful. Let me do int j gets 1, j less than or equal to i, j++. This will be similar but not identical to either the less comfortable or more comfortable version of Mario from the past, because this pyramid is shaped in a different direction. Now, you print a hash there. And then let me go ahead and print a new line here. So I did this super quickly. But logically, what I'm doing is iterating over every row, so from 1 through h, so row 1, 2, 3, 4, for instance. And then on each row, I'm deliberately iterating from 1 through i. So I print 1, then 2, then 3, then 4. And again, I could zero index if I want. I find that in this context, more user friendly, more intelligible to me to index from 1, totally reasonable if you think there's a compelling design argument. So let me go ahead and make Mario. Ah, darn it. Oh, I missed my prototype. So notice, it's not understanding draw. So the fix for that is to either move the whole function or, as we've preached instead, to just put your prototype up top. Let me recompile Mario. OK, now successful. Mario, let's do a height of 4, and voila. Now, I have a relatively simple-- though I certainly did it faster than you might without some practice-- implementation of Mario's pyramid. But here's where things get kind of cool. Let me stipulate that that is a correct iterative solution, even if it might take you some number of steps or trial and error to get that iterative loop-based code correct. Let me change this, now, to be recursive. And recall, a recursive function is one that calls itself. How do you print a pyramid of height h? Well, recall that you print a pyramid of height h minus 1, and then you proceed to print one more row of blocks. So let me take that literally. for int i gets zero. i is less than h, i++. Let me go ahead and just print that extra row of bricks like this, followed by a new line. So now, I did this kind of fast. But what am I doing here? Well, if the height equals 1, I want this loop to iterate one time. If the height equals 2, I wanted to iterate two times, 3, and so forth. So I think, using my zero-indexing technique here, this will work too. But if you prefer, I could certainly just change this to a 1 and change this 2. But I'm going to go ahead and-- actually, no. In this case, I want to leave it as such, zero index, just like we typically do. All right, let me go ahead and compile this, make Mario. OK, oops, interesting. All paths through this function will call itself. So clang is being kind of smart here, whereby, it's noticing that in my draw function, I'm calling my draw function. And that's a process that never changes. In fact, let me see if I can override that. Let me use clang manually and compile a program called mario using mario.c. And let me go ahead and link in cs50. So I'm using our old school syntax from week two. OK, that compiled. And why did that compile? Well, make is, again, a program that uses your compiler clang. And we've configured make to be a little more user-friendly and a little more protective of you by turning on special features where we detect problems like that. By using clang directly now, I'm disabling those special checks. And watch what happens when I run Mario now for height of 4, for instance. Boom, it crashed. It didn't even print anything. It crashed pretty quickly. And again, a segmentation fault means you touched memory that you shouldn't. So what's going on? Well, if you think of this memory as representing main still, but then draw, draw, draw, draw, draw, draw. If every one of your calls to draw just cause draw again, why would it ever stop? It wouldn't seem to stop here, necessarily. So it seems that I'm missing a key detail in my recursive version. You know what? If there's nothing to draw, if height equals equals 0, let me go ahead, then, and just return immediately. Otherwise, I'll go ahead and draw part of the pyramid and then add the new row. So you need this so-called base case, which you literally choose to equal some simple value, like height of 0, height of 1, any hardcoded value, so that eventually, draw does not call itself. So let me go ahead and recompile this with clang or make. Let me rerun it, height of 4, and voila. It's still working just like the interior version, but it's now using recursion. So here's a sort of design question. Is iteration better than recursion? It depends. Iteration will always work. When using the iterative version, I will never overflow the stack and hit the heap. Why? Because I'm not calling functions again and again. There's only main and one invocation of draw. But with the recursive version, it's kind of a cool, powerful way to do things. Like, oh, I can draw you a pyramid of height h. Let me just have you draw me a pyramid of height h minus 1, and then I'll add a row. It's kind of this clever, cyclical argument that does work very elegantly. But there's a danger. And in fact, even though this base case ensures that it doesn't go forever, it could go on so long-- maybe let's try 10,000 invocations. So that worked OK. It's a little slow. I'm losing control over my keyboard. So Control C is your friend. Let me try this once more. Let me go ahead and do something like 2 billion and see if that works. Boom. So even that doesn't work. So there's this inherent danger with recursion, whereby, even though it empowered us last week to solve a problem even more efficiently with merge sort, we kind of got lucky, in that we weren't trying to crazy big things on Brian's shelf, because it would seem if you use recursion and call yourself again and again and again and again, even finitely many times, you might eventually touch memory you shouldn't. And what's the solution here? Unfortunately, it's don't do that. Design your algorithms, choose your inputs in such a way that there just isn't that risk. And we'll use recursion again in a few weeks time when we look at more sophisticated data structures. But again, there's always this trade off. Just because you can design something a little more elegantly doesn't necessarily mean that it's always going to work for you. But more commonly, are you likely to run into other problems as well? There's something called a buffer overflow. And this you will surely trip over in the coming weeks. A buffer overflow is when you allocate an array and go too far past the end of it. Or you use malloc and you, nonetheless, go farther than the end of the chunk of memory that you allocated. A buffer it's just a chunk of memory, so to speak, that you can use as you see fit. Buffer overflow means going beyond the boundaries of that array. You might use-- you're using, right now, video. You might know the phrase buffering from videos, like sort of buffering and annoying you on Netflix, because there's a spinning icon or whatnot. Well, that means exactly this. A buffer, in the context of YouTube or Zoom or Netflix, means some chunk of memory that was retrieved via malloc or some similar tool that gets filled with bytes comprising your video. And it's finite, which is why you can only buffer so many seconds or minutes of video before, eventually, if you're offline, you run out of video content to watch. And the stupid icon comes up, and you can watch no more, because a buffer is just a chunk of memory, an array of memory. And if Netflix or Google or others were to implement their code unsafely, they might very well go too far past that boundary as well. So with all this said, let's consider, in some of our final minutes here today, just what else we've been getting from these training wheels, because we do want to take them mostly off for you. So the CS50 library not only provides you with this abstraction of a string type, which again, doesn't give you any new functionality. Strings in C exist, just not by that name. They're known more properly as char stars. But all of these functions in the CS50 library can be implemented with other actual C functions that weren't from CS50, namely using one called scanf. But you're going to see, immediately, some of the dangers of using something like scanf, which is an old school function. It was not designed to be self-defensive like CS50's library. And so it's very easy to make mistakes. Let me go ahead, for instance, and create a file called scanf.c, just to demonstrate this function. I'm not going to use the CS50 library, just standard io dot h. And I'm going to give myself int main void. And I'm going to go ahead and give myself a variable x. And I'm going to go ahead and print out quote unquote, "x:" just like CS50's get int function does. And then I'm going to call scanf. And I'm going to go ahead and say, scan from the user's keyboard, an integer, and store it in the location of x. Then, I'm going to go ahead and print out, again, x, and a colon and a backslash percent i backslash n. And I'm going to print x. So what's going on here? In line 5, I'm declaring a variable called x, just like in week one. Line 6, just using printf, like in week one. The interesting stuff seems to be in line 7. Scanf is a function that takes input from the user, just like get int, get string, get float, and so forth. But it does it only by you having to understand pointers, because recall from our swap example, if you want to have a function, change the contents of a variable, as we did with a and b and x and y, you have to pass in the address of the variable, whose value you want to change. You can't just pass in x itself. So if we didn't use the CS50 library in week one, you would have been writing code like this just to get an int from the user. And you would have had to understand pointers. And you would have to understand ampersand and stars and so forth. It's just too much, when all we care about in the first weeks are loops and variables and conditions and sort of the fundamentals. But here, we now have the ability to call scanf, tell it to scan from the user's keyboard, so to speak, an integer, or percent f would give us a float or other such codes, and pass in the address of x so that scanf can go to that address and put the integer from the user's keyboard there. Line 8 is like week one stuff. I'm just printing out the value. And this is pretty safe. I'm going to go ahead and make scanf. It compiles OK. I'm going to go ahead and run it. I'm going to type in 50. And voila, it prints out a 50. But there's some weirdness, because if you run this program too and type in cat, well then x is 0. And there's no error checking. So immediately, you should glimpse that one of the features of the CS50 library, recall, is that we keep prompting the user again and again if they're not cooperating and giving you an int. So that's one feature you get from the library. But it turns out that get string is even more powerful, because if I go and change this program now, not to get an int, but something fancier like a string-- or wait, we're calling it char star now. I'm going to go ahead and do something very similar. I'm going to prompt the user for string s. And I'm going to use scanf. And I'm going to use percent s, just like printf uses percent s. And I'm going to pass in s. Now, to be clear, I don't need to do ampersand s here, because now, we all know that s is fundamentally an address. So it suffices just to pass in the address that you already have. Now, I'm going to go ahead and print out s colon, percent s backslash n, and print out s. But when I compile this, make scanf, it doesn't like it when I compile variable s's uninitialized when used here. All right, well if I really want to be sort of adventurous, I can override make's protections. And I can just compile this manually myself using scanf-- using clang directly. That worked, dot slash scanf. Let me go ahead and type in, for instance, "HI!" and you see weirdness, nul. Well, fortunately, make, and in turn clang, were kind of helping us help ourselves there. It was pointing out that you declared s. So you were declared 8 bytes for a pointer. But there's nothing there. It's a garbage value. And so there's nowhere to put this. And thankfully, printf and scanf are being smart enough by not just blindly going there and plopping H, I, exclamation point in a nul character. They're just leaving it alone. And this parenthetical nul is just a printf feature saying, you screwed up. If you see nul, you've done something wrong. It's just being generous and not crashing on you. If I actually want to get user's input, I need to be smarter than this. And I need to either allocate myself 4 bytes, as we've done earlier today. Or I could go back to week two stuff and say something like, give me 4 bytes. This, though, gives me 4 bytes on the stack somewhere down here in main's frame, so to speak. These rows are called frames. If I use malloc instead, it comes from the so-called heap, which not pictured, is sort of up here. And the only difference is that if I'm using malloc, I have to use free. If I'm using the stack, as I did in week two, I don't have to use free. It's automatically managed for me. So frankly, there's so much new stuff today. I like the idea of sticking with the old school arrays. So now, though, if I go ahead and make scanf, now it compiles with make. If I then run scanf and type in, HI!, voila, it seems to work. But that's because I was smart and anticipated that H-I, OK four characters. I gave myself 4 bytes. But what if the user types in, HI THERE, DAVID, HOW ARE YOU? Clearly, more than four bytes. And I hit Enter now, something weird there happened. The rest is just lost. And this would really be annoying and very frustrating if you-- trying to get user input in the first week of the class. Get string avoids this for you. Get string calls malloc for you. And it calls it for as big a chunk of memory as the string the human types in. Long story short, we sort of watch what they're typing character by character by character. And we make sure to allocate or reallocate just enough memory to fit whatever it is the human has typed in. So scanf is, essentially, how a function like the CS50 library works underneath the hood. But it is doing all of this for you. And as soon as you take away training wheels like that, or frankly, libraries like that, which it really is at the end of the day. It's not just a teaching tool. It's a useful library. You have to start implementing more of this low-level stuff yourself. So again, there is a trade off. If you don't want to use something like the CS50 library, that's fine. Now, the onus is on you to avoid all of these possible error conditions. All right, with that said, we have one final feature to give you in order to motivate this week's problems, wherein you'll actually explore and manipulate and write code to change files. And for that, we need one final topic of file I/O. File I/O is the term of art that describes taking input and output from files. Pretty much every program we've written thus far just uses memory, like this here, whereby, you can put stuff in memory. But as soon as your program ends, boom. It's gone. The contents of memory are gone. Files, of course, are where you and I in the computing world save our essays and documents and resumes and all of that permanently on your computer. In C, you have the ability, certainly, to write code yourself that saves files long term. So for instance, let me go ahead and write my own program here, a phonebook program that stores names and numbers in a file. I'm going to go ahead and include, just for convenience, the CS50 library again, because I don't want to deal with scanf. I'm going to go ahead and save this, incidentally, as phonebook.c. I'm going to go ahead and include, not just the CS50 library, but standard io. And preemptively, I'm going to go ahead and include string.h as well. And I'm going to go ahead in my main function. And I'm going to use a few new functions that we'll see only briefly here. But in the next problem set, will you explore these in more detail. I'm going to give myself a pointer to a file. It turns out, weirdly, that in all caps, FILE, this is a new data type that does come with C that represents a file. So I'm going to go ahead and give myself a pointer to a file, the address of a file. And I'm going to call the variable file. I could call it f I could call it x. I'm going to call it lowercase file, just to be clear. And I'm going to use a new function called f open, which means file open. And file open takes two arguments. It takes the first argument, which is the name of a file you want to open. I'm going to open a file called phonebook.csv. And then I'm going to go ahead and open it, specifically, in append mode. Long story short, you can open files in different ways, to read them, that is just look at their contents, to write them, which is to change their contents entirely, or to append to them, a, which means to add row by row to them, so to keep tacking on more information to them. I'm going to go ahead and, just to be safe, I'm going to say if file equals equals nul, because recall that nul signifies something went wrong, let's just return now. Maybe I mistyped the name of the file. Maybe it doesn't exist. Something went wrong, potentially. I'm going to check for that by saying, if file equals equals nul, just quit out of the program now. But after that, I'm going to go ahead and get a string. But we can call that char star now, called name. And I'm going to ask the user for a name. And we've done this before. I'm going to go ahead and ask them for a number, phone number. And we've done this before. The only difference, now, is I'm calling string char star. And now, here's the cool part. It turns out, if I want to save this name and number to that file permanently in a CSV-- if unfamiliar, popular in the consulting world, the analytics world. It's just a spreadsheet, a comma-separated value file that you can open in Excel or numbers or Google spreadsheet. I'm going to go ahead and, not printf, but fprintf to that file, a string followed by a comma, followed by a string, followed by a new line, plugging in the name and the number. And then down here, I'm going to close the file. So this is new. fprintf is not printf, which prints to your screen. fprintf prints to a file. So you have to pass in one more argument, the first one, which is the pointer to the file that you want to send these new strings to. Then you still provide a format string, which says, hey fprintf, this is the kind of data I want to print to the file. And then you plug in the variables, just like we've always done with printf. And then lastly, we close the file. So in short, this program would seem to prompt a human for a name and number. And then it's going to go ahead and write those names and numbers to the file. So let me go ahead and make phonebook. OK, no mistake so far, dot slash phonebook, David, 949-468-2750. OK, let me run it once more, even though nothing seems to have happened. Brian, how about 617-495-1000, Enter. Let me check my file browser here. Notice, all of the files we've created today, including, if I zoom in, not just phonebook.c, but phonebook.csv. And if I double click that, notice what's inside of this. Voila, David's name, Brian's name, and each of our numbers. And even cooler than that, let me go ahead and close this. Let me go ahead and download this file using the IDE. And that's going to put it into my Downloads folder. Let me go ahead and click on it. And it's going to open Excel or Numbers or whatever you happen to have on your Mac or PC. I'm going to go ahead and just proceed. And voila, looks a little stupid in this formatting here. But I've opened up a spreadsheet that I, myself, generated using fopen, fprintf, and fclose. So already, now that we have pointers at our disposal, can we actually manipulate things like files, which is quite cool. But we're going to do that this week, not with text, but with actual specific types of files. And indeed, recall this kind of thinking here. If you glance at this, it's probably pretty cryptic. It looks like machine code, but it's not. This is, perhaps, the simplest representation of a smiley face inside of a file. If you have a bitmap file, a map of bits, a grid of bits, those bits, quite simply, could literally be 0's and 1's. And if you assign the color black to 0 and the color white to 1, you could actually think of this same grid of 0's and 1's as representing, indeed, a smiley face. In other words, here are some pixels. We talked about pixels in week zero. Pixels are just the dots that compose a graphic file on your computer. And pixels are everywhere. All of us, now, tuning in live via Zoom or YouTube or the like, we're watching streams of pixels, which compose multiple images and multiple images compose video that appears to be moving at, like, 20 something or 30 frames per second, images per second. Now, of course, there's only so much fidelity in these kinds of images. And it's quite common in the case on TV and in movies, if there's some bad guy that's been picked up with some surveillance footage or the like, invariably, the folks on Law & Order and the like can just kind of enhance the video and zoom in and see exactly the glint in the person's eye that reveals who committed some crime. Well, that's all kind of nonsense. And it derives from some of the primitives we introduced in week zero. In fact, just to poke fun at this, let me go ahead and play on a few seconds of this TV show here in the US called CSI, just to give you a sense of just how commonplace this kind of logic is. [VIDEO PLAYBACK] - We know. - That at 9:15, Ray Santoya was at the ATM. - So the question is, what was he doing at 9:16? - Shooting the 9 millimeter at something. Maybe he saw the sniper. - Or was working with him. - Wait, go back one. - What do you see? [CLICKING] - Bring his face up, full screen. - His glasses. - There's a reflection. [TYPING] - That's Neuvitas baseball team. That's their logo. - And he's talking to whoever's wearing that jacket. - We may have a witness. - To both shootings. [END PLAYBACK] DAVID MALAN: So unfortunately, today will rather ruin a lot of TV and movie for you, because you can't just zoom in infinitely and see more information if that information is not there. At the end of the day, there's only a finite number of bits. And case in point, here's a photograph of Brian. And you might see that, oh, there's a glint in his eye. Let's see what was being reflected in his eye there. And so if we Zoom in on this image here of Brian, and maybe we zoom in a little further, that's all that's actually there. You can't just click the enhance button and see more, because at the end of the day, these are just pixels. And pixels, per week zero, are just 0's and 1's, and finitely, many so. So what you see is what you get. Now, with that said-- and actually, we can poke fun of this, too, here. Let me just play one other short clip from Futurama, which kind of hammers home this point as well, but more playfully so. [VIDEO PLAYBACK] - Magnify that death speed. Why is it still blurry? - That's all the resolution we have. Making it bigger doesn't make it clearer. - It does on CSI: Miami. - [SIGH] [END PLAYBACK] DAVID MALAN: So there, we have two clips talking, rather, to one another. But I have to update things for 2020. You can't really pick up the internet these days or magazine these days, if you even would, that doesn't somehow mention machine learning and artificial intelligence and fancy algorithms via which you can do things that previously weren't quite possible. And that's actually kinda sorta the case. You might recall from week zero, that we found this beautiful watercolor painting in the Harvard archives that's only about 11 inches tall total. And yet somehow, it's 13 feet tall here behind me. Now, normally, if you were to just enhance this watercolor painting, it would start to look pretty stupid pretty quickly with lots and lots of pixelation, even if you used a very fancy camera, as the archives do, to capture the original image. But we wanted to blow it up to 13 feet tall so that it would stand at high quality behind us this whole time. And there, we actually did use enhance, in some sense. So using, long story short, fancier algorithms than those last week, you can use artificial intelligence, machine learning, to actually analyze data and find patterns where there weren't-- that aren't necessarily visible to the human eye. So for instance, if we take the original here and start to zoom in, it looks pretty good at this resolution. But it's pretty smooth. You don't really see the fact that this was paint on an actual canvas. So this was just zooming in on Photoshop. But when you actually run an image like this through fancy machine learning-based software, artificial intelligence, you can begin to improve it and actually see, not just this window from the top of one of the buildings, which is pretty glossed over here in Photoshop, you can start to see more detail. So this is literally the before, just zooming in Photoshop. This is after, actually applying fancy artificial intelligence algorithms that notice, wait a minute, there's a little discoloration there. Wait, there's a little discoloration there. And nowadays, enhance is increasingly becoming a thing. It's still inferring. It's not resurrecting information that was necessarily there. It's doing its best guess, really, algorithmically, to reconstruct what the image actually was. And if we zoom in further, you can, perhaps, see that this is really starting to get blurry if you just use Photoshop and keep zooming in. But if you run it through fancy enough algorithms and start to notice slight discolorations that aren't super visible to the human eye, we can enhance that even further. And you can't do it infinitely so. And in some sense, we're creating information where there isn't necessarily that information there. So whether or not these kinds of things hold up in court is another question. But it can improve the fidelity of images like this. And indeed, it allowed us to zoom in from 11 inches to 13 feet instead. So when it comes to manipulating images, ultimately, we do have some programmatic capabilities, including this file pointer, like we just saw, and also, a few other functions as well. And our final examples, here, will lay the foundation for what you'll do this coming week, which is manipulate your very own graphical files with a newfound understanding of pointers and addresses and now files and input and output. For instance, I'm going to go ahead and open up a program here called-- give me just one second. I'm going to open up a program here called jpeg.c. And this program, jpeg.c, which I wrote in advance, which is on the course's website, does the following. It first declares a type called byte. It turns out, in C, there's no common definition of what a byte is. A bite, as we know it, is a bit. And it turns out, the simplest way to create a byte is to define our own, just like we've defined a string, just like we've defined other types too, like a student, in order-- a person, rather, in order to give us a byte. So this first line of code just declares a data type called byte, using another, more arcane data type called u int a underscore t. But more on that in the problem set. That this just did invent something called byte. Notice, in this program, I'm resurrecting the idea from week two of command line arguments, where we can take input from the user. Notice that I'm checking if the user typed in two arguments. And if not, I'm returning one immediately to signify error. In line 17, I'm using my new technique. I'm opening a file using the name of the file that the human typed at the command line. And this time, I'm opening it to read it with quote unquote, r instead of a. But if there's not a file-- so if bang file, that is, if exclamation point file, or if file equals equals NULL, those mean the same thing. I can go ahead and return one, signifying an error. Down here, I'm doing something a little clever. It turns out that with very high probability, you can determine if any file is a jpeg by looking only at its first three bytes. A lot of file formats have what are called magic numbers at the beginning of their files. And these are industry standard numbers, 1 or 2 or 3 or more of them, that is just commonly expected to be at the beginning of a file, so that a program can quickly check, is this a jpeg? Is this a gif? Is this a Word document? Is this an Excel file? They tend to have these numbers at the beginning of them. And jpegs have a sequence of bytes that we're about to see. This line of code 24 here, as you'll see in the next problem set, is how you might give yourself a buffer of bytes, specifically an array of three bytes. This next line of code, as you'll see this coming week, is called fread. fread, as the name suggests, reads from a file. That is, it grabs bytes from a file. And it's a little fancy to use, but you'll get more comfortable with this over time. It reads into this buffer, its first argument, the size of this data type, the size of a byte. And it reads in this many of those data types from this file. So again, it's for arguments, which is kind of a lot from what we've seen. But it reads from this file, three bytes into this array, a.k.a. buffer, called bytes. So this is just how you write code that doesn't put data in a file, but read it from it. And then here, notice our hexadecimal. So we've come full circle. If bytes bracket 0 equals equals 0xff and bytes bracket 1 equals 0xd8 and bytes bracket 2 equals 0xff, this definitely looks cryptic to you. But that's just because I looked up in the manual for jpegs, and it turns out that almost any jpeg, rather, must start with 0xff, 0xd8, 0xff. Those are the first three bytes of any jpeg on your Mac, your PC, on the internet. There are always those three bytes. It turns out, the fourth byte further decides whether or not a file is actually a jpeg. But the algorithm for that's a little fancier, so I kept it simple. If the first three bytes of a file are those, maybe you have a jpeg. But if you don't have exactly those three bytes, you definitely don't have a jpeg. And so what I can do, here, is as follows. In today's code-- let me go ahead and grab two other files that I brought with me. And one happens to be a photograph again. Give me one second. I brought with me a few files, one of which is called brian.jpeg, which is the same photo of Brian. And then I have a gif, which of course, is not a jpeg, that is this cat typing here. And what I, effectively, have in front of me now is a program that if I do make jpeg, because this file is jpeg.c, and I run dot slash jpeg, I can type in something like cat.gif at the command line as an argument, hit Enter, and I should see no. By contrast, if I pass in Brian's jpeg at the command line as an argument, I see maybe. And again, maybe only because the algorithm for actually adjudicating whether something is a jpeg is a little more complicated than that. But indeed, I can now access the individual bytes, and therefore pixels, it would seem, of an image file. And in fact, we can even do this. Let me go ahead and show you one last program that we wrote deliberately in advance, just to give you a taste of what's coming with the next problem set. This program is a reimplementation of the program you've probably used one or more times called CP. Recall that CP is a program in the IDE and in Linux, more generally, that allows you to copy a file. You do CP, space, the filename, space, the new filename. How does this work? I now have all of the building blocks with which to copy files myself. So again, I'm defining a byte up here. I'm defining main as taking command line arguments here. And notice one change. I'm not using the CS50 library. So even what was previously string in week two is now char star. Even here for argv, I'm making sure that the human types in three words, the program's name and the source file and the destination file. I'm using fopen again. I'm opening the source file here from argv1. I'm making sure it's not nul. And then I'm quitting if it is. I'm then-- here's something new, opening the destination file here, also with fopen. But I'm using quote unquote, "w." I'm opening one file with r, one file for w, because I want to read from one and write to the other. And then down here, this loop is a clever way of copying one file to another. I'm giving myself a buffer of one byte, so just a temporary variable, just like Brian's temp or empty glass. And I'm using this function, fread. I'm reading into that buffer via its address, the size of a byte, specifically one byte from the source file. And then, in that same loop, I'm writing from that buffer, the size of a byte, specifically one byte, to the destination. So literally, the CP program you might have seen me use or you yourself have used to copy files, is literally doing this. It's opening one file, iterating over all of its bytes, and copying them from source to destination. And then lastly, it's closing the file. And these last two examples deliberately fast, because this whole week will be spent diving into file I/O and images thereof. But all that we've done is use these fread, fopen, and fwrite and f close, to manipulate those very files. So for instance, if I now do this, let me do make cp. OK, seems to compile, dot slash cp, brian.jpeg. How about brian2.jpeg? And hit Enter. Nothing seems to happen. But if I go in here and double click on brian2, we see that we have a second copy of Brian's actual file. So this coming week, you'll experiment with multiple file formats for images. The first is jpegs. And we will give you a so-called forensic image of a whole bunch of photographs from a digital memory card. In fact, it's very common these days, certainly in law enforcement, to take forensic copies of hard drives, of media sticks, of phones and other devices, and then analyze them for data that's been lost or corrupted or deleted. We'll do exactly that, whereby, you'll write a program that recovers jpegs that have been accidentally deleted from a digital memory card. And we'll give you all copies of that memory card by making a forensic image of it, that is copying all of the 0's and 1's from a camera and giving them to you in a file that you can fread and then fwrite from. We'll also introduce you to bitmap files, BMP's, popularized by the Windows operating system for wallpaper's and the like. But we'll use them to implement using pointers and using file I/O, your very own Instagram-like filter. So we'll take this picture, here, of the Weeks footbridge here in Cambridge, Massachusetts by Harvard. And we'll have you implement a number of filters, taking this original image, for instance, and desaturating it, making it black and white, by iterating over all of the pixels top to bottom, left to right, and recognizing any colors, like red or green or blue or anything in between, and changing them to some shade of gray, doing a sepia filter, making things look old school, like this photo was taken many years ago, by similarly applying a heuristic that alters the colors of all of the pixels in this picture. We'll have you flip it around so you have to put this pixel over here and this pixel over there. And you'll appreciate exactly how files are implemented within your own hard drive and phone. And you'll even implement, for instance, a blur filter, which no accident, makes it harder to see what's going on here, because you're starting to, now, average together pixels that are nearby each other to kind of gloss things over and deliberately make it harder to see here. And so we'll even, if you so choose, have you implement edge detection, if feeling more comfortable, where you find the edges of all of the physical objects in these pictures, in order to actually detect them in code and create visual art like this. Now, this was a lot. And I know pointers are generally considered to be among the more challenging features of C, and certainly, programming in general. So if you're feeling like it's been quite a bit, it was. But you do now have the ability, either today or in the very near term, to understand even XKCD comics like this that most any computer scientist out there has seen. So our final look for you, today, is on this joke here. And even though I can't necessarily hear you from afar, I'll just assume, in our final moments today, that everyone is breaking out into a very geeky laughter. And I see some smiles, at least, which is reassuring. This was, then, CS50. We'll see you next time. [MUSIC PLAYING]