[MUSIC PLAYING] [VIDEO PLAYBACK] -He's lying. -About what? -I don't know. -So what do we know? -That at 9:15, Ray Santoya was at the ATM. -Yeah. So the question is, what was he doing at 9:16? -Shooting the 9 millimeter at something. Maybe he saw the sniper. -Or was working with him. -Wait. Go back one. -What do you see? -Bring his face up full screen. -His glasses. -There's a reflection. -It's the Nuevitas baseball team. That's their logo. -And he's talking to whoever's wearing that jacket. [END PLAYBACK] DAVID MALAN: All right. This is CS50 and this is a bit more of [INAUDIBLE] with which you're dabbling with problem set four. Today we start to look a little more deeply at these things called pointers, which even though it's a pretty arcane topic, it turns out that it's going to be the means by which we can start building and assembling much more sophisticated programs. But we did it on last Wednesday by way of some claymation first. So this, recall, is Binky and we used him to take a look at a program that didn't really do anything interesting, but it did reveal a few problems. So to begin today, why don't we walk quickly through a few of these steps, try to distill into human's terms exactly what's going on here and why this is bad, and then move on and actually start building something with this technique? So these were the first two lines in this program and in layman's terms, what are these two lines doing? Someone who's reasonably comfortable with what's declared on the screen? What are these two lines doing? It's not all that different from week one, but there is some new special symbol. Yeah? Back there. AUDIENCE: Declaring pointers? DAVID MALAN: Say again? AUDIENCE: Declaring pointers? DAVID MALAN: Declaring pointers and let's refine it a little bit more. AUDIENCE: [INAUDIBLE] address x and then y. DAVID MALAN: And then address. So specifically what we're doing is we are declaring two variables. These variables, though, are going to be of type int star, which more specifically means they are going to store the address of an int, respectively, x and y. Now are there any values? Are there any actual addresses in these two variables at this point in time? No. It's just so-called garbage values. If you don't actually assign a variable, whatever was in RAM previously is going to fill with zeros and ones both of those variables. But we don't yet know what they are and that's going to be key to why Binky lost his head last week. So this was the claymation incarnation of this whereby you have just two variables, little circular pieces of clay, that can store variables, but as the wrapped up arrows suggest, they're not actually pointing to anywhere known per se. So then we had this line, and this was new last week, malloc for memory allocation, which is just a fancy way of telling the operating system, Linux or Mac OS or Windows, hey, give me some memory, and all you have to tell the operating system is what when asking it for memory. It's not going to care what you're going to do with it, but you do need to tell the operating system what by way of malloc. Yeah? AUDIENCE: How much? DAVID MALAN: How much? How much in bytes, and so, this, again, a contrived example, is just saying, give me the size of an int. Now, the size of an int is four bytes or 32 bits. So this is just a way of saying, hey, operating system, give me four bytes of memory that I can use at my disposal, and specifically, what does malloc return with respect to that chunk of four bytes? AUDIENCE: Address? DAVID MALAN: The address. The address of that chunk of four bytes. Exactly. And so that's what's stored ultimately in x and that's why we don't really care what the number of that address is, whether it's ox1 or ox2 or some cryptic hexadecimal address. We just care pictorially that that variable x is now pointing to that chunk of memory. So the arrow represents a pointer, or more specifically, a memory address. But again, we don't typically care what those actual addresses are. Now, this line says what in layman's terms? Star x gets 42 semicolon. What does this mean? You wanna go? Don't scratch your neck. AUDIENCE: The address of x is at the 42. DAVID MALAN: The address of x is at 42. Not quite. So close, but not quite, because there's the star that's prefixing this x. So we need to tweak a little bit. Yeah? AUDIENCE: The value that the pointer x is pointing to is 42. DAVID MALAN: OK. The value that the pointer x is pointing to, let's say, shall be 42, or put another way, the star x says, go to whatever address is in x, whether it's 1 Oxford Street or 33 Oxford Street or ox1 or ox33, whatever that numeric address is, star x is the dereferencing of x. So go to that address and then put the number 42 there. So that would be an equivalent way of saying that. So that's all fine and then we would represent the picture as follows where we've added the 42 to that chunk of four bytes on the right-hand side, but this line was where things went awry and Binky's head popped off at this point, because bad things happen when you dereference garbage values or you dereference invalid pointers, and I say invalid because at this point in the story, what is inside of y? What's the value of y based on the past few steps? Yeah? What's that? AUDIENCE: An address. DAVID MALAN: An address. It should be an address but have I initialized it? So I haven't yet. So what is known to be in there? It's just some garbage value. It could be any address from zero to 2 billion if you have two gigs of RAM, or zero to 4 billion if you've got four gigabytes of RAM. It's some garbage value, but the problem is that the operating system, if it hasn't given you that chunk of memory specifically that you're trying to go to, it's generally going to cause what we've seen as a segmentation fault. So in fact, any of you who have struggled at problems at office hours or in problems that's more generally with trying to figure out a segmentation fault, that generally means you're touching a segment of memory that you shouldn't be. You're touching memory that the operating system has not allowed you to touch, whether it's by going too far in your array or starting now, whether it's because you're touching memory that just is some garbage value. So doing star x here is sort of undefined behavior. You should never do it because odds are, the program's just going to crash, because you're saying, go to this address and you have no idea where that address actually is. So the operating system is likely going to crash your program as a result and indeed, that's what happened there to Binky. So ultimately, Binky fixed this problem with this. So that program itself was flawed. But if you sort of forge ahead and execute this line instead, y equals x just means whatever address is an x, also put it in y. And so pictorially, we've represented this with two arrows from x and from y pointing to the same place. So semantically, x is equal to y because both of those are storing the same address, ergo pointing at 42, and now, when you say star y, go to the address in y, this has an interesting side effect. So the address in y is the same thing as the address in x. So if you say go to the address in y and change the value to 13, who else is affected? X is, point D, so to speak, should be affected as well. And indeed, how Nick drew this picture in claymation was exactly that. Even though we follow the pointer y, we ended up in the same place, and so if we were to print out x or y's pointee, then we would see the value of 13. Now, I say pointee to be consistent with the video. Programmers, to my knowledge, never actually say the word pointee, that which is pointed at, but for consistency with the video, realize that's all that was meant in that situation. So any questions on claymation or pointers or malloc just yet? No? All right. So without further ado, let's take a look at where this has actually been used for some time. So we've had this CS50 library that's got all of these functions. We've used GetInt a lot, GetString, probably GetLongLong earlier in my PSet one or so, but what's actually been going on? Well, let's take a quick look underneath the hood at a program that inspires why we give you the CS50 library, and indeed as of last week, we started taking those training wheels off. So this is now sorted of a postmortem of what has been going on inside the CS50 library, even though we now will start moving away from it for most programs. So this is a program called scanf 0. It's super short. It just has these lines, but it introduces a function called scanf that we're actually going to see in a moment inside of the CS50 library, albeit in a slightly different form. So this program on line 16 is declaring a variable x. So give me four bytes for an int. It's been telling user, number please, and then this is an interesting line that actually ties together last week and this. Scanf, and then notice it takes a format string, just like printf, %i means an int, and then it takes a second argument which looks a little funky. It's ampersand x, and to recall, we only saw this once last week. What does ampersand x represent? What does ampersand do in C? Yeah? AUDIENCE: The address of. DAVID MALAN: The address of. So it's the opposite of the star operator, whereas the star operator says, go to this address, the ampersand operator says, figure out the address of this variable, and so this is key, because scanf's purpose in life is to scan the user's input from the keyboard, depending on whatever he or she types, and then read that user's input into a variable, but we saw in the past two weeks that that swap function that we tried effortlessly to implement was just broken. Recall that with the swap function, if we just declared A and B as ints, we did successfully swap the two variables inside of swap just like with the milk and OJ, but as soon as swap returned, what was the result with respect to x and y, the original values? Nothing. Yeah. Nothing happened that time, because swaps change only its local copies, which is to say, all this time, whenever we've been passing in arguments to functions, we're just passing copies of those arguments. You can do with that whatever you want with them, but they're going to have no effect on the original values. So this is problematic if you want to have a function like scanf in life, whose purpose is to scan the user's input from the keyboard and then fill in the blanks, so to speak, that is, give a variable like x a value, because if I were to just pass x to scanf, if you consider the logic of last week, scanf can do whatever it wants with a copy of x, but it couldn't permanently change x unless we give scanf a treasure map, so to speak, where x marks the spot, whereby we pass in the address of x so that scanf can go there and actually change the value of x. And so indeed, all that this program does if I make scanf 0, in my source 5m directory, make scanf 0, dot slash scanf, number please 50, thanks for the 50. So it's not all that interesting, but what's indeed happening is that as soon as I call scanf here, the value of x is being permanently changed. Now, this seems nice and good, and in fact, it seems like we don't really need the CS50 library at all anymore. For instance, let's run this once more here. Let me reopen it for a second. Let's try a number please and instead of saying 50 like before, let's just say no. OK, that's a little weird. OK. And just some nonsense here. So it doesn't seem to handle erroneous situations. So we need to minimally start adding some error-checking to make sure that the user has typed in an actual number like 50, because apparently typing words isn't detected as problematic, but it probably should be. Let's look at this version now that's my attempt to reimplement GetString. If scanf has all this functionality built in, why have we been dabbling with these training wheels like GetString? Well, here is perhaps my own simple version of GetString whereby a week ago, I might have said, give me a string and call it buffer. Today, I'm going to start just saying char star, which, recall, it's just synonymous. It looks scarier but it's the exact same thing. So give me a variable called buffer that's going to store a string, tell the user string please, and then, just like before, let's try to borrow this lesson scanf %s this time and then pass in buffer. Now, a quick sanity check. Why am I not saying ampersand buffer this time? Infer from the previous example. AUDIENCE: Char star is a pointer. DAVID MALAN: Exactly, because this time, char star is already a pointer, an address, by definition of that star being there. And if scanf expects an address, it suffices just to pass in buffer. I don't need to say ampersand buffer. For the curious, you could do something like this. It would have different meaning. This would give you a pointer to a pointer, which is actually a valid thing in C, but for now, let's keep it simple and keep the story consistent. I'm just going to pass in buffer and that's correct. The problem though is this. Let me go ahead and run this program after compiling it. Make scanf 1. Damn it, my compiler's catching my error. Give me one second. Clang. Let's say scanf-1.c. OK. There we go. I need it. CS50 ID has various configuration settings that protect you against yourself. I needed to disable those by running clang manually this time. So string please. I'm going to go ahead and type in my favorite hello world. OK, null. That's not what I typed. So it's indicative of something being wrong. Let me go ahead and type in a really long string. Thanks for the null and I don't know if I'm going to be able to crash it. Let's try a little copy paste and see if this helps. Just paste a lot of this. It's definitely a bigger string than usual. Let's just really write it. No. Damn it. Command not found. So that's unrelated. That's because I pasted some bad characters, but this turns out is not going to work. Let's try this once more, because it's more fun if we actually crash it. Let's type this and now, I'm going to copy a really long string and now let's see if we can crash this thing. Notice I omitted spaces and new lines and semicolons and all funky characters. Enter. And now the network's just being slow. I held down Command-V too long, clearly. Damn it! Command not found. OK. Well, the point is nonetheless the following. So what is actually going on with this declaration of char star buffer on line 16? So what am I getting when I declare a pointer? All I'm getting is a four byte value called buffer, but what's inside of it at the moment? It's just some garbage value. Because any time you declare a variable in C, it's just some garbage value, and we're starting to trip over this reality. Now, when I tell scanf, go to this address and put whatever the user types in. If the user types in hello world, well, where do I put it? Buffer is a garbage value. So that's kind of like an arrow that's pointing who knows where. Maybe it's pointing right here in my memory. And so when the user types in hello world, the program tries to put the string hello world backslash 0 in that chunk of memory. But with high probability, but clearly not 100% probability, the computer is going to then crash the program because this is not memory I should be allowed to touch. So in short, this program is flawed for exactly that reason. I'm fundamentally not doing what? What steps have I omitted, just like we omitted with Binky's first example? Yeah? AUDIENCE: Memory allocation? DAVID MALAN: Memory allocation. I haven't actually allocated any memory for that string. So we can fix this in a couple of ways. One, we can keep it simple and in fact, now you're going to start to see a blurring of the lines between what an array is, what a string is, what a char star is, what an array of chars is. Here's a second example involving strings and notice all I've done on line 16 is, instead of saying that buffer is going to be a char star, a pointer to a chunk of memory, I'm going to very proactively give myself a buffer for 16 characters, and in fact, if you're familiar with the term buffering, probably from the world of videos, where a video is buffering, buffering, buffering. Well, what's the connection here? Well, Inside of YouTube and inside of video players generally is an array that's bigger than 16. It might be an array of size one megabyte, maybe 10 megabytes, and into that array does your browser download a whole bunch of bytes, a whole bunch of megabytes of video, and the video player, YouTube's or whoever's, starts reading the bytes from that array, and any time you see the word buffering, buffering, that means the player has gotten to the end of that array. The network is so slow that it hasn't refilled the array with more bytes and so you're out of bits to display to the user. So buffer is an apt term here in that it's just an array, a chunk of memory. And this will fix it because it turns out that you can treat arrays as though they are addresses, even though buffer is just a symbol, it's a sequence of characters, buffer, that's useful for me, the programmer, you can pass its name around as though it were a pointer, as though it were the address of a chunk of memory for 16 chars. So that's to say, I can pass the scanf exactly that word and so now, if I make this program, make scanf 2, dot slash scanf 2, and type in hello world, Enter, that time-- Hmm, what happened? String please. What did I do wrong? Hello world, buffer. Hello world. Ah, I know what it's doing. OK. So it's reading up until the first space. So let's cheat for just a moment and say I just wanted to type something really long like this is a long sentence that's one, two, three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16. OK. It is indeed a long sentence. So this sentence is longer than 16 characters and so when I hit Enter, what's going to happen? Well, in this case of the story, I had declared buffer to actually being an array with 16 chars ready to go. So one, two, three, four, five, six, seven, eight, nine, 10, 11, 12, 13, 14, 15, 16. So 16 characters, and now, when I read in something like this is a long sentence, what's going to happen is that I'm going to read in this is a long S-E-N-T-E-N-C-E, sentence. So this is deliberately a bad thing that I keep writing beyond the boundaries of my array, beyond the boundaries of my buffer. I could get lucky and the program will keep on running and not care, but generally speaking, this will indeed crash my program, and it is a bug in my code the moment I step beyond the boundaries of that array, because I don't know if it's necessarily going to crash or if I'm just going to get lucky. So this is problematic because in this case, it does seem to work and let's tempt fate here, even though the IDE seems to tolerate quite a bit of-- There we go. Finally. So I'm the only one that can see this. So I just had a lot of fun typing out a really long actual phrase that it certainly exceeded 16 bytes, because I typed in this crazy long multi-line phrase, and then notice what happened. The program tried printing it and then got a segmentation fault and segmentation faults is when something like this happens and the operating system says no, cannot touch that memory. We're going to kill the program altogether. So this seems problematic. I've improved the program whereby at least have some memory, but this would seem to confine the function GetString to getting strings of some finite length 16. So if you want to support longer sentences than 16 characters, what do you do? Well, you can increase the size of this buffer to 32 or that seems kind of short. Why don't we just make it 1,000 but push back. What's the response intuitively of just avoiding this problem by making my buffer bigger, like 1,000 chars? By implementing GetString this way. What's good or bad here? Yeah? AUDIENCE: If you bind up a lot of space and you don't use it, then you can't reallocate that space. DAVID MALAN: Absolutely. It's wasteful insofar as if you don't actually need 900 of those bytes and yet you're asking for 1,000 in total anyway, you're just consuming more memory on the user's computer than you need to, and after all, some of you've already encountered in life that when you're running lots of programs and they're eating up lots of memory, this can actually impact performance and the user's experience on the computer. So that's kind of a lazy solution, for sure, and conversely, it's not only wasteful, what problem still remains, even if I make my buffer 1,000? Yeah? AUDIENCE: The string is length 1,001. DAVID MALAN: Exactly. If your string is length 1,001, you have the exact same problem, and by my argument, I would just then make it 2000, but you don't know in advance how big it should be, and yet, I do have to compile my program before letting people use and download it. So this is exactly the kind of stuff that the CS50 library tries to help us with and we'll only glance at some of the underlying implementation here, but this is CS50 dot C. This is the file that's been on CS50 IDE all these weeks that you've been using. It's pre-compiled and you've been using it automatically by nature of having the dash L CS50 flag with clang, but if I scroll down through all of these functions, here's GetString, and just to give you a taste of what's going on, let's take a quick look at the relative complexity. It's not a super long function, but we didn't have to think all hard about how to go about getting strings. So here's my buffer and I apparently initialize it to null. This, of course, is the same thing as char star, but I decided in implementing the CS50 library that if we're going to be completely dynamic, I don't know in advance how big of a string users are going to want to get. So I'm going to start with just an empty string and I'm going to build up as much memory as I need to fit the user string and if I don't have enough, I'm going to ask the operating system for more memory. I'm going to move their string into a bigger chunk of memory and I'm going to release or free the insufficiently large chunk of memory and we're just going to do this iteratively. So a quick glance, here's just a variable with which I'm going to keep track of the capacity of my buffer. How many bytes can I fit? Here's a variable n with which I'm going to keep track of how many bytes are actually in the buffer or that the user has typed. If you've not seen this before, you can specify that a variable like an int is unsigned, which as the name suggests, means it's non-negative, and why would I ever want to bother specifying that an int isn't just an int, but it's an unsigned int? It's a non-negative int. What does the [INAUDIBLE] mean? AUDIENCE: It's describing an amount of memory that can be [INAUDIBLE]. DAVID MALAN: Yeah. So if I say unsigned, this is actually giving you one bit of extra memory and it seems kind of silly, but if you have one bit of additional memory, that means you have twice as many values you can represent, because it can be a 0 or a 1. So by default, an int can be roughly negative 2 billion all the way up to positive 2 billion. Those are big ranges, but it's still kind of wasteful if you only care about sizes, which just intuitively should be non-negative or positive or 0, well then, why are you wasting 2 billion possible values for negative numbers if you're never going to use them? So by saying unsigned, now my int can be between 0 and roughly 4 billion. So here's just an int C for reasons we won't get into just now as to why it's an int instead of a char, but here is the gist of what's going on, and some of you might be using, for instance, the fgetc function even in PSet four or thereafter, we'll see it again in problem set five, fgetc is nice because as the name kind of, sort of arcanely suggests, it's a function that gets a character and so, what's fundamentally different about what we're doing in GetString is we're not using scanf in the same way. We are just creeping along step-by-step over whatever the user has typed in, because we can always allocate one char, and so we can always safely look at one char at a time, and the magic starts to happen here. I'm going to scroll down to the middle of this function just to introduce briefly this function. Much like there's a malloc function, there's a realloc function where realloc lets you reallocate a chunk of memory and make it bigger or smaller. So long story short and with a wave of my hand for today, know that what GetString is doing is it's sort of magically growing or shrinking the buffer as the user types in his or her string. So if the user types a short string, this code only allocates enough memory to fit the string. If the user keeps typing as I did it again and again and again, well, if the buffer's initially this big and the program realizes, to wait a minute, I'm out of space, it's going to double the size of the buffer and then double the size of the buffer and the code that does the doubling, if we look at it here, it's just this clever one-liner. You might not have seen this syntax before, but if you say star equals, this is the same thing as saying capacity times 2. So it just keeps doubling the capacity of the buffer and then telling realloc to give itself that much more memory. Now, as an aside, there are other functions in here that we won't look into any detail other than to show in GetInt, we use GetString in GetInt. We check that it's not null, which, recall, is the special value that means something went wrong. We're out of memory. Better check for that. And we return a sentinel value. But I'll defer to the comments as to why and then we use this cousin of scanf called sscanf and it turns out that sscanf, or string scanf, lets you take a look at the line that the user has typed in and let you analyze it essentially and what I'm doing here is I'm telling sscanf, analyze whatever the user has typed in and make sure %i, there is an integer in it, and we won't get into today exactly why there's also a %c here, but that in a nutshell allows us to detect if the user has typed in something bogus after the number. So the reason that GetInt and GetString tell you to retry, retry, retry is because of all of that code we've written, it's kind of looking at the user's input in making sure it's entirely numeric or it's an actual floating point value or the like, depending on what value function you're using. Whew. OK. That was a mouthful but the point here is that the reason we had those training wheels on is because at the lowest level, there is just so many things that can go wrong that we wanted to preemptively handle those things certainly in the earliest weeks of the class, but now with PSet four and PSet five and beyond will you see that it's more unto you but also you're more capable of solving those kinds of problems yourself. Any questions on GetString or GetInt? Yeah? AUDIENCE: Why would you double the capacity of the buffer rather than just increasing it by the exact amount? DAVID MALAN: Good question. Why would we double the capacity of the buffer as opposed to just increasing it by some constant value? It was a design decision. We just decided that because it tends to be a little expensive time-wise to ask the operating system for memory, we didn't want to end up getting into a situation for big strings that we were asking the OS again and again and again and again in rapid succession for memory. So we just decided, somewhat arbitrarily but we hope reasonably, that, you know what, let's try to get ahead of ourselves and just keep doubling it so that we minimize the amount of times we have to call malloc or realloc, but a total judgment call in the absence of knowing what users might want to type in. Both ways could be arguable. Arguably good. So let's take a look at a couple of other side effects of memory, things that can go wrong and tools that you can use to catch these kinds of mistakes. It turns out all of you, even though check50 has not told you as much, have been writing buggy code since week one, even if all check50 tests are passed, and even if you and your TF are super confident that your code works as intended. Your code has been buggy or flawed in that all of you, in using the CS50 library, have been leaking memory. You've been asking the operating system for memory in most of the programs you've written, but you've never actually given it back. You've called GetString and GetInt and GetFloat, but with GetString, you've never called unGetString or Give String Back or the like, but we've seen that GetString does allocate memory by way of malloc or this function realloc, which is just very similar in spirit, and yet, we've been asking the operating system for memory and memory again and again but never giving it back. Now, as an aside, it turns out that when a program quits, all of the memory is automatically freed. So it's not been a huge deal. It's not going to break the IDE or slow things down, but when programs do generally leak memory and they're running for a long time. If you've ever seen the stupid little beach ball in Mac OS or the hourglass on Windows where it's kind of slowing down or thinking or thinking or just really starts to slow to a crawl, it very possibly could be the result of a memory leak. The programmers who wrote the software you're using ask the operating system for memory every few minutes, every hour. But if you're running the software, even if it's minimized in your computer for hours or days on end, you might be asking for more and more memory and never actually using it and so your code might be, or programs might be leaking memory, and if you start to leak memory, there's less memory for other programs, and the effect is to slow everything down. Now, this is by far one of the most atrocious programs you will have opportunities to run in CS50 insofar as its output is even more esoteric than clang's or make's or any of the command line programs we've run before but thankfully, embedded in its output is some super helpful tips that will be useful either for PSet four or certainly PSet five. So valgrind is a tool that can be used to look for memory leaks in your program. It's relatively simple to run. You run valgrind and then, even though it's a little verbose, dash dash leak check equals full, and then dot slash and the name of your program. So valgrind will then run your program and at the very end of your program running before it quits and gives you another prompt, it's going to analyze your program while it's been running and tell you did you leak any memory and better yet, did you touch memory that didn't belong to you? It can't catch everything, but it's pretty good at catching most things. So here's an example of my having run this program, having run valgrind, on a program called memory, and I'm going to highlight the lines that are ultimately of interest to us. So there's even more distractions that I've deleted from the slide. But let's just see what this program is capable of telling us. It's capable of telling us things like invalid write of size 4. In other words, if you touch memory, specifically 4 bytes of memory that you shouldn't have, valgrind can tell you that. Invalid write of size 4. You touched four bytes that you shouldn't have. Where did you do that? This is the beauty. Memory dot c line 21 is where you screwed up and that's why it's helpful. Much like GDB, it can help point you at the actual error. Now, this one's a little more verbose, if not confusing. 40 bytes in 1 blocks are definitely lost in loss record 1 of 1. What does that mean? Well, it just means you asked for 40 bytes and you never gave it back. You called malloc or you called GetString and the operating system gave you 40 bytes, but you never freed or released that memory, and to be fair, we've never show you how to give back memory. Turns out there's a super simple function called free. Takes one argument, the thing you want to free or give back, but 40 bytes, apparently, in this program have been lost at line 20 of memory dot c. So let's see this program. It's super useless. It only demonstrates this particular error. So let's take a look. Here is main and main, notice, calls a function called f and then returns. So not all that interesting. What does f do? Notice I didn't bother with a prototype. I wanted to keep the code as minimal as possible. So I put f above main and that's fine, certainly, for short programs like this. So f does not return anything and does not take anything, but it does do this. It declares, much like in the Binky example, a pointer called x that's going to store the address of an int. So that's the left-hand side. In English, what is the right-hand side doing? Anyone? What is this doing for us? Yeah? AUDIENCE: [INAUDIBLE] times the size of an int which is 10 times that [INAUDIBLE] DAVID MALAN: Good and let me summarize. So allocate enough space for 10 integers or 10, what's the size of an int, it's four bytes, so 10 times 4 is 40, so that right-hand side that I've highlighted is give me 40 bytes and store the address of the first byte into x. And now lastly, and here's where this program is buggy, what's wrong with line 21 based on that logic? What's wrong with line 21? Yeah? AUDIENCE: You can't index into x [INAUDIBLE]. DAVID MALAN: Yeah. I shouldn't index into x like that. So syntactically, that's OK. What's nice is, much like you can treat the name of an array as though it's a pointer, similarly can you treat a pointer as though it's an array, and so I can syntactically say x bracket something, x bracket i, but the 10 is problematic. Why? AUDIENCE: Because it's not inside. DAVID MALAN: It's not inside that chunk of memory. What's the largest value I should be putting in those square brackets? 9, 0 through 9. Because of zero indexing. So 0 through 9 would be fine. Bracket 10 is not good and but, recall though, every time I seem to try to make CS50 IDE crash by typing in bogus values, it doesn't always cooperate, and indeed, you often get lucky just because the operating system doesn't notice that you ever so slightly pass some chunk of memory, because you stayed within technically your segment, but more on that in an operating systems class, and so something like this could very easily go undetected. Your program's never going to crash consistently but maybe once in awhile. And so let's try valgrind on this, and here's where we'll get overwhelmed by the output momentarily. So make memory valgrind leak check equals full dot slash memory. And here's why I promise this would overwhelm. Here's what valgrind, here's what a programmer, some years ago- decided it would be a good idea for the output to look like. So let's make sense of this. So all the way on the left-hand side for no good reason is the process ID of the program we just run, the unique identifier for the program we just ran. We deleted that from the slide, but there is some useful information in here. Let's scroll up to the very top. Here's where we began. So it's not all that much output. Here's that invalid write of size 4 on line 21. Well, what was line 21? Line 21 was exactly this and it makes sense that I'm in validly writing 4 bytes because I'm trying to put this integer, which could be anything, it just happens to be zero, but I'm trying to put it at a location that doesn't belong to me. Moreover, down here, 40 bytes in one blocks are definitely lost in record 1. That's because when I call malloc here, I never actually free the memory. So how can we fix this? Let me go ahead and be a little safer and do 9 there and let me here free x. This is the new function for today. If I now rerun make memory dot slash, let's run valgrind on it again, maximize my window and hit Enter. Now, it's good. They bury the good news in all of this output. All heap blocks were free. We'll come back to what the heap is, but no leaks are possible. So this is just another tool for your tool kit with which you can start to find now errors like that. But let's see what more can go wrong here. Let's transition now to actually solving a problem. As an aside, if this will relieve a little bit of confusion or tension, this is now funny. Yeah. That's pretty good. Because pointers are addresses and addresses are generally by convention written with hexadecimal. Ha, ha, this is funny now. Anyhow, so let's now actually solve a problem. This has been super, super low-level thus far, and we can actually do useful things with these low-level details. So we introduced a few weeks ago the notion of an array. An array was nice because it's hard to clean up our code because if we wanted to write a program with multiple students or multiple names and houses and dorms and colleges and all of that, we could store everything more cleanly inside of an array. But propose one downside of an array thus far. Even if you've not suffered it yourself in a program, just instinctively, what is a bad thing about an array, perhaps? I hear some murmurs. AUDIENCE: It's difficult to change the size. DAVID MALAN: It's difficult to change the size. You can't change the size of an array, in fact, per se in C. You can allocate another array, move everything from the old one into the new, and now have some extra space, but it's not like a language like Java or Python or any number of other languages with which some of you might be familiar where you can just keep adding things ad nauseam to the end of an array. When you have an array of size 6, that is its size, and so much like the idea earlier having a buffer of a certain size, you have to guess out of the gate what size do you want it to be? If you guess too big, you're wasting space. If you guess too small, you can't store that data, at least without a lot more work. So today, thanks to pointers, we can start stitching together our own custom data structures, and in fact, here is something that looks a little more cryptic at first glance, but this is what we'll call a linked list, and its name kind of summarizes it. It's a list of numbers, or in this case, a list of numbers, but it could be a list of anything, but it's linked together by way of arrows, and just take a guess with what technique are we going to be able to stitch together, sort of like popcorn with a thread, a linked lists rectangles here? Its numbers? What's the underlying language feature? AUDIENCE: A pointer. DAVID MALAN: A pointer. So each of these arrows here represents a pointer or just an address. So in other words, if I want to store a list of numbers, I can't just store it if I want the ability to grow and shrink my data structure in an array. So I need to have a little more sophistication, but notice that this picture kind of suggests that if you've just got little threads connecting everything together, probably isn't that hard to make space in between two of those rectangles or two of those nodes, as we'll start calling them, put in a new node, and then with some new thread, just ditch the three nodes together, the first one, the last one, and the one that you just inserted into the middle. And indeed a linked list, unlike an array, is dynamic. It can grow and it can shrink and you don't have to know or care in advance how much data you're going to be storing, but it turns out we have to be a little careful about how to implement this. So first let's consider how we implement one of these little rectangles. It's easy to implement an int. You just say int n and then you get 4 bytes for an int, but how do I get an int, call it n, and then a pointer, let's call it next. We could call these things anything we want but I need a custom data structure. Yeah? AUDIENCE: Ampersand [INAUDIBLE]. DAVID MALAN: So ampersand we will use to get the address of a node potentially. But we need another feature of C in order to give me the ability to create this custom rectangle, this custom variable if you will, in memory. AUDIENCE: A struct. DAVID MALAN: A struct. Recall from last week, we introduced struct, this relatively simple keyword that lets us make things like this. C did not come with a data structure called student. It comes with int and float and char and such, but it doesn't come with student, but we can create a student data type, a student structure, with this syntax here. And you'll see this again and again. So don't worry about memorizing the keywords, but the keyword that's important is just the fact that we said struct and then we called it student and inside of the student was a name and a house or a dorm or the like. And so now today, let's propose this. I've added a few words, but if I want to implement this rectangle that's got both an int and a pointer, you know what, I'm going to declare a struct called node. I'm also, inside of it, going to say that a node, this rectangle, has an int and we'll call it n and it has a next pointer. And this is a little verbose, but if you think about it, the arrows that were in the picture a moment ago are of what data type? Where each of those arrows is pointing to what type of data structure? It's not pointing just to an int per se. It's pointing to the whole rectangular thing and that rectangular thing, we said, is called a node. And so we kind of have to recursively define this such that a node, we shall say, will contain an int called n and a pointer called next and the type of data structure to which that pointer points is apparently going to be struct node. So this is annoyingly verbose and just to be pedantic, the reason why we can't just say this, which frankly looks a lot more readable, is because recall that C read things top to bottom, left to right. It's not until we get the semicolon that the keyword node actually exists. So if we want to have this sort of cyclical reference inside of the data structure, we have to do this, where we say struct node at the top, which gives us a longer way of describing this thing, then inside we say struct node, and then at the very last line we say, all right, C, by the way, just call this whole damn thing a node and stop using the keyword struct altogether. So this is just sort of a syntactic trick that ultimately lets us create something that looks exactly like this. So if we assume now we can implement this thing in C, how do we actually start traversing this? Well, in fact, all we have to do is iterate from left to right and just kind of insert nodes or delete nodes or search for things wherever we want, but to do this, let's go ahead and make things a little more real because this has been super low-level thus far. Would anyone literally like to be first? OK. Come on up. What's your name? DAVID: David. DAVID MALAN: David. Nice to meet you. Me too. All right. And we need a number 9. Not as good as first, perhaps. OK, number 9. A number 17, please. Let me go back a little farther. Number 22, please, and how about farther back if I can see any hands with all the light or no. Someone's being volunteered right there. Do you want to come up? Your forearm is forcibly going up. OK, 17. 22. 26 is coming down. Would anyone else like to forcefully-- Come on up. An actual volunteer. So very quickly, if you guys could arrange yourselves just like the nodes on the screen. Thank you. And you'll be 26. All right and quick introductions. So I'm David and you are also? DAVID: David. DAVID MALAN: And you are? JAKE: Jake. SUE: Sue. ALEX: Alex. RAPHAEL: Raphael. TAYLOR: Taylor. DAVID MALAN: Taylor. Excellent. So these are our volunteers for today and go ahead and shift a little that way, and just go ahead and keep holding your numbers as you are or your first sign and using your left hand, go ahead and just implement these arrows, just so that your left hand is literally pointing at whatever you should point at, and give yourself some room so that we can visually see your arms actually pointing, and you can just point sort of at the ground is fine. So here we have a linked list of one, two, three, four, five nodes initially, and notice we have this special pointer at the beginning who's key because we have to keep track of the whole length list somehow. These guys, even though they're left to right, back to back in memory, they can actually be anywhere in the computer's memory. So these guys could be standing anywhere on the stage and that's fine, so long as they're actually pointing at one another, but to keep things clean and simple, we'll just draw them left to right like this, but there could be massive gaps in between those nodes. Now, if I want to actually insert some new value, let's go ahead and do this. We have an opportunity now to choose another node. Say let's start off with mallocing 55. Would someone mind being malloc? OK, come on up. What's your name? RAINBOW: Rainbow. DAVID MALAN: Rainbow? All right. Malloc Rainbow. Come on up. So now we have to ask ourselves algorithmically where we can put 55. So all of us know, obviously, where she probably belongs if we're trying to keep this sorted and if you guys could take one step back so we don't fall off the stage, that would be great. So actually, Rainbow, start over here with me, because we as the computer now can only see one variable at a time. So if this is the first node. Notice he's not a node, he's just a pointer, and that's why he's drawn to be only the size of a pointer, not one of those full rectangles. So we're going to check at each iteration is 55 less than 9? No. Is 55 less than 17? No. Less than 22? Less than 26? Less than 34? And so now, obviously Rainbow belongs at the end. So to be clear, and what was your name, Taylor? TAYLOR: Taylor. DAVID MALAN: So among Taylor's left hand and Rainbow's hands here, whose hand needs to point at what in order to insert 55 into this list? What do we need to do? Yeah? AUDIENCE: Taylor's hand needs to point left. DAVID MALAN: Exactly. So inserting a node into the end of the list is pretty simple because Taylor just has to point, instead of at the ground or we'll call it null, null is sort of the absence of a pointer or a special zero pointer, you're going to point with your left hand at Rainbow and then Rainbow, where should your left hand probably point? Down. It's not good if her hand is sort of pointing off here or sort of any which way. That would be considered a garbage value, but if she points to some known value, we'll call it zero or null, that's OK because we have a term in this and we know the list now is complete. So what's another relatively simple case? Could we malloc 5? Come on up. What's your name? TIFFANY: Tiffany. DAVID MALAN: I'm sorry? TIFFANY: Tiffany. DAVID MALAN: Tiffany. All right. Tiffany has been malloced with the value 5. Come on up. This one's relatively easy too, but let's consider order of operations now. It was pretty easy with Taylor at the end. Number 5 is of course less than 9, and so we have David, we have Tiffany, and what was your name? JAKE: Jake. DAVID MALAN: Jake. Tiffany, Jake, and David. Whose hand should be updated first? What do you want to do here? There's a couple possible ways, but there's also one or more wrong ways. AUDIENCE: Start with leftmost. DAVID MALAN: Start with the leftmost. Who's the leftmost here then? AUDIENCE: First. DAVID MALAN: OK. So start with first and where do you want to update David's hands to be? AUDIENCE: Towards the 5. DAVID MALAN: OK. So David, point at five or Tiffany here, and now? AUDIENCE: Tiffany points to the 9? DAVID MALAN: Perfect, except Binky's head just kind of fell off, right? Because what's wrong with this picture literally? AUDIENCE: Nothing is pointing. DAVID MALAN: Nothing is pointing to Jake now. We've literally orphaned 9 and 17, and we've literally leaked all of this memory, because by updating David's hand first, that's fine insofar as it's correctly pointing at Tiffany now, but if no one had the foresight to point at Jake, then we've lost the entirety of that list. So let's undo. So that was a good thing to trip over but let's correct now. What should we do first instead? Yeah? AUDIENCE: Tiffany should point at the 9? DAVID MALAN: I can't get that close to you. Who should point at the 9? AUDIENCE: Tiffany. DAVID MALAN: All right. So Tiffany should first point at the 9. So Tiffany should take on an identical value to David, which seems redundant for a moment, but that's fine because now, second step, we can update David's hand to point at Tiffany, and then if we just kind of clean things up as though this is kind of spring-like, now that's a correct insertion. So excellent. So now we're almost there. Let's insert one final value like the value 20. If we could malloc one final volunteer? Come on up. So this one's a little more tricky. But really, the code we're writing, albeit verbally, is just like having a bunch of if conditions now, right? We had a condition checking if it belongs at the end, maybe the beginning. We need some kind of loop to find the spot in the middle. So let's do that with what's your name? ERIC: Eric. DAVID MALAN: Eric? Eric. Nice to meet you. So we have 20. Less than five? No. Less than nine? No. Less than 17? No. OK. He belongs here and your names again are? SUE: Sue. DAVID MALAN: Sue. ALEX: Alex. DAVID MALAN: Sue, Alex, and? ERIC: Eric. DAVID MALAN: Eric. Whose hands need to get updated first? AUDIENCE: Eric. OK. So Eric's should point at where? At 22. Good. And now what's next? Sue can then point at Eric and now, if you guys just make some room, which is fine visually, now we've done the insertion. So let's now consider a question but thank you so much for our volunteers. Very well done. You can keep those, if you like. And we have a lovely parting gift if you'd each like to take a stress ball. Let me just pass this down. So what is the takeaway of this? This seems to be amazing insofar as we have now introduced an alternative to an array that is not so confined to an array of some fixed size. They can grow dynamically. But much like we've seen in weeks past, we never get anything for free, like surely there's a trade-off here. So with an upside of a linked list, is this dynamism? This ability to grow and frankly, we could have done delete and we could shrink as needed. What price are we paying? Twice as much space, first of all. If you look at the picture, no longer am I storing a list of integers. I'm storing a list of integers plus pointers. So I'm doubling the amount of space. Now, maybe that's not such a big deal 4 bytes, 8 bytes, but it could certainly add up for large data sets. What's another downside? Yeah? AUDIENCE: We have to traverse them one-by-one. DAVID MALAN: Yeah. We have to traverse them one-by-one. You know what, we gave up this super convenient feature of square bracket notation, more properly known as random access, where we can just jump to an individual element but now if I still had my volunteers here, if I wanted to find the number 22, I can't just jump to bracket something something. I have to look over the list, much like our searching examples linearly, to find the number 22. So we seem to have paid a price there. But we can nonetheless solve other problems. In fact, let me introduce just a couple of visuals. So if you've been down to Mather's Dining Hall recently, you'll recall that their stacks of trays like this, we borrowed these from Annenberg before class. So this stack of trays, though, is representative actually of a computer science data structure. There is a data structure in computer science known as a stack which very nicely lends itself to exactly this visual. So if each of these trays isn't a tray but like a number and I wanted to store numbers, I could put one down here, and I could put another down here, and continue stacking numbers on top of one another, and what's potentially helpful about this is that what's the implication of this data structure? Which number can I pull out first most conveniently? The most recently one put on there. So this is what we would call in computer science a LIFO data structure. Last in, first out. And we'll see before long why that might be useful but for now, just consider the property. And it's kind of stupid if you think about how the dining hall does it. Every time they clean trays and put the freshest ones on top, you could have a previously clean but eventually very dirty and dusty tray at the very bottom if you never actually get to the bottom of that stack, because you just keep putting the new and the clean ones on top of it. The same thing might happen in a supermarket too. If you have a display case of milk and every time CVS or whoever gets more milk, you just shove the milks you already have to the back and you put the new ones up front, you're going to have some pretty nasty milk at the end of the data structure, because it's always at the bottom or equivalently it's always at the back. But there's another way to think about lining up data and for instance, this. If you're one of those people who likes to line up outside of Apple stores when a new product comes out, you're probably not using a stack data structure because you would alienate everyone else who is lining up to purchase some new toy. Rather, you're probably using what kind of data structure or what kind of system in the real world? Hopefully it's a line, or more properly or more British-like, a queue. And it turns out a queue is also a data structure in computer science, but a queue has a very different property. It's not LIFO. Last in, first out. God forbid. It's instead FIFO. First in, first out. And that's a good thing for fairness' sake certainly when you're lining up super early in the morning. If you get there first, you want to get out first as well. And so all of these data structures, queues and stacks and bunches of others, turns out you can think of this as just an array. This is an array, maybe a fixed size 4, but it'd be kind of nice if we could just pile trays almost infinitely tall if we have that many trays or numbers. So maybe we want to use a linked list here, but the trade-off is going to be potentially that we need more memory, takes a little more time, but we don't limit the height of the stack, much like Mather's display case might limit the size of the stack, and so these are design decisions or options available to us ultimately. So with these data structures, we've started seeing new upper bounds potentially on what previously was super fast and where we'll leave off today and where we'll hope to get to is on Wednesday, we'll start to look at a data structure that lets us search through data in log end time again. And we saw that, recall, in week zero and one with binary search or divide and conquer. It's coming back and better yet, the holy grail for this Wednesday will be to come up with the data structure that runs truly or theoretically in constant time, whereby it doesn't matter how many millions or billions of things we have in the data structure, it will take us constant time, maybe one step or two steps or 10 steps, but constant numbers of steps to search through that data structure. That indeed will be the holy grail but more on that on Wednesday. See ya then. [MUSIC PLAYING]