[ Silence ] >> Alright. Welcome back to CS50. This is the end of week five, so no lecture on Monday because of the holiday, quiz zero is on Wednesday during lecture. Expect an email in the next day or two regarding the locations for quiz zero. We will separate you based on the first letter of your last name. So if you do not receive this email, just realize you should check the course's homepage before showing up anywhere on Wednesday. So the music you heard on your way in today is a bit of a special request. I received this email from one of your classmate yesterday. Hi, David. Could you play the Motion Sick 30 Lives, either the original version or the DDR remix. It would great whatever you think works better. Columbus Day is my second anniversary with my boyfriend Richard, and who's also in CS50 and is an avid gamer. So this song is sort of relevant. This would totally make our day. So, from CS50 to Joanna and Richard, Happy Second Anniversary. [ Cheering ] [ Applause ] >> So for those unfamiliar, this song, 30 lives, is actually poking fun at a game that I grew up with. It's referring to what's called the Konami code, which is the sequence of inputs on the original Nintendo controller whereby when starting this game called Contra, as well as some others, you could hit up, up, down, down, left, right, left, left, right, B, A, and you would immediately get 30 free lives with which to proceed in the game. And it's a funny thing, 30 plus years later, I've still remembered this code. But for those who've never seen the game, I thought I would pull up someone's playing there for just a moment here so that you too now know the sounds that I grew up with. [ Music ] >> Anyhow it's hard to justify too many seconds of that video. But that is what video games used to look like. Now, for those of you haven't really dug into the source code, the HTML as we'll call it in a few weeks, of the course's own homepage, it turns out that our site, like a lot, actually has a little Easter Egg in it, whereby if you go to cs50.net and hit up, up, down, down, left, right, left, left, right, B, A-- [ Laughter ] >> You indeed do get that. So-- [ Applause ] >> Anyhow, alright. So--oh, speaking of games, I got one more for you. So speaking of games, this is a popular cartoon called XKCD that you should not understand. [ Laughter ] >> So, a little bit of geek humor today. So the CS50 library, we finally started peeling back the layers that it is on Monday. And what I thought I'd do is draw your attention specifically to just a couple of the features thereof. It's not important that you understand exactly what every line of code in the CS50 library does right now. Towards semester's end, it's absolutely a set of code that you should be able to read through and understand. And even if you hit some line that you don't quite understand, you should know that you can type man foo on the command prompt to get the man page for some function foo, you can Google for some answer to a function. But by the semester's end, you should certainly be able to understand it at high level and then dive in deeper using the tools that you picked up along the way. So there are these--there are these types that are declared only in CS50's library. And everything seems to be slightly off there. Let me reset. There are two files in the CS50 library, the first one of which is called cs50.h. This is a header file, which if you've ever looked at, it has a whole lot of stuff, but it's mostly comments, and it's mostly the prototypes for all the functions in the .c file, the prototypes for getString, getInt, getLongLong and so forth. But if you've ever looked through here, you'll notice some new features that I'll just point out so that you seen them before. Well, there's this little construct up here. We've seen sharp define in the context of declaring constants. Well, that's kind of what we're doing here. We're defining the constants called _cs50_h. This is just kind of an old school trick to--whereby you define a constant that you can then check for later to make sure you're not including a header file multiple times. So it turns out you haven't needed to do this for your own programs because when you've used sudoku.h, the only file including it was sudoku.c. And same deal with helpers.h, the only file that was including it last week was helpers.c. But when you get to the point of writing programs whereby multiple files, might very well want to include the same header file, or you need to make sure that you only read it into memory once. Otherwise, you're gonna try doubly defining function prototypes, you're gonna try doubly defining constants and so forth. So this is just a little preprocessor director that essentially is an if condition written in a preprocessing language the GCC understands that says if you've already read this file before, don't do this again. So just tuck that away if curios. We happened to include a couple of libraries ourselves so that we can use certain features later on in the file. We happened to include a file called standardbull.h. It's actually a math file that this notion of true and false is defined, which isn't native to C, but it's just a synonym for 0 and 1, and then there it is. We learned typedef a couple of days ago. And typedef allows you to define your own types. Well, here is how we define the keyword string and it maps to Char Star. So the syntax is a little weird. Things are kind of maybe in reverse order than you'd expect and the star is on the string--next to string itself. But that line of code just says declare a keyword called string and henceforth, it's an alias for Char Star. And then the rest of the file is just comments and prototypes. Hey, here comes getChar, here comes getDouble, here comes getInt and so forth. So all this time, when you've been sharp including CS.h, you're telling GCC these are the prototypes for these functions 'cause they are in cs50.h, so don't fret if I, the programmer, start using getString, getInt and getFloat in my own code, because rest assured, they've been declared already in this header file. Now, what's in the C file? Well, the C file, cs50.c, has the implementations or definitions of all of these functions. At the top notice, we include itself, cs50.h. And then if we scroll down, we'll see what we saw Monday, namely getString. Let me go to the function implementation for getString. And recall that we do this little tricks in our code, whereby conceptually, we start with an empty buffer, thinking, well, maybe the user won't get us any input. But as soon as the users starts giving us input, we start growing this buffer by calling malloc, or its cousin, realloc, whereby we get more and more memory from the operating system to fit more and more characters from the user, but at every step, we take extra care never to exceed the boundaries of our own buffer, our own array of characters, because we, one, use that special function called fgetc, which gets one character from a file, F, from standard in, which is just a special keyword referring to the keyboard. And so, by doing this, it feels a little efficient, and arguably, it is. But by just inching our way along the user's input, we make sure that we always have enough storage space to receive it, so that we don't run into the so-called buffer overflow or overrun attack that we described last time. And it was this if condition we pointed out Monday, that you can look over in more detail if you'd like, that takes care of checking; do I have enough space in buffer. If not, let me double it, copy things over with realloc, and then proceed along my way. And there is actually an optimization toward the very end of this function which we didn't look at on Monday. There's a few lines of code that the comment says minimize buffer. So if you want to get better--a better understanding of how malloc works and string copy or stir N copy which copies specific number, N of bytes from array to another, you can read through this code, and even with the skills we've picked up in piece that's 3 and 4, can probably make sense of what it's doing. In this last few lines of code, it's essentially saying, if I completely overcompensated by doubling the size of the buffer, when really, I just needed one more character, well, this clean up process at the bottom of the function is where we say, alright, give me yet another buffer that's exactly the right size, copy the stuff from the original buffer into this new smaller array, and then free, using this function free, the original buffer. So again, I'm pointing out just some of the highlights of the code, not necessarily some of the specifics. But realize, at the end of this function, you have in memory exactly enough bytes plus one for what the user typed in. And what's that plus one for? [ Inaudible Remark ] >> That backlash 0, the terminating--the terminator--the sentinel value that says this is the end of the string. Now, we won't look at all of the functions 'cause they're pretty much identical in spirit. But we will look at getInt. It turns out, we realized, when writing this library, that you know what, almost everyone of this functions involves getting input from the user and then returning a different data type. Well, why not treat the user's input initially as just a string and then check; is this is an int? If so, return in int. Is this is a float? If so return a float. And so, we actually use a trick in getInt, in getFloat, in getLongLong, whereby we first used getString to get this user's input. So notice here, this is getInt, at the very top of this code, we call getString, and we put that string in a variable called line. And then down here, we use a new trick, but there's not all that much to get int. There's this function called sscanf for string scan F. >> The F means format. So this is another cousin of printf, much like fprintf was on Monday. So sscanf does this; you give it a string its first argument, for instance line, and line is what the user just typed in. You then give it a for a format string, which unlike printf tells the computer what to print, this time in the context of scanf, the format string tells the computer what to expect, at least ideally. So this format string is saying to scanf, hopefully, inside of this string called line, there is an int followed by maybe a character. So maybe the user typed the number then hit the spacebar, or hit the spacebar then a number. But in short, this allows us to tell scanf, hopefully, you're gonna get an int, and then you might also get a charge, just if the user is being--making a typo or being difficult. And so, at the end of this line, we check what the return value of scanf--sscanf. Sscanf tells you exactly how many of those placeholders, the percent signs, were actually filled with values from the keyboard. So what we're hoping is that the user actually only types in an integer, in which case only the percent D placeholder will be filled. And so, sscanf will return 1. That's a good thing. Now, if the user is being difficult or makes a mistake at the keyboard and types in 123X, well, we're gonna capture that X in percent C because it's not a digit, it's not a number, it's a character C. And so, sscanf, in that accidental scenario, is gonna instead return what? So, 2. And so, we can detect that, because 2 obviously doesn't equal 1. And so, if that happens, this [inaudible] block executes, we get rid off--we free the string, we free the memory that we just used to store the user's input, we yell at the user retry. So a lot of you have probably seen this retry prompt. This is where it comes from. And because we're in this intentionally infinite loop right now, the whole process repeats. So to summarize, we use this new function called sscanf that just allows us to parse, so to speak, the user's input and tees apart the various data types. And now, there is one new trick that I glossed over. The first argument is the string to parse. The second argument is the format string, what are ampersand N and ampersand C. [ Inaudible Remark ] >> So, yeah, these are the addresses. Ampersand means address if. That's the address of operator, and this returns the address of N and the address of C. Well, what are those? Well, it looks like--ah, here they are right here. These are the local variables on the stack--which is fine 'cause they're temporary--local variables called N and C, but I can't just do this as I've normally do with printf. I can't just do this, because if I omit the ampersands, what gets passed into a sscanf? [ Inaudible Remark ] >> Right, the values. So by default, when you pass arguments to a function, recall that you pass them by value, which means you get a copy. So sure, sscanf has now been passed to copy of N which has some garbage value cause I didn't initialize it. Sscanf has been passed a copy of C which also has some garbage value. But if sscanf doesn't know where those variables live in memory, doesn't know their addresses, he can change it all they want, but it's not permanent. Because as soon as sscanf returns, the changes are gonna be lost. So anytime you want to function in C to be able to mutate, to change the value inside of some variable like an int or a char, you can't just pass in the int or char like this. You actually have to pass in the address. And what that means, to pass the address and recall, is to pass in a pointer to each of these variables. And the ampersand handles all of the hard work for you. It figures out where in ram N and where ram C are, and then passes in those pointers. Yeah? [ Inaudible Remark ] >> Good question. If the user inputs only characters, will sscanf return one? No. It will return zero in that case, because it will not have been able to populate percent D first. So the percent sign are important here. And if we actually flip through the code for getFloat, getLongLong, they're almost identical functions, except we use different format strings, for instance, percent F for a float instead of percent D. So let's actually use this new trick, sscanf. It's not something you have to use all that often. But when we start to manipulate files as we will for the forensics P set later on this semester, when you have to read in lots of words for a dictionary, it's gonna be useful to understand exactly what's going on. And so, this is one file, scanf.c, which is in Monday's print out, it's from last week, but I thought I'd resurrect it because we didn't bother looking at it, this is a file called scanf1.c, and it's pretty darn short. It doesn't even include the CS50 Library. So that training wheel is off. It doesn't take any command-line arguments as suggested by void. It does have a local variable called X, then we print out something fluffy number please, and then we use scanf. So this time, I'm using scanf instead of sscanf, because whereas sscanf, string scanf, expects to be past the string, S--sorry--rather, scanf, without that additional S, reads the input from where presumably, from the keyboard directly. So we're not using getString. So, same idea, this function, but it takes one fewer parameters. The first one is gone. Instead, we just pass in a format string because the keyboard is assumed as in source for the input. So this is actually sort of the old fashioned way or the non-CS50 library way of just getting an int from the user. You call scanf, you pass in the format string of percent D, and you pass scanf ampersand X, where X again is just some local value, and voila, you have now stored a number inside of X. So let's go ahead and compile this. So make scanf1, go ahead and run scanf1, and I'm gonna type in 1, 2, 3, and thanks for the 1, 2, 3. So that in fact, worked. Now, let's go ahead and type A, B, C. Hit and--oh, something went wrong. So this is why, to paint a picture with a very simple example, we introduce the CS50 library early on, because this is--it looks simple. Frankly, this is a lot simpler than some of the code we've looked at, but its' so easily broken just by a difficult user. And so, the trick we use with sscanf, and two format strings and so forth allows us to detect errors for you and then trigger that retry prompt. Moreover, notice how dangerous functions like these are. This is scanf2.c. It's just as short, but this time, instead of using an int, I'm using a char star, aka string, but I'm calling it char star because I'm not using the CS50 library here this time. So a char star is not a buffer, and this variable name is what's intentionally misleading. Generally, a buffer is like an empty array, but an array that has some storage space. It might be--there might not be anything legit in it yet, but it's at least memory that's been allocated. But with this one of code, char start buffer, what has been allocated for me exactly? [ Inaudible Remark ] >> A little louder. [ Inaudible Remark ] >> So just a pointer. Only a pointer. The only thing that's been allocated by that line of code recall is something we always draw as a square that represents four bytes, or 32 bits of storage space for an array, for a string. What has not been allocated by that line of code is anything even resembling those actual storage spaces for the string itself. And so, let's see what I proceed to do with this. Well, I call printf and just say string please, then I call scanf percent S, and that's okay 'cause I want a string to come in from the user, and I am passing the address to--the address of a string to scanf. But what's gonna happen with this line of code, with scanf? Where is it gonna read this bytes from the user's keyboard into? Where are they gonna be put in memory? What--where--whatever is--recall that when you just declare a variable, who knows what this could be. This could be 9, 8, 7. This is just some random garbage value. So when you call scanf, pass in percent S, so here come the string, the user types in a string, and you say put the string at the address called buffer, well, you know where those bytes that the user types in is gonna go? At this address, in memory. But who knows what's actually at that address. And what almost always happens when you just put something wherever you want without regard for it actually being your own memory? [ Inaudible Remark ] >> So, segufault. So let's try this. So let's run scanf2 string please, hello there, segmentation fault. It's that easy, because I have now indeed, read in the string, hello there exclamation point, put it at some address, some bogus address, but there is no storage space, there is no chunk of memory, there is no array of characters waiting to receive that input. So what does it mean to over flow a buffer? Case and point. If I were more clever here and I knew exactly how to write some attack code with just my keyboard and I could write out the 0s and 1s essentially in the right way with ASCII, I could perhaps compromise this machine or trick the program into executing some code that's not expected, because recall, we did exactly that on Monday by just assuming that what's in argv1 is legitimate. And so, we could potentially trigger that thing we called a buffer overrun exploit. It's that easy unless you actually code defensively, and the CS50 library at least is all about coding defensively and protecting you from threats against some user, but also mistakes from you, the programmer. Yeah? [ Inaudible Remark ] >> It is. So that--is the pointer interpreted as the memory address? So, yes. We can actually see this as follows. Let me go ahead and load scanf2 again with my text editor, and let me go ahead and do this. Before I say string please, I'm gonna say I have a storage for a pointer at percent D. So I'm gonna put this pointer--I'm gonna print out the address that is currently in that buffer array--buffer variable. So let me go ahead and compile this, scanf2, let me run scanf2. So apparently, just by chance, the garbage value called buffer that we have is 2719732. So when I guess 987 before, I was a little off. >> That's the garbage value that happens to be in the computers' memory at this point in time. Now, what does that mean? When I call scanf, and I say put whatever the user types at the address in buffer, that's where it's gonna go, and odds are, if my program just started, I certainly have not called malloc, I certainly haven't asked for memory at that address, so who knows what's gonna be there, and bam, segmentation fault, because I've not been proactive in asking for memory. So, how to solve this? Well, anytime you want to actually get something from the user, well, we could actually do this; rather than allocate just the pointer, why don't we allocate a buffer with heck--a ridiculous amount of space so that the user's string, odds are will fit in there. But even this is not a safe assumption. What if the user knows that I only allocated a million or billion or whatever that is number of bytes for my string, what did they have to do in order to crash or compromise potentially my program? Just one more than that. So again, it's not enough to just allocate more space and try to avoid the problem. You need to code against it, as by checking the length, and again, if you look back and getString or getInt, those are the tricks that you use at least and see to code defensively against either mistakes or malicious inputs. Alright. So what's then the relevance of this? Well, thus far, we have been focusing on arrays. And an array is the first of several data structures we're actually gonna see in the course. We'll call that an array is really an improvement over Week 0 stuff, whereby if you wanted storage space for some value, you had to declare a variable in scratch or even in C. And then if you wanted storage space for two variables, you declare two variables. If you want three things in memory, three variables, four variables, five--and this very quickly started to get redundant, a whole lot of copy-paste starts happening quickly. And then also conceptually, if you wanted to start keeping track of a lot of like items, like someone's quiz scores, well, it seems completely ridiculous to have to have a quiz one variable, quiz two variable, quiz three variable, quiz four--right? Wouldn't it be nice to just have a quizzes variable that you could then access individual members of? And array allowed us to do that. Moreover, an array allows us random access. What does it mean for any array to give us random access to data? [ Inaudible Response ] >> So you can get anywhere in that array using this bracket notation, right? You don't have to start at the beginning of the array and check every value to see where you're going. You can just jump to location bracket 1, bracket 2, bracket N minus 1. So an array is nice in terms of efficiency because you can jump anywhere you want. This was useful--recall, for our sorting. When we had students up here on the stage and we needed to arbitrarily eject the person who is here, move them over here, and vice versa, well, we didn't have to constantly walk back and forth in the array spending all these additional steps. Rather, I just had to move this person to bracket 4, bracket 4 to bracket 0, or something like that. So an array is powerful and that it supports random access and it's fairly efficient, but what's at least one thing an array is not good at? What something an array is not good at? [ Inaudible Remark ] >> Inserting things into the middle, right? One of the first scenarios we ran into with the sorting algorithm was we had these 8 people up here, and we needed to put someone, say, here. So what was the solution? Well, the simple solution there was just eject one person and move them out of the way, or you could say, could you please move down this way? Right. In an array, you would have to move all of the elements because there is no space that's empty or free in the middle. And so, you have to actually spend time and spend the cycles to actually move things out of the way. Well, notice too, what getString really highlights a flaw within array. What's one of the really big problems with an array? It's--sorry? >> Finite Size. >> It's a finite size, which means you have to guess in advance, or predict in advance, how big chunk of memory you're gonna need. And you know what, if you're wrong, that's kind of expensive. Because if you again, read through the line in getString, we do call this function called realloc which takes an array and returns you a bigger version of that array. But underneath the hood, what's really going on is realloc is essentially calling malloc most of time, manually copying everything from the old array into the new bigger array as with the four loop, just iterating from over the array, left to right and copying from old to new, then it calls free on the original array, and it returns to the new array. Like, that's a lot of work. One, you have to contact the operating system, and in general, anytime you ask the operating system for help, give me more memory, you incur a penalty, a performance penalty, that normally you don't notice. But if you're doing this again and again and again, your program does start to run noticeably slower. And also, it's just incredibly inefficient. If you knew or had a hunch in the beginning that you were gonna need more than just a single byte or two bytes, why did you only ask me for one or two bytes? Why not ask for a million bytes and then you'll have enough space? Well, what's the down side of that approach? Well, then you end up wasting memory. So it's really hard to find the sweet spot of getting the right amount of memory, but not too much that you're wasting memory unnecessarily, and not too little that you're wasting time copying everything and reallocating memory all of the time. But thankfully, there is a solution to this problem. Suppose we have a data structure that looks a little something more like this. So here is--well, let's start calling a singly linked list, and this is actually the first of a much more interesting data structure that we now have the capabilities of designing ourselves. So this is just a pictorial representation of something that looks a lot like an array, but what does this thing seem to have in it? Well, each of these vertical rectangles has actually two components. One, apparently is an integer, the number 9, and the number 17, the number 22 and so forth, and then there is an arrow at the bottom of almost everyone of this vertical rectangles. What does that arrow presumably represent? So, an address or a pointer, right, arrows or just a pointer. And in fact, that's true. So it turns out, if one of your design goals is to actually have some dynamism in your data structures so that you can add things in the middle without having to move everyone down the stage, if you want to be able to expand the length of this thing without having to copy everything over to a new chunk of memory, well, maybe the solution is don't ask for memory that is back to back to back to back and then paint yourself into this corner. Rather, only ask for memory one node or one cell at a time, and then just stitch them together somehow. We now have pointers, we have the ability to have one variable point to another. So why don't we just include a pointer along with everyone of our pieces of data, or number 9, or number 17, and weave these data structures together as suggested by this arrows so that we can still walk from left to right, but we can also more easily insert something into the middle or into the end, or to the beginning. Now, to define a chunk of memory that has space for an integer like N and then a pointer called next, what piece of syntax do we need for Monday? [ Inaudible Remar k] >> Struct, right? So we introduce the notion of a struct on Monday. And at the time, we introduced struct for sort of a more compelling real world situation. We want to represent students, and students have IDs and houses and names and all of that. And so, rather than have three variables for one student, just felt a lot cleaner to have one variable for a student inside of which are these various fields. Well, this seems to suggest the same idea. I just need a structure of some sort of struct that contains what two data types, apparently? [ Inaudible Remark ] >> And int and-- [ Inaudible Remark ] >> And a pointer for some sort. Now, a pointer to what? Well, let me propose this. So it's a familiar syntax, even though it's little more--that's--actually, that's very familiar syntax. That's what we saw last time. So here is what we defined as a student. Let me propose this modification. So it's almost the same, but this time, I have an int called N instead of ID. I don't need a house and a name, but I do need a pointer. But because this time I'm not pointing at strings, I'm not pointing at a house, I'm not pointing at a name, I wanna point at another one of these structures, we need a slightly new piece of syntax which is this. Type defstruct, and then node. So I actually have to specify this word node or some keyword here because I wanna be able to reflexively refer to myself inside of the curly braces. So if you want a struct like this to contain an integer N and a pointer as suggested by the star, but that pointer needs to be able to point at, not a string, not a house, not a name, but at another such structure. The Syntax is to simply say very explicitly, this pointer, let's call it next, is going to be a pointer to a struct node. Now, this feels a little redundant here. But at the bottom, recall that the very--at the very end of a struct, just as we do with student, we give the whole structure a name, you don't need to say node down here, but it's a nice short hand notation, because if you don't include that name at the bottom, you then have to, henceforth, call your structure struct node in code. Every time you want one of these things, you have to struct node, struct to node, struct node, and it just gets very tedious. If you wanna just call these things nodes, you can--just like we did with student, redundantly saying node down here at the bottom outside the curly braces. So what's the takeaway? The takeaway is just that we have now implemented in C the code with which to implement anyone of these individual rectangles. So that begs the question, who cares? Well, when you actually store data on a hard drive, you probably don't know in advance how big that essay you're writing is going to be. You don't know how many pages it's going to be. When you check your email, you don't know how long an email you've just received, and therefore, you don't know in advance how much disk space is gonna be necessary if you download it locally to store that email. So it turns out on a file system, on Mac, in a PC, in Linux server, you have these things called linked lists being used all over the place to solve very common, very compelling problems, and among which is the implementation of a file on disks. So recall in Monday, we introduce that very simple example for file IO. IO, just this geek speak for input-output. >> So File IO refers to taking files as input and producing them as output. And we did that with that very simple example. As a little sanity check, what function do we use on Monday to write data to a file line by line? [ Inaudible Remark ] >> So we did use fopen to open and create the file, but then line by line? Fprintf. So instead of printf which prints to the screen by default, we used fprintf which prints to a file. So on Monday, we introduce this ability to actually store stuff on disk, which frankly, is just useful in general, 'cause otherwise, almost every program we were in is kind of--completely a femoral. You've quit running the program and bam, any data you've produced, any input you've provided disappears. But now, we have the ability to write things to disk, and certainly is this an omnipresent feature. So for problem set 5, which will--is not going up the door this week--you actually have the week off for the quiz, but will go out next week--you'll actually be handed a number of problems involving file IO. And one of them is going to involve recovering, as I predicted in week 0 or 1, recovery from some JPEGs that I am accidentally going to erase over the next several days, having taken them on campus with a digital camera. And the goal is gonna be to recover these files. Well, these files, thankfully, we're gonna simplify a little bit, and we're gonna make sure to wipe or to zero or to format or erase any number of synonyms here. We're gonna erase the compact flash card that this digital camera has, so that for convenience's sake, when we actually take these photos, even though the camera might not know exactly how big that photo is gonna be in advance based on the lighting and the colors we're trying to store, it's at least gonna store them from top to bottom, it--well, rather it's gonna store them contiguously as though they're an array. But this is not the norm. On your computer, you probably delete files all the time, either by dragging them to the trash or because you have temporary files that things like Microsoft Word are using. And so, hard drives become what's called fragmented. And how many of you have heard the term defragment your hard drive? How many of you actually do this? So if it's actually--reasonable number. Frankly, these days, it's probably not all that necessary, especially now that hard drives have gotten much faster, and they've also gotten much better at managing these details themselves. But linked lists, this topic we're gonna start now exploring, lend themselves to the implementation of files on disk, because suppose you do start writing some essay, and it's pretty short initially, so Microsoft Word creates a file on disk that's ye big, then you come back the next day and you add another chapter to your essay or your thesis. And so, now, you need this much more space, but you know what just happened? Between yesterday and today, you downloaded some MP3s from the internet, and where did they go? They went to the free space on your hard drive, which at that time was right there. So now, you have first chapter of your thesis here, you've got some illegal music here, and now, you need space for the rest of your thesis. Well, this would be unfortunate if Microsoft Word said, "Sorry, you should have finished yesterday." Right? So instead, the computer's got to find some space elsewhere. And that space, you know, it's probably over there. So this is why files become fragmented, because if you don't know a priori how much space you need just like in RAM with erase, well, thankfully, the file system, the operating system, is going to say, "Alright, I'm gonna put chapter 1 here, chapter 2 over here, but I'm going to implement the equivalent on disk of this thing called a pointer." A pointer is generally referred to in RAM only, but you can certainly implement the same idea on disks so that you can weave together big files, even though they are fragmented across multiple locations. Now, what are some of the real world implications of this, and why come up with such a dramatic picture as this one, CSI? Well, if you had files potentially all over your hard drive and wove in together just with these references, these pointers on disk, it makes it very easy for traces of files to be left all over the place. In fact--and this is more detailed than we need to spend time on today. In fact, when you ask an operating system for disk space, even if you need, let's say, 200 bytes for 200 keystrokes, for efficiency reasons, operating systems generally return you a chunk of memory in certain block sizes. You might actually get back 512 bytes just to store those 200 bytes. And the remaining 312, guess what happens to them, completely wasted. Because the operating system realizes, this probably isn't the common case. These days, people download movies and music and write big things. So having lots of small files isn't generally the common case. So this wastefulness doesn't happen all this time. But that wastefulness between the space you ask for and the space you're actually using, is called slack space. And so, in the world of forensics, digital forensics, CSI and these kinds of things, these are the kinds of places forensic or agents look for trying to find remnants of data. 'Cause what's really happening? Well, just to motivate next week's Pset, and also, perhaps send you scrambling to start deleting stuff from your hard drives properly, recall from week 1 that a hard drive is implemented in a bunch of different ways. These days, you have solid state drives which are flash memory. Idea is similar in sphere. The same things happen, 'cause most this is a software issue, not hardware issue, but I'll draw it old school like this floppy disk we tore apart a couple of weeks ago. Inside of a typical hard drive is a platter. It's a metal disk. And just like we saw in that cheesy video a couple of weeks ago, you have all of these bits somewhere on the hard drive, 0s and 1s, but those 0s and 1s are implemented how? A little magnetic particles, right? That if it's oriented this way, it's 1, if it's south north instead, it's a 0. So something arbitrary like that. Well, you probably know--and I think I might have mentioned earlier--that when you actually store files on a disk, they can--they end up somewhere, say, right there. That is some file, some JPEG that I've downloaded or photograph I've taken. But suppose then I delete this. Well, what does it mean to delete a file? We'll, recall that somewhere in your computer, somewhere in the operating system, there is some kind of table that essentially is like a two column excel spreadsheet where the first column is the file name and the second column is the location. And if this file is called foo.jpeg, it might be at location--who knows what this is? One, two, three. That is byte number 1, 2, 3 on the physical hard drive. So it's at location 1, 2, 3. Well, most of you probably know this now. What happens when you actually delete that file by dragging it to your recycle bin first or trash can? Yeah? [ Inaudible Remark ] >> Okay, perfect. So when you drag a file to your trash can or to your recycle bin, depending on what your operating system calls it, these 0s and 1s, they actually don't do anywhere, right? You certainly don't get rid of them. Otherwise, you'd have perpetually declining storage space. They actually stay there, but what happens is the computer just forgets where they are as by just erasing this row in this table. And in fact, the problem's a little worse. Most of you probably know that if you want to delete something, you can't just drag it to your recycle bin or trash can, because where is it at that point? [ Inaudible Remark ] >> It's in your recycle bin or your trash can. You actually have to go to the special menu and say empty recycle bin or empty trash. So it's in that second step when you actually proactively say delete this information, or you wait enough days or months that the computer just does this for you, that this in here gets erased. So when I accidentally format the compact flash card that has next week's photos on it, what's really gonna happen is all of the photos are still gonna be on that compact flash card, but the computer is gonna have no idea where they are. And so, what I'm gonna go ahead and do is make a forensic image of this compact flash card, which means plug it in to a special computer, run some special software that just blindly reads in all of the 0s and 1s from the flash memory which, even though its not circular and metallic, actually has zeros and ones implemented somehow on it, and what you're gonna get is a file that is, as they would use in police scenarios, a forensic copy of that media, so that you can then proceed to recover those JPEGs. And you'll see exactly how this is done in the spec itself, but it's going to amount to a strikingly easy process. And this piece that actually grew out of a real world scenario where three or five--some number of years ago when grad school, a friend of mine--this happened to a friend of mine. He didn't accidentally delete the files, but his compact flash card got corrupted somehow. And it was just, oh, there was something mechanical happened with the camera, so it just got a little corrupted. And so, the source code you guys are gonna write is exactly the code that he and I put together in order to recover his JPEGs several years ago. So there's easier ways to do this, right? You can probably just buy a piece of software to do this if you really need those photos back, but you'll be struct by think, by just how relatively straightforward it is once you understand the concepts. Now, as in the side, just to actually put some scare into folks, it is this easy to erase files from a disk. It is this easy to find remnants of files inside that stuff called slack space. So unfortunately, most operating system don't know how to do this very well. Apple has gone a bit better than has for instance, Windows. But if you've ever wondered what's--let's see, this thing here, top left, in a Mac, you have secure empty trash, what does this option do? These are the empty trash. [ Inaudible Remark ] >> Yeah. So what secure kind of--the Layman's way of explaining this, what that process actually does is it, yes, erases that, and it also doesn't quite erase these, but it turns them all into--what could you do? Just all 1s, or something like that, or all 0s, or maybe better yet, all random. But the problem is--and one of the--we're gonna have you read an academic paper of sorts wherein a buddy of mine from MIT a few years ago, where he actually spent a good deal of money on eBay and similar websites, bought up a whole bunch of hard drives that people had just sold because they didn't need them anymore, and most of these people were nontechnical, or were people who had put their trust in the best buys of the world to often recycle hard drives. But if you've ever read, very often, realize it's a lot cheaper for us to just throw out the drive hard drives than actually securely erase them. So this paper you'll read is meant to scare and it's meant to open your eyes to the fact that he found on hundreds of hard drives thousands of credit card numbers, financial documents, pornography, health document stuff that is may well be in some subset, on your own computers or your own old hard drives. >> So why don't we go ahead and take a five minute break, let that simmer. We'll come back. Okay. So we are back. This is a sneak preview of one of the files that is among your printouts for today. A lot of it is pretty straightforward, but it introduces some new syntax arrow notation. So before we dive into this, which at first glance can actually look a little overwhelming, even though if you take it step by step, it actually is gonna be very--can perfectly consistent with the verbal story we're about to tell. I thought I would first make this a little more human, for which, if you humor me, we need six volunteers. The first one gets to be first literally. [ Inaudible Remark ] >> You would like to be first? Alright. I need five more volunteers. You of course, need to be comfortable on camera here. How about--yeah, ideally five people we've not have before. Come on up. Let's do five new faces. Yeah, come on up. Alright. And I think we need one more. Don't yawn. Okay. The hand'll go up. Alright. And last, yeah, come on down. Let me see if we're out of paper yet. Alright. And if you could line yourselves up right here roughly where you are in the same order as the board. Let's see. Yeah, I think we're good. Yes. We have--no, we have a number for you, number nine. Alright, come on up. Yeah. Alright. So--no, don't follow me. Let's do it over here so we'll all stay in--within view. Okay. So if you wouldn't mind, first of all, order yourselves exactly as these five numbers on the board, and then go ahead and use your left hands as I pointer. We got a gap in our link list there. [ Inaudible Remark ] >> That's okay. Alright, much better. Alright. I'm gonna try to stand out of the way. And actually, go ahead and spread out just a little bit so that pointing is not awkward. So, few feet each of you. Alright. So now--no, [inaudible]--there we go. Okay. It is my first time with this demo too, actually. So now, go ahead, and with the arrows, implement with your left hand. Okay. Excellent. So yes, and you get to stand there with just your hand dangling. So we now have a linked list. And actually--this is actually frankly, I think a lot more easy than working through this in code, because we can actually manipulate the pointers and the values a little more easily. So let's first ask a question before this gets too awkward up here. One is, suppose now, I'm the program, and I'm storing these numbers, not in an array obviously, but as a linked list, and suppose I wanna search this list, and I wanna search for the number 50. How do I go about using a linked list, not necessarily in code, but conceptually finding the number 50 like we did two weeks ago up here? What do I do? I am the program. Okay, sure. [ Inaudible Remark ] >> Okay. So I'd--frankly, because we literally have these pointers pointing at the next element, the next element, I cannot use what fancy technique that we've using since week zero. [ Inaudible Remark ] >> So I can't use binary search or divide and conquer more generally because I have no way of just jumping to this middle of the list because I don't know where it is. The only thing I know about at this point in the story is, as the keyword first suggests, is the very first note in the list. So the means by which you implement in the--in actual code a linked list, is yes, you could remember every address of everyone of this so called nodes. How could you remember so many addresses? I could have an array of pointers, but that would completely defeat the point of this dynamism. So we don't wanna take a step backwards and implement the pointers, your left hands, with an array just so we can access them more quickly, 'cause again, we would be regressing back to an array itself. So I need more dynamism, but I only need one pointer. So first here is the only piece of data, 4 bytes, 32 bits, that I, the main function, actually need to hang on to in order to hang on to this whole list in memory, because as the hand suggests, I can get to any node now, so long as I remember just one of them. And perhaps obviously, I need to remember the first, because if I remember, say, the second, and if you're pointing--if you don't mind just over to number 17--I mean, this now, we've obviously orphaned poor number nine. So just--in terms of common sense, we need to start the list at the first node. So here we go. So go ahead and fix your pointer and your hair, and we have to now find the number 50. So I'm the program, I have a pointer here, I'm gonna need some new syntax in C to actually follow a pointer. On Monday, we use dot notation to get at the data member of a struct. These are structs, but we'll see--we're gonna use slightly new notation today. We're gonna use literally an arrow, a hyphen, and then an angled bracket so that it actually looks like an arrow in code, but more on that in a moment. So I'm gonna essentially follow this--these pointers just in a four loop, and I'll use some new syntax eventually to actually do this. But conceptually, it's one step here is this 50, is this 50, is this 50, is this 50, in this 50, and now, I see his left hand is not actually pointing. In fact, as the picture up there suggests, with the little antenna, he's null. His left hand is null at this point, as suggested by just dangling there. And so, that means we're at the end of the list. Alright. So at this point in the story I know 50 is not present in the list. So let's do something a little more interesting. Suppose I now want to--and you guys stay there--suppose I want to, let's say insert into this list. Come get a 7th volunteer? 55? Anyone? Yeah, come on up. Alright. So now, I have number 55. So I have just called malloc in class. Malloc int. alright. So now, we have--what's your name? >> Jacob. >> Jacob has been malloced, and now, we have another struct in memory, and he, as a malloced chunk of memory, I make sure to malloc, not as Jacob, but enough RAM for both his number and for his left hand. So now, we have what, 8 bytes; 4 bytes for the number, 4 bytes for the pointer. So Jacob here, like every one of these guys here, represents a struct whose total size is 8 bytes. Now, malloc just returns to me a chunk of memory. I can--let me borrow this back. I can very easily--as we'll see in code--put a number inside of the int inside of the structure. But now, I have to do something with his left hand, which right now, is just some garbage value. Who knows where it's pointing off to, right? Same story as before. So how do I do this? Well, malloc returned to me a chunk of memory, so I'm gonna go ahead and play the role of new pointer. So I am now point. I need to allocated myself in code as we'll see, just a new pointer, 32 bits that just remembers where Jacob is. So right now, his left hand--if you could just point off awkwardly--I am gonna point to him because I called malloc, but he--his pointer is just dangling somewhere. So let's figure out what to do with him. Well, let me go ahead and just move him over here, if you don't mind. Because if I want to now insert Jacob, number 55, into this list, what do I have to do? Well, first of all, I'm gonna keep around this pointer and remember where Jacob is, but then, just as before, I have to check. Alright. Does he belong--and stay here for just a moment. Does he belong here? Nine? No. Nine is greater than--55 is greater than 9. So I keep going, I keep going, I keep going, I keep going. Can you meet me at the end of the list? And 34, I have to keep going, but now, this pointer is null, so this actually is a pretty trivial when insertion. What does it take if--Jacob, you don't mind coming over here. So what is it gonna take mechanically to insert Jacob into the right part of the list? I now know he belongs here, what code--what lines of code do I need to implement? [ Inaudible Remark ] >> Yeah. Go ahead. What's your name? >> Mark. >> Mark. Okay. So what--you have the right inclination. What--it's just expressing words. >> In Array. >> So-- >> A pointer. >> So you know--you need to update your left hand, the null pointer at the moment, it's currently zero, need to point to Jacob. And now, what does Jacob need to do? Because this is just dangling. So what should be initialize these two? [ Inaudible Remark ] >. So null. So you can just point downward. So it was really three major steps. One, I had to do some four looping and find the right location. But if that didn't point in the story, I'll just had update 2 pointers. I had to update Mark's left hand, which was null. Now, he is pointing at Jacob, and Jacob had to make sure to initialize his left hand to null, because if he were just kind of pointing out into nowhere, what's the risk in the future? [ Inaudible Remark ] >> So there'll be another link in the list. But if he is just pointing at some garbage value, what's the implication? Well, it means next time, if I start traversing this list, trying to insert the number 65, I would go no, not here, not here, not here, not here, not here, not here, and then I'm just gonna fly off the end of this list going to wherever he is pointing, which is probably an invalid memory address, which means what's gonna happen? [ Inaudible Remark ] >> Segfault. Alright. Let's try one more, 'cause--at least one more 'cause it's gonna get a little trickier than this. Let's go ahead and insert--you guys can stay there. We're just gonna keep adding to the awkwardness up here. So I need someone for number 5. Alright. Come on down. What's your name? >> Kenny. >> Kenny. Malloc, Kenny. Alright. So I'm going to store in Kenny's N--the field's called N and his struct, the number 5. His left hand is just some garbage value right now. I'm gonna do the same story, so if you wanna just kind of tag along for a moment. I start off here at the beginning. The--my list is still at the beginning here. So I check the first element. Nine. So Kenny belongs before number 9 'cause he's number 5. So what do you propose we do to get him into the right place in the list? Okay. So--and what's your name again? >> Herb. >> Herb. Okay. So Herb is pointing now at Kenny-- [ Inaudible Remark ] >> But someone pushed back. [ Inaudible Remark ] >> Okay. So, right. So let's try that first though. So Herb is pointing at Kenny, but now--what's your name again? >> Karen. >> Karen. Who is pointing at Karen? We've lost Karen, right? We've just leaked this entire string of memory, this entire list of memory, because Herb pointed too soon at Kenny, because now, we've orphaned the rest of this list. Right? Case and point, memory leak. And this is a big memory leak. Now, we've lost the entire chain, and there is no way of getting back to this now. So let's fix. What do we have to do first instead? >> I will point at her. >> So Kenny is gonna point at Karen first. So now, it feels redundant. Now, it's like we've got two heads of this list, two starts of the list. >> So now, what do we do? So now, Herb just has to repoint himself, not at Karen, but at Kenny. Alright. So, one more. This one's gonna be a pain. Alright. So who would like to be the most difficult node, 20? Yes, in the back. Come on up. Alright, what's your name? >> Max. >> Max. Malloc, Max. Alright. Let's get this right the first time. Okay. Very nice. Okay. So here we go. I'm gonna do my four loop thing where I try to find the location in this list to insert this new value 20, and nope, doesn't go here. So I keep going, so I keep going, and oh, it belongs somewhere in here. I don't wanna keep going, 'cause what just happen if I went too far, what can I not do at this point? [ Inaudible Remark ] >> I can't go back, which means I'm not gonna be able to update--what's your name? >> Abby. >> I'm not gonna be able to update Abby's left hand, which means yes, I could insert Max here before-- >> Rachel. >> Rachel, but then, I can't fix the beginning of the list. So at this point--and this is gonna be the more--most complicated of the several in code--I--doing all the leg work here, I kind of need two pointers, my left and right hand, to point at both of these guys temporarily, as well as Max, so that we actually can do this correctly. It's only gonna be three steps, but order of operations is going to be important. So what do you propose that we do first to insert max, assuming I now am pointing at those two? [ Inaudible Remark ] >> Okay. Max is gonna point to Rachel. Okay. And Rachel is again is 22. Alright. So that's step one. Okay. So now, you--Abby? Abby, 17, points at Max--and now, conceptually, if you don't mind kind of shuffling along. Are we good? Good then? [ Inaudible Remark ] >> We'll see--it actually maps pretty well to see code. But it's a little more clear, believe it or not, to actually do it with humans, I think. So we'll do that in just a moment. Okay. So anymore updates needs to be done? Alright. So let's now make this even more difficult. I don't think we need any more humans. In fact, it's time to start deleting some of you guys. So let's go ahead and delete from the tale. So, number 20. Okay, Max--I'm sorry, your time is up already. So wait, don't just get out of the list. We need to now delete Max. So how do I delete the number 20? Well, I'm--be the four loop, so I'm gonna have to again, start at the beginning. And here's the downside, right? We're already paying the price for this dynamism and this flexibility of memory management. What's the cost that I keep incurring with every one of these algorithms? [ Inaudible Remark ] >> I have to traverse the whole list, right? It's as though we now have operations that used to be constant time, order of, O(1), where I could just jump right where I want in the list. Now, it seems that every one of these insertions or deletions is at least O(n), because in the worst case, we have to put someone at the very end of the list. So again, there's this theme of tradeoffs in the course and in CS in general, and here's another manifestation of that. So I wanna delete 20. So here we go. This is not 20, this is not 20, this is not 20, should I take the step? >> No. >> So not yet, 'cause if I do, I'm not gonna be able to update the list, and I'm again, gonna orphan part of this list. So now, I am at 17 and 20. I'm gonna go ahead and do what? Shall I free Max? >> No. >> Okay, why? [ Simultaneous Talking ] >> Right. So if I free Max--now take--just to manifest the--can you step back a bit? Suppose I've called the function free and I've passed it Max, as Max being a pointer to the structure now. Now, technically, if I don't do anything else in code, odds are Max's left hand is still pointing where? 'Cause free, just like in the world of file systems and forensics, doesn't actually reclaim that pointer and change it immediately, but it's now no longer safe to trust that his left hand is pointing at any legitimate value. So we better go ahead and undo this and make sure, if we're gonna delete Max, that we first remember where he's pointing. How do we do this? So Abby could point to him. But what just happened? [ Inaudible Remark ] >> Now, we just orphaned-- [ Inaudible Remark ] >> Well, not the rest of the list. Abby is now pointing to Rachel. But who did we orphan? So now, we orphaned Max. So now, we again have a memory leak, right? It's not as bad. We've just kind of lost track of Max, so you know, just--big deal, just one node. But it's still 8 bytes. And if we do this frankly, again and again, that's gonna start to add up. So this isn't good enough too. So you know, I'm gonna have to get involved here, right? I'm gonna have to have a special temporary pointer, maybe new pointer predecessor pointer, what--pred pointer is what I used in the picture there. So let me help out here. So I'm gonna point at Max, me being a special alloc--special local variable, that's a pointer. Now, Abby can go ahead and point as before. So now, the list is fixed, and now, I can call free on what I'm pointing at, and bam, Max disappears, stage left--hand round of applause for Max if we could. [ Applause ] >> Okay. So now, Max is free--actually, you can hang out here with us if you want. It's--we don't have to banish you. So Max has been freed, but the list has been updated, so let's do two more. So let's go ahead now and delete--let's do an easy one. Jacob? Alright, Jacob. I want to free Jacob. So I wanna free the node containing the number 55. So here we go. Four loop, nope, not 55, nope, nope, nope, nope, or O(n), found Jacob. And so, what do I do here? Free Jacob. But what's just happened? [ Inaudible Remark ] >> Now, Mark's left hand is again pointing at some garbage value. He is technically pointing at the memory previously occupied by Jacob. But as soon as we call free on Jacob, that memory might be get reused, so I can't take this N minus 1 step. I have to pause here, and before--if you can come back into the story for moment. Before I actually free Mark--free Jacob, I should probably point at him, then update what first? Mark's left hand. So now, your left hand goes down. I have still retained the pointer to Jacob. And so, now, I can call free on the pointer that I have retained. And so, now, Jacob is out of the story, but you can stay there if you want. So let's free one more just for good measure. So Kenny is number 5, you're about to be banished. So free Kenny, how do I do this? Well, technically, Herb is pointing at Kenny, so I can very easily find--okay, here it is, the number five. So now, let's make the same mistake. Free Kenny. So if you could step back. What's the problem, obviously? [ Inaudible Remark ] >> So what's that? So Kenny is now--well, Kenny has been freed. So technically, he is not orphaned. But who-- [ Simultaneous Talking ] >> The rest, yeah. So now, even though his left hand might, as an artifact, still be pointing at number 9 onward the rest of the list, it's no longer considered valid. So I can't touch that. So I need to again use this trick. I need to get involved with some temporary storage, point at 9, point at 5, and now, I can free Kenny. And now, what do I do here? Herb is still pointing--yeah, follow Kenny. Yeah. So Herb is still pointing at Kenny. And so, what do we need to do to fix the list? [ Inaudible Remark ] >> Now, we update Herb to point to 9 and 17, 22, and so forth. So frankly, it's--you know, it's easy to make mistakes as we did several times here, but it's only two or three operations, and let's go ahead and see. This actually worked at a lot better than I though it would. Let's go--a quick round of applause for these guys, and let's now translate this to code. [ Applause ] >> If you guys could see the TFs on the corner there, that would be great. You can keep these if you want as a souvenir. So let's actually try to translate this now to some actual implementation, and we'll actually see that even though we're focusing for today on this notion of a linked list, now that we have this power to stitch together arbitrary objects in memory, we can really start to do anything we want. So case and point, when you're implementing something, say, entrepreneurial in spirits, something like Google, something like Facebook, where you actually have some real world problems where you have lots of data very often these days, and you wanna be able to search it more effectively, and you're just throwing an array at all of the web pages in the world probably isn't going to solve that problem of finding them very easily. Facebook is still growing at a ridiculous rate, storing every Facebook user in a big array of memory, probably isn't it gonna cut it as well, but no matter. Now, we have this ability to represent a Facebook user with struct, a web page with struct. And using things like pointers, we can now weave these things together in any manner we see fit. And so, we'll start leveraging this all the more to come up with even more effective and clever designs. So let's go ahead and take a look at some code here. In the list 1.h, we have, as promised, the structure I need to use. So every time I malloc a student from the audience, we were essentially asking for this much space. It's 64bits or 8 bytes, the 8 bytes--the first 4 bytes being for the N, and then pointers at least on the cloud are 4 bytes as well or 32 bits. So every struct is probably 8 bytes total. That's how much space each of the humans on stage here represented, except for me who was some special main function, and Herb. How much space was Herb taking up at the very first of the list? So he was just 4 bytes, 'cause Herb was just a pointer, and that's why I intentionally chose yellow paper for him. He was a little bit different from everyone else, because he did not have storage space for an integer. He just pointed at the start of the list. And so, when you implement a linked list, it turns out, all you really need, at least in advance, is one node. So in this program, list 1.c, which is among next week's printouts, notice I've declared for convenience, one global variable, I've called it first, just like we called Herb first, it's a pointer to a node, so node start means pointer to a node. And so, this is how you translate Herb standing over here with his left hand pointing to actual C code. Now, it's initialized to know--which means when Herb first came up, he was like this. But now, I am gonna actually start doing something with this list. So this program is just a relatively simple program in terms of usage, that allows me to play with the linked list. And so, I needed a trivial menu. So what you are seeing now is the list 1 program running. I've just got an infinite wild loop listening for user input, 'cause I just wanted some mechanism whereby we could add and remove numbers from an actual linked list and see them. >> So I'm gonna go ahead and type 3 and hit enter. This is gonna trigger an if condition or a switch case in my main function that's gonna allow me to insert a number. Let me go ahead and insert the number, let's say number--who was the first number before. I'll try to make the same story come together. We had number 9 was the very start of the list. So let input 9, so the list is now 9. Let me go ahead and insert 1 more value, and the next value in the list was 17 before. So now, the list is 9 followed by 17. Let me do one more for good measure. Let's do 34 who was the end. And now, the list is this. So the program itself is not interesting to run or to play with, 'cause all it's gonna do is what you tell it. What is interesting here is to look at the code. So let me go ahead and open up--this is list 1.c, and let's see how an insertion actually works. Well, first, the main function. It's pretty does what I said. It's just got to do a wild loop which his common when you wanna get user input but sanity check it. I've got a menu that I just printed with printf., I'm using getInt to get the user's command, then I've got a switch statement here, and notice, this is a stylistic decision. In the past, you've probably seen us hit enter after the colon in a case, so that these things would take up multiple lines. But frankly, aesthetically, this is pretty pretty. And other--it is a little cleaner looking, this code. So there's no reason to insert line breaks as you might be always if these things can be written very nicely on one line, stylistic decision. Let me scroll down, and it looks like--okay. Apparently, here in the end is how I'm gonna free the linked list, but let's not get ahead of ourselves. So suppose I do type number 3 to insert an element. Let me scroll down to the insert function. Alright. So there's a lot of code here, but I'm gonna draw your attention to the highlights 'cause we'll return to this before long. So here is the insert function. Here is--in the first few lines of codes--where I try to instantiate the node for the number. So, when I stood up here and I said, "Malloc, Max." what happened were these 3 lines. I declare a local variable called new pointer. So this was me taking on the role recall of the new pointer 'cause I just needed one of my hands to be pointing at each new audience member. So now, I've called Malloc, size of node. So it's with this call here, Malloc size of node, that I get back those 8 bytes, where I got back Max and Jacob and other volunteers from the audience. So that's what's going on in these first few lines, but I'm doing a sanity check. If malloc returns null, something has gone wrong, probably I've asked for too many volunteers, we're out of memory, so I'd better bail at this point. And so, I arbitrarily just decided to return. I'm just not gonna try inserting. I'm not gonna inform the user. I'm just not gonna risk following a pointer that's null. So now, I initialize the node. So when I handed--when I handed someone their--actually, maybe I should hang on to some of the numbers. So when I handed the volunteer a number, that corresponds now to these several lines. So here is that new piece of syntax I promised. I first asked the user number to insert, I used getInt to get the int. So here, I was the one playing getInt 'cause I handed the piece of paper out. But here is the new syntax. New pointer at this moment of time, was pointing at the volunteer, at Max or Jacob or someone of our other volunteers. New pointer, arrow N means take this pointer, new pointer, follow it, and then dive into the field called N, and put whatever you want there, in this case, the return value of getInt. Meanwhile, follow new pointer, use this arrow annotation, go into its next field, and have each volunteer put their hand by their side. So those two steps; handing the human the number, and then having them put their left hand by their sides translates to these two lines that are involved this new piece of syntax N arrow notation. And frankly, this is finally the first piece of C syntax that would make conceptual sense. The arrow implies go there, point there. And so, we have something that's at least pretty reasonable this time. Alright, so now, what's going to happen next? Well, I first need to handle a few cases, and this is where it's not easy to code this for the first time because you have to consider all these possible cases, sometimes in a certain order, but I realize through trial and error, that the first thing I wanna do is check, does Herb actually point to anyone form the get-go? Or is he just standing here like this with his left hand down, in which case, he exist, my linked list exist, but it's of size 0. So with this line of code, I'm checking, is Herb's hand pointing down? If so, inserting Max or whoever has come up to the stage at this point is actually pretty easy, because I've already malloced Max or someone, I now have my new pointer pointing at him, so all I need to do is ask Herb, Herb, can you please point at this struct that I just allocated? And so, that's the first case here, and it's easy because there's so little work to be done. Well, let's try a slightly more interesting one. Suppose that the number I'm trying to insert into the list, let's say 22, so suppose that the new nodes, new pointers N field, which is 22, is less than the first pointers N field. Now, this might seem a little contradictory, because recall that Herb does not have what? [ Inaudible Remark ] >> Herb does--well, Herb does not have a value. He does not have N. Herb is just, recall, this special variable called first who's intentionally was written on yellow paper cause he's only 4 bytes, Herb does not have a value N, so it seems that we need to take this arrow literally. We don't have--we don't wanna assume that first itself is a node. First is a pointer to a node. So the fact that it says first arrow means follow the arrow, follow the left hand, and then you arrive at some chunk of memory, at that point you dive into the N field. So we had number nine standing here at the very beginning of the list. So when we followed Herb's first pointer, we then ended up in this node here and looked at the value on N. So it's okay that Herb himself was not a struct, just a pointer. You have to follow the arrow literally. Same deal here. New pointer--again, I chose this yellow intentionally when I malloced Max or someone from the audience, I'm just a single pointer pointing at this new chunk of memory, so this is consistent with that story. Now, how do we translate those hand updates into code? Well, my new--if the node I've just allocated from the audience be--is suppose to be the first node in the list because it's value is less than the current first value, what do I need to do? Well, if the new node I'm inserting actually belongs over here at the very start of the list, well, what do I need to do? I need to ask--I need to take my new node and point at the former start of the list. So this is when I inserted--it was Jacob, number 5. I might be confusing names. But this is when I inserted--no, Kenny--when we inserted 5 at the very start of the list, I had to say, Kenny, please point at the current start of the list, which is this first line of code, and then I asked Herb, hey, Herb, would you mind pointing now at Kenny? Hopefully, I got the names right there. Yeah? [ Inaudible Remark ] >> So, correct. I wanna do new pointer next gets first, because if I first updated the first pointer, what happened? That's when we orphaned the whole list. If we update first prematurely, we lose track of everyone else. So we wanna update the beginning of the list, the pointer called first, or Herb, at the very last moment so we don't orphan everyone else. So if we reverse these steps, we would actually incur the same bug that we did somewhat intentionally earlier. Now, this case here, I'm gonna wave my hand at some of it. This is when frankly, these steps gets a little annoying, and you only get it by thinking it through and trial and errors, starting with very small list. But in this frankly, when we did this with humans, it was pretty straightforward. It is actually pretty straightforward in code if you read through all the cases, but let's not dwell too long on this one because we'll revisit it. So what do I next? If it--if the node does not belong at the--there were two cases we handled already. If the list is not empty and the number doesn't belong in the beginning of the list, I am gonna go ahead and collapse the remaining two cases, middle and end, into one big list. So this wild loop here is--mounts to my--I called it four loop before. But this wild loop does the process of looking and looking and looking and looking for the right place in memory to insert this node, then I have a few cases. One--and we didn't even bother with his verbally. If there is a duplicate, I am gonna arbitrarily decide, Max, go back to your seat. We already have one of you in the least. We don't watch another duplicate in the list. But that's not a feature we had in the human example, but I do have it here on code. Else check for insertion at tail. So what does this mean? Well, this is the line of code that's interesting. For now, notice that pred pointer, assume that pred pointer is one of my hands that I was just taking for granted, pointing, pointing, pointing in each volunteer. If my hand starts pointing at node, what's the implication conceptually? I'm just at the end of the list, right? If I am now pointing at nothing, it means I went too far. But intuitively, then that means that the new node I've alloc--malloced from the audience must belong at the tail of the list. So this is why--and this part of what wave our hands out for today. This is why I have this notion for a pred pointer, previous pointer--new pointer and pred pointer are essentially my left and right hands. When I was doing this walking through the list, I was keeping track of where I was. I need to do that, so that in code, I actually remember where folks are. Let's take a quick look at find, if only because it's nice and straightforward now. So let's assume, even though we glossed over that last big case, which we will come back to before long. Suppose my goal in life now is just to find the number 50, or to find any node in a particular linked list. So here, I prompt the user, enter the number to find, I get the N from the users, store it in a variable called N, and now, notice this trick. So here is a very common approach with pointers--if you have some super special pointer, like this one called first, which again is global, and that's Herb. >> That's Herb pointing at any list we currently have in memory. I'm gonna just declare a pointer, PTR, that's initially identical to Herb's value, because I just need--like I did visually, I just need a finger to point at each of this nodes with, and we're gonna call that PTR here. Now why PTR is not null? So in other words, this is me making sure that I don't walk off the end of the list. If the current pointer, the thing I'm pointing at, arrow N, equals the N I'm looking for--and now notice, this is legit, even though I've given N the same name as this N, the fact that this guy is only accessible via the arrow, it's completely legitimate, I can also call this guy N, but I could have called him X or Y, just simple to go with the same. So if the current note I'm pointing at with my right hand equals, equals N, the value that the user typed in, just print out, found it, sleep for a second just for some silly animation, then break out of this. Else, if I did not find the value N, if I pointed at Kenny, and then again, and again, and again down the list, and I am not finding the number I'm looking for, how do I update this pointer? Well, this is kind of trippy here, but I actually do say, pointer, guess pointer next. So if right now, I'm pointing at a node, that's this rectangle, right? Inside this rectangle is a number and a pointer, and I am PTR pointing at this node, if I wanna move on to the next node, what do I set myself equal to? Well, I just grab this node that's at the bottom of this struct, and I say, I am going now point at whatever this thing is pointing at. And so, I'm now conceptually pointing at the next element. Now, this is just a linked list. Where can you actually take these things? Well, recall this example. So this is a photograph taken by one of the TFs at Mather House, of some trays. We keep referring to stacks of trays. So it turns out that besides linked lists, there's a data structure called a stack. It's coincidence that it's the same thing we've been calling this portion of RAM. But a stack is what's called a LIFO data structure, L-I-F-O, whereby the last element in, L-I, last in, is going to be the first element out, F-O. And so, we can now, using arrays and using pointers, can start to implement structures a little like this. There's this thing here taken by someone on--in gagdet.com, of a bunch of geeks lining up for the iPhone a couple of years ago, in Manhattan. So here, we have--I say that having an iPhone in my pocket--but here--but I wasn't in line--this was--I was not taking the photo. So this is what we generally call a queue. In the real world, you call this a queue or a line to. This is a FIFO data structure, First In First Out. So, queues in the real world tend to be much more appreciated by humans. We don't wanna get screwed over by being the first tray at the bottom of the stack, and the last one to actually leave the lining hall. But with these now, different concepts in the real world, can we start to now implement things in code. And that's where we will go after quiz one, all the more sophisticated data structures, no lecture on Monday, we will see you on Wednesday. [ Silence ] ==== Transcribed by Automatic Sync Technologies ====