[MUSIC PLAYING] DOUG LLOYD: OK, so at this point in the course, we've covered a lot of the basics of C. We know a lot about variables, arrays, pointers, all that good stuff. Those are all sort of built in to see as the fundamentals, but we can do more, right? We can combine things together in interesting ways. And so let's do that, let's start to branch out of what C gives us, and start to create our own data structures using these building blocks together to do something really valuable, useful. One way we can do this is to talk about collections. So so far we've had one kind of data structure for representing collections of like values, similar values. That would be an array. We have collections of integers, or collections of characters and so on. Structures are also sort of a data structure for collecting information, but it's not for collecting like values. It usually mixes different data types together inside of a single box. But it's not itself used to chain together or connect together similar items, like an array. Arrays are great for element look up, but recall that it's very difficult to insert into an array, unless we're inserting at the very end of that array. And the best example I have for that is insertion sort. If you recall our video on insertion sort, there was a lot of expense involved in having to pick up elements, and shift them out of the way to fit something into the middle of your array. Arrays also suffer from another problem, which is inflexibility. When we declare an array, we get one shot at it. We get to say, I want this many elements. Might be 100, it might be 1,000, it might be x where x is a number that the user gave us at a prompt or at the command line. But we only get one shot at it, we don't get to then say oh, actually I needed 101, or I needed x plus 20. Too late, we've already declared the array, and if we want to get 101 or x plus 20, we have to declare an entirely different array, copy all the elements of the array over, and then we have enough. And what if we are wrong again, what if we actually need 102, or x plus 40, we have to do this again. So they're very inflexible for resizing our data, but if we combine together some of the basics that we've already learned about pointers and structures, in particular using dynamic memory allocation with malloc, we can put these pieces together to create a new data structure-- a singly linked list we might say-- that allows us to grow and shrink a collection of values and we won't have any wasted space. So again, we call this idea, this notion, a linked list. In particular, in this video we're talking about singly linked list, and then another video we'll talk about doubly linked lists, which is just a variation on a theme here. But a singly linked list is comprised of nodes, nodes just being an abstract term-- it's just something I'm calling that's a kind of structure, basically, I'm? Just going to call it a node-- and this node has two members, or two fields. It has data, usually an integer, a character float, or could be some other data type that you've defined with a type def. And it contains a pointer to another node of the same type. So we have two things inside of this node, data and a pointer to another node. And if you start to visualize this, you can think about it like a chain of nodes that are connected together. We have the first node, it contains data, and a pointer to the second node, which contains data, and a pointer to the third node. And so that's why we call it a linked list, they're linked together. What does this special node structure look like? Well, if you recall from our video on defining custom types, with type def, we can define a structure-- and type define a structure like this. tyepdef struct sllist, and then I'm using the word value here arbitrarily to indicate any data type really. You could pass on an integer or float, you could have whatever you want. It's not restricted to just integers, or anything like that. So value is just an arbitrary data type, and then a pointer to another node of the same type. Now, there's a little catch here with defining a structure when it's a self referential structure. I have to have a temporary name for my structure. At the end of the day I clearly want to call it sll node, that's ultimately the new name part of my type definition, but I can't use sll node in the middle of this. The reason being, I haven't created a type called sll node until I hit this final point here. Up until that point, I have to have another way to refer to this data type. And this is a self referential data type. It;s a data type of a structure that contains a data, and a pointer to another structure of the same type. So I need to be able to refer to this data type at least temporarily, so giving it a temporary name of struct sllist allows me to then say I want a pointer to another struct sllist, a struct sllist star, and then after I've completed the definition, I can now call this type an sll node. So that's why you see there's a temporary name here, but a permanent name here. Sometimes you might see definitions of structure, for example, that aren't self referential, that don't have a specifier name here. It would just say typedef struct, open curly brace and then define it. But if you're struct is self referential, as this one is, you need to specify a temporary type name. But ultimately, now that we've done this, we can just refer to these nodes, these units, as sll nodes for purposes of the rest of this video. All right, so we know how to create a linked list node. We know how to define a linked list node. Now, if we're going to start using them to collect information, there's a couple of operations we need to understand and work with. We need to know how to create a linked list out of thin air. If there's no list already, we want to start one. So we need to be able to create a linked list, we need to probably search through the link list to find an element we're looking for. We need to be able to insert new things into the list, we want our list to be able to grow. And similarly, we want to be able to delete things from our list, we want our list to be able to shrink. And at the end of our programs, especially if you recall that we're dynamically allocating memory to build these lists typically, we want to free all of that memory when we're done working with it. And so we need to be able to delete an entire linked list in one fail swoop. So let's go through some of these operations and how we might visualize them, talking in pseudocode code specifically. So we want to create a linked list, so maybe we want to define a function with this prototype. sll node star, create, and I'm passing in one argument, some arbitrary data type again, of some arbitrary data type. But I'm returning-- this function should return to me a pointer, to a singly linked list node. Again, we're trying to create a linked list out of thin air, so I need a pointer to that list when I'm done. So what are the steps involved here? Well, first thing I'm going to do is dynamically allocate space for a new node. Again, we're creating it out of thin air, so we need to malloc space for it. And of course, immediately after we malloc, we always check to make sure that our pointer-- we did not get back null. Because if we try and deference a null pointer, we're going to suffer a segfault and we don't want that. Then we want to fill in the field, we want to initialize the value field and initialize the next field. And then we want to-- eventually as the function prototype indicates-- we want to return a pointer to an sll node. So what make this look like visually? Well, first we're going to dynamically allocate space for a new sll node, so we malloc-- that's a visual representation of the node we just created. And we check to make sure it's not null-- in this case, the picture wouldn't have shown up if it was null, we would have run out of memory, so we're good to go there. So now we're on to step C, initialize the nodes value field. Well, based on this function call I'm using here, looks like I want to pass in 6, so I'll 6 in the value field. Now, initialize the next field. Well, what am I going to do there, there is nothing next, right, this is the only thing in the list. So what's the next thing in the list? It shouldn't point to anything, right. There's nothing else there, so what is the concept we know of that's nothing-- pointers to nothing? It should be maybe we want to put a null pointer there, and I'll represent the null pointer as just a red box, we can't go any further. As we'll see a little later on, we will have eventually chains of arrows connecting these nodes together, but when you hit the red box, that's null, we can't go any further, that's the end of the list. And lastly, we just want to return a pointer to this node. So we'll call it new, and will return new so it can be used in whatever function created it. So there we go, We've created a singly linked list node out of thin air, and now we have a list we can work with. Now, let's say we already have a large chain, and we want to find something in it. And we want a function that's going to return true or false, depending on whether a value exists in that list. A function prototype, or declaration for that function, might look like this-- bool find, and then we want to pass in two arguments. The first, is a pointer to the first element of the linked list. This is actually something you'll always want to keep track of, and actually might be something that you even put in a global variable. Once you create a list, you always, always want to keep track of the very first element of the list. That way you can refer to all the other elements by just following the chain, without having to keep pointers intact to every single element. You only need to keep track of the first one if they're all chained together. And then the second thing we're passing in again is arbitrarily some-- whatever data type we're looking for there is inside of hopefully one of the nodes in the list. So what are the steps? Well, the first thing we do is we create a transversal pointer pointing to the lists head. Well, why do we do that, we already have a pointer at the lists head, why don't we just move that one around? Well, like I just said, it's really important for us to always keep track of the very first element in the list. And so it's actually better to create a duplicate of that, and use that to move around so we never accidentally move away, or we always have a pointer at some point that is right on the first element of the list. So it's better to create a second one that we use to move. Then we just compare whether the value field at that node is what we're looking for, and if it's not, we just move to the next node. And we keep doing that over, and over, and over, until we either find the element, or we hit null-- we've reached the end of the list and it isn't there. This should hopefully ring a bell to you as just linear search, we're just replicating it in a singly linked list structure instead of using an array to do it. So here's an example of a singly linked list. This one consists of five nodes, and we have a pointer to the head of the list, which is called list. The first thing we want to do is again, create that traversal pointer. So we have now two pointers that point to the same thing. Now, notice here also, I didn't have to malloc any space for trav. I didn't say trav equals malloc something, that node already exists, that space in memory already exists. So all I'm actually doing is creating another pointer to it. I'm not mallocing an additional space, just have now two pointers pointing to the same thing. So is 2 what I'm looking for? Well, no, so instead I'm going to move to the next one. So basically I would say, trav equals trav next. Is 3 what I'm looking for, no. So I continue to go through, until eventually get to 6 which is what I'm looking for based on the function call I have at the top there, and so I'm done. Now, what if the element I'm looking for isn't in the list, is it still going to work? Well, notice that the list here is subtly different, and this is another thing that's important with linked lists, you don't have to preserve them in any particular order. You can if you want, but you may have already noticed that we're not keeping track of what number element we are at. And that's sort of one trade that we have with linked list verses arrays, is it we don't have random access anymore. We can't just say, I want to go to the 0th element, or the 6th element of my array, which I can do in an array. I can't say I want to go to the 0th element, or the 6th element, or the 25th element of my linked list, there's no index associated with them. And so it doesn't really matter if we preserve our list in order. If you want to you certainly can, but there's no reason why they need to be preserved in any order. So again, let's try and find 6 in this list. Well, we start at the beginning, we don't find 6, and then we continue not finding 6, until we eventually get to here. So right now trav points to the node containing 8, and six is not in there. So the next step would be to go to the next pointer, so say trav equals trav next. Well, trav next, indicated by the red box there, is null. So there's nowhere else to go, and so at this point we can conclude that we've reached the end of the linked list, and 6 isn't in there. And it would be returned false in this case. OK, how do we insert a new node into the linked list? So we've been able to create a linked list out of nowhere, but we probably want to build a chain and not create a bunch of distinct lists. We want to have one list that has a bunch of nodes in it, not a bunch of lists with a single node. So we can't just keep using the Create function we defined earlier, now we want to insert into a list that already exists. So this case, we're going to pass in two arguments, the pointer to the head of that linked list that we want to add to. Again, that's why it's so important that we always keep track of it, because it's the only way we really have to refer to the whole list is just by a pointer to the first element. So we want to pass in a pointer to that first element, and whatever value we want to add to the list. And eventually this function is going to return a pointer to the new head of a linked list. What are the steps involved here? Well, just like with create, we need to dynamically allocate space for a new node, and check to make sure we don't run out of memory, again, because we're using malloc. Then we want to populate and insert the node, so put the number, whatever val is, into the node. We want to insert the node at the beginning of the linked list. There's a reason that I want to do this, and it might be worth taking a second to pause the video here, and think about why I would want to insert at the beginning of a linked list. Again, I mentioned earlier that it doesn't really matter if we preserve it in any order, so maybe that's a clue. And you saw what would happen if we wanted to-- or from just a second ago when we were going through search you could see what might happen if we were trying to insert at the end of the list. Because we don't have a pointer to the end of the list. So the reason that I would want to insert at the beginning, is because I can do it immediately. I have a pointer at the beginning, and we'll see this in a visual in a second. But if I want to insert at the end, I have to start at the beginning, traverse all the way to the end, and then tack it on. So that would mean that inserting at the end of the list would become an o of n operation, going back to our discussion of computational complexity. It'd become an o of n operation, where as the list got bigger, and bigger, and bigger, it'll become more and more difficult to tack something on at the end. But it's always really easy to tack something on at the beginning, you're always at the beginning. And we'll see a visual of this again. And then once we're done, once we've inserted the new node, we want to return our pointer to the new head of a linked list, which since we're inserting at the beginning, will actually be a pointer to the node we just created. Let's visualize this, because I think it'll help. So here's our list, it consists of four elements, a node containing 15, which points to a node containing 9, which points to a node containing 13, which points to a node containing 10, which has the null pointer as its next pointer so that's the end of the list. So we want to insert a new node with the value 12 at the beginning of this list, what do we do? Well, first we malloc space for the node, and then we put 12 in there. So now we've reached a decision point, right? We have a couple of pointers that we could move, which one should we move first? Should we make 12 point to the new head of the list-- or excuse me, should we make 12 point to the old head of the list? Or should we say that the list now begins at 12. There's a distinction there, and we'll look at what happens with both in a second. But this leads to a great topic for sidebar, which is that one of the trickiest things with linked lists is to arrange the pointers in the correct order. If you move things out of order, you can end up accidentally orphaning the rest of the list. And here's an example of that. So let's go with the idea of-- well, we've just created 12. We know 12 is going to be the new head of the list, and so why don't we just move the list pointer to point there. OK, so that's good. So now where does 12 next point? I mean, visually we can see that it will point to 15, as humans it's really obvious to us. How does the computer know? We don't have anything pointing to 15 anymore, right? We've lost any ability to refer to 15. We can't say new arrow next equals something, there's nothing there. In fact, we've orphaned the rest of the list by doing so, we've accidentally broken the chain. And we certainly don't want to do that. So let's go back and try this again. Maybe the right thing to do is to set 12's next pointer to the old head of the list first, then we can move the list over. And in fact, that is the correct order that we need to follow when we're working with singly linked list. We always want to connect the new element into the list, before we take that kind of important step of changing where the head of the linked list is. Again, that's such a fundamental thing, we don't want to lose track of it. So we want to make sure that everything is chained together, before we move that pointer. And so this would be the correct order, which is to connect 12 to the list, then say that the list starts a 12. If we said the list starts at 12 and then tried to connect 12 to the list, we've already seen what happens. We lose the list by mistake. OK, so one more thing to talk about. What if we want to get rid of an entire linked list at once? Again, we're mallocing all this space, and so we need to free it when we're done. So now we want to delete the entire linked list. Well, what do we want to do? If we've reached the null pointer, we want to stop, otherwise, just delete the rest of the list and then free me. Delete the rest of the list, and then free the current node. What does that sound like, what technique have we talked about previously does that sound like? Delete everybody else, then come back and delete me. That's recursion, we've made the problem a little bit smaller, we're saying delete everybody else, then you can delete me. And further down the road, that node will say, delete everybody else. But eventually we'll get to the point where the list is null, and that's our base case. So let's take a look at this, and how this might work. So here's our list, it's the same list we were just talking about, and there's the steps. There's a lot of text here, but hopefully the visualization will help. So we have-- and I also pulled up our stack frames illustration from our video on call stacks, and hopefully all of this together will show you what's going on. So here's our pseudocode code. If we reach a null pointer, stop, otherwise, delete the rest of the list, then free the current node. So right now, list-- the pointer that we're passing in to destroy points to 12. 12 is not a null pointer, so we're going to delete the rest of the list. What is deleting the rest of us involved? Well, it means making a call to destroy, saying that 15 is the beginning of the rest of the list we want to destroy. And so the call to destroy 12 is kind of on hold. It's frozen there, waiting for the call to destroy 15, to finish its job. Well, 15 is not a null pointer, and so it's going to say, all right, well, delete the rest of the list. The rest of the list starts at 9, and so we'll just wait until you delete all that stuff, then come back and delete me. Well 9's going to say, well, I'm not a null pointer, so delete the rest the list from here. And so try and destroy 13. 13 says, I'm not null pointer, same thing, it passes the buck. 10 is not null pointer, 10 contains a null pointer, but 10 is not itself a null pointer right now, and so it passes the buck too. And now list points there, it really would point to some-- if I had more space in the image, it would point to some random space that we don't know what it is. It's the null pointer though, the list is literally now set it's values null. It's pointing right inside that red box. We reached a null pointer, so we can stop, and we're done. And so that purple frame is now-- at the top of stack-- that's the active frame, but it's done. If we've reached a null pointer, stop. We don't do anything, we can't free a null pointer, we didn't malloc any space, and so we're done. So that function frame is destroyed, and we resume-- we pick up where we left off with the next highest one, which is this dark blue frame here. So we pick up right where we left off. We deleted the rest of the list already, so now we're going to free the current nodes. So now we can free this node, and now we've reached the end of the function. And so that function frame is destroyed, and we pick up at the light blue one. So it says-- I've already done-- deleting the rest of the list, so free the current node. And now the yellow frame is back on top of the stack. And so as you see, we're now destroying the list from right to left. What would have happened, though, if we had done things the wrong way? Just like when we tried to add an element. If we messed up the chain, if we didn't connect the pointers in the correct order, if we just freed the first element, if we just freed the head of the list, now we have no way to refer to the rest of the list. And so we would have orphaned everything, we would have had what's called a memory leak. If you recall from our video on dynamic memory allocation, that's not very good thing. So as I said, there are several operations that we need to use to work with linked list effectively. And you may have noticed I omitted one, deleting a single element from a linked list. The reason I did that is it's actually kind of tricky to think about how to delete a single element from a singly linked list. We need to be able to skip over something in the list, which means we get to a point-- we want to delete this node-- but in order to make it so we don't lose any information, we need to connect this node over here, here. So I probably did that wrong from a visual perspective. So we're at the beginning of our list, we're proceeding through, we want to delete this node. If we just delete it, we've broken the chain. This node right here refers to everything else, it contains the chain from here on out. So what we need to do actually after we get to this point, is we need to step back one, and connect this node over to this node, so we can then delete the one in the middle. But singly linked lists don't provide us a way to go backwards. So we need to either keep two pointers, and move them sort of off step, one behind the other as we go, or get to a point and then send another pointer through. And as you can see, it can get a little messy. Fortunately, we have another way to resolve that, when we talk about doubly linked lists. I'm Doug Lloyd, this is CS50.