In programming, we often need to represent lists of values, such as the names of students in a section or their scores on the latest quiz.  

In the C language, declared arrays can be used to store lists. It's easy to enumerate the elements of a list stored in an array, and if you need to access or modify the ith list element for some arbitrary index I, that can be done in constant time, but arrays have disadvantages, too. 

When we declare them, we're required to say up front how big they are, that is, how many elements they can store and how big these elements are, which is determined by their type.  For instance, int arr(10) can store 10 items that are the size of an int.  

We can't change an array's size after declaration. We have to make a new array if we want to store more elements. The reason this limitation exists is that our program stores the whole array as a contiguous chunk of memory. Say this is the buffer where we stored in our array. There might be other variables located right next to the array in memory, so we can't just make the array bigger. 

Sometimes we'd like to trade the array's fast data access speed for a little more flexibility. Enter the linked list, another basic data structure you might not be as familiar with.  At a high level,  a linked list stores data in a sequence of nodes that are connected to each other with links, hence the name 'linked list.' As we'll see, this difference in design leads to different advantages and disadvantages than an array.  

Here's some C code for a very simple linked list of integers. You can see that we have represented each node in the list as a struct which contains 2 things,  an integer to store called 'val' and a link to the next node in the list which we represent as a pointer called 'next.' This way, we can track the entire list with just a single pointer to the 1st node, and then we can follow the next pointers to the 2nd node, to the 3rd node, to the 4th node, and so on, until we get to the end of the list. 

You might be able to see 1 advantage this has over the static array structure--with a linked list, we don't need a big chunk of memory altogether. The 1st node of the list could live at this place in memory, and the 2nd node could be all the way over here. We can get to all the nodes no matter where in memory they are, because starting at the 1st node, each node's next pointer tells us exactly where to go next.  

Additionally, we don't have to say up front how big a linked list will be the way we do with static arrays, since we can keep adding nodes to a list as long as there's space somewhere in memory for new nodes.  Therefore, linked lists are easy to resize dynamically. Say, later in the program we need to add more nodes into our list. To insert a new node into our list on the fly,  all we have to do is allocate memory for that node, plop in the data value, and then place it where we want by adjusting the appropriate pointers.  

For example, if we wanted to place a node in between the 2nd and 3rd nodes of the list,   we wouldn't have to move the 2nd or 3rd nodes at all.  Say we're inserting this red node. All we'd have to do is set the new node's next pointer to point to the 3rd node and then rewire the 2nd node's next pointer to point to our new node. So, we can resize our lists on the fly since our computer doesn't rely on indexing, but rather on linking using pointers to store them.  

However, a disadvantage of linked lists is that, unlike a static array,  the computer can't just jump to the middle of the list. Since the computer has to visit each node in the linked list to get to the next one,  it's going to take longer to find a particular node than it would in an array. To traverse the entire list takes time proportional to the length of the list, or O(n) in asymptotic notation.  On average, reaching any node also takes time proportional to n. 

Now, let's actually write some code that works with linked lists.  Let's say we want a linked list of integers. We can represent a node in our list again as a struct with 2 fields, an integer value called 'val' and a next pointer to the next node of the list.  Well, seems simple enough.  

Let's say we want to write a function which traverses the list and prints out the value stored in the last node of the list.  Well, that means we'll need to traverse all the nodes in the list to find the last one, but since we're not adding or deleting anything, we don't want to change the internal structure of the next pointers in the list.  

So, we'll need a pointer specifically for traversal which we'll call 'crawler.' It will crawl through all the elements of the list by following the chain of next pointers. All we have stored is a pointer to the 1st node, or 'head' of the list.  Head points to the 1st node.  It's of type pointer-to-node. 

To get the actual 1st node in the list,  we have to dereference this pointer, but before we can dereference it, we need to check if the pointer is null first. If it's null, the list is empty, and we should print out a message that, because the list is empty,  there is no last node.  But, let's say the list isn't empty.  If it's not, then we should crawl through the entire list until we get to the last node of the list, and how can we tell if we're looking at the last node in the list? 

Well, if a node's next pointer is null, we know we're at the end since the last next pointer would have no next node in the list to point to.  It's good practice to always keep the last node's next pointer initialized to null to have a standardized property which alerts us when we've reached the end of the list.  

So, if crawler → next is null,  remember that the arrow syntax is a shortcut for dereferencing a pointer to a struct, then accessing its next field equivalent to the awkward:  (*crawler).next.  Once we've found the last node,  we want to print crawler → val, the value in the current node which we know is the last one.  Otherwise, if we're not yet at the last node in the list,  we have to move on to the next node in the list and check if that's the last one.  To do this, we just set our crawler pointer to point to the current node's next value, that is, the next node in the list. This is done by setting  crawler = crawler → next.  Then we repeat this process, with a loop for instance, until we find the last node. So, for instance, if crawler was pointing to head, we set crawler to point to crawler → next, which is the same as the next field of the 1st node. So, now our crawler is pointing to the 2nd node, and, again, we repeat this with a loop, until we've found the last node, that is,  where the node's next pointer is pointing to null.  And there we have it, we've found the last node in the list, and to print its value,  we just use crawler → val.  

Traversing isn't so bad, but what about inserting? Lets say we want to insert an integer into the 4th position in an integer list.  That is between the current 3rd and 4th nodes.  Again, we have to traverse the list just to get to the 3rd element, the one we're inserting after.  So, we create a crawler pointer again to traverse the list, check if our head pointer is null, and if it's not, point our crawler pointer at the head node.  So, we're at the 1st element.  We have to go forward 2 more elements before we can insert, so we can use a for loop int i = 1; i < 3; i++ and in each iteration of the loop, advance our crawler pointer forward by 1 node by checking if the current node's next field is null, and if it's not, move our crawler pointer to the next node by setting it equal to the current node's next pointer.  So, since our for loop says to do that twice,  we've reached the 3rd node, and once our crawler pointer has reached the node after which we want to insert our new integer,  how do we actually do the inserting? 

Well, our new integer has to be inserted into the list as part of its own node struct, since this is really a sequence of nodes. So, let's make a new pointer to node called 'new_node,'  and set it to point to memory that we now allocate on the heap for the node itself, and how much memory do we need to allocate? Well, the size of a node, and we want to set its val field to the integer that we want to insert.  Let's say, 6.  Now, the node contains our integer value.  It's also good practice to initialize the new node's next field to point to null, but now what? 

We have to change the internal structure of the list and the next pointers contained in the list's existing 3rd and 4th nodes.  Since the next pointers determine the order of the list, and since we're inserting our new node right into the middle of the list, it can be a bit tricky.  This is because, remember, our computer only knows the location of nodes in the list because of the next pointers stored in the previous nodes.  So, if we ever lost track of any of these locations,  say by changing one of the next pointers in our list,  for instance, say we changed the 3rd node's next field to point to some node over here.  We'd be out of luck, because we wouldn't have any idea where to find the rest of the list, and that's obviously really bad.  So, we have to be really careful about the order in which we manipulate our next pointers during insertion.  

So, to simplify this, let's say that our first 4 nodes are called A, B, C, and D, with the arrows representing the chain of pointers that connect the nodes.  So, we need to insert our new node in between nodes C and D.  It's critical to do it in the right order, and I'll show you why.  

Let's look at the wrong way to do it first.  Hey, we know the new node has to come right after C, so let's set C's next pointer to point to new_node. All right, seems okay, we just have to finish up now by  making the new node's next pointer point to D, but wait, how can we do that? The only thing that could tell us where D was, was the next pointer previously stored in C, but we just rewrote that pointer to point to the new node, so we no longer have any clue where D is in memory, and we've lost the rest of the list.  Not good at all.  

So, how do we do this right? First, point the new node's next pointer at D. Now, both the new node's and C's next pointers are pointing to the same node, D, but that's fine. Now we can point C's next pointer at the new node.  So, we've done this without losing any data.  In code, C is the current node that the traversal pointer crawler is pointing to, and D is represented by the node pointed to by the current node's next field, or crawler → next.  So, we first set the new node's next pointer to point to crawler → next, the same way we said new_node's next pointer should  point to D in the illustration.  Then, we can set the current node's next pointer to our new node, just as we had to wait to point C to new_node in the drawing.  Now everything's in order, and we didn't lose track of any data, and we were able to just stick our new node in the middle of the list  without rebuilding the whole thing or even shifting any elements the way we would have had to with a fixed-length array.  

So, linked lists are a basic, but important, dynamic data structure which have both advantages and disadvantages compared to arrays and other data structures, and as is often the case in computer science,  it's important to know when to use each tool, so you can pick the right tool for the right job. 

For more practice, try writing functions to delete nodes from a linked list-- remember to be careful about the order in which you rearrange your next pointers to ensure that you don't lose a chunk of your list-- or a function to count the nodes in a linked list, or a fun one, to reverse the order of all of the nodes in a linked list.  

My name is Jackson Steinkamp, this is CS50.