1 00:00:07,260 --> 00:00:10,050 In programming, we often need to represent lists of values, 2 00:00:10,050 --> 00:00:12,840 such as the names of students in a section 3 00:00:12,840 --> 00:00:15,100 or their scores on the latest quiz. 4 00:00:15,100 --> 00:00:17,430 >> In the C language, declared arrays can be used 5 00:00:17,430 --> 00:00:19,160 to store lists. 6 00:00:19,160 --> 00:00:21,200 It's easy to enumerate the elements of a list 7 00:00:21,200 --> 00:00:23,390 stored in an array, and if you need to access 8 00:00:23,390 --> 00:00:25,050 or modify the ith list element 9 00:00:25,050 --> 00:00:27,570 for some arbitrary index I, 10 00:00:27,570 --> 00:00:29,910 that can be done in constant time, 11 00:00:29,910 --> 00:00:31,660 but arrays have disadvantages, too. 12 00:00:31,660 --> 00:00:33,850 >> When we declare them, we're required to say 13 00:00:33,850 --> 00:00:35,900 up front how big they are, 14 00:00:35,900 --> 00:00:38,160 that is, how many elements they can store 15 00:00:38,160 --> 00:00:40,780 and how big these elements are, which is determined by their type. 16 00:00:40,780 --> 00:00:45,450 For instance, int arr(10) 17 00:00:45,450 --> 00:00:48,220 can store 10 items 18 00:00:48,220 --> 00:00:50,200 that are the size of an int. 19 00:00:50,200 --> 00:00:52,590 >> We can't change an array's size after declaration. 20 00:00:52,590 --> 00:00:55,290 We have to make a new array if we want to store more elements. 21 00:00:55,290 --> 00:00:57,410 The reason this limitation exists is that our 22 00:00:57,410 --> 00:00:59,040 program stores the whole array 23 00:00:59,040 --> 00:01:02,310 as a contiguous chunk of memory. 24 00:01:02,310 --> 00:01:04,500 Say this is the buffer where we stored in our array. 25 00:01:04,500 --> 00:01:06,910 There might be other variables 26 00:01:06,910 --> 00:01:08,310 located right next to the array 27 00:01:08,310 --> 00:01:10,060 in memory, so we can't 28 00:01:10,060 --> 00:01:12,060 just make the array bigger. 29 00:01:12,060 --> 00:01:15,700 >> Sometimes we'd like to trade the array's fast data access speed 30 00:01:15,700 --> 00:01:17,650 for a little more flexibility. 31 00:01:17,650 --> 00:01:20,380 Enter the linked list, another basic data structure 32 00:01:20,380 --> 00:01:22,360 you might not be as familiar with. 33 00:01:22,360 --> 00:01:24,200 At a high level, 34 00:01:24,200 --> 00:01:26,840 a linked list stores data in a sequence of nodes 35 00:01:26,840 --> 00:01:29,280 that are connected to each other with links, 36 00:01:29,280 --> 00:01:31,760 hence the name 'linked list.' 37 00:01:31,760 --> 00:01:33,840 As we'll see, this difference in design 38 00:01:33,840 --> 00:01:35,500 leads to different advantages and disadvantages 39 00:01:35,500 --> 00:01:37,000 than an array. 40 00:01:37,000 --> 00:01:39,840 >> Here's some C code for a very simple linked list of integers. 41 00:01:39,840 --> 00:01:42,190 You can see that we have represented each node 42 00:01:42,190 --> 00:01:45,520 in the list as a struct which contains 2 things, 43 00:01:45,520 --> 00:01:47,280 an integer to store called 'val' 44 00:01:47,280 --> 00:01:50,460 and a link to the next node in the list 45 00:01:50,460 --> 00:01:52,990 which we represent as a pointer called 'next.' 46 00:01:54,120 --> 00:01:56,780 This way, we can track the entire list 47 00:01:56,780 --> 00:01:58,790 with just a single pointer to the 1st node, 48 00:01:58,790 --> 00:02:01,270 and then we can follow the next pointers 49 00:02:01,270 --> 00:02:03,130 to the 2nd node, 50 00:02:03,130 --> 00:02:05,280 to the 3rd node, 51 00:02:05,280 --> 00:02:07,000 to the 4th node, 52 00:02:07,000 --> 00:02:09,889 and so on, until we get to the end of the list. 53 00:02:10,520 --> 00:02:12,210 >> You might be able to see 1 advantage this has 54 00:02:12,210 --> 00:02:14,490 over the static array structure--with a linked list, 55 00:02:14,490 --> 00:02:16,450 we don't need a big chunk of memory altogether. 56 00:02:17,400 --> 00:02:20,530 The 1st node of the list could live at this place in memory, 57 00:02:20,530 --> 00:02:23,160 and the 2nd node could be all the way over here. 58 00:02:23,160 --> 00:02:25,780 We can get to all the nodes no matter where in memory they are, 59 00:02:25,780 --> 00:02:28,890 because starting at the 1st node, each node's next pointer 60 00:02:28,890 --> 00:02:31,700 tells us exactly where to go next. 61 00:02:31,700 --> 00:02:33,670 >> Additionally, we don't have to say up front 62 00:02:33,670 --> 00:02:36,740 how big a linked list will be the way we do with static arrays, 63 00:02:36,740 --> 00:02:39,060 since we can keep adding nodes to a list 64 00:02:39,060 --> 00:02:42,600 as long as there's space somewhere in memory for new nodes. 65 00:02:42,600 --> 00:02:45,370 Therefore, linked lists are easy to resize dynamically. 66 00:02:45,370 --> 00:02:47,950 Say, later in the program we need to add more nodes 67 00:02:47,950 --> 00:02:49,350 into our list. 68 00:02:49,350 --> 00:02:51,480 To insert a new node into our list on the fly, 69 00:02:51,480 --> 00:02:53,740 all we have to do is allocate memory for that node, 70 00:02:53,740 --> 00:02:55,630 plop in the data value, 71 00:02:55,630 --> 00:02:59,070 and then place it where we want by adjusting the appropriate pointers. 72 00:02:59,070 --> 00:03:02,310 >> For example, if we wanted to place a node in between 73 00:03:02,310 --> 00:03:04,020 the 2nd and 3rd nodes of the list, 74 00:03:04,020 --> 00:03:06,800 we wouldn't have to move the 2nd or 3rd nodes at all. 75 00:03:06,800 --> 00:03:09,190 Say we're inserting this red node. 76 00:03:09,190 --> 00:03:12,890 All we'd have to do is set the new node's next pointer 77 00:03:12,890 --> 00:03:14,870 to point to the 3rd node 78 00:03:14,870 --> 00:03:18,580 and then rewire the 2nd node's next pointer 79 00:03:18,580 --> 00:03:20,980 to point to our new node. 80 00:03:22,340 --> 00:03:24,370 So, we can resize our lists on the fly 81 00:03:24,370 --> 00:03:26,090 since our computer doesn't rely on indexing, 82 00:03:26,090 --> 00:03:28,990 but rather on linking using pointers to store them. 83 00:03:29,120 --> 00:03:31,600 >> However, a disadvantage of linked lists 84 00:03:31,600 --> 00:03:33,370 is that, unlike a static array, 85 00:03:33,370 --> 00:03:36,690 the computer can't just jump to the middle of the list. 86 00:03:38,040 --> 00:03:40,780 Since the computer has to visit each node in the linked list 87 00:03:40,780 --> 00:03:42,330 to get to the next one, 88 00:03:42,330 --> 00:03:44,770 it's going to take longer to find a particular node 89 00:03:44,770 --> 00:03:46,400 than it would in an array. 90 00:03:46,400 --> 00:03:48,660 To traverse the entire list takes time proportional 91 00:03:48,660 --> 00:03:50,580 to the length of the list, 92 00:03:50,580 --> 00:03:54,630 or O(n) in asymptotic notation. 93 00:03:54,630 --> 00:03:56,510 On average, reaching any node 94 00:03:56,510 --> 00:03:58,800 also takes time proportional to n. 95 00:03:58,800 --> 00:04:00,700 >> Now, let's actually write some code 96 00:04:00,700 --> 00:04:02,000 that works with linked lists. 97 00:04:02,000 --> 00:04:04,220 Let's say we want a linked list of integers. 98 00:04:04,220 --> 00:04:06,140 We can represent a node in our list again 99 00:04:06,140 --> 00:04:08,340 as a struct with 2 fields, 100 00:04:08,340 --> 00:04:10,750 an integer value called 'val' 101 00:04:10,750 --> 00:04:13,490 and a next pointer to the next node of the list. 102 00:04:13,490 --> 00:04:15,660 Well, seems simple enough. 103 00:04:15,660 --> 00:04:17,220 >> Let's say we want to write a function 104 00:04:17,220 --> 00:04:19,329 which traverses the list and prints out the 105 00:04:19,329 --> 00:04:22,150 value stored in the last node of the list. 106 00:04:22,150 --> 00:04:24,850 Well, that means we'll need to traverse all the nodes in the list 107 00:04:24,850 --> 00:04:27,310 to find the last one, but since we're not adding 108 00:04:27,310 --> 00:04:29,250 or deleting anything, we don't want to change 109 00:04:29,250 --> 00:04:32,210 the internal structure of the next pointers in the list. 110 00:04:32,210 --> 00:04:34,790 >> So, we'll need a pointer specifically for traversal 111 00:04:34,790 --> 00:04:36,940 which we'll call 'crawler.' 112 00:04:36,940 --> 00:04:38,870 It will crawl through all the elements of the list 113 00:04:38,870 --> 00:04:41,190 by following the chain of next pointers. 114 00:04:41,190 --> 00:04:43,750 All we have stored is a pointer to the 1st node, 115 00:04:43,750 --> 00:04:45,730 or 'head' of the list. 116 00:04:45,730 --> 00:04:47,370 Head points to the 1st node. 117 00:04:47,370 --> 00:04:49,120 It's of type pointer-to-node. 118 00:04:49,120 --> 00:04:51,280 >> To get the actual 1st node in the list, 119 00:04:51,280 --> 00:04:53,250 we have to dereference this pointer, 120 00:04:53,250 --> 00:04:55,100 but before we can dereference it, we need to check 121 00:04:55,100 --> 00:04:57,180 if the pointer is null first. 122 00:04:57,180 --> 00:04:59,190 If it's null, the list is empty, 123 00:04:59,190 --> 00:05:01,320 and we should print out a message that, because the list is empty, 124 00:05:01,320 --> 00:05:03,250 there is no last node. 125 00:05:03,250 --> 00:05:05,190 But, let's say the list isn't empty. 126 00:05:05,190 --> 00:05:08,340 If it's not, then we should crawl through the entire list 127 00:05:08,340 --> 00:05:10,440 until we get to the last node of the list, 128 00:05:10,440 --> 00:05:13,030 and how can we tell if we're looking at the last node in the list? 129 00:05:13,670 --> 00:05:16,660 >> Well, if a node's next pointer is null, 130 00:05:16,660 --> 00:05:18,320 we know we're at the end 131 00:05:18,320 --> 00:05:22,390 since the last next pointer would have no next node in the list to point to. 132 00:05:22,390 --> 00:05:26,590 It's good practice to always keep the last node's next pointer initialized to null 133 00:05:26,590 --> 00:05:30,800 to have a standardized property which alerts us when we've reached the end of the list. 134 00:05:30,800 --> 00:05:33,510 >> So, if crawler → next is null, 135 00:05:34,120 --> 00:05:38,270 remember that the arrow syntax is a shortcut for dereferencing 136 00:05:38,270 --> 00:05:40,010 a pointer to a struct, then accessing 137 00:05:40,010 --> 00:05:42,510 its next field equivalent to the awkward: 138 00:05:42,510 --> 00:05:48,750 (*crawler).next. 139 00:05:49,820 --> 00:05:51,260 Once we've found the last node, 140 00:05:51,260 --> 00:05:53,830 we want to print crawler → val, 141 00:05:53,830 --> 00:05:55,000 the value in the current node 142 00:05:55,000 --> 00:05:57,130 which we know is the last one. 143 00:05:57,130 --> 00:05:59,740 Otherwise, if we're not yet at the last node in the list, 144 00:05:59,740 --> 00:06:02,340 we have to move on to the next node in the list 145 00:06:02,340 --> 00:06:04,750 and check if that's the last one. 146 00:06:04,750 --> 00:06:07,010 To do this, we just set our crawler pointer 147 00:06:07,010 --> 00:06:09,840 to point to the current node's next value, 148 00:06:09,840 --> 00:06:11,680 that is, the next node in the list. 149 00:06:11,680 --> 00:06:13,030 This is done by setting 150 00:06:13,030 --> 00:06:15,280 crawler = crawler → next. 151 00:06:16,050 --> 00:06:18,960 Then we repeat this process, with a loop for instance, 152 00:06:18,960 --> 00:06:20,960 until we find the last node. 153 00:06:20,960 --> 00:06:23,150 So, for instance, if crawler was pointing to head, 154 00:06:24,050 --> 00:06:27,710 we set crawler to point to crawler → next, 155 00:06:27,710 --> 00:06:30,960 which is the same as the next field of the 1st node. 156 00:06:30,960 --> 00:06:33,620 So, now our crawler is pointing to the 2nd node, 157 00:06:33,620 --> 00:06:35,480 and, again, we repeat this with a loop, 158 00:06:37,220 --> 00:06:40,610 until we've found the last node, that is, 159 00:06:40,610 --> 00:06:43,640 where the node's next pointer is pointing to null. 160 00:06:43,640 --> 00:06:45,070 And there we have it, 161 00:06:45,070 --> 00:06:47,620 we've found the last node in the list, and to print its value, 162 00:06:47,620 --> 00:06:50,800 we just use crawler → val. 163 00:06:50,800 --> 00:06:53,130 >> Traversing isn't so bad, but what about inserting? 164 00:06:53,130 --> 00:06:56,290 Lets say we want to insert an integer into the 4th position 165 00:06:56,290 --> 00:06:58,040 in an integer list. 166 00:06:58,040 --> 00:07:01,280 That is between the current 3rd and 4th nodes. 167 00:07:01,280 --> 00:07:03,760 Again, we have to traverse the list just to 168 00:07:03,760 --> 00:07:06,520 get to the 3rd element, the one we're inserting after. 169 00:07:06,520 --> 00:07:09,300 So, we create a crawler pointer again to traverse the list, 170 00:07:09,300 --> 00:07:11,400 check if our head pointer is null, 171 00:07:11,400 --> 00:07:14,810 and if it's not, point our crawler pointer at the head node. 172 00:07:16,880 --> 00:07:18,060 So, we're at the 1st element. 173 00:07:18,060 --> 00:07:21,020 We have to go forward 2 more elements before we can insert, 174 00:07:21,020 --> 00:07:23,390 so we can use a for loop 175 00:07:23,390 --> 00:07:26,430 int i = 1; i < 3; i++ 176 00:07:26,430 --> 00:07:28,590 and in each iteration of the loop, 177 00:07:28,590 --> 00:07:31,540 advance our crawler pointer forward by 1 node 178 00:07:31,540 --> 00:07:34,570 by checking if the current node's next field is null, 179 00:07:34,570 --> 00:07:37,550 and if it's not, move our crawler pointer to the next node 180 00:07:37,550 --> 00:07:41,810 by setting it equal to the current node's next pointer. 181 00:07:41,810 --> 00:07:45,210 So, since our for loop says to do that 182 00:07:45,210 --> 00:07:47,550 twice, 183 00:07:49,610 --> 00:07:51,190 we've reached the 3rd node, 184 00:07:51,190 --> 00:07:53,110 and once our crawler pointer has reached the node after 185 00:07:53,110 --> 00:07:55,270 which we want to insert our new integer, 186 00:07:55,270 --> 00:07:57,050 how do we actually do the inserting? 187 00:07:57,050 --> 00:07:59,440 >> Well, our new integer has to be inserted into the list 188 00:07:59,440 --> 00:08:01,250 as part of its own node struct, 189 00:08:01,250 --> 00:08:03,140 since this is really a sequence of nodes. 190 00:08:03,140 --> 00:08:05,690 So, let's make a new pointer to node 191 00:08:05,690 --> 00:08:08,910 called 'new_node,' 192 00:08:08,910 --> 00:08:11,800 and set it to point to memory that we now allocate 193 00:08:11,800 --> 00:08:14,270 on the heap for the node itself, 194 00:08:14,270 --> 00:08:16,000 and how much memory do we need to allocate? 195 00:08:16,000 --> 00:08:18,250 Well, the size of a node, 196 00:08:20,450 --> 00:08:23,410 and we want to set its val field to the integer that we want to insert. 197 00:08:23,410 --> 00:08:25,590 Let's say, 6. 198 00:08:25,590 --> 00:08:27,710 Now, the node contains our integer value. 199 00:08:27,710 --> 00:08:30,650 It's also good practice to initialize the new node's next field 200 00:08:30,650 --> 00:08:33,690 to point to null, 201 00:08:33,690 --> 00:08:35,080 but now what? 202 00:08:35,080 --> 00:08:37,179 >> We have to change the internal structure of the list 203 00:08:37,179 --> 00:08:40,409 and the next pointers contained in the list's existing 204 00:08:40,409 --> 00:08:42,950 3rd and 4th nodes. 205 00:08:42,950 --> 00:08:46,560 Since the next pointers determine the order of the list, 206 00:08:46,560 --> 00:08:48,650 and since we're inserting our new node 207 00:08:48,650 --> 00:08:50,510 right into the middle of the list, 208 00:08:50,510 --> 00:08:52,010 it can be a bit tricky. 209 00:08:52,010 --> 00:08:54,250 This is because, remember, our computer 210 00:08:54,250 --> 00:08:56,250 only knows the location of nodes in the list 211 00:08:56,250 --> 00:09:00,400 because of the next pointers stored in the previous nodes. 212 00:09:00,400 --> 00:09:03,940 So, if we ever lost track of any of these locations, 213 00:09:03,940 --> 00:09:06,860 say by changing one of the next pointers in our list, 214 00:09:06,860 --> 00:09:09,880 for instance, say we changed 215 00:09:09,880 --> 00:09:12,920 the 3rd node's next field 216 00:09:12,920 --> 00:09:15,610 to point to some node over here. 217 00:09:15,610 --> 00:09:17,920 We'd be out of luck, because we wouldn't 218 00:09:17,920 --> 00:09:20,940 have any idea where to find the rest of the list, 219 00:09:20,940 --> 00:09:23,070 and that's obviously really bad. 220 00:09:23,070 --> 00:09:25,080 So, we have to be really careful about the order 221 00:09:25,080 --> 00:09:28,360 in which we manipulate our next pointers during insertion. 222 00:09:28,360 --> 00:09:30,540 >> So, to simplify this, let's say that 223 00:09:30,540 --> 00:09:32,220 our first 4 nodes 224 00:09:32,220 --> 00:09:36,200 are called A, B, C, and D, with the arrows representing the chain of pointers 225 00:09:36,200 --> 00:09:38,070 that connect the nodes. 226 00:09:38,070 --> 00:09:40,050 So, we need to insert our new node 227 00:09:40,050 --> 00:09:42,070 in between nodes C and D. 228 00:09:42,070 --> 00:09:45,060 It's critical to do it in the right order, and I'll show you why. 229 00:09:45,060 --> 00:09:47,500 >> Let's look at the wrong way to do it first. 230 00:09:47,500 --> 00:09:49,490 Hey, we know the new node has to come right after C, 231 00:09:49,490 --> 00:09:51,910 so let's set C's next pointer 232 00:09:51,910 --> 00:09:54,700 to point to new_node. 233 00:09:56,530 --> 00:09:59,180 All right, seems okay, we just have to finish up now by 234 00:09:59,180 --> 00:10:01,580 making the new node's next pointer point to D, 235 00:10:01,580 --> 00:10:03,250 but wait, how can we do that? 236 00:10:03,250 --> 00:10:05,170 The only thing that could tell us where D was, 237 00:10:05,170 --> 00:10:07,630 was the next pointer previously stored in C, 238 00:10:07,630 --> 00:10:09,870 but we just rewrote that pointer 239 00:10:09,870 --> 00:10:11,170 to point to the new node, 240 00:10:11,170 --> 00:10:14,230 so we no longer have any clue where D is in memory, 241 00:10:14,230 --> 00:10:17,020 and we've lost the rest of the list. 242 00:10:17,020 --> 00:10:19,000 Not good at all. 243 00:10:19,000 --> 00:10:21,090 >> So, how do we do this right? 244 00:10:22,360 --> 00:10:25,090 First, point the new node's next pointer at D. 245 00:10:26,170 --> 00:10:28,990 Now, both the new node's and C's next pointers 246 00:10:28,990 --> 00:10:30,660 are pointing to the same node, D, 247 00:10:30,660 --> 00:10:32,290 but that's fine. 248 00:10:32,290 --> 00:10:35,680 Now we can point C's next pointer at the new node. 249 00:10:37,450 --> 00:10:39,670 So, we've done this without losing any data. 250 00:10:39,670 --> 00:10:42,280 In code, C is the current node 251 00:10:42,280 --> 00:10:45,540 that the traversal pointer crawler is pointing to, 252 00:10:45,540 --> 00:10:50,400 and D is represented by the node pointed to by the current node's next field, 253 00:10:50,400 --> 00:10:52,600 or crawler → next. 254 00:10:52,600 --> 00:10:55,460 So, we first set the new node's next pointer 255 00:10:55,460 --> 00:10:57,370 to point to crawler → next, 256 00:10:57,370 --> 00:11:00,880 the same way we said new_node's next pointer should 257 00:11:00,880 --> 00:11:02,780 point to D in the illustration. 258 00:11:02,780 --> 00:11:04,540 Then, we can set the current node's next pointer 259 00:11:04,540 --> 00:11:06,330 to our new node, 260 00:11:06,330 --> 00:11:10,980 just as we had to wait to point C to new_node in the drawing. 261 00:11:10,980 --> 00:11:12,250 Now everything's in order, and we didn't lose 262 00:11:12,250 --> 00:11:14,490 track of any data, and we were able to just 263 00:11:14,490 --> 00:11:16,200 stick our new node in the middle of the list 264 00:11:16,200 --> 00:11:19,330 without rebuilding the whole thing or even shifting any elements 265 00:11:19,330 --> 00:11:22,490 the way we would have had to with a fixed-length array. 266 00:11:22,490 --> 00:11:26,020 >> So, linked lists are a basic, but important, dynamic data structure 267 00:11:26,020 --> 00:11:29,080 which have both advantages and disadvantages 268 00:11:29,080 --> 00:11:31,260 compared to arrays and other data structures, 269 00:11:31,260 --> 00:11:33,350 and as is often the case in computer science, 270 00:11:33,350 --> 00:11:35,640 it's important to know when to use each tool, 271 00:11:35,640 --> 00:11:37,960 so you can pick the right tool for the right job. 272 00:11:37,960 --> 00:11:40,060 >> For more practice, try writing functions to 273 00:11:40,060 --> 00:11:42,080 delete nodes from a linked list-- 274 00:11:42,080 --> 00:11:44,050 remember to be careful about the order in which you rearrange 275 00:11:44,050 --> 00:11:47,430 your next pointers to ensure that you don't lose a chunk of your list-- 276 00:11:47,430 --> 00:11:50,200 or a function to count the nodes in a linked list, 277 00:11:50,200 --> 00:11:53,280 or a fun one, to reverse the order of all of the nodes in a linked list. 278 00:11:53,280 --> 00:11:56,090 >> My name is Jackson Steinkamp, this is CS50.