1 00:00:00,000 --> 00:00:00,996 2 00:00:00,996 --> 00:00:04,482 [MUSIC PLAYING] 3 00:00:04,482 --> 00:01:23,688 4 00:01:23,688 --> 00:01:25,230 DAVID MALAN: All right, this is CS50. 5 00:01:25,230 --> 00:01:27,243 And this is our look today at data structures. 6 00:01:27,243 --> 00:01:29,160 You'll recall last week that we gave ourselves 7 00:01:29,160 --> 00:01:32,310 a few new ingredients and some new syntax in C. 8 00:01:32,310 --> 00:01:35,820 We introduced pointers, the ability to address chunks of memory 9 00:01:35,820 --> 00:01:36,990 by actual addresses-- 10 00:01:36,990 --> 00:01:38,700 0, 1, 2, 3 and on up. 11 00:01:38,700 --> 00:01:42,990 And we introduced the star notation and a few other functions, 12 00:01:42,990 --> 00:01:45,870 malloc among them, free among them, so that you can actually 13 00:01:45,870 --> 00:01:47,310 manage your computer's memory. 14 00:01:47,310 --> 00:01:49,110 Just to hammer this home, let's take a look 15 00:01:49,110 --> 00:01:52,080 at just this small example that from the outset is buggy. 16 00:01:52,080 --> 00:01:52,872 This code is buggy. 17 00:01:52,872 --> 00:01:54,330 And we'll see in just a moment why. 18 00:01:54,330 --> 00:01:56,280 But let's just walk through it step by step. 19 00:01:56,280 --> 00:01:58,920 This first line highlighted in yellow in English 20 00:01:58,920 --> 00:02:02,160 is doing what as you understand it now? 21 00:02:02,160 --> 00:02:05,160 If you're a little rusty since last week, what's that first line of code 22 00:02:05,160 --> 00:02:07,170 doing in English? 23 00:02:07,170 --> 00:02:07,830 Anything? 24 00:02:07,830 --> 00:02:08,030 Yeah. 25 00:02:08,030 --> 00:02:10,039 AUDIENCE: It's creating a point to an int named x. 26 00:02:10,039 --> 00:02:10,680 DAVID MALAN: Perfect. 27 00:02:10,680 --> 00:02:12,930 It's creating a pointer to an integer and calling 28 00:02:12,930 --> 00:02:15,042 that pointer or that variable x. 29 00:02:15,042 --> 00:02:18,000 And let me propose that the next line is doing the same thing giving us 30 00:02:18,000 --> 00:02:21,267 another pointer to an integer but this time calling it y. 31 00:02:21,267 --> 00:02:23,100 The third line of code, someone else, what's 32 00:02:23,100 --> 00:02:26,710 happening in English with this line here? 33 00:02:26,710 --> 00:02:28,431 Yeah. 34 00:02:28,431 --> 00:02:36,830 AUDIENCE: It's creates a memory of size of int and assigns it to x, 35 00:02:36,830 --> 00:02:40,180 but it's more difficult to execute if it's not a pointer, I don't know. 36 00:02:40,180 --> 00:02:42,190 DAVID MALAN: This is not the bug. 37 00:02:42,190 --> 00:02:44,243 But the first part is correct. 38 00:02:44,243 --> 00:02:46,160 malloc is the function we introduced last week 39 00:02:46,160 --> 00:02:47,900 that allocates memory for you. 40 00:02:47,900 --> 00:02:50,960 It takes one argument, the number of bytes that you want to allocate. 41 00:02:50,960 --> 00:02:53,630 Even if you don't recall how many bytes you need for an integer, 42 00:02:53,630 --> 00:02:55,540 you can call this other operator, sizeof, 43 00:02:55,540 --> 00:02:59,062 that we saw briefly last week, that will return, in this case, 4 most likely 44 00:02:59,062 --> 00:03:00,770 depending on the computer that you're on. 45 00:03:00,770 --> 00:03:04,070 So this is saying, hey, computer give me 4 bytes of memory. 46 00:03:04,070 --> 00:03:06,890 And it returns that chunk of memory to conventionally 47 00:03:06,890 --> 00:03:09,740 by way of the first address, so ox something, 48 00:03:09,740 --> 00:03:11,660 wherever those 4 bytes happen to star. 49 00:03:11,660 --> 00:03:14,810 And then it stores that address in x, which is in fact OK, 50 00:03:14,810 --> 00:03:17,240 because as you noted initially, x is in fact a pointer. 51 00:03:17,240 --> 00:03:18,260 That is an address. 52 00:03:18,260 --> 00:03:21,250 So all this is doing is it's declaring a variable called 53 00:03:21,250 --> 00:03:23,810 x and storing in it ultimately the address 54 00:03:23,810 --> 00:03:25,407 of a legitimate chunk of memory. 55 00:03:25,407 --> 00:03:28,030 You wouldn't typically allocate an int like this. 56 00:03:28,030 --> 00:03:32,535 You would allocate an int with just int and semicolon just like in week 1. 57 00:03:32,535 --> 00:03:35,660 But now that we have the ability to allocate addresses and allocate memory, 58 00:03:35,660 --> 00:03:38,150 you could achieve the same idea here. 59 00:03:38,150 --> 00:03:43,400 This line here now, the fourth line of code, says what in English? 60 00:03:43,400 --> 00:03:46,453 Star x equals 42 semicolon. 61 00:03:46,453 --> 00:03:47,370 What's going on there? 62 00:03:47,370 --> 00:03:48,072 Yeah. 63 00:03:48,072 --> 00:03:51,380 AUDIENCE: It goes to the address in the x and then sets it to 42. 64 00:03:51,380 --> 00:03:54,580 DAVID MALAN: Good goes to the address in x and sets it to 42. 65 00:03:54,580 --> 00:03:57,050 So the star operator is the dereference operator, 66 00:03:57,050 --> 00:03:59,390 which is a fancy way of saying go to that address. 67 00:03:59,390 --> 00:04:00,600 And what do you want to do? 68 00:04:00,600 --> 00:04:03,080 Well, per week 1 when we discussed the assignment operator, 69 00:04:03,080 --> 00:04:05,720 it just says put the number 42 there. 70 00:04:05,720 --> 00:04:09,950 So wherever malloc found 4 bytes of available memory for me, 71 00:04:09,950 --> 00:04:13,670 this fourth line of code says, go there and put the number 42 72 00:04:13,670 --> 00:04:16,730 by putting the appropriate zeros and ones. 73 00:04:16,730 --> 00:04:18,200 This last line here-- 74 00:04:18,200 --> 00:04:20,540 and here's the bug, if we finally reveal it here-- 75 00:04:20,540 --> 00:04:22,580 this line is buggy. 76 00:04:22,580 --> 00:04:25,420 Does anyone see why? 77 00:04:25,420 --> 00:04:26,458 Yeah, over here. 78 00:04:26,458 --> 00:04:29,000 AUDIENCE: You haven't allocated memory for that variable yet. 79 00:04:29,000 --> 00:04:29,540 DAVID MALAN: Exactly. 80 00:04:29,540 --> 00:04:31,730 I haven't allocated memory for that variable yet. 81 00:04:31,730 --> 00:04:35,160 And because up here I've just said int star y semicolon. 82 00:04:35,160 --> 00:04:38,870 You can only assume safely that has some garbage value, some unknown value, 83 00:04:38,870 --> 00:04:41,510 maybe remnants from some other part of the program, 84 00:04:41,510 --> 00:04:43,910 that might not necessarily be true here at the beginning of the program. 85 00:04:43,910 --> 00:04:46,493 But for safety's sake assume that if you don't give a variable 86 00:04:46,493 --> 00:04:48,870 a value, who knows what it contains. 87 00:04:48,870 --> 00:04:51,080 It's got some bogus address, such that if you 88 00:04:51,080 --> 00:04:55,220 say star y go to that bogus address something bad is going to happen. 89 00:04:55,220 --> 00:04:57,920 And maybe you experienced this already in P Set 4 or prior, 90 00:04:57,920 --> 00:05:00,050 some kind of memory problem with your code, 91 00:05:00,050 --> 00:05:02,990 or a segmentation fault or seg fault, bad things 92 00:05:02,990 --> 00:05:05,810 happen when you go to addresses that don't exist 93 00:05:05,810 --> 00:05:08,510 or that you don't even know where they are. 94 00:05:08,510 --> 00:05:09,770 So this line of code is bad. 95 00:05:09,770 --> 00:05:10,730 But we can do a little better. 96 00:05:10,730 --> 00:05:12,500 What if instead I do something like this? 97 00:05:12,500 --> 00:05:14,720 I actually assign to y x. 98 00:05:14,720 --> 00:05:18,320 So that just says put in y the same address that's in x. 99 00:05:18,320 --> 00:05:24,023 And then with this last line of code, what if I now say star y equals 13? 100 00:05:24,023 --> 00:05:25,690 What is that-- you're nodding your head. 101 00:05:25,690 --> 00:05:27,867 What am I doing correctly now? 102 00:05:27,867 --> 00:05:30,770 AUDIENCE: Now there's memory allocated for y. 103 00:05:30,770 --> 00:05:31,520 DAVID MALAN: Good. 104 00:05:31,520 --> 00:05:33,080 Now, there's memory allocated for y. 105 00:05:33,080 --> 00:05:36,090 So you're saying go to that address and put 13 there. 106 00:05:36,090 --> 00:05:40,830 However, what did we just need to the 42, just to be clear? 107 00:05:40,830 --> 00:05:41,540 We clobbered it. 108 00:05:41,540 --> 00:05:43,640 We overwrote it with the 13. 109 00:05:43,640 --> 00:05:48,410 Because if x and y are the same address, both this says go to that address 110 00:05:48,410 --> 00:05:51,050 and put 42 there, but then two lines later, we say, no, no, no, 111 00:05:51,050 --> 00:05:53,780 go there and put 13 there instead. 112 00:05:53,780 --> 00:05:57,170 But long story short, bad things happen when you don't necessarily 113 00:05:57,170 --> 00:05:59,917 anticipate what is in memory and you don't allocate it yourself, 114 00:05:59,917 --> 00:06:01,750 so thanks to one of our friends at Stanford, 115 00:06:01,750 --> 00:06:05,870 allow us to take a moment here to hit Play on a short film, a claymation 116 00:06:05,870 --> 00:06:07,760 if you will, that paints the same picture 117 00:06:07,760 --> 00:06:09,615 in a more memorable way perhaps. 118 00:06:09,615 --> 00:06:11,365 If we could dim the lights. 119 00:06:11,365 --> 00:06:14,100 [VIDEO PLAYBACK] 120 00:06:14,100 --> 00:06:15,810 NARRATOR: Hey, Binky, wake up. 121 00:06:15,810 --> 00:06:18,330 It's time for pointer fun. 122 00:06:18,330 --> 00:06:19,480 BINKY: What's that? 123 00:06:19,480 --> 00:06:21,130 Learn about pointers? 124 00:06:21,130 --> 00:06:22,650 Oh, goody. 125 00:06:22,650 --> 00:06:25,900 NARRATOR: Well, to get started, I guess we're going to need a couple pointers. 126 00:06:25,900 --> 00:06:30,460 BINKY: OK, this code allocates two pointers, which can point to integers. 127 00:06:30,460 --> 00:06:32,492 NARRATOR: OK, well, I see the two pointers. 128 00:06:32,492 --> 00:06:34,450 But they don't seem to be pointing to anything. 129 00:06:34,450 --> 00:06:37,390 BINKY: That's right, initially pointers don't point to anything. 130 00:06:37,390 --> 00:06:39,670 The things they point to are call pointees. 131 00:06:39,670 --> 00:06:41,395 And setting them up is a separate step. 132 00:06:41,395 --> 00:06:42,520 NARRATOR: Oh, right, right. 133 00:06:42,520 --> 00:06:43,240 I knew that. 134 00:06:43,240 --> 00:06:45,040 The pointees are separate. 135 00:06:45,040 --> 00:06:47,560 So how do you allocate a pointee? 136 00:06:47,560 --> 00:06:51,310 BINKY: OK, well, this code allocates a new integer pointee. 137 00:06:51,310 --> 00:06:54,155 And this part sets x to point to it. 138 00:06:54,155 --> 00:06:55,530 NARRATOR: Hey, that looks better. 139 00:06:55,530 --> 00:06:57,210 So make it do something. 140 00:06:57,210 --> 00:07:00,570 BINKY: OK, I'll dereference the pointer x to store the number 141 00:07:00,570 --> 00:07:02,880 42 into its pointee. 142 00:07:02,880 --> 00:07:06,450 For this trick, I'll need my magic wand of y dereferencing. 143 00:07:06,450 --> 00:07:09,790 NARRATOR: Your magic wand of dereferencing. 144 00:07:09,790 --> 00:07:11,630 That-- that's great. 145 00:07:11,630 --> 00:07:13,400 BINKY: This is what the code looks like. 146 00:07:13,400 --> 00:07:14,570 I'll just set up the number. 147 00:07:14,570 --> 00:07:16,380 And-- 148 00:07:16,380 --> 00:07:18,450 NARRATOR: Hey, look, there it goes. 149 00:07:18,450 --> 00:07:21,840 So doing a dereference on x, follows the arrow 150 00:07:21,840 --> 00:07:25,440 to access its pointee, in this case, the store 42 in there. 151 00:07:25,440 --> 00:07:30,010 Hey, try using it to store the number 13 through the other pointer, y. 152 00:07:30,010 --> 00:07:31,080 BINKY: OK. 153 00:07:31,080 --> 00:07:35,610 I'll just go over here to y and get the number 13 set up 154 00:07:35,610 --> 00:07:38,985 and then take the wand of dereferencing and just-- 155 00:07:38,985 --> 00:07:41,130 [BUZZER SOUND] oh. 156 00:07:41,130 --> 00:07:43,380 NARRATOR: Oh, hey, that didn't work. 157 00:07:43,380 --> 00:07:48,240 Say, Binky, I don't think dereferencing y is a good idea, because, you know, 158 00:07:48,240 --> 00:07:50,260 setting up the pointee is a separate step. 159 00:07:50,260 --> 00:07:52,445 And I don't think we ever did it. 160 00:07:52,445 --> 00:07:53,910 BINKY: Oh, good point. 161 00:07:53,910 --> 00:07:56,220 NARRATOR: Yeah, we allocated the pointer y. 162 00:07:56,220 --> 00:07:58,990 But we never set it to point to a pointee. 163 00:07:58,990 --> 00:08:00,730 BINKY: Mm, very observant. 164 00:08:00,730 --> 00:08:02,730 NARRATOR: Hey, you're looking good there, Binky. 165 00:08:02,730 --> 00:08:05,700 Can you fix it so that y points to the same pointee as x? 166 00:08:05,700 --> 00:08:06,300 BINKY: Sure. 167 00:08:06,300 --> 00:08:09,090 I'll use my magic wand of pointer assignment. 168 00:08:09,090 --> 00:08:11,280 NARRATOR: Is that going to be a problem like before? 169 00:08:11,280 --> 00:08:13,110 BINKY: No, this doesn't touch the pointees. 170 00:08:13,110 --> 00:08:16,815 It just changes one pointer to point to the same thing as another. 171 00:08:16,815 --> 00:08:17,730 NARRATOR: Oh, I see. 172 00:08:17,730 --> 00:08:20,500 Now y points to the same place as x. 173 00:08:20,500 --> 00:08:22,260 So, wait, now y is fixed. 174 00:08:22,260 --> 00:08:23,430 It has a pointee. 175 00:08:23,430 --> 00:08:26,950 So you can try the wand of dereferencing again to send the 13 over. 176 00:08:26,950 --> 00:08:30,410 BINKY: Uh-- OK, here goes. 177 00:08:30,410 --> 00:08:31,640 NARRATOR: Hey, look at that. 178 00:08:31,640 --> 00:08:33,289 Now dereferencing works on y. 179 00:08:33,289 --> 00:08:37,460 And because the pointers are sharing that one pointee, they both see the 13. 180 00:08:37,460 --> 00:08:39,169 BINKY: Yeah, sharing, whatever. 181 00:08:39,169 --> 00:08:41,090 So are we going to switch places now? 182 00:08:41,090 --> 00:08:43,010 NARRATOR: Oh, look, we're out of time. 183 00:08:43,010 --> 00:08:43,643 BINKY: But-- 184 00:08:43,643 --> 00:08:44,430 [END PLAYBACK] 185 00:08:44,430 --> 00:08:46,610 DAVID MALAN: All right, so now that we do 186 00:08:46,610 --> 00:08:49,040 have this power of pointers and addresses 187 00:08:49,040 --> 00:08:51,800 where we have low level access to the computer's memory, 188 00:08:51,800 --> 00:08:54,732 we can actually solve problems a lot more powerfully 189 00:08:54,732 --> 00:08:56,190 and in a lot more interesting ways. 190 00:08:56,190 --> 00:08:58,232 But first, let's motivate some of these problems. 191 00:08:58,232 --> 00:09:00,290 So back in Week 2, we introduced arrays, which 192 00:09:00,290 --> 00:09:02,780 was the first of our data structures, if you will. 193 00:09:02,780 --> 00:09:05,270 Before then in Week 1, all we had was variables for things 194 00:09:05,270 --> 00:09:07,910 like ints and chars and floats and so forth. 195 00:09:07,910 --> 00:09:09,860 In Week 2, we introduced arrays, which meant 196 00:09:09,860 --> 00:09:13,850 you could store two ints altogether or three or 10 or 100. 197 00:09:13,850 --> 00:09:16,810 So you can kind of encapsulate lots of data together. 198 00:09:16,810 --> 00:09:21,150 So unfortunately, though, arrays aren't quite as powerful as might be ideal. 199 00:09:21,150 --> 00:09:25,670 So, for instance, if we have an array with size 3 and we actually want to go 200 00:09:25,670 --> 00:09:28,580 ahead and store three values in it-- one, two, three-- 201 00:09:28,580 --> 00:09:31,430 suppose that we actually want to now store a fourth value, 202 00:09:31,430 --> 00:09:33,350 but we didn't anticipate that from the get go. 203 00:09:33,350 --> 00:09:37,160 Recall after all that with arrays you have to declare their size upfront. 204 00:09:37,160 --> 00:09:41,300 So you've got to hard code the number 3 or a variable containing the number 3. 205 00:09:41,300 --> 00:09:43,550 But suppose that we want to store the number 4. 206 00:09:43,550 --> 00:09:46,130 You might think that, well, just give me another box 207 00:09:46,130 --> 00:09:48,950 of memory just to the right of the number 3, 208 00:09:48,950 --> 00:09:51,080 so that I can keep all of my numbers together. 209 00:09:51,080 --> 00:09:53,930 But unfortunately, per last week, that's not really 210 00:09:53,930 --> 00:09:56,240 a reliable assumption, because in the context 211 00:09:56,240 --> 00:09:59,150 of the rest of your computer's memory, that 1, 2, 3, 212 00:09:59,150 --> 00:10:01,580 might be here surrounded by other bytes. 213 00:10:01,580 --> 00:10:05,450 And per last week those bytes might be mostly filled with other data 214 00:10:05,450 --> 00:10:07,850 from some other parts of your program. 215 00:10:07,850 --> 00:10:11,270 And yet you would think that in seeing that 1, 2, 3 216 00:10:11,270 --> 00:10:14,060 is kind of painted into this corner, so to speak, 217 00:10:14,060 --> 00:10:16,400 that there's just no room for the number 4, 218 00:10:16,400 --> 00:10:19,970 and therefore you can't add the fourth number to your array, 219 00:10:19,970 --> 00:10:23,012 is there a solution visibly to this problem nonetheless? 220 00:10:23,012 --> 00:10:25,970 221 00:10:25,970 --> 00:10:27,260 Where else could we put it? 222 00:10:27,260 --> 00:10:27,760 Yeah. 223 00:10:27,760 --> 00:10:29,468 AUDIENCE: Move it to off of other memory. 224 00:10:29,468 --> 00:10:31,052 DAVID MALAN: Say that a little louder. 225 00:10:31,052 --> 00:10:32,975 AUDIENCE: We can move it off to other memory. 226 00:10:32,975 --> 00:10:35,600 DAVID MALAN: Yeah, so maybe we can move it off to other memory. 227 00:10:35,600 --> 00:10:38,780 So there's a lot of EMMAs in my memory per last week, but there is still, 228 00:10:38,780 --> 00:10:41,630 it would seem, based on this picture, some unused memory. 229 00:10:41,630 --> 00:10:46,532 So maybe we could resize our array, grow it, not by just moving all of the EMMAs 230 00:10:46,532 --> 00:10:48,740 because frankly that would seem to take a lot of time 231 00:10:48,740 --> 00:10:50,532 if we had to shift all of these characters, 232 00:10:50,532 --> 00:10:53,840 why don't we just relocate the 1, 2, 3 down here, 233 00:10:53,840 --> 00:10:57,020 and that gives us an extra space for at least a number 4. 234 00:10:57,020 --> 00:10:58,970 So indeed even if you're using arrays, you 235 00:10:58,970 --> 00:11:01,940 can achieve this outcome by actually moving memory around. 236 00:11:01,940 --> 00:11:03,500 But consider what's involved in that. 237 00:11:03,500 --> 00:11:05,360 So if you've got our old array at top left, 238 00:11:05,360 --> 00:11:08,390 and we've got our new array at bottom right, that is of size 4. 239 00:11:08,390 --> 00:11:09,830 So we have plenty of room. 240 00:11:09,830 --> 00:11:12,528 How do we go about resizing the array? 241 00:11:12,528 --> 00:11:13,820 Well, it's kind of an illusion. 242 00:11:13,820 --> 00:11:17,112 You can't just resize the array when we have all of these EMMAs surrounding us. 243 00:11:17,112 --> 00:11:20,720 Instead, we actually have to move the array or copy it. 244 00:11:20,720 --> 00:11:22,352 So the 1 gets moved to the new memory. 245 00:11:22,352 --> 00:11:23,810 The 2 gets moved to the new memory. 246 00:11:23,810 --> 00:11:25,520 The 3 gets moved to the new memory. 247 00:11:25,520 --> 00:11:30,350 And then at that point, we can just throw away or free the previously used 248 00:11:30,350 --> 00:11:33,560 memory and now go ahead and add our 4. 249 00:11:33,560 --> 00:11:36,530 Unfortunately, this isn't necessarily the best strategy, right, 250 00:11:36,530 --> 00:11:40,100 because if these three lockers represent our original memory and these four 251 00:11:40,100 --> 00:11:44,090 lockers represents our new memory and they're deliberately far apart, 252 00:11:44,090 --> 00:11:49,010 that is to say that if I want to go ahead and move like these same numbers, 253 00:11:49,010 --> 00:11:52,680 I really have to do something like this, which involves quite a few steps. 254 00:11:52,680 --> 00:11:54,980 Let me go ahead and put the 1 in there now. 255 00:11:54,980 --> 00:11:58,365 Now, let me go ahead and get the 2 here. 256 00:11:58,365 --> 00:12:00,240 And then I can go ahead and put this in here. 257 00:12:00,240 --> 00:12:02,570 So now I've got the 2. 258 00:12:02,570 --> 00:12:05,662 And then lastly, I can go grab the 3. 259 00:12:05,662 --> 00:12:08,120 And so even though I did this pretty quickly on the screen, 260 00:12:08,120 --> 00:12:10,328 the reality is there's a decent amount of work to do. 261 00:12:10,328 --> 00:12:15,440 And then I still, of course, have to go ahead and add the 4 to the mix, which 262 00:12:15,440 --> 00:12:20,480 is to say that I've taken figuratively and physically quite a few steps 263 00:12:20,480 --> 00:12:24,830 in order to resize an array from size 3 to size 4, which 264 00:12:24,830 --> 00:12:26,840 is to say if we now consider the efficiency 265 00:12:26,840 --> 00:12:34,460 or, if you will, inefficiency of that algorithm, what kind of running time 266 00:12:34,460 --> 00:12:39,140 is involved when inserting additional numbers into an array 267 00:12:39,140 --> 00:12:39,920 as I've done here? 268 00:12:39,920 --> 00:12:42,770 Here's our menu of options from a couple of weeks 269 00:12:42,770 --> 00:12:45,140 ago when we focused on algorithms. 270 00:12:45,140 --> 00:12:49,070 What's the running time of insertion into an array 271 00:12:49,070 --> 00:12:52,970 based even on that simple demonstration would you say? 272 00:12:52,970 --> 00:12:54,040 What's the running time? 273 00:12:54,040 --> 00:12:54,540 Yeah. 274 00:12:54,540 --> 00:12:55,280 AUDIENCE: O n squared. 275 00:12:55,280 --> 00:12:55,370 276 00:12:55,370 --> 00:12:56,046 DAVID MALAN: Say it again. 277 00:12:56,046 --> 00:12:56,860 AUDIENCE: O n squared. 278 00:12:56,860 --> 00:12:57,920 DAVID MALAN: O n squared. 279 00:12:57,920 --> 00:13:01,245 So maybe O n squared in that there was a lot of back and forth 280 00:13:01,245 --> 00:13:02,370 and we've seen that before. 281 00:13:02,370 --> 00:13:04,430 We've seen bubble sort and selection sort add up. 282 00:13:04,430 --> 00:13:06,140 It's not quite as bad as that. 283 00:13:06,140 --> 00:13:07,390 It's not quite as bad as that. 284 00:13:07,390 --> 00:13:07,890 Yeah. 285 00:13:07,890 --> 00:13:08,757 AUDIENCE: O of n. 286 00:13:08,757 --> 00:13:09,590 DAVID MALAN: O of n. 287 00:13:09,590 --> 00:13:11,390 And why do you say O of n? 288 00:13:11,390 --> 00:13:16,996 AUDIENCE: Because for as like as many lockers there are in the first one, 289 00:13:16,996 --> 00:13:22,645 you have to increment the same amount of processes to insert them. 290 00:13:22,645 --> 00:13:23,520 DAVID MALAN: Exactly. 291 00:13:23,520 --> 00:13:26,940 Whatever number of lockers you have here-- so that's three specifically-- 292 00:13:26,940 --> 00:13:29,670 but n more generally, it's going to take me n steps 293 00:13:29,670 --> 00:13:31,942 to transfer those numbers over here. 294 00:13:31,942 --> 00:13:34,900 Or technically, it's going to take me 3-- maybe if I go back and forth, 295 00:13:34,900 --> 00:13:35,890 it's like 6 steps. 296 00:13:35,890 --> 00:13:37,770 But it's some multiple of n. 297 00:13:37,770 --> 00:13:38,730 So it's not n squared. 298 00:13:38,730 --> 00:13:41,310 That's when we kept iterating again and again and again. 299 00:13:41,310 --> 00:13:45,120 This time I just have to move 3 numbers to here and then add the fourth number. 300 00:13:45,120 --> 00:13:48,840 So it's indeed, Big O of n when you want to go ahead and insert or search 301 00:13:48,840 --> 00:13:54,180 equivalently an array that's actually implemented-- 302 00:13:54,180 --> 00:13:56,640 sorry, insert is going to take us linear time. 303 00:13:56,640 --> 00:13:58,878 But search recall-- and this was the powerful thing-- 304 00:13:58,878 --> 00:14:01,920 what's the running time of search so long as you keep your number sorted? 305 00:14:01,920 --> 00:14:04,340 Per two weeks ago, that was logarithmic. 306 00:14:04,340 --> 00:14:06,330 So we haven't necessarily sacrificed that. 307 00:14:06,330 --> 00:14:10,020 And that's the appeal of storing our data in an array that's sorted. 308 00:14:10,020 --> 00:14:12,030 You can use binary search. 309 00:14:12,030 --> 00:14:15,930 However, this is expensive and moving things around isn't necessarily 310 00:14:15,930 --> 00:14:16,930 the ideal approach. 311 00:14:16,930 --> 00:14:19,510 So let's just consider what this might look like in code. 312 00:14:19,510 --> 00:14:22,400 Let me go over to CS50 IDE here. 313 00:14:22,400 --> 00:14:26,970 And let me go ahead and create a new file called list.c. 314 00:14:26,970 --> 00:14:30,510 And let's see if we can't represent in code exactly this idea. 315 00:14:30,510 --> 00:14:33,630 So let me go ahead and include for myself standard stdio.h 316 00:14:33,630 --> 00:14:36,300 just so that we can print out some values ultimately. 317 00:14:36,300 --> 00:14:38,482 Let me go ahead then and declare main-- 318 00:14:38,482 --> 00:14:40,470 int main void. 319 00:14:40,470 --> 00:14:42,750 And then down here, let's just arbitrarily start 320 00:14:42,750 --> 00:14:46,620 where we did with three integers, called list and size 3. 321 00:14:46,620 --> 00:14:49,230 So I'm just mimicking exactly where we started pictorially 322 00:14:49,230 --> 00:14:51,780 by having an array that was fixed at size 3. 323 00:14:51,780 --> 00:14:53,940 And then if I went ahead and initialized that list, 324 00:14:53,940 --> 00:14:57,540 I could just hard code-- that is type into the program itself-- 325 00:14:57,540 --> 00:15:03,370 those three values into bracket 0, 1, and 2 the numbers 1, 2, 3 respectively. 326 00:15:03,370 --> 00:15:06,300 So I'm just manually initializing that array to three values. 327 00:15:06,300 --> 00:15:08,970 And then just so that this program has some purpose in life, 328 00:15:08,970 --> 00:15:13,200 let me go ahead and do int i equals 0, i less than 3, i++. 329 00:15:13,200 --> 00:15:16,812 And then, let's just print out these elements just for good measure. 330 00:15:16,812 --> 00:15:17,770 Each of them is an int. 331 00:15:17,770 --> 00:15:19,170 So we'll use %i. 332 00:15:19,170 --> 00:15:22,350 And then I'm going to go ahead and print out list bracket i. 333 00:15:22,350 --> 00:15:25,260 So kind of a Week 2 style program, where All I'm doing 334 00:15:25,260 --> 00:15:28,470 is hard coding an array of size 3, initializing it 335 00:15:28,470 --> 00:15:32,460 with three values, 1, 2, 3; 0 indexed, and then printing them out. 336 00:15:32,460 --> 00:15:37,920 So if I go ahead and save this and make my list and then go ahead and compile 337 00:15:37,920 --> 00:15:41,640 this with ./list, I should see hopefully 1, 2, 3. 338 00:15:41,640 --> 00:15:45,060 But there's a problem with this implementation fundamentally because I 339 00:15:45,060 --> 00:15:47,880 have hardcoded-- that is typed explicitly-- 340 00:15:47,880 --> 00:15:53,040 the size of this array, how can I go about adding a fourth element? 341 00:15:53,040 --> 00:15:54,210 What would I have to do? 342 00:15:54,210 --> 00:15:56,365 Well, I could change the code up here to 4. 343 00:15:56,365 --> 00:15:57,990 And then I could add another line here. 344 00:15:57,990 --> 00:15:59,160 And then I could change this. 345 00:15:59,160 --> 00:16:00,493 But then I have to recompile it. 346 00:16:00,493 --> 00:16:02,145 And so it's certainly not dynamic. 347 00:16:02,145 --> 00:16:04,020 But we did see a function last week that lets 348 00:16:04,020 --> 00:16:06,120 you allocate more memory dynamically. 349 00:16:06,120 --> 00:16:08,340 And just to be sure what was that function? 350 00:16:08,340 --> 00:16:08,970 So malloc. 351 00:16:08,970 --> 00:16:09,470 Right? 352 00:16:09,470 --> 00:16:13,290 Now that we have malloc, you don't have to type into your program source code 353 00:16:13,290 --> 00:16:15,420 from the get go a fixed number. 354 00:16:15,420 --> 00:16:18,160 You can actually allocate some amount of memory dynamically. 355 00:16:18,160 --> 00:16:21,990 Now, here just for demonstration's sake, we'll do it to achieve the same goal, 356 00:16:21,990 --> 00:16:24,833 but in a way that's going to scale a little more effectively. 357 00:16:24,833 --> 00:16:28,000 Recall from last week that if you want to get a chunk of memory from malloc, 358 00:16:28,000 --> 00:16:30,375 it's going to return the address of that chunk of memory. 359 00:16:30,375 --> 00:16:36,000 So that suggests that I can declare a pointer to an integer called list. 360 00:16:36,000 --> 00:16:40,560 And then let me go ahead and allocate, how about, three integers initially 361 00:16:40,560 --> 00:16:43,510 times whatever the size is of an integer. 362 00:16:43,510 --> 00:16:46,680 So this is a little weird looking, but consider what this is doing. 363 00:16:46,680 --> 00:16:49,740 malloc is being asked for 3 times the size of an int. 364 00:16:49,740 --> 00:16:51,900 So give me enough memory to fit three integers. 365 00:16:51,900 --> 00:16:55,000 By definition that returns a pointer, per last week. 366 00:16:55,000 --> 00:16:57,480 So we have to assign it to a pointer on the left. 367 00:16:57,480 --> 00:17:01,660 So list is a variable now, just like x and y from our previous example, 368 00:17:01,660 --> 00:17:04,180 that's storing the address of that chunk of memory. 369 00:17:04,180 --> 00:17:07,530 But what's cool about C is that now that you 370 00:17:07,530 --> 00:17:10,109 know that list is a chunk of memory, we can actually 371 00:17:10,109 --> 00:17:14,069 borrow that same square bracket notation from Week 2. 372 00:17:14,069 --> 00:17:17,849 And this code here doesn't actually need to change. 373 00:17:17,849 --> 00:17:22,890 If you use square bracket notation next to the name of a pointer, what's 374 00:17:22,890 --> 00:17:25,190 going to happen for you automatically is the computer 375 00:17:25,190 --> 00:17:28,620 is going to go to the first byte in that chunk of memory. 376 00:17:28,620 --> 00:17:31,060 This index is going to go to the next chunk of memory. 377 00:17:31,060 --> 00:17:33,060 This is going to go to the next chunk of memory, 378 00:17:33,060 --> 00:17:36,430 all within the scope of what malloc returned for you. 379 00:17:36,430 --> 00:17:39,073 And just as an aside, how many bytes are in an integer? 380 00:17:39,073 --> 00:17:39,760 AUDIENCE: 4. 381 00:17:39,760 --> 00:17:40,740 DAVID MALAN: 4. 382 00:17:40,740 --> 00:17:43,050 And recall I briefly mentioned the expression last week 383 00:17:43,050 --> 00:17:44,400 pointer arithmetic. 384 00:17:44,400 --> 00:17:46,710 What you're also getting sort of magically 385 00:17:46,710 --> 00:17:50,790 with this square bracket notation is that bracket 0, 386 00:17:50,790 --> 00:17:52,530 it happens to be byte 0. 387 00:17:52,530 --> 00:17:54,930 Bracket 1 is not the second byte. 388 00:17:54,930 --> 00:17:57,000 It's actually 4 bytes over. 389 00:17:57,000 --> 00:17:59,550 And bracket 2 is not the third byte. 390 00:17:59,550 --> 00:18:05,910 It's actually 8 bytes over, because you allocated 4 plus 4 plus 4, 12 bytes. 391 00:18:05,910 --> 00:18:08,130 And so this square bracket notation just jumps you 392 00:18:08,130 --> 00:18:11,160 to the right place in that chunk of memory, so that you can fit int, 393 00:18:11,160 --> 00:18:13,350 int, int. 394 00:18:13,350 --> 00:18:14,096 Yeah. 395 00:18:14,096 --> 00:18:17,096 AUDIENCE: Why do you allocate a pointer to an int rather than an pointer 396 00:18:17,096 --> 00:18:18,223 to an int array? 397 00:18:18,223 --> 00:18:21,140 DAVID MALAN: Why do you allocate a pointer to an int and not a pointer 398 00:18:21,140 --> 00:18:23,090 to an int array? 399 00:18:23,090 --> 00:18:28,760 In this context, arrays and pointers are in some sense the same. 400 00:18:28,760 --> 00:18:30,640 A pointer is an address of memory. 401 00:18:30,640 --> 00:18:32,930 An array is just a chunk of memory. 402 00:18:32,930 --> 00:18:36,890 And so even though we used chunks of memory in Week 2 403 00:18:36,890 --> 00:18:39,500 by just calling them arrays, they really are just more 404 00:18:39,500 --> 00:18:42,860 generally chunks of memory that support square bracket notation. 405 00:18:42,860 --> 00:18:45,380 But now that we can allocate as much memory as we want, 406 00:18:45,380 --> 00:18:48,350 we can kind of use these two concepts interchangeably. 407 00:18:48,350 --> 00:18:52,760 And there are some subtleties in C. But this now has the same effect as Week 2. 408 00:18:52,760 --> 00:18:55,495 And this is the only new line from this week. 409 00:18:55,495 --> 00:18:57,620 But now if you're using malloc, even though I'm not 410 00:18:57,620 --> 00:18:59,960 going to do it in a more complicated program here, 411 00:18:59,960 --> 00:19:04,463 you can imagine now the code running in a loop and maybe allocating more memory 412 00:19:04,463 --> 00:19:07,130 and more memory and more memory when you need it, because malloc 413 00:19:07,130 --> 00:19:08,900 allows you to do just that. 414 00:19:08,900 --> 00:19:11,768 And we do need to do a couple of safety checks here. 415 00:19:11,768 --> 00:19:14,810 It turns out, per last week, that malloc can sometimes run out of memory. 416 00:19:14,810 --> 00:19:18,522 If you're Mac or PC or the cloud runs out of memory for your account, 417 00:19:18,522 --> 00:19:20,480 well, you might want to check the return value. 418 00:19:20,480 --> 00:19:22,700 And so good practice would be, well, wait a minute, 419 00:19:22,700 --> 00:19:26,210 if list equals equals null, let me just go ahead and return 1, 420 00:19:26,210 --> 00:19:28,670 something went wrong, because my computer is out of memory 421 00:19:28,670 --> 00:19:29,450 for some reason. 422 00:19:29,450 --> 00:19:32,900 So best practice would say anytime you allocate memory, 423 00:19:32,900 --> 00:19:36,110 always check if you've gotten back null. 424 00:19:36,110 --> 00:19:39,080 Now, let me just do something for the sake of demonstration. 425 00:19:39,080 --> 00:19:41,047 Let me move my window down here. 426 00:19:41,047 --> 00:19:43,130 Let me highlight these lines of code and just make 427 00:19:43,130 --> 00:19:45,920 the claim that highlighted here between lines 5 and 13 428 00:19:45,920 --> 00:19:50,480 are lines of code that simply allocate a list of size 3 429 00:19:50,480 --> 00:19:52,920 and store three values in it. 430 00:19:52,920 --> 00:19:55,460 That's the story where we left off a moment ago. 431 00:19:55,460 --> 00:20:00,350 Suppose now I change my mind and decide after line 13 432 00:20:00,350 --> 00:20:02,690 or maybe elsewhere in this program if it were larger, 433 00:20:02,690 --> 00:20:05,660 you know what, I actually want another integer. 434 00:20:05,660 --> 00:20:08,570 I want to resize that array. 435 00:20:08,570 --> 00:20:10,050 Well, how can I go about doing it? 436 00:20:10,050 --> 00:20:11,467 Well, let me go ahead and do this. 437 00:20:11,467 --> 00:20:15,380 Let me go ahead and allocate, for instance, another address 438 00:20:15,380 --> 00:20:23,000 and say store at that address a chunk of memory corresponding to four integers 439 00:20:23,000 --> 00:20:25,670 using the size of operator as before. 440 00:20:25,670 --> 00:20:27,800 So temporarily, let me go ahead and give myself 441 00:20:27,800 --> 00:20:30,260 a new chunk of memory that is big enough to fit 442 00:20:30,260 --> 00:20:32,347 four integers instead of just three. 443 00:20:32,347 --> 00:20:35,180 Let me practice best practices and say, you know what, just in case, 444 00:20:35,180 --> 00:20:37,967 if temp equals equals null because I'm out of memory, forget it, 445 00:20:37,967 --> 00:20:39,050 I'm done with the program. 446 00:20:39,050 --> 00:20:40,467 We're not going to proceed anyway. 447 00:20:40,467 --> 00:20:42,110 But that's just good practice now. 448 00:20:42,110 --> 00:20:43,580 But now what I want to do? 449 00:20:43,580 --> 00:20:49,340 If I now have two chunks of memory, this one is a size 3, this one is of size 4, 450 00:20:49,340 --> 00:20:52,010 what did we do the last time we wanted to move something around 451 00:20:52,010 --> 00:20:55,282 in the computer's memory, what did I physically do with the lockers? 452 00:20:55,282 --> 00:20:56,240 I think you're nodding. 453 00:20:56,240 --> 00:20:58,268 What did I do? 454 00:20:58,268 --> 00:21:00,750 AUDIENCE: You went through each and move 1 to the-- 455 00:21:00,750 --> 00:21:02,875 DAVID MALAN: Yeah, exactly, I went through each one 456 00:21:02,875 --> 00:21:05,590 and copied the value from left to right, from old to new. 457 00:21:05,590 --> 00:21:07,780 And so let me go ahead and do exactly that. 458 00:21:07,780 --> 00:21:11,890 I think I can do this with a for loop, for int i get 0, i is less than 3, 459 00:21:11,890 --> 00:21:16,220 because that's the size of the old array that size 3, i++. 460 00:21:16,220 --> 00:21:19,360 And then in this iteration, I can just do something like this-- 461 00:21:19,360 --> 00:21:21,820 use this new chunk of memory just like an array, 462 00:21:21,820 --> 00:21:25,420 because I claimed I can use my square bracket notation and store location 463 00:21:25,420 --> 00:21:28,990 i whatever is in the original list at location i. 464 00:21:28,990 --> 00:21:31,570 So this code here now, if I were to comment it, 465 00:21:31,570 --> 00:21:36,700 copy integers from old array into new array. 466 00:21:36,700 --> 00:21:39,940 And that too is just using a for loop from old to new. 467 00:21:39,940 --> 00:21:42,670 But now that's not quite everything I want to do. 468 00:21:42,670 --> 00:21:46,750 I also want to store at the location 3, 0 index, 469 00:21:46,750 --> 00:21:49,540 which means it's the fourth location, another value, number 4. 470 00:21:49,540 --> 00:21:53,260 That's why I put the additional number 4 into those lockers. 471 00:21:53,260 --> 00:21:57,310 So now with these lines of code, I have implemented the physical notion 472 00:21:57,310 --> 00:22:01,460 of copying all of the values from the old array into the new array. 473 00:22:01,460 --> 00:22:03,670 So I'm almost done, except what did we learn 474 00:22:03,670 --> 00:22:07,030 last week that you should do whenever you're done with a chunk of memory? 475 00:22:07,030 --> 00:22:09,188 Do I still need the original chunk of memory? 476 00:22:09,188 --> 00:22:09,730 AUDIENCE: No. 477 00:22:09,730 --> 00:22:09,910 DAVID MALAN: No. 478 00:22:09,910 --> 00:22:12,040 And how do I give it back to the computer? 479 00:22:12,040 --> 00:22:12,760 AUDIENCE: Free. 480 00:22:12,760 --> 00:22:13,510 DAVID MALAN: Free. 481 00:22:13,510 --> 00:22:16,420 So quite simply, I literally just call free, 482 00:22:16,420 --> 00:22:20,958 passing in the address of the chunk of memory that I want to free. 483 00:22:20,958 --> 00:22:22,750 And even though I'm passing in one address, 484 00:22:22,750 --> 00:22:25,480 the computer is going to do the heavy, lifting of remembering 485 00:22:25,480 --> 00:22:27,130 how many bytes I asked for originally. 486 00:22:27,130 --> 00:22:28,588 You don't have to worry about that. 487 00:22:28,588 --> 00:22:31,738 You just say, whatever this is pointing at, go ahead and free it all. 488 00:22:31,738 --> 00:22:34,280 So now, you know what, now that I've gotten rid of that list, 489 00:22:34,280 --> 00:22:38,172 I'm going to update list equal temp, which is just cleaning up the naming. 490 00:22:38,172 --> 00:22:39,880 Temp is kind of a stupid name for a list. 491 00:22:39,880 --> 00:22:45,010 Let me just reuse the original pointer and let list equal temp. 492 00:22:45,010 --> 00:22:49,040 And now down here if I've done everything correctly, 493 00:22:49,040 --> 00:22:51,350 it should suffice to print out that whole list. 494 00:22:51,350 --> 00:22:52,727 So let me save this. 495 00:22:52,727 --> 00:22:54,560 Let me give myself a bigger terminal window. 496 00:22:54,560 --> 00:22:56,500 Do make list again. 497 00:22:56,500 --> 00:22:58,010 A lot of mistakes. 498 00:22:58,010 --> 00:22:59,590 Let's see, first one is up here. 499 00:22:59,590 --> 00:23:04,060 Implicitly declaring library function malloc dot dot dot. 500 00:23:04,060 --> 00:23:07,870 What generally is the solution when you see implicitly declaring something? 501 00:23:07,870 --> 00:23:09,370 AUDIENCE: Header file. 502 00:23:09,370 --> 00:23:10,700 DAVID MALAN: Header file, which one do I want? 503 00:23:10,700 --> 00:23:11,283 Do you recall? 504 00:23:11,283 --> 00:23:14,143 505 00:23:14,143 --> 00:23:16,810 This is subtle and we might not have used it last time if I used 506 00:23:16,810 --> 00:23:19,360 the CS50 library, but it's stdlib.h. 507 00:23:19,360 --> 00:23:20,453 That is where malloc is. 508 00:23:20,453 --> 00:23:21,370 That is where free is. 509 00:23:21,370 --> 00:23:25,598 So stdlib.h is one new header file that contains last week's functions. 510 00:23:25,598 --> 00:23:26,640 So let me try this again. 511 00:23:26,640 --> 00:23:27,670 Let me make list. 512 00:23:27,670 --> 00:23:29,530 Nice, this time it did compile. 513 00:23:29,530 --> 00:23:36,030 ./list and-- hm, I seem to be missing that fourth number. 514 00:23:36,030 --> 00:23:39,560 But I think this is just a stupid mistake on my part. 515 00:23:39,560 --> 00:23:44,440 What did I do wrong if I look at the printing of this array? 516 00:23:44,440 --> 00:23:45,880 AUDIENCE: Size of list. 517 00:23:45,880 --> 00:23:48,790 DAVID MALAN: Yeah, down here the new list is size 4. 518 00:23:48,790 --> 00:23:50,740 So frankly, if you recall a few weeks ago, 519 00:23:50,740 --> 00:23:53,592 I encouraged you, don't just hard code numbers, magic numbers, 520 00:23:53,592 --> 00:23:54,300 in your programs. 521 00:23:54,300 --> 00:23:57,320 We should really be using constants or some other variable. 522 00:23:57,320 --> 00:23:59,860 This is exactly why, because you, just like me, 523 00:23:59,860 --> 00:24:01,300 might overlook a detail like that. 524 00:24:01,300 --> 00:24:02,560 So let me recompile it. 525 00:24:02,560 --> 00:24:03,820 And let me do list. 526 00:24:03,820 --> 00:24:06,070 And, voila, there is my 1, 2, 3, 4. 527 00:24:06,070 --> 00:24:08,230 Now, to be clear, this is kind of a stupid program, 528 00:24:08,230 --> 00:24:11,260 because I sort of decided halfway through writing 529 00:24:11,260 --> 00:24:14,023 this program, wait a minute, I want four integers, not three. 530 00:24:14,023 --> 00:24:15,940 And, of course, at that point, you should just 531 00:24:15,940 --> 00:24:17,420 delete the earlier code you wrote. 532 00:24:17,420 --> 00:24:19,120 So this is just for demonstration sake. 533 00:24:19,120 --> 00:24:22,000 If you imagine this being a bigger program, that just over time 534 00:24:22,000 --> 00:24:25,480 the human decides maybe because get int is called that they need more memory, 535 00:24:25,480 --> 00:24:27,280 this is how you would do it using malloc. 536 00:24:27,280 --> 00:24:29,488 But it turns out there's a function that can actually 537 00:24:29,488 --> 00:24:31,370 make our lives a little easier here. 538 00:24:31,370 --> 00:24:34,580 So let me go ahead and clean this up just a little bit. 539 00:24:34,580 --> 00:24:38,380 It turns out that I don't have to allocate more memory myself, 540 00:24:38,380 --> 00:24:42,460 copy all of these things myself, and then also free it. 541 00:24:42,460 --> 00:24:46,040 I can consolidate a bunch of these lines as follows. 542 00:24:46,040 --> 00:24:50,380 Instead of using malloc, I can actually say realloc, 543 00:24:50,380 --> 00:24:53,288 which as the name suggests, reallocate a chunk of memory. 544 00:24:53,288 --> 00:24:54,580 What do you want to reallocate? 545 00:24:54,580 --> 00:24:57,020 Well, I want to reallocate the list. 546 00:24:57,020 --> 00:25:01,690 And this time I want to do 4 times size of an int. 547 00:25:01,690 --> 00:25:03,842 I'm going to store this in temp temporarily. 548 00:25:03,842 --> 00:25:05,800 I'm going to make sure that nothing went wrong, 549 00:25:05,800 --> 00:25:08,925 as by checking for null, which just means, hey, you might be out of memory. 550 00:25:08,925 --> 00:25:10,810 And then I'm going to return if so. 551 00:25:10,810 --> 00:25:15,670 But down here, if all is well, I'm going to go ahead and do this. 552 00:25:15,670 --> 00:25:25,410 And watch this I now have simplified my code by quite a few lines. 553 00:25:25,410 --> 00:25:29,300 realloc, by definition-- this is another function in stdlib.h-- 554 00:25:29,300 --> 00:25:32,050 handles the process of taking an existing chunk of memory that you 555 00:25:32,050 --> 00:25:34,930 already asked for, resizes it to be this new size, 556 00:25:34,930 --> 00:25:36,730 whether it's bigger or smaller. 557 00:25:36,730 --> 00:25:40,330 It handles the copying of the data from old to new to you. 558 00:25:40,330 --> 00:25:42,700 And you just need to check that nothing went wrong 559 00:25:42,700 --> 00:25:46,000 as by checking for null here and then remembering the new value. 560 00:25:46,000 --> 00:25:48,940 So this code now, which is just six lines of code, 561 00:25:48,940 --> 00:25:51,040 previously was more than that. 562 00:25:51,040 --> 00:25:52,880 And it's just a handy function to use. 563 00:25:52,880 --> 00:25:55,992 All right, a question from earlier. 564 00:25:55,992 --> 00:26:00,384 AUDIENCE: Why can't we just create this to the temp in the beginning, 565 00:26:00,384 --> 00:26:04,776 because if we equate this to the temp, then we equate this to the pointer 566 00:26:04,776 --> 00:26:07,923 perhaps, so this to the point to the 4 bytes of memory? 567 00:26:07,923 --> 00:26:09,340 DAVID MALAN: Really good question. 568 00:26:09,340 --> 00:26:10,950 So let me roll this back by rewinding. 569 00:26:10,950 --> 00:26:13,830 And all of the finished versions are on the course's website 570 00:26:13,830 --> 00:26:15,570 if you want to play with them later. 571 00:26:15,570 --> 00:26:18,360 This was the previous version using just malloc. 572 00:26:18,360 --> 00:26:23,580 If you just do this, update a new chunk of memory, as I think you're asking, 573 00:26:23,580 --> 00:26:27,600 what's happening is you are effectively orphaning the old chunk of memory. 574 00:26:27,600 --> 00:26:29,730 Because if you change what's stored in list 575 00:26:29,730 --> 00:26:32,860 and have it store the new chunk of memory, where'd the old chunk of memory 576 00:26:32,860 --> 00:26:33,360 go? 577 00:26:33,360 --> 00:26:35,280 It's sort of floating there in your computer's memory. 578 00:26:35,280 --> 00:26:36,840 But you've lost all pointers to it. 579 00:26:36,840 --> 00:26:39,242 There's no arrow anymore pointing to it conceptually. 580 00:26:39,242 --> 00:26:41,700 So that's why we have to jump through these hoops of having 581 00:26:41,700 --> 00:26:43,650 a temporary variable just so that we don't 582 00:26:43,650 --> 00:26:45,238 lose track of things we've allocated. 583 00:26:45,238 --> 00:26:48,030 And we'll see this later today with another data structure as well. 584 00:26:48,030 --> 00:26:49,210 Yeah. 585 00:26:49,210 --> 00:26:51,660 AUDIENCE: Somebody asked this, but I don't 586 00:26:51,660 --> 00:26:56,070 understand that if you initialize temp as a pointer toward integer, 587 00:26:56,070 --> 00:26:58,955 then does it not create problems that you use it as an array. 588 00:26:58,955 --> 00:27:00,080 DAVID MALAN: Good question. 589 00:27:00,080 --> 00:27:02,410 If you initialize temp as a pointer to an integer, 590 00:27:02,410 --> 00:27:05,180 does it not create problems that you're using it as an array? 591 00:27:05,180 --> 00:27:09,020 Short answer, no, because again an array by definition from Week 2 592 00:27:09,020 --> 00:27:10,730 is just a chunk of memory. 593 00:27:10,730 --> 00:27:13,170 And in C you can use the square bracket notation 594 00:27:13,170 --> 00:27:18,000 to jump to random parts of that memory using simple arithmetic, bracket 0, 1, 595 00:27:18,000 --> 00:27:19,880 2, and so forth. 596 00:27:19,880 --> 00:27:23,000 Last week when we introduced malloc and free and now realloc, 597 00:27:23,000 --> 00:27:27,690 you now have a more low level way of allocating as much memory as you want. 598 00:27:27,690 --> 00:27:29,910 So it's a more powerful, general purpose mechanism. 599 00:27:29,910 --> 00:27:32,000 But at the end of the day, you're still getting back a chunk of memory, 600 00:27:32,000 --> 00:27:34,910 contiguous memory, bytes that are back to back to back. 601 00:27:34,910 --> 00:27:38,360 So you can certainly still use the square bracket notation 602 00:27:38,360 --> 00:27:41,660 because essentially an array is a chunk of memory. 603 00:27:41,660 --> 00:27:45,380 And malloc gives you a chunk of memory, ergo you can treat it as an array. 604 00:27:45,380 --> 00:27:47,330 They really are equivalent in that sense. 605 00:27:47,330 --> 00:27:50,060 You just don't get as many user friendly features as with arrays, 606 00:27:50,060 --> 00:27:54,440 like them being freed for you, as we never until last week 607 00:27:54,440 --> 00:27:56,000 do we have to free chunks of memory. 608 00:27:56,000 --> 00:27:59,140 Arrays do that for you automatically thanks to the compiler. 609 00:27:59,140 --> 00:27:59,820 Yeah. 610 00:27:59,820 --> 00:28:06,092 AUDIENCE: Do you not have to recreate the list for temp after line 37. 611 00:28:06,092 --> 00:28:06,800 DAVID MALAN: Yes. 612 00:28:06,800 --> 00:28:07,190 Thank you. 613 00:28:07,190 --> 00:28:08,148 So there is a bug here. 614 00:28:08,148 --> 00:28:10,985 And if I ran Valgrind on this code, I would see exactly 615 00:28:10,985 --> 00:28:13,760 that, that I'm leaking some number of bytes. 616 00:28:13,760 --> 00:28:19,640 So indeed, at the end of this program, I want to free the-- 617 00:28:19,640 --> 00:28:22,910 let's make sure-- yep, I want to free now 618 00:28:22,910 --> 00:28:26,420 the new chunk of memory, which is a size 4, to avoid 619 00:28:26,420 --> 00:28:28,040 exactly the problem you identified. 620 00:28:28,040 --> 00:28:29,810 Good catch. 621 00:28:29,810 --> 00:28:34,140 All right , so just for now, even if that's a lot of code, 622 00:28:34,140 --> 00:28:36,740 let's consider now higher level takeaways from this, 623 00:28:36,740 --> 00:28:41,000 just so that we can then motivates an alternative approach that allows us 624 00:28:41,000 --> 00:28:44,840 to stitch together somewhat fancier data structures instead. 625 00:28:44,840 --> 00:28:49,712 So in general, a data structure is just a programming construct in C, C++, 626 00:28:49,712 --> 00:28:52,670 Java, Python-- we'll see them in different languages over the remainder 627 00:28:52,670 --> 00:28:53,480 of the term-- 628 00:28:53,480 --> 00:28:57,710 that allow you to store information differently in your computer's memory. 629 00:28:57,710 --> 00:29:04,010 In C, everything we're about to do today is thanks to these three features of C. 630 00:29:04,010 --> 00:29:06,860 So even though this may feel like a lot of syntax, everything we do 631 00:29:06,860 --> 00:29:11,420 today boils down to three pieces of syntax that you have seen before. 632 00:29:11,420 --> 00:29:14,660 Struct, this recall was a keyword in C that allows 633 00:29:14,660 --> 00:29:16,200 you to create your own structure. 634 00:29:16,200 --> 00:29:19,100 For instance, a couple of weeks ago, we created a person structure, 635 00:29:19,100 --> 00:29:20,840 who had a name and a number. 636 00:29:20,840 --> 00:29:25,160 And that gives us our own data type that is structured to contain two values, 637 00:29:25,160 --> 00:29:27,070 like name and number. 638 00:29:27,070 --> 00:29:28,820 You use structures as well for the problem 639 00:29:28,820 --> 00:29:31,190 set involving bitmaps, the bitmap header and so forth. 640 00:29:31,190 --> 00:29:33,200 Those were data structures as well. 641 00:29:33,200 --> 00:29:35,630 What do we use the dot notation for just to be clear? 642 00:29:35,630 --> 00:29:37,850 And you definitely use this when manipulating 643 00:29:37,850 --> 00:29:39,630 red and green and blue pixels recently. 644 00:29:39,630 --> 00:29:40,130 Yeah. 645 00:29:40,130 --> 00:29:41,900 AUDIENCE: To access a property of a structure. 646 00:29:41,900 --> 00:29:42,320 DAVID MALAN: Exactly. 647 00:29:42,320 --> 00:29:44,160 To access a property of a structure. 648 00:29:44,160 --> 00:29:47,390 So if you want to get at a person's name or get at a person's number, 649 00:29:47,390 --> 00:29:50,090 you use the variable's name and then a dot operator 650 00:29:50,090 --> 00:29:51,760 to get inside of that data structure. 651 00:29:51,760 --> 00:29:52,970 So we've seen that before. 652 00:29:52,970 --> 00:29:56,487 Then last week and again today, we see the star operator, 653 00:29:56,487 --> 00:29:59,070 which can kind of mean different things in different contexts. 654 00:29:59,070 --> 00:30:02,120 But it's always related here to memory as of last week. 655 00:30:02,120 --> 00:30:05,930 This is the dereference operator that allows you to go to a chunk of memory 656 00:30:05,930 --> 00:30:07,820 by way of this thing called a pointer. 657 00:30:07,820 --> 00:30:10,160 So even if today feels like a bit of that fire hose-- 658 00:30:10,160 --> 00:30:12,320 and again per my email, this is where things now 659 00:30:12,320 --> 00:30:16,850 begin to level off-- realize that it all boils down to first principles, 660 00:30:16,850 --> 00:30:19,970 or if you will sort of three scratch-like puzzle pieces 661 00:30:19,970 --> 00:30:23,760 that we're now going to assemble into more interesting solutions to problems. 662 00:30:23,760 --> 00:30:26,930 So allow me to introduce something called a linked list. 663 00:30:26,930 --> 00:30:30,980 A linked list, as we'll see is going to allow you to store a list of values. 664 00:30:30,980 --> 00:30:33,410 Now an array allows you to store a list of values. 665 00:30:33,410 --> 00:30:36,020 But what are some of the downsides with an array? 666 00:30:36,020 --> 00:30:38,480 Well, an array is a fixed chunk of memory. 667 00:30:38,480 --> 00:30:40,730 And if you want to resize that array to add more 668 00:30:40,730 --> 00:30:43,130 values to it, what do you have to do? 669 00:30:43,130 --> 00:30:45,710 Well, you minimally have to allocate more memory. 670 00:30:45,710 --> 00:30:48,740 You need to copy all of those values from old to new. 671 00:30:48,740 --> 00:30:50,870 And then you can go about your business. 672 00:30:50,870 --> 00:30:53,600 Now, realloc is a function that makes that a little simpler. 673 00:30:53,600 --> 00:30:56,090 But realloc is doing the exact same legwork 674 00:30:56,090 --> 00:30:59,540 that I was doing between the lockers of copying value, freeing memory, 675 00:30:59,540 --> 00:31:00,180 and so forth. 676 00:31:00,180 --> 00:31:01,138 So it needs to be done. 677 00:31:01,138 --> 00:31:04,850 And that's why insertion into an array is going to be big O of n, 678 00:31:04,850 --> 00:31:07,160 because it might take you that much time to copy 679 00:31:07,160 --> 00:31:09,303 the whole array into a new space. 680 00:31:09,303 --> 00:31:10,970 So that feels kind of suboptimal, right? 681 00:31:10,970 --> 00:31:12,870 Arrays can be slow in that sense. 682 00:31:12,870 --> 00:31:14,960 But what was the appeal of an array? 683 00:31:14,960 --> 00:31:16,220 Like what's good about arrays? 684 00:31:16,220 --> 00:31:17,930 Because we don't want to abandon them entirely. 685 00:31:17,930 --> 00:31:18,150 Yeah. 686 00:31:18,150 --> 00:31:20,150 AUDIENCE: You can index into them really easily. 687 00:31:20,150 --> 00:31:22,760 DAVID MALAN: You can index into them really easily, not only syntactically 688 00:31:22,760 --> 00:31:25,700 with the square brackets, but you have constant time access-- 689 00:31:25,700 --> 00:31:27,103 this is known as random access. 690 00:31:27,103 --> 00:31:30,020 And it's not random in the sense that you just end up who knows where. 691 00:31:30,020 --> 00:31:33,520 You can just jump to bracket 0 or 1 or 2 instantly. 692 00:31:33,520 --> 00:31:36,270 It took me, the human, more time because I had to physically walk. 693 00:31:36,270 --> 00:31:40,370 But a computer is going to be able to jump to 0, 1, 2, 3 instantly. 694 00:31:40,370 --> 00:31:42,392 And so arrays are super fast. 695 00:31:42,392 --> 00:31:44,600 And they lend themselves to things like binary search 696 00:31:44,600 --> 00:31:46,710 as we've seen now for some time. 697 00:31:46,710 --> 00:31:49,430 But what if we use the canvas that is our computer's 698 00:31:49,430 --> 00:31:51,430 memory like a little more cleverly? 699 00:31:51,430 --> 00:31:54,680 We don't have to just plop things next to each other, next to each other, next 700 00:31:54,680 --> 00:31:56,597 to each other, and then hope for the best hope 701 00:31:56,597 --> 00:31:58,777 that there's still more memory back to back to back. 702 00:31:58,777 --> 00:32:00,860 What if we instead are a bit more clever about it? 703 00:32:00,860 --> 00:32:03,200 And suppose we want to store the number 1. 704 00:32:03,200 --> 00:32:05,930 And that happens to be an address 0x123. 705 00:32:05,930 --> 00:32:06,620 It's arbitrary. 706 00:32:06,620 --> 00:32:09,470 But recall from last week that every byte of memory in your computer 707 00:32:09,470 --> 00:32:10,340 is stored somewhere. 708 00:32:10,340 --> 00:32:13,700 So let's propose that 1 is stored at 0x123. 709 00:32:13,700 --> 00:32:17,270 Suppose now that this represents an array of size 1 710 00:32:17,270 --> 00:32:20,300 and you want to add a second value to this array. 711 00:32:20,300 --> 00:32:22,640 Or let's start calling things more generally a list. 712 00:32:22,640 --> 00:32:25,160 A list like in the real world is just a list of values. 713 00:32:25,160 --> 00:32:26,583 This list is of size 1. 714 00:32:26,583 --> 00:32:29,750 Now maybe there's a lot of EMMAs in this memory that are getting in the way. 715 00:32:29,750 --> 00:32:33,410 But suppose that there is some free space a little lower in your computer's 716 00:32:33,410 --> 00:32:34,490 memory over there. 717 00:32:34,490 --> 00:32:36,360 So it's not here. 718 00:32:36,360 --> 00:32:37,670 It's not here. 719 00:32:37,670 --> 00:32:39,647 It's not available here or here or here. 720 00:32:39,647 --> 00:32:40,730 There's other stuff there. 721 00:32:40,730 --> 00:32:43,670 But suppose the computer does have some available memory over here 722 00:32:43,670 --> 00:32:46,090 in which you can store the number two, just because. 723 00:32:46,090 --> 00:32:48,590 And that address happens to be 456. 724 00:32:48,590 --> 00:32:50,232 Finally, you want to store third value. 725 00:32:50,232 --> 00:32:52,940 And it turns out that the nearest possible location is down here, 726 00:32:52,940 --> 00:32:53,720 number 3. 727 00:32:53,720 --> 00:32:57,320 That happens to be at address 0x789. 728 00:32:57,320 --> 00:33:01,250 So this is not an array by definition, because the 1, the 2, the 3 729 00:33:01,250 --> 00:33:03,560 are not contiguous back to back to back. 730 00:33:03,560 --> 00:33:06,980 You cannot square bracket notation here because square bracket notation 731 00:33:06,980 --> 00:33:11,520 requires, per Week 2, that all of your values be next to each other, 732 00:33:11,520 --> 00:33:14,330 just like the lockers here. 733 00:33:14,330 --> 00:33:18,240 This picture, where 1 is over here, 2 is over here, 3 is over here is more like, 734 00:33:18,240 --> 00:33:20,090 oh, maybe this is 0x123. 735 00:33:20,090 --> 00:33:22,070 Maybe this is 0x456. 736 00:33:22,070 --> 00:33:23,652 Maybe this is 0x789. 737 00:33:23,652 --> 00:33:25,110 They're kind of all over the place. 738 00:33:25,110 --> 00:33:28,520 And that's just because that's what's available in your computer's memory. 739 00:33:28,520 --> 00:33:31,910 But what if I get a little extravagant and I 740 00:33:31,910 --> 00:33:35,090 start to use, not just one chunk of memory 741 00:33:35,090 --> 00:33:38,180 to store each value, like 1, 2, 3, what if I go ahead 742 00:33:38,180 --> 00:33:42,590 and give myself twice as much memory just to give myself some flexibility? 743 00:33:42,590 --> 00:33:46,730 So I now conceptual use this chunk of memory to represent one. 744 00:33:46,730 --> 00:33:49,227 This junk to represent 2, this chunk to represent 3. 745 00:33:49,227 --> 00:33:52,310 But you know what I'm going to use the latter half of each of those chunks 746 00:33:52,310 --> 00:33:54,108 for? 747 00:33:54,108 --> 00:33:54,650 Any thoughts? 748 00:33:54,650 --> 00:33:56,290 AUDIENCE: Address to the next. 749 00:33:56,290 --> 00:33:58,480 DAVID MALAN: An address to the next chunk of memory. 750 00:33:58,480 --> 00:34:01,390 So, for instance, if my goal is to keep this list sorted, 751 00:34:01,390 --> 00:34:05,080 so I want conceptually to have a list that stores 1, 2, 3, 752 00:34:05,080 --> 00:34:08,620 why don't I use this as sort of a map or a breadcrumb, if you will, 753 00:34:08,620 --> 00:34:11,350 that points to the next chunk of memory? 754 00:34:11,350 --> 00:34:14,260 And why don't I use this chunk of memory to point at the next one? 755 00:34:14,260 --> 00:34:17,350 And then this chunk of memory, you know what, I just need a special value here. 756 00:34:17,350 --> 00:34:19,400 What would be a good arbitrary way to say, mm, mm, 757 00:34:19,400 --> 00:34:20,775 there's nothing more in the list? 758 00:34:20,775 --> 00:34:21,400 AUDIENCE: Null. 759 00:34:21,400 --> 00:34:23,067 DAVID MALAN: It's something called null. 760 00:34:23,067 --> 00:34:26,469 And this technically is different from backslash 0, which is a char. 761 00:34:26,469 --> 00:34:28,090 This is something called-- 762 00:34:28,090 --> 00:34:30,380 well, this is in hexadecimal 0. 763 00:34:30,380 --> 00:34:33,010 Now, starting today-- and we saw the super briefly last week-- 764 00:34:33,010 --> 00:34:37,150 this is n-u-l-l with two L's-- this was stupid left hand not really talking 765 00:34:37,150 --> 00:34:40,960 to right hand-- n-u-l is backslash 0, which is a char. 766 00:34:40,960 --> 00:34:43,909 n-u-l-l is a pointer. 767 00:34:43,909 --> 00:34:46,989 But they both equal 0 underneath the hood. 768 00:34:46,989 --> 00:34:50,020 So you just store special value that says that's it for the list. 769 00:34:50,020 --> 00:34:53,469 Now, we last week I proposed who really cares where things are in memory? 770 00:34:53,469 --> 00:34:54,909 So indeed, let's do that again. 771 00:34:54,909 --> 00:34:58,420 Let's just use pointers drawn as arrows in this artist's rendition 772 00:34:58,420 --> 00:35:02,200 to say this list of numbers, 1 2, 3, is now linked. 773 00:35:02,200 --> 00:35:06,760 A linked list is just a data structure containing multiple chunks of memory 774 00:35:06,760 --> 00:35:08,475 that are somehow linked together. 775 00:35:08,475 --> 00:35:11,350 And if underneath the hood, so to speak, they're just linked together 776 00:35:11,350 --> 00:35:14,830 by way of pointers, and the price we pay is 777 00:35:14,830 --> 00:35:18,820 that rather than now in a linked list storing just the numbers 1, 2, 3, which 778 00:35:18,820 --> 00:35:22,990 we could have in an array, now you have to store twice as much information, 1, 779 00:35:22,990 --> 00:35:27,700 2, 3, as well as three pointers, two of which are in use, the other of which 780 00:35:27,700 --> 00:35:31,370 is ready to go if I want to add something to this list. 781 00:35:31,370 --> 00:35:34,120 So this is to say we can now create structures 782 00:35:34,120 --> 00:35:36,190 that look like this in the computer's memory 783 00:35:36,190 --> 00:35:39,085 just by using this new feature of pointers. 784 00:35:39,085 --> 00:35:41,210 What might these structures look like individually? 785 00:35:41,210 --> 00:35:43,600 Well, any one of these numbers has two fields it seems. 786 00:35:43,600 --> 00:35:44,870 One is an integer. 787 00:35:44,870 --> 00:35:45,880 We'll call it number. 788 00:35:45,880 --> 00:35:47,890 And then there's another field here. 789 00:35:47,890 --> 00:35:51,098 That let's call it next by convention, but we could call it anything we want. 790 00:35:51,098 --> 00:35:53,140 It's just another chunk of memory that's pointing 791 00:35:53,140 --> 00:35:54,910 to the next element in the list. 792 00:35:54,910 --> 00:35:57,238 Well a couple of weeks ago, we introduced persons. 793 00:35:57,238 --> 00:35:58,780 And a person had a name and a number. 794 00:35:58,780 --> 00:36:01,630 That's not relevant today, because we're not dealing with names and numbers. 795 00:36:01,630 --> 00:36:03,040 We're just dealing with integers. 796 00:36:03,040 --> 00:36:06,670 So let me propose that we back that up and still use the same syntax 797 00:36:06,670 --> 00:36:08,150 as a couple of weeks ago. 798 00:36:08,150 --> 00:36:12,430 But instead of defining a person, let's call this rectangle a node. 799 00:36:12,430 --> 00:36:15,700 So this is a term of art in computer science node-- n-o-d-e-- 800 00:36:15,700 --> 00:36:20,860 just represents this rectangular concept, a chunk of memory 801 00:36:20,860 --> 00:36:22,690 that you're using for interesting purposes. 802 00:36:22,690 --> 00:36:25,930 It's sort of a node in a graph if familiar from math. 803 00:36:25,930 --> 00:36:27,795 But what do I want this know to store? 804 00:36:27,795 --> 00:36:30,170 Well, let me go ahead and store a couple of things in it. 805 00:36:30,170 --> 00:36:32,253 One, a number, and that's just going to be an int. 806 00:36:32,253 --> 00:36:34,300 And I'm going to go ahead and call it number. 807 00:36:34,300 --> 00:36:39,630 And then any guesses as to what the second field should be declared as? 808 00:36:39,630 --> 00:36:42,400 I want to call it next just because it's conventional. 809 00:36:42,400 --> 00:36:44,938 What should its data type be? 810 00:36:44,938 --> 00:36:45,480 Any thoughts? 811 00:36:45,480 --> 00:36:45,880 Yeah in back. 812 00:36:45,880 --> 00:36:46,780 AUDIENCE: A pointer. 813 00:36:46,780 --> 00:36:47,738 DAVID MALAN: A pointer. 814 00:36:47,738 --> 00:36:49,650 And a pointer to what would you say? 815 00:36:49,650 --> 00:36:50,970 AUDIENCE: The next number. 816 00:36:50,970 --> 00:36:54,850 DAVID MALAN: A pointer to the next number, and not quite the next number 817 00:36:54,850 --> 00:36:58,940 per se, because now we have not numbers only, we now have nodes. 818 00:36:58,940 --> 00:37:02,842 So those three yellow boxes, 1 2, 3, those are now nodes, I would say. 819 00:37:02,842 --> 00:37:03,550 So you know what? 820 00:37:03,550 --> 00:37:06,490 Let's go ahead and call this node star. 821 00:37:06,490 --> 00:37:08,660 But you can't technically quite do this. 822 00:37:08,660 --> 00:37:11,800 It turns out that C, recall, takes you super literally. 823 00:37:11,800 --> 00:37:13,880 And notice, if you read this code top to bottom, 824 00:37:13,880 --> 00:37:15,820 left to right, at which point in the story 825 00:37:15,820 --> 00:37:18,660 does the word node come into existence? 826 00:37:18,660 --> 00:37:20,715 Like not until the very last line. 827 00:37:20,715 --> 00:37:22,090 That's where we mentioned person. 828 00:37:22,090 --> 00:37:23,530 This is where we mentioned node. 829 00:37:23,530 --> 00:37:26,980 So, unfortunately, you can't use a node inside of a node, 830 00:37:26,980 --> 00:37:29,500 because it literally doesn't exist in the computer's mind 831 00:37:29,500 --> 00:37:31,118 until two lines later. 832 00:37:31,118 --> 00:37:32,410 So there's an alternative here. 833 00:37:32,410 --> 00:37:33,415 There is a solution. 834 00:37:33,415 --> 00:37:34,540 It's a little more verbose. 835 00:37:34,540 --> 00:37:39,340 Instead of just saying typedef struct, you actually say typedef struct node. 836 00:37:39,340 --> 00:37:41,360 Just add the word that you want to use. 837 00:37:41,360 --> 00:37:45,897 And now down here, you can say struct node star next. 838 00:37:45,897 --> 00:37:47,230 It's kind of like a work around. 839 00:37:47,230 --> 00:37:48,520 This is the way C works. 840 00:37:48,520 --> 00:37:50,350 But this is the exact same idea. 841 00:37:50,350 --> 00:37:54,460 This means that any node in the data structure, any yellow rectangular box 842 00:37:54,460 --> 00:37:56,440 has a number and a pointer. 843 00:37:56,440 --> 00:38:01,360 And that pointer by definition is a pointer to a node structure. 844 00:38:01,360 --> 00:38:03,610 We just have to express it more verbosely 845 00:38:03,610 --> 00:38:07,360 here, because the shorthand notation node doesn't exist until the bottom. 846 00:38:07,360 --> 00:38:11,260 It's just sort of an annoying reality of the syntax. 847 00:38:11,260 --> 00:38:14,950 All right, any questions on that definition of struct? 848 00:38:14,950 --> 00:38:16,150 Yeah. 849 00:38:16,150 --> 00:38:18,630 AUDIENCE: Do you have to put node twice? 850 00:38:18,630 --> 00:38:20,590 DAVID MALAN: Do you have to put node twice? 851 00:38:20,590 --> 00:38:22,360 You don't have to put node twice. 852 00:38:22,360 --> 00:38:25,780 You can actually use any word here you want. 853 00:38:25,780 --> 00:38:27,500 You can call this x or y or z. 854 00:38:27,500 --> 00:38:30,250 But then you're going to have to make this be struct x or struct y 855 00:38:30,250 --> 00:38:31,240 or struct z. 856 00:38:31,240 --> 00:38:33,800 By convention, I would just reuse the same term. 857 00:38:33,800 --> 00:38:35,830 So this is the formal name for this structure. 858 00:38:35,830 --> 00:38:37,670 It is a struct node. 859 00:38:37,670 --> 00:38:41,390 This is the nickname for the structure, more succinctly, node. 860 00:38:41,390 --> 00:38:42,610 And that's what typedef does. 861 00:38:42,610 --> 00:38:46,240 It gives you an alias from struct node to just node, 862 00:38:46,240 --> 00:38:49,660 just because it's easier to type elsewhere in the program. 863 00:38:49,660 --> 00:38:52,060 Other questions? 864 00:38:52,060 --> 00:38:54,770 All right, so what can we do now with this structure? 865 00:38:54,770 --> 00:38:56,860 Well, let's go ahead and build something up here. 866 00:38:56,860 --> 00:38:59,850 All right, so this is about as scary as the code gets today. 867 00:38:59,850 --> 00:39:02,380 We'll focus primarily on pictures and concepts hereafter. 868 00:39:02,380 --> 00:39:04,690 But let's take a tour through one implementation 869 00:39:04,690 --> 00:39:06,370 of this same idea of a linked list. 870 00:39:06,370 --> 00:39:09,520 How would we go about representing a linked list initially? 871 00:39:09,520 --> 00:39:11,420 Well, initially the list is empty. 872 00:39:11,420 --> 00:39:13,587 And if you want to represent something that's empty, 873 00:39:13,587 --> 00:39:14,830 we minimally need something. 874 00:39:14,830 --> 00:39:17,320 So let me draw it as this empty box. 875 00:39:17,320 --> 00:39:19,660 And this is just a pointer to a node, I claim. 876 00:39:19,660 --> 00:39:23,100 So how do I implement the notion of a linked list that has no numbers in it 877 00:39:23,100 --> 00:39:23,800 yet? 878 00:39:23,800 --> 00:39:26,810 Well, why don't we just use this, which I can implement as follows. 879 00:39:26,810 --> 00:39:30,550 node star, and I'm going to call it list, but then set it equal to NULL. 880 00:39:30,550 --> 00:39:33,550 Right, if there's no numbers available-- there's no 1, there's no 2, 881 00:39:33,550 --> 00:39:34,450 there's no three-- 882 00:39:34,450 --> 00:39:38,710 I should at least have a variable that connotes there is no list. 883 00:39:38,710 --> 00:39:42,490 And the easiest way to do that is in the absence of a value, store 0, 884 00:39:42,490 --> 00:39:45,940 which has this new nickname as of last week and this, called null. 885 00:39:45,940 --> 00:39:49,930 So this variable here represents this picture here. 886 00:39:49,930 --> 00:39:52,780 And notice, there's no numbers, because the list is empty. 887 00:39:52,780 --> 00:39:55,090 But we do initialize it to NULL so that we 888 00:39:55,090 --> 00:39:56,920 don't think that there's an arrow pointing 889 00:39:56,920 --> 00:39:58,510 to some specific chunk of memory yet. 890 00:39:58,510 --> 00:39:59,860 Because there isn't yet. 891 00:39:59,860 --> 00:40:03,370 Now, suppose I want to go ahead and insert a number into this list. 892 00:40:03,370 --> 00:40:05,470 Suppose I want to insert the number 2. 893 00:40:05,470 --> 00:40:08,350 I can't just allocate space for 2 now. 894 00:40:08,350 --> 00:40:11,200 I have to allocate space for 2 and that pointer, 895 00:40:11,200 --> 00:40:14,817 otherwise known together as a node, per the previous slide. 896 00:40:14,817 --> 00:40:16,150 So how do I go about doing this? 897 00:40:16,150 --> 00:40:19,182 Well, in code, I can borrow the same technique 898 00:40:19,182 --> 00:40:21,640 that we've used a couple times now, even though it's uglier 899 00:40:21,640 --> 00:40:24,985 than some past approaches, malloc then an integer. 900 00:40:24,985 --> 00:40:26,110 How many bytes do you want? 901 00:40:26,110 --> 00:40:27,402 I don't know how big a node is. 902 00:40:27,402 --> 00:40:31,360 I could probably do the math and add up the integer and then the pointer. 903 00:40:31,360 --> 00:40:34,720 But, you know what, size of node is just going to answer that question for me. 904 00:40:34,720 --> 00:40:38,300 So this returns that chunk of memory that's big enough to store a node. 905 00:40:38,300 --> 00:40:40,300 And I'm going to store that just for temporarily 906 00:40:40,300 --> 00:40:43,330 in a variable called n, n for node, and that's going to just 907 00:40:43,330 --> 00:40:45,330 be a temporary variable, if you will. 908 00:40:45,330 --> 00:40:47,830 So, again, even though there's some new stuff going on here, 909 00:40:47,830 --> 00:40:48,872 this is just like before. 910 00:40:48,872 --> 00:40:50,870 Previously, I wanted to allocate an integer. 911 00:40:50,870 --> 00:40:52,510 Now, I want more than an integer. 912 00:40:52,510 --> 00:40:53,830 I want an actual node. 913 00:40:53,830 --> 00:40:57,640 And malloc returns an address, which means I must assign it to a variable. 914 00:40:57,640 --> 00:40:59,392 That's an address on the left hand side. 915 00:40:59,392 --> 00:41:00,850 All right, what should I always do? 916 00:41:00,850 --> 00:41:03,290 Slight spoiler because I clicked ahead a moment ago-- 917 00:41:03,290 --> 00:41:05,440 actually, we're going to forge ahead here. 918 00:41:05,440 --> 00:41:07,840 This is the ugliest thing we'll see. 919 00:41:07,840 --> 00:41:10,180 What is this second line of code doing here? 920 00:41:10,180 --> 00:41:13,330 921 00:41:13,330 --> 00:41:15,280 What's going on here do you think? 922 00:41:15,280 --> 00:41:16,280 Yeah, what do you think? 923 00:41:16,280 --> 00:41:18,898 AUDIENCE: It's setting the number of that node to 2. 924 00:41:18,898 --> 00:41:19,690 DAVID MALAN: It is. 925 00:41:19,690 --> 00:41:21,740 It's setting the number of that node to 2. 926 00:41:21,740 --> 00:41:24,640 But why this crazy syntax, which we've never used before? 927 00:41:24,640 --> 00:41:26,650 Well, star n, we did see last week. 928 00:41:26,650 --> 00:41:28,180 That just means go there. 929 00:41:28,180 --> 00:41:30,850 The parentheses are just necessary for order of operations 930 00:41:30,850 --> 00:41:33,517 so that the compiler knows, OK, first go there. 931 00:41:33,517 --> 00:41:36,100 And then once you're there, what do you want to get access to? 932 00:41:36,100 --> 00:41:36,960 The number field. 933 00:41:36,960 --> 00:41:38,470 So use the same dot notation. 934 00:41:38,470 --> 00:41:39,760 So it's super ugly. 935 00:41:39,760 --> 00:41:43,070 But it's just doing two different things that we've seen in isolation. 936 00:41:43,070 --> 00:41:45,520 Go to the address in n, which is that chunk of memory. 937 00:41:45,520 --> 00:41:48,820 And then access the number field and set it equal to 2. 938 00:41:48,820 --> 00:41:53,337 Fortunately, C has some syntactic sugar, just an easier, prettier way 939 00:41:53,337 --> 00:41:53,920 of doing this. 940 00:41:53,920 --> 00:41:57,003 And it happens to look wonderfully like the actual thing we keep drawing-- 941 00:41:57,003 --> 00:41:58,690 this arrow notation. 942 00:41:58,690 --> 00:42:01,760 So if you ever see and you ever write this notation in C-- 943 00:42:01,760 --> 00:42:05,050 and I'm pretty sure this is the last new syntax we'll see-- 944 00:42:05,050 --> 00:42:07,990 this arrow, this very sort of hackish era 945 00:42:07,990 --> 00:42:10,660 where you hit a hyphen and then a greater than sign, 946 00:42:10,660 --> 00:42:12,740 this means the exact same thing as this. 947 00:42:12,740 --> 00:42:13,990 This is just annoying to type. 948 00:42:13,990 --> 00:42:15,040 It's ugly to look at. 949 00:42:15,040 --> 00:42:16,457 This is just slightly more pretty. 950 00:42:16,457 --> 00:42:18,820 And frankly, it's reminiscent of the pictures 951 00:42:18,820 --> 00:42:21,370 we've been drawing with the arrows pointing left and right. 952 00:42:21,370 --> 00:42:22,828 What's the next thing I want to do? 953 00:42:22,828 --> 00:42:25,120 After allocating this new node for the number 2, 954 00:42:25,120 --> 00:42:28,414 what do I want to put as well in that node? 955 00:42:28,414 --> 00:42:29,790 AUDIENCE: Put in the address. 956 00:42:29,790 --> 00:42:31,290 DAVID MALAN: Sorry, a little louder. 957 00:42:31,290 --> 00:42:32,650 AUDIENCE: The next address. 958 00:42:32,650 --> 00:42:33,880 DAVID MALAN: The address of the next node. 959 00:42:33,880 --> 00:42:35,170 But there is no next node yet. 960 00:42:35,170 --> 00:42:36,790 So what value could I use as a placeholder? 961 00:42:36,790 --> 00:42:37,540 AUDIENCE: Null. 962 00:42:37,540 --> 00:42:37,960 DAVID MALAN: Null. 963 00:42:37,960 --> 00:42:40,490 And so indeed, I'm going to do this arrow notation as well. 964 00:42:40,490 --> 00:42:43,420 You never need to do the star and then the dots and the parentheses. 965 00:42:43,420 --> 00:42:46,280 Everyone just writes the code like this in the real world. 966 00:42:46,280 --> 00:42:48,970 So n arrow next gets null. 967 00:42:48,970 --> 00:42:52,650 That now gives me that picture we were drawing. 968 00:42:52,650 --> 00:42:56,170 But, again, sanity check, if you ever use malloc, 969 00:42:56,170 --> 00:42:58,240 you should always check the return value. 970 00:42:58,240 --> 00:43:00,370 So just to be super precise, let me go ahead 971 00:43:00,370 --> 00:43:04,720 and add a couple more lines of code that just check if n is not null, 972 00:43:04,720 --> 00:43:06,400 go ahead and do the following. 973 00:43:06,400 --> 00:43:09,550 Conversely, I could check if n is null and then just exit or return 974 00:43:09,550 --> 00:43:11,350 depending on where I'm using this code. 975 00:43:11,350 --> 00:43:14,860 But you don't want to touch n and use this arrow notation 976 00:43:14,860 --> 00:43:17,958 unless you're sure n is not null. 977 00:43:17,958 --> 00:43:19,000 So what have I just done? 978 00:43:19,000 --> 00:43:20,970 My picture now looks like this. 979 00:43:20,970 --> 00:43:22,720 But this, of course, is not a linked list, 980 00:43:22,720 --> 00:43:24,670 because there's no linkage going on. 981 00:43:24,670 --> 00:43:27,520 I really need to do the equivalent of pointing an arrow 982 00:43:27,520 --> 00:43:30,340 from this pointer to this structure. 983 00:43:30,340 --> 00:43:32,870 I need to implement an arrow that looks like this. 984 00:43:32,870 --> 00:43:34,933 So how can we go about implementing that in code? 985 00:43:34,933 --> 00:43:37,600 Well, let me propose that this is what it ultimately looks like. 986 00:43:37,600 --> 00:43:39,280 We just need to draw that arrow. 987 00:43:39,280 --> 00:43:40,790 How do I do that? 988 00:43:40,790 --> 00:43:42,700 Well, it's as simple as this. 989 00:43:42,700 --> 00:43:47,200 If list is a variable, and it's previously initialized to null-- 990 00:43:47,200 --> 00:43:48,340 it's just a place holder-- 991 00:43:48,340 --> 00:43:53,380 and n is my temporary variable storing the new node, it suffices to say, 992 00:43:53,380 --> 00:43:55,420 well, lists should not be null anymore. 993 00:43:55,420 --> 00:43:59,740 It should literally equal the address of that chunk of memory I just allocated. 994 00:43:59,740 --> 00:44:03,648 And that's how we get this picture now inside of the computer. 995 00:44:03,648 --> 00:44:05,440 Now, let me do a couple of more operations. 996 00:44:05,440 --> 00:44:07,570 Suppose I want to add to the list the number 4. 997 00:44:07,570 --> 00:44:08,860 How do I add the number 4? 998 00:44:08,860 --> 00:44:10,990 Well, the number 4 is inside of its own node. 999 00:44:10,990 --> 00:44:13,030 So I have to go back to code like this. 1000 00:44:13,030 --> 00:44:16,348 I need to allocate another node that installs the number 4 there. 1001 00:44:16,348 --> 00:44:17,140 But that's not all. 1002 00:44:17,140 --> 00:44:18,970 You don't want to just create the node, because it's 1003 00:44:18,970 --> 00:44:21,120 otherwise out there in no man's land, so to speak. 1004 00:44:21,120 --> 00:44:22,670 We now need to add the arrow. 1005 00:44:22,670 --> 00:44:26,700 But now it gets a little non-obvious how you update the arrows, 1006 00:44:26,700 --> 00:44:28,450 right, because I don't want to update list 1007 00:44:28,450 --> 00:44:32,170 to point at 4, because that's going to sort of orphan, so to speak, number 2. 1008 00:44:32,170 --> 00:44:34,090 And it just kind of float away conceptually. 1009 00:44:34,090 --> 00:44:38,140 I really want to update 2's pointer to 4. 1010 00:44:38,140 --> 00:44:39,015 So how can I do that? 1011 00:44:39,015 --> 00:44:41,973 Well, you know what I can do is I can kind of follow these breadcrumbs. 1012 00:44:41,973 --> 00:44:43,480 If I declare a temporary pointer-- 1013 00:44:43,480 --> 00:44:46,300 and I'll do it using a little extravagantly last week 1014 00:44:46,300 --> 00:44:50,500 like this little pointer notation-- if I'm a variable called temp, TMP, 1015 00:44:50,500 --> 00:44:54,130 I can go ahead and point at the same thing that list is pointing at. 1016 00:44:54,130 --> 00:44:58,040 And I'm going to check is this next value null? 1017 00:44:58,040 --> 00:44:59,750 If it is, I found the end of the list. 1018 00:44:59,750 --> 00:45:01,390 So really I can follow that arrow. 1019 00:45:01,390 --> 00:45:03,460 Now, I know that I'm at a null pointer. 1020 00:45:03,460 --> 00:45:05,825 So now, I just want to draw this number up here. 1021 00:45:05,825 --> 00:45:07,450 And I accidentally advanced the screen. 1022 00:45:07,450 --> 00:45:10,460 I want to actually draw this arrow up here. 1023 00:45:10,460 --> 00:45:12,310 So how do we go about doing that? 1024 00:45:12,310 --> 00:45:15,770 1025 00:45:15,770 --> 00:45:18,860 Well, the code there might look like this. 1026 00:45:18,860 --> 00:45:23,710 So if all I want to point at a node, as I just did with the big fuzzy hand, 1027 00:45:23,710 --> 00:45:27,280 I can just initialize this pointer to equal whatever the list itself 1028 00:45:27,280 --> 00:45:28,305 is pointing at. 1029 00:45:28,305 --> 00:45:29,680 Then, I can do like a while loop. 1030 00:45:29,680 --> 00:45:33,010 And it's a little weird looking, because I'm using some of my new syntax. 1031 00:45:33,010 --> 00:45:37,060 But this is just asking the question, while the next field I'm pointing at 1032 00:45:37,060 --> 00:45:40,570 is not NULL, go ahead and follow it. 1033 00:45:40,570 --> 00:45:43,480 So again, this is as complicated as the syntax today will get. 1034 00:45:43,480 --> 00:45:46,810 But this is just saying, whatever I'm pointing at point specifically 1035 00:45:46,810 --> 00:45:48,160 at the next field. 1036 00:45:48,160 --> 00:45:51,640 While it is not NULL, go ahead and update yourself 1037 00:45:51,640 --> 00:45:54,440 to point at whatever it is pointing at. 1038 00:45:54,440 --> 00:45:56,560 So if I advance to the next slide here, this 1039 00:45:56,560 --> 00:45:58,660 is like I'm initially pointing at 2. 1040 00:45:58,660 --> 00:45:59,318 I see an arrow. 1041 00:45:59,318 --> 00:46:00,610 I'm going to follow that arrow. 1042 00:46:00,610 --> 00:46:01,902 I'm going to follow that arrow. 1043 00:46:01,902 --> 00:46:07,000 So however big the list is I just keep moving my temporary pointer to follow 1044 00:46:07,000 --> 00:46:09,340 these breadcrumbs until I hit NULL. 1045 00:46:09,340 --> 00:46:12,212 So here in the story let me propose that we add another number, 5. 1046 00:46:12,212 --> 00:46:14,920 And 5, of course, if we keep it sorted, it's got to go over here. 1047 00:46:14,920 --> 00:46:16,750 And again, they're all over my computer's memory. 1048 00:46:16,750 --> 00:46:18,030 They're not in a perfectly straight line, 1049 00:46:18,030 --> 00:46:20,030 because who knows where there's available space. 1050 00:46:20,030 --> 00:46:24,480 But now that I found this, I want to go ahead and create one more pointer using 1051 00:46:24,480 --> 00:46:28,180 code very similar to what we just saw. 1052 00:46:28,180 --> 00:46:31,380 But now lastly, let's do one more here at the beginning 1053 00:46:31,380 --> 00:46:34,350 and then one more in the middle and see what can go wrong. 1054 00:46:34,350 --> 00:46:37,200 What is worrisome about 1 if we actually want 1055 00:46:37,200 --> 00:46:40,680 to store this list in sorted order? 1056 00:46:40,680 --> 00:46:43,500 What might I be mindful of now if the goal 1057 00:46:43,500 --> 00:46:46,860 is to insert 1 into this linked list? 1058 00:46:46,860 --> 00:46:47,520 Any thoughts? 1059 00:46:47,520 --> 00:46:50,915 1060 00:46:50,915 --> 00:46:52,040 What do I want to do first? 1061 00:46:52,040 --> 00:46:54,080 Well, you know what, let me go ahead and just 1062 00:46:54,080 --> 00:46:56,750 point-- you know what, it's obviously got to go to the start of the list 1063 00:46:56,750 --> 00:46:59,167 if I want to keep it sorted, so that the arrows eventually 1064 00:46:59,167 --> 00:47:00,270 go from left to right. 1065 00:47:00,270 --> 00:47:03,530 So let me go ahead and just use code like this to allocate the new node. 1066 00:47:03,530 --> 00:47:07,300 And let me go ahead and just move that arrow like this. 1067 00:47:07,300 --> 00:47:10,720 This is wrong even though we've not seen the code for it. 1068 00:47:10,720 --> 00:47:12,490 But why is this wrong? 1069 00:47:12,490 --> 00:47:12,990 Yeah. 1070 00:47:12,990 --> 00:47:15,570 AUDIENCE: You're orphaning 2, 4, and 5. 1071 00:47:15,570 --> 00:47:17,070 DAVID MALAN: I'm orphaning 2, 4, 5. 1072 00:47:17,070 --> 00:47:18,100 In what sense? 1073 00:47:18,100 --> 00:47:22,800 I mean literally in my program, the only variables and the variables 1074 00:47:22,800 --> 00:47:24,870 I have are those you see on the board here. 1075 00:47:24,870 --> 00:47:27,570 So if nothing is pointing at 2 anymore, it 1076 00:47:27,570 --> 00:47:30,330 doesn't matter that 2 is pointing at 4 and 4 is pointing at 5, 1077 00:47:30,330 --> 00:47:33,720 we have orphaned 2 and transitively 4 and 5. 1078 00:47:33,720 --> 00:47:35,220 So those are just lost. 1079 00:47:35,220 --> 00:47:36,480 That is a memory leak. 1080 00:47:36,480 --> 00:47:39,545 If you recall using Valgrind and getting yelled at by Valgrind 1081 00:47:39,545 --> 00:47:42,170 because you're leaking memory, it might be because, yes, you've 1082 00:47:42,170 --> 00:47:43,470 forgotten to free memory. 1083 00:47:43,470 --> 00:47:46,260 Or worse, you might have completely forgotten where 1084 00:47:46,260 --> 00:47:48,300 the memory is that you were using. 1085 00:47:48,300 --> 00:47:54,042 And by definition of your own code, you can never access that memory again. 1086 00:47:54,042 --> 00:47:57,000 You've asked the computer for it, but you're never able to give it back 1087 00:47:57,000 --> 00:47:59,820 because you have no variable remembering where it is. 1088 00:47:59,820 --> 00:48:01,210 So we don't want to do that. 1089 00:48:01,210 --> 00:48:03,360 We instead want to do this probably. 1090 00:48:03,360 --> 00:48:07,260 Let's point 1 to 2 first, which is kind of redundant, right? 1091 00:48:07,260 --> 00:48:09,660 Now, we have sort of conflicting beginnings of the list. 1092 00:48:09,660 --> 00:48:12,590 But once 1 is pointing to 2, what can your next update? 1093 00:48:12,590 --> 00:48:13,470 AUDIENCE: The list. 1094 00:48:13,470 --> 00:48:15,840 DAVID MALAN: List to point at 1. 1095 00:48:15,840 --> 00:48:19,020 And you can do this in code if you'd like really with just two steps. 1096 00:48:19,020 --> 00:48:23,190 You can update the next field of your new node, which 1097 00:48:23,190 --> 00:48:25,290 is the one representing 1 that I just allocated, 1098 00:48:25,290 --> 00:48:28,990 and you can initialize it to point at whatever list is pointing at. 1099 00:48:28,990 --> 00:48:31,860 So if you want this thing to point at the same thing 1100 00:48:31,860 --> 00:48:34,170 that this thing was pointing at, you literally just 1101 00:48:34,170 --> 00:48:38,580 say in code n arrow x equals whatever list is pointing at 1102 00:48:38,580 --> 00:48:42,770 and then you say the list should equal n itself. 1103 00:48:42,770 --> 00:48:44,520 And again, you'll see in section this week 1104 00:48:44,520 --> 00:48:46,770 and in the upcoming problem set actual opportunities 1105 00:48:46,770 --> 00:48:48,510 to apply these kinds of lines of code. 1106 00:48:48,510 --> 00:48:50,430 But those are the kinds of thought processes 1107 00:48:50,430 --> 00:48:51,750 that you should be mindful of. 1108 00:48:51,750 --> 00:48:53,790 Now, 3 is the only one that's particularly annoying. 1109 00:48:53,790 --> 00:48:55,290 And we won't look at the code for this. 1110 00:48:55,290 --> 00:48:58,240 If we actually want to put something in sorted order in the middle of the list, 1111 00:48:58,240 --> 00:49:00,490 let's just consider conceptually what's got to happen. 1112 00:49:00,490 --> 00:49:02,610 We've got to allocate memory for the node. 1113 00:49:02,610 --> 00:49:03,960 We then need to update what? 1114 00:49:03,960 --> 00:49:06,900 We probably don't want to point 2 at 3 for the exact same reason 1115 00:49:06,900 --> 00:49:07,590 you identified. 1116 00:49:07,590 --> 00:49:08,820 We then orphan 4 and 5. 1117 00:49:08,820 --> 00:49:11,250 So what should we update first conceptually? 1118 00:49:11,250 --> 00:49:12,338 AUDIENCE: 3 to 4. 1119 00:49:12,338 --> 00:49:14,880 DAVID MALAN: Update 3 to 4, so it is going to look like this. 1120 00:49:14,880 --> 00:49:17,570 And now we can update 2 to 3. 1121 00:49:17,570 --> 00:49:19,750 And I'm going to wave my hand at the code for this 1122 00:49:19,750 --> 00:49:21,930 only because there's multiple steps now. 1123 00:49:21,930 --> 00:49:23,910 You have to probably have some kind of loop 1124 00:49:23,910 --> 00:49:27,960 that iterates over the existing list, finds the appropriate location using 1125 00:49:27,960 --> 00:49:30,540 less than or greater than, trying to find the right spot. 1126 00:49:30,540 --> 00:49:32,910 And then you have to manipulate the pointers to do that. 1127 00:49:32,910 --> 00:49:34,785 You won't need to do something as complicated 1128 00:49:34,785 --> 00:49:36,432 as that for the upcoming problem set 5. 1129 00:49:36,432 --> 00:49:41,520 But it is just boiling down to some loops, some inequality checks, and then 1130 00:49:41,520 --> 00:49:42,750 some updates of the pointers. 1131 00:49:42,750 --> 00:49:44,880 But it's easier generally to add stuff at the end 1132 00:49:44,880 --> 00:49:47,550 and even easier to add things at the beginning, 1133 00:49:47,550 --> 00:49:52,904 especially if you don't care about maintaining any kind of sorted order. 1134 00:49:52,904 --> 00:49:53,850 Phew. 1135 00:49:53,850 --> 00:49:56,650 Any questions on that? 1136 00:49:56,650 --> 00:49:57,836 Yeah. 1137 00:49:57,836 --> 00:50:01,252 AUDIENCE: Back to the beginning, like the code you had, 1138 00:50:01,252 --> 00:50:09,690 what's the difference between node with star and like a pointer n of type node? 1139 00:50:09,690 --> 00:50:14,350 DAVID MALAN: A pointer n of type-- let me just scroll back to the code, here? 1140 00:50:14,350 --> 00:50:14,980 AUDIENCE: Yeah. 1141 00:50:14,980 --> 00:50:17,010 DAVID MALAN: OK, so this is malloc is going 1142 00:50:17,010 --> 00:50:20,700 to give us a chunk of memory that's big enough to store node. 1143 00:50:20,700 --> 00:50:25,870 Node star n gives us a pointer that is the address of a node. 1144 00:50:25,870 --> 00:50:29,200 And therefore we're going to assign the return value of malloc 1145 00:50:29,200 --> 00:50:33,730 to that variable, so that n effectively represents a chunk of memory 1146 00:50:33,730 --> 00:50:35,904 that's big enough to store a node. 1147 00:50:35,904 --> 00:50:38,690 AUDIENCE: So n is node, not a pointer? 1148 00:50:38,690 --> 00:50:41,720 DAVID MALAN: n is a pointer to a node. 1149 00:50:41,720 --> 00:50:44,200 n is a node star, or a pointer to a node. 1150 00:50:44,200 --> 00:50:45,200 And what does that mean? 1151 00:50:45,200 --> 00:50:46,970 n is the address of a node. 1152 00:50:46,970 --> 00:50:50,000 And that should make sense, because malloc returns an address. 1153 00:50:50,000 --> 00:50:52,370 But this is why we're now using arrow notation. 1154 00:50:52,370 --> 00:50:53,210 n is not a node. 1155 00:50:53,210 --> 00:50:56,750 You can't do n dot number and n dot next. 1156 00:50:56,750 --> 00:50:58,970 You have to do the star thing and then the dot. 1157 00:50:58,970 --> 00:51:04,730 Or more succinctly now, you do an arrow number and arrow next. 1158 00:51:04,730 --> 00:51:05,853 Good question. 1159 00:51:05,853 --> 00:51:09,020 All right, let's see if we can't make this a little more real with maybe one 1160 00:51:09,020 --> 00:51:10,460 demonstration here. 1161 00:51:10,460 --> 00:51:14,220 Let me go ahead and put on the screen the end point that we want to get to, 1162 00:51:14,220 --> 00:51:16,310 which is that here with everything in order, 1163 00:51:16,310 --> 00:51:20,192 I think for this we need maybe five volunteers if we could. 1164 00:51:20,192 --> 00:51:21,650 Let me go a little farther in back. 1165 00:51:21,650 --> 00:51:23,750 OK, one over there. 1166 00:51:23,750 --> 00:51:26,258 Maybe two over there. 1167 00:51:26,258 --> 00:51:27,050 Now the hands are-- 1168 00:51:27,050 --> 00:51:31,210 OK, 3 over here if we can go in back up over there. 1169 00:51:31,210 --> 00:51:33,830 OK, 4 being volunteered by your friend. 1170 00:51:33,830 --> 00:51:35,900 And 5 being volunteered by your friends. 1171 00:51:35,900 --> 00:51:37,160 Do you want to come on up? 1172 00:51:37,160 --> 00:51:38,100 All right, come on up. 1173 00:51:38,100 --> 00:51:39,180 Come on up. 1174 00:51:39,180 --> 00:51:41,180 Brian's going to help me run this demonstration. 1175 00:51:41,180 --> 00:51:47,760 If all five of you could come on over, come on over here 1176 00:51:47,760 --> 00:51:50,240 where we have some space. 1177 00:51:50,240 --> 00:51:54,020 All right, let me get you some microphones and introductions. 1178 00:51:54,020 --> 00:51:55,230 OK, thank you. 1179 00:51:55,230 --> 00:51:58,230 Two of them were bravely volunteered by the people sitting next to them. 1180 00:51:58,230 --> 00:51:59,650 So props to both of you. 1181 00:51:59,650 --> 00:52:02,150 You want to say hello and a little something about yourself. 1182 00:52:02,150 --> 00:52:02,817 AUDIENCE: Hello. 1183 00:52:02,817 --> 00:52:03,940 My name [? Siobhana. ?] 1184 00:52:03,940 --> 00:52:05,690 DAVID MALAN: [? Siobhana. ?] And year or-- 1185 00:52:05,690 --> 00:52:06,560 AUDIENCE: I'm a sophomore. 1186 00:52:06,560 --> 00:52:07,030 DAVID MALAN: Sophomore. 1187 00:52:07,030 --> 00:52:07,530 OK. 1188 00:52:07,530 --> 00:52:09,182 Nice to meet you. 1189 00:52:09,182 --> 00:52:10,130 AUDIENCE: Hi. 1190 00:52:10,130 --> 00:52:11,405 I'm a senior. 1191 00:52:11,405 --> 00:52:13,030 DAVID MALAN: It's nice to have you too. 1192 00:52:13,030 --> 00:52:13,530 Yeah. 1193 00:52:13,530 --> 00:52:16,042 AUDIENCE: Hi, I'm Athena a sophomore in FOHO. 1194 00:52:16,042 --> 00:52:17,508 DAVID MALAN: Athena. 1195 00:52:17,508 --> 00:52:18,050 AUDIENCE: Hi. 1196 00:52:18,050 --> 00:52:18,950 I'm Anurag. 1197 00:52:18,950 --> 00:52:21,470 I'm a first year at Matthews. 1198 00:52:21,470 --> 00:52:22,580 AUDIENCE: I'm Ethan. 1199 00:52:22,580 --> 00:52:24,216 I'm a first year at Weld. 1200 00:52:24,216 --> 00:52:25,325 DAVID MALAN: OK, and-- 1201 00:52:25,325 --> 00:52:26,200 AUDIENCE: I'm Sarika. 1202 00:52:26,200 --> 00:52:27,710 I'm first year in Thayer. 1203 00:52:27,710 --> 00:52:28,190 DAVID MALAN: Wonderful. 1204 00:52:28,190 --> 00:52:29,600 All right, thank you all for volunteering. 1205 00:52:29,600 --> 00:52:30,725 Let's go ahead and do this. 1206 00:52:30,725 --> 00:52:33,990 You for the moment represent a heap of memory, if you will. 1207 00:52:33,990 --> 00:52:36,410 So if you could maybe all back up over here just 1208 00:52:36,410 --> 00:52:38,240 to where we have some available space. 1209 00:52:38,240 --> 00:52:41,390 We're going to need one of you to represent the list. 1210 00:52:41,390 --> 00:52:42,520 Siobhan was it? 1211 00:52:42,520 --> 00:52:42,810 AUDIENCE: [? Siobhana. ?] 1212 00:52:42,810 --> 00:52:44,120 DAVID MALAN: [? Siobhana, ?] come on up. [? Siobhana, ?] do 1213 00:52:44,120 --> 00:52:45,787 you want to go ahead and represent list. 1214 00:52:45,787 --> 00:52:49,310 And to represent our actual list, we have-- 1215 00:52:49,310 --> 00:52:53,870 or Brian-- yeah, we have a name tag, hello, my name is list. 1216 00:52:53,870 --> 00:52:56,990 So you're going to represent the rectangle on the left that 1217 00:52:56,990 --> 00:52:59,150 represents the linked list itself. 1218 00:52:59,150 --> 00:53:01,868 And now initially we're going to go ahead initialize you to null. 1219 00:53:01,868 --> 00:53:04,160 So you can just go ahead and put that behind your back. 1220 00:53:04,160 --> 00:53:05,240 So you're not pointing at anything. 1221 00:53:05,240 --> 00:53:06,198 But you represent list. 1222 00:53:06,198 --> 00:53:08,810 And there's nothing in the list, no numbers in the list. 1223 00:53:08,810 --> 00:53:09,800 What was the next step? 1224 00:53:09,800 --> 00:53:17,540 If the goal at hand is to insert 2, 4, 5, 1, 3, we want to do what first, 1225 00:53:17,540 --> 00:53:20,430 what lower level operation to get 2 in there? 1226 00:53:20,430 --> 00:53:21,842 What was the first line of code? 1227 00:53:21,842 --> 00:53:22,550 AUDIENCE: malloc. 1228 00:53:22,550 --> 00:53:23,383 DAVID MALAN: malloc. 1229 00:53:23,383 --> 00:53:25,242 So we want to malloc a node for 2. 1230 00:53:25,242 --> 00:53:26,450 So let's go ahead and malloc. 1231 00:53:26,450 --> 00:53:27,770 OK, come on up. 1232 00:53:27,770 --> 00:53:28,400 So malloc. 1233 00:53:28,400 --> 00:53:29,355 And what's your name again? 1234 00:53:29,355 --> 00:53:29,690 AUDIENCE: Ethan. 1235 00:53:29,690 --> 00:53:30,140 DAVID MALAN: Ethan. 1236 00:53:30,140 --> 00:53:30,710 OK. 1237 00:53:30,710 --> 00:53:32,127 And what do we need to give Ethan? 1238 00:53:32,127 --> 00:53:34,740 Ethan has two values or two fields. 1239 00:53:34,740 --> 00:53:36,800 The first one is the number 2. 1240 00:53:36,800 --> 00:53:38,240 Thank you. 1241 00:53:38,240 --> 00:53:42,110 The next one is a pointer called next. 1242 00:53:42,110 --> 00:53:43,980 Now, you're not pointing at anyone else. 1243 00:53:43,980 --> 00:53:44,920 So you'll put it behind your back. 1244 00:53:44,920 --> 00:53:47,030 And now what do we want to do with? [? Siobhana, ?] what do we have to do? 1245 00:53:47,030 --> 00:53:47,863 AUDIENCE: Point to-- 1246 00:53:47,863 --> 00:53:48,780 DAVID MALAN: Point to? 1247 00:53:48,780 --> 00:53:49,400 AUDIENCE: 2. 1248 00:53:49,400 --> 00:53:50,733 DAVID MALAN: Him, yes, number 2. 1249 00:53:50,733 --> 00:53:55,070 OK, so this now represents the picture where we have list here, 2 here, 1250 00:53:55,070 --> 00:53:56,430 but the null pointer as well. 1251 00:53:56,430 --> 00:53:58,520 All right next we wanted to add 4 to the list. 1252 00:53:58,520 --> 00:53:59,820 How do we go ahead and do this? 1253 00:53:59,820 --> 00:54:01,862 Well, with 4, we're going to go ahead and malloc. 1254 00:54:01,862 --> 00:54:03,230 malloc, all right. 1255 00:54:03,230 --> 00:54:06,500 And now, Brian has a lovely number 4 for you and a pointer. 1256 00:54:06,500 --> 00:54:08,744 What do we want to do with your pointer? 1257 00:54:08,744 --> 00:54:10,022 AUDIENCE: Not point it. 1258 00:54:10,022 --> 00:54:11,480 DAVID MALAN: Not point at anything. 1259 00:54:11,480 --> 00:54:12,950 Now, it's a little more work. 1260 00:54:12,950 --> 00:54:14,330 And I need a temporary variable. 1261 00:54:14,330 --> 00:54:15,560 So I'll play this role. 1262 00:54:15,560 --> 00:54:17,810 I'm going to go ahead and point at wherever Siobhana is pointing 1263 00:54:17,810 --> 00:54:19,268 at in sort of unnaturally this way. 1264 00:54:19,268 --> 00:54:20,060 That's OK. 1265 00:54:20,060 --> 00:54:22,602 We couldn't get hands that point the other way physically. 1266 00:54:22,602 --> 00:54:24,560 So we're going to point at the same thing here. 1267 00:54:24,560 --> 00:54:25,643 You're both pointing at 2. 1268 00:54:25,643 --> 00:54:28,435 And what am I looking for in order to decide where to put 4? 1269 00:54:28,435 --> 00:54:30,105 AUDIENCE: If it's greater than. 1270 00:54:30,105 --> 00:54:31,980 DAVID MALAN: If it's greater than some value. 1271 00:54:31,980 --> 00:54:32,897 So I'm going to check. 1272 00:54:32,897 --> 00:54:34,190 Well, 4 is greater than 2. 1273 00:54:34,190 --> 00:54:35,060 So I'm going to keep going. 1274 00:54:35,060 --> 00:54:35,870 And your name was Eric? 1275 00:54:35,870 --> 00:54:36,190 AUDIENCE: Ethan. 1276 00:54:36,190 --> 00:54:36,920 DAVID MALAN: Ethan, sorry. 1277 00:54:36,920 --> 00:54:38,640 So, Ethan, what are you pointing at? 1278 00:54:38,640 --> 00:54:39,140 Nothing. 1279 00:54:39,140 --> 00:54:40,170 So that's an opportunity. 1280 00:54:40,170 --> 00:54:41,240 There's nothing to his right. 1281 00:54:41,240 --> 00:54:43,865 So let me go ahead and have Ethan point at-- what's you're name again? 1282 00:54:43,865 --> 00:54:44,573 AUDIENCE: Athena. 1283 00:54:44,573 --> 00:54:45,710 DAVID MALAN: Athena. 1284 00:54:45,710 --> 00:54:47,390 Also, unnaturally, but that's fine. 1285 00:54:47,390 --> 00:54:50,360 And so now does Athena need to update her pointer? 1286 00:54:50,360 --> 00:54:51,110 No, she's good. 1287 00:54:51,110 --> 00:54:52,440 She represents the end of the list. 1288 00:54:52,440 --> 00:54:54,100 So her pointer can stay behind her back. 1289 00:54:54,100 --> 00:54:55,725 All right, let's go ahead and malloc 5. 1290 00:54:55,725 --> 00:54:57,080 You want to be our 5? 1291 00:54:57,080 --> 00:54:58,250 So now we need a 5. 1292 00:54:58,250 --> 00:55:01,070 So we need to hand you the number 5. 1293 00:55:01,070 --> 00:55:02,280 And what's your name again? 1294 00:55:02,280 --> 00:55:02,780 AUDIENCE: Sarika. 1295 00:55:02,780 --> 00:55:03,613 DAVID MALAN: Sarika. 1296 00:55:03,613 --> 00:55:06,182 All right, so Sarika's holding the number 5. 1297 00:55:06,182 --> 00:55:08,140 She also is going to get a pointer called next. 1298 00:55:08,140 --> 00:55:09,650 What should Sarika be pointing at? 1299 00:55:09,650 --> 00:55:10,430 AUDIENCE: Nothing. 1300 00:55:10,430 --> 00:55:10,910 DAVID MALAN: Nothing. 1301 00:55:10,910 --> 00:55:13,070 And now how to do I insert her into the right place? 1302 00:55:13,070 --> 00:55:14,400 Well, I have to do the same thing. 1303 00:55:14,400 --> 00:55:16,670 So I'm going to get involved again and be a temporary variable. 1304 00:55:16,670 --> 00:55:19,503 I'm going to point at the same thing [? Siobhana ?] is pointing out, 1305 00:55:19,503 --> 00:55:20,130 which is Ethan. 1306 00:55:20,130 --> 00:55:21,920 I'm going to follow this and see, ooh, wait a minute, 1307 00:55:21,920 --> 00:55:23,470 he's actually pointing at someone else. 1308 00:55:23,470 --> 00:55:24,530 So I'm going to follow that. 1309 00:55:24,530 --> 00:55:25,370 It's still number 4. 1310 00:55:25,370 --> 00:55:26,210 So I want to keep going. 1311 00:55:26,210 --> 00:55:27,055 Oh, wait a minute. 1312 00:55:27,055 --> 00:55:28,430 Athena is not pointing at anyone. 1313 00:55:28,430 --> 00:55:32,125 This is an opportunity now to have Athena point at 5 and voila. 1314 00:55:32,125 --> 00:55:34,000 But are you going to change your pointer yet? 1315 00:55:34,000 --> 00:55:34,610 No. 1316 00:55:34,610 --> 00:55:36,318 Now things get a little more interesting. 1317 00:55:36,318 --> 00:55:38,345 Could we go ahead and malloc 1? 1318 00:55:38,345 --> 00:55:39,510 And what's your name again? 1319 00:55:39,510 --> 00:55:39,780 AUDIENCE: Emma. 1320 00:55:39,780 --> 00:55:40,530 DAVID MALAN: Emma. 1321 00:55:40,530 --> 00:55:42,810 OK, Emma, we have the number 1 for you from Brian. 1322 00:55:42,810 --> 00:55:46,980 You have a pointer, which should be initialized as well to null. 1323 00:55:46,980 --> 00:55:49,713 And now we have a couple of steps involved here. 1324 00:55:49,713 --> 00:55:50,880 What do we want to do first? 1325 00:55:50,880 --> 00:55:53,395 1326 00:55:53,395 --> 00:55:54,270 What's your proposal? 1327 00:55:54,270 --> 00:55:55,080 AUDIENCE: Temporary pointer. 1328 00:55:55,080 --> 00:55:55,890 DAVID MALAN: Temporary pointer. 1329 00:55:55,890 --> 00:55:58,050 So I'm going to point at the same things [? Siobhana ?] is pointing at, 1330 00:55:58,050 --> 00:55:59,430 which is Ethan here. 1331 00:55:59,430 --> 00:56:01,320 But I see that 2 is greater than 1. 1332 00:56:01,320 --> 00:56:02,880 So what do I actually want to do? 1333 00:56:02,880 --> 00:56:05,422 Well, let me incorrectly for a moment-- [? Siobhana, ?] could 1334 00:56:05,422 --> 00:56:06,532 you point at number 1? 1335 00:56:06,532 --> 00:56:07,740 What have we just done wrong? 1336 00:56:07,740 --> 00:56:09,310 We've orphaned everyone else. 1337 00:56:09,310 --> 00:56:11,790 And even more visibly now, no one is pointing 1338 00:56:11,790 --> 00:56:15,270 at Ethan or beyond, which means we've just leaked that memory, never 1339 00:56:15,270 --> 00:56:16,383 to be recovered or free. 1340 00:56:16,383 --> 00:56:17,550 So we don't want to do that. 1341 00:56:17,550 --> 00:56:18,870 Undo, Control-Z. 1342 00:56:18,870 --> 00:56:20,132 What do we want to do instead? 1343 00:56:20,132 --> 00:56:21,090 What's your name again? 1344 00:56:21,090 --> 00:56:21,430 AUDIENCE: Emma. 1345 00:56:21,430 --> 00:56:22,180 DAVID MALAN: Emma. 1346 00:56:22,180 --> 00:56:23,430 What do you want to point at? 1347 00:56:23,430 --> 00:56:25,685 AUDIENCE: I want to point at [? Siobhana. ?] 1348 00:56:25,685 --> 00:56:27,810 DAVID MALAN: At that the same thing, [? Siobhana ?] 1349 00:56:27,810 --> 00:56:31,312 is pointing at, which is equivalent then to Ethan. 1350 00:56:31,312 --> 00:56:32,770 So go ahead and do that with your-- 1351 00:56:32,770 --> 00:56:34,650 OK, sort of like Twister now. 1352 00:56:34,650 --> 00:56:35,610 That's OK. 1353 00:56:35,610 --> 00:56:37,980 And then [? Siobhana, ?] what I do want to point at? 1354 00:56:37,980 --> 00:56:38,700 Perfect. 1355 00:56:38,700 --> 00:56:42,033 So again a bunch of steps involved, but it really is just two or three steps 1356 00:56:42,033 --> 00:56:43,950 depending on which pointers we want to update. 1357 00:56:43,950 --> 00:56:45,720 And then lastly, let's go head to malloc 3. 1358 00:56:45,720 --> 00:56:46,440 And your name was again? 1359 00:56:46,440 --> 00:56:47,150 AUDIENCE: Anurag. 1360 00:56:47,150 --> 00:56:47,910 DAVID MALAN: Anurag. 1361 00:56:47,910 --> 00:56:49,470 So then we have a 3 for you from Brian. 1362 00:56:49,470 --> 00:56:50,553 We have a pointer for you. 1363 00:56:50,553 --> 00:56:52,510 It's initialized initially to null. 1364 00:56:52,510 --> 00:56:54,130 So you can put that behind your back. 1365 00:56:54,130 --> 00:56:57,390 I'm going to point at the same thing as [? Siobhana. ?] And here we go. 1366 00:56:57,390 --> 00:56:58,560 1 is smaller. 1367 00:56:58,560 --> 00:56:59,670 2 is smaller. 1368 00:56:59,670 --> 00:57:01,000 4 is larger. 1369 00:57:01,000 --> 00:57:02,280 So let's get this right. 1370 00:57:02,280 --> 00:57:06,249 And who do we want to point at whom first? 1371 00:57:06,249 --> 00:57:08,570 AUDIENCE: 3 points at 4. 1372 00:57:08,570 --> 00:57:10,190 DAVID MALAN: 3 should point at 4. 1373 00:57:10,190 --> 00:57:11,190 So go ahead and do that. 1374 00:57:11,190 --> 00:57:14,232 And you can step a little forward just so it looks a little less awkward. 1375 00:57:14,232 --> 00:57:17,390 And then lastly, big finale, Ethan, who do you want to point at? 1376 00:57:17,390 --> 00:57:18,500 Number 3. 1377 00:57:18,500 --> 00:57:21,357 And thankfully, all these steps later, we have a linked list. 1378 00:57:21,357 --> 00:57:23,190 We have wonderful souvenirs for each of you. 1379 00:57:23,190 --> 00:57:24,020 We just need the numbers back. 1380 00:57:24,020 --> 00:57:26,266 Thank you to our volunteers here if we could. 1381 00:57:26,266 --> 00:57:28,760 OK, you can keep that. 1382 00:57:28,760 --> 00:57:31,340 You can put the numbers on the desk here. 1383 00:57:31,340 --> 00:57:35,480 So as these folks head off the stage, let me point out now one 1384 00:57:35,480 --> 00:57:38,480 of the shortcomings of the approach-- thank you all so much-- 1385 00:57:38,480 --> 00:57:40,220 of what we've just done here. 1386 00:57:40,220 --> 00:57:44,870 Even though you all had the luxury of seeing 1 and 2 and 3 and 4 and 5 1387 00:57:44,870 --> 00:57:47,750 and you could even sort of immediately figure out where things go, 1388 00:57:47,750 --> 00:57:51,380 even these linked lists are implemented with chunks of memory. 1389 00:57:51,380 --> 00:57:54,290 And just like with arrays, or equivalently lockers, 1390 00:57:54,290 --> 00:57:56,840 the computer can only look at one piece of memory 1391 00:57:56,840 --> 00:58:00,980 at a time, which is to say that to a computer, that same linked list 1392 00:58:00,980 --> 00:58:02,030 kind of looks like this. 1393 00:58:02,030 --> 00:58:05,150 It's sort of blind to the specific numbers in that linked list 1394 00:58:05,150 --> 00:58:07,930 until it opens each of those doors of memory. 1395 00:58:07,930 --> 00:58:13,670 So to find where 3 goes or to find where 5 goes or to find where 1 goes, 1396 00:58:13,670 --> 00:58:17,360 all of those doors maximally might need to be opened one 1397 00:58:17,360 --> 00:58:19,470 at a time to find that value. 1398 00:58:19,470 --> 00:58:22,220 So with linked lists we have gained a feature. 1399 00:58:22,220 --> 00:58:26,930 We have gained the ability to add dynamically to the list, right. 1400 00:58:26,930 --> 00:58:30,020 I just kept malloc-ing, malloc-ing, malloc-ing additional students 1401 00:58:30,020 --> 00:58:31,040 and additional numbers. 1402 00:58:31,040 --> 00:58:33,590 So the list can grow as big as we wanted it to. 1403 00:58:33,590 --> 00:58:34,730 And that case, we had five. 1404 00:58:34,730 --> 00:58:38,630 We could have done it 50 times or 500 to add more and more numbers. 1405 00:58:38,630 --> 00:58:43,250 That's an upside, because we don't have to waste time inserting into an array 1406 00:58:43,250 --> 00:58:46,340 by resizing it and moving all of the original contents. 1407 00:58:46,340 --> 00:58:49,850 None of our volunteers had to move, technically speaking, just 1408 00:58:49,850 --> 00:58:51,800 to insert a number 5 or number 3. 1409 00:58:51,800 --> 00:58:56,450 They just had to point at someone else in memory or someone else on stage. 1410 00:58:56,450 --> 00:58:59,970 So if your data structure now is a linked list that looks like this, 1411 00:58:59,970 --> 00:59:01,790 we've paid a price for that dynamism. 1412 00:59:01,790 --> 00:59:07,400 We've paid a price for the ability to resize our list without moving 1413 00:59:07,400 --> 00:59:09,500 everything around that's already there. 1414 00:59:09,500 --> 00:59:14,630 What is a downside that you might perceive of a linked list? 1415 00:59:14,630 --> 00:59:17,030 What have we lost or given up here? 1416 00:59:17,030 --> 00:59:18,470 AUDIENCE: We lost random access. 1417 00:59:18,470 --> 00:59:18,770 DAVID MALAN: Sorry. 1418 00:59:18,770 --> 00:59:19,430 Say again. 1419 00:59:19,430 --> 00:59:20,780 AUDIENCE: We lost random access. 1420 00:59:20,780 --> 00:59:22,238 DAVID MALAN: We lost random access. 1421 00:59:22,238 --> 00:59:23,090 That's spot on. 1422 00:59:23,090 --> 00:59:24,420 We've lost random access. 1423 00:59:24,420 --> 00:59:24,920 Why? 1424 00:59:24,920 --> 00:59:28,250 Because the way you get from the beginning of the list to the end 1425 00:59:28,250 --> 00:59:30,170 is by following these pointers, following 1426 00:59:30,170 --> 00:59:33,440 these arrows, these breadcrumbs, you can't just jump to the middle elements, 1427 00:59:33,440 --> 00:59:36,560 even though obviously this one here on the screen to all of us humans, 1428 00:59:36,560 --> 00:59:37,700 that's the middle element. 1429 00:59:37,700 --> 00:59:39,680 You don't know that if you're the computer. 1430 00:59:39,680 --> 00:59:42,740 If the main variable that's storing this data structure 1431 00:59:42,740 --> 00:59:44,900 is the pointer, like [? Siobhana, ?] pointing 1432 00:59:44,900 --> 00:59:46,940 to the first elements of the list, you're 1433 00:59:46,940 --> 00:59:49,490 going to have to follow all of these arrows, 1434 00:59:49,490 --> 00:59:52,530 frankly counting them up and then retroactively realizing, 1435 00:59:52,530 --> 00:59:53,440 oh, there was 5. 1436 00:59:53,440 --> 00:59:55,490 I passed the middle one earlier. 1437 00:59:55,490 --> 00:59:57,020 You've glossed random access. 1438 00:59:57,020 --> 01:00:00,530 And what algorithm have we used wonderfully in the past 1439 01:00:00,530 --> 01:00:02,270 when we do have random access? 1440 01:00:02,270 --> 01:00:03,290 AUDIENCE: Binary search. 1441 01:00:03,290 --> 01:00:04,430 DAVID MALAN: Binary search. 1442 01:00:04,430 --> 01:00:08,570 So we've lost the ability now to do binary search as efficiently as we once 1443 01:00:08,570 --> 01:00:09,500 were able to. 1444 01:00:09,500 --> 01:00:12,260 And so if we consider now the running time of linked lists, 1445 01:00:12,260 --> 01:00:14,120 unfortunately, we've paid that price. 1446 01:00:14,120 --> 01:00:17,000 Searching now has gone back up to linear. 1447 01:00:17,000 --> 01:00:20,467 We no longer have logarithmic running time because of the fact 1448 01:00:20,467 --> 01:00:22,550 that we're stitching together this data structure. 1449 01:00:22,550 --> 01:00:25,610 And the only way to find the end of the list, the middle of the list 1450 01:00:25,610 --> 01:00:27,600 is to follow all of these arrows. 1451 01:00:27,600 --> 01:00:29,510 You can't just jump to one location. 1452 01:00:29,510 --> 01:00:33,290 Meanwhile inserting into the list, in the worst case, big O of n 1453 01:00:33,290 --> 01:00:36,740 is going to be linear as well, because you have to walk through the whole list 1454 01:00:36,740 --> 01:00:40,140 to actually find a spot for the given number, 1455 01:00:40,140 --> 01:00:42,623 especially if you're trying to keep it sorted. 1456 01:00:42,623 --> 01:00:44,540 So it would seem that even though we've gained 1457 01:00:44,540 --> 01:00:46,742 this feature of much more dynamic insertion 1458 01:00:46,742 --> 01:00:49,200 and we're building up something more interesting in memory, 1459 01:00:49,200 --> 01:00:51,770 and you can imagine this just taking much less time overall, 1460 01:00:51,770 --> 01:00:53,960 because you have to keep moving everything around 1461 01:00:53,960 --> 01:00:56,570 like we did with realloc, it's unfortunately 1462 01:00:56,570 --> 01:00:58,518 something we're paying a price for. 1463 01:00:58,518 --> 01:00:59,310 But that was a lot. 1464 01:00:59,310 --> 01:01:00,540 Let's go ahead and take our 5-minute break. 1465 01:01:00,540 --> 01:01:01,415 Fruit awaits outside. 1466 01:01:01,415 --> 01:01:02,770 And we'll be back. 1467 01:01:02,770 --> 01:01:09,020 All right, so during break, I whipped up one final example of our list program. 1468 01:01:09,020 --> 01:01:11,390 This one uses all of those building blocks. 1469 01:01:11,390 --> 01:01:14,780 And let's see if we can't follow along pictorially and code-wise what 1470 01:01:14,780 --> 01:01:17,220 it is we just built with all of these humans on stage. 1471 01:01:17,220 --> 01:01:19,250 So here is list list3.c. 1472 01:01:19,250 --> 01:01:20,260 It's available online. 1473 01:01:20,260 --> 01:01:22,720 So you can follow along at home afterward if you'd like. 1474 01:01:22,720 --> 01:01:25,762 And let's just walk through the lines that are written for us in advance. 1475 01:01:25,762 --> 01:01:27,500 One, I'm using standard I/O for printf. 1476 01:01:27,500 --> 01:01:30,350 And I'm using stdlib for malloc and free, 1477 01:01:30,350 --> 01:01:32,660 our new friends that give us dynamic memory. 1478 01:01:32,660 --> 01:01:37,760 Here is the definition of a node that again has a number inside of it 1479 01:01:37,760 --> 01:01:41,730 and a pointer, specifically a pointer to another node structure. 1480 01:01:41,730 --> 01:01:45,650 So that's what each of our humans represented, this time now in C. 1481 01:01:45,650 --> 01:01:47,180 What is my main program going to do? 1482 01:01:47,180 --> 01:01:49,490 Just for the sake of demonstration, the goal at hand 1483 01:01:49,490 --> 01:01:53,480 is just to write a program that initializes a linked list to nothing 1484 01:01:53,480 --> 01:01:56,960 initially, then adds a node with 1, then adds a node with 2, 1485 01:01:56,960 --> 01:01:58,010 then add a node with 3. 1486 01:01:58,010 --> 01:02:00,710 We'll keep it simple and not add 4 or 5 this time. 1487 01:02:00,710 --> 01:02:01,950 So how am I going to do this? 1488 01:02:01,950 --> 01:02:07,490 Well, on line 17, I propose that we create a variable called list 1489 01:02:07,490 --> 01:02:09,110 and have it be the address of a node. 1490 01:02:09,110 --> 01:02:10,907 So if I were to draw this now pictorially, 1491 01:02:10,907 --> 01:02:12,740 it's going to be just like our demonstration 1492 01:02:12,740 --> 01:02:16,010 a bit ago, where I have a rectangle here called list. 1493 01:02:16,010 --> 01:02:17,970 And initially, it's not pointing to anything. 1494 01:02:17,970 --> 01:02:20,840 So I'm just going to leave the box blank to represent NULL. 1495 01:02:20,840 --> 01:02:23,210 So that's that line 17 right here. 1496 01:02:23,210 --> 01:02:25,100 Now, let me go ahead and do the following. 1497 01:02:25,100 --> 01:02:27,260 Add a number to the list as follows. 1498 01:02:27,260 --> 01:02:29,930 Line 20 just gives me enough memory for a node. 1499 01:02:29,930 --> 01:02:33,710 And it stores that memory's address in a variable called n. 1500 01:02:33,710 --> 01:02:37,100 Lines 21 through 24 are just a safety check. 1501 01:02:37,100 --> 01:02:38,060 Did anything go wrong? 1502 01:02:38,060 --> 01:02:40,400 If so, just return 1 and stop the program. 1503 01:02:40,400 --> 01:02:42,470 We ran out of memory for some reason. 1504 01:02:42,470 --> 01:02:45,440 But these two lines now should look a little more familiar. 1505 01:02:45,440 --> 01:02:49,310 This now is going to go ahead and install 1 and NULL into that structure 1506 01:02:49,310 --> 01:02:49,980 as follows. 1507 01:02:49,980 --> 01:02:50,960 So let's recap. 1508 01:02:50,960 --> 01:02:55,220 This line here 20 is the same thing as allocating really 1509 01:02:55,220 --> 01:02:58,220 a node that looks like this in memory that has two halves. 1510 01:02:58,220 --> 01:03:01,730 One of those fields is called number, which I'll write there. 1511 01:03:01,730 --> 01:03:03,770 The other field is called next. 1512 01:03:03,770 --> 01:03:06,740 And then if we go back to the code, these two lines 1513 01:03:06,740 --> 01:03:09,420 are all about just installing values in that structure. 1514 01:03:09,420 --> 01:03:11,420 So if I go ahead to number and put the number 1, 1515 01:03:11,420 --> 01:03:14,003 I'm not going to bother drawing anything for next, because I'm 1516 01:03:14,003 --> 01:03:15,620 going to leave it implicitly as NULL. 1517 01:03:15,620 --> 01:03:17,360 So that's what's going on now. 1518 01:03:17,360 --> 01:03:18,740 What do I next want to do? 1519 01:03:18,740 --> 01:03:21,470 Well, the last line of code here under this comment that 1520 01:03:21,470 --> 01:03:24,530 says add number to list, I set list equal 1521 01:03:24,530 --> 01:03:28,320 to n where n again is pointing at this new node. 1522 01:03:28,320 --> 01:03:31,010 So that's the same thing as saying, well, list 1523 01:03:31,010 --> 01:03:34,307 is going to go ahead and point at that new node. 1524 01:03:34,307 --> 01:03:36,890 So after those lines of code, I've created a picture in memory 1525 01:03:36,890 --> 01:03:39,450 that effectively looks like this. 1526 01:03:39,450 --> 01:03:42,290 Now, let's go ahead and add the number 2 to the list. 1527 01:03:42,290 --> 01:03:43,507 It's almost the same. 1528 01:03:43,507 --> 01:03:46,340 So here's the chunk of code that's going to go and add a second node 1529 01:03:46,340 --> 01:03:47,630 to the list, this time containing 2. 1530 01:03:47,630 --> 01:03:48,830 Let's do it step by step. 1531 01:03:48,830 --> 01:03:51,445 Line 30, I'm going to reuse n as my temporary variable. 1532 01:03:51,445 --> 01:03:52,820 So I don't have to re-declare it. 1533 01:03:52,820 --> 01:03:54,695 It's the same n as before, but it's now going 1534 01:03:54,695 --> 01:03:57,200 to get a different address of memory thanks to malloc. 1535 01:03:57,200 --> 01:03:59,310 So that gives me another box like this that I'm 1536 01:03:59,310 --> 01:04:02,280 going to go ahead and draw like that with nothing in it initially. 1537 01:04:02,280 --> 01:04:05,510 I'm going to make sure per lines 31 to 34 that nothing went wrong. 1538 01:04:05,510 --> 01:04:07,170 But that's just as before. 1539 01:04:07,170 --> 01:04:10,430 And now in lines 35 and 36, I'm going to put 2 in there and NULL. 1540 01:04:10,430 --> 01:04:13,130 So let me go over there and let me go ahead and put 2 in there. 1541 01:04:13,130 --> 01:04:14,780 And I'm going to leave NULL blank implicitly. 1542 01:04:14,780 --> 01:04:15,990 That's the end of the list. 1543 01:04:15,990 --> 01:04:19,820 But now I, of course, conceptually have to link the node for 1 1544 01:04:19,820 --> 01:04:21,140 to the node for 2. 1545 01:04:21,140 --> 01:04:25,700 And here's where C syntax, even though it's new, kind of finally makes sense. 1546 01:04:25,700 --> 01:04:30,530 Notice here, I'm saying list arrow next equals NULL. 1547 01:04:30,530 --> 01:04:32,840 That maps perfectly to the picture. 1548 01:04:32,840 --> 01:04:37,910 List arrow x equals what? n. 1549 01:04:37,910 --> 01:04:39,680 Well, n is this thing over here. 1550 01:04:39,680 --> 01:04:41,450 So I just draw the arrow there. 1551 01:04:41,450 --> 01:04:46,040 And so the code actually finally lines up even though it's new for today. 1552 01:04:46,040 --> 01:04:48,480 So now I've drawn the picture as follows with 1 and 2. 1553 01:04:48,480 --> 01:04:50,397 Let's go ahead and add a third and final node. 1554 01:04:50,397 --> 01:04:53,640 This one containing the number 3, using these lines here. 1555 01:04:53,640 --> 01:04:56,067 So line 40 gives me a new node with malloc. 1556 01:04:56,067 --> 01:04:57,650 So that's going to give me a new node. 1557 01:04:57,650 --> 01:04:59,030 I'll draw it as a rectangle over here. 1558 01:04:59,030 --> 01:05:00,410 I'm drawing it left to right, but these things 1559 01:05:00,410 --> 01:05:01,993 could be all over the place in memory. 1560 01:05:01,993 --> 01:05:03,590 It doesn't matter where they end up. 1561 01:05:03,590 --> 01:05:06,440 I'm going to go ahead and check as before that's 1562 01:05:06,440 --> 01:05:07,915 it's not NULL, just to be safe. 1563 01:05:07,915 --> 01:05:10,040 Then I'm going to go ahead and install the number 3 1564 01:05:10,040 --> 01:05:11,943 and NULL in there just as before. 1565 01:05:11,943 --> 01:05:13,610 So that means let's go ahead and draw 3. 1566 01:05:13,610 --> 01:05:16,130 I'm going to leave that blank because it's going to be NULL. 1567 01:05:16,130 --> 01:05:18,740 And then the last line, you wouldn't typically 1568 01:05:18,740 --> 01:05:21,710 hard code this or write this explicitly in a program. 1569 01:05:21,710 --> 01:05:23,840 This is a bit more verbose than you need to. 1570 01:05:23,840 --> 01:05:27,860 Let me propose that you would probably use some kind of loop instead 1571 01:05:27,860 --> 01:05:31,730 and walk through the data structure step by step as I proposed earlier. 1572 01:05:31,730 --> 01:05:34,430 But if we really want to do this just for demonstration's sake, 1573 01:05:34,430 --> 01:05:38,030 notice, start at list, follow an arrow and go to next. 1574 01:05:38,030 --> 01:05:40,040 Follow another arrow and go to next. 1575 01:05:40,040 --> 01:05:42,540 We can literally do that with our picture. 1576 01:05:42,540 --> 01:05:43,460 So here we go. 1577 01:05:43,460 --> 01:05:48,020 Let me start at list and follow an arrow and go to next. 1578 01:05:48,020 --> 01:05:50,270 Follow an arrow, go to next. 1579 01:05:50,270 --> 01:05:51,570 And now this is NULL. 1580 01:05:51,570 --> 01:05:57,890 So what I want to update is exactly this, as with line 47, which said 1581 01:05:57,890 --> 01:06:01,100 follow two arrows, look at two next fields 1582 01:06:01,100 --> 01:06:05,077 interchangeably and then set it equal to n. 1583 01:06:05,077 --> 01:06:06,410 All right, so what remains here? 1584 01:06:06,410 --> 01:06:09,500 Well, this program's whole purpose in life was just to print a list out. 1585 01:06:09,500 --> 01:06:11,600 Here's a way where you can actually use a for loop 1586 01:06:11,600 --> 01:06:14,000 to iterate over a linked list. 1587 01:06:14,000 --> 01:06:18,500 It's kind of funky because we don't have i and ints and i++ and so forth. 1588 01:06:18,500 --> 01:06:21,500 But a for loop doesn't need to use integers or i's. 1589 01:06:21,500 --> 01:06:24,860 Remember that before the first semicolon, you have initialization. 1590 01:06:24,860 --> 01:06:27,440 In between the semicolons, you have a condition. 1591 01:06:27,440 --> 01:06:29,923 And then you have an update that happens over here. 1592 01:06:29,923 --> 01:06:32,840 So you'll get more experience with this with Problem Set 5 ultimately. 1593 01:06:32,840 --> 01:06:35,300 But for today's purposes, high level, notice 1594 01:06:35,300 --> 01:06:38,870 that this gives me a temporary pointer, like my big red hand earlier. 1595 01:06:38,870 --> 01:06:40,342 That's a node star pointer. 1596 01:06:40,342 --> 01:06:42,800 And that's why I was able to point with the big fuzzy hand. 1597 01:06:42,800 --> 01:06:44,270 And I set that equal to list. 1598 01:06:44,270 --> 01:06:48,800 So whatever the list was pointing at so was my temporary fuzzy hand. 1599 01:06:48,800 --> 01:06:53,000 I'm going to follow the following loop and so long as temp does not equal 1600 01:06:53,000 --> 01:06:53,540 NULL. 1601 01:06:53,540 --> 01:06:55,880 So earlier when I was wearing the big fuzzy hand, 1602 01:06:55,880 --> 01:06:57,470 I kept pointing, pointing, pointing. 1603 01:06:57,470 --> 01:07:00,170 And I stopped once it equaled NULL. 1604 01:07:00,170 --> 01:07:03,470 So this is saying keep doing the following until it equals NULL. 1605 01:07:03,470 --> 01:07:04,580 What do I want to do? 1606 01:07:04,580 --> 01:07:06,710 I want to just print out the integer that's 1607 01:07:06,710 --> 01:07:10,550 inside of whatever I'm pointing at inside of it's number field. 1608 01:07:10,550 --> 01:07:13,130 So go to whatever I'm pointing at, follow the arrow, 1609 01:07:13,130 --> 01:07:14,360 and go to the number field. 1610 01:07:14,360 --> 01:07:16,640 That's how we get at the data inside. 1611 01:07:16,640 --> 01:07:20,220 Once I've printed that out, for loops say that you just update a variable. 1612 01:07:20,220 --> 01:07:24,150 So what is that variable temp equals temp arrow next. 1613 01:07:24,150 --> 01:07:26,510 So if my fuzzy hand is pointing at someone 1614 01:07:26,510 --> 01:07:29,990 and I need to update it to point at temp arrow next, 1615 01:07:29,990 --> 01:07:33,080 that means go to whatever I'm pointing at, follow the arrow. 1616 01:07:33,080 --> 01:07:35,090 There's the next field and point at whatever 1617 01:07:35,090 --> 01:07:37,567 the next field was pointing at. 1618 01:07:37,567 --> 01:07:39,650 So you just keep updating what you're pointing at. 1619 01:07:39,650 --> 01:07:40,850 That prints out the list. 1620 01:07:40,850 --> 01:07:43,350 And then-- and we'll defer this ultimately to Problem 1621 01:07:43,350 --> 01:07:46,580 Set 5-- we will need to free this memory. 1622 01:07:46,580 --> 01:07:50,247 And actually you have to be a little clever about how you free memory, 1623 01:07:50,247 --> 01:07:52,580 but I'm going to use a while loop there, which turns out 1624 01:07:52,580 --> 01:07:55,010 to be a little cleaner, a little easier, ultimately 1625 01:07:55,010 --> 01:07:57,710 to free all of this mess I made in my computer's memory. 1626 01:07:57,710 --> 01:07:59,960 I kind of need to do the equivalent of freeing things, 1627 01:07:59,960 --> 01:08:03,200 but I need to free what's behind me, not what's in front of me. 1628 01:08:03,200 --> 01:08:07,500 Once you free memory, you should not touch it, traverse it, and so forth. 1629 01:08:07,500 --> 01:08:11,002 But again more on that final note in P Set 5. 1630 01:08:11,002 --> 01:08:13,210 All right, any questions on a high level on the code? 1631 01:08:13,210 --> 01:08:15,200 It's fine if it looks quite new. 1632 01:08:15,200 --> 01:08:18,170 We make it available so that you have a starting point when it comes 1633 01:08:18,170 --> 01:08:22,520 to using this kind of code yourself. 1634 01:08:22,520 --> 01:08:24,910 Any questions? 1635 01:08:24,910 --> 01:08:26,660 All right, so someone came up during break 1636 01:08:26,660 --> 01:08:29,930 and noted that this actually seems to be a regression in that arrays gave us 1637 01:08:29,930 --> 01:08:32,930 the ability to resize, even though it was a little expensive because you 1638 01:08:32,930 --> 01:08:35,149 got to copy everything from one place to another. 1639 01:08:35,149 --> 01:08:39,090 But we had random access and therefore binary search and therefore 1640 01:08:39,090 --> 01:08:42,149 logarithmic running time for things like searching assorted lists. 1641 01:08:42,149 --> 01:08:43,399 We seem to have given that up. 1642 01:08:43,399 --> 01:08:47,420 Linked lists give us dynamism where we and shrink things without wasting time 1643 01:08:47,420 --> 01:08:48,890 by moving things around. 1644 01:08:48,890 --> 01:08:50,540 But we've lost random access. 1645 01:08:50,540 --> 01:08:51,470 But you know what? 1646 01:08:51,470 --> 01:08:54,770 Now, that we have this ability using pointers and data structures to kind 1647 01:08:54,770 --> 01:08:59,128 of stitch things together in memory, connect things with arrows, 1648 01:08:59,128 --> 01:09:00,920 you know what, we can build fancier things. 1649 01:09:00,920 --> 01:09:02,878 Most of you are probably familiar with the idea 1650 01:09:02,878 --> 01:09:06,529 of a family tree, which is this hierarchical two dimensional structure. 1651 01:09:06,529 --> 01:09:08,420 And indeed, that's our inspiration here. 1652 01:09:08,420 --> 01:09:11,779 What if we don't just keep making one dimensional data structures, 1653 01:09:11,779 --> 01:09:15,350 arrays that go left and right, linked lists that kind of go left to right? 1654 01:09:15,350 --> 01:09:18,950 What if we actually use a vertical notion too and lay things 1655 01:09:18,950 --> 01:09:20,060 out more interestingly. 1656 01:09:20,060 --> 01:09:21,779 What can we gain from this? 1657 01:09:21,779 --> 01:09:26,880 Well, let me propose that anytime we've seen an array, 1658 01:09:26,880 --> 01:09:29,484 we can actually re-implement an array, but get 1659 01:09:29,484 --> 01:09:32,359 the best of both worlds, the best of arrays, the best of linked lists 1660 01:09:32,359 --> 01:09:33,000 as follows. 1661 01:09:33,000 --> 01:09:37,250 Here is an array, back from Week 1 or even Week 0 1662 01:09:37,250 --> 01:09:39,140 when we were searching behind doors. 1663 01:09:39,140 --> 01:09:43,790 And here, Week 2, when we were searching behind doors, 1664 01:09:43,790 --> 01:09:46,880 let's go ahead and note that if we were to do binary search on this 1665 01:09:46,880 --> 01:09:50,720 looking for some value, as before, many times you look in the middle first. 1666 01:09:50,720 --> 01:09:52,622 And then you decide, do you go left or right? 1667 01:09:52,622 --> 01:09:55,580 And if you go left or right, you'd look in the middle element over here 1668 01:09:55,580 --> 01:09:57,280 or the middle element over here. 1669 01:09:57,280 --> 01:09:58,280 And then what do you do? 1670 01:09:58,280 --> 01:10:01,010 You go left or right, looking at the middle element over here 1671 01:10:01,010 --> 01:10:03,587 or over here or over here or over here. 1672 01:10:03,587 --> 01:10:04,170 You know what? 1673 01:10:04,170 --> 01:10:06,500 Let me just kind of explode this picture because all of this 1674 01:10:06,500 --> 01:10:07,910 is happening in one dimension. 1675 01:10:07,910 --> 01:10:11,150 We can actually think of this is happening really in two dimensions. 1676 01:10:11,150 --> 01:10:13,850 Let me draw my same array, 1, 2, 3, 4, 5, 6, 1677 01:10:13,850 --> 01:10:17,060 7, but let me represent it on different levels that's 1678 01:10:17,060 --> 01:10:18,470 indicative of what's happening. 1679 01:10:18,470 --> 01:10:19,550 I start in the middle. 1680 01:10:19,550 --> 01:10:21,292 And I go left or I go right. 1681 01:10:21,292 --> 01:10:23,000 I then go ahead and look at this element. 1682 01:10:23,000 --> 01:10:25,160 And then I go left or I go right. 1683 01:10:25,160 --> 01:10:28,160 So it's the same thing, but it's a two-dimensional rendition 1684 01:10:28,160 --> 01:10:32,270 of what we've been doing for a few weeks whenever we've done binary search. 1685 01:10:32,270 --> 01:10:34,220 Well, you know what this kind of looks like? 1686 01:10:34,220 --> 01:10:37,550 It kind of looks like a linked list, albeit without the arrows. 1687 01:10:37,550 --> 01:10:40,940 But you know what, I don't think I want to stitch this together from 1 to 2 1688 01:10:40,940 --> 01:10:44,270 to 3 to 4 to 5 to 6 to 7, because that's just going to be a linked list. 1689 01:10:44,270 --> 01:10:47,870 But what if I use my new-found familiarity with pointers, 1690 01:10:47,870 --> 01:10:49,260 use a few more of them? 1691 01:10:49,260 --> 01:10:53,240 So I spend more space and stitch this data structure together 1692 01:10:53,240 --> 01:10:55,490 in two dimensions conceptually. 1693 01:10:55,490 --> 01:10:57,800 Every node represented here is a rectangle. 1694 01:10:57,800 --> 01:10:59,770 It doesn't have to have just one pointer. 1695 01:10:59,770 --> 01:11:03,740 There's nothing stopping me from creating a new struct, a new definition 1696 01:11:03,740 --> 01:11:05,870 of node that has two pointers. 1697 01:11:05,870 --> 01:11:07,070 Maybe it's called left. 1698 01:11:07,070 --> 01:11:08,660 Maybe it's called right. 1699 01:11:08,660 --> 01:11:10,700 Previously, we had just one we called it next. 1700 01:11:10,700 --> 01:11:12,992 But there's nothing stopping us from creating a fancier 1701 01:11:12,992 --> 01:11:14,480 structure that actually has two. 1702 01:11:14,480 --> 01:11:17,917 And so we might make it look not like this as before for a linked list, 1703 01:11:17,917 --> 01:11:19,500 but let's get rid of the next pointer. 1704 01:11:19,500 --> 01:11:20,750 Let's make a little more room. 1705 01:11:20,750 --> 01:11:24,750 And let's actually give myself two pointers, left and right. 1706 01:11:24,750 --> 01:11:27,320 And I claim that this structure now in C could 1707 01:11:27,320 --> 01:11:32,810 be used to implement the tree that I just described, the family-like tree, 1708 01:11:32,810 --> 01:11:37,170 more properly called a binary search tree, in the following way. 1709 01:11:37,170 --> 01:11:39,260 This is a binary search tree. 1710 01:11:39,260 --> 01:11:43,670 One, because every node in the tree has at most two children, 1711 01:11:43,670 --> 01:11:45,980 hence the bi in binary, meaning maximally two. 1712 01:11:45,980 --> 01:11:49,100 It has zero children, as like these down here. 1713 01:11:49,100 --> 01:11:51,050 Or it has maximally two children. 1714 01:11:51,050 --> 01:11:52,760 Hence, the bi in binary search tree. 1715 01:11:52,760 --> 01:11:57,500 It's a search tree in the sense that I have taken care with this data 1716 01:11:57,500 --> 01:11:59,660 to sort things properly. 1717 01:11:59,660 --> 01:12:01,760 Notice the following definition. 1718 01:12:01,760 --> 01:12:06,920 For any node in the tree, every element to the left is smaller than it. 1719 01:12:06,920 --> 01:12:10,730 And every element to the right is greater than it. 1720 01:12:10,730 --> 01:12:14,390 That's a recursive definition, because watch, look at this node. 1721 01:12:14,390 --> 01:12:17,060 Everything to the left of it is smaller. 1722 01:12:17,060 --> 01:12:19,730 Everything to the right of it is larger. 1723 01:12:19,730 --> 01:12:20,750 Let's look at 6. 1724 01:12:20,750 --> 01:12:22,850 Everything to the left of it is smaller. 1725 01:12:22,850 --> 01:12:24,770 Everything to the right of it is larger. 1726 01:12:24,770 --> 01:12:27,860 So it's recursive in the sense that no matter what node you look at, 1727 01:12:27,860 --> 01:12:31,370 no matter what rectangle you look at, what I just said correctly 1728 01:12:31,370 --> 01:12:36,800 is true of both the left child or subtree and the right child or subtree. 1729 01:12:36,800 --> 01:12:39,917 So this is to say if you have a list of numbers, for instance, 1730 01:12:39,917 --> 01:12:41,750 or a list of anything and you actually store 1731 01:12:41,750 --> 01:12:45,590 them using nodes that look like this, but conceptually what you're really 1732 01:12:45,590 --> 01:12:48,380 doing is stitching them together two dimensionally 1733 01:12:48,380 --> 01:12:53,030 like this, guess what feature we just gain back? 1734 01:12:53,030 --> 01:12:54,925 What have we just improved? 1735 01:12:54,925 --> 01:12:56,300 I heard some murmuring over here. 1736 01:12:56,300 --> 01:12:57,360 AUDIENCE: Binary search. 1737 01:12:57,360 --> 01:12:59,235 DAVID MALAN: We've gotten back binary search. 1738 01:12:59,235 --> 01:13:01,400 So we still have dynamism, like a linked list. 1739 01:13:01,400 --> 01:13:02,690 We're still using pointers. 1740 01:13:02,690 --> 01:13:05,570 And suppose we want to add the number 0 or the number 8, 1741 01:13:05,570 --> 01:13:08,870 you could imagine 0 going over here, 8 going over here. 1742 01:13:08,870 --> 01:13:12,080 So we could still just plug them in without having to move everything 1743 01:13:12,080 --> 01:13:13,790 around like we would for an array. 1744 01:13:13,790 --> 01:13:16,670 But because you're stitching things together with additional arrows 1745 01:13:16,670 --> 01:13:20,690 wherever they are in memory, so long as you keep track of this data structure, 1746 01:13:20,690 --> 01:13:23,740 called a tree, with one pointer to the so-called root-- 1747 01:13:23,740 --> 01:13:27,040 the root being upside down in this world of computer science-- 1748 01:13:27,040 --> 01:13:29,440 this is the root of this binary search tree, 1749 01:13:29,440 --> 01:13:32,150 guess what you do if you're looking for the number 7? 1750 01:13:32,150 --> 01:13:33,400 Well, you see 4. 1751 01:13:33,400 --> 01:13:34,900 You know it's greater than 4. 1752 01:13:34,900 --> 01:13:36,070 So what do you do? 1753 01:13:36,070 --> 01:13:40,180 You move to the right, thereby ignoring the other half of this tree, 1754 01:13:40,180 --> 01:13:42,880 just like the other half of the phone book in Week 0. 1755 01:13:42,880 --> 01:13:45,550 Once you get to 6, you consider, I'm looking for 7. 1756 01:13:45,550 --> 01:13:46,280 What do I know? 1757 01:13:46,280 --> 01:13:47,740 It's got to be to the right. 1758 01:13:47,740 --> 01:13:48,657 And so you go. 1759 01:13:48,657 --> 01:13:50,740 The height of this tree happens to be logarithmic, 1760 01:13:50,740 --> 01:13:53,140 for those familiar, log base 2 of n, which 1761 01:13:53,140 --> 01:13:56,590 is to say I have 8 elements or 7 elements in this tree. 1762 01:13:56,590 --> 01:14:01,630 But it only takes me 1, 2, 3 steps to find the value. 1763 01:14:01,630 --> 01:14:07,040 It does not take big O of n, or a linear number of steps. 1764 01:14:07,040 --> 01:14:09,460 And if you want your mind really to be blown here, 1765 01:14:09,460 --> 01:14:13,762 it turns out this is actually the best application for recursion, 1766 01:14:13,762 --> 01:14:15,970 which might have felt a little forced previously when 1767 01:14:15,970 --> 01:14:17,980 we built Mario's pyramid with recursion where 1768 01:14:17,980 --> 01:14:23,800 you did factorial or product or sum or something 1769 01:14:23,800 --> 01:14:25,720 like that in section recursively. 1770 01:14:25,720 --> 01:14:29,230 It turns out that now that we have data structures that exist conceptually 1771 01:14:29,230 --> 01:14:32,040 in two dimensions that are recursively defined-- 1772 01:14:32,040 --> 01:14:36,550 and by recursively defined, I mean for any given node, left is smaller, 1773 01:14:36,550 --> 01:14:40,270 right is bigger, and you can make that statement about any node in the tree-- 1774 01:14:40,270 --> 01:14:44,020 watch what we can do in terms of implementing binary search. 1775 01:14:44,020 --> 01:14:47,200 If I have here a function called search, whose purpose in life 1776 01:14:47,200 --> 01:14:50,650 is to return true or false if the number 50 is in the tree. 1777 01:14:50,650 --> 01:14:52,210 How do you search a tree? 1778 01:14:52,210 --> 01:14:53,860 Well, it takes as input the tree. 1779 01:14:53,860 --> 01:14:56,027 More specifically, it takes the address of the tree. 1780 01:14:56,027 --> 01:14:59,950 More specifically, it takes the address of the root of the tree. 1781 01:14:59,950 --> 01:15:03,010 That is when you want to search a tree, you literally just hand it 1782 01:15:03,010 --> 01:15:06,610 the address of the very first tip top node called the root. 1783 01:15:06,610 --> 01:15:08,920 And from there, you can get everywhere else. 1784 01:15:08,920 --> 01:15:11,720 Just like with the list, we just need the beginning of the list. 1785 01:15:11,720 --> 01:15:13,930 So how do I go about searching a tree? 1786 01:15:13,930 --> 01:15:15,820 Well, let's consider the easy case first. 1787 01:15:15,820 --> 01:15:18,550 Suppose the address you're handed is null, 1788 01:15:18,550 --> 01:15:21,970 what should you do if you're looking for 50, 1789 01:15:21,970 --> 01:15:24,942 but you're handed the empty address, zeros? 1790 01:15:24,942 --> 01:15:25,900 AUDIENCE: Return false. 1791 01:15:25,900 --> 01:15:27,650 DAVID MALAN: Probably return false, right. 1792 01:15:27,650 --> 01:15:30,820 If I hand you no tree and I say it's 50 in here, it's an easy answer. 1793 01:15:30,820 --> 01:15:33,220 No, there's no 50, because there's no tree. 1794 01:15:33,220 --> 01:15:36,220 So that's our base case, if you recall that nomenclature 1795 01:15:36,220 --> 01:15:37,930 from our discussion of recursion. 1796 01:15:37,930 --> 01:15:38,710 You hard code. 1797 01:15:38,710 --> 01:15:42,460 You type in manually one explicit case that just gets you out of the program. 1798 01:15:42,460 --> 01:15:47,530 Next case, if 50 is less than the tree, follow the arrow 1799 01:15:47,530 --> 01:15:49,308 to the number field, then what do know? 1800 01:15:49,308 --> 01:15:51,100 50 is less than the node you're looking at. 1801 01:15:51,100 --> 01:15:53,033 What direction do you want to go conceptually? 1802 01:15:53,033 --> 01:15:53,950 AUDIENCE: To the left. 1803 01:15:53,950 --> 01:15:56,110 DAVID MALAN: You want to go to the left. 1804 01:15:56,110 --> 01:16:01,930 So this line here searches the tree's left child, so to speak, 1805 01:16:01,930 --> 01:16:04,390 in the family tree sense, the left subtree. 1806 01:16:04,390 --> 01:16:06,310 So if we go back to the picture a moment ago, 1807 01:16:06,310 --> 01:16:09,370 if I'm looking for 50 in that story-- or let's make it more real, 1808 01:16:09,370 --> 01:16:11,920 if I'm looking for 1 in the current story, 1809 01:16:11,920 --> 01:16:14,030 I see that 1 is less than the current node. 1810 01:16:14,030 --> 01:16:17,110 So I go ahead and just search the left subtree. 1811 01:16:17,110 --> 01:16:18,550 And notice, this is a tree. 1812 01:16:18,550 --> 01:16:21,550 But so is this if you look at it in isolation. 1813 01:16:21,550 --> 01:16:22,780 And so is this. 1814 01:16:22,780 --> 01:16:25,180 And therein lies the recursive opportunity. 1815 01:16:25,180 --> 01:16:28,263 So again here, if 50 is less than the tree's number, 1816 01:16:28,263 --> 01:16:29,680 then go ahead and search the left. 1817 01:16:29,680 --> 01:16:33,700 Else if 50 is greater than the tree's current number, search the right. 1818 01:16:33,700 --> 01:16:38,650 Else logically what must be the case if the tree exists and it's not less 1819 01:16:38,650 --> 01:16:41,740 than and it's not greater than the number you're looking at? 1820 01:16:41,740 --> 01:16:44,920 It must equal the number you're looking for, 50, in which case 1821 01:16:44,920 --> 01:16:45,940 we can return true. 1822 01:16:45,940 --> 01:16:49,150 But you recall perhaps from scratch, we don't really need that explicit case. 1823 01:16:49,150 --> 01:16:51,640 We can just call it else instead. 1824 01:16:51,640 --> 01:16:53,707 Any questions then on this use of code? 1825 01:16:53,707 --> 01:16:55,040 We won't actually run this code. 1826 01:16:55,040 --> 01:16:57,010 But this is how you can implement recursively 1827 01:16:57,010 --> 01:16:59,560 an algorithm that is reminiscent of Week 0 1828 01:16:59,560 --> 01:17:03,190 searching for Mike Smith in the phone book, this time now searching a data 1829 01:17:03,190 --> 01:17:07,410 structure that itself is recursive. 1830 01:17:07,410 --> 01:17:09,910 All right, so what do we gain back in terms of running time, 1831 01:17:09,910 --> 01:17:12,730 in terms of searching a binary search tree. 1832 01:17:12,730 --> 01:17:16,000 To be clear, what's an upper bound on the running time? 1833 01:17:16,000 --> 01:17:18,760 We're back to log n, which was the goal. 1834 01:17:18,760 --> 01:17:22,277 And what about inserting into a binary search tree? 1835 01:17:22,277 --> 01:17:25,360 This one we're going to defer to a higher level CS class, because it turns 1836 01:17:25,360 --> 01:17:29,292 out you don't want to just go ahead and put 0 over there, and 8 over there, 1837 01:17:29,292 --> 01:17:32,500 because if you keep doing that, putting smaller and smaller numbers or bigger 1838 01:17:32,500 --> 01:17:35,860 and bigger numbers, you could imagine your tree getting very lanky, 1839 01:17:35,860 --> 01:17:39,730 like very tall over here or maybe very tall over here 1840 01:17:39,730 --> 01:17:42,820 and therefore not nearly as balanced as the tree we drew. 1841 01:17:42,820 --> 01:17:45,190 And so it turns out there are algorithms that let 1842 01:17:45,190 --> 01:17:47,528 you keep a binary search tree balanced. 1843 01:17:47,528 --> 01:17:50,320 So even as you add elements to it, you kind of shift things around. 1844 01:17:50,320 --> 01:17:51,838 You don't remove them in memory. 1845 01:17:51,838 --> 01:17:54,130 You just update the pointer, so that the data structure 1846 01:17:54,130 --> 01:17:56,000 itself does not get terribly high. 1847 01:17:56,000 --> 01:18:00,190 But that too is log n, which means we had arrays, which gave us binary search 1848 01:18:00,190 --> 01:18:01,822 capabilities in logarithmic time. 1849 01:18:01,822 --> 01:18:04,780 We then introduced the linked list, which gave us dynamism, the ability 1850 01:18:04,780 --> 01:18:06,670 to grow and, if we want, shrink. 1851 01:18:06,670 --> 01:18:08,650 But we sacrifice binary search. 1852 01:18:08,650 --> 01:18:11,860 But if we spend a little more space and use not one pointer for every node, 1853 01:18:11,860 --> 01:18:16,090 but two, we can actually tip the scales again, spend more space 1854 01:18:16,090 --> 01:18:20,770 and save time by searching the data structure, this time using 1855 01:18:20,770 --> 01:18:23,420 something logarithmic. 1856 01:18:23,420 --> 01:18:25,940 All right, so what would the ideal, though, be? 1857 01:18:25,940 --> 01:18:28,100 Every time we talk about running time, it 1858 01:18:28,100 --> 01:18:31,930 feels like we want to be low on this list and not high. 1859 01:18:31,930 --> 01:18:33,300 n squared was slow. 1860 01:18:33,300 --> 01:18:35,120 Big O of 1 is constant time. 1861 01:18:35,120 --> 01:18:36,050 That's fast. 1862 01:18:36,050 --> 01:18:38,150 Wouldn't it be nice throughout this story 1863 01:18:38,150 --> 01:18:42,890 if we actually found our way to a data structure that gave us constant time? 1864 01:18:42,890 --> 01:18:44,870 Like, my god, if we could just insert something 1865 01:18:44,870 --> 01:18:48,560 into a data structure with one step and find something in a data structure 1866 01:18:48,560 --> 01:18:51,200 with one step, that's sort of the holy grail, so to speak, 1867 01:18:51,200 --> 01:18:54,530 because you don't have to worry about big O of n or big O of log n. 1868 01:18:54,530 --> 01:18:57,082 You just jump immediately to the value you want. 1869 01:18:57,082 --> 01:18:58,790 Well, it turns out, theoretically there's 1870 01:18:58,790 --> 01:19:01,910 something that allows you to achieve that called a hash table. 1871 01:19:01,910 --> 01:19:04,770 But how you implement that is not necessarily obvious. 1872 01:19:04,770 --> 01:19:06,440 And it takes some expertise. 1873 01:19:06,440 --> 01:19:09,140 And indeed, in Problem Set 5 among the goals at hand 1874 01:19:09,140 --> 01:19:12,290 is to implement exactly this notion of a hash table that lets 1875 01:19:12,290 --> 01:19:14,870 you spell check a document super fast. 1876 01:19:14,870 --> 01:19:17,420 A word processing program would be so slow 1877 01:19:17,420 --> 01:19:21,080 if every time you wanted to check a word for whether it's spelled correctly 1878 01:19:21,080 --> 01:19:24,410 or incorrectly, if you had to search linearly or even longer rhythmically 1879 01:19:24,410 --> 01:19:28,340 a big dictionary file, it might actually be really slow to spell check a file. 1880 01:19:28,340 --> 01:19:31,570 But using a hash table, we can probably do much better. 1881 01:19:31,570 --> 01:19:37,673 A hash table is a combination of an array and linked lists inside of it. 1882 01:19:37,673 --> 01:19:40,340 So I'm going to go ahead and just for convenience draw my array, 1883 01:19:40,340 --> 01:19:42,080 this time vertically instead of horizontally. 1884 01:19:42,080 --> 01:19:42,870 But it's the same thing. 1885 01:19:42,870 --> 01:19:44,690 And it's just an artist's rendition anyway. 1886 01:19:44,690 --> 01:19:49,322 And suppose the goal at hand is to keep track efficiently of like a name tags. 1887 01:19:49,322 --> 01:19:50,780 So maybe we're holding a big event. 1888 01:19:50,780 --> 01:19:53,360 We've made some name tags in advance, which we indeed have. 1889 01:19:53,360 --> 01:19:55,940 And we want people to be able to pick up these name tags super efficiently. 1890 01:19:55,940 --> 01:19:58,100 It would be really annoying and pretty dumb 1891 01:19:58,100 --> 01:19:59,870 if we just made a big stack and name tags, 1892 01:19:59,870 --> 01:20:03,500 even if it's alphabetical, A to Z, then had everyone in the room line up 1893 01:20:03,500 --> 01:20:06,620 and look through all of the darn name tags looking for their name. 1894 01:20:06,620 --> 01:20:08,330 That's not a very inefficient system. 1895 01:20:08,330 --> 01:20:11,630 Fortunately, we've come prepared with some buckets, all of which are labeled, 1896 01:20:11,630 --> 01:20:14,297 because wouldn't it be nice if you're looking for your name tag, 1897 01:20:14,297 --> 01:20:17,090 you don't look through the whole darn list of name tags or stack? 1898 01:20:17,090 --> 01:20:18,800 You actually just go to your bucket. 1899 01:20:18,800 --> 01:20:21,290 And you jump instantly to your name, where hopefully you're 1900 01:20:21,290 --> 01:20:23,770 the only person with a name that starts with some letter. 1901 01:20:23,770 --> 01:20:25,520 And then you can just reach in and get it. 1902 01:20:25,520 --> 01:20:27,560 Well, how do we implement this conceptually? 1903 01:20:27,560 --> 01:20:30,320 Well, it's very common with a hash table if the inputs 1904 01:20:30,320 --> 01:20:34,730 are things like words or names to look at the characters in those words 1905 01:20:34,730 --> 01:20:38,760 to decide where to put those names or those name tags, if you will. 1906 01:20:38,760 --> 01:20:41,223 So here's an array of size 26, from 0 to 25. 1907 01:20:41,223 --> 01:20:43,640 But, you know what, It's convenient to think of this array 1908 01:20:43,640 --> 01:20:47,960 as maybe being indexed from A through Z. So still 26 buckets, 1909 01:20:47,960 --> 01:20:53,930 but this array is really just of size 26, 0 through 25 ultimately. 1910 01:20:53,930 --> 01:20:57,140 And suppose the goal at hand now is to go ahead and store these name 1911 01:20:57,140 --> 01:20:58,080 tags in advance. 1912 01:20:58,080 --> 01:20:59,780 So this is what the staff and I would do in advance. 1913 01:20:59,780 --> 01:21:01,490 And, Brian, if you wouldn't mind helping out with this. 1914 01:21:01,490 --> 01:21:04,490 The goal at hand is quite simply to get the name tags ready for students 1915 01:21:04,490 --> 01:21:05,130 to pick up. 1916 01:21:05,130 --> 01:21:07,970 And so where do I want to go ahead and put the first one? 1917 01:21:07,970 --> 01:21:10,250 So Albus is the first one whose name tag we made. 1918 01:21:10,250 --> 01:21:12,500 I'm going to go ahead and jump immediately to bucket 0 1919 01:21:12,500 --> 01:21:15,050 and put Albus's name right there in one step. 1920 01:21:15,050 --> 01:21:17,420 Meanwhile I've got Zacharias, and so even 1921 01:21:17,420 --> 01:21:21,390 though it's taking me a bunch of steps to go over here, if this is an array, 1922 01:21:21,390 --> 01:21:25,100 I have random access, as a human, and so I can immediately, 1923 01:21:25,100 --> 01:21:27,590 instantly put Zacharias over there. 1924 01:21:27,590 --> 01:21:30,260 It's a little laborious for my feet, but a computer could just 1925 01:21:30,260 --> 01:21:33,410 jump to 0 or 25 or anything in between. 1926 01:21:33,410 --> 01:21:34,720 All right, so Hermione-- 1927 01:21:34,720 --> 01:21:37,428 maybe you're noticing the pattern-- so Hermione is going to be H, 1928 01:21:37,428 --> 01:21:39,710 or which is 7, which is going to be over here. 1929 01:21:39,710 --> 01:21:41,450 Ginny is 6, which is over here. 1930 01:21:41,450 --> 01:21:43,830 Ron is 17, which is over here. 1931 01:21:43,830 --> 01:21:47,870 So think of each of my multiple steps taking actually one step. 1932 01:21:47,870 --> 01:21:49,885 Fred is going to go over here. 1933 01:21:49,885 --> 01:21:52,010 As an aside, the staff and I discussed this morning 1934 01:21:52,010 --> 01:21:54,427 how we probably should've put the buckets closer together. 1935 01:21:54,427 --> 01:21:55,670 But that's OK. 1936 01:21:55,670 --> 01:21:57,560 Severus is going to go over here. 1937 01:21:57,560 --> 01:21:59,150 Petunia is going to go over here. 1938 01:21:59,150 --> 01:22:03,430 Draco is way over here, but doesn't matter, constant time, bracket 3. 1939 01:22:03,430 --> 01:22:05,930 James is bracket 9. 1940 01:22:05,930 --> 01:22:09,620 Cedric is bracket 2. 1941 01:22:09,620 --> 01:22:11,390 Perhaps play this part in 2x speed. 1942 01:22:11,390 --> 01:22:13,850 Luna is bucket 11. 1943 01:22:13,850 --> 01:22:15,980 Neville bucket 13. 1944 01:22:15,980 --> 01:22:18,690 Kingsley bucket 10. 1945 01:22:18,690 --> 01:22:20,570 Kingsley, there we go. 1946 01:22:20,570 --> 01:22:22,580 Minerva bucket 12. 1947 01:22:22,580 --> 01:22:25,095 Vernon-- ironically, we don't actually need this many names 1948 01:22:25,095 --> 01:22:26,720 to make the point we're trying to make. 1949 01:22:26,720 --> 01:22:30,230 But Vernon-- we got a little carried away with the names we recognized. 1950 01:22:30,230 --> 01:22:31,877 And now, the list is pretty full. 1951 01:22:31,877 --> 01:22:33,710 All right, so that's a whole bunch of names. 1952 01:22:33,710 --> 01:22:36,110 I filled up most of the buckets with a name tag. 1953 01:22:36,110 --> 01:22:37,400 But-- why am I out of breath? 1954 01:22:37,400 --> 01:22:43,550 But what's really convenient now is that if Cedric or Albus or Draco or Fred 1955 01:22:43,550 --> 01:22:47,180 or Ginny come into the room, they can index instantly, 1956 01:22:47,180 --> 01:22:50,030 randomly, to their pocket, get their name tag, and go. 1957 01:22:50,030 --> 01:22:50,680 Nothing linear. 1958 01:22:50,680 --> 01:22:53,180 They don't have to flip through the whole stack of name tags 1959 01:22:53,180 --> 01:22:55,800 with which I actually began the story. 1960 01:22:55,800 --> 01:22:57,920 But there's a problem ahead. 1961 01:22:57,920 --> 01:23:01,175 We very deliberately ordered the name tags thus far in such a way 1962 01:23:01,175 --> 01:23:03,050 that we don't create a problem for ourselves. 1963 01:23:03,050 --> 01:23:06,770 But among the more famous characters we've not heard from yet is Harry. 1964 01:23:06,770 --> 01:23:08,840 So Harry's name tag is still here. 1965 01:23:08,840 --> 01:23:09,990 Where does this go? 1966 01:23:09,990 --> 01:23:12,470 Well, Harry is going to go in bucket 7. 1967 01:23:12,470 --> 01:23:15,230 But wait a minute, there's already someone there. 1968 01:23:15,230 --> 01:23:16,880 So what do I do? 1969 01:23:16,880 --> 01:23:20,180 If I were only using an array, Harry's kind of out of luck. 1970 01:23:20,180 --> 01:23:23,430 Like Hermione is already in that location in the array. 1971 01:23:23,430 --> 01:23:26,520 And we would have to decide, either Hermione goes there or Harry, 1972 01:23:26,520 --> 01:23:28,110 but we can't just put them both. 1973 01:23:28,110 --> 01:23:32,130 But if we implement this new data structure called a hash table using 1974 01:23:32,130 --> 01:23:37,485 an array that's conceptually vertical, but that horizontally is a linked list, 1975 01:23:37,485 --> 01:23:38,610 you know what, that's fine. 1976 01:23:38,610 --> 01:23:41,707 We're just going to go ahead and link Hermione's and Harry's together. 1977 01:23:41,707 --> 01:23:44,790 So, yes, it's going to take both of them or one of them at least two steps 1978 01:23:44,790 --> 01:23:45,960 to find their name tag. 1979 01:23:45,960 --> 01:23:49,020 But it's not going to take big O of n steps to find their name tag, 1980 01:23:49,020 --> 01:23:51,210 at least if there's only two in this bucket. 1981 01:23:51,210 --> 01:23:53,460 All right, Hagrid, dammit, so he came in the door too. 1982 01:23:53,460 --> 01:23:55,780 So now that linked list is getting a little longer. 1983 01:23:55,780 --> 01:23:59,010 We now have a chain, if you will, a linked list of size 3. 1984 01:23:59,010 --> 01:24:02,400 Sirius is going to go over here in bucket 18. 1985 01:24:02,400 --> 01:24:04,260 But Severus is already there too. 1986 01:24:04,260 --> 01:24:05,100 Awkward. 1987 01:24:05,100 --> 01:24:08,730 Remus is 17. 1988 01:24:08,730 --> 01:24:11,510 Remus is going to go and link together with Ron there. 1989 01:24:11,510 --> 01:24:14,880 George is going to go into bucket 6, which is over here. 1990 01:24:14,880 --> 01:24:19,107 Lily is also going to collide, so to speak with Luna. 1991 01:24:19,107 --> 01:24:20,940 And this is a collision in computer science. 1992 01:24:20,940 --> 01:24:23,460 Anytime you have a value that you're trying to put in one place 1993 01:24:23,460 --> 01:24:26,418 but there's something there, you need to resolve the collision somehow. 1994 01:24:26,418 --> 01:24:28,980 So I'm proposing that we actually just link these together. 1995 01:24:28,980 --> 01:24:33,630 Or as we're doing here, to bucketize values in computer science conceptually 1996 01:24:33,630 --> 01:24:37,740 means to throw the value into a bucket, or physically as we've done here. 1997 01:24:37,740 --> 01:24:40,270 Lucius finally is going to go in bucket 11 too. 1998 01:24:40,270 --> 01:24:44,080 And lastly, Lavender goes in that same bucket. 1999 01:24:44,080 --> 01:24:44,580 Phew. 2000 01:24:44,580 --> 01:24:46,860 So thank you to Brian for helping choreograph that. 2001 01:24:46,860 --> 01:24:51,270 So this structure that you're looking at is what is called a hash table. 2002 01:24:51,270 --> 01:24:56,160 It is an array that you index into using what's called a hash function. 2003 01:24:56,160 --> 01:24:59,940 A hash function is like any function that we've seen thus far, 2004 01:24:59,940 --> 01:25:01,500 any program we've seen thus far-- 2005 01:25:01,500 --> 01:25:04,780 something that takes input and produces output. 2006 01:25:04,780 --> 01:25:06,900 So if we consider our original picture from Week 0 2007 01:25:06,900 --> 01:25:09,900 of what computer science in itself is when it comes to solving problems, 2008 01:25:09,900 --> 01:25:13,320 hash function for today's purpose it's just this function, this process, 2009 01:25:13,320 --> 01:25:16,600 this algorithm in between that decides, given a name 2010 01:25:16,600 --> 01:25:19,560 tag, what bucket to put that name tag in. 2011 01:25:19,560 --> 01:25:21,990 And quite obviously in the real world, what algorithm 2012 01:25:21,990 --> 01:25:25,770 was I using to bucketize a name tag upon reading the name? 2013 01:25:25,770 --> 01:25:27,060 AUDIENCE: First letters. 2014 01:25:27,060 --> 01:25:28,300 DAVID MALAN: Looking at the first letter. 2015 01:25:28,300 --> 01:25:28,800 Why? 2016 01:25:28,800 --> 01:25:29,650 It's simple. 2017 01:25:29,650 --> 01:25:30,690 It's pretty efficient. 2018 01:25:30,690 --> 01:25:34,140 It means I can store a relatively small array of size 26 2019 01:25:34,140 --> 01:25:37,060 and just immediately put the name tags there. 2020 01:25:37,060 --> 01:25:41,040 So in this case, we might have fed in Albus to that hash function. 2021 01:25:41,040 --> 01:25:44,970 And it might return 0, representing A, if we're 0 indexing the array. 2022 01:25:44,970 --> 01:25:48,090 Or for someone like Zacharias, we might get out 25 2023 01:25:48,090 --> 01:25:50,130 just because the first letter of his name is z. 2024 01:25:50,130 --> 01:25:52,290 But this is kind of simplistic, right. 2025 01:25:52,290 --> 01:25:53,350 And we've seen a problem. 2026 01:25:53,350 --> 01:25:57,090 What is the problem with just looking, of course, at the users first letter 2027 01:25:57,090 --> 01:25:59,310 of their name? 2028 01:25:59,310 --> 01:26:00,320 What problem arose? 2029 01:26:00,320 --> 01:26:00,500 Yeah. 2030 01:26:00,500 --> 01:26:03,300 AUDIENCE: There might be more than one name with the first letter. 2031 01:26:03,300 --> 01:26:05,520 DAVID MALAN: There might be more than one name with the first letter. 2032 01:26:05,520 --> 01:26:06,720 And you know in the extreme-- 2033 01:26:06,720 --> 01:26:08,610 and computer scientists and software engineers 2034 01:26:08,610 --> 01:26:09,860 often think about the extreme. 2035 01:26:09,860 --> 01:26:10,870 What is the corner case? 2036 01:26:10,870 --> 01:26:11,950 What could go wrong? 2037 01:26:11,950 --> 01:26:14,730 What if by chance there's just a lot of characters 2038 01:26:14,730 --> 01:26:18,900 in this universe whose names start with h or l, and maybe all of their names 2039 01:26:18,900 --> 01:26:20,550 just happened to start with h or l? 2040 01:26:20,550 --> 01:26:22,380 It doesn't matter how fancy your hash table 2041 01:26:22,380 --> 01:26:26,130 is, it's pretty stupid if all of the name tags are stacked up in a bucket. 2042 01:26:26,130 --> 01:26:28,260 So in that sense, a hash table, even though this 2043 01:26:28,260 --> 01:26:32,040 feels like it's pretty efficient, in the worst case, big O of n, 2044 01:26:32,040 --> 01:26:34,170 when it comes to inserting and searching, 2045 01:26:34,170 --> 01:26:37,380 because you could just get unlucky and get a huge stack of names 2046 01:26:37,380 --> 01:26:40,860 that by nature of the class just all start with the same letter. 2047 01:26:40,860 --> 01:26:42,855 So how can we mitigate this? 2048 01:26:42,855 --> 01:26:43,980 How could we mitigate this? 2049 01:26:43,980 --> 01:26:45,630 Well, you know what, rather than naively only 2050 01:26:45,630 --> 01:26:48,475 looking at the first name, let's leverage some probabilities here. 2051 01:26:48,475 --> 01:26:50,850 Why don't we look not at just the first letter, but maybe 2052 01:26:50,850 --> 01:26:52,532 the first two letters? 2053 01:26:52,532 --> 01:26:54,240 I bet if we look at the first two letters 2054 01:26:54,240 --> 01:26:55,948 we're not going to get as many collisions 2055 01:26:55,948 --> 01:26:58,260 as many people belonging to the same bucket. 2056 01:26:58,260 --> 01:27:01,590 So Hermione, Harry, and Hagrid was a problem we identified earlier, 2057 01:27:01,590 --> 01:27:03,000 not to mention a few other names. 2058 01:27:03,000 --> 01:27:07,733 But that was because we were looking only at h for the hash function, only 2059 01:27:07,733 --> 01:27:09,150 at the first letter in their name. 2060 01:27:09,150 --> 01:27:11,040 What if instead we look at the first two, 2061 01:27:11,040 --> 01:27:16,110 so we have a bucket for HA, HB, HC, HD, HE, HF? 2062 01:27:16,110 --> 01:27:18,870 And so Hermione now goes in this bucket specifically. 2063 01:27:18,870 --> 01:27:20,370 So we're going to need more buckets. 2064 01:27:20,370 --> 01:27:21,600 And they're not pictured on the screen. 2065 01:27:21,600 --> 01:27:23,433 And they're also not pictured here on stage. 2066 01:27:23,433 --> 01:27:25,380 We need more than 26 buckets. 2067 01:27:25,380 --> 01:27:27,900 Frankly, if we're looking at two letters, we need 26 times 2068 01:27:27,900 --> 01:27:30,690 26, like 676 buckets now. 2069 01:27:30,690 --> 01:27:33,330 So more space, but we're hopefully going to decrease 2070 01:27:33,330 --> 01:27:34,690 the probability of collisions. 2071 01:27:34,690 --> 01:27:35,340 Why? 2072 01:27:35,340 --> 01:27:37,540 Well, the next name I might put in here is Harry. 2073 01:27:37,540 --> 01:27:39,780 He's going to end up in a different bucket this time. 2074 01:27:39,780 --> 01:27:42,600 That's great, because it would seem that now I can get access 2075 01:27:42,600 --> 01:27:44,940 to his name tag in constant time. 2076 01:27:44,940 --> 01:27:47,530 Unfortunately, Hagrid is still in the story. 2077 01:27:47,530 --> 01:27:49,650 And so we're going to have a collision with HA. 2078 01:27:49,650 --> 01:27:52,920 So even looking at the first two letters is not ideal. 2079 01:27:52,920 --> 01:27:57,445 So even though we have 676 buckets in this story, 26 times 26, which 2080 01:27:57,445 --> 01:27:59,820 is a lot of buckets, we're still going to get collisions. 2081 01:27:59,820 --> 01:28:02,767 So what would maybe the next evolution of this idea be? 2082 01:28:02,767 --> 01:28:05,850 Well, don't look at the first letter, don't look at the first two letters. 2083 01:28:05,850 --> 01:28:07,300 Why don't we look at the first three letters. 2084 01:28:07,300 --> 01:28:09,720 Surely, that's going to drive down the probability. 2085 01:28:09,720 --> 01:28:13,410 Unfortunately, that's going to drive up the number of cells in the array 2086 01:28:13,410 --> 01:28:17,715 and buckets on stage to 10,000 plus buckets this time around. 2087 01:28:17,715 --> 01:28:18,840 So that's a lot of buckets. 2088 01:28:18,840 --> 01:28:27,630 But suppose we use not HA, but maybe HAA, HAB, HAC, HAD, HAE, HAF, HAG, dot 2089 01:28:27,630 --> 01:28:35,340 dot dot, HAQ, HAR, HAS, dot dot dot, HEQ, HER, HES. 2090 01:28:35,340 --> 01:28:38,280 So we have a lot of buckets and even more in between not pictured. 2091 01:28:38,280 --> 01:28:43,170 Now we can go ahead and hash on Harry's name, Hagrid's name, Hermione's name. 2092 01:28:43,170 --> 01:28:47,385 And this time, by design, they're going to end up in different buckets, which 2093 01:28:47,385 --> 01:28:48,510 seems to be an improvement. 2094 01:28:48,510 --> 01:28:51,870 And indeed, it is, because now if I go searching 2095 01:28:51,870 --> 01:28:56,160 for Hermione or Hagrid or Harry's name tag, or they do themselves, 2096 01:28:56,160 --> 01:29:00,270 they're going to be able to find it in constant time. 2097 01:29:00,270 --> 01:29:03,990 But that's assuming there's not a lot of other kids with the name starting 2098 01:29:03,990 --> 01:29:04,800 with H. 2099 01:29:04,800 --> 01:29:08,468 And so a hash table still technically is big O of n 2100 01:29:08,468 --> 01:29:10,260 because you could just get unlucky and have 2101 01:29:10,260 --> 01:29:14,040 a big pile up of similar inputs, all of which produce the same output, 2102 01:29:14,040 --> 01:29:17,253 even if you're using a fancier hash function like this. 2103 01:29:17,253 --> 01:29:18,420 And there's a trade off too. 2104 01:29:18,420 --> 01:29:22,483 My god, we're using like almost 20,000 buckets now just 2105 01:29:22,483 --> 01:29:24,150 to store these names to speed things up. 2106 01:29:24,150 --> 01:29:26,430 At some point, you know, it's probably cheaper 2107 01:29:26,430 --> 01:29:29,730 to just let Harry and Hermione and Hagrid form a line 2108 01:29:29,730 --> 01:29:31,480 and find their name tag more slowly. 2109 01:29:31,480 --> 01:29:33,443 So there's this trade-off of time and space. 2110 01:29:33,443 --> 01:29:35,610 But if you have what's called an ideal hash function 2111 01:29:35,610 --> 01:29:38,430 and you figure out some magical algorithm written 2112 01:29:38,430 --> 01:29:42,630 in code that ensures uniqueness that no name tag will end up 2113 01:29:42,630 --> 01:29:46,830 colliding with another, then you can achieve this holy grail of big O 2114 01:29:46,830 --> 01:29:50,080 of one time, constant time for a hash function. 2115 01:29:50,080 --> 01:29:53,160 So it's this sort of tension between how much space do you want to spend? 2116 01:29:53,160 --> 01:29:54,780 How much effort do you want to spend figuring out 2117 01:29:54,780 --> 01:29:56,190 what that ideal hash function is? 2118 01:29:56,190 --> 01:29:58,620 So in the real world, and we'll see this in Python, 2119 01:29:58,620 --> 01:30:01,560 most computer systems give you a best effort, such 2120 01:30:01,560 --> 01:30:04,920 that a hash table is not big O of n usually. 2121 01:30:04,920 --> 01:30:07,110 It's actually, on average much much, much faster, 2122 01:30:07,110 --> 01:30:09,760 even though there's a theoretical risk that it can be slow. 2123 01:30:09,760 --> 01:30:11,760 And more on that too in a higher level CS course 2124 01:30:11,760 --> 01:30:14,770 where you explore data structures and algorithms more formally. 2125 01:30:14,770 --> 01:30:17,130 So technically speaking, it feels like search 2126 01:30:17,130 --> 01:30:20,100 could get down to big O of 1, constant time, 2127 01:30:20,100 --> 01:30:22,620 if every name tag ends up in a unique bucket. 2128 01:30:22,620 --> 01:30:25,440 But you could still get unlucky if there's a lot of H names 2129 01:30:25,440 --> 01:30:27,010 or L names or the like. 2130 01:30:27,010 --> 01:30:29,910 So technically speaking, a hash table is big O of n. 2131 01:30:29,910 --> 01:30:35,670 But, frankly, three names in a bucket, like Hermione, Hagrid, and Harry, 2132 01:30:35,670 --> 01:30:38,880 is much better than n names in the same bucket. 2133 01:30:38,880 --> 01:30:42,480 So even in the real world if you get rid of this asymptotic hand waviness, 2134 01:30:42,480 --> 01:30:43,160 that's faster. 2135 01:30:43,160 --> 01:30:44,910 That's much faster than putting everything 2136 01:30:44,910 --> 01:30:47,940 in a linked list or an array itself. 2137 01:30:47,940 --> 01:30:53,712 All right, so from there, I bet we can try one other approach here. 2138 01:30:53,712 --> 01:30:55,920 There's another data structure we want to present to, 2139 01:30:55,920 --> 01:30:57,690 not in code, but in pictures. 2140 01:30:57,690 --> 01:30:59,820 This one's called a trie. 2141 01:30:59,820 --> 01:31:02,430 Short for retrieval, even though it's pronounced differently. 2142 01:31:02,430 --> 01:31:05,640 A trie is a data structure that actually is pretty amazing. 2143 01:31:05,640 --> 01:31:09,630 And it follows this pattern of spending one resource to save on another. 2144 01:31:09,630 --> 01:31:12,090 A trie is going to use a lot more memory, 2145 01:31:12,090 --> 01:31:15,090 but it is going to give us actual constant time 2146 01:31:15,090 --> 01:31:19,627 lookup for things like names or words being inserted into the structure. 2147 01:31:19,627 --> 01:31:20,710 So what does it look like? 2148 01:31:20,710 --> 01:31:22,650 It's a little strong, because we need to leave room for ourselves 2149 01:31:22,650 --> 01:31:24,360 on the board with lots of memory. 2150 01:31:24,360 --> 01:31:30,720 A trie is a tree, each of whose nodes is essentially an array. 2151 01:31:30,720 --> 01:31:31,980 So notice the pattern here. 2152 01:31:31,980 --> 01:31:34,230 Computer scientists over time have been kind of clever 2153 01:31:34,230 --> 01:31:37,800 taking this idea, this idea, mashing them together and creating some monster 2154 01:31:37,800 --> 01:31:41,890 data structure, but that gives you some savings of time or space. 2155 01:31:41,890 --> 01:31:44,880 So this array at the very top represents the roots 2156 01:31:44,880 --> 01:31:48,600 of this trie, which again is a tree whose nodes are arrays. 2157 01:31:48,600 --> 01:31:51,510 And notice that the array is of size 26, for the sake of discussion, 2158 01:31:51,510 --> 01:31:53,910 A through Z, or 0 through 25. 2159 01:31:53,910 --> 01:31:55,170 A trie does this. 2160 01:31:55,170 --> 01:31:59,400 If you want to store a name in a trie, what you do, in this case, 2161 01:31:59,400 --> 01:32:03,250 is look at every letter in the word in question. 2162 01:32:03,250 --> 01:32:04,892 So for Harry' it would be H-a-r-r-y. 2163 01:32:04,892 --> 01:32:07,350 We're not just looking at the first, the second, and third. 2164 01:32:07,350 --> 01:32:08,740 We're looking at all of them. 2165 01:32:08,740 --> 01:32:10,260 And what we do is this. 2166 01:32:10,260 --> 01:32:13,650 Suppose the first letter in the person's name or their name tag or the word 2167 01:32:13,650 --> 01:32:17,250 more generally is an H. You go ahead and go to that index. 2168 01:32:17,250 --> 01:32:21,210 And if there's no child node, there's no tree yet below it, 2169 01:32:21,210 --> 01:32:23,850 another branch, if you will, you allocate another node. 2170 01:32:23,850 --> 01:32:25,600 And another node just means another array. 2171 01:32:25,600 --> 01:32:27,960 And so we've drawn two arrays on the board. 2172 01:32:27,960 --> 01:32:30,080 This now has the letter A highlighted. 2173 01:32:30,080 --> 01:32:32,830 All of the letters are technically there, because it's of course 0 2174 01:32:32,830 --> 01:32:33,450 through 25. 2175 01:32:33,450 --> 01:32:35,408 But we're only highlighting the letters we care 2176 01:32:35,408 --> 01:32:37,260 about for the sake of this example. 2177 01:32:37,260 --> 01:32:39,690 Here is H-a-g. 2178 01:32:39,690 --> 01:32:42,870 So it looks like the first name tag I'm trying to install into this data 2179 01:32:42,870 --> 01:32:44,160 structure is Hagrid. 2180 01:32:44,160 --> 01:32:46,830 Notice now that g is inside of that array. 2181 01:32:46,830 --> 01:32:48,933 I want to go now to r for Hagrid. 2182 01:32:48,933 --> 01:32:50,100 That gives me another array. 2183 01:32:50,100 --> 01:32:53,110 Now i, now d. d is the end of his name. 2184 01:32:53,110 --> 01:32:56,760 So I'm going to just color in green, or I can use like a Boolean flag in C code 2185 01:32:56,760 --> 01:32:59,520 that just says someone's name ends here. 2186 01:32:59,520 --> 01:33:03,600 So notice, I've implicitly stored Hagrid name now in this data structure 2187 01:33:03,600 --> 01:33:08,590 by storing one node, that is one array, for every letter in his name. 2188 01:33:08,590 --> 01:33:10,950 But there's this slight efficiency here because there's 2189 01:33:10,950 --> 01:33:12,960 other people in this story besides Hagrid 2190 01:33:12,960 --> 01:33:16,630 whose names are prefixes or share common prefixes. 2191 01:33:16,630 --> 01:33:19,890 So, for instance, suppose I want to install Harry into this data structure. 2192 01:33:19,890 --> 01:33:23,237 He is H-a-r-r-y. 2193 01:33:23,237 --> 01:33:25,070 And so that gives me a couple of more nodes. 2194 01:33:25,070 --> 01:33:29,510 And if I go ahead now and install Hermione in this, notice now 2195 01:33:29,510 --> 01:33:31,250 I have even more nodes in the tree. 2196 01:33:31,250 --> 01:33:33,360 But some of them are shared. 2197 01:33:33,360 --> 01:33:35,540 If you start at the very top and look at the H, 2198 01:33:35,540 --> 01:33:39,800 notice that both Hagrid and Harry and Hermione 2199 01:33:39,800 --> 01:33:44,150 at least share at least one node in common. 2200 01:33:44,150 --> 01:33:46,550 Now, what's cool about this ultimately? 2201 01:33:46,550 --> 01:33:50,870 So what is the running time of searching for someone in this data structure 2202 01:33:50,870 --> 01:33:52,457 if there's n people already in it? 2203 01:33:52,457 --> 01:33:54,790 Right now n equals 3 because there's three people in it, 2204 01:33:54,790 --> 01:33:56,360 even though there's a lot of nodes. 2205 01:33:56,360 --> 01:33:59,300 But what's the running time for searching this data structure to see 2206 01:33:59,300 --> 01:34:01,190 has Harry picked up his name tag already? 2207 01:34:01,190 --> 01:34:02,480 Has Hermione picked up hers? 2208 01:34:02,480 --> 01:34:03,950 Has Hagrid picked up his? 2209 01:34:03,950 --> 01:34:07,220 Well, how many steps does it take to find Harry or Hermione or Hagrid 2210 01:34:07,220 --> 01:34:09,320 in this data structure? 2211 01:34:09,320 --> 01:34:11,750 For Harry, it's H-a-r-r-y. 2212 01:34:11,750 --> 01:34:13,940 So it's five steps maximally. 2213 01:34:13,940 --> 01:34:16,510 For Hagrid it's H-a-g-r-i-d. 2214 01:34:16,510 --> 01:34:18,880 It's six steps maximally. 2215 01:34:18,880 --> 01:34:24,290 And H-e-r-m-i-o-n-e, 8 steps total. 2216 01:34:24,290 --> 01:34:26,920 And it's probably the case that if we read through the books, 2217 01:34:26,920 --> 01:34:30,100 there is going to be some upper bound on the length of someone's name. 2218 01:34:30,100 --> 01:34:31,100 I don't know what it is. 2219 01:34:31,100 --> 01:34:32,390 It's probably 20 characters. 2220 01:34:32,390 --> 01:34:34,200 Maybe 30 if it's crazy long. 2221 01:34:34,200 --> 01:34:36,200 But there is some fixed value. 2222 01:34:36,200 --> 01:34:39,470 Anytime you have a fixed value, that's what you by definition in CS 2223 01:34:39,470 --> 01:34:40,905 and in math call a constant. 2224 01:34:40,905 --> 01:34:42,530 If it's 20, it's 30, it doesn't matter. 2225 01:34:42,530 --> 01:34:43,360 But it's fixed. 2226 01:34:43,360 --> 01:34:45,890 People's names aren't growing every year in length. 2227 01:34:45,890 --> 01:34:47,390 There's some hard upper bound. 2228 01:34:47,390 --> 01:34:51,470 And so technically, if it only takes you five steps or six steps or eight 2229 01:34:51,470 --> 01:34:55,910 steps to find Harry or Hagrid or Harry or Hermione, that 2230 01:34:55,910 --> 01:35:00,860 is technically constant time or, as we've said, Big O of 1. 2231 01:35:00,860 --> 01:35:05,030 So we can actually then achieve, truly for searching this data structure, 2232 01:35:05,030 --> 01:35:08,240 for inserting this data structure, truly what we call big O of k 2233 01:35:08,240 --> 01:35:09,440 where k is some constant. 2234 01:35:09,440 --> 01:35:13,220 But a constant is the same thing, asymptotically, per our discussion 2235 01:35:13,220 --> 01:35:16,910 in Week 3, of big O of 1. 2236 01:35:16,910 --> 01:35:20,000 These are effectively constant time, because to find Harry, 2237 01:35:20,000 --> 01:35:22,430 you look only at H-a-r-r-y. 2238 01:35:22,430 --> 01:35:27,392 It doesn't matter if there's 1 million other characters in that trie already. 2239 01:35:27,392 --> 01:35:30,350 It doesn't matter if there's Hermione and Hagrid and everyone else from 2240 01:35:30,350 --> 01:35:33,517 the seven books in the data structure, because the only nodes you're looking 2241 01:35:33,517 --> 01:35:36,620 at are the ones representing H-a-r-r-y. 2242 01:35:36,620 --> 01:35:38,030 And that's a powerful thing. 2243 01:35:38,030 --> 01:35:40,700 Every other algorithm we've discussed thus far, certainly 2244 01:35:40,700 --> 01:35:44,180 for searching and sorting, has somehow been slowed down 2245 01:35:44,180 --> 01:35:47,570 by how many other names or numbers are in the data structure. 2246 01:35:47,570 --> 01:35:50,120 That is not the case for this one here. 2247 01:35:50,120 --> 01:35:53,510 However, there is a price being paid. 2248 01:35:53,510 --> 01:35:55,880 What appears to be the price that we're paying 2249 01:35:55,880 --> 01:35:58,070 to gain that really low running time? 2250 01:35:58,070 --> 01:35:58,910 AUDIENCE: Memory. 2251 01:35:58,910 --> 01:35:59,270 DAVID MALAN: Memory. 2252 01:35:59,270 --> 01:36:01,400 I mean, my god, it barely fits on the slide. 2253 01:36:01,400 --> 01:36:03,170 And this is just three names. 2254 01:36:03,170 --> 01:36:07,560 You're spending 26 amount of the memory to store one character. 2255 01:36:07,560 --> 01:36:08,900 Now there's some optimizations. 2256 01:36:08,900 --> 01:36:12,140 Over time, if you insert a lot of names, some of these nodes will be shared. 2257 01:36:12,140 --> 01:36:15,000 But this is a very wide, very dense data structure, 2258 01:36:15,000 --> 01:36:19,040 so to speak, because it's using so much memory to give you 2259 01:36:19,040 --> 01:36:23,920 that super amazing running time of theoretically constant time. 2260 01:36:23,920 --> 01:36:25,670 So again this theme of trade-offs is going 2261 01:36:25,670 --> 01:36:29,588 to persist in the remaining weeks of the semester where to gain one resource, 2262 01:36:29,588 --> 01:36:31,130 we're going to have to spend another. 2263 01:36:31,130 --> 01:36:32,900 So that there is a trie. 2264 01:36:32,900 --> 01:36:35,990 So it turns out that now that we have arrays and linked 2265 01:36:35,990 --> 01:36:40,250 lists and trees and tries and hash tables and yet other data 2266 01:36:40,250 --> 01:36:42,440 structures out there, we can actually implement 2267 01:36:42,440 --> 01:36:44,627 what are called abstract data structures, using 2268 01:36:44,627 --> 01:36:45,960 any of those as building blocks. 2269 01:36:45,960 --> 01:36:48,335 What we've kind of done today verbally and pictorially is 2270 01:36:48,335 --> 01:36:52,130 invent more of those pink puzzle pieces in Scratch, those custom puzzle pieces. 2271 01:36:52,130 --> 01:36:54,980 Now we have as building blocks arrays and linked lists and trees 2272 01:36:54,980 --> 01:36:57,650 and hash tables that we can use to solve other problems. 2273 01:36:57,650 --> 01:37:01,346 And one of the problems out there in the real world is something called a queue. 2274 01:37:01,346 --> 01:37:04,298 A queue and certainly in certain cultures immediately comes to mind, 2275 01:37:04,298 --> 01:37:06,590 what's a queue in the real world or an example thereof? 2276 01:37:06,590 --> 01:37:07,525 AUDIENCE: A line. 2277 01:37:07,525 --> 01:37:09,650 DAVID MALAN: So a line, right, lining up at a store 2278 01:37:09,650 --> 01:37:11,360 or a restaurant or a takeout place. 2279 01:37:11,360 --> 01:37:14,690 So a queue actually has a technical meaning and computer science too. 2280 01:37:14,690 --> 01:37:19,250 It's a data structure that is FIFO, first in, first out. 2281 01:37:19,250 --> 01:37:22,250 A queue, by definition should have people hopefully pleasantly lining up 2282 01:37:22,250 --> 01:37:23,720 one person in front of the other. 2283 01:37:23,720 --> 01:37:27,350 And it maintains this FIFO property, first in, first out, 2284 01:37:27,350 --> 01:37:29,830 such that if I'm at the front of the line 2285 01:37:29,830 --> 01:37:33,020 I am going to be served my food first and then the person behind me 2286 01:37:33,020 --> 01:37:34,670 and then the person behind them. 2287 01:37:34,670 --> 01:37:38,270 It'd be really obnoxious if you walked up to Tasty Burger, placed your order, 2288 01:37:38,270 --> 01:37:41,272 and whoever showed up most recently got their food first. 2289 01:37:41,272 --> 01:37:42,980 That would be an opposite data structure. 2290 01:37:42,980 --> 01:37:43,790 That's LIFO. 2291 01:37:43,790 --> 01:37:45,210 Last in, first out. 2292 01:37:45,210 --> 01:37:47,010 Not fair in the real world. 2293 01:37:47,010 --> 01:37:51,020 So you might hope then that the software that companies like Tasty Burger 2294 01:37:51,020 --> 01:37:53,000 use when they type in your order to the system 2295 01:37:53,000 --> 01:37:55,460 actually send those orders to the team working 2296 01:37:55,460 --> 01:37:57,795 in back cooking the food in a queue fashion, 2297 01:37:57,795 --> 01:38:00,170 because it'd be pretty obnoxious too if people behind you 2298 01:38:00,170 --> 01:38:01,420 were getting their food first. 2299 01:38:01,420 --> 01:38:03,980 So hopefully in software, you're implementing that real world 2300 01:38:03,980 --> 01:38:05,680 notion of a queue as well. 2301 01:38:05,680 --> 01:38:09,260 Printing, if you still print on campus sometimes, papers and whatnot 2302 01:38:09,260 --> 01:38:11,640 on printers, they're often shared printers on campus. 2303 01:38:11,640 --> 01:38:13,640 And so they have what are called printer queues. 2304 01:38:13,640 --> 01:38:16,010 You might go to Command-P or Control-P print, 2305 01:38:16,010 --> 01:38:18,320 but then hopefully, in fairness, if there's 2306 01:38:18,320 --> 01:38:21,320 10 people who are trying to print to the same Harvard printer, 2307 01:38:21,320 --> 01:38:24,670 they are printed in the order in which they were requested. 2308 01:38:24,670 --> 01:38:27,670 It would be pretty obnoxious, again, if the order were flipped. 2309 01:38:27,670 --> 01:38:30,040 Well, it turns out with queues in computer science, 2310 01:38:30,040 --> 01:38:32,957 there's two fundamental operations, even though we humans don't really 2311 01:38:32,957 --> 01:38:34,870 think in these terms, enqueue and dequeue. 2312 01:38:34,870 --> 01:38:36,310 To enqueue means to get in line. 2313 01:38:36,310 --> 01:38:38,440 To dequeue means to get out of line, hopefully 2314 01:38:38,440 --> 01:38:41,860 once you've been served your printouts or your food or whatnot. 2315 01:38:41,860 --> 01:38:45,670 Using today's principles, arrays, linked lists, 2316 01:38:45,670 --> 01:38:49,000 you could probably imagine conceptually using them as building blocks 2317 01:38:49,000 --> 01:38:51,490 to implement this notion of a queue. 2318 01:38:51,490 --> 01:38:54,730 The software that Tasty Burger or any fast food place uses 2319 01:38:54,730 --> 01:38:57,790 probably has implemented in code some lines 2320 01:38:57,790 --> 01:39:00,970 that are using an array that's maybe being dynamically resized or better 2321 01:39:00,970 --> 01:39:03,430 yet a linked list that's growing and shrinking as people 2322 01:39:03,430 --> 01:39:05,440 are placing orders and getting orders. 2323 01:39:05,440 --> 01:39:08,290 So there's this one-to-one mapping between some of today's ideas 2324 01:39:08,290 --> 01:39:09,802 and even the real world as well. 2325 01:39:09,802 --> 01:39:12,760 There's kind of the opposite data structure that I referred to a moment 2326 01:39:12,760 --> 01:39:13,260 ago. 2327 01:39:13,260 --> 01:39:15,100 And these are generally known as stacks. 2328 01:39:15,100 --> 01:39:17,558 Stacks in the real world might be in the dining hall right. 2329 01:39:17,558 --> 01:39:18,990 Like here is of trays. 2330 01:39:18,990 --> 01:39:21,240 And they have this fundamentally different properties, 2331 01:39:21,240 --> 01:39:23,710 such that if the staff go ahead and clean the trays 2332 01:39:23,710 --> 01:39:27,250 and put them right here, it would be pretty obnoxious if to get your you 2333 01:39:27,250 --> 01:39:31,330 had to go through a FIFO fashion and get the first they put down 2334 01:39:31,330 --> 01:39:32,660 and take that out first. 2335 01:39:32,660 --> 01:39:34,210 No one does that, realistically. 2336 01:39:34,210 --> 01:39:36,580 If you've got a big stack of trays in the dining hall, 2337 01:39:36,580 --> 01:39:41,190 you probably enforce a LIFO order, last in, first out. 2338 01:39:41,190 --> 01:39:44,035 So if this was the most recently installed or clean tray, 2339 01:39:44,035 --> 01:39:46,710 you're probably, as the human, just going to take the top one, 2340 01:39:46,710 --> 01:39:48,710 even though that's not really fair to the below. 2341 01:39:48,710 --> 01:39:51,130 But it doesn't matter in this particular case. 2342 01:39:51,130 --> 01:39:52,922 So a stack gives you the opposite property. 2343 01:39:52,922 --> 01:39:54,380 And where else might you see these? 2344 01:39:54,380 --> 01:39:55,510 Well, your Gmail inbox. 2345 01:39:55,510 --> 01:39:58,570 If you use Gmail, your inbox, most likely by default 2346 01:39:58,570 --> 01:39:59,845 is configured as a stack. 2347 01:39:59,845 --> 01:40:01,840 Where do your most recent emails end up? 2348 01:40:01,840 --> 01:40:02,740 AUDIENCE: At the top. 2349 01:40:02,740 --> 01:40:03,580 DAVID MALAN: At the top. 2350 01:40:03,580 --> 01:40:06,340 Now, this is wonderful because it's a feature in that you always 2351 01:40:06,340 --> 01:40:07,720 see your newest mail. 2352 01:40:07,720 --> 01:40:11,170 Kind of a downside, though, to your friends who've emailed you an hour ago 2353 01:40:11,170 --> 01:40:14,540 and whose emails are now down here are on page 2 of your email. 2354 01:40:14,540 --> 01:40:16,210 So there's trade-offs here too. 2355 01:40:16,210 --> 01:40:19,780 Stacks might have desirable properties, like just get your tray quickly, 2356 01:40:19,780 --> 01:40:21,460 see your most recent email. 2357 01:40:21,460 --> 01:40:24,370 But if you're like me, as soon as email falls on the page 2, 2358 01:40:24,370 --> 01:40:27,280 you might never get back to it if the stack of trays 2359 01:40:27,280 --> 01:40:28,540 never actually gets exhausted. 2360 01:40:28,540 --> 01:40:31,390 Frankly, there might be in some dining hall on campus some way 2361 01:40:31,390 --> 01:40:34,780 down here that has never been used in years, because they 2362 01:40:34,780 --> 01:40:37,930 keep refilling the stack, and we keep taking from the top. 2363 01:40:37,930 --> 01:40:40,510 So that would be a bad property for a lot of context, 2364 01:40:40,510 --> 01:40:41,927 but not necessarily all. 2365 01:40:41,927 --> 01:40:44,260 So it turns out there's another data structure too-- oh, 2366 01:40:44,260 --> 01:40:46,760 and the operations there are not called enqueue and dequeue. 2367 01:40:46,760 --> 01:40:48,625 By convention they're called push and pop, 2368 01:40:48,625 --> 01:40:51,910 where this means pushing an element onto the stack. 2369 01:40:51,910 --> 01:40:53,830 Even if it's very gentle, that's pushing. 2370 01:40:53,830 --> 01:40:55,870 Popping means removing the top element. 2371 01:40:55,870 --> 01:40:59,290 So it's just nomenclature meaning adding and removing elements. 2372 01:40:59,290 --> 01:41:02,350 But there's one other data structure we'll give mentioned to today. 2373 01:41:02,350 --> 01:41:03,410 And that's known as a dictionary. 2374 01:41:03,410 --> 01:41:06,285 And we'll see this again in a couple of weeks when we look at Python. 2375 01:41:06,285 --> 01:41:10,030 A dictionary is the abstraction that you can get on top of a hash table. 2376 01:41:10,030 --> 01:41:12,460 This hash table literally involved physical buckets 2377 01:41:12,460 --> 01:41:14,800 and in code would involve arrays and linked lists. 2378 01:41:14,800 --> 01:41:16,240 That's like low level plumbing. 2379 01:41:16,240 --> 01:41:18,820 A dictionary more generally in computer science 2380 01:41:18,820 --> 01:41:23,680 is a data structure that has keys and values, words and values, words 2381 01:41:23,680 --> 01:41:26,625 and page numbers, anything that maps one thing to another. 2382 01:41:26,625 --> 01:41:29,500 Physical dictionaries in the human world, like an English Dictionary, 2383 01:41:29,500 --> 01:41:30,940 has lots of words. 2384 01:41:30,940 --> 01:41:34,810 And if a word is correctly spelled in your document, 2385 01:41:34,810 --> 01:41:36,130 it will be in that dictionary. 2386 01:41:36,130 --> 01:41:38,530 And if you have a typo, a misspelling, in your document, 2387 01:41:38,530 --> 01:41:40,300 it will not be in that dictionary. 2388 01:41:40,300 --> 01:41:43,420 So wouldn't it be nice if you could actually implement 2389 01:41:43,420 --> 01:41:47,170 a dictionary using maybe a hash table, but a smart hash table 2390 01:41:47,170 --> 01:41:50,480 that has plenty of buckets, so that you can answer a question, is this a word, 2391 01:41:50,480 --> 01:41:54,220 is this a word, super fast without having a whole stack of name tags 2392 01:41:54,220 --> 01:41:56,692 or, in this case, English words all in the same bucket. 2393 01:41:56,692 --> 01:41:58,900 And, in fact, that's the challenge for Problem Set 5. 2394 01:41:58,900 --> 01:42:02,710 We're going to give you a big text file with 140,000 plus English words. 2395 01:42:02,710 --> 01:42:06,610 And the goal for you is to implement a hash table with your choice of number 2396 01:42:06,610 --> 01:42:09,310 of buckets, your choice of hash functions, 2397 01:42:09,310 --> 01:42:12,670 and implement this notion of an array with linked lists that 2398 01:42:12,670 --> 01:42:16,358 stores those 140,000 plus words. 2399 01:42:16,358 --> 01:42:18,400 Dictionaries, though, do exist in the real world. 2400 01:42:18,400 --> 01:42:21,640 And taken last night at like 9:00 PM before Sweetgreen closed 2401 01:42:21,640 --> 01:42:23,360 in Harvard Square was this photo. 2402 01:42:23,360 --> 01:42:25,390 If you've ever ordered a salad at Sweetgreen, 2403 01:42:25,390 --> 01:42:28,390 they have a pretty clever optimized system so as to pick up your salad. 2404 01:42:28,390 --> 01:42:30,460 If you order on their app in advance, they 2405 01:42:30,460 --> 01:42:34,540 go ahead and put your salad under D for David, for instance, or B for Brian 2406 01:42:34,540 --> 01:42:35,185 and so forth. 2407 01:42:35,185 --> 01:42:37,060 So that when you go into the store, you don't 2408 01:42:37,060 --> 01:42:39,610 have to look through big O of n other salads. 2409 01:42:39,610 --> 01:42:41,920 You can jump immediately to the B section, the D 2410 01:42:41,920 --> 01:42:44,440 section, or any other section and get your salad. 2411 01:42:44,440 --> 01:42:46,960 Now, in the extreme case, maybe Harry and Hermione 2412 01:42:46,960 --> 01:42:48,820 and Hagrid all order at the same time. 2413 01:42:48,820 --> 01:42:50,960 So there's just a big stack at the H's. 2414 01:42:50,960 --> 01:42:53,800 So it's technically still big O of n. 2415 01:42:53,800 --> 01:42:56,650 But if you assume a nice uniform distribution of names, 2416 01:42:56,650 --> 01:42:59,740 this probably does work out pretty well, especially if the salads aren't 2417 01:42:59,740 --> 01:43:01,990 there by design very long. 2418 01:43:01,990 --> 01:43:06,310 But let's use our final minutes together to take a look at one final visual, one 2419 01:43:06,310 --> 01:43:09,700 made by some of our other friends online who put together an animation that 2420 01:43:09,700 --> 01:43:13,180 tells the story of the differences between stacks and queues 2421 01:43:13,180 --> 01:43:15,510 personified as follows. 2422 01:43:15,510 --> 01:43:17,710 [VIDEO PLAYBACK] 2423 01:43:17,710 --> 01:43:20,620 NARRATOR: Once upon a time, there was a guy named Jack. 2424 01:43:20,620 --> 01:43:24,040 When it came to making friends, Jack did not have the knack. 2425 01:43:24,040 --> 01:43:26,920 So Jack went to talk to the most popular guy he knew. 2426 01:43:26,920 --> 01:43:29,230 He went up to Lou and asked, what do I do? 2427 01:43:29,230 --> 01:43:32,140 Lou saw that his friend was really distressed. 2428 01:43:32,140 --> 01:43:34,405 Well, Lou began, just look how you're dressed. 2429 01:43:34,405 --> 01:43:37,420 Don't you have any with clothes with a different look? 2430 01:43:37,420 --> 01:43:39,550 Yes, said Jack, I sure do. 2431 01:43:39,550 --> 01:43:42,010 Come to my house and I'll show them to you. 2432 01:43:42,010 --> 01:43:43,270 So they went off to Jack's. 2433 01:43:43,270 --> 01:43:46,240 And Jack showed Lou the box where he kept all his shirts 2434 01:43:46,240 --> 01:43:47,650 and his pants and his socks. 2435 01:43:47,650 --> 01:43:50,470 Lou said, I see you have all your clothes in a pile. 2436 01:43:50,470 --> 01:43:53,530 Why don't you wear some others once in a while? 2437 01:43:53,530 --> 01:43:56,740 Jack said, well, when I remove and socks, 2438 01:43:56,740 --> 01:43:59,470 I wash them and put them away in the box. 2439 01:43:59,470 --> 01:44:01,960 Then comes the next morning and up I hop. 2440 01:44:01,960 --> 01:44:04,825 I go to the box and get my off the top. 2441 01:44:04,825 --> 01:44:07,810 Lou quickly realized the problem with Jack. 2442 01:44:07,810 --> 01:44:10,170 He kept clothes, CDs, and books in a stack. 2443 01:44:10,170 --> 01:44:13,150 When he reached for something to read or to wear, 2444 01:44:13,150 --> 01:44:15,790 he chose the top book or underwear. 2445 01:44:15,790 --> 01:44:18,160 Then when he was done, he would put it right back. 2446 01:44:18,160 --> 01:44:20,645 Back it would go, on top of the stack. 2447 01:44:20,645 --> 01:44:23,056 I know the solution, said a triumphant Lou. 2448 01:44:23,056 --> 01:44:25,300 You need to learn to start using a queue. 2449 01:44:25,300 --> 01:44:28,600 Lou took Jack's and hung them in a closet. 2450 01:44:28,600 --> 01:44:31,420 And when he had emptied the box, he just tossed it. 2451 01:44:31,420 --> 01:44:35,170 Then he said, now Jack, at the end of day, put your clothes on the left 2452 01:44:35,170 --> 01:44:36,770 when you put them away. 2453 01:44:36,770 --> 01:44:39,160 Then tomorrow morning, when you see the sun shine, 2454 01:44:39,160 --> 01:44:41,830 get your clothes from the right, from the end of the line. 2455 01:44:41,830 --> 01:44:45,060 Don't you see, said Lou, it will be so nice. 2456 01:44:45,060 --> 01:44:48,490 You'll wear everything once before you wear something twice. 2457 01:44:48,490 --> 01:44:51,340 And with everything in queues in his closet and shelf, 2458 01:44:51,340 --> 01:44:54,370 Jack started to feel quite sure of himself, all thanks 2459 01:44:54,370 --> 01:44:56,777 to Lou and his wonderful queue. 2460 01:44:56,777 --> 01:44:57,850 [END PLAYBACK] 2461 01:44:57,850 --> 01:44:59,600 DAVID MALAN: All right that's it for CS50. 2462 01:44:59,600 --> 01:45:01,620 We'll see you next time. 2463 01:45:01,620 --> 01:45:02,405