WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.996 --> 00:00:04.482 [MUSIC PLAYING] 00:01:23.688 --> 00:01:25.230 DAVID MALAN: All right, this is CS50. 00:01:25.230 --> 00:01:27.243 And this is our look today at data structures. 00:01:27.243 --> 00:01:29.160 You'll recall last week that we gave ourselves 00:01:29.160 --> 00:01:32.310 a few new ingredients and some new syntax in C. 00:01:32.310 --> 00:01:35.820 We introduced pointers, the ability to address chunks of memory 00:01:35.820 --> 00:01:36.990 by actual addresses-- 00:01:36.990 --> 00:01:38.700 0, 1, 2, 3 and on up. 00:01:38.700 --> 00:01:42.990 And we introduced the star notation and a few other functions, 00:01:42.990 --> 00:01:45.870 malloc among them, free among them, so that you can actually 00:01:45.870 --> 00:01:47.310 manage your computer's memory. 00:01:47.310 --> 00:01:49.110 Just to hammer this home, let's take a look 00:01:49.110 --> 00:01:52.080 at just this small example that from the outset is buggy. 00:01:52.080 --> 00:01:52.872 This code is buggy. 00:01:52.872 --> 00:01:54.330 And we'll see in just a moment why. 00:01:54.330 --> 00:01:56.280 But let's just walk through it step by step. 00:01:56.280 --> 00:01:58.920 This first line highlighted in yellow in English 00:01:58.920 --> 00:02:02.160 is doing what as you understand it now? 00:02:02.160 --> 00:02:05.160 If you're a little rusty since last week, what's that first line of code 00:02:05.160 --> 00:02:07.170 doing in English? 00:02:07.170 --> 00:02:07.830 Anything? 00:02:07.830 --> 00:02:08.030 Yeah. 00:02:08.030 --> 00:02:10.039 AUDIENCE: It's creating a point to an int named x. 00:02:10.039 --> 00:02:10.680 DAVID MALAN: Perfect. 00:02:10.680 --> 00:02:12.930 It's creating a pointer to an integer and calling 00:02:12.930 --> 00:02:15.042 that pointer or that variable x. 00:02:15.042 --> 00:02:18.000 And let me propose that the next line is doing the same thing giving us 00:02:18.000 --> 00:02:21.267 another pointer to an integer but this time calling it y. 00:02:21.267 --> 00:02:23.100 The third line of code, someone else, what's 00:02:23.100 --> 00:02:26.710 happening in English with this line here? 00:02:26.710 --> 00:02:28.431 Yeah. 00:02:28.431 --> 00:02:36.830 AUDIENCE: It's creates a memory of size of int and assigns it to x, 00:02:36.830 --> 00:02:40.180 but it's more difficult to execute if it's not a pointer, I don't know. 00:02:40.180 --> 00:02:42.190 DAVID MALAN: This is not the bug. 00:02:42.190 --> 00:02:44.243 But the first part is correct. 00:02:44.243 --> 00:02:46.160 malloc is the function we introduced last week 00:02:46.160 --> 00:02:47.900 that allocates memory for you. 00:02:47.900 --> 00:02:50.960 It takes one argument, the number of bytes that you want to allocate. 00:02:50.960 --> 00:02:53.630 Even if you don't recall how many bytes you need for an integer, 00:02:53.630 --> 00:02:55.540 you can call this other operator, sizeof, 00:02:55.540 --> 00:02:59.062 that we saw briefly last week, that will return, in this case, 4 most likely 00:02:59.062 --> 00:03:00.770 depending on the computer that you're on. 00:03:00.770 --> 00:03:04.070 So this is saying, hey, computer give me 4 bytes of memory. 00:03:04.070 --> 00:03:06.890 And it returns that chunk of memory to conventionally 00:03:06.890 --> 00:03:09.740 by way of the first address, so ox something, 00:03:09.740 --> 00:03:11.660 wherever those 4 bytes happen to star. 00:03:11.660 --> 00:03:14.810 And then it stores that address in x, which is in fact OK, 00:03:14.810 --> 00:03:17.240 because as you noted initially, x is in fact a pointer. 00:03:17.240 --> 00:03:18.260 That is an address. 00:03:18.260 --> 00:03:21.250 So all this is doing is it's declaring a variable called 00:03:21.250 --> 00:03:23.810 x and storing in it ultimately the address 00:03:23.810 --> 00:03:25.407 of a legitimate chunk of memory. 00:03:25.407 --> 00:03:28.030 You wouldn't typically allocate an int like this. 00:03:28.030 --> 00:03:32.535 You would allocate an int with just int and semicolon just like in week 1. 00:03:32.535 --> 00:03:35.660 But now that we have the ability to allocate addresses and allocate memory, 00:03:35.660 --> 00:03:38.150 you could achieve the same idea here. 00:03:38.150 --> 00:03:43.400 This line here now, the fourth line of code, says what in English? 00:03:43.400 --> 00:03:46.453 Star x equals 42 semicolon. 00:03:46.453 --> 00:03:47.370 What's going on there? 00:03:47.370 --> 00:03:48.072 Yeah. 00:03:48.072 --> 00:03:51.380 AUDIENCE: It goes to the address in the x and then sets it to 42. 00:03:51.380 --> 00:03:54.580 DAVID MALAN: Good goes to the address in x and sets it to 42. 00:03:54.580 --> 00:03:57.050 So the star operator is the dereference operator, 00:03:57.050 --> 00:03:59.390 which is a fancy way of saying go to that address. 00:03:59.390 --> 00:04:00.600 And what do you want to do? 00:04:00.600 --> 00:04:03.080 Well, per week 1 when we discussed the assignment operator, 00:04:03.080 --> 00:04:05.720 it just says put the number 42 there. 00:04:05.720 --> 00:04:09.950 So wherever malloc found 4 bytes of available memory for me, 00:04:09.950 --> 00:04:13.670 this fourth line of code says, go there and put the number 42 00:04:13.670 --> 00:04:16.730 by putting the appropriate zeros and ones. 00:04:16.730 --> 00:04:18.200 This last line here-- 00:04:18.200 --> 00:04:20.540 and here's the bug, if we finally reveal it here-- 00:04:20.540 --> 00:04:22.580 this line is buggy. 00:04:22.580 --> 00:04:25.420 Does anyone see why? 00:04:25.420 --> 00:04:26.458 Yeah, over here. 00:04:26.458 --> 00:04:29.000 AUDIENCE: You haven't allocated memory for that variable yet. 00:04:29.000 --> 00:04:29.540 DAVID MALAN: Exactly. 00:04:29.540 --> 00:04:31.730 I haven't allocated memory for that variable yet. 00:04:31.730 --> 00:04:35.160 And because up here I've just said int star y semicolon. 00:04:35.160 --> 00:04:38.870 You can only assume safely that has some garbage value, some unknown value, 00:04:38.870 --> 00:04:41.510 maybe remnants from some other part of the program, 00:04:41.510 --> 00:04:43.910 that might not necessarily be true here at the beginning of the program. 00:04:43.910 --> 00:04:46.493 But for safety's sake assume that if you don't give a variable 00:04:46.493 --> 00:04:48.870 a value, who knows what it contains. 00:04:48.870 --> 00:04:51.080 It's got some bogus address, such that if you 00:04:51.080 --> 00:04:55.220 say star y go to that bogus address something bad is going to happen. 00:04:55.220 --> 00:04:57.920 And maybe you experienced this already in P Set 4 or prior, 00:04:57.920 --> 00:05:00.050 some kind of memory problem with your code, 00:05:00.050 --> 00:05:02.990 or a segmentation fault or seg fault, bad things 00:05:02.990 --> 00:05:05.810 happen when you go to addresses that don't exist 00:05:05.810 --> 00:05:08.510 or that you don't even know where they are. 00:05:08.510 --> 00:05:09.770 So this line of code is bad. 00:05:09.770 --> 00:05:10.730 But we can do a little better. 00:05:10.730 --> 00:05:12.500 What if instead I do something like this? 00:05:12.500 --> 00:05:14.720 I actually assign to y x. 00:05:14.720 --> 00:05:18.320 So that just says put in y the same address that's in x. 00:05:18.320 --> 00:05:24.023 And then with this last line of code, what if I now say star y equals 13? 00:05:24.023 --> 00:05:25.690 What is that-- you're nodding your head. 00:05:25.690 --> 00:05:27.867 What am I doing correctly now? 00:05:27.867 --> 00:05:30.770 AUDIENCE: Now there's memory allocated for y. 00:05:30.770 --> 00:05:31.520 DAVID MALAN: Good. 00:05:31.520 --> 00:05:33.080 Now, there's memory allocated for y. 00:05:33.080 --> 00:05:36.090 So you're saying go to that address and put 13 there. 00:05:36.090 --> 00:05:40.830 However, what did we just need to the 42, just to be clear? 00:05:40.830 --> 00:05:41.540 We clobbered it. 00:05:41.540 --> 00:05:43.640 We overwrote it with the 13. 00:05:43.640 --> 00:05:48.410 Because if x and y are the same address, both this says go to that address 00:05:48.410 --> 00:05:51.050 and put 42 there, but then two lines later, we say, no, no, no, 00:05:51.050 --> 00:05:53.780 go there and put 13 there instead. 00:05:53.780 --> 00:05:57.170 But long story short, bad things happen when you don't necessarily 00:05:57.170 --> 00:05:59.917 anticipate what is in memory and you don't allocate it yourself, 00:05:59.917 --> 00:06:01.750 so thanks to one of our friends at Stanford, 00:06:01.750 --> 00:06:05.870 allow us to take a moment here to hit Play on a short film, a claymation 00:06:05.870 --> 00:06:07.760 if you will, that paints the same picture 00:06:07.760 --> 00:06:09.615 in a more memorable way perhaps. 00:06:09.615 --> 00:06:11.365 If we could dim the lights. 00:06:11.365 --> 00:06:14.100 [VIDEO PLAYBACK] 00:06:14.100 --> 00:06:15.810 NARRATOR: Hey, Binky, wake up. 00:06:15.810 --> 00:06:18.330 It's time for pointer fun. 00:06:18.330 --> 00:06:19.480 BINKY: What's that? 00:06:19.480 --> 00:06:21.130 Learn about pointers? 00:06:21.130 --> 00:06:22.650 Oh, goody. 00:06:22.650 --> 00:06:25.900 NARRATOR: Well, to get started, I guess we're going to need a couple pointers. 00:06:25.900 --> 00:06:30.460 BINKY: OK, this code allocates two pointers, which can point to integers. 00:06:30.460 --> 00:06:32.492 NARRATOR: OK, well, I see the two pointers. 00:06:32.492 --> 00:06:34.450 But they don't seem to be pointing to anything. 00:06:34.450 --> 00:06:37.390 BINKY: That's right, initially pointers don't point to anything. 00:06:37.390 --> 00:06:39.670 The things they point to are call pointees. 00:06:39.670 --> 00:06:41.395 And setting them up is a separate step. 00:06:41.395 --> 00:06:42.520 NARRATOR: Oh, right, right. 00:06:42.520 --> 00:06:43.240 I knew that. 00:06:43.240 --> 00:06:45.040 The pointees are separate. 00:06:45.040 --> 00:06:47.560 So how do you allocate a pointee? 00:06:47.560 --> 00:06:51.310 BINKY: OK, well, this code allocates a new integer pointee. 00:06:51.310 --> 00:06:54.155 And this part sets x to point to it. 00:06:54.155 --> 00:06:55.530 NARRATOR: Hey, that looks better. 00:06:55.530 --> 00:06:57.210 So make it do something. 00:06:57.210 --> 00:07:00.570 BINKY: OK, I'll dereference the pointer x to store the number 00:07:00.570 --> 00:07:02.880 42 into its pointee. 00:07:02.880 --> 00:07:06.450 For this trick, I'll need my magic wand of y dereferencing. 00:07:06.450 --> 00:07:09.790 NARRATOR: Your magic wand of dereferencing. 00:07:09.790 --> 00:07:11.630 That-- that's great. 00:07:11.630 --> 00:07:13.400 BINKY: This is what the code looks like. 00:07:13.400 --> 00:07:14.570 I'll just set up the number. 00:07:14.570 --> 00:07:16.380 And-- 00:07:16.380 --> 00:07:18.450 NARRATOR: Hey, look, there it goes. 00:07:18.450 --> 00:07:21.840 So doing a dereference on x, follows the arrow 00:07:21.840 --> 00:07:25.440 to access its pointee, in this case, the store 42 in there. 00:07:25.440 --> 00:07:30.010 Hey, try using it to store the number 13 through the other pointer, y. 00:07:30.010 --> 00:07:31.080 BINKY: OK. 00:07:31.080 --> 00:07:35.610 I'll just go over here to y and get the number 13 set up 00:07:35.610 --> 00:07:38.985 and then take the wand of dereferencing and just-- 00:07:38.985 --> 00:07:41.130 [BUZZER SOUND] oh. 00:07:41.130 --> 00:07:43.380 NARRATOR: Oh, hey, that didn't work. 00:07:43.380 --> 00:07:48.240 Say, Binky, I don't think dereferencing y is a good idea, because, you know, 00:07:48.240 --> 00:07:50.260 setting up the pointee is a separate step. 00:07:50.260 --> 00:07:52.445 And I don't think we ever did it. 00:07:52.445 --> 00:07:53.910 BINKY: Oh, good point. 00:07:53.910 --> 00:07:56.220 NARRATOR: Yeah, we allocated the pointer y. 00:07:56.220 --> 00:07:58.990 But we never set it to point to a pointee. 00:07:58.990 --> 00:08:00.730 BINKY: Mm, very observant. 00:08:00.730 --> 00:08:02.730 NARRATOR: Hey, you're looking good there, Binky. 00:08:02.730 --> 00:08:05.700 Can you fix it so that y points to the same pointee as x? 00:08:05.700 --> 00:08:06.300 BINKY: Sure. 00:08:06.300 --> 00:08:09.090 I'll use my magic wand of pointer assignment. 00:08:09.090 --> 00:08:11.280 NARRATOR: Is that going to be a problem like before? 00:08:11.280 --> 00:08:13.110 BINKY: No, this doesn't touch the pointees. 00:08:13.110 --> 00:08:16.815 It just changes one pointer to point to the same thing as another. 00:08:16.815 --> 00:08:17.730 NARRATOR: Oh, I see. 00:08:17.730 --> 00:08:20.500 Now y points to the same place as x. 00:08:20.500 --> 00:08:22.260 So, wait, now y is fixed. 00:08:22.260 --> 00:08:23.430 It has a pointee. 00:08:23.430 --> 00:08:26.950 So you can try the wand of dereferencing again to send the 13 over. 00:08:26.950 --> 00:08:30.410 BINKY: Uh-- OK, here goes. 00:08:30.410 --> 00:08:31.640 NARRATOR: Hey, look at that. 00:08:31.640 --> 00:08:33.289 Now dereferencing works on y. 00:08:33.289 --> 00:08:37.460 And because the pointers are sharing that one pointee, they both see the 13. 00:08:37.460 --> 00:08:39.169 BINKY: Yeah, sharing, whatever. 00:08:39.169 --> 00:08:41.090 So are we going to switch places now? 00:08:41.090 --> 00:08:43.010 NARRATOR: Oh, look, we're out of time. 00:08:43.010 --> 00:08:43.643 BINKY: But-- 00:08:43.643 --> 00:08:44.430 [END PLAYBACK] 00:08:44.430 --> 00:08:46.610 DAVID MALAN: All right, so now that we do 00:08:46.610 --> 00:08:49.040 have this power of pointers and addresses 00:08:49.040 --> 00:08:51.800 where we have low level access to the computer's memory, 00:08:51.800 --> 00:08:54.732 we can actually solve problems a lot more powerfully 00:08:54.732 --> 00:08:56.190 and in a lot more interesting ways. 00:08:56.190 --> 00:08:58.232 But first, let's motivate some of these problems. 00:08:58.232 --> 00:09:00.290 So back in Week 2, we introduced arrays, which 00:09:00.290 --> 00:09:02.780 was the first of our data structures, if you will. 00:09:02.780 --> 00:09:05.270 Before then in Week 1, all we had was variables for things 00:09:05.270 --> 00:09:07.910 like ints and chars and floats and so forth. 00:09:07.910 --> 00:09:09.860 In Week 2, we introduced arrays, which meant 00:09:09.860 --> 00:09:13.850 you could store two ints altogether or three or 10 or 100. 00:09:13.850 --> 00:09:16.810 So you can kind of encapsulate lots of data together. 00:09:16.810 --> 00:09:21.150 So unfortunately, though, arrays aren't quite as powerful as might be ideal. 00:09:21.150 --> 00:09:25.670 So, for instance, if we have an array with size 3 and we actually want to go 00:09:25.670 --> 00:09:28.580 ahead and store three values in it-- one, two, three-- 00:09:28.580 --> 00:09:31.430 suppose that we actually want to now store a fourth value, 00:09:31.430 --> 00:09:33.350 but we didn't anticipate that from the get go. 00:09:33.350 --> 00:09:37.160 Recall after all that with arrays you have to declare their size upfront. 00:09:37.160 --> 00:09:41.300 So you've got to hard code the number 3 or a variable containing the number 3. 00:09:41.300 --> 00:09:43.550 But suppose that we want to store the number 4. 00:09:43.550 --> 00:09:46.130 You might think that, well, just give me another box 00:09:46.130 --> 00:09:48.950 of memory just to the right of the number 3, 00:09:48.950 --> 00:09:51.080 so that I can keep all of my numbers together. 00:09:51.080 --> 00:09:53.930 But unfortunately, per last week, that's not really 00:09:53.930 --> 00:09:56.240 a reliable assumption, because in the context 00:09:56.240 --> 00:09:59.150 of the rest of your computer's memory, that 1, 2, 3, 00:09:59.150 --> 00:10:01.580 might be here surrounded by other bytes. 00:10:01.580 --> 00:10:05.450 And per last week those bytes might be mostly filled with other data 00:10:05.450 --> 00:10:07.850 from some other parts of your program. 00:10:07.850 --> 00:10:11.270 And yet you would think that in seeing that 1, 2, 3 00:10:11.270 --> 00:10:14.060 is kind of painted into this corner, so to speak, 00:10:14.060 --> 00:10:16.400 that there's just no room for the number 4, 00:10:16.400 --> 00:10:19.970 and therefore you can't add the fourth number to your array, 00:10:19.970 --> 00:10:23.012 is there a solution visibly to this problem nonetheless? 00:10:25.970 --> 00:10:27.260 Where else could we put it? 00:10:27.260 --> 00:10:27.760 Yeah. 00:10:27.760 --> 00:10:29.468 AUDIENCE: Move it to off of other memory. 00:10:29.468 --> 00:10:31.052 DAVID MALAN: Say that a little louder. 00:10:31.052 --> 00:10:32.975 AUDIENCE: We can move it off to other memory. 00:10:32.975 --> 00:10:35.600 DAVID MALAN: Yeah, so maybe we can move it off to other memory. 00:10:35.600 --> 00:10:38.780 So there's a lot of EMMAs in my memory per last week, but there is still, 00:10:38.780 --> 00:10:41.630 it would seem, based on this picture, some unused memory. 00:10:41.630 --> 00:10:46.532 So maybe we could resize our array, grow it, not by just moving all of the EMMAs 00:10:46.532 --> 00:10:48.740 because frankly that would seem to take a lot of time 00:10:48.740 --> 00:10:50.532 if we had to shift all of these characters, 00:10:50.532 --> 00:10:53.840 why don't we just relocate the 1, 2, 3 down here, 00:10:53.840 --> 00:10:57.020 and that gives us an extra space for at least a number 4. 00:10:57.020 --> 00:10:58.970 So indeed even if you're using arrays, you 00:10:58.970 --> 00:11:01.940 can achieve this outcome by actually moving memory around. 00:11:01.940 --> 00:11:03.500 But consider what's involved in that. 00:11:03.500 --> 00:11:05.360 So if you've got our old array at top left, 00:11:05.360 --> 00:11:08.390 and we've got our new array at bottom right, that is of size 4. 00:11:08.390 --> 00:11:09.830 So we have plenty of room. 00:11:09.830 --> 00:11:12.528 How do we go about resizing the array? 00:11:12.528 --> 00:11:13.820 Well, it's kind of an illusion. 00:11:13.820 --> 00:11:17.112 You can't just resize the array when we have all of these EMMAs surrounding us. 00:11:17.112 --> 00:11:20.720 Instead, we actually have to move the array or copy it. 00:11:20.720 --> 00:11:22.352 So the 1 gets moved to the new memory. 00:11:22.352 --> 00:11:23.810 The 2 gets moved to the new memory. 00:11:23.810 --> 00:11:25.520 The 3 gets moved to the new memory. 00:11:25.520 --> 00:11:30.350 And then at that point, we can just throw away or free the previously used 00:11:30.350 --> 00:11:33.560 memory and now go ahead and add our 4. 00:11:33.560 --> 00:11:36.530 Unfortunately, this isn't necessarily the best strategy, right, 00:11:36.530 --> 00:11:40.100 because if these three lockers represent our original memory and these four 00:11:40.100 --> 00:11:44.090 lockers represents our new memory and they're deliberately far apart, 00:11:44.090 --> 00:11:49.010 that is to say that if I want to go ahead and move like these same numbers, 00:11:49.010 --> 00:11:52.680 I really have to do something like this, which involves quite a few steps. 00:11:52.680 --> 00:11:54.980 Let me go ahead and put the 1 in there now. 00:11:54.980 --> 00:11:58.365 Now, let me go ahead and get the 2 here. 00:11:58.365 --> 00:12:00.240 And then I can go ahead and put this in here. 00:12:00.240 --> 00:12:02.570 So now I've got the 2. 00:12:02.570 --> 00:12:05.662 And then lastly, I can go grab the 3. 00:12:05.662 --> 00:12:08.120 And so even though I did this pretty quickly on the screen, 00:12:08.120 --> 00:12:10.328 the reality is there's a decent amount of work to do. 00:12:10.328 --> 00:12:15.440 And then I still, of course, have to go ahead and add the 4 to the mix, which 00:12:15.440 --> 00:12:20.480 is to say that I've taken figuratively and physically quite a few steps 00:12:20.480 --> 00:12:24.830 in order to resize an array from size 3 to size 4, which 00:12:24.830 --> 00:12:26.840 is to say if we now consider the efficiency 00:12:26.840 --> 00:12:34.460 or, if you will, inefficiency of that algorithm, what kind of running time 00:12:34.460 --> 00:12:39.140 is involved when inserting additional numbers into an array 00:12:39.140 --> 00:12:39.920 as I've done here? 00:12:39.920 --> 00:12:42.770 Here's our menu of options from a couple of weeks 00:12:42.770 --> 00:12:45.140 ago when we focused on algorithms. 00:12:45.140 --> 00:12:49.070 What's the running time of insertion into an array 00:12:49.070 --> 00:12:52.970 based even on that simple demonstration would you say? 00:12:52.970 --> 00:12:54.040 What's the running time? 00:12:54.040 --> 00:12:54.540 Yeah. 00:12:54.540 --> 00:12:55.280 AUDIENCE: O n squared. 00:12:55.370 --> 00:12:56.046 DAVID MALAN: Say it again. 00:12:56.046 --> 00:12:56.860 AUDIENCE: O n squared. 00:12:56.860 --> 00:12:57.920 DAVID MALAN: O n squared. 00:12:57.920 --> 00:13:01.245 So maybe O n squared in that there was a lot of back and forth 00:13:01.245 --> 00:13:02.370 and we've seen that before. 00:13:02.370 --> 00:13:04.430 We've seen bubble sort and selection sort add up. 00:13:04.430 --> 00:13:06.140 It's not quite as bad as that. 00:13:06.140 --> 00:13:07.390 It's not quite as bad as that. 00:13:07.390 --> 00:13:07.890 Yeah. 00:13:07.890 --> 00:13:08.757 AUDIENCE: O of n. 00:13:08.757 --> 00:13:09.590 DAVID MALAN: O of n. 00:13:09.590 --> 00:13:11.390 And why do you say O of n? 00:13:11.390 --> 00:13:16.996 AUDIENCE: Because for as like as many lockers there are in the first one, 00:13:16.996 --> 00:13:22.645 you have to increment the same amount of processes to insert them. 00:13:22.645 --> 00:13:23.520 DAVID MALAN: Exactly. 00:13:23.520 --> 00:13:26.940 Whatever number of lockers you have here-- so that's three specifically-- 00:13:26.940 --> 00:13:29.670 but n more generally, it's going to take me n steps 00:13:29.670 --> 00:13:31.942 to transfer those numbers over here. 00:13:31.942 --> 00:13:34.900 Or technically, it's going to take me 3-- maybe if I go back and forth, 00:13:34.900 --> 00:13:35.890 it's like 6 steps. 00:13:35.890 --> 00:13:37.770 But it's some multiple of n. 00:13:37.770 --> 00:13:38.730 So it's not n squared. 00:13:38.730 --> 00:13:41.310 That's when we kept iterating again and again and again. 00:13:41.310 --> 00:13:45.120 This time I just have to move 3 numbers to here and then add the fourth number. 00:13:45.120 --> 00:13:48.840 So it's indeed, Big O of n when you want to go ahead and insert or search 00:13:48.840 --> 00:13:54.180 equivalently an array that's actually implemented-- 00:13:54.180 --> 00:13:56.640 sorry, insert is going to take us linear time. 00:13:56.640 --> 00:13:58.878 But search recall-- and this was the powerful thing-- 00:13:58.878 --> 00:14:01.920 what's the running time of search so long as you keep your number sorted? 00:14:01.920 --> 00:14:04.340 Per two weeks ago, that was logarithmic. 00:14:04.340 --> 00:14:06.330 So we haven't necessarily sacrificed that. 00:14:06.330 --> 00:14:10.020 And that's the appeal of storing our data in an array that's sorted. 00:14:10.020 --> 00:14:12.030 You can use binary search. 00:14:12.030 --> 00:14:15.930 However, this is expensive and moving things around isn't necessarily 00:14:15.930 --> 00:14:16.930 the ideal approach. 00:14:16.930 --> 00:14:19.510 So let's just consider what this might look like in code. 00:14:19.510 --> 00:14:22.400 Let me go over to CS50 IDE here. 00:14:22.400 --> 00:14:26.970 And let me go ahead and create a new file called list.c. 00:14:26.970 --> 00:14:30.510 And let's see if we can't represent in code exactly this idea. 00:14:30.510 --> 00:14:33.630 So let me go ahead and include for myself standard stdio.h 00:14:33.630 --> 00:14:36.300 just so that we can print out some values ultimately. 00:14:36.300 --> 00:14:38.482 Let me go ahead then and declare main-- 00:14:38.482 --> 00:14:40.470 int main void. 00:14:40.470 --> 00:14:42.750 And then down here, let's just arbitrarily start 00:14:42.750 --> 00:14:46.620 where we did with three integers, called list and size 3. 00:14:46.620 --> 00:14:49.230 So I'm just mimicking exactly where we started pictorially 00:14:49.230 --> 00:14:51.780 by having an array that was fixed at size 3. 00:14:51.780 --> 00:14:53.940 And then if I went ahead and initialized that list, 00:14:53.940 --> 00:14:57.540 I could just hard code-- that is type into the program itself-- 00:14:57.540 --> 00:15:03.370 those three values into bracket 0, 1, and 2 the numbers 1, 2, 3 respectively. 00:15:03.370 --> 00:15:06.300 So I'm just manually initializing that array to three values. 00:15:06.300 --> 00:15:08.970 And then just so that this program has some purpose in life, 00:15:08.970 --> 00:15:13.200 let me go ahead and do int i equals 0, i less than 3, i++. 00:15:13.200 --> 00:15:16.812 And then, let's just print out these elements just for good measure. 00:15:16.812 --> 00:15:17.770 Each of them is an int. 00:15:17.770 --> 00:15:19.170 So we'll use %i. 00:15:19.170 --> 00:15:22.350 And then I'm going to go ahead and print out list bracket i. 00:15:22.350 --> 00:15:25.260 So kind of a Week 2 style program, where All I'm doing 00:15:25.260 --> 00:15:28.470 is hard coding an array of size 3, initializing it 00:15:28.470 --> 00:15:32.460 with three values, 1, 2, 3; 0 indexed, and then printing them out. 00:15:32.460 --> 00:15:37.920 So if I go ahead and save this and make my list and then go ahead and compile 00:15:37.920 --> 00:15:41.640 this with ./list, I should see hopefully 1, 2, 3. 00:15:41.640 --> 00:15:45.060 But there's a problem with this implementation fundamentally because I 00:15:45.060 --> 00:15:47.880 have hardcoded-- that is typed explicitly-- 00:15:47.880 --> 00:15:53.040 the size of this array, how can I go about adding a fourth element? 00:15:53.040 --> 00:15:54.210 What would I have to do? 00:15:54.210 --> 00:15:56.365 Well, I could change the code up here to 4. 00:15:56.365 --> 00:15:57.990 And then I could add another line here. 00:15:57.990 --> 00:15:59.160 And then I could change this. 00:15:59.160 --> 00:16:00.493 But then I have to recompile it. 00:16:00.493 --> 00:16:02.145 And so it's certainly not dynamic. 00:16:02.145 --> 00:16:04.020 But we did see a function last week that lets 00:16:04.020 --> 00:16:06.120 you allocate more memory dynamically. 00:16:06.120 --> 00:16:08.340 And just to be sure what was that function? 00:16:08.340 --> 00:16:08.970 So malloc. 00:16:08.970 --> 00:16:09.470 Right? 00:16:09.470 --> 00:16:13.290 Now that we have malloc, you don't have to type into your program source code 00:16:13.290 --> 00:16:15.420 from the get go a fixed number. 00:16:15.420 --> 00:16:18.160 You can actually allocate some amount of memory dynamically. 00:16:18.160 --> 00:16:21.990 Now, here just for demonstration's sake, we'll do it to achieve the same goal, 00:16:21.990 --> 00:16:24.833 but in a way that's going to scale a little more effectively. 00:16:24.833 --> 00:16:28.000 Recall from last week that if you want to get a chunk of memory from malloc, 00:16:28.000 --> 00:16:30.375 it's going to return the address of that chunk of memory. 00:16:30.375 --> 00:16:36.000 So that suggests that I can declare a pointer to an integer called list. 00:16:36.000 --> 00:16:40.560 And then let me go ahead and allocate, how about, three integers initially 00:16:40.560 --> 00:16:43.510 times whatever the size is of an integer. 00:16:43.510 --> 00:16:46.680 So this is a little weird looking, but consider what this is doing. 00:16:46.680 --> 00:16:49.740 malloc is being asked for 3 times the size of an int. 00:16:49.740 --> 00:16:51.900 So give me enough memory to fit three integers. 00:16:51.900 --> 00:16:55.000 By definition that returns a pointer, per last week. 00:16:55.000 --> 00:16:57.480 So we have to assign it to a pointer on the left. 00:16:57.480 --> 00:17:01.660 So list is a variable now, just like x and y from our previous example, 00:17:01.660 --> 00:17:04.180 that's storing the address of that chunk of memory. 00:17:04.180 --> 00:17:07.530 But what's cool about C is that now that you 00:17:07.530 --> 00:17:10.109 know that list is a chunk of memory, we can actually 00:17:10.109 --> 00:17:14.069 borrow that same square bracket notation from Week 2. 00:17:14.069 --> 00:17:17.849 And this code here doesn't actually need to change. 00:17:17.849 --> 00:17:22.890 If you use square bracket notation next to the name of a pointer, what's 00:17:22.890 --> 00:17:25.190 going to happen for you automatically is the computer 00:17:25.190 --> 00:17:28.620 is going to go to the first byte in that chunk of memory. 00:17:28.620 --> 00:17:31.060 This index is going to go to the next chunk of memory. 00:17:31.060 --> 00:17:33.060 This is going to go to the next chunk of memory, 00:17:33.060 --> 00:17:36.430 all within the scope of what malloc returned for you. 00:17:36.430 --> 00:17:39.073 And just as an aside, how many bytes are in an integer? 00:17:39.073 --> 00:17:39.760 AUDIENCE: 4. 00:17:39.760 --> 00:17:40.740 DAVID MALAN: 4. 00:17:40.740 --> 00:17:43.050 And recall I briefly mentioned the expression last week 00:17:43.050 --> 00:17:44.400 pointer arithmetic. 00:17:44.400 --> 00:17:46.710 What you're also getting sort of magically 00:17:46.710 --> 00:17:50.790 with this square bracket notation is that bracket 0, 00:17:50.790 --> 00:17:52.530 it happens to be byte 0. 00:17:52.530 --> 00:17:54.930 Bracket 1 is not the second byte. 00:17:54.930 --> 00:17:57.000 It's actually 4 bytes over. 00:17:57.000 --> 00:17:59.550 And bracket 2 is not the third byte. 00:17:59.550 --> 00:18:05.910 It's actually 8 bytes over, because you allocated 4 plus 4 plus 4, 12 bytes. 00:18:05.910 --> 00:18:08.130 And so this square bracket notation just jumps you 00:18:08.130 --> 00:18:11.160 to the right place in that chunk of memory, so that you can fit int, 00:18:11.160 --> 00:18:13.350 int, int. 00:18:13.350 --> 00:18:14.096 Yeah. 00:18:14.096 --> 00:18:17.096 AUDIENCE: Why do you allocate a pointer to an int rather than an pointer 00:18:17.096 --> 00:18:18.223 to an int array? 00:18:18.223 --> 00:18:21.140 DAVID MALAN: Why do you allocate a pointer to an int and not a pointer 00:18:21.140 --> 00:18:23.090 to an int array? 00:18:23.090 --> 00:18:28.760 In this context, arrays and pointers are in some sense the same. 00:18:28.760 --> 00:18:30.640 A pointer is an address of memory. 00:18:30.640 --> 00:18:32.930 An array is just a chunk of memory. 00:18:32.930 --> 00:18:36.890 And so even though we used chunks of memory in Week 2 00:18:36.890 --> 00:18:39.500 by just calling them arrays, they really are just more 00:18:39.500 --> 00:18:42.860 generally chunks of memory that support square bracket notation. 00:18:42.860 --> 00:18:45.380 But now that we can allocate as much memory as we want, 00:18:45.380 --> 00:18:48.350 we can kind of use these two concepts interchangeably. 00:18:48.350 --> 00:18:52.760 And there are some subtleties in C. But this now has the same effect as Week 2. 00:18:52.760 --> 00:18:55.495 And this is the only new line from this week. 00:18:55.495 --> 00:18:57.620 But now if you're using malloc, even though I'm not 00:18:57.620 --> 00:18:59.960 going to do it in a more complicated program here, 00:18:59.960 --> 00:19:04.463 you can imagine now the code running in a loop and maybe allocating more memory 00:19:04.463 --> 00:19:07.130 and more memory and more memory when you need it, because malloc 00:19:07.130 --> 00:19:08.900 allows you to do just that. 00:19:08.900 --> 00:19:11.768 And we do need to do a couple of safety checks here. 00:19:11.768 --> 00:19:14.810 It turns out, per last week, that malloc can sometimes run out of memory. 00:19:14.810 --> 00:19:18.522 If you're Mac or PC or the cloud runs out of memory for your account, 00:19:18.522 --> 00:19:20.480 well, you might want to check the return value. 00:19:20.480 --> 00:19:22.700 And so good practice would be, well, wait a minute, 00:19:22.700 --> 00:19:26.210 if list equals equals null, let me just go ahead and return 1, 00:19:26.210 --> 00:19:28.670 something went wrong, because my computer is out of memory 00:19:28.670 --> 00:19:29.450 for some reason. 00:19:29.450 --> 00:19:32.900 So best practice would say anytime you allocate memory, 00:19:32.900 --> 00:19:36.110 always check if you've gotten back null. 00:19:36.110 --> 00:19:39.080 Now, let me just do something for the sake of demonstration. 00:19:39.080 --> 00:19:41.047 Let me move my window down here. 00:19:41.047 --> 00:19:43.130 Let me highlight these lines of code and just make 00:19:43.130 --> 00:19:45.920 the claim that highlighted here between lines 5 and 13 00:19:45.920 --> 00:19:50.480 are lines of code that simply allocate a list of size 3 00:19:50.480 --> 00:19:52.920 and store three values in it. 00:19:52.920 --> 00:19:55.460 That's the story where we left off a moment ago. 00:19:55.460 --> 00:20:00.350 Suppose now I change my mind and decide after line 13 00:20:00.350 --> 00:20:02.690 or maybe elsewhere in this program if it were larger, 00:20:02.690 --> 00:20:05.660 you know what, I actually want another integer. 00:20:05.660 --> 00:20:08.570 I want to resize that array. 00:20:08.570 --> 00:20:10.050 Well, how can I go about doing it? 00:20:10.050 --> 00:20:11.467 Well, let me go ahead and do this. 00:20:11.467 --> 00:20:15.380 Let me go ahead and allocate, for instance, another address 00:20:15.380 --> 00:20:23.000 and say store at that address a chunk of memory corresponding to four integers 00:20:23.000 --> 00:20:25.670 using the size of operator as before. 00:20:25.670 --> 00:20:27.800 So temporarily, let me go ahead and give myself 00:20:27.800 --> 00:20:30.260 a new chunk of memory that is big enough to fit 00:20:30.260 --> 00:20:32.347 four integers instead of just three. 00:20:32.347 --> 00:20:35.180 Let me practice best practices and say, you know what, just in case, 00:20:35.180 --> 00:20:37.967 if temp equals equals null because I'm out of memory, forget it, 00:20:37.967 --> 00:20:39.050 I'm done with the program. 00:20:39.050 --> 00:20:40.467 We're not going to proceed anyway. 00:20:40.467 --> 00:20:42.110 But that's just good practice now. 00:20:42.110 --> 00:20:43.580 But now what I want to do? 00:20:43.580 --> 00:20:49.340 If I now have two chunks of memory, this one is a size 3, this one is of size 4, 00:20:49.340 --> 00:20:52.010 what did we do the last time we wanted to move something around 00:20:52.010 --> 00:20:55.282 in the computer's memory, what did I physically do with the lockers? 00:20:55.282 --> 00:20:56.240 I think you're nodding. 00:20:56.240 --> 00:20:58.268 What did I do? 00:20:58.268 --> 00:21:00.750 AUDIENCE: You went through each and move 1 to the-- 00:21:00.750 --> 00:21:02.875 DAVID MALAN: Yeah, exactly, I went through each one 00:21:02.875 --> 00:21:05.590 and copied the value from left to right, from old to new. 00:21:05.590 --> 00:21:07.780 And so let me go ahead and do exactly that. 00:21:07.780 --> 00:21:11.890 I think I can do this with a for loop, for int i get 0, i is less than 3, 00:21:11.890 --> 00:21:16.220 because that's the size of the old array that size 3, i++. 00:21:16.220 --> 00:21:19.360 And then in this iteration, I can just do something like this-- 00:21:19.360 --> 00:21:21.820 use this new chunk of memory just like an array, 00:21:21.820 --> 00:21:25.420 because I claimed I can use my square bracket notation and store location 00:21:25.420 --> 00:21:28.990 i whatever is in the original list at location i. 00:21:28.990 --> 00:21:31.570 So this code here now, if I were to comment it, 00:21:31.570 --> 00:21:36.700 copy integers from old array into new array. 00:21:36.700 --> 00:21:39.940 And that too is just using a for loop from old to new. 00:21:39.940 --> 00:21:42.670 But now that's not quite everything I want to do. 00:21:42.670 --> 00:21:46.750 I also want to store at the location 3, 0 index, 00:21:46.750 --> 00:21:49.540 which means it's the fourth location, another value, number 4. 00:21:49.540 --> 00:21:53.260 That's why I put the additional number 4 into those lockers. 00:21:53.260 --> 00:21:57.310 So now with these lines of code, I have implemented the physical notion 00:21:57.310 --> 00:22:01.460 of copying all of the values from the old array into the new array. 00:22:01.460 --> 00:22:03.670 So I'm almost done, except what did we learn 00:22:03.670 --> 00:22:07.030 last week that you should do whenever you're done with a chunk of memory? 00:22:07.030 --> 00:22:09.188 Do I still need the original chunk of memory? 00:22:09.188 --> 00:22:09.730 AUDIENCE: No. 00:22:09.730 --> 00:22:09.910 DAVID MALAN: No. 00:22:09.910 --> 00:22:12.040 And how do I give it back to the computer? 00:22:12.040 --> 00:22:12.760 AUDIENCE: Free. 00:22:12.760 --> 00:22:13.510 DAVID MALAN: Free. 00:22:13.510 --> 00:22:16.420 So quite simply, I literally just call free, 00:22:16.420 --> 00:22:20.958 passing in the address of the chunk of memory that I want to free. 00:22:20.958 --> 00:22:22.750 And even though I'm passing in one address, 00:22:22.750 --> 00:22:25.480 the computer is going to do the heavy, lifting of remembering 00:22:25.480 --> 00:22:27.130 how many bytes I asked for originally. 00:22:27.130 --> 00:22:28.588 You don't have to worry about that. 00:22:28.588 --> 00:22:31.738 You just say, whatever this is pointing at, go ahead and free it all. 00:22:31.738 --> 00:22:34.280 So now, you know what, now that I've gotten rid of that list, 00:22:34.280 --> 00:22:38.172 I'm going to update list equal temp, which is just cleaning up the naming. 00:22:38.172 --> 00:22:39.880 Temp is kind of a stupid name for a list. 00:22:39.880 --> 00:22:45.010 Let me just reuse the original pointer and let list equal temp. 00:22:45.010 --> 00:22:49.040 And now down here if I've done everything correctly, 00:22:49.040 --> 00:22:51.350 it should suffice to print out that whole list. 00:22:51.350 --> 00:22:52.727 So let me save this. 00:22:52.727 --> 00:22:54.560 Let me give myself a bigger terminal window. 00:22:54.560 --> 00:22:56.500 Do make list again. 00:22:56.500 --> 00:22:58.010 A lot of mistakes. 00:22:58.010 --> 00:22:59.590 Let's see, first one is up here. 00:22:59.590 --> 00:23:04.060 Implicitly declaring library function malloc dot dot dot. 00:23:04.060 --> 00:23:07.870 What generally is the solution when you see implicitly declaring something? 00:23:07.870 --> 00:23:09.370 AUDIENCE: Header file. 00:23:09.370 --> 00:23:10.700 DAVID MALAN: Header file, which one do I want? 00:23:10.700 --> 00:23:11.283 Do you recall? 00:23:14.143 --> 00:23:16.810 This is subtle and we might not have used it last time if I used 00:23:16.810 --> 00:23:19.360 the CS50 library, but it's stdlib.h. 00:23:19.360 --> 00:23:20.453 That is where malloc is. 00:23:20.453 --> 00:23:21.370 That is where free is. 00:23:21.370 --> 00:23:25.598 So stdlib.h is one new header file that contains last week's functions. 00:23:25.598 --> 00:23:26.640 So let me try this again. 00:23:26.640 --> 00:23:27.670 Let me make list. 00:23:27.670 --> 00:23:29.530 Nice, this time it did compile. 00:23:29.530 --> 00:23:36.030 ./list and-- hm, I seem to be missing that fourth number. 00:23:36.030 --> 00:23:39.560 But I think this is just a stupid mistake on my part. 00:23:39.560 --> 00:23:44.440 What did I do wrong if I look at the printing of this array? 00:23:44.440 --> 00:23:45.880 AUDIENCE: Size of list. 00:23:45.880 --> 00:23:48.790 DAVID MALAN: Yeah, down here the new list is size 4. 00:23:48.790 --> 00:23:50.740 So frankly, if you recall a few weeks ago, 00:23:50.740 --> 00:23:53.592 I encouraged you, don't just hard code numbers, magic numbers, 00:23:53.592 --> 00:23:54.300 in your programs. 00:23:54.300 --> 00:23:57.320 We should really be using constants or some other variable. 00:23:57.320 --> 00:23:59.860 This is exactly why, because you, just like me, 00:23:59.860 --> 00:24:01.300 might overlook a detail like that. 00:24:01.300 --> 00:24:02.560 So let me recompile it. 00:24:02.560 --> 00:24:03.820 And let me do list. 00:24:03.820 --> 00:24:06.070 And, voila, there is my 1, 2, 3, 4. 00:24:06.070 --> 00:24:08.230 Now, to be clear, this is kind of a stupid program, 00:24:08.230 --> 00:24:11.260 because I sort of decided halfway through writing 00:24:11.260 --> 00:24:14.023 this program, wait a minute, I want four integers, not three. 00:24:14.023 --> 00:24:15.940 And, of course, at that point, you should just 00:24:15.940 --> 00:24:17.420 delete the earlier code you wrote. 00:24:17.420 --> 00:24:19.120 So this is just for demonstration sake. 00:24:19.120 --> 00:24:22.000 If you imagine this being a bigger program, that just over time 00:24:22.000 --> 00:24:25.480 the human decides maybe because get int is called that they need more memory, 00:24:25.480 --> 00:24:27.280 this is how you would do it using malloc. 00:24:27.280 --> 00:24:29.488 But it turns out there's a function that can actually 00:24:29.488 --> 00:24:31.370 make our lives a little easier here. 00:24:31.370 --> 00:24:34.580 So let me go ahead and clean this up just a little bit. 00:24:34.580 --> 00:24:38.380 It turns out that I don't have to allocate more memory myself, 00:24:38.380 --> 00:24:42.460 copy all of these things myself, and then also free it. 00:24:42.460 --> 00:24:46.040 I can consolidate a bunch of these lines as follows. 00:24:46.040 --> 00:24:50.380 Instead of using malloc, I can actually say realloc, 00:24:50.380 --> 00:24:53.288 which as the name suggests, reallocate a chunk of memory. 00:24:53.288 --> 00:24:54.580 What do you want to reallocate? 00:24:54.580 --> 00:24:57.020 Well, I want to reallocate the list. 00:24:57.020 --> 00:25:01.690 And this time I want to do 4 times size of an int. 00:25:01.690 --> 00:25:03.842 I'm going to store this in temp temporarily. 00:25:03.842 --> 00:25:05.800 I'm going to make sure that nothing went wrong, 00:25:05.800 --> 00:25:08.925 as by checking for null, which just means, hey, you might be out of memory. 00:25:08.925 --> 00:25:10.810 And then I'm going to return if so. 00:25:10.810 --> 00:25:15.670 But down here, if all is well, I'm going to go ahead and do this. 00:25:15.670 --> 00:25:25.410 And watch this I now have simplified my code by quite a few lines. 00:25:25.410 --> 00:25:29.300 realloc, by definition-- this is another function in stdlib.h-- 00:25:29.300 --> 00:25:32.050 handles the process of taking an existing chunk of memory that you 00:25:32.050 --> 00:25:34.930 already asked for, resizes it to be this new size, 00:25:34.930 --> 00:25:36.730 whether it's bigger or smaller. 00:25:36.730 --> 00:25:40.330 It handles the copying of the data from old to new to you. 00:25:40.330 --> 00:25:42.700 And you just need to check that nothing went wrong 00:25:42.700 --> 00:25:46.000 as by checking for null here and then remembering the new value. 00:25:46.000 --> 00:25:48.940 So this code now, which is just six lines of code, 00:25:48.940 --> 00:25:51.040 previously was more than that. 00:25:51.040 --> 00:25:52.880 And it's just a handy function to use. 00:25:52.880 --> 00:25:55.992 All right, a question from earlier. 00:25:55.992 --> 00:26:00.384 AUDIENCE: Why can't we just create this to the temp in the beginning, 00:26:00.384 --> 00:26:04.776 because if we equate this to the temp, then we equate this to the pointer 00:26:04.776 --> 00:26:07.923 perhaps, so this to the point to the 4 bytes of memory? 00:26:07.923 --> 00:26:09.340 DAVID MALAN: Really good question. 00:26:09.340 --> 00:26:10.950 So let me roll this back by rewinding. 00:26:10.950 --> 00:26:13.830 And all of the finished versions are on the course's website 00:26:13.830 --> 00:26:15.570 if you want to play with them later. 00:26:15.570 --> 00:26:18.360 This was the previous version using just malloc. 00:26:18.360 --> 00:26:23.580 If you just do this, update a new chunk of memory, as I think you're asking, 00:26:23.580 --> 00:26:27.600 what's happening is you are effectively orphaning the old chunk of memory. 00:26:27.600 --> 00:26:29.730 Because if you change what's stored in list 00:26:29.730 --> 00:26:32.860 and have it store the new chunk of memory, where'd the old chunk of memory 00:26:32.860 --> 00:26:33.360 go? 00:26:33.360 --> 00:26:35.280 It's sort of floating there in your computer's memory. 00:26:35.280 --> 00:26:36.840 But you've lost all pointers to it. 00:26:36.840 --> 00:26:39.242 There's no arrow anymore pointing to it conceptually. 00:26:39.242 --> 00:26:41.700 So that's why we have to jump through these hoops of having 00:26:41.700 --> 00:26:43.650 a temporary variable just so that we don't 00:26:43.650 --> 00:26:45.238 lose track of things we've allocated. 00:26:45.238 --> 00:26:48.030 And we'll see this later today with another data structure as well. 00:26:48.030 --> 00:26:49.210 Yeah. 00:26:49.210 --> 00:26:51.660 AUDIENCE: Somebody asked this, but I don't 00:26:51.660 --> 00:26:56.070 understand that if you initialize temp as a pointer toward integer, 00:26:56.070 --> 00:26:58.955 then does it not create problems that you use it as an array. 00:26:58.955 --> 00:27:00.080 DAVID MALAN: Good question. 00:27:00.080 --> 00:27:02.410 If you initialize temp as a pointer to an integer, 00:27:02.410 --> 00:27:05.180 does it not create problems that you're using it as an array? 00:27:05.180 --> 00:27:09.020 Short answer, no, because again an array by definition from Week 2 00:27:09.020 --> 00:27:10.730 is just a chunk of memory. 00:27:10.730 --> 00:27:13.170 And in C you can use the square bracket notation 00:27:13.170 --> 00:27:18.000 to jump to random parts of that memory using simple arithmetic, bracket 0, 1, 00:27:18.000 --> 00:27:19.880 2, and so forth. 00:27:19.880 --> 00:27:23.000 Last week when we introduced malloc and free and now realloc, 00:27:23.000 --> 00:27:27.690 you now have a more low level way of allocating as much memory as you want. 00:27:27.690 --> 00:27:29.910 So it's a more powerful, general purpose mechanism. 00:27:29.910 --> 00:27:32.000 But at the end of the day, you're still getting back a chunk of memory, 00:27:32.000 --> 00:27:34.910 contiguous memory, bytes that are back to back to back. 00:27:34.910 --> 00:27:38.360 So you can certainly still use the square bracket notation 00:27:38.360 --> 00:27:41.660 because essentially an array is a chunk of memory. 00:27:41.660 --> 00:27:45.380 And malloc gives you a chunk of memory, ergo you can treat it as an array. 00:27:45.380 --> 00:27:47.330 They really are equivalent in that sense. 00:27:47.330 --> 00:27:50.060 You just don't get as many user friendly features as with arrays, 00:27:50.060 --> 00:27:54.440 like them being freed for you, as we never until last week 00:27:54.440 --> 00:27:56.000 do we have to free chunks of memory. 00:27:56.000 --> 00:27:59.140 Arrays do that for you automatically thanks to the compiler. 00:27:59.140 --> 00:27:59.820 Yeah. 00:27:59.820 --> 00:28:06.092 AUDIENCE: Do you not have to recreate the list for temp after line 37. 00:28:06.092 --> 00:28:06.800 DAVID MALAN: Yes. 00:28:06.800 --> 00:28:07.190 Thank you. 00:28:07.190 --> 00:28:08.148 So there is a bug here. 00:28:08.148 --> 00:28:10.985 And if I ran Valgrind on this code, I would see exactly 00:28:10.985 --> 00:28:13.760 that, that I'm leaking some number of bytes. 00:28:13.760 --> 00:28:19.640 So indeed, at the end of this program, I want to free the-- 00:28:19.640 --> 00:28:22.910 let's make sure-- yep, I want to free now 00:28:22.910 --> 00:28:26.420 the new chunk of memory, which is a size 4, to avoid 00:28:26.420 --> 00:28:28.040 exactly the problem you identified. 00:28:28.040 --> 00:28:29.810 Good catch. 00:28:29.810 --> 00:28:34.140 All right , so just for now, even if that's a lot of code, 00:28:34.140 --> 00:28:36.740 let's consider now higher level takeaways from this, 00:28:36.740 --> 00:28:41.000 just so that we can then motivates an alternative approach that allows us 00:28:41.000 --> 00:28:44.840 to stitch together somewhat fancier data structures instead. 00:28:44.840 --> 00:28:49.712 So in general, a data structure is just a programming construct in C, C++, 00:28:49.712 --> 00:28:52.670 Java, Python-- we'll see them in different languages over the remainder 00:28:52.670 --> 00:28:53.480 of the term-- 00:28:53.480 --> 00:28:57.710 that allow you to store information differently in your computer's memory. 00:28:57.710 --> 00:29:04.010 In C, everything we're about to do today is thanks to these three features of C. 00:29:04.010 --> 00:29:06.860 So even though this may feel like a lot of syntax, everything we do 00:29:06.860 --> 00:29:11.420 today boils down to three pieces of syntax that you have seen before. 00:29:11.420 --> 00:29:14.660 Struct, this recall was a keyword in C that allows 00:29:14.660 --> 00:29:16.200 you to create your own structure. 00:29:16.200 --> 00:29:19.100 For instance, a couple of weeks ago, we created a person structure, 00:29:19.100 --> 00:29:20.840 who had a name and a number. 00:29:20.840 --> 00:29:25.160 And that gives us our own data type that is structured to contain two values, 00:29:25.160 --> 00:29:27.070 like name and number. 00:29:27.070 --> 00:29:28.820 You use structures as well for the problem 00:29:28.820 --> 00:29:31.190 set involving bitmaps, the bitmap header and so forth. 00:29:31.190 --> 00:29:33.200 Those were data structures as well. 00:29:33.200 --> 00:29:35.630 What do we use the dot notation for just to be clear? 00:29:35.630 --> 00:29:37.850 And you definitely use this when manipulating 00:29:37.850 --> 00:29:39.630 red and green and blue pixels recently. 00:29:39.630 --> 00:29:40.130 Yeah. 00:29:40.130 --> 00:29:41.900 AUDIENCE: To access a property of a structure. 00:29:41.900 --> 00:29:42.320 DAVID MALAN: Exactly. 00:29:42.320 --> 00:29:44.160 To access a property of a structure. 00:29:44.160 --> 00:29:47.390 So if you want to get at a person's name or get at a person's number, 00:29:47.390 --> 00:29:50.090 you use the variable's name and then a dot operator 00:29:50.090 --> 00:29:51.760 to get inside of that data structure. 00:29:51.760 --> 00:29:52.970 So we've seen that before. 00:29:52.970 --> 00:29:56.487 Then last week and again today, we see the star operator, 00:29:56.487 --> 00:29:59.070 which can kind of mean different things in different contexts. 00:29:59.070 --> 00:30:02.120 But it's always related here to memory as of last week. 00:30:02.120 --> 00:30:05.930 This is the dereference operator that allows you to go to a chunk of memory 00:30:05.930 --> 00:30:07.820 by way of this thing called a pointer. 00:30:07.820 --> 00:30:10.160 So even if today feels like a bit of that fire hose-- 00:30:10.160 --> 00:30:12.320 and again per my email, this is where things now 00:30:12.320 --> 00:30:16.850 begin to level off-- realize that it all boils down to first principles, 00:30:16.850 --> 00:30:19.970 or if you will sort of three scratch-like puzzle pieces 00:30:19.970 --> 00:30:23.760 that we're now going to assemble into more interesting solutions to problems. 00:30:23.760 --> 00:30:26.930 So allow me to introduce something called a linked list. 00:30:26.930 --> 00:30:30.980 A linked list, as we'll see is going to allow you to store a list of values. 00:30:30.980 --> 00:30:33.410 Now an array allows you to store a list of values. 00:30:33.410 --> 00:30:36.020 But what are some of the downsides with an array? 00:30:36.020 --> 00:30:38.480 Well, an array is a fixed chunk of memory. 00:30:38.480 --> 00:30:40.730 And if you want to resize that array to add more 00:30:40.730 --> 00:30:43.130 values to it, what do you have to do? 00:30:43.130 --> 00:30:45.710 Well, you minimally have to allocate more memory. 00:30:45.710 --> 00:30:48.740 You need to copy all of those values from old to new. 00:30:48.740 --> 00:30:50.870 And then you can go about your business. 00:30:50.870 --> 00:30:53.600 Now, realloc is a function that makes that a little simpler. 00:30:53.600 --> 00:30:56.090 But realloc is doing the exact same legwork 00:30:56.090 --> 00:30:59.540 that I was doing between the lockers of copying value, freeing memory, 00:30:59.540 --> 00:31:00.180 and so forth. 00:31:00.180 --> 00:31:01.138 So it needs to be done. 00:31:01.138 --> 00:31:04.850 And that's why insertion into an array is going to be big O of n, 00:31:04.850 --> 00:31:07.160 because it might take you that much time to copy 00:31:07.160 --> 00:31:09.303 the whole array into a new space. 00:31:09.303 --> 00:31:10.970 So that feels kind of suboptimal, right? 00:31:10.970 --> 00:31:12.870 Arrays can be slow in that sense. 00:31:12.870 --> 00:31:14.960 But what was the appeal of an array? 00:31:14.960 --> 00:31:16.220 Like what's good about arrays? 00:31:16.220 --> 00:31:17.930 Because we don't want to abandon them entirely. 00:31:17.930 --> 00:31:18.150 Yeah. 00:31:18.150 --> 00:31:20.150 AUDIENCE: You can index into them really easily. 00:31:20.150 --> 00:31:22.760 DAVID MALAN: You can index into them really easily, not only syntactically 00:31:22.760 --> 00:31:25.700 with the square brackets, but you have constant time access-- 00:31:25.700 --> 00:31:27.103 this is known as random access. 00:31:27.103 --> 00:31:30.020 And it's not random in the sense that you just end up who knows where. 00:31:30.020 --> 00:31:33.520 You can just jump to bracket 0 or 1 or 2 instantly. 00:31:33.520 --> 00:31:36.270 It took me, the human, more time because I had to physically walk. 00:31:36.270 --> 00:31:40.370 But a computer is going to be able to jump to 0, 1, 2, 3 instantly. 00:31:40.370 --> 00:31:42.392 And so arrays are super fast. 00:31:42.392 --> 00:31:44.600 And they lend themselves to things like binary search 00:31:44.600 --> 00:31:46.710 as we've seen now for some time. 00:31:46.710 --> 00:31:49.430 But what if we use the canvas that is our computer's 00:31:49.430 --> 00:31:51.430 memory like a little more cleverly? 00:31:51.430 --> 00:31:54.680 We don't have to just plop things next to each other, next to each other, next 00:31:54.680 --> 00:31:56.597 to each other, and then hope for the best hope 00:31:56.597 --> 00:31:58.777 that there's still more memory back to back to back. 00:31:58.777 --> 00:32:00.860 What if we instead are a bit more clever about it? 00:32:00.860 --> 00:32:03.200 And suppose we want to store the number 1. 00:32:03.200 --> 00:32:05.930 And that happens to be an address 0x123. 00:32:05.930 --> 00:32:06.620 It's arbitrary. 00:32:06.620 --> 00:32:09.470 But recall from last week that every byte of memory in your computer 00:32:09.470 --> 00:32:10.340 is stored somewhere. 00:32:10.340 --> 00:32:13.700 So let's propose that 1 is stored at 0x123. 00:32:13.700 --> 00:32:17.270 Suppose now that this represents an array of size 1 00:32:17.270 --> 00:32:20.300 and you want to add a second value to this array. 00:32:20.300 --> 00:32:22.640 Or let's start calling things more generally a list. 00:32:22.640 --> 00:32:25.160 A list like in the real world is just a list of values. 00:32:25.160 --> 00:32:26.583 This list is of size 1. 00:32:26.583 --> 00:32:29.750 Now maybe there's a lot of EMMAs in this memory that are getting in the way. 00:32:29.750 --> 00:32:33.410 But suppose that there is some free space a little lower in your computer's 00:32:33.410 --> 00:32:34.490 memory over there. 00:32:34.490 --> 00:32:36.360 So it's not here. 00:32:36.360 --> 00:32:37.670 It's not here. 00:32:37.670 --> 00:32:39.647 It's not available here or here or here. 00:32:39.647 --> 00:32:40.730 There's other stuff there. 00:32:40.730 --> 00:32:43.670 But suppose the computer does have some available memory over here 00:32:43.670 --> 00:32:46.090 in which you can store the number two, just because. 00:32:46.090 --> 00:32:48.590 And that address happens to be 456. 00:32:48.590 --> 00:32:50.232 Finally, you want to store third value. 00:32:50.232 --> 00:32:52.940 And it turns out that the nearest possible location is down here, 00:32:52.940 --> 00:32:53.720 number 3. 00:32:53.720 --> 00:32:57.320 That happens to be at address 0x789. 00:32:57.320 --> 00:33:01.250 So this is not an array by definition, because the 1, the 2, the 3 00:33:01.250 --> 00:33:03.560 are not contiguous back to back to back. 00:33:03.560 --> 00:33:06.980 You cannot square bracket notation here because square bracket notation 00:33:06.980 --> 00:33:11.520 requires, per Week 2, that all of your values be next to each other, 00:33:11.520 --> 00:33:14.330 just like the lockers here. 00:33:14.330 --> 00:33:18.240 This picture, where 1 is over here, 2 is over here, 3 is over here is more like, 00:33:18.240 --> 00:33:20.090 oh, maybe this is 0x123. 00:33:20.090 --> 00:33:22.070 Maybe this is 0x456. 00:33:22.070 --> 00:33:23.652 Maybe this is 0x789. 00:33:23.652 --> 00:33:25.110 They're kind of all over the place. 00:33:25.110 --> 00:33:28.520 And that's just because that's what's available in your computer's memory. 00:33:28.520 --> 00:33:31.910 But what if I get a little extravagant and I 00:33:31.910 --> 00:33:35.090 start to use, not just one chunk of memory 00:33:35.090 --> 00:33:38.180 to store each value, like 1, 2, 3, what if I go ahead 00:33:38.180 --> 00:33:42.590 and give myself twice as much memory just to give myself some flexibility? 00:33:42.590 --> 00:33:46.730 So I now conceptual use this chunk of memory to represent one. 00:33:46.730 --> 00:33:49.227 This junk to represent 2, this chunk to represent 3. 00:33:49.227 --> 00:33:52.310 But you know what I'm going to use the latter half of each of those chunks 00:33:52.310 --> 00:33:54.108 for? 00:33:54.108 --> 00:33:54.650 Any thoughts? 00:33:54.650 --> 00:33:56.290 AUDIENCE: Address to the next. 00:33:56.290 --> 00:33:58.480 DAVID MALAN: An address to the next chunk of memory. 00:33:58.480 --> 00:34:01.390 So, for instance, if my goal is to keep this list sorted, 00:34:01.390 --> 00:34:05.080 so I want conceptually to have a list that stores 1, 2, 3, 00:34:05.080 --> 00:34:08.620 why don't I use this as sort of a map or a breadcrumb, if you will, 00:34:08.620 --> 00:34:11.350 that points to the next chunk of memory? 00:34:11.350 --> 00:34:14.260 And why don't I use this chunk of memory to point at the next one? 00:34:14.260 --> 00:34:17.350 And then this chunk of memory, you know what, I just need a special value here. 00:34:17.350 --> 00:34:19.400 What would be a good arbitrary way to say, mm, mm, 00:34:19.400 --> 00:34:20.775 there's nothing more in the list? 00:34:20.775 --> 00:34:21.400 AUDIENCE: Null. 00:34:21.400 --> 00:34:23.067 DAVID MALAN: It's something called null. 00:34:23.067 --> 00:34:26.469 And this technically is different from backslash 0, which is a char. 00:34:26.469 --> 00:34:28.090 This is something called-- 00:34:28.090 --> 00:34:30.380 well, this is in hexadecimal 0. 00:34:30.380 --> 00:34:33.010 Now, starting today-- and we saw the super briefly last week-- 00:34:33.010 --> 00:34:37.150 this is n-u-l-l with two L's-- this was stupid left hand not really talking 00:34:37.150 --> 00:34:40.960 to right hand-- n-u-l is backslash 0, which is a char. 00:34:40.960 --> 00:34:43.909 n-u-l-l is a pointer. 00:34:43.909 --> 00:34:46.989 But they both equal 0 underneath the hood. 00:34:46.989 --> 00:34:50.020 So you just store special value that says that's it for the list. 00:34:50.020 --> 00:34:53.469 Now, we last week I proposed who really cares where things are in memory? 00:34:53.469 --> 00:34:54.909 So indeed, let's do that again. 00:34:54.909 --> 00:34:58.420 Let's just use pointers drawn as arrows in this artist's rendition 00:34:58.420 --> 00:35:02.200 to say this list of numbers, 1 2, 3, is now linked. 00:35:02.200 --> 00:35:06.760 A linked list is just a data structure containing multiple chunks of memory 00:35:06.760 --> 00:35:08.475 that are somehow linked together. 00:35:08.475 --> 00:35:11.350 And if underneath the hood, so to speak, they're just linked together 00:35:11.350 --> 00:35:14.830 by way of pointers, and the price we pay is 00:35:14.830 --> 00:35:18.820 that rather than now in a linked list storing just the numbers 1, 2, 3, which 00:35:18.820 --> 00:35:22.990 we could have in an array, now you have to store twice as much information, 1, 00:35:22.990 --> 00:35:27.700 2, 3, as well as three pointers, two of which are in use, the other of which 00:35:27.700 --> 00:35:31.370 is ready to go if I want to add something to this list. 00:35:31.370 --> 00:35:34.120 So this is to say we can now create structures 00:35:34.120 --> 00:35:36.190 that look like this in the computer's memory 00:35:36.190 --> 00:35:39.085 just by using this new feature of pointers. 00:35:39.085 --> 00:35:41.210 What might these structures look like individually? 00:35:41.210 --> 00:35:43.600 Well, any one of these numbers has two fields it seems. 00:35:43.600 --> 00:35:44.870 One is an integer. 00:35:44.870 --> 00:35:45.880 We'll call it number. 00:35:45.880 --> 00:35:47.890 And then there's another field here. 00:35:47.890 --> 00:35:51.098 That let's call it next by convention, but we could call it anything we want. 00:35:51.098 --> 00:35:53.140 It's just another chunk of memory that's pointing 00:35:53.140 --> 00:35:54.910 to the next element in the list. 00:35:54.910 --> 00:35:57.238 Well a couple of weeks ago, we introduced persons. 00:35:57.238 --> 00:35:58.780 And a person had a name and a number. 00:35:58.780 --> 00:36:01.630 That's not relevant today, because we're not dealing with names and numbers. 00:36:01.630 --> 00:36:03.040 We're just dealing with integers. 00:36:03.040 --> 00:36:06.670 So let me propose that we back that up and still use the same syntax 00:36:06.670 --> 00:36:08.150 as a couple of weeks ago. 00:36:08.150 --> 00:36:12.430 But instead of defining a person, let's call this rectangle a node. 00:36:12.430 --> 00:36:15.700 So this is a term of art in computer science node-- n-o-d-e-- 00:36:15.700 --> 00:36:20.860 just represents this rectangular concept, a chunk of memory 00:36:20.860 --> 00:36:22.690 that you're using for interesting purposes. 00:36:22.690 --> 00:36:25.930 It's sort of a node in a graph if familiar from math. 00:36:25.930 --> 00:36:27.795 But what do I want this know to store? 00:36:27.795 --> 00:36:30.170 Well, let me go ahead and store a couple of things in it. 00:36:30.170 --> 00:36:32.253 One, a number, and that's just going to be an int. 00:36:32.253 --> 00:36:34.300 And I'm going to go ahead and call it number. 00:36:34.300 --> 00:36:39.630 And then any guesses as to what the second field should be declared as? 00:36:39.630 --> 00:36:42.400 I want to call it next just because it's conventional. 00:36:42.400 --> 00:36:44.938 What should its data type be? 00:36:44.938 --> 00:36:45.480 Any thoughts? 00:36:45.480 --> 00:36:45.880 Yeah in back. 00:36:45.880 --> 00:36:46.780 AUDIENCE: A pointer. 00:36:46.780 --> 00:36:47.738 DAVID MALAN: A pointer. 00:36:47.738 --> 00:36:49.650 And a pointer to what would you say? 00:36:49.650 --> 00:36:50.970 AUDIENCE: The next number. 00:36:50.970 --> 00:36:54.850 DAVID MALAN: A pointer to the next number, and not quite the next number 00:36:54.850 --> 00:36:58.940 per se, because now we have not numbers only, we now have nodes. 00:36:58.940 --> 00:37:02.842 So those three yellow boxes, 1 2, 3, those are now nodes, I would say. 00:37:02.842 --> 00:37:03.550 So you know what? 00:37:03.550 --> 00:37:06.490 Let's go ahead and call this node star. 00:37:06.490 --> 00:37:08.660 But you can't technically quite do this. 00:37:08.660 --> 00:37:11.800 It turns out that C, recall, takes you super literally. 00:37:11.800 --> 00:37:13.880 And notice, if you read this code top to bottom, 00:37:13.880 --> 00:37:15.820 left to right, at which point in the story 00:37:15.820 --> 00:37:18.660 does the word node come into existence? 00:37:18.660 --> 00:37:20.715 Like not until the very last line. 00:37:20.715 --> 00:37:22.090 That's where we mentioned person. 00:37:22.090 --> 00:37:23.530 This is where we mentioned node. 00:37:23.530 --> 00:37:26.980 So, unfortunately, you can't use a node inside of a node, 00:37:26.980 --> 00:37:29.500 because it literally doesn't exist in the computer's mind 00:37:29.500 --> 00:37:31.118 until two lines later. 00:37:31.118 --> 00:37:32.410 So there's an alternative here. 00:37:32.410 --> 00:37:33.415 There is a solution. 00:37:33.415 --> 00:37:34.540 It's a little more verbose. 00:37:34.540 --> 00:37:39.340 Instead of just saying typedef struct, you actually say typedef struct node. 00:37:39.340 --> 00:37:41.360 Just add the word that you want to use. 00:37:41.360 --> 00:37:45.897 And now down here, you can say struct node star next. 00:37:45.897 --> 00:37:47.230 It's kind of like a work around. 00:37:47.230 --> 00:37:48.520 This is the way C works. 00:37:48.520 --> 00:37:50.350 But this is the exact same idea. 00:37:50.350 --> 00:37:54.460 This means that any node in the data structure, any yellow rectangular box 00:37:54.460 --> 00:37:56.440 has a number and a pointer. 00:37:56.440 --> 00:38:01.360 And that pointer by definition is a pointer to a node structure. 00:38:01.360 --> 00:38:03.610 We just have to express it more verbosely 00:38:03.610 --> 00:38:07.360 here, because the shorthand notation node doesn't exist until the bottom. 00:38:07.360 --> 00:38:11.260 It's just sort of an annoying reality of the syntax. 00:38:11.260 --> 00:38:14.950 All right, any questions on that definition of struct? 00:38:14.950 --> 00:38:16.150 Yeah. 00:38:16.150 --> 00:38:18.630 AUDIENCE: Do you have to put node twice? 00:38:18.630 --> 00:38:20.590 DAVID MALAN: Do you have to put node twice? 00:38:20.590 --> 00:38:22.360 You don't have to put node twice. 00:38:22.360 --> 00:38:25.780 You can actually use any word here you want. 00:38:25.780 --> 00:38:27.500 You can call this x or y or z. 00:38:27.500 --> 00:38:30.250 But then you're going to have to make this be struct x or struct y 00:38:30.250 --> 00:38:31.240 or struct z. 00:38:31.240 --> 00:38:33.800 By convention, I would just reuse the same term. 00:38:33.800 --> 00:38:35.830 So this is the formal name for this structure. 00:38:35.830 --> 00:38:37.670 It is a struct node. 00:38:37.670 --> 00:38:41.390 This is the nickname for the structure, more succinctly, node. 00:38:41.390 --> 00:38:42.610 And that's what typedef does. 00:38:42.610 --> 00:38:46.240 It gives you an alias from struct node to just node, 00:38:46.240 --> 00:38:49.660 just because it's easier to type elsewhere in the program. 00:38:49.660 --> 00:38:52.060 Other questions? 00:38:52.060 --> 00:38:54.770 All right, so what can we do now with this structure? 00:38:54.770 --> 00:38:56.860 Well, let's go ahead and build something up here. 00:38:56.860 --> 00:38:59.850 All right, so this is about as scary as the code gets today. 00:38:59.850 --> 00:39:02.380 We'll focus primarily on pictures and concepts hereafter. 00:39:02.380 --> 00:39:04.690 But let's take a tour through one implementation 00:39:04.690 --> 00:39:06.370 of this same idea of a linked list. 00:39:06.370 --> 00:39:09.520 How would we go about representing a linked list initially? 00:39:09.520 --> 00:39:11.420 Well, initially the list is empty. 00:39:11.420 --> 00:39:13.587 And if you want to represent something that's empty, 00:39:13.587 --> 00:39:14.830 we minimally need something. 00:39:14.830 --> 00:39:17.320 So let me draw it as this empty box. 00:39:17.320 --> 00:39:19.660 And this is just a pointer to a node, I claim. 00:39:19.660 --> 00:39:23.100 So how do I implement the notion of a linked list that has no numbers in it 00:39:23.100 --> 00:39:23.800 yet? 00:39:23.800 --> 00:39:26.810 Well, why don't we just use this, which I can implement as follows. 00:39:26.810 --> 00:39:30.550 node star, and I'm going to call it list, but then set it equal to NULL. 00:39:30.550 --> 00:39:33.550 Right, if there's no numbers available-- there's no 1, there's no 2, 00:39:33.550 --> 00:39:34.450 there's no three-- 00:39:34.450 --> 00:39:38.710 I should at least have a variable that connotes there is no list. 00:39:38.710 --> 00:39:42.490 And the easiest way to do that is in the absence of a value, store 0, 00:39:42.490 --> 00:39:45.940 which has this new nickname as of last week and this, called null. 00:39:45.940 --> 00:39:49.930 So this variable here represents this picture here. 00:39:49.930 --> 00:39:52.780 And notice, there's no numbers, because the list is empty. 00:39:52.780 --> 00:39:55.090 But we do initialize it to NULL so that we 00:39:55.090 --> 00:39:56.920 don't think that there's an arrow pointing 00:39:56.920 --> 00:39:58.510 to some specific chunk of memory yet. 00:39:58.510 --> 00:39:59.860 Because there isn't yet. 00:39:59.860 --> 00:40:03.370 Now, suppose I want to go ahead and insert a number into this list. 00:40:03.370 --> 00:40:05.470 Suppose I want to insert the number 2. 00:40:05.470 --> 00:40:08.350 I can't just allocate space for 2 now. 00:40:08.350 --> 00:40:11.200 I have to allocate space for 2 and that pointer, 00:40:11.200 --> 00:40:14.817 otherwise known together as a node, per the previous slide. 00:40:14.817 --> 00:40:16.150 So how do I go about doing this? 00:40:16.150 --> 00:40:19.182 Well, in code, I can borrow the same technique 00:40:19.182 --> 00:40:21.640 that we've used a couple times now, even though it's uglier 00:40:21.640 --> 00:40:24.985 than some past approaches, malloc then an integer. 00:40:24.985 --> 00:40:26.110 How many bytes do you want? 00:40:26.110 --> 00:40:27.402 I don't know how big a node is. 00:40:27.402 --> 00:40:31.360 I could probably do the math and add up the integer and then the pointer. 00:40:31.360 --> 00:40:34.720 But, you know what, size of node is just going to answer that question for me. 00:40:34.720 --> 00:40:38.300 So this returns that chunk of memory that's big enough to store a node. 00:40:38.300 --> 00:40:40.300 And I'm going to store that just for temporarily 00:40:40.300 --> 00:40:43.330 in a variable called n, n for node, and that's going to just 00:40:43.330 --> 00:40:45.330 be a temporary variable, if you will. 00:40:45.330 --> 00:40:47.830 So, again, even though there's some new stuff going on here, 00:40:47.830 --> 00:40:48.872 this is just like before. 00:40:48.872 --> 00:40:50.870 Previously, I wanted to allocate an integer. 00:40:50.870 --> 00:40:52.510 Now, I want more than an integer. 00:40:52.510 --> 00:40:53.830 I want an actual node. 00:40:53.830 --> 00:40:57.640 And malloc returns an address, which means I must assign it to a variable. 00:40:57.640 --> 00:40:59.392 That's an address on the left hand side. 00:40:59.392 --> 00:41:00.850 All right, what should I always do? 00:41:00.850 --> 00:41:03.290 Slight spoiler because I clicked ahead a moment ago-- 00:41:03.290 --> 00:41:05.440 actually, we're going to forge ahead here. 00:41:05.440 --> 00:41:07.840 This is the ugliest thing we'll see. 00:41:07.840 --> 00:41:10.180 What is this second line of code doing here? 00:41:13.330 --> 00:41:15.280 What's going on here do you think? 00:41:15.280 --> 00:41:16.280 Yeah, what do you think? 00:41:16.280 --> 00:41:18.898 AUDIENCE: It's setting the number of that node to 2. 00:41:18.898 --> 00:41:19.690 DAVID MALAN: It is. 00:41:19.690 --> 00:41:21.740 It's setting the number of that node to 2. 00:41:21.740 --> 00:41:24.640 But why this crazy syntax, which we've never used before? 00:41:24.640 --> 00:41:26.650 Well, star n, we did see last week. 00:41:26.650 --> 00:41:28.180 That just means go there. 00:41:28.180 --> 00:41:30.850 The parentheses are just necessary for order of operations 00:41:30.850 --> 00:41:33.517 so that the compiler knows, OK, first go there. 00:41:33.517 --> 00:41:36.100 And then once you're there, what do you want to get access to? 00:41:36.100 --> 00:41:36.960 The number field. 00:41:36.960 --> 00:41:38.470 So use the same dot notation. 00:41:38.470 --> 00:41:39.760 So it's super ugly. 00:41:39.760 --> 00:41:43.070 But it's just doing two different things that we've seen in isolation. 00:41:43.070 --> 00:41:45.520 Go to the address in n, which is that chunk of memory. 00:41:45.520 --> 00:41:48.820 And then access the number field and set it equal to 2. 00:41:48.820 --> 00:41:53.337 Fortunately, C has some syntactic sugar, just an easier, prettier way 00:41:53.337 --> 00:41:53.920 of doing this. 00:41:53.920 --> 00:41:57.003 And it happens to look wonderfully like the actual thing we keep drawing-- 00:41:57.003 --> 00:41:58.690 this arrow notation. 00:41:58.690 --> 00:42:01.760 So if you ever see and you ever write this notation in C-- 00:42:01.760 --> 00:42:05.050 and I'm pretty sure this is the last new syntax we'll see-- 00:42:05.050 --> 00:42:07.990 this arrow, this very sort of hackish era 00:42:07.990 --> 00:42:10.660 where you hit a hyphen and then a greater than sign, 00:42:10.660 --> 00:42:12.740 this means the exact same thing as this. 00:42:12.740 --> 00:42:13.990 This is just annoying to type. 00:42:13.990 --> 00:42:15.040 It's ugly to look at. 00:42:15.040 --> 00:42:16.457 This is just slightly more pretty. 00:42:16.457 --> 00:42:18.820 And frankly, it's reminiscent of the pictures 00:42:18.820 --> 00:42:21.370 we've been drawing with the arrows pointing left and right. 00:42:21.370 --> 00:42:22.828 What's the next thing I want to do? 00:42:22.828 --> 00:42:25.120 After allocating this new node for the number 2, 00:42:25.120 --> 00:42:28.414 what do I want to put as well in that node? 00:42:28.414 --> 00:42:29.790 AUDIENCE: Put in the address. 00:42:29.790 --> 00:42:31.290 DAVID MALAN: Sorry, a little louder. 00:42:31.290 --> 00:42:32.650 AUDIENCE: The next address. 00:42:32.650 --> 00:42:33.880 DAVID MALAN: The address of the next node. 00:42:33.880 --> 00:42:35.170 But there is no next node yet. 00:42:35.170 --> 00:42:36.790 So what value could I use as a placeholder? 00:42:36.790 --> 00:42:37.540 AUDIENCE: Null. 00:42:37.540 --> 00:42:37.960 DAVID MALAN: Null. 00:42:37.960 --> 00:42:40.490 And so indeed, I'm going to do this arrow notation as well. 00:42:40.490 --> 00:42:43.420 You never need to do the star and then the dots and the parentheses. 00:42:43.420 --> 00:42:46.280 Everyone just writes the code like this in the real world. 00:42:46.280 --> 00:42:48.970 So n arrow next gets null. 00:42:48.970 --> 00:42:52.650 That now gives me that picture we were drawing. 00:42:52.650 --> 00:42:56.170 But, again, sanity check, if you ever use malloc, 00:42:56.170 --> 00:42:58.240 you should always check the return value. 00:42:58.240 --> 00:43:00.370 So just to be super precise, let me go ahead 00:43:00.370 --> 00:43:04.720 and add a couple more lines of code that just check if n is not null, 00:43:04.720 --> 00:43:06.400 go ahead and do the following. 00:43:06.400 --> 00:43:09.550 Conversely, I could check if n is null and then just exit or return 00:43:09.550 --> 00:43:11.350 depending on where I'm using this code. 00:43:11.350 --> 00:43:14.860 But you don't want to touch n and use this arrow notation 00:43:14.860 --> 00:43:17.958 unless you're sure n is not null. 00:43:17.958 --> 00:43:19.000 So what have I just done? 00:43:19.000 --> 00:43:20.970 My picture now looks like this. 00:43:20.970 --> 00:43:22.720 But this, of course, is not a linked list, 00:43:22.720 --> 00:43:24.670 because there's no linkage going on. 00:43:24.670 --> 00:43:27.520 I really need to do the equivalent of pointing an arrow 00:43:27.520 --> 00:43:30.340 from this pointer to this structure. 00:43:30.340 --> 00:43:32.870 I need to implement an arrow that looks like this. 00:43:32.870 --> 00:43:34.933 So how can we go about implementing that in code? 00:43:34.933 --> 00:43:37.600 Well, let me propose that this is what it ultimately looks like. 00:43:37.600 --> 00:43:39.280 We just need to draw that arrow. 00:43:39.280 --> 00:43:40.790 How do I do that? 00:43:40.790 --> 00:43:42.700 Well, it's as simple as this. 00:43:42.700 --> 00:43:47.200 If list is a variable, and it's previously initialized to null-- 00:43:47.200 --> 00:43:48.340 it's just a place holder-- 00:43:48.340 --> 00:43:53.380 and n is my temporary variable storing the new node, it suffices to say, 00:43:53.380 --> 00:43:55.420 well, lists should not be null anymore. 00:43:55.420 --> 00:43:59.740 It should literally equal the address of that chunk of memory I just allocated. 00:43:59.740 --> 00:44:03.648 And that's how we get this picture now inside of the computer. 00:44:03.648 --> 00:44:05.440 Now, let me do a couple of more operations. 00:44:05.440 --> 00:44:07.570 Suppose I want to add to the list the number 4. 00:44:07.570 --> 00:44:08.860 How do I add the number 4? 00:44:08.860 --> 00:44:10.990 Well, the number 4 is inside of its own node. 00:44:10.990 --> 00:44:13.030 So I have to go back to code like this. 00:44:13.030 --> 00:44:16.348 I need to allocate another node that installs the number 4 there. 00:44:16.348 --> 00:44:17.140 But that's not all. 00:44:17.140 --> 00:44:18.970 You don't want to just create the node, because it's 00:44:18.970 --> 00:44:21.120 otherwise out there in no man's land, so to speak. 00:44:21.120 --> 00:44:22.670 We now need to add the arrow. 00:44:22.670 --> 00:44:26.700 But now it gets a little non-obvious how you update the arrows, 00:44:26.700 --> 00:44:28.450 right, because I don't want to update list 00:44:28.450 --> 00:44:32.170 to point at 4, because that's going to sort of orphan, so to speak, number 2. 00:44:32.170 --> 00:44:34.090 And it just kind of float away conceptually. 00:44:34.090 --> 00:44:38.140 I really want to update 2's pointer to 4. 00:44:38.140 --> 00:44:39.015 So how can I do that? 00:44:39.015 --> 00:44:41.973 Well, you know what I can do is I can kind of follow these breadcrumbs. 00:44:41.973 --> 00:44:43.480 If I declare a temporary pointer-- 00:44:43.480 --> 00:44:46.300 and I'll do it using a little extravagantly last week 00:44:46.300 --> 00:44:50.500 like this little pointer notation-- if I'm a variable called temp, TMP, 00:44:50.500 --> 00:44:54.130 I can go ahead and point at the same thing that list is pointing at. 00:44:54.130 --> 00:44:58.040 And I'm going to check is this next value null? 00:44:58.040 --> 00:44:59.750 If it is, I found the end of the list. 00:44:59.750 --> 00:45:01.390 So really I can follow that arrow. 00:45:01.390 --> 00:45:03.460 Now, I know that I'm at a null pointer. 00:45:03.460 --> 00:45:05.825 So now, I just want to draw this number up here. 00:45:05.825 --> 00:45:07.450 And I accidentally advanced the screen. 00:45:07.450 --> 00:45:10.460 I want to actually draw this arrow up here. 00:45:10.460 --> 00:45:12.310 So how do we go about doing that? 00:45:15.770 --> 00:45:18.860 Well, the code there might look like this. 00:45:18.860 --> 00:45:23.710 So if all I want to point at a node, as I just did with the big fuzzy hand, 00:45:23.710 --> 00:45:27.280 I can just initialize this pointer to equal whatever the list itself 00:45:27.280 --> 00:45:28.305 is pointing at. 00:45:28.305 --> 00:45:29.680 Then, I can do like a while loop. 00:45:29.680 --> 00:45:33.010 And it's a little weird looking, because I'm using some of my new syntax. 00:45:33.010 --> 00:45:37.060 But this is just asking the question, while the next field I'm pointing at 00:45:37.060 --> 00:45:40.570 is not NULL, go ahead and follow it. 00:45:40.570 --> 00:45:43.480 So again, this is as complicated as the syntax today will get. 00:45:43.480 --> 00:45:46.810 But this is just saying, whatever I'm pointing at point specifically 00:45:46.810 --> 00:45:48.160 at the next field. 00:45:48.160 --> 00:45:51.640 While it is not NULL, go ahead and update yourself 00:45:51.640 --> 00:45:54.440 to point at whatever it is pointing at. 00:45:54.440 --> 00:45:56.560 So if I advance to the next slide here, this 00:45:56.560 --> 00:45:58.660 is like I'm initially pointing at 2. 00:45:58.660 --> 00:45:59.318 I see an arrow. 00:45:59.318 --> 00:46:00.610 I'm going to follow that arrow. 00:46:00.610 --> 00:46:01.902 I'm going to follow that arrow. 00:46:01.902 --> 00:46:07.000 So however big the list is I just keep moving my temporary pointer to follow 00:46:07.000 --> 00:46:09.340 these breadcrumbs until I hit NULL. 00:46:09.340 --> 00:46:12.212 So here in the story let me propose that we add another number, 5. 00:46:12.212 --> 00:46:14.920 And 5, of course, if we keep it sorted, it's got to go over here. 00:46:14.920 --> 00:46:16.750 And again, they're all over my computer's memory. 00:46:16.750 --> 00:46:18.030 They're not in a perfectly straight line, 00:46:18.030 --> 00:46:20.030 because who knows where there's available space. 00:46:20.030 --> 00:46:24.480 But now that I found this, I want to go ahead and create one more pointer using 00:46:24.480 --> 00:46:28.180 code very similar to what we just saw. 00:46:28.180 --> 00:46:31.380 But now lastly, let's do one more here at the beginning 00:46:31.380 --> 00:46:34.350 and then one more in the middle and see what can go wrong. 00:46:34.350 --> 00:46:37.200 What is worrisome about 1 if we actually want 00:46:37.200 --> 00:46:40.680 to store this list in sorted order? 00:46:40.680 --> 00:46:43.500 What might I be mindful of now if the goal 00:46:43.500 --> 00:46:46.860 is to insert 1 into this linked list? 00:46:46.860 --> 00:46:47.520 Any thoughts? 00:46:50.915 --> 00:46:52.040 What do I want to do first? 00:46:52.040 --> 00:46:54.080 Well, you know what, let me go ahead and just 00:46:54.080 --> 00:46:56.750 point-- you know what, it's obviously got to go to the start of the list 00:46:56.750 --> 00:46:59.167 if I want to keep it sorted, so that the arrows eventually 00:46:59.167 --> 00:47:00.270 go from left to right. 00:47:00.270 --> 00:47:03.530 So let me go ahead and just use code like this to allocate the new node. 00:47:03.530 --> 00:47:07.300 And let me go ahead and just move that arrow like this. 00:47:07.300 --> 00:47:10.720 This is wrong even though we've not seen the code for it. 00:47:10.720 --> 00:47:12.490 But why is this wrong? 00:47:12.490 --> 00:47:12.990 Yeah. 00:47:12.990 --> 00:47:15.570 AUDIENCE: You're orphaning 2, 4, and 5. 00:47:15.570 --> 00:47:17.070 DAVID MALAN: I'm orphaning 2, 4, 5. 00:47:17.070 --> 00:47:18.100 In what sense? 00:47:18.100 --> 00:47:22.800 I mean literally in my program, the only variables and the variables 00:47:22.800 --> 00:47:24.870 I have are those you see on the board here. 00:47:24.870 --> 00:47:27.570 So if nothing is pointing at 2 anymore, it 00:47:27.570 --> 00:47:30.330 doesn't matter that 2 is pointing at 4 and 4 is pointing at 5, 00:47:30.330 --> 00:47:33.720 we have orphaned 2 and transitively 4 and 5. 00:47:33.720 --> 00:47:35.220 So those are just lost. 00:47:35.220 --> 00:47:36.480 That is a memory leak. 00:47:36.480 --> 00:47:39.545 If you recall using Valgrind and getting yelled at by Valgrind 00:47:39.545 --> 00:47:42.170 because you're leaking memory, it might be because, yes, you've 00:47:42.170 --> 00:47:43.470 forgotten to free memory. 00:47:43.470 --> 00:47:46.260 Or worse, you might have completely forgotten where 00:47:46.260 --> 00:47:48.300 the memory is that you were using. 00:47:48.300 --> 00:47:54.042 And by definition of your own code, you can never access that memory again. 00:47:54.042 --> 00:47:57.000 You've asked the computer for it, but you're never able to give it back 00:47:57.000 --> 00:47:59.820 because you have no variable remembering where it is. 00:47:59.820 --> 00:48:01.210 So we don't want to do that. 00:48:01.210 --> 00:48:03.360 We instead want to do this probably. 00:48:03.360 --> 00:48:07.260 Let's point 1 to 2 first, which is kind of redundant, right? 00:48:07.260 --> 00:48:09.660 Now, we have sort of conflicting beginnings of the list. 00:48:09.660 --> 00:48:12.590 But once 1 is pointing to 2, what can your next update? 00:48:12.590 --> 00:48:13.470 AUDIENCE: The list. 00:48:13.470 --> 00:48:15.840 DAVID MALAN: List to point at 1. 00:48:15.840 --> 00:48:19.020 And you can do this in code if you'd like really with just two steps. 00:48:19.020 --> 00:48:23.190 You can update the next field of your new node, which 00:48:23.190 --> 00:48:25.290 is the one representing 1 that I just allocated, 00:48:25.290 --> 00:48:28.990 and you can initialize it to point at whatever list is pointing at. 00:48:28.990 --> 00:48:31.860 So if you want this thing to point at the same thing 00:48:31.860 --> 00:48:34.170 that this thing was pointing at, you literally just 00:48:34.170 --> 00:48:38.580 say in code n arrow x equals whatever list is pointing at 00:48:38.580 --> 00:48:42.770 and then you say the list should equal n itself. 00:48:42.770 --> 00:48:44.520 And again, you'll see in section this week 00:48:44.520 --> 00:48:46.770 and in the upcoming problem set actual opportunities 00:48:46.770 --> 00:48:48.510 to apply these kinds of lines of code. 00:48:48.510 --> 00:48:50.430 But those are the kinds of thought processes 00:48:50.430 --> 00:48:51.750 that you should be mindful of. 00:48:51.750 --> 00:48:53.790 Now, 3 is the only one that's particularly annoying. 00:48:53.790 --> 00:48:55.290 And we won't look at the code for this. 00:48:55.290 --> 00:48:58.240 If we actually want to put something in sorted order in the middle of the list, 00:48:58.240 --> 00:49:00.490 let's just consider conceptually what's got to happen. 00:49:00.490 --> 00:49:02.610 We've got to allocate memory for the node. 00:49:02.610 --> 00:49:03.960 We then need to update what? 00:49:03.960 --> 00:49:06.900 We probably don't want to point 2 at 3 for the exact same reason 00:49:06.900 --> 00:49:07.590 you identified. 00:49:07.590 --> 00:49:08.820 We then orphan 4 and 5. 00:49:08.820 --> 00:49:11.250 So what should we update first conceptually? 00:49:11.250 --> 00:49:12.338 AUDIENCE: 3 to 4. 00:49:12.338 --> 00:49:14.880 DAVID MALAN: Update 3 to 4, so it is going to look like this. 00:49:14.880 --> 00:49:17.570 And now we can update 2 to 3. 00:49:17.570 --> 00:49:19.750 And I'm going to wave my hand at the code for this 00:49:19.750 --> 00:49:21.930 only because there's multiple steps now. 00:49:21.930 --> 00:49:23.910 You have to probably have some kind of loop 00:49:23.910 --> 00:49:27.960 that iterates over the existing list, finds the appropriate location using 00:49:27.960 --> 00:49:30.540 less than or greater than, trying to find the right spot. 00:49:30.540 --> 00:49:32.910 And then you have to manipulate the pointers to do that. 00:49:32.910 --> 00:49:34.785 You won't need to do something as complicated 00:49:34.785 --> 00:49:36.432 as that for the upcoming problem set 5. 00:49:36.432 --> 00:49:41.520 But it is just boiling down to some loops, some inequality checks, and then 00:49:41.520 --> 00:49:42.750 some updates of the pointers. 00:49:42.750 --> 00:49:44.880 But it's easier generally to add stuff at the end 00:49:44.880 --> 00:49:47.550 and even easier to add things at the beginning, 00:49:47.550 --> 00:49:52.904 especially if you don't care about maintaining any kind of sorted order. 00:49:52.904 --> 00:49:53.850 Phew. 00:49:53.850 --> 00:49:56.650 Any questions on that? 00:49:56.650 --> 00:49:57.836 Yeah. 00:49:57.836 --> 00:50:01.252 AUDIENCE: Back to the beginning, like the code you had, 00:50:01.252 --> 00:50:09.690 what's the difference between node with star and like a pointer n of type node? 00:50:09.690 --> 00:50:14.350 DAVID MALAN: A pointer n of type-- let me just scroll back to the code, here? 00:50:14.350 --> 00:50:14.980 AUDIENCE: Yeah. 00:50:14.980 --> 00:50:17.010 DAVID MALAN: OK, so this is malloc is going 00:50:17.010 --> 00:50:20.700 to give us a chunk of memory that's big enough to store node. 00:50:20.700 --> 00:50:25.870 Node star n gives us a pointer that is the address of a node. 00:50:25.870 --> 00:50:29.200 And therefore we're going to assign the return value of malloc 00:50:29.200 --> 00:50:33.730 to that variable, so that n effectively represents a chunk of memory 00:50:33.730 --> 00:50:35.904 that's big enough to store a node. 00:50:35.904 --> 00:50:38.690 AUDIENCE: So n is node, not a pointer? 00:50:38.690 --> 00:50:41.720 DAVID MALAN: n is a pointer to a node. 00:50:41.720 --> 00:50:44.200 n is a node star, or a pointer to a node. 00:50:44.200 --> 00:50:45.200 And what does that mean? 00:50:45.200 --> 00:50:46.970 n is the address of a node. 00:50:46.970 --> 00:50:50.000 And that should make sense, because malloc returns an address. 00:50:50.000 --> 00:50:52.370 But this is why we're now using arrow notation. 00:50:52.370 --> 00:50:53.210 n is not a node. 00:50:53.210 --> 00:50:56.750 You can't do n dot number and n dot next. 00:50:56.750 --> 00:50:58.970 You have to do the star thing and then the dot. 00:50:58.970 --> 00:51:04.730 Or more succinctly now, you do an arrow number and arrow next. 00:51:04.730 --> 00:51:05.853 Good question. 00:51:05.853 --> 00:51:09.020 All right, let's see if we can't make this a little more real with maybe one 00:51:09.020 --> 00:51:10.460 demonstration here. 00:51:10.460 --> 00:51:14.220 Let me go ahead and put on the screen the end point that we want to get to, 00:51:14.220 --> 00:51:16.310 which is that here with everything in order, 00:51:16.310 --> 00:51:20.192 I think for this we need maybe five volunteers if we could. 00:51:20.192 --> 00:51:21.650 Let me go a little farther in back. 00:51:21.650 --> 00:51:23.750 OK, one over there. 00:51:23.750 --> 00:51:26.258 Maybe two over there. 00:51:26.258 --> 00:51:27.050 Now the hands are-- 00:51:27.050 --> 00:51:31.210 OK, 3 over here if we can go in back up over there. 00:51:31.210 --> 00:51:33.830 OK, 4 being volunteered by your friend. 00:51:33.830 --> 00:51:35.900 And 5 being volunteered by your friends. 00:51:35.900 --> 00:51:37.160 Do you want to come on up? 00:51:37.160 --> 00:51:38.100 All right, come on up. 00:51:38.100 --> 00:51:39.180 Come on up. 00:51:39.180 --> 00:51:41.180 Brian's going to help me run this demonstration. 00:51:41.180 --> 00:51:47.760 If all five of you could come on over, come on over here 00:51:47.760 --> 00:51:50.240 where we have some space. 00:51:50.240 --> 00:51:54.020 All right, let me get you some microphones and introductions. 00:51:54.020 --> 00:51:55.230 OK, thank you. 00:51:55.230 --> 00:51:58.230 Two of them were bravely volunteered by the people sitting next to them. 00:51:58.230 --> 00:51:59.650 So props to both of you. 00:51:59.650 --> 00:52:02.150 You want to say hello and a little something about yourself. 00:52:02.150 --> 00:52:02.817 AUDIENCE: Hello. 00:52:02.817 --> 00:52:03.940 My name [? Siobhana. ?] 00:52:03.940 --> 00:52:05.690 DAVID MALAN: [? Siobhana. ?] And year or-- 00:52:05.690 --> 00:52:06.560 AUDIENCE: I'm a sophomore. 00:52:06.560 --> 00:52:07.030 DAVID MALAN: Sophomore. 00:52:07.030 --> 00:52:07.530 OK. 00:52:07.530 --> 00:52:09.182 Nice to meet you. 00:52:09.182 --> 00:52:10.130 AUDIENCE: Hi. 00:52:10.130 --> 00:52:11.405 I'm a senior. 00:52:11.405 --> 00:52:13.030 DAVID MALAN: It's nice to have you too. 00:52:13.030 --> 00:52:13.530 Yeah. 00:52:13.530 --> 00:52:16.042 AUDIENCE: Hi, I'm Athena a sophomore in FOHO. 00:52:16.042 --> 00:52:17.508 DAVID MALAN: Athena. 00:52:17.508 --> 00:52:18.050 AUDIENCE: Hi. 00:52:18.050 --> 00:52:18.950 I'm Anurag. 00:52:18.950 --> 00:52:21.470 I'm a first year at Matthews. 00:52:21.470 --> 00:52:22.580 AUDIENCE: I'm Ethan. 00:52:22.580 --> 00:52:24.216 I'm a first year at Weld. 00:52:24.216 --> 00:52:25.325 DAVID MALAN: OK, and-- 00:52:25.325 --> 00:52:26.200 AUDIENCE: I'm Sarika. 00:52:26.200 --> 00:52:27.710 I'm first year in Thayer. 00:52:27.710 --> 00:52:28.190 DAVID MALAN: Wonderful. 00:52:28.190 --> 00:52:29.600 All right, thank you all for volunteering. 00:52:29.600 --> 00:52:30.725 Let's go ahead and do this. 00:52:30.725 --> 00:52:33.990 You for the moment represent a heap of memory, if you will. 00:52:33.990 --> 00:52:36.410 So if you could maybe all back up over here just 00:52:36.410 --> 00:52:38.240 to where we have some available space. 00:52:38.240 --> 00:52:41.390 We're going to need one of you to represent the list. 00:52:41.390 --> 00:52:42.520 Siobhan was it? 00:52:42.520 --> 00:52:42.810 AUDIENCE: [? Siobhana. ?] 00:52:42.810 --> 00:52:44.120 DAVID MALAN: [? Siobhana, ?] come on up. [? Siobhana, ?] do 00:52:44.120 --> 00:52:45.787 you want to go ahead and represent list. 00:52:45.787 --> 00:52:49.310 And to represent our actual list, we have-- 00:52:49.310 --> 00:52:53.870 or Brian-- yeah, we have a name tag, hello, my name is list. 00:52:53.870 --> 00:52:56.990 So you're going to represent the rectangle on the left that 00:52:56.990 --> 00:52:59.150 represents the linked list itself. 00:52:59.150 --> 00:53:01.868 And now initially we're going to go ahead initialize you to null. 00:53:01.868 --> 00:53:04.160 So you can just go ahead and put that behind your back. 00:53:04.160 --> 00:53:05.240 So you're not pointing at anything. 00:53:05.240 --> 00:53:06.198 But you represent list. 00:53:06.198 --> 00:53:08.810 And there's nothing in the list, no numbers in the list. 00:53:08.810 --> 00:53:09.800 What was the next step? 00:53:09.800 --> 00:53:17.540 If the goal at hand is to insert 2, 4, 5, 1, 3, we want to do what first, 00:53:17.540 --> 00:53:20.430 what lower level operation to get 2 in there? 00:53:20.430 --> 00:53:21.842 What was the first line of code? 00:53:21.842 --> 00:53:22.550 AUDIENCE: malloc. 00:53:22.550 --> 00:53:23.383 DAVID MALAN: malloc. 00:53:23.383 --> 00:53:25.242 So we want to malloc a node for 2. 00:53:25.242 --> 00:53:26.450 So let's go ahead and malloc. 00:53:26.450 --> 00:53:27.770 OK, come on up. 00:53:27.770 --> 00:53:28.400 So malloc. 00:53:28.400 --> 00:53:29.355 And what's your name again? 00:53:29.355 --> 00:53:29.690 AUDIENCE: Ethan. 00:53:29.690 --> 00:53:30.140 DAVID MALAN: Ethan. 00:53:30.140 --> 00:53:30.710 OK. 00:53:30.710 --> 00:53:32.127 And what do we need to give Ethan? 00:53:32.127 --> 00:53:34.740 Ethan has two values or two fields. 00:53:34.740 --> 00:53:36.800 The first one is the number 2. 00:53:36.800 --> 00:53:38.240 Thank you. 00:53:38.240 --> 00:53:42.110 The next one is a pointer called next. 00:53:42.110 --> 00:53:43.980 Now, you're not pointing at anyone else. 00:53:43.980 --> 00:53:44.920 So you'll put it behind your back. 00:53:44.920 --> 00:53:47.030 And now what do we want to do with? [? Siobhana, ?] what do we have to do? 00:53:47.030 --> 00:53:47.863 AUDIENCE: Point to-- 00:53:47.863 --> 00:53:48.780 DAVID MALAN: Point to? 00:53:48.780 --> 00:53:49.400 AUDIENCE: 2. 00:53:49.400 --> 00:53:50.733 DAVID MALAN: Him, yes, number 2. 00:53:50.733 --> 00:53:55.070 OK, so this now represents the picture where we have list here, 2 here, 00:53:55.070 --> 00:53:56.430 but the null pointer as well. 00:53:56.430 --> 00:53:58.520 All right next we wanted to add 4 to the list. 00:53:58.520 --> 00:53:59.820 How do we go ahead and do this? 00:53:59.820 --> 00:54:01.862 Well, with 4, we're going to go ahead and malloc. 00:54:01.862 --> 00:54:03.230 malloc, all right. 00:54:03.230 --> 00:54:06.500 And now, Brian has a lovely number 4 for you and a pointer. 00:54:06.500 --> 00:54:08.744 What do we want to do with your pointer? 00:54:08.744 --> 00:54:10.022 AUDIENCE: Not point it. 00:54:10.022 --> 00:54:11.480 DAVID MALAN: Not point at anything. 00:54:11.480 --> 00:54:12.950 Now, it's a little more work. 00:54:12.950 --> 00:54:14.330 And I need a temporary variable. 00:54:14.330 --> 00:54:15.560 So I'll play this role. 00:54:15.560 --> 00:54:17.810 I'm going to go ahead and point at wherever Siobhana is pointing 00:54:17.810 --> 00:54:19.268 at in sort of unnaturally this way. 00:54:19.268 --> 00:54:20.060 That's OK. 00:54:20.060 --> 00:54:22.602 We couldn't get hands that point the other way physically. 00:54:22.602 --> 00:54:24.560 So we're going to point at the same thing here. 00:54:24.560 --> 00:54:25.643 You're both pointing at 2. 00:54:25.643 --> 00:54:28.435 And what am I looking for in order to decide where to put 4? 00:54:28.435 --> 00:54:30.105 AUDIENCE: If it's greater than. 00:54:30.105 --> 00:54:31.980 DAVID MALAN: If it's greater than some value. 00:54:31.980 --> 00:54:32.897 So I'm going to check. 00:54:32.897 --> 00:54:34.190 Well, 4 is greater than 2. 00:54:34.190 --> 00:54:35.060 So I'm going to keep going. 00:54:35.060 --> 00:54:35.870 And your name was Eric? 00:54:35.870 --> 00:54:36.190 AUDIENCE: Ethan. 00:54:36.190 --> 00:54:36.920 DAVID MALAN: Ethan, sorry. 00:54:36.920 --> 00:54:38.640 So, Ethan, what are you pointing at? 00:54:38.640 --> 00:54:39.140 Nothing. 00:54:39.140 --> 00:54:40.170 So that's an opportunity. 00:54:40.170 --> 00:54:41.240 There's nothing to his right. 00:54:41.240 --> 00:54:43.865 So let me go ahead and have Ethan point at-- what's you're name again? 00:54:43.865 --> 00:54:44.573 AUDIENCE: Athena. 00:54:44.573 --> 00:54:45.710 DAVID MALAN: Athena. 00:54:45.710 --> 00:54:47.390 Also, unnaturally, but that's fine. 00:54:47.390 --> 00:54:50.360 And so now does Athena need to update her pointer? 00:54:50.360 --> 00:54:51.110 No, she's good. 00:54:51.110 --> 00:54:52.440 She represents the end of the list. 00:54:52.440 --> 00:54:54.100 So her pointer can stay behind her back. 00:54:54.100 --> 00:54:55.725 All right, let's go ahead and malloc 5. 00:54:55.725 --> 00:54:57.080 You want to be our 5? 00:54:57.080 --> 00:54:58.250 So now we need a 5. 00:54:58.250 --> 00:55:01.070 So we need to hand you the number 5. 00:55:01.070 --> 00:55:02.280 And what's your name again? 00:55:02.280 --> 00:55:02.780 AUDIENCE: Sarika. 00:55:02.780 --> 00:55:03.613 DAVID MALAN: Sarika. 00:55:03.613 --> 00:55:06.182 All right, so Sarika's holding the number 5. 00:55:06.182 --> 00:55:08.140 She also is going to get a pointer called next. 00:55:08.140 --> 00:55:09.650 What should Sarika be pointing at? 00:55:09.650 --> 00:55:10.430 AUDIENCE: Nothing. 00:55:10.430 --> 00:55:10.910 DAVID MALAN: Nothing. 00:55:10.910 --> 00:55:13.070 And now how to do I insert her into the right place? 00:55:13.070 --> 00:55:14.400 Well, I have to do the same thing. 00:55:14.400 --> 00:55:16.670 So I'm going to get involved again and be a temporary variable. 00:55:16.670 --> 00:55:19.503 I'm going to point at the same thing [? Siobhana ?] is pointing out, 00:55:19.503 --> 00:55:20.130 which is Ethan. 00:55:20.130 --> 00:55:21.920 I'm going to follow this and see, ooh, wait a minute, 00:55:21.920 --> 00:55:23.470 he's actually pointing at someone else. 00:55:23.470 --> 00:55:24.530 So I'm going to follow that. 00:55:24.530 --> 00:55:25.370 It's still number 4. 00:55:25.370 --> 00:55:26.210 So I want to keep going. 00:55:26.210 --> 00:55:27.055 Oh, wait a minute. 00:55:27.055 --> 00:55:28.430 Athena is not pointing at anyone. 00:55:28.430 --> 00:55:32.125 This is an opportunity now to have Athena point at 5 and voila. 00:55:32.125 --> 00:55:34.000 But are you going to change your pointer yet? 00:55:34.000 --> 00:55:34.610 No. 00:55:34.610 --> 00:55:36.318 Now things get a little more interesting. 00:55:36.318 --> 00:55:38.345 Could we go ahead and malloc 1? 00:55:38.345 --> 00:55:39.510 And what's your name again? 00:55:39.510 --> 00:55:39.780 AUDIENCE: Emma. 00:55:39.780 --> 00:55:40.530 DAVID MALAN: Emma. 00:55:40.530 --> 00:55:42.810 OK, Emma, we have the number 1 for you from Brian. 00:55:42.810 --> 00:55:46.980 You have a pointer, which should be initialized as well to null. 00:55:46.980 --> 00:55:49.713 And now we have a couple of steps involved here. 00:55:49.713 --> 00:55:50.880 What do we want to do first? 00:55:53.395 --> 00:55:54.270 What's your proposal? 00:55:54.270 --> 00:55:55.080 AUDIENCE: Temporary pointer. 00:55:55.080 --> 00:55:55.890 DAVID MALAN: Temporary pointer. 00:55:55.890 --> 00:55:58.050 So I'm going to point at the same things [? Siobhana ?] is pointing at, 00:55:58.050 --> 00:55:59.430 which is Ethan here. 00:55:59.430 --> 00:56:01.320 But I see that 2 is greater than 1. 00:56:01.320 --> 00:56:02.880 So what do I actually want to do? 00:56:02.880 --> 00:56:05.422 Well, let me incorrectly for a moment-- [? Siobhana, ?] could 00:56:05.422 --> 00:56:06.532 you point at number 1? 00:56:06.532 --> 00:56:07.740 What have we just done wrong? 00:56:07.740 --> 00:56:09.310 We've orphaned everyone else. 00:56:09.310 --> 00:56:11.790 And even more visibly now, no one is pointing 00:56:11.790 --> 00:56:15.270 at Ethan or beyond, which means we've just leaked that memory, never 00:56:15.270 --> 00:56:16.383 to be recovered or free. 00:56:16.383 --> 00:56:17.550 So we don't want to do that. 00:56:17.550 --> 00:56:18.870 Undo, Control-Z. 00:56:18.870 --> 00:56:20.132 What do we want to do instead? 00:56:20.132 --> 00:56:21.090 What's your name again? 00:56:21.090 --> 00:56:21.430 AUDIENCE: Emma. 00:56:21.430 --> 00:56:22.180 DAVID MALAN: Emma. 00:56:22.180 --> 00:56:23.430 What do you want to point at? 00:56:23.430 --> 00:56:25.685 AUDIENCE: I want to point at [? Siobhana. ?] 00:56:25.685 --> 00:56:27.810 DAVID MALAN: At that the same thing, [? Siobhana ?] 00:56:27.810 --> 00:56:31.312 is pointing at, which is equivalent then to Ethan. 00:56:31.312 --> 00:56:32.770 So go ahead and do that with your-- 00:56:32.770 --> 00:56:34.650 OK, sort of like Twister now. 00:56:34.650 --> 00:56:35.610 That's OK. 00:56:35.610 --> 00:56:37.980 And then [? Siobhana, ?] what I do want to point at? 00:56:37.980 --> 00:56:38.700 Perfect. 00:56:38.700 --> 00:56:42.033 So again a bunch of steps involved, but it really is just two or three steps 00:56:42.033 --> 00:56:43.950 depending on which pointers we want to update. 00:56:43.950 --> 00:56:45.720 And then lastly, let's go head to malloc 3. 00:56:45.720 --> 00:56:46.440 And your name was again? 00:56:46.440 --> 00:56:47.150 AUDIENCE: Anurag. 00:56:47.150 --> 00:56:47.910 DAVID MALAN: Anurag. 00:56:47.910 --> 00:56:49.470 So then we have a 3 for you from Brian. 00:56:49.470 --> 00:56:50.553 We have a pointer for you. 00:56:50.553 --> 00:56:52.510 It's initialized initially to null. 00:56:52.510 --> 00:56:54.130 So you can put that behind your back. 00:56:54.130 --> 00:56:57.390 I'm going to point at the same thing as [? Siobhana. ?] And here we go. 00:56:57.390 --> 00:56:58.560 1 is smaller. 00:56:58.560 --> 00:56:59.670 2 is smaller. 00:56:59.670 --> 00:57:01.000 4 is larger. 00:57:01.000 --> 00:57:02.280 So let's get this right. 00:57:02.280 --> 00:57:06.249 And who do we want to point at whom first? 00:57:06.249 --> 00:57:08.570 AUDIENCE: 3 points at 4. 00:57:08.570 --> 00:57:10.190 DAVID MALAN: 3 should point at 4. 00:57:10.190 --> 00:57:11.190 So go ahead and do that. 00:57:11.190 --> 00:57:14.232 And you can step a little forward just so it looks a little less awkward. 00:57:14.232 --> 00:57:17.390 And then lastly, big finale, Ethan, who do you want to point at? 00:57:17.390 --> 00:57:18.500 Number 3. 00:57:18.500 --> 00:57:21.357 And thankfully, all these steps later, we have a linked list. 00:57:21.357 --> 00:57:23.190 We have wonderful souvenirs for each of you. 00:57:23.190 --> 00:57:24.020 We just need the numbers back. 00:57:24.020 --> 00:57:26.266 Thank you to our volunteers here if we could. 00:57:26.266 --> 00:57:28.760 OK, you can keep that. 00:57:28.760 --> 00:57:31.340 You can put the numbers on the desk here. 00:57:31.340 --> 00:57:35.480 So as these folks head off the stage, let me point out now one 00:57:35.480 --> 00:57:38.480 of the shortcomings of the approach-- thank you all so much-- 00:57:38.480 --> 00:57:40.220 of what we've just done here. 00:57:40.220 --> 00:57:44.870 Even though you all had the luxury of seeing 1 and 2 and 3 and 4 and 5 00:57:44.870 --> 00:57:47.750 and you could even sort of immediately figure out where things go, 00:57:47.750 --> 00:57:51.380 even these linked lists are implemented with chunks of memory. 00:57:51.380 --> 00:57:54.290 And just like with arrays, or equivalently lockers, 00:57:54.290 --> 00:57:56.840 the computer can only look at one piece of memory 00:57:56.840 --> 00:58:00.980 at a time, which is to say that to a computer, that same linked list 00:58:00.980 --> 00:58:02.030 kind of looks like this. 00:58:02.030 --> 00:58:05.150 It's sort of blind to the specific numbers in that linked list 00:58:05.150 --> 00:58:07.930 until it opens each of those doors of memory. 00:58:07.930 --> 00:58:13.670 So to find where 3 goes or to find where 5 goes or to find where 1 goes, 00:58:13.670 --> 00:58:17.360 all of those doors maximally might need to be opened one 00:58:17.360 --> 00:58:19.470 at a time to find that value. 00:58:19.470 --> 00:58:22.220 So with linked lists we have gained a feature. 00:58:22.220 --> 00:58:26.930 We have gained the ability to add dynamically to the list, right. 00:58:26.930 --> 00:58:30.020 I just kept malloc-ing, malloc-ing, malloc-ing additional students 00:58:30.020 --> 00:58:31.040 and additional numbers. 00:58:31.040 --> 00:58:33.590 So the list can grow as big as we wanted it to. 00:58:33.590 --> 00:58:34.730 And that case, we had five. 00:58:34.730 --> 00:58:38.630 We could have done it 50 times or 500 to add more and more numbers. 00:58:38.630 --> 00:58:43.250 That's an upside, because we don't have to waste time inserting into an array 00:58:43.250 --> 00:58:46.340 by resizing it and moving all of the original contents. 00:58:46.340 --> 00:58:49.850 None of our volunteers had to move, technically speaking, just 00:58:49.850 --> 00:58:51.800 to insert a number 5 or number 3. 00:58:51.800 --> 00:58:56.450 They just had to point at someone else in memory or someone else on stage. 00:58:56.450 --> 00:58:59.970 So if your data structure now is a linked list that looks like this, 00:58:59.970 --> 00:59:01.790 we've paid a price for that dynamism. 00:59:01.790 --> 00:59:07.400 We've paid a price for the ability to resize our list without moving 00:59:07.400 --> 00:59:09.500 everything around that's already there. 00:59:09.500 --> 00:59:14.630 What is a downside that you might perceive of a linked list? 00:59:14.630 --> 00:59:17.030 What have we lost or given up here? 00:59:17.030 --> 00:59:18.470 AUDIENCE: We lost random access. 00:59:18.470 --> 00:59:18.770 DAVID MALAN: Sorry. 00:59:18.770 --> 00:59:19.430 Say again. 00:59:19.430 --> 00:59:20.780 AUDIENCE: We lost random access. 00:59:20.780 --> 00:59:22.238 DAVID MALAN: We lost random access. 00:59:22.238 --> 00:59:23.090 That's spot on. 00:59:23.090 --> 00:59:24.420 We've lost random access. 00:59:24.420 --> 00:59:24.920 Why? 00:59:24.920 --> 00:59:28.250 Because the way you get from the beginning of the list to the end 00:59:28.250 --> 00:59:30.170 is by following these pointers, following 00:59:30.170 --> 00:59:33.440 these arrows, these breadcrumbs, you can't just jump to the middle elements, 00:59:33.440 --> 00:59:36.560 even though obviously this one here on the screen to all of us humans, 00:59:36.560 --> 00:59:37.700 that's the middle element. 00:59:37.700 --> 00:59:39.680 You don't know that if you're the computer. 00:59:39.680 --> 00:59:42.740 If the main variable that's storing this data structure 00:59:42.740 --> 00:59:44.900 is the pointer, like [? Siobhana, ?] pointing 00:59:44.900 --> 00:59:46.940 to the first elements of the list, you're 00:59:46.940 --> 00:59:49.490 going to have to follow all of these arrows, 00:59:49.490 --> 00:59:52.530 frankly counting them up and then retroactively realizing, 00:59:52.530 --> 00:59:53.440 oh, there was 5. 00:59:53.440 --> 00:59:55.490 I passed the middle one earlier. 00:59:55.490 --> 00:59:57.020 You've glossed random access. 00:59:57.020 --> 01:00:00.530 And what algorithm have we used wonderfully in the past 01:00:00.530 --> 01:00:02.270 when we do have random access? 01:00:02.270 --> 01:00:03.290 AUDIENCE: Binary search. 01:00:03.290 --> 01:00:04.430 DAVID MALAN: Binary search. 01:00:04.430 --> 01:00:08.570 So we've lost the ability now to do binary search as efficiently as we once 01:00:08.570 --> 01:00:09.500 were able to. 01:00:09.500 --> 01:00:12.260 And so if we consider now the running time of linked lists, 01:00:12.260 --> 01:00:14.120 unfortunately, we've paid that price. 01:00:14.120 --> 01:00:17.000 Searching now has gone back up to linear. 01:00:17.000 --> 01:00:20.467 We no longer have logarithmic running time because of the fact 01:00:20.467 --> 01:00:22.550 that we're stitching together this data structure. 01:00:22.550 --> 01:00:25.610 And the only way to find the end of the list, the middle of the list 01:00:25.610 --> 01:00:27.600 is to follow all of these arrows. 01:00:27.600 --> 01:00:29.510 You can't just jump to one location. 01:00:29.510 --> 01:00:33.290 Meanwhile inserting into the list, in the worst case, big O of n 01:00:33.290 --> 01:00:36.740 is going to be linear as well, because you have to walk through the whole list 01:00:36.740 --> 01:00:40.140 to actually find a spot for the given number, 01:00:40.140 --> 01:00:42.623 especially if you're trying to keep it sorted. 01:00:42.623 --> 01:00:44.540 So it would seem that even though we've gained 01:00:44.540 --> 01:00:46.742 this feature of much more dynamic insertion 01:00:46.742 --> 01:00:49.200 and we're building up something more interesting in memory, 01:00:49.200 --> 01:00:51.770 and you can imagine this just taking much less time overall, 01:00:51.770 --> 01:00:53.960 because you have to keep moving everything around 01:00:53.960 --> 01:00:56.570 like we did with realloc, it's unfortunately 01:00:56.570 --> 01:00:58.518 something we're paying a price for. 01:00:58.518 --> 01:00:59.310 But that was a lot. 01:00:59.310 --> 01:01:00.540 Let's go ahead and take our 5-minute break. 01:01:00.540 --> 01:01:01.415 Fruit awaits outside. 01:01:01.415 --> 01:01:02.770 And we'll be back. 01:01:02.770 --> 01:01:09.020 All right, so during break, I whipped up one final example of our list program. 01:01:09.020 --> 01:01:11.390 This one uses all of those building blocks. 01:01:11.390 --> 01:01:14.780 And let's see if we can't follow along pictorially and code-wise what 01:01:14.780 --> 01:01:17.220 it is we just built with all of these humans on stage. 01:01:17.220 --> 01:01:19.250 So here is list list3.c. 01:01:19.250 --> 01:01:20.260 It's available online. 01:01:20.260 --> 01:01:22.720 So you can follow along at home afterward if you'd like. 01:01:22.720 --> 01:01:25.762 And let's just walk through the lines that are written for us in advance. 01:01:25.762 --> 01:01:27.500 One, I'm using standard I/O for printf. 01:01:27.500 --> 01:01:30.350 And I'm using stdlib for malloc and free, 01:01:30.350 --> 01:01:32.660 our new friends that give us dynamic memory. 01:01:32.660 --> 01:01:37.760 Here is the definition of a node that again has a number inside of it 01:01:37.760 --> 01:01:41.730 and a pointer, specifically a pointer to another node structure. 01:01:41.730 --> 01:01:45.650 So that's what each of our humans represented, this time now in C. 01:01:45.650 --> 01:01:47.180 What is my main program going to do? 01:01:47.180 --> 01:01:49.490 Just for the sake of demonstration, the goal at hand 01:01:49.490 --> 01:01:53.480 is just to write a program that initializes a linked list to nothing 01:01:53.480 --> 01:01:56.960 initially, then adds a node with 1, then adds a node with 2, 01:01:56.960 --> 01:01:58.010 then add a node with 3. 01:01:58.010 --> 01:02:00.710 We'll keep it simple and not add 4 or 5 this time. 01:02:00.710 --> 01:02:01.950 So how am I going to do this? 01:02:01.950 --> 01:02:07.490 Well, on line 17, I propose that we create a variable called list 01:02:07.490 --> 01:02:09.110 and have it be the address of a node. 01:02:09.110 --> 01:02:10.907 So if I were to draw this now pictorially, 01:02:10.907 --> 01:02:12.740 it's going to be just like our demonstration 01:02:12.740 --> 01:02:16.010 a bit ago, where I have a rectangle here called list. 01:02:16.010 --> 01:02:17.970 And initially, it's not pointing to anything. 01:02:17.970 --> 01:02:20.840 So I'm just going to leave the box blank to represent NULL. 01:02:20.840 --> 01:02:23.210 So that's that line 17 right here. 01:02:23.210 --> 01:02:25.100 Now, let me go ahead and do the following. 01:02:25.100 --> 01:02:27.260 Add a number to the list as follows. 01:02:27.260 --> 01:02:29.930 Line 20 just gives me enough memory for a node. 01:02:29.930 --> 01:02:33.710 And it stores that memory's address in a variable called n. 01:02:33.710 --> 01:02:37.100 Lines 21 through 24 are just a safety check. 01:02:37.100 --> 01:02:38.060 Did anything go wrong? 01:02:38.060 --> 01:02:40.400 If so, just return 1 and stop the program. 01:02:40.400 --> 01:02:42.470 We ran out of memory for some reason. 01:02:42.470 --> 01:02:45.440 But these two lines now should look a little more familiar. 01:02:45.440 --> 01:02:49.310 This now is going to go ahead and install 1 and NULL into that structure 01:02:49.310 --> 01:02:49.980 as follows. 01:02:49.980 --> 01:02:50.960 So let's recap. 01:02:50.960 --> 01:02:55.220 This line here 20 is the same thing as allocating really 01:02:55.220 --> 01:02:58.220 a node that looks like this in memory that has two halves. 01:02:58.220 --> 01:03:01.730 One of those fields is called number, which I'll write there. 01:03:01.730 --> 01:03:03.770 The other field is called next. 01:03:03.770 --> 01:03:06.740 And then if we go back to the code, these two lines 01:03:06.740 --> 01:03:09.420 are all about just installing values in that structure. 01:03:09.420 --> 01:03:11.420 So if I go ahead to number and put the number 1, 01:03:11.420 --> 01:03:14.003 I'm not going to bother drawing anything for next, because I'm 01:03:14.003 --> 01:03:15.620 going to leave it implicitly as NULL. 01:03:15.620 --> 01:03:17.360 So that's what's going on now. 01:03:17.360 --> 01:03:18.740 What do I next want to do? 01:03:18.740 --> 01:03:21.470 Well, the last line of code here under this comment that 01:03:21.470 --> 01:03:24.530 says add number to list, I set list equal 01:03:24.530 --> 01:03:28.320 to n where n again is pointing at this new node. 01:03:28.320 --> 01:03:31.010 So that's the same thing as saying, well, list 01:03:31.010 --> 01:03:34.307 is going to go ahead and point at that new node. 01:03:34.307 --> 01:03:36.890 So after those lines of code, I've created a picture in memory 01:03:36.890 --> 01:03:39.450 that effectively looks like this. 01:03:39.450 --> 01:03:42.290 Now, let's go ahead and add the number 2 to the list. 01:03:42.290 --> 01:03:43.507 It's almost the same. 01:03:43.507 --> 01:03:46.340 So here's the chunk of code that's going to go and add a second node 01:03:46.340 --> 01:03:47.630 to the list, this time containing 2. 01:03:47.630 --> 01:03:48.830 Let's do it step by step. 01:03:48.830 --> 01:03:51.445 Line 30, I'm going to reuse n as my temporary variable. 01:03:51.445 --> 01:03:52.820 So I don't have to re-declare it. 01:03:52.820 --> 01:03:54.695 It's the same n as before, but it's now going 01:03:54.695 --> 01:03:57.200 to get a different address of memory thanks to malloc. 01:03:57.200 --> 01:03:59.310 So that gives me another box like this that I'm 01:03:59.310 --> 01:04:02.280 going to go ahead and draw like that with nothing in it initially. 01:04:02.280 --> 01:04:05.510 I'm going to make sure per lines 31 to 34 that nothing went wrong. 01:04:05.510 --> 01:04:07.170 But that's just as before. 01:04:07.170 --> 01:04:10.430 And now in lines 35 and 36, I'm going to put 2 in there and NULL. 01:04:10.430 --> 01:04:13.130 So let me go over there and let me go ahead and put 2 in there. 01:04:13.130 --> 01:04:14.780 And I'm going to leave NULL blank implicitly. 01:04:14.780 --> 01:04:15.990 That's the end of the list. 01:04:15.990 --> 01:04:19.820 But now I, of course, conceptually have to link the node for 1 01:04:19.820 --> 01:04:21.140 to the node for 2. 01:04:21.140 --> 01:04:25.700 And here's where C syntax, even though it's new, kind of finally makes sense. 01:04:25.700 --> 01:04:30.530 Notice here, I'm saying list arrow next equals NULL. 01:04:30.530 --> 01:04:32.840 That maps perfectly to the picture. 01:04:32.840 --> 01:04:37.910 List arrow x equals what? n. 01:04:37.910 --> 01:04:39.680 Well, n is this thing over here. 01:04:39.680 --> 01:04:41.450 So I just draw the arrow there. 01:04:41.450 --> 01:04:46.040 And so the code actually finally lines up even though it's new for today. 01:04:46.040 --> 01:04:48.480 So now I've drawn the picture as follows with 1 and 2. 01:04:48.480 --> 01:04:50.397 Let's go ahead and add a third and final node. 01:04:50.397 --> 01:04:53.640 This one containing the number 3, using these lines here. 01:04:53.640 --> 01:04:56.067 So line 40 gives me a new node with malloc. 01:04:56.067 --> 01:04:57.650 So that's going to give me a new node. 01:04:57.650 --> 01:04:59.030 I'll draw it as a rectangle over here. 01:04:59.030 --> 01:05:00.410 I'm drawing it left to right, but these things 01:05:00.410 --> 01:05:01.993 could be all over the place in memory. 01:05:01.993 --> 01:05:03.590 It doesn't matter where they end up. 01:05:03.590 --> 01:05:06.440 I'm going to go ahead and check as before that's 01:05:06.440 --> 01:05:07.915 it's not NULL, just to be safe. 01:05:07.915 --> 01:05:10.040 Then I'm going to go ahead and install the number 3 01:05:10.040 --> 01:05:11.943 and NULL in there just as before. 01:05:11.943 --> 01:05:13.610 So that means let's go ahead and draw 3. 01:05:13.610 --> 01:05:16.130 I'm going to leave that blank because it's going to be NULL. 01:05:16.130 --> 01:05:18.740 And then the last line, you wouldn't typically 01:05:18.740 --> 01:05:21.710 hard code this or write this explicitly in a program. 01:05:21.710 --> 01:05:23.840 This is a bit more verbose than you need to. 01:05:23.840 --> 01:05:27.860 Let me propose that you would probably use some kind of loop instead 01:05:27.860 --> 01:05:31.730 and walk through the data structure step by step as I proposed earlier. 01:05:31.730 --> 01:05:34.430 But if we really want to do this just for demonstration's sake, 01:05:34.430 --> 01:05:38.030 notice, start at list, follow an arrow and go to next. 01:05:38.030 --> 01:05:40.040 Follow another arrow and go to next. 01:05:40.040 --> 01:05:42.540 We can literally do that with our picture. 01:05:42.540 --> 01:05:43.460 So here we go. 01:05:43.460 --> 01:05:48.020 Let me start at list and follow an arrow and go to next. 01:05:48.020 --> 01:05:50.270 Follow an arrow, go to next. 01:05:50.270 --> 01:05:51.570 And now this is NULL. 01:05:51.570 --> 01:05:57.890 So what I want to update is exactly this, as with line 47, which said 01:05:57.890 --> 01:06:01.100 follow two arrows, look at two next fields 01:06:01.100 --> 01:06:05.077 interchangeably and then set it equal to n. 01:06:05.077 --> 01:06:06.410 All right, so what remains here? 01:06:06.410 --> 01:06:09.500 Well, this program's whole purpose in life was just to print a list out. 01:06:09.500 --> 01:06:11.600 Here's a way where you can actually use a for loop 01:06:11.600 --> 01:06:14.000 to iterate over a linked list. 01:06:14.000 --> 01:06:18.500 It's kind of funky because we don't have i and ints and i++ and so forth. 01:06:18.500 --> 01:06:21.500 But a for loop doesn't need to use integers or i's. 01:06:21.500 --> 01:06:24.860 Remember that before the first semicolon, you have initialization. 01:06:24.860 --> 01:06:27.440 In between the semicolons, you have a condition. 01:06:27.440 --> 01:06:29.923 And then you have an update that happens over here. 01:06:29.923 --> 01:06:32.840 So you'll get more experience with this with Problem Set 5 ultimately. 01:06:32.840 --> 01:06:35.300 But for today's purposes, high level, notice 01:06:35.300 --> 01:06:38.870 that this gives me a temporary pointer, like my big red hand earlier. 01:06:38.870 --> 01:06:40.342 That's a node star pointer. 01:06:40.342 --> 01:06:42.800 And that's why I was able to point with the big fuzzy hand. 01:06:42.800 --> 01:06:44.270 And I set that equal to list. 01:06:44.270 --> 01:06:48.800 So whatever the list was pointing at so was my temporary fuzzy hand. 01:06:48.800 --> 01:06:53.000 I'm going to follow the following loop and so long as temp does not equal 01:06:53.000 --> 01:06:53.540 NULL. 01:06:53.540 --> 01:06:55.880 So earlier when I was wearing the big fuzzy hand, 01:06:55.880 --> 01:06:57.470 I kept pointing, pointing, pointing. 01:06:57.470 --> 01:07:00.170 And I stopped once it equaled NULL. 01:07:00.170 --> 01:07:03.470 So this is saying keep doing the following until it equals NULL. 01:07:03.470 --> 01:07:04.580 What do I want to do? 01:07:04.580 --> 01:07:06.710 I want to just print out the integer that's 01:07:06.710 --> 01:07:10.550 inside of whatever I'm pointing at inside of it's number field. 01:07:10.550 --> 01:07:13.130 So go to whatever I'm pointing at, follow the arrow, 01:07:13.130 --> 01:07:14.360 and go to the number field. 01:07:14.360 --> 01:07:16.640 That's how we get at the data inside. 01:07:16.640 --> 01:07:20.220 Once I've printed that out, for loops say that you just update a variable. 01:07:20.220 --> 01:07:24.150 So what is that variable temp equals temp arrow next. 01:07:24.150 --> 01:07:26.510 So if my fuzzy hand is pointing at someone 01:07:26.510 --> 01:07:29.990 and I need to update it to point at temp arrow next, 01:07:29.990 --> 01:07:33.080 that means go to whatever I'm pointing at, follow the arrow. 01:07:33.080 --> 01:07:35.090 There's the next field and point at whatever 01:07:35.090 --> 01:07:37.567 the next field was pointing at. 01:07:37.567 --> 01:07:39.650 So you just keep updating what you're pointing at. 01:07:39.650 --> 01:07:40.850 That prints out the list. 01:07:40.850 --> 01:07:43.350 And then-- and we'll defer this ultimately to Problem 01:07:43.350 --> 01:07:46.580 Set 5-- we will need to free this memory. 01:07:46.580 --> 01:07:50.247 And actually you have to be a little clever about how you free memory, 01:07:50.247 --> 01:07:52.580 but I'm going to use a while loop there, which turns out 01:07:52.580 --> 01:07:55.010 to be a little cleaner, a little easier, ultimately 01:07:55.010 --> 01:07:57.710 to free all of this mess I made in my computer's memory. 01:07:57.710 --> 01:07:59.960 I kind of need to do the equivalent of freeing things, 01:07:59.960 --> 01:08:03.200 but I need to free what's behind me, not what's in front of me. 01:08:03.200 --> 01:08:07.500 Once you free memory, you should not touch it, traverse it, and so forth. 01:08:07.500 --> 01:08:11.002 But again more on that final note in P Set 5. 01:08:11.002 --> 01:08:13.210 All right, any questions on a high level on the code? 01:08:13.210 --> 01:08:15.200 It's fine if it looks quite new. 01:08:15.200 --> 01:08:18.170 We make it available so that you have a starting point when it comes 01:08:18.170 --> 01:08:22.520 to using this kind of code yourself. 01:08:22.520 --> 01:08:24.910 Any questions? 01:08:24.910 --> 01:08:26.660 All right, so someone came up during break 01:08:26.660 --> 01:08:29.930 and noted that this actually seems to be a regression in that arrays gave us 01:08:29.930 --> 01:08:32.930 the ability to resize, even though it was a little expensive because you 01:08:32.930 --> 01:08:35.149 got to copy everything from one place to another. 01:08:35.149 --> 01:08:39.090 But we had random access and therefore binary search and therefore 01:08:39.090 --> 01:08:42.149 logarithmic running time for things like searching assorted lists. 01:08:42.149 --> 01:08:43.399 We seem to have given that up. 01:08:43.399 --> 01:08:47.420 Linked lists give us dynamism where we and shrink things without wasting time 01:08:47.420 --> 01:08:48.890 by moving things around. 01:08:48.890 --> 01:08:50.540 But we've lost random access. 01:08:50.540 --> 01:08:51.470 But you know what? 01:08:51.470 --> 01:08:54.770 Now, that we have this ability using pointers and data structures to kind 01:08:54.770 --> 01:08:59.128 of stitch things together in memory, connect things with arrows, 01:08:59.128 --> 01:09:00.920 you know what, we can build fancier things. 01:09:00.920 --> 01:09:02.878 Most of you are probably familiar with the idea 01:09:02.878 --> 01:09:06.529 of a family tree, which is this hierarchical two dimensional structure. 01:09:06.529 --> 01:09:08.420 And indeed, that's our inspiration here. 01:09:08.420 --> 01:09:11.779 What if we don't just keep making one dimensional data structures, 01:09:11.779 --> 01:09:15.350 arrays that go left and right, linked lists that kind of go left to right? 01:09:15.350 --> 01:09:18.950 What if we actually use a vertical notion too and lay things 01:09:18.950 --> 01:09:20.060 out more interestingly. 01:09:20.060 --> 01:09:21.779 What can we gain from this? 01:09:21.779 --> 01:09:26.880 Well, let me propose that anytime we've seen an array, 01:09:26.880 --> 01:09:29.484 we can actually re-implement an array, but get 01:09:29.484 --> 01:09:32.359 the best of both worlds, the best of arrays, the best of linked lists 01:09:32.359 --> 01:09:33.000 as follows. 01:09:33.000 --> 01:09:37.250 Here is an array, back from Week 1 or even Week 0 01:09:37.250 --> 01:09:39.140 when we were searching behind doors. 01:09:39.140 --> 01:09:43.790 And here, Week 2, when we were searching behind doors, 01:09:43.790 --> 01:09:46.880 let's go ahead and note that if we were to do binary search on this 01:09:46.880 --> 01:09:50.720 looking for some value, as before, many times you look in the middle first. 01:09:50.720 --> 01:09:52.622 And then you decide, do you go left or right? 01:09:52.622 --> 01:09:55.580 And if you go left or right, you'd look in the middle element over here 01:09:55.580 --> 01:09:57.280 or the middle element over here. 01:09:57.280 --> 01:09:58.280 And then what do you do? 01:09:58.280 --> 01:10:01.010 You go left or right, looking at the middle element over here 01:10:01.010 --> 01:10:03.587 or over here or over here or over here. 01:10:03.587 --> 01:10:04.170 You know what? 01:10:04.170 --> 01:10:06.500 Let me just kind of explode this picture because all of this 01:10:06.500 --> 01:10:07.910 is happening in one dimension. 01:10:07.910 --> 01:10:11.150 We can actually think of this is happening really in two dimensions. 01:10:11.150 --> 01:10:13.850 Let me draw my same array, 1, 2, 3, 4, 5, 6, 01:10:13.850 --> 01:10:17.060 7, but let me represent it on different levels that's 01:10:17.060 --> 01:10:18.470 indicative of what's happening. 01:10:18.470 --> 01:10:19.550 I start in the middle. 01:10:19.550 --> 01:10:21.292 And I go left or I go right. 01:10:21.292 --> 01:10:23.000 I then go ahead and look at this element. 01:10:23.000 --> 01:10:25.160 And then I go left or I go right. 01:10:25.160 --> 01:10:28.160 So it's the same thing, but it's a two-dimensional rendition 01:10:28.160 --> 01:10:32.270 of what we've been doing for a few weeks whenever we've done binary search. 01:10:32.270 --> 01:10:34.220 Well, you know what this kind of looks like? 01:10:34.220 --> 01:10:37.550 It kind of looks like a linked list, albeit without the arrows. 01:10:37.550 --> 01:10:40.940 But you know what, I don't think I want to stitch this together from 1 to 2 01:10:40.940 --> 01:10:44.270 to 3 to 4 to 5 to 6 to 7, because that's just going to be a linked list. 01:10:44.270 --> 01:10:47.870 But what if I use my new-found familiarity with pointers, 01:10:47.870 --> 01:10:49.260 use a few more of them? 01:10:49.260 --> 01:10:53.240 So I spend more space and stitch this data structure together 01:10:53.240 --> 01:10:55.490 in two dimensions conceptually. 01:10:55.490 --> 01:10:57.800 Every node represented here is a rectangle. 01:10:57.800 --> 01:10:59.770 It doesn't have to have just one pointer. 01:10:59.770 --> 01:11:03.740 There's nothing stopping me from creating a new struct, a new definition 01:11:03.740 --> 01:11:05.870 of node that has two pointers. 01:11:05.870 --> 01:11:07.070 Maybe it's called left. 01:11:07.070 --> 01:11:08.660 Maybe it's called right. 01:11:08.660 --> 01:11:10.700 Previously, we had just one we called it next. 01:11:10.700 --> 01:11:12.992 But there's nothing stopping us from creating a fancier 01:11:12.992 --> 01:11:14.480 structure that actually has two. 01:11:14.480 --> 01:11:17.917 And so we might make it look not like this as before for a linked list, 01:11:17.917 --> 01:11:19.500 but let's get rid of the next pointer. 01:11:19.500 --> 01:11:20.750 Let's make a little more room. 01:11:20.750 --> 01:11:24.750 And let's actually give myself two pointers, left and right. 01:11:24.750 --> 01:11:27.320 And I claim that this structure now in C could 01:11:27.320 --> 01:11:32.810 be used to implement the tree that I just described, the family-like tree, 01:11:32.810 --> 01:11:37.170 more properly called a binary search tree, in the following way. 01:11:37.170 --> 01:11:39.260 This is a binary search tree. 01:11:39.260 --> 01:11:43.670 One, because every node in the tree has at most two children, 01:11:43.670 --> 01:11:45.980 hence the bi in binary, meaning maximally two. 01:11:45.980 --> 01:11:49.100 It has zero children, as like these down here. 01:11:49.100 --> 01:11:51.050 Or it has maximally two children. 01:11:51.050 --> 01:11:52.760 Hence, the bi in binary search tree. 01:11:52.760 --> 01:11:57.500 It's a search tree in the sense that I have taken care with this data 01:11:57.500 --> 01:11:59.660 to sort things properly. 01:11:59.660 --> 01:12:01.760 Notice the following definition. 01:12:01.760 --> 01:12:06.920 For any node in the tree, every element to the left is smaller than it. 01:12:06.920 --> 01:12:10.730 And every element to the right is greater than it. 01:12:10.730 --> 01:12:14.390 That's a recursive definition, because watch, look at this node. 01:12:14.390 --> 01:12:17.060 Everything to the left of it is smaller. 01:12:17.060 --> 01:12:19.730 Everything to the right of it is larger. 01:12:19.730 --> 01:12:20.750 Let's look at 6. 01:12:20.750 --> 01:12:22.850 Everything to the left of it is smaller. 01:12:22.850 --> 01:12:24.770 Everything to the right of it is larger. 01:12:24.770 --> 01:12:27.860 So it's recursive in the sense that no matter what node you look at, 01:12:27.860 --> 01:12:31.370 no matter what rectangle you look at, what I just said correctly 01:12:31.370 --> 01:12:36.800 is true of both the left child or subtree and the right child or subtree. 01:12:36.800 --> 01:12:39.917 So this is to say if you have a list of numbers, for instance, 01:12:39.917 --> 01:12:41.750 or a list of anything and you actually store 01:12:41.750 --> 01:12:45.590 them using nodes that look like this, but conceptually what you're really 01:12:45.590 --> 01:12:48.380 doing is stitching them together two dimensionally 01:12:48.380 --> 01:12:53.030 like this, guess what feature we just gain back? 01:12:53.030 --> 01:12:54.925 What have we just improved? 01:12:54.925 --> 01:12:56.300 I heard some murmuring over here. 01:12:56.300 --> 01:12:57.360 AUDIENCE: Binary search. 01:12:57.360 --> 01:12:59.235 DAVID MALAN: We've gotten back binary search. 01:12:59.235 --> 01:13:01.400 So we still have dynamism, like a linked list. 01:13:01.400 --> 01:13:02.690 We're still using pointers. 01:13:02.690 --> 01:13:05.570 And suppose we want to add the number 0 or the number 8, 01:13:05.570 --> 01:13:08.870 you could imagine 0 going over here, 8 going over here. 01:13:08.870 --> 01:13:12.080 So we could still just plug them in without having to move everything 01:13:12.080 --> 01:13:13.790 around like we would for an array. 01:13:13.790 --> 01:13:16.670 But because you're stitching things together with additional arrows 01:13:16.670 --> 01:13:20.690 wherever they are in memory, so long as you keep track of this data structure, 01:13:20.690 --> 01:13:23.740 called a tree, with one pointer to the so-called root-- 01:13:23.740 --> 01:13:27.040 the root being upside down in this world of computer science-- 01:13:27.040 --> 01:13:29.440 this is the root of this binary search tree, 01:13:29.440 --> 01:13:32.150 guess what you do if you're looking for the number 7? 01:13:32.150 --> 01:13:33.400 Well, you see 4. 01:13:33.400 --> 01:13:34.900 You know it's greater than 4. 01:13:34.900 --> 01:13:36.070 So what do you do? 01:13:36.070 --> 01:13:40.180 You move to the right, thereby ignoring the other half of this tree, 01:13:40.180 --> 01:13:42.880 just like the other half of the phone book in Week 0. 01:13:42.880 --> 01:13:45.550 Once you get to 6, you consider, I'm looking for 7. 01:13:45.550 --> 01:13:46.280 What do I know? 01:13:46.280 --> 01:13:47.740 It's got to be to the right. 01:13:47.740 --> 01:13:48.657 And so you go. 01:13:48.657 --> 01:13:50.740 The height of this tree happens to be logarithmic, 01:13:50.740 --> 01:13:53.140 for those familiar, log base 2 of n, which 01:13:53.140 --> 01:13:56.590 is to say I have 8 elements or 7 elements in this tree. 01:13:56.590 --> 01:14:01.630 But it only takes me 1, 2, 3 steps to find the value. 01:14:01.630 --> 01:14:07.040 It does not take big O of n, or a linear number of steps. 01:14:07.040 --> 01:14:09.460 And if you want your mind really to be blown here, 01:14:09.460 --> 01:14:13.762 it turns out this is actually the best application for recursion, 01:14:13.762 --> 01:14:15.970 which might have felt a little forced previously when 01:14:15.970 --> 01:14:17.980 we built Mario's pyramid with recursion where 01:14:17.980 --> 01:14:23.800 you did factorial or product or sum or something 01:14:23.800 --> 01:14:25.720 like that in section recursively. 01:14:25.720 --> 01:14:29.230 It turns out that now that we have data structures that exist conceptually 01:14:29.230 --> 01:14:32.040 in two dimensions that are recursively defined-- 01:14:32.040 --> 01:14:36.550 and by recursively defined, I mean for any given node, left is smaller, 01:14:36.550 --> 01:14:40.270 right is bigger, and you can make that statement about any node in the tree-- 01:14:40.270 --> 01:14:44.020 watch what we can do in terms of implementing binary search. 01:14:44.020 --> 01:14:47.200 If I have here a function called search, whose purpose in life 01:14:47.200 --> 01:14:50.650 is to return true or false if the number 50 is in the tree. 01:14:50.650 --> 01:14:52.210 How do you search a tree? 01:14:52.210 --> 01:14:53.860 Well, it takes as input the tree. 01:14:53.860 --> 01:14:56.027 More specifically, it takes the address of the tree. 01:14:56.027 --> 01:14:59.950 More specifically, it takes the address of the root of the tree. 01:14:59.950 --> 01:15:03.010 That is when you want to search a tree, you literally just hand it 01:15:03.010 --> 01:15:06.610 the address of the very first tip top node called the root. 01:15:06.610 --> 01:15:08.920 And from there, you can get everywhere else. 01:15:08.920 --> 01:15:11.720 Just like with the list, we just need the beginning of the list. 01:15:11.720 --> 01:15:13.930 So how do I go about searching a tree? 01:15:13.930 --> 01:15:15.820 Well, let's consider the easy case first. 01:15:15.820 --> 01:15:18.550 Suppose the address you're handed is null, 01:15:18.550 --> 01:15:21.970 what should you do if you're looking for 50, 01:15:21.970 --> 01:15:24.942 but you're handed the empty address, zeros? 01:15:24.942 --> 01:15:25.900 AUDIENCE: Return false. 01:15:25.900 --> 01:15:27.650 DAVID MALAN: Probably return false, right. 01:15:27.650 --> 01:15:30.820 If I hand you no tree and I say it's 50 in here, it's an easy answer. 01:15:30.820 --> 01:15:33.220 No, there's no 50, because there's no tree. 01:15:33.220 --> 01:15:36.220 So that's our base case, if you recall that nomenclature 01:15:36.220 --> 01:15:37.930 from our discussion of recursion. 01:15:37.930 --> 01:15:38.710 You hard code. 01:15:38.710 --> 01:15:42.460 You type in manually one explicit case that just gets you out of the program. 01:15:42.460 --> 01:15:47.530 Next case, if 50 is less than the tree, follow the arrow 01:15:47.530 --> 01:15:49.308 to the number field, then what do know? 01:15:49.308 --> 01:15:51.100 50 is less than the node you're looking at. 01:15:51.100 --> 01:15:53.033 What direction do you want to go conceptually? 01:15:53.033 --> 01:15:53.950 AUDIENCE: To the left. 01:15:53.950 --> 01:15:56.110 DAVID MALAN: You want to go to the left. 01:15:56.110 --> 01:16:01.930 So this line here searches the tree's left child, so to speak, 01:16:01.930 --> 01:16:04.390 in the family tree sense, the left subtree. 01:16:04.390 --> 01:16:06.310 So if we go back to the picture a moment ago, 01:16:06.310 --> 01:16:09.370 if I'm looking for 50 in that story-- or let's make it more real, 01:16:09.370 --> 01:16:11.920 if I'm looking for 1 in the current story, 01:16:11.920 --> 01:16:14.030 I see that 1 is less than the current node. 01:16:14.030 --> 01:16:17.110 So I go ahead and just search the left subtree. 01:16:17.110 --> 01:16:18.550 And notice, this is a tree. 01:16:18.550 --> 01:16:21.550 But so is this if you look at it in isolation. 01:16:21.550 --> 01:16:22.780 And so is this. 01:16:22.780 --> 01:16:25.180 And therein lies the recursive opportunity. 01:16:25.180 --> 01:16:28.263 So again here, if 50 is less than the tree's number, 01:16:28.263 --> 01:16:29.680 then go ahead and search the left. 01:16:29.680 --> 01:16:33.700 Else if 50 is greater than the tree's current number, search the right. 01:16:33.700 --> 01:16:38.650 Else logically what must be the case if the tree exists and it's not less 01:16:38.650 --> 01:16:41.740 than and it's not greater than the number you're looking at? 01:16:41.740 --> 01:16:44.920 It must equal the number you're looking for, 50, in which case 01:16:44.920 --> 01:16:45.940 we can return true. 01:16:45.940 --> 01:16:49.150 But you recall perhaps from scratch, we don't really need that explicit case. 01:16:49.150 --> 01:16:51.640 We can just call it else instead. 01:16:51.640 --> 01:16:53.707 Any questions then on this use of code? 01:16:53.707 --> 01:16:55.040 We won't actually run this code. 01:16:55.040 --> 01:16:57.010 But this is how you can implement recursively 01:16:57.010 --> 01:16:59.560 an algorithm that is reminiscent of Week 0 01:16:59.560 --> 01:17:03.190 searching for Mike Smith in the phone book, this time now searching a data 01:17:03.190 --> 01:17:07.410 structure that itself is recursive. 01:17:07.410 --> 01:17:09.910 All right, so what do we gain back in terms of running time, 01:17:09.910 --> 01:17:12.730 in terms of searching a binary search tree. 01:17:12.730 --> 01:17:16.000 To be clear, what's an upper bound on the running time? 01:17:16.000 --> 01:17:18.760 We're back to log n, which was the goal. 01:17:18.760 --> 01:17:22.277 And what about inserting into a binary search tree? 01:17:22.277 --> 01:17:25.360 This one we're going to defer to a higher level CS class, because it turns 01:17:25.360 --> 01:17:29.292 out you don't want to just go ahead and put 0 over there, and 8 over there, 01:17:29.292 --> 01:17:32.500 because if you keep doing that, putting smaller and smaller numbers or bigger 01:17:32.500 --> 01:17:35.860 and bigger numbers, you could imagine your tree getting very lanky, 01:17:35.860 --> 01:17:39.730 like very tall over here or maybe very tall over here 01:17:39.730 --> 01:17:42.820 and therefore not nearly as balanced as the tree we drew. 01:17:42.820 --> 01:17:45.190 And so it turns out there are algorithms that let 01:17:45.190 --> 01:17:47.528 you keep a binary search tree balanced. 01:17:47.528 --> 01:17:50.320 So even as you add elements to it, you kind of shift things around. 01:17:50.320 --> 01:17:51.838 You don't remove them in memory. 01:17:51.838 --> 01:17:54.130 You just update the pointer, so that the data structure 01:17:54.130 --> 01:17:56.000 itself does not get terribly high. 01:17:56.000 --> 01:18:00.190 But that too is log n, which means we had arrays, which gave us binary search 01:18:00.190 --> 01:18:01.822 capabilities in logarithmic time. 01:18:01.822 --> 01:18:04.780 We then introduced the linked list, which gave us dynamism, the ability 01:18:04.780 --> 01:18:06.670 to grow and, if we want, shrink. 01:18:06.670 --> 01:18:08.650 But we sacrifice binary search. 01:18:08.650 --> 01:18:11.860 But if we spend a little more space and use not one pointer for every node, 01:18:11.860 --> 01:18:16.090 but two, we can actually tip the scales again, spend more space 01:18:16.090 --> 01:18:20.770 and save time by searching the data structure, this time using 01:18:20.770 --> 01:18:23.420 something logarithmic. 01:18:23.420 --> 01:18:25.940 All right, so what would the ideal, though, be? 01:18:25.940 --> 01:18:28.100 Every time we talk about running time, it 01:18:28.100 --> 01:18:31.930 feels like we want to be low on this list and not high. 01:18:31.930 --> 01:18:33.300 n squared was slow. 01:18:33.300 --> 01:18:35.120 Big O of 1 is constant time. 01:18:35.120 --> 01:18:36.050 That's fast. 01:18:36.050 --> 01:18:38.150 Wouldn't it be nice throughout this story 01:18:38.150 --> 01:18:42.890 if we actually found our way to a data structure that gave us constant time? 01:18:42.890 --> 01:18:44.870 Like, my god, if we could just insert something 01:18:44.870 --> 01:18:48.560 into a data structure with one step and find something in a data structure 01:18:48.560 --> 01:18:51.200 with one step, that's sort of the holy grail, so to speak, 01:18:51.200 --> 01:18:54.530 because you don't have to worry about big O of n or big O of log n. 01:18:54.530 --> 01:18:57.082 You just jump immediately to the value you want. 01:18:57.082 --> 01:18:58.790 Well, it turns out, theoretically there's 01:18:58.790 --> 01:19:01.910 something that allows you to achieve that called a hash table. 01:19:01.910 --> 01:19:04.770 But how you implement that is not necessarily obvious. 01:19:04.770 --> 01:19:06.440 And it takes some expertise. 01:19:06.440 --> 01:19:09.140 And indeed, in Problem Set 5 among the goals at hand 01:19:09.140 --> 01:19:12.290 is to implement exactly this notion of a hash table that lets 01:19:12.290 --> 01:19:14.870 you spell check a document super fast. 01:19:14.870 --> 01:19:17.420 A word processing program would be so slow 01:19:17.420 --> 01:19:21.080 if every time you wanted to check a word for whether it's spelled correctly 01:19:21.080 --> 01:19:24.410 or incorrectly, if you had to search linearly or even longer rhythmically 01:19:24.410 --> 01:19:28.340 a big dictionary file, it might actually be really slow to spell check a file. 01:19:28.340 --> 01:19:31.570 But using a hash table, we can probably do much better. 01:19:31.570 --> 01:19:37.673 A hash table is a combination of an array and linked lists inside of it. 01:19:37.673 --> 01:19:40.340 So I'm going to go ahead and just for convenience draw my array, 01:19:40.340 --> 01:19:42.080 this time vertically instead of horizontally. 01:19:42.080 --> 01:19:42.870 But it's the same thing. 01:19:42.870 --> 01:19:44.690 And it's just an artist's rendition anyway. 01:19:44.690 --> 01:19:49.322 And suppose the goal at hand is to keep track efficiently of like a name tags. 01:19:49.322 --> 01:19:50.780 So maybe we're holding a big event. 01:19:50.780 --> 01:19:53.360 We've made some name tags in advance, which we indeed have. 01:19:53.360 --> 01:19:55.940 And we want people to be able to pick up these name tags super efficiently. 01:19:55.940 --> 01:19:58.100 It would be really annoying and pretty dumb 01:19:58.100 --> 01:19:59.870 if we just made a big stack and name tags, 01:19:59.870 --> 01:20:03.500 even if it's alphabetical, A to Z, then had everyone in the room line up 01:20:03.500 --> 01:20:06.620 and look through all of the darn name tags looking for their name. 01:20:06.620 --> 01:20:08.330 That's not a very inefficient system. 01:20:08.330 --> 01:20:11.630 Fortunately, we've come prepared with some buckets, all of which are labeled, 01:20:11.630 --> 01:20:14.297 because wouldn't it be nice if you're looking for your name tag, 01:20:14.297 --> 01:20:17.090 you don't look through the whole darn list of name tags or stack? 01:20:17.090 --> 01:20:18.800 You actually just go to your bucket. 01:20:18.800 --> 01:20:21.290 And you jump instantly to your name, where hopefully you're 01:20:21.290 --> 01:20:23.770 the only person with a name that starts with some letter. 01:20:23.770 --> 01:20:25.520 And then you can just reach in and get it. 01:20:25.520 --> 01:20:27.560 Well, how do we implement this conceptually? 01:20:27.560 --> 01:20:30.320 Well, it's very common with a hash table if the inputs 01:20:30.320 --> 01:20:34.730 are things like words or names to look at the characters in those words 01:20:34.730 --> 01:20:38.760 to decide where to put those names or those name tags, if you will. 01:20:38.760 --> 01:20:41.223 So here's an array of size 26, from 0 to 25. 01:20:41.223 --> 01:20:43.640 But, you know what, It's convenient to think of this array 01:20:43.640 --> 01:20:47.960 as maybe being indexed from A through Z. So still 26 buckets, 01:20:47.960 --> 01:20:53.930 but this array is really just of size 26, 0 through 25 ultimately. 01:20:53.930 --> 01:20:57.140 And suppose the goal at hand now is to go ahead and store these name 01:20:57.140 --> 01:20:58.080 tags in advance. 01:20:58.080 --> 01:20:59.780 So this is what the staff and I would do in advance. 01:20:59.780 --> 01:21:01.490 And, Brian, if you wouldn't mind helping out with this. 01:21:01.490 --> 01:21:04.490 The goal at hand is quite simply to get the name tags ready for students 01:21:04.490 --> 01:21:05.130 to pick up. 01:21:05.130 --> 01:21:07.970 And so where do I want to go ahead and put the first one? 01:21:07.970 --> 01:21:10.250 So Albus is the first one whose name tag we made. 01:21:10.250 --> 01:21:12.500 I'm going to go ahead and jump immediately to bucket 0 01:21:12.500 --> 01:21:15.050 and put Albus's name right there in one step. 01:21:15.050 --> 01:21:17.420 Meanwhile I've got Zacharias, and so even 01:21:17.420 --> 01:21:21.390 though it's taking me a bunch of steps to go over here, if this is an array, 01:21:21.390 --> 01:21:25.100 I have random access, as a human, and so I can immediately, 01:21:25.100 --> 01:21:27.590 instantly put Zacharias over there. 01:21:27.590 --> 01:21:30.260 It's a little laborious for my feet, but a computer could just 01:21:30.260 --> 01:21:33.410 jump to 0 or 25 or anything in between. 01:21:33.410 --> 01:21:34.720 All right, so Hermione-- 01:21:34.720 --> 01:21:37.428 maybe you're noticing the pattern-- so Hermione is going to be H, 01:21:37.428 --> 01:21:39.710 or which is 7, which is going to be over here. 01:21:39.710 --> 01:21:41.450 Ginny is 6, which is over here. 01:21:41.450 --> 01:21:43.830 Ron is 17, which is over here. 01:21:43.830 --> 01:21:47.870 So think of each of my multiple steps taking actually one step. 01:21:47.870 --> 01:21:49.885 Fred is going to go over here. 01:21:49.885 --> 01:21:52.010 As an aside, the staff and I discussed this morning 01:21:52.010 --> 01:21:54.427 how we probably should've put the buckets closer together. 01:21:54.427 --> 01:21:55.670 But that's OK. 01:21:55.670 --> 01:21:57.560 Severus is going to go over here. 01:21:57.560 --> 01:21:59.150 Petunia is going to go over here. 01:21:59.150 --> 01:22:03.430 Draco is way over here, but doesn't matter, constant time, bracket 3. 01:22:03.430 --> 01:22:05.930 James is bracket 9. 01:22:05.930 --> 01:22:09.620 Cedric is bracket 2. 01:22:09.620 --> 01:22:11.390 Perhaps play this part in 2x speed. 01:22:11.390 --> 01:22:13.850 Luna is bucket 11. 01:22:13.850 --> 01:22:15.980 Neville bucket 13. 01:22:15.980 --> 01:22:18.690 Kingsley bucket 10. 01:22:18.690 --> 01:22:20.570 Kingsley, there we go. 01:22:20.570 --> 01:22:22.580 Minerva bucket 12. 01:22:22.580 --> 01:22:25.095 Vernon-- ironically, we don't actually need this many names 01:22:25.095 --> 01:22:26.720 to make the point we're trying to make. 01:22:26.720 --> 01:22:30.230 But Vernon-- we got a little carried away with the names we recognized. 01:22:30.230 --> 01:22:31.877 And now, the list is pretty full. 01:22:31.877 --> 01:22:33.710 All right, so that's a whole bunch of names. 01:22:33.710 --> 01:22:36.110 I filled up most of the buckets with a name tag. 01:22:36.110 --> 01:22:37.400 But-- why am I out of breath? 01:22:37.400 --> 01:22:43.550 But what's really convenient now is that if Cedric or Albus or Draco or Fred 01:22:43.550 --> 01:22:47.180 or Ginny come into the room, they can index instantly, 01:22:47.180 --> 01:22:50.030 randomly, to their pocket, get their name tag, and go. 01:22:50.030 --> 01:22:50.680 Nothing linear. 01:22:50.680 --> 01:22:53.180 They don't have to flip through the whole stack of name tags 01:22:53.180 --> 01:22:55.800 with which I actually began the story. 01:22:55.800 --> 01:22:57.920 But there's a problem ahead. 01:22:57.920 --> 01:23:01.175 We very deliberately ordered the name tags thus far in such a way 01:23:01.175 --> 01:23:03.050 that we don't create a problem for ourselves. 01:23:03.050 --> 01:23:06.770 But among the more famous characters we've not heard from yet is Harry. 01:23:06.770 --> 01:23:08.840 So Harry's name tag is still here. 01:23:08.840 --> 01:23:09.990 Where does this go? 01:23:09.990 --> 01:23:12.470 Well, Harry is going to go in bucket 7. 01:23:12.470 --> 01:23:15.230 But wait a minute, there's already someone there. 01:23:15.230 --> 01:23:16.880 So what do I do? 01:23:16.880 --> 01:23:20.180 If I were only using an array, Harry's kind of out of luck. 01:23:20.180 --> 01:23:23.430 Like Hermione is already in that location in the array. 01:23:23.430 --> 01:23:26.520 And we would have to decide, either Hermione goes there or Harry, 01:23:26.520 --> 01:23:28.110 but we can't just put them both. 01:23:28.110 --> 01:23:32.130 But if we implement this new data structure called a hash table using 01:23:32.130 --> 01:23:37.485 an array that's conceptually vertical, but that horizontally is a linked list, 01:23:37.485 --> 01:23:38.610 you know what, that's fine. 01:23:38.610 --> 01:23:41.707 We're just going to go ahead and link Hermione's and Harry's together. 01:23:41.707 --> 01:23:44.790 So, yes, it's going to take both of them or one of them at least two steps 01:23:44.790 --> 01:23:45.960 to find their name tag. 01:23:45.960 --> 01:23:49.020 But it's not going to take big O of n steps to find their name tag, 01:23:49.020 --> 01:23:51.210 at least if there's only two in this bucket. 01:23:51.210 --> 01:23:53.460 All right, Hagrid, dammit, so he came in the door too. 01:23:53.460 --> 01:23:55.780 So now that linked list is getting a little longer. 01:23:55.780 --> 01:23:59.010 We now have a chain, if you will, a linked list of size 3. 01:23:59.010 --> 01:24:02.400 Sirius is going to go over here in bucket 18. 01:24:02.400 --> 01:24:04.260 But Severus is already there too. 01:24:04.260 --> 01:24:05.100 Awkward. 01:24:05.100 --> 01:24:08.730 Remus is 17. 01:24:08.730 --> 01:24:11.510 Remus is going to go and link together with Ron there. 01:24:11.510 --> 01:24:14.880 George is going to go into bucket 6, which is over here. 01:24:14.880 --> 01:24:19.107 Lily is also going to collide, so to speak with Luna. 01:24:19.107 --> 01:24:20.940 And this is a collision in computer science. 01:24:20.940 --> 01:24:23.460 Anytime you have a value that you're trying to put in one place 01:24:23.460 --> 01:24:26.418 but there's something there, you need to resolve the collision somehow. 01:24:26.418 --> 01:24:28.980 So I'm proposing that we actually just link these together. 01:24:28.980 --> 01:24:33.630 Or as we're doing here, to bucketize values in computer science conceptually 01:24:33.630 --> 01:24:37.740 means to throw the value into a bucket, or physically as we've done here. 01:24:37.740 --> 01:24:40.270 Lucius finally is going to go in bucket 11 too. 01:24:40.270 --> 01:24:44.080 And lastly, Lavender goes in that same bucket. 01:24:44.080 --> 01:24:44.580 Phew. 01:24:44.580 --> 01:24:46.860 So thank you to Brian for helping choreograph that. 01:24:46.860 --> 01:24:51.270 So this structure that you're looking at is what is called a hash table. 01:24:51.270 --> 01:24:56.160 It is an array that you index into using what's called a hash function. 01:24:56.160 --> 01:24:59.940 A hash function is like any function that we've seen thus far, 01:24:59.940 --> 01:25:01.500 any program we've seen thus far-- 01:25:01.500 --> 01:25:04.780 something that takes input and produces output. 01:25:04.780 --> 01:25:06.900 So if we consider our original picture from Week 0 01:25:06.900 --> 01:25:09.900 of what computer science in itself is when it comes to solving problems, 01:25:09.900 --> 01:25:13.320 hash function for today's purpose it's just this function, this process, 01:25:13.320 --> 01:25:16.600 this algorithm in between that decides, given a name 01:25:16.600 --> 01:25:19.560 tag, what bucket to put that name tag in. 01:25:19.560 --> 01:25:21.990 And quite obviously in the real world, what algorithm 01:25:21.990 --> 01:25:25.770 was I using to bucketize a name tag upon reading the name? 01:25:25.770 --> 01:25:27.060 AUDIENCE: First letters. 01:25:27.060 --> 01:25:28.300 DAVID MALAN: Looking at the first letter. 01:25:28.300 --> 01:25:28.800 Why? 01:25:28.800 --> 01:25:29.650 It's simple. 01:25:29.650 --> 01:25:30.690 It's pretty efficient. 01:25:30.690 --> 01:25:34.140 It means I can store a relatively small array of size 26 01:25:34.140 --> 01:25:37.060 and just immediately put the name tags there. 01:25:37.060 --> 01:25:41.040 So in this case, we might have fed in Albus to that hash function. 01:25:41.040 --> 01:25:44.970 And it might return 0, representing A, if we're 0 indexing the array. 01:25:44.970 --> 01:25:48.090 Or for someone like Zacharias, we might get out 25 01:25:48.090 --> 01:25:50.130 just because the first letter of his name is z. 01:25:50.130 --> 01:25:52.290 But this is kind of simplistic, right. 01:25:52.290 --> 01:25:53.350 And we've seen a problem. 01:25:53.350 --> 01:25:57.090 What is the problem with just looking, of course, at the users first letter 01:25:57.090 --> 01:25:59.310 of their name? 01:25:59.310 --> 01:26:00.320 What problem arose? 01:26:00.320 --> 01:26:00.500 Yeah. 01:26:00.500 --> 01:26:03.300 AUDIENCE: There might be more than one name with the first letter. 01:26:03.300 --> 01:26:05.520 DAVID MALAN: There might be more than one name with the first letter. 01:26:05.520 --> 01:26:06.720 And you know in the extreme-- 01:26:06.720 --> 01:26:08.610 and computer scientists and software engineers 01:26:08.610 --> 01:26:09.860 often think about the extreme. 01:26:09.860 --> 01:26:10.870 What is the corner case? 01:26:10.870 --> 01:26:11.950 What could go wrong? 01:26:11.950 --> 01:26:14.730 What if by chance there's just a lot of characters 01:26:14.730 --> 01:26:18.900 in this universe whose names start with h or l, and maybe all of their names 01:26:18.900 --> 01:26:20.550 just happened to start with h or l? 01:26:20.550 --> 01:26:22.380 It doesn't matter how fancy your hash table 01:26:22.380 --> 01:26:26.130 is, it's pretty stupid if all of the name tags are stacked up in a bucket. 01:26:26.130 --> 01:26:28.260 So in that sense, a hash table, even though this 01:26:28.260 --> 01:26:32.040 feels like it's pretty efficient, in the worst case, big O of n, 01:26:32.040 --> 01:26:34.170 when it comes to inserting and searching, 01:26:34.170 --> 01:26:37.380 because you could just get unlucky and get a huge stack of names 01:26:37.380 --> 01:26:40.860 that by nature of the class just all start with the same letter. 01:26:40.860 --> 01:26:42.855 So how can we mitigate this? 01:26:42.855 --> 01:26:43.980 How could we mitigate this? 01:26:43.980 --> 01:26:45.630 Well, you know what, rather than naively only 01:26:45.630 --> 01:26:48.475 looking at the first name, let's leverage some probabilities here. 01:26:48.475 --> 01:26:50.850 Why don't we look not at just the first letter, but maybe 01:26:50.850 --> 01:26:52.532 the first two letters? 01:26:52.532 --> 01:26:54.240 I bet if we look at the first two letters 01:26:54.240 --> 01:26:55.948 we're not going to get as many collisions 01:26:55.948 --> 01:26:58.260 as many people belonging to the same bucket. 01:26:58.260 --> 01:27:01.590 So Hermione, Harry, and Hagrid was a problem we identified earlier, 01:27:01.590 --> 01:27:03.000 not to mention a few other names. 01:27:03.000 --> 01:27:07.733 But that was because we were looking only at h for the hash function, only 01:27:07.733 --> 01:27:09.150 at the first letter in their name. 01:27:09.150 --> 01:27:11.040 What if instead we look at the first two, 01:27:11.040 --> 01:27:16.110 so we have a bucket for HA, HB, HC, HD, HE, HF? 01:27:16.110 --> 01:27:18.870 And so Hermione now goes in this bucket specifically. 01:27:18.870 --> 01:27:20.370 So we're going to need more buckets. 01:27:20.370 --> 01:27:21.600 And they're not pictured on the screen. 01:27:21.600 --> 01:27:23.433 And they're also not pictured here on stage. 01:27:23.433 --> 01:27:25.380 We need more than 26 buckets. 01:27:25.380 --> 01:27:27.900 Frankly, if we're looking at two letters, we need 26 times 01:27:27.900 --> 01:27:30.690 26, like 676 buckets now. 01:27:30.690 --> 01:27:33.330 So more space, but we're hopefully going to decrease 01:27:33.330 --> 01:27:34.690 the probability of collisions. 01:27:34.690 --> 01:27:35.340 Why? 01:27:35.340 --> 01:27:37.540 Well, the next name I might put in here is Harry. 01:27:37.540 --> 01:27:39.780 He's going to end up in a different bucket this time. 01:27:39.780 --> 01:27:42.600 That's great, because it would seem that now I can get access 01:27:42.600 --> 01:27:44.940 to his name tag in constant time. 01:27:44.940 --> 01:27:47.530 Unfortunately, Hagrid is still in the story. 01:27:47.530 --> 01:27:49.650 And so we're going to have a collision with HA. 01:27:49.650 --> 01:27:52.920 So even looking at the first two letters is not ideal. 01:27:52.920 --> 01:27:57.445 So even though we have 676 buckets in this story, 26 times 26, which 01:27:57.445 --> 01:27:59.820 is a lot of buckets, we're still going to get collisions. 01:27:59.820 --> 01:28:02.767 So what would maybe the next evolution of this idea be? 01:28:02.767 --> 01:28:05.850 Well, don't look at the first letter, don't look at the first two letters. 01:28:05.850 --> 01:28:07.300 Why don't we look at the first three letters. 01:28:07.300 --> 01:28:09.720 Surely, that's going to drive down the probability. 01:28:09.720 --> 01:28:13.410 Unfortunately, that's going to drive up the number of cells in the array 01:28:13.410 --> 01:28:17.715 and buckets on stage to 10,000 plus buckets this time around. 01:28:17.715 --> 01:28:18.840 So that's a lot of buckets. 01:28:18.840 --> 01:28:27.630 But suppose we use not HA, but maybe HAA, HAB, HAC, HAD, HAE, HAF, HAG, dot 01:28:27.630 --> 01:28:35.340 dot dot, HAQ, HAR, HAS, dot dot dot, HEQ, HER, HES. 01:28:35.340 --> 01:28:38.280 So we have a lot of buckets and even more in between not pictured. 01:28:38.280 --> 01:28:43.170 Now we can go ahead and hash on Harry's name, Hagrid's name, Hermione's name. 01:28:43.170 --> 01:28:47.385 And this time, by design, they're going to end up in different buckets, which 01:28:47.385 --> 01:28:48.510 seems to be an improvement. 01:28:48.510 --> 01:28:51.870 And indeed, it is, because now if I go searching 01:28:51.870 --> 01:28:56.160 for Hermione or Hagrid or Harry's name tag, or they do themselves, 01:28:56.160 --> 01:29:00.270 they're going to be able to find it in constant time. 01:29:00.270 --> 01:29:03.990 But that's assuming there's not a lot of other kids with the name starting 01:29:03.990 --> 01:29:04.800 with H. 01:29:04.800 --> 01:29:08.468 And so a hash table still technically is big O of n 01:29:08.468 --> 01:29:10.260 because you could just get unlucky and have 01:29:10.260 --> 01:29:14.040 a big pile up of similar inputs, all of which produce the same output, 01:29:14.040 --> 01:29:17.253 even if you're using a fancier hash function like this. 01:29:17.253 --> 01:29:18.420 And there's a trade off too. 01:29:18.420 --> 01:29:22.483 My god, we're using like almost 20,000 buckets now just 01:29:22.483 --> 01:29:24.150 to store these names to speed things up. 01:29:24.150 --> 01:29:26.430 At some point, you know, it's probably cheaper 01:29:26.430 --> 01:29:29.730 to just let Harry and Hermione and Hagrid form a line 01:29:29.730 --> 01:29:31.480 and find their name tag more slowly. 01:29:31.480 --> 01:29:33.443 So there's this trade-off of time and space. 01:29:33.443 --> 01:29:35.610 But if you have what's called an ideal hash function 01:29:35.610 --> 01:29:38.430 and you figure out some magical algorithm written 01:29:38.430 --> 01:29:42.630 in code that ensures uniqueness that no name tag will end up 01:29:42.630 --> 01:29:46.830 colliding with another, then you can achieve this holy grail of big O 01:29:46.830 --> 01:29:50.080 of one time, constant time for a hash function. 01:29:50.080 --> 01:29:53.160 So it's this sort of tension between how much space do you want to spend? 01:29:53.160 --> 01:29:54.780 How much effort do you want to spend figuring out 01:29:54.780 --> 01:29:56.190 what that ideal hash function is? 01:29:56.190 --> 01:29:58.620 So in the real world, and we'll see this in Python, 01:29:58.620 --> 01:30:01.560 most computer systems give you a best effort, such 01:30:01.560 --> 01:30:04.920 that a hash table is not big O of n usually. 01:30:04.920 --> 01:30:07.110 It's actually, on average much much, much faster, 01:30:07.110 --> 01:30:09.760 even though there's a theoretical risk that it can be slow. 01:30:09.760 --> 01:30:11.760 And more on that too in a higher level CS course 01:30:11.760 --> 01:30:14.770 where you explore data structures and algorithms more formally. 01:30:14.770 --> 01:30:17.130 So technically speaking, it feels like search 01:30:17.130 --> 01:30:20.100 could get down to big O of 1, constant time, 01:30:20.100 --> 01:30:22.620 if every name tag ends up in a unique bucket. 01:30:22.620 --> 01:30:25.440 But you could still get unlucky if there's a lot of H names 01:30:25.440 --> 01:30:27.010 or L names or the like. 01:30:27.010 --> 01:30:29.910 So technically speaking, a hash table is big O of n. 01:30:29.910 --> 01:30:35.670 But, frankly, three names in a bucket, like Hermione, Hagrid, and Harry, 01:30:35.670 --> 01:30:38.880 is much better than n names in the same bucket. 01:30:38.880 --> 01:30:42.480 So even in the real world if you get rid of this asymptotic hand waviness, 01:30:42.480 --> 01:30:43.160 that's faster. 01:30:43.160 --> 01:30:44.910 That's much faster than putting everything 01:30:44.910 --> 01:30:47.940 in a linked list or an array itself. 01:30:47.940 --> 01:30:53.712 All right, so from there, I bet we can try one other approach here. 01:30:53.712 --> 01:30:55.920 There's another data structure we want to present to, 01:30:55.920 --> 01:30:57.690 not in code, but in pictures. 01:30:57.690 --> 01:30:59.820 This one's called a trie. 01:30:59.820 --> 01:31:02.430 Short for retrieval, even though it's pronounced differently. 01:31:02.430 --> 01:31:05.640 A trie is a data structure that actually is pretty amazing. 01:31:05.640 --> 01:31:09.630 And it follows this pattern of spending one resource to save on another. 01:31:09.630 --> 01:31:12.090 A trie is going to use a lot more memory, 01:31:12.090 --> 01:31:15.090 but it is going to give us actual constant time 01:31:15.090 --> 01:31:19.627 lookup for things like names or words being inserted into the structure. 01:31:19.627 --> 01:31:20.710 So what does it look like? 01:31:20.710 --> 01:31:22.650 It's a little strong, because we need to leave room for ourselves 01:31:22.650 --> 01:31:24.360 on the board with lots of memory. 01:31:24.360 --> 01:31:30.720 A trie is a tree, each of whose nodes is essentially an array. 01:31:30.720 --> 01:31:31.980 So notice the pattern here. 01:31:31.980 --> 01:31:34.230 Computer scientists over time have been kind of clever 01:31:34.230 --> 01:31:37.800 taking this idea, this idea, mashing them together and creating some monster 01:31:37.800 --> 01:31:41.890 data structure, but that gives you some savings of time or space. 01:31:41.890 --> 01:31:44.880 So this array at the very top represents the roots 01:31:44.880 --> 01:31:48.600 of this trie, which again is a tree whose nodes are arrays. 01:31:48.600 --> 01:31:51.510 And notice that the array is of size 26, for the sake of discussion, 01:31:51.510 --> 01:31:53.910 A through Z, or 0 through 25. 01:31:53.910 --> 01:31:55.170 A trie does this. 01:31:55.170 --> 01:31:59.400 If you want to store a name in a trie, what you do, in this case, 01:31:59.400 --> 01:32:03.250 is look at every letter in the word in question. 01:32:03.250 --> 01:32:04.892 So for Harry' it would be H-a-r-r-y. 01:32:04.892 --> 01:32:07.350 We're not just looking at the first, the second, and third. 01:32:07.350 --> 01:32:08.740 We're looking at all of them. 01:32:08.740 --> 01:32:10.260 And what we do is this. 01:32:10.260 --> 01:32:13.650 Suppose the first letter in the person's name or their name tag or the word 01:32:13.650 --> 01:32:17.250 more generally is an H. You go ahead and go to that index. 01:32:17.250 --> 01:32:21.210 And if there's no child node, there's no tree yet below it, 01:32:21.210 --> 01:32:23.850 another branch, if you will, you allocate another node. 01:32:23.850 --> 01:32:25.600 And another node just means another array. 01:32:25.600 --> 01:32:27.960 And so we've drawn two arrays on the board. 01:32:27.960 --> 01:32:30.080 This now has the letter A highlighted. 01:32:30.080 --> 01:32:32.830 All of the letters are technically there, because it's of course 0 01:32:32.830 --> 01:32:33.450 through 25. 01:32:33.450 --> 01:32:35.408 But we're only highlighting the letters we care 01:32:35.408 --> 01:32:37.260 about for the sake of this example. 01:32:37.260 --> 01:32:39.690 Here is H-a-g. 01:32:39.690 --> 01:32:42.870 So it looks like the first name tag I'm trying to install into this data 01:32:42.870 --> 01:32:44.160 structure is Hagrid. 01:32:44.160 --> 01:32:46.830 Notice now that g is inside of that array. 01:32:46.830 --> 01:32:48.933 I want to go now to r for Hagrid. 01:32:48.933 --> 01:32:50.100 That gives me another array. 01:32:50.100 --> 01:32:53.110 Now i, now d. d is the end of his name. 01:32:53.110 --> 01:32:56.760 So I'm going to just color in green, or I can use like a Boolean flag in C code 01:32:56.760 --> 01:32:59.520 that just says someone's name ends here. 01:32:59.520 --> 01:33:03.600 So notice, I've implicitly stored Hagrid name now in this data structure 01:33:03.600 --> 01:33:08.590 by storing one node, that is one array, for every letter in his name. 01:33:08.590 --> 01:33:10.950 But there's this slight efficiency here because there's 01:33:10.950 --> 01:33:12.960 other people in this story besides Hagrid 01:33:12.960 --> 01:33:16.630 whose names are prefixes or share common prefixes. 01:33:16.630 --> 01:33:19.890 So, for instance, suppose I want to install Harry into this data structure. 01:33:19.890 --> 01:33:23.237 He is H-a-r-r-y. 01:33:23.237 --> 01:33:25.070 And so that gives me a couple of more nodes. 01:33:25.070 --> 01:33:29.510 And if I go ahead now and install Hermione in this, notice now 01:33:29.510 --> 01:33:31.250 I have even more nodes in the tree. 01:33:31.250 --> 01:33:33.360 But some of them are shared. 01:33:33.360 --> 01:33:35.540 If you start at the very top and look at the H, 01:33:35.540 --> 01:33:39.800 notice that both Hagrid and Harry and Hermione 01:33:39.800 --> 01:33:44.150 at least share at least one node in common. 01:33:44.150 --> 01:33:46.550 Now, what's cool about this ultimately? 01:33:46.550 --> 01:33:50.870 So what is the running time of searching for someone in this data structure 01:33:50.870 --> 01:33:52.457 if there's n people already in it? 01:33:52.457 --> 01:33:54.790 Right now n equals 3 because there's three people in it, 01:33:54.790 --> 01:33:56.360 even though there's a lot of nodes. 01:33:56.360 --> 01:33:59.300 But what's the running time for searching this data structure to see 01:33:59.300 --> 01:34:01.190 has Harry picked up his name tag already? 01:34:01.190 --> 01:34:02.480 Has Hermione picked up hers? 01:34:02.480 --> 01:34:03.950 Has Hagrid picked up his? 01:34:03.950 --> 01:34:07.220 Well, how many steps does it take to find Harry or Hermione or Hagrid 01:34:07.220 --> 01:34:09.320 in this data structure? 01:34:09.320 --> 01:34:11.750 For Harry, it's H-a-r-r-y. 01:34:11.750 --> 01:34:13.940 So it's five steps maximally. 01:34:13.940 --> 01:34:16.510 For Hagrid it's H-a-g-r-i-d. 01:34:16.510 --> 01:34:18.880 It's six steps maximally. 01:34:18.880 --> 01:34:24.290 And H-e-r-m-i-o-n-e, 8 steps total. 01:34:24.290 --> 01:34:26.920 And it's probably the case that if we read through the books, 01:34:26.920 --> 01:34:30.100 there is going to be some upper bound on the length of someone's name. 01:34:30.100 --> 01:34:31.100 I don't know what it is. 01:34:31.100 --> 01:34:32.390 It's probably 20 characters. 01:34:32.390 --> 01:34:34.200 Maybe 30 if it's crazy long. 01:34:34.200 --> 01:34:36.200 But there is some fixed value. 01:34:36.200 --> 01:34:39.470 Anytime you have a fixed value, that's what you by definition in CS 01:34:39.470 --> 01:34:40.905 and in math call a constant. 01:34:40.905 --> 01:34:42.530 If it's 20, it's 30, it doesn't matter. 01:34:42.530 --> 01:34:43.360 But it's fixed. 01:34:43.360 --> 01:34:45.890 People's names aren't growing every year in length. 01:34:45.890 --> 01:34:47.390 There's some hard upper bound. 01:34:47.390 --> 01:34:51.470 And so technically, if it only takes you five steps or six steps or eight 01:34:51.470 --> 01:34:55.910 steps to find Harry or Hagrid or Harry or Hermione, that 01:34:55.910 --> 01:35:00.860 is technically constant time or, as we've said, Big O of 1. 01:35:00.860 --> 01:35:05.030 So we can actually then achieve, truly for searching this data structure, 01:35:05.030 --> 01:35:08.240 for inserting this data structure, truly what we call big O of k 01:35:08.240 --> 01:35:09.440 where k is some constant. 01:35:09.440 --> 01:35:13.220 But a constant is the same thing, asymptotically, per our discussion 01:35:13.220 --> 01:35:16.910 in Week 3, of big O of 1. 01:35:16.910 --> 01:35:20.000 These are effectively constant time, because to find Harry, 01:35:20.000 --> 01:35:22.430 you look only at H-a-r-r-y. 01:35:22.430 --> 01:35:27.392 It doesn't matter if there's 1 million other characters in that trie already. 01:35:27.392 --> 01:35:30.350 It doesn't matter if there's Hermione and Hagrid and everyone else from 01:35:30.350 --> 01:35:33.517 the seven books in the data structure, because the only nodes you're looking 01:35:33.517 --> 01:35:36.620 at are the ones representing H-a-r-r-y. 01:35:36.620 --> 01:35:38.030 And that's a powerful thing. 01:35:38.030 --> 01:35:40.700 Every other algorithm we've discussed thus far, certainly 01:35:40.700 --> 01:35:44.180 for searching and sorting, has somehow been slowed down 01:35:44.180 --> 01:35:47.570 by how many other names or numbers are in the data structure. 01:35:47.570 --> 01:35:50.120 That is not the case for this one here. 01:35:50.120 --> 01:35:53.510 However, there is a price being paid. 01:35:53.510 --> 01:35:55.880 What appears to be the price that we're paying 01:35:55.880 --> 01:35:58.070 to gain that really low running time? 01:35:58.070 --> 01:35:58.910 AUDIENCE: Memory. 01:35:58.910 --> 01:35:59.270 DAVID MALAN: Memory. 01:35:59.270 --> 01:36:01.400 I mean, my god, it barely fits on the slide. 01:36:01.400 --> 01:36:03.170 And this is just three names. 01:36:03.170 --> 01:36:07.560 You're spending 26 amount of the memory to store one character. 01:36:07.560 --> 01:36:08.900 Now there's some optimizations. 01:36:08.900 --> 01:36:12.140 Over time, if you insert a lot of names, some of these nodes will be shared. 01:36:12.140 --> 01:36:15.000 But this is a very wide, very dense data structure, 01:36:15.000 --> 01:36:19.040 so to speak, because it's using so much memory to give you 01:36:19.040 --> 01:36:23.920 that super amazing running time of theoretically constant time. 01:36:23.920 --> 01:36:25.670 So again this theme of trade-offs is going 01:36:25.670 --> 01:36:29.588 to persist in the remaining weeks of the semester where to gain one resource, 01:36:29.588 --> 01:36:31.130 we're going to have to spend another. 01:36:31.130 --> 01:36:32.900 So that there is a trie. 01:36:32.900 --> 01:36:35.990 So it turns out that now that we have arrays and linked 01:36:35.990 --> 01:36:40.250 lists and trees and tries and hash tables and yet other data 01:36:40.250 --> 01:36:42.440 structures out there, we can actually implement 01:36:42.440 --> 01:36:44.627 what are called abstract data structures, using 01:36:44.627 --> 01:36:45.960 any of those as building blocks. 01:36:45.960 --> 01:36:48.335 What we've kind of done today verbally and pictorially is 01:36:48.335 --> 01:36:52.130 invent more of those pink puzzle pieces in Scratch, those custom puzzle pieces. 01:36:52.130 --> 01:36:54.980 Now we have as building blocks arrays and linked lists and trees 01:36:54.980 --> 01:36:57.650 and hash tables that we can use to solve other problems. 01:36:57.650 --> 01:37:01.346 And one of the problems out there in the real world is something called a queue. 01:37:01.346 --> 01:37:04.298 A queue and certainly in certain cultures immediately comes to mind, 01:37:04.298 --> 01:37:06.590 what's a queue in the real world or an example thereof? 01:37:06.590 --> 01:37:07.525 AUDIENCE: A line. 01:37:07.525 --> 01:37:09.650 DAVID MALAN: So a line, right, lining up at a store 01:37:09.650 --> 01:37:11.360 or a restaurant or a takeout place. 01:37:11.360 --> 01:37:14.690 So a queue actually has a technical meaning and computer science too. 01:37:14.690 --> 01:37:19.250 It's a data structure that is FIFO, first in, first out. 01:37:19.250 --> 01:37:22.250 A queue, by definition should have people hopefully pleasantly lining up 01:37:22.250 --> 01:37:23.720 one person in front of the other. 01:37:23.720 --> 01:37:27.350 And it maintains this FIFO property, first in, first out, 01:37:27.350 --> 01:37:29.830 such that if I'm at the front of the line 01:37:29.830 --> 01:37:33.020 I am going to be served my food first and then the person behind me 01:37:33.020 --> 01:37:34.670 and then the person behind them. 01:37:34.670 --> 01:37:38.270 It'd be really obnoxious if you walked up to Tasty Burger, placed your order, 01:37:38.270 --> 01:37:41.272 and whoever showed up most recently got their food first. 01:37:41.272 --> 01:37:42.980 That would be an opposite data structure. 01:37:42.980 --> 01:37:43.790 That's LIFO. 01:37:43.790 --> 01:37:45.210 Last in, first out. 01:37:45.210 --> 01:37:47.010 Not fair in the real world. 01:37:47.010 --> 01:37:51.020 So you might hope then that the software that companies like Tasty Burger 01:37:51.020 --> 01:37:53.000 use when they type in your order to the system 01:37:53.000 --> 01:37:55.460 actually send those orders to the team working 01:37:55.460 --> 01:37:57.795 in back cooking the food in a queue fashion, 01:37:57.795 --> 01:38:00.170 because it'd be pretty obnoxious too if people behind you 01:38:00.170 --> 01:38:01.420 were getting their food first. 01:38:01.420 --> 01:38:03.980 So hopefully in software, you're implementing that real world 01:38:03.980 --> 01:38:05.680 notion of a queue as well. 01:38:05.680 --> 01:38:09.260 Printing, if you still print on campus sometimes, papers and whatnot 01:38:09.260 --> 01:38:11.640 on printers, they're often shared printers on campus. 01:38:11.640 --> 01:38:13.640 And so they have what are called printer queues. 01:38:13.640 --> 01:38:16.010 You might go to Command-P or Control-P print, 01:38:16.010 --> 01:38:18.320 but then hopefully, in fairness, if there's 01:38:18.320 --> 01:38:21.320 10 people who are trying to print to the same Harvard printer, 01:38:21.320 --> 01:38:24.670 they are printed in the order in which they were requested. 01:38:24.670 --> 01:38:27.670 It would be pretty obnoxious, again, if the order were flipped. 01:38:27.670 --> 01:38:30.040 Well, it turns out with queues in computer science, 01:38:30.040 --> 01:38:32.957 there's two fundamental operations, even though we humans don't really 01:38:32.957 --> 01:38:34.870 think in these terms, enqueue and dequeue. 01:38:34.870 --> 01:38:36.310 To enqueue means to get in line. 01:38:36.310 --> 01:38:38.440 To dequeue means to get out of line, hopefully 01:38:38.440 --> 01:38:41.860 once you've been served your printouts or your food or whatnot. 01:38:41.860 --> 01:38:45.670 Using today's principles, arrays, linked lists, 01:38:45.670 --> 01:38:49.000 you could probably imagine conceptually using them as building blocks 01:38:49.000 --> 01:38:51.490 to implement this notion of a queue. 01:38:51.490 --> 01:38:54.730 The software that Tasty Burger or any fast food place uses 01:38:54.730 --> 01:38:57.790 probably has implemented in code some lines 01:38:57.790 --> 01:39:00.970 that are using an array that's maybe being dynamically resized or better 01:39:00.970 --> 01:39:03.430 yet a linked list that's growing and shrinking as people 01:39:03.430 --> 01:39:05.440 are placing orders and getting orders. 01:39:05.440 --> 01:39:08.290 So there's this one-to-one mapping between some of today's ideas 01:39:08.290 --> 01:39:09.802 and even the real world as well. 01:39:09.802 --> 01:39:12.760 There's kind of the opposite data structure that I referred to a moment 01:39:12.760 --> 01:39:13.260 ago. 01:39:13.260 --> 01:39:15.100 And these are generally known as stacks. 01:39:15.100 --> 01:39:17.558 Stacks in the real world might be in the dining hall right. 01:39:17.558 --> 01:39:18.990 Like here is of trays. 01:39:18.990 --> 01:39:21.240 And they have this fundamentally different properties, 01:39:21.240 --> 01:39:23.710 such that if the staff go ahead and clean the trays 01:39:23.710 --> 01:39:27.250 and put them right here, it would be pretty obnoxious if to get your you 01:39:27.250 --> 01:39:31.330 had to go through a FIFO fashion and get the first they put down 01:39:31.330 --> 01:39:32.660 and take that out first. 01:39:32.660 --> 01:39:34.210 No one does that, realistically. 01:39:34.210 --> 01:39:36.580 If you've got a big stack of trays in the dining hall, 01:39:36.580 --> 01:39:41.190 you probably enforce a LIFO order, last in, first out. 01:39:41.190 --> 01:39:44.035 So if this was the most recently installed or clean tray, 01:39:44.035 --> 01:39:46.710 you're probably, as the human, just going to take the top one, 01:39:46.710 --> 01:39:48.710 even though that's not really fair to the below. 01:39:48.710 --> 01:39:51.130 But it doesn't matter in this particular case. 01:39:51.130 --> 01:39:52.922 So a stack gives you the opposite property. 01:39:52.922 --> 01:39:54.380 And where else might you see these? 01:39:54.380 --> 01:39:55.510 Well, your Gmail inbox. 01:39:55.510 --> 01:39:58.570 If you use Gmail, your inbox, most likely by default 01:39:58.570 --> 01:39:59.845 is configured as a stack. 01:39:59.845 --> 01:40:01.840 Where do your most recent emails end up? 01:40:01.840 --> 01:40:02.740 AUDIENCE: At the top. 01:40:02.740 --> 01:40:03.580 DAVID MALAN: At the top. 01:40:03.580 --> 01:40:06.340 Now, this is wonderful because it's a feature in that you always 01:40:06.340 --> 01:40:07.720 see your newest mail. 01:40:07.720 --> 01:40:11.170 Kind of a downside, though, to your friends who've emailed you an hour ago 01:40:11.170 --> 01:40:14.540 and whose emails are now down here are on page 2 of your email. 01:40:14.540 --> 01:40:16.210 So there's trade-offs here too. 01:40:16.210 --> 01:40:19.780 Stacks might have desirable properties, like just get your tray quickly, 01:40:19.780 --> 01:40:21.460 see your most recent email. 01:40:21.460 --> 01:40:24.370 But if you're like me, as soon as email falls on the page 2, 01:40:24.370 --> 01:40:27.280 you might never get back to it if the stack of trays 01:40:27.280 --> 01:40:28.540 never actually gets exhausted. 01:40:28.540 --> 01:40:31.390 Frankly, there might be in some dining hall on campus some way 01:40:31.390 --> 01:40:34.780 down here that has never been used in years, because they 01:40:34.780 --> 01:40:37.930 keep refilling the stack, and we keep taking from the top. 01:40:37.930 --> 01:40:40.510 So that would be a bad property for a lot of context, 01:40:40.510 --> 01:40:41.927 but not necessarily all. 01:40:41.927 --> 01:40:44.260 So it turns out there's another data structure too-- oh, 01:40:44.260 --> 01:40:46.760 and the operations there are not called enqueue and dequeue. 01:40:46.760 --> 01:40:48.625 By convention they're called push and pop, 01:40:48.625 --> 01:40:51.910 where this means pushing an element onto the stack. 01:40:51.910 --> 01:40:53.830 Even if it's very gentle, that's pushing. 01:40:53.830 --> 01:40:55.870 Popping means removing the top element. 01:40:55.870 --> 01:40:59.290 So it's just nomenclature meaning adding and removing elements. 01:40:59.290 --> 01:41:02.350 But there's one other data structure we'll give mentioned to today. 01:41:02.350 --> 01:41:03.410 And that's known as a dictionary. 01:41:03.410 --> 01:41:06.285 And we'll see this again in a couple of weeks when we look at Python. 01:41:06.285 --> 01:41:10.030 A dictionary is the abstraction that you can get on top of a hash table. 01:41:10.030 --> 01:41:12.460 This hash table literally involved physical buckets 01:41:12.460 --> 01:41:14.800 and in code would involve arrays and linked lists. 01:41:14.800 --> 01:41:16.240 That's like low level plumbing. 01:41:16.240 --> 01:41:18.820 A dictionary more generally in computer science 01:41:18.820 --> 01:41:23.680 is a data structure that has keys and values, words and values, words 01:41:23.680 --> 01:41:26.625 and page numbers, anything that maps one thing to another. 01:41:26.625 --> 01:41:29.500 Physical dictionaries in the human world, like an English Dictionary, 01:41:29.500 --> 01:41:30.940 has lots of words. 01:41:30.940 --> 01:41:34.810 And if a word is correctly spelled in your document, 01:41:34.810 --> 01:41:36.130 it will be in that dictionary. 01:41:36.130 --> 01:41:38.530 And if you have a typo, a misspelling, in your document, 01:41:38.530 --> 01:41:40.300 it will not be in that dictionary. 01:41:40.300 --> 01:41:43.420 So wouldn't it be nice if you could actually implement 01:41:43.420 --> 01:41:47.170 a dictionary using maybe a hash table, but a smart hash table 01:41:47.170 --> 01:41:50.480 that has plenty of buckets, so that you can answer a question, is this a word, 01:41:50.480 --> 01:41:54.220 is this a word, super fast without having a whole stack of name tags 01:41:54.220 --> 01:41:56.692 or, in this case, English words all in the same bucket. 01:41:56.692 --> 01:41:58.900 And, in fact, that's the challenge for Problem Set 5. 01:41:58.900 --> 01:42:02.710 We're going to give you a big text file with 140,000 plus English words. 01:42:02.710 --> 01:42:06.610 And the goal for you is to implement a hash table with your choice of number 01:42:06.610 --> 01:42:09.310 of buckets, your choice of hash functions, 01:42:09.310 --> 01:42:12.670 and implement this notion of an array with linked lists that 01:42:12.670 --> 01:42:16.358 stores those 140,000 plus words. 01:42:16.358 --> 01:42:18.400 Dictionaries, though, do exist in the real world. 01:42:18.400 --> 01:42:21.640 And taken last night at like 9:00 PM before Sweetgreen closed 01:42:21.640 --> 01:42:23.360 in Harvard Square was this photo. 01:42:23.360 --> 01:42:25.390 If you've ever ordered a salad at Sweetgreen, 01:42:25.390 --> 01:42:28.390 they have a pretty clever optimized system so as to pick up your salad. 01:42:28.390 --> 01:42:30.460 If you order on their app in advance, they 01:42:30.460 --> 01:42:34.540 go ahead and put your salad under D for David, for instance, or B for Brian 01:42:34.540 --> 01:42:35.185 and so forth. 01:42:35.185 --> 01:42:37.060 So that when you go into the store, you don't 01:42:37.060 --> 01:42:39.610 have to look through big O of n other salads. 01:42:39.610 --> 01:42:41.920 You can jump immediately to the B section, the D 01:42:41.920 --> 01:42:44.440 section, or any other section and get your salad. 01:42:44.440 --> 01:42:46.960 Now, in the extreme case, maybe Harry and Hermione 01:42:46.960 --> 01:42:48.820 and Hagrid all order at the same time. 01:42:48.820 --> 01:42:50.960 So there's just a big stack at the H's. 01:42:50.960 --> 01:42:53.800 So it's technically still big O of n. 01:42:53.800 --> 01:42:56.650 But if you assume a nice uniform distribution of names, 01:42:56.650 --> 01:42:59.740 this probably does work out pretty well, especially if the salads aren't 01:42:59.740 --> 01:43:01.990 there by design very long. 01:43:01.990 --> 01:43:06.310 But let's use our final minutes together to take a look at one final visual, one 01:43:06.310 --> 01:43:09.700 made by some of our other friends online who put together an animation that 01:43:09.700 --> 01:43:13.180 tells the story of the differences between stacks and queues 01:43:13.180 --> 01:43:15.510 personified as follows. 01:43:15.510 --> 01:43:17.710 [VIDEO PLAYBACK] 01:43:17.710 --> 01:43:20.620 NARRATOR: Once upon a time, there was a guy named Jack. 01:43:20.620 --> 01:43:24.040 When it came to making friends, Jack did not have the knack. 01:43:24.040 --> 01:43:26.920 So Jack went to talk to the most popular guy he knew. 01:43:26.920 --> 01:43:29.230 He went up to Lou and asked, what do I do? 01:43:29.230 --> 01:43:32.140 Lou saw that his friend was really distressed. 01:43:32.140 --> 01:43:34.405 Well, Lou began, just look how you're dressed. 01:43:34.405 --> 01:43:37.420 Don't you have any with clothes with a different look? 01:43:37.420 --> 01:43:39.550 Yes, said Jack, I sure do. 01:43:39.550 --> 01:43:42.010 Come to my house and I'll show them to you. 01:43:42.010 --> 01:43:43.270 So they went off to Jack's. 01:43:43.270 --> 01:43:46.240 And Jack showed Lou the box where he kept all his shirts 01:43:46.240 --> 01:43:47.650 and his pants and his socks. 01:43:47.650 --> 01:43:50.470 Lou said, I see you have all your clothes in a pile. 01:43:50.470 --> 01:43:53.530 Why don't you wear some others once in a while? 01:43:53.530 --> 01:43:56.740 Jack said, well, when I remove and socks, 01:43:56.740 --> 01:43:59.470 I wash them and put them away in the box. 01:43:59.470 --> 01:44:01.960 Then comes the next morning and up I hop. 01:44:01.960 --> 01:44:04.825 I go to the box and get my off the top. 01:44:04.825 --> 01:44:07.810 Lou quickly realized the problem with Jack. 01:44:07.810 --> 01:44:10.170 He kept clothes, CDs, and books in a stack. 01:44:10.170 --> 01:44:13.150 When he reached for something to read or to wear, 01:44:13.150 --> 01:44:15.790 he chose the top book or underwear. 01:44:15.790 --> 01:44:18.160 Then when he was done, he would put it right back. 01:44:18.160 --> 01:44:20.645 Back it would go, on top of the stack. 01:44:20.645 --> 01:44:23.056 I know the solution, said a triumphant Lou. 01:44:23.056 --> 01:44:25.300 You need to learn to start using a queue. 01:44:25.300 --> 01:44:28.600 Lou took Jack's and hung them in a closet. 01:44:28.600 --> 01:44:31.420 And when he had emptied the box, he just tossed it. 01:44:31.420 --> 01:44:35.170 Then he said, now Jack, at the end of day, put your clothes on the left 01:44:35.170 --> 01:44:36.770 when you put them away. 01:44:36.770 --> 01:44:39.160 Then tomorrow morning, when you see the sun shine, 01:44:39.160 --> 01:44:41.830 get your clothes from the right, from the end of the line. 01:44:41.830 --> 01:44:45.060 Don't you see, said Lou, it will be so nice. 01:44:45.060 --> 01:44:48.490 You'll wear everything once before you wear something twice. 01:44:48.490 --> 01:44:51.340 And with everything in queues in his closet and shelf, 01:44:51.340 --> 01:44:54.370 Jack started to feel quite sure of himself, all thanks 01:44:54.370 --> 01:44:56.777 to Lou and his wonderful queue. 01:44:56.777 --> 01:44:57.850 [END PLAYBACK] 01:44:57.850 --> 01:44:59.600 DAVID MALAN: All right that's it for CS50. 01:44:59.600 --> 01:45:01.620 We'll see you next time.