WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:01.992 [MUSIC PLAYING] 00:01:12.587 --> 00:01:13.670 DAVID J. MALAN: All right. 00:01:13.670 --> 00:01:15.200 This is CS50. 00:01:15.200 --> 00:01:20.270 And this is week five, which is going to be our last week in C. But what 00:01:20.270 --> 00:01:22.265 that means is that we'll have OK-- 00:01:22.265 --> 00:01:25.470 [CHEERING AND APPLAUSE] 00:01:25.470 --> 00:01:28.500 But with this week, with last week, and really all of the weeks 00:01:28.500 --> 00:01:30.330 prior, have you been hopefully acquiring, 00:01:30.330 --> 00:01:33.527 if slowly and with some challenge, some fundamental building 00:01:33.527 --> 00:01:35.610 blocks that are still going to underlie everything 00:01:35.610 --> 00:01:38.370 we continue doing in the semester, even as we transition 00:01:38.370 --> 00:01:40.140 to so-called higher level languages. 00:01:40.140 --> 00:01:43.260 Next week, indeed, we'll introduce Python, a very popular language that 00:01:43.260 --> 00:01:45.660 does not have pointers, does not have memory 00:01:45.660 --> 00:01:47.220 management at this very low level. 00:01:47.220 --> 00:01:49.290 But that's really just because someone else wrote 00:01:49.290 --> 00:01:50.880 the code that will do that for you. 00:01:50.880 --> 00:01:52.553 And it's going to make our lives easier. 00:01:52.553 --> 00:01:55.470 Because it means when you want to solve a problem conceptually up here 00:01:55.470 --> 00:01:58.140 to just get real work done or build something amazing, 00:01:58.140 --> 00:02:00.240 you don't have to really get into the same weeds 00:02:00.240 --> 00:02:03.390 as we have been deliberately this week and now last. 00:02:03.390 --> 00:02:06.600 But the goal ultimately is that you better understand and can better 00:02:06.600 --> 00:02:10.560 harness then all that a computer can do even in those higher level languages. 00:02:10.560 --> 00:02:12.480 So today, we're going to focus particularly 00:02:12.480 --> 00:02:16.170 on data structures, how you might structure your data in memory, which 00:02:16.170 --> 00:02:19.500 really amounts to building things digitally, stitching together 00:02:19.500 --> 00:02:22.405 ideas and concepts in memory using this new building 00:02:22.405 --> 00:02:24.780 block from last week, which of course are these pointers. 00:02:24.780 --> 00:02:28.680 Pointers allow you to store addresses in memory, like in variables. 00:02:28.680 --> 00:02:32.040 But with those simple addresses we can leave these breadcrumbs. 00:02:32.040 --> 00:02:33.600 We can point from here to there. 00:02:33.600 --> 00:02:36.792 And we can conceptually stitch pieces of data together. 00:02:36.792 --> 00:02:39.000 But there's going to be different ways of doing that. 00:02:39.000 --> 00:02:41.010 And today we'll focus first on what's generally 00:02:41.010 --> 00:02:42.960 known as an abstract data type. 00:02:42.960 --> 00:02:46.110 And similar to a type in C more generally, it does actually 00:02:46.110 --> 00:02:47.250 have some properties in it. 00:02:47.250 --> 00:02:50.520 But the underlying implementation details of an abstract data type 00:02:50.520 --> 00:02:52.170 are ultimately up to you. 00:02:52.170 --> 00:02:55.080 That is to say, an abstract data type can represent one thing 00:02:55.080 --> 00:02:56.310 and can do something. 00:02:56.310 --> 00:03:00.420 But how you implement it allows you some discretion underneath the hood. 00:03:00.420 --> 00:03:03.210 So for instance, in the world of computer science, 00:03:03.210 --> 00:03:05.550 a queue is actually a technical term. 00:03:05.550 --> 00:03:09.360 This is a type of data structure that we could theoretically build in code 00:03:09.360 --> 00:03:11.250 in C or really any other language. 00:03:11.250 --> 00:03:14.800 But a queue has properties just like queues in the real world. 00:03:14.800 --> 00:03:16.950 For instance, if you've ever lined up for something 00:03:16.950 --> 00:03:21.060 to get food in a restaurant, or go into a store, 00:03:21.060 --> 00:03:24.540 or wait for the airport to clear you, well, you've lined up in a queue. 00:03:24.540 --> 00:03:26.250 A queue being some sort of line. 00:03:26.250 --> 00:03:30.030 But what's noteworthy about queues are specific properties. 00:03:30.030 --> 00:03:34.920 They are first in, first out data structures, either virtually 00:03:34.920 --> 00:03:36.030 or in the human world. 00:03:36.030 --> 00:03:38.370 Which is to say the first person in the line 00:03:38.370 --> 00:03:40.657 should ideally be served first at the restaurant. 00:03:40.657 --> 00:03:43.740 Or the first person in the line should get through airport security first. 00:03:43.740 --> 00:03:46.650 By contrast, if it weren't first in first out, 00:03:46.650 --> 00:03:48.930 you can imagine how frustrated you would be 00:03:48.930 --> 00:03:51.223 if you have this inherent unfairness. 00:03:51.223 --> 00:03:54.390 In fact, if you've ever been in line at a store, a supermarket, or the like, 00:03:54.390 --> 00:03:56.460 and all of a sudden they maybe open a new line. 00:03:56.460 --> 00:03:59.910 And now the person behind you gets to cut ahead and go forward. 00:03:59.910 --> 00:04:02.730 That's because they've broken the concept of the queue. 00:04:02.730 --> 00:04:06.750 So it has this inherent potential for unfairness unless you maintain 00:04:06.750 --> 00:04:09.310 this first in first out property. 00:04:09.310 --> 00:04:12.570 This would be true too for a to-do list, just for productivity's sake. 00:04:12.570 --> 00:04:16.815 If you're in the habit on paper or virtually making a to-do list, ideally 00:04:16.815 --> 00:04:18.690 you probably want to go through that list top 00:04:18.690 --> 00:04:21.120 to bottom so that you actually get the first stuff you 00:04:21.120 --> 00:04:25.140 thought of done first rather than always focusing on your most recent thought. 00:04:25.140 --> 00:04:28.230 Now, within the world of queues, there's generally 00:04:28.230 --> 00:04:31.300 two operations, two functions, if you will, 00:04:31.300 --> 00:04:34.290 that any queue would have either in the real world or in the virtual. 00:04:34.290 --> 00:04:38.280 Enqueue is usually the technical term to mean adding something to a queue. 00:04:38.280 --> 00:04:41.610 But specifically, it means adding it to the end of the queue 00:04:41.610 --> 00:04:44.610 so that someone isn't cutting in line, for instance, to go up front. 00:04:44.610 --> 00:04:46.380 And then dequeue is just the opposite. 00:04:46.380 --> 00:04:48.480 When it's time for the first person in line 00:04:48.480 --> 00:04:51.970 to be served, the time for the first person in line to go through security, 00:04:51.970 --> 00:04:54.040 they are dequeued, so to speak. 00:04:54.040 --> 00:04:57.600 So a technical concept, ultimately, as it's implemented in code. 00:04:57.600 --> 00:04:59.760 But it's really just a real-world concept. 00:04:59.760 --> 00:05:02.670 And these are in contrast to another abstract data type 00:05:02.670 --> 00:05:04.560 that we might call a stack. 00:05:04.560 --> 00:05:08.370 And a stack, much like a stack of trays in the cafeteria 00:05:08.370 --> 00:05:10.698 has fundamentally different properties. 00:05:10.698 --> 00:05:13.740 You can still add and remove things from them, but consider what happens. 00:05:13.740 --> 00:05:16.980 Whenever they clean all of the trays in the cafeteria or the dining hall, 00:05:16.980 --> 00:05:20.410 they put one of the trays down here and then the next one on top of it. 00:05:20.410 --> 00:05:22.810 And then the next one on top of it, it, it, and so forth. 00:05:22.810 --> 00:05:24.810 But of course, which tray do you presumably 00:05:24.810 --> 00:05:28.740 take as a user of that physical stack? 00:05:28.740 --> 00:05:30.662 The top one, presumably. 00:05:30.662 --> 00:05:33.120 You're not going to futz down here and try to pull one out. 00:05:33.120 --> 00:05:35.328 And so, that would seem to have an opposite property. 00:05:35.328 --> 00:05:40.292 LIFO, last in first out is what characterizes something like a stack. 00:05:40.292 --> 00:05:42.750 And that just makes sense, certainly, in the physical world 00:05:42.750 --> 00:05:45.083 of stacking all of those cafeteria trays because it just 00:05:45.083 --> 00:05:47.730 makes more sense to grab the most recently 00:05:47.730 --> 00:05:50.580 added one, the last added one first. 00:05:50.580 --> 00:05:53.070 And at least in that context, the trays don't necessarily 00:05:53.070 --> 00:05:54.905 care what order they're used in. 00:05:54.905 --> 00:05:57.030 But even then, you could imagine that maybe there's 00:05:57.030 --> 00:05:59.910 some old dirty nasty tray at the very bottom that 00:05:59.910 --> 00:06:03.330 never gets used because you're constantly replenishing the stack. 00:06:03.330 --> 00:06:07.200 So there might very well be side effects of these kinds of features. 00:06:07.200 --> 00:06:09.990 You might be familiar with using Gmail for instance, or really 00:06:09.990 --> 00:06:11.340 any email program. 00:06:11.340 --> 00:06:14.710 What you're looking at in your inbox is technically a stack, 00:06:14.710 --> 00:06:16.710 at least if you've left the defaults configured. 00:06:16.710 --> 00:06:17.210 Why? 00:06:17.210 --> 00:06:18.180 You get a new email? 00:06:18.180 --> 00:06:19.080 Where does it end up? 00:06:19.080 --> 00:06:22.600 Not five pages of emails later, presumably right at the top. 00:06:22.600 --> 00:06:25.600 And the next one's right at the top, right at the top, right at the top. 00:06:25.600 --> 00:06:27.850 And so, if you're like me, you're guilty of eventually 00:06:27.850 --> 00:06:29.110 losing track of some emails. 00:06:29.110 --> 00:06:29.610 Why? 00:06:29.610 --> 00:06:32.430 Because you've pushed so many more emails onto the stack 00:06:32.430 --> 00:06:35.730 that you lose track of the things you got earliest. 00:06:35.730 --> 00:06:37.543 Last in first out though, is maintained. 00:06:37.543 --> 00:06:40.710 The most recent email you get might very well be the first one you reply to. 00:06:40.710 --> 00:06:43.950 But that's not necessarily good for responsiveness to everyone 00:06:43.950 --> 00:06:45.160 else out there. 00:06:45.160 --> 00:06:50.700 Similarly, if you store all of your sweaters in a stack like this, 00:06:50.700 --> 00:06:53.610 like a pile of black ones, below which is a red and then a blue. 00:06:53.610 --> 00:06:56.160 Stack might be perfectly fine for keeping things organized. 00:06:56.160 --> 00:06:59.490 It's the sane way to do it if you just have a shelf in your dorm room or home. 00:06:59.490 --> 00:07:02.160 But what's going to be a side effect of using a stack 00:07:02.160 --> 00:07:06.270 to store your sweaters if there are these in this way as opposed 00:07:06.270 --> 00:07:07.770 to a queue? 00:07:07.770 --> 00:07:08.790 Yeah. 00:07:08.790 --> 00:07:10.275 AUDIENCE: It's harder to get the red and blue ones? 00:07:10.275 --> 00:07:11.830 DAVID J. MALAN: It's harder to get the red and blue ones. 00:07:11.830 --> 00:07:15.010 So presumably you're going to much more often wear, for instance, 00:07:15.010 --> 00:07:16.950 if you will, black instead there. 00:07:16.950 --> 00:07:19.920 Now, the operations for adding things to a stack 00:07:19.920 --> 00:07:22.920 are similar in spirit, but just different vocabulary. 00:07:22.920 --> 00:07:25.950 You push something on top of a stack. 00:07:25.950 --> 00:07:28.920 Even though it's more like in the tray world you place it or rest it. 00:07:28.920 --> 00:07:31.410 But pushing means adding something to the top of the stack. 00:07:31.410 --> 00:07:36.370 And popping means removing something also from the top of the stack. 00:07:36.370 --> 00:07:40.590 So it's not a matter of enqueuing at the end and dequeuing at the beginning. 00:07:40.590 --> 00:07:42.960 With a stack, everything's happening on top. 00:07:42.960 --> 00:07:46.140 You're pushing onto the top and then popping off of the top. 00:07:46.140 --> 00:07:49.480 Now, when it comes to actual code, how might we implement something like this? 00:07:49.480 --> 00:07:51.600 Well, let's just focus on really how you might 00:07:51.600 --> 00:07:53.250 implement the data structure itself. 00:07:53.250 --> 00:07:54.870 And we won't implement any functions. 00:07:54.870 --> 00:07:57.360 You might implement the notion of a stack like this. 00:07:57.360 --> 00:07:58.620 We've seen typedef before. 00:07:58.620 --> 00:08:00.930 It just means define a new type of my own. 00:08:00.930 --> 00:08:03.430 Struct means here comes a structure, a.k.a. 00:08:03.430 --> 00:08:06.780 a data structure of one or more variables within. 00:08:06.780 --> 00:08:11.610 And let's suppose, like last time, we've had-- we defined already a person data 00:08:11.610 --> 00:08:13.110 type using a separate typedef. 00:08:13.110 --> 00:08:15.870 And every person has a name and a number or whatever. 00:08:15.870 --> 00:08:18.330 Let me just stipulate that person exists already. 00:08:18.330 --> 00:08:20.670 So here, you might have to implement a stack, 00:08:20.670 --> 00:08:24.390 an array called people of some capacity. 00:08:24.390 --> 00:08:28.450 Maybe it's an array of size 10, or 100, or whatever. 00:08:28.450 --> 00:08:30.810 This is a constant here in this context, capacity. 00:08:30.810 --> 00:08:34.320 And every element in that array is a person structure. 00:08:34.320 --> 00:08:37.270 And I now have this too, size. 00:08:37.270 --> 00:08:39.990 Now, this almost seems like a synonym for capacity. 00:08:39.990 --> 00:08:44.010 But maybe intuitively, anyone want to propose why I'm apparently 00:08:44.010 --> 00:08:47.010 maintaining separately the capacity of the stack 00:08:47.010 --> 00:08:49.440 but also the size of the stack? 00:08:49.440 --> 00:08:52.720 Why these two distinctions here? 00:08:52.720 --> 00:08:53.800 Yeah. 00:08:53.800 --> 00:08:57.515 AUDIENCE: Capacity is how high the stack could be. 00:08:57.515 --> 00:08:58.390 DAVID J. MALAN: Yeah. 00:08:58.390 --> 00:09:01.650 AUDIENCE: Size is how high it actually is right now. 00:09:01.650 --> 00:09:02.650 DAVID J. MALAN: Perfect. 00:09:02.650 --> 00:09:05.080 The capacity is how high the stack could be, 00:09:05.080 --> 00:09:09.550 like how much room is there for those sweatshirts in my closet, for instance. 00:09:09.550 --> 00:09:12.590 Whereas size is just literally at this moment in time, 00:09:12.590 --> 00:09:14.440 how many sweatshirts are in the stack. 00:09:14.440 --> 00:09:18.260 It's either capacity or fewer presumably in total there. 00:09:18.260 --> 00:09:19.450 So what is capacity? 00:09:19.450 --> 00:09:22.360 Well, we could implement this in perhaps a familiar way. 00:09:22.360 --> 00:09:25.720 I might just define a const somewhere else in my code of type int 00:09:25.720 --> 00:09:28.390 that just defines it to be capacity 50. 00:09:28.390 --> 00:09:30.550 But what perhaps is going to be the downside 00:09:30.550 --> 00:09:34.030 of implementing a stack in this way? 00:09:34.030 --> 00:09:37.120 Of using an array inside here? 00:09:37.120 --> 00:09:42.190 What's a downside now of implementing a stack using an array and this size 00:09:42.190 --> 00:09:45.020 variable within. 00:09:45.020 --> 00:09:48.320 What's a caveat here perhaps? 00:09:48.320 --> 00:09:48.900 Any hands? 00:09:48.900 --> 00:09:49.400 Yeah. 00:09:49.400 --> 00:09:51.372 AUDIENCE: [INAUDIBLE] 00:09:51.600 --> 00:09:53.350 DAVID J. MALAN: So it's going to be harder 00:09:53.350 --> 00:09:55.142 to reach elements that aren't the last one. 00:09:55.142 --> 00:09:56.740 That is the most recently added one. 00:09:56.740 --> 00:09:59.600 So there could be some sweatshirts, so to speak, way down below. 00:09:59.600 --> 00:10:01.120 So we've seen that before too. 00:10:01.120 --> 00:10:03.640 But at some point too a limitation of this design 00:10:03.640 --> 00:10:05.740 is what it's going to be finite. 00:10:05.740 --> 00:10:09.580 I can maximally fit in this example 50 sweatshirts, or 50 emails, 00:10:09.580 --> 00:10:11.800 or 50 cafeteria trays, which is fine. 00:10:11.800 --> 00:10:13.600 But at least it's indeed finite. 00:10:13.600 --> 00:10:15.370 And at least in the computer's memory, it 00:10:15.370 --> 00:10:19.010 might be nice to use more, and more, and more, maybe as more things are 00:10:19.010 --> 00:10:20.260 getting added to the computer. 00:10:20.260 --> 00:10:21.910 So maybe I make this 500. 00:10:21.910 --> 00:10:24.970 Or heck, why don't I make it 5,000 or 50,000? 00:10:24.970 --> 00:10:26.680 Well, what's the trade-off there? 00:10:26.680 --> 00:10:30.040 If I want to have enough room to grow, seems 00:10:30.040 --> 00:10:33.370 like I should just crank up the value of capacity endlessly. 00:10:33.370 --> 00:10:39.480 But why might I not want to change the 50 to 500, or 5,000, or 50,000? 00:10:39.480 --> 00:10:42.630 What's the trade-off there perhaps just intuitively? 00:10:42.630 --> 00:10:43.696 Yeah. 00:10:43.696 --> 00:10:47.170 AUDIENCE: Because I don't want to touch memory that I'm not supposed to. 00:10:47.170 --> 00:10:48.280 DAVID J. MALAN: You don't want to touch memory that you're not 00:10:48.280 --> 00:10:49.480 supposed to be touching. 00:10:49.480 --> 00:10:52.810 And in this case, it wouldn't be-- that wouldn't be a risk per se 00:10:52.810 --> 00:10:54.650 unless you indeed overflow the stack. 00:10:54.650 --> 00:10:58.060 But there's a related issue in asking for that much memory. 00:10:58.060 --> 00:11:00.520 What would another downside be? 00:11:00.520 --> 00:11:06.015 AUDIENCE: You only have one element and you have [INAUDIBLE] 5,000 [INAUDIBLE].. 00:11:06.015 --> 00:11:06.890 DAVID J. MALAN: Yeah. 00:11:06.890 --> 00:11:07.910 [LAUGHS] Exactly. 00:11:07.910 --> 00:11:10.160 So if you've got a capacity of 5,000 but you're only 00:11:10.160 --> 00:11:14.030 using one of those elements, it's awkward to say it non-technically. 00:11:14.030 --> 00:11:16.310 Which is just to say very, very wasteful. 00:11:16.310 --> 00:11:17.450 That's just bad design. 00:11:17.450 --> 00:11:19.760 It's correct, it will work for up to 5,000 elements. 00:11:19.760 --> 00:11:23.030 But my gosh, you're wasting 4,999 extra spots. 00:11:23.030 --> 00:11:25.280 And that's not going to end well, especially if you're 00:11:25.280 --> 00:11:27.170 using more data structures in memory. 00:11:27.170 --> 00:11:29.720 Your Mac, your PC, your phone is surely going to run out 00:11:29.720 --> 00:11:31.220 of memory if you ask for that much. 00:11:31.220 --> 00:11:33.830 So it'd be nice if there was a bit more dynamism there, 00:11:33.830 --> 00:11:36.770 whether it's a stack or a queue, both of which 00:11:36.770 --> 00:11:39.080 might be implemented a little similarly in spirit. 00:11:39.080 --> 00:11:43.700 But let's conclude this abstraction by comparing-- thanks to a friend of ours, 00:11:43.700 --> 00:11:46.880 Professor Shannon Duvall of Elon University, who kindly put together 00:11:46.880 --> 00:11:50.030 this graphical animation that's just under two minutes long that paints 00:11:50.030 --> 00:11:52.640 a picture of these two types of abstract data structures. 00:11:52.640 --> 00:11:57.050 And then we'll dive in underneath to how we might implement problems like these. 00:11:57.050 --> 00:11:58.990 If we could dim the lights dramatically. 00:11:58.990 --> 00:11:59.657 [VIDEO PLAYBACK] 00:11:59.657 --> 00:12:01.220 [MUSIC PLAYING] 00:12:04.010 --> 00:12:06.920 - Once upon a time, there was a guy named Jack. 00:12:06.920 --> 00:12:10.250 When it came to making friends, Jack did not have the knack. 00:12:10.250 --> 00:12:13.250 So Jack went to talk to the most popular guy he knew. 00:12:13.250 --> 00:12:15.920 He went up to Lou and asked, what do I do? 00:12:15.920 --> 00:12:18.410 Lou saw that his friend was really distressed. 00:12:18.410 --> 00:12:21.020 Well, Lou began, just look how you're dressed. 00:12:21.020 --> 00:12:23.630 Don't you have any clothes with a different look? 00:12:23.630 --> 00:12:24.770 Yes, said Jack. 00:12:24.770 --> 00:12:25.820 I sure do. 00:12:25.820 --> 00:12:28.250 Come to my house and I'll show them to you. 00:12:28.250 --> 00:12:29.540 So they went off to Jack's. 00:12:29.540 --> 00:12:33.290 And Jack showed Lou the box where he kept all his shirts, and his pants, 00:12:33.290 --> 00:12:34.250 at his socks. 00:12:34.250 --> 00:12:37.280 Lou said, I see you have all your clothes in a pile. 00:12:37.280 --> 00:12:39.800 Why don't you wear some others once in a while? 00:12:39.800 --> 00:12:43.110 Jack said, well, when I remove clothes and socks, 00:12:43.110 --> 00:12:45.710 I wash them and put them away in the box. 00:12:45.710 --> 00:12:48.260 Then comes the next morning and up I hop. 00:12:48.260 --> 00:12:51.440 I go to the box and get my clothes off the top. 00:12:51.440 --> 00:12:54.050 Lou quickly realized the problem with Jack. 00:12:54.050 --> 00:12:57.020 He kept clothes, CDs, and books in a stack. 00:12:57.020 --> 00:12:59.480 When he'd reach for something to read or to wear, 00:12:59.480 --> 00:13:02.030 he chose a top book or underwear. 00:13:02.030 --> 00:13:04.460 Then when he was done he would put it right back. 00:13:04.460 --> 00:13:06.980 Back it would go, on top of the stack. 00:13:06.980 --> 00:13:09.440 I know the solution, said a triumphant Lou. 00:13:09.440 --> 00:13:11.990 You need to learn to start using a queue. 00:13:11.990 --> 00:13:14.840 Lou took Jack's clothes and hung them in a closet. 00:13:14.840 --> 00:13:17.630 And when he had emptied the box, he just tossed it. 00:13:17.630 --> 00:13:21.560 Then he said, now, Jack, at the end of the day, put your clothes on the left 00:13:21.560 --> 00:13:23.010 when you put them away. 00:13:23.010 --> 00:13:24.770 Then tomorrow morning when you see the sun 00:13:24.770 --> 00:13:28.430 shine, get your clothes from the right from the end of the line. 00:13:28.430 --> 00:13:29.930 Don't you see? said Lou. 00:13:29.930 --> 00:13:31.340 It will be so nice. 00:13:31.340 --> 00:13:34.670 You'll wear everything once before you wear something twice. 00:13:34.670 --> 00:13:37.610 And with everything in queues in his closet and shelf, 00:13:37.610 --> 00:13:40.220 Jack started to feel quite sure of himself. 00:13:40.220 --> 00:13:44.603 All thanks to Lou and his wonderful queue. 00:13:44.603 --> 00:13:45.477 [END PLAYBACK] 00:13:45.477 --> 00:13:46.560 DAVID J. MALAN: All right. 00:13:46.560 --> 00:13:46.890 [LAUGHS] 00:13:46.890 --> 00:13:47.557 AUDIENCE: Great! 00:13:47.557 --> 00:13:48.600 [APPLAUSE] 00:13:48.600 --> 00:13:49.890 DAVID J. MALAN: Sure. 00:13:49.890 --> 00:13:52.558 So that paints a picture of these two abstract data structures. 00:13:52.558 --> 00:13:54.600 But if we really were to dive underneath the hood 00:13:54.600 --> 00:13:56.850 we could implement them in a number of different ways. 00:13:56.850 --> 00:13:59.682 But we really, I think, need some building blocks via which 00:13:59.682 --> 00:14:01.140 we could solve problems like those. 00:14:01.140 --> 00:14:03.220 And we'll see today too, some others as well. 00:14:03.220 --> 00:14:06.983 So let's rewind back to week two when we implement-- we introduced you 00:14:06.983 --> 00:14:08.400 to your very first data structure. 00:14:08.400 --> 00:14:09.330 That is an array. 00:14:09.330 --> 00:14:12.630 And an array, recall, was just a chunk of memory 00:14:12.630 --> 00:14:16.500 whereby elements were stored by design back to back to back. 00:14:16.500 --> 00:14:19.800 It's an array of contiguous memory specifically. 00:14:19.800 --> 00:14:23.400 So with an array, we could certainly store not just one thing but two, 00:14:23.400 --> 00:14:25.210 or three, or even more. 00:14:25.210 --> 00:14:29.280 And so, for instance, if we treat my computer's memory as this abstraction 00:14:29.280 --> 00:14:33.720 here, and pictured here are three bytes or some multiplication thereof, 00:14:33.720 --> 00:14:37.230 suppose we're storing in the computer's memory an array of size three, 00:14:37.230 --> 00:14:39.540 storing the digits one, two, and three. 00:14:39.540 --> 00:14:43.170 Well, remember, that if we zoom out per last week, 00:14:43.170 --> 00:14:44.880 there's other stuff going on in memory. 00:14:44.880 --> 00:14:48.000 So even if we want to add another number to this array 00:14:48.000 --> 00:14:51.360 that we didn't think of when we first started the program, like the number 00:14:51.360 --> 00:14:54.340 four, ideally we would put it right here next to it. 00:14:54.340 --> 00:14:55.900 Otherwise it's no longer an array. 00:14:55.900 --> 00:14:57.650 So by definition, if we're using an array, 00:14:57.650 --> 00:14:59.940 it's got to end up right there after the three. 00:14:59.940 --> 00:15:02.550 But what else is going on inside of your computer's memory? 00:15:02.550 --> 00:15:04.740 Well, assuming your program is of any length, 00:15:04.740 --> 00:15:06.870 and you've got other variables, other functions, 00:15:06.870 --> 00:15:09.420 you've been running it for a while, there's a lot going on. 00:15:09.420 --> 00:15:11.470 And your memory's being used and reused. 00:15:11.470 --> 00:15:13.680 So for instance, somewhere in memory might 00:15:13.680 --> 00:15:18.200 be immediately adjacent to this, Hello, comma, world, backslash zero, 00:15:18.200 --> 00:15:19.950 the null character, just because maybe you 00:15:19.950 --> 00:15:22.230 have another variable somewhere in there that 00:15:22.230 --> 00:15:25.110 is storing that particular string alongside your existing 00:15:25.110 --> 00:15:26.183 array of size three. 00:15:26.183 --> 00:15:28.350 And all of these Oscar the Grouches here really just 00:15:28.350 --> 00:15:31.110 represent what we called last week garbage values. 00:15:31.110 --> 00:15:33.660 There's obviously bits there because they don't disappear. 00:15:33.660 --> 00:15:36.600 They're always going to be inside of the computer somehow implemented. 00:15:36.600 --> 00:15:38.700 But we don't really know or care what they are. 00:15:38.700 --> 00:15:40.500 They're the remnants of those bytes having 00:15:40.500 --> 00:15:44.440 been used for other older variables, previous function calls, or the like. 00:15:44.440 --> 00:15:47.970 But the problem clearly here is that, OK, one, two, three is there. 00:15:47.970 --> 00:15:48.900 But the H is here. 00:15:48.900 --> 00:15:52.380 And unless I want to start taking a bite out of my string 00:15:52.380 --> 00:15:57.060 by overriding the H with a four, we just can't fit it right there. 00:15:57.060 --> 00:15:59.440 And yet, even though there's Oscars all over the place, 00:15:59.440 --> 00:16:00.773 those are indeed garbage values. 00:16:00.773 --> 00:16:04.132 And therefore, we could use that space because it's technically unused. 00:16:04.132 --> 00:16:06.090 We just don't know or care what the values are. 00:16:06.090 --> 00:16:08.940 So where could I put one, two, three, four? 00:16:08.940 --> 00:16:09.660 Well, my gosh. 00:16:09.660 --> 00:16:12.420 I have all this memory down here that's unused. 00:16:12.420 --> 00:16:16.830 I could certainly change those garbage values to be one, two, three, four. 00:16:16.830 --> 00:16:21.720 But to do that, I might need to do a bit of work here. 00:16:21.720 --> 00:16:25.230 It's not just a matter of just saying boom and it happens. 00:16:25.230 --> 00:16:28.680 Now with C and with code, I'd have to do this a little more methodically. 00:16:28.680 --> 00:16:31.410 So let me abstract away everything else that's a distraction. 00:16:31.410 --> 00:16:35.208 Let me assume that there is indeed at least four bytes available for numbers, 00:16:35.208 --> 00:16:38.250 just down here that we could have put them in a bunch of different spots. 00:16:38.250 --> 00:16:44.230 What's involved now in moving the one, two, three to this new chunk of memory 00:16:44.230 --> 00:16:45.423 so we can add the four? 00:16:45.423 --> 00:16:47.340 Well, I think conceptually we're going to have 00:16:47.340 --> 00:16:49.590 to copy the one from old to new. 00:16:49.590 --> 00:16:51.240 Copy the two from old to new. 00:16:51.240 --> 00:16:53.130 Copy the three from old to new. 00:16:53.130 --> 00:16:56.070 And then ultimately, we can get rid of the old memory. 00:16:56.070 --> 00:16:59.160 Those three original bytes could now look like Oscar the Grouch 00:16:59.160 --> 00:17:01.770 and just be garbage values for all intents and purposes. 00:17:01.770 --> 00:17:06.630 But now I have room for a fourth byte wherein I can put the number four. 00:17:06.630 --> 00:17:08.400 So this is nice. 00:17:08.400 --> 00:17:12.450 But what's a downside of this approach? 00:17:12.450 --> 00:17:15.780 What's a downside of solving the problem in this way, where the problem at hand 00:17:15.780 --> 00:17:20.220 is just to grow the array, so to speak, to increase its size, 00:17:20.220 --> 00:17:23.750 to fit one or more numbers? 00:17:23.750 --> 00:17:26.240 Seems pretty straight forward. 00:17:26.240 --> 00:17:27.615 But yeah. 00:17:27.615 --> 00:17:29.198 AUDIENCE: That it's just out of order. 00:17:29.198 --> 00:17:31.653 You move the one, two, three, four to the very bottom. 00:17:31.653 --> 00:17:33.320 DAVID J. MALAN: Maybe it's out of order. 00:17:33.320 --> 00:17:36.445 But I think that's OK because the order is just matters that it's relative. 00:17:36.445 --> 00:17:38.525 So so long it's still contiguous back, to back, 00:17:38.525 --> 00:17:41.150 to back in a different chunk of memory, I think we're OK there. 00:17:41.150 --> 00:17:43.790 It's not like I changed it from four, three, two, one. 00:17:43.790 --> 00:17:44.960 But a reasonable hunch. 00:17:44.960 --> 00:17:45.855 Yeah. 00:17:45.855 --> 00:17:47.635 AUDIENCE: [INAUDIBLE] 00:17:48.970 --> 00:17:50.020 DAVID J. MALAN: Yeah. 00:17:50.020 --> 00:17:51.490 I didn't really plan ahead here. 00:17:51.490 --> 00:17:53.990 If I have to add another number, like five or anything else, 00:17:53.990 --> 00:17:56.198 well, I might have to jump through these hoops again. 00:17:56.198 --> 00:17:58.240 Maybe I get lucky and maybe there's space there. 00:17:58.240 --> 00:18:00.850 But not if I have other variables and other things going on, 00:18:00.850 --> 00:18:03.410 that too might be used at some point. 00:18:03.410 --> 00:18:04.420 Other thoughts? 00:18:04.420 --> 00:18:05.020 Yeah. 00:18:05.020 --> 00:18:06.230 AUDIENCE: Slow efficiency. 00:18:06.230 --> 00:18:08.600 DAVID J. MALAN: Slow efficiency, why? 00:18:08.600 --> 00:18:10.340 AUDIENCE: [INAUDIBLE] 00:18:10.815 --> 00:18:11.690 DAVID J. MALAN: Yeah. 00:18:11.690 --> 00:18:13.920 I mean, it's just inefficient. 00:18:13.920 --> 00:18:16.430 It's bad design arguably. 00:18:16.430 --> 00:18:17.030 Why? 00:18:17.030 --> 00:18:20.398 Because I had a copy all of my original work down here. 00:18:20.398 --> 00:18:22.440 And as you note, if I want to add a fifth number, 00:18:22.440 --> 00:18:24.815 I'm going to have to copy it again, and again, and again. 00:18:24.815 --> 00:18:27.770 And do things n times again and again. 00:18:27.770 --> 00:18:29.600 Now, maybe that's necessary. 00:18:29.600 --> 00:18:30.640 We'll soon see for sure. 00:18:30.640 --> 00:18:33.890 But it feels like this is not going to end well, especially if the array isn't 00:18:33.890 --> 00:18:36.440 of size three or four but 300, 400. 00:18:36.440 --> 00:18:40.490 Your computer ends up spending so much time just spinning its wheels. 00:18:40.490 --> 00:18:42.410 I mean, honestly, better might be this. 00:18:42.410 --> 00:18:46.430 If this is my same array physically incarnated now, one, two, three, 00:18:46.430 --> 00:18:48.120 it's literally on the edge of the shelf. 00:18:48.120 --> 00:18:50.810 So there's no room for the number four. 00:18:50.810 --> 00:18:53.660 Maybe where we could take this story is, well, let's just 00:18:53.660 --> 00:18:55.310 find room for the four. 00:18:55.310 --> 00:18:58.220 Let's just put the four, for instance, over here, 00:18:58.220 --> 00:19:01.790 replacing some available garbage value, some spare byte over here. 00:19:01.790 --> 00:19:03.080 But now, wait a minute. 00:19:03.080 --> 00:19:06.140 I've broken the definition of an array. 00:19:06.140 --> 00:19:08.580 I can't have one, two, three and then four over here. 00:19:08.580 --> 00:19:12.720 So maybe there maybe could be a mechanism if I put this thing on again, 00:19:12.720 --> 00:19:14.990 where when you get to the end of the existing elements 00:19:14.990 --> 00:19:18.980 maybe I just somehow digitally point to the fourth array. 00:19:18.980 --> 00:19:22.700 And maybe we can stitch together all of these different values in memory 00:19:22.700 --> 00:19:25.220 so that if you follow the arrows, so to speak, 00:19:25.220 --> 00:19:28.310 we can reconstruct exactly what the order is even 00:19:28.310 --> 00:19:32.030 without having to find or make room here or pick up all of these numbers 00:19:32.030 --> 00:19:33.990 and move all of them over there. 00:19:33.990 --> 00:19:36.480 So that's perhaps the direction in which we'll go here. 00:19:36.480 --> 00:19:41.220 So let's see how we might get to that spot as follows. 00:19:41.220 --> 00:19:43.770 Let me go ahead and open up, say, VS Code here. 00:19:43.770 --> 00:19:48.110 Let me open up a program called list.c in my terminal. 00:19:48.110 --> 00:19:51.740 And let me go ahead and whip up a relatively simple program 00:19:51.740 --> 00:19:54.320 that just demonstrates what we did back in week two 00:19:54.320 --> 00:19:56.400 when we introduced arrays as follows. 00:19:56.400 --> 00:20:00.590 Let me include stdio.h so we can print stuff out. 00:20:00.590 --> 00:20:02.480 Let me do int main void. 00:20:02.480 --> 00:20:04.310 No command line arguments for now. 00:20:04.310 --> 00:20:10.105 Let me give myself an array called list of size three. 00:20:10.105 --> 00:20:12.980 And I'll just hard-code it to keep it simple for lecture's sake, each 00:20:12.980 --> 00:20:14.438 of which is going to be an integer. 00:20:14.438 --> 00:20:16.970 And now, just so we have some specifics to talk about, 00:20:16.970 --> 00:20:19.670 let me put it list bracket zero the number one, 00:20:19.670 --> 00:20:23.570 list bracket one the number two, and list bracket two the number three. 00:20:23.570 --> 00:20:27.680 So I'm just translating into code what we just had pictorially on the screen 00:20:27.680 --> 00:20:30.740 and also physically here with these numbers on the desk. 00:20:30.740 --> 00:20:33.570 Now, let's just do something mildly useful for this. 00:20:33.570 --> 00:20:38.750 How about we do four int i get zero, i is less than three, i plus, plus. 00:20:38.750 --> 00:20:41.030 Let's just print each of these numbers out just 00:20:41.030 --> 00:20:43.250 to make sure they're indeed in memory as I intended. 00:20:43.250 --> 00:20:47.240 So percent i backslash n comma i and then a semicolon. 00:20:47.240 --> 00:20:48.630 And I think that's it for now. 00:20:48.630 --> 00:20:49.820 So nothing interesting. 00:20:49.820 --> 00:20:51.290 No problem solved just yet. 00:20:51.290 --> 00:20:52.470 Just a proof of concept. 00:20:52.470 --> 00:20:54.710 So that now when I clear my terminal and run 00:20:54.710 --> 00:20:59.090 make list no apparent errors at the terminal. 00:20:59.090 --> 00:21:02.360 And so, when I now do dot slash list, I should see, hopefully from left 00:21:02.360 --> 00:21:04.250 to right, one, two, three. 00:21:04.250 --> 00:21:07.370 But of course, if I want to add a fourth number now, 00:21:07.370 --> 00:21:09.020 there's no mechanism for such. 00:21:09.020 --> 00:21:12.470 Certainly in the code that I just wrote I could go back in here 00:21:12.470 --> 00:21:14.120 and change this to a four. 00:21:14.120 --> 00:21:17.810 I could go down here and change lists bracket three equals four. 00:21:17.810 --> 00:21:20.985 I could just manually change the code, recompile the code. 00:21:20.985 --> 00:21:23.360 But of course, that doesn't give me any additional runway 00:21:23.360 --> 00:21:24.870 for the fifth or sixth number. 00:21:24.870 --> 00:21:27.560 So let me try to take a different approach drawing 00:21:27.560 --> 00:21:29.330 some inspiration from last week. 00:21:29.330 --> 00:21:33.050 If I want to allocate memory dynamically, 00:21:33.050 --> 00:21:37.220 maybe because I don't know when I wrote the program how many bytes I want, 00:21:37.220 --> 00:21:39.380 we have another function as of last week that 00:21:39.380 --> 00:21:42.530 does not require that you commit in advance to a certain number of bytes 00:21:42.530 --> 00:21:47.060 via what function can you just ask the operating system for a chunk of memory. 00:21:47.060 --> 00:21:49.070 So malloc, allocate memory. 00:21:49.070 --> 00:21:51.260 Now, an array is just a chunk of memory. 00:21:51.260 --> 00:21:53.240 And even though since week two we've been 00:21:53.240 --> 00:21:56.420 using this syntactic sugar, this convenience of just using 00:21:56.420 --> 00:21:58.910 square brackets and indexing into it, it's 00:21:58.910 --> 00:22:02.780 just making it easier to manipulate a chunk of memory that's contiguous 00:22:02.780 --> 00:22:04.860 all together back, to back, to back. 00:22:04.860 --> 00:22:08.090 So today, just like last week, we can take those training wheels off 00:22:08.090 --> 00:22:11.570 and maybe be a little more deliberate in how we allocate memory. 00:22:11.570 --> 00:22:14.130 Let me go, for instance, and do this. 00:22:14.130 --> 00:22:17.480 Let me delete my contents of my main function here. 00:22:17.480 --> 00:22:18.920 Go back into main. 00:22:18.920 --> 00:22:24.650 And let me propose now that I declare, for instance, how about my list no 00:22:24.650 --> 00:22:26.600 longer as an array but as a pointer. 00:22:26.600 --> 00:22:28.700 So int star list. 00:22:28.700 --> 00:22:31.670 And I'm going to go ahead and initialize this 00:22:31.670 --> 00:22:34.532 to be how about a chunk of three integers for now? 00:22:34.532 --> 00:22:35.990 So I'm still going to hard-code it. 00:22:35.990 --> 00:22:38.880 But I'm taking a step toward more dynamism for now. 00:22:38.880 --> 00:22:43.033 So let me allocate three times whatever the size is of an int. 00:22:43.033 --> 00:22:45.200 But it's usually going to be four bytes, as we know. 00:22:45.200 --> 00:22:47.900 So this is really going to be 3 times 4 equals 12. 00:22:47.900 --> 00:22:49.500 But it's a little more dynamic. 00:22:49.500 --> 00:22:51.530 And now, what can I do down here? 00:22:51.530 --> 00:22:53.280 Well, this is just a chunk of memory. 00:22:53.280 --> 00:22:58.760 So I can do literally list bracket zero equals one. 00:22:58.760 --> 00:23:01.220 List bracket one equals two. 00:23:01.220 --> 00:23:04.460 List bracket two equals three. 00:23:04.460 --> 00:23:07.220 And voila, achieve the exact same effect. 00:23:07.220 --> 00:23:10.130 Because, again, an array is just a chunk of contiguous memory. 00:23:10.130 --> 00:23:13.950 But malloc gives you any old chunk of contiguous memory 00:23:13.950 --> 00:23:16.890 so you can rather treat one like the other here. 00:23:16.890 --> 00:23:20.280 Now, if you really want to be cool, you can do something like this instead. 00:23:20.280 --> 00:23:24.360 You could dereference the address in list and go there. 00:23:24.360 --> 00:23:29.080 You could go down here and dereference list plus one and go there. 00:23:29.080 --> 00:23:31.360 But honestly, no one really writes code like this. 00:23:31.360 --> 00:23:32.400 It's just too cryptic. 00:23:32.400 --> 00:23:35.080 It's a little too far over the line at least for most people. 00:23:35.080 --> 00:23:38.130 And so, I think the syntactic sugar as I keep describing it, 00:23:38.130 --> 00:23:40.497 just the more user-friendly square bracket notation 00:23:40.497 --> 00:23:43.080 does the exact same thing, figures out the pointer arithmetic, 00:23:43.080 --> 00:23:46.800 and puts each of these integers in the right chunks therein. 00:23:46.800 --> 00:23:50.950 Now, just to be super pedantic, let me make sure if something went wrong. 00:23:50.950 --> 00:23:54.930 So if list equals equals null, that means that something went wrong. 00:23:54.930 --> 00:23:58.200 Like my computer is out of memory, which we should check for typically. 00:23:58.200 --> 00:24:02.490 So let me just immediately return one signaling anything other than zero, 00:24:02.490 --> 00:24:05.250 which means success typically, just to get out of this program 00:24:05.250 --> 00:24:06.490 because something's wrong. 00:24:06.490 --> 00:24:09.780 But now let me propose that I've had a-- 00:24:09.780 --> 00:24:10.620 well, let's do this. 00:24:10.620 --> 00:24:14.880 For int i gets zero, i less than 3, i plus plus. 00:24:14.880 --> 00:24:17.280 Though a better design would always be to use a const. 00:24:17.280 --> 00:24:19.440 But I'm just doing this for demonstration sake. 00:24:19.440 --> 00:24:21.270 Let's print out each of these ints too. 00:24:21.270 --> 00:24:23.530 And just make sure I didn't mess anything up. 00:24:23.530 --> 00:24:26.140 And let me open my terminal window again. 00:24:26.140 --> 00:24:29.680 Let me do make list again. 00:24:29.680 --> 00:24:30.180 Huh. 00:24:30.180 --> 00:24:35.220 Implicitly declaring library function malloc we type void star something, 00:24:35.220 --> 00:24:39.570 something implicitly declaring is the operative words there. 00:24:39.570 --> 00:24:41.070 What did I mess up? 00:24:41.070 --> 00:24:41.640 Yeah. 00:24:41.640 --> 00:24:43.890 AUDIENCE: [INAUDIBLE] need the header [INAUDIBLE].. 00:24:43.890 --> 00:24:44.765 DAVID J. MALAN: Yeah. 00:24:44.765 --> 00:24:46.980 I forgot the header file in which malloc is declared. 00:24:46.980 --> 00:24:50.118 I remember now, OK, that's in standard lib.h. 00:24:50.118 --> 00:24:52.410 And it's fine to look stuff like that up if you forget. 00:24:52.410 --> 00:24:55.170 So let me include standard lib.h. 00:24:55.170 --> 00:24:56.820 Now let me clear my terminal. 00:24:56.820 --> 00:24:59.070 Run make list again. 00:24:59.070 --> 00:25:00.880 Now we're good. dot slash list. 00:25:00.880 --> 00:25:04.630 And now, what did I do wrong? 00:25:07.570 --> 00:25:08.070 Oh. 00:25:08.070 --> 00:25:09.090 [LAUGHS] OK. 00:25:09.090 --> 00:25:10.800 Not intended, but teachable moment. 00:25:10.800 --> 00:25:11.790 What did I do wrong? 00:25:11.790 --> 00:25:13.494 [LAUGHS] Yeah. 00:25:13.494 --> 00:25:15.150 AUDIENCE: You are printing [INAUDIBLE]. 00:25:15.150 --> 00:25:16.025 DAVID J. MALAN: Yeah. 00:25:16.025 --> 00:25:20.080 I'm printing the values of i instead of what is at location i in the array. 00:25:20.080 --> 00:25:22.440 So what I actually meant to do was print this out. 00:25:22.440 --> 00:25:22.990 Thank you. 00:25:22.990 --> 00:25:26.280 So now let me recompile make list dot slash list. 00:25:26.280 --> 00:25:30.700 And now, those are the three values I was expecting, not the indices thereof. 00:25:30.700 --> 00:25:33.090 Now, let me just propose that for the sake of discussion 00:25:33.090 --> 00:25:37.980 that I regret having only allocated space for three integers. 00:25:37.980 --> 00:25:40.985 And maybe I really should have allocated enough space for four. 00:25:40.985 --> 00:25:43.110 Now, this is not how you would do this in practice. 00:25:43.110 --> 00:25:45.193 Because presumably if you have a change of thought 00:25:45.193 --> 00:25:46.800 just go back in and correct the code. 00:25:46.800 --> 00:25:48.900 But let me propose that somewhere in here 00:25:48.900 --> 00:25:52.020 is a more complicated program and time passes, dot, dot, dot. 00:25:52.020 --> 00:25:54.190 There's a lot of other interesting code there. 00:25:54.190 --> 00:25:57.610 But at some point, I might want to give myself more memory. 00:25:57.610 --> 00:25:58.780 So how can I do this? 00:25:58.780 --> 00:26:02.310 Well, let me just ask the operating system now for four new bytes of memory 00:26:02.310 --> 00:26:05.280 so that we can at least in version one implement the idea on the board 00:26:05.280 --> 00:26:08.610 where I just copied the three bytes into the new four bytes 00:26:08.610 --> 00:26:10.360 and then added a fourth value. 00:26:10.360 --> 00:26:11.970 So I'm going to use malloc again. 00:26:11.970 --> 00:26:13.980 And I'm going to say, here's a new pointer. 00:26:13.980 --> 00:26:17.490 I'll call it temp, tmp for short, which is quite common when you just 00:26:17.490 --> 00:26:18.810 need it briefly. 00:26:18.810 --> 00:26:20.670 I'm going to then call malloc again. 00:26:20.670 --> 00:26:24.810 I'm going to say, give me four integers using size of. 00:26:24.810 --> 00:26:26.070 Let me again make sure. 00:26:26.070 --> 00:26:30.397 So if temp equals equals null something went wrong. 00:26:30.397 --> 00:26:31.980 So let me just immediately return one. 00:26:31.980 --> 00:26:35.550 And for good measure, before I return one, 00:26:35.550 --> 00:26:39.370 let me free the original list so that I don't leak memory. 00:26:39.370 --> 00:26:41.250 So I'm not just immediately returning one. 00:26:41.250 --> 00:26:44.160 I'm being a good citizen and remembering, well, 00:26:44.160 --> 00:26:49.440 if this malloc call did succeed and indeed I got as far as line 18, 00:26:49.440 --> 00:26:54.222 but then line 18 failed, I should free the memory that I previously malloc'd. 00:26:54.222 --> 00:26:55.680 So again, that's the rule of thumb. 00:26:55.680 --> 00:26:57.600 If you allocate it, you should be the one 00:26:57.600 --> 00:26:59.880 to free it even before you're about to quit. 00:26:59.880 --> 00:27:03.240 Now, once I've done that, I think I need to do what we did pictorially 00:27:03.240 --> 00:27:06.660 on the screen where I need to copy the one, the two, the three 00:27:06.660 --> 00:27:08.790 from the old array into the new. 00:27:08.790 --> 00:27:10.180 So how might I do this? 00:27:10.180 --> 00:27:11.850 Well, let me give myself a loop. 00:27:11.850 --> 00:27:15.420 So for int i get zero, i less than 3, i plus plus, 00:27:15.420 --> 00:27:18.490 because the size of the original is still the same. 00:27:18.490 --> 00:27:23.880 Let me go ahead and treat the new chunk of memory called temp as an array 00:27:23.880 --> 00:27:24.670 itself. 00:27:24.670 --> 00:27:27.150 And so, I can absolutely use these square brackets just like before. 00:27:27.150 --> 00:27:28.320 It's just a chunk of memory. 00:27:28.320 --> 00:27:29.730 I'm treating it like an array. 00:27:29.730 --> 00:27:34.740 And let me add to that value whatever is at the original list at location i 00:27:34.740 --> 00:27:35.470 as well. 00:27:35.470 --> 00:27:41.790 So this, again, is just this exercise of copying from old to new step by step 00:27:41.790 --> 00:27:44.130 the one, the two, and the three. 00:27:44.130 --> 00:27:46.110 But I still need one additional step. 00:27:46.110 --> 00:27:50.817 If my goal at hand now is to have ultimately a fourth value here, 00:27:50.817 --> 00:27:53.400 well, I'm just going to hard-code this for demonstration sake. 00:27:53.400 --> 00:27:56.520 And I'm going to go to the very last location of temp, 00:27:56.520 --> 00:27:58.320 which is of size four. 00:27:58.320 --> 00:28:04.260 Which means the last element in temp is temp bracket three, 00:28:04.260 --> 00:28:05.580 because it's zero-indexed. 00:28:05.580 --> 00:28:07.310 But there's four total spaces there. 00:28:07.310 --> 00:28:09.810 And I'm just going to arbitrarily for the sake of discussion 00:28:09.810 --> 00:28:11.010 put the number four there. 00:28:11.010 --> 00:28:16.920 And that is what happened when we proposed changing the final garbage 00:28:16.920 --> 00:28:18.480 value there to that four. 00:28:18.480 --> 00:28:22.350 But now I need to do what the slide did for us magically on the screen. 00:28:22.350 --> 00:28:24.510 I should now do a couple of final things. 00:28:24.510 --> 00:28:28.350 I should free the original list, which I've not done yet, because I only 00:28:28.350 --> 00:28:30.420 called free earlier in cases of error. 00:28:30.420 --> 00:28:32.070 And that was just to be safe. 00:28:32.070 --> 00:28:34.110 I can now free the list. 00:28:34.110 --> 00:28:39.390 And now, if I want to inform the computer that I want list, quote 00:28:39.390 --> 00:28:43.740 unquote, my variable called list to point at not the old chunk 00:28:43.740 --> 00:28:46.410 like it originally did but the new chunk, I think 00:28:46.410 --> 00:28:50.280 I can just do this-- list equals tmp. 00:28:50.280 --> 00:28:53.670 And again, that's just saying that if list is a pointer, which it was. 00:28:53.670 --> 00:28:58.230 Because look at the very top line here, on line six 00:28:58.230 --> 00:29:02.790 I declared list to be a pointer to a chunk of memory. 00:29:02.790 --> 00:29:06.370 Temp meanwhile is a separate pointer to a chunk of memory. 00:29:06.370 --> 00:29:10.470 So down here, this line 33 is just a matter of my saying, 00:29:10.470 --> 00:29:13.980 OK, now henceforth, because I've already freed the old chunk of memory, 00:29:13.980 --> 00:29:17.370 my list variable should point not at this chunk of three bytes 00:29:17.370 --> 00:29:22.290 but this chunk of four bytes or really 12 in total now. 00:29:22.290 --> 00:29:26.310 Or rather 16 now because we have four such bytes. 00:29:26.310 --> 00:29:30.090 Questions now on this code, the point of which 00:29:30.090 --> 00:29:33.570 was quite simply to demonstrate how we could implement 00:29:33.570 --> 00:29:38.310 in code this idea of fairly correctly but inefficiently allocating 00:29:38.310 --> 00:29:44.610 a new array of sufficient size and then populating it with a new fourth value. 00:29:44.610 --> 00:29:48.720 Questions on what we've just done here? 00:29:48.720 --> 00:29:49.220 No? 00:29:49.220 --> 00:29:51.140 Yeah. 00:29:51.140 --> 00:29:53.590 AUDIENCE: [INAUDIBLE] 00:29:59.380 --> 00:30:00.630 DAVID J. MALAN: Good question. 00:30:00.630 --> 00:30:03.360 At this point in the story, with line 33, 00:30:03.360 --> 00:30:07.410 do I not have two different variables pointing at the same chunk of memory? 00:30:07.410 --> 00:30:08.520 Short answer, yes. 00:30:08.520 --> 00:30:10.800 But here's where the semantics are perhaps compelling. 00:30:10.800 --> 00:30:14.007 List is the variable that I intend to use longer term 00:30:14.007 --> 00:30:15.090 and keep around in memory. 00:30:15.090 --> 00:30:18.102 And again, assume that there's even more code going on here that we just 00:30:18.102 --> 00:30:18.810 didn't write yet. 00:30:18.810 --> 00:30:20.790 So it's useful to have that variable. 00:30:20.790 --> 00:30:23.280 Temp was just a necessary evil. 00:30:23.280 --> 00:30:27.570 Because up here, it would not have been correct to do this. 00:30:27.570 --> 00:30:31.140 It would not have been correct to say list on line 18 00:30:31.140 --> 00:30:33.720 equals the new chunk of memory because this 00:30:33.720 --> 00:30:35.700 would have represented a memory leak. 00:30:35.700 --> 00:30:42.060 If I prematurely change temp to point not at the old chunk but the new chunk, 00:30:42.060 --> 00:30:44.470 at that point no one's pointing at the old chunk. 00:30:44.470 --> 00:30:46.260 And so I've lost those three bytes. 00:30:46.260 --> 00:30:47.970 Valgrind, for instance, would yell at you 00:30:47.970 --> 00:30:51.070 for having lost as many bytes in memory. 00:30:51.070 --> 00:30:53.730 So in this case here, I do leave this as temp. 00:30:53.730 --> 00:30:55.590 Yes, it's duplicative at this point. 00:30:55.590 --> 00:30:58.380 But it's not a huge deal if it was just meant semantically 00:30:58.380 --> 00:30:59.610 to be a temporary value. 00:30:59.610 --> 00:31:03.900 But down here, at the risk of one more line of code, 00:31:03.900 --> 00:31:07.380 I still want to, to be a good citizen, free list, and maybe 00:31:07.380 --> 00:31:09.690 just for good measure return zero explicitly. 00:31:09.690 --> 00:31:15.870 But notice, it's not doing it twice per se on line 31. 00:31:15.870 --> 00:31:17.130 What am I freeing? 00:31:17.130 --> 00:31:19.620 The original address of list. 00:31:19.620 --> 00:31:21.900 The three-integer version. 00:31:21.900 --> 00:31:23.740 Then I change what list points at. 00:31:23.740 --> 00:31:25.740 So it's pointing at a completely different chunk 00:31:25.740 --> 00:31:27.360 of memory, this one of size four. 00:31:27.360 --> 00:31:31.150 So eventually, when I'm all done using this memory for this demonstration, 00:31:31.150 --> 00:31:32.370 I still need to free list. 00:31:32.370 --> 00:31:34.500 But at this point in the storyline 40, it's 00:31:34.500 --> 00:31:38.190 pointing at the new chunk of memory, which I similarly need to hand back 00:31:38.190 --> 00:31:40.290 to the operating system by free. 00:31:40.290 --> 00:31:41.236 Yeah. 00:31:41.236 --> 00:31:42.634 AUDIENCE: [INAUDIBLE] 00:31:44.458 --> 00:31:46.250 DAVID J. MALAN: When would temp equal null? 00:31:46.250 --> 00:31:48.290 So let me scroll back up slightly. 00:31:48.290 --> 00:31:50.890 This is being a good citizen and a good programmer. 00:31:50.890 --> 00:31:54.250 Whenever it comes to using malloc, malloc 00:31:54.250 --> 00:31:57.755 can return null if the computer is out of memory. 00:31:57.755 --> 00:31:59.380 So this is maybe a much bigger program. 00:31:59.380 --> 00:32:01.040 You've got other things going on in it. 00:32:01.040 --> 00:32:04.450 And so, you just don't have enough memory available to be handed. 00:32:04.450 --> 00:32:06.970 Malloc needs to signal to you that there's some error. 00:32:06.970 --> 00:32:10.090 And so, it will buy convention per the documentation, 00:32:10.090 --> 00:32:12.623 per the manual pages return null. 00:32:12.623 --> 00:32:14.290 So this is just me being a good citizen. 00:32:14.290 --> 00:32:16.900 Otherwise, here's another error that might cause your program 00:32:16.900 --> 00:32:19.690 to crash with a segmentation fault. If you get back 00:32:19.690 --> 00:32:24.820 null but you assume that it's good memory going to address zero, a.k.a. 00:32:24.820 --> 00:32:27.850 null, will crash your program intentionally. 00:32:27.850 --> 00:32:28.390 Yeah. 00:32:28.390 --> 00:32:31.046 AUDIENCE: If you want to write [INAUDIBLE] at the very bottom 00:32:31.046 --> 00:32:33.703 of a program, and you just put [INAUDIBLE],, 00:32:33.703 --> 00:32:36.120 it would [INAUDIBLE] the same [INAUDIBLE],, right? 00:32:36.120 --> 00:32:37.120 DAVID J. MALAN: Correct. 00:32:37.120 --> 00:32:41.820 If I were to change my final line 40 here to be free temp, 00:32:41.820 --> 00:32:44.140 this would also work as well. 00:32:44.140 --> 00:32:46.560 And here, this is really a matter of design. 00:32:46.560 --> 00:32:48.120 It's a very nitpicky thing. 00:32:48.120 --> 00:32:49.380 We could probably debate it. 00:32:49.380 --> 00:32:53.460 But because at this point in the story my main variable 00:32:53.460 --> 00:32:55.770 for remembering where the list is is called list. 00:32:55.770 --> 00:32:59.010 This is the more responsible way to do it, freeing the list, 00:32:59.010 --> 00:33:01.830 just so that my colleagues, my TA doesn't 00:33:01.830 --> 00:33:05.160 wonder why are you freeing temporary memory that you already freed. 00:33:05.160 --> 00:33:06.930 It just is a semantic thing at this point. 00:33:06.930 --> 00:33:07.680 But good instinct. 00:33:07.680 --> 00:33:08.970 It would also work. 00:33:08.970 --> 00:33:09.690 Correct. 00:33:09.690 --> 00:33:12.112 Maybe just not good design. 00:33:12.112 --> 00:33:14.070 So it turns out that this gets annoying quickly 00:33:14.070 --> 00:33:16.980 as it did in the picture of doing all of this duplication. 00:33:16.980 --> 00:33:20.010 And even though technically it's necessary to copy those values, 00:33:20.010 --> 00:33:22.350 if you need a newer, bigger chunk of memory, 00:33:22.350 --> 00:33:27.300 there is at least a function in C that simplifies a lot of this for us. 00:33:27.300 --> 00:33:29.400 And in fact, let me go ahead and do this. 00:33:29.400 --> 00:33:33.840 Instead of using malloc, this second time on line 18 00:33:33.840 --> 00:33:37.170 in addition to the first time I used it on line six, 00:33:37.170 --> 00:33:41.310 I'm actually going to try and introduce another function called realloc, which 00:33:41.310 --> 00:33:44.700 as the name suggests tries to reallocate memory for you. 00:33:44.700 --> 00:33:47.010 And it works a little differently from malloc. 00:33:47.010 --> 00:33:49.330 realloc expects two arguments. 00:33:49.330 --> 00:33:51.840 The first one is what is the chunk of memory 00:33:51.840 --> 00:33:55.650 that you want to try to grow or shrink, that is, reallocate 00:33:55.650 --> 00:33:56.880 to be a different size. 00:33:56.880 --> 00:33:59.580 And then you specify what size you would want. 00:33:59.580 --> 00:34:04.800 And indeed, in this case, I want four times size of int. 00:34:04.800 --> 00:34:10.500 And that will now give me hopefully a new address of a chunk of memory that's 00:34:10.500 --> 00:34:12.960 big enough to fit all four numbers. 00:34:12.960 --> 00:34:17.190 But what's wonderful about realloc is that it 00:34:17.190 --> 00:34:20.020 will handle all of the copying for me. 00:34:20.020 --> 00:34:21.969 So in fact, I'm going to go down here. 00:34:21.969 --> 00:34:25.960 I'm going to get rid of all of this, this extra for loop. 00:34:25.960 --> 00:34:28.710 And what I'm simply going to do instead is this. 00:34:28.710 --> 00:34:34.290 Once I can trust after lines 18 through 23 00:34:34.290 --> 00:34:37.710 that realloc worked, and it didn't return null because I'm out of memory, 00:34:37.710 --> 00:34:41.850 I can just say, OK, just immediately remember that the new list points 00:34:41.850 --> 00:34:44.230 at this new chunk of memory instead. 00:34:44.230 --> 00:34:46.710 And then, I can still now do this line. 00:34:46.710 --> 00:34:50.190 But I can tweak the semantics here and just say list bracket three 00:34:50.190 --> 00:34:56.310 the new final location in the new list is four. 00:34:56.310 --> 00:34:58.020 I don't need to free this here. 00:34:58.020 --> 00:34:59.580 I don't need to do this. 00:34:59.580 --> 00:35:02.280 All I need now at the bottom is the final for loop 00:35:02.280 --> 00:35:03.988 to just print out these values. 00:35:03.988 --> 00:35:06.030 So in short, even though that was somewhat quick, 00:35:06.030 --> 00:35:09.720 using realloc just moves the entire copying 00:35:09.720 --> 00:35:13.330 process that I implemented myself a moment ago using a for loop. 00:35:13.330 --> 00:35:16.500 It just moves it to realloc and lets it deal with the copying for me. 00:35:16.500 --> 00:35:17.520 It's no more efficient. 00:35:17.520 --> 00:35:20.400 But at least means I'm writing less code, which is more pleasant. 00:35:20.400 --> 00:35:22.830 And hopefully, the people who wrote malloc or realloc 00:35:22.830 --> 00:35:27.030 are smarter than me and they just will introduce bugs with lower probability 00:35:27.030 --> 00:35:28.930 too. 00:35:28.930 --> 00:35:29.710 That was a lot. 00:35:29.710 --> 00:35:30.754 Any questions? 00:35:30.754 --> 00:35:32.754 AUDIENCE: Why do you still have to [INAUDIBLE]?? 00:35:34.580 --> 00:35:35.830 DAVID J. MALAN: Good question. 00:35:35.830 --> 00:35:40.750 Why do you still need to make list equal temp as I did on line 24? 00:35:40.750 --> 00:35:44.200 So ideally, I would do this. 00:35:44.200 --> 00:35:48.910 Ideally I would just change this line 18 to be list. 00:35:48.910 --> 00:35:50.350 That is to say, call-- 00:35:50.350 --> 00:35:52.570 or actually, even better, ideally I would just 00:35:52.570 --> 00:35:55.180 say reallocate this list to be of this new size. 00:35:55.180 --> 00:35:57.760 But again, things can go wrong when allocating memory. 00:35:57.760 --> 00:36:01.760 You need to check a return value to see if it was successful or not. 00:36:01.760 --> 00:36:04.220 And so, we need to use a return value. 00:36:04.220 --> 00:36:05.680 So let's not introduce temp. 00:36:05.680 --> 00:36:07.120 Let's just use list. 00:36:07.120 --> 00:36:09.610 But here's where a memory leak might happen. 00:36:09.610 --> 00:36:13.240 In the off chance realloc fails and doesn't 00:36:13.240 --> 00:36:16.240 have enough memory for your four bytes, therefore 00:36:16.240 --> 00:36:18.100 it returns by definition null. 00:36:18.100 --> 00:36:23.110 You can't overwrite the original value of list with null to then check it. 00:36:23.110 --> 00:36:23.680 Why? 00:36:23.680 --> 00:36:27.190 Because now who remembers where the original three bytes were? 00:36:27.190 --> 00:36:30.790 If you prematurely change the value of list, you've lost. 00:36:30.790 --> 00:36:32.780 You've leaked memory in that sense. 00:36:32.780 --> 00:36:34.990 And so, that's why-- let me undo this change-- 00:36:34.990 --> 00:36:39.370 I declare a temporary pointer for the sole purpose of making 00:36:39.370 --> 00:36:41.060 sure I can check the return value. 00:36:41.060 --> 00:36:44.660 And then, once it's good, now I'll update the value of list. 00:36:44.660 --> 00:36:47.590 So it's doing a switcheroo by making sure first 00:36:47.590 --> 00:36:51.070 that you have a new value to swap with the old. 00:36:51.070 --> 00:36:53.630 Other questions on this code? 00:36:53.630 --> 00:36:54.523 Yeah. 00:36:54.523 --> 00:36:59.882 AUDIENCE: Does realloc [INAUDIBLE] memory where [INAUDIBLE] is stored? 00:36:59.882 --> 00:37:00.840 DAVID J. MALAN: Indeed. 00:37:00.840 --> 00:37:03.660 realloc automatically frees the previous memory for you. 00:37:03.660 --> 00:37:05.610 And better yet, it's even smarter than that. 00:37:05.610 --> 00:37:10.230 If you get lucky and there happens to be space right after your existing 00:37:10.230 --> 00:37:13.900 chunk of memory, so one, two, three, garbage value, instead of one, 00:37:13.900 --> 00:37:16.560 two, three, Hello, world, realloc won't even 00:37:16.560 --> 00:37:18.990 bother copying things from old to new. 00:37:18.990 --> 00:37:21.330 It will just say, OK, I'm going to now reserve 00:37:21.330 --> 00:37:24.690 for you more bytes than you originally asked for so it doesn't 00:37:24.690 --> 00:37:26.460 have to waste time doing that copying. 00:37:26.460 --> 00:37:29.310 And so, in that sense, this version is now not only still correct, 00:37:29.310 --> 00:37:32.790 it's even better designed because we're not wasting time with that for loop. 00:37:32.790 --> 00:37:36.060 We might have to resort to it if there is, in fact, Hello, world or something 00:37:36.060 --> 00:37:36.900 else in the way. 00:37:36.900 --> 00:37:39.990 But hopefully, we'll get lucky and save those steps. 00:37:39.990 --> 00:37:46.610 Other questions on this manipulation of code here? 00:37:46.610 --> 00:37:48.038 Yeah, in the middle. 00:37:48.038 --> 00:37:49.442 AUDIENCE: So [INAUDIBLE]? 00:37:53.190 --> 00:37:56.110 DAVID J. MALAN: What if you want to resize a two-dimensional array? 00:37:56.110 --> 00:38:00.540 So very similar in spirit whereby you can use the same trickery. 00:38:00.540 --> 00:38:03.150 Let me wave my hand at that for now just because I 00:38:03.150 --> 00:38:06.270 think that's going to significantly increase the complexity. 00:38:06.270 --> 00:38:08.100 But very same primitives ultimately. 00:38:08.100 --> 00:38:12.570 A two-dimensional array is essentially just a doubly long or quadratically 00:38:12.570 --> 00:38:16.050 longer list of memory that using square bracket notation 00:38:16.050 --> 00:38:18.520 is doing some of that mental math for you. 00:38:18.520 --> 00:38:24.060 But it's fundamentally no different of what's going on underneath the hood. 00:38:24.060 --> 00:38:27.417 So with that said and that code under our belt, 00:38:27.417 --> 00:38:30.000 even though that's not going to be something you'll frequently 00:38:30.000 --> 00:38:33.390 need to code yourself, let's propose now how 00:38:33.390 --> 00:38:36.780 we might go about building some actual data structures ourselves. 00:38:36.780 --> 00:38:39.750 The new ingredient here being this reality 00:38:39.750 --> 00:38:44.520 that if you want to resize a chunk of memory so as to make room for things, 00:38:44.520 --> 00:38:45.780 we now have that ability. 00:38:45.780 --> 00:38:47.040 Memory addresses. 00:38:47.040 --> 00:38:50.160 And pointers just give us the ability to point around at things 00:38:50.160 --> 00:38:51.570 and move things around in memory. 00:38:51.570 --> 00:38:54.060 But now that we have malloc and even realloc, 00:38:54.060 --> 00:38:56.100 you can imagine, maybe rewinding, and you 00:38:56.100 --> 00:39:00.090 could implement that stack, that queue using not an array per 00:39:00.090 --> 00:39:02.790 se, because you have to commit to an array size in advance. 00:39:02.790 --> 00:39:06.840 But if you implement your stack or your queue using a pointer and then 00:39:06.840 --> 00:39:10.680 malloc and realloc, and maybe someone else writes all that code for you, 00:39:10.680 --> 00:39:14.490 perhaps now you can imagine that now the stack can grow 00:39:14.490 --> 00:39:17.400 or shrink by using realloc accordingly. 00:39:17.400 --> 00:39:23.400 You don't have to preemptively say give me five bytes, or 50, or 500, or 5,000. 00:39:23.400 --> 00:39:25.360 You can say, just give me one initially. 00:39:25.360 --> 00:39:27.850 And if I need more, I'll realloc, realloc, realloc. 00:39:27.850 --> 00:39:29.880 And if you keep popping things off the stack, 00:39:29.880 --> 00:39:33.630 you can realloc in the other direction and ask for fewer and fewer bytes 00:39:33.630 --> 00:39:36.550 and the operating system can take that memory back as well. 00:39:36.550 --> 00:39:38.130 So we now have this building block. 00:39:38.130 --> 00:39:39.900 Let's see what we can do with it. 00:39:39.900 --> 00:39:43.050 So we've had a few pieces of syntax in recent weeks, all of which 00:39:43.050 --> 00:39:46.090 we're going to combine now in just a slightly more clever way. 00:39:46.090 --> 00:39:49.920 So struct is this keyword in C that lets us build our own structure in memory, 00:39:49.920 --> 00:39:52.530 like a collection of two or three or more variables, 00:39:52.530 --> 00:39:54.480 like a person that we've seen before. 00:39:54.480 --> 00:40:01.320 The dot operator, recall, we've used when you do have a struct like a person 00:40:01.320 --> 00:40:02.980 and you want to go inside of it. 00:40:02.980 --> 00:40:07.170 So person.name or person.number. 00:40:07.170 --> 00:40:08.520 We did this a few weeks ago. 00:40:08.520 --> 00:40:12.030 Now but the dot operator just allows you to go inside of a structure 00:40:12.030 --> 00:40:14.160 and get the individual variables within. 00:40:14.160 --> 00:40:17.520 And then, the star operator, unfortunately, has a lot of uses now. 00:40:17.520 --> 00:40:18.810 One was multiplication. 00:40:18.810 --> 00:40:20.880 My God, that was easy back in the day. 00:40:20.880 --> 00:40:23.520 Now it's used to declare pointers. 00:40:23.520 --> 00:40:25.680 It's also used to dereference pointer. 00:40:25.680 --> 00:40:29.430 So to make one exist and then go to that address. 00:40:29.430 --> 00:40:31.900 Unfortunately, it's the same symbol for all of those. 00:40:31.900 --> 00:40:33.270 But it's all related. 00:40:33.270 --> 00:40:35.183 But with these three symbols, it turns out-- 00:40:35.183 --> 00:40:37.350 you're going to get one last one today-- and my God, 00:40:37.350 --> 00:40:40.320 it finally looks like the concept. 00:40:40.320 --> 00:40:42.870 It turns out there's a clever way any time 00:40:42.870 --> 00:40:45.870 you want to use the dot and the star together, that is, to go 00:40:45.870 --> 00:40:49.830 somewhere, and go to an address, and then look inside of a structure, 00:40:49.830 --> 00:40:53.523 you can actually literally use an arrow symbol on your keyboard. 00:40:53.523 --> 00:40:54.690 It's not a single keystroke. 00:40:54.690 --> 00:40:57.750 It's a hyphen and then an open angle bracket. 00:40:57.750 --> 00:40:59.520 But at least it looks like an arrow. 00:40:59.520 --> 00:41:02.700 And we'll see indeed in code today, the things I was drawing pictorially 00:41:02.700 --> 00:41:04.680 on the screen last time with yellow arrows, 00:41:04.680 --> 00:41:07.380 you can actually now express as well in code. 00:41:07.380 --> 00:41:12.000 And so, here we have our next data structure called a linked list. 00:41:12.000 --> 00:41:15.330 And this is one of the most useful powerful concepts in C. 00:41:15.330 --> 00:41:18.030 It's the kind of thing that you can take for granted in Java 00:41:18.030 --> 00:41:19.920 and Python and higher level languages. 00:41:19.920 --> 00:41:24.150 But today, we'll see how we or others can actually build these things just 00:41:24.150 --> 00:41:25.830 using these same primitives. 00:41:25.830 --> 00:41:29.430 So a linked list is going to allow us to actually do 00:41:29.430 --> 00:41:32.490 what we used a foam finger for last week allow 00:41:32.490 --> 00:41:35.100 us to link together, for instance, these three values maybe 00:41:35.100 --> 00:41:36.540 with that fourth value over there. 00:41:36.540 --> 00:41:39.390 And then if there's a fifth, maybe this other foam finger 00:41:39.390 --> 00:41:41.790 points even farther over way to that fifth value. 00:41:41.790 --> 00:41:45.930 The key being that you can stitch together fancier data structures 00:41:45.930 --> 00:41:49.440 without having to pick all of these up and find new space. 00:41:49.440 --> 00:41:51.750 You just have to at least connect the dots somehow. 00:41:51.750 --> 00:41:54.568 We just need to somehow point from one to the other. 00:41:54.568 --> 00:41:57.360 And that's going to make things much more efficient, it would seem. 00:41:57.360 --> 00:41:58.802 So how do we get there? 00:41:58.802 --> 00:42:00.510 So here's my computer's memory as always. 00:42:00.510 --> 00:42:03.720 Suppose that I'm storing the value one somewhere in there 00:42:03.720 --> 00:42:06.180 and it's at 0x123 address, whatever. 00:42:06.180 --> 00:42:10.140 And I'm storing the number two somewhere else in memory, 0x456. 00:42:10.140 --> 00:42:13.500 And number three at address 0x789. 00:42:13.500 --> 00:42:15.780 This is not in array by definition. 00:42:15.780 --> 00:42:17.412 Why? 00:42:17.412 --> 00:42:19.620 Even though it's the only three things on the screen, 00:42:19.620 --> 00:42:20.715 what makes this not an array? 00:42:20.715 --> 00:42:21.550 AUDIENCE: It's not contiguous. 00:42:21.550 --> 00:42:23.050 DAVID J. MALAN: It's not contiguous. 00:42:23.050 --> 00:42:25.450 So this violates the definition of an array. 00:42:25.450 --> 00:42:27.880 But you know, especially since they're sequential, 00:42:27.880 --> 00:42:30.250 it kind of looks to a human like a list. 00:42:30.250 --> 00:42:33.120 So it would be nice if there were a data type called list. 00:42:33.120 --> 00:42:35.820 And there isn't in C. There will be in Python. 00:42:35.820 --> 00:42:36.930 But you know what? 00:42:36.930 --> 00:42:40.270 If I could somehow stitch together these three values 00:42:40.270 --> 00:42:42.330 so I can get from one to the next to the next, 00:42:42.330 --> 00:42:45.660 then I think we could achieve the idea the concept of a list 00:42:45.660 --> 00:42:48.420 without this really annoying constraint that they all be 00:42:48.420 --> 00:42:51.210 contiguous as in an array. 00:42:51.210 --> 00:42:52.260 So how do I do that? 00:42:52.260 --> 00:42:55.970 Well, at the end of the day I only have memory at my disposal. 00:42:55.970 --> 00:42:58.880 There's no more training wheels to take off here. 00:42:58.880 --> 00:43:01.620 This is what we've got underneath the hood of a computer. 00:43:01.620 --> 00:43:04.760 So if all I have is memory, I think the solution 00:43:04.760 --> 00:43:08.390 to this problem of stitching together those values in a list 00:43:08.390 --> 00:43:11.090 must be to spend a bit more memory. 00:43:11.090 --> 00:43:13.770 That's literally the only resource we have right now. 00:43:13.770 --> 00:43:17.180 So let me propose that if we want to create a list conceptually out 00:43:17.180 --> 00:43:21.140 of three values that are in random although pictorially pretty 00:43:21.140 --> 00:43:25.280 positions in memory, let me just add a little bit more memory to the picture. 00:43:25.280 --> 00:43:27.200 So in addition to storing the one, I'm going 00:43:27.200 --> 00:43:30.300 to leave myself some room, a little scratch pad if you will, 00:43:30.300 --> 00:43:31.880 to use some other bits as well. 00:43:31.880 --> 00:43:34.310 Same for the two, same for the three. 00:43:34.310 --> 00:43:38.450 And you can perhaps see where this is going based on last week. 00:43:38.450 --> 00:43:43.640 If I want to somehow connect the one to the two, 00:43:43.640 --> 00:43:48.290 any instincts as to what I should write in this box here that 00:43:48.290 --> 00:43:52.460 would lead me effectively from one to the two? 00:43:52.460 --> 00:43:53.390 What could go here? 00:43:53.390 --> 00:43:53.900 Yeah. 00:43:53.900 --> 00:43:55.852 AUDIENCE: [INAUDIBLE] 00:43:56.882 --> 00:43:58.965 DAVID J. MALAN: We could store the address of two. 00:43:58.965 --> 00:44:01.463 And so, specifically, what would you have me write here? 00:44:01.463 --> 00:44:05.220 AUDIENCE: [INAUDIBLE] pointer 0x456. 00:44:05.220 --> 00:44:06.220 DAVID J. MALAN: Perfect. 00:44:06.220 --> 00:44:09.220 Ideally, I would just put in this box another integer, 00:44:09.220 --> 00:44:11.690 one that happens to be represented in hexadecimal. 00:44:11.690 --> 00:44:12.940 But that's just a base system. 00:44:12.940 --> 00:44:15.320 It's just a human thing for us to look at. 00:44:15.320 --> 00:44:18.110 I'm going to put the value 0x456 here. 00:44:18.110 --> 00:44:19.690 So let me go ahead and reveal that. 00:44:19.690 --> 00:44:21.290 0x456 goes there. 00:44:21.290 --> 00:44:23.290 You can perhaps see further where this is going. 00:44:23.290 --> 00:44:25.390 Well, if I want to get from the two to the three, 00:44:25.390 --> 00:44:29.050 I think I need to put below the two the address of the three, which 00:44:29.050 --> 00:44:30.760 gives me 0x789. 00:44:30.760 --> 00:44:33.400 Now if three is the end of the list, I don't 00:44:33.400 --> 00:44:35.230 want to let it be some garbage value. 00:44:35.230 --> 00:44:38.020 Because that would imply that who knows where it's pointing. 00:44:38.020 --> 00:44:39.377 I need some definitive value. 00:44:39.377 --> 00:44:40.960 And just what would your instincts be? 00:44:40.960 --> 00:44:44.920 If I want to make clear with some special sentinel value 00:44:44.920 --> 00:44:47.350 that the buck stops here, what do I put? 00:44:47.350 --> 00:44:48.525 What might my options be? 00:44:48.525 --> 00:44:49.150 AUDIENCE: Null? 00:44:49.150 --> 00:44:49.480 DAVID J. MALAN: Yeah. 00:44:49.480 --> 00:44:50.110 So null. 00:44:50.110 --> 00:44:55.240 Not N-U-L, per se, but N-U-L-L, which was the new keyword we introduced last 00:44:55.240 --> 00:44:58.960 week, which just represents an empty pointer, if you will. 00:44:58.960 --> 00:45:01.780 Technically the address 0x0. 00:45:01.780 --> 00:45:03.760 So literally, the zero address. 00:45:03.760 --> 00:45:06.430 And what humans did years ago, they just decided, you know what? 00:45:06.430 --> 00:45:09.850 Nothing should ever live at address zero in memory. 00:45:09.850 --> 00:45:12.700 We're just going to reserve that one special byte to be 00:45:12.700 --> 00:45:15.850 a special signal, a sentinel value such that if you ever 00:45:15.850 --> 00:45:19.540 see a zero address in a pointer, it just means it's invalid. 00:45:19.540 --> 00:45:20.720 It does not exist. 00:45:20.720 --> 00:45:24.940 Now, we write that though a little more pleasantly for the eyes as just N-U-L-L 00:45:24.940 --> 00:45:25.780 in all caps. 00:45:25.780 --> 00:45:27.500 And that's a keyword in C as well. 00:45:27.500 --> 00:45:31.120 But of course, last week, I claimed that who cares where things are in memory. 00:45:31.120 --> 00:45:34.730 And honestly, this quickly gets tedious even worrying about these values. 00:45:34.730 --> 00:45:37.840 So let me abstract this away and propose that if we 00:45:37.840 --> 00:45:40.780 want to remember where all of these numbers are in memory, 00:45:40.780 --> 00:45:43.570 let's give ourselves one final piece of memory 00:45:43.570 --> 00:45:47.110 that just allows us to start the whole process. 00:45:47.110 --> 00:45:50.800 Let me allocate on the left-hand side here not room for a number, 00:45:50.800 --> 00:45:54.850 like one, two, three, just room for a pointer that henceforth 00:45:54.850 --> 00:45:57.320 I think I'll call list by convention. 00:45:57.320 --> 00:46:00.880 And then store in that one additional pointer 00:46:00.880 --> 00:46:03.340 a value that just kickstarts the whole process. 00:46:03.340 --> 00:46:06.280 This is the treasure map, if you will, that you get handed. 00:46:06.280 --> 00:46:11.630 And this has the address of the very first actual node in memory. 00:46:11.630 --> 00:46:13.630 Now, technically, we could just start with this. 00:46:13.630 --> 00:46:15.940 But it turns out we'll see it's just a little cleaner 00:46:15.940 --> 00:46:18.880 to use a simple single pointer that leads 00:46:18.880 --> 00:46:22.330 to the things you care about, as opposed to just starting with the first element 00:46:22.330 --> 00:46:22.940 y. 00:46:22.940 --> 00:46:25.150 Well, if you ever want to get rid of this element, 00:46:25.150 --> 00:46:27.150 it'd be nice if you could at least still hang on 00:46:27.150 --> 00:46:29.590 to an empty sheet of paper that indicates that the list is 00:46:29.590 --> 00:46:31.430 empty would be one argument for that. 00:46:31.430 --> 00:46:33.700 So again, who cares about these addresses now? 00:46:33.700 --> 00:46:36.940 Now with the wave of the hand, let's just abstract it away. 00:46:36.940 --> 00:46:38.770 And there are our pointers. 00:46:38.770 --> 00:46:42.550 Each of those addresses in the squares at the bottom 00:46:42.550 --> 00:46:46.390 are simply pointing to the next element in the list. 00:46:46.390 --> 00:46:48.730 The jargon to introduce here would be that now 00:46:48.730 --> 00:46:50.890 that we have these integers one, two, three, 00:46:50.890 --> 00:46:53.290 but they're in these wrappers, if you will, 00:46:53.290 --> 00:46:58.660 these structures that have metadata that is additional data that is related to 00:46:58.660 --> 00:47:00.460 but not the data you actually care about. 00:47:00.460 --> 00:47:01.150 This is data. 00:47:01.150 --> 00:47:02.290 This is metadata. 00:47:02.290 --> 00:47:05.200 This thing here, rectangularly we'll call a node-- 00:47:05.200 --> 00:47:08.530 N-O-D-E. And it's just a term of art that means it's like a container 00:47:08.530 --> 00:47:12.130 in code for storing some values. 00:47:12.130 --> 00:47:14.050 This then is a linked list. 00:47:14.050 --> 00:47:17.860 And this then is the graphical incarnation 00:47:17.860 --> 00:47:20.240 of one node pointing to the other. 00:47:20.240 --> 00:47:22.990 In this case, they happen to be by chance and by design 00:47:22.990 --> 00:47:25.450 of this desk contiguous initially. 00:47:25.450 --> 00:47:27.490 But there's no requirement that they be such. 00:47:27.490 --> 00:47:28.420 The one could be over there. 00:47:28.420 --> 00:47:29.080 The two over there. 00:47:29.080 --> 00:47:29.955 The three over there. 00:47:29.955 --> 00:47:33.430 I would just need more foam fingers to point at one to the next. 00:47:33.430 --> 00:47:36.355 Questions on this concept of a linked list? 00:47:36.355 --> 00:47:38.050 AUDIENCE: [INAUDIBLE] 00:47:38.050 --> 00:47:39.547 DAVID J. MALAN: Yeah, in the back. 00:47:39.547 --> 00:47:43.820 AUDIENCE: [INAUDIBLE] array [INAUDIBLE] by pointer that is outside of the array 00:47:43.820 --> 00:47:44.514 itself? 00:47:44.514 --> 00:47:46.139 DAVID J. MALAN: Can you say that again? 00:47:46.139 --> 00:47:49.195 AUDIENCE: Do traditional arrays always start [INAUDIBLE] pointer 00:47:49.195 --> 00:47:50.653 that is outside of the [INAUDIBLE]? 00:47:50.653 --> 00:47:51.987 DAVID J. MALAN: A good question. 00:47:51.987 --> 00:47:53.867 Do traditional arrays start with a pointer 00:47:53.867 --> 00:47:55.200 that's outside of the structure? 00:47:55.200 --> 00:47:56.430 Short answer, no. 00:47:56.430 --> 00:48:00.210 Arrays are special in C and certain other languages. 00:48:00.210 --> 00:48:04.060 And the name of an array is technically a symbol, if you will, 00:48:04.060 --> 00:48:08.700 that the computer-- the program knows maps to a specific location in memory. 00:48:08.700 --> 00:48:12.540 It's just a label, a synonym for an memory address. 00:48:12.540 --> 00:48:14.320 It does not take up space. 00:48:14.320 --> 00:48:17.580 So to be clear, the name of an array does not take up space 00:48:17.580 --> 00:48:19.530 like that extra square on the left. 00:48:19.530 --> 00:48:22.810 But you do need that extra square on the left when implementing a linked list 00:48:22.810 --> 00:48:25.530 so that you can determine if the list is of size 00:48:25.530 --> 00:48:29.650 zero there's nothing being pointed at, or size three in this case. 00:48:29.650 --> 00:48:32.820 We're taking on more responsibility ourselves. 00:48:32.820 --> 00:48:33.390 Yeah. 00:48:33.390 --> 00:48:35.350 AUDIENCE: [INAUDIBLE] 00:48:39.099 --> 00:48:41.307 DAVID J. MALAN: How do you point to the next element? 00:48:41.307 --> 00:48:42.780 Can you elaborate? 00:48:42.780 --> 00:48:47.103 AUDIENCE: These elements pointing to the next one, how does [INAUDIBLE]?? 00:48:47.103 --> 00:48:48.520 DAVID J. MALAN: Ah, good question. 00:48:48.520 --> 00:48:50.603 If each of these elements is pointing to the next, 00:48:50.603 --> 00:48:52.360 how does three point to the others? 00:48:52.360 --> 00:48:53.800 Short answer, it doesn't. 00:48:53.800 --> 00:48:55.960 At least in this design we have more technically 00:48:55.960 --> 00:48:58.300 what's called a singly linked list. 00:48:58.300 --> 00:49:01.490 And as the arrows imply, it only goes in one direction. 00:49:01.490 --> 00:49:04.960 So if you somehow find in code maybe a for loop, maybe a while loop, 00:49:04.960 --> 00:49:09.700 somehow you're in code over here, you have no way in code 00:49:09.700 --> 00:49:13.150 to go backwards unless we change this to a doubly linked list 00:49:13.150 --> 00:49:17.440 where I add another box that lets me have arrows in both directions. 00:49:17.440 --> 00:49:23.080 Or maybe I just make it circular and I connect the three back 00:49:23.080 --> 00:49:24.940 to the one, which you can totally do. 00:49:24.940 --> 00:49:27.183 But that tends to make life harder because now you 00:49:27.183 --> 00:49:30.100 have to figure out when you're stuck in a loop in your data structure. 00:49:30.100 --> 00:49:31.370 But it's doable as well. 00:49:31.370 --> 00:49:34.480 But as is, it's a dead end by design. 00:49:34.480 --> 00:49:36.730 Other questions on this design here? 00:49:40.250 --> 00:49:42.570 Well, how might we implement this structure in code? 00:49:42.570 --> 00:49:46.370 Well, let me just connect the dots to something we've seen before here. 00:49:46.370 --> 00:49:50.780 This is how a couple of weeks ago, we introduced the notion of a person. 00:49:50.780 --> 00:49:53.270 And we claimed a person might have a name and a number. 00:49:53.270 --> 00:49:55.937 Last week, of course, we took off some of these training wheels. 00:49:55.937 --> 00:49:59.120 And a string is really technically a char star in both cases. 00:49:59.120 --> 00:50:01.640 But really, there's no conceptual difference beyond that. 00:50:01.640 --> 00:50:06.110 But let's use this same paradigm to implement a node 00:50:06.110 --> 00:50:07.757 as I described it in that picture. 00:50:07.757 --> 00:50:09.590 So let me get rid of the name and the number 00:50:09.590 --> 00:50:11.360 because that's related only to a person. 00:50:11.360 --> 00:50:15.440 And let me rename this structure for discussion's sake to node. 00:50:15.440 --> 00:50:20.000 That then invites the question, well, what needs to go inside of a node? 00:50:20.000 --> 00:50:22.970 Well, minimally, an integer. 00:50:22.970 --> 00:50:27.417 But this is now where we need to think a little harder, just conceptually. 00:50:27.417 --> 00:50:29.750 Even if you have no idea how to type it at the keyboard, 00:50:29.750 --> 00:50:33.920 what else needs to be part of a node based on these rectangular pictures 00:50:33.920 --> 00:50:36.455 that we've drawn? 00:50:36.455 --> 00:50:37.330 What more do we need? 00:50:37.330 --> 00:50:38.246 Yeah. 00:50:38.246 --> 00:50:39.755 AUDIENCE: [INAUDIBLE] 00:50:39.755 --> 00:50:40.630 DAVID J. MALAN: Yeah. 00:50:40.630 --> 00:50:42.650 We need a pointer to another node. 00:50:42.650 --> 00:50:48.190 So if I don't know how to implement this yet, it could be something like pointer 00:50:48.190 --> 00:50:49.847 to another node, how do I do that? 00:50:49.847 --> 00:50:50.680 Well, you know what? 00:50:50.680 --> 00:50:53.290 It turns out you would ideally say this. 00:50:53.290 --> 00:50:59.260 If you know that the next node is itself a node by definition, well, 00:50:59.260 --> 00:51:02.752 any time we've needed a pointer we just use the data type and a star. 00:51:02.752 --> 00:51:04.960 And I'm going to arbitrarily, but I think reasonably, 00:51:04.960 --> 00:51:08.920 call this second squared at the bottom of those rectangles 00:51:08.920 --> 00:51:11.770 next as the name of my attribute here. 00:51:11.770 --> 00:51:15.760 But node star just connotes that the next variable 00:51:15.760 --> 00:51:19.118 is going to be not a node per se but the address of a node. 00:51:19.118 --> 00:51:20.410 And that's exactly what we did. 00:51:20.410 --> 00:51:26.240 You had me put 0x456, 0x789 in that box, which is the address of another node. 00:51:26.240 --> 00:51:30.730 So the way we would express this in code would be node star next. 00:51:30.730 --> 00:51:33.830 But we could call the variable anything we want. 00:51:33.830 --> 00:51:35.890 Now, this is a bit of a white lie. 00:51:35.890 --> 00:51:37.120 But we'll fix this right now. 00:51:37.120 --> 00:51:39.400 This code won't actually compile. 00:51:39.400 --> 00:51:41.710 C takes you pretty literally, recall. 00:51:41.710 --> 00:51:45.220 And if you use some term at the top of your file 00:51:45.220 --> 00:51:47.405 that you don't define until later in your file, 00:51:47.405 --> 00:51:49.030 you're going to see some error message. 00:51:49.030 --> 00:51:50.738 We've seen this when I've messed up and I 00:51:50.738 --> 00:51:53.500 forgot to include the function prototypes at the top of my code. 00:51:53.500 --> 00:51:55.000 This is related in spirit. 00:51:55.000 --> 00:51:57.880 I seem here on my one, two, three, fourth line of code, 00:51:57.880 --> 00:52:00.880 I'm trying to use this new term of art that I invented here 00:52:00.880 --> 00:52:04.220 in my code called node, even though it's a CS term as well. 00:52:04.220 --> 00:52:08.410 But nowhere above this it would seem did I even define what a node is. 00:52:08.410 --> 00:52:09.613 It's not a data type in C. 00:52:09.613 --> 00:52:11.530 Every computer scientists know what a node is. 00:52:11.530 --> 00:52:13.900 But it doesn't come for free with the language. 00:52:13.900 --> 00:52:16.330 So I need to do something else. 00:52:16.330 --> 00:52:20.420 I need this word here to come first so that I can use it here. 00:52:20.420 --> 00:52:22.270 And so, we have this catch-22. 00:52:22.270 --> 00:52:26.290 How can a structure be self-referential, that is, 00:52:26.290 --> 00:52:30.440 point to another version of itself if the word doesn't yet exist? 00:52:30.440 --> 00:52:34.510 So the solution to this in C, which we didn't need for a person 00:52:34.510 --> 00:52:36.550 because there was no notion of listing-- 00:52:36.550 --> 00:52:37.900 connecting is a list-- 00:52:37.900 --> 00:52:41.380 we need one more keyword here that we didn't need for a person. 00:52:41.380 --> 00:52:43.490 And we reuse that keyword here. 00:52:43.490 --> 00:52:45.250 So kind of an annoying detail. 00:52:45.250 --> 00:52:49.840 But if we preemptively call this whole thing struct node, 00:52:49.840 --> 00:52:54.370 you can now refer to the thing on the inside as a struct node star. 00:52:54.370 --> 00:52:58.630 But then you can shorten the name of the whole thing from struct node 00:52:58.630 --> 00:52:59.920 to just node. 00:52:59.920 --> 00:53:01.570 Sort of an annoying sequence of steps. 00:53:01.570 --> 00:53:05.050 But in short, any time you're building a node, a linked list in memory, 00:53:05.050 --> 00:53:06.340 this is just the paradigm. 00:53:06.340 --> 00:53:10.060 You use typedef, struct, the name of the thing you want to define, like node. 00:53:10.060 --> 00:53:14.020 You use that name on the inside if you want to point from one to another. 00:53:14.020 --> 00:53:18.730 And then you can shorten it down here to just be called node. 00:53:18.730 --> 00:53:23.690 Questions then on this code here? 00:53:23.690 --> 00:53:26.480 Questions on what we just did? 00:53:26.480 --> 00:53:32.120 Well, if I rewind just a moment to that final picture, what would be the upside 00:53:32.120 --> 00:53:34.310 to be clear, of having jumped through these hoops 00:53:34.310 --> 00:53:37.160 and added this complexity if you will? 00:53:37.160 --> 00:53:41.390 What problem did we just solve by linking together these three values 00:53:41.390 --> 00:53:42.050 to be clear? 00:53:42.050 --> 00:53:42.560 Yeah. 00:53:42.560 --> 00:53:44.801 AUDIENCE: Making lists that are not [INAUDIBLE].. 00:53:44.801 --> 00:53:46.426 DAVID J. MALAN: Making lists that are-- 00:53:46.426 --> 00:53:47.805 AUDIENCE: Not [INAUDIBLE]. 00:53:47.805 --> 00:53:50.180 DAVID J. MALAN: Oh, that are not contiguous, if you will. 00:53:50.180 --> 00:53:52.870 So making lists that are not contiguous in memory. 00:53:52.870 --> 00:53:56.440 The upside of which is that if I want to add the number four to this list, 00:53:56.440 --> 00:53:59.710 it looks like I could choose from any chunks of available memory 00:53:59.710 --> 00:54:00.520 on the screen. 00:54:00.520 --> 00:54:04.270 I just need to point from the end of the current list 00:54:04.270 --> 00:54:06.250 to wherever that other one is in memory. 00:54:06.250 --> 00:54:10.270 What I don't need to do, to be clear, is copy the one, the two, or the three. 00:54:10.270 --> 00:54:11.840 Everything can just stay put. 00:54:11.840 --> 00:54:14.890 Which means time-wise I can do this much more quickly, 00:54:14.890 --> 00:54:17.140 it would seem, without copying things again and again. 00:54:17.140 --> 00:54:20.710 And even without using realloc to let it do all of the copying 00:54:20.710 --> 00:54:22.060 potentially for me. 00:54:22.060 --> 00:54:25.990 But as we'll start seeing even more in the coming weeks, every time we benefit 00:54:25.990 --> 00:54:28.450 and solve some problem, we pay a price. 00:54:28.450 --> 00:54:29.680 There's a trade-off. 00:54:29.680 --> 00:54:32.440 What is a downside as you might perceive now 00:54:32.440 --> 00:54:36.040 of using a linked list instead of an array? 00:54:36.040 --> 00:54:36.670 Yeah. 00:54:36.670 --> 00:54:37.900 AUDIENCE: You use twice as much memory. 00:54:37.900 --> 00:54:38.290 DAVID J. MALAN: Yeah. 00:54:38.290 --> 00:54:39.880 I mean, we use twice as much memory. 00:54:39.880 --> 00:54:42.547 Because now in addition to storing the integers one, two, three, 00:54:42.547 --> 00:54:45.040 I also need to store a pointer for each of those. 00:54:45.040 --> 00:54:48.460 And honestly, even this picture is a bit of a simplification. 00:54:48.460 --> 00:54:51.910 Technically, in most systems today, each int would be four bytes. 00:54:51.910 --> 00:54:54.767 Technically today, most pointers though would be eight bytes. 00:54:54.767 --> 00:54:57.100 I just didn't want to draw this weird shape on the board 00:54:57.100 --> 00:54:59.558 where the bottom square is even bigger than the top square. 00:54:59.558 --> 00:55:02.698 But technically, we're using even more than twice as much space 00:55:02.698 --> 00:55:03.490 for these pointers. 00:55:03.490 --> 00:55:04.573 So there's that trade-off. 00:55:04.573 --> 00:55:07.060 Now, thankfully, decades after C was invented, 00:55:07.060 --> 00:55:09.650 memory is generally much cheaper nowadays. 00:55:09.650 --> 00:55:12.180 And so, it's OK to spend more of it if you need to. 00:55:12.180 --> 00:55:14.180 And it depends on what you want to optimize for. 00:55:14.180 --> 00:55:16.420 But that's absolutely here a downside. 00:55:16.420 --> 00:55:20.320 What's another downside of having transitioned to a linked list? 00:55:20.320 --> 00:55:22.070 AUDIENCE: [INAUDIBLE] 00:55:22.070 --> 00:55:23.870 DAVID J. MALAN: You can't index into it. 00:55:23.870 --> 00:55:26.370 Now, I haven't even tried in code. 00:55:26.370 --> 00:55:31.640 But when you have a linked list you can no longer use square bracket notation. 00:55:31.640 --> 00:55:32.780 Because why? 00:55:32.780 --> 00:55:36.410 Well, square bracket notation just assumes the contiguousness of memory. 00:55:36.410 --> 00:55:37.820 Location zero is here. 00:55:37.820 --> 00:55:39.950 Location one is literally one to the right. 00:55:39.950 --> 00:55:42.590 Location two is literally one to the right, one to the right. 00:55:42.590 --> 00:55:45.350 These things, even though I've drawn it from right to left 00:55:45.350 --> 00:55:48.050 to just keep things pretty, there are gaps here. 00:55:48.050 --> 00:55:49.970 And this is just my interpretation of this. 00:55:49.970 --> 00:55:51.225 These gaps could be big. 00:55:51.225 --> 00:55:52.100 They could be narrow. 00:55:52.100 --> 00:55:53.720 They could be down here, up here. 00:55:53.720 --> 00:55:57.320 They could be anywhere so long as we're linking things together in this list. 00:55:57.320 --> 00:56:01.130 The computer can't just use bracket zero, bracket one, bracket two anymore 00:56:01.130 --> 00:56:04.490 because it can't do simple arithmetic and jump to the middle. 00:56:04.490 --> 00:56:06.980 And now, here's perhaps the worst price we've paid. 00:56:06.980 --> 00:56:09.860 If you don't have square bracket notation, or really, 00:56:09.860 --> 00:56:12.920 you don't have contiguousness, what algorithm did 00:56:12.920 --> 00:56:16.540 we just sacrifice for this dynamism? 00:56:16.540 --> 00:56:20.230 If you rewind even back to week zero. 00:56:20.230 --> 00:56:23.680 And we gave it a name in week three. 00:56:23.680 --> 00:56:27.250 What algorithm can we not use now if we can't assume that the memory is 00:56:27.250 --> 00:56:29.603 back, to back, to back, to back? 00:56:29.603 --> 00:56:30.350 AUDIENCE: Binary? 00:56:30.350 --> 00:56:31.940 DAVID J. MALAN: Binary search. 00:56:31.940 --> 00:56:32.570 Why? 00:56:32.570 --> 00:56:36.350 Because binary search, just like the phone book, back in the first week, 00:56:36.350 --> 00:56:39.950 requires being able to arithmetically jump right to the middle. 00:56:39.950 --> 00:56:42.260 Take the total length of it, divide by two, and boom. 00:56:42.260 --> 00:56:44.802 You're right there in the middle with some simple arithmetic. 00:56:44.802 --> 00:56:49.520 Here they might be laid out, again, with these big or small gaps. 00:56:49.520 --> 00:56:52.148 There's no simple math I can do to just jump immediately 00:56:52.148 --> 00:56:53.190 to the one in the middle. 00:56:53.190 --> 00:56:55.100 And in fact, again, if this TV were bigger, 00:56:55.100 --> 00:56:59.038 the two could technically be in memory be way down here or even way over here. 00:56:59.038 --> 00:57:01.580 The foam finger could be pointing in any number of directions 00:57:01.580 --> 00:57:03.380 depending on where malloc put the thing. 00:57:03.380 --> 00:57:05.630 There's just no way to do binary search. 00:57:05.630 --> 00:57:08.360 And so, it would seem that we've paid another price indeed 00:57:08.360 --> 00:57:09.980 in terms of a performance. 00:57:09.980 --> 00:57:13.170 We're now talking about linear time again. 00:57:13.170 --> 00:57:14.580 So that's a regression. 00:57:14.580 --> 00:57:16.128 Now, that's also a lot. 00:57:16.128 --> 00:57:18.920 Feels like a good time for some muffins and fruit out in the lobby. 00:57:18.920 --> 00:57:21.837 And when we come back, we'll try to solve the problem we just created. 00:57:21.837 --> 00:57:23.060 So see you in 10. 00:57:23.060 --> 00:57:24.290 So we are back. 00:57:24.290 --> 00:57:26.900 And let's see if we can't now take some of these higher level 00:57:26.900 --> 00:57:29.930 concepts of stitching together these nodes in memory 00:57:29.930 --> 00:57:31.933 and translate it to some actual code. 00:57:31.933 --> 00:57:34.100 But we'll do it step by step first before I actually 00:57:34.100 --> 00:57:35.360 start writing it in VS Code. 00:57:35.360 --> 00:57:38.485 So if, Carter, you wouldn't mind helping me step through with some visuals, 00:57:38.485 --> 00:57:41.630 let me propose that line by line we solve some of the problems 00:57:41.630 --> 00:57:44.730 that we've just created for ourselves in building this thing in memory. 00:57:44.730 --> 00:57:48.620 So let's go ahead and first consider how we 00:57:48.620 --> 00:57:52.950 could build a linked list containing the numbers indeed one, then two, then 00:57:52.950 --> 00:57:53.450 three. 00:57:53.450 --> 00:57:55.160 And let's translate each of those steps to code 00:57:55.160 --> 00:57:57.510 and then we'll put it all together into something that actually runs. 00:57:57.510 --> 00:58:00.020 So how about the first step here will just be this. 00:58:00.020 --> 00:58:04.460 To declare a pointer called list that's initially has no value, 00:58:04.460 --> 00:58:05.990 at least at this point in the story. 00:58:05.990 --> 00:58:07.640 List is the name of the variable. 00:58:07.640 --> 00:58:09.980 Node star just means that this is essentially 00:58:09.980 --> 00:58:12.590 going to be our little square over here that 00:58:12.590 --> 00:58:14.780 points to the beginning of the list. 00:58:14.780 --> 00:58:17.750 Of course, it's ideal if it ultimately has a value. 00:58:17.750 --> 00:58:21.050 Because when we initially call this line of code, 00:58:21.050 --> 00:58:24.050 it just gives us indeed that square over here on the left. 00:58:24.050 --> 00:58:27.230 But it's got a garbage value because there's no equal sign on the other side 00:58:27.230 --> 00:58:27.780 there. 00:58:27.780 --> 00:58:30.080 So let's propose that we do one more step here 00:58:30.080 --> 00:58:34.160 and actually initialize it to null so that if only 00:58:34.160 --> 00:58:37.177 we know that it's not garbage, it at least has some known value. 00:58:37.177 --> 00:58:40.010 And null is a good way of signifying that at this point in the story 00:58:40.010 --> 00:58:41.240 the list is empty. 00:58:41.240 --> 00:58:44.390 Indeed, null indicates there's no nodes in the list. 00:58:44.390 --> 00:58:46.930 So that picture would now look like this, whereby let's just 00:58:46.930 --> 00:58:48.680 draw-- instead of writing null everywhere, 00:58:48.680 --> 00:58:51.680 I'll just leave the squares blank when it's not a garbage value, per se. 00:58:51.680 --> 00:58:54.320 It's literally 0x0 or null. 00:58:54.320 --> 00:58:57.650 So that's it for building a linked list of size zero. 00:58:57.650 --> 00:58:58.942 We're done then. 00:58:58.942 --> 00:59:01.400 But we want to now add a one, and then a two, then a three. 00:59:01.400 --> 00:59:03.290 So next step here might be this. 00:59:03.290 --> 00:59:08.030 If I want to allocate the first of my rectangles on our previous picture, 00:59:08.030 --> 00:59:09.200 I'm going to call malloc. 00:59:09.200 --> 00:59:12.650 And I'm going to ask for enough memory to fit a whole node. 00:59:12.650 --> 00:59:15.572 Now, technically, I think that's going to be four bytes for the int 00:59:15.572 --> 00:59:17.780 and eight bytes for the pointer even though I did not 00:59:17.780 --> 00:59:19.650 draw it to scale on the board. 00:59:19.650 --> 00:59:21.890 So that's technically going to be what, 12 bytes? 00:59:21.890 --> 00:59:24.740 But again, size of node just figures out how many 00:59:24.740 --> 00:59:26.450 bytes I actually need dynamically. 00:59:26.450 --> 00:59:30.200 That's going to return to me the address of that chunk of memory, which 00:59:30.200 --> 00:59:33.860 apparently I'm going to store inside of a temporary variable called n 00:59:33.860 --> 00:59:35.060 for short for node. 00:59:35.060 --> 00:59:37.010 But let's see what this does pictorially. 00:59:37.010 --> 00:59:39.710 So when this line of code is executed, I first get, 00:59:39.710 --> 00:59:41.930 on the left, that variable n. 00:59:41.930 --> 00:59:45.410 It's got a garbage value by default because I haven't executed 00:59:45.410 --> 00:59:47.143 the whole thing from right to left. 00:59:47.143 --> 00:59:49.310 Meanwhile, on the right-hand side of the expression, 00:59:49.310 --> 00:59:51.320 I've got now a node somewhere in memory. 00:59:51.320 --> 00:59:52.550 It happened to be free here. 00:59:52.550 --> 00:59:54.203 This is where malloc put it for me. 00:59:54.203 --> 00:59:56.120 But it does have two garbage values initially. 00:59:56.120 --> 01:00:01.040 But because it's a node per my typedef earlier, every node I proposed 01:00:01.040 --> 01:00:03.980 is going to have a number and a next pointer. 01:00:03.980 --> 01:00:05.660 So we can see those labeled here. 01:00:05.660 --> 01:00:07.640 But they've got two garbage values initially. 01:00:07.640 --> 01:00:11.120 But all I care about initially is that ultimately n 01:00:11.120 --> 01:00:13.520 is pointing at that chunk of code. 01:00:13.520 --> 01:00:18.600 So initially, if we could back up two steps, we have-- two steps. 01:00:18.600 --> 01:00:20.720 So we have initially-- oh, one step forward. 01:00:20.720 --> 01:00:25.790 We have this line of code gives us this variable here, which has garbage. 01:00:25.790 --> 01:00:29.660 When this side of the expression is executed, that allocates the memory. 01:00:29.660 --> 01:00:32.720 And then, when we copy from right to left the address 01:00:32.720 --> 01:00:36.290 of that chunk of memory, that's what gives us conceptually this arrow. 01:00:36.290 --> 01:00:38.848 And the garbage goes away because that's a valid pointer now. 01:00:38.848 --> 01:00:40.640 Of course, there's still two garbage values 01:00:40.640 --> 01:00:43.940 there because we haven't set this node to store a number like the number one. 01:00:43.940 --> 01:00:46.107 So let's go ahead and execute one other line of code 01:00:46.107 --> 01:00:49.850 like this, which, while cryptic-looking, is just an application of ideas 01:00:49.850 --> 01:00:51.890 we've seen in week four and prior. 01:00:51.890 --> 01:00:56.060 Star n means to start at this variable and go there. 01:00:56.060 --> 01:01:00.240 Follow the arrow is what the star or the dereference operator does for us. 01:01:00.240 --> 01:01:03.440 And then, the dot operator, recall, when we first introduced structs, 01:01:03.440 --> 01:01:06.110 like for a person struct, allows us to go 01:01:06.110 --> 01:01:08.190 at the number field or the next field. 01:01:08.190 --> 01:01:10.910 So if I do star n, and then in parentheses 01:01:10.910 --> 01:01:12.890 to make sure order of operations is preserved, 01:01:12.890 --> 01:01:15.920 dot number, and then assign it the actual number one, 01:01:15.920 --> 01:01:18.270 which puts the one in the top of that rectangle. 01:01:18.270 --> 01:01:20.747 Now, admittedly, this syntax is not very user-friendly. 01:01:20.747 --> 01:01:21.830 It's annoying to remember. 01:01:21.830 --> 01:01:23.070 You have to have the parentheses. 01:01:23.070 --> 01:01:24.660 So there's another syntax for this. 01:01:24.660 --> 01:01:27.335 Whenever you're doing two things like this in code, 01:01:27.335 --> 01:01:31.010 dereferencing a pointer that is going to an address. 01:01:31.010 --> 01:01:34.940 And then further using the dot notation to go inside of the structure, 01:01:34.940 --> 01:01:38.900 you find that wonderfully C gives us this syntax, whereby 01:01:38.900 --> 01:01:41.510 you can just change the star and the parentheses 01:01:41.510 --> 01:01:43.387 and the dot to just be an arrow. 01:01:43.387 --> 01:01:45.720 And again, it's not a single character on your keyboard. 01:01:45.720 --> 01:01:47.690 It's a hyphen and then an open angle bracket. 01:01:47.690 --> 01:01:49.400 But I kind of like the semantics of this. 01:01:49.400 --> 01:01:53.810 Because this code now pretty much matches the picture-- n arrow 01:01:53.810 --> 01:01:56.780 leads you to the value that you want to access or ultimately 01:01:56.780 --> 01:01:58.050 change in this way. 01:01:58.050 --> 01:01:59.930 There's one step, though, we've forgotten, 01:01:59.930 --> 01:02:03.110 of course, which is that we can't leave this garbage value here. 01:02:03.110 --> 01:02:07.070 Because the garbage value is some unknown value that effectively 01:02:07.070 --> 01:02:08.510 is pointing who knows where. 01:02:08.510 --> 01:02:10.580 And we don't want to accidentally misinterpret 01:02:10.580 --> 01:02:14.160 that garbage value as being a valid address and risk going there. 01:02:14.160 --> 01:02:17.660 So of course, what value should we put here instead? 01:02:17.660 --> 01:02:20.780 Our old friend null, just to signify that this is indeed 01:02:20.780 --> 01:02:21.665 the end of the list. 01:02:21.665 --> 01:02:23.790 And we could do that with a line of code like this. 01:02:23.790 --> 01:02:27.300 And again, we'll connote as much by just leaving that empty box blank. 01:02:27.300 --> 01:02:29.450 So now, we have a list of size one. 01:02:29.450 --> 01:02:33.050 Let's go ahead and add the second number to it as with these lines here. 01:02:33.050 --> 01:02:39.150 List equals n allows us to remember that indeed, we have this list here. 01:02:39.150 --> 01:02:40.980 So if we can step one step forward. 01:02:40.980 --> 01:02:42.920 Here's what the picture now looks like. 01:02:42.920 --> 01:02:45.470 And technically, let's go one step further here. 01:02:45.470 --> 01:02:50.270 This is now really what's going on in memory once my list of size one exists. 01:02:50.270 --> 01:02:55.148 My main variable called list is pointing at exactly that first node. 01:02:55.148 --> 01:02:57.440 At this point in the story I don't need to know or care 01:02:57.440 --> 01:03:00.200 about the temporary variable that I called n, even though it 01:03:00.200 --> 01:03:01.550 might very well still be there. 01:03:01.550 --> 01:03:03.950 But indeed, this now represents that linked list. 01:03:03.950 --> 01:03:05.900 Let's now indeed add the number two. 01:03:05.900 --> 01:03:09.590 So with the same line of code as before, I'm going to allocate another node. 01:03:09.590 --> 01:03:10.700 Size of node. 01:03:10.700 --> 01:03:12.620 Ideally, I would be checking for null here. 01:03:12.620 --> 01:03:15.090 But we're doing the juicy parts only on the slides. 01:03:15.090 --> 01:03:16.740 Let's now go ahead and depict that. 01:03:16.740 --> 01:03:17.825 So what happens with this? 01:03:17.825 --> 01:03:21.740 This brings back our n pointer, which might have been there the whole time. 01:03:21.740 --> 01:03:23.250 But we're doing this step by step. 01:03:23.250 --> 01:03:26.120 It's a garbage value though because we haven't yet copied from right to left. 01:03:26.120 --> 01:03:27.980 Malloc, of course, gives us a second chunk 01:03:27.980 --> 01:03:31.100 of memory, which maybe ends up there with two garbage values 01:03:31.100 --> 01:03:34.100 by default. I've omitted the labels now just because they're still going 01:03:34.100 --> 01:03:36.590 to be number and next respectively. 01:03:36.590 --> 01:03:40.610 Once we copy from right to left, the garbage value indeed becomes an arrow. 01:03:40.610 --> 01:03:44.900 Oscar disappears because it's now indeed a valid pointer pointing here. 01:03:44.900 --> 01:03:48.290 Now, the values themselves number and next are invalid-- 01:03:48.290 --> 01:03:49.140 garbage value. 01:03:49.140 --> 01:03:53.570 So here is where we can now start using our new syntax like the arrow notation 01:03:53.570 --> 01:03:55.430 or the star and the dot if you prefer. 01:03:55.430 --> 01:03:57.860 And we can change the value of n-- 01:03:57.860 --> 01:03:59.870 follow the arrow to number. 01:03:59.870 --> 01:04:01.220 And that becomes two. 01:04:01.220 --> 01:04:05.340 Similarly, we can do this again and set n arrow next. 01:04:05.340 --> 01:04:08.930 So start at n, follow the arrow, access the next field, 01:04:08.930 --> 01:04:10.670 and set that equal to null. 01:04:10.670 --> 01:04:13.460 Now, we're not quite done yet because we haven't actually 01:04:13.460 --> 01:04:14.908 linked things together. 01:04:14.908 --> 01:04:16.700 So here's now where things get interesting. 01:04:16.700 --> 01:04:18.470 How do I combine these two? 01:04:18.470 --> 01:04:20.160 Well, let me propose this. 01:04:20.160 --> 01:04:24.080 Let me propose on our next line here we actually update for now 01:04:24.080 --> 01:04:26.420 list equal to n. 01:04:26.420 --> 01:04:29.690 That is to say, whatever address this is, whatever it's pointing at, 01:04:29.690 --> 01:04:33.330 change list to be the same address that is point at the same thing. 01:04:33.330 --> 01:04:37.280 So if n is pointing here, let's change list to point here. 01:04:37.280 --> 01:04:39.470 And go ahead and do that, Carter, if you could. 01:04:39.470 --> 01:04:41.040 I don't like this. 01:04:41.040 --> 01:04:42.650 Can you go one further step? 01:04:42.650 --> 01:04:44.180 This is bad. 01:04:44.180 --> 01:04:48.800 What is wrong about my sequence of operations here where I updated list 01:04:48.800 --> 01:04:51.370 to point at my new node? 01:04:51.370 --> 01:04:52.060 Yeah. 01:04:52.060 --> 01:04:53.105 AUDIENCE: We [INAUDIBLE]. 01:04:53.105 --> 01:04:53.980 DAVID J. MALAN: Yeah. 01:04:53.980 --> 01:04:56.090 We lost the pointer to the other node. 01:04:56.090 --> 01:04:59.830 So I don't even care about the ordering, two one or one two. 01:04:59.830 --> 01:05:03.610 The bigger problem now as the lack of arrows over there suggests 01:05:03.610 --> 01:05:05.260 is that I have a memory leak. 01:05:05.260 --> 01:05:08.140 I have orphaned my original node in the sense 01:05:08.140 --> 01:05:10.425 that nothing is pointing at it anymore. 01:05:10.425 --> 01:05:13.300 Now, absolutely, I could fix this by adding some temporary variables. 01:05:13.300 --> 01:05:14.383 I could add it to the mix. 01:05:14.383 --> 01:05:18.280 But at this point in the story I have not done any such recollection thereof. 01:05:18.280 --> 01:05:19.450 So let me back this up. 01:05:19.450 --> 01:05:20.930 And let's go forward in the slides. 01:05:20.930 --> 01:05:22.660 This is where we left off a moment ago. 01:05:22.660 --> 01:05:25.923 I think I need to take into account order of operations. 01:05:25.923 --> 01:05:27.340 And I'm going to keep this simple. 01:05:27.340 --> 01:05:30.130 I'm not going to care about the order of the numbers for now. 01:05:30.130 --> 01:05:33.260 I'm fine with a list that is two and then one. 01:05:33.260 --> 01:05:36.110 So with that said, let me go ahead and update, 01:05:36.110 --> 01:05:39.800 I think, this box here to point at my original node. 01:05:39.800 --> 01:05:42.280 So let's see how we could do this in code. 01:05:42.280 --> 01:05:43.820 n arrow next. 01:05:43.820 --> 01:05:48.945 So n, arrow, next should equal the current list. 01:05:48.945 --> 01:05:50.320 And this is a little weird again. 01:05:50.320 --> 01:05:51.670 But recall what list is. 01:05:51.670 --> 01:05:54.070 List is this pointer here that just contains 01:05:54.070 --> 01:05:57.700 the address of the original address of the list, 01:05:57.700 --> 01:06:01.370 or equivalently, it contains this arrow, whatever it's pointing at. 01:06:01.370 --> 01:06:05.500 So what this means in this line of code, n bracket next means start at n, 01:06:05.500 --> 01:06:07.900 follow the arrow, access the next pointer, 01:06:07.900 --> 01:06:10.690 and set it equal to whatever list equals. 01:06:10.690 --> 01:06:16.000 So if list is pointing here, then next should point there as well. 01:06:16.000 --> 01:06:17.530 This, I think, is safe. 01:06:17.530 --> 01:06:18.880 Because now we have redundancy. 01:06:18.880 --> 01:06:21.700 Now we've got two pointers pointing at the original list. 01:06:21.700 --> 01:06:26.860 And now I think we can do another step whereby we update list to equal n, 01:06:26.860 --> 01:06:29.150 same line of code before that got us into trouble. 01:06:29.150 --> 01:06:31.150 But I'm doing it second now instead of first. 01:06:31.150 --> 01:06:36.340 When I execute list equals n, this now sets list equal to the same thing 01:06:36.340 --> 01:06:37.570 that n equals. 01:06:37.570 --> 01:06:41.650 And so, now I have successfully inserted my new node 01:06:41.650 --> 01:06:44.440 containing two into the list. 01:06:44.440 --> 01:06:46.480 And in fact, if we advance one more, we can just 01:06:46.480 --> 01:06:48.940 clear up the clutter, assume that the temporary variable is 01:06:48.940 --> 01:06:50.080 gone from the story. 01:06:50.080 --> 01:06:53.067 Now we have a linked list where admittedly ordering is wrong. 01:06:53.067 --> 01:06:54.400 It's two one instead of one two. 01:06:54.400 --> 01:06:56.260 But at least it's linked correctly. 01:06:56.260 --> 01:07:00.630 And I didn't orphan or leak any memory. 01:07:00.630 --> 01:07:04.440 Questions on this sequence of steps here? 01:07:04.440 --> 01:07:05.894 Yeah, in back. 01:07:05.894 --> 01:07:09.858 AUDIENCE: So [INAUDIBLE] this is all [INAUDIBLE] type, right? 01:07:09.858 --> 01:07:11.945 [INAUDIBLE] 01:07:11.945 --> 01:07:12.820 DAVID J. MALAN: Yeah. 01:07:12.820 --> 01:07:13.460 Spot on. 01:07:13.460 --> 01:07:17.020 So this would fall under that category of a stack, if you will. 01:07:17.020 --> 01:07:21.070 Although I've not called it that by name because I just pushed the number two 01:07:21.070 --> 01:07:22.780 onto this data structure, if you will. 01:07:22.780 --> 01:07:26.320 And indeed, it ended up at the beginning of the list instead of the end. 01:07:26.320 --> 01:07:29.182 And so, here's where we see a distinction between an abstract data 01:07:29.182 --> 01:07:30.640 structure, which is where we began. 01:07:30.640 --> 01:07:34.030 A stack is a thing like the pile of sweaters that just has push and pop 01:07:34.030 --> 01:07:37.000 properties and LIFO access-- 01:07:37.000 --> 01:07:39.040 last in first out. 01:07:39.040 --> 01:07:41.320 How do you implement something like that in memory? 01:07:41.320 --> 01:07:45.310 Well, it would seem that you could implement the notion of a stack here 01:07:45.310 --> 01:07:48.350 not for sweaters but for numbers using a linked list. 01:07:48.350 --> 01:07:51.220 So long as you implement insertion a.k.a. 01:07:51.220 --> 01:07:56.030 pushing by pretending new values to the list by pretending again and again. 01:07:56.030 --> 01:07:58.780 And if, Carter, you don't mind hitting the keyboard one more time. 01:07:58.780 --> 01:08:00.970 If I wanted to add the number three now, you 01:08:00.970 --> 01:08:03.610 could imagine prepending it to the list. 01:08:03.610 --> 01:08:04.310 Why? 01:08:04.310 --> 01:08:07.520 Well honestly, especially as this list gets longer and longer, 01:08:07.520 --> 01:08:10.780 I kind of like the appeal of prepending these elements. 01:08:10.780 --> 01:08:11.320 Why? 01:08:11.320 --> 01:08:15.040 Because even if this list gets crazy long and way, way out here, 01:08:15.040 --> 01:08:17.770 you didn't notice me following all of the arrows 01:08:17.770 --> 01:08:19.149 earlier to do the insertions. 01:08:19.149 --> 01:08:22.569 If I want to insert a fourth number, a fifth number, a sixth number, 01:08:22.569 --> 01:08:24.880 all I have to do is insert it here, if you 01:08:24.880 --> 01:08:29.680 will, point it at the original start of the list, then update this pointer, 01:08:29.680 --> 01:08:30.290 and done. 01:08:30.290 --> 01:08:32.830 And I would say that's two steps, give or take. 01:08:32.830 --> 01:08:35.200 It's not going to be n steps as it would be 01:08:35.200 --> 01:08:39.050 if I had to append the new nodes to the end of the list. 01:08:39.050 --> 01:08:41.705 Now, of course, we've sacrificed ordering of these numbers. 01:08:41.705 --> 01:08:44.080 They're literally in the opposite order or whatever order 01:08:44.080 --> 01:08:45.140 they were inserted in. 01:08:45.140 --> 01:08:48.833 But that might very well be OK depending on the goal at hand. 01:08:48.833 --> 01:08:50.750 Thank you to Carter for stepping through this. 01:08:50.750 --> 01:08:52.479 What if now we wanted to translate this-- 01:08:52.479 --> 01:08:52.930 [APPLAUSE] 01:08:52.930 --> 01:08:53.430 Oh, sure. 01:08:53.430 --> 01:08:55.480 Thank you. 01:08:55.480 --> 01:08:59.240 [LAUGHS] It's all for you, none for me in this example. 01:08:59.240 --> 01:09:02.200 So here we have perhaps a way of translating this now 01:09:02.200 --> 01:09:03.430 to some actual code. 01:09:03.430 --> 01:09:06.040 And this will be the last of the intense code 01:09:06.040 --> 01:09:09.160 here just to give you a sense of how we can translate this idea now 01:09:09.160 --> 01:09:10.580 to actual steps. 01:09:10.580 --> 01:09:13.189 So this is list.c in VS Code here. 01:09:13.189 --> 01:09:15.490 Let me go ahead and make a couple of changes up top. 01:09:15.490 --> 01:09:16.420 Let me go ahead. 01:09:16.420 --> 01:09:22.120 And how about declaring a node using typedef struct 01:09:22.120 --> 01:09:25.120 node using our new framing as before. 01:09:25.120 --> 01:09:27.760 I'm going to give every node a number, as I proposed. 01:09:27.760 --> 01:09:31.390 And every node a pointer to the next element, which is 01:09:31.390 --> 01:09:33.850 going to be implemented just as before. 01:09:33.850 --> 01:09:36.380 And I'm going to simplify the whole name as just node. 01:09:36.380 --> 01:09:40.300 So all that is is the exact same typedef that we proposed earlier. 01:09:40.300 --> 01:09:44.830 Now, let me go ahead and get rid of all of this code which we wrote earlier. 01:09:44.830 --> 01:09:48.250 And recall that this was the most recent version that was not a linked list. 01:09:48.250 --> 01:09:52.090 This was just an array that we allocated and then reallocated. 01:09:52.090 --> 01:09:54.350 So this is the old way of doing things. 01:09:54.350 --> 01:09:58.210 But it was inefficient because we might have to lean on a for loop 01:09:58.210 --> 01:10:00.760 or lean on realloc to copy everything around. 01:10:00.760 --> 01:10:04.930 We're now going to re-implement the notion of a list as an actual linked 01:10:04.930 --> 01:10:07.670 list, not as an array. 01:10:07.670 --> 01:10:10.150 So my main function now might do something like this. 01:10:10.150 --> 01:10:12.330 And I'm going to really just copy the lines of code 01:10:12.330 --> 01:10:14.080 that we just stepped through on the board. 01:10:14.080 --> 01:10:17.710 So let me give myself a special variable called list. 01:10:17.710 --> 01:10:19.360 That's going to be initialized to null. 01:10:19.360 --> 01:10:22.540 And this is just my pointer, the square on the left-hand side of the screen 01:10:22.540 --> 01:10:24.235 that represents the start of the list. 01:10:24.235 --> 01:10:26.110 And if it's null, it means the list is empty. 01:10:26.110 --> 01:10:26.830 So done. 01:10:26.830 --> 01:10:30.460 I'm done implementing a linked list of size zero. 01:10:30.460 --> 01:10:32.462 Well, now, how do I want to run this code? 01:10:32.462 --> 01:10:34.420 Well, let me propose for the sake of discussion 01:10:34.420 --> 01:10:37.482 that this version of the program will take command line arguments. 01:10:37.482 --> 01:10:39.440 So I want to be able to do something like this. 01:10:39.440 --> 01:10:43.060 I want to run this program ultimately and type in three command line 01:10:43.060 --> 01:10:44.680 arguments like this, one, two, three. 01:10:44.680 --> 01:10:47.530 And I want my program in a couple of minutes 01:10:47.530 --> 01:10:51.400 to allocate one, two, three nodes and stitch them together just 01:10:51.400 --> 01:10:54.070 like the visualization on the board. 01:10:54.070 --> 01:10:55.290 I could use get int. 01:10:55.290 --> 01:10:58.040 But it's just going to be faster if we use command line arguments. 01:10:58.040 --> 01:11:00.550 So again, I'm just borrowing some concepts from week two. 01:11:00.550 --> 01:11:03.382 But none of that's possible yet until I change my code here. 01:11:03.382 --> 01:11:04.090 So let's do this. 01:11:04.090 --> 01:11:08.560 Int argc string argv. 01:11:08.560 --> 01:11:10.030 But you know what? 01:11:10.030 --> 01:11:12.800 We know that strings are not actually a thing anymore. 01:11:12.800 --> 01:11:16.600 So I can change my command line argument definition to be what it really is. 01:11:16.600 --> 01:11:18.040 It's really char star. 01:11:18.040 --> 01:11:20.230 But it's the exact same thing as in week two, 01:11:20.230 --> 01:11:23.500 just strings are no more, at least without the training wheels 01:11:23.500 --> 01:11:25.360 on anymore like last week. 01:11:25.360 --> 01:11:27.070 And now let me do this. 01:11:27.070 --> 01:11:33.210 For int i equals one, i is less than argc i plus, plus. 01:11:33.210 --> 01:11:34.960 So what I'm doing with this loop is I just 01:11:34.960 --> 01:11:36.918 want to iterate over the command line argument. 01:11:36.918 --> 01:11:39.820 So I have one number at a time from the prompt. 01:11:39.820 --> 01:11:42.980 What else do I want to do here? 01:11:42.980 --> 01:11:48.170 Well, let's go ahead, and how about do this. 01:11:48.170 --> 01:11:49.730 Let's get a number. 01:11:49.730 --> 01:11:53.140 So int number equals argv bracket i. 01:11:53.140 --> 01:11:54.530 So a couple of notes here. 01:11:54.530 --> 01:11:58.390 One, I'm starting for loop at one instead of zero. 01:11:58.390 --> 01:11:59.890 But I'm going up to argc. 01:11:59.890 --> 01:12:02.530 argc is argument count, how many words are at the prompt. 01:12:02.530 --> 01:12:07.930 Why am I starting at one instead of zero though given my goal? 01:12:07.930 --> 01:12:09.220 Why am I starting one? 01:12:09.220 --> 01:12:10.610 Yeah. 01:12:10.610 --> 01:12:15.805 AUDIENCE: Because [INAUDIBLE] the first value [INAUDIBLE].. 01:12:15.805 --> 01:12:16.680 DAVID J. MALAN: Yeah. 01:12:16.680 --> 01:12:19.860 So the first value in argv is actually the name of the program. 01:12:19.860 --> 01:12:21.130 That's obviously not a number. 01:12:21.130 --> 01:12:22.493 So I want the second value. 01:12:22.493 --> 01:12:25.410 So I'm going to start iterating over those command line arguments at i 01:12:25.410 --> 01:12:25.937 equals one. 01:12:25.937 --> 01:12:26.520 So that's all. 01:12:26.520 --> 01:12:29.340 I just want to get the actual numbers at the prompt. 01:12:29.340 --> 01:12:33.910 Unfortunately, argv, bracket i, is a string, a.k.a. 01:12:33.910 --> 01:12:34.410 char star. 01:12:34.410 --> 01:12:35.267 That is not an int. 01:12:35.267 --> 01:12:36.600 So this line of code won't work. 01:12:36.600 --> 01:12:39.690 But can anyone think back to week two where 01:12:39.690 --> 01:12:43.560 we had a function for converting strings to integers? 01:12:43.560 --> 01:12:44.153 Anyone? 01:12:44.153 --> 01:12:44.820 AUDIENCE: a to i 01:12:44.820 --> 01:12:45.695 DAVID J. MALAN: Yeah. 01:12:45.695 --> 01:12:47.820 So a to i is a function that converts ASCII 01:12:47.820 --> 01:12:50.820 to an integer assuming what you give it as an argument looks 01:12:50.820 --> 01:12:52.590 like a number like one, or two, or three. 01:12:52.590 --> 01:12:53.680 So let me fix this. 01:12:53.680 --> 01:12:55.170 Let me actually do the conversion. 01:12:55.170 --> 01:12:57.990 If I were really being careful, I would error-check this, 01:12:57.990 --> 01:13:01.260 make sure that there's no digits, just like you might have in problem set two. 01:13:01.260 --> 01:13:03.760 But for today's purposes, I'm just going to assume the honor 01:13:03.760 --> 01:13:07.800 system that the user, me, is going to run the program correctly. 01:13:07.800 --> 01:13:11.400 So now that I have a variable containing the number from the command line, 01:13:11.400 --> 01:13:12.910 let's just allocate a node for it. 01:13:12.910 --> 01:13:16.950 So let me do node star n, just like we did in the visualization. 01:13:16.950 --> 01:13:21.990 And let's malloc enough space for the size of one such node here. 01:13:21.990 --> 01:13:24.250 I now need to just be super safe. 01:13:24.250 --> 01:13:29.310 So if n equals equals null, like if I'm out of memory, you know what? 01:13:29.310 --> 01:13:32.640 Let me go ahead and just immediately return one here. 01:13:32.640 --> 01:13:35.580 Otherwise, if that's not the case, let me 01:13:35.580 --> 01:13:41.880 go ahead and update the number field of this new node, which at line 24 01:13:41.880 --> 01:13:44.050 does exist because it did not return null. 01:13:44.050 --> 01:13:46.140 So I did not exit early with return. 01:13:46.140 --> 01:13:49.660 And let me just store whatever number this human typed in first. 01:13:49.660 --> 01:13:53.940 So the return value of a to i, which per line 17 is in my variable 01:13:53.940 --> 01:13:54.750 called number. 01:13:54.750 --> 01:14:01.780 And then, let me go ahead and just prepend this to the list. 01:14:01.780 --> 01:14:08.730 Let me go ahead and say that this next field first has a known value null, 01:14:08.730 --> 01:14:11.140 just so that we get rid of that second garbage value. 01:14:11.140 --> 01:14:13.930 And let me go ahead and now prepend it to the list. 01:14:13.930 --> 01:14:16.950 So if I want to prepend it, that means this new node 01:14:16.950 --> 01:14:24.040 must have a next field that points to the current beginning of the list. 01:14:24.040 --> 01:14:26.730 And again, the goal here is to prepend, prepend, prepend. 01:14:26.730 --> 01:14:28.980 So whatever the current list is, let's change it 01:14:28.980 --> 01:14:33.030 so that this new node points to that existing list. 01:14:33.030 --> 01:14:37.680 And now, step two, as before, was to update the actual list 01:14:37.680 --> 01:14:39.330 to point at this node. 01:14:39.330 --> 01:14:42.780 So recall in red on the screen before, I screwed up originally. 01:14:42.780 --> 01:14:46.750 And I only did this line by moving the pointer too early, if you will. 01:14:46.750 --> 01:14:49.290 But I fixed that once Carter helped me rewind 01:14:49.290 --> 01:14:52.050 and we got rid of the red line, which indicated error. 01:14:52.050 --> 01:14:56.790 And I just do n arrow next to change the next field of this new node 01:14:56.790 --> 01:14:58.360 to point to the existing list. 01:14:58.360 --> 01:15:00.510 So I'm not orphaning anything. 01:15:00.510 --> 01:15:06.962 At this point in the story, I think my code is correct, not batting 01:15:06.962 --> 01:15:07.920 very well though today. 01:15:07.920 --> 01:15:09.030 But I think my code is correct. 01:15:09.030 --> 01:15:11.200 But the program doesn't do anything interesting. 01:15:11.200 --> 01:15:14.940 So it would be nice to now iterate over this linked list in memory whatever 01:15:14.940 --> 01:15:16.990 its order is and print things out. 01:15:16.990 --> 01:15:18.160 Well, how do we do that? 01:15:18.160 --> 01:15:21.000 Well, it turns out if you want to iterate over a linked list, 01:15:21.000 --> 01:15:23.490 the general paradigm is to do something like this. 01:15:23.490 --> 01:15:25.950 To define a temporary variable, I could call it temp. 01:15:25.950 --> 01:15:27.992 But another convention that you might as well see 01:15:27.992 --> 01:15:30.120 is called pointer, ptr for short. 01:15:30.120 --> 01:15:31.870 But you can call it anything you want. 01:15:31.870 --> 01:15:33.930 And you can have a temporary variable first point 01:15:33.930 --> 01:15:35.700 at the first node in the list. 01:15:35.700 --> 01:15:37.887 And then in some kind of loop, like a while loop, 01:15:37.887 --> 01:15:39.720 you point it at the second node in the list. 01:15:39.720 --> 01:15:40.980 And then you keep iterating. 01:15:40.980 --> 01:15:42.510 You point it at the last node in the list. 01:15:42.510 --> 01:15:44.340 And then, eventually, you iterate too far, 01:15:44.340 --> 01:15:46.800 effectively pointing at null, at which point 01:15:46.800 --> 01:15:49.000 your while loop can presumably terminate. 01:15:49.000 --> 01:15:52.590 So how do I implement that idea of allocating a temporary pointer that 01:15:52.590 --> 01:15:56.203 just points at each node in the list and lets me print out ultimately 01:15:56.203 --> 01:15:57.120 each of those numbers? 01:15:57.120 --> 01:15:59.040 Well, let's go back to my code here. 01:15:59.040 --> 01:16:00.430 And let me do this. 01:16:00.430 --> 01:16:04.020 Let me go ahead and declare this temporary pointer, 01:16:04.020 --> 01:16:05.680 which is going to be a node star also. 01:16:05.680 --> 01:16:06.180 Why? 01:16:06.180 --> 01:16:07.530 Because it's the address of a node. 01:16:07.530 --> 01:16:08.850 The first, the second, the third. 01:16:08.850 --> 01:16:11.892 And I'm going to set that equal to whatever the beginning of the list is. 01:16:11.892 --> 01:16:15.180 So that is going to be equivalent to this version of the picture 01:16:15.180 --> 01:16:19.570 here, where pointer is just temporarily pointing at the first node in the list. 01:16:19.570 --> 01:16:21.870 It's not pointing at list per se, it's pointing 01:16:21.870 --> 01:16:25.920 at the first node in the list, which list is also pointing at itself. 01:16:25.920 --> 01:16:28.050 Once I've done this, I think I can translate 01:16:28.050 --> 01:16:32.460 this to code that's a little new-- but it's conceptually familiar perhaps 01:16:32.460 --> 01:16:36.700 now-- while that pointer does not equal null. 01:16:36.700 --> 01:16:40.740 So while I have a valid pointer, like my finger or that arrow 01:16:40.740 --> 01:16:44.530 is pointing at an actual node in memory, well, let me go ahead and print it out. 01:16:44.530 --> 01:16:46.920 So let me print out with percent i backslash 01:16:46.920 --> 01:16:52.908 n whatever is in the current node at the number field within. 01:16:52.908 --> 01:16:55.200 And again, this is going to have the effect, hopefully, 01:16:55.200 --> 01:16:57.180 of first printing the three. 01:16:57.180 --> 01:17:01.170 And I think I just need to now update the pointer so 01:17:01.170 --> 01:17:04.150 that on the next iteration it's pointing at the next value. 01:17:04.150 --> 01:17:06.750 So if this is where the story is, how do I 01:17:06.750 --> 01:17:10.150 update pointer to point at the second element of the list? 01:17:10.150 --> 01:17:12.250 Well, I want pointer to point at the two. 01:17:12.250 --> 01:17:14.760 And I want pointer to eventually point at the three. 01:17:14.760 --> 01:17:16.020 Well, how do I do that? 01:17:16.020 --> 01:17:19.230 Well, the way in code I can follow these arrows is as follows. 01:17:19.230 --> 01:17:22.190 If I currently have pointer pointing at this node 01:17:22.190 --> 01:17:27.570 but I want to point it at the next node, I can borrow this pointer here. 01:17:27.570 --> 01:17:30.920 So whatever this address is in the first node, a.k.a. 01:17:30.920 --> 01:17:33.770 the next field, I can copy that into pointer. 01:17:33.770 --> 01:17:36.200 Because then, pointer will point at whatever 01:17:36.200 --> 01:17:39.230 this is pointing at by just setting one equal to the other. 01:17:39.230 --> 01:17:44.570 So once I've done that, the picture will become this. 01:17:44.570 --> 01:17:47.420 And how do I translate that to code? 01:17:47.420 --> 01:17:50.340 Well, new syntax is surprisingly straightforward. 01:17:50.340 --> 01:17:53.060 All I need do is say pointer after printing 01:17:53.060 --> 01:18:00.300 it equals whatever pointer currently is, but grab its next field instead. 01:18:00.300 --> 01:18:03.860 And this is a very common paradigm when iterating over a linked list 01:18:03.860 --> 01:18:06.470 and you're using some temporary variable like pointer, 01:18:06.470 --> 01:18:09.680 you can simply set pointer equal to pointer next. 01:18:09.680 --> 01:18:12.330 And what that means here is as follows. 01:18:12.330 --> 01:18:16.670 If this is pointer pointing from here down to here, 01:18:16.670 --> 01:18:20.160 pointer next is follow the arrow, grab the next field. 01:18:20.160 --> 01:18:22.550 So if you set pointer equal to this thing, 01:18:22.550 --> 01:18:26.570 that's the same thing as pointing this at this same box. 01:18:26.570 --> 01:18:29.862 And indeed, if I advance to the next slide, 01:18:29.862 --> 01:18:31.820 even though the arrows are technically pointing 01:18:31.820 --> 01:18:34.730 at different parts of the rectangles, that's just for graphic's sake, 01:18:34.730 --> 01:18:36.620 pointer is now pointing at the second node. 01:18:36.620 --> 01:18:39.650 And when I do this again on my next iteration, it points at this. 01:18:39.650 --> 01:18:44.840 And then, this last step, notice, when I keep doing pointer equals pointer next, 01:18:44.840 --> 01:18:47.570 this will become eventually this value. 01:18:47.570 --> 01:18:50.620 But what's this value in this linked list? 01:18:50.620 --> 01:18:51.910 It's null, technically. 01:18:51.910 --> 01:18:55.180 So this arrow will eventually take on this value 01:18:55.180 --> 01:18:57.490 when I set pointer equal to pointer next. 01:18:57.490 --> 01:19:02.630 And at that point ptr, my temporary pointer is going to be null. 01:19:02.630 --> 01:19:04.760 So it might as well look like this pictorially. 01:19:04.760 --> 01:19:07.630 And what does that mean for my loop? 01:19:07.630 --> 01:19:11.530 Once pointer is null, because you've walked off the end of the linked list, 01:19:11.530 --> 01:19:17.210 what's going to be true of this loop here started in line 32? 01:19:17.210 --> 01:19:20.870 Any observations here? 01:19:20.870 --> 01:19:22.010 What's going to be true? 01:19:22.010 --> 01:19:24.620 What will happen now as soon as we hit the end of the list? 01:19:24.620 --> 01:19:25.250 Yeah, sorry. 01:19:25.250 --> 01:19:26.810 AUDIENCE: [INAUDIBLE] 01:19:26.810 --> 01:19:28.040 DAVID J. MALAN: The loop is going to break out. 01:19:28.040 --> 01:19:28.540 Why? 01:19:28.540 --> 01:19:30.830 Because line 32, which is constantly asking, 01:19:30.830 --> 01:19:32.570 well, pointer does not equal null. 01:19:32.570 --> 01:19:36.590 Well, if pointer finally equals null three steps later, 01:19:36.590 --> 01:19:38.400 the while loop is now done. 01:19:38.400 --> 01:19:40.580 And so, what I can do at the end of this program 01:19:40.580 --> 01:19:43.103 once I printed out those values, well, first let's go ahead 01:19:43.103 --> 01:19:44.270 and open my terminal window. 01:19:44.270 --> 01:19:46.220 Let's make list. 01:19:46.220 --> 01:19:47.840 Compile dot slash list. 01:19:47.840 --> 01:19:51.080 And let me try the same values, one, and two, and three. 01:19:51.080 --> 01:19:55.130 That's going to again allocate one node, two node, three nodes by prepending, 01:19:55.130 --> 01:19:57.290 prepending, prepending each of those values. 01:19:57.290 --> 01:20:00.060 And it's then going to iterate over them from left to right. 01:20:00.060 --> 01:20:03.380 And so, when I hit Enter now, what should I see on the screen 01:20:03.380 --> 01:20:04.445 if my code is correct? 01:20:07.040 --> 01:20:07.910 What will I see? 01:20:07.910 --> 01:20:10.492 Feel free to just call it out. 01:20:10.492 --> 01:20:11.575 AUDIENCE: Three, two, one. 01:20:11.575 --> 01:20:14.490 DAVID J. MALAN: Three, two, one, because I've prepended presumably. 01:20:14.490 --> 01:20:15.630 And here we go. 01:20:15.630 --> 01:20:17.250 I indeed see three, two, one. 01:20:17.250 --> 01:20:18.510 So the list is backwards. 01:20:18.510 --> 01:20:20.100 But all of the elements are there. 01:20:20.100 --> 01:20:22.230 Now, technically, if I ran Valgrind on this, 01:20:22.230 --> 01:20:25.500 Valgrind would not be happy because I have never freed any of my memory. 01:20:25.500 --> 01:20:27.930 So I should probably now have a second loop here 01:20:27.930 --> 01:20:29.350 that does something like this. 01:20:29.350 --> 01:20:31.918 Let me again set pointer equal to list. 01:20:31.918 --> 01:20:33.960 I don't need to redeclare it because I've already 01:20:33.960 --> 01:20:35.760 created this thing on line 31. 01:20:35.760 --> 01:20:38.552 I just want to reset it to be the beginning of the list again. 01:20:38.552 --> 01:20:40.260 And now, I can do the same kind of thing. 01:20:40.260 --> 01:20:45.640 While prt not equals null, go ahead and do this. 01:20:45.640 --> 01:20:48.510 Well, I don't want to just do free pointer 01:20:48.510 --> 01:20:52.770 and then do pointer gets pointer next. 01:20:52.770 --> 01:20:54.000 Why? 01:20:54.000 --> 01:20:56.212 My goal is to free all of my memory. 01:20:56.212 --> 01:20:58.170 But I think this is going to get me in trouble. 01:20:58.170 --> 01:21:00.587 Pointer equals list just gives me a temporary pointer that 01:21:00.587 --> 01:21:03.630 points at the three, and then eventually the two, and then the one. 01:21:03.630 --> 01:21:04.140 How? 01:21:04.140 --> 01:21:07.090 Well, while pointer not equal null, I'm freeing the pointer. 01:21:07.090 --> 01:21:10.500 So this is like saying to malloc, free that node, free that node, free 01:21:10.500 --> 01:21:11.460 that node. 01:21:11.460 --> 01:21:16.040 But what's the problem with what I've just done here? 01:21:16.040 --> 01:21:18.030 This code is technically buggy. 01:21:18.030 --> 01:21:18.530 Yeah. 01:21:18.530 --> 01:21:21.613 AUDIENCE: After you free your pointer, you have no [INAUDIBLE] what's next 01:21:21.613 --> 01:21:22.260 [INAUDIBLE]. 01:21:22.260 --> 01:21:23.260 DAVID J. MALAN: Exactly. 01:21:23.260 --> 01:21:27.180 After you call free on pointer, you are by social contract 01:21:27.180 --> 01:21:30.240 with C not allowed to touch pointer anymore. 01:21:30.240 --> 01:21:31.080 It is invalid. 01:21:31.080 --> 01:21:32.580 Now it's still going to be a number. 01:21:32.580 --> 01:21:34.288 It's still going to be a pattern of bits. 01:21:34.288 --> 01:21:35.280 But it's invalid. 01:21:35.280 --> 01:21:40.240 And you'll very often get a segmentation fault if you tempt fate in that way. 01:21:40.240 --> 01:21:43.920 So I can't free the pointer and then use it literally the next line. 01:21:43.920 --> 01:21:47.220 The solution here, like our swapping of the liquids last time, 01:21:47.220 --> 01:21:49.480 was to maybe just have a temporary variable. 01:21:49.480 --> 01:21:50.550 So I can do a switcheroo. 01:21:50.550 --> 01:21:54.120 And so, a common way to solve this problem to get the order of operations 01:21:54.120 --> 01:21:55.890 right would be to do something like this. 01:21:55.890 --> 01:21:59.790 Give yourself a temporary pointer like node star next. 01:21:59.790 --> 01:22:05.220 Set it equal to the place you want to go next, so we're one step ahead. 01:22:05.220 --> 01:22:07.290 Now you can free pointer. 01:22:07.290 --> 01:22:10.690 And then you can update pointer to be that next value. 01:22:10.690 --> 01:22:12.900 So essentially, you need two hands now. 01:22:12.900 --> 01:22:15.720 You create on line 41 another pointer that if this 01:22:15.720 --> 01:22:18.300 is pointing at the first node, the three, 01:22:18.300 --> 01:22:20.880 your new pointer is pointing at the two temporarily. 01:22:20.880 --> 01:22:24.960 So now you can tell malloc via free, release this memory. 01:22:24.960 --> 01:22:27.720 But I haven't forgotten where I want to go next. 01:22:27.720 --> 01:22:30.160 And so, I can now continue on. 01:22:30.160 --> 01:22:33.720 So a common paradigm for just iterating over these nodes 01:22:33.720 --> 01:22:36.420 and then freeing them, a couple of observations. 01:22:36.420 --> 01:22:38.850 Strictly speaking, I could have consolidated this. 01:22:38.850 --> 01:22:42.720 I don't need two loops to print the nodes and then free the nodes. 01:22:42.720 --> 01:22:43.948 I could do that all at once. 01:22:43.948 --> 01:22:46.740 But let's assume that there's other stuff of interest in my program 01:22:46.740 --> 01:22:48.615 and I don't want to just immediately free it. 01:22:48.615 --> 01:22:51.420 There's one other bug that I should probably address here. 01:22:51.420 --> 01:22:54.120 There is still a potential memory leak up here. 01:22:54.120 --> 01:22:57.810 And this one is super subtle, though Valgrind would help you find it. 01:22:57.810 --> 01:23:02.730 Notice that in this loop here when I'm calling malloc, this line of code 01:23:02.730 --> 01:23:06.120 is fine if the first line of malloc fails and returns 01:23:06.120 --> 01:23:08.440 null because I immediately return and I'm done. 01:23:08.440 --> 01:23:12.240 But what if the second call-- but not the first, or the third call, but not 01:23:12.240 --> 01:23:13.770 the first or second fail? 01:23:13.770 --> 01:23:16.560 This line of code has me returning immediately. 01:23:16.560 --> 01:23:19.840 You really need to do some garbage collection, so to speak, 01:23:19.840 --> 01:23:23.250 whereby you really need to go in and free any nodes that you 01:23:23.250 --> 01:23:25.110 did allocate successfully earlier. 01:23:25.110 --> 01:23:26.370 Honestly, that's going to be a pain in the neck. 01:23:26.370 --> 01:23:27.450 We won't do that here. 01:23:27.450 --> 01:23:30.840 But probably what I'd want to do is write a function called free list 01:23:30.840 --> 01:23:33.180 or something like that and call that function 01:23:33.180 --> 01:23:35.950 to free any nodes I had previously created. 01:23:35.950 --> 01:23:37.960 So it's not quite at the finish line. 01:23:37.960 --> 01:23:41.160 But the building blocks are indeed here. 01:23:41.160 --> 01:23:44.550 Questions on this code? 01:23:44.550 --> 01:23:48.360 And I think it's safe for me to promise that it won't escalate further 01:23:48.360 --> 01:23:50.880 from that. 01:23:50.880 --> 01:23:54.760 Questions on this? 01:23:54.760 --> 01:23:55.630 No? 01:23:55.630 --> 01:23:58.130 Well, let me show you one alternative that you might prefer. 01:23:58.130 --> 01:24:00.005 And I'm pretty sure this isn't an escalation, 01:24:00.005 --> 01:24:01.600 it's just an alternative formulation. 01:24:01.600 --> 01:24:05.060 Another way you can iterate over nodes in a list could be this. 01:24:05.060 --> 01:24:07.203 Instead of a while loop, for instance, let 01:24:07.203 --> 01:24:09.370 me actually show you one other piece of syntax here. 01:24:09.370 --> 01:24:10.930 You could technically use a for loop. 01:24:10.930 --> 01:24:15.580 You could give yourself a node pointer here that is initialized to the list. 01:24:15.580 --> 01:24:18.530 You can then check in your for loop that it's not equal to null. 01:24:18.530 --> 01:24:21.310 And then, you can do your update as usual like this. 01:24:21.310 --> 01:24:23.360 Either of these are equivalent. 01:24:23.360 --> 01:24:25.480 Even though this one I suspect looks scarier, 01:24:25.480 --> 01:24:28.840 it's doing the exact same thing in one line instead of two. 01:24:28.840 --> 01:24:32.230 But there's no reason we can't use for loops instead of while loops 01:24:32.230 --> 01:24:33.490 to achieve the same idea. 01:24:33.490 --> 01:24:35.320 But I'll leave these two as demonstrations 01:24:35.320 --> 01:24:36.580 of one approach or the other. 01:24:36.580 --> 01:24:39.340 But that's just like in week one, for loops, while loops, 01:24:39.340 --> 01:24:41.860 whatever looks simpler to you. 01:24:41.860 --> 01:24:46.460 Even though admittedly neither of these probably looks super clean. 01:24:46.460 --> 01:24:49.660 So let's take things back to things more conceptual here. 01:24:49.660 --> 01:24:52.900 Up until now, we've been inserting elements into this linked list 01:24:52.900 --> 01:24:54.580 by prepending them. 01:24:54.580 --> 01:24:58.070 Let's consider what the running time then is of these operations. 01:24:58.070 --> 01:25:01.423 So if I've got a linked list of size three or size n more generally, 01:25:01.423 --> 01:25:03.340 time has passed and I've added a lot of things 01:25:03.340 --> 01:25:06.400 to it, what's going to be the running time, for instance, of searching 01:25:06.400 --> 01:25:08.590 a linked list for some value? 01:25:08.590 --> 01:25:10.510 And I'll tell you already, it's not log n. 01:25:10.510 --> 01:25:15.040 Because again, binary search is off the table as per before break. 01:25:15.040 --> 01:25:20.080 So what might the running time be of searching a linked list for some value, 01:25:20.080 --> 01:25:23.350 like two, or three, or one, or 50? 01:25:23.350 --> 01:25:26.545 What might the running time be? 01:25:26.545 --> 01:25:27.420 AUDIENCE: [INAUDIBLE] 01:25:27.420 --> 01:25:28.310 DAVID J. MALAN: O of-- 01:25:28.310 --> 01:25:28.970 I heard it over here. 01:25:28.970 --> 01:25:29.600 O of n. 01:25:29.600 --> 01:25:30.260 And why? 01:25:30.260 --> 01:25:30.862 Who was that? 01:25:30.862 --> 01:25:31.820 Oh, in the middle here? 01:25:31.820 --> 01:25:33.421 Who O of n? 01:25:33.421 --> 01:25:37.260 AUDIENCE: Because you're [INAUDIBLE] item in the list [INAUDIBLE].. 01:25:37.260 --> 01:25:38.260 DAVID J. MALAN: Exactly. 01:25:38.260 --> 01:25:40.030 You're going to have to go through every [? item ?] in the list 01:25:40.030 --> 01:25:42.550 starting from the left from the beginning, which is how we've been 01:25:42.550 --> 01:25:44.175 drawing things and connecting the dots. 01:25:44.175 --> 01:25:47.092 And in the worst case, the element might very well be at the very end. 01:25:47.092 --> 01:25:48.430 So it's going to be big O of n. 01:25:48.430 --> 01:25:49.990 What about insertion? 01:25:49.990 --> 01:25:52.810 How many steps in terms of big O notation 01:25:52.810 --> 01:25:55.780 has it been taking me to insert elements into the linked 01:25:55.780 --> 01:25:59.692 list using this prepend design? 01:25:59.692 --> 01:26:00.540 AUDIENCE: One. 01:26:00.540 --> 01:26:01.415 DAVID J. MALAN: Yeah. 01:26:01.415 --> 01:26:03.900 So it's technically constant time, big O of one. 01:26:03.900 --> 01:26:06.197 And again, one is just representative of any constant. 01:26:06.197 --> 01:26:08.280 It could technically be two steps, or three steps, 01:26:08.280 --> 01:26:10.240 or even 10 steps, or 100 steps. 01:26:10.240 --> 01:26:13.950 But if it's always finite and fixed, then 01:26:13.950 --> 01:26:16.080 indeed you can say it's in big O of one. 01:26:16.080 --> 01:26:17.080 Now, why is that? 01:26:17.080 --> 01:26:19.950 Well, again, no matter how long this list gets, so long 01:26:19.950 --> 01:26:24.240 as there's memory available for me I can just create a little splice 01:26:24.240 --> 01:26:27.570 at the beginning of the list to put in the new node, update the original list, 01:26:27.570 --> 01:26:28.410 and I'm on my way. 01:26:28.410 --> 01:26:31.618 And it keeps getting longer even though it might not be spread out in memory. 01:26:31.618 --> 01:26:34.650 So big O of one is possible with these linked lists 01:26:34.650 --> 01:26:36.690 if I indeed prepend things. 01:26:36.690 --> 01:26:39.300 Of course, if I prepend things, everything's 01:26:39.300 --> 01:26:40.920 going to get out of order potentially. 01:26:40.920 --> 01:26:43.590 And we're going to have maybe the stack property instead of a queue property. 01:26:43.590 --> 01:26:45.715 So we might want to do things slightly differently. 01:26:45.715 --> 01:26:49.620 So instead of doing this, whereby we kept prepending, 01:26:49.620 --> 01:26:53.890 prepending, prepending, suppose we append to the end of the list instead. 01:26:53.890 --> 01:26:56.190 So if we now insert the one, the two, and the three 01:26:56.190 --> 01:26:59.640 as we might want to for a queue, to maintain that fairness property, 01:26:59.640 --> 01:27:01.230 we might start with an empty list. 01:27:01.230 --> 01:27:02.580 We might add the one. 01:27:02.580 --> 01:27:05.820 We might append the two, append the three. 01:27:05.820 --> 01:27:08.263 And so, it just is laid out differently in memory. 01:27:08.263 --> 01:27:10.180 And again, if I can come to you in the middle, 01:27:10.180 --> 01:27:12.180 what's the running time of search, again, 01:27:12.180 --> 01:27:15.592 when the linked list uses this append implementation? 01:27:15.592 --> 01:27:16.830 AUDIENCE: Still big O of n. 01:27:16.830 --> 01:27:17.190 DAVID J. MALAN: Yeah. 01:27:17.190 --> 01:27:17.898 Still big O of n. 01:27:17.898 --> 01:27:21.065 Because in the worst case, you're going to have to go through the whole list 01:27:21.065 --> 01:27:21.960 just to find it. 01:27:21.960 --> 01:27:25.380 And notice, it doesn't matter if you have an intuition now that the bigger 01:27:25.380 --> 01:27:27.420 numbers might very well be at the end. 01:27:27.420 --> 01:27:29.520 You have no way to jump to the end. 01:27:29.520 --> 01:27:33.120 You have no way to jump to the middle or do anything resembling binary search. 01:27:33.120 --> 01:27:38.025 Every search has to start from the left and follow the arrows again and again. 01:27:38.025 --> 01:27:39.900 So I don't think we've done any better there. 01:27:39.900 --> 01:27:43.500 And in fact, what is insertions running time now in big O 01:27:43.500 --> 01:27:45.690 when we're appending to the list in this way, 01:27:45.690 --> 01:27:49.230 as we might to implement a queue instead of a stack? 01:27:49.230 --> 01:27:53.670 What's the running time of inserting a new value? 01:27:53.670 --> 01:27:55.397 Big O of? 01:27:55.397 --> 01:27:55.980 AUDIENCE: One. 01:27:55.980 --> 01:28:00.030 DAVID J. MALAN: So not big O of one in this case, but big O of n. 01:28:00.030 --> 01:28:02.610 Because if I'm appending, by definition I 01:28:02.610 --> 01:28:06.130 have to start here and traverse the whole thing looking for the end. 01:28:06.130 --> 01:28:08.010 Now, this is a bit of an overstatement. 01:28:08.010 --> 01:28:09.900 You could obviously optimize this slightly 01:28:09.900 --> 01:28:14.220 by maybe adding another variable that always points to the last element, sort 01:28:14.220 --> 01:28:17.460 of a cheat sheet or a shortcut that gets you all the way to the end. 01:28:17.460 --> 01:28:18.293 That's totally fine. 01:28:18.293 --> 01:28:20.752 It's not-- it doesn't really fit the traditional definition 01:28:20.752 --> 01:28:21.810 of a singly linked list. 01:28:21.810 --> 01:28:25.330 But there's absolutely smart engineering solutions to these kinds of problems. 01:28:25.330 --> 01:28:27.570 But as designed, it would indeed be big O event 01:28:27.570 --> 01:28:30.480 to insert to if you've got to go all the way to the end 01:28:30.480 --> 01:28:34.260 and you're not using a little extra memory to get yourself there quickly. 01:28:34.260 --> 01:28:38.160 Well, what if we want to take things one last step and not just append 01:28:38.160 --> 01:28:41.160 blindly, because even though I inserted one, two, three, 01:28:41.160 --> 01:28:44.910 if I inserted them in random order, they would end up in random order. 01:28:44.910 --> 01:28:48.660 What if you want to maintain a sorted list from smallest to largest? 01:28:48.660 --> 01:28:51.090 Well, then you might want to insert numbers like this. 01:28:51.090 --> 01:28:54.420 Starting from an empty list, we might have a two, 01:28:54.420 --> 01:28:56.640 then we might try inserting a one. 01:28:56.640 --> 01:28:57.930 But we want to keep it sorted. 01:28:57.930 --> 01:29:00.450 So now we're going to prepend in our code. 01:29:00.450 --> 01:29:02.820 But then you might want to insert a four. 01:29:02.820 --> 01:29:05.400 So you would append the four because you're probably 01:29:05.400 --> 01:29:07.440 going to look for the right spot to insert it. 01:29:07.440 --> 01:29:09.110 Then we're going to insert a three. 01:29:09.110 --> 01:29:10.860 And this one is getting a little annoying. 01:29:10.860 --> 01:29:14.910 Because now you have to iterate over the list, look for the right spot, 01:29:14.910 --> 01:29:17.760 and then do a little smarter of a splice. 01:29:17.760 --> 01:29:18.773 But it's possible. 01:29:18.773 --> 01:29:20.940 But you don't want to orphan the four, for instance. 01:29:20.940 --> 01:29:24.060 And then, ultimately, we get back to this question. 01:29:24.060 --> 01:29:26.640 What would the performance be of your linked list 01:29:26.640 --> 01:29:30.090 if you're trying to maintain sorted order? 01:29:30.090 --> 01:29:34.410 Well, search I think is going to be big O of n for the same reasons as before. 01:29:34.410 --> 01:29:37.170 What about insertion? 01:29:37.170 --> 01:29:44.120 Big O of what for inserting into a sorted linked list? 01:29:44.120 --> 01:29:45.590 Yeah, in the worst case? 01:29:45.590 --> 01:29:46.460 AUDIENCE: Big O of n 01:29:46.460 --> 01:29:46.850 DAVID J. MALAN: Yeah. 01:29:46.850 --> 01:29:48.050 It's still a big O of n. 01:29:48.050 --> 01:29:51.410 So it's no worse than, but it's not really any better than appending. 01:29:51.410 --> 01:29:55.040 But we gain the additional property of maintaining a sorted list, which 01:29:55.040 --> 01:29:58.070 might very well be useful if you're sorting your contacts in your phone 01:29:58.070 --> 01:30:01.850 or something like that where it just makes sense to maintain sorted order. 01:30:01.850 --> 01:30:04.280 Now, in the code for online today, if you take a look 01:30:04.280 --> 01:30:09.203 at some of the final versions of code, like list 6.c and list 5.c 01:30:09.203 --> 01:30:11.120 as we'll post on the website, you can actually 01:30:11.120 --> 01:30:14.660 see code that will solve all three of these problems, the prepend version 01:30:14.660 --> 01:30:17.360 that we wrote live, the append version which we talked through, 01:30:17.360 --> 01:30:19.220 as well as this sorted order one. 01:30:19.220 --> 01:30:22.197 But I think I'll avoid showing it live just because I do 01:30:22.197 --> 01:30:23.780 think that starts to escalate quickly. 01:30:23.780 --> 01:30:25.697 But I think we have enough of a building block 01:30:25.697 --> 01:30:29.660 if we're comfortable with prepending to at least solve some real-world problems 01:30:29.660 --> 01:30:31.970 with these linked lists. 01:30:31.970 --> 01:30:37.520 Questions then on linked lists which we'll now leave behind on their own 01:30:37.520 --> 01:30:44.630 but now use this technique to solve fancier problems but much less code. 01:30:44.630 --> 01:30:47.630 Questions on linked lists? 01:30:47.630 --> 01:30:52.160 So to recap, we've taken a sidestep with linked list. 01:30:52.160 --> 01:30:56.180 We have this dynamism now where we can grow and shrink our chunks of memory 01:30:56.180 --> 01:30:59.150 without over-allocating or accidentally under-allocating 01:30:59.150 --> 01:31:00.620 as in the world of an array. 01:31:00.620 --> 01:31:03.580 We don't have to worry about copying values endlessly 01:31:03.580 --> 01:31:06.830 because once you allocate the node, it can just stay wherever it is in memory. 01:31:06.830 --> 01:31:08.390 And you can just maintain-- 01:31:08.390 --> 01:31:10.190 you can just stitch it together somehow. 01:31:10.190 --> 01:31:13.220 But unfortunately, we've sacrificed what we started the class 01:31:13.220 --> 01:31:16.550 with in week zero, which was binary search, divide and conquer, which 01:31:16.550 --> 01:31:19.550 was like gave us that log and running time, which 01:31:19.550 --> 01:31:23.870 was really compelling if you think back to the demonstrations and the visuals. 01:31:23.870 --> 01:31:26.060 Can we get the best of both worlds? 01:31:26.060 --> 01:31:30.920 Can we get the speed of binary search, something logarithmic, 01:31:30.920 --> 01:31:33.660 but the dynamism of something like a linked list? 01:31:33.660 --> 01:31:35.600 Well, we can, actually, I think, if we start 01:31:35.600 --> 01:31:39.690 to think not in a single dimension, just the x-axis, if you will, 01:31:39.690 --> 01:31:42.410 but two dimensions, such that our data structures 01:31:42.410 --> 01:31:45.120 can maybe now have width and height, if you will. 01:31:45.120 --> 01:31:47.600 And so, a tree is perhaps the right term here. 01:31:47.600 --> 01:31:51.290 Much like a family tree, if you have your elders up here in the tree 01:31:51.290 --> 01:31:54.260 and then the branches below them for their children and grandchildren 01:31:54.260 --> 01:31:54.890 and the like. 01:31:54.890 --> 01:31:56.598 That's actually what a computer scientist 01:31:56.598 --> 01:31:59.840 means when they talk about trees, not a tree that grows up like this, 01:31:59.840 --> 01:32:02.510 but really one that typically is depicted growing down. 01:32:02.510 --> 01:32:05.520 Although this is just an artist's depiction no matter what. 01:32:05.520 --> 01:32:09.080 But there are certain types of trees in the world called binary search 01:32:09.080 --> 01:32:14.090 trees that are structured on paper and in visually like a family tree. 01:32:14.090 --> 01:32:16.850 But they have a special property that lends themselves 01:32:16.850 --> 01:32:19.470 to exactly that feature binary search. 01:32:19.470 --> 01:32:22.580 So for instance, here is an array back from week two. 01:32:22.580 --> 01:32:26.450 And I've sorted a whole bunch of numbers herein from one to seven. 01:32:26.450 --> 01:32:29.540 We know we can do binary search on this structure 01:32:29.540 --> 01:32:31.640 if it's implemented as an array. 01:32:31.640 --> 01:32:37.490 But what feature do arrays, to be clear, not have that linked lists do? 01:32:37.490 --> 01:32:38.720 Today's kind of a seesaw. 01:32:38.720 --> 01:32:45.486 What did we just gain by adding linked lists that arrays do not allow Yeah. 01:32:45.486 --> 01:32:47.755 AUDIENCE: [INAUDIBLE] 01:32:47.755 --> 01:32:48.630 DAVID J. MALAN: Yeah. 01:32:48.630 --> 01:32:51.330 You can insert more elements without having to copy 01:32:51.330 --> 01:32:52.890 or moving everything else around. 01:32:52.890 --> 01:32:57.330 Right now, in this single dimension, if these values to the left and/or right 01:32:57.330 --> 01:32:59.810 are already used, then you have to move everything. 01:32:59.810 --> 01:33:01.560 And that's where we started today's story. 01:33:01.560 --> 01:33:03.450 So arrays kind of paint you into a corner. 01:33:03.450 --> 01:33:06.990 Because you have to, by definition, decide in advance how big they are. 01:33:06.990 --> 01:33:09.390 Well, couldn't we have some kind of array 01:33:09.390 --> 01:33:12.780 that can still grow but still is contiguous so we 01:33:12.780 --> 01:33:15.190 can do binary search in some way? 01:33:15.190 --> 01:33:19.440 Well, yes, if we rethink how we implement binary search. 01:33:19.440 --> 01:33:24.660 Let me propose that this, I've chosen these seven elements in the array 01:33:24.660 --> 01:33:29.910 much like the lockers from week two to be ordered from smallest to largest. 01:33:29.910 --> 01:33:33.040 I've highlighted now in yellow the middle elements here. 01:33:33.040 --> 01:33:36.210 And if we were telling the story of week two going left or going right, 01:33:36.210 --> 01:33:39.090 let me highlight in red the middle elements 01:33:39.090 --> 01:33:40.890 of the left half and the right half. 01:33:40.890 --> 01:33:42.960 And then, let me further highlight in green 01:33:42.960 --> 01:33:45.840 the other elements in between those. 01:33:45.840 --> 01:33:48.778 And there's actually a pattern here, as you might notice. 01:33:48.778 --> 01:33:50.820 Whereby there's one yellow in the middle and then 01:33:50.820 --> 01:33:52.830 there's the two red and the four green. 01:33:52.830 --> 01:33:55.380 There's an implicit structure there, if you will. 01:33:55.380 --> 01:33:57.840 And what if I do start to think in two dimensions, 01:33:57.840 --> 01:34:03.210 and instead of laying out an array of lockers like this on the x-axis only, 01:34:03.210 --> 01:34:09.450 what if I slide the four up, and pull the one, the three, the five down 01:34:09.450 --> 01:34:13.210 and draw this in two dimensions instead? 01:34:13.210 --> 01:34:17.490 Well, let me do that, as by separating these things like this. 01:34:17.490 --> 01:34:20.610 Such that, now, let me propose that each of these squares 01:34:20.610 --> 01:34:22.770 maybe it doesn't have to be contiguous. 01:34:22.770 --> 01:34:24.990 It can be anywhere in the computer's memory. 01:34:24.990 --> 01:34:27.810 But I can't have these crazy gaps among them. 01:34:27.810 --> 01:34:32.100 How could I perhaps keep these things connected conceptually? 01:34:32.100 --> 01:34:34.050 What should I add to the picture, if you will? 01:34:36.615 --> 01:34:37.115 Yeah. 01:34:37.115 --> 01:34:37.992 AUDIENCE: Branches? 01:34:37.992 --> 01:34:39.075 DAVID J. MALAN: Say again? 01:34:39.075 --> 01:34:39.410 AUDIENCE: Branches. 01:34:39.410 --> 01:34:41.500 DAVID J. MALAN: So branches metaphorically here. 01:34:41.500 --> 01:34:44.410 And more technically in the language of C, maybe just 01:34:44.410 --> 01:34:46.420 some arrows, some pointers. 01:34:46.420 --> 01:34:49.090 So I won't bother drawing things as rectangles constantly. 01:34:49.090 --> 01:34:51.940 Let me propose that we're now just abstracting away what a node is. 01:34:51.940 --> 01:34:54.550 But let me claim that each of these squares now is a node. 01:34:54.550 --> 01:34:55.900 And a node might have a number. 01:34:55.900 --> 01:34:57.400 But it might also have a pointer. 01:34:57.400 --> 01:34:59.950 Heck, maybe even two or more pointers. 01:34:59.950 --> 01:35:01.240 And let me draw those now. 01:35:01.240 --> 01:35:04.970 I don't care about addresses like 0x123, 456, 789 anymore. 01:35:04.970 --> 01:35:07.210 Let's just draw our pointers with arrows. 01:35:07.210 --> 01:35:10.570 But now, let me propose that we could very well think 01:35:10.570 --> 01:35:17.860 about this as a tree storing what was previously array data. 01:35:17.860 --> 01:35:21.670 But now, each of these nodes can be anywhere in memory. 01:35:21.670 --> 01:35:24.040 And moreover, even though I've painted myself 01:35:24.040 --> 01:35:26.290 into a corner visually on the screen, so long 01:35:26.290 --> 01:35:30.620 as there's more memory in the computer, I could put the number zero over here. 01:35:30.620 --> 01:35:32.720 I could put the number eight over here. 01:35:32.720 --> 01:35:35.290 And if I'm smart, I could probably, if I want 01:35:35.290 --> 01:35:39.310 to insert other numbers, like 2.5 or 1.54 values in between, 01:35:39.310 --> 01:35:43.660 I bet we could make room by swiveling things around and just 01:35:43.660 --> 01:35:47.330 hanging things off of these branches slightly differently. 01:35:47.330 --> 01:35:49.060 And so, what does this gain me? 01:35:49.060 --> 01:35:52.690 Well, if I instead start to model my data not single dimensionally 01:35:52.690 --> 01:35:56.170 but in two dimensions, and I connect those nodes 01:35:56.170 --> 01:35:59.050 with these pointers, what can I now do? 01:35:59.050 --> 01:36:01.280 I think I just gave myself back binary search. 01:36:01.280 --> 01:36:01.780 Why? 01:36:01.780 --> 01:36:03.530 Suppose I'm searching for the number five. 01:36:03.530 --> 01:36:04.505 How do I find it? 01:36:04.505 --> 01:36:06.880 Well just, like in a family tree where you might visually 01:36:06.880 --> 01:36:09.100 start reading from top to bottom, I'm always 01:36:09.100 --> 01:36:11.990 going to start from the so-called root of a binary search tree. 01:36:11.990 --> 01:36:16.000 This is just like the list pointer that kicks off the whole linked list 01:36:16.000 --> 01:36:16.660 process. 01:36:16.660 --> 01:36:18.610 This is the so-called root. 01:36:18.610 --> 01:36:19.930 Here I am at the number four. 01:36:19.930 --> 01:36:21.760 I want to find the number five. 01:36:21.760 --> 01:36:25.120 What decision can I make when I see that I'm currently at the number four, 01:36:25.120 --> 01:36:27.550 just like the phone book from week zero? 01:36:27.550 --> 01:36:30.790 Where is five not? 01:36:30.790 --> 01:36:32.480 It's not to the left. 01:36:32.480 --> 01:36:35.140 And if I had built a little mobile here or something 01:36:35.140 --> 01:36:38.890 we could very dramatically snip off this branch, this [LAUGHS] 01:36:38.890 --> 01:36:40.640 is very low-budget animation. 01:36:40.640 --> 01:36:43.130 These nodes could fall to the ground. 01:36:43.130 --> 01:36:45.850 And we're left with half of essentially, a tree. 01:36:45.850 --> 01:36:47.050 But what do I now know? 01:36:47.050 --> 01:36:48.667 It's obviously the five to the right. 01:36:48.667 --> 01:36:49.750 So let me go to the right. 01:36:49.750 --> 01:36:52.070 Six is obviously not the one I'm looking for. 01:36:52.070 --> 01:36:53.890 But what do I now know about the five? 01:36:53.890 --> 01:36:56.140 Well, five is less than the six. 01:36:56.140 --> 01:37:00.850 So I can snip this off here because I know it's not going to be down there. 01:37:00.850 --> 01:37:02.830 And I can follow the remaining arrow here. 01:37:02.830 --> 01:37:04.533 And voila, I just found it. 01:37:04.533 --> 01:37:06.700 And now, without getting into the weeds of the math, 01:37:06.700 --> 01:37:08.770 I've got here what, seven elements. 01:37:08.770 --> 01:37:10.610 That's roughly eight if I round up. 01:37:10.610 --> 01:37:13.480 And if I do some log base two, I actually-- one, two, three 01:37:13.480 --> 01:37:14.890 is the key detail here. 01:37:14.890 --> 01:37:17.770 The height of this tree is three. 01:37:17.770 --> 01:37:20.290 Because I took a list of size seven, and I halved it, 01:37:20.290 --> 01:37:24.040 and I halved it in order to let it dangle in these two dimensions, 01:37:24.040 --> 01:37:25.930 plus or minus one for rounding sake. 01:37:25.930 --> 01:37:27.280 So what do I get back? 01:37:27.280 --> 01:37:29.770 I now have binary search. 01:37:29.770 --> 01:37:32.710 But it's not like the middle of the middle of the middle. 01:37:32.710 --> 01:37:35.720 I now follow these arrows in one of two directions. 01:37:35.720 --> 01:37:37.960 So each of these nodes now has an int and maybe 01:37:37.960 --> 01:37:40.023 a left pointer and a right pointer. 01:37:40.023 --> 01:37:41.690 But you can call them anything you want. 01:37:41.690 --> 01:37:45.220 And so, I've gotten back binary search and dynamism. 01:37:45.220 --> 01:37:48.580 Because if you want to add zero, or eight, or nine, or 10, 01:37:48.580 --> 01:37:52.400 we can just dangle them at the bottom of the binary search tree. 01:37:52.400 --> 01:37:54.070 So what would this look like in code? 01:37:54.070 --> 01:37:56.380 But we won't actually implement it line by line. 01:37:56.380 --> 01:38:01.330 Well, here was previously our definition of a node for a linked list, which 01:38:01.330 --> 01:38:02.800 was one-dimensional, if you will. 01:38:02.800 --> 01:38:05.050 Even though it might bounce up and down on the screen. 01:38:05.050 --> 01:38:07.550 It was still just a line, if you will. 01:38:07.550 --> 01:38:10.600 Well, let me get rid of the single pointer in the linked list. 01:38:10.600 --> 01:38:13.060 Let me make a little bit of room here in this typedef. 01:38:13.060 --> 01:38:16.570 And let me propose that we just add two pointers, each of which 01:38:16.570 --> 01:38:18.130 is a struct node star. 01:38:18.130 --> 01:38:20.230 One will be called left by convention. 01:38:20.230 --> 01:38:22.870 One will be called right by convention. 01:38:22.870 --> 01:38:26.200 And so, long as someone, not me, not today, not in class, 01:38:26.200 --> 01:38:29.920 writes the code that stitches together this data structure too, 01:38:29.920 --> 01:38:32.710 handling both the left child and the right child so to speak, 01:38:32.710 --> 01:38:37.040 I think we can indeed stitch together that two-dimensional structure. 01:38:37.040 --> 01:38:39.970 And moreover, once you have this in memory, 01:38:39.970 --> 01:38:44.170 you can translate pretty elegantly to code binary search itself 01:38:44.170 --> 01:38:47.530 using a principle we talked about recently too. 01:38:47.530 --> 01:38:50.530 Here is, for instance, a function that I'll 01:38:50.530 --> 01:38:54.820 write by just clicking through steps called search whose purpose in life 01:38:54.820 --> 01:38:55.840 is to return a Boolean. 01:38:55.840 --> 01:38:59.620 True or false, the number I'm looking for is in the tree? 01:38:59.620 --> 01:39:01.930 This search function therefore takes two arguments, 01:39:01.930 --> 01:39:04.780 the number I'm looking for called number, and then 01:39:04.780 --> 01:39:07.970 a pointer to the tree, the so-called root of the tree. 01:39:07.970 --> 01:39:10.690 Now, how can I implement binary search in code? 01:39:10.690 --> 01:39:13.520 Well, recall our brief discussion of recursion. 01:39:13.520 --> 01:39:16.210 It turns out recursion is a beautiful technique. 01:39:16.210 --> 01:39:18.130 And honestly, more obvious technique when 01:39:18.130 --> 01:39:22.060 you have two-dimensional structures, which finally after five-plus weeks 01:39:22.060 --> 01:39:22.990 we now do. 01:39:22.990 --> 01:39:25.390 Here is maybe my first line of code here. 01:39:25.390 --> 01:39:27.848 If the tree is null, then obviously return false. 01:39:27.848 --> 01:39:29.140 You've handed me an empty tree. 01:39:29.140 --> 01:39:30.160 There's nothing going on. 01:39:30.160 --> 01:39:32.868 Obviously, the number you're looking for is not going to be here. 01:39:32.868 --> 01:39:35.170 So that's my safe base case to make sure I don't 01:39:35.170 --> 01:39:37.420 screw up and recursive infinitely. 01:39:37.420 --> 01:39:39.050 Well, what else might be the case? 01:39:39.050 --> 01:39:44.680 Well, if the number I'm looking for is less than the tree's own number. 01:39:44.680 --> 01:39:46.710 And now, recall that tree is a node star. 01:39:46.710 --> 01:39:48.460 So even though I'm calling it a tree, it's 01:39:48.460 --> 01:39:51.350 really the current node that's been passed in. 01:39:51.350 --> 01:39:55.990 So if the number I'm looking for is less than the current node's number, 01:39:55.990 --> 01:39:59.650 then I must know that the number I'm looking for is to the left, 01:39:59.650 --> 01:40:00.400 so to speak. 01:40:00.400 --> 01:40:01.700 So how can I solve that? 01:40:01.700 --> 01:40:03.610 Well, this is where the magic of recursion. 01:40:03.610 --> 01:40:09.670 Just return whatever the answer is to calling search again but on a subtree, 01:40:09.670 --> 01:40:10.300 if you will. 01:40:10.300 --> 01:40:13.060 This is the equivalent of snipping off half of the tree. 01:40:13.060 --> 01:40:17.470 Pass in the left subtree, if you will, with the same number. 01:40:17.470 --> 01:40:19.600 Else, if the number you're looking for isn't 01:40:19.600 --> 01:40:22.360 less than the current node's number but greater than, 01:40:22.360 --> 01:40:24.890 snip off the other subtree instead. 01:40:24.890 --> 01:40:29.710 And just return whatever search says it finds in the right subtree here. 01:40:29.710 --> 01:40:31.520 And then, there's a fourth and final case. 01:40:31.520 --> 01:40:34.264 What else could be true logically? 01:40:34.264 --> 01:40:35.220 yeah. 01:40:35.220 --> 01:40:37.710 AUDIENCE: Number equals [INAUDIBLE] number? 01:40:37.710 --> 01:40:38.710 DAVID J. MALAN: Perfect. 01:40:38.710 --> 01:40:43.240 If the number you're looking for equals equals the number in this node, 01:40:43.240 --> 01:40:45.310 then I'm just going to return true. 01:40:45.310 --> 01:40:49.600 And you might recall from our recurring discussions of design, 01:40:49.600 --> 01:40:51.820 I don't strictly need to ask that explicitly. 01:40:51.820 --> 01:40:54.580 Either there's no node, it's to the left, it's to the right, 01:40:54.580 --> 01:40:55.580 or you found it. 01:40:55.580 --> 01:40:59.170 So I can just whittle that down as usual to an else. 01:40:59.170 --> 01:41:01.130 And this now returns my true. 01:41:01.130 --> 01:41:04.180 So here too, this is where recursion, once you get comfy with it, 01:41:04.180 --> 01:41:06.970 start it gets pretty elegant and cool in the sense 01:41:06.970 --> 01:41:09.340 that, wow, even though there's a lot of lines here, 01:41:09.340 --> 01:41:11.620 I mean, there's only a few interesting lines. 01:41:11.620 --> 01:41:13.448 A lot of it's curly braces at that, which 01:41:13.448 --> 01:41:14.990 strictly speaking I could get rid of. 01:41:14.990 --> 01:41:17.470 And so, recursion lends itself to elegance 01:41:17.470 --> 01:41:20.980 when it comes to traversing these two-dimensional data 01:41:20.980 --> 01:41:22.280 structures as well. 01:41:22.280 --> 01:41:27.940 So that is in code how you might implement something like search. 01:41:27.940 --> 01:41:32.290 Questions then on these trees? 01:41:32.290 --> 01:41:33.220 We have dynamism. 01:41:33.220 --> 01:41:34.900 We can insert more nodes to them. 01:41:34.900 --> 01:41:37.150 They're faster because we get better your search back. 01:41:37.150 --> 01:41:39.250 But, but, but, there's got to be a price paid. 01:41:39.250 --> 01:41:41.488 Any downsides, or question, or downside? 01:41:41.488 --> 01:41:42.280 AUDIENCE: Question. 01:41:42.280 --> 01:41:44.863 DAVID J. MALAN: OK, let me come back to that and just one sec. 01:41:44.863 --> 01:41:47.290 Downside though, what price have we paid for this dynamism 01:41:47.290 --> 01:41:50.530 and for this binary searchability? 01:41:50.530 --> 01:41:53.867 Even though I've abstracted it away in the picture? 01:41:53.867 --> 01:41:54.907 AUDIENCE: [INAUDIBLE] 01:41:54.907 --> 01:41:55.990 DAVID J. MALAN: Say again? 01:41:55.990 --> 01:41:56.990 AUDIENCE: [INAUDIBLE] 01:41:56.990 --> 01:41:58.880 DAVID J. MALAN: We're using a lot of memory. 01:41:58.880 --> 01:42:01.790 I'm kind of misleading you now because I'm just drawing these little squares 01:42:01.790 --> 01:42:02.690 with the simple numbers. 01:42:02.690 --> 01:42:04.490 But there's actually three things in there. 01:42:04.490 --> 01:42:09.020 A four-byte integer, an eight-byte left pointer, an eight-byte right pointer. 01:42:09.020 --> 01:42:12.740 So we're already up to 16, 20 bytes now to store individual INTs. 01:42:12.740 --> 01:42:15.320 That's probably OK though if memory is relatively 01:42:15.320 --> 01:42:18.000 cheap and voluminous as it nowadays is. 01:42:18.000 --> 01:42:20.270 But these are the kinds of trade-offs. 01:42:20.270 --> 01:42:24.050 And here, too, you see a hint of why some people still do like, 01:42:24.050 --> 01:42:26.720 and you see-- and in fact, it's so omnipresent. 01:42:26.720 --> 01:42:29.125 Because when you have C you can really fine-tune 01:42:29.125 --> 01:42:32.000 how much memory is being used for better or for worse under the hood. 01:42:32.000 --> 01:42:35.630 As we transition soon to Python, these decisions get made for you. 01:42:35.630 --> 01:42:39.140 And you have much, much less control about how much memory 01:42:39.140 --> 01:42:42.500 is being used by your program because someone else made the design decisions 01:42:42.500 --> 01:42:43.380 for you. 01:42:43.380 --> 01:42:44.030 Question. 01:42:44.030 --> 01:42:46.890 AUDIENCE: [INAUDIBLE] 01:42:46.890 --> 01:42:49.740 DAVID J. MALAN: Is it bad if we don't the parent node? 01:42:49.740 --> 01:42:50.670 Not necessarily. 01:42:50.670 --> 01:42:54.820 There's no reason why you need to have pointers in both directions. 01:42:54.820 --> 01:42:57.210 However, that can lend itself to efficiency. 01:42:57.210 --> 01:42:59.970 By spending more space and having arrows go up to 01:42:59.970 --> 01:43:03.360 you can actually save more time when searching the tree and other contexts. 01:43:03.360 --> 01:43:06.810 This though would be the canonical way, the typical way to implement it. 01:43:06.810 --> 01:43:09.150 But absolutely, just like a doubly linked list, 01:43:09.150 --> 01:43:12.630 that could help you solve other problems too. 01:43:12.630 --> 01:43:16.740 So turns out I'm overselling binary search trees. 01:43:16.740 --> 01:43:21.000 There are perversions of them, so to speak, whereby they won't actually 01:43:21.000 --> 01:43:22.410 behave as advertised. 01:43:22.410 --> 01:43:24.700 For instance, here's a good situation. 01:43:24.700 --> 01:43:27.090 Suppose you've got an empty tree initially. 01:43:27.090 --> 01:43:28.498 And you insert the number two. 01:43:28.498 --> 01:43:29.790 Well, it's got to go somewhere. 01:43:29.790 --> 01:43:32.490 So it might as well become the root of this binary search tree. 01:43:32.490 --> 01:43:34.823 And let's assume that someone wrote the code to do this. 01:43:34.823 --> 01:43:36.990 Now you want to insert the number one and you 01:43:36.990 --> 01:43:39.520 want to maintain the searchability of this tree. 01:43:39.520 --> 01:43:44.400 Well, it's important to note that binary search tree is different from tree. 01:43:44.400 --> 01:43:46.320 If you've just got a tree in memory, there 01:43:46.320 --> 01:43:49.500 is no social contract with where the numbers need to go. 01:43:49.500 --> 01:43:51.810 They can be completely random all over the place. 01:43:51.810 --> 01:43:55.320 Binary search tree means that you can do binary search, 01:43:55.320 --> 01:43:59.460 means that any node here is going to be greater than every node here 01:43:59.460 --> 01:44:01.770 and less than every node here. 01:44:01.770 --> 01:44:03.308 And that's a definition. 01:44:03.308 --> 01:44:05.100 It's a recursive structural definition that 01:44:05.100 --> 01:44:08.850 must be true to be a binary search tree or BST. 01:44:08.850 --> 01:44:12.060 So if we maintain that property ourselves, let me insert two, 01:44:12.060 --> 01:44:13.140 let me insert one. 01:44:13.140 --> 01:44:15.060 One belongs there by that definition. 01:44:15.060 --> 01:44:16.590 Let me insert three. 01:44:16.590 --> 01:44:18.540 Three belongs there by that definition. 01:44:18.540 --> 01:44:21.390 But I kind of got lucky in that I in the story 01:44:21.390 --> 01:44:25.020 inserted two and then one and then three. 01:44:25.020 --> 01:44:28.230 Let me propose a sort of perversion of the algorithm whereby we just 01:44:28.230 --> 01:44:33.360 get unlucky let me propose that we insert one first. 01:44:33.360 --> 01:44:34.830 And then, we insert two. 01:44:34.830 --> 01:44:35.850 Well, where does two go? 01:44:35.850 --> 01:44:38.700 Well, logically, it goes to the right because it's larger. 01:44:38.700 --> 01:44:40.620 Now, the user inserts three. 01:44:40.620 --> 01:44:42.760 Where does it go? 01:44:42.760 --> 01:44:44.310 It goes there, logically. 01:44:44.310 --> 01:44:46.110 And how does this story unfold? 01:44:46.110 --> 01:44:48.270 The user inserts four, five, six. 01:44:48.270 --> 01:44:51.180 It's wonderfully sorted in advance by luck. 01:44:51.180 --> 01:44:56.040 But this is a perversion of the structure in what sense? 01:44:56.040 --> 01:44:59.080 It's still technically a binary search tree. 01:44:59.080 --> 01:45:02.970 But what is it look more like? 01:45:02.970 --> 01:45:06.430 It really is devolving, if you will, into a linked list. 01:45:06.430 --> 01:45:09.720 And so, if you, the programmer, don't implement a binary search 01:45:09.720 --> 01:45:13.980 tree with some kind of repairs going on, such that as soon as something 01:45:13.980 --> 01:45:17.440 gets, whoa, a little too long and stringy, I think I can fix this. 01:45:17.440 --> 01:45:20.523 It's going to be an annoying line-- a number of lines of code, which we're 01:45:20.523 --> 01:45:22.110 not going to write here or in a P-set. 01:45:22.110 --> 01:45:24.630 But we could pivot this thing. 01:45:24.630 --> 01:45:28.590 And we could just rejigger things so that the two becomes the new root. 01:45:28.590 --> 01:45:30.053 The one becomes the left child. 01:45:30.053 --> 01:45:31.470 The three becomes the right child. 01:45:31.470 --> 01:45:33.970 But that's what, two, three-plus lines of code. 01:45:33.970 --> 01:45:34.560 It's possible. 01:45:34.560 --> 01:45:35.280 It's doable. 01:45:35.280 --> 01:45:36.660 But it's extra work. 01:45:36.660 --> 01:45:37.620 It's extra code. 01:45:37.620 --> 01:45:42.570 So unless you write that code though and maintain balance of these trees, 01:45:42.570 --> 01:45:46.110 just because it's a binary search tree does not mean its height is 01:45:46.110 --> 01:45:47.520 going to be log base 2 of n. 01:45:47.520 --> 01:45:52.450 The height could be n, in which case you don't get those properties. 01:45:52.450 --> 01:45:56.310 So when it comes to looking up in a balanced binary search tree, yes, 01:45:56.310 --> 01:45:57.180 it's log n. 01:45:57.180 --> 01:46:00.990 But if it's unbalanced, if you don't add that additional logic in those repairs 01:46:00.990 --> 01:46:04.320 so to speak, it could devolve into big O of n. 01:46:04.320 --> 01:46:07.343 And this is a whole category of algorithms and fanciness 01:46:07.343 --> 01:46:10.260 that you would explore in a higher level course on algorithms and data 01:46:10.260 --> 01:46:10.760 structures. 01:46:10.760 --> 01:46:13.410 There's lots of way to do that sort of fixing that I'm alluding 01:46:13.410 --> 01:46:17.800 to in the picture there on the screen. 01:46:17.800 --> 01:46:20.080 A few other data structures, if you will, 01:46:20.080 --> 01:46:23.110 toward an end of a sort of computer science Holy Grail. 01:46:23.110 --> 01:46:27.970 So log n is repeatedly a really good place to end up. 01:46:27.970 --> 01:46:30.190 We started in week zero and we got log n. 01:46:30.190 --> 01:46:33.280 We lost it earlier today by introducing linked list, 01:46:33.280 --> 01:46:37.190 but we just got it back, albeit at the price of spending more space. 01:46:37.190 --> 01:46:39.850 But the Holy Grail, so to speak, when it comes to algorithms 01:46:39.850 --> 01:46:41.710 would not be big O of n, certainly. 01:46:41.710 --> 01:46:45.220 Definitely not n squared like our bubble sorts and selection sorts. 01:46:45.220 --> 01:46:47.110 And not even big O of log n. 01:46:47.110 --> 01:46:50.870 What's better than all of those? 01:46:50.870 --> 01:46:53.223 Just big O of one, constant time. 01:46:53.223 --> 01:46:54.140 That's the Holy Grail. 01:46:54.140 --> 01:46:56.270 Because if we could store huge amounts of data 01:46:56.270 --> 01:46:59.330 but find it instantly in one step, or two steps, 01:46:59.330 --> 01:47:01.070 or heck, even 10 or 20 steps. 01:47:01.070 --> 01:47:05.063 But independent of the size of the data structure that's pretty powerful. 01:47:05.063 --> 01:47:07.730 I mean, that's the secret sauce of the Google's and the Twitters 01:47:07.730 --> 01:47:11.150 of the world trying to get back results really, really fast. 01:47:11.150 --> 01:47:15.165 Well, it turns out another abstract data type or abstract data structure 01:47:15.165 --> 01:47:16.790 might be something called a dictionary. 01:47:16.790 --> 01:47:19.010 Just like the Merriam-Webster Oxford English 01:47:19.010 --> 01:47:23.387 dictionaries that you might know which associate, say, words with definitions. 01:47:23.387 --> 01:47:25.970 Well, you can think of a dictionary really abstractly as this. 01:47:25.970 --> 01:47:28.505 Two columns maybe on a spreadsheet of sorts 01:47:28.505 --> 01:47:31.130 where the left column represents something and the right column 01:47:31.130 --> 01:47:32.240 represents something else. 01:47:32.240 --> 01:47:34.820 The word is on the left and its definition is on the right. 01:47:34.820 --> 01:47:37.470 And that's almost literally what a dictionary is on paper. 01:47:37.470 --> 01:47:40.400 You've got all the words and all the definitions right next to it. 01:47:40.400 --> 01:47:44.120 But more generally, in computing, a dictionary 01:47:44.120 --> 01:47:46.100 really just has not words and definitions 01:47:46.100 --> 01:47:48.363 per se, but key value pairs. 01:47:48.363 --> 01:47:49.280 This is a term of art. 01:47:49.280 --> 01:47:51.822 And we're going to see this again and again, especially as we 01:47:51.822 --> 01:47:54.320 transition to web programming, keys and values. 01:47:54.320 --> 01:47:57.290 Key is what you use to look for something. 01:47:57.290 --> 01:48:01.650 The value is what you find ultimately via that key. 01:48:01.650 --> 01:48:03.110 So that's the generic term there. 01:48:03.110 --> 01:48:05.590 We've seen key value pairs really in the past. 01:48:05.590 --> 01:48:07.340 In week zero we talked about your contacts 01:48:07.340 --> 01:48:10.040 and your iPhone or Android phone being an app 01:48:10.040 --> 01:48:13.250 that has a whole bunch of contacts presumably alphabetized by first name, 01:48:13.250 --> 01:48:14.600 or last name, or the like. 01:48:14.600 --> 01:48:17.270 Well, one of those contact cards ultimately 01:48:17.270 --> 01:48:20.730 has someone's number, for instance, like John Harvard in this case. 01:48:20.730 --> 01:48:23.450 So in that type of application, the keys is 01:48:23.450 --> 01:48:26.810 the name, like John Harvard that you use to find information. 01:48:26.810 --> 01:48:30.020 And the value is the number that you find there. 01:48:30.020 --> 01:48:32.270 Or if there's more information like where he lives 01:48:32.270 --> 01:48:35.360 and email address and the like, the whole contact card 01:48:35.360 --> 01:48:37.490 could be the value thereof. 01:48:37.490 --> 01:48:39.860 The key is what you use to look up John Harvard. 01:48:39.860 --> 01:48:41.780 Now, back in week zero-- 01:48:41.780 --> 01:48:45.200 oh, and rather, the corresponding table then if we draw this in two columns 01:48:45.200 --> 01:48:47.780 wouldn't be word and definition or key value generically, 01:48:47.780 --> 01:48:50.100 it would be name and number, for instance. 01:48:50.100 --> 01:48:52.010 So we're just slapping some new terminology 01:48:52.010 --> 01:48:53.760 on this old contacts problem. 01:48:53.760 --> 01:48:56.660 Well, this is the picture we drew way back in week zero, 01:48:56.660 --> 01:49:00.480 whereby I claim that log of n was really, really good. 01:49:00.480 --> 01:49:02.420 And indeed it was and has been since. 01:49:02.420 --> 01:49:04.970 But the Holy Grail would indeed be something more 01:49:04.970 --> 01:49:08.180 like this in this dashed green line constant time. 01:49:08.180 --> 01:49:11.300 And maybe not literally one step, but a fixed number of steps that even 01:49:11.300 --> 01:49:15.890 as the problem gets huge and you go way, way out on the right of the x-axis, 01:49:15.890 --> 01:49:19.850 the problem does not depend on the size-- 01:49:19.850 --> 01:49:24.080 the time to solve the problem does not depend at all 01:49:24.080 --> 01:49:26.400 on the size of the problem itself. 01:49:26.400 --> 01:49:29.300 You can have 1,000 contacts or 100,000 contacts, 01:49:29.300 --> 01:49:32.960 constant time means it takes the same number of steps no matter what. 01:49:32.960 --> 01:49:34.980 Well, how can we get to that point? 01:49:34.980 --> 01:49:37.260 Well, there's a couple of final building blocks today. 01:49:37.260 --> 01:49:38.660 And there's one called hashing. 01:49:38.660 --> 01:49:40.743 And this is something that will recur a few times. 01:49:40.743 --> 01:49:45.680 But for now, hashing is all about taking as input some value 01:49:45.680 --> 01:49:49.550 and outputting a simpler version thereof. 01:49:49.550 --> 01:49:52.520 So for instance, here's a gratuitously large deck 01:49:52.520 --> 01:49:56.510 of cards, which are all the more visible as a result. And in a deck of cards, 01:49:56.510 --> 01:50:00.200 typically you've got what, 52 cards plus maybe the jokers and whatnot. 01:50:00.200 --> 01:50:03.530 And each of those cards has a number of sorts and a suit on it. 01:50:03.530 --> 01:50:06.260 And here are literally four buckets on the stage. 01:50:06.260 --> 01:50:08.840 And how might I go about sorting these cards? 01:50:08.840 --> 01:50:10.730 Not just by number but also by suit. 01:50:10.730 --> 01:50:14.180 Well, you could certainly spread them all out and make a mess of things 01:50:14.180 --> 01:50:17.180 and just reason your way through it and get everything in order 01:50:17.180 --> 01:50:19.090 according to suit and according by number. 01:50:19.090 --> 01:50:21.590 But most of us, even if you don't have four buckets at home, 01:50:21.590 --> 01:50:24.470 probably are going to do something a little more intuitive, 01:50:24.470 --> 01:50:25.730 feels like an optimization. 01:50:25.730 --> 01:50:28.460 Where if I find the nine of hearts, I'm going 01:50:28.460 --> 01:50:30.012 to put that into the hearts bucket. 01:50:30.012 --> 01:50:32.720 The king of spades, I'm going to put that into the spades bucket. 01:50:32.720 --> 01:50:34.970 The jack of diamonds over here. 01:50:34.970 --> 01:50:39.170 And I'll do this with the queen of diamonds, and the ace of clubs here, 01:50:39.170 --> 01:50:41.630 and the three here, and the 10 here. 01:50:41.630 --> 01:50:45.560 And even though it's still going to be 52 steps, why am I-- and maybe at home, 01:50:45.560 --> 01:50:49.610 why would you perhaps do this step first? 01:50:49.610 --> 01:50:53.040 What's the value of bucketizing the values in this way? 01:50:53.040 --> 01:50:55.720 And that actually is a term of art. 01:50:55.720 --> 01:50:58.950 What's the value of doing this first before you sift through and try 01:50:58.950 --> 01:51:00.120 to sort the numbers? 01:51:00.120 --> 01:51:01.076 Yeah. 01:51:01.076 --> 01:51:03.875 AUDIENCE: It's easier to make sure you're not [INAUDIBLE].. 01:51:03.875 --> 01:51:04.750 DAVID J. MALAN: Yeah. 01:51:04.750 --> 01:51:06.958 It's easier to make sure you're not missing anything. 01:51:06.958 --> 01:51:09.580 And it's taking a problem of size 52 and shrinking it 01:51:09.580 --> 01:51:12.230 into four problems of size 13, if you will. 01:51:12.230 --> 01:51:14.642 And so, that just helps simplify things, maybe 01:51:14.642 --> 01:51:16.600 reduces the probability of errors and the like. 01:51:16.600 --> 01:51:18.808 And what I'm doing here, to give it a technical term, 01:51:18.808 --> 01:51:20.590 is that I'm hashing the values. 01:51:20.590 --> 01:51:22.570 I'm taking as input a card like this. 01:51:22.570 --> 01:51:26.950 And I'm reducing it more simply from a larger domain 01:51:26.950 --> 01:51:29.600 to a much smaller range, if you will. 01:51:29.600 --> 01:51:32.680 So here is a domain of 52 possibilities. 01:51:32.680 --> 01:51:35.980 I want to map that to a range of four possible outcomes-- 01:51:35.980 --> 01:51:39.830 the diamonds, the clubs, the hearts, or the spades here. 01:51:39.830 --> 01:51:42.665 And by doing that I'm just shrinking the size of the problem. 01:51:42.665 --> 01:51:43.540 So hashing does that. 01:51:43.540 --> 01:51:45.790 It's literally an f of x type arrangement 01:51:45.790 --> 01:51:50.300 whereby you pass something in and you get back a simpler known value. 01:51:50.300 --> 01:51:54.940 Well, a hash function more technically is the algorithm, or even the math, 01:51:54.940 --> 01:51:58.180 or even the code that implements that idea, converting 01:51:58.180 --> 01:52:03.040 something bigger to something smaller to this indeed finite range of values. 01:52:03.040 --> 01:52:10.090 And it turns out that hash tables are a wonderful application of arrays 01:52:10.090 --> 01:52:14.080 and linked lists to try to leverage the best of both worlds. 01:52:14.080 --> 01:52:18.740 The goal being theoretically to achieve that Holy Grail of constant time. 01:52:18.740 --> 01:52:20.740 And that's going to be a bit of an overstatement 01:52:20.740 --> 01:52:22.990 because you're not always going to achieve it exactly. 01:52:22.990 --> 01:52:25.420 But at least we can get a little closer there too. 01:52:25.420 --> 01:52:28.510 So with hash tables, you have something that looks like this. 01:52:28.510 --> 01:52:29.860 This is just an array. 01:52:29.860 --> 01:52:33.700 This is an artist's rendition of drawing it vertically instead of horizontally. 01:52:33.700 --> 01:52:36.310 But that's just a detail graphically. 01:52:36.310 --> 01:52:40.960 And this array, for instance, maybe is of size 26. 01:52:40.960 --> 01:52:42.470 And where am I going with this? 01:52:42.470 --> 01:52:46.990 Well, how does Apple, how does Google store your context alphabetically 01:52:46.990 --> 01:52:49.510 in your phone and search for things quickly? 01:52:49.510 --> 01:52:52.630 Well, they might-- they probably alphabetize at least in English 01:52:52.630 --> 01:52:55.210 according to A through Z. Or if we convert that to numbers 01:52:55.210 --> 01:52:56.920 it's like what, 65 through whatever. 01:52:56.920 --> 01:52:59.380 Or really, zero through 25 suffices. 01:52:59.380 --> 01:53:02.710 If we're using an array of size 26, we start counting at zero 01:53:02.710 --> 01:53:04.180 and we count up to 25. 01:53:04.180 --> 01:53:06.830 But let's abstract that away as just letters of the alphabet. 01:53:06.830 --> 01:53:09.670 So maybe what Google and Apple are doing in your phone 01:53:09.670 --> 01:53:12.670 is storing all of the A's up there, all of the Z's down there, 01:53:12.670 --> 01:53:14.810 and everything else in between. 01:53:14.810 --> 01:53:16.610 And so, this works pretty well if you start 01:53:16.610 --> 01:53:18.110 adding your friends and your family. 01:53:18.110 --> 01:53:21.970 So for instance-- and I'll get rid of the letter so as to not distract. 01:53:21.970 --> 01:53:25.750 Alvis might go in that first spot because A, you subtract a 65, 01:53:25.750 --> 01:53:26.530 maps to zero. 01:53:26.530 --> 01:53:29.020 So we put him in the first bucket, the A bucket. 01:53:29.020 --> 01:53:31.450 Maybe Zacharias ends up all the way at the end there. 01:53:31.450 --> 01:53:33.580 And then in the middle might here be Hermione. 01:53:33.580 --> 01:53:36.580 And if we do this dot, dot, dot, you keep adding all of your classmates, 01:53:36.580 --> 01:53:41.500 you might get a contacts database that has all of this data herein. 01:53:41.500 --> 01:53:44.530 Now, each of these nodes, they're drawn differently 01:53:44.530 --> 01:53:46.450 because this is just another artist rendition. 01:53:46.450 --> 01:53:49.870 These rectangles, these long rectangles represent a contact card, 01:53:49.870 --> 01:53:53.200 like John Harvard's that's got the name, maybe email, definitely phone 01:53:53.200 --> 01:53:54.950 number, and things like that. 01:53:54.950 --> 01:53:57.460 So this seems great. 01:53:57.460 --> 01:53:58.120 Why? 01:53:58.120 --> 01:54:00.100 How can I find Alvis? 01:54:00.100 --> 01:54:01.300 Well, I go to the A bucket. 01:54:01.300 --> 01:54:02.380 How do I find Zacharias? 01:54:02.380 --> 01:54:03.582 I go to the Z bucket. 01:54:03.582 --> 01:54:04.540 How do I find Hermione? 01:54:04.540 --> 01:54:06.010 I go to the H bucket. 01:54:06.010 --> 01:54:09.730 But, but, but, I've done this very deliberately. 01:54:09.730 --> 01:54:14.710 What problem will arise eventually assuming you have enough classmates? 01:54:14.710 --> 01:54:16.462 AUDIENCE: [INAUDIBLE] 01:54:17.885 --> 01:54:18.760 DAVID J. MALAN: Yeah. 01:54:18.760 --> 01:54:21.100 There'll be too many people, too many contacts for all 01:54:21.100 --> 01:54:23.860 of the available spaces in the array. 01:54:23.860 --> 01:54:25.540 There's still some room here. 01:54:25.540 --> 01:54:29.440 But I'm pretty sure if I think back to this particular class, 01:54:29.440 --> 01:54:34.090 we've got not Hermione but also Harry, who's also an H. Hagrid, who's 01:54:34.090 --> 01:54:36.430 also an H. So where do put them? 01:54:36.430 --> 01:54:40.010 I could just put them arbitrarily in any of the open spots. 01:54:40.010 --> 01:54:43.450 But then you lose the immediacy of jumping right to the H, right to the A, 01:54:43.450 --> 01:54:44.290 right to the Z. 01:54:44.290 --> 01:54:47.590 But now that we have linked lists, we can combine these ideas. 01:54:47.590 --> 01:54:51.530 Use an array to get to the first letter of the name you care about. 01:54:51.530 --> 01:54:56.300 And then if you have a collision, so to speak, whereby someone's already there, 01:54:56.300 --> 01:54:58.360 you don't do something stupid like put Harry 01:54:58.360 --> 01:55:01.270 down here just because it's available, or maybe Hagrid down here 01:55:01.270 --> 01:55:03.228 just because it's available because then you're 01:55:03.228 --> 01:55:04.870 losing the immediacy of the lookup. 01:55:04.870 --> 01:55:08.890 Why don't you just stitch them together in a link list? 01:55:08.890 --> 01:55:10.220 Now what does this mean. 01:55:10.220 --> 01:55:14.398 This means for most of the characters here, you have constant time lookup. 01:55:14.398 --> 01:55:15.940 You look up Alvis, boom, you're done. 01:55:15.940 --> 01:55:17.230 Zacharias, boom, you're done. 01:55:17.230 --> 01:55:22.400 OK, Harry, Hermione, Hagrid, it might be one, two, or three steps. 01:55:22.400 --> 01:55:24.760 So that's actually devolving into something linear. 01:55:24.760 --> 01:55:28.000 But here we make a distinction today between theoretical running times, 01:55:28.000 --> 01:55:31.750 which we keep talking about, and honestly, a clock on the wall 01:55:31.750 --> 01:55:34.090 running times that actual humans care about. 01:55:34.090 --> 01:55:38.290 This is way faster than a linked list because you 01:55:38.290 --> 01:55:40.120 don't have to search every name. 01:55:40.120 --> 01:55:44.890 It's even faster than an array because you don't need to do binary search. 01:55:44.890 --> 01:55:49.600 You can literally, for most of the names, find them in constant time-- 01:55:49.600 --> 01:55:50.590 one step. 01:55:50.590 --> 01:55:53.020 And again, it's not theoretically constant because these-- 01:55:53.020 --> 01:55:55.210 if you only befriend people who have H names, 01:55:55.210 --> 01:55:57.550 it's going to be a crazy long linked list anyway. 01:55:57.550 --> 01:56:01.840 So again, it really depends on what the nature of the data is here. 01:56:01.840 --> 01:56:04.650 But this is pretty close to constant time. 01:56:04.650 --> 01:56:06.400 And in fact, how could we get even closer? 01:56:06.400 --> 01:56:08.980 How could we reduce the probability of collisions 01:56:08.980 --> 01:56:12.220 for the H's or any other letters? 01:56:12.220 --> 01:56:14.830 How could we avoid putting too many H names together? 01:56:14.830 --> 01:56:16.802 AUDIENCE: [INAUDIBLE] 01:56:18.240 --> 01:56:19.740 DAVID J. MALAN: Say a little louder. 01:56:19.740 --> 01:56:22.626 AUDIENCE: [INAUDIBLE] 01:56:24.538 --> 01:56:25.580 DAVID J. MALAN: OK, yeah. 01:56:25.580 --> 01:56:27.360 So we could add another dimension, if you will. 01:56:27.360 --> 01:56:29.193 But let's not add a third dimension, per se. 01:56:29.193 --> 01:56:32.150 But let's indeed look at not just the first letter of everyone's name 01:56:32.150 --> 01:56:33.650 but the first and the second. 01:56:33.650 --> 01:56:37.310 And in fact, let's see if that gets us a little further along. 01:56:37.310 --> 01:56:40.310 So let me go ahead and propose, if you go through the whole Harry Potter 01:56:40.310 --> 01:56:42.977 universe, there's actually a lot of collisions if we keep going. 01:56:42.977 --> 01:56:45.950 And so, we've got the L's here, the R's, the S's, and so forth. 01:56:45.950 --> 01:56:47.510 Well, let's clean this up here. 01:56:47.510 --> 01:56:50.240 Hermione originally went to the H location. 01:56:50.240 --> 01:56:54.230 But let's decrease the probability of collisions there and everywhere else. 01:56:54.230 --> 01:56:58.050 Instead of putting Hermione, Harry, and Hagrid all together, 01:56:58.050 --> 01:56:59.750 let's go ahead and do this instead. 01:56:59.750 --> 01:57:02.840 Instead of labeling these buckets A through Z, 01:57:02.840 --> 01:57:04.620 let's give ourselves more buckets. 01:57:04.620 --> 01:57:07.940 So in fact, this might be H. Well, instead of H, maybe this should be HA, 01:57:07.940 --> 01:57:11.870 and then this should be HB, HC, HD, HE, HF. 01:57:11.870 --> 01:57:14.000 Now, some of those are a little nonsensical 01:57:14.000 --> 01:57:16.460 because I can't think of names that match most of those. 01:57:16.460 --> 01:57:18.170 But it's deterministic. 01:57:18.170 --> 01:57:20.820 At least we know the bucket will be there, 01:57:20.820 --> 01:57:22.680 which is important even if it's empty. 01:57:22.680 --> 01:57:24.560 And now, we can put Hermione here. 01:57:24.560 --> 01:57:25.760 We can put Harry here. 01:57:25.760 --> 01:57:26.480 But oh-oh. 01:57:26.480 --> 01:57:28.190 We didn't do this perfectly well. 01:57:28.190 --> 01:57:29.420 Hagrid still collide. 01:57:29.420 --> 01:57:30.620 So let me come back to you. 01:57:30.620 --> 01:57:33.788 How can we reduce the probability of Harry and Hagrid colliding? 01:57:33.788 --> 01:57:35.330 AUDIENCE: [INAUDIBLE] another letter? 01:57:35.330 --> 01:57:35.630 DAVID J. MALAN: Yeah. 01:57:35.630 --> 01:57:37.250 So we can look at the third letter. 01:57:37.250 --> 01:57:38.240 So let me try that. 01:57:38.240 --> 01:57:44.030 Instead of HA, let's look at HAA, HAB, HAC, dot, dot, dot, HAQ, dot, dot, dot, 01:57:44.030 --> 01:57:47.610 HEQ, HER, HES, and so forth. 01:57:47.610 --> 01:57:50.870 And now, I think those names and probably all of the others we saw 01:57:50.870 --> 01:57:54.630 are now much more cleanly distributed. 01:57:54.630 --> 01:57:57.140 There is much lower probability of collisions 01:57:57.140 --> 01:58:01.040 unless two people have almost the same names or one is a prefix of the other. 01:58:01.040 --> 01:58:06.140 But, but, but, even though we're now closer than ever to constant time 01:58:06.140 --> 01:58:09.470 because the odds that we hit a collision and have to devolve to a linked list 01:58:09.470 --> 01:58:11.990 are much lower, what's the downside that's 01:58:11.990 --> 01:58:16.840 not completely obvious from how I've depicted this on screen? 01:58:16.840 --> 01:58:18.400 What's the price I'm paying here? 01:58:18.400 --> 01:58:18.910 Yeah. 01:58:18.910 --> 01:58:19.960 AUDIENCE: So much memory. 01:58:19.960 --> 01:58:22.330 DAVID J. MALAN: This is a huge amount of memory. 01:58:22.330 --> 01:58:26.560 The number of cells here in the array is now what, 26 times 26 times 01:58:26.560 --> 01:58:30.730 26 for the first, the second, and the third possible characters 01:58:30.730 --> 01:58:32.500 all combinatorically combined here. 01:58:32.500 --> 01:58:33.100 That's a lot. 01:58:33.100 --> 01:58:34.100 I didn't even draw them. 01:58:34.100 --> 01:58:36.830 I have the dot, dot, dot. [? Could you ?] evoke that instead? 01:58:36.830 --> 01:58:38.260 That's a huge amount of memory. 01:58:38.260 --> 01:58:40.390 This is a very sparse data set now. 01:58:40.390 --> 01:58:43.120 And odds are, you're going to waste so much memory, 01:58:43.120 --> 01:58:47.623 even for the names like HAE, HA, HEQ. 01:58:47.623 --> 01:58:48.790 I can't even think of names. 01:58:48.790 --> 01:58:50.860 So many of those buckets are going to be empty, 01:58:50.860 --> 01:58:55.400 not to mention the AAA and the ZZZ and everything else in between. 01:58:55.400 --> 01:58:56.500 So it's a trade-off. 01:58:56.500 --> 01:58:58.387 And it might be too expensive a trade-off. 01:58:58.387 --> 01:59:00.220 And so, you might have to tolerate something 01:59:00.220 --> 01:59:03.040 like the collisions we had earlier, whereby even though they might 01:59:03.040 --> 01:59:09.520 very well happen, at least you are decreasing the probability by perhaps 01:59:09.520 --> 01:59:10.940 having more buckets like this. 01:59:10.940 --> 01:59:13.910 And in fact, if I rewind now to where we might have gone with this, 01:59:13.910 --> 01:59:17.500 here's how we might represent these nodes in the tree. 01:59:17.500 --> 01:59:19.810 Previously, in the past, we've had a person 01:59:19.810 --> 01:59:23.230 who had a string name and a string number, a.k.a. now char star. 01:59:23.230 --> 01:59:27.250 And so, here now might be how in this hash table 01:59:27.250 --> 01:59:31.630 we represent someone's name and number as well as a pointer 01:59:31.630 --> 01:59:33.610 to the next element in the list. 01:59:33.610 --> 01:59:35.380 Let me rewind just to the picture here. 01:59:35.380 --> 01:59:38.140 We keep drawing different shapes, because, again, these are abstractions. 01:59:38.140 --> 01:59:39.932 Who really cares if they're at a scale now. 01:59:39.932 --> 01:59:42.568 We've got enough room for the person's name, not pictured 01:59:42.568 --> 01:59:44.110 on the screen it's Hermione's number. 01:59:44.110 --> 01:59:45.910 But that's somewhere in this rectangle. 01:59:45.910 --> 01:59:48.100 But yes, pictured here in this little square 01:59:48.100 --> 01:59:50.860 is a pointer to the next node in the list. 01:59:50.860 --> 01:59:54.640 So by storing name and number, maybe her address, maybe her mailing address, 01:59:54.640 --> 01:59:58.360 whatever, in addition to a pointer allows each of these nodes 01:59:58.360 --> 02:00:01.750 to be connectable just like the nodes in the linked list. 02:00:01.750 --> 02:00:04.160 But where they're starting is in an array. 02:00:04.160 --> 02:00:06.580 So the array gets us 126-- 02:00:06.580 --> 02:00:10.630 or gets us-- narrows the problem from size 26 to one, 02:00:10.630 --> 02:00:12.520 gets us to the linked list in question. 02:00:12.520 --> 02:00:16.520 Hopefully, it's a single person or perhaps it has more than that. 02:00:16.520 --> 02:00:19.210 Meanwhile, what is the hash table itself-- 02:00:19.210 --> 02:00:22.960 the hash table, the whole thing is literally just an array. 02:00:22.960 --> 02:00:25.990 I've hardcoded the simplest version of size 26 02:00:25.990 --> 02:00:29.470 but what do each of those boxes in the vertical array 02:00:29.470 --> 02:00:33.520 represent just a pointer to potentially a node, a node in a linked list. 02:00:33.520 --> 02:00:36.130 And if there's no one there, if there's no one in location y, 02:00:36.130 --> 02:00:38.530 or x, or the like in that universe, well, it's 02:00:38.530 --> 02:00:41.890 just a null pointer signifying there's no one there. 02:00:41.890 --> 02:00:45.790 But if there is, it's going to be a pointer to a valid node from which we 02:00:45.790 --> 02:00:48.135 can get to any of the others as well. 02:00:48.135 --> 02:00:50.260 And that so-called hash function, just like the one 02:00:50.260 --> 02:00:52.120 I did with the greeting cards, well, it's 02:00:52.120 --> 02:00:55.780 just a black box, if you will, but implemented somewhere in code, 02:00:55.780 --> 02:01:00.850 like in C. And so, if you pass in Albus, what is the hash value of Albus? 02:01:00.850 --> 02:01:05.260 Well, in the first version of the story with 26 buckets, it should be a zero. 02:01:05.260 --> 02:01:07.900 If you pass in Zacharias, it should be 25. 02:01:07.900 --> 02:01:11.980 And so, just as my cards were being hashed to one of one, two, three, 02:01:11.980 --> 02:01:16.780 four values, now these names are being hashed to one of 26 possibilities, 02:01:16.780 --> 02:01:22.420 or 26 times 26, or 26 to the third power if you have more and more granularity 02:01:22.420 --> 02:01:24.620 than that. 02:01:24.620 --> 02:01:31.850 Questions on this implementation now of this idea of a hash table? 02:01:31.850 --> 02:01:38.090 AUDIENCE: Should we [INAUDIBLE] the value of null and then [INAUDIBLE]?? 02:01:41.470 --> 02:01:43.570 DAVID J. MALAN: If you-- 02:01:43.570 --> 02:01:45.173 say that again with the null? 02:01:45.173 --> 02:01:46.840 AUDIENCE: Could you ever be [INAUDIBLE]? 02:01:55.190 --> 02:01:56.690 DAVID J. MALAN: Oh, a good question. 02:01:56.690 --> 02:01:58.398 So if there's so much sparseness, there's 02:01:58.398 --> 02:02:00.160 all of these empty cells in the array. 02:02:00.160 --> 02:02:03.040 Couldn't you just go in and free them, or delete them, or just 02:02:03.040 --> 02:02:07.690 shrink the array and not have AAA, and AAB, and AAC only 02:02:07.690 --> 02:02:10.240 have the prefixes, two or three characters 02:02:10.240 --> 02:02:12.970 that you need, you absolutely could do that. 02:02:12.970 --> 02:02:16.780 But now, what you lose is the arithmetic benefit of being 02:02:16.780 --> 02:02:22.480 able to map each letter to a number if you start freeing up unused space, 02:02:22.480 --> 02:02:26.347 you don't know that Zacharias is necessarily at location 25. 02:02:26.347 --> 02:02:28.180 Albus is still going to be at location zero. 02:02:28.180 --> 02:02:30.760 But if you've deleted some of the elements in the middle, 02:02:30.760 --> 02:02:33.460 Zacharias could be at 24 if you've deleted one, 02:02:33.460 --> 02:02:35.510 23 if you've deleted another. 02:02:35.510 --> 02:02:37.900 And so, you don't have that arithmetic immediacy 02:02:37.900 --> 02:02:41.620 that you need in order to index into the array with constant time. 02:02:41.620 --> 02:02:45.420 And the same is going to be true if it's two letters or three letters. 02:02:45.420 --> 02:02:47.920 You need to be able to trust that you can do some quick math 02:02:47.920 --> 02:02:50.890 and jump to the right index in constant time. 02:02:50.890 --> 02:02:53.000 And that's, again, the appeal of these arrays. 02:02:53.000 --> 02:02:57.370 So when it comes to the running time of a hash table, inserting values into it, 02:02:57.370 --> 02:03:00.200 searching for values therein, at the end of the day, 02:03:00.200 --> 02:03:01.990 it technically is big O of n. 02:03:01.990 --> 02:03:06.580 Because in the craziest case, you might have a huge fancy hash table. 02:03:06.580 --> 02:03:09.490 But everyone in the universe has a name starting with H. 02:03:09.490 --> 02:03:12.370 And then, it just evolves into a really long linked list 02:03:12.370 --> 02:03:14.870 just like a binary search tree could do the same. 02:03:14.870 --> 02:03:17.620 But if you choose a smarter hash function, maybe you mitigate that 02:03:17.620 --> 02:03:19.570 and you don't rely only on the first letter 02:03:19.570 --> 02:03:23.860 but on the second, or the third as well, or some other combination of that input 02:03:23.860 --> 02:03:26.260 and make your hash function smarter, odds 02:03:26.260 --> 02:03:30.520 are if you get a good hash function you want to get it to be more of order of n 02:03:30.520 --> 02:03:33.620 divided by k, where k means constant mathematically. 02:03:33.620 --> 02:03:35.510 And so, k is the number of buckets. 02:03:35.510 --> 02:03:38.140 So ideally you want a uniform distribution. 02:03:38.140 --> 02:03:41.180 You want this many people here, this many people here. 02:03:41.180 --> 02:03:43.600 You don't want there to be some or no people. 02:03:43.600 --> 02:03:45.965 You want a uniform statistical distribution. 02:03:45.965 --> 02:03:48.340 And maybe you get that from human names, maybe you don't. 02:03:48.340 --> 02:03:50.680 But that's the challenge of a hash function. 02:03:50.680 --> 02:03:53.290 Of course, big O of n over k is not a thing 02:03:53.290 --> 02:03:56.050 because we always throw away constants like k. 02:03:56.050 --> 02:03:58.240 So it's still big O of n. 02:03:58.240 --> 02:04:00.970 But again, the distinction today is that, OK, yes, 02:04:00.970 --> 02:04:03.640 academically you learned in CS50 that, sure, it's big O of n. 02:04:03.640 --> 02:04:08.000 But my God, it's 26 times faster if you do the hash function well 02:04:08.000 --> 02:04:10.600 and you spread everyone out over the hash table. 02:04:10.600 --> 02:04:14.200 And that's the appeal of these kinds of structures. 02:04:14.200 --> 02:04:16.960 And we've got one more for you, if I may. 02:04:16.960 --> 02:04:20.180 And that's something now known as a try. 02:04:20.180 --> 02:04:23.500 So it turns out that a try is even cooler, 02:04:23.500 --> 02:04:29.560 if you like this kind of thing, in that it does not devolve into big O of n. 02:04:29.560 --> 02:04:31.707 It is truly constant time. 02:04:31.707 --> 02:04:33.040 But there's going to be a price. 02:04:33.040 --> 02:04:34.360 There's going to be a gotcha. 02:04:34.360 --> 02:04:38.230 A tri is sort of a fancier tree. 02:04:38.230 --> 02:04:41.200 And it's short for retrieval, but pronounced tri 02:04:41.200 --> 02:04:42.670 for weird historical reasons. 02:04:42.670 --> 02:04:47.980 But a tri is a tree, each of whose nodes is an array. 02:04:47.980 --> 02:04:49.427 So this is all crazy mash-ups now. 02:04:49.427 --> 02:04:52.510 People started inventing data structures just by combining different ones. 02:04:52.510 --> 02:04:54.677 So unfortunately, a lot of the good ideas are taken. 02:04:54.677 --> 02:04:59.643 But you just have benefits from certain aspects of those data structures. 02:04:59.643 --> 02:05:02.810 And combining them, ideally, gives you the best of both worlds, so to speak. 02:05:02.810 --> 02:05:04.940 So here might be the root of a tri. 02:05:04.940 --> 02:05:06.850 It's literally a big node, a big rectangle. 02:05:06.850 --> 02:05:08.480 But it's actually an array. 02:05:08.480 --> 02:05:11.540 So there's 26 locations in this picture here. 02:05:11.540 --> 02:05:14.920 And here's how you use a tri, for instance, the store names 02:05:14.920 --> 02:05:17.410 just like the hash table. 02:05:17.410 --> 02:05:20.680 You treat each of the elements of that array in that node 02:05:20.680 --> 02:05:22.100 as a letter of the alphabet. 02:05:22.100 --> 02:05:24.590 So A through Z or zero through 25. 02:05:24.590 --> 02:05:28.910 And if you want to store someone's name in here, you do so as follows. 02:05:28.910 --> 02:05:34.270 If you want to store an H, you index into the H location. 02:05:34.270 --> 02:05:37.990 And if you want to store the second letter of someone's name, like an A, 02:05:37.990 --> 02:05:40.180 well, you add another node below it. 02:05:40.180 --> 02:05:40.750 And such. 02:05:40.750 --> 02:05:42.190 One is connected to the other. 02:05:42.190 --> 02:05:45.130 And you then identify the A in that array. 02:05:45.130 --> 02:05:49.240 And then you go on and maybe put a G if the goal is to store-- spoiler now-- 02:05:49.240 --> 02:05:51.730 Hagrid in this data structure. 02:05:51.730 --> 02:05:54.782 And then the R, and the I, and then the D. 02:05:54.782 --> 02:05:56.740 But when you get to the D, the end of the name, 02:05:56.740 --> 02:06:00.070 you have to somehow flag that this is the end of a name 02:06:00.070 --> 02:06:02.380 that we've embedded into this data structure. 02:06:02.380 --> 02:06:04.780 So whereas all of these are called out in white just 02:06:04.780 --> 02:06:06.910 to make it obvious what we're connecting to what, 02:06:06.910 --> 02:06:09.460 green has to be like a bool that's true that 02:06:09.460 --> 02:06:11.830 just indicates the buck stops here. 02:06:11.830 --> 02:06:15.160 D is the last letter in someone's actual name. 02:06:15.160 --> 02:06:18.820 And what's kind of cool now about a tri is 02:06:18.820 --> 02:06:21.260 that we can repeat this for other names as well. 02:06:21.260 --> 02:06:24.770 So for instance, here's where we might put Harry as well. 02:06:24.770 --> 02:06:30.100 And notice, they share a common prefix, HA for Hagrid, HA for Harry. 02:06:30.100 --> 02:06:33.430 So we're reusing some of these nodes, some of these arrays. 02:06:33.430 --> 02:06:36.820 We can even slip Hermione in here too borrowing only the H. 02:06:36.820 --> 02:06:39.790 But she gets the H, then the E, then R-- 02:06:39.790 --> 02:06:42.650 R-M-I-O-N-E, and so forth. 02:06:42.650 --> 02:06:46.360 And we mark at the end of her name too that she's in there. 02:06:46.360 --> 02:06:48.440 Now, what's the takeaway here? 02:06:48.440 --> 02:06:51.670 Well what is the running time of a tri? 02:06:51.670 --> 02:06:55.820 How many steps does it take to find someone in this data structure? 02:06:55.820 --> 02:06:59.150 And let me zoom out so that it suddenly becomes a massive data 02:06:59.150 --> 02:07:01.070 structure with even more in it. 02:07:01.070 --> 02:07:02.780 Maybe it looks-- sorry. 02:07:02.780 --> 02:07:04.040 No, I'll keep it on this one. 02:07:04.040 --> 02:07:07.040 Maybe it looks a little something like this with just these three names. 02:07:07.040 --> 02:07:12.560 But how many steps does it take to find Hagrid, or Harry, or Hermione, 02:07:12.560 --> 02:07:15.230 no matter how many names are in this data structure? 02:07:15.230 --> 02:07:16.820 There's three at the moment. 02:07:16.820 --> 02:07:18.230 But it takes what? 02:07:18.230 --> 02:07:23.390 H-A-G-R-I-D, so six steps to find Hagrid. 02:07:23.390 --> 02:07:26.660 H-A-R-R-Y, five steps to find Harry. 02:07:26.660 --> 02:07:31.760 H-E-R-M-I-O-N-E, eight steps to find Hermione. 02:07:31.760 --> 02:07:35.180 But notice, that those steps are only dependent on what 02:07:35.180 --> 02:07:37.130 the lengths of the humans names. 02:07:37.130 --> 02:07:41.180 And let's assume that no one's going to have a infinitely long name. 02:07:41.180 --> 02:07:42.830 It's going to max out at what, eight? 02:07:42.830 --> 02:07:44.870 No, maybe 18, maybe 20, 30. 02:07:44.870 --> 02:07:47.270 There's actually some pretty long human names out there. 02:07:47.270 --> 02:07:49.080 But it's going to be finite. 02:07:49.080 --> 02:07:50.660 You know it's a bounded. 02:07:50.660 --> 02:07:54.360 Whereas most contexts in could grow forever. 02:07:54.360 --> 02:07:57.530 So what's compelling here is if you assume that the longest name is, 02:07:57.530 --> 02:08:00.170 I don't know, 50 for the sake of a theme here, 02:08:00.170 --> 02:08:03.080 then you know that finding anyone in this data structure 02:08:03.080 --> 02:08:05.180 will take more than 50 steps. 02:08:05.180 --> 02:08:10.520 50 is thus a constant, which means you have big O of one running time. 02:08:10.520 --> 02:08:13.190 It doesn't matter if there's a million people in this phonebook 02:08:13.190 --> 02:08:14.990 or a billion people in this phonebook. 02:08:14.990 --> 02:08:17.010 That's going to definitely add more nodes to it. 02:08:17.010 --> 02:08:18.940 But it's still going to take you H-A-R-- 02:08:18.940 --> 02:08:22.880 I'm sorry-- H-A-G-R-I-D, six steps to find Hagrid. 02:08:22.880 --> 02:08:28.250 H-A-R-R-Y, five steps to find Harry even if there is a billion other people 02:08:28.250 --> 02:08:29.820 in that data structure. 02:08:29.820 --> 02:08:32.978 So now, we actually do seem to have constant time 02:08:32.978 --> 02:08:36.020 if you assume that there's going to be a bound on the length of the name. 02:08:36.020 --> 02:08:38.960 Why don't we use tries for everything then? 02:08:38.960 --> 02:08:41.660 What's the price we're paying for this data structure 02:08:41.660 --> 02:08:44.630 even though we've represented just three characters here? 02:08:44.630 --> 02:08:45.200 Yeah. 02:08:45.200 --> 02:08:46.860 AUDIENCE: [INAUDIBLE] 02:08:46.860 --> 02:08:48.580 DAVID J. MALAN: It's a lot of memory. 02:08:48.580 --> 02:08:49.080 Yeah. 02:08:49.080 --> 02:08:51.330 And you can see it even with these three names, most 02:08:51.330 --> 02:08:55.500 of the squares on the screen are empty, bytes and bits that are there 02:08:55.500 --> 02:08:56.460 and are allocated. 02:08:56.460 --> 02:08:58.210 And they need to be there because you need 02:08:58.210 --> 02:09:01.297 to be able to do that arithmetic thing of this being zero, this being 25. 02:09:01.297 --> 02:09:04.380 So you can jump from boom, boom, boom, boom, based on each of the letters. 02:09:04.380 --> 02:09:07.770 But it's a hugely sparse data structure, which means 02:09:07.770 --> 02:09:10.060 it takes up a crazy amount of memory. 02:09:10.060 --> 02:09:12.865 Now, maybe that's tolerable, especially for short names. 02:09:12.865 --> 02:09:14.740 But that's going to be the trade-off as well. 02:09:14.740 --> 02:09:16.470 And this is such a tension in computing. 02:09:16.470 --> 02:09:19.410 Almost any time you want to improve time, 02:09:19.410 --> 02:09:22.560 you want to speed up the efficiency, the speed of your algorithm, 02:09:22.560 --> 02:09:23.880 you're going to spend space. 02:09:23.880 --> 02:09:27.270 If by contrast you want to decrease the amount of space, 02:09:27.270 --> 02:09:30.150 you might very well have to increase the running time. 02:09:30.150 --> 02:09:32.470 And it is indeed, this seesaw back and forth. 02:09:32.470 --> 02:09:34.500 And you, your colleagues, your company need 02:09:34.500 --> 02:09:36.990 to decide what resource is the most precious. 02:09:36.990 --> 02:09:40.590 Heck, it might be much harder to code one of these data structures 02:09:40.590 --> 02:09:41.190 than another. 02:09:41.190 --> 02:09:41.880 You're a human. 02:09:41.880 --> 02:09:42.990 Your time is valuable. 02:09:42.990 --> 02:09:46.170 Do you really want to spend hours implementing a tri when, 02:09:46.170 --> 02:09:49.680 hey, in 30 minutes I can bang out an array nowadays or a linked list even. 02:09:49.680 --> 02:09:52.800 There too, development time is going to be yet another resource and why 02:09:52.800 --> 02:09:57.000 sometimes there's good code or bad code, it depends on what you're prioritizing. 02:09:57.000 --> 02:09:59.880 So what do each of these nodes look like in a tri? 02:09:59.880 --> 02:10:01.380 Well, we can keep calling it a node. 02:10:01.380 --> 02:10:04.710 This is a very generic term for just a container in these data structures. 02:10:04.710 --> 02:10:10.080 In this story though, let me claim that everyone has a number, like a phone 02:10:10.080 --> 02:10:12.840 number, a string, a.k.a char star. 02:10:12.840 --> 02:10:18.330 Every node has 26 children, or technically an array of size 26 02:10:18.330 --> 02:10:20.520 that can point to more of these nodes. 02:10:20.520 --> 02:10:25.020 Notice that I don't need to store the name of someone in a try 02:10:25.020 --> 02:10:28.330 because it's implicit in the path that you take to find them. 02:10:28.330 --> 02:10:29.670 So that's a minor optimization. 02:10:29.670 --> 02:10:31.330 But it saves us some space. 02:10:31.330 --> 02:10:33.750 But this would be just a different data structure 02:10:33.750 --> 02:10:37.740 we could use to actually solve this problem as well, 02:10:37.740 --> 02:10:39.570 albeit at a very expensive cost. 02:10:39.570 --> 02:10:42.480 And what do we need our variable to be that stores the try? 02:10:42.480 --> 02:10:44.880 Just like before, we just need a single pointer 02:10:44.880 --> 02:10:48.000 that hangs on to the root of this structure that's 02:10:48.000 --> 02:10:52.790 null if it's empty or non-null if it's actually pointing at something. 02:10:52.790 --> 02:10:56.210 Any questions then on tries? 02:10:56.210 --> 02:10:59.420 And if it's feeling like a lot, the fire hydrant, it is. 02:10:59.420 --> 02:11:03.350 We started with arrays, then linked list, then tries. 02:11:03.350 --> 02:11:06.230 But questions on how we've just assembled from these basic building 02:11:06.230 --> 02:11:06.730 blocks? 02:11:06.730 --> 02:11:07.326 Yeah. 02:11:07.326 --> 02:11:10.122 AUDIENCE: Why isn't the root of tri [INAUDIBLE]?? 02:11:10.122 --> 02:11:11.520 Why does that matter? 02:11:11.520 --> 02:11:12.853 DAVID J. MALAN: A good question. 02:11:12.853 --> 02:11:14.820 Why is this not size 26? 02:11:14.820 --> 02:11:19.380 It's just like with the linked list before, it just 02:11:19.380 --> 02:11:23.550 tends to be in code convenient to have a separate additional pointer that's 02:11:23.550 --> 02:11:26.220 small that just points to the beginning of the data structure. 02:11:26.220 --> 02:11:29.010 Because that way, it can be null, thereby clearly indicating 02:11:29.010 --> 02:11:30.030 there are no nodes. 02:11:30.030 --> 02:11:31.380 The whole structure is empty. 02:11:31.380 --> 02:11:35.095 If you allocated one of those nodes, you absolutely could. 02:11:35.095 --> 02:11:37.470 But then you'd be just wasting space, even if it's empty. 02:11:37.470 --> 02:11:38.980 And it creates an ambiguity. 02:11:38.980 --> 02:11:41.370 So just having a single pointer linked to the beginnings 02:11:41.370 --> 02:11:44.320 of all of these things is a good thing. 02:11:44.320 --> 02:11:52.750 Other questions now on tries, or trees, or hash tables, or arrays? 02:11:52.750 --> 02:11:55.450 So what problems might arise? 02:11:55.450 --> 02:11:57.520 Well, here's a counterexample. 02:11:57.520 --> 02:12:00.385 What names are manifest in this tri here? 02:12:03.330 --> 02:12:04.580 Feel free to just call it out. 02:12:08.333 --> 02:12:09.000 What do you see? 02:12:09.000 --> 02:12:10.320 AUDIENCE: Danielle. 02:12:10.320 --> 02:12:13.360 DAVID J. MALAN: Danielle and Daniel. 02:12:13.360 --> 02:12:16.530 So presumably, if these are two names here, one of which 02:12:16.530 --> 02:12:19.680 is a prefix of another, notice that the data structure still works. 02:12:19.680 --> 02:12:22.830 And I chose a friend's name and then appended a couple of more characters 02:12:22.830 --> 02:12:26.790 to it that's also a name because we have here D-A-N-I-E-L. 02:12:26.790 --> 02:12:30.255 And the green technically is implemented as a bool or something like that 02:12:30.255 --> 02:12:31.950 that indicates a word stops here. 02:12:31.950 --> 02:12:35.130 But we don't want to preclude storing Danielle as well, who's 02:12:35.130 --> 02:12:37.440 a superstring, if you will, of Daniel. 02:12:37.440 --> 02:12:41.250 And so, that's OK too, so long as the structure allows for the pointers 02:12:41.250 --> 02:12:41.860 to keep going. 02:12:41.860 --> 02:12:46.000 So even that works out OK whereas it might not have otherwise. 02:12:46.000 --> 02:12:49.170 And in terms of the running time, just to be clear, at the end of the day, 02:12:49.170 --> 02:12:55.890 tries do give you actual constant time for insertions, lookups, deletions, 02:12:55.890 --> 02:13:00.210 and the like because it's dependent only on the length of the input, the key, 02:13:00.210 --> 02:13:05.100 if you will, and not on how many other people are in your phone or address 02:13:05.100 --> 02:13:06.060 book. 02:13:06.060 --> 02:13:08.760 And I thought we'd conclude with a visual. 02:13:08.760 --> 02:13:13.140 If you've gotten out into the square, anyone recognize this? 02:13:13.140 --> 02:13:15.120 Sweet Green, a local salad place. 02:13:15.120 --> 02:13:18.427 What are we looking at here and what's its connection to today? 02:13:18.427 --> 02:13:20.760 You're about to become all the geekier in the real world 02:13:20.760 --> 02:13:24.300 because you will start to see data structures everywhere. 02:13:24.300 --> 02:13:25.680 What is this? 02:13:25.680 --> 02:13:29.460 Or how does this work maybe in salad form? 02:13:29.460 --> 02:13:32.287 Who's been to Sweet Green. 02:13:32.287 --> 02:13:32.870 Either of you. 02:13:32.870 --> 02:13:34.145 So how does this work? 02:13:34.145 --> 02:13:35.422 [LAUGHTER] 02:13:35.422 --> 02:13:40.713 AUDIENCE: [INAUDIBLE] They'll put your order [INAUDIBLE] 02:13:40.713 --> 02:13:42.568 your name [INAUDIBLE] L. 02:13:42.568 --> 02:13:43.610 DAVID J. MALAN: OK, good. 02:13:43.610 --> 02:13:46.400 So if you order a salad for someone named L, when it's ready, 02:13:46.400 --> 02:13:48.810 they put it in the L section here. 02:13:48.810 --> 02:13:51.680 And so, this is a set of key value pairs, right? 02:13:51.680 --> 02:13:55.710 If L is the first letter of someone's name, the value hopefully is the salad. 02:13:55.710 --> 02:13:58.880 And so, what you kind of have here is a dictionary, key value pairs, 02:13:58.880 --> 02:14:00.500 where it's not words and definitions. 02:14:00.500 --> 02:14:02.300 It's names and salads. 02:14:02.300 --> 02:14:04.550 And you can think of this too as kind of a hash table. 02:14:04.550 --> 02:14:05.050 Why? 02:14:05.050 --> 02:14:07.940 Even though it actually doesn't fit on one long shelf 02:14:07.940 --> 02:14:10.520 because the store is only so big, this is really an array. 02:14:10.520 --> 02:14:13.580 And apparently, A is missing or maybe it's around the corner. 02:14:13.580 --> 02:14:16.440 But this array just happens to wrap onto multiple lines. 02:14:16.440 --> 02:14:18.680 But it's still conceptually a single dimension. 02:14:18.680 --> 02:14:20.960 But suppose two people have the name L? 02:14:20.960 --> 02:14:23.102 What do they do typically? 02:14:23.102 --> 02:14:26.300 AUDIENCE: They just put the one [INAUDIBLE] second letter [INAUDIBLE].. 02:14:26.300 --> 02:14:27.175 DAVID J. MALAN: Yeah. 02:14:27.175 --> 02:14:29.870 So maybe they-- if they want to put that much effort into it. 02:14:29.870 --> 02:14:31.850 They might look at the second letter and then the third letter. 02:14:31.850 --> 02:14:35.270 Odds are this is not that interesting a problem to solve optimally in that way. 02:14:35.270 --> 02:14:39.000 But they probably do start stacking the salads on top of each other 02:14:39.000 --> 02:14:41.040 or maybe scooching it over just a little bit. 02:14:41.040 --> 02:14:42.390 And so, what do you have there? 02:14:42.390 --> 02:14:45.770 Well, now, if you start to view the lens through CS50 glasses, 02:14:45.770 --> 02:14:46.820 OK, you have an array. 02:14:46.820 --> 02:14:49.140 And then you have these linked lists that are sort of growing here. 02:14:49.140 --> 02:14:50.490 But even then you run into a problem. 02:14:50.490 --> 02:14:50.780 Why? 02:14:50.780 --> 02:14:52.070 Because it's not really a linked list. 02:14:52.070 --> 02:14:54.612 Because at some point you're going to hit the boundary here. 02:14:54.612 --> 02:14:57.320 So it's kind of like an array of arrays because you can only fit, 02:14:57.320 --> 02:14:59.400 what, three or four salads here. 02:14:59.400 --> 02:15:01.370 And so, long story short, we started today 02:15:01.370 --> 02:15:04.610 deliberately talking about real-world things like stacks and queues. 02:15:04.610 --> 02:15:07.910 And even though it did escalate quickly into binary search trees, 02:15:07.910 --> 02:15:11.362 and hash tables, and tries, even those things are everywhere. 02:15:11.362 --> 02:15:13.070 Even though they don't call them as such, 02:15:13.070 --> 02:15:14.612 these are just solutions to problems. 02:15:14.612 --> 02:15:16.910 And now, with this final week of C under your belt, 02:15:16.910 --> 02:15:19.310 you have all the more of a technical toolkit 02:15:19.310 --> 02:15:21.500 by which to implement these things in code. 02:15:21.500 --> 02:15:25.310 Next week we'll be able to trust that someone else solved all these problems. 02:15:25.310 --> 02:15:26.450 We'll introduce Python. 02:15:26.450 --> 02:15:30.083 And lines of code like this will finally become lines of code like that. 02:15:30.083 --> 02:15:31.250 So that's the promise ahead. 02:15:31.250 --> 02:15:32.510 And we'll see you next time. 02:15:32.510 --> 02:15:33.710 [APPLAUSE] 02:15:33.710 --> 02:15:36.460 [MUSIC PLAYING]