1 00:00:00,000 --> 00:00:01,992 [MUSIC PLAYING] 2 00:00:01,992 --> 00:01:12,587 3 00:01:12,587 --> 00:01:13,670 DAVID J. MALAN: All right. 4 00:01:13,670 --> 00:01:15,200 This is CS50. 5 00:01:15,200 --> 00:01:20,270 And this is week five, which is going to be our last week in C. But what 6 00:01:20,270 --> 00:01:22,265 that means is that we'll have OK-- 7 00:01:22,265 --> 00:01:25,470 [CHEERING AND APPLAUSE] 8 00:01:25,470 --> 00:01:28,500 But with this week, with last week, and really all of the weeks 9 00:01:28,500 --> 00:01:30,330 prior, have you been hopefully acquiring, 10 00:01:30,330 --> 00:01:33,527 if slowly and with some challenge, some fundamental building 11 00:01:33,527 --> 00:01:35,610 blocks that are still going to underlie everything 12 00:01:35,610 --> 00:01:38,370 we continue doing in the semester, even as we transition 13 00:01:38,370 --> 00:01:40,140 to so-called higher level languages. 14 00:01:40,140 --> 00:01:43,260 Next week, indeed, we'll introduce Python, a very popular language that 15 00:01:43,260 --> 00:01:45,660 does not have pointers, does not have memory 16 00:01:45,660 --> 00:01:47,220 management at this very low level. 17 00:01:47,220 --> 00:01:49,290 But that's really just because someone else wrote 18 00:01:49,290 --> 00:01:50,880 the code that will do that for you. 19 00:01:50,880 --> 00:01:52,553 And it's going to make our lives easier. 20 00:01:52,553 --> 00:01:55,470 Because it means when you want to solve a problem conceptually up here 21 00:01:55,470 --> 00:01:58,140 to just get real work done or build something amazing, 22 00:01:58,140 --> 00:02:00,240 you don't have to really get into the same weeds 23 00:02:00,240 --> 00:02:03,390 as we have been deliberately this week and now last. 24 00:02:03,390 --> 00:02:06,600 But the goal ultimately is that you better understand and can better 25 00:02:06,600 --> 00:02:10,560 harness then all that a computer can do even in those higher level languages. 26 00:02:10,560 --> 00:02:12,480 So today, we're going to focus particularly 27 00:02:12,480 --> 00:02:16,170 on data structures, how you might structure your data in memory, which 28 00:02:16,170 --> 00:02:19,500 really amounts to building things digitally, stitching together 29 00:02:19,500 --> 00:02:22,405 ideas and concepts in memory using this new building 30 00:02:22,405 --> 00:02:24,780 block from last week, which of course are these pointers. 31 00:02:24,780 --> 00:02:28,680 Pointers allow you to store addresses in memory, like in variables. 32 00:02:28,680 --> 00:02:32,040 But with those simple addresses we can leave these breadcrumbs. 33 00:02:32,040 --> 00:02:33,600 We can point from here to there. 34 00:02:33,600 --> 00:02:36,792 And we can conceptually stitch pieces of data together. 35 00:02:36,792 --> 00:02:39,000 But there's going to be different ways of doing that. 36 00:02:39,000 --> 00:02:41,010 And today we'll focus first on what's generally 37 00:02:41,010 --> 00:02:42,960 known as an abstract data type. 38 00:02:42,960 --> 00:02:46,110 And similar to a type in C more generally, it does actually 39 00:02:46,110 --> 00:02:47,250 have some properties in it. 40 00:02:47,250 --> 00:02:50,520 But the underlying implementation details of an abstract data type 41 00:02:50,520 --> 00:02:52,170 are ultimately up to you. 42 00:02:52,170 --> 00:02:55,080 That is to say, an abstract data type can represent one thing 43 00:02:55,080 --> 00:02:56,310 and can do something. 44 00:02:56,310 --> 00:03:00,420 But how you implement it allows you some discretion underneath the hood. 45 00:03:00,420 --> 00:03:03,210 So for instance, in the world of computer science, 46 00:03:03,210 --> 00:03:05,550 a queue is actually a technical term. 47 00:03:05,550 --> 00:03:09,360 This is a type of data structure that we could theoretically build in code 48 00:03:09,360 --> 00:03:11,250 in C or really any other language. 49 00:03:11,250 --> 00:03:14,800 But a queue has properties just like queues in the real world. 50 00:03:14,800 --> 00:03:16,950 For instance, if you've ever lined up for something 51 00:03:16,950 --> 00:03:21,060 to get food in a restaurant, or go into a store, 52 00:03:21,060 --> 00:03:24,540 or wait for the airport to clear you, well, you've lined up in a queue. 53 00:03:24,540 --> 00:03:26,250 A queue being some sort of line. 54 00:03:26,250 --> 00:03:30,030 But what's noteworthy about queues are specific properties. 55 00:03:30,030 --> 00:03:34,920 They are first in, first out data structures, either virtually 56 00:03:34,920 --> 00:03:36,030 or in the human world. 57 00:03:36,030 --> 00:03:38,370 Which is to say the first person in the line 58 00:03:38,370 --> 00:03:40,657 should ideally be served first at the restaurant. 59 00:03:40,657 --> 00:03:43,740 Or the first person in the line should get through airport security first. 60 00:03:43,740 --> 00:03:46,650 By contrast, if it weren't first in first out, 61 00:03:46,650 --> 00:03:48,930 you can imagine how frustrated you would be 62 00:03:48,930 --> 00:03:51,223 if you have this inherent unfairness. 63 00:03:51,223 --> 00:03:54,390 In fact, if you've ever been in line at a store, a supermarket, or the like, 64 00:03:54,390 --> 00:03:56,460 and all of a sudden they maybe open a new line. 65 00:03:56,460 --> 00:03:59,910 And now the person behind you gets to cut ahead and go forward. 66 00:03:59,910 --> 00:04:02,730 That's because they've broken the concept of the queue. 67 00:04:02,730 --> 00:04:06,750 So it has this inherent potential for unfairness unless you maintain 68 00:04:06,750 --> 00:04:09,310 this first in first out property. 69 00:04:09,310 --> 00:04:12,570 This would be true too for a to-do list, just for productivity's sake. 70 00:04:12,570 --> 00:04:16,815 If you're in the habit on paper or virtually making a to-do list, ideally 71 00:04:16,815 --> 00:04:18,690 you probably want to go through that list top 72 00:04:18,690 --> 00:04:21,120 to bottom so that you actually get the first stuff you 73 00:04:21,120 --> 00:04:25,140 thought of done first rather than always focusing on your most recent thought. 74 00:04:25,140 --> 00:04:28,230 Now, within the world of queues, there's generally 75 00:04:28,230 --> 00:04:31,300 two operations, two functions, if you will, 76 00:04:31,300 --> 00:04:34,290 that any queue would have either in the real world or in the virtual. 77 00:04:34,290 --> 00:04:38,280 Enqueue is usually the technical term to mean adding something to a queue. 78 00:04:38,280 --> 00:04:41,610 But specifically, it means adding it to the end of the queue 79 00:04:41,610 --> 00:04:44,610 so that someone isn't cutting in line, for instance, to go up front. 80 00:04:44,610 --> 00:04:46,380 And then dequeue is just the opposite. 81 00:04:46,380 --> 00:04:48,480 When it's time for the first person in line 82 00:04:48,480 --> 00:04:51,970 to be served, the time for the first person in line to go through security, 83 00:04:51,970 --> 00:04:54,040 they are dequeued, so to speak. 84 00:04:54,040 --> 00:04:57,600 So a technical concept, ultimately, as it's implemented in code. 85 00:04:57,600 --> 00:04:59,760 But it's really just a real-world concept. 86 00:04:59,760 --> 00:05:02,670 And these are in contrast to another abstract data type 87 00:05:02,670 --> 00:05:04,560 that we might call a stack. 88 00:05:04,560 --> 00:05:08,370 And a stack, much like a stack of trays in the cafeteria 89 00:05:08,370 --> 00:05:10,698 has fundamentally different properties. 90 00:05:10,698 --> 00:05:13,740 You can still add and remove things from them, but consider what happens. 91 00:05:13,740 --> 00:05:16,980 Whenever they clean all of the trays in the cafeteria or the dining hall, 92 00:05:16,980 --> 00:05:20,410 they put one of the trays down here and then the next one on top of it. 93 00:05:20,410 --> 00:05:22,810 And then the next one on top of it, it, it, and so forth. 94 00:05:22,810 --> 00:05:24,810 But of course, which tray do you presumably 95 00:05:24,810 --> 00:05:28,740 take as a user of that physical stack? 96 00:05:28,740 --> 00:05:30,662 The top one, presumably. 97 00:05:30,662 --> 00:05:33,120 You're not going to futz down here and try to pull one out. 98 00:05:33,120 --> 00:05:35,328 And so, that would seem to have an opposite property. 99 00:05:35,328 --> 00:05:40,292 LIFO, last in first out is what characterizes something like a stack. 100 00:05:40,292 --> 00:05:42,750 And that just makes sense, certainly, in the physical world 101 00:05:42,750 --> 00:05:45,083 of stacking all of those cafeteria trays because it just 102 00:05:45,083 --> 00:05:47,730 makes more sense to grab the most recently 103 00:05:47,730 --> 00:05:50,580 added one, the last added one first. 104 00:05:50,580 --> 00:05:53,070 And at least in that context, the trays don't necessarily 105 00:05:53,070 --> 00:05:54,905 care what order they're used in. 106 00:05:54,905 --> 00:05:57,030 But even then, you could imagine that maybe there's 107 00:05:57,030 --> 00:05:59,910 some old dirty nasty tray at the very bottom that 108 00:05:59,910 --> 00:06:03,330 never gets used because you're constantly replenishing the stack. 109 00:06:03,330 --> 00:06:07,200 So there might very well be side effects of these kinds of features. 110 00:06:07,200 --> 00:06:09,990 You might be familiar with using Gmail for instance, or really 111 00:06:09,990 --> 00:06:11,340 any email program. 112 00:06:11,340 --> 00:06:14,710 What you're looking at in your inbox is technically a stack, 113 00:06:14,710 --> 00:06:16,710 at least if you've left the defaults configured. 114 00:06:16,710 --> 00:06:17,210 Why? 115 00:06:17,210 --> 00:06:18,180 You get a new email? 116 00:06:18,180 --> 00:06:19,080 Where does it end up? 117 00:06:19,080 --> 00:06:22,600 Not five pages of emails later, presumably right at the top. 118 00:06:22,600 --> 00:06:25,600 And the next one's right at the top, right at the top, right at the top. 119 00:06:25,600 --> 00:06:27,850 And so, if you're like me, you're guilty of eventually 120 00:06:27,850 --> 00:06:29,110 losing track of some emails. 121 00:06:29,110 --> 00:06:29,610 Why? 122 00:06:29,610 --> 00:06:32,430 Because you've pushed so many more emails onto the stack 123 00:06:32,430 --> 00:06:35,730 that you lose track of the things you got earliest. 124 00:06:35,730 --> 00:06:37,543 Last in first out though, is maintained. 125 00:06:37,543 --> 00:06:40,710 The most recent email you get might very well be the first one you reply to. 126 00:06:40,710 --> 00:06:43,950 But that's not necessarily good for responsiveness to everyone 127 00:06:43,950 --> 00:06:45,160 else out there. 128 00:06:45,160 --> 00:06:50,700 Similarly, if you store all of your sweaters in a stack like this, 129 00:06:50,700 --> 00:06:53,610 like a pile of black ones, below which is a red and then a blue. 130 00:06:53,610 --> 00:06:56,160 Stack might be perfectly fine for keeping things organized. 131 00:06:56,160 --> 00:06:59,490 It's the sane way to do it if you just have a shelf in your dorm room or home. 132 00:06:59,490 --> 00:07:02,160 But what's going to be a side effect of using a stack 133 00:07:02,160 --> 00:07:06,270 to store your sweaters if there are these in this way as opposed 134 00:07:06,270 --> 00:07:07,770 to a queue? 135 00:07:07,770 --> 00:07:08,790 Yeah. 136 00:07:08,790 --> 00:07:10,275 AUDIENCE: It's harder to get the red and blue ones? 137 00:07:10,275 --> 00:07:11,830 DAVID J. MALAN: It's harder to get the red and blue ones. 138 00:07:11,830 --> 00:07:15,010 So presumably you're going to much more often wear, for instance, 139 00:07:15,010 --> 00:07:16,950 if you will, black instead there. 140 00:07:16,950 --> 00:07:19,920 Now, the operations for adding things to a stack 141 00:07:19,920 --> 00:07:22,920 are similar in spirit, but just different vocabulary. 142 00:07:22,920 --> 00:07:25,950 You push something on top of a stack. 143 00:07:25,950 --> 00:07:28,920 Even though it's more like in the tray world you place it or rest it. 144 00:07:28,920 --> 00:07:31,410 But pushing means adding something to the top of the stack. 145 00:07:31,410 --> 00:07:36,370 And popping means removing something also from the top of the stack. 146 00:07:36,370 --> 00:07:40,590 So it's not a matter of enqueuing at the end and dequeuing at the beginning. 147 00:07:40,590 --> 00:07:42,960 With a stack, everything's happening on top. 148 00:07:42,960 --> 00:07:46,140 You're pushing onto the top and then popping off of the top. 149 00:07:46,140 --> 00:07:49,480 Now, when it comes to actual code, how might we implement something like this? 150 00:07:49,480 --> 00:07:51,600 Well, let's just focus on really how you might 151 00:07:51,600 --> 00:07:53,250 implement the data structure itself. 152 00:07:53,250 --> 00:07:54,870 And we won't implement any functions. 153 00:07:54,870 --> 00:07:57,360 You might implement the notion of a stack like this. 154 00:07:57,360 --> 00:07:58,620 We've seen typedef before. 155 00:07:58,620 --> 00:08:00,930 It just means define a new type of my own. 156 00:08:00,930 --> 00:08:03,430 Struct means here comes a structure, a.k.a. 157 00:08:03,430 --> 00:08:06,780 a data structure of one or more variables within. 158 00:08:06,780 --> 00:08:11,610 And let's suppose, like last time, we've had-- we defined already a person data 159 00:08:11,610 --> 00:08:13,110 type using a separate typedef. 160 00:08:13,110 --> 00:08:15,870 And every person has a name and a number or whatever. 161 00:08:15,870 --> 00:08:18,330 Let me just stipulate that person exists already. 162 00:08:18,330 --> 00:08:20,670 So here, you might have to implement a stack, 163 00:08:20,670 --> 00:08:24,390 an array called people of some capacity. 164 00:08:24,390 --> 00:08:28,450 Maybe it's an array of size 10, or 100, or whatever. 165 00:08:28,450 --> 00:08:30,810 This is a constant here in this context, capacity. 166 00:08:30,810 --> 00:08:34,320 And every element in that array is a person structure. 167 00:08:34,320 --> 00:08:37,270 And I now have this too, size. 168 00:08:37,270 --> 00:08:39,990 Now, this almost seems like a synonym for capacity. 169 00:08:39,990 --> 00:08:44,010 But maybe intuitively, anyone want to propose why I'm apparently 170 00:08:44,010 --> 00:08:47,010 maintaining separately the capacity of the stack 171 00:08:47,010 --> 00:08:49,440 but also the size of the stack? 172 00:08:49,440 --> 00:08:52,720 Why these two distinctions here? 173 00:08:52,720 --> 00:08:53,800 Yeah. 174 00:08:53,800 --> 00:08:57,515 AUDIENCE: Capacity is how high the stack could be. 175 00:08:57,515 --> 00:08:58,390 DAVID J. MALAN: Yeah. 176 00:08:58,390 --> 00:09:01,650 AUDIENCE: Size is how high it actually is right now. 177 00:09:01,650 --> 00:09:02,650 DAVID J. MALAN: Perfect. 178 00:09:02,650 --> 00:09:05,080 The capacity is how high the stack could be, 179 00:09:05,080 --> 00:09:09,550 like how much room is there for those sweatshirts in my closet, for instance. 180 00:09:09,550 --> 00:09:12,590 Whereas size is just literally at this moment in time, 181 00:09:12,590 --> 00:09:14,440 how many sweatshirts are in the stack. 182 00:09:14,440 --> 00:09:18,260 It's either capacity or fewer presumably in total there. 183 00:09:18,260 --> 00:09:19,450 So what is capacity? 184 00:09:19,450 --> 00:09:22,360 Well, we could implement this in perhaps a familiar way. 185 00:09:22,360 --> 00:09:25,720 I might just define a const somewhere else in my code of type int 186 00:09:25,720 --> 00:09:28,390 that just defines it to be capacity 50. 187 00:09:28,390 --> 00:09:30,550 But what perhaps is going to be the downside 188 00:09:30,550 --> 00:09:34,030 of implementing a stack in this way? 189 00:09:34,030 --> 00:09:37,120 Of using an array inside here? 190 00:09:37,120 --> 00:09:42,190 What's a downside now of implementing a stack using an array and this size 191 00:09:42,190 --> 00:09:45,020 variable within. 192 00:09:45,020 --> 00:09:48,320 What's a caveat here perhaps? 193 00:09:48,320 --> 00:09:48,900 Any hands? 194 00:09:48,900 --> 00:09:49,400 Yeah. 195 00:09:49,400 --> 00:09:51,372 AUDIENCE: [INAUDIBLE] 196 00:09:51,372 --> 00:09:51,600 197 00:09:51,600 --> 00:09:53,350 DAVID J. MALAN: So it's going to be harder 198 00:09:53,350 --> 00:09:55,142 to reach elements that aren't the last one. 199 00:09:55,142 --> 00:09:56,740 That is the most recently added one. 200 00:09:56,740 --> 00:09:59,600 So there could be some sweatshirts, so to speak, way down below. 201 00:09:59,600 --> 00:10:01,120 So we've seen that before too. 202 00:10:01,120 --> 00:10:03,640 But at some point too a limitation of this design 203 00:10:03,640 --> 00:10:05,740 is what it's going to be finite. 204 00:10:05,740 --> 00:10:09,580 I can maximally fit in this example 50 sweatshirts, or 50 emails, 205 00:10:09,580 --> 00:10:11,800 or 50 cafeteria trays, which is fine. 206 00:10:11,800 --> 00:10:13,600 But at least it's indeed finite. 207 00:10:13,600 --> 00:10:15,370 And at least in the computer's memory, it 208 00:10:15,370 --> 00:10:19,010 might be nice to use more, and more, and more, maybe as more things are 209 00:10:19,010 --> 00:10:20,260 getting added to the computer. 210 00:10:20,260 --> 00:10:21,910 So maybe I make this 500. 211 00:10:21,910 --> 00:10:24,970 Or heck, why don't I make it 5,000 or 50,000? 212 00:10:24,970 --> 00:10:26,680 Well, what's the trade-off there? 213 00:10:26,680 --> 00:10:30,040 If I want to have enough room to grow, seems 214 00:10:30,040 --> 00:10:33,370 like I should just crank up the value of capacity endlessly. 215 00:10:33,370 --> 00:10:39,480 But why might I not want to change the 50 to 500, or 5,000, or 50,000? 216 00:10:39,480 --> 00:10:42,630 What's the trade-off there perhaps just intuitively? 217 00:10:42,630 --> 00:10:43,696 Yeah. 218 00:10:43,696 --> 00:10:47,170 AUDIENCE: Because I don't want to touch memory that I'm not supposed to. 219 00:10:47,170 --> 00:10:48,280 DAVID J. MALAN: You don't want to touch memory that you're not 220 00:10:48,280 --> 00:10:49,480 supposed to be touching. 221 00:10:49,480 --> 00:10:52,810 And in this case, it wouldn't be-- that wouldn't be a risk per se 222 00:10:52,810 --> 00:10:54,650 unless you indeed overflow the stack. 223 00:10:54,650 --> 00:10:58,060 But there's a related issue in asking for that much memory. 224 00:10:58,060 --> 00:11:00,520 What would another downside be? 225 00:11:00,520 --> 00:11:06,015 AUDIENCE: You only have one element and you have [INAUDIBLE] 5,000 [INAUDIBLE].. 226 00:11:06,015 --> 00:11:06,890 DAVID J. MALAN: Yeah. 227 00:11:06,890 --> 00:11:07,910 [LAUGHS] Exactly. 228 00:11:07,910 --> 00:11:10,160 So if you've got a capacity of 5,000 but you're only 229 00:11:10,160 --> 00:11:14,030 using one of those elements, it's awkward to say it non-technically. 230 00:11:14,030 --> 00:11:16,310 Which is just to say very, very wasteful. 231 00:11:16,310 --> 00:11:17,450 That's just bad design. 232 00:11:17,450 --> 00:11:19,760 It's correct, it will work for up to 5,000 elements. 233 00:11:19,760 --> 00:11:23,030 But my gosh, you're wasting 4,999 extra spots. 234 00:11:23,030 --> 00:11:25,280 And that's not going to end well, especially if you're 235 00:11:25,280 --> 00:11:27,170 using more data structures in memory. 236 00:11:27,170 --> 00:11:29,720 Your Mac, your PC, your phone is surely going to run out 237 00:11:29,720 --> 00:11:31,220 of memory if you ask for that much. 238 00:11:31,220 --> 00:11:33,830 So it'd be nice if there was a bit more dynamism there, 239 00:11:33,830 --> 00:11:36,770 whether it's a stack or a queue, both of which 240 00:11:36,770 --> 00:11:39,080 might be implemented a little similarly in spirit. 241 00:11:39,080 --> 00:11:43,700 But let's conclude this abstraction by comparing-- thanks to a friend of ours, 242 00:11:43,700 --> 00:11:46,880 Professor Shannon Duvall of Elon University, who kindly put together 243 00:11:46,880 --> 00:11:50,030 this graphical animation that's just under two minutes long that paints 244 00:11:50,030 --> 00:11:52,640 a picture of these two types of abstract data structures. 245 00:11:52,640 --> 00:11:57,050 And then we'll dive in underneath to how we might implement problems like these. 246 00:11:57,050 --> 00:11:58,990 If we could dim the lights dramatically. 247 00:11:58,990 --> 00:11:59,657 [VIDEO PLAYBACK] 248 00:11:59,657 --> 00:12:01,220 [MUSIC PLAYING] 249 00:12:01,220 --> 00:12:04,010 250 00:12:04,010 --> 00:12:06,920 - Once upon a time, there was a guy named Jack. 251 00:12:06,920 --> 00:12:10,250 When it came to making friends, Jack did not have the knack. 252 00:12:10,250 --> 00:12:13,250 So Jack went to talk to the most popular guy he knew. 253 00:12:13,250 --> 00:12:15,920 He went up to Lou and asked, what do I do? 254 00:12:15,920 --> 00:12:18,410 Lou saw that his friend was really distressed. 255 00:12:18,410 --> 00:12:21,020 Well, Lou began, just look how you're dressed. 256 00:12:21,020 --> 00:12:23,630 Don't you have any clothes with a different look? 257 00:12:23,630 --> 00:12:24,770 Yes, said Jack. 258 00:12:24,770 --> 00:12:25,820 I sure do. 259 00:12:25,820 --> 00:12:28,250 Come to my house and I'll show them to you. 260 00:12:28,250 --> 00:12:29,540 So they went off to Jack's. 261 00:12:29,540 --> 00:12:33,290 And Jack showed Lou the box where he kept all his shirts, and his pants, 262 00:12:33,290 --> 00:12:34,250 at his socks. 263 00:12:34,250 --> 00:12:37,280 Lou said, I see you have all your clothes in a pile. 264 00:12:37,280 --> 00:12:39,800 Why don't you wear some others once in a while? 265 00:12:39,800 --> 00:12:43,110 Jack said, well, when I remove clothes and socks, 266 00:12:43,110 --> 00:12:45,710 I wash them and put them away in the box. 267 00:12:45,710 --> 00:12:48,260 Then comes the next morning and up I hop. 268 00:12:48,260 --> 00:12:51,440 I go to the box and get my clothes off the top. 269 00:12:51,440 --> 00:12:54,050 Lou quickly realized the problem with Jack. 270 00:12:54,050 --> 00:12:57,020 He kept clothes, CDs, and books in a stack. 271 00:12:57,020 --> 00:12:59,480 When he'd reach for something to read or to wear, 272 00:12:59,480 --> 00:13:02,030 he chose a top book or underwear. 273 00:13:02,030 --> 00:13:04,460 Then when he was done he would put it right back. 274 00:13:04,460 --> 00:13:06,980 Back it would go, on top of the stack. 275 00:13:06,980 --> 00:13:09,440 I know the solution, said a triumphant Lou. 276 00:13:09,440 --> 00:13:11,990 You need to learn to start using a queue. 277 00:13:11,990 --> 00:13:14,840 Lou took Jack's clothes and hung them in a closet. 278 00:13:14,840 --> 00:13:17,630 And when he had emptied the box, he just tossed it. 279 00:13:17,630 --> 00:13:21,560 Then he said, now, Jack, at the end of the day, put your clothes on the left 280 00:13:21,560 --> 00:13:23,010 when you put them away. 281 00:13:23,010 --> 00:13:24,770 Then tomorrow morning when you see the sun 282 00:13:24,770 --> 00:13:28,430 shine, get your clothes from the right from the end of the line. 283 00:13:28,430 --> 00:13:29,930 Don't you see? said Lou. 284 00:13:29,930 --> 00:13:31,340 It will be so nice. 285 00:13:31,340 --> 00:13:34,670 You'll wear everything once before you wear something twice. 286 00:13:34,670 --> 00:13:37,610 And with everything in queues in his closet and shelf, 287 00:13:37,610 --> 00:13:40,220 Jack started to feel quite sure of himself. 288 00:13:40,220 --> 00:13:44,603 All thanks to Lou and his wonderful queue. 289 00:13:44,603 --> 00:13:45,477 [END PLAYBACK] 290 00:13:45,477 --> 00:13:46,560 DAVID J. MALAN: All right. 291 00:13:46,560 --> 00:13:46,890 [LAUGHS] 292 00:13:46,890 --> 00:13:47,557 AUDIENCE: Great! 293 00:13:47,557 --> 00:13:48,600 [APPLAUSE] 294 00:13:48,600 --> 00:13:49,890 DAVID J. MALAN: Sure. 295 00:13:49,890 --> 00:13:52,558 So that paints a picture of these two abstract data structures. 296 00:13:52,558 --> 00:13:54,600 But if we really were to dive underneath the hood 297 00:13:54,600 --> 00:13:56,850 we could implement them in a number of different ways. 298 00:13:56,850 --> 00:13:59,682 But we really, I think, need some building blocks via which 299 00:13:59,682 --> 00:14:01,140 we could solve problems like those. 300 00:14:01,140 --> 00:14:03,220 And we'll see today too, some others as well. 301 00:14:03,220 --> 00:14:06,983 So let's rewind back to week two when we implement-- we introduced you 302 00:14:06,983 --> 00:14:08,400 to your very first data structure. 303 00:14:08,400 --> 00:14:09,330 That is an array. 304 00:14:09,330 --> 00:14:12,630 And an array, recall, was just a chunk of memory 305 00:14:12,630 --> 00:14:16,500 whereby elements were stored by design back to back to back. 306 00:14:16,500 --> 00:14:19,800 It's an array of contiguous memory specifically. 307 00:14:19,800 --> 00:14:23,400 So with an array, we could certainly store not just one thing but two, 308 00:14:23,400 --> 00:14:25,210 or three, or even more. 309 00:14:25,210 --> 00:14:29,280 And so, for instance, if we treat my computer's memory as this abstraction 310 00:14:29,280 --> 00:14:33,720 here, and pictured here are three bytes or some multiplication thereof, 311 00:14:33,720 --> 00:14:37,230 suppose we're storing in the computer's memory an array of size three, 312 00:14:37,230 --> 00:14:39,540 storing the digits one, two, and three. 313 00:14:39,540 --> 00:14:43,170 Well, remember, that if we zoom out per last week, 314 00:14:43,170 --> 00:14:44,880 there's other stuff going on in memory. 315 00:14:44,880 --> 00:14:48,000 So even if we want to add another number to this array 316 00:14:48,000 --> 00:14:51,360 that we didn't think of when we first started the program, like the number 317 00:14:51,360 --> 00:14:54,340 four, ideally we would put it right here next to it. 318 00:14:54,340 --> 00:14:55,900 Otherwise it's no longer an array. 319 00:14:55,900 --> 00:14:57,650 So by definition, if we're using an array, 320 00:14:57,650 --> 00:14:59,940 it's got to end up right there after the three. 321 00:14:59,940 --> 00:15:02,550 But what else is going on inside of your computer's memory? 322 00:15:02,550 --> 00:15:04,740 Well, assuming your program is of any length, 323 00:15:04,740 --> 00:15:06,870 and you've got other variables, other functions, 324 00:15:06,870 --> 00:15:09,420 you've been running it for a while, there's a lot going on. 325 00:15:09,420 --> 00:15:11,470 And your memory's being used and reused. 326 00:15:11,470 --> 00:15:13,680 So for instance, somewhere in memory might 327 00:15:13,680 --> 00:15:18,200 be immediately adjacent to this, Hello, comma, world, backslash zero, 328 00:15:18,200 --> 00:15:19,950 the null character, just because maybe you 329 00:15:19,950 --> 00:15:22,230 have another variable somewhere in there that 330 00:15:22,230 --> 00:15:25,110 is storing that particular string alongside your existing 331 00:15:25,110 --> 00:15:26,183 array of size three. 332 00:15:26,183 --> 00:15:28,350 And all of these Oscar the Grouches here really just 333 00:15:28,350 --> 00:15:31,110 represent what we called last week garbage values. 334 00:15:31,110 --> 00:15:33,660 There's obviously bits there because they don't disappear. 335 00:15:33,660 --> 00:15:36,600 They're always going to be inside of the computer somehow implemented. 336 00:15:36,600 --> 00:15:38,700 But we don't really know or care what they are. 337 00:15:38,700 --> 00:15:40,500 They're the remnants of those bytes having 338 00:15:40,500 --> 00:15:44,440 been used for other older variables, previous function calls, or the like. 339 00:15:44,440 --> 00:15:47,970 But the problem clearly here is that, OK, one, two, three is there. 340 00:15:47,970 --> 00:15:48,900 But the H is here. 341 00:15:48,900 --> 00:15:52,380 And unless I want to start taking a bite out of my string 342 00:15:52,380 --> 00:15:57,060 by overriding the H with a four, we just can't fit it right there. 343 00:15:57,060 --> 00:15:59,440 And yet, even though there's Oscars all over the place, 344 00:15:59,440 --> 00:16:00,773 those are indeed garbage values. 345 00:16:00,773 --> 00:16:04,132 And therefore, we could use that space because it's technically unused. 346 00:16:04,132 --> 00:16:06,090 We just don't know or care what the values are. 347 00:16:06,090 --> 00:16:08,940 So where could I put one, two, three, four? 348 00:16:08,940 --> 00:16:09,660 Well, my gosh. 349 00:16:09,660 --> 00:16:12,420 I have all this memory down here that's unused. 350 00:16:12,420 --> 00:16:16,830 I could certainly change those garbage values to be one, two, three, four. 351 00:16:16,830 --> 00:16:21,720 But to do that, I might need to do a bit of work here. 352 00:16:21,720 --> 00:16:25,230 It's not just a matter of just saying boom and it happens. 353 00:16:25,230 --> 00:16:28,680 Now with C and with code, I'd have to do this a little more methodically. 354 00:16:28,680 --> 00:16:31,410 So let me abstract away everything else that's a distraction. 355 00:16:31,410 --> 00:16:35,208 Let me assume that there is indeed at least four bytes available for numbers, 356 00:16:35,208 --> 00:16:38,250 just down here that we could have put them in a bunch of different spots. 357 00:16:38,250 --> 00:16:44,230 What's involved now in moving the one, two, three to this new chunk of memory 358 00:16:44,230 --> 00:16:45,423 so we can add the four? 359 00:16:45,423 --> 00:16:47,340 Well, I think conceptually we're going to have 360 00:16:47,340 --> 00:16:49,590 to copy the one from old to new. 361 00:16:49,590 --> 00:16:51,240 Copy the two from old to new. 362 00:16:51,240 --> 00:16:53,130 Copy the three from old to new. 363 00:16:53,130 --> 00:16:56,070 And then ultimately, we can get rid of the old memory. 364 00:16:56,070 --> 00:16:59,160 Those three original bytes could now look like Oscar the Grouch 365 00:16:59,160 --> 00:17:01,770 and just be garbage values for all intents and purposes. 366 00:17:01,770 --> 00:17:06,630 But now I have room for a fourth byte wherein I can put the number four. 367 00:17:06,630 --> 00:17:08,400 So this is nice. 368 00:17:08,400 --> 00:17:12,450 But what's a downside of this approach? 369 00:17:12,450 --> 00:17:15,780 What's a downside of solving the problem in this way, where the problem at hand 370 00:17:15,780 --> 00:17:20,220 is just to grow the array, so to speak, to increase its size, 371 00:17:20,220 --> 00:17:23,750 to fit one or more numbers? 372 00:17:23,750 --> 00:17:26,240 Seems pretty straight forward. 373 00:17:26,240 --> 00:17:27,615 But yeah. 374 00:17:27,615 --> 00:17:29,198 AUDIENCE: That it's just out of order. 375 00:17:29,198 --> 00:17:31,653 You move the one, two, three, four to the very bottom. 376 00:17:31,653 --> 00:17:33,320 DAVID J. MALAN: Maybe it's out of order. 377 00:17:33,320 --> 00:17:36,445 But I think that's OK because the order is just matters that it's relative. 378 00:17:36,445 --> 00:17:38,525 So so long it's still contiguous back, to back, 379 00:17:38,525 --> 00:17:41,150 to back in a different chunk of memory, I think we're OK there. 380 00:17:41,150 --> 00:17:43,790 It's not like I changed it from four, three, two, one. 381 00:17:43,790 --> 00:17:44,960 But a reasonable hunch. 382 00:17:44,960 --> 00:17:45,855 Yeah. 383 00:17:45,855 --> 00:17:47,635 AUDIENCE: [INAUDIBLE] 384 00:17:47,635 --> 00:17:48,970 385 00:17:48,970 --> 00:17:50,020 DAVID J. MALAN: Yeah. 386 00:17:50,020 --> 00:17:51,490 I didn't really plan ahead here. 387 00:17:51,490 --> 00:17:53,990 If I have to add another number, like five or anything else, 388 00:17:53,990 --> 00:17:56,198 well, I might have to jump through these hoops again. 389 00:17:56,198 --> 00:17:58,240 Maybe I get lucky and maybe there's space there. 390 00:17:58,240 --> 00:18:00,850 But not if I have other variables and other things going on, 391 00:18:00,850 --> 00:18:03,410 that too might be used at some point. 392 00:18:03,410 --> 00:18:04,420 Other thoughts? 393 00:18:04,420 --> 00:18:05,020 Yeah. 394 00:18:05,020 --> 00:18:06,230 AUDIENCE: Slow efficiency. 395 00:18:06,230 --> 00:18:08,600 DAVID J. MALAN: Slow efficiency, why? 396 00:18:08,600 --> 00:18:10,340 AUDIENCE: [INAUDIBLE] 397 00:18:10,340 --> 00:18:10,815 398 00:18:10,815 --> 00:18:11,690 DAVID J. MALAN: Yeah. 399 00:18:11,690 --> 00:18:13,920 I mean, it's just inefficient. 400 00:18:13,920 --> 00:18:16,430 It's bad design arguably. 401 00:18:16,430 --> 00:18:17,030 Why? 402 00:18:17,030 --> 00:18:20,398 Because I had a copy all of my original work down here. 403 00:18:20,398 --> 00:18:22,440 And as you note, if I want to add a fifth number, 404 00:18:22,440 --> 00:18:24,815 I'm going to have to copy it again, and again, and again. 405 00:18:24,815 --> 00:18:27,770 And do things n times again and again. 406 00:18:27,770 --> 00:18:29,600 Now, maybe that's necessary. 407 00:18:29,600 --> 00:18:30,640 We'll soon see for sure. 408 00:18:30,640 --> 00:18:33,890 But it feels like this is not going to end well, especially if the array isn't 409 00:18:33,890 --> 00:18:36,440 of size three or four but 300, 400. 410 00:18:36,440 --> 00:18:40,490 Your computer ends up spending so much time just spinning its wheels. 411 00:18:40,490 --> 00:18:42,410 I mean, honestly, better might be this. 412 00:18:42,410 --> 00:18:46,430 If this is my same array physically incarnated now, one, two, three, 413 00:18:46,430 --> 00:18:48,120 it's literally on the edge of the shelf. 414 00:18:48,120 --> 00:18:50,810 So there's no room for the number four. 415 00:18:50,810 --> 00:18:53,660 Maybe where we could take this story is, well, let's just 416 00:18:53,660 --> 00:18:55,310 find room for the four. 417 00:18:55,310 --> 00:18:58,220 Let's just put the four, for instance, over here, 418 00:18:58,220 --> 00:19:01,790 replacing some available garbage value, some spare byte over here. 419 00:19:01,790 --> 00:19:03,080 But now, wait a minute. 420 00:19:03,080 --> 00:19:06,140 I've broken the definition of an array. 421 00:19:06,140 --> 00:19:08,580 I can't have one, two, three and then four over here. 422 00:19:08,580 --> 00:19:12,720 So maybe there maybe could be a mechanism if I put this thing on again, 423 00:19:12,720 --> 00:19:14,990 where when you get to the end of the existing elements 424 00:19:14,990 --> 00:19:18,980 maybe I just somehow digitally point to the fourth array. 425 00:19:18,980 --> 00:19:22,700 And maybe we can stitch together all of these different values in memory 426 00:19:22,700 --> 00:19:25,220 so that if you follow the arrows, so to speak, 427 00:19:25,220 --> 00:19:28,310 we can reconstruct exactly what the order is even 428 00:19:28,310 --> 00:19:32,030 without having to find or make room here or pick up all of these numbers 429 00:19:32,030 --> 00:19:33,990 and move all of them over there. 430 00:19:33,990 --> 00:19:36,480 So that's perhaps the direction in which we'll go here. 431 00:19:36,480 --> 00:19:41,220 So let's see how we might get to that spot as follows. 432 00:19:41,220 --> 00:19:43,770 Let me go ahead and open up, say, VS Code here. 433 00:19:43,770 --> 00:19:48,110 Let me open up a program called list.c in my terminal. 434 00:19:48,110 --> 00:19:51,740 And let me go ahead and whip up a relatively simple program 435 00:19:51,740 --> 00:19:54,320 that just demonstrates what we did back in week two 436 00:19:54,320 --> 00:19:56,400 when we introduced arrays as follows. 437 00:19:56,400 --> 00:20:00,590 Let me include stdio.h so we can print stuff out. 438 00:20:00,590 --> 00:20:02,480 Let me do int main void. 439 00:20:02,480 --> 00:20:04,310 No command line arguments for now. 440 00:20:04,310 --> 00:20:10,105 Let me give myself an array called list of size three. 441 00:20:10,105 --> 00:20:12,980 And I'll just hard-code it to keep it simple for lecture's sake, each 442 00:20:12,980 --> 00:20:14,438 of which is going to be an integer. 443 00:20:14,438 --> 00:20:16,970 And now, just so we have some specifics to talk about, 444 00:20:16,970 --> 00:20:19,670 let me put it list bracket zero the number one, 445 00:20:19,670 --> 00:20:23,570 list bracket one the number two, and list bracket two the number three. 446 00:20:23,570 --> 00:20:27,680 So I'm just translating into code what we just had pictorially on the screen 447 00:20:27,680 --> 00:20:30,740 and also physically here with these numbers on the desk. 448 00:20:30,740 --> 00:20:33,570 Now, let's just do something mildly useful for this. 449 00:20:33,570 --> 00:20:38,750 How about we do four int i get zero, i is less than three, i plus, plus. 450 00:20:38,750 --> 00:20:41,030 Let's just print each of these numbers out just 451 00:20:41,030 --> 00:20:43,250 to make sure they're indeed in memory as I intended. 452 00:20:43,250 --> 00:20:47,240 So percent i backslash n comma i and then a semicolon. 453 00:20:47,240 --> 00:20:48,630 And I think that's it for now. 454 00:20:48,630 --> 00:20:49,820 So nothing interesting. 455 00:20:49,820 --> 00:20:51,290 No problem solved just yet. 456 00:20:51,290 --> 00:20:52,470 Just a proof of concept. 457 00:20:52,470 --> 00:20:54,710 So that now when I clear my terminal and run 458 00:20:54,710 --> 00:20:59,090 make list no apparent errors at the terminal. 459 00:20:59,090 --> 00:21:02,360 And so, when I now do dot slash list, I should see, hopefully from left 460 00:21:02,360 --> 00:21:04,250 to right, one, two, three. 461 00:21:04,250 --> 00:21:07,370 But of course, if I want to add a fourth number now, 462 00:21:07,370 --> 00:21:09,020 there's no mechanism for such. 463 00:21:09,020 --> 00:21:12,470 Certainly in the code that I just wrote I could go back in here 464 00:21:12,470 --> 00:21:14,120 and change this to a four. 465 00:21:14,120 --> 00:21:17,810 I could go down here and change lists bracket three equals four. 466 00:21:17,810 --> 00:21:20,985 I could just manually change the code, recompile the code. 467 00:21:20,985 --> 00:21:23,360 But of course, that doesn't give me any additional runway 468 00:21:23,360 --> 00:21:24,870 for the fifth or sixth number. 469 00:21:24,870 --> 00:21:27,560 So let me try to take a different approach drawing 470 00:21:27,560 --> 00:21:29,330 some inspiration from last week. 471 00:21:29,330 --> 00:21:33,050 If I want to allocate memory dynamically, 472 00:21:33,050 --> 00:21:37,220 maybe because I don't know when I wrote the program how many bytes I want, 473 00:21:37,220 --> 00:21:39,380 we have another function as of last week that 474 00:21:39,380 --> 00:21:42,530 does not require that you commit in advance to a certain number of bytes 475 00:21:42,530 --> 00:21:47,060 via what function can you just ask the operating system for a chunk of memory. 476 00:21:47,060 --> 00:21:49,070 So malloc, allocate memory. 477 00:21:49,070 --> 00:21:51,260 Now, an array is just a chunk of memory. 478 00:21:51,260 --> 00:21:53,240 And even though since week two we've been 479 00:21:53,240 --> 00:21:56,420 using this syntactic sugar, this convenience of just using 480 00:21:56,420 --> 00:21:58,910 square brackets and indexing into it, it's 481 00:21:58,910 --> 00:22:02,780 just making it easier to manipulate a chunk of memory that's contiguous 482 00:22:02,780 --> 00:22:04,860 all together back, to back, to back. 483 00:22:04,860 --> 00:22:08,090 So today, just like last week, we can take those training wheels off 484 00:22:08,090 --> 00:22:11,570 and maybe be a little more deliberate in how we allocate memory. 485 00:22:11,570 --> 00:22:14,130 Let me go, for instance, and do this. 486 00:22:14,130 --> 00:22:17,480 Let me delete my contents of my main function here. 487 00:22:17,480 --> 00:22:18,920 Go back into main. 488 00:22:18,920 --> 00:22:24,650 And let me propose now that I declare, for instance, how about my list no 489 00:22:24,650 --> 00:22:26,600 longer as an array but as a pointer. 490 00:22:26,600 --> 00:22:28,700 So int star list. 491 00:22:28,700 --> 00:22:31,670 And I'm going to go ahead and initialize this 492 00:22:31,670 --> 00:22:34,532 to be how about a chunk of three integers for now? 493 00:22:34,532 --> 00:22:35,990 So I'm still going to hard-code it. 494 00:22:35,990 --> 00:22:38,880 But I'm taking a step toward more dynamism for now. 495 00:22:38,880 --> 00:22:43,033 So let me allocate three times whatever the size is of an int. 496 00:22:43,033 --> 00:22:45,200 But it's usually going to be four bytes, as we know. 497 00:22:45,200 --> 00:22:47,900 So this is really going to be 3 times 4 equals 12. 498 00:22:47,900 --> 00:22:49,500 But it's a little more dynamic. 499 00:22:49,500 --> 00:22:51,530 And now, what can I do down here? 500 00:22:51,530 --> 00:22:53,280 Well, this is just a chunk of memory. 501 00:22:53,280 --> 00:22:58,760 So I can do literally list bracket zero equals one. 502 00:22:58,760 --> 00:23:01,220 List bracket one equals two. 503 00:23:01,220 --> 00:23:04,460 List bracket two equals three. 504 00:23:04,460 --> 00:23:07,220 And voila, achieve the exact same effect. 505 00:23:07,220 --> 00:23:10,130 Because, again, an array is just a chunk of contiguous memory. 506 00:23:10,130 --> 00:23:13,950 But malloc gives you any old chunk of contiguous memory 507 00:23:13,950 --> 00:23:16,890 so you can rather treat one like the other here. 508 00:23:16,890 --> 00:23:20,280 Now, if you really want to be cool, you can do something like this instead. 509 00:23:20,280 --> 00:23:24,360 You could dereference the address in list and go there. 510 00:23:24,360 --> 00:23:29,080 You could go down here and dereference list plus one and go there. 511 00:23:29,080 --> 00:23:31,360 But honestly, no one really writes code like this. 512 00:23:31,360 --> 00:23:32,400 It's just too cryptic. 513 00:23:32,400 --> 00:23:35,080 It's a little too far over the line at least for most people. 514 00:23:35,080 --> 00:23:38,130 And so, I think the syntactic sugar as I keep describing it, 515 00:23:38,130 --> 00:23:40,497 just the more user-friendly square bracket notation 516 00:23:40,497 --> 00:23:43,080 does the exact same thing, figures out the pointer arithmetic, 517 00:23:43,080 --> 00:23:46,800 and puts each of these integers in the right chunks therein. 518 00:23:46,800 --> 00:23:50,950 Now, just to be super pedantic, let me make sure if something went wrong. 519 00:23:50,950 --> 00:23:54,930 So if list equals equals null, that means that something went wrong. 520 00:23:54,930 --> 00:23:58,200 Like my computer is out of memory, which we should check for typically. 521 00:23:58,200 --> 00:24:02,490 So let me just immediately return one signaling anything other than zero, 522 00:24:02,490 --> 00:24:05,250 which means success typically, just to get out of this program 523 00:24:05,250 --> 00:24:06,490 because something's wrong. 524 00:24:06,490 --> 00:24:09,780 But now let me propose that I've had a-- 525 00:24:09,780 --> 00:24:10,620 well, let's do this. 526 00:24:10,620 --> 00:24:14,880 For int i gets zero, i less than 3, i plus plus. 527 00:24:14,880 --> 00:24:17,280 Though a better design would always be to use a const. 528 00:24:17,280 --> 00:24:19,440 But I'm just doing this for demonstration sake. 529 00:24:19,440 --> 00:24:21,270 Let's print out each of these ints too. 530 00:24:21,270 --> 00:24:23,530 And just make sure I didn't mess anything up. 531 00:24:23,530 --> 00:24:26,140 And let me open my terminal window again. 532 00:24:26,140 --> 00:24:29,680 Let me do make list again. 533 00:24:29,680 --> 00:24:30,180 Huh. 534 00:24:30,180 --> 00:24:35,220 Implicitly declaring library function malloc we type void star something, 535 00:24:35,220 --> 00:24:39,570 something implicitly declaring is the operative words there. 536 00:24:39,570 --> 00:24:41,070 What did I mess up? 537 00:24:41,070 --> 00:24:41,640 Yeah. 538 00:24:41,640 --> 00:24:43,890 AUDIENCE: [INAUDIBLE] need the header [INAUDIBLE].. 539 00:24:43,890 --> 00:24:44,765 DAVID J. MALAN: Yeah. 540 00:24:44,765 --> 00:24:46,980 I forgot the header file in which malloc is declared. 541 00:24:46,980 --> 00:24:50,118 I remember now, OK, that's in standard lib.h. 542 00:24:50,118 --> 00:24:52,410 And it's fine to look stuff like that up if you forget. 543 00:24:52,410 --> 00:24:55,170 So let me include standard lib.h. 544 00:24:55,170 --> 00:24:56,820 Now let me clear my terminal. 545 00:24:56,820 --> 00:24:59,070 Run make list again. 546 00:24:59,070 --> 00:25:00,880 Now we're good. dot slash list. 547 00:25:00,880 --> 00:25:04,630 And now, what did I do wrong? 548 00:25:04,630 --> 00:25:07,570 549 00:25:07,570 --> 00:25:08,070 Oh. 550 00:25:08,070 --> 00:25:09,090 [LAUGHS] OK. 551 00:25:09,090 --> 00:25:10,800 Not intended, but teachable moment. 552 00:25:10,800 --> 00:25:11,790 What did I do wrong? 553 00:25:11,790 --> 00:25:13,494 [LAUGHS] Yeah. 554 00:25:13,494 --> 00:25:15,150 AUDIENCE: You are printing [INAUDIBLE]. 555 00:25:15,150 --> 00:25:16,025 DAVID J. MALAN: Yeah. 556 00:25:16,025 --> 00:25:20,080 I'm printing the values of i instead of what is at location i in the array. 557 00:25:20,080 --> 00:25:22,440 So what I actually meant to do was print this out. 558 00:25:22,440 --> 00:25:22,990 Thank you. 559 00:25:22,990 --> 00:25:26,280 So now let me recompile make list dot slash list. 560 00:25:26,280 --> 00:25:30,700 And now, those are the three values I was expecting, not the indices thereof. 561 00:25:30,700 --> 00:25:33,090 Now, let me just propose that for the sake of discussion 562 00:25:33,090 --> 00:25:37,980 that I regret having only allocated space for three integers. 563 00:25:37,980 --> 00:25:40,985 And maybe I really should have allocated enough space for four. 564 00:25:40,985 --> 00:25:43,110 Now, this is not how you would do this in practice. 565 00:25:43,110 --> 00:25:45,193 Because presumably if you have a change of thought 566 00:25:45,193 --> 00:25:46,800 just go back in and correct the code. 567 00:25:46,800 --> 00:25:48,900 But let me propose that somewhere in here 568 00:25:48,900 --> 00:25:52,020 is a more complicated program and time passes, dot, dot, dot. 569 00:25:52,020 --> 00:25:54,190 There's a lot of other interesting code there. 570 00:25:54,190 --> 00:25:57,610 But at some point, I might want to give myself more memory. 571 00:25:57,610 --> 00:25:58,780 So how can I do this? 572 00:25:58,780 --> 00:26:02,310 Well, let me just ask the operating system now for four new bytes of memory 573 00:26:02,310 --> 00:26:05,280 so that we can at least in version one implement the idea on the board 574 00:26:05,280 --> 00:26:08,610 where I just copied the three bytes into the new four bytes 575 00:26:08,610 --> 00:26:10,360 and then added a fourth value. 576 00:26:10,360 --> 00:26:11,970 So I'm going to use malloc again. 577 00:26:11,970 --> 00:26:13,980 And I'm going to say, here's a new pointer. 578 00:26:13,980 --> 00:26:17,490 I'll call it temp, tmp for short, which is quite common when you just 579 00:26:17,490 --> 00:26:18,810 need it briefly. 580 00:26:18,810 --> 00:26:20,670 I'm going to then call malloc again. 581 00:26:20,670 --> 00:26:24,810 I'm going to say, give me four integers using size of. 582 00:26:24,810 --> 00:26:26,070 Let me again make sure. 583 00:26:26,070 --> 00:26:30,397 So if temp equals equals null something went wrong. 584 00:26:30,397 --> 00:26:31,980 So let me just immediately return one. 585 00:26:31,980 --> 00:26:35,550 And for good measure, before I return one, 586 00:26:35,550 --> 00:26:39,370 let me free the original list so that I don't leak memory. 587 00:26:39,370 --> 00:26:41,250 So I'm not just immediately returning one. 588 00:26:41,250 --> 00:26:44,160 I'm being a good citizen and remembering, well, 589 00:26:44,160 --> 00:26:49,440 if this malloc call did succeed and indeed I got as far as line 18, 590 00:26:49,440 --> 00:26:54,222 but then line 18 failed, I should free the memory that I previously malloc'd. 591 00:26:54,222 --> 00:26:55,680 So again, that's the rule of thumb. 592 00:26:55,680 --> 00:26:57,600 If you allocate it, you should be the one 593 00:26:57,600 --> 00:26:59,880 to free it even before you're about to quit. 594 00:26:59,880 --> 00:27:03,240 Now, once I've done that, I think I need to do what we did pictorially 595 00:27:03,240 --> 00:27:06,660 on the screen where I need to copy the one, the two, the three 596 00:27:06,660 --> 00:27:08,790 from the old array into the new. 597 00:27:08,790 --> 00:27:10,180 So how might I do this? 598 00:27:10,180 --> 00:27:11,850 Well, let me give myself a loop. 599 00:27:11,850 --> 00:27:15,420 So for int i get zero, i less than 3, i plus plus, 600 00:27:15,420 --> 00:27:18,490 because the size of the original is still the same. 601 00:27:18,490 --> 00:27:23,880 Let me go ahead and treat the new chunk of memory called temp as an array 602 00:27:23,880 --> 00:27:24,670 itself. 603 00:27:24,670 --> 00:27:27,150 And so, I can absolutely use these square brackets just like before. 604 00:27:27,150 --> 00:27:28,320 It's just a chunk of memory. 605 00:27:28,320 --> 00:27:29,730 I'm treating it like an array. 606 00:27:29,730 --> 00:27:34,740 And let me add to that value whatever is at the original list at location i 607 00:27:34,740 --> 00:27:35,470 as well. 608 00:27:35,470 --> 00:27:41,790 So this, again, is just this exercise of copying from old to new step by step 609 00:27:41,790 --> 00:27:44,130 the one, the two, and the three. 610 00:27:44,130 --> 00:27:46,110 But I still need one additional step. 611 00:27:46,110 --> 00:27:50,817 If my goal at hand now is to have ultimately a fourth value here, 612 00:27:50,817 --> 00:27:53,400 well, I'm just going to hard-code this for demonstration sake. 613 00:27:53,400 --> 00:27:56,520 And I'm going to go to the very last location of temp, 614 00:27:56,520 --> 00:27:58,320 which is of size four. 615 00:27:58,320 --> 00:28:04,260 Which means the last element in temp is temp bracket three, 616 00:28:04,260 --> 00:28:05,580 because it's zero-indexed. 617 00:28:05,580 --> 00:28:07,310 But there's four total spaces there. 618 00:28:07,310 --> 00:28:09,810 And I'm just going to arbitrarily for the sake of discussion 619 00:28:09,810 --> 00:28:11,010 put the number four there. 620 00:28:11,010 --> 00:28:16,920 And that is what happened when we proposed changing the final garbage 621 00:28:16,920 --> 00:28:18,480 value there to that four. 622 00:28:18,480 --> 00:28:22,350 But now I need to do what the slide did for us magically on the screen. 623 00:28:22,350 --> 00:28:24,510 I should now do a couple of final things. 624 00:28:24,510 --> 00:28:28,350 I should free the original list, which I've not done yet, because I only 625 00:28:28,350 --> 00:28:30,420 called free earlier in cases of error. 626 00:28:30,420 --> 00:28:32,070 And that was just to be safe. 627 00:28:32,070 --> 00:28:34,110 I can now free the list. 628 00:28:34,110 --> 00:28:39,390 And now, if I want to inform the computer that I want list, quote 629 00:28:39,390 --> 00:28:43,740 unquote, my variable called list to point at not the old chunk 630 00:28:43,740 --> 00:28:46,410 like it originally did but the new chunk, I think 631 00:28:46,410 --> 00:28:50,280 I can just do this-- list equals tmp. 632 00:28:50,280 --> 00:28:53,670 And again, that's just saying that if list is a pointer, which it was. 633 00:28:53,670 --> 00:28:58,230 Because look at the very top line here, on line six 634 00:28:58,230 --> 00:29:02,790 I declared list to be a pointer to a chunk of memory. 635 00:29:02,790 --> 00:29:06,370 Temp meanwhile is a separate pointer to a chunk of memory. 636 00:29:06,370 --> 00:29:10,470 So down here, this line 33 is just a matter of my saying, 637 00:29:10,470 --> 00:29:13,980 OK, now henceforth, because I've already freed the old chunk of memory, 638 00:29:13,980 --> 00:29:17,370 my list variable should point not at this chunk of three bytes 639 00:29:17,370 --> 00:29:22,290 but this chunk of four bytes or really 12 in total now. 640 00:29:22,290 --> 00:29:26,310 Or rather 16 now because we have four such bytes. 641 00:29:26,310 --> 00:29:30,090 Questions now on this code, the point of which 642 00:29:30,090 --> 00:29:33,570 was quite simply to demonstrate how we could implement 643 00:29:33,570 --> 00:29:38,310 in code this idea of fairly correctly but inefficiently allocating 644 00:29:38,310 --> 00:29:44,610 a new array of sufficient size and then populating it with a new fourth value. 645 00:29:44,610 --> 00:29:48,720 Questions on what we've just done here? 646 00:29:48,720 --> 00:29:49,220 No? 647 00:29:49,220 --> 00:29:51,140 Yeah. 648 00:29:51,140 --> 00:29:53,590 AUDIENCE: [INAUDIBLE] 649 00:29:53,590 --> 00:29:59,380 650 00:29:59,380 --> 00:30:00,630 DAVID J. MALAN: Good question. 651 00:30:00,630 --> 00:30:03,360 At this point in the story, with line 33, 652 00:30:03,360 --> 00:30:07,410 do I not have two different variables pointing at the same chunk of memory? 653 00:30:07,410 --> 00:30:08,520 Short answer, yes. 654 00:30:08,520 --> 00:30:10,800 But here's where the semantics are perhaps compelling. 655 00:30:10,800 --> 00:30:14,007 List is the variable that I intend to use longer term 656 00:30:14,007 --> 00:30:15,090 and keep around in memory. 657 00:30:15,090 --> 00:30:18,102 And again, assume that there's even more code going on here that we just 658 00:30:18,102 --> 00:30:18,810 didn't write yet. 659 00:30:18,810 --> 00:30:20,790 So it's useful to have that variable. 660 00:30:20,790 --> 00:30:23,280 Temp was just a necessary evil. 661 00:30:23,280 --> 00:30:27,570 Because up here, it would not have been correct to do this. 662 00:30:27,570 --> 00:30:31,140 It would not have been correct to say list on line 18 663 00:30:31,140 --> 00:30:33,720 equals the new chunk of memory because this 664 00:30:33,720 --> 00:30:35,700 would have represented a memory leak. 665 00:30:35,700 --> 00:30:42,060 If I prematurely change temp to point not at the old chunk but the new chunk, 666 00:30:42,060 --> 00:30:44,470 at that point no one's pointing at the old chunk. 667 00:30:44,470 --> 00:30:46,260 And so I've lost those three bytes. 668 00:30:46,260 --> 00:30:47,970 Valgrind, for instance, would yell at you 669 00:30:47,970 --> 00:30:51,070 for having lost as many bytes in memory. 670 00:30:51,070 --> 00:30:53,730 So in this case here, I do leave this as temp. 671 00:30:53,730 --> 00:30:55,590 Yes, it's duplicative at this point. 672 00:30:55,590 --> 00:30:58,380 But it's not a huge deal if it was just meant semantically 673 00:30:58,380 --> 00:30:59,610 to be a temporary value. 674 00:30:59,610 --> 00:31:03,900 But down here, at the risk of one more line of code, 675 00:31:03,900 --> 00:31:07,380 I still want to, to be a good citizen, free list, and maybe 676 00:31:07,380 --> 00:31:09,690 just for good measure return zero explicitly. 677 00:31:09,690 --> 00:31:15,870 But notice, it's not doing it twice per se on line 31. 678 00:31:15,870 --> 00:31:17,130 What am I freeing? 679 00:31:17,130 --> 00:31:19,620 The original address of list. 680 00:31:19,620 --> 00:31:21,900 The three-integer version. 681 00:31:21,900 --> 00:31:23,740 Then I change what list points at. 682 00:31:23,740 --> 00:31:25,740 So it's pointing at a completely different chunk 683 00:31:25,740 --> 00:31:27,360 of memory, this one of size four. 684 00:31:27,360 --> 00:31:31,150 So eventually, when I'm all done using this memory for this demonstration, 685 00:31:31,150 --> 00:31:32,370 I still need to free list. 686 00:31:32,370 --> 00:31:34,500 But at this point in the storyline 40, it's 687 00:31:34,500 --> 00:31:38,190 pointing at the new chunk of memory, which I similarly need to hand back 688 00:31:38,190 --> 00:31:40,290 to the operating system by free. 689 00:31:40,290 --> 00:31:41,236 Yeah. 690 00:31:41,236 --> 00:31:42,634 AUDIENCE: [INAUDIBLE] 691 00:31:42,634 --> 00:31:44,458 692 00:31:44,458 --> 00:31:46,250 DAVID J. MALAN: When would temp equal null? 693 00:31:46,250 --> 00:31:48,290 So let me scroll back up slightly. 694 00:31:48,290 --> 00:31:50,890 This is being a good citizen and a good programmer. 695 00:31:50,890 --> 00:31:54,250 Whenever it comes to using malloc, malloc 696 00:31:54,250 --> 00:31:57,755 can return null if the computer is out of memory. 697 00:31:57,755 --> 00:31:59,380 So this is maybe a much bigger program. 698 00:31:59,380 --> 00:32:01,040 You've got other things going on in it. 699 00:32:01,040 --> 00:32:04,450 And so, you just don't have enough memory available to be handed. 700 00:32:04,450 --> 00:32:06,970 Malloc needs to signal to you that there's some error. 701 00:32:06,970 --> 00:32:10,090 And so, it will buy convention per the documentation, 702 00:32:10,090 --> 00:32:12,623 per the manual pages return null. 703 00:32:12,623 --> 00:32:14,290 So this is just me being a good citizen. 704 00:32:14,290 --> 00:32:16,900 Otherwise, here's another error that might cause your program 705 00:32:16,900 --> 00:32:19,690 to crash with a segmentation fault. If you get back 706 00:32:19,690 --> 00:32:24,820 null but you assume that it's good memory going to address zero, a.k.a. 707 00:32:24,820 --> 00:32:27,850 null, will crash your program intentionally. 708 00:32:27,850 --> 00:32:28,390 Yeah. 709 00:32:28,390 --> 00:32:31,046 AUDIENCE: If you want to write [INAUDIBLE] at the very bottom 710 00:32:31,046 --> 00:32:33,703 of a program, and you just put [INAUDIBLE],, 711 00:32:33,703 --> 00:32:36,120 it would [INAUDIBLE] the same [INAUDIBLE],, right? 712 00:32:36,120 --> 00:32:37,120 DAVID J. MALAN: Correct. 713 00:32:37,120 --> 00:32:41,820 If I were to change my final line 40 here to be free temp, 714 00:32:41,820 --> 00:32:44,140 this would also work as well. 715 00:32:44,140 --> 00:32:46,560 And here, this is really a matter of design. 716 00:32:46,560 --> 00:32:48,120 It's a very nitpicky thing. 717 00:32:48,120 --> 00:32:49,380 We could probably debate it. 718 00:32:49,380 --> 00:32:53,460 But because at this point in the story my main variable 719 00:32:53,460 --> 00:32:55,770 for remembering where the list is is called list. 720 00:32:55,770 --> 00:32:59,010 This is the more responsible way to do it, freeing the list, 721 00:32:59,010 --> 00:33:01,830 just so that my colleagues, my TA doesn't 722 00:33:01,830 --> 00:33:05,160 wonder why are you freeing temporary memory that you already freed. 723 00:33:05,160 --> 00:33:06,930 It just is a semantic thing at this point. 724 00:33:06,930 --> 00:33:07,680 But good instinct. 725 00:33:07,680 --> 00:33:08,970 It would also work. 726 00:33:08,970 --> 00:33:09,690 Correct. 727 00:33:09,690 --> 00:33:12,112 Maybe just not good design. 728 00:33:12,112 --> 00:33:14,070 So it turns out that this gets annoying quickly 729 00:33:14,070 --> 00:33:16,980 as it did in the picture of doing all of this duplication. 730 00:33:16,980 --> 00:33:20,010 And even though technically it's necessary to copy those values, 731 00:33:20,010 --> 00:33:22,350 if you need a newer, bigger chunk of memory, 732 00:33:22,350 --> 00:33:27,300 there is at least a function in C that simplifies a lot of this for us. 733 00:33:27,300 --> 00:33:29,400 And in fact, let me go ahead and do this. 734 00:33:29,400 --> 00:33:33,840 Instead of using malloc, this second time on line 18 735 00:33:33,840 --> 00:33:37,170 in addition to the first time I used it on line six, 736 00:33:37,170 --> 00:33:41,310 I'm actually going to try and introduce another function called realloc, which 737 00:33:41,310 --> 00:33:44,700 as the name suggests tries to reallocate memory for you. 738 00:33:44,700 --> 00:33:47,010 And it works a little differently from malloc. 739 00:33:47,010 --> 00:33:49,330 realloc expects two arguments. 740 00:33:49,330 --> 00:33:51,840 The first one is what is the chunk of memory 741 00:33:51,840 --> 00:33:55,650 that you want to try to grow or shrink, that is, reallocate 742 00:33:55,650 --> 00:33:56,880 to be a different size. 743 00:33:56,880 --> 00:33:59,580 And then you specify what size you would want. 744 00:33:59,580 --> 00:34:04,800 And indeed, in this case, I want four times size of int. 745 00:34:04,800 --> 00:34:10,500 And that will now give me hopefully a new address of a chunk of memory that's 746 00:34:10,500 --> 00:34:12,960 big enough to fit all four numbers. 747 00:34:12,960 --> 00:34:17,190 But what's wonderful about realloc is that it 748 00:34:17,190 --> 00:34:20,020 will handle all of the copying for me. 749 00:34:20,020 --> 00:34:21,969 So in fact, I'm going to go down here. 750 00:34:21,969 --> 00:34:25,960 I'm going to get rid of all of this, this extra for loop. 751 00:34:25,960 --> 00:34:28,710 And what I'm simply going to do instead is this. 752 00:34:28,710 --> 00:34:34,290 Once I can trust after lines 18 through 23 753 00:34:34,290 --> 00:34:37,710 that realloc worked, and it didn't return null because I'm out of memory, 754 00:34:37,710 --> 00:34:41,850 I can just say, OK, just immediately remember that the new list points 755 00:34:41,850 --> 00:34:44,230 at this new chunk of memory instead. 756 00:34:44,230 --> 00:34:46,710 And then, I can still now do this line. 757 00:34:46,710 --> 00:34:50,190 But I can tweak the semantics here and just say list bracket three 758 00:34:50,190 --> 00:34:56,310 the new final location in the new list is four. 759 00:34:56,310 --> 00:34:58,020 I don't need to free this here. 760 00:34:58,020 --> 00:34:59,580 I don't need to do this. 761 00:34:59,580 --> 00:35:02,280 All I need now at the bottom is the final for loop 762 00:35:02,280 --> 00:35:03,988 to just print out these values. 763 00:35:03,988 --> 00:35:06,030 So in short, even though that was somewhat quick, 764 00:35:06,030 --> 00:35:09,720 using realloc just moves the entire copying 765 00:35:09,720 --> 00:35:13,330 process that I implemented myself a moment ago using a for loop. 766 00:35:13,330 --> 00:35:16,500 It just moves it to realloc and lets it deal with the copying for me. 767 00:35:16,500 --> 00:35:17,520 It's no more efficient. 768 00:35:17,520 --> 00:35:20,400 But at least means I'm writing less code, which is more pleasant. 769 00:35:20,400 --> 00:35:22,830 And hopefully, the people who wrote malloc or realloc 770 00:35:22,830 --> 00:35:27,030 are smarter than me and they just will introduce bugs with lower probability 771 00:35:27,030 --> 00:35:28,930 too. 772 00:35:28,930 --> 00:35:29,710 That was a lot. 773 00:35:29,710 --> 00:35:30,754 Any questions? 774 00:35:30,754 --> 00:35:32,754 AUDIENCE: Why do you still have to [INAUDIBLE]?? 775 00:35:32,754 --> 00:35:34,580 776 00:35:34,580 --> 00:35:35,830 DAVID J. MALAN: Good question. 777 00:35:35,830 --> 00:35:40,750 Why do you still need to make list equal temp as I did on line 24? 778 00:35:40,750 --> 00:35:44,200 So ideally, I would do this. 779 00:35:44,200 --> 00:35:48,910 Ideally I would just change this line 18 to be list. 780 00:35:48,910 --> 00:35:50,350 That is to say, call-- 781 00:35:50,350 --> 00:35:52,570 or actually, even better, ideally I would just 782 00:35:52,570 --> 00:35:55,180 say reallocate this list to be of this new size. 783 00:35:55,180 --> 00:35:57,760 But again, things can go wrong when allocating memory. 784 00:35:57,760 --> 00:36:01,760 You need to check a return value to see if it was successful or not. 785 00:36:01,760 --> 00:36:04,220 And so, we need to use a return value. 786 00:36:04,220 --> 00:36:05,680 So let's not introduce temp. 787 00:36:05,680 --> 00:36:07,120 Let's just use list. 788 00:36:07,120 --> 00:36:09,610 But here's where a memory leak might happen. 789 00:36:09,610 --> 00:36:13,240 In the off chance realloc fails and doesn't 790 00:36:13,240 --> 00:36:16,240 have enough memory for your four bytes, therefore 791 00:36:16,240 --> 00:36:18,100 it returns by definition null. 792 00:36:18,100 --> 00:36:23,110 You can't overwrite the original value of list with null to then check it. 793 00:36:23,110 --> 00:36:23,680 Why? 794 00:36:23,680 --> 00:36:27,190 Because now who remembers where the original three bytes were? 795 00:36:27,190 --> 00:36:30,790 If you prematurely change the value of list, you've lost. 796 00:36:30,790 --> 00:36:32,780 You've leaked memory in that sense. 797 00:36:32,780 --> 00:36:34,990 And so, that's why-- let me undo this change-- 798 00:36:34,990 --> 00:36:39,370 I declare a temporary pointer for the sole purpose of making 799 00:36:39,370 --> 00:36:41,060 sure I can check the return value. 800 00:36:41,060 --> 00:36:44,660 And then, once it's good, now I'll update the value of list. 801 00:36:44,660 --> 00:36:47,590 So it's doing a switcheroo by making sure first 802 00:36:47,590 --> 00:36:51,070 that you have a new value to swap with the old. 803 00:36:51,070 --> 00:36:53,630 Other questions on this code? 804 00:36:53,630 --> 00:36:54,523 Yeah. 805 00:36:54,523 --> 00:36:59,882 AUDIENCE: Does realloc [INAUDIBLE] memory where [INAUDIBLE] is stored? 806 00:36:59,882 --> 00:37:00,840 DAVID J. MALAN: Indeed. 807 00:37:00,840 --> 00:37:03,660 realloc automatically frees the previous memory for you. 808 00:37:03,660 --> 00:37:05,610 And better yet, it's even smarter than that. 809 00:37:05,610 --> 00:37:10,230 If you get lucky and there happens to be space right after your existing 810 00:37:10,230 --> 00:37:13,900 chunk of memory, so one, two, three, garbage value, instead of one, 811 00:37:13,900 --> 00:37:16,560 two, three, Hello, world, realloc won't even 812 00:37:16,560 --> 00:37:18,990 bother copying things from old to new. 813 00:37:18,990 --> 00:37:21,330 It will just say, OK, I'm going to now reserve 814 00:37:21,330 --> 00:37:24,690 for you more bytes than you originally asked for so it doesn't 815 00:37:24,690 --> 00:37:26,460 have to waste time doing that copying. 816 00:37:26,460 --> 00:37:29,310 And so, in that sense, this version is now not only still correct, 817 00:37:29,310 --> 00:37:32,790 it's even better designed because we're not wasting time with that for loop. 818 00:37:32,790 --> 00:37:36,060 We might have to resort to it if there is, in fact, Hello, world or something 819 00:37:36,060 --> 00:37:36,900 else in the way. 820 00:37:36,900 --> 00:37:39,990 But hopefully, we'll get lucky and save those steps. 821 00:37:39,990 --> 00:37:46,610 Other questions on this manipulation of code here? 822 00:37:46,610 --> 00:37:48,038 Yeah, in the middle. 823 00:37:48,038 --> 00:37:49,442 AUDIENCE: So [INAUDIBLE]? 824 00:37:49,442 --> 00:37:53,190 825 00:37:53,190 --> 00:37:56,110 DAVID J. MALAN: What if you want to resize a two-dimensional array? 826 00:37:56,110 --> 00:38:00,540 So very similar in spirit whereby you can use the same trickery. 827 00:38:00,540 --> 00:38:03,150 Let me wave my hand at that for now just because I 828 00:38:03,150 --> 00:38:06,270 think that's going to significantly increase the complexity. 829 00:38:06,270 --> 00:38:08,100 But very same primitives ultimately. 830 00:38:08,100 --> 00:38:12,570 A two-dimensional array is essentially just a doubly long or quadratically 831 00:38:12,570 --> 00:38:16,050 longer list of memory that using square bracket notation 832 00:38:16,050 --> 00:38:18,520 is doing some of that mental math for you. 833 00:38:18,520 --> 00:38:24,060 But it's fundamentally no different of what's going on underneath the hood. 834 00:38:24,060 --> 00:38:27,417 So with that said and that code under our belt, 835 00:38:27,417 --> 00:38:30,000 even though that's not going to be something you'll frequently 836 00:38:30,000 --> 00:38:33,390 need to code yourself, let's propose now how 837 00:38:33,390 --> 00:38:36,780 we might go about building some actual data structures ourselves. 838 00:38:36,780 --> 00:38:39,750 The new ingredient here being this reality 839 00:38:39,750 --> 00:38:44,520 that if you want to resize a chunk of memory so as to make room for things, 840 00:38:44,520 --> 00:38:45,780 we now have that ability. 841 00:38:45,780 --> 00:38:47,040 Memory addresses. 842 00:38:47,040 --> 00:38:50,160 And pointers just give us the ability to point around at things 843 00:38:50,160 --> 00:38:51,570 and move things around in memory. 844 00:38:51,570 --> 00:38:54,060 But now that we have malloc and even realloc, 845 00:38:54,060 --> 00:38:56,100 you can imagine, maybe rewinding, and you 846 00:38:56,100 --> 00:39:00,090 could implement that stack, that queue using not an array per 847 00:39:00,090 --> 00:39:02,790 se, because you have to commit to an array size in advance. 848 00:39:02,790 --> 00:39:06,840 But if you implement your stack or your queue using a pointer and then 849 00:39:06,840 --> 00:39:10,680 malloc and realloc, and maybe someone else writes all that code for you, 850 00:39:10,680 --> 00:39:14,490 perhaps now you can imagine that now the stack can grow 851 00:39:14,490 --> 00:39:17,400 or shrink by using realloc accordingly. 852 00:39:17,400 --> 00:39:23,400 You don't have to preemptively say give me five bytes, or 50, or 500, or 5,000. 853 00:39:23,400 --> 00:39:25,360 You can say, just give me one initially. 854 00:39:25,360 --> 00:39:27,850 And if I need more, I'll realloc, realloc, realloc. 855 00:39:27,850 --> 00:39:29,880 And if you keep popping things off the stack, 856 00:39:29,880 --> 00:39:33,630 you can realloc in the other direction and ask for fewer and fewer bytes 857 00:39:33,630 --> 00:39:36,550 and the operating system can take that memory back as well. 858 00:39:36,550 --> 00:39:38,130 So we now have this building block. 859 00:39:38,130 --> 00:39:39,900 Let's see what we can do with it. 860 00:39:39,900 --> 00:39:43,050 So we've had a few pieces of syntax in recent weeks, all of which 861 00:39:43,050 --> 00:39:46,090 we're going to combine now in just a slightly more clever way. 862 00:39:46,090 --> 00:39:49,920 So struct is this keyword in C that lets us build our own structure in memory, 863 00:39:49,920 --> 00:39:52,530 like a collection of two or three or more variables, 864 00:39:52,530 --> 00:39:54,480 like a person that we've seen before. 865 00:39:54,480 --> 00:40:01,320 The dot operator, recall, we've used when you do have a struct like a person 866 00:40:01,320 --> 00:40:02,980 and you want to go inside of it. 867 00:40:02,980 --> 00:40:07,170 So person.name or person.number. 868 00:40:07,170 --> 00:40:08,520 We did this a few weeks ago. 869 00:40:08,520 --> 00:40:12,030 Now but the dot operator just allows you to go inside of a structure 870 00:40:12,030 --> 00:40:14,160 and get the individual variables within. 871 00:40:14,160 --> 00:40:17,520 And then, the star operator, unfortunately, has a lot of uses now. 872 00:40:17,520 --> 00:40:18,810 One was multiplication. 873 00:40:18,810 --> 00:40:20,880 My God, that was easy back in the day. 874 00:40:20,880 --> 00:40:23,520 Now it's used to declare pointers. 875 00:40:23,520 --> 00:40:25,680 It's also used to dereference pointer. 876 00:40:25,680 --> 00:40:29,430 So to make one exist and then go to that address. 877 00:40:29,430 --> 00:40:31,900 Unfortunately, it's the same symbol for all of those. 878 00:40:31,900 --> 00:40:33,270 But it's all related. 879 00:40:33,270 --> 00:40:35,183 But with these three symbols, it turns out-- 880 00:40:35,183 --> 00:40:37,350 you're going to get one last one today-- and my God, 881 00:40:37,350 --> 00:40:40,320 it finally looks like the concept. 882 00:40:40,320 --> 00:40:42,870 It turns out there's a clever way any time 883 00:40:42,870 --> 00:40:45,870 you want to use the dot and the star together, that is, to go 884 00:40:45,870 --> 00:40:49,830 somewhere, and go to an address, and then look inside of a structure, 885 00:40:49,830 --> 00:40:53,523 you can actually literally use an arrow symbol on your keyboard. 886 00:40:53,523 --> 00:40:54,690 It's not a single keystroke. 887 00:40:54,690 --> 00:40:57,750 It's a hyphen and then an open angle bracket. 888 00:40:57,750 --> 00:40:59,520 But at least it looks like an arrow. 889 00:40:59,520 --> 00:41:02,700 And we'll see indeed in code today, the things I was drawing pictorially 890 00:41:02,700 --> 00:41:04,680 on the screen last time with yellow arrows, 891 00:41:04,680 --> 00:41:07,380 you can actually now express as well in code. 892 00:41:07,380 --> 00:41:12,000 And so, here we have our next data structure called a linked list. 893 00:41:12,000 --> 00:41:15,330 And this is one of the most useful powerful concepts in C. 894 00:41:15,330 --> 00:41:18,030 It's the kind of thing that you can take for granted in Java 895 00:41:18,030 --> 00:41:19,920 and Python and higher level languages. 896 00:41:19,920 --> 00:41:24,150 But today, we'll see how we or others can actually build these things just 897 00:41:24,150 --> 00:41:25,830 using these same primitives. 898 00:41:25,830 --> 00:41:29,430 So a linked list is going to allow us to actually do 899 00:41:29,430 --> 00:41:32,490 what we used a foam finger for last week allow 900 00:41:32,490 --> 00:41:35,100 us to link together, for instance, these three values maybe 901 00:41:35,100 --> 00:41:36,540 with that fourth value over there. 902 00:41:36,540 --> 00:41:39,390 And then if there's a fifth, maybe this other foam finger 903 00:41:39,390 --> 00:41:41,790 points even farther over way to that fifth value. 904 00:41:41,790 --> 00:41:45,930 The key being that you can stitch together fancier data structures 905 00:41:45,930 --> 00:41:49,440 without having to pick all of these up and find new space. 906 00:41:49,440 --> 00:41:51,750 You just have to at least connect the dots somehow. 907 00:41:51,750 --> 00:41:54,568 We just need to somehow point from one to the other. 908 00:41:54,568 --> 00:41:57,360 And that's going to make things much more efficient, it would seem. 909 00:41:57,360 --> 00:41:58,802 So how do we get there? 910 00:41:58,802 --> 00:42:00,510 So here's my computer's memory as always. 911 00:42:00,510 --> 00:42:03,720 Suppose that I'm storing the value one somewhere in there 912 00:42:03,720 --> 00:42:06,180 and it's at 0x123 address, whatever. 913 00:42:06,180 --> 00:42:10,140 And I'm storing the number two somewhere else in memory, 0x456. 914 00:42:10,140 --> 00:42:13,500 And number three at address 0x789. 915 00:42:13,500 --> 00:42:15,780 This is not in array by definition. 916 00:42:15,780 --> 00:42:17,412 Why? 917 00:42:17,412 --> 00:42:19,620 Even though it's the only three things on the screen, 918 00:42:19,620 --> 00:42:20,715 what makes this not an array? 919 00:42:20,715 --> 00:42:21,550 AUDIENCE: It's not contiguous. 920 00:42:21,550 --> 00:42:23,050 DAVID J. MALAN: It's not contiguous. 921 00:42:23,050 --> 00:42:25,450 So this violates the definition of an array. 922 00:42:25,450 --> 00:42:27,880 But you know, especially since they're sequential, 923 00:42:27,880 --> 00:42:30,250 it kind of looks to a human like a list. 924 00:42:30,250 --> 00:42:33,120 So it would be nice if there were a data type called list. 925 00:42:33,120 --> 00:42:35,820 And there isn't in C. There will be in Python. 926 00:42:35,820 --> 00:42:36,930 But you know what? 927 00:42:36,930 --> 00:42:40,270 If I could somehow stitch together these three values 928 00:42:40,270 --> 00:42:42,330 so I can get from one to the next to the next, 929 00:42:42,330 --> 00:42:45,660 then I think we could achieve the idea the concept of a list 930 00:42:45,660 --> 00:42:48,420 without this really annoying constraint that they all be 931 00:42:48,420 --> 00:42:51,210 contiguous as in an array. 932 00:42:51,210 --> 00:42:52,260 So how do I do that? 933 00:42:52,260 --> 00:42:55,970 Well, at the end of the day I only have memory at my disposal. 934 00:42:55,970 --> 00:42:58,880 There's no more training wheels to take off here. 935 00:42:58,880 --> 00:43:01,620 This is what we've got underneath the hood of a computer. 936 00:43:01,620 --> 00:43:04,760 So if all I have is memory, I think the solution 937 00:43:04,760 --> 00:43:08,390 to this problem of stitching together those values in a list 938 00:43:08,390 --> 00:43:11,090 must be to spend a bit more memory. 939 00:43:11,090 --> 00:43:13,770 That's literally the only resource we have right now. 940 00:43:13,770 --> 00:43:17,180 So let me propose that if we want to create a list conceptually out 941 00:43:17,180 --> 00:43:21,140 of three values that are in random although pictorially pretty 942 00:43:21,140 --> 00:43:25,280 positions in memory, let me just add a little bit more memory to the picture. 943 00:43:25,280 --> 00:43:27,200 So in addition to storing the one, I'm going 944 00:43:27,200 --> 00:43:30,300 to leave myself some room, a little scratch pad if you will, 945 00:43:30,300 --> 00:43:31,880 to use some other bits as well. 946 00:43:31,880 --> 00:43:34,310 Same for the two, same for the three. 947 00:43:34,310 --> 00:43:38,450 And you can perhaps see where this is going based on last week. 948 00:43:38,450 --> 00:43:43,640 If I want to somehow connect the one to the two, 949 00:43:43,640 --> 00:43:48,290 any instincts as to what I should write in this box here that 950 00:43:48,290 --> 00:43:52,460 would lead me effectively from one to the two? 951 00:43:52,460 --> 00:43:53,390 What could go here? 952 00:43:53,390 --> 00:43:53,900 Yeah. 953 00:43:53,900 --> 00:43:55,852 AUDIENCE: [INAUDIBLE] 954 00:43:55,852 --> 00:43:56,882 955 00:43:56,882 --> 00:43:58,965 DAVID J. MALAN: We could store the address of two. 956 00:43:58,965 --> 00:44:01,463 And so, specifically, what would you have me write here? 957 00:44:01,463 --> 00:44:05,220 AUDIENCE: [INAUDIBLE] pointer 0x456. 958 00:44:05,220 --> 00:44:06,220 DAVID J. MALAN: Perfect. 959 00:44:06,220 --> 00:44:09,220 Ideally, I would just put in this box another integer, 960 00:44:09,220 --> 00:44:11,690 one that happens to be represented in hexadecimal. 961 00:44:11,690 --> 00:44:12,940 But that's just a base system. 962 00:44:12,940 --> 00:44:15,320 It's just a human thing for us to look at. 963 00:44:15,320 --> 00:44:18,110 I'm going to put the value 0x456 here. 964 00:44:18,110 --> 00:44:19,690 So let me go ahead and reveal that. 965 00:44:19,690 --> 00:44:21,290 0x456 goes there. 966 00:44:21,290 --> 00:44:23,290 You can perhaps see further where this is going. 967 00:44:23,290 --> 00:44:25,390 Well, if I want to get from the two to the three, 968 00:44:25,390 --> 00:44:29,050 I think I need to put below the two the address of the three, which 969 00:44:29,050 --> 00:44:30,760 gives me 0x789. 970 00:44:30,760 --> 00:44:33,400 Now if three is the end of the list, I don't 971 00:44:33,400 --> 00:44:35,230 want to let it be some garbage value. 972 00:44:35,230 --> 00:44:38,020 Because that would imply that who knows where it's pointing. 973 00:44:38,020 --> 00:44:39,377 I need some definitive value. 974 00:44:39,377 --> 00:44:40,960 And just what would your instincts be? 975 00:44:40,960 --> 00:44:44,920 If I want to make clear with some special sentinel value 976 00:44:44,920 --> 00:44:47,350 that the buck stops here, what do I put? 977 00:44:47,350 --> 00:44:48,525 What might my options be? 978 00:44:48,525 --> 00:44:49,150 AUDIENCE: Null? 979 00:44:49,150 --> 00:44:49,480 DAVID J. MALAN: Yeah. 980 00:44:49,480 --> 00:44:50,110 So null. 981 00:44:50,110 --> 00:44:55,240 Not N-U-L, per se, but N-U-L-L, which was the new keyword we introduced last 982 00:44:55,240 --> 00:44:58,960 week, which just represents an empty pointer, if you will. 983 00:44:58,960 --> 00:45:01,780 Technically the address 0x0. 984 00:45:01,780 --> 00:45:03,760 So literally, the zero address. 985 00:45:03,760 --> 00:45:06,430 And what humans did years ago, they just decided, you know what? 986 00:45:06,430 --> 00:45:09,850 Nothing should ever live at address zero in memory. 987 00:45:09,850 --> 00:45:12,700 We're just going to reserve that one special byte to be 988 00:45:12,700 --> 00:45:15,850 a special signal, a sentinel value such that if you ever 989 00:45:15,850 --> 00:45:19,540 see a zero address in a pointer, it just means it's invalid. 990 00:45:19,540 --> 00:45:20,720 It does not exist. 991 00:45:20,720 --> 00:45:24,940 Now, we write that though a little more pleasantly for the eyes as just N-U-L-L 992 00:45:24,940 --> 00:45:25,780 in all caps. 993 00:45:25,780 --> 00:45:27,500 And that's a keyword in C as well. 994 00:45:27,500 --> 00:45:31,120 But of course, last week, I claimed that who cares where things are in memory. 995 00:45:31,120 --> 00:45:34,730 And honestly, this quickly gets tedious even worrying about these values. 996 00:45:34,730 --> 00:45:37,840 So let me abstract this away and propose that if we 997 00:45:37,840 --> 00:45:40,780 want to remember where all of these numbers are in memory, 998 00:45:40,780 --> 00:45:43,570 let's give ourselves one final piece of memory 999 00:45:43,570 --> 00:45:47,110 that just allows us to start the whole process. 1000 00:45:47,110 --> 00:45:50,800 Let me allocate on the left-hand side here not room for a number, 1001 00:45:50,800 --> 00:45:54,850 like one, two, three, just room for a pointer that henceforth 1002 00:45:54,850 --> 00:45:57,320 I think I'll call list by convention. 1003 00:45:57,320 --> 00:46:00,880 And then store in that one additional pointer 1004 00:46:00,880 --> 00:46:03,340 a value that just kickstarts the whole process. 1005 00:46:03,340 --> 00:46:06,280 This is the treasure map, if you will, that you get handed. 1006 00:46:06,280 --> 00:46:11,630 And this has the address of the very first actual node in memory. 1007 00:46:11,630 --> 00:46:13,630 Now, technically, we could just start with this. 1008 00:46:13,630 --> 00:46:15,940 But it turns out we'll see it's just a little cleaner 1009 00:46:15,940 --> 00:46:18,880 to use a simple single pointer that leads 1010 00:46:18,880 --> 00:46:22,330 to the things you care about, as opposed to just starting with the first element 1011 00:46:22,330 --> 00:46:22,940 y. 1012 00:46:22,940 --> 00:46:25,150 Well, if you ever want to get rid of this element, 1013 00:46:25,150 --> 00:46:27,150 it'd be nice if you could at least still hang on 1014 00:46:27,150 --> 00:46:29,590 to an empty sheet of paper that indicates that the list is 1015 00:46:29,590 --> 00:46:31,430 empty would be one argument for that. 1016 00:46:31,430 --> 00:46:33,700 So again, who cares about these addresses now? 1017 00:46:33,700 --> 00:46:36,940 Now with the wave of the hand, let's just abstract it away. 1018 00:46:36,940 --> 00:46:38,770 And there are our pointers. 1019 00:46:38,770 --> 00:46:42,550 Each of those addresses in the squares at the bottom 1020 00:46:42,550 --> 00:46:46,390 are simply pointing to the next element in the list. 1021 00:46:46,390 --> 00:46:48,730 The jargon to introduce here would be that now 1022 00:46:48,730 --> 00:46:50,890 that we have these integers one, two, three, 1023 00:46:50,890 --> 00:46:53,290 but they're in these wrappers, if you will, 1024 00:46:53,290 --> 00:46:58,660 these structures that have metadata that is additional data that is related to 1025 00:46:58,660 --> 00:47:00,460 but not the data you actually care about. 1026 00:47:00,460 --> 00:47:01,150 This is data. 1027 00:47:01,150 --> 00:47:02,290 This is metadata. 1028 00:47:02,290 --> 00:47:05,200 This thing here, rectangularly we'll call a node-- 1029 00:47:05,200 --> 00:47:08,530 N-O-D-E. And it's just a term of art that means it's like a container 1030 00:47:08,530 --> 00:47:12,130 in code for storing some values. 1031 00:47:12,130 --> 00:47:14,050 This then is a linked list. 1032 00:47:14,050 --> 00:47:17,860 And this then is the graphical incarnation 1033 00:47:17,860 --> 00:47:20,240 of one node pointing to the other. 1034 00:47:20,240 --> 00:47:22,990 In this case, they happen to be by chance and by design 1035 00:47:22,990 --> 00:47:25,450 of this desk contiguous initially. 1036 00:47:25,450 --> 00:47:27,490 But there's no requirement that they be such. 1037 00:47:27,490 --> 00:47:28,420 The one could be over there. 1038 00:47:28,420 --> 00:47:29,080 The two over there. 1039 00:47:29,080 --> 00:47:29,955 The three over there. 1040 00:47:29,955 --> 00:47:33,430 I would just need more foam fingers to point at one to the next. 1041 00:47:33,430 --> 00:47:36,355 Questions on this concept of a linked list? 1042 00:47:36,355 --> 00:47:38,050 AUDIENCE: [INAUDIBLE] 1043 00:47:38,050 --> 00:47:39,547 DAVID J. MALAN: Yeah, in the back. 1044 00:47:39,547 --> 00:47:43,820 AUDIENCE: [INAUDIBLE] array [INAUDIBLE] by pointer that is outside of the array 1045 00:47:43,820 --> 00:47:44,514 itself? 1046 00:47:44,514 --> 00:47:46,139 DAVID J. MALAN: Can you say that again? 1047 00:47:46,139 --> 00:47:49,195 AUDIENCE: Do traditional arrays always start [INAUDIBLE] pointer 1048 00:47:49,195 --> 00:47:50,653 that is outside of the [INAUDIBLE]? 1049 00:47:50,653 --> 00:47:51,987 DAVID J. MALAN: A good question. 1050 00:47:51,987 --> 00:47:53,867 Do traditional arrays start with a pointer 1051 00:47:53,867 --> 00:47:55,200 that's outside of the structure? 1052 00:47:55,200 --> 00:47:56,430 Short answer, no. 1053 00:47:56,430 --> 00:48:00,210 Arrays are special in C and certain other languages. 1054 00:48:00,210 --> 00:48:04,060 And the name of an array is technically a symbol, if you will, 1055 00:48:04,060 --> 00:48:08,700 that the computer-- the program knows maps to a specific location in memory. 1056 00:48:08,700 --> 00:48:12,540 It's just a label, a synonym for an memory address. 1057 00:48:12,540 --> 00:48:14,320 It does not take up space. 1058 00:48:14,320 --> 00:48:17,580 So to be clear, the name of an array does not take up space 1059 00:48:17,580 --> 00:48:19,530 like that extra square on the left. 1060 00:48:19,530 --> 00:48:22,810 But you do need that extra square on the left when implementing a linked list 1061 00:48:22,810 --> 00:48:25,530 so that you can determine if the list is of size 1062 00:48:25,530 --> 00:48:29,650 zero there's nothing being pointed at, or size three in this case. 1063 00:48:29,650 --> 00:48:32,820 We're taking on more responsibility ourselves. 1064 00:48:32,820 --> 00:48:33,390 Yeah. 1065 00:48:33,390 --> 00:48:35,350 AUDIENCE: [INAUDIBLE] 1066 00:48:35,350 --> 00:48:39,099 1067 00:48:39,099 --> 00:48:41,307 DAVID J. MALAN: How do you point to the next element? 1068 00:48:41,307 --> 00:48:42,780 Can you elaborate? 1069 00:48:42,780 --> 00:48:47,103 AUDIENCE: These elements pointing to the next one, how does [INAUDIBLE]?? 1070 00:48:47,103 --> 00:48:48,520 DAVID J. MALAN: Ah, good question. 1071 00:48:48,520 --> 00:48:50,603 If each of these elements is pointing to the next, 1072 00:48:50,603 --> 00:48:52,360 how does three point to the others? 1073 00:48:52,360 --> 00:48:53,800 Short answer, it doesn't. 1074 00:48:53,800 --> 00:48:55,960 At least in this design we have more technically 1075 00:48:55,960 --> 00:48:58,300 what's called a singly linked list. 1076 00:48:58,300 --> 00:49:01,490 And as the arrows imply, it only goes in one direction. 1077 00:49:01,490 --> 00:49:04,960 So if you somehow find in code maybe a for loop, maybe a while loop, 1078 00:49:04,960 --> 00:49:09,700 somehow you're in code over here, you have no way in code 1079 00:49:09,700 --> 00:49:13,150 to go backwards unless we change this to a doubly linked list 1080 00:49:13,150 --> 00:49:17,440 where I add another box that lets me have arrows in both directions. 1081 00:49:17,440 --> 00:49:23,080 Or maybe I just make it circular and I connect the three back 1082 00:49:23,080 --> 00:49:24,940 to the one, which you can totally do. 1083 00:49:24,940 --> 00:49:27,183 But that tends to make life harder because now you 1084 00:49:27,183 --> 00:49:30,100 have to figure out when you're stuck in a loop in your data structure. 1085 00:49:30,100 --> 00:49:31,370 But it's doable as well. 1086 00:49:31,370 --> 00:49:34,480 But as is, it's a dead end by design. 1087 00:49:34,480 --> 00:49:36,730 Other questions on this design here? 1088 00:49:36,730 --> 00:49:40,250 1089 00:49:40,250 --> 00:49:42,570 Well, how might we implement this structure in code? 1090 00:49:42,570 --> 00:49:46,370 Well, let me just connect the dots to something we've seen before here. 1091 00:49:46,370 --> 00:49:50,780 This is how a couple of weeks ago, we introduced the notion of a person. 1092 00:49:50,780 --> 00:49:53,270 And we claimed a person might have a name and a number. 1093 00:49:53,270 --> 00:49:55,937 Last week, of course, we took off some of these training wheels. 1094 00:49:55,937 --> 00:49:59,120 And a string is really technically a char star in both cases. 1095 00:49:59,120 --> 00:50:01,640 But really, there's no conceptual difference beyond that. 1096 00:50:01,640 --> 00:50:06,110 But let's use this same paradigm to implement a node 1097 00:50:06,110 --> 00:50:07,757 as I described it in that picture. 1098 00:50:07,757 --> 00:50:09,590 So let me get rid of the name and the number 1099 00:50:09,590 --> 00:50:11,360 because that's related only to a person. 1100 00:50:11,360 --> 00:50:15,440 And let me rename this structure for discussion's sake to node. 1101 00:50:15,440 --> 00:50:20,000 That then invites the question, well, what needs to go inside of a node? 1102 00:50:20,000 --> 00:50:22,970 Well, minimally, an integer. 1103 00:50:22,970 --> 00:50:27,417 But this is now where we need to think a little harder, just conceptually. 1104 00:50:27,417 --> 00:50:29,750 Even if you have no idea how to type it at the keyboard, 1105 00:50:29,750 --> 00:50:33,920 what else needs to be part of a node based on these rectangular pictures 1106 00:50:33,920 --> 00:50:36,455 that we've drawn? 1107 00:50:36,455 --> 00:50:37,330 What more do we need? 1108 00:50:37,330 --> 00:50:38,246 Yeah. 1109 00:50:38,246 --> 00:50:39,755 AUDIENCE: [INAUDIBLE] 1110 00:50:39,755 --> 00:50:40,630 DAVID J. MALAN: Yeah. 1111 00:50:40,630 --> 00:50:42,650 We need a pointer to another node. 1112 00:50:42,650 --> 00:50:48,190 So if I don't know how to implement this yet, it could be something like pointer 1113 00:50:48,190 --> 00:50:49,847 to another node, how do I do that? 1114 00:50:49,847 --> 00:50:50,680 Well, you know what? 1115 00:50:50,680 --> 00:50:53,290 It turns out you would ideally say this. 1116 00:50:53,290 --> 00:50:59,260 If you know that the next node is itself a node by definition, well, 1117 00:50:59,260 --> 00:51:02,752 any time we've needed a pointer we just use the data type and a star. 1118 00:51:02,752 --> 00:51:04,960 And I'm going to arbitrarily, but I think reasonably, 1119 00:51:04,960 --> 00:51:08,920 call this second squared at the bottom of those rectangles 1120 00:51:08,920 --> 00:51:11,770 next as the name of my attribute here. 1121 00:51:11,770 --> 00:51:15,760 But node star just connotes that the next variable 1122 00:51:15,760 --> 00:51:19,118 is going to be not a node per se but the address of a node. 1123 00:51:19,118 --> 00:51:20,410 And that's exactly what we did. 1124 00:51:20,410 --> 00:51:26,240 You had me put 0x456, 0x789 in that box, which is the address of another node. 1125 00:51:26,240 --> 00:51:30,730 So the way we would express this in code would be node star next. 1126 00:51:30,730 --> 00:51:33,830 But we could call the variable anything we want. 1127 00:51:33,830 --> 00:51:35,890 Now, this is a bit of a white lie. 1128 00:51:35,890 --> 00:51:37,120 But we'll fix this right now. 1129 00:51:37,120 --> 00:51:39,400 This code won't actually compile. 1130 00:51:39,400 --> 00:51:41,710 C takes you pretty literally, recall. 1131 00:51:41,710 --> 00:51:45,220 And if you use some term at the top of your file 1132 00:51:45,220 --> 00:51:47,405 that you don't define until later in your file, 1133 00:51:47,405 --> 00:51:49,030 you're going to see some error message. 1134 00:51:49,030 --> 00:51:50,738 We've seen this when I've messed up and I 1135 00:51:50,738 --> 00:51:53,500 forgot to include the function prototypes at the top of my code. 1136 00:51:53,500 --> 00:51:55,000 This is related in spirit. 1137 00:51:55,000 --> 00:51:57,880 I seem here on my one, two, three, fourth line of code, 1138 00:51:57,880 --> 00:52:00,880 I'm trying to use this new term of art that I invented here 1139 00:52:00,880 --> 00:52:04,220 in my code called node, even though it's a CS term as well. 1140 00:52:04,220 --> 00:52:08,410 But nowhere above this it would seem did I even define what a node is. 1141 00:52:08,410 --> 00:52:09,613 It's not a data type in C. 1142 00:52:09,613 --> 00:52:11,530 Every computer scientists know what a node is. 1143 00:52:11,530 --> 00:52:13,900 But it doesn't come for free with the language. 1144 00:52:13,900 --> 00:52:16,330 So I need to do something else. 1145 00:52:16,330 --> 00:52:20,420 I need this word here to come first so that I can use it here. 1146 00:52:20,420 --> 00:52:22,270 And so, we have this catch-22. 1147 00:52:22,270 --> 00:52:26,290 How can a structure be self-referential, that is, 1148 00:52:26,290 --> 00:52:30,440 point to another version of itself if the word doesn't yet exist? 1149 00:52:30,440 --> 00:52:34,510 So the solution to this in C, which we didn't need for a person 1150 00:52:34,510 --> 00:52:36,550 because there was no notion of listing-- 1151 00:52:36,550 --> 00:52:37,900 connecting is a list-- 1152 00:52:37,900 --> 00:52:41,380 we need one more keyword here that we didn't need for a person. 1153 00:52:41,380 --> 00:52:43,490 And we reuse that keyword here. 1154 00:52:43,490 --> 00:52:45,250 So kind of an annoying detail. 1155 00:52:45,250 --> 00:52:49,840 But if we preemptively call this whole thing struct node, 1156 00:52:49,840 --> 00:52:54,370 you can now refer to the thing on the inside as a struct node star. 1157 00:52:54,370 --> 00:52:58,630 But then you can shorten the name of the whole thing from struct node 1158 00:52:58,630 --> 00:52:59,920 to just node. 1159 00:52:59,920 --> 00:53:01,570 Sort of an annoying sequence of steps. 1160 00:53:01,570 --> 00:53:05,050 But in short, any time you're building a node, a linked list in memory, 1161 00:53:05,050 --> 00:53:06,340 this is just the paradigm. 1162 00:53:06,340 --> 00:53:10,060 You use typedef, struct, the name of the thing you want to define, like node. 1163 00:53:10,060 --> 00:53:14,020 You use that name on the inside if you want to point from one to another. 1164 00:53:14,020 --> 00:53:18,730 And then you can shorten it down here to just be called node. 1165 00:53:18,730 --> 00:53:23,690 Questions then on this code here? 1166 00:53:23,690 --> 00:53:26,480 Questions on what we just did? 1167 00:53:26,480 --> 00:53:32,120 Well, if I rewind just a moment to that final picture, what would be the upside 1168 00:53:32,120 --> 00:53:34,310 to be clear, of having jumped through these hoops 1169 00:53:34,310 --> 00:53:37,160 and added this complexity if you will? 1170 00:53:37,160 --> 00:53:41,390 What problem did we just solve by linking together these three values 1171 00:53:41,390 --> 00:53:42,050 to be clear? 1172 00:53:42,050 --> 00:53:42,560 Yeah. 1173 00:53:42,560 --> 00:53:44,801 AUDIENCE: Making lists that are not [INAUDIBLE].. 1174 00:53:44,801 --> 00:53:46,426 DAVID J. MALAN: Making lists that are-- 1175 00:53:46,426 --> 00:53:47,805 AUDIENCE: Not [INAUDIBLE]. 1176 00:53:47,805 --> 00:53:50,180 DAVID J. MALAN: Oh, that are not contiguous, if you will. 1177 00:53:50,180 --> 00:53:52,870 So making lists that are not contiguous in memory. 1178 00:53:52,870 --> 00:53:56,440 The upside of which is that if I want to add the number four to this list, 1179 00:53:56,440 --> 00:53:59,710 it looks like I could choose from any chunks of available memory 1180 00:53:59,710 --> 00:54:00,520 on the screen. 1181 00:54:00,520 --> 00:54:04,270 I just need to point from the end of the current list 1182 00:54:04,270 --> 00:54:06,250 to wherever that other one is in memory. 1183 00:54:06,250 --> 00:54:10,270 What I don't need to do, to be clear, is copy the one, the two, or the three. 1184 00:54:10,270 --> 00:54:11,840 Everything can just stay put. 1185 00:54:11,840 --> 00:54:14,890 Which means time-wise I can do this much more quickly, 1186 00:54:14,890 --> 00:54:17,140 it would seem, without copying things again and again. 1187 00:54:17,140 --> 00:54:20,710 And even without using realloc to let it do all of the copying 1188 00:54:20,710 --> 00:54:22,060 potentially for me. 1189 00:54:22,060 --> 00:54:25,990 But as we'll start seeing even more in the coming weeks, every time we benefit 1190 00:54:25,990 --> 00:54:28,450 and solve some problem, we pay a price. 1191 00:54:28,450 --> 00:54:29,680 There's a trade-off. 1192 00:54:29,680 --> 00:54:32,440 What is a downside as you might perceive now 1193 00:54:32,440 --> 00:54:36,040 of using a linked list instead of an array? 1194 00:54:36,040 --> 00:54:36,670 Yeah. 1195 00:54:36,670 --> 00:54:37,900 AUDIENCE: You use twice as much memory. 1196 00:54:37,900 --> 00:54:38,290 DAVID J. MALAN: Yeah. 1197 00:54:38,290 --> 00:54:39,880 I mean, we use twice as much memory. 1198 00:54:39,880 --> 00:54:42,547 Because now in addition to storing the integers one, two, three, 1199 00:54:42,547 --> 00:54:45,040 I also need to store a pointer for each of those. 1200 00:54:45,040 --> 00:54:48,460 And honestly, even this picture is a bit of a simplification. 1201 00:54:48,460 --> 00:54:51,910 Technically, in most systems today, each int would be four bytes. 1202 00:54:51,910 --> 00:54:54,767 Technically today, most pointers though would be eight bytes. 1203 00:54:54,767 --> 00:54:57,100 I just didn't want to draw this weird shape on the board 1204 00:54:57,100 --> 00:54:59,558 where the bottom square is even bigger than the top square. 1205 00:54:59,558 --> 00:55:02,698 But technically, we're using even more than twice as much space 1206 00:55:02,698 --> 00:55:03,490 for these pointers. 1207 00:55:03,490 --> 00:55:04,573 So there's that trade-off. 1208 00:55:04,573 --> 00:55:07,060 Now, thankfully, decades after C was invented, 1209 00:55:07,060 --> 00:55:09,650 memory is generally much cheaper nowadays. 1210 00:55:09,650 --> 00:55:12,180 And so, it's OK to spend more of it if you need to. 1211 00:55:12,180 --> 00:55:14,180 And it depends on what you want to optimize for. 1212 00:55:14,180 --> 00:55:16,420 But that's absolutely here a downside. 1213 00:55:16,420 --> 00:55:20,320 What's another downside of having transitioned to a linked list? 1214 00:55:20,320 --> 00:55:22,070 AUDIENCE: [INAUDIBLE] 1215 00:55:22,070 --> 00:55:23,870 DAVID J. MALAN: You can't index into it. 1216 00:55:23,870 --> 00:55:26,370 Now, I haven't even tried in code. 1217 00:55:26,370 --> 00:55:31,640 But when you have a linked list you can no longer use square bracket notation. 1218 00:55:31,640 --> 00:55:32,780 Because why? 1219 00:55:32,780 --> 00:55:36,410 Well, square bracket notation just assumes the contiguousness of memory. 1220 00:55:36,410 --> 00:55:37,820 Location zero is here. 1221 00:55:37,820 --> 00:55:39,950 Location one is literally one to the right. 1222 00:55:39,950 --> 00:55:42,590 Location two is literally one to the right, one to the right. 1223 00:55:42,590 --> 00:55:45,350 These things, even though I've drawn it from right to left 1224 00:55:45,350 --> 00:55:48,050 to just keep things pretty, there are gaps here. 1225 00:55:48,050 --> 00:55:49,970 And this is just my interpretation of this. 1226 00:55:49,970 --> 00:55:51,225 These gaps could be big. 1227 00:55:51,225 --> 00:55:52,100 They could be narrow. 1228 00:55:52,100 --> 00:55:53,720 They could be down here, up here. 1229 00:55:53,720 --> 00:55:57,320 They could be anywhere so long as we're linking things together in this list. 1230 00:55:57,320 --> 00:56:01,130 The computer can't just use bracket zero, bracket one, bracket two anymore 1231 00:56:01,130 --> 00:56:04,490 because it can't do simple arithmetic and jump to the middle. 1232 00:56:04,490 --> 00:56:06,980 And now, here's perhaps the worst price we've paid. 1233 00:56:06,980 --> 00:56:09,860 If you don't have square bracket notation, or really, 1234 00:56:09,860 --> 00:56:12,920 you don't have contiguousness, what algorithm did 1235 00:56:12,920 --> 00:56:16,540 we just sacrifice for this dynamism? 1236 00:56:16,540 --> 00:56:20,230 If you rewind even back to week zero. 1237 00:56:20,230 --> 00:56:23,680 And we gave it a name in week three. 1238 00:56:23,680 --> 00:56:27,250 What algorithm can we not use now if we can't assume that the memory is 1239 00:56:27,250 --> 00:56:29,603 back, to back, to back, to back? 1240 00:56:29,603 --> 00:56:30,350 AUDIENCE: Binary? 1241 00:56:30,350 --> 00:56:31,940 DAVID J. MALAN: Binary search. 1242 00:56:31,940 --> 00:56:32,570 Why? 1243 00:56:32,570 --> 00:56:36,350 Because binary search, just like the phone book, back in the first week, 1244 00:56:36,350 --> 00:56:39,950 requires being able to arithmetically jump right to the middle. 1245 00:56:39,950 --> 00:56:42,260 Take the total length of it, divide by two, and boom. 1246 00:56:42,260 --> 00:56:44,802 You're right there in the middle with some simple arithmetic. 1247 00:56:44,802 --> 00:56:49,520 Here they might be laid out, again, with these big or small gaps. 1248 00:56:49,520 --> 00:56:52,148 There's no simple math I can do to just jump immediately 1249 00:56:52,148 --> 00:56:53,190 to the one in the middle. 1250 00:56:53,190 --> 00:56:55,100 And in fact, again, if this TV were bigger, 1251 00:56:55,100 --> 00:56:59,038 the two could technically be in memory be way down here or even way over here. 1252 00:56:59,038 --> 00:57:01,580 The foam finger could be pointing in any number of directions 1253 00:57:01,580 --> 00:57:03,380 depending on where malloc put the thing. 1254 00:57:03,380 --> 00:57:05,630 There's just no way to do binary search. 1255 00:57:05,630 --> 00:57:08,360 And so, it would seem that we've paid another price indeed 1256 00:57:08,360 --> 00:57:09,980 in terms of a performance. 1257 00:57:09,980 --> 00:57:13,170 We're now talking about linear time again. 1258 00:57:13,170 --> 00:57:14,580 So that's a regression. 1259 00:57:14,580 --> 00:57:16,128 Now, that's also a lot. 1260 00:57:16,128 --> 00:57:18,920 Feels like a good time for some muffins and fruit out in the lobby. 1261 00:57:18,920 --> 00:57:21,837 And when we come back, we'll try to solve the problem we just created. 1262 00:57:21,837 --> 00:57:23,060 So see you in 10. 1263 00:57:23,060 --> 00:57:24,290 So we are back. 1264 00:57:24,290 --> 00:57:26,900 And let's see if we can't now take some of these higher level 1265 00:57:26,900 --> 00:57:29,930 concepts of stitching together these nodes in memory 1266 00:57:29,930 --> 00:57:31,933 and translate it to some actual code. 1267 00:57:31,933 --> 00:57:34,100 But we'll do it step by step first before I actually 1268 00:57:34,100 --> 00:57:35,360 start writing it in VS Code. 1269 00:57:35,360 --> 00:57:38,485 So if, Carter, you wouldn't mind helping me step through with some visuals, 1270 00:57:38,485 --> 00:57:41,630 let me propose that line by line we solve some of the problems 1271 00:57:41,630 --> 00:57:44,730 that we've just created for ourselves in building this thing in memory. 1272 00:57:44,730 --> 00:57:48,620 So let's go ahead and first consider how we 1273 00:57:48,620 --> 00:57:52,950 could build a linked list containing the numbers indeed one, then two, then 1274 00:57:52,950 --> 00:57:53,450 three. 1275 00:57:53,450 --> 00:57:55,160 And let's translate each of those steps to code 1276 00:57:55,160 --> 00:57:57,510 and then we'll put it all together into something that actually runs. 1277 00:57:57,510 --> 00:58:00,020 So how about the first step here will just be this. 1278 00:58:00,020 --> 00:58:04,460 To declare a pointer called list that's initially has no value, 1279 00:58:04,460 --> 00:58:05,990 at least at this point in the story. 1280 00:58:05,990 --> 00:58:07,640 List is the name of the variable. 1281 00:58:07,640 --> 00:58:09,980 Node star just means that this is essentially 1282 00:58:09,980 --> 00:58:12,590 going to be our little square over here that 1283 00:58:12,590 --> 00:58:14,780 points to the beginning of the list. 1284 00:58:14,780 --> 00:58:17,750 Of course, it's ideal if it ultimately has a value. 1285 00:58:17,750 --> 00:58:21,050 Because when we initially call this line of code, 1286 00:58:21,050 --> 00:58:24,050 it just gives us indeed that square over here on the left. 1287 00:58:24,050 --> 00:58:27,230 But it's got a garbage value because there's no equal sign on the other side 1288 00:58:27,230 --> 00:58:27,780 there. 1289 00:58:27,780 --> 00:58:30,080 So let's propose that we do one more step here 1290 00:58:30,080 --> 00:58:34,160 and actually initialize it to null so that if only 1291 00:58:34,160 --> 00:58:37,177 we know that it's not garbage, it at least has some known value. 1292 00:58:37,177 --> 00:58:40,010 And null is a good way of signifying that at this point in the story 1293 00:58:40,010 --> 00:58:41,240 the list is empty. 1294 00:58:41,240 --> 00:58:44,390 Indeed, null indicates there's no nodes in the list. 1295 00:58:44,390 --> 00:58:46,930 So that picture would now look like this, whereby let's just 1296 00:58:46,930 --> 00:58:48,680 draw-- instead of writing null everywhere, 1297 00:58:48,680 --> 00:58:51,680 I'll just leave the squares blank when it's not a garbage value, per se. 1298 00:58:51,680 --> 00:58:54,320 It's literally 0x0 or null. 1299 00:58:54,320 --> 00:58:57,650 So that's it for building a linked list of size zero. 1300 00:58:57,650 --> 00:58:58,942 We're done then. 1301 00:58:58,942 --> 00:59:01,400 But we want to now add a one, and then a two, then a three. 1302 00:59:01,400 --> 00:59:03,290 So next step here might be this. 1303 00:59:03,290 --> 00:59:08,030 If I want to allocate the first of my rectangles on our previous picture, 1304 00:59:08,030 --> 00:59:09,200 I'm going to call malloc. 1305 00:59:09,200 --> 00:59:12,650 And I'm going to ask for enough memory to fit a whole node. 1306 00:59:12,650 --> 00:59:15,572 Now, technically, I think that's going to be four bytes for the int 1307 00:59:15,572 --> 00:59:17,780 and eight bytes for the pointer even though I did not 1308 00:59:17,780 --> 00:59:19,650 draw it to scale on the board. 1309 00:59:19,650 --> 00:59:21,890 So that's technically going to be what, 12 bytes? 1310 00:59:21,890 --> 00:59:24,740 But again, size of node just figures out how many 1311 00:59:24,740 --> 00:59:26,450 bytes I actually need dynamically. 1312 00:59:26,450 --> 00:59:30,200 That's going to return to me the address of that chunk of memory, which 1313 00:59:30,200 --> 00:59:33,860 apparently I'm going to store inside of a temporary variable called n 1314 00:59:33,860 --> 00:59:35,060 for short for node. 1315 00:59:35,060 --> 00:59:37,010 But let's see what this does pictorially. 1316 00:59:37,010 --> 00:59:39,710 So when this line of code is executed, I first get, 1317 00:59:39,710 --> 00:59:41,930 on the left, that variable n. 1318 00:59:41,930 --> 00:59:45,410 It's got a garbage value by default because I haven't executed 1319 00:59:45,410 --> 00:59:47,143 the whole thing from right to left. 1320 00:59:47,143 --> 00:59:49,310 Meanwhile, on the right-hand side of the expression, 1321 00:59:49,310 --> 00:59:51,320 I've got now a node somewhere in memory. 1322 00:59:51,320 --> 00:59:52,550 It happened to be free here. 1323 00:59:52,550 --> 00:59:54,203 This is where malloc put it for me. 1324 00:59:54,203 --> 00:59:56,120 But it does have two garbage values initially. 1325 00:59:56,120 --> 01:00:01,040 But because it's a node per my typedef earlier, every node I proposed 1326 01:00:01,040 --> 01:00:03,980 is going to have a number and a next pointer. 1327 01:00:03,980 --> 01:00:05,660 So we can see those labeled here. 1328 01:00:05,660 --> 01:00:07,640 But they've got two garbage values initially. 1329 01:00:07,640 --> 01:00:11,120 But all I care about initially is that ultimately n 1330 01:00:11,120 --> 01:00:13,520 is pointing at that chunk of code. 1331 01:00:13,520 --> 01:00:18,600 So initially, if we could back up two steps, we have-- two steps. 1332 01:00:18,600 --> 01:00:20,720 So we have initially-- oh, one step forward. 1333 01:00:20,720 --> 01:00:25,790 We have this line of code gives us this variable here, which has garbage. 1334 01:00:25,790 --> 01:00:29,660 When this side of the expression is executed, that allocates the memory. 1335 01:00:29,660 --> 01:00:32,720 And then, when we copy from right to left the address 1336 01:00:32,720 --> 01:00:36,290 of that chunk of memory, that's what gives us conceptually this arrow. 1337 01:00:36,290 --> 01:00:38,848 And the garbage goes away because that's a valid pointer now. 1338 01:00:38,848 --> 01:00:40,640 Of course, there's still two garbage values 1339 01:00:40,640 --> 01:00:43,940 there because we haven't set this node to store a number like the number one. 1340 01:00:43,940 --> 01:00:46,107 So let's go ahead and execute one other line of code 1341 01:00:46,107 --> 01:00:49,850 like this, which, while cryptic-looking, is just an application of ideas 1342 01:00:49,850 --> 01:00:51,890 we've seen in week four and prior. 1343 01:00:51,890 --> 01:00:56,060 Star n means to start at this variable and go there. 1344 01:00:56,060 --> 01:01:00,240 Follow the arrow is what the star or the dereference operator does for us. 1345 01:01:00,240 --> 01:01:03,440 And then, the dot operator, recall, when we first introduced structs, 1346 01:01:03,440 --> 01:01:06,110 like for a person struct, allows us to go 1347 01:01:06,110 --> 01:01:08,190 at the number field or the next field. 1348 01:01:08,190 --> 01:01:10,910 So if I do star n, and then in parentheses 1349 01:01:10,910 --> 01:01:12,890 to make sure order of operations is preserved, 1350 01:01:12,890 --> 01:01:15,920 dot number, and then assign it the actual number one, 1351 01:01:15,920 --> 01:01:18,270 which puts the one in the top of that rectangle. 1352 01:01:18,270 --> 01:01:20,747 Now, admittedly, this syntax is not very user-friendly. 1353 01:01:20,747 --> 01:01:21,830 It's annoying to remember. 1354 01:01:21,830 --> 01:01:23,070 You have to have the parentheses. 1355 01:01:23,070 --> 01:01:24,660 So there's another syntax for this. 1356 01:01:24,660 --> 01:01:27,335 Whenever you're doing two things like this in code, 1357 01:01:27,335 --> 01:01:31,010 dereferencing a pointer that is going to an address. 1358 01:01:31,010 --> 01:01:34,940 And then further using the dot notation to go inside of the structure, 1359 01:01:34,940 --> 01:01:38,900 you find that wonderfully C gives us this syntax, whereby 1360 01:01:38,900 --> 01:01:41,510 you can just change the star and the parentheses 1361 01:01:41,510 --> 01:01:43,387 and the dot to just be an arrow. 1362 01:01:43,387 --> 01:01:45,720 And again, it's not a single character on your keyboard. 1363 01:01:45,720 --> 01:01:47,690 It's a hyphen and then an open angle bracket. 1364 01:01:47,690 --> 01:01:49,400 But I kind of like the semantics of this. 1365 01:01:49,400 --> 01:01:53,810 Because this code now pretty much matches the picture-- n arrow 1366 01:01:53,810 --> 01:01:56,780 leads you to the value that you want to access or ultimately 1367 01:01:56,780 --> 01:01:58,050 change in this way. 1368 01:01:58,050 --> 01:01:59,930 There's one step, though, we've forgotten, 1369 01:01:59,930 --> 01:02:03,110 of course, which is that we can't leave this garbage value here. 1370 01:02:03,110 --> 01:02:07,070 Because the garbage value is some unknown value that effectively 1371 01:02:07,070 --> 01:02:08,510 is pointing who knows where. 1372 01:02:08,510 --> 01:02:10,580 And we don't want to accidentally misinterpret 1373 01:02:10,580 --> 01:02:14,160 that garbage value as being a valid address and risk going there. 1374 01:02:14,160 --> 01:02:17,660 So of course, what value should we put here instead? 1375 01:02:17,660 --> 01:02:20,780 Our old friend null, just to signify that this is indeed 1376 01:02:20,780 --> 01:02:21,665 the end of the list. 1377 01:02:21,665 --> 01:02:23,790 And we could do that with a line of code like this. 1378 01:02:23,790 --> 01:02:27,300 And again, we'll connote as much by just leaving that empty box blank. 1379 01:02:27,300 --> 01:02:29,450 So now, we have a list of size one. 1380 01:02:29,450 --> 01:02:33,050 Let's go ahead and add the second number to it as with these lines here. 1381 01:02:33,050 --> 01:02:39,150 List equals n allows us to remember that indeed, we have this list here. 1382 01:02:39,150 --> 01:02:40,980 So if we can step one step forward. 1383 01:02:40,980 --> 01:02:42,920 Here's what the picture now looks like. 1384 01:02:42,920 --> 01:02:45,470 And technically, let's go one step further here. 1385 01:02:45,470 --> 01:02:50,270 This is now really what's going on in memory once my list of size one exists. 1386 01:02:50,270 --> 01:02:55,148 My main variable called list is pointing at exactly that first node. 1387 01:02:55,148 --> 01:02:57,440 At this point in the story I don't need to know or care 1388 01:02:57,440 --> 01:03:00,200 about the temporary variable that I called n, even though it 1389 01:03:00,200 --> 01:03:01,550 might very well still be there. 1390 01:03:01,550 --> 01:03:03,950 But indeed, this now represents that linked list. 1391 01:03:03,950 --> 01:03:05,900 Let's now indeed add the number two. 1392 01:03:05,900 --> 01:03:09,590 So with the same line of code as before, I'm going to allocate another node. 1393 01:03:09,590 --> 01:03:10,700 Size of node. 1394 01:03:10,700 --> 01:03:12,620 Ideally, I would be checking for null here. 1395 01:03:12,620 --> 01:03:15,090 But we're doing the juicy parts only on the slides. 1396 01:03:15,090 --> 01:03:16,740 Let's now go ahead and depict that. 1397 01:03:16,740 --> 01:03:17,825 So what happens with this? 1398 01:03:17,825 --> 01:03:21,740 This brings back our n pointer, which might have been there the whole time. 1399 01:03:21,740 --> 01:03:23,250 But we're doing this step by step. 1400 01:03:23,250 --> 01:03:26,120 It's a garbage value though because we haven't yet copied from right to left. 1401 01:03:26,120 --> 01:03:27,980 Malloc, of course, gives us a second chunk 1402 01:03:27,980 --> 01:03:31,100 of memory, which maybe ends up there with two garbage values 1403 01:03:31,100 --> 01:03:34,100 by default. I've omitted the labels now just because they're still going 1404 01:03:34,100 --> 01:03:36,590 to be number and next respectively. 1405 01:03:36,590 --> 01:03:40,610 Once we copy from right to left, the garbage value indeed becomes an arrow. 1406 01:03:40,610 --> 01:03:44,900 Oscar disappears because it's now indeed a valid pointer pointing here. 1407 01:03:44,900 --> 01:03:48,290 Now, the values themselves number and next are invalid-- 1408 01:03:48,290 --> 01:03:49,140 garbage value. 1409 01:03:49,140 --> 01:03:53,570 So here is where we can now start using our new syntax like the arrow notation 1410 01:03:53,570 --> 01:03:55,430 or the star and the dot if you prefer. 1411 01:03:55,430 --> 01:03:57,860 And we can change the value of n-- 1412 01:03:57,860 --> 01:03:59,870 follow the arrow to number. 1413 01:03:59,870 --> 01:04:01,220 And that becomes two. 1414 01:04:01,220 --> 01:04:05,340 Similarly, we can do this again and set n arrow next. 1415 01:04:05,340 --> 01:04:08,930 So start at n, follow the arrow, access the next field, 1416 01:04:08,930 --> 01:04:10,670 and set that equal to null. 1417 01:04:10,670 --> 01:04:13,460 Now, we're not quite done yet because we haven't actually 1418 01:04:13,460 --> 01:04:14,908 linked things together. 1419 01:04:14,908 --> 01:04:16,700 So here's now where things get interesting. 1420 01:04:16,700 --> 01:04:18,470 How do I combine these two? 1421 01:04:18,470 --> 01:04:20,160 Well, let me propose this. 1422 01:04:20,160 --> 01:04:24,080 Let me propose on our next line here we actually update for now 1423 01:04:24,080 --> 01:04:26,420 list equal to n. 1424 01:04:26,420 --> 01:04:29,690 That is to say, whatever address this is, whatever it's pointing at, 1425 01:04:29,690 --> 01:04:33,330 change list to be the same address that is point at the same thing. 1426 01:04:33,330 --> 01:04:37,280 So if n is pointing here, let's change list to point here. 1427 01:04:37,280 --> 01:04:39,470 And go ahead and do that, Carter, if you could. 1428 01:04:39,470 --> 01:04:41,040 I don't like this. 1429 01:04:41,040 --> 01:04:42,650 Can you go one further step? 1430 01:04:42,650 --> 01:04:44,180 This is bad. 1431 01:04:44,180 --> 01:04:48,800 What is wrong about my sequence of operations here where I updated list 1432 01:04:48,800 --> 01:04:51,370 to point at my new node? 1433 01:04:51,370 --> 01:04:52,060 Yeah. 1434 01:04:52,060 --> 01:04:53,105 AUDIENCE: We [INAUDIBLE]. 1435 01:04:53,105 --> 01:04:53,980 DAVID J. MALAN: Yeah. 1436 01:04:53,980 --> 01:04:56,090 We lost the pointer to the other node. 1437 01:04:56,090 --> 01:04:59,830 So I don't even care about the ordering, two one or one two. 1438 01:04:59,830 --> 01:05:03,610 The bigger problem now as the lack of arrows over there suggests 1439 01:05:03,610 --> 01:05:05,260 is that I have a memory leak. 1440 01:05:05,260 --> 01:05:08,140 I have orphaned my original node in the sense 1441 01:05:08,140 --> 01:05:10,425 that nothing is pointing at it anymore. 1442 01:05:10,425 --> 01:05:13,300 Now, absolutely, I could fix this by adding some temporary variables. 1443 01:05:13,300 --> 01:05:14,383 I could add it to the mix. 1444 01:05:14,383 --> 01:05:18,280 But at this point in the story I have not done any such recollection thereof. 1445 01:05:18,280 --> 01:05:19,450 So let me back this up. 1446 01:05:19,450 --> 01:05:20,930 And let's go forward in the slides. 1447 01:05:20,930 --> 01:05:22,660 This is where we left off a moment ago. 1448 01:05:22,660 --> 01:05:25,923 I think I need to take into account order of operations. 1449 01:05:25,923 --> 01:05:27,340 And I'm going to keep this simple. 1450 01:05:27,340 --> 01:05:30,130 I'm not going to care about the order of the numbers for now. 1451 01:05:30,130 --> 01:05:33,260 I'm fine with a list that is two and then one. 1452 01:05:33,260 --> 01:05:36,110 So with that said, let me go ahead and update, 1453 01:05:36,110 --> 01:05:39,800 I think, this box here to point at my original node. 1454 01:05:39,800 --> 01:05:42,280 So let's see how we could do this in code. 1455 01:05:42,280 --> 01:05:43,820 n arrow next. 1456 01:05:43,820 --> 01:05:48,945 So n, arrow, next should equal the current list. 1457 01:05:48,945 --> 01:05:50,320 And this is a little weird again. 1458 01:05:50,320 --> 01:05:51,670 But recall what list is. 1459 01:05:51,670 --> 01:05:54,070 List is this pointer here that just contains 1460 01:05:54,070 --> 01:05:57,700 the address of the original address of the list, 1461 01:05:57,700 --> 01:06:01,370 or equivalently, it contains this arrow, whatever it's pointing at. 1462 01:06:01,370 --> 01:06:05,500 So what this means in this line of code, n bracket next means start at n, 1463 01:06:05,500 --> 01:06:07,900 follow the arrow, access the next pointer, 1464 01:06:07,900 --> 01:06:10,690 and set it equal to whatever list equals. 1465 01:06:10,690 --> 01:06:16,000 So if list is pointing here, then next should point there as well. 1466 01:06:16,000 --> 01:06:17,530 This, I think, is safe. 1467 01:06:17,530 --> 01:06:18,880 Because now we have redundancy. 1468 01:06:18,880 --> 01:06:21,700 Now we've got two pointers pointing at the original list. 1469 01:06:21,700 --> 01:06:26,860 And now I think we can do another step whereby we update list to equal n, 1470 01:06:26,860 --> 01:06:29,150 same line of code before that got us into trouble. 1471 01:06:29,150 --> 01:06:31,150 But I'm doing it second now instead of first. 1472 01:06:31,150 --> 01:06:36,340 When I execute list equals n, this now sets list equal to the same thing 1473 01:06:36,340 --> 01:06:37,570 that n equals. 1474 01:06:37,570 --> 01:06:41,650 And so, now I have successfully inserted my new node 1475 01:06:41,650 --> 01:06:44,440 containing two into the list. 1476 01:06:44,440 --> 01:06:46,480 And in fact, if we advance one more, we can just 1477 01:06:46,480 --> 01:06:48,940 clear up the clutter, assume that the temporary variable is 1478 01:06:48,940 --> 01:06:50,080 gone from the story. 1479 01:06:50,080 --> 01:06:53,067 Now we have a linked list where admittedly ordering is wrong. 1480 01:06:53,067 --> 01:06:54,400 It's two one instead of one two. 1481 01:06:54,400 --> 01:06:56,260 But at least it's linked correctly. 1482 01:06:56,260 --> 01:07:00,630 And I didn't orphan or leak any memory. 1483 01:07:00,630 --> 01:07:04,440 Questions on this sequence of steps here? 1484 01:07:04,440 --> 01:07:05,894 Yeah, in back. 1485 01:07:05,894 --> 01:07:09,858 AUDIENCE: So [INAUDIBLE] this is all [INAUDIBLE] type, right? 1486 01:07:09,858 --> 01:07:11,945 [INAUDIBLE] 1487 01:07:11,945 --> 01:07:12,820 DAVID J. MALAN: Yeah. 1488 01:07:12,820 --> 01:07:13,460 Spot on. 1489 01:07:13,460 --> 01:07:17,020 So this would fall under that category of a stack, if you will. 1490 01:07:17,020 --> 01:07:21,070 Although I've not called it that by name because I just pushed the number two 1491 01:07:21,070 --> 01:07:22,780 onto this data structure, if you will. 1492 01:07:22,780 --> 01:07:26,320 And indeed, it ended up at the beginning of the list instead of the end. 1493 01:07:26,320 --> 01:07:29,182 And so, here's where we see a distinction between an abstract data 1494 01:07:29,182 --> 01:07:30,640 structure, which is where we began. 1495 01:07:30,640 --> 01:07:34,030 A stack is a thing like the pile of sweaters that just has push and pop 1496 01:07:34,030 --> 01:07:37,000 properties and LIFO access-- 1497 01:07:37,000 --> 01:07:39,040 last in first out. 1498 01:07:39,040 --> 01:07:41,320 How do you implement something like that in memory? 1499 01:07:41,320 --> 01:07:45,310 Well, it would seem that you could implement the notion of a stack here 1500 01:07:45,310 --> 01:07:48,350 not for sweaters but for numbers using a linked list. 1501 01:07:48,350 --> 01:07:51,220 So long as you implement insertion a.k.a. 1502 01:07:51,220 --> 01:07:56,030 pushing by pretending new values to the list by pretending again and again. 1503 01:07:56,030 --> 01:07:58,780 And if, Carter, you don't mind hitting the keyboard one more time. 1504 01:07:58,780 --> 01:08:00,970 If I wanted to add the number three now, you 1505 01:08:00,970 --> 01:08:03,610 could imagine prepending it to the list. 1506 01:08:03,610 --> 01:08:04,310 Why? 1507 01:08:04,310 --> 01:08:07,520 Well honestly, especially as this list gets longer and longer, 1508 01:08:07,520 --> 01:08:10,780 I kind of like the appeal of prepending these elements. 1509 01:08:10,780 --> 01:08:11,320 Why? 1510 01:08:11,320 --> 01:08:15,040 Because even if this list gets crazy long and way, way out here, 1511 01:08:15,040 --> 01:08:17,770 you didn't notice me following all of the arrows 1512 01:08:17,770 --> 01:08:19,149 earlier to do the insertions. 1513 01:08:19,149 --> 01:08:22,569 If I want to insert a fourth number, a fifth number, a sixth number, 1514 01:08:22,569 --> 01:08:24,880 all I have to do is insert it here, if you 1515 01:08:24,880 --> 01:08:29,680 will, point it at the original start of the list, then update this pointer, 1516 01:08:29,680 --> 01:08:30,290 and done. 1517 01:08:30,290 --> 01:08:32,830 And I would say that's two steps, give or take. 1518 01:08:32,830 --> 01:08:35,200 It's not going to be n steps as it would be 1519 01:08:35,200 --> 01:08:39,050 if I had to append the new nodes to the end of the list. 1520 01:08:39,050 --> 01:08:41,705 Now, of course, we've sacrificed ordering of these numbers. 1521 01:08:41,705 --> 01:08:44,080 They're literally in the opposite order or whatever order 1522 01:08:44,080 --> 01:08:45,140 they were inserted in. 1523 01:08:45,140 --> 01:08:48,833 But that might very well be OK depending on the goal at hand. 1524 01:08:48,833 --> 01:08:50,750 Thank you to Carter for stepping through this. 1525 01:08:50,750 --> 01:08:52,479 What if now we wanted to translate this-- 1526 01:08:52,479 --> 01:08:52,930 [APPLAUSE] 1527 01:08:52,930 --> 01:08:53,430 Oh, sure. 1528 01:08:53,430 --> 01:08:55,480 Thank you. 1529 01:08:55,480 --> 01:08:59,240 [LAUGHS] It's all for you, none for me in this example. 1530 01:08:59,240 --> 01:09:02,200 So here we have perhaps a way of translating this now 1531 01:09:02,200 --> 01:09:03,430 to some actual code. 1532 01:09:03,430 --> 01:09:06,040 And this will be the last of the intense code 1533 01:09:06,040 --> 01:09:09,160 here just to give you a sense of how we can translate this idea now 1534 01:09:09,160 --> 01:09:10,580 to actual steps. 1535 01:09:10,580 --> 01:09:13,189 So this is list.c in VS Code here. 1536 01:09:13,189 --> 01:09:15,490 Let me go ahead and make a couple of changes up top. 1537 01:09:15,490 --> 01:09:16,420 Let me go ahead. 1538 01:09:16,420 --> 01:09:22,120 And how about declaring a node using typedef struct 1539 01:09:22,120 --> 01:09:25,120 node using our new framing as before. 1540 01:09:25,120 --> 01:09:27,760 I'm going to give every node a number, as I proposed. 1541 01:09:27,760 --> 01:09:31,390 And every node a pointer to the next element, which is 1542 01:09:31,390 --> 01:09:33,850 going to be implemented just as before. 1543 01:09:33,850 --> 01:09:36,380 And I'm going to simplify the whole name as just node. 1544 01:09:36,380 --> 01:09:40,300 So all that is is the exact same typedef that we proposed earlier. 1545 01:09:40,300 --> 01:09:44,830 Now, let me go ahead and get rid of all of this code which we wrote earlier. 1546 01:09:44,830 --> 01:09:48,250 And recall that this was the most recent version that was not a linked list. 1547 01:09:48,250 --> 01:09:52,090 This was just an array that we allocated and then reallocated. 1548 01:09:52,090 --> 01:09:54,350 So this is the old way of doing things. 1549 01:09:54,350 --> 01:09:58,210 But it was inefficient because we might have to lean on a for loop 1550 01:09:58,210 --> 01:10:00,760 or lean on realloc to copy everything around. 1551 01:10:00,760 --> 01:10:04,930 We're now going to re-implement the notion of a list as an actual linked 1552 01:10:04,930 --> 01:10:07,670 list, not as an array. 1553 01:10:07,670 --> 01:10:10,150 So my main function now might do something like this. 1554 01:10:10,150 --> 01:10:12,330 And I'm going to really just copy the lines of code 1555 01:10:12,330 --> 01:10:14,080 that we just stepped through on the board. 1556 01:10:14,080 --> 01:10:17,710 So let me give myself a special variable called list. 1557 01:10:17,710 --> 01:10:19,360 That's going to be initialized to null. 1558 01:10:19,360 --> 01:10:22,540 And this is just my pointer, the square on the left-hand side of the screen 1559 01:10:22,540 --> 01:10:24,235 that represents the start of the list. 1560 01:10:24,235 --> 01:10:26,110 And if it's null, it means the list is empty. 1561 01:10:26,110 --> 01:10:26,830 So done. 1562 01:10:26,830 --> 01:10:30,460 I'm done implementing a linked list of size zero. 1563 01:10:30,460 --> 01:10:32,462 Well, now, how do I want to run this code? 1564 01:10:32,462 --> 01:10:34,420 Well, let me propose for the sake of discussion 1565 01:10:34,420 --> 01:10:37,482 that this version of the program will take command line arguments. 1566 01:10:37,482 --> 01:10:39,440 So I want to be able to do something like this. 1567 01:10:39,440 --> 01:10:43,060 I want to run this program ultimately and type in three command line 1568 01:10:43,060 --> 01:10:44,680 arguments like this, one, two, three. 1569 01:10:44,680 --> 01:10:47,530 And I want my program in a couple of minutes 1570 01:10:47,530 --> 01:10:51,400 to allocate one, two, three nodes and stitch them together just 1571 01:10:51,400 --> 01:10:54,070 like the visualization on the board. 1572 01:10:54,070 --> 01:10:55,290 I could use get int. 1573 01:10:55,290 --> 01:10:58,040 But it's just going to be faster if we use command line arguments. 1574 01:10:58,040 --> 01:11:00,550 So again, I'm just borrowing some concepts from week two. 1575 01:11:00,550 --> 01:11:03,382 But none of that's possible yet until I change my code here. 1576 01:11:03,382 --> 01:11:04,090 So let's do this. 1577 01:11:04,090 --> 01:11:08,560 Int argc string argv. 1578 01:11:08,560 --> 01:11:10,030 But you know what? 1579 01:11:10,030 --> 01:11:12,800 We know that strings are not actually a thing anymore. 1580 01:11:12,800 --> 01:11:16,600 So I can change my command line argument definition to be what it really is. 1581 01:11:16,600 --> 01:11:18,040 It's really char star. 1582 01:11:18,040 --> 01:11:20,230 But it's the exact same thing as in week two, 1583 01:11:20,230 --> 01:11:23,500 just strings are no more, at least without the training wheels 1584 01:11:23,500 --> 01:11:25,360 on anymore like last week. 1585 01:11:25,360 --> 01:11:27,070 And now let me do this. 1586 01:11:27,070 --> 01:11:33,210 For int i equals one, i is less than argc i plus, plus. 1587 01:11:33,210 --> 01:11:34,960 So what I'm doing with this loop is I just 1588 01:11:34,960 --> 01:11:36,918 want to iterate over the command line argument. 1589 01:11:36,918 --> 01:11:39,820 So I have one number at a time from the prompt. 1590 01:11:39,820 --> 01:11:42,980 What else do I want to do here? 1591 01:11:42,980 --> 01:11:48,170 Well, let's go ahead, and how about do this. 1592 01:11:48,170 --> 01:11:49,730 Let's get a number. 1593 01:11:49,730 --> 01:11:53,140 So int number equals argv bracket i. 1594 01:11:53,140 --> 01:11:54,530 So a couple of notes here. 1595 01:11:54,530 --> 01:11:58,390 One, I'm starting for loop at one instead of zero. 1596 01:11:58,390 --> 01:11:59,890 But I'm going up to argc. 1597 01:11:59,890 --> 01:12:02,530 argc is argument count, how many words are at the prompt. 1598 01:12:02,530 --> 01:12:07,930 Why am I starting at one instead of zero though given my goal? 1599 01:12:07,930 --> 01:12:09,220 Why am I starting one? 1600 01:12:09,220 --> 01:12:10,610 Yeah. 1601 01:12:10,610 --> 01:12:15,805 AUDIENCE: Because [INAUDIBLE] the first value [INAUDIBLE].. 1602 01:12:15,805 --> 01:12:16,680 DAVID J. MALAN: Yeah. 1603 01:12:16,680 --> 01:12:19,860 So the first value in argv is actually the name of the program. 1604 01:12:19,860 --> 01:12:21,130 That's obviously not a number. 1605 01:12:21,130 --> 01:12:22,493 So I want the second value. 1606 01:12:22,493 --> 01:12:25,410 So I'm going to start iterating over those command line arguments at i 1607 01:12:25,410 --> 01:12:25,937 equals one. 1608 01:12:25,937 --> 01:12:26,520 So that's all. 1609 01:12:26,520 --> 01:12:29,340 I just want to get the actual numbers at the prompt. 1610 01:12:29,340 --> 01:12:33,910 Unfortunately, argv, bracket i, is a string, a.k.a. 1611 01:12:33,910 --> 01:12:34,410 char star. 1612 01:12:34,410 --> 01:12:35,267 That is not an int. 1613 01:12:35,267 --> 01:12:36,600 So this line of code won't work. 1614 01:12:36,600 --> 01:12:39,690 But can anyone think back to week two where 1615 01:12:39,690 --> 01:12:43,560 we had a function for converting strings to integers? 1616 01:12:43,560 --> 01:12:44,153 Anyone? 1617 01:12:44,153 --> 01:12:44,820 AUDIENCE: a to i 1618 01:12:44,820 --> 01:12:45,695 DAVID J. MALAN: Yeah. 1619 01:12:45,695 --> 01:12:47,820 So a to i is a function that converts ASCII 1620 01:12:47,820 --> 01:12:50,820 to an integer assuming what you give it as an argument looks 1621 01:12:50,820 --> 01:12:52,590 like a number like one, or two, or three. 1622 01:12:52,590 --> 01:12:53,680 So let me fix this. 1623 01:12:53,680 --> 01:12:55,170 Let me actually do the conversion. 1624 01:12:55,170 --> 01:12:57,990 If I were really being careful, I would error-check this, 1625 01:12:57,990 --> 01:13:01,260 make sure that there's no digits, just like you might have in problem set two. 1626 01:13:01,260 --> 01:13:03,760 But for today's purposes, I'm just going to assume the honor 1627 01:13:03,760 --> 01:13:07,800 system that the user, me, is going to run the program correctly. 1628 01:13:07,800 --> 01:13:11,400 So now that I have a variable containing the number from the command line, 1629 01:13:11,400 --> 01:13:12,910 let's just allocate a node for it. 1630 01:13:12,910 --> 01:13:16,950 So let me do node star n, just like we did in the visualization. 1631 01:13:16,950 --> 01:13:21,990 And let's malloc enough space for the size of one such node here. 1632 01:13:21,990 --> 01:13:24,250 I now need to just be super safe. 1633 01:13:24,250 --> 01:13:29,310 So if n equals equals null, like if I'm out of memory, you know what? 1634 01:13:29,310 --> 01:13:32,640 Let me go ahead and just immediately return one here. 1635 01:13:32,640 --> 01:13:35,580 Otherwise, if that's not the case, let me 1636 01:13:35,580 --> 01:13:41,880 go ahead and update the number field of this new node, which at line 24 1637 01:13:41,880 --> 01:13:44,050 does exist because it did not return null. 1638 01:13:44,050 --> 01:13:46,140 So I did not exit early with return. 1639 01:13:46,140 --> 01:13:49,660 And let me just store whatever number this human typed in first. 1640 01:13:49,660 --> 01:13:53,940 So the return value of a to i, which per line 17 is in my variable 1641 01:13:53,940 --> 01:13:54,750 called number. 1642 01:13:54,750 --> 01:14:01,780 And then, let me go ahead and just prepend this to the list. 1643 01:14:01,780 --> 01:14:08,730 Let me go ahead and say that this next field first has a known value null, 1644 01:14:08,730 --> 01:14:11,140 just so that we get rid of that second garbage value. 1645 01:14:11,140 --> 01:14:13,930 And let me go ahead and now prepend it to the list. 1646 01:14:13,930 --> 01:14:16,950 So if I want to prepend it, that means this new node 1647 01:14:16,950 --> 01:14:24,040 must have a next field that points to the current beginning of the list. 1648 01:14:24,040 --> 01:14:26,730 And again, the goal here is to prepend, prepend, prepend. 1649 01:14:26,730 --> 01:14:28,980 So whatever the current list is, let's change it 1650 01:14:28,980 --> 01:14:33,030 so that this new node points to that existing list. 1651 01:14:33,030 --> 01:14:37,680 And now, step two, as before, was to update the actual list 1652 01:14:37,680 --> 01:14:39,330 to point at this node. 1653 01:14:39,330 --> 01:14:42,780 So recall in red on the screen before, I screwed up originally. 1654 01:14:42,780 --> 01:14:46,750 And I only did this line by moving the pointer too early, if you will. 1655 01:14:46,750 --> 01:14:49,290 But I fixed that once Carter helped me rewind 1656 01:14:49,290 --> 01:14:52,050 and we got rid of the red line, which indicated error. 1657 01:14:52,050 --> 01:14:56,790 And I just do n arrow next to change the next field of this new node 1658 01:14:56,790 --> 01:14:58,360 to point to the existing list. 1659 01:14:58,360 --> 01:15:00,510 So I'm not orphaning anything. 1660 01:15:00,510 --> 01:15:06,962 At this point in the story, I think my code is correct, not batting 1661 01:15:06,962 --> 01:15:07,920 very well though today. 1662 01:15:07,920 --> 01:15:09,030 But I think my code is correct. 1663 01:15:09,030 --> 01:15:11,200 But the program doesn't do anything interesting. 1664 01:15:11,200 --> 01:15:14,940 So it would be nice to now iterate over this linked list in memory whatever 1665 01:15:14,940 --> 01:15:16,990 its order is and print things out. 1666 01:15:16,990 --> 01:15:18,160 Well, how do we do that? 1667 01:15:18,160 --> 01:15:21,000 Well, it turns out if you want to iterate over a linked list, 1668 01:15:21,000 --> 01:15:23,490 the general paradigm is to do something like this. 1669 01:15:23,490 --> 01:15:25,950 To define a temporary variable, I could call it temp. 1670 01:15:25,950 --> 01:15:27,992 But another convention that you might as well see 1671 01:15:27,992 --> 01:15:30,120 is called pointer, ptr for short. 1672 01:15:30,120 --> 01:15:31,870 But you can call it anything you want. 1673 01:15:31,870 --> 01:15:33,930 And you can have a temporary variable first point 1674 01:15:33,930 --> 01:15:35,700 at the first node in the list. 1675 01:15:35,700 --> 01:15:37,887 And then in some kind of loop, like a while loop, 1676 01:15:37,887 --> 01:15:39,720 you point it at the second node in the list. 1677 01:15:39,720 --> 01:15:40,980 And then you keep iterating. 1678 01:15:40,980 --> 01:15:42,510 You point it at the last node in the list. 1679 01:15:42,510 --> 01:15:44,340 And then, eventually, you iterate too far, 1680 01:15:44,340 --> 01:15:46,800 effectively pointing at null, at which point 1681 01:15:46,800 --> 01:15:49,000 your while loop can presumably terminate. 1682 01:15:49,000 --> 01:15:52,590 So how do I implement that idea of allocating a temporary pointer that 1683 01:15:52,590 --> 01:15:56,203 just points at each node in the list and lets me print out ultimately 1684 01:15:56,203 --> 01:15:57,120 each of those numbers? 1685 01:15:57,120 --> 01:15:59,040 Well, let's go back to my code here. 1686 01:15:59,040 --> 01:16:00,430 And let me do this. 1687 01:16:00,430 --> 01:16:04,020 Let me go ahead and declare this temporary pointer, 1688 01:16:04,020 --> 01:16:05,680 which is going to be a node star also. 1689 01:16:05,680 --> 01:16:06,180 Why? 1690 01:16:06,180 --> 01:16:07,530 Because it's the address of a node. 1691 01:16:07,530 --> 01:16:08,850 The first, the second, the third. 1692 01:16:08,850 --> 01:16:11,892 And I'm going to set that equal to whatever the beginning of the list is. 1693 01:16:11,892 --> 01:16:15,180 So that is going to be equivalent to this version of the picture 1694 01:16:15,180 --> 01:16:19,570 here, where pointer is just temporarily pointing at the first node in the list. 1695 01:16:19,570 --> 01:16:21,870 It's not pointing at list per se, it's pointing 1696 01:16:21,870 --> 01:16:25,920 at the first node in the list, which list is also pointing at itself. 1697 01:16:25,920 --> 01:16:28,050 Once I've done this, I think I can translate 1698 01:16:28,050 --> 01:16:32,460 this to code that's a little new-- but it's conceptually familiar perhaps 1699 01:16:32,460 --> 01:16:36,700 now-- while that pointer does not equal null. 1700 01:16:36,700 --> 01:16:40,740 So while I have a valid pointer, like my finger or that arrow 1701 01:16:40,740 --> 01:16:44,530 is pointing at an actual node in memory, well, let me go ahead and print it out. 1702 01:16:44,530 --> 01:16:46,920 So let me print out with percent i backslash 1703 01:16:46,920 --> 01:16:52,908 n whatever is in the current node at the number field within. 1704 01:16:52,908 --> 01:16:55,200 And again, this is going to have the effect, hopefully, 1705 01:16:55,200 --> 01:16:57,180 of first printing the three. 1706 01:16:57,180 --> 01:17:01,170 And I think I just need to now update the pointer so 1707 01:17:01,170 --> 01:17:04,150 that on the next iteration it's pointing at the next value. 1708 01:17:04,150 --> 01:17:06,750 So if this is where the story is, how do I 1709 01:17:06,750 --> 01:17:10,150 update pointer to point at the second element of the list? 1710 01:17:10,150 --> 01:17:12,250 Well, I want pointer to point at the two. 1711 01:17:12,250 --> 01:17:14,760 And I want pointer to eventually point at the three. 1712 01:17:14,760 --> 01:17:16,020 Well, how do I do that? 1713 01:17:16,020 --> 01:17:19,230 Well, the way in code I can follow these arrows is as follows. 1714 01:17:19,230 --> 01:17:22,190 If I currently have pointer pointing at this node 1715 01:17:22,190 --> 01:17:27,570 but I want to point it at the next node, I can borrow this pointer here. 1716 01:17:27,570 --> 01:17:30,920 So whatever this address is in the first node, a.k.a. 1717 01:17:30,920 --> 01:17:33,770 the next field, I can copy that into pointer. 1718 01:17:33,770 --> 01:17:36,200 Because then, pointer will point at whatever 1719 01:17:36,200 --> 01:17:39,230 this is pointing at by just setting one equal to the other. 1720 01:17:39,230 --> 01:17:44,570 So once I've done that, the picture will become this. 1721 01:17:44,570 --> 01:17:47,420 And how do I translate that to code? 1722 01:17:47,420 --> 01:17:50,340 Well, new syntax is surprisingly straightforward. 1723 01:17:50,340 --> 01:17:53,060 All I need do is say pointer after printing 1724 01:17:53,060 --> 01:18:00,300 it equals whatever pointer currently is, but grab its next field instead. 1725 01:18:00,300 --> 01:18:03,860 And this is a very common paradigm when iterating over a linked list 1726 01:18:03,860 --> 01:18:06,470 and you're using some temporary variable like pointer, 1727 01:18:06,470 --> 01:18:09,680 you can simply set pointer equal to pointer next. 1728 01:18:09,680 --> 01:18:12,330 And what that means here is as follows. 1729 01:18:12,330 --> 01:18:16,670 If this is pointer pointing from here down to here, 1730 01:18:16,670 --> 01:18:20,160 pointer next is follow the arrow, grab the next field. 1731 01:18:20,160 --> 01:18:22,550 So if you set pointer equal to this thing, 1732 01:18:22,550 --> 01:18:26,570 that's the same thing as pointing this at this same box. 1733 01:18:26,570 --> 01:18:29,862 And indeed, if I advance to the next slide, 1734 01:18:29,862 --> 01:18:31,820 even though the arrows are technically pointing 1735 01:18:31,820 --> 01:18:34,730 at different parts of the rectangles, that's just for graphic's sake, 1736 01:18:34,730 --> 01:18:36,620 pointer is now pointing at the second node. 1737 01:18:36,620 --> 01:18:39,650 And when I do this again on my next iteration, it points at this. 1738 01:18:39,650 --> 01:18:44,840 And then, this last step, notice, when I keep doing pointer equals pointer next, 1739 01:18:44,840 --> 01:18:47,570 this will become eventually this value. 1740 01:18:47,570 --> 01:18:50,620 But what's this value in this linked list? 1741 01:18:50,620 --> 01:18:51,910 It's null, technically. 1742 01:18:51,910 --> 01:18:55,180 So this arrow will eventually take on this value 1743 01:18:55,180 --> 01:18:57,490 when I set pointer equal to pointer next. 1744 01:18:57,490 --> 01:19:02,630 And at that point ptr, my temporary pointer is going to be null. 1745 01:19:02,630 --> 01:19:04,760 So it might as well look like this pictorially. 1746 01:19:04,760 --> 01:19:07,630 And what does that mean for my loop? 1747 01:19:07,630 --> 01:19:11,530 Once pointer is null, because you've walked off the end of the linked list, 1748 01:19:11,530 --> 01:19:17,210 what's going to be true of this loop here started in line 32? 1749 01:19:17,210 --> 01:19:20,870 Any observations here? 1750 01:19:20,870 --> 01:19:22,010 What's going to be true? 1751 01:19:22,010 --> 01:19:24,620 What will happen now as soon as we hit the end of the list? 1752 01:19:24,620 --> 01:19:25,250 Yeah, sorry. 1753 01:19:25,250 --> 01:19:26,810 AUDIENCE: [INAUDIBLE] 1754 01:19:26,810 --> 01:19:28,040 DAVID J. MALAN: The loop is going to break out. 1755 01:19:28,040 --> 01:19:28,540 Why? 1756 01:19:28,540 --> 01:19:30,830 Because line 32, which is constantly asking, 1757 01:19:30,830 --> 01:19:32,570 well, pointer does not equal null. 1758 01:19:32,570 --> 01:19:36,590 Well, if pointer finally equals null three steps later, 1759 01:19:36,590 --> 01:19:38,400 the while loop is now done. 1760 01:19:38,400 --> 01:19:40,580 And so, what I can do at the end of this program 1761 01:19:40,580 --> 01:19:43,103 once I printed out those values, well, first let's go ahead 1762 01:19:43,103 --> 01:19:44,270 and open my terminal window. 1763 01:19:44,270 --> 01:19:46,220 Let's make list. 1764 01:19:46,220 --> 01:19:47,840 Compile dot slash list. 1765 01:19:47,840 --> 01:19:51,080 And let me try the same values, one, and two, and three. 1766 01:19:51,080 --> 01:19:55,130 That's going to again allocate one node, two node, three nodes by prepending, 1767 01:19:55,130 --> 01:19:57,290 prepending, prepending each of those values. 1768 01:19:57,290 --> 01:20:00,060 And it's then going to iterate over them from left to right. 1769 01:20:00,060 --> 01:20:03,380 And so, when I hit Enter now, what should I see on the screen 1770 01:20:03,380 --> 01:20:04,445 if my code is correct? 1771 01:20:04,445 --> 01:20:07,040 1772 01:20:07,040 --> 01:20:07,910 What will I see? 1773 01:20:07,910 --> 01:20:10,492 Feel free to just call it out. 1774 01:20:10,492 --> 01:20:11,575 AUDIENCE: Three, two, one. 1775 01:20:11,575 --> 01:20:14,490 DAVID J. MALAN: Three, two, one, because I've prepended presumably. 1776 01:20:14,490 --> 01:20:15,630 And here we go. 1777 01:20:15,630 --> 01:20:17,250 I indeed see three, two, one. 1778 01:20:17,250 --> 01:20:18,510 So the list is backwards. 1779 01:20:18,510 --> 01:20:20,100 But all of the elements are there. 1780 01:20:20,100 --> 01:20:22,230 Now, technically, if I ran Valgrind on this, 1781 01:20:22,230 --> 01:20:25,500 Valgrind would not be happy because I have never freed any of my memory. 1782 01:20:25,500 --> 01:20:27,930 So I should probably now have a second loop here 1783 01:20:27,930 --> 01:20:29,350 that does something like this. 1784 01:20:29,350 --> 01:20:31,918 Let me again set pointer equal to list. 1785 01:20:31,918 --> 01:20:33,960 I don't need to redeclare it because I've already 1786 01:20:33,960 --> 01:20:35,760 created this thing on line 31. 1787 01:20:35,760 --> 01:20:38,552 I just want to reset it to be the beginning of the list again. 1788 01:20:38,552 --> 01:20:40,260 And now, I can do the same kind of thing. 1789 01:20:40,260 --> 01:20:45,640 While prt not equals null, go ahead and do this. 1790 01:20:45,640 --> 01:20:48,510 Well, I don't want to just do free pointer 1791 01:20:48,510 --> 01:20:52,770 and then do pointer gets pointer next. 1792 01:20:52,770 --> 01:20:54,000 Why? 1793 01:20:54,000 --> 01:20:56,212 My goal is to free all of my memory. 1794 01:20:56,212 --> 01:20:58,170 But I think this is going to get me in trouble. 1795 01:20:58,170 --> 01:21:00,587 Pointer equals list just gives me a temporary pointer that 1796 01:21:00,587 --> 01:21:03,630 points at the three, and then eventually the two, and then the one. 1797 01:21:03,630 --> 01:21:04,140 How? 1798 01:21:04,140 --> 01:21:07,090 Well, while pointer not equal null, I'm freeing the pointer. 1799 01:21:07,090 --> 01:21:10,500 So this is like saying to malloc, free that node, free that node, free 1800 01:21:10,500 --> 01:21:11,460 that node. 1801 01:21:11,460 --> 01:21:16,040 But what's the problem with what I've just done here? 1802 01:21:16,040 --> 01:21:18,030 This code is technically buggy. 1803 01:21:18,030 --> 01:21:18,530 Yeah. 1804 01:21:18,530 --> 01:21:21,613 AUDIENCE: After you free your pointer, you have no [INAUDIBLE] what's next 1805 01:21:21,613 --> 01:21:22,260 [INAUDIBLE]. 1806 01:21:22,260 --> 01:21:23,260 DAVID J. MALAN: Exactly. 1807 01:21:23,260 --> 01:21:27,180 After you call free on pointer, you are by social contract 1808 01:21:27,180 --> 01:21:30,240 with C not allowed to touch pointer anymore. 1809 01:21:30,240 --> 01:21:31,080 It is invalid. 1810 01:21:31,080 --> 01:21:32,580 Now it's still going to be a number. 1811 01:21:32,580 --> 01:21:34,288 It's still going to be a pattern of bits. 1812 01:21:34,288 --> 01:21:35,280 But it's invalid. 1813 01:21:35,280 --> 01:21:40,240 And you'll very often get a segmentation fault if you tempt fate in that way. 1814 01:21:40,240 --> 01:21:43,920 So I can't free the pointer and then use it literally the next line. 1815 01:21:43,920 --> 01:21:47,220 The solution here, like our swapping of the liquids last time, 1816 01:21:47,220 --> 01:21:49,480 was to maybe just have a temporary variable. 1817 01:21:49,480 --> 01:21:50,550 So I can do a switcheroo. 1818 01:21:50,550 --> 01:21:54,120 And so, a common way to solve this problem to get the order of operations 1819 01:21:54,120 --> 01:21:55,890 right would be to do something like this. 1820 01:21:55,890 --> 01:21:59,790 Give yourself a temporary pointer like node star next. 1821 01:21:59,790 --> 01:22:05,220 Set it equal to the place you want to go next, so we're one step ahead. 1822 01:22:05,220 --> 01:22:07,290 Now you can free pointer. 1823 01:22:07,290 --> 01:22:10,690 And then you can update pointer to be that next value. 1824 01:22:10,690 --> 01:22:12,900 So essentially, you need two hands now. 1825 01:22:12,900 --> 01:22:15,720 You create on line 41 another pointer that if this 1826 01:22:15,720 --> 01:22:18,300 is pointing at the first node, the three, 1827 01:22:18,300 --> 01:22:20,880 your new pointer is pointing at the two temporarily. 1828 01:22:20,880 --> 01:22:24,960 So now you can tell malloc via free, release this memory. 1829 01:22:24,960 --> 01:22:27,720 But I haven't forgotten where I want to go next. 1830 01:22:27,720 --> 01:22:30,160 And so, I can now continue on. 1831 01:22:30,160 --> 01:22:33,720 So a common paradigm for just iterating over these nodes 1832 01:22:33,720 --> 01:22:36,420 and then freeing them, a couple of observations. 1833 01:22:36,420 --> 01:22:38,850 Strictly speaking, I could have consolidated this. 1834 01:22:38,850 --> 01:22:42,720 I don't need two loops to print the nodes and then free the nodes. 1835 01:22:42,720 --> 01:22:43,948 I could do that all at once. 1836 01:22:43,948 --> 01:22:46,740 But let's assume that there's other stuff of interest in my program 1837 01:22:46,740 --> 01:22:48,615 and I don't want to just immediately free it. 1838 01:22:48,615 --> 01:22:51,420 There's one other bug that I should probably address here. 1839 01:22:51,420 --> 01:22:54,120 There is still a potential memory leak up here. 1840 01:22:54,120 --> 01:22:57,810 And this one is super subtle, though Valgrind would help you find it. 1841 01:22:57,810 --> 01:23:02,730 Notice that in this loop here when I'm calling malloc, this line of code 1842 01:23:02,730 --> 01:23:06,120 is fine if the first line of malloc fails and returns 1843 01:23:06,120 --> 01:23:08,440 null because I immediately return and I'm done. 1844 01:23:08,440 --> 01:23:12,240 But what if the second call-- but not the first, or the third call, but not 1845 01:23:12,240 --> 01:23:13,770 the first or second fail? 1846 01:23:13,770 --> 01:23:16,560 This line of code has me returning immediately. 1847 01:23:16,560 --> 01:23:19,840 You really need to do some garbage collection, so to speak, 1848 01:23:19,840 --> 01:23:23,250 whereby you really need to go in and free any nodes that you 1849 01:23:23,250 --> 01:23:25,110 did allocate successfully earlier. 1850 01:23:25,110 --> 01:23:26,370 Honestly, that's going to be a pain in the neck. 1851 01:23:26,370 --> 01:23:27,450 We won't do that here. 1852 01:23:27,450 --> 01:23:30,840 But probably what I'd want to do is write a function called free list 1853 01:23:30,840 --> 01:23:33,180 or something like that and call that function 1854 01:23:33,180 --> 01:23:35,950 to free any nodes I had previously created. 1855 01:23:35,950 --> 01:23:37,960 So it's not quite at the finish line. 1856 01:23:37,960 --> 01:23:41,160 But the building blocks are indeed here. 1857 01:23:41,160 --> 01:23:44,550 Questions on this code? 1858 01:23:44,550 --> 01:23:48,360 And I think it's safe for me to promise that it won't escalate further 1859 01:23:48,360 --> 01:23:50,880 from that. 1860 01:23:50,880 --> 01:23:54,760 Questions on this? 1861 01:23:54,760 --> 01:23:55,630 No? 1862 01:23:55,630 --> 01:23:58,130 Well, let me show you one alternative that you might prefer. 1863 01:23:58,130 --> 01:24:00,005 And I'm pretty sure this isn't an escalation, 1864 01:24:00,005 --> 01:24:01,600 it's just an alternative formulation. 1865 01:24:01,600 --> 01:24:05,060 Another way you can iterate over nodes in a list could be this. 1866 01:24:05,060 --> 01:24:07,203 Instead of a while loop, for instance, let 1867 01:24:07,203 --> 01:24:09,370 me actually show you one other piece of syntax here. 1868 01:24:09,370 --> 01:24:10,930 You could technically use a for loop. 1869 01:24:10,930 --> 01:24:15,580 You could give yourself a node pointer here that is initialized to the list. 1870 01:24:15,580 --> 01:24:18,530 You can then check in your for loop that it's not equal to null. 1871 01:24:18,530 --> 01:24:21,310 And then, you can do your update as usual like this. 1872 01:24:21,310 --> 01:24:23,360 Either of these are equivalent. 1873 01:24:23,360 --> 01:24:25,480 Even though this one I suspect looks scarier, 1874 01:24:25,480 --> 01:24:28,840 it's doing the exact same thing in one line instead of two. 1875 01:24:28,840 --> 01:24:32,230 But there's no reason we can't use for loops instead of while loops 1876 01:24:32,230 --> 01:24:33,490 to achieve the same idea. 1877 01:24:33,490 --> 01:24:35,320 But I'll leave these two as demonstrations 1878 01:24:35,320 --> 01:24:36,580 of one approach or the other. 1879 01:24:36,580 --> 01:24:39,340 But that's just like in week one, for loops, while loops, 1880 01:24:39,340 --> 01:24:41,860 whatever looks simpler to you. 1881 01:24:41,860 --> 01:24:46,460 Even though admittedly neither of these probably looks super clean. 1882 01:24:46,460 --> 01:24:49,660 So let's take things back to things more conceptual here. 1883 01:24:49,660 --> 01:24:52,900 Up until now, we've been inserting elements into this linked list 1884 01:24:52,900 --> 01:24:54,580 by prepending them. 1885 01:24:54,580 --> 01:24:58,070 Let's consider what the running time then is of these operations. 1886 01:24:58,070 --> 01:25:01,423 So if I've got a linked list of size three or size n more generally, 1887 01:25:01,423 --> 01:25:03,340 time has passed and I've added a lot of things 1888 01:25:03,340 --> 01:25:06,400 to it, what's going to be the running time, for instance, of searching 1889 01:25:06,400 --> 01:25:08,590 a linked list for some value? 1890 01:25:08,590 --> 01:25:10,510 And I'll tell you already, it's not log n. 1891 01:25:10,510 --> 01:25:15,040 Because again, binary search is off the table as per before break. 1892 01:25:15,040 --> 01:25:20,080 So what might the running time be of searching a linked list for some value, 1893 01:25:20,080 --> 01:25:23,350 like two, or three, or one, or 50? 1894 01:25:23,350 --> 01:25:26,545 What might the running time be? 1895 01:25:26,545 --> 01:25:27,420 AUDIENCE: [INAUDIBLE] 1896 01:25:27,420 --> 01:25:28,310 DAVID J. MALAN: O of-- 1897 01:25:28,310 --> 01:25:28,970 I heard it over here. 1898 01:25:28,970 --> 01:25:29,600 O of n. 1899 01:25:29,600 --> 01:25:30,260 And why? 1900 01:25:30,260 --> 01:25:30,862 Who was that? 1901 01:25:30,862 --> 01:25:31,820 Oh, in the middle here? 1902 01:25:31,820 --> 01:25:33,421 Who O of n? 1903 01:25:33,421 --> 01:25:37,260 AUDIENCE: Because you're [INAUDIBLE] item in the list [INAUDIBLE].. 1904 01:25:37,260 --> 01:25:38,260 DAVID J. MALAN: Exactly. 1905 01:25:38,260 --> 01:25:40,030 You're going to have to go through every [? item ?] in the list 1906 01:25:40,030 --> 01:25:42,550 starting from the left from the beginning, which is how we've been 1907 01:25:42,550 --> 01:25:44,175 drawing things and connecting the dots. 1908 01:25:44,175 --> 01:25:47,092 And in the worst case, the element might very well be at the very end. 1909 01:25:47,092 --> 01:25:48,430 So it's going to be big O of n. 1910 01:25:48,430 --> 01:25:49,990 What about insertion? 1911 01:25:49,990 --> 01:25:52,810 How many steps in terms of big O notation 1912 01:25:52,810 --> 01:25:55,780 has it been taking me to insert elements into the linked 1913 01:25:55,780 --> 01:25:59,692 list using this prepend design? 1914 01:25:59,692 --> 01:26:00,540 AUDIENCE: One. 1915 01:26:00,540 --> 01:26:01,415 DAVID J. MALAN: Yeah. 1916 01:26:01,415 --> 01:26:03,900 So it's technically constant time, big O of one. 1917 01:26:03,900 --> 01:26:06,197 And again, one is just representative of any constant. 1918 01:26:06,197 --> 01:26:08,280 It could technically be two steps, or three steps, 1919 01:26:08,280 --> 01:26:10,240 or even 10 steps, or 100 steps. 1920 01:26:10,240 --> 01:26:13,950 But if it's always finite and fixed, then 1921 01:26:13,950 --> 01:26:16,080 indeed you can say it's in big O of one. 1922 01:26:16,080 --> 01:26:17,080 Now, why is that? 1923 01:26:17,080 --> 01:26:19,950 Well, again, no matter how long this list gets, so long 1924 01:26:19,950 --> 01:26:24,240 as there's memory available for me I can just create a little splice 1925 01:26:24,240 --> 01:26:27,570 at the beginning of the list to put in the new node, update the original list, 1926 01:26:27,570 --> 01:26:28,410 and I'm on my way. 1927 01:26:28,410 --> 01:26:31,618 And it keeps getting longer even though it might not be spread out in memory. 1928 01:26:31,618 --> 01:26:34,650 So big O of one is possible with these linked lists 1929 01:26:34,650 --> 01:26:36,690 if I indeed prepend things. 1930 01:26:36,690 --> 01:26:39,300 Of course, if I prepend things, everything's 1931 01:26:39,300 --> 01:26:40,920 going to get out of order potentially. 1932 01:26:40,920 --> 01:26:43,590 And we're going to have maybe the stack property instead of a queue property. 1933 01:26:43,590 --> 01:26:45,715 So we might want to do things slightly differently. 1934 01:26:45,715 --> 01:26:49,620 So instead of doing this, whereby we kept prepending, 1935 01:26:49,620 --> 01:26:53,890 prepending, prepending, suppose we append to the end of the list instead. 1936 01:26:53,890 --> 01:26:56,190 So if we now insert the one, the two, and the three 1937 01:26:56,190 --> 01:26:59,640 as we might want to for a queue, to maintain that fairness property, 1938 01:26:59,640 --> 01:27:01,230 we might start with an empty list. 1939 01:27:01,230 --> 01:27:02,580 We might add the one. 1940 01:27:02,580 --> 01:27:05,820 We might append the two, append the three. 1941 01:27:05,820 --> 01:27:08,263 And so, it just is laid out differently in memory. 1942 01:27:08,263 --> 01:27:10,180 And again, if I can come to you in the middle, 1943 01:27:10,180 --> 01:27:12,180 what's the running time of search, again, 1944 01:27:12,180 --> 01:27:15,592 when the linked list uses this append implementation? 1945 01:27:15,592 --> 01:27:16,830 AUDIENCE: Still big O of n. 1946 01:27:16,830 --> 01:27:17,190 DAVID J. MALAN: Yeah. 1947 01:27:17,190 --> 01:27:17,898 Still big O of n. 1948 01:27:17,898 --> 01:27:21,065 Because in the worst case, you're going to have to go through the whole list 1949 01:27:21,065 --> 01:27:21,960 just to find it. 1950 01:27:21,960 --> 01:27:25,380 And notice, it doesn't matter if you have an intuition now that the bigger 1951 01:27:25,380 --> 01:27:27,420 numbers might very well be at the end. 1952 01:27:27,420 --> 01:27:29,520 You have no way to jump to the end. 1953 01:27:29,520 --> 01:27:33,120 You have no way to jump to the middle or do anything resembling binary search. 1954 01:27:33,120 --> 01:27:38,025 Every search has to start from the left and follow the arrows again and again. 1955 01:27:38,025 --> 01:27:39,900 So I don't think we've done any better there. 1956 01:27:39,900 --> 01:27:43,500 And in fact, what is insertions running time now in big O 1957 01:27:43,500 --> 01:27:45,690 when we're appending to the list in this way, 1958 01:27:45,690 --> 01:27:49,230 as we might to implement a queue instead of a stack? 1959 01:27:49,230 --> 01:27:53,670 What's the running time of inserting a new value? 1960 01:27:53,670 --> 01:27:55,397 Big O of? 1961 01:27:55,397 --> 01:27:55,980 AUDIENCE: One. 1962 01:27:55,980 --> 01:28:00,030 DAVID J. MALAN: So not big O of one in this case, but big O of n. 1963 01:28:00,030 --> 01:28:02,610 Because if I'm appending, by definition I 1964 01:28:02,610 --> 01:28:06,130 have to start here and traverse the whole thing looking for the end. 1965 01:28:06,130 --> 01:28:08,010 Now, this is a bit of an overstatement. 1966 01:28:08,010 --> 01:28:09,900 You could obviously optimize this slightly 1967 01:28:09,900 --> 01:28:14,220 by maybe adding another variable that always points to the last element, sort 1968 01:28:14,220 --> 01:28:17,460 of a cheat sheet or a shortcut that gets you all the way to the end. 1969 01:28:17,460 --> 01:28:18,293 That's totally fine. 1970 01:28:18,293 --> 01:28:20,752 It's not-- it doesn't really fit the traditional definition 1971 01:28:20,752 --> 01:28:21,810 of a singly linked list. 1972 01:28:21,810 --> 01:28:25,330 But there's absolutely smart engineering solutions to these kinds of problems. 1973 01:28:25,330 --> 01:28:27,570 But as designed, it would indeed be big O event 1974 01:28:27,570 --> 01:28:30,480 to insert to if you've got to go all the way to the end 1975 01:28:30,480 --> 01:28:34,260 and you're not using a little extra memory to get yourself there quickly. 1976 01:28:34,260 --> 01:28:38,160 Well, what if we want to take things one last step and not just append 1977 01:28:38,160 --> 01:28:41,160 blindly, because even though I inserted one, two, three, 1978 01:28:41,160 --> 01:28:44,910 if I inserted them in random order, they would end up in random order. 1979 01:28:44,910 --> 01:28:48,660 What if you want to maintain a sorted list from smallest to largest? 1980 01:28:48,660 --> 01:28:51,090 Well, then you might want to insert numbers like this. 1981 01:28:51,090 --> 01:28:54,420 Starting from an empty list, we might have a two, 1982 01:28:54,420 --> 01:28:56,640 then we might try inserting a one. 1983 01:28:56,640 --> 01:28:57,930 But we want to keep it sorted. 1984 01:28:57,930 --> 01:29:00,450 So now we're going to prepend in our code. 1985 01:29:00,450 --> 01:29:02,820 But then you might want to insert a four. 1986 01:29:02,820 --> 01:29:05,400 So you would append the four because you're probably 1987 01:29:05,400 --> 01:29:07,440 going to look for the right spot to insert it. 1988 01:29:07,440 --> 01:29:09,110 Then we're going to insert a three. 1989 01:29:09,110 --> 01:29:10,860 And this one is getting a little annoying. 1990 01:29:10,860 --> 01:29:14,910 Because now you have to iterate over the list, look for the right spot, 1991 01:29:14,910 --> 01:29:17,760 and then do a little smarter of a splice. 1992 01:29:17,760 --> 01:29:18,773 But it's possible. 1993 01:29:18,773 --> 01:29:20,940 But you don't want to orphan the four, for instance. 1994 01:29:20,940 --> 01:29:24,060 And then, ultimately, we get back to this question. 1995 01:29:24,060 --> 01:29:26,640 What would the performance be of your linked list 1996 01:29:26,640 --> 01:29:30,090 if you're trying to maintain sorted order? 1997 01:29:30,090 --> 01:29:34,410 Well, search I think is going to be big O of n for the same reasons as before. 1998 01:29:34,410 --> 01:29:37,170 What about insertion? 1999 01:29:37,170 --> 01:29:44,120 Big O of what for inserting into a sorted linked list? 2000 01:29:44,120 --> 01:29:45,590 Yeah, in the worst case? 2001 01:29:45,590 --> 01:29:46,460 AUDIENCE: Big O of n 2002 01:29:46,460 --> 01:29:46,850 DAVID J. MALAN: Yeah. 2003 01:29:46,850 --> 01:29:48,050 It's still a big O of n. 2004 01:29:48,050 --> 01:29:51,410 So it's no worse than, but it's not really any better than appending. 2005 01:29:51,410 --> 01:29:55,040 But we gain the additional property of maintaining a sorted list, which 2006 01:29:55,040 --> 01:29:58,070 might very well be useful if you're sorting your contacts in your phone 2007 01:29:58,070 --> 01:30:01,850 or something like that where it just makes sense to maintain sorted order. 2008 01:30:01,850 --> 01:30:04,280 Now, in the code for online today, if you take a look 2009 01:30:04,280 --> 01:30:09,203 at some of the final versions of code, like list 6.c and list 5.c 2010 01:30:09,203 --> 01:30:11,120 as we'll post on the website, you can actually 2011 01:30:11,120 --> 01:30:14,660 see code that will solve all three of these problems, the prepend version 2012 01:30:14,660 --> 01:30:17,360 that we wrote live, the append version which we talked through, 2013 01:30:17,360 --> 01:30:19,220 as well as this sorted order one. 2014 01:30:19,220 --> 01:30:22,197 But I think I'll avoid showing it live just because I do 2015 01:30:22,197 --> 01:30:23,780 think that starts to escalate quickly. 2016 01:30:23,780 --> 01:30:25,697 But I think we have enough of a building block 2017 01:30:25,697 --> 01:30:29,660 if we're comfortable with prepending to at least solve some real-world problems 2018 01:30:29,660 --> 01:30:31,970 with these linked lists. 2019 01:30:31,970 --> 01:30:37,520 Questions then on linked lists which we'll now leave behind on their own 2020 01:30:37,520 --> 01:30:44,630 but now use this technique to solve fancier problems but much less code. 2021 01:30:44,630 --> 01:30:47,630 Questions on linked lists? 2022 01:30:47,630 --> 01:30:52,160 So to recap, we've taken a sidestep with linked list. 2023 01:30:52,160 --> 01:30:56,180 We have this dynamism now where we can grow and shrink our chunks of memory 2024 01:30:56,180 --> 01:30:59,150 without over-allocating or accidentally under-allocating 2025 01:30:59,150 --> 01:31:00,620 as in the world of an array. 2026 01:31:00,620 --> 01:31:03,580 We don't have to worry about copying values endlessly 2027 01:31:03,580 --> 01:31:06,830 because once you allocate the node, it can just stay wherever it is in memory. 2028 01:31:06,830 --> 01:31:08,390 And you can just maintain-- 2029 01:31:08,390 --> 01:31:10,190 you can just stitch it together somehow. 2030 01:31:10,190 --> 01:31:13,220 But unfortunately, we've sacrificed what we started the class 2031 01:31:13,220 --> 01:31:16,550 with in week zero, which was binary search, divide and conquer, which 2032 01:31:16,550 --> 01:31:19,550 was like gave us that log and running time, which 2033 01:31:19,550 --> 01:31:23,870 was really compelling if you think back to the demonstrations and the visuals. 2034 01:31:23,870 --> 01:31:26,060 Can we get the best of both worlds? 2035 01:31:26,060 --> 01:31:30,920 Can we get the speed of binary search, something logarithmic, 2036 01:31:30,920 --> 01:31:33,660 but the dynamism of something like a linked list? 2037 01:31:33,660 --> 01:31:35,600 Well, we can, actually, I think, if we start 2038 01:31:35,600 --> 01:31:39,690 to think not in a single dimension, just the x-axis, if you will, 2039 01:31:39,690 --> 01:31:42,410 but two dimensions, such that our data structures 2040 01:31:42,410 --> 01:31:45,120 can maybe now have width and height, if you will. 2041 01:31:45,120 --> 01:31:47,600 And so, a tree is perhaps the right term here. 2042 01:31:47,600 --> 01:31:51,290 Much like a family tree, if you have your elders up here in the tree 2043 01:31:51,290 --> 01:31:54,260 and then the branches below them for their children and grandchildren 2044 01:31:54,260 --> 01:31:54,890 and the like. 2045 01:31:54,890 --> 01:31:56,598 That's actually what a computer scientist 2046 01:31:56,598 --> 01:31:59,840 means when they talk about trees, not a tree that grows up like this, 2047 01:31:59,840 --> 01:32:02,510 but really one that typically is depicted growing down. 2048 01:32:02,510 --> 01:32:05,520 Although this is just an artist's depiction no matter what. 2049 01:32:05,520 --> 01:32:09,080 But there are certain types of trees in the world called binary search 2050 01:32:09,080 --> 01:32:14,090 trees that are structured on paper and in visually like a family tree. 2051 01:32:14,090 --> 01:32:16,850 But they have a special property that lends themselves 2052 01:32:16,850 --> 01:32:19,470 to exactly that feature binary search. 2053 01:32:19,470 --> 01:32:22,580 So for instance, here is an array back from week two. 2054 01:32:22,580 --> 01:32:26,450 And I've sorted a whole bunch of numbers herein from one to seven. 2055 01:32:26,450 --> 01:32:29,540 We know we can do binary search on this structure 2056 01:32:29,540 --> 01:32:31,640 if it's implemented as an array. 2057 01:32:31,640 --> 01:32:37,490 But what feature do arrays, to be clear, not have that linked lists do? 2058 01:32:37,490 --> 01:32:38,720 Today's kind of a seesaw. 2059 01:32:38,720 --> 01:32:45,486 What did we just gain by adding linked lists that arrays do not allow Yeah. 2060 01:32:45,486 --> 01:32:47,755 AUDIENCE: [INAUDIBLE] 2061 01:32:47,755 --> 01:32:48,630 DAVID J. MALAN: Yeah. 2062 01:32:48,630 --> 01:32:51,330 You can insert more elements without having to copy 2063 01:32:51,330 --> 01:32:52,890 or moving everything else around. 2064 01:32:52,890 --> 01:32:57,330 Right now, in this single dimension, if these values to the left and/or right 2065 01:32:57,330 --> 01:32:59,810 are already used, then you have to move everything. 2066 01:32:59,810 --> 01:33:01,560 And that's where we started today's story. 2067 01:33:01,560 --> 01:33:03,450 So arrays kind of paint you into a corner. 2068 01:33:03,450 --> 01:33:06,990 Because you have to, by definition, decide in advance how big they are. 2069 01:33:06,990 --> 01:33:09,390 Well, couldn't we have some kind of array 2070 01:33:09,390 --> 01:33:12,780 that can still grow but still is contiguous so we 2071 01:33:12,780 --> 01:33:15,190 can do binary search in some way? 2072 01:33:15,190 --> 01:33:19,440 Well, yes, if we rethink how we implement binary search. 2073 01:33:19,440 --> 01:33:24,660 Let me propose that this, I've chosen these seven elements in the array 2074 01:33:24,660 --> 01:33:29,910 much like the lockers from week two to be ordered from smallest to largest. 2075 01:33:29,910 --> 01:33:33,040 I've highlighted now in yellow the middle elements here. 2076 01:33:33,040 --> 01:33:36,210 And if we were telling the story of week two going left or going right, 2077 01:33:36,210 --> 01:33:39,090 let me highlight in red the middle elements 2078 01:33:39,090 --> 01:33:40,890 of the left half and the right half. 2079 01:33:40,890 --> 01:33:42,960 And then, let me further highlight in green 2080 01:33:42,960 --> 01:33:45,840 the other elements in between those. 2081 01:33:45,840 --> 01:33:48,778 And there's actually a pattern here, as you might notice. 2082 01:33:48,778 --> 01:33:50,820 Whereby there's one yellow in the middle and then 2083 01:33:50,820 --> 01:33:52,830 there's the two red and the four green. 2084 01:33:52,830 --> 01:33:55,380 There's an implicit structure there, if you will. 2085 01:33:55,380 --> 01:33:57,840 And what if I do start to think in two dimensions, 2086 01:33:57,840 --> 01:34:03,210 and instead of laying out an array of lockers like this on the x-axis only, 2087 01:34:03,210 --> 01:34:09,450 what if I slide the four up, and pull the one, the three, the five down 2088 01:34:09,450 --> 01:34:13,210 and draw this in two dimensions instead? 2089 01:34:13,210 --> 01:34:17,490 Well, let me do that, as by separating these things like this. 2090 01:34:17,490 --> 01:34:20,610 Such that, now, let me propose that each of these squares 2091 01:34:20,610 --> 01:34:22,770 maybe it doesn't have to be contiguous. 2092 01:34:22,770 --> 01:34:24,990 It can be anywhere in the computer's memory. 2093 01:34:24,990 --> 01:34:27,810 But I can't have these crazy gaps among them. 2094 01:34:27,810 --> 01:34:32,100 How could I perhaps keep these things connected conceptually? 2095 01:34:32,100 --> 01:34:34,050 What should I add to the picture, if you will? 2096 01:34:34,050 --> 01:34:36,615 2097 01:34:36,615 --> 01:34:37,115 Yeah. 2098 01:34:37,115 --> 01:34:37,992 AUDIENCE: Branches? 2099 01:34:37,992 --> 01:34:39,075 DAVID J. MALAN: Say again? 2100 01:34:39,075 --> 01:34:39,410 AUDIENCE: Branches. 2101 01:34:39,410 --> 01:34:41,500 DAVID J. MALAN: So branches metaphorically here. 2102 01:34:41,500 --> 01:34:44,410 And more technically in the language of C, maybe just 2103 01:34:44,410 --> 01:34:46,420 some arrows, some pointers. 2104 01:34:46,420 --> 01:34:49,090 So I won't bother drawing things as rectangles constantly. 2105 01:34:49,090 --> 01:34:51,940 Let me propose that we're now just abstracting away what a node is. 2106 01:34:51,940 --> 01:34:54,550 But let me claim that each of these squares now is a node. 2107 01:34:54,550 --> 01:34:55,900 And a node might have a number. 2108 01:34:55,900 --> 01:34:57,400 But it might also have a pointer. 2109 01:34:57,400 --> 01:34:59,950 Heck, maybe even two or more pointers. 2110 01:34:59,950 --> 01:35:01,240 And let me draw those now. 2111 01:35:01,240 --> 01:35:04,970 I don't care about addresses like 0x123, 456, 789 anymore. 2112 01:35:04,970 --> 01:35:07,210 Let's just draw our pointers with arrows. 2113 01:35:07,210 --> 01:35:10,570 But now, let me propose that we could very well think 2114 01:35:10,570 --> 01:35:17,860 about this as a tree storing what was previously array data. 2115 01:35:17,860 --> 01:35:21,670 But now, each of these nodes can be anywhere in memory. 2116 01:35:21,670 --> 01:35:24,040 And moreover, even though I've painted myself 2117 01:35:24,040 --> 01:35:26,290 into a corner visually on the screen, so long 2118 01:35:26,290 --> 01:35:30,620 as there's more memory in the computer, I could put the number zero over here. 2119 01:35:30,620 --> 01:35:32,720 I could put the number eight over here. 2120 01:35:32,720 --> 01:35:35,290 And if I'm smart, I could probably, if I want 2121 01:35:35,290 --> 01:35:39,310 to insert other numbers, like 2.5 or 1.54 values in between, 2122 01:35:39,310 --> 01:35:43,660 I bet we could make room by swiveling things around and just 2123 01:35:43,660 --> 01:35:47,330 hanging things off of these branches slightly differently. 2124 01:35:47,330 --> 01:35:49,060 And so, what does this gain me? 2125 01:35:49,060 --> 01:35:52,690 Well, if I instead start to model my data not single dimensionally 2126 01:35:52,690 --> 01:35:56,170 but in two dimensions, and I connect those nodes 2127 01:35:56,170 --> 01:35:59,050 with these pointers, what can I now do? 2128 01:35:59,050 --> 01:36:01,280 I think I just gave myself back binary search. 2129 01:36:01,280 --> 01:36:01,780 Why? 2130 01:36:01,780 --> 01:36:03,530 Suppose I'm searching for the number five. 2131 01:36:03,530 --> 01:36:04,505 How do I find it? 2132 01:36:04,505 --> 01:36:06,880 Well just, like in a family tree where you might visually 2133 01:36:06,880 --> 01:36:09,100 start reading from top to bottom, I'm always 2134 01:36:09,100 --> 01:36:11,990 going to start from the so-called root of a binary search tree. 2135 01:36:11,990 --> 01:36:16,000 This is just like the list pointer that kicks off the whole linked list 2136 01:36:16,000 --> 01:36:16,660 process. 2137 01:36:16,660 --> 01:36:18,610 This is the so-called root. 2138 01:36:18,610 --> 01:36:19,930 Here I am at the number four. 2139 01:36:19,930 --> 01:36:21,760 I want to find the number five. 2140 01:36:21,760 --> 01:36:25,120 What decision can I make when I see that I'm currently at the number four, 2141 01:36:25,120 --> 01:36:27,550 just like the phone book from week zero? 2142 01:36:27,550 --> 01:36:30,790 Where is five not? 2143 01:36:30,790 --> 01:36:32,480 It's not to the left. 2144 01:36:32,480 --> 01:36:35,140 And if I had built a little mobile here or something 2145 01:36:35,140 --> 01:36:38,890 we could very dramatically snip off this branch, this [LAUGHS] 2146 01:36:38,890 --> 01:36:40,640 is very low-budget animation. 2147 01:36:40,640 --> 01:36:43,130 These nodes could fall to the ground. 2148 01:36:43,130 --> 01:36:45,850 And we're left with half of essentially, a tree. 2149 01:36:45,850 --> 01:36:47,050 But what do I now know? 2150 01:36:47,050 --> 01:36:48,667 It's obviously the five to the right. 2151 01:36:48,667 --> 01:36:49,750 So let me go to the right. 2152 01:36:49,750 --> 01:36:52,070 Six is obviously not the one I'm looking for. 2153 01:36:52,070 --> 01:36:53,890 But what do I now know about the five? 2154 01:36:53,890 --> 01:36:56,140 Well, five is less than the six. 2155 01:36:56,140 --> 01:37:00,850 So I can snip this off here because I know it's not going to be down there. 2156 01:37:00,850 --> 01:37:02,830 And I can follow the remaining arrow here. 2157 01:37:02,830 --> 01:37:04,533 And voila, I just found it. 2158 01:37:04,533 --> 01:37:06,700 And now, without getting into the weeds of the math, 2159 01:37:06,700 --> 01:37:08,770 I've got here what, seven elements. 2160 01:37:08,770 --> 01:37:10,610 That's roughly eight if I round up. 2161 01:37:10,610 --> 01:37:13,480 And if I do some log base two, I actually-- one, two, three 2162 01:37:13,480 --> 01:37:14,890 is the key detail here. 2163 01:37:14,890 --> 01:37:17,770 The height of this tree is three. 2164 01:37:17,770 --> 01:37:20,290 Because I took a list of size seven, and I halved it, 2165 01:37:20,290 --> 01:37:24,040 and I halved it in order to let it dangle in these two dimensions, 2166 01:37:24,040 --> 01:37:25,930 plus or minus one for rounding sake. 2167 01:37:25,930 --> 01:37:27,280 So what do I get back? 2168 01:37:27,280 --> 01:37:29,770 I now have binary search. 2169 01:37:29,770 --> 01:37:32,710 But it's not like the middle of the middle of the middle. 2170 01:37:32,710 --> 01:37:35,720 I now follow these arrows in one of two directions. 2171 01:37:35,720 --> 01:37:37,960 So each of these nodes now has an int and maybe 2172 01:37:37,960 --> 01:37:40,023 a left pointer and a right pointer. 2173 01:37:40,023 --> 01:37:41,690 But you can call them anything you want. 2174 01:37:41,690 --> 01:37:45,220 And so, I've gotten back binary search and dynamism. 2175 01:37:45,220 --> 01:37:48,580 Because if you want to add zero, or eight, or nine, or 10, 2176 01:37:48,580 --> 01:37:52,400 we can just dangle them at the bottom of the binary search tree. 2177 01:37:52,400 --> 01:37:54,070 So what would this look like in code? 2178 01:37:54,070 --> 01:37:56,380 But we won't actually implement it line by line. 2179 01:37:56,380 --> 01:38:01,330 Well, here was previously our definition of a node for a linked list, which 2180 01:38:01,330 --> 01:38:02,800 was one-dimensional, if you will. 2181 01:38:02,800 --> 01:38:05,050 Even though it might bounce up and down on the screen. 2182 01:38:05,050 --> 01:38:07,550 It was still just a line, if you will. 2183 01:38:07,550 --> 01:38:10,600 Well, let me get rid of the single pointer in the linked list. 2184 01:38:10,600 --> 01:38:13,060 Let me make a little bit of room here in this typedef. 2185 01:38:13,060 --> 01:38:16,570 And let me propose that we just add two pointers, each of which 2186 01:38:16,570 --> 01:38:18,130 is a struct node star. 2187 01:38:18,130 --> 01:38:20,230 One will be called left by convention. 2188 01:38:20,230 --> 01:38:22,870 One will be called right by convention. 2189 01:38:22,870 --> 01:38:26,200 And so, long as someone, not me, not today, not in class, 2190 01:38:26,200 --> 01:38:29,920 writes the code that stitches together this data structure too, 2191 01:38:29,920 --> 01:38:32,710 handling both the left child and the right child so to speak, 2192 01:38:32,710 --> 01:38:37,040 I think we can indeed stitch together that two-dimensional structure. 2193 01:38:37,040 --> 01:38:39,970 And moreover, once you have this in memory, 2194 01:38:39,970 --> 01:38:44,170 you can translate pretty elegantly to code binary search itself 2195 01:38:44,170 --> 01:38:47,530 using a principle we talked about recently too. 2196 01:38:47,530 --> 01:38:50,530 Here is, for instance, a function that I'll 2197 01:38:50,530 --> 01:38:54,820 write by just clicking through steps called search whose purpose in life 2198 01:38:54,820 --> 01:38:55,840 is to return a Boolean. 2199 01:38:55,840 --> 01:38:59,620 True or false, the number I'm looking for is in the tree? 2200 01:38:59,620 --> 01:39:01,930 This search function therefore takes two arguments, 2201 01:39:01,930 --> 01:39:04,780 the number I'm looking for called number, and then 2202 01:39:04,780 --> 01:39:07,970 a pointer to the tree, the so-called root of the tree. 2203 01:39:07,970 --> 01:39:10,690 Now, how can I implement binary search in code? 2204 01:39:10,690 --> 01:39:13,520 Well, recall our brief discussion of recursion. 2205 01:39:13,520 --> 01:39:16,210 It turns out recursion is a beautiful technique. 2206 01:39:16,210 --> 01:39:18,130 And honestly, more obvious technique when 2207 01:39:18,130 --> 01:39:22,060 you have two-dimensional structures, which finally after five-plus weeks 2208 01:39:22,060 --> 01:39:22,990 we now do. 2209 01:39:22,990 --> 01:39:25,390 Here is maybe my first line of code here. 2210 01:39:25,390 --> 01:39:27,848 If the tree is null, then obviously return false. 2211 01:39:27,848 --> 01:39:29,140 You've handed me an empty tree. 2212 01:39:29,140 --> 01:39:30,160 There's nothing going on. 2213 01:39:30,160 --> 01:39:32,868 Obviously, the number you're looking for is not going to be here. 2214 01:39:32,868 --> 01:39:35,170 So that's my safe base case to make sure I don't 2215 01:39:35,170 --> 01:39:37,420 screw up and recursive infinitely. 2216 01:39:37,420 --> 01:39:39,050 Well, what else might be the case? 2217 01:39:39,050 --> 01:39:44,680 Well, if the number I'm looking for is less than the tree's own number. 2218 01:39:44,680 --> 01:39:46,710 And now, recall that tree is a node star. 2219 01:39:46,710 --> 01:39:48,460 So even though I'm calling it a tree, it's 2220 01:39:48,460 --> 01:39:51,350 really the current node that's been passed in. 2221 01:39:51,350 --> 01:39:55,990 So if the number I'm looking for is less than the current node's number, 2222 01:39:55,990 --> 01:39:59,650 then I must know that the number I'm looking for is to the left, 2223 01:39:59,650 --> 01:40:00,400 so to speak. 2224 01:40:00,400 --> 01:40:01,700 So how can I solve that? 2225 01:40:01,700 --> 01:40:03,610 Well, this is where the magic of recursion. 2226 01:40:03,610 --> 01:40:09,670 Just return whatever the answer is to calling search again but on a subtree, 2227 01:40:09,670 --> 01:40:10,300 if you will. 2228 01:40:10,300 --> 01:40:13,060 This is the equivalent of snipping off half of the tree. 2229 01:40:13,060 --> 01:40:17,470 Pass in the left subtree, if you will, with the same number. 2230 01:40:17,470 --> 01:40:19,600 Else, if the number you're looking for isn't 2231 01:40:19,600 --> 01:40:22,360 less than the current node's number but greater than, 2232 01:40:22,360 --> 01:40:24,890 snip off the other subtree instead. 2233 01:40:24,890 --> 01:40:29,710 And just return whatever search says it finds in the right subtree here. 2234 01:40:29,710 --> 01:40:31,520 And then, there's a fourth and final case. 2235 01:40:31,520 --> 01:40:34,264 What else could be true logically? 2236 01:40:34,264 --> 01:40:35,220 yeah. 2237 01:40:35,220 --> 01:40:37,710 AUDIENCE: Number equals [INAUDIBLE] number? 2238 01:40:37,710 --> 01:40:38,710 DAVID J. MALAN: Perfect. 2239 01:40:38,710 --> 01:40:43,240 If the number you're looking for equals equals the number in this node, 2240 01:40:43,240 --> 01:40:45,310 then I'm just going to return true. 2241 01:40:45,310 --> 01:40:49,600 And you might recall from our recurring discussions of design, 2242 01:40:49,600 --> 01:40:51,820 I don't strictly need to ask that explicitly. 2243 01:40:51,820 --> 01:40:54,580 Either there's no node, it's to the left, it's to the right, 2244 01:40:54,580 --> 01:40:55,580 or you found it. 2245 01:40:55,580 --> 01:40:59,170 So I can just whittle that down as usual to an else. 2246 01:40:59,170 --> 01:41:01,130 And this now returns my true. 2247 01:41:01,130 --> 01:41:04,180 So here too, this is where recursion, once you get comfy with it, 2248 01:41:04,180 --> 01:41:06,970 start it gets pretty elegant and cool in the sense 2249 01:41:06,970 --> 01:41:09,340 that, wow, even though there's a lot of lines here, 2250 01:41:09,340 --> 01:41:11,620 I mean, there's only a few interesting lines. 2251 01:41:11,620 --> 01:41:13,448 A lot of it's curly braces at that, which 2252 01:41:13,448 --> 01:41:14,990 strictly speaking I could get rid of. 2253 01:41:14,990 --> 01:41:17,470 And so, recursion lends itself to elegance 2254 01:41:17,470 --> 01:41:20,980 when it comes to traversing these two-dimensional data 2255 01:41:20,980 --> 01:41:22,280 structures as well. 2256 01:41:22,280 --> 01:41:27,940 So that is in code how you might implement something like search. 2257 01:41:27,940 --> 01:41:32,290 Questions then on these trees? 2258 01:41:32,290 --> 01:41:33,220 We have dynamism. 2259 01:41:33,220 --> 01:41:34,900 We can insert more nodes to them. 2260 01:41:34,900 --> 01:41:37,150 They're faster because we get better your search back. 2261 01:41:37,150 --> 01:41:39,250 But, but, but, there's got to be a price paid. 2262 01:41:39,250 --> 01:41:41,488 Any downsides, or question, or downside? 2263 01:41:41,488 --> 01:41:42,280 AUDIENCE: Question. 2264 01:41:42,280 --> 01:41:44,863 DAVID J. MALAN: OK, let me come back to that and just one sec. 2265 01:41:44,863 --> 01:41:47,290 Downside though, what price have we paid for this dynamism 2266 01:41:47,290 --> 01:41:50,530 and for this binary searchability? 2267 01:41:50,530 --> 01:41:53,867 Even though I've abstracted it away in the picture? 2268 01:41:53,867 --> 01:41:54,907 AUDIENCE: [INAUDIBLE] 2269 01:41:54,907 --> 01:41:55,990 DAVID J. MALAN: Say again? 2270 01:41:55,990 --> 01:41:56,990 AUDIENCE: [INAUDIBLE] 2271 01:41:56,990 --> 01:41:58,880 DAVID J. MALAN: We're using a lot of memory. 2272 01:41:58,880 --> 01:42:01,790 I'm kind of misleading you now because I'm just drawing these little squares 2273 01:42:01,790 --> 01:42:02,690 with the simple numbers. 2274 01:42:02,690 --> 01:42:04,490 But there's actually three things in there. 2275 01:42:04,490 --> 01:42:09,020 A four-byte integer, an eight-byte left pointer, an eight-byte right pointer. 2276 01:42:09,020 --> 01:42:12,740 So we're already up to 16, 20 bytes now to store individual INTs. 2277 01:42:12,740 --> 01:42:15,320 That's probably OK though if memory is relatively 2278 01:42:15,320 --> 01:42:18,000 cheap and voluminous as it nowadays is. 2279 01:42:18,000 --> 01:42:20,270 But these are the kinds of trade-offs. 2280 01:42:20,270 --> 01:42:24,050 And here, too, you see a hint of why some people still do like, 2281 01:42:24,050 --> 01:42:26,720 and you see-- and in fact, it's so omnipresent. 2282 01:42:26,720 --> 01:42:29,125 Because when you have C you can really fine-tune 2283 01:42:29,125 --> 01:42:32,000 how much memory is being used for better or for worse under the hood. 2284 01:42:32,000 --> 01:42:35,630 As we transition soon to Python, these decisions get made for you. 2285 01:42:35,630 --> 01:42:39,140 And you have much, much less control about how much memory 2286 01:42:39,140 --> 01:42:42,500 is being used by your program because someone else made the design decisions 2287 01:42:42,500 --> 01:42:43,380 for you. 2288 01:42:43,380 --> 01:42:44,030 Question. 2289 01:42:44,030 --> 01:42:46,890 AUDIENCE: [INAUDIBLE] 2290 01:42:46,890 --> 01:42:49,740 DAVID J. MALAN: Is it bad if we don't the parent node? 2291 01:42:49,740 --> 01:42:50,670 Not necessarily. 2292 01:42:50,670 --> 01:42:54,820 There's no reason why you need to have pointers in both directions. 2293 01:42:54,820 --> 01:42:57,210 However, that can lend itself to efficiency. 2294 01:42:57,210 --> 01:42:59,970 By spending more space and having arrows go up to 2295 01:42:59,970 --> 01:43:03,360 you can actually save more time when searching the tree and other contexts. 2296 01:43:03,360 --> 01:43:06,810 This though would be the canonical way, the typical way to implement it. 2297 01:43:06,810 --> 01:43:09,150 But absolutely, just like a doubly linked list, 2298 01:43:09,150 --> 01:43:12,630 that could help you solve other problems too. 2299 01:43:12,630 --> 01:43:16,740 So turns out I'm overselling binary search trees. 2300 01:43:16,740 --> 01:43:21,000 There are perversions of them, so to speak, whereby they won't actually 2301 01:43:21,000 --> 01:43:22,410 behave as advertised. 2302 01:43:22,410 --> 01:43:24,700 For instance, here's a good situation. 2303 01:43:24,700 --> 01:43:27,090 Suppose you've got an empty tree initially. 2304 01:43:27,090 --> 01:43:28,498 And you insert the number two. 2305 01:43:28,498 --> 01:43:29,790 Well, it's got to go somewhere. 2306 01:43:29,790 --> 01:43:32,490 So it might as well become the root of this binary search tree. 2307 01:43:32,490 --> 01:43:34,823 And let's assume that someone wrote the code to do this. 2308 01:43:34,823 --> 01:43:36,990 Now you want to insert the number one and you 2309 01:43:36,990 --> 01:43:39,520 want to maintain the searchability of this tree. 2310 01:43:39,520 --> 01:43:44,400 Well, it's important to note that binary search tree is different from tree. 2311 01:43:44,400 --> 01:43:46,320 If you've just got a tree in memory, there 2312 01:43:46,320 --> 01:43:49,500 is no social contract with where the numbers need to go. 2313 01:43:49,500 --> 01:43:51,810 They can be completely random all over the place. 2314 01:43:51,810 --> 01:43:55,320 Binary search tree means that you can do binary search, 2315 01:43:55,320 --> 01:43:59,460 means that any node here is going to be greater than every node here 2316 01:43:59,460 --> 01:44:01,770 and less than every node here. 2317 01:44:01,770 --> 01:44:03,308 And that's a definition. 2318 01:44:03,308 --> 01:44:05,100 It's a recursive structural definition that 2319 01:44:05,100 --> 01:44:08,850 must be true to be a binary search tree or BST. 2320 01:44:08,850 --> 01:44:12,060 So if we maintain that property ourselves, let me insert two, 2321 01:44:12,060 --> 01:44:13,140 let me insert one. 2322 01:44:13,140 --> 01:44:15,060 One belongs there by that definition. 2323 01:44:15,060 --> 01:44:16,590 Let me insert three. 2324 01:44:16,590 --> 01:44:18,540 Three belongs there by that definition. 2325 01:44:18,540 --> 01:44:21,390 But I kind of got lucky in that I in the story 2326 01:44:21,390 --> 01:44:25,020 inserted two and then one and then three. 2327 01:44:25,020 --> 01:44:28,230 Let me propose a sort of perversion of the algorithm whereby we just 2328 01:44:28,230 --> 01:44:33,360 get unlucky let me propose that we insert one first. 2329 01:44:33,360 --> 01:44:34,830 And then, we insert two. 2330 01:44:34,830 --> 01:44:35,850 Well, where does two go? 2331 01:44:35,850 --> 01:44:38,700 Well, logically, it goes to the right because it's larger. 2332 01:44:38,700 --> 01:44:40,620 Now, the user inserts three. 2333 01:44:40,620 --> 01:44:42,760 Where does it go? 2334 01:44:42,760 --> 01:44:44,310 It goes there, logically. 2335 01:44:44,310 --> 01:44:46,110 And how does this story unfold? 2336 01:44:46,110 --> 01:44:48,270 The user inserts four, five, six. 2337 01:44:48,270 --> 01:44:51,180 It's wonderfully sorted in advance by luck. 2338 01:44:51,180 --> 01:44:56,040 But this is a perversion of the structure in what sense? 2339 01:44:56,040 --> 01:44:59,080 It's still technically a binary search tree. 2340 01:44:59,080 --> 01:45:02,970 But what is it look more like? 2341 01:45:02,970 --> 01:45:06,430 It really is devolving, if you will, into a linked list. 2342 01:45:06,430 --> 01:45:09,720 And so, if you, the programmer, don't implement a binary search 2343 01:45:09,720 --> 01:45:13,980 tree with some kind of repairs going on, such that as soon as something 2344 01:45:13,980 --> 01:45:17,440 gets, whoa, a little too long and stringy, I think I can fix this. 2345 01:45:17,440 --> 01:45:20,523 It's going to be an annoying line-- a number of lines of code, which we're 2346 01:45:20,523 --> 01:45:22,110 not going to write here or in a P-set. 2347 01:45:22,110 --> 01:45:24,630 But we could pivot this thing. 2348 01:45:24,630 --> 01:45:28,590 And we could just rejigger things so that the two becomes the new root. 2349 01:45:28,590 --> 01:45:30,053 The one becomes the left child. 2350 01:45:30,053 --> 01:45:31,470 The three becomes the right child. 2351 01:45:31,470 --> 01:45:33,970 But that's what, two, three-plus lines of code. 2352 01:45:33,970 --> 01:45:34,560 It's possible. 2353 01:45:34,560 --> 01:45:35,280 It's doable. 2354 01:45:35,280 --> 01:45:36,660 But it's extra work. 2355 01:45:36,660 --> 01:45:37,620 It's extra code. 2356 01:45:37,620 --> 01:45:42,570 So unless you write that code though and maintain balance of these trees, 2357 01:45:42,570 --> 01:45:46,110 just because it's a binary search tree does not mean its height is 2358 01:45:46,110 --> 01:45:47,520 going to be log base 2 of n. 2359 01:45:47,520 --> 01:45:52,450 The height could be n, in which case you don't get those properties. 2360 01:45:52,450 --> 01:45:56,310 So when it comes to looking up in a balanced binary search tree, yes, 2361 01:45:56,310 --> 01:45:57,180 it's log n. 2362 01:45:57,180 --> 01:46:00,990 But if it's unbalanced, if you don't add that additional logic in those repairs 2363 01:46:00,990 --> 01:46:04,320 so to speak, it could devolve into big O of n. 2364 01:46:04,320 --> 01:46:07,343 And this is a whole category of algorithms and fanciness 2365 01:46:07,343 --> 01:46:10,260 that you would explore in a higher level course on algorithms and data 2366 01:46:10,260 --> 01:46:10,760 structures. 2367 01:46:10,760 --> 01:46:13,410 There's lots of way to do that sort of fixing that I'm alluding 2368 01:46:13,410 --> 01:46:17,800 to in the picture there on the screen. 2369 01:46:17,800 --> 01:46:20,080 A few other data structures, if you will, 2370 01:46:20,080 --> 01:46:23,110 toward an end of a sort of computer science Holy Grail. 2371 01:46:23,110 --> 01:46:27,970 So log n is repeatedly a really good place to end up. 2372 01:46:27,970 --> 01:46:30,190 We started in week zero and we got log n. 2373 01:46:30,190 --> 01:46:33,280 We lost it earlier today by introducing linked list, 2374 01:46:33,280 --> 01:46:37,190 but we just got it back, albeit at the price of spending more space. 2375 01:46:37,190 --> 01:46:39,850 But the Holy Grail, so to speak, when it comes to algorithms 2376 01:46:39,850 --> 01:46:41,710 would not be big O of n, certainly. 2377 01:46:41,710 --> 01:46:45,220 Definitely not n squared like our bubble sorts and selection sorts. 2378 01:46:45,220 --> 01:46:47,110 And not even big O of log n. 2379 01:46:47,110 --> 01:46:50,870 What's better than all of those? 2380 01:46:50,870 --> 01:46:53,223 Just big O of one, constant time. 2381 01:46:53,223 --> 01:46:54,140 That's the Holy Grail. 2382 01:46:54,140 --> 01:46:56,270 Because if we could store huge amounts of data 2383 01:46:56,270 --> 01:46:59,330 but find it instantly in one step, or two steps, 2384 01:46:59,330 --> 01:47:01,070 or heck, even 10 or 20 steps. 2385 01:47:01,070 --> 01:47:05,063 But independent of the size of the data structure that's pretty powerful. 2386 01:47:05,063 --> 01:47:07,730 I mean, that's the secret sauce of the Google's and the Twitters 2387 01:47:07,730 --> 01:47:11,150 of the world trying to get back results really, really fast. 2388 01:47:11,150 --> 01:47:15,165 Well, it turns out another abstract data type or abstract data structure 2389 01:47:15,165 --> 01:47:16,790 might be something called a dictionary. 2390 01:47:16,790 --> 01:47:19,010 Just like the Merriam-Webster Oxford English 2391 01:47:19,010 --> 01:47:23,387 dictionaries that you might know which associate, say, words with definitions. 2392 01:47:23,387 --> 01:47:25,970 Well, you can think of a dictionary really abstractly as this. 2393 01:47:25,970 --> 01:47:28,505 Two columns maybe on a spreadsheet of sorts 2394 01:47:28,505 --> 01:47:31,130 where the left column represents something and the right column 2395 01:47:31,130 --> 01:47:32,240 represents something else. 2396 01:47:32,240 --> 01:47:34,820 The word is on the left and its definition is on the right. 2397 01:47:34,820 --> 01:47:37,470 And that's almost literally what a dictionary is on paper. 2398 01:47:37,470 --> 01:47:40,400 You've got all the words and all the definitions right next to it. 2399 01:47:40,400 --> 01:47:44,120 But more generally, in computing, a dictionary 2400 01:47:44,120 --> 01:47:46,100 really just has not words and definitions 2401 01:47:46,100 --> 01:47:48,363 per se, but key value pairs. 2402 01:47:48,363 --> 01:47:49,280 This is a term of art. 2403 01:47:49,280 --> 01:47:51,822 And we're going to see this again and again, especially as we 2404 01:47:51,822 --> 01:47:54,320 transition to web programming, keys and values. 2405 01:47:54,320 --> 01:47:57,290 Key is what you use to look for something. 2406 01:47:57,290 --> 01:48:01,650 The value is what you find ultimately via that key. 2407 01:48:01,650 --> 01:48:03,110 So that's the generic term there. 2408 01:48:03,110 --> 01:48:05,590 We've seen key value pairs really in the past. 2409 01:48:05,590 --> 01:48:07,340 In week zero we talked about your contacts 2410 01:48:07,340 --> 01:48:10,040 and your iPhone or Android phone being an app 2411 01:48:10,040 --> 01:48:13,250 that has a whole bunch of contacts presumably alphabetized by first name, 2412 01:48:13,250 --> 01:48:14,600 or last name, or the like. 2413 01:48:14,600 --> 01:48:17,270 Well, one of those contact cards ultimately 2414 01:48:17,270 --> 01:48:20,730 has someone's number, for instance, like John Harvard in this case. 2415 01:48:20,730 --> 01:48:23,450 So in that type of application, the keys is 2416 01:48:23,450 --> 01:48:26,810 the name, like John Harvard that you use to find information. 2417 01:48:26,810 --> 01:48:30,020 And the value is the number that you find there. 2418 01:48:30,020 --> 01:48:32,270 Or if there's more information like where he lives 2419 01:48:32,270 --> 01:48:35,360 and email address and the like, the whole contact card 2420 01:48:35,360 --> 01:48:37,490 could be the value thereof. 2421 01:48:37,490 --> 01:48:39,860 The key is what you use to look up John Harvard. 2422 01:48:39,860 --> 01:48:41,780 Now, back in week zero-- 2423 01:48:41,780 --> 01:48:45,200 oh, and rather, the corresponding table then if we draw this in two columns 2424 01:48:45,200 --> 01:48:47,780 wouldn't be word and definition or key value generically, 2425 01:48:47,780 --> 01:48:50,100 it would be name and number, for instance. 2426 01:48:50,100 --> 01:48:52,010 So we're just slapping some new terminology 2427 01:48:52,010 --> 01:48:53,760 on this old contacts problem. 2428 01:48:53,760 --> 01:48:56,660 Well, this is the picture we drew way back in week zero, 2429 01:48:56,660 --> 01:49:00,480 whereby I claim that log of n was really, really good. 2430 01:49:00,480 --> 01:49:02,420 And indeed it was and has been since. 2431 01:49:02,420 --> 01:49:04,970 But the Holy Grail would indeed be something more 2432 01:49:04,970 --> 01:49:08,180 like this in this dashed green line constant time. 2433 01:49:08,180 --> 01:49:11,300 And maybe not literally one step, but a fixed number of steps that even 2434 01:49:11,300 --> 01:49:15,890 as the problem gets huge and you go way, way out on the right of the x-axis, 2435 01:49:15,890 --> 01:49:19,850 the problem does not depend on the size-- 2436 01:49:19,850 --> 01:49:24,080 the time to solve the problem does not depend at all 2437 01:49:24,080 --> 01:49:26,400 on the size of the problem itself. 2438 01:49:26,400 --> 01:49:29,300 You can have 1,000 contacts or 100,000 contacts, 2439 01:49:29,300 --> 01:49:32,960 constant time means it takes the same number of steps no matter what. 2440 01:49:32,960 --> 01:49:34,980 Well, how can we get to that point? 2441 01:49:34,980 --> 01:49:37,260 Well, there's a couple of final building blocks today. 2442 01:49:37,260 --> 01:49:38,660 And there's one called hashing. 2443 01:49:38,660 --> 01:49:40,743 And this is something that will recur a few times. 2444 01:49:40,743 --> 01:49:45,680 But for now, hashing is all about taking as input some value 2445 01:49:45,680 --> 01:49:49,550 and outputting a simpler version thereof. 2446 01:49:49,550 --> 01:49:52,520 So for instance, here's a gratuitously large deck 2447 01:49:52,520 --> 01:49:56,510 of cards, which are all the more visible as a result. And in a deck of cards, 2448 01:49:56,510 --> 01:50:00,200 typically you've got what, 52 cards plus maybe the jokers and whatnot. 2449 01:50:00,200 --> 01:50:03,530 And each of those cards has a number of sorts and a suit on it. 2450 01:50:03,530 --> 01:50:06,260 And here are literally four buckets on the stage. 2451 01:50:06,260 --> 01:50:08,840 And how might I go about sorting these cards? 2452 01:50:08,840 --> 01:50:10,730 Not just by number but also by suit. 2453 01:50:10,730 --> 01:50:14,180 Well, you could certainly spread them all out and make a mess of things 2454 01:50:14,180 --> 01:50:17,180 and just reason your way through it and get everything in order 2455 01:50:17,180 --> 01:50:19,090 according to suit and according by number. 2456 01:50:19,090 --> 01:50:21,590 But most of us, even if you don't have four buckets at home, 2457 01:50:21,590 --> 01:50:24,470 probably are going to do something a little more intuitive, 2458 01:50:24,470 --> 01:50:25,730 feels like an optimization. 2459 01:50:25,730 --> 01:50:28,460 Where if I find the nine of hearts, I'm going 2460 01:50:28,460 --> 01:50:30,012 to put that into the hearts bucket. 2461 01:50:30,012 --> 01:50:32,720 The king of spades, I'm going to put that into the spades bucket. 2462 01:50:32,720 --> 01:50:34,970 The jack of diamonds over here. 2463 01:50:34,970 --> 01:50:39,170 And I'll do this with the queen of diamonds, and the ace of clubs here, 2464 01:50:39,170 --> 01:50:41,630 and the three here, and the 10 here. 2465 01:50:41,630 --> 01:50:45,560 And even though it's still going to be 52 steps, why am I-- and maybe at home, 2466 01:50:45,560 --> 01:50:49,610 why would you perhaps do this step first? 2467 01:50:49,610 --> 01:50:53,040 What's the value of bucketizing the values in this way? 2468 01:50:53,040 --> 01:50:55,720 And that actually is a term of art. 2469 01:50:55,720 --> 01:50:58,950 What's the value of doing this first before you sift through and try 2470 01:50:58,950 --> 01:51:00,120 to sort the numbers? 2471 01:51:00,120 --> 01:51:01,076 Yeah. 2472 01:51:01,076 --> 01:51:03,875 AUDIENCE: It's easier to make sure you're not [INAUDIBLE].. 2473 01:51:03,875 --> 01:51:04,750 DAVID J. MALAN: Yeah. 2474 01:51:04,750 --> 01:51:06,958 It's easier to make sure you're not missing anything. 2475 01:51:06,958 --> 01:51:09,580 And it's taking a problem of size 52 and shrinking it 2476 01:51:09,580 --> 01:51:12,230 into four problems of size 13, if you will. 2477 01:51:12,230 --> 01:51:14,642 And so, that just helps simplify things, maybe 2478 01:51:14,642 --> 01:51:16,600 reduces the probability of errors and the like. 2479 01:51:16,600 --> 01:51:18,808 And what I'm doing here, to give it a technical term, 2480 01:51:18,808 --> 01:51:20,590 is that I'm hashing the values. 2481 01:51:20,590 --> 01:51:22,570 I'm taking as input a card like this. 2482 01:51:22,570 --> 01:51:26,950 And I'm reducing it more simply from a larger domain 2483 01:51:26,950 --> 01:51:29,600 to a much smaller range, if you will. 2484 01:51:29,600 --> 01:51:32,680 So here is a domain of 52 possibilities. 2485 01:51:32,680 --> 01:51:35,980 I want to map that to a range of four possible outcomes-- 2486 01:51:35,980 --> 01:51:39,830 the diamonds, the clubs, the hearts, or the spades here. 2487 01:51:39,830 --> 01:51:42,665 And by doing that I'm just shrinking the size of the problem. 2488 01:51:42,665 --> 01:51:43,540 So hashing does that. 2489 01:51:43,540 --> 01:51:45,790 It's literally an f of x type arrangement 2490 01:51:45,790 --> 01:51:50,300 whereby you pass something in and you get back a simpler known value. 2491 01:51:50,300 --> 01:51:54,940 Well, a hash function more technically is the algorithm, or even the math, 2492 01:51:54,940 --> 01:51:58,180 or even the code that implements that idea, converting 2493 01:51:58,180 --> 01:52:03,040 something bigger to something smaller to this indeed finite range of values. 2494 01:52:03,040 --> 01:52:10,090 And it turns out that hash tables are a wonderful application of arrays 2495 01:52:10,090 --> 01:52:14,080 and linked lists to try to leverage the best of both worlds. 2496 01:52:14,080 --> 01:52:18,740 The goal being theoretically to achieve that Holy Grail of constant time. 2497 01:52:18,740 --> 01:52:20,740 And that's going to be a bit of an overstatement 2498 01:52:20,740 --> 01:52:22,990 because you're not always going to achieve it exactly. 2499 01:52:22,990 --> 01:52:25,420 But at least we can get a little closer there too. 2500 01:52:25,420 --> 01:52:28,510 So with hash tables, you have something that looks like this. 2501 01:52:28,510 --> 01:52:29,860 This is just an array. 2502 01:52:29,860 --> 01:52:33,700 This is an artist's rendition of drawing it vertically instead of horizontally. 2503 01:52:33,700 --> 01:52:36,310 But that's just a detail graphically. 2504 01:52:36,310 --> 01:52:40,960 And this array, for instance, maybe is of size 26. 2505 01:52:40,960 --> 01:52:42,470 And where am I going with this? 2506 01:52:42,470 --> 01:52:46,990 Well, how does Apple, how does Google store your context alphabetically 2507 01:52:46,990 --> 01:52:49,510 in your phone and search for things quickly? 2508 01:52:49,510 --> 01:52:52,630 Well, they might-- they probably alphabetize at least in English 2509 01:52:52,630 --> 01:52:55,210 according to A through Z. Or if we convert that to numbers 2510 01:52:55,210 --> 01:52:56,920 it's like what, 65 through whatever. 2511 01:52:56,920 --> 01:52:59,380 Or really, zero through 25 suffices. 2512 01:52:59,380 --> 01:53:02,710 If we're using an array of size 26, we start counting at zero 2513 01:53:02,710 --> 01:53:04,180 and we count up to 25. 2514 01:53:04,180 --> 01:53:06,830 But let's abstract that away as just letters of the alphabet. 2515 01:53:06,830 --> 01:53:09,670 So maybe what Google and Apple are doing in your phone 2516 01:53:09,670 --> 01:53:12,670 is storing all of the A's up there, all of the Z's down there, 2517 01:53:12,670 --> 01:53:14,810 and everything else in between. 2518 01:53:14,810 --> 01:53:16,610 And so, this works pretty well if you start 2519 01:53:16,610 --> 01:53:18,110 adding your friends and your family. 2520 01:53:18,110 --> 01:53:21,970 So for instance-- and I'll get rid of the letter so as to not distract. 2521 01:53:21,970 --> 01:53:25,750 Alvis might go in that first spot because A, you subtract a 65, 2522 01:53:25,750 --> 01:53:26,530 maps to zero. 2523 01:53:26,530 --> 01:53:29,020 So we put him in the first bucket, the A bucket. 2524 01:53:29,020 --> 01:53:31,450 Maybe Zacharias ends up all the way at the end there. 2525 01:53:31,450 --> 01:53:33,580 And then in the middle might here be Hermione. 2526 01:53:33,580 --> 01:53:36,580 And if we do this dot, dot, dot, you keep adding all of your classmates, 2527 01:53:36,580 --> 01:53:41,500 you might get a contacts database that has all of this data herein. 2528 01:53:41,500 --> 01:53:44,530 Now, each of these nodes, they're drawn differently 2529 01:53:44,530 --> 01:53:46,450 because this is just another artist rendition. 2530 01:53:46,450 --> 01:53:49,870 These rectangles, these long rectangles represent a contact card, 2531 01:53:49,870 --> 01:53:53,200 like John Harvard's that's got the name, maybe email, definitely phone 2532 01:53:53,200 --> 01:53:54,950 number, and things like that. 2533 01:53:54,950 --> 01:53:57,460 So this seems great. 2534 01:53:57,460 --> 01:53:58,120 Why? 2535 01:53:58,120 --> 01:54:00,100 How can I find Alvis? 2536 01:54:00,100 --> 01:54:01,300 Well, I go to the A bucket. 2537 01:54:01,300 --> 01:54:02,380 How do I find Zacharias? 2538 01:54:02,380 --> 01:54:03,582 I go to the Z bucket. 2539 01:54:03,582 --> 01:54:04,540 How do I find Hermione? 2540 01:54:04,540 --> 01:54:06,010 I go to the H bucket. 2541 01:54:06,010 --> 01:54:09,730 But, but, but, I've done this very deliberately. 2542 01:54:09,730 --> 01:54:14,710 What problem will arise eventually assuming you have enough classmates? 2543 01:54:14,710 --> 01:54:16,462 AUDIENCE: [INAUDIBLE] 2544 01:54:16,462 --> 01:54:17,885 2545 01:54:17,885 --> 01:54:18,760 DAVID J. MALAN: Yeah. 2546 01:54:18,760 --> 01:54:21,100 There'll be too many people, too many contacts for all 2547 01:54:21,100 --> 01:54:23,860 of the available spaces in the array. 2548 01:54:23,860 --> 01:54:25,540 There's still some room here. 2549 01:54:25,540 --> 01:54:29,440 But I'm pretty sure if I think back to this particular class, 2550 01:54:29,440 --> 01:54:34,090 we've got not Hermione but also Harry, who's also an H. Hagrid, who's 2551 01:54:34,090 --> 01:54:36,430 also an H. So where do put them? 2552 01:54:36,430 --> 01:54:40,010 I could just put them arbitrarily in any of the open spots. 2553 01:54:40,010 --> 01:54:43,450 But then you lose the immediacy of jumping right to the H, right to the A, 2554 01:54:43,450 --> 01:54:44,290 right to the Z. 2555 01:54:44,290 --> 01:54:47,590 But now that we have linked lists, we can combine these ideas. 2556 01:54:47,590 --> 01:54:51,530 Use an array to get to the first letter of the name you care about. 2557 01:54:51,530 --> 01:54:56,300 And then if you have a collision, so to speak, whereby someone's already there, 2558 01:54:56,300 --> 01:54:58,360 you don't do something stupid like put Harry 2559 01:54:58,360 --> 01:55:01,270 down here just because it's available, or maybe Hagrid down here 2560 01:55:01,270 --> 01:55:03,228 just because it's available because then you're 2561 01:55:03,228 --> 01:55:04,870 losing the immediacy of the lookup. 2562 01:55:04,870 --> 01:55:08,890 Why don't you just stitch them together in a link list? 2563 01:55:08,890 --> 01:55:10,220 Now what does this mean. 2564 01:55:10,220 --> 01:55:14,398 This means for most of the characters here, you have constant time lookup. 2565 01:55:14,398 --> 01:55:15,940 You look up Alvis, boom, you're done. 2566 01:55:15,940 --> 01:55:17,230 Zacharias, boom, you're done. 2567 01:55:17,230 --> 01:55:22,400 OK, Harry, Hermione, Hagrid, it might be one, two, or three steps. 2568 01:55:22,400 --> 01:55:24,760 So that's actually devolving into something linear. 2569 01:55:24,760 --> 01:55:28,000 But here we make a distinction today between theoretical running times, 2570 01:55:28,000 --> 01:55:31,750 which we keep talking about, and honestly, a clock on the wall 2571 01:55:31,750 --> 01:55:34,090 running times that actual humans care about. 2572 01:55:34,090 --> 01:55:38,290 This is way faster than a linked list because you 2573 01:55:38,290 --> 01:55:40,120 don't have to search every name. 2574 01:55:40,120 --> 01:55:44,890 It's even faster than an array because you don't need to do binary search. 2575 01:55:44,890 --> 01:55:49,600 You can literally, for most of the names, find them in constant time-- 2576 01:55:49,600 --> 01:55:50,590 one step. 2577 01:55:50,590 --> 01:55:53,020 And again, it's not theoretically constant because these-- 2578 01:55:53,020 --> 01:55:55,210 if you only befriend people who have H names, 2579 01:55:55,210 --> 01:55:57,550 it's going to be a crazy long linked list anyway. 2580 01:55:57,550 --> 01:56:01,840 So again, it really depends on what the nature of the data is here. 2581 01:56:01,840 --> 01:56:04,650 But this is pretty close to constant time. 2582 01:56:04,650 --> 01:56:06,400 And in fact, how could we get even closer? 2583 01:56:06,400 --> 01:56:08,980 How could we reduce the probability of collisions 2584 01:56:08,980 --> 01:56:12,220 for the H's or any other letters? 2585 01:56:12,220 --> 01:56:14,830 How could we avoid putting too many H names together? 2586 01:56:14,830 --> 01:56:16,802 AUDIENCE: [INAUDIBLE] 2587 01:56:16,802 --> 01:56:18,240 2588 01:56:18,240 --> 01:56:19,740 DAVID J. MALAN: Say a little louder. 2589 01:56:19,740 --> 01:56:22,626 AUDIENCE: [INAUDIBLE] 2590 01:56:22,626 --> 01:56:24,538 2591 01:56:24,538 --> 01:56:25,580 DAVID J. MALAN: OK, yeah. 2592 01:56:25,580 --> 01:56:27,360 So we could add another dimension, if you will. 2593 01:56:27,360 --> 01:56:29,193 But let's not add a third dimension, per se. 2594 01:56:29,193 --> 01:56:32,150 But let's indeed look at not just the first letter of everyone's name 2595 01:56:32,150 --> 01:56:33,650 but the first and the second. 2596 01:56:33,650 --> 01:56:37,310 And in fact, let's see if that gets us a little further along. 2597 01:56:37,310 --> 01:56:40,310 So let me go ahead and propose, if you go through the whole Harry Potter 2598 01:56:40,310 --> 01:56:42,977 universe, there's actually a lot of collisions if we keep going. 2599 01:56:42,977 --> 01:56:45,950 And so, we've got the L's here, the R's, the S's, and so forth. 2600 01:56:45,950 --> 01:56:47,510 Well, let's clean this up here. 2601 01:56:47,510 --> 01:56:50,240 Hermione originally went to the H location. 2602 01:56:50,240 --> 01:56:54,230 But let's decrease the probability of collisions there and everywhere else. 2603 01:56:54,230 --> 01:56:58,050 Instead of putting Hermione, Harry, and Hagrid all together, 2604 01:56:58,050 --> 01:56:59,750 let's go ahead and do this instead. 2605 01:56:59,750 --> 01:57:02,840 Instead of labeling these buckets A through Z, 2606 01:57:02,840 --> 01:57:04,620 let's give ourselves more buckets. 2607 01:57:04,620 --> 01:57:07,940 So in fact, this might be H. Well, instead of H, maybe this should be HA, 2608 01:57:07,940 --> 01:57:11,870 and then this should be HB, HC, HD, HE, HF. 2609 01:57:11,870 --> 01:57:14,000 Now, some of those are a little nonsensical 2610 01:57:14,000 --> 01:57:16,460 because I can't think of names that match most of those. 2611 01:57:16,460 --> 01:57:18,170 But it's deterministic. 2612 01:57:18,170 --> 01:57:20,820 At least we know the bucket will be there, 2613 01:57:20,820 --> 01:57:22,680 which is important even if it's empty. 2614 01:57:22,680 --> 01:57:24,560 And now, we can put Hermione here. 2615 01:57:24,560 --> 01:57:25,760 We can put Harry here. 2616 01:57:25,760 --> 01:57:26,480 But oh-oh. 2617 01:57:26,480 --> 01:57:28,190 We didn't do this perfectly well. 2618 01:57:28,190 --> 01:57:29,420 Hagrid still collide. 2619 01:57:29,420 --> 01:57:30,620 So let me come back to you. 2620 01:57:30,620 --> 01:57:33,788 How can we reduce the probability of Harry and Hagrid colliding? 2621 01:57:33,788 --> 01:57:35,330 AUDIENCE: [INAUDIBLE] another letter? 2622 01:57:35,330 --> 01:57:35,630 DAVID J. MALAN: Yeah. 2623 01:57:35,630 --> 01:57:37,250 So we can look at the third letter. 2624 01:57:37,250 --> 01:57:38,240 So let me try that. 2625 01:57:38,240 --> 01:57:44,030 Instead of HA, let's look at HAA, HAB, HAC, dot, dot, dot, HAQ, dot, dot, dot, 2626 01:57:44,030 --> 01:57:47,610 HEQ, HER, HES, and so forth. 2627 01:57:47,610 --> 01:57:50,870 And now, I think those names and probably all of the others we saw 2628 01:57:50,870 --> 01:57:54,630 are now much more cleanly distributed. 2629 01:57:54,630 --> 01:57:57,140 There is much lower probability of collisions 2630 01:57:57,140 --> 01:58:01,040 unless two people have almost the same names or one is a prefix of the other. 2631 01:58:01,040 --> 01:58:06,140 But, but, but, even though we're now closer than ever to constant time 2632 01:58:06,140 --> 01:58:09,470 because the odds that we hit a collision and have to devolve to a linked list 2633 01:58:09,470 --> 01:58:11,990 are much lower, what's the downside that's 2634 01:58:11,990 --> 01:58:16,840 not completely obvious from how I've depicted this on screen? 2635 01:58:16,840 --> 01:58:18,400 What's the price I'm paying here? 2636 01:58:18,400 --> 01:58:18,910 Yeah. 2637 01:58:18,910 --> 01:58:19,960 AUDIENCE: So much memory. 2638 01:58:19,960 --> 01:58:22,330 DAVID J. MALAN: This is a huge amount of memory. 2639 01:58:22,330 --> 01:58:26,560 The number of cells here in the array is now what, 26 times 26 times 2640 01:58:26,560 --> 01:58:30,730 26 for the first, the second, and the third possible characters 2641 01:58:30,730 --> 01:58:32,500 all combinatorically combined here. 2642 01:58:32,500 --> 01:58:33,100 That's a lot. 2643 01:58:33,100 --> 01:58:34,100 I didn't even draw them. 2644 01:58:34,100 --> 01:58:36,830 I have the dot, dot, dot. [? Could you ?] evoke that instead? 2645 01:58:36,830 --> 01:58:38,260 That's a huge amount of memory. 2646 01:58:38,260 --> 01:58:40,390 This is a very sparse data set now. 2647 01:58:40,390 --> 01:58:43,120 And odds are, you're going to waste so much memory, 2648 01:58:43,120 --> 01:58:47,623 even for the names like HAE, HA, HEQ. 2649 01:58:47,623 --> 01:58:48,790 I can't even think of names. 2650 01:58:48,790 --> 01:58:50,860 So many of those buckets are going to be empty, 2651 01:58:50,860 --> 01:58:55,400 not to mention the AAA and the ZZZ and everything else in between. 2652 01:58:55,400 --> 01:58:56,500 So it's a trade-off. 2653 01:58:56,500 --> 01:58:58,387 And it might be too expensive a trade-off. 2654 01:58:58,387 --> 01:59:00,220 And so, you might have to tolerate something 2655 01:59:00,220 --> 01:59:03,040 like the collisions we had earlier, whereby even though they might 2656 01:59:03,040 --> 01:59:09,520 very well happen, at least you are decreasing the probability by perhaps 2657 01:59:09,520 --> 01:59:10,940 having more buckets like this. 2658 01:59:10,940 --> 01:59:13,910 And in fact, if I rewind now to where we might have gone with this, 2659 01:59:13,910 --> 01:59:17,500 here's how we might represent these nodes in the tree. 2660 01:59:17,500 --> 01:59:19,810 Previously, in the past, we've had a person 2661 01:59:19,810 --> 01:59:23,230 who had a string name and a string number, a.k.a. now char star. 2662 01:59:23,230 --> 01:59:27,250 And so, here now might be how in this hash table 2663 01:59:27,250 --> 01:59:31,630 we represent someone's name and number as well as a pointer 2664 01:59:31,630 --> 01:59:33,610 to the next element in the list. 2665 01:59:33,610 --> 01:59:35,380 Let me rewind just to the picture here. 2666 01:59:35,380 --> 01:59:38,140 We keep drawing different shapes, because, again, these are abstractions. 2667 01:59:38,140 --> 01:59:39,932 Who really cares if they're at a scale now. 2668 01:59:39,932 --> 01:59:42,568 We've got enough room for the person's name, not pictured 2669 01:59:42,568 --> 01:59:44,110 on the screen it's Hermione's number. 2670 01:59:44,110 --> 01:59:45,910 But that's somewhere in this rectangle. 2671 01:59:45,910 --> 01:59:48,100 But yes, pictured here in this little square 2672 01:59:48,100 --> 01:59:50,860 is a pointer to the next node in the list. 2673 01:59:50,860 --> 01:59:54,640 So by storing name and number, maybe her address, maybe her mailing address, 2674 01:59:54,640 --> 01:59:58,360 whatever, in addition to a pointer allows each of these nodes 2675 01:59:58,360 --> 02:00:01,750 to be connectable just like the nodes in the linked list. 2676 02:00:01,750 --> 02:00:04,160 But where they're starting is in an array. 2677 02:00:04,160 --> 02:00:06,580 So the array gets us 126-- 2678 02:00:06,580 --> 02:00:10,630 or gets us-- narrows the problem from size 26 to one, 2679 02:00:10,630 --> 02:00:12,520 gets us to the linked list in question. 2680 02:00:12,520 --> 02:00:16,520 Hopefully, it's a single person or perhaps it has more than that. 2681 02:00:16,520 --> 02:00:19,210 Meanwhile, what is the hash table itself-- 2682 02:00:19,210 --> 02:00:22,960 the hash table, the whole thing is literally just an array. 2683 02:00:22,960 --> 02:00:25,990 I've hardcoded the simplest version of size 26 2684 02:00:25,990 --> 02:00:29,470 but what do each of those boxes in the vertical array 2685 02:00:29,470 --> 02:00:33,520 represent just a pointer to potentially a node, a node in a linked list. 2686 02:00:33,520 --> 02:00:36,130 And if there's no one there, if there's no one in location y, 2687 02:00:36,130 --> 02:00:38,530 or x, or the like in that universe, well, it's 2688 02:00:38,530 --> 02:00:41,890 just a null pointer signifying there's no one there. 2689 02:00:41,890 --> 02:00:45,790 But if there is, it's going to be a pointer to a valid node from which we 2690 02:00:45,790 --> 02:00:48,135 can get to any of the others as well. 2691 02:00:48,135 --> 02:00:50,260 And that so-called hash function, just like the one 2692 02:00:50,260 --> 02:00:52,120 I did with the greeting cards, well, it's 2693 02:00:52,120 --> 02:00:55,780 just a black box, if you will, but implemented somewhere in code, 2694 02:00:55,780 --> 02:01:00,850 like in C. And so, if you pass in Albus, what is the hash value of Albus? 2695 02:01:00,850 --> 02:01:05,260 Well, in the first version of the story with 26 buckets, it should be a zero. 2696 02:01:05,260 --> 02:01:07,900 If you pass in Zacharias, it should be 25. 2697 02:01:07,900 --> 02:01:11,980 And so, just as my cards were being hashed to one of one, two, three, 2698 02:01:11,980 --> 02:01:16,780 four values, now these names are being hashed to one of 26 possibilities, 2699 02:01:16,780 --> 02:01:22,420 or 26 times 26, or 26 to the third power if you have more and more granularity 2700 02:01:22,420 --> 02:01:24,620 than that. 2701 02:01:24,620 --> 02:01:31,850 Questions on this implementation now of this idea of a hash table? 2702 02:01:31,850 --> 02:01:38,090 AUDIENCE: Should we [INAUDIBLE] the value of null and then [INAUDIBLE]?? 2703 02:01:38,090 --> 02:01:41,470 2704 02:01:41,470 --> 02:01:43,570 DAVID J. MALAN: If you-- 2705 02:01:43,570 --> 02:01:45,173 say that again with the null? 2706 02:01:45,173 --> 02:01:46,840 AUDIENCE: Could you ever be [INAUDIBLE]? 2707 02:01:46,840 --> 02:01:55,190 2708 02:01:55,190 --> 02:01:56,690 DAVID J. MALAN: Oh, a good question. 2709 02:01:56,690 --> 02:01:58,398 So if there's so much sparseness, there's 2710 02:01:58,398 --> 02:02:00,160 all of these empty cells in the array. 2711 02:02:00,160 --> 02:02:03,040 Couldn't you just go in and free them, or delete them, or just 2712 02:02:03,040 --> 02:02:07,690 shrink the array and not have AAA, and AAB, and AAC only 2713 02:02:07,690 --> 02:02:10,240 have the prefixes, two or three characters 2714 02:02:10,240 --> 02:02:12,970 that you need, you absolutely could do that. 2715 02:02:12,970 --> 02:02:16,780 But now, what you lose is the arithmetic benefit of being 2716 02:02:16,780 --> 02:02:22,480 able to map each letter to a number if you start freeing up unused space, 2717 02:02:22,480 --> 02:02:26,347 you don't know that Zacharias is necessarily at location 25. 2718 02:02:26,347 --> 02:02:28,180 Albus is still going to be at location zero. 2719 02:02:28,180 --> 02:02:30,760 But if you've deleted some of the elements in the middle, 2720 02:02:30,760 --> 02:02:33,460 Zacharias could be at 24 if you've deleted one, 2721 02:02:33,460 --> 02:02:35,510 23 if you've deleted another. 2722 02:02:35,510 --> 02:02:37,900 And so, you don't have that arithmetic immediacy 2723 02:02:37,900 --> 02:02:41,620 that you need in order to index into the array with constant time. 2724 02:02:41,620 --> 02:02:45,420 And the same is going to be true if it's two letters or three letters. 2725 02:02:45,420 --> 02:02:47,920 You need to be able to trust that you can do some quick math 2726 02:02:47,920 --> 02:02:50,890 and jump to the right index in constant time. 2727 02:02:50,890 --> 02:02:53,000 And that's, again, the appeal of these arrays. 2728 02:02:53,000 --> 02:02:57,370 So when it comes to the running time of a hash table, inserting values into it, 2729 02:02:57,370 --> 02:03:00,200 searching for values therein, at the end of the day, 2730 02:03:00,200 --> 02:03:01,990 it technically is big O of n. 2731 02:03:01,990 --> 02:03:06,580 Because in the craziest case, you might have a huge fancy hash table. 2732 02:03:06,580 --> 02:03:09,490 But everyone in the universe has a name starting with H. 2733 02:03:09,490 --> 02:03:12,370 And then, it just evolves into a really long linked list 2734 02:03:12,370 --> 02:03:14,870 just like a binary search tree could do the same. 2735 02:03:14,870 --> 02:03:17,620 But if you choose a smarter hash function, maybe you mitigate that 2736 02:03:17,620 --> 02:03:19,570 and you don't rely only on the first letter 2737 02:03:19,570 --> 02:03:23,860 but on the second, or the third as well, or some other combination of that input 2738 02:03:23,860 --> 02:03:26,260 and make your hash function smarter, odds 2739 02:03:26,260 --> 02:03:30,520 are if you get a good hash function you want to get it to be more of order of n 2740 02:03:30,520 --> 02:03:33,620 divided by k, where k means constant mathematically. 2741 02:03:33,620 --> 02:03:35,510 And so, k is the number of buckets. 2742 02:03:35,510 --> 02:03:38,140 So ideally you want a uniform distribution. 2743 02:03:38,140 --> 02:03:41,180 You want this many people here, this many people here. 2744 02:03:41,180 --> 02:03:43,600 You don't want there to be some or no people. 2745 02:03:43,600 --> 02:03:45,965 You want a uniform statistical distribution. 2746 02:03:45,965 --> 02:03:48,340 And maybe you get that from human names, maybe you don't. 2747 02:03:48,340 --> 02:03:50,680 But that's the challenge of a hash function. 2748 02:03:50,680 --> 02:03:53,290 Of course, big O of n over k is not a thing 2749 02:03:53,290 --> 02:03:56,050 because we always throw away constants like k. 2750 02:03:56,050 --> 02:03:58,240 So it's still big O of n. 2751 02:03:58,240 --> 02:04:00,970 But again, the distinction today is that, OK, yes, 2752 02:04:00,970 --> 02:04:03,640 academically you learned in CS50 that, sure, it's big O of n. 2753 02:04:03,640 --> 02:04:08,000 But my God, it's 26 times faster if you do the hash function well 2754 02:04:08,000 --> 02:04:10,600 and you spread everyone out over the hash table. 2755 02:04:10,600 --> 02:04:14,200 And that's the appeal of these kinds of structures. 2756 02:04:14,200 --> 02:04:16,960 And we've got one more for you, if I may. 2757 02:04:16,960 --> 02:04:20,180 And that's something now known as a try. 2758 02:04:20,180 --> 02:04:23,500 So it turns out that a try is even cooler, 2759 02:04:23,500 --> 02:04:29,560 if you like this kind of thing, in that it does not devolve into big O of n. 2760 02:04:29,560 --> 02:04:31,707 It is truly constant time. 2761 02:04:31,707 --> 02:04:33,040 But there's going to be a price. 2762 02:04:33,040 --> 02:04:34,360 There's going to be a gotcha. 2763 02:04:34,360 --> 02:04:38,230 A tri is sort of a fancier tree. 2764 02:04:38,230 --> 02:04:41,200 And it's short for retrieval, but pronounced tri 2765 02:04:41,200 --> 02:04:42,670 for weird historical reasons. 2766 02:04:42,670 --> 02:04:47,980 But a tri is a tree, each of whose nodes is an array. 2767 02:04:47,980 --> 02:04:49,427 So this is all crazy mash-ups now. 2768 02:04:49,427 --> 02:04:52,510 People started inventing data structures just by combining different ones. 2769 02:04:52,510 --> 02:04:54,677 So unfortunately, a lot of the good ideas are taken. 2770 02:04:54,677 --> 02:04:59,643 But you just have benefits from certain aspects of those data structures. 2771 02:04:59,643 --> 02:05:02,810 And combining them, ideally, gives you the best of both worlds, so to speak. 2772 02:05:02,810 --> 02:05:04,940 So here might be the root of a tri. 2773 02:05:04,940 --> 02:05:06,850 It's literally a big node, a big rectangle. 2774 02:05:06,850 --> 02:05:08,480 But it's actually an array. 2775 02:05:08,480 --> 02:05:11,540 So there's 26 locations in this picture here. 2776 02:05:11,540 --> 02:05:14,920 And here's how you use a tri, for instance, the store names 2777 02:05:14,920 --> 02:05:17,410 just like the hash table. 2778 02:05:17,410 --> 02:05:20,680 You treat each of the elements of that array in that node 2779 02:05:20,680 --> 02:05:22,100 as a letter of the alphabet. 2780 02:05:22,100 --> 02:05:24,590 So A through Z or zero through 25. 2781 02:05:24,590 --> 02:05:28,910 And if you want to store someone's name in here, you do so as follows. 2782 02:05:28,910 --> 02:05:34,270 If you want to store an H, you index into the H location. 2783 02:05:34,270 --> 02:05:37,990 And if you want to store the second letter of someone's name, like an A, 2784 02:05:37,990 --> 02:05:40,180 well, you add another node below it. 2785 02:05:40,180 --> 02:05:40,750 And such. 2786 02:05:40,750 --> 02:05:42,190 One is connected to the other. 2787 02:05:42,190 --> 02:05:45,130 And you then identify the A in that array. 2788 02:05:45,130 --> 02:05:49,240 And then you go on and maybe put a G if the goal is to store-- spoiler now-- 2789 02:05:49,240 --> 02:05:51,730 Hagrid in this data structure. 2790 02:05:51,730 --> 02:05:54,782 And then the R, and the I, and then the D. 2791 02:05:54,782 --> 02:05:56,740 But when you get to the D, the end of the name, 2792 02:05:56,740 --> 02:06:00,070 you have to somehow flag that this is the end of a name 2793 02:06:00,070 --> 02:06:02,380 that we've embedded into this data structure. 2794 02:06:02,380 --> 02:06:04,780 So whereas all of these are called out in white just 2795 02:06:04,780 --> 02:06:06,910 to make it obvious what we're connecting to what, 2796 02:06:06,910 --> 02:06:09,460 green has to be like a bool that's true that 2797 02:06:09,460 --> 02:06:11,830 just indicates the buck stops here. 2798 02:06:11,830 --> 02:06:15,160 D is the last letter in someone's actual name. 2799 02:06:15,160 --> 02:06:18,820 And what's kind of cool now about a tri is 2800 02:06:18,820 --> 02:06:21,260 that we can repeat this for other names as well. 2801 02:06:21,260 --> 02:06:24,770 So for instance, here's where we might put Harry as well. 2802 02:06:24,770 --> 02:06:30,100 And notice, they share a common prefix, HA for Hagrid, HA for Harry. 2803 02:06:30,100 --> 02:06:33,430 So we're reusing some of these nodes, some of these arrays. 2804 02:06:33,430 --> 02:06:36,820 We can even slip Hermione in here too borrowing only the H. 2805 02:06:36,820 --> 02:06:39,790 But she gets the H, then the E, then R-- 2806 02:06:39,790 --> 02:06:42,650 R-M-I-O-N-E, and so forth. 2807 02:06:42,650 --> 02:06:46,360 And we mark at the end of her name too that she's in there. 2808 02:06:46,360 --> 02:06:48,440 Now, what's the takeaway here? 2809 02:06:48,440 --> 02:06:51,670 Well what is the running time of a tri? 2810 02:06:51,670 --> 02:06:55,820 How many steps does it take to find someone in this data structure? 2811 02:06:55,820 --> 02:06:59,150 And let me zoom out so that it suddenly becomes a massive data 2812 02:06:59,150 --> 02:07:01,070 structure with even more in it. 2813 02:07:01,070 --> 02:07:02,780 Maybe it looks-- sorry. 2814 02:07:02,780 --> 02:07:04,040 No, I'll keep it on this one. 2815 02:07:04,040 --> 02:07:07,040 Maybe it looks a little something like this with just these three names. 2816 02:07:07,040 --> 02:07:12,560 But how many steps does it take to find Hagrid, or Harry, or Hermione, 2817 02:07:12,560 --> 02:07:15,230 no matter how many names are in this data structure? 2818 02:07:15,230 --> 02:07:16,820 There's three at the moment. 2819 02:07:16,820 --> 02:07:18,230 But it takes what? 2820 02:07:18,230 --> 02:07:23,390 H-A-G-R-I-D, so six steps to find Hagrid. 2821 02:07:23,390 --> 02:07:26,660 H-A-R-R-Y, five steps to find Harry. 2822 02:07:26,660 --> 02:07:31,760 H-E-R-M-I-O-N-E, eight steps to find Hermione. 2823 02:07:31,760 --> 02:07:35,180 But notice, that those steps are only dependent on what 2824 02:07:35,180 --> 02:07:37,130 the lengths of the humans names. 2825 02:07:37,130 --> 02:07:41,180 And let's assume that no one's going to have a infinitely long name. 2826 02:07:41,180 --> 02:07:42,830 It's going to max out at what, eight? 2827 02:07:42,830 --> 02:07:44,870 No, maybe 18, maybe 20, 30. 2828 02:07:44,870 --> 02:07:47,270 There's actually some pretty long human names out there. 2829 02:07:47,270 --> 02:07:49,080 But it's going to be finite. 2830 02:07:49,080 --> 02:07:50,660 You know it's a bounded. 2831 02:07:50,660 --> 02:07:54,360 Whereas most contexts in could grow forever. 2832 02:07:54,360 --> 02:07:57,530 So what's compelling here is if you assume that the longest name is, 2833 02:07:57,530 --> 02:08:00,170 I don't know, 50 for the sake of a theme here, 2834 02:08:00,170 --> 02:08:03,080 then you know that finding anyone in this data structure 2835 02:08:03,080 --> 02:08:05,180 will take more than 50 steps. 2836 02:08:05,180 --> 02:08:10,520 50 is thus a constant, which means you have big O of one running time. 2837 02:08:10,520 --> 02:08:13,190 It doesn't matter if there's a million people in this phonebook 2838 02:08:13,190 --> 02:08:14,990 or a billion people in this phonebook. 2839 02:08:14,990 --> 02:08:17,010 That's going to definitely add more nodes to it. 2840 02:08:17,010 --> 02:08:18,940 But it's still going to take you H-A-R-- 2841 02:08:18,940 --> 02:08:22,880 I'm sorry-- H-A-G-R-I-D, six steps to find Hagrid. 2842 02:08:22,880 --> 02:08:28,250 H-A-R-R-Y, five steps to find Harry even if there is a billion other people 2843 02:08:28,250 --> 02:08:29,820 in that data structure. 2844 02:08:29,820 --> 02:08:32,978 So now, we actually do seem to have constant time 2845 02:08:32,978 --> 02:08:36,020 if you assume that there's going to be a bound on the length of the name. 2846 02:08:36,020 --> 02:08:38,960 Why don't we use tries for everything then? 2847 02:08:38,960 --> 02:08:41,660 What's the price we're paying for this data structure 2848 02:08:41,660 --> 02:08:44,630 even though we've represented just three characters here? 2849 02:08:44,630 --> 02:08:45,200 Yeah. 2850 02:08:45,200 --> 02:08:46,860 AUDIENCE: [INAUDIBLE] 2851 02:08:46,860 --> 02:08:48,580 DAVID J. MALAN: It's a lot of memory. 2852 02:08:48,580 --> 02:08:49,080 Yeah. 2853 02:08:49,080 --> 02:08:51,330 And you can see it even with these three names, most 2854 02:08:51,330 --> 02:08:55,500 of the squares on the screen are empty, bytes and bits that are there 2855 02:08:55,500 --> 02:08:56,460 and are allocated. 2856 02:08:56,460 --> 02:08:58,210 And they need to be there because you need 2857 02:08:58,210 --> 02:09:01,297 to be able to do that arithmetic thing of this being zero, this being 25. 2858 02:09:01,297 --> 02:09:04,380 So you can jump from boom, boom, boom, boom, based on each of the letters. 2859 02:09:04,380 --> 02:09:07,770 But it's a hugely sparse data structure, which means 2860 02:09:07,770 --> 02:09:10,060 it takes up a crazy amount of memory. 2861 02:09:10,060 --> 02:09:12,865 Now, maybe that's tolerable, especially for short names. 2862 02:09:12,865 --> 02:09:14,740 But that's going to be the trade-off as well. 2863 02:09:14,740 --> 02:09:16,470 And this is such a tension in computing. 2864 02:09:16,470 --> 02:09:19,410 Almost any time you want to improve time, 2865 02:09:19,410 --> 02:09:22,560 you want to speed up the efficiency, the speed of your algorithm, 2866 02:09:22,560 --> 02:09:23,880 you're going to spend space. 2867 02:09:23,880 --> 02:09:27,270 If by contrast you want to decrease the amount of space, 2868 02:09:27,270 --> 02:09:30,150 you might very well have to increase the running time. 2869 02:09:30,150 --> 02:09:32,470 And it is indeed, this seesaw back and forth. 2870 02:09:32,470 --> 02:09:34,500 And you, your colleagues, your company need 2871 02:09:34,500 --> 02:09:36,990 to decide what resource is the most precious. 2872 02:09:36,990 --> 02:09:40,590 Heck, it might be much harder to code one of these data structures 2873 02:09:40,590 --> 02:09:41,190 than another. 2874 02:09:41,190 --> 02:09:41,880 You're a human. 2875 02:09:41,880 --> 02:09:42,990 Your time is valuable. 2876 02:09:42,990 --> 02:09:46,170 Do you really want to spend hours implementing a tri when, 2877 02:09:46,170 --> 02:09:49,680 hey, in 30 minutes I can bang out an array nowadays or a linked list even. 2878 02:09:49,680 --> 02:09:52,800 There too, development time is going to be yet another resource and why 2879 02:09:52,800 --> 02:09:57,000 sometimes there's good code or bad code, it depends on what you're prioritizing. 2880 02:09:57,000 --> 02:09:59,880 So what do each of these nodes look like in a tri? 2881 02:09:59,880 --> 02:10:01,380 Well, we can keep calling it a node. 2882 02:10:01,380 --> 02:10:04,710 This is a very generic term for just a container in these data structures. 2883 02:10:04,710 --> 02:10:10,080 In this story though, let me claim that everyone has a number, like a phone 2884 02:10:10,080 --> 02:10:12,840 number, a string, a.k.a char star. 2885 02:10:12,840 --> 02:10:18,330 Every node has 26 children, or technically an array of size 26 2886 02:10:18,330 --> 02:10:20,520 that can point to more of these nodes. 2887 02:10:20,520 --> 02:10:25,020 Notice that I don't need to store the name of someone in a try 2888 02:10:25,020 --> 02:10:28,330 because it's implicit in the path that you take to find them. 2889 02:10:28,330 --> 02:10:29,670 So that's a minor optimization. 2890 02:10:29,670 --> 02:10:31,330 But it saves us some space. 2891 02:10:31,330 --> 02:10:33,750 But this would be just a different data structure 2892 02:10:33,750 --> 02:10:37,740 we could use to actually solve this problem as well, 2893 02:10:37,740 --> 02:10:39,570 albeit at a very expensive cost. 2894 02:10:39,570 --> 02:10:42,480 And what do we need our variable to be that stores the try? 2895 02:10:42,480 --> 02:10:44,880 Just like before, we just need a single pointer 2896 02:10:44,880 --> 02:10:48,000 that hangs on to the root of this structure that's 2897 02:10:48,000 --> 02:10:52,790 null if it's empty or non-null if it's actually pointing at something. 2898 02:10:52,790 --> 02:10:56,210 Any questions then on tries? 2899 02:10:56,210 --> 02:10:59,420 And if it's feeling like a lot, the fire hydrant, it is. 2900 02:10:59,420 --> 02:11:03,350 We started with arrays, then linked list, then tries. 2901 02:11:03,350 --> 02:11:06,230 But questions on how we've just assembled from these basic building 2902 02:11:06,230 --> 02:11:06,730 blocks? 2903 02:11:06,730 --> 02:11:07,326 Yeah. 2904 02:11:07,326 --> 02:11:10,122 AUDIENCE: Why isn't the root of tri [INAUDIBLE]?? 2905 02:11:10,122 --> 02:11:11,520 Why does that matter? 2906 02:11:11,520 --> 02:11:12,853 DAVID J. MALAN: A good question. 2907 02:11:12,853 --> 02:11:14,820 Why is this not size 26? 2908 02:11:14,820 --> 02:11:19,380 It's just like with the linked list before, it just 2909 02:11:19,380 --> 02:11:23,550 tends to be in code convenient to have a separate additional pointer that's 2910 02:11:23,550 --> 02:11:26,220 small that just points to the beginning of the data structure. 2911 02:11:26,220 --> 02:11:29,010 Because that way, it can be null, thereby clearly indicating 2912 02:11:29,010 --> 02:11:30,030 there are no nodes. 2913 02:11:30,030 --> 02:11:31,380 The whole structure is empty. 2914 02:11:31,380 --> 02:11:35,095 If you allocated one of those nodes, you absolutely could. 2915 02:11:35,095 --> 02:11:37,470 But then you'd be just wasting space, even if it's empty. 2916 02:11:37,470 --> 02:11:38,980 And it creates an ambiguity. 2917 02:11:38,980 --> 02:11:41,370 So just having a single pointer linked to the beginnings 2918 02:11:41,370 --> 02:11:44,320 of all of these things is a good thing. 2919 02:11:44,320 --> 02:11:52,750 Other questions now on tries, or trees, or hash tables, or arrays? 2920 02:11:52,750 --> 02:11:55,450 So what problems might arise? 2921 02:11:55,450 --> 02:11:57,520 Well, here's a counterexample. 2922 02:11:57,520 --> 02:12:00,385 What names are manifest in this tri here? 2923 02:12:00,385 --> 02:12:03,330 2924 02:12:03,330 --> 02:12:04,580 Feel free to just call it out. 2925 02:12:04,580 --> 02:12:08,333 2926 02:12:08,333 --> 02:12:09,000 What do you see? 2927 02:12:09,000 --> 02:12:10,320 AUDIENCE: Danielle. 2928 02:12:10,320 --> 02:12:13,360 DAVID J. MALAN: Danielle and Daniel. 2929 02:12:13,360 --> 02:12:16,530 So presumably, if these are two names here, one of which 2930 02:12:16,530 --> 02:12:19,680 is a prefix of another, notice that the data structure still works. 2931 02:12:19,680 --> 02:12:22,830 And I chose a friend's name and then appended a couple of more characters 2932 02:12:22,830 --> 02:12:26,790 to it that's also a name because we have here D-A-N-I-E-L. 2933 02:12:26,790 --> 02:12:30,255 And the green technically is implemented as a bool or something like that 2934 02:12:30,255 --> 02:12:31,950 that indicates a word stops here. 2935 02:12:31,950 --> 02:12:35,130 But we don't want to preclude storing Danielle as well, who's 2936 02:12:35,130 --> 02:12:37,440 a superstring, if you will, of Daniel. 2937 02:12:37,440 --> 02:12:41,250 And so, that's OK too, so long as the structure allows for the pointers 2938 02:12:41,250 --> 02:12:41,860 to keep going. 2939 02:12:41,860 --> 02:12:46,000 So even that works out OK whereas it might not have otherwise. 2940 02:12:46,000 --> 02:12:49,170 And in terms of the running time, just to be clear, at the end of the day, 2941 02:12:49,170 --> 02:12:55,890 tries do give you actual constant time for insertions, lookups, deletions, 2942 02:12:55,890 --> 02:13:00,210 and the like because it's dependent only on the length of the input, the key, 2943 02:13:00,210 --> 02:13:05,100 if you will, and not on how many other people are in your phone or address 2944 02:13:05,100 --> 02:13:06,060 book. 2945 02:13:06,060 --> 02:13:08,760 And I thought we'd conclude with a visual. 2946 02:13:08,760 --> 02:13:13,140 If you've gotten out into the square, anyone recognize this? 2947 02:13:13,140 --> 02:13:15,120 Sweet Green, a local salad place. 2948 02:13:15,120 --> 02:13:18,427 What are we looking at here and what's its connection to today? 2949 02:13:18,427 --> 02:13:20,760 You're about to become all the geekier in the real world 2950 02:13:20,760 --> 02:13:24,300 because you will start to see data structures everywhere. 2951 02:13:24,300 --> 02:13:25,680 What is this? 2952 02:13:25,680 --> 02:13:29,460 Or how does this work maybe in salad form? 2953 02:13:29,460 --> 02:13:32,287 Who's been to Sweet Green. 2954 02:13:32,287 --> 02:13:32,870 Either of you. 2955 02:13:32,870 --> 02:13:34,145 So how does this work? 2956 02:13:34,145 --> 02:13:35,422 [LAUGHTER] 2957 02:13:35,422 --> 02:13:40,713 AUDIENCE: [INAUDIBLE] They'll put your order [INAUDIBLE] 2958 02:13:40,713 --> 02:13:42,568 your name [INAUDIBLE] L. 2959 02:13:42,568 --> 02:13:43,610 DAVID J. MALAN: OK, good. 2960 02:13:43,610 --> 02:13:46,400 So if you order a salad for someone named L, when it's ready, 2961 02:13:46,400 --> 02:13:48,810 they put it in the L section here. 2962 02:13:48,810 --> 02:13:51,680 And so, this is a set of key value pairs, right? 2963 02:13:51,680 --> 02:13:55,710 If L is the first letter of someone's name, the value hopefully is the salad. 2964 02:13:55,710 --> 02:13:58,880 And so, what you kind of have here is a dictionary, key value pairs, 2965 02:13:58,880 --> 02:14:00,500 where it's not words and definitions. 2966 02:14:00,500 --> 02:14:02,300 It's names and salads. 2967 02:14:02,300 --> 02:14:04,550 And you can think of this too as kind of a hash table. 2968 02:14:04,550 --> 02:14:05,050 Why? 2969 02:14:05,050 --> 02:14:07,940 Even though it actually doesn't fit on one long shelf 2970 02:14:07,940 --> 02:14:10,520 because the store is only so big, this is really an array. 2971 02:14:10,520 --> 02:14:13,580 And apparently, A is missing or maybe it's around the corner. 2972 02:14:13,580 --> 02:14:16,440 But this array just happens to wrap onto multiple lines. 2973 02:14:16,440 --> 02:14:18,680 But it's still conceptually a single dimension. 2974 02:14:18,680 --> 02:14:20,960 But suppose two people have the name L? 2975 02:14:20,960 --> 02:14:23,102 What do they do typically? 2976 02:14:23,102 --> 02:14:26,300 AUDIENCE: They just put the one [INAUDIBLE] second letter [INAUDIBLE].. 2977 02:14:26,300 --> 02:14:27,175 DAVID J. MALAN: Yeah. 2978 02:14:27,175 --> 02:14:29,870 So maybe they-- if they want to put that much effort into it. 2979 02:14:29,870 --> 02:14:31,850 They might look at the second letter and then the third letter. 2980 02:14:31,850 --> 02:14:35,270 Odds are this is not that interesting a problem to solve optimally in that way. 2981 02:14:35,270 --> 02:14:39,000 But they probably do start stacking the salads on top of each other 2982 02:14:39,000 --> 02:14:41,040 or maybe scooching it over just a little bit. 2983 02:14:41,040 --> 02:14:42,390 And so, what do you have there? 2984 02:14:42,390 --> 02:14:45,770 Well, now, if you start to view the lens through CS50 glasses, 2985 02:14:45,770 --> 02:14:46,820 OK, you have an array. 2986 02:14:46,820 --> 02:14:49,140 And then you have these linked lists that are sort of growing here. 2987 02:14:49,140 --> 02:14:50,490 But even then you run into a problem. 2988 02:14:50,490 --> 02:14:50,780 Why? 2989 02:14:50,780 --> 02:14:52,070 Because it's not really a linked list. 2990 02:14:52,070 --> 02:14:54,612 Because at some point you're going to hit the boundary here. 2991 02:14:54,612 --> 02:14:57,320 So it's kind of like an array of arrays because you can only fit, 2992 02:14:57,320 --> 02:14:59,400 what, three or four salads here. 2993 02:14:59,400 --> 02:15:01,370 And so, long story short, we started today 2994 02:15:01,370 --> 02:15:04,610 deliberately talking about real-world things like stacks and queues. 2995 02:15:04,610 --> 02:15:07,910 And even though it did escalate quickly into binary search trees, 2996 02:15:07,910 --> 02:15:11,362 and hash tables, and tries, even those things are everywhere. 2997 02:15:11,362 --> 02:15:13,070 Even though they don't call them as such, 2998 02:15:13,070 --> 02:15:14,612 these are just solutions to problems. 2999 02:15:14,612 --> 02:15:16,910 And now, with this final week of C under your belt, 3000 02:15:16,910 --> 02:15:19,310 you have all the more of a technical toolkit 3001 02:15:19,310 --> 02:15:21,500 by which to implement these things in code. 3002 02:15:21,500 --> 02:15:25,310 Next week we'll be able to trust that someone else solved all these problems. 3003 02:15:25,310 --> 02:15:26,450 We'll introduce Python. 3004 02:15:26,450 --> 02:15:30,083 And lines of code like this will finally become lines of code like that. 3005 02:15:30,083 --> 02:15:31,250 So that's the promise ahead. 3006 02:15:31,250 --> 02:15:32,510 And we'll see you next time. 3007 02:15:32,510 --> 02:15:33,710 [APPLAUSE] 3008 02:15:33,710 --> 02:15:36,460 [MUSIC PLAYING] 3009 02:15:36,460 --> 02:16:11,000