1 00:00:00,000 --> 00:01:18,380 [MUSIC PLAYING] 2 00:01:18,380 --> 00:01:20,000 SPEAKER 1: All right. 3 00:01:20,000 --> 00:01:21,830 This is CS 50. 4 00:01:21,830 --> 00:01:24,740 And this is already week 5, which means this is actually 5 00:01:24,740 --> 00:01:27,240 our last week in C together. 6 00:01:27,240 --> 00:01:31,070 In fact, in just a few days' time, what has looked like this 7 00:01:31,070 --> 00:01:33,490 and much more cryptic than this perhaps, is 8 00:01:33,490 --> 00:01:35,990 going to be distilled into something much simpler next week. 9 00:01:35,990 --> 00:01:38,150 When we transition to a language called Python. 10 00:01:38,150 --> 00:01:42,470 And with Python, we'll still have our conditionals, and loops, and functions, 11 00:01:42,470 --> 00:01:43,173 and so forth. 12 00:01:43,173 --> 00:01:46,340 But a lot of the low-level plumbing that you might have been wrestling with, 13 00:01:46,340 --> 00:01:49,020 struggling with, frustrated by, over the past couple of weeks, 14 00:01:49,020 --> 00:01:51,320 especially, now that we've introduced pointers. 15 00:01:51,320 --> 00:01:54,200 And it feels like you probably have to do everything yourself. 16 00:01:54,200 --> 00:01:57,060 In Python, and in a lot of higher level languages 17 00:01:57,060 --> 00:01:59,450 so to speak-- more modern, more recent languages, 18 00:01:59,450 --> 00:02:02,540 you'll be able to do so much more with just single lines of code. 19 00:02:02,540 --> 00:02:05,540 And indeed, we're going to start leveraging libraries, all the more code 20 00:02:05,540 --> 00:02:06,980 that other people wrote. 21 00:02:06,980 --> 00:02:10,160 Frameworks, which is collections of libraries that other people wrote. 22 00:02:10,160 --> 00:02:13,610 And on top of all that, will you be able to make even better, grander, more 23 00:02:13,610 --> 00:02:17,210 impressive projects, that actually solve problems of particular interest to you. 24 00:02:17,210 --> 00:02:20,100 Particularly, by way of your own final project. 25 00:02:20,100 --> 00:02:23,600 So last week though, in week 4, recall that we focused on memory. 26 00:02:23,600 --> 00:02:26,210 And we've been treating this memory inside of your computer 27 00:02:26,210 --> 00:02:27,560 is like a canvas, right. 28 00:02:27,560 --> 00:02:30,770 At the end of the day, it's just zeros and ones, or bytes, really. 29 00:02:30,770 --> 00:02:33,900 And it's really up to you what you do with those bytes. 30 00:02:33,900 --> 00:02:37,400 And how you interconnect them, how you represent information on them. 31 00:02:37,400 --> 00:02:39,478 And arrays, were like one of the simplest ways. 32 00:02:39,478 --> 00:02:41,270 We started playing around with that memory. 33 00:02:41,270 --> 00:02:43,160 Just contiguous chunks of memory. 34 00:02:43,160 --> 00:02:44,300 Back-to-back, to back. 35 00:02:44,300 --> 00:02:47,030 But let's consider, for a moment, some of the problems that 36 00:02:47,030 --> 00:02:48,620 pretty quickly arise with arrays. 37 00:02:48,620 --> 00:02:52,190 And then, today focus on what more generally are called data structures. 38 00:02:52,190 --> 00:02:57,110 Using your computer's memory as a much more versatile canvas, 39 00:02:57,110 --> 00:02:59,380 to create even two-dimensional structures. 40 00:02:59,380 --> 00:03:01,130 To represent information, and, ultimately, 41 00:03:01,130 --> 00:03:03,210 to solve more interesting problems. 42 00:03:03,210 --> 00:03:04,790 So here's an array of size 3. 43 00:03:04,790 --> 00:03:06,590 Maybe, the size of 3 integers. 44 00:03:06,590 --> 00:03:08,838 And suppose that this is inside of a program. 45 00:03:08,838 --> 00:03:11,630 And at this point in the story, you've got 3 numbers in it already. 46 00:03:11,630 --> 00:03:13,040 1, 2 and 3. 47 00:03:13,040 --> 00:03:17,077 And suppose, whatever the context, you need to now add a fourth number 48 00:03:17,077 --> 00:03:17,660 to this array. 49 00:03:17,660 --> 00:03:18,950 Like, the number 4. 50 00:03:18,950 --> 00:03:21,967 Well, instinctively, where should the number 4 go? 51 00:03:21,967 --> 00:03:24,050 If this is your computer's memory and we currently 52 00:03:24,050 --> 00:03:25,759 have this array 1, 2, 3, from what. 53 00:03:25,759 --> 00:03:27,110 Left to right. 54 00:03:27,110 --> 00:03:30,340 Where should the number 4 just, perhaps, naively go. 55 00:03:30,340 --> 00:03:31,340 Yeah, what do you think? 56 00:03:31,340 --> 00:03:32,420 AUDIENCE: Replace number 1. 57 00:03:32,420 --> 00:03:32,930 SPEAKER 1: Sorry? 58 00:03:32,930 --> 00:03:33,830 AUDIENCE: Replace number 1. 59 00:03:33,830 --> 00:03:34,580 SPEAKER 1: Oh, OK. 60 00:03:34,580 --> 00:03:36,020 So you could replace number 1. 61 00:03:36,020 --> 00:03:37,895 I don't really like that, though, because I'd 62 00:03:37,895 --> 00:03:39,290 like to keep number 1 around. 63 00:03:39,290 --> 00:03:40,580 But that's an option. 64 00:03:40,580 --> 00:03:42,330 But I'm losing, of course, information. 65 00:03:42,330 --> 00:03:44,790 So what else could I do if I want to add the number 4. 66 00:03:44,790 --> 00:03:45,290 Over there? 67 00:03:45,290 --> 00:03:46,665 AUDIENCE: On the right side of 3. 68 00:03:46,665 --> 00:03:47,332 SPEAKER 1: Yeah. 69 00:03:47,332 --> 00:03:49,472 So, I mean, it feels like if there's some ordering 70 00:03:49,472 --> 00:03:51,680 to these, which seems kind of a reasonable inference, 71 00:03:51,680 --> 00:03:53,780 that it probably belongs somewhere over here. 72 00:03:53,780 --> 00:03:57,260 But recall last week, as we started poking around a computer's memory, 73 00:03:57,260 --> 00:03:59,130 there's other stuff potentially going on. 74 00:03:59,130 --> 00:04:02,750 And if fill that in, ideally, we'd want to just plop the number 4 here. 75 00:04:02,750 --> 00:04:04,580 If we're maintaining this kind of order. 76 00:04:04,580 --> 00:04:06,980 But recall in the context of your computer's memory, 77 00:04:06,980 --> 00:04:08,420 there might be other stuff there. 78 00:04:08,420 --> 00:04:10,932 Some of these garbage values that might be usable, 79 00:04:10,932 --> 00:04:12,890 but we don't really know or care what they are. 80 00:04:12,890 --> 00:04:14,480 As represented by Oscar here. 81 00:04:14,480 --> 00:04:17,510 But there might actually be useful data in use. 82 00:04:17,510 --> 00:04:20,900 Like, if your program has not just a few integers in this array, 83 00:04:20,900 --> 00:04:23,030 but also a string that says like, "Hello, world." 84 00:04:23,030 --> 00:04:29,090 It could be that your computer has plopped the H-E-L-L-O W-O-R-L-D right 85 00:04:29,090 --> 00:04:30,210 after this array. 86 00:04:30,210 --> 00:04:30,710 Why? 87 00:04:30,710 --> 00:04:32,960 Well, maybe, you created the array in one line of code 88 00:04:32,960 --> 00:04:34,610 and filled it with 1, 2, 3. 89 00:04:34,610 --> 00:04:37,010 Maybe the next line of code used GET-STRING. 90 00:04:37,010 --> 00:04:40,230 Or maybe just hard coded a string in your code for "Hello, world." 91 00:04:40,230 --> 00:04:42,977 And so you painted yourself into a corner, so to speak. 92 00:04:42,977 --> 00:04:45,560 Now I think you might claim, well, let's just overwrite the H. 93 00:04:45,560 --> 00:04:47,510 But that's problematic for the same reasons. 94 00:04:47,510 --> 00:04:49,230 We don't want to do that. 95 00:04:49,230 --> 00:04:52,130 So where else could the 4 go? 96 00:04:52,130 --> 00:04:55,370 Or how do we solve this problem if we want to add a number, 97 00:04:55,370 --> 00:04:57,080 and there's clearly memory available. 98 00:04:57,080 --> 00:05:00,470 Because those garbage values are junk that we don't care about anymore. 99 00:05:00,470 --> 00:05:02,600 So we could certainly reuse those. 100 00:05:02,600 --> 00:05:06,240 Where could the 4, and perhaps this whole array, go? 101 00:05:06,240 --> 00:05:06,740 OK. 102 00:05:06,740 --> 00:05:08,570 So I'm hearing we could move it somewhere. 103 00:05:08,570 --> 00:05:10,403 Maybe, replace some of those garbage values. 104 00:05:10,403 --> 00:05:12,420 And honestly, we have a lot of options. 105 00:05:12,420 --> 00:05:14,660 We could use any of these garbage values up here. 106 00:05:14,660 --> 00:05:17,400 We could use any of these down here, or even further down. 107 00:05:17,400 --> 00:05:20,960 The point is there is plenty of memory available as 108 00:05:20,960 --> 00:05:24,410 indicated by these Oscars, where we could put 4, maybe even, 5, 109 00:05:24,410 --> 00:05:25,790 6 or more integers. 110 00:05:25,790 --> 00:05:28,970 The catch is that we chose poorly early on. 111 00:05:28,970 --> 00:05:30,050 Or we just got unlucky. 112 00:05:30,050 --> 00:05:33,686 And 1, 2, 3 ended up back-to-back with some other data that we care about. 113 00:05:33,686 --> 00:05:34,769 All right, so that's fine. 114 00:05:34,769 --> 00:05:37,579 Let's go ahead and assume that we'll abstract away everything else. 115 00:05:37,579 --> 00:05:40,745 And we'll plop the new array in this location here. 116 00:05:40,745 --> 00:05:42,620 So I'm going to go ahead and copy the 1 over. 117 00:05:42,620 --> 00:05:43,520 The 2 over. 118 00:05:43,520 --> 00:05:44,420 The 3 over. 119 00:05:44,420 --> 00:05:47,152 And then, ultimately, once I'm ready to fill the 4, 120 00:05:47,152 --> 00:05:49,610 I can throw away, essentially, the old array at this point. 121 00:05:49,610 --> 00:05:51,620 Because I have it now entirely in duplicate. 122 00:05:51,620 --> 00:05:53,760 And I can populate it with the number 4. 123 00:05:53,760 --> 00:05:54,260 All right. 124 00:05:54,260 --> 00:05:55,130 So problem solved. 125 00:05:55,130 --> 00:05:58,100 That is a correct potential solution to this problem. 126 00:05:58,100 --> 00:05:59,183 But, what's the trade off? 127 00:05:59,183 --> 00:06:02,142 And this is something we're going to start thinking about all the more. 128 00:06:02,142 --> 00:06:04,820 What's the downside of having solved this problem in this way? 129 00:06:04,820 --> 00:06:06,415 Yeah. 130 00:06:06,415 --> 00:06:07,790 I'm adding a lot of running time. 131 00:06:07,790 --> 00:06:10,580 It took me a lot of effort to copy those additional numbers. 132 00:06:10,580 --> 00:06:12,020 Now, granted, it's a small array. 133 00:06:12,020 --> 00:06:13,020 3 numbers, who cares. 134 00:06:13,020 --> 00:06:14,895 It's going to be over in the blink of an eye. 135 00:06:14,895 --> 00:06:17,580 But if we start talking about interesting data sets, 136 00:06:17,580 --> 00:06:20,190 web application data sets, mobile app data sets. 137 00:06:20,190 --> 00:06:23,670 Where you have not just a few, but maybe a few hundred, few thousand, 138 00:06:23,670 --> 00:06:25,630 a few million pieces of data. 139 00:06:25,630 --> 00:06:28,770 This is probably a suboptimal solution to just, oh, 140 00:06:28,770 --> 00:06:30,752 move all your data from one place to another. 141 00:06:30,752 --> 00:06:32,460 Because who's to say that we're not going 142 00:06:32,460 --> 00:06:34,050 to paint ourselves into a new corner. 143 00:06:34,050 --> 00:06:37,260 And it would feel like you're wasting all of this time moving stuff around. 144 00:06:37,260 --> 00:06:41,110 And, ultimately, just costing yourself a huge amount of time. 145 00:06:41,110 --> 00:06:44,130 In fact, if we put this now into the context of our Big O notation 146 00:06:44,130 --> 00:06:49,050 from a few weeks back, what might the running time now of Search 147 00:06:49,050 --> 00:06:50,160 be for an array? 148 00:06:50,160 --> 00:06:51,270 Let's start simple. 149 00:06:51,270 --> 00:06:53,430 A throwback a couple of weeks ago. 150 00:06:53,430 --> 00:06:56,580 If you're using an array, to recap, what was the running time 151 00:06:56,580 --> 00:06:59,590 of a Search algorithm in Big O notation? 152 00:06:59,590 --> 00:07:01,770 So, maybe, in the worst case. 153 00:07:01,770 --> 00:07:05,550 If you've got n numbers, 3 in this case or 4, but n more generally. 154 00:07:05,550 --> 00:07:08,320 Big O of what for Search? 155 00:07:08,320 --> 00:07:08,820 Yeah. 156 00:07:08,820 --> 00:07:09,420 What do you think? 157 00:07:09,420 --> 00:07:10,050 AUDIENCE: Big O of n. 158 00:07:10,050 --> 00:07:11,100 SPEAKER 1: Big O of n. 159 00:07:11,100 --> 00:07:12,720 And what's your intuition for that? 160 00:07:12,720 --> 00:07:14,145 AUDIENCE: [INAUDIBLE]. 161 00:07:18,487 --> 00:07:19,070 SPEAKER 1: OK. 162 00:07:19,070 --> 00:07:19,310 Yeah. 163 00:07:19,310 --> 00:07:22,102 So if we go through each element, for instance, from left to right, 164 00:07:22,102 --> 00:07:25,490 then Search is going to take this a Big O running time. 165 00:07:25,490 --> 00:07:28,520 If, though, we're talking about these numbers, specifically. 166 00:07:28,520 --> 00:07:31,490 And now I'll explicitly stipulate that, yeah, they're sorted. 167 00:07:31,490 --> 00:07:32,660 Does that buy us anything? 168 00:07:32,660 --> 00:07:36,950 What would the Big O notation be for Searching an array in this case, 169 00:07:36,950 --> 00:07:39,440 be it of size 3, or 4, or n, more generally. 170 00:07:39,440 --> 00:07:40,490 AUDIENCE: Big O of n. 171 00:07:40,490 --> 00:07:42,290 SPEAKER 1: Big O of, not n, but rather? 172 00:07:42,290 --> 00:07:42,680 AUDIENCE: Log n. 173 00:07:42,680 --> 00:07:43,700 SPEAKER 1: Log n, right. 174 00:07:43,700 --> 00:07:47,708 Because we could use per week zero binary search on an array like this, 175 00:07:47,708 --> 00:07:49,250 we'd have to deal with some rounding. 176 00:07:49,250 --> 00:07:51,440 Because there's not a perfect number of elements at the moment. 177 00:07:51,440 --> 00:07:52,850 But you could use binary search. 178 00:07:52,850 --> 00:07:54,170 Go to the middle roughly. 179 00:07:54,170 --> 00:07:55,910 And then go left or right, left or right, 180 00:07:55,910 --> 00:07:57,660 until you find the element you care about. 181 00:07:57,660 --> 00:08:01,820 So Search remains in Big O of log n when using arrays. 182 00:08:01,820 --> 00:08:03,650 But what about insertion, now? 183 00:08:03,650 --> 00:08:05,690 If we start to think about other operations. 184 00:08:05,690 --> 00:08:09,380 Like, adding a number to this array, or adding a friend to your contacts 185 00:08:09,380 --> 00:08:12,050 app, or Google finding another page on the internet. 186 00:08:12,050 --> 00:08:14,510 So insertion happens all the time. 187 00:08:14,510 --> 00:08:17,330 What's the running time of Insert? 188 00:08:17,330 --> 00:08:20,630 When it comes to inserting into an existing array of size n. 189 00:08:20,630 --> 00:08:23,300 How many steps might that take? 190 00:08:23,300 --> 00:08:24,170 Big O of n. 191 00:08:24,170 --> 00:08:25,220 It would be, indeed, n. 192 00:08:25,220 --> 00:08:25,720 Why? 193 00:08:25,720 --> 00:08:28,580 Because in the worst case, where you're out of space, 194 00:08:28,580 --> 00:08:31,148 you have to allocate, it would seem, a new array. 195 00:08:31,148 --> 00:08:33,440 Maybe, taking over some of the previous garbage values. 196 00:08:33,440 --> 00:08:35,180 But the catch is, even though you're only 197 00:08:35,180 --> 00:08:37,550 inserting one new number, like the number 4, 198 00:08:37,550 --> 00:08:41,070 you have to copy over all the darn existing numbers into the new one. 199 00:08:41,070 --> 00:08:44,060 So if your original array of size n, the copying of that 200 00:08:44,060 --> 00:08:45,930 is going to take Big O of n plus 1. 201 00:08:45,930 --> 00:08:48,930 But we can throw away the plus 1 because of the math we did in the past. 202 00:08:48,930 --> 00:08:51,860 So Insert now becomes Big O of n. 203 00:08:51,860 --> 00:08:53,720 And that might not be ideal. 204 00:08:53,720 --> 00:08:56,510 Because if you're in the habit of inserting things frequently, 205 00:08:56,510 --> 00:08:58,880 that could start to add up, and add up, and add up. 206 00:08:58,880 --> 00:09:01,820 And this is why computer programs, and websites, and mobile apps 207 00:09:01,820 --> 00:09:02,990 could be slow. 208 00:09:02,990 --> 00:09:06,000 If you're not being mindful of these trade offs. 209 00:09:06,000 --> 00:09:10,010 So what about, just for good measure, Omega notation. 210 00:09:10,010 --> 00:09:11,270 And maybe, the best case. 211 00:09:11,270 --> 00:09:13,760 Well just to recap here, we could get lucky 212 00:09:13,760 --> 00:09:16,052 and Search could just take one step. 213 00:09:16,052 --> 00:09:18,260 Because you might just get lucky, and boom the number 214 00:09:18,260 --> 00:09:20,810 you're looking for is right there in the middle, if using binary search. 215 00:09:20,810 --> 00:09:22,670 Or even linear search, for that matter. 216 00:09:22,670 --> 00:09:23,720 And insert 2. 217 00:09:23,720 --> 00:09:27,710 If there's enough room, and we didn't have to move all of those numbers-- 218 00:09:27,710 --> 00:09:29,247 1, 2, and 3, to a new location. 219 00:09:29,247 --> 00:09:30,080 You could get lucky. 220 00:09:30,080 --> 00:09:32,240 And we could have, as someone suggested, just 221 00:09:32,240 --> 00:09:34,038 put the number 4 right there at the end. 222 00:09:34,038 --> 00:09:36,080 And if we don't get lucky, it might take n steps. 223 00:09:36,080 --> 00:09:39,960 If we do get lucky, it might just take the one, or constant number, of steps. 224 00:09:39,960 --> 00:09:41,670 In fact, let me go ahead and do this. 225 00:09:41,670 --> 00:09:43,320 How about we do something like this? 226 00:09:43,320 --> 00:09:45,020 Let me switch over to some code here. 227 00:09:45,020 --> 00:09:48,110 Let me start to make a program called List.C. 228 00:09:48,110 --> 00:09:50,789 And in List.C, let's start with the old way. 229 00:09:50,789 --> 00:09:54,030 So we follow the breadcrumbs we've laid for ourselves as follows. 230 00:09:54,030 --> 00:09:57,470 So in this List.C, I'm going to include standardio.h. 231 00:09:57,470 --> 00:09:59,450 Int main(void) as usual. 232 00:09:59,450 --> 00:10:02,780 Then inside of my code here, I'm going to go ahead and give myself 233 00:10:02,780 --> 00:10:04,590 the first version of memory. 234 00:10:04,590 --> 00:10:09,330 So int list 3 is now implemented at the moment, in an array. 235 00:10:09,330 --> 00:10:11,687 So we're rewinding for now to week 2 style code. 236 00:10:11,687 --> 00:10:13,520 And then, let me just initialize this thing. 237 00:10:13,520 --> 00:10:15,200 At the first location will be 1. 238 00:10:15,200 --> 00:10:17,240 At the next location will be 2. 239 00:10:17,240 --> 00:10:19,910 And at the last location will be 3. 240 00:10:19,910 --> 00:10:22,240 So the array is zero indexed always. 241 00:10:22,240 --> 00:10:23,990 I, for just the sake of discussion though, 242 00:10:23,990 --> 00:10:27,420 am putting in the numbers 1, 2, 3, like a normal person might. 243 00:10:27,420 --> 00:10:27,920 All right. 244 00:10:27,920 --> 00:10:29,337 So now let's just print these out. 245 00:10:29,337 --> 00:10:30,800 4 int i gets 0. 246 00:10:30,800 --> 00:10:32,840 I less than 3, i++. 247 00:10:32,840 --> 00:10:35,750 Let's go ahead now and print out using printf. 248 00:10:35,750 --> 00:10:38,660 %i/n list [i]. 249 00:10:38,660 --> 00:10:42,290 So very simple program, inspired by what we did in week 2. 250 00:10:42,290 --> 00:10:46,200 Just to create and then print out the contents of an array. 251 00:10:46,200 --> 00:10:48,380 So let's Make List. 252 00:10:48,380 --> 00:10:52,460 So far, so good. ./list And voila, we see 1, 2, 3. 253 00:10:52,460 --> 00:10:57,470 Now let's start to practice some of what we're preaching with this new syntax. 254 00:10:57,470 --> 00:11:02,060 So let me go in now and get rid of the array version. 255 00:11:02,060 --> 00:11:04,910 And let me zoom out a little bit to give ourselves some more space. 256 00:11:04,910 --> 00:11:08,450 And now let's begin to create a list of size 3. 257 00:11:08,450 --> 00:11:11,630 So if I'm going to do this now, dynamically, 258 00:11:11,630 --> 00:11:15,780 so that I'm allocating these things again and again, 259 00:11:15,780 --> 00:11:17,430 let me go ahead and do this. 260 00:11:17,430 --> 00:11:24,470 Let me give myself a list that's of type int* equal the return value of malloc 261 00:11:24,470 --> 00:11:31,490 of 3 times the size of an int, so what this is going to do for me is give me 262 00:11:31,490 --> 00:11:34,490 enough memory for that very first picture we drew on the board. 263 00:11:34,490 --> 00:11:37,160 Which was the array containing 1, 2, and 3. 264 00:11:37,160 --> 00:11:39,990 But laying the foundation to be able to resize it, 265 00:11:39,990 --> 00:11:41,580 which was ultimately the goal. 266 00:11:41,580 --> 00:11:43,650 So my syntax is a little different here. 267 00:11:43,650 --> 00:11:47,090 I'm going to use malloc and get memory from the so-called "heap", as we 268 00:11:47,090 --> 00:11:48,000 called it last week. 269 00:11:48,000 --> 00:11:51,890 Instead of using the stack by just doing the previous version where I said, 270 00:11:51,890 --> 00:11:54,680 int list 3. 271 00:11:54,680 --> 00:11:59,090 That is to say this line of code from the first version is in some sense 272 00:11:59,090 --> 00:12:02,630 identical to this line of code in the second version. 273 00:12:02,630 --> 00:12:04,730 But the first line of code puts the memory 274 00:12:04,730 --> 00:12:06,890 on the stack, automatically, for me. 275 00:12:06,890 --> 00:12:09,800 The second line of code, that I've left here now, 276 00:12:09,800 --> 00:12:13,280 is creating an array of size 3, but it's putting it on the heap. 277 00:12:13,280 --> 00:12:16,900 And that's important because it was only on the heap and via this new function 278 00:12:16,900 --> 00:12:17,830 last week, malloc. 279 00:12:17,830 --> 00:12:20,860 That you can actually ask for more memory, and even give it back. 280 00:12:20,860 --> 00:12:24,760 When you just use the first notation int list 3, 281 00:12:24,760 --> 00:12:28,150 you have permanently given yourself an array of size 3. 282 00:12:28,150 --> 00:12:31,130 You cannot add to that in code. 283 00:12:31,130 --> 00:12:33,010 So let me go ahead and do this. 284 00:12:33,010 --> 00:12:36,143 If list==null, something went wrong. 285 00:12:36,143 --> 00:12:37,310 The computers out of memory. 286 00:12:37,310 --> 00:12:39,503 So let's just return 1 and quit out of this program. 287 00:12:39,503 --> 00:12:40,670 There's nothing to see here. 288 00:12:40,670 --> 00:12:42,520 So just a good error check there. 289 00:12:42,520 --> 00:12:44,770 Now let me go ahead and initialize this list. 290 00:12:44,770 --> 00:12:46,720 So list [0] will be 1 again. 291 00:12:46,720 --> 00:12:48,070 List [1] will be 2. 292 00:12:48,070 --> 00:12:50,440 And list [2] will be 3. 293 00:12:50,440 --> 00:12:52,810 So that's the same kind of syntax as before. 294 00:12:52,810 --> 00:12:55,930 And notice this equivalence. 295 00:12:55,930 --> 00:13:00,730 Recall that there's this relationship between chunks of memory and arrays. 296 00:13:00,730 --> 00:13:03,550 And arrays are really just doing pointer arithmetic for you, 297 00:13:03,550 --> 00:13:05,260 where the square bracket notation is. 298 00:13:05,260 --> 00:13:10,030 So if I've asked myself here, in line 5, for enough memory for 3 integers, 299 00:13:10,030 --> 00:13:15,250 it is perfectly OK to treat it now like an array using square bracket notation. 300 00:13:15,250 --> 00:13:17,740 Because the computer will do the arithmetic for me 301 00:13:17,740 --> 00:13:20,440 and find the first location, the second, and the third. 302 00:13:20,440 --> 00:13:24,550 If you really want to be cool and hacker-like, well, 303 00:13:24,550 --> 00:13:31,300 you could say list=1, list+1=2, list+2=3. 304 00:13:33,880 --> 00:13:36,220 That's the same thing using very explicit, 305 00:13:36,220 --> 00:13:38,830 pointer arithmetic, which we looked at briefly last week. 306 00:13:38,830 --> 00:13:41,170 But this is atrocious to look at for most people. 307 00:13:41,170 --> 00:13:42,860 It's just not very user friendly. 308 00:13:42,860 --> 00:13:45,790 It's longer to type, so most people, even when 309 00:13:45,790 --> 00:13:48,670 allocating memory dynamically as I did a second ago, 310 00:13:48,670 --> 00:13:52,630 would just use the more familiar notation of an array. 311 00:13:52,630 --> 00:13:53,240 All right. 312 00:13:53,240 --> 00:13:54,310 So let's go on. 313 00:13:54,310 --> 00:13:58,840 Now suppose time passes and I realize, oh shoot, 314 00:13:58,840 --> 00:14:03,820 I really wanted this array to be of size 4 instead of size 3. 315 00:14:03,820 --> 00:14:06,362 Now, obviously, I could just rewind and like fix the program. 316 00:14:06,362 --> 00:14:08,320 But suppose that this is a much larger program. 317 00:14:08,320 --> 00:14:10,690 And I've realized, at this point, that I need 318 00:14:10,690 --> 00:14:14,080 to be able to dynamically add more things to this array for whatever 319 00:14:14,080 --> 00:14:14,740 reason. 320 00:14:14,740 --> 00:14:16,280 Well let me go ahead and do this. 321 00:14:16,280 --> 00:14:18,670 Let me just say, all right, list should actually 322 00:14:18,670 --> 00:14:24,700 be the result of asking for 4 chunks of memory from malloc. 323 00:14:24,700 --> 00:14:28,735 And then, I could do something like this, list [3]=4. 324 00:14:31,690 --> 00:14:34,700 Now this is buggy, potentially, in a couple of ways. 325 00:14:34,700 --> 00:14:41,530 But let me ask first, what's really wrong, first, with this code? 326 00:14:41,530 --> 00:14:45,850 The goal at hand is to start with the array of size 3 with the 1, 2, 3. 327 00:14:45,850 --> 00:14:47,660 And I want to add a number 4 to it. 328 00:14:47,660 --> 00:14:53,380 So at the moment, in line 17, I've asked the computer for a chunk of 4 integers. 329 00:14:53,380 --> 00:14:54,940 Just like the picture. 330 00:14:54,940 --> 00:14:57,130 And then I'm adding the number 4 to it. 331 00:14:57,130 --> 00:15:00,610 But I have skipped a few steps and broken this somehow. 332 00:15:00,610 --> 00:15:01,894 Yeah. 333 00:15:01,894 --> 00:15:04,023 AUDIENCE: You don't know exactly [INAUDIBLE].. 334 00:15:04,023 --> 00:15:04,690 SPEAKER 1: Yeah. 335 00:15:04,690 --> 00:15:07,060 I don't necessarily know where this is going to end up in memory. 336 00:15:07,060 --> 00:15:08,560 It's probably not going to be immediately 337 00:15:08,560 --> 00:15:09,910 adjacent to the previous chunk. 338 00:15:09,910 --> 00:15:12,740 And so, yes, even though I'm putting the number for there, 339 00:15:12,740 --> 00:15:16,700 I haven't copied the 1, the 2, or the 3 over to this chunk of memory. 340 00:15:16,700 --> 00:15:18,400 So well let me fix-- 341 00:15:18,400 --> 00:15:22,630 well, that's actually, indeed, really the essence of the problem. 342 00:15:22,630 --> 00:15:26,080 I am orphaning the original chunk of memory. 343 00:15:26,080 --> 00:15:29,260 If you think of the picture that I drew earlier, the line of code 344 00:15:29,260 --> 00:15:35,500 up here on line 5 that allocates space for the initial 3 integers. 345 00:15:35,500 --> 00:15:36,820 This code is fine. 346 00:15:36,820 --> 00:15:38,270 This code is fine. 347 00:15:38,270 --> 00:15:41,650 But as soon as I do this, I'm clobbering the value of list. 348 00:15:41,650 --> 00:15:43,960 And saying no, don't point at this chunk of memory. 349 00:15:43,960 --> 00:15:47,900 Point at this chunk of memory, at which point I've forgotten if you will, 350 00:15:47,900 --> 00:15:50,230 where the original chunk of memory is. 351 00:15:50,230 --> 00:15:54,820 So the right way to do something like this, would be a little more involved. 352 00:15:54,820 --> 00:15:57,398 Let me go ahead and give myself a temporary variable. 353 00:15:57,398 --> 00:15:58,690 And I'll literally call it TMP. 354 00:15:58,690 --> 00:16:00,820 T-M-P, like I did last week. 355 00:16:00,820 --> 00:16:04,120 So that I can now ask the computer for a completely different chunk of memory 356 00:16:04,120 --> 00:16:05,290 of size 4. 357 00:16:05,290 --> 00:16:08,230 I'm going to again say if TMP equals null, 358 00:16:08,230 --> 00:16:10,370 I'm going to say bad things happened here. 359 00:16:10,370 --> 00:16:11,560 So let me just return 1. 360 00:16:11,560 --> 00:16:13,840 And you know what, just to be tidy, let me 361 00:16:13,840 --> 00:16:16,542 free the original list before I quit. 362 00:16:16,542 --> 00:16:18,250 Because remember from last week, any time 363 00:16:18,250 --> 00:16:20,650 you use malloc you eventually have to use free. 364 00:16:20,650 --> 00:16:24,040 But this chunk of code here is just a safety check. 365 00:16:24,040 --> 00:16:26,440 If there's no more memory, there's nothing to see here. 366 00:16:26,440 --> 00:16:29,500 I'm just going to clean up my state and quit. 367 00:16:29,500 --> 00:16:32,840 But now, if I have asked for this chunk of memory, 368 00:16:32,840 --> 00:16:38,200 now I can do this 4 int i gets 0. 369 00:16:38,200 --> 00:16:40,600 I is less than 3, i++. 370 00:16:40,600 --> 00:16:42,520 What if I do something like this? 371 00:16:42,520 --> 00:16:46,540 TMP [i] equals list [i]. 372 00:16:46,540 --> 00:16:50,980 That would seem to have the effect of copying all of the memory from one 373 00:16:50,980 --> 00:16:51,800 to the other. 374 00:16:51,800 --> 00:16:55,510 And then, I think I need to do one last thing TMP [3] 375 00:16:55,510 --> 00:16:57,460 gets the number 4, for instance. 376 00:16:57,460 --> 00:17:01,480 Again, I'm hard coding the numbers for the sake of discussion. 377 00:17:01,480 --> 00:17:06,460 After I've done this, what could I now do? 378 00:17:06,460 --> 00:17:10,990 I could now set list equals to TMP. 379 00:17:10,990 --> 00:17:14,048 And now, I have updated my linked list properly. 380 00:17:14,048 --> 00:17:15,340 So let me go ahead and do this. 381 00:17:15,340 --> 00:17:17,080 4 int i gets 0. 382 00:17:17,080 --> 00:17:19,480 I is less than 4, i++. 383 00:17:19,480 --> 00:17:24,820 Let me go ahead and print each of these elements out with %i using list [i]. 384 00:17:24,820 --> 00:17:27,890 And then, I'm going to return 0 just to signify that all is successful. 385 00:17:27,890 --> 00:17:31,990 Now so to recap, we initialize the original array 386 00:17:31,990 --> 00:17:35,140 of size 3 and plug-in the values 1, 2, 3. 387 00:17:35,140 --> 00:17:35,960 Time passes. 388 00:17:35,960 --> 00:17:38,210 And then, I realize, wait a minute, I need more space. 389 00:17:38,210 --> 00:17:40,585 And so I asked the computer for a second chunk of memory. 390 00:17:40,585 --> 00:17:41,800 This one of size 4. 391 00:17:41,800 --> 00:17:44,467 Just as a safety check, I make sure that TMP doesn't equal null. 392 00:17:44,467 --> 00:17:46,008 Because if it does I'm out of memory. 393 00:17:46,008 --> 00:17:47,590 So I should just quit altogether. 394 00:17:47,590 --> 00:17:50,110 But once I'm sure that it's not null, I'm 395 00:17:50,110 --> 00:17:55,450 going to copy all the values from the old list into the new list. 396 00:17:55,450 --> 00:17:58,910 And then, I'm going to add my new number at the end of that list. 397 00:17:58,910 --> 00:18:02,410 And then, now that I'm done playing around with this temporary variable, 398 00:18:02,410 --> 00:18:05,860 I'm going to remember in my list variable what 399 00:18:05,860 --> 00:18:07,900 the addresses of this new chunk of memory. 400 00:18:07,900 --> 00:18:10,570 And then, I'm going to print all of those values out. 401 00:18:10,570 --> 00:18:14,350 So at least, aesthetically, when I make this new version of my list, 402 00:18:14,350 --> 00:18:16,660 except for my missing semicolon. 403 00:18:16,660 --> 00:18:17,590 Let me try this again. 404 00:18:17,590 --> 00:18:19,480 When I make lists, Oh OK. 405 00:18:19,480 --> 00:18:20,620 What did I do this time? 406 00:18:20,620 --> 00:18:23,290 Implicitly declaring a library function malloc. 407 00:18:23,290 --> 00:18:27,749 What's my mistake any time you see that kind of error? 408 00:18:27,749 --> 00:18:28,510 AUDIENCE: Library. 409 00:18:28,510 --> 00:18:28,800 SPEAKER 1: Yeah. 410 00:18:28,800 --> 00:18:29,380 A library. 411 00:18:29,380 --> 00:18:34,700 So up here, I forgot to do include stdlib.h, which is where malloc lives. 412 00:18:34,700 --> 00:18:36,490 Let me go ahead and, again, do make list. 413 00:18:36,490 --> 00:18:37,250 There we go. 414 00:18:37,250 --> 00:18:38,950 So I fixed that dot/list. 415 00:18:38,950 --> 00:18:41,829 And I should see 1, 2, 3, 4. 416 00:18:41,829 --> 00:18:45,640 But they're still a bug here. 417 00:18:45,640 --> 00:18:48,310 Does anyone see the the-- bug or question? 418 00:18:48,310 --> 00:18:50,100 AUDIENCE: You forgot to free them. 419 00:18:50,100 --> 00:18:50,790 SPEAKER 1: I'm sorry, say again. 420 00:18:50,790 --> 00:18:52,470 AUDIENCE: You forgot to free them. 421 00:18:52,470 --> 00:18:54,570 SPEAKER 1: I forgot to free the original list. 422 00:18:54,570 --> 00:18:58,170 And we could see this, even if not just with our own eyes or intuition. 423 00:18:58,170 --> 00:19:00,847 If I do something like Valgrind of dot/list, 424 00:19:00,847 --> 00:19:02,430 remember our tool from this past week. 425 00:19:02,430 --> 00:19:05,310 Let me increase the size of my terminal window, temporarily. 426 00:19:05,310 --> 00:19:07,540 The output is crazy cryptic at first. 427 00:19:07,540 --> 00:19:12,780 But, notice that I have definitely lost some number of bytes here. 428 00:19:12,780 --> 00:19:15,150 And indeed, it's even pointing at the line number 429 00:19:15,150 --> 00:19:16,930 in which some of those bytes were lost. 430 00:19:16,930 --> 00:19:18,930 So let me go ahead and back to my code. 431 00:19:18,930 --> 00:19:23,610 And indeed, I think what I need to do is, before I clobber the value of list 432 00:19:23,610 --> 00:19:27,150 pointing it at this new chunk of memory instead of the old, 433 00:19:27,150 --> 00:19:29,910 I think I now need to first, proactively, 434 00:19:29,910 --> 00:19:32,460 say free the old list of memory. 435 00:19:32,460 --> 00:19:34,480 And then, change its value. 436 00:19:34,480 --> 00:19:39,250 So if I now do Make List and do dot /list, the output is still the same. 437 00:19:39,250 --> 00:19:42,450 And, if I cross my fingers and run Valgrind again 438 00:19:42,450 --> 00:19:46,440 after increasing my window size, hopefully here. 439 00:19:46,440 --> 00:19:48,160 Oh, still a bug. 440 00:19:48,160 --> 00:19:49,080 So better. 441 00:19:49,080 --> 00:19:52,020 It seems like less memory is lost. 442 00:19:52,020 --> 00:19:54,450 What have I now forgotten to do? 443 00:19:54,450 --> 00:19:56,430 AUDIENCE: You forgot to free the end. 444 00:19:56,430 --> 00:19:58,740 SPEAKER 1: I forgot to free it at the very end, too. 445 00:19:58,740 --> 00:20:01,560 Because I still have a chunk of memory that I got from malloc. 446 00:20:01,560 --> 00:20:04,200 So let me go to the very bottom of the program now. 447 00:20:04,200 --> 00:20:09,330 And after I'm done senselessly just printing this thing out, 448 00:20:09,330 --> 00:20:12,450 let me free the new list. 449 00:20:12,450 --> 00:20:15,780 And now let me do Make List, dot/list. 450 00:20:15,780 --> 00:20:17,670 It's still works, visually. 451 00:20:17,670 --> 00:20:22,200 Now let's do Valgrind of dot/list, Enter. 452 00:20:22,200 --> 00:20:25,530 And now, hopefully, all heap blocks were freed. 453 00:20:25,530 --> 00:20:27,018 No leaks are possible. 454 00:20:27,018 --> 00:20:30,060 So this is perhaps the best output you can see from a tool like Valgrind. 455 00:20:30,060 --> 00:20:32,950 I used the heap, but I freed all the memory as well. 456 00:20:32,950 --> 00:20:34,630 So there were 2 fixes needed there. 457 00:20:34,630 --> 00:20:35,130 All right. 458 00:20:35,130 --> 00:20:38,910 Any questions then on this array-based approach, the first of which 459 00:20:38,910 --> 00:20:41,530 is statically allocating an array, so to speak. 460 00:20:41,530 --> 00:20:43,230 By just hard coding the number 3. 461 00:20:43,230 --> 00:20:47,190 The second version now is dynamically allocating the array, 462 00:20:47,190 --> 00:20:49,380 using not the stack but the heap. 463 00:20:49,380 --> 00:20:52,800 But, it too, suffers from the slowness we described earlier, 464 00:20:52,800 --> 00:20:55,290 of having to copy all those values from one to the other. 465 00:20:55,290 --> 00:20:55,790 OK. 466 00:20:55,790 --> 00:20:57,183 A hand was over here. 467 00:20:57,183 --> 00:20:59,858 AUDIENCE: Why do you not have to free the TMP? 468 00:20:59,858 --> 00:21:00,900 SPEAKER 1: Good question. 469 00:21:00,900 --> 00:21:02,820 Why did I not have to free the TMP? 470 00:21:02,820 --> 00:21:05,130 I essentially did eventually. 471 00:21:05,130 --> 00:21:10,360 Because TMP was pointing at the chunk of 4 integers. 472 00:21:10,360 --> 00:21:15,810 But on line 33 here, I assigned list to be 473 00:21:15,810 --> 00:21:18,580 identical to what TMP was pointing at. 474 00:21:18,580 --> 00:21:23,173 And so, when I finally freed the list, that was the same thing as freeing TMP. 475 00:21:23,173 --> 00:21:26,340 In fact, if I wanted to, I could say free TMP here and it would be the same. 476 00:21:26,340 --> 00:21:28,080 But conceptually, it's wrong. 477 00:21:28,080 --> 00:21:32,130 Because at this point in the story, I should be freeing the actual list, not 478 00:21:32,130 --> 00:21:33,240 that temporary variable. 479 00:21:33,240 --> 00:21:35,340 But they were the same at that point in the story. 480 00:21:35,340 --> 00:21:35,840 Yeah. 481 00:21:35,840 --> 00:21:37,878 AUDIENCE: Is [? the line ?] part of it? 482 00:21:37,878 --> 00:21:38,920 SPEAKER 1: Good question. 483 00:21:38,920 --> 00:21:41,350 And long story short, everything we're doing thus far 484 00:21:41,350 --> 00:21:42,820 is still in the world of arrays. 485 00:21:42,820 --> 00:21:44,710 The only distinction we're making is that 486 00:21:44,710 --> 00:21:51,220 in version 1, when I said int list [3], that was an array of fixed size. 487 00:21:51,220 --> 00:21:55,150 So-called statically allocated on the stack, as per last week. 488 00:21:55,150 --> 00:21:58,900 This version now is still dealing with arrays, but I'm flexing my muscles 489 00:21:58,900 --> 00:22:00,980 and using dynamic memory allocation. 490 00:22:00,980 --> 00:22:03,498 So that I can still use an array per the first pictures 491 00:22:03,498 --> 00:22:04,540 we started talking about. 492 00:22:04,540 --> 00:22:07,070 But I can at least grow the array if I want. 493 00:22:07,070 --> 00:22:10,990 So we haven't even now solved this, even better in a sense, with linked lists. 494 00:22:10,990 --> 00:22:12,080 That's going to come next. 495 00:22:12,080 --> 00:22:12,580 Yeah. 496 00:22:12,580 --> 00:22:16,930 AUDIENCE: How are you able to free list and then still make list? 497 00:22:16,930 --> 00:22:19,720 SPEAKER 1: How am I able to free list? 498 00:22:19,720 --> 00:22:24,310 I freed the original address of list. 499 00:22:24,310 --> 00:22:27,220 I, then, changed what list is storing. 500 00:22:27,220 --> 00:22:30,070 I'm moving its arrow to a new chunk of memory. 501 00:22:30,070 --> 00:22:33,550 And that is perfectly reasonable for me to now manipulate 502 00:22:33,550 --> 00:22:37,180 because now list is pointing at the same value of TMP. 503 00:22:37,180 --> 00:22:42,610 And TMP is what was given the return value of malloc, the second time. 504 00:22:42,610 --> 00:22:44,780 So that chunk of memory is valid. 505 00:22:44,780 --> 00:22:48,220 So these are just squares on the board, right. 506 00:22:48,220 --> 00:22:49,970 There's just pointers inside of them. 507 00:22:49,970 --> 00:22:51,887 So what I'm technically saying is, and I'm not 508 00:22:51,887 --> 00:22:54,040 pointing I'm not freeing list per se, I am 509 00:22:54,040 --> 00:22:58,660 freeing the chunk of memory that begins at the address currently in list. 510 00:22:58,660 --> 00:23:04,060 Therefore, if a few lines later, I change what the address is in list. 511 00:23:04,060 --> 00:23:08,080 Totally reasonable to then touch that memory, and eventually free it later. 512 00:23:08,080 --> 00:23:10,390 Because you're not freeing the variable per se, 513 00:23:10,390 --> 00:23:12,790 you're freeing the address in the variable. 514 00:23:12,790 --> 00:23:13,630 Good distinction. 515 00:23:13,630 --> 00:23:14,140 All right. 516 00:23:14,140 --> 00:23:19,750 So let me back up here and now make one final edit. 517 00:23:19,750 --> 00:23:24,190 So let's finish this with one final improvement here. 518 00:23:24,190 --> 00:23:27,160 Because it turns out, there's a somewhat better way 519 00:23:27,160 --> 00:23:30,610 to actually resize an array as we've been doing here. 520 00:23:30,610 --> 00:23:35,028 And there's another function in stdlib that's called realloc, for re-allocate. 521 00:23:35,028 --> 00:23:37,570 And I'm just going to go in and make a little bit of a change 522 00:23:37,570 --> 00:23:40,578 here so that I can do the following. 523 00:23:40,578 --> 00:23:42,370 Let me go ahead and first comment this now, 524 00:23:42,370 --> 00:23:45,320 just so we can keep track of what's been going on this whole time. 525 00:23:45,320 --> 00:23:51,970 So dynamically allocate an array of size 3. 526 00:23:51,970 --> 00:23:56,650 Assign 3 numbers to that array. 527 00:23:56,650 --> 00:23:58,330 Time passes. 528 00:23:58,330 --> 00:24:03,640 Allocate new array of size 4. 529 00:24:03,640 --> 00:24:09,460 Copy numbers from old array into new array. 530 00:24:09,460 --> 00:24:14,170 And add fourth number to new array. 531 00:24:14,170 --> 00:24:15,895 Free old array. 532 00:24:18,850 --> 00:24:24,460 Remember, if you will, new array using my same list variable. 533 00:24:24,460 --> 00:24:28,960 And now, print new array. 534 00:24:28,960 --> 00:24:31,270 Free new array. 535 00:24:31,270 --> 00:24:32,260 Hopefully, that helps. 536 00:24:32,260 --> 00:24:35,530 And we'll post this code online after 2, which tells a more explicit story. 537 00:24:35,530 --> 00:24:39,220 So it turns out that we can reduce some of the labor involved with this. 538 00:24:39,220 --> 00:24:41,980 Not so much with the printing here, but with this copying. 539 00:24:41,980 --> 00:24:44,260 Turns out c does have a function called realloc, 540 00:24:44,260 --> 00:24:49,580 that can actually handle the resizing of an array for you, as follows. 541 00:24:49,580 --> 00:24:51,700 I'm going to scroll up to where I previously 542 00:24:51,700 --> 00:24:54,820 allocated a new array of size 4. 543 00:24:54,820 --> 00:25:02,020 And I'm instead going to say this, resize old array to be of size 4. 544 00:25:02,020 --> 00:25:04,477 Now, previously this wasn't necessarily possible. 545 00:25:04,477 --> 00:25:06,310 Because recall that we had painted ourselves 546 00:25:06,310 --> 00:25:08,143 into a corner with the example on the screen 547 00:25:08,143 --> 00:25:10,990 where "Hello, world" happened to be right after the original array. 548 00:25:10,990 --> 00:25:12,410 But let me do this. 549 00:25:12,410 --> 00:25:15,340 Let me use realloc, for re-allocate. 550 00:25:15,340 --> 00:25:18,640 And pass in not just the size of memory we want this time, 551 00:25:18,640 --> 00:25:22,330 but also the address that we want to resize. 552 00:25:22,330 --> 00:25:25,940 Which, again, is this array called list. 553 00:25:25,940 --> 00:25:26,440 All right. 554 00:25:26,440 --> 00:25:29,330 The code thereafter is pretty much the same. 555 00:25:29,330 --> 00:25:33,200 But what I don't need to do is this. 556 00:25:33,200 --> 00:25:36,520 So realloc is a pretty handy function that will do the following. 557 00:25:36,520 --> 00:25:39,670 If at the very beginning of class, when we had 1, 2, 3 on the board. 558 00:25:39,670 --> 00:25:43,010 And someone's instinct was to just plop the 4 right at the end of the list. 559 00:25:43,010 --> 00:25:45,760 If there's available memory, realloc will just do that. 560 00:25:45,760 --> 00:25:50,200 And boom, it will just grow the array for you in the computer's memory. 561 00:25:50,200 --> 00:25:54,160 If, though, it realizes, sorry, there's already a string like "Hello, world" 562 00:25:54,160 --> 00:25:57,040 or something else there, realloc will handle 563 00:25:57,040 --> 00:26:00,730 the trouble of moving that whole array from 1 chunk of memory, 564 00:26:00,730 --> 00:26:03,010 originally, to a new chunk of memory. 565 00:26:03,010 --> 00:26:09,400 And then realloc will return to you, the address of that new chunk of memory. 566 00:26:09,400 --> 00:26:13,550 And it will handle the process of freeing the old chunk for you. 567 00:26:13,550 --> 00:26:15,800 So you do not need to do this yourself. 568 00:26:15,800 --> 00:26:19,130 So in fact, let me go ahead and get rid of this as well. 569 00:26:19,130 --> 00:26:24,100 So realloc just condenses, a lot of what we just did, into a single function. 570 00:26:24,100 --> 00:26:28,110 Whereby, realloc handles it for you. 571 00:26:28,110 --> 00:26:28,610 All right. 572 00:26:28,610 --> 00:26:31,670 So that's the final improvement on this array-based approach. 573 00:26:31,670 --> 00:26:34,450 So what now, knowing what your memory is, 574 00:26:34,450 --> 00:26:37,400 what can we now do with it that solves that kind of problem? 575 00:26:37,400 --> 00:26:39,320 Because the world is going to get really slow. 576 00:26:39,320 --> 00:26:42,320 And our apps, and our phones, and our computers are getting really slow, 577 00:26:42,320 --> 00:26:46,550 if we're just constantly wasting time moving things around in memory. 578 00:26:46,550 --> 00:26:48,410 What could we perhaps do instead? 579 00:26:48,410 --> 00:26:50,480 Well there's one new piece of syntax today 580 00:26:50,480 --> 00:26:53,840 that builds on these 3 pieces of syntax from the past. 581 00:26:53,840 --> 00:26:55,700 Recall, that we've looked at struct, which 582 00:26:55,700 --> 00:26:58,820 is a keyword in C, that just lets you invent your own structure. 583 00:26:58,820 --> 00:27:02,060 Your own variable, if you will, in conjunction with typedef. 584 00:27:02,060 --> 00:27:06,200 Which lets you say a person has a name and a number, or something like that. 585 00:27:06,200 --> 00:27:08,660 Or a candidate has a name and some number of votes. 586 00:27:08,660 --> 00:27:13,040 You can encapsulate multiple pieces of data inside of just one using struct. 587 00:27:13,040 --> 00:27:17,160 What did we use the Dot Notation for now, a couple of times? 588 00:27:17,160 --> 00:27:20,468 What does the Dot operator do in C? 589 00:27:20,468 --> 00:27:21,760 AUDIENCE: Access the structure. 590 00:27:21,760 --> 00:27:22,150 SPEAKER 1: Perfect. 591 00:27:22,150 --> 00:27:24,200 To access the field inside of a structure. 592 00:27:24,200 --> 00:27:26,325 So if you've got a person with a name and a number, 593 00:27:26,325 --> 00:27:29,350 you could say something like person.name or person.number, 594 00:27:29,350 --> 00:27:31,510 if person is the name of one such variable. 595 00:27:31,510 --> 00:27:33,850 Star, of course, we've seen now in a few ways. 596 00:27:33,850 --> 00:27:37,540 Like way back in week 1, we saw it as like, multiplication. 597 00:27:37,540 --> 00:27:40,750 Last week, we began to see it in the context of pointers, 598 00:27:40,750 --> 00:27:42,970 whereby, you use it to declare a pointer. 599 00:27:42,970 --> 00:27:45,560 Like, int* p, or something like that. 600 00:27:45,560 --> 00:27:48,040 But we also saw it in one other context, which 601 00:27:48,040 --> 00:27:51,380 was like the opposite, which was the dereference operator. 602 00:27:51,380 --> 00:27:53,272 Which says if this is an address, that is 603 00:27:53,272 --> 00:27:56,230 if this is a variable like a pointer, and you put a star in front of it 604 00:27:56,230 --> 00:27:59,980 then with no int or no char, no data type in front of it. 605 00:27:59,980 --> 00:28:01,870 That means go to that address. 606 00:28:01,870 --> 00:28:05,300 And it dereferences the pointer and goes to that location. 607 00:28:05,300 --> 00:28:07,720 So it turns out that using these 3 building blocks, 608 00:28:07,720 --> 00:28:10,760 you can actually start to now use your computer's memory almost any way 609 00:28:10,760 --> 00:28:11,260 you want. 610 00:28:11,260 --> 00:28:13,720 And even next week, when we transition to Python, 611 00:28:13,720 --> 00:28:16,360 and you start to get a lot of features for free. 612 00:28:16,360 --> 00:28:18,550 Like a single line of code will just do so much 613 00:28:18,550 --> 00:28:23,170 more in Python than it does in C. It boils down to those basic primitives. 614 00:28:23,170 --> 00:28:25,060 And just so you've seen it already. 615 00:28:25,060 --> 00:28:29,770 It turns out that it's so common in C to use this operator 616 00:28:29,770 --> 00:28:33,790 to go inside of a structure and this operator to go to an address, 617 00:28:33,790 --> 00:28:36,250 that there's shorthand notation for it, a.k.a. 618 00:28:36,250 --> 00:28:37,450 syntactic sugar. 619 00:28:37,450 --> 00:28:39,095 That literally looks like an arrow. 620 00:28:39,095 --> 00:28:41,470 So recall last week, I was in the habit of pointing, even 621 00:28:41,470 --> 00:28:42,670 with the big foam finger. 622 00:28:42,670 --> 00:28:47,020 This arrow notation, a hyphen and an angled bracket, 623 00:28:47,020 --> 00:28:53,950 denotes going to an address and looking at a field inside of it. 624 00:28:53,950 --> 00:28:56,240 But we'll see this in practice in just a bit. 625 00:28:56,240 --> 00:28:59,110 So what might be the solution, now, to this problem 626 00:28:59,110 --> 00:29:02,620 we saw a moment ago whereby, we had painted ourselves into a corner. 627 00:29:02,620 --> 00:29:05,900 And our memory, a few moments ago, looked like this. 628 00:29:05,900 --> 00:29:10,720 We could just copy the whole existing array to a new location, add the 4, 629 00:29:10,720 --> 00:29:12,010 and go about our business. 630 00:29:12,010 --> 00:29:15,850 What would another, perhaps better solution longer term 631 00:29:15,850 --> 00:29:21,145 be, that doesn't require constantly moving stuff around? 632 00:29:21,145 --> 00:29:23,020 Maybe hang in there for your instincts if you 633 00:29:23,020 --> 00:29:27,200 know the buzz phrase we're looking for from past experience, hang in there. 634 00:29:27,200 --> 00:29:29,800 But if we want to avoid moving the 1, 2, and the 3, 635 00:29:29,800 --> 00:29:32,500 but we still want to be able to add endless amounts of data. 636 00:29:32,500 --> 00:29:33,980 What could we do? 637 00:29:33,980 --> 00:29:34,480 Yeah. 638 00:29:34,480 --> 00:29:37,390 So maybe create some kind of list using pointers that 639 00:29:37,390 --> 00:29:39,370 just point at a new location, right. 640 00:29:39,370 --> 00:29:42,490 In an ideal world, even though this piece of memory 641 00:29:42,490 --> 00:29:45,430 is being used by this h in the string "Hello, world", 642 00:29:45,430 --> 00:29:47,980 maybe we could somehow use a pointer from last week. 643 00:29:47,980 --> 00:29:52,330 Like an arrow, that says after the 3, oh I don't know, go down over here 644 00:29:52,330 --> 00:29:54,040 to this location in memory. 645 00:29:54,040 --> 00:29:58,310 And you just stitch together these integers in memory 646 00:29:58,310 --> 00:30:00,340 so that each one leads to the next. 647 00:30:00,340 --> 00:30:03,700 It's not necessarily the case that it's literally back-to-back. 648 00:30:03,700 --> 00:30:05,950 That would have the downside, it would seem, 649 00:30:05,950 --> 00:30:07,510 of costing us a little bit of space. 650 00:30:07,510 --> 00:30:10,120 Like a pointer, which recall, takes up some amount of space. 651 00:30:10,120 --> 00:30:12,400 Typically 8 bytes or 64 bits. 652 00:30:12,400 --> 00:30:16,000 But I don't have to copy potentially a huge amount of data just 653 00:30:16,000 --> 00:30:17,440 to add one more number. 654 00:30:17,440 --> 00:30:19,278 And so these things do have a name. 655 00:30:19,278 --> 00:30:21,070 And indeed, these things are what generally 656 00:30:21,070 --> 00:30:24,820 would be called a linked list. 657 00:30:24,820 --> 00:30:27,340 A linked list captures exactly that intuition 658 00:30:27,340 --> 00:30:29,060 of linking together things in memory. 659 00:30:29,060 --> 00:30:30,530 So let's take a look at an example. 660 00:30:30,530 --> 00:30:32,322 Here's a computer's memory in the abstract. 661 00:30:32,322 --> 00:30:35,140 Suppose that I'm trying to create an array. 662 00:30:35,140 --> 00:30:38,200 Let's generalize it as a list, now, of numbers. 663 00:30:38,200 --> 00:30:39,880 An array has a very specific meaning. 664 00:30:39,880 --> 00:30:42,610 It's memory that's contiguous, back, to back, to back. 665 00:30:42,610 --> 00:30:46,240 At the end of the day, I as the programmer, just care about the data-- 666 00:30:46,240 --> 00:30:48,340 1, 2, 3, 4, and so forth. 667 00:30:48,340 --> 00:30:52,300 I don't really care how it's stored. 668 00:30:52,300 --> 00:30:54,610 I don't care how it's stored when I'm writing the code, 669 00:30:54,610 --> 00:30:56,443 I just wanted to work at the end of the day. 670 00:30:56,443 --> 00:30:58,570 So suppose that I first insert my number 1. 671 00:30:58,570 --> 00:31:02,110 And, who knows, it ends up, up there at location, 0X123, 672 00:31:02,110 --> 00:31:03,320 for the sake of discussion. 673 00:31:03,320 --> 00:31:03,820 All right. 674 00:31:03,820 --> 00:31:06,070 Maybe there's something already here. 675 00:31:06,070 --> 00:31:08,110 And heck, maybe there's something already here, 676 00:31:08,110 --> 00:31:11,095 but there's plenty of other options for where this thing can go. 677 00:31:11,095 --> 00:31:12,970 And suppose that, for the sake of discussion, 678 00:31:12,970 --> 00:31:14,803 the first available spot for the next number 679 00:31:14,803 --> 00:31:20,612 happens to be over here at location 0X456, for the sake of discussion. 680 00:31:20,612 --> 00:31:22,570 So that's where I'm going to plop the number 2. 681 00:31:22,570 --> 00:31:24,070 And where might the number 3 end up? 682 00:31:24,070 --> 00:31:26,860 Oh I don't know, maybe down over there at 0X789. 683 00:31:26,860 --> 00:31:31,030 The point being, I don't know what is, or really care about, 684 00:31:31,030 --> 00:31:33,190 everything else that's in the computer's memory. 685 00:31:33,190 --> 00:31:37,240 I just care that there are at least 3 locations available where 686 00:31:37,240 --> 00:31:40,300 I can put my 1, my 2, and my 3. 687 00:31:40,300 --> 00:31:44,020 But the catch is, now that we're not using an array, 688 00:31:44,020 --> 00:31:48,370 we can't just naively assume that you just add 1 to an index and boom, 689 00:31:48,370 --> 00:31:49,510 you're at the next number. 690 00:31:49,510 --> 00:31:52,960 Add 2 to an index, and boom you're at the next, next number. 691 00:31:52,960 --> 00:31:57,370 Now you have to leave these little breadcrumbs, or use the arrow notation, 692 00:31:57,370 --> 00:31:59,680 to lead from one to the other. 693 00:31:59,680 --> 00:32:01,870 And sometimes, it might be close, a few bytes away. 694 00:32:01,870 --> 00:32:05,810 Maybe, it's a whole gigabyte away in an even bigger computer's memory. 695 00:32:05,810 --> 00:32:07,540 So how might I do this? 696 00:32:07,540 --> 00:32:12,770 Like where do these pointers go, as you proposed? 697 00:32:12,770 --> 00:32:13,270 All right. 698 00:32:13,270 --> 00:32:15,340 All I have access to here are bytes. 699 00:32:15,340 --> 00:32:17,410 I've already stored the 1, the 2, and the 3. 700 00:32:17,410 --> 00:32:19,780 So what more should I do? 701 00:32:19,780 --> 00:32:20,480 OK, yeah. 702 00:32:20,480 --> 00:32:23,370 So let me, you put the pointers right next to these numbers. 703 00:32:23,370 --> 00:32:27,410 So let me at least plan ahead, so that when I ask the computer like malloc, 704 00:32:27,410 --> 00:32:30,470 recall from last week, for some memory, I don't just ask it now 705 00:32:30,470 --> 00:32:32,375 for space for just the number. 706 00:32:32,375 --> 00:32:34,250 Let me start getting into the habit of asking 707 00:32:34,250 --> 00:32:39,350 malloc for enough space for the number and a pointer to another such number. 708 00:32:39,350 --> 00:32:42,060 So it's a little more aggressive of me to ask for more memory. 709 00:32:42,060 --> 00:32:43,340 But I'm planning ahead. 710 00:32:43,340 --> 00:32:45,140 And here is an example of a trade off. 711 00:32:45,140 --> 00:32:48,920 Almost any time in CS, when you start using more space, you can save time. 712 00:32:48,920 --> 00:32:53,180 Or if you try to conserve space, you might have to lose time. 713 00:32:53,180 --> 00:32:54,680 It's being that trade off there. 714 00:32:54,680 --> 00:32:56,910 So how might I solve this? 715 00:32:56,910 --> 00:32:58,460 Well let me abstract this away. 716 00:32:58,460 --> 00:33:01,575 And either next to or below, I'm just drawing it vertically, just 717 00:33:01,575 --> 00:33:02,700 for the sake of discussion. 718 00:33:02,700 --> 00:33:04,670 So the arrows are a bit prettier. 719 00:33:04,670 --> 00:33:07,580 I've asked malloc for now twice as much space, 720 00:33:07,580 --> 00:33:09,590 it would seem, than I previously needed. 721 00:33:09,590 --> 00:33:13,535 But I'm going to use this second chunk of memory to refer to the next number. 722 00:33:13,535 --> 00:33:16,160 And I'm going to use this chunk of memory to refer to the next, 723 00:33:16,160 --> 00:33:17,970 essentially, stitching this thing together. 724 00:33:17,970 --> 00:33:20,030 So what should go in this first box? 725 00:33:20,030 --> 00:33:23,600 Well, I claim the number, 0X456. 726 00:33:23,600 --> 00:33:26,300 And it's written in hex because it represents a memory address. 727 00:33:26,300 --> 00:33:30,320 But this is the equivalent of drawing an arrow from one to the other. 728 00:33:30,320 --> 00:33:34,070 As a little check here, what should go in this second box 729 00:33:34,070 --> 00:33:37,940 if the goal is to stitch these together in order 1, 2, 3? 730 00:33:37,940 --> 00:33:40,112 Feel free to just shout this out. 731 00:33:40,112 --> 00:33:41,570 AUDIENCE: 0X789. 732 00:33:41,570 --> 00:33:42,990 SPEAKER 1: OK, that worked well. 733 00:33:42,990 --> 00:33:43,915 So 0X789, indeed. 734 00:33:43,915 --> 00:33:46,790 And you can't do that with the hands because I can't count that fast. 735 00:33:46,790 --> 00:33:51,030 So 0X789 should go here because that's like a little breadcrumb to the next. 736 00:33:51,030 --> 00:33:54,290 And then, we don't really have terribly many possibilities here. 737 00:33:54,290 --> 00:33:56,960 This has to have a value, right. 738 00:33:56,960 --> 00:34:01,830 Because at the end of the day, it's got to use its 64 bits in some way. 739 00:34:01,830 --> 00:34:05,170 So what value should go here, if this is the end of this list? 740 00:34:05,170 --> 00:34:06,170 AUDIENCE: 0. 741 00:34:06,170 --> 00:34:08,270 SPEAKER 1: So it could be 0X123. 742 00:34:08,270 --> 00:34:12,050 The implication being that it would be a cyclical list. 743 00:34:12,050 --> 00:34:14,570 Which is OK, but potentially problematic. 744 00:34:14,570 --> 00:34:18,620 If any of you have accidentally lost control over your code space 745 00:34:18,620 --> 00:34:21,680 because you had an infinite loop, this would seem a very easy way 746 00:34:21,680 --> 00:34:26,330 to give yourself the accidental probability of an infinite loop. 747 00:34:26,330 --> 00:34:28,916 What might be simpler than that and ward that off? 748 00:34:28,916 --> 00:34:29,590 AUDIENCE: Null. 749 00:34:29,590 --> 00:34:30,505 SPEAKER 1: Say again? 750 00:34:30,505 --> 00:34:31,130 AUDIENCE: Null. 751 00:34:31,130 --> 00:34:32,840 SPEAKER 1: So just the null character. 752 00:34:32,840 --> 00:34:35,540 Not N-U-L, confusingly, which is at the end of strings. 753 00:34:35,540 --> 00:34:38,550 But N-U-L-L, as we introduced it last week. 754 00:34:38,550 --> 00:34:40,580 Which is the same as 0x0. 755 00:34:40,580 --> 00:34:43,400 So this is just a special value that programmers decades ago 756 00:34:43,400 --> 00:34:47,510 decided that if you store the address 0, that's not a valid address. 757 00:34:47,510 --> 00:34:50,420 There's never going to be anything useful at 0x0. 758 00:34:50,420 --> 00:34:53,600 Therefore, it's a sentinel value, just a special value, 759 00:34:53,600 --> 00:34:54,800 that indicates that's it. 760 00:34:54,800 --> 00:34:56,870 There's nowhere further to go. 761 00:34:56,870 --> 00:35:00,470 It's OK to come back to your suggestion of making a cyclical list. 762 00:35:00,470 --> 00:35:02,390 But we'd better be smart enough to, maybe, 763 00:35:02,390 --> 00:35:06,380 remember where did the list start so that you can detect cycles. 764 00:35:06,380 --> 00:35:08,940 If you start looping around in this structure, otherwise. 765 00:35:08,940 --> 00:35:09,440 All right. 766 00:35:09,440 --> 00:35:11,640 But these addresses, who really cares at the end of the day 767 00:35:11,640 --> 00:35:12,920 if we abstract this away. 768 00:35:12,920 --> 00:35:14,820 It really just now looks like this. 769 00:35:14,820 --> 00:35:17,778 And indeed, this is how most anyone would draw this on a whiteboard 770 00:35:17,778 --> 00:35:19,070 if having a discussion at work. 771 00:35:19,070 --> 00:35:20,862 Talking about what data structure we should 772 00:35:20,862 --> 00:35:22,790 use to solve some problem in the real world. 773 00:35:22,790 --> 00:35:25,040 We don't care generally about the addresses. 774 00:35:25,040 --> 00:35:27,630 We care that in code we can access them. 775 00:35:27,630 --> 00:35:30,590 But in terms of the concept alone this would be, perhaps, 776 00:35:30,590 --> 00:35:32,239 the right way to think about this. 777 00:35:32,239 --> 00:35:34,197 All right, let me pause here and see if there's 778 00:35:34,197 --> 00:35:38,420 any questions on this idea of creating a linked list in memory by just storing, 779 00:35:38,420 --> 00:35:42,540 not just the numbers like 1, 2, 3, but twice as much data. 780 00:35:42,540 --> 00:35:45,110 So that you have little breadcrumbs in the form of pointers 781 00:35:45,110 --> 00:35:48,510 that can lead you from one to the next. 782 00:35:48,510 --> 00:35:50,674 Any questions on these linked lists? 783 00:35:54,130 --> 00:35:54,730 Any questions? 784 00:35:54,730 --> 00:35:55,230 No? 785 00:35:55,230 --> 00:35:55,940 All right. 786 00:35:55,940 --> 00:35:56,440 Oh, yeah. 787 00:35:56,440 --> 00:35:57,431 Over here. 788 00:35:57,431 --> 00:36:02,025 AUDIENCE: So does this takes time more memory than an array? 789 00:36:02,025 --> 00:36:04,150 SPEAKER 1: This does take more memory than an array 790 00:36:04,150 --> 00:36:06,699 because I now need space for these pointers. 791 00:36:06,699 --> 00:36:10,670 And to be clear, I technically didn't really draw this to scale. 792 00:36:10,670 --> 00:36:13,600 Thus far, in the class, we've generally thought about integers 793 00:36:13,600 --> 00:36:16,510 like, 1, 2 and 3, as being 4 bytes, or 32 bits. 794 00:36:16,510 --> 00:36:19,540 I made the claim last week that on modern computer's pointers 795 00:36:19,540 --> 00:36:22,570 tend to be 8 bytes or 64 bits. 796 00:36:22,570 --> 00:36:25,280 So, technically, this box should actually be a little bigger. 797 00:36:25,280 --> 00:36:26,980 It was just going to look a little stupid in the picture. 798 00:36:26,980 --> 00:36:28,330 So I abstracted it away. 799 00:36:28,330 --> 00:36:31,330 But, indeed, you're using more space as a result. 800 00:36:31,330 --> 00:36:32,787 AUDIENCE: [INAUDIBLE]. 801 00:36:32,787 --> 00:36:34,120 SPEAKER 1: Oh, how does-- sorry. 802 00:36:34,120 --> 00:36:37,970 How does the computer identify useful data from used data? 803 00:36:37,970 --> 00:36:40,780 So, for instance, garbage values or non-garbage values. 804 00:36:40,780 --> 00:36:43,420 For now, think of that as the job of malloc. 805 00:36:43,420 --> 00:36:46,810 So when you ask malloc for memory, as we started to last week, 806 00:36:46,810 --> 00:36:49,990 malloc keeps track of the addresses of the memory 807 00:36:49,990 --> 00:36:52,960 it has handed to as valid values. 808 00:36:52,960 --> 00:36:55,450 The other type of memory you use, not just from the heap. 809 00:36:55,450 --> 00:36:58,390 Because recall we briefly discussed that malloc uses space 810 00:36:58,390 --> 00:37:01,390 from the heap, which was drawn at the top of the picture, pointing down. 811 00:37:01,390 --> 00:37:05,220 There's also stack memory, which is where all of your local variables go. 812 00:37:05,220 --> 00:37:07,720 And where all of the memory used by individual functions go. 813 00:37:07,720 --> 00:37:10,053 And that was drawn in the picture is working its way up. 814 00:37:10,053 --> 00:37:12,820 That's just an artist's rendition of direction. 815 00:37:12,820 --> 00:37:16,180 The compiler, essentially, will also help 816 00:37:16,180 --> 00:37:19,868 keep track of which values are valid or not inside of the stack. 817 00:37:19,868 --> 00:37:21,910 Or really the underlying code that you've written 818 00:37:21,910 --> 00:37:23,243 will keep track of that for you. 819 00:37:23,243 --> 00:37:26,210 So it's managed for you at that point. 820 00:37:26,210 --> 00:37:26,710 All right. 821 00:37:26,710 --> 00:37:27,310 Good question. 822 00:37:27,310 --> 00:37:29,040 Sorry it took me a bit to catch on. 823 00:37:29,040 --> 00:37:31,210 So let's now translate this to actual code. 824 00:37:31,210 --> 00:37:34,780 How could we implement this idea of, let's call these things nodes. 825 00:37:34,780 --> 00:37:36,160 And that's a term of our NCS. 826 00:37:36,160 --> 00:37:40,210 Whenever you have some data structure that encapsulates information, node, 827 00:37:40,210 --> 00:37:42,947 N-O-D-E, is the generic term for that. 828 00:37:42,947 --> 00:37:44,780 So each of these might be said to be a node. 829 00:37:44,780 --> 00:37:45,830 Well, how can we do this? 830 00:37:45,830 --> 00:37:48,622 Well a couple of weeks ago, we saw how we could represent something 831 00:37:48,622 --> 00:37:50,260 like a student or a candidate. 832 00:37:50,260 --> 00:37:54,940 And a student, or rather a person, we said has a name and a number. 833 00:37:54,940 --> 00:37:56,680 And we used a few pieces of syntax here. 834 00:37:56,680 --> 00:37:59,890 One, we use the struct keyword, which gives us a data structure. 835 00:37:59,890 --> 00:38:04,420 We use typedef, which defines the name person to be our new data 836 00:38:04,420 --> 00:38:06,850 type representing that whole structure. 837 00:38:06,850 --> 00:38:08,950 So we probably have the right ingredients here 838 00:38:08,950 --> 00:38:11,500 to build up this thing called a node. 839 00:38:11,500 --> 00:38:14,620 And just to be clear, what should go inside of one of these nodes, 840 00:38:14,620 --> 00:38:15,435 do we think? 841 00:38:15,435 --> 00:38:17,560 It's not going to be a name or a number, obviously. 842 00:38:17,560 --> 00:38:22,250 But what should a node have in terms of those fields, perhaps? 843 00:38:22,250 --> 00:38:22,750 Yeah? 844 00:38:22,750 --> 00:38:23,625 AUDIENCE: [? Data. ?] 845 00:38:23,625 --> 00:38:26,600 SPEAKER 1: So a number like a number and a pointer in some form. 846 00:38:26,600 --> 00:38:28,850 So let's translate this to actual code. 847 00:38:28,850 --> 00:38:33,610 So let's rename person to node to capture this notion here. 848 00:38:33,610 --> 00:38:34,865 And the number is easy. 849 00:38:34,865 --> 00:38:36,740 If it's just going to be an int, that's fine. 850 00:38:36,740 --> 00:38:38,980 We can just say int number, or int n, or whatever 851 00:38:38,980 --> 00:38:41,380 you want to call that particular field. 852 00:38:41,380 --> 00:38:43,072 The next one is a little non-obvious. 853 00:38:43,072 --> 00:38:45,280 And this is where things get a little weird at first, 854 00:38:45,280 --> 00:38:47,830 but, in retrospect, it should all fit together. 855 00:38:47,830 --> 00:38:53,630 Let me propose that, ideally, we would say something like node* next. 856 00:38:53,630 --> 00:38:55,930 And I could call the word next anything I want. 857 00:38:55,930 --> 00:39:00,110 Next just means what comes after me is the notion I'm using it at. 858 00:39:00,110 --> 00:39:02,500 So a lot of CS people would just use next to represent 859 00:39:02,500 --> 00:39:03,880 the name of this pointer. 860 00:39:03,880 --> 00:39:05,260 But there's a catch here. 861 00:39:05,260 --> 00:39:08,440 C and C compilers are pretty naive, recall. 862 00:39:08,440 --> 00:39:11,660 They only look at code top to bottom, left to right. 863 00:39:11,660 --> 00:39:13,840 And any time they encounter a word they have never 864 00:39:13,840 --> 00:39:15,513 seen before, bad things happen. 865 00:39:15,513 --> 00:39:16,930 Like, you can't compile your code. 866 00:39:16,930 --> 00:39:18,920 You get some cryptic error message or the like. 867 00:39:18,920 --> 00:39:21,910 And that seems to be about to happen here. 868 00:39:21,910 --> 00:39:24,970 Because if the compiler is reading this code from top to bottom, 869 00:39:24,970 --> 00:39:27,340 it's going to say, oh, inside of this struct 870 00:39:27,340 --> 00:39:29,140 should be a variable called next. 871 00:39:29,140 --> 00:39:31,000 Which is of type node*. 872 00:39:31,000 --> 00:39:32,200 What the heck is a node? 873 00:39:32,200 --> 00:39:35,470 Because it literally does not find out until 2 lines 874 00:39:35,470 --> 00:39:37,720 later, after that semicolon. 875 00:39:37,720 --> 00:39:40,330 So the way to avoid this, which we haven't quite seen before, 876 00:39:40,330 --> 00:39:45,220 is that you can temporarily name this whole thing up here, struct node. 877 00:39:45,220 --> 00:39:50,560 And then, down here inside of the data structure, you say struct node*. 878 00:39:50,560 --> 00:39:52,210 And then, you leave the rest alone. 879 00:39:52,210 --> 00:39:56,620 This is a workaround this is possible because now you're 880 00:39:56,620 --> 00:39:59,740 teaching the compiler, from the first line, that here comes 881 00:39:59,740 --> 00:40:01,960 a data structure called struct node. 882 00:40:01,960 --> 00:40:05,420 Down here, you're shortening the name of this whole thing to just node. 883 00:40:05,420 --> 00:40:05,920 Why? 884 00:40:05,920 --> 00:40:09,003 It's just a little more convenient than having to write struct everywhere. 885 00:40:09,003 --> 00:40:12,760 But you do have to write struct node* inside of the data structure. 886 00:40:12,760 --> 00:40:15,730 But that's OK because it's already come into existence 887 00:40:15,730 --> 00:40:17,892 now, as of that first line of code. 888 00:40:17,892 --> 00:40:19,600 So that's the only fundamental difference 889 00:40:19,600 --> 00:40:22,900 between what we did last week with a person or a candidate. 890 00:40:22,900 --> 00:40:27,890 We just now have to use this struct workaround, syntactically. 891 00:40:27,890 --> 00:40:28,390 All right. 892 00:40:28,390 --> 00:40:29,170 Yeah, question. 893 00:40:29,170 --> 00:40:33,010 AUDIENCE: So [INAUDIBLE] have like right next to the [INAUDIBLE] point 894 00:40:33,010 --> 00:40:33,970 to another [INAUDIBLE]. 895 00:40:33,970 --> 00:40:39,070 SPEAKER 1: Why is the next variable a struct node* pointer and not an int 896 00:40:39,070 --> 00:40:41,150 star pointer, for instance? 897 00:40:41,150 --> 00:40:43,870 So think about the picture we are trying to draw. 898 00:40:43,870 --> 00:40:47,740 Technically, yes, each of these arrows I deliberately drew 899 00:40:47,740 --> 00:40:49,240 is pointing at the number. 900 00:40:49,240 --> 00:40:50,500 But that's not alone. 901 00:40:50,500 --> 00:40:53,320 They need to point at the whole data structure in memory. 902 00:40:53,320 --> 00:40:55,600 Because the computer, ultimately, and the compiler, 903 00:40:55,600 --> 00:40:59,470 in turn, needs to know that this chunk of memory is not just an int. 904 00:40:59,470 --> 00:41:01,040 It is a whole node. 905 00:41:01,040 --> 00:41:04,370 Inside of a node is a number and also another pointer. 906 00:41:04,370 --> 00:41:06,770 So when you draw these arrows, it would be 907 00:41:06,770 --> 00:41:09,380 incorrect to point at just the number. 908 00:41:09,380 --> 00:41:11,757 Because that throws away information that 909 00:41:11,757 --> 00:41:14,090 would leave the compiler wondering, OK, I'm at a number. 910 00:41:14,090 --> 00:41:15,200 Where the heck is the pointer? 911 00:41:15,200 --> 00:41:17,450 You have to tell it that it's pointing at a whole node 912 00:41:17,450 --> 00:41:20,857 so it knows a few bytes away is that corresponding pointer. 913 00:41:20,857 --> 00:41:21,440 Good question. 914 00:41:21,440 --> 00:41:23,183 Yeah. 915 00:41:23,183 --> 00:41:24,630 AUDIENCE: How do you [INAUDIBLE]. 916 00:41:24,630 --> 00:41:25,963 SPEAKER 1: Really good question. 917 00:41:25,963 --> 00:41:29,250 It would seem that just as copying the array earlier 918 00:41:29,250 --> 00:41:32,460 required twice as much memory, because we copied from old to new. 919 00:41:32,460 --> 00:41:35,130 So, technically, twice as much plus 1 for the new number. 920 00:41:35,130 --> 00:41:38,520 Here, too, it looks like we're using twice as much memory, also. 921 00:41:38,520 --> 00:41:41,400 And to my comment earlier, it's even more than twice as much memory 922 00:41:41,400 --> 00:41:45,270 because these pointers are 8 bytes, and not just 4 bytes like a typical integer 923 00:41:45,270 --> 00:41:45,870 is. 924 00:41:45,870 --> 00:41:47,280 The differences are these. 925 00:41:47,280 --> 00:41:50,910 In the context of the array, you were using that memory temporarily. 926 00:41:50,910 --> 00:41:52,750 So, yes, you needed twice as much memory. 927 00:41:52,750 --> 00:41:55,600 But then you were quickly freeing the original array. 928 00:41:55,600 --> 00:41:58,890 So you weren't consuming long-term, more memory than you might need. 929 00:41:58,890 --> 00:42:02,290 The difference here, too, is that, as we'll see in a moment, 930 00:42:02,290 --> 00:42:05,670 it turns out it's going to be relatively quick for me, potentially, 931 00:42:05,670 --> 00:42:07,620 to insert new numbers in here. 932 00:42:07,620 --> 00:42:10,620 Because I'm not going to have to do a huge amount of copying. 933 00:42:10,620 --> 00:42:13,800 And even though I might still have to follow all of these arrows, which 934 00:42:13,800 --> 00:42:16,080 is going to take some amount of time, I'm 935 00:42:16,080 --> 00:42:19,470 not going to have to be asking for more memory, freeing more memory. 936 00:42:19,470 --> 00:42:23,190 And certain operations in the computer, anything involving asking for or giving 937 00:42:23,190 --> 00:42:25,000 back memory, tends to be slower. 938 00:42:25,000 --> 00:42:26,858 So we get to avoid that situation as well. 939 00:42:26,858 --> 00:42:28,650 There's going to be some downsides, though. 940 00:42:28,650 --> 00:42:29,700 This is not all upside. 941 00:42:29,700 --> 00:42:33,760 But we'll see in a bit just what some of those trade offs actually are. 942 00:42:33,760 --> 00:42:34,260 All right. 943 00:42:34,260 --> 00:42:38,740 So from here, if we go back to the structure in code as we left it, 944 00:42:38,740 --> 00:42:41,820 let's start to now build up a linked list with some actual code. 945 00:42:41,820 --> 00:42:46,200 How do you go about, in C, representing a linked list in code? 946 00:42:46,200 --> 00:42:48,780 Well, at the moment, it would actually be as simple as this. 947 00:42:48,780 --> 00:42:51,930 You declare a variable, called list, for instance. 948 00:42:51,930 --> 00:42:54,970 That itself stores the address of a node. 949 00:42:54,970 --> 00:42:56,010 That's what node* means. 950 00:42:56,010 --> 00:42:57,220 The address of a node. 951 00:42:57,220 --> 00:42:59,880 So if you want to store a linked list in memory, 952 00:42:59,880 --> 00:43:02,397 you just create a variable called list, or whatever else. 953 00:43:02,397 --> 00:43:04,230 And you just say that this variable is going 954 00:43:04,230 --> 00:43:08,430 to be pointing at the first node in a list, wherever it happens to end up. 955 00:43:08,430 --> 00:43:12,270 Because malloc is ultimately going to be the tool that we use just to go 956 00:43:12,270 --> 00:43:16,270 get at any one particular node in memory. 957 00:43:16,270 --> 00:43:16,770 All right. 958 00:43:16,770 --> 00:43:18,690 So let's actually do this in pictorial form. 959 00:43:18,690 --> 00:43:21,690 When you write a line of code, like I just did here-- 960 00:43:21,690 --> 00:43:25,680 and I do not initialize it to anything with the assignment operator, 961 00:43:25,680 --> 00:43:26,730 an equal sign. 962 00:43:26,730 --> 00:43:30,720 It does exist in memory as a box, as I'll draw it here, called list. 963 00:43:30,720 --> 00:43:33,430 But I've deliberately drawn Oscar inside of it. 964 00:43:33,430 --> 00:43:33,930 Why? 965 00:43:33,930 --> 00:43:35,630 To connote what exactly? 966 00:43:35,630 --> 00:43:36,630 AUDIENCE: Garbage value. 967 00:43:36,630 --> 00:43:37,963 SPEAKER 1: It's a garbage value. 968 00:43:37,963 --> 00:43:42,400 I have been allocated the variable in memory, called list. 969 00:43:42,400 --> 00:43:46,470 Which is going to give me 64 bits or 8 bytes somewhere drawn here 970 00:43:46,470 --> 00:43:47,470 with this box. 971 00:43:47,470 --> 00:43:50,220 But if I myself have not used the assignment operator, 972 00:43:50,220 --> 00:43:53,830 it's not going to get magically initialized to any particular address 973 00:43:53,830 --> 00:43:54,330 for me. 974 00:43:54,330 --> 00:43:56,470 It's not going to even give me a node. 975 00:43:56,470 --> 00:44:01,150 This is literally just going to be an address of a future node that exists. 976 00:44:01,150 --> 00:44:02,760 So what would be a solution here? 977 00:44:02,760 --> 00:44:05,760 Suppose that I'm beginning to create my linked list, 978 00:44:05,760 --> 00:44:07,290 but I don't have any nodes yet. 979 00:44:07,290 --> 00:44:11,302 What would be a sensible thing to initialize the list to, perhaps? 980 00:44:11,302 --> 00:44:12,122 AUDIENCE: Null. 981 00:44:12,122 --> 00:44:13,080 SPEAKER 1: Yeah, again. 982 00:44:13,080 --> 00:44:13,838 AUDIENCE: To null. 983 00:44:13,838 --> 00:44:15,130 SPEAKER 1: So just null, right. 984 00:44:15,130 --> 00:44:16,860 When in doubt with pointers, generally it's 985 00:44:16,860 --> 00:44:18,610 a good thing to initialize things to null, 986 00:44:18,610 --> 00:44:20,160 so at least it's not a garbage value. 987 00:44:20,160 --> 00:44:21,420 It's a known value. 988 00:44:21,420 --> 00:44:22,418 Invalid, yes. 989 00:44:22,418 --> 00:44:24,210 But it's a special value you can then check 990 00:44:24,210 --> 00:44:26,140 for with a conditional, or the like. 991 00:44:26,140 --> 00:44:30,120 So this might be a better way to create a linked list, 992 00:44:30,120 --> 00:44:34,120 even before you've inserted any numbers into the thing itself. 993 00:44:34,120 --> 00:44:34,620 All right. 994 00:44:34,620 --> 00:44:37,835 So after that, how can we go about adding something to this linked list? 995 00:44:37,835 --> 00:44:39,210 So now the story looks like this. 996 00:44:39,210 --> 00:44:42,150 Oscar is gone because inside of this box is all zero bits. 997 00:44:42,150 --> 00:44:46,050 Just because it's nice and clean, and this represents an empty linked list. 998 00:44:46,050 --> 00:44:50,590 Well, if I want to add the number 1 to this linked list, what could I do? 999 00:44:50,590 --> 00:44:52,590 Well, perhaps I could start with code like this. 1000 00:44:52,590 --> 00:44:54,300 Borrowing inspiration from last week. 1001 00:44:54,300 --> 00:44:58,920 Let's ask malloc for enough space for the size of a node. 1002 00:44:58,920 --> 00:45:03,060 And this gets to your question earlier, like, what is it I'm manipulating here? 1003 00:45:03,060 --> 00:45:06,360 I don't just need space for an int and I don't just need space for a pointer. 1004 00:45:06,360 --> 00:45:07,440 I need space for both. 1005 00:45:07,440 --> 00:45:10,150 And I gave that thing a name, node. 1006 00:45:10,150 --> 00:45:12,930 So size of node figures out and does the arithmetic for me. 1007 00:45:12,930 --> 00:45:15,390 And gives me back the right number of bytes. 1008 00:45:15,390 --> 00:45:18,930 This, then, stores the address of that chunk of memory 1009 00:45:18,930 --> 00:45:20,880 in what I'll temporarily called n. 1010 00:45:20,880 --> 00:45:23,160 Just to represent a generic new node. 1011 00:45:23,160 --> 00:45:24,870 And it's of type node*. 1012 00:45:24,870 --> 00:45:28,080 Because just like last week when I asked malloc for enough space for an int 1013 00:45:28,080 --> 00:45:30,360 and I stored it in an int* pointer. 1014 00:45:30,360 --> 00:45:32,760 This week, if I'm asking for memory for a node, 1015 00:45:32,760 --> 00:45:35,340 I'm storing it in a node* pointer. 1016 00:45:35,340 --> 00:45:38,520 So technically, nothing new there except for this new term 1017 00:45:38,520 --> 00:45:41,020 of art in data structure called node. 1018 00:45:41,020 --> 00:45:41,520 All right. 1019 00:45:41,520 --> 00:45:42,870 So what does that do for me? 1020 00:45:42,870 --> 00:45:45,660 It essentially draws a picture like this in memory. 1021 00:45:45,660 --> 00:45:49,690 I still have my list variable from my previous line of code initialize 1022 00:45:49,690 --> 00:45:50,190 to null. 1023 00:45:50,190 --> 00:45:51,648 And that's why I've drawn it blank. 1024 00:45:51,648 --> 00:45:54,060 I also now have a temporary variable called 1025 00:45:54,060 --> 00:45:57,570 n, which I initialize to the return value of malloc. 1026 00:45:57,570 --> 00:45:59,650 Which gave me one of these nodes in memory. 1027 00:45:59,650 --> 00:46:02,130 But I've drawn it having garbage values, too, 1028 00:46:02,130 --> 00:46:03,850 because I don't know what int is there. 1029 00:46:03,850 --> 00:46:05,308 I don't know what pointer is there. 1030 00:46:05,308 --> 00:46:09,600 It's garbage values because malloc does not magically initialize memory for me. 1031 00:46:09,600 --> 00:46:11,250 There is another function for that. 1032 00:46:11,250 --> 00:46:14,100 But malloc alone just says, sure, use this chunk of memory. 1033 00:46:14,100 --> 00:46:15,910 Deal with whatever is there. 1034 00:46:15,910 --> 00:46:18,900 So how can I go about initializing this to known values? 1035 00:46:18,900 --> 00:46:23,440 Well, suppose I want to insert the number 1 and then, leave it at that. 1036 00:46:23,440 --> 00:46:27,212 A list of size 1, I could do something like this. 1037 00:46:27,212 --> 00:46:29,920 And this is where you have to think back to some of these basics. 1038 00:46:29,920 --> 00:46:34,060 My conditional here is asking the question if n does not equal null. 1039 00:46:34,060 --> 00:46:37,210 So that is, if malloc gave me valid memory, 1040 00:46:37,210 --> 00:46:40,690 and I don't have to quit altogether because my computer's out of memory. 1041 00:46:40,690 --> 00:46:44,590 If n does not equal null, but is equal to valid address, 1042 00:46:44,590 --> 00:46:46,070 I'm going to go ahead and do this. 1043 00:46:46,070 --> 00:46:48,820 And this is cryptic looking syntax now. 1044 00:46:48,820 --> 00:46:52,150 But does someone want to take a stab at translating this inside line of code 1045 00:46:52,150 --> 00:46:56,380 to English, in some sense? 1046 00:46:56,380 --> 00:47:00,520 How might you explain what that inner line of code is doing? *n. 1047 00:47:00,520 --> 00:47:03,130 number equals 1. 1048 00:47:03,130 --> 00:47:05,355 Let me go further back. 1049 00:47:05,355 --> 00:47:06,477 Nope? 1050 00:47:06,477 --> 00:47:07,060 OK, over here. 1051 00:47:07,060 --> 00:47:07,772 Yeah. 1052 00:47:07,772 --> 00:47:09,010 AUDIENCE: [INAUDIBLE]. 1053 00:47:09,010 --> 00:47:09,802 SPEAKER 1: Perfect. 1054 00:47:09,802 --> 00:47:12,160 The place that n is pointing to, set it equal to 1. 1055 00:47:12,160 --> 00:47:16,060 Or using the vernacular of going there, go to the address in n 1056 00:47:16,060 --> 00:47:18,480 and set it's number field to 1. 1057 00:47:18,480 --> 00:47:20,480 However you want to think about it, that's fine. 1058 00:47:20,480 --> 00:47:22,930 But the * again is the dereference operator here. 1059 00:47:22,930 --> 00:47:24,730 And we're doing the parentheses, which we 1060 00:47:24,730 --> 00:47:28,240 haven't needed to do before because we haven't dealt with pointers and data 1061 00:47:28,240 --> 00:47:30,010 structures together until today. 1062 00:47:30,010 --> 00:47:32,380 This just means go there first. 1063 00:47:32,380 --> 00:47:34,720 And then once you're there, go access number. 1064 00:47:34,720 --> 00:47:36,830 You don't want to do one thing before the other. 1065 00:47:36,830 --> 00:47:38,890 So this is just enforcing order of operations. 1066 00:47:38,890 --> 00:47:41,300 The parentheses just like in grade school math. 1067 00:47:41,300 --> 00:47:41,800 All right. 1068 00:47:41,800 --> 00:47:43,210 So this line of code is cryptic. 1069 00:47:43,210 --> 00:47:43,982 It's ugly. 1070 00:47:43,982 --> 00:47:45,940 It's not something most people easily remember. 1071 00:47:45,940 --> 00:47:49,750 Thankfully, there's that syntactic sugar that simplifies this line of code 1072 00:47:49,750 --> 00:47:50,857 to just this. 1073 00:47:50,857 --> 00:47:52,690 And this, even though it's new to you today, 1074 00:47:52,690 --> 00:47:54,820 should eventually feel a little more familiar. 1075 00:47:54,820 --> 00:47:58,210 Because this now is shorthand notation for saying, start at n. 1076 00:47:58,210 --> 00:48:00,410 Go there as by following the arrow. 1077 00:48:00,410 --> 00:48:02,530 And when you get there, change the number field. 1078 00:48:02,530 --> 00:48:04,720 In this case, to 1. 1079 00:48:04,720 --> 00:48:07,240 So most people would not write code like this. 1080 00:48:07,240 --> 00:48:08,030 It's just ugly. 1081 00:48:08,030 --> 00:48:09,430 It's a couple extra keystrokes. 1082 00:48:09,430 --> 00:48:13,300 This just looks more like the artist's renditions we've been talking about. 1083 00:48:13,300 --> 00:48:17,530 And how most CS people would think about pointers as really just being arrows 1084 00:48:17,530 --> 00:48:18,710 in some form. 1085 00:48:18,710 --> 00:48:19,210 All right. 1086 00:48:19,210 --> 00:48:20,293 So what have we just done? 1087 00:48:20,293 --> 00:48:24,650 The picture now, after setting number to 1, looks a little something like this. 1088 00:48:24,650 --> 00:48:26,440 So there's still one step missing. 1089 00:48:26,440 --> 00:48:28,720 And that's, of course, to initialize, it would seem, 1090 00:48:28,720 --> 00:48:33,080 the pointer in this new node to something known like null. 1091 00:48:33,080 --> 00:48:34,735 So I bet we could do this like this. 1092 00:48:34,735 --> 00:48:36,610 With a different line of code, I'm just going 1093 00:48:36,610 --> 00:48:42,880 to say if n does not equal null, then set n's next field to null. 1094 00:48:42,880 --> 00:48:46,540 Or more pedantically, go to n, follow the arrow, 1095 00:48:46,540 --> 00:48:50,440 and then update the next field that you find there to equal null. 1096 00:48:50,440 --> 00:48:52,690 And again, this is just doing some nice bookkeeping. 1097 00:48:52,690 --> 00:48:55,870 Technically speaking, we might not need to set 1098 00:48:55,870 --> 00:48:58,910 this to null if we're going to keep adding more and more numbers to it. 1099 00:48:58,910 --> 00:49:02,110 But I'm doing it step-by-step so that I have a very clean picture. 1100 00:49:02,110 --> 00:49:05,800 And there's no bugs in my code at this point. 1101 00:49:05,800 --> 00:49:07,270 But I'm still not done. 1102 00:49:07,270 --> 00:49:09,730 There's one last thing I'm going to have to do here. 1103 00:49:09,730 --> 00:49:14,950 If the goal, ultimately, was to insert the number 1 into my linked list, 1104 00:49:14,950 --> 00:49:18,860 what's the last step I should, perhaps, do here? 1105 00:49:18,860 --> 00:49:20,050 Just been English is fine. 1106 00:49:20,050 --> 00:49:20,550 Yeah. 1107 00:49:20,550 --> 00:49:23,260 AUDIENCE: Set the pointer value to null. 1108 00:49:23,260 --> 00:49:24,010 SPEAKER 1: Yes. 1109 00:49:24,010 --> 00:49:27,970 I now need to update the actual variable, that represents my linked 1110 00:49:27,970 --> 00:49:31,030 list, to point at this brand new node. 1111 00:49:31,030 --> 00:49:35,317 That is now perfectly initialized as having an integer and a null pointer. 1112 00:49:35,317 --> 00:49:37,400 Yeah, technically, this is already pointing there. 1113 00:49:37,400 --> 00:49:40,090 But I describe this deliberately earlier as being temporary. 1114 00:49:40,090 --> 00:49:44,620 I just needed this to get it back from malloc and clean things up, initially. 1115 00:49:44,620 --> 00:49:47,230 This is the long term variable I care about. 1116 00:49:47,230 --> 00:49:49,480 So I'm going to want to do something simple like this. 1117 00:49:49,480 --> 00:49:51,520 List equals n. 1118 00:49:51,520 --> 00:49:53,863 And this seems a little weird that list equals n. 1119 00:49:53,863 --> 00:49:55,780 But again, think about what's inside this box. 1120 00:49:55,780 --> 00:49:57,988 At the moment this is null because there is no linked 1121 00:49:57,988 --> 00:49:59,530 list at the beginning of our story. 1122 00:49:59,530 --> 00:50:03,910 N is the address of the beginning, and it turns out, end of our linked list. 1123 00:50:03,910 --> 00:50:07,300 So it stands to reason that if you set list equal to n, 1124 00:50:07,300 --> 00:50:10,180 that has the effect of copying this address up here. 1125 00:50:10,180 --> 00:50:13,283 Or really just copying the arrow into that same location 1126 00:50:13,283 --> 00:50:14,950 so that now the picture looks like this. 1127 00:50:14,950 --> 00:50:18,340 And heck, if this was a temporary variable, it will eventually go away. 1128 00:50:18,340 --> 00:50:19,870 And now, this is the picture. 1129 00:50:19,870 --> 00:50:22,030 So an annoying number of steps, certainly, 1130 00:50:22,030 --> 00:50:24,520 to walk through verbally like this. 1131 00:50:24,520 --> 00:50:26,680 But it's just malloc to give yourself a node, 1132 00:50:26,680 --> 00:50:31,930 initialize the 2 fields inside of it, update the linked list, and boom, 1133 00:50:31,930 --> 00:50:32,770 you're on your way. 1134 00:50:32,770 --> 00:50:34,910 I didn't have to copy anything. 1135 00:50:34,910 --> 00:50:38,132 I just had to insert something in this case. 1136 00:50:38,132 --> 00:50:40,840 Let me pause here to see if there's any questions on those steps. 1137 00:50:40,840 --> 00:50:44,790 And we'll see before long it all in context with some larger code. 1138 00:50:44,790 --> 00:50:48,965 AUDIENCE: So if the statements [INAUDIBLE].. 1139 00:50:48,965 --> 00:50:49,590 SPEAKER 1: Yes. 1140 00:50:49,590 --> 00:50:53,010 I drew them separately just for the sake of the voiceover 1141 00:50:53,010 --> 00:50:55,020 of doing each thing very methodically. 1142 00:50:55,020 --> 00:50:57,090 In real code, as we'll transition to now, 1143 00:50:57,090 --> 00:50:59,220 I could have and should have just done it 1144 00:50:59,220 --> 00:51:03,000 all inside of one conditional after checking if n is not equal to null. 1145 00:51:03,000 --> 00:51:05,310 I could set number to a value like 1. 1146 00:51:05,310 --> 00:51:08,415 And I could set the pointer itself to something like null. 1147 00:51:08,415 --> 00:51:09,030 All right. 1148 00:51:09,030 --> 00:51:12,600 Well let's translate, then, this into some similar code 1149 00:51:12,600 --> 00:51:17,340 that allows us to build up a linked list now using code similar in spirit 1150 00:51:17,340 --> 00:51:18,150 to before. 1151 00:51:18,150 --> 00:51:19,900 But now, using this new primitive. 1152 00:51:19,900 --> 00:51:22,140 So I'm going to go back into VS Code here. 1153 00:51:22,140 --> 00:51:25,470 I'm going to go ahead now and delete the entirety of this old version that 1154 00:51:25,470 --> 00:51:27,270 was entirely array-based. 1155 00:51:27,270 --> 00:51:32,470 And now, inside of my main function, I'm going to go ahead and first do this. 1156 00:51:32,470 --> 00:51:36,180 I'm going to first give myself a list of size 0. 1157 00:51:36,180 --> 00:51:38,610 And I'm going to call that node* list. 1158 00:51:38,610 --> 00:51:41,610 And I'm going to initialize that to null, as we proposed earlier. 1159 00:51:41,610 --> 00:51:44,760 But I'm also now going to have to take the additional step of defining 1160 00:51:44,760 --> 00:51:45,970 what this node is. 1161 00:51:45,970 --> 00:51:49,500 So recall that I might do something like typedef, struct node. 1162 00:51:49,500 --> 00:51:52,320 Inside of this struct node, I'm going to have a number, which 1163 00:51:52,320 --> 00:51:54,010 I'll call number of type int. 1164 00:51:54,010 --> 00:51:56,160 And I'm going to have a structure called node 1165 00:51:56,160 --> 00:51:59,470 with a * that says the next pointer is called next. 1166 00:51:59,470 --> 00:52:03,150 And I'm going to call this whole thing, more succinctly, node, 1167 00:52:03,150 --> 00:52:04,830 instead of struct node. 1168 00:52:04,830 --> 00:52:07,920 Now as an aside, for those of you wondering what the difference really 1169 00:52:07,920 --> 00:52:09,600 is between struct and node. 1170 00:52:09,600 --> 00:52:12,450 Technically, I could do something like this. 1171 00:52:12,450 --> 00:52:15,960 Not use typedef and not use the word node alone. 1172 00:52:15,960 --> 00:52:19,680 This syntax here would actually create for me a new data 1173 00:52:19,680 --> 00:52:22,830 type called, verbosely, struct node. 1174 00:52:22,830 --> 00:52:25,440 And I could use this throughout my code saying struct node. 1175 00:52:25,440 --> 00:52:26,460 Struct node. 1176 00:52:26,460 --> 00:52:27,840 That just gets a little tedious. 1177 00:52:27,840 --> 00:52:30,715 And it would be nicer just to refer to this thing more simplistically 1178 00:52:30,715 --> 00:52:31,750 as a node. 1179 00:52:31,750 --> 00:52:34,230 So what typedef has been doing for us is it, 1180 00:52:34,230 --> 00:52:37,770 again, lets us invent our own word that's even more succinct. 1181 00:52:37,770 --> 00:52:41,040 And this just has the effect now of calling this whole thing 1182 00:52:41,040 --> 00:52:44,760 node without the need, subsequently, to keep saying struct all over the place. 1183 00:52:44,760 --> 00:52:46,170 Just FYI. 1184 00:52:46,170 --> 00:52:46,680 All right. 1185 00:52:46,680 --> 00:52:50,050 So now that this thing exists in main, let's go ahead and do this. 1186 00:52:50,050 --> 00:52:52,770 Let's add a number to list. 1187 00:52:52,770 --> 00:52:55,440 And to do this, I'm going to give myself a temporary variable. 1188 00:52:55,440 --> 00:52:57,340 I'll call it n for consistency. 1189 00:52:57,340 --> 00:53:00,540 I'm going to use malloc to give myself the size of a node, 1190 00:53:00,540 --> 00:53:02,080 just like in our slides. 1191 00:53:02,080 --> 00:53:03,540 And then, I'm going to do a little safety check. 1192 00:53:03,540 --> 00:53:06,470 If n equals equals null, I'm going to do the opposite of the slides. 1193 00:53:06,470 --> 00:53:08,220 I'm just going to quit out of this program 1194 00:53:08,220 --> 00:53:10,960 because there's nothing useful to be done at this point. 1195 00:53:10,960 --> 00:53:13,570 But most likely my computer is not going to run out of memory. 1196 00:53:13,570 --> 00:53:16,750 So I'm going to assume we can keep going with some of the logic here. 1197 00:53:16,750 --> 00:53:21,390 If n does not equal null, and that is it's a valid memory address, 1198 00:53:21,390 --> 00:53:23,370 I'm going to say n []-- 1199 00:53:23,370 --> 00:53:24,930 I'm going to build this up backwards. 1200 00:53:24,930 --> 00:53:26,707 Well let's do. 1201 00:53:26,707 --> 00:53:28,290 That's OK, let's go ahead and do this. 1202 00:53:28,290 --> 00:53:30,600 N [number] equals 1. 1203 00:53:30,600 --> 00:53:35,490 And then n [arrow next] equals null. 1204 00:53:35,490 --> 00:53:42,420 And now, update list to point to new node, list equals n. 1205 00:53:42,420 --> 00:53:44,580 So at this point in the story, we've essentially 1206 00:53:44,580 --> 00:53:49,330 constructed what was that first picture, which looks like this. 1207 00:53:49,330 --> 00:53:53,880 This is the corresponding code via which we built up this node in memory. 1208 00:53:53,880 --> 00:53:56,860 Suppose now, we want to add the number 2 to the list. 1209 00:53:56,860 --> 00:53:58,080 So let's do this again. 1210 00:53:58,080 --> 00:54:02,550 Add a number to list. 1211 00:54:02,550 --> 00:54:03,910 How might I do this? 1212 00:54:03,910 --> 00:54:06,330 Well, I don't need to redeclare n because I can use 1213 00:54:06,330 --> 00:54:08,110 the same temporary variables before. 1214 00:54:08,110 --> 00:54:13,310 So this time, I'm just going to say n equals malloc and the size of a node. 1215 00:54:13,310 --> 00:54:15,060 I'm, again, going to have my safety check. 1216 00:54:15,060 --> 00:54:19,290 So if n equals equals null, then let's just quit out of this altogether. 1217 00:54:19,290 --> 00:54:23,820 But, I have to be a little more careful now. 1218 00:54:23,820 --> 00:54:26,160 Technically speaking, what do I still need 1219 00:54:26,160 --> 00:54:30,540 to do before I quit out of my program to be really proper? 1220 00:54:30,540 --> 00:54:33,880 Free the memory that did succeed a little higher up. 1221 00:54:33,880 --> 00:54:39,280 So I think it suffices to free what is now called list, way at the top. 1222 00:54:39,280 --> 00:54:39,780 All right. 1223 00:54:39,780 --> 00:54:46,260 Now, if all was well, though, let's go ahead and say n [number] equals 2. 1224 00:54:46,260 --> 00:54:51,840 And now, n [arrow next] equals null. 1225 00:54:51,840 --> 00:54:54,900 And now, let's go ahead and add it to the list. 1226 00:54:54,900 --> 00:55:02,910 If I go ahead and do list arrow next equals n, 1227 00:55:02,910 --> 00:55:06,660 I think what we've just done is build up the equivalent, now, 1228 00:55:06,660 --> 00:55:09,660 of this in the computer's memory. 1229 00:55:09,660 --> 00:55:12,180 By going to the list field's next field, which 1230 00:55:12,180 --> 00:55:16,080 is synonymous with the 1 nodes, bottom-most box. 1231 00:55:16,080 --> 00:55:19,540 And store the address of what was n, which a moment ago looked like this. 1232 00:55:19,540 --> 00:55:22,390 And I'm just throwing away, in the picture, the temporary variable. 1233 00:55:22,390 --> 00:55:22,890 All right. 1234 00:55:22,890 --> 00:55:24,880 One last thing to do. 1235 00:55:24,880 --> 00:55:30,087 Let me go down here and say, add a number to list, n equals malloc. 1236 00:55:30,087 --> 00:55:31,170 Let's do it one more time. 1237 00:55:31,170 --> 00:55:32,340 Size of node. 1238 00:55:32,340 --> 00:55:35,280 And clearly, in a real program, we might want to start using a loop. 1239 00:55:35,280 --> 00:55:39,060 And do this dynamically or a function because it's a lot of repetition now. 1240 00:55:39,060 --> 00:55:42,120 But just to go through the syntax here, this is fine. 1241 00:55:42,120 --> 00:55:45,700 If n equals equals null, out of memory for some reason. 1242 00:55:45,700 --> 00:55:51,650 Let's return 1, but we should free the list itself 1243 00:55:51,650 --> 00:55:55,450 and even the second node, list [next]. 1244 00:55:55,450 --> 00:55:58,730 But I've deliberately done this poorly. 1245 00:55:58,730 --> 00:55:59,230 All right. 1246 00:55:59,230 --> 00:56:01,240 This is a little more subtle now. 1247 00:56:01,240 --> 00:56:04,570 And let me get rid of the highlighting just so it's a little more visible. 1248 00:56:04,570 --> 00:56:08,890 If n happens to equal equal null, and something really just 1249 00:56:08,890 --> 00:56:15,040 went wrong they're out of memory, why am I freeing 2 addresses now? 1250 00:56:15,040 --> 00:56:17,770 And again, it's not that I'm freeing those variables per se. 1251 00:56:17,770 --> 00:56:21,620 I'm freeing the addresses at in those variables. 1252 00:56:21,620 --> 00:56:23,890 But there's also a bug with my code here. 1253 00:56:23,890 --> 00:56:26,290 And it's subtle. 1254 00:56:26,290 --> 00:56:27,580 Let me ask more pointedly. 1255 00:56:27,580 --> 00:56:31,683 This line here, 43, what is that freeing specifically? 1256 00:56:31,683 --> 00:56:32,350 Can I go to you? 1257 00:56:32,350 --> 00:56:34,900 AUDIENCE: You're freeing list 2 times. 1258 00:56:34,900 --> 00:56:36,640 SPEAKER 1: I'm freeing, not so. 1259 00:56:36,640 --> 00:56:37,150 That's OK. 1260 00:56:37,150 --> 00:56:38,740 I'm not freeing list 2 times. 1261 00:56:38,740 --> 00:56:41,530 Technically, I'm freeing list once and list next once. 1262 00:56:41,530 --> 00:56:43,600 But let me just ask the more explicit question. 1263 00:56:43,600 --> 00:56:46,420 What am I freeing with line 43 at the moment? 1264 00:56:46,420 --> 00:56:49,420 Which node? 1265 00:56:49,420 --> 00:56:50,930 I think node number 1. 1266 00:56:50,930 --> 00:56:51,430 Why? 1267 00:56:51,430 --> 00:56:53,440 Because if 1 is at the beginning of the list, 1268 00:56:53,440 --> 00:56:56,530 list contains the address of that number 1 node. 1269 00:56:56,530 --> 00:56:58,280 And so this frees that node. 1270 00:56:58,280 --> 00:57:01,250 This line of code, you might think now intuitively, OK, 1271 00:57:01,250 --> 00:57:03,610 it's probably freeing the node number 2. 1272 00:57:03,610 --> 00:57:04,540 But this is bad. 1273 00:57:04,540 --> 00:57:05,410 And this is subtle. 1274 00:57:05,410 --> 00:57:07,120 Valgrind might help you catch this. 1275 00:57:07,120 --> 00:57:09,520 But by eyeing it, it's not necessarily obvious. 1276 00:57:09,520 --> 00:57:13,990 You should never touch memory that you have already freed. 1277 00:57:13,990 --> 00:57:16,930 And so, the fact that I did in this order, very bad. 1278 00:57:16,930 --> 00:57:19,630 Because I'm telling the operating system, I don't know. 1279 00:57:19,630 --> 00:57:22,150 I don't need the list address anymore. 1280 00:57:22,150 --> 00:57:23,410 Do with it what you want. 1281 00:57:23,410 --> 00:57:25,660 And then, literally one line later, you're saying, wait a minute. 1282 00:57:25,660 --> 00:57:27,730 Let me actually go to that address for a moment 1283 00:57:27,730 --> 00:57:30,400 and look at the next field of that first node. 1284 00:57:30,400 --> 00:57:31,220 It's too late. 1285 00:57:31,220 --> 00:57:33,710 You've already given up control over the node. 1286 00:57:33,710 --> 00:57:36,730 So it's an easy fix in this case, logically. 1287 00:57:36,730 --> 00:57:39,370 But we should be freeing the second node first 1288 00:57:39,370 --> 00:57:43,060 and then the first one so that we're doing it 1289 00:57:43,060 --> 00:57:45,040 in, essentially, reverse order. 1290 00:57:45,040 --> 00:57:46,957 And again, Valgrind would help you catch that. 1291 00:57:46,957 --> 00:57:49,582 But that's the kind of thing one needs to be careful about when 1292 00:57:49,582 --> 00:57:50,600 touching memory at all. 1293 00:57:50,600 --> 00:57:53,110 You cannot touch memory after you freed it. 1294 00:57:53,110 --> 00:57:54,970 But here is my last step. 1295 00:57:54,970 --> 00:58:00,490 Let me go ahead and update the number field of n to be 3. 1296 00:58:00,490 --> 00:58:03,500 The next node of n to be null. 1297 00:58:03,500 --> 00:58:05,290 And then, just like in the slide earlier, 1298 00:58:05,290 --> 00:58:11,020 I think I can do list next, next equals n. 1299 00:58:11,020 --> 00:58:14,890 And that has the effect now of building up in the computer's memory, 1300 00:58:14,890 --> 00:58:16,990 essentially, this data structure. 1301 00:58:16,990 --> 00:58:17,890 Very manually. 1302 00:58:17,890 --> 00:58:18,820 Very pedantically. 1303 00:58:18,820 --> 00:58:20,860 Like, in a better world, we'd have a loop and some functions 1304 00:58:20,860 --> 00:58:22,420 that are automating this process. 1305 00:58:22,420 --> 00:58:26,680 But, for now, we're doing it just to play around with the syntax. 1306 00:58:26,680 --> 00:58:31,420 So at this point, unfortunately, suppose I want to print the numbers. 1307 00:58:31,420 --> 00:58:36,190 It's no longer as easy as int i equals 0, i less than 3, i++. 1308 00:58:36,190 --> 00:58:43,420 Because you cannot just do something like this. 1309 00:58:43,420 --> 00:58:48,520 Because pointer arithmetic no longer comes into play 1310 00:58:48,520 --> 00:58:52,750 when it's you, who are stitching together the data structure in memory. 1311 00:58:52,750 --> 00:58:55,450 In all of our past examples with arrays, you've 1312 00:58:55,450 --> 00:58:58,820 been trusting that all of the bytes in the array are back, to back, to back. 1313 00:58:58,820 --> 00:59:01,533 So it's perfectly reasonable for the compiler and the computer 1314 00:59:01,533 --> 00:59:04,450 to just figure out, oh, well if you want [0], that's at the beginning. 1315 00:59:04,450 --> 00:59:06,130 [1], it's one location over. 1316 00:59:06,130 --> 00:59:08,110 [2], it's one location over. 1317 00:59:08,110 --> 00:59:11,030 This is way less obvious now. 1318 00:59:11,030 --> 00:59:14,650 Because even though you might want to go to the first element in the linked 1319 00:59:14,650 --> 00:59:19,270 list, or the second, or the third, you can't just jump to those arithmetically 1320 00:59:19,270 --> 00:59:20,590 by doing a bit of math. 1321 00:59:20,590 --> 00:59:24,040 Instead, you have to follow all of those arrows. 1322 00:59:24,040 --> 00:59:27,340 So with linked lists, you can't use this square bracket notation anymore 1323 00:59:27,340 --> 00:59:30,310 because one node might be here, over here, over here, over here. 1324 00:59:30,310 --> 00:59:33,550 You can't just use some simple offset. 1325 00:59:33,550 --> 00:59:36,340 So I think our code is going to have to be a little fancier. 1326 00:59:36,340 --> 00:59:39,820 And this might look scary at first, but it's just an application 1327 00:59:39,820 --> 00:59:42,160 of some of the basic definitions here. 1328 00:59:42,160 --> 00:59:49,480 Let me do a for-loop that actually uses a node* variable initialized 1329 00:59:49,480 --> 00:59:51,130 to the list itself. 1330 00:59:51,130 --> 00:59:55,780 I'm going to keep doing this, so long as TMP does not equal null. 1331 00:59:55,780 --> 00:59:58,360 And on each iteration of this loop, I'm going 1332 00:59:58,360 --> 01:00:03,100 to update TMP to be whatever TMP arrow next is. 1333 01:00:03,100 --> 01:00:05,710 And I'll remind you in a moment and explain in more detail. 1334 01:00:05,710 --> 01:00:09,730 But when I print something here with printf, I can still use %i. 1335 01:00:09,730 --> 01:00:12,040 Because it's still a number at the end of the day. 1336 01:00:12,040 --> 01:00:16,640 But what I want to print out is the number in this temporary variable. 1337 01:00:16,640 --> 01:00:19,032 So maybe the ugliest for-loop we've ever seen. 1338 01:00:19,032 --> 01:00:21,490 Because it's mixing, not just the idea of a for-loop, which 1339 01:00:21,490 --> 01:00:23,500 itself was a bit cryptic weeks ago. 1340 01:00:23,500 --> 01:00:26,025 But now, I'm using pointers instead of integers. 1341 01:00:26,025 --> 01:00:28,150 But I'm not violating the definition of a for-loop. 1342 01:00:28,150 --> 01:00:30,940 Recall that a for-loop has 3 main things in parentheses. 1343 01:00:30,940 --> 01:00:32,800 What do you want to initialize first? 1344 01:00:32,800 --> 01:00:35,740 What condition do you want to keep checking again and again? 1345 01:00:35,740 --> 01:00:39,440 And what update do you want to make on every iteration of the loop? 1346 01:00:39,440 --> 01:00:41,860 So with that basic definition in mind, this 1347 01:00:41,860 --> 01:00:44,350 is giving me a temporary variable called TMP 1348 01:00:44,350 --> 01:00:46,520 that is initialized to the beginning of the loop. 1349 01:00:46,520 --> 01:00:50,110 So it's like pointing my finger at the number 1 node. 1350 01:00:50,110 --> 01:00:53,530 Then, I'm asking the question, does TMP not equal null? 1351 01:00:53,530 --> 01:00:56,170 Well, hopefully, not because I'm pointing at a valid node 1352 01:00:56,170 --> 01:00:57,710 that is the number 1 node. 1353 01:00:57,710 --> 01:00:59,530 So, of course, it doesn't equal null yet. 1354 01:00:59,530 --> 01:01:02,030 Null won't be until we get to the end of the list. 1355 01:01:02,030 --> 01:01:03,530 So what do I do? 1356 01:01:03,530 --> 01:01:05,260 I started this TMP variable. 1357 01:01:05,260 --> 01:01:10,270 I follow the arrow and go to the number field they're in. 1358 01:01:10,270 --> 01:01:11,350 What do I then do? 1359 01:01:11,350 --> 01:01:15,010 The for-loop says, change TMP to be whatever 1360 01:01:15,010 --> 01:01:19,090 is at TMP, by following the arrow and grabbing the next field. 1361 01:01:19,090 --> 01:01:22,260 That, then, has the result of being checked against this conditional. 1362 01:01:22,260 --> 01:01:24,760 No, of course, it doesn't equal null because the second node 1363 01:01:24,760 --> 01:01:26,050 is the number 2 node. 1364 01:01:26,050 --> 01:01:27,920 Null is still at the very end. 1365 01:01:27,920 --> 01:01:29,710 So I print out the number 2. 1366 01:01:29,710 --> 01:01:33,670 Next step, I update TMP one more time to be whatever is next. 1367 01:01:33,670 --> 01:01:36,230 That, then, does not yet equal null. 1368 01:01:36,230 --> 01:01:38,470 So I go ahead and print out the number 3 node. 1369 01:01:38,470 --> 01:01:44,120 Then one last time, I update TMP to be whatever TMP is in the next field. 1370 01:01:44,120 --> 01:01:47,980 But after 1, 2, 3, that last next field is null. 1371 01:01:47,980 --> 01:01:51,790 And so, I break out of this for-loop altogether. 1372 01:01:51,790 --> 01:01:54,730 So if I do this in pictorial form, all we're 1373 01:01:54,730 --> 01:01:58,300 doing, if I now use my finger to represent the TMP variable. 1374 01:01:58,300 --> 01:02:02,080 I initialize TMP to be whatever list is, so it points here. 1375 01:02:02,080 --> 01:02:04,780 That's obviously not null so I print out whatever 1376 01:02:04,780 --> 01:02:09,100 is that TMP, follow the arrow in number, and I print that out. 1377 01:02:09,100 --> 01:02:11,290 Then I update TMP to point here. 1378 01:02:11,290 --> 01:02:13,077 Then I update TMP to point here. 1379 01:02:13,077 --> 01:02:14,410 Then I update TMP to point here. 1380 01:02:14,410 --> 01:02:15,160 Wait, that's null. 1381 01:02:15,160 --> 01:02:17,480 The for-loop ends. 1382 01:02:17,480 --> 01:02:21,670 So, again, admittedly much more cryptic than our familiar int i equals 0, 1383 01:02:21,670 --> 01:02:22,610 and so forth. 1384 01:02:22,610 --> 01:02:28,855 But it's just a different utilization of the for-loop syntax. 1385 01:02:28,855 --> 01:02:29,355 Yes. 1386 01:02:29,355 --> 01:02:33,140 AUDIENCE: How does it happen that you're always printing out the numbers. 1387 01:02:33,140 --> 01:02:35,018 Because it seems to me that addresses- 1388 01:02:35,018 --> 01:02:36,060 SPEAKER 1: Good question. 1389 01:02:36,060 --> 01:02:39,060 How is it that I'm actually printing numbers and not printing out 1390 01:02:39,060 --> 01:02:40,440 addresses instead. 1391 01:02:40,440 --> 01:02:42,120 The compiler is helping me here. 1392 01:02:42,120 --> 01:02:44,730 Because I taught it, in the very beginning of my program, 1393 01:02:44,730 --> 01:02:45,360 what a node is. 1394 01:02:45,360 --> 01:02:47,730 Which looks like this here. 1395 01:02:47,730 --> 01:02:51,510 The compiler knows that a node has a number of fields and a next field 1396 01:02:51,510 --> 01:02:53,430 down here, in the for-loop. 1397 01:02:53,430 --> 01:02:59,410 Because I'm iterating using a node* pointer, and not an int* pointer, 1398 01:02:59,410 --> 01:03:02,160 the compiler knows that any time I'm pointing at something, 1399 01:03:02,160 --> 01:03:03,940 I'm pointing at the whole node. 1400 01:03:03,940 --> 01:03:07,020 Doesn't matter where specifically in the rectangle I'm pointing per se. 1401 01:03:07,020 --> 01:03:09,210 It's, ultimately, pointing at the whole node itself. 1402 01:03:09,210 --> 01:03:13,320 And the fact that I, then, use TMP arrow number means, OK, 1403 01:03:13,320 --> 01:03:14,490 adjust your finger slightly. 1404 01:03:14,490 --> 01:03:18,510 So you're literally pointing at the number field and not the next field. 1405 01:03:18,510 --> 01:03:22,920 So that's sufficient information for the computer to distinguish the 2. 1406 01:03:22,920 --> 01:03:23,560 Good question. 1407 01:03:23,560 --> 01:03:26,730 Other questions then on this approach here. 1408 01:03:26,730 --> 01:03:28,042 Yeah, in the back. 1409 01:03:28,042 --> 01:03:29,280 AUDIENCE: How would you-- 1410 01:03:29,280 --> 01:03:33,840 SPEAKER 1: How would I use a for-loop to add elements to a linked list? 1411 01:03:33,840 --> 01:03:38,640 You will do something like this, if I may, in problem set 5. 1412 01:03:38,640 --> 01:03:41,730 We will give you some of the scaffolding for doing this. 1413 01:03:41,730 --> 01:03:44,700 But in this coming weeks materials will we guide you to that. 1414 01:03:44,700 --> 01:03:47,293 But let me not spoil it just yet. 1415 01:03:47,293 --> 01:03:48,210 Fair question, though. 1416 01:03:48,210 --> 01:03:48,710 Yeah. 1417 01:03:48,710 --> 01:03:51,077 AUDIENCE: So I had a question about line 49. 1418 01:03:51,077 --> 01:03:51,660 SPEAKER 1: OK. 1419 01:03:51,660 --> 01:03:53,678 AUDIENCE: Is line 49 possible in line 43? 1420 01:03:53,678 --> 01:03:54,720 SPEAKER 1: Good question. 1421 01:03:54,720 --> 01:03:57,900 Is line 49 acceptable, even if we freed it earlier. 1422 01:03:57,900 --> 01:04:00,600 We didn't free it in line 43, in this case, right. 1423 01:04:00,600 --> 01:04:04,800 You can only reach line 49, if n does not equal null. 1424 01:04:04,800 --> 01:04:06,990 And you do not return on line 45. 1425 01:04:06,990 --> 01:04:07,860 So that's safe. 1426 01:04:07,860 --> 01:04:12,180 I was only doing those freeing, if I knew on line 45 that I'm out of here 1427 01:04:12,180 --> 01:04:13,620 anyway, at that point. 1428 01:04:13,620 --> 01:04:14,400 Good question. 1429 01:04:14,400 --> 01:04:15,030 And, yeah. 1430 01:04:15,030 --> 01:04:16,405 AUDIENCE: I had a quick question. 1431 01:04:16,405 --> 01:04:19,380 Is TMP [INAUDIBLE]. 1432 01:04:19,380 --> 01:04:22,650 SPEAKER 1: Correct You're asking about TMP, because it's in a for-loop, 1433 01:04:22,650 --> 01:04:24,358 does that mean you don't have to free it? 1434 01:04:24,358 --> 01:04:26,760 You never have to free pointers, per se. 1435 01:04:26,760 --> 01:04:31,560 You should only free addresses that were returned to you by malloc. 1436 01:04:31,560 --> 01:04:33,930 So I haven't finished the program, to be fair. 1437 01:04:33,930 --> 01:04:35,880 But you're not freeing variables. 1438 01:04:35,880 --> 01:04:37,740 You're not freeing like, fields. 1439 01:04:37,740 --> 01:04:40,870 You are freeing specific addresses, whatever they may be. 1440 01:04:40,870 --> 01:04:43,770 So the last thing, and I was stalling on showing this 1441 01:04:43,770 --> 01:04:45,450 because it too is a little cryptic. 1442 01:04:45,450 --> 01:04:48,570 Here is how you can free, now, a whole linked list. 1443 01:04:48,570 --> 01:04:51,242 In the world of arrays, recall, it was so easy. 1444 01:04:51,242 --> 01:04:52,200 You just say free list. 1445 01:04:52,200 --> 01:04:53,920 You return 0 and you're done. 1446 01:04:53,920 --> 01:04:55,140 Not with a linked list. 1447 01:04:55,140 --> 01:04:57,000 Because, again, the computer doesn't know 1448 01:04:57,000 --> 01:04:59,700 what you have stitched together using all of these pointers 1449 01:04:59,700 --> 01:05:01,140 all over the computer's memory. 1450 01:05:01,140 --> 01:05:03,180 You need to follow those arrows. 1451 01:05:03,180 --> 01:05:05,920 So one way to do this would be as follows. 1452 01:05:05,920 --> 01:05:10,920 While the list itself is not null, so while there's a list to be freed. 1453 01:05:10,920 --> 01:05:12,240 What do I want to do? 1454 01:05:12,240 --> 01:05:14,972 I'm going to give myself a temporary variable called TMP again. 1455 01:05:14,972 --> 01:05:17,430 And it's a different TMP because it's in a different scope. 1456 01:05:17,430 --> 01:05:21,210 It's inside of the while loop instead the for-loop, a few lines earlier. 1457 01:05:21,210 --> 01:05:26,640 I am going to initialize TMP to be the address of the next node. 1458 01:05:26,640 --> 01:05:29,160 Just so I can get one step ahead of things. 1459 01:05:29,160 --> 01:05:30,450 Why am I doing this? 1460 01:05:30,450 --> 01:05:34,330 Because now, I can boldly free the list itself, 1461 01:05:34,330 --> 01:05:35,970 which does not mean the whole list. 1462 01:05:35,970 --> 01:05:38,670 Again, I'm freeing the address in list, which 1463 01:05:38,670 --> 01:05:41,410 is the address of the number 1 node. 1464 01:05:41,410 --> 01:05:42,390 That's what list is. 1465 01:05:42,390 --> 01:05:44,980 It's just the address of the number 1 node. 1466 01:05:44,980 --> 01:05:47,880 So if I first use TMP to point out the number 1467 01:05:47,880 --> 01:05:53,310 2 slightly in the middle of the picture, then it is safe for me on line 61, 1468 01:05:53,310 --> 01:05:55,290 at the moment, to free list. 1469 01:05:55,290 --> 01:05:57,870 That is the address of the first node. 1470 01:05:57,870 --> 01:06:02,160 Now I'm going to say, all right, once I freed the first node in the list, 1471 01:06:02,160 --> 01:06:07,080 I can update the list itself to be literally TMP. 1472 01:06:07,080 --> 01:06:09,120 And now, the loop repeats. 1473 01:06:09,120 --> 01:06:10,450 So what's happening here? 1474 01:06:10,450 --> 01:06:16,140 If you think about this picture, TMP is initially pointing at not the list, 1475 01:06:16,140 --> 01:06:17,550 but list arrow next. 1476 01:06:17,550 --> 01:06:20,940 So TMP, represented by my right hand here, is pointing at the number 2. 1477 01:06:20,940 --> 01:06:25,530 Totally safe and reasonable to free now the list itself a.k.a. 1478 01:06:25,530 --> 01:06:27,150 the address of the number 1 node. 1479 01:06:27,150 --> 01:06:29,880 That has the effect of just throwing away the number 1 node, 1480 01:06:29,880 --> 01:06:32,670 telling the computer you can reuse that memory for you. 1481 01:06:32,670 --> 01:06:36,150 The last line of code I wrote updated list to point at the number 1482 01:06:36,150 --> 01:06:40,560 2, at which point my loop proceeded to do the exact same thing again. 1483 01:06:40,560 --> 01:06:43,590 And only once my finger is literally pointing at nowhere, 1484 01:06:43,590 --> 01:06:46,350 the null symbol, will the loop, by nature of a while 1485 01:06:46,350 --> 01:06:48,990 loop as I'll toggle back to, break out. 1486 01:06:48,990 --> 01:06:51,630 And there's nothing more to be freed. 1487 01:06:51,630 --> 01:06:54,690 So again, what you'll see, ultimately, in problem set 5, 1488 01:06:54,690 --> 01:06:58,690 more on that later, is an opportunity to play around with just this syntax. 1489 01:06:58,690 --> 01:06:59,730 But also these ideas. 1490 01:06:59,730 --> 01:07:02,580 But again, even though the syntax is admittedly pretty cryptic, 1491 01:07:02,580 --> 01:07:06,300 we're still using basics like these for-loops or while loops. 1492 01:07:06,300 --> 01:07:09,960 We're just starting to now follow explicit addresses rather 1493 01:07:09,960 --> 01:07:13,740 than letting the computer do all of the arithmetic for us, 1494 01:07:13,740 --> 01:07:15,635 as we previously benefited from. 1495 01:07:15,635 --> 01:07:18,760 At the very end of this thing, I'm going to return 0 as though all is well. 1496 01:07:18,760 --> 01:07:22,240 And I think, then, we're good to go. 1497 01:07:22,240 --> 01:07:22,740 All right. 1498 01:07:22,740 --> 01:07:25,960 Questions on this linked list code now? 1499 01:07:25,960 --> 01:07:28,710 And again, we'll walk through this again in the coming weeks spec. 1500 01:07:28,710 --> 01:07:29,210 Yeah. 1501 01:07:29,210 --> 01:07:33,613 AUDIENCE: Can you explain the while loop [INAUDIBLE] starts in other ways? 1502 01:07:33,613 --> 01:07:34,280 SPEAKER 1: Sure. 1503 01:07:34,280 --> 01:07:37,950 Can we explain this while loop here for freeing the list. 1504 01:07:37,950 --> 01:07:40,580 So notice that, first, I'm just asking the obvious question. 1505 01:07:40,580 --> 01:07:41,420 Is the list null? 1506 01:07:41,420 --> 01:07:45,390 Because if it is, there's no work to be done. 1507 01:07:45,390 --> 01:07:49,460 However, while the list is not null, according to line 58, 1508 01:07:49,460 --> 01:07:50,540 what do we want to do? 1509 01:07:50,540 --> 01:07:54,920 I want to create a temporary variable that points at the same thing 1510 01:07:54,920 --> 01:07:57,540 that list arrow next is pointing at. 1511 01:07:57,540 --> 01:07:58,760 So what does that mean? 1512 01:07:58,760 --> 01:08:00,260 Here is list. 1513 01:08:00,260 --> 01:08:03,690 List arrow next is whatever this thing is here. 1514 01:08:03,690 --> 01:08:06,470 So if my right hand represents the temporary variable, 1515 01:08:06,470 --> 01:08:10,470 I'm literally pointing at the same thing as the list is itself. 1516 01:08:10,470 --> 01:08:13,640 The next line of code, recall, was free the list. 1517 01:08:13,640 --> 01:08:16,400 And unlike, in our world of arrays, like half an hour 1518 01:08:16,400 --> 01:08:19,100 ago where that just meant free the whole darn list, 1519 01:08:19,100 --> 01:08:23,690 you now have taken over control over the computer's memory with a linked list, 1520 01:08:23,690 --> 01:08:25,550 in ways that you didn't with the array. 1521 01:08:25,550 --> 01:08:28,850 The computer knew how to free the whole array because you 1522 01:08:28,850 --> 01:08:30,680 malloc the whole thing at once. 1523 01:08:30,680 --> 01:08:34,580 You are now mallocing the linked list one node at a time. 1524 01:08:34,580 --> 01:08:37,430 And the operating system does not keep track of for you 1525 01:08:37,430 --> 01:08:38,810 where all these nodes are. 1526 01:08:38,810 --> 01:08:42,470 So when you free list, you are literally freeing 1527 01:08:42,470 --> 01:08:46,430 the value of the list variable, which is just this first node here. 1528 01:08:46,430 --> 01:08:49,820 Then my last line of code, which I'll flip back to in a second, updates 1529 01:08:49,820 --> 01:08:54,500 list to now ignore the free memory and point at 2. 1530 01:08:54,500 --> 01:08:57,080 And the story then repeats. 1531 01:08:57,080 --> 01:09:00,500 So, again, it's just a very pedantic way of using 1532 01:09:00,500 --> 01:09:04,460 this new syntax of star notation, and the arrow notation, and the like, 1533 01:09:04,460 --> 01:09:08,420 to do the equivalent of walking down all of these arrows. 1534 01:09:08,420 --> 01:09:10,640 Following all of these breadcrumbs. 1535 01:09:10,640 --> 01:09:13,940 But it does take admittedly some getting used to. 1536 01:09:13,940 --> 01:09:16,445 Syntax, you only have to do one week. 1537 01:09:16,445 --> 01:09:18,320 But, again, next week in Python will we begin 1538 01:09:18,320 --> 01:09:20,150 to abstract a lot of this complexity away. 1539 01:09:20,150 --> 01:09:22,020 But none of this complexity is going away. 1540 01:09:22,020 --> 01:09:24,770 It's just that someone else, the authors of Python for instance, 1541 01:09:24,770 --> 01:09:26,908 will have automated this stuff for us. 1542 01:09:26,908 --> 01:09:28,700 The goal this week is to understand what it 1543 01:09:28,700 --> 01:09:31,980 is we're going to get for free, so to speak, next week. 1544 01:09:31,980 --> 01:09:32,480 All right. 1545 01:09:32,480 --> 01:09:36,810 Questions on these length lists. 1546 01:09:36,810 --> 01:09:37,310 All right. 1547 01:09:37,310 --> 01:09:38,450 Just, yeah, in the back. 1548 01:09:38,450 --> 01:09:41,264 AUDIENCE: So are the while loops strictly necessary 1549 01:09:41,264 --> 01:09:42,728 for the freeing [INAUDIBLE]. 1550 01:09:42,728 --> 01:09:43,770 SPEAKER 1: Fair question. 1551 01:09:43,770 --> 01:09:46,353 Let me summarize as, could we have freed this with a for-loop? 1552 01:09:46,353 --> 01:09:47,279 Absolutely. 1553 01:09:47,279 --> 01:09:48,630 It just is a matter of style. 1554 01:09:48,630 --> 01:09:51,670 It's a little more elegant to do it in a while loop, according to me. 1555 01:09:51,670 --> 01:09:53,672 But other people will reasonably disagree. 1556 01:09:53,672 --> 01:09:56,380 Anything you can do with a while loop you can do with a for-loop, 1557 01:09:56,380 --> 01:09:57,390 and vise versa. 1558 01:09:57,390 --> 01:09:59,729 Do while loops, recall, are a little different. 1559 01:09:59,729 --> 01:10:02,372 But they will always do at least one thing. 1560 01:10:02,372 --> 01:10:04,830 But for-loops and while loops behave the same in this case. 1561 01:10:04,830 --> 01:10:05,953 AUDIENCE: Thank you. 1562 01:10:05,953 --> 01:10:06,620 SPEAKER 1: Sure. 1563 01:10:06,620 --> 01:10:08,000 Other questions? 1564 01:10:08,000 --> 01:10:10,399 All right, well let's just vary things a little bit here. 1565 01:10:10,399 --> 01:10:12,482 Just to see what some of the pitfalls might now be 1566 01:10:12,482 --> 01:10:14,240 without getting into the weeds of code. 1567 01:10:14,240 --> 01:10:18,229 Indeed, we'll try to save some of that for problem set 5's exploration. 1568 01:10:18,229 --> 01:10:22,520 But instead, let's imagine that we want to create a list here of our own. 1569 01:10:22,520 --> 01:10:25,700 I can offer, in exchange for a few volunteers, some foam fingers 1570 01:10:25,700 --> 01:10:27,617 to bring to the next game, perhaps. 1571 01:10:27,617 --> 01:10:29,450 Could we get maybe just one volunteer first? 1572 01:10:29,450 --> 01:10:30,109 Come on up. 1573 01:10:30,109 --> 01:10:33,109 You will be our linked list from the get go. 1574 01:10:33,109 --> 01:10:33,913 What's your name? 1575 01:10:33,913 --> 01:10:34,580 AUDIENCE: Pedro. 1576 01:10:34,580 --> 01:10:36,840 SPEAKER 1: Pedro, come on up. 1577 01:10:36,840 --> 01:10:38,090 All right, thank you to Pedro. 1578 01:10:38,090 --> 01:10:41,180 [AUDIENCE CLAPPING] 1579 01:10:41,180 --> 01:10:43,180 And if you want to just stand roughly over here. 1580 01:10:43,180 --> 01:10:45,729 But you are a null pointer so just point sort of at the ground, 1581 01:10:45,729 --> 01:10:46,930 as though you're pointing at 0. 1582 01:10:46,930 --> 01:10:47,430 All right. 1583 01:10:47,430 --> 01:10:50,027 So Pedro is our linked list of size 0, which pictorially 1584 01:10:50,027 --> 01:10:53,319 might look a little something like this for consistency with our past pictures. 1585 01:10:53,319 --> 01:10:58,000 Now suppose that we want to go ahead and malloc, oh, how about the number 2. 1586 01:10:58,000 --> 01:11:00,200 Can we get a volunteer to be on camera here? 1587 01:11:00,200 --> 01:11:00,700 OK. 1588 01:11:00,700 --> 01:11:01,867 You jumped out of your seat. 1589 01:11:01,867 --> 01:11:04,408 Do you want to come up? 1590 01:11:04,408 --> 01:11:06,200 OK, you really want the foam finger, I say. 1591 01:11:06,200 --> 01:11:06,370 All right. 1592 01:11:06,370 --> 01:11:07,450 Round of applause, sure. 1593 01:11:07,450 --> 01:11:12,690 [AUDIENCE CLAPPING] 1594 01:11:12,690 --> 01:11:13,235 OK. 1595 01:11:13,235 --> 01:11:14,110 And what's your name? 1596 01:11:14,110 --> 01:11:14,970 AUDIENCE: Caleb. 1597 01:11:14,970 --> 01:11:15,430 SPEAKER 1: Say again? 1598 01:11:15,430 --> 01:11:15,760 AUDIENCE: Caleb. 1599 01:11:15,760 --> 01:11:16,030 SPEAKER 1: Halen? 1600 01:11:16,030 --> 01:11:16,762 AUDIENCE: Caleb. 1601 01:11:16,762 --> 01:11:17,470 SPEAKER 1: Caleb. 1602 01:11:17,470 --> 01:11:18,770 Caleb, sorry. 1603 01:11:18,770 --> 01:11:19,270 All right. 1604 01:11:19,270 --> 01:11:21,790 So here is your number 2 for your number field. 1605 01:11:21,790 --> 01:11:23,020 And here is your pointer. 1606 01:11:23,020 --> 01:11:26,115 And come on, let's say that there was room for Caleb like, right there. 1607 01:11:26,115 --> 01:11:26,740 That's perfect. 1608 01:11:26,740 --> 01:11:29,480 So Caleb got malloced, if you will, over here. 1609 01:11:29,480 --> 01:11:33,805 So now if we want to insert Caleb and the number 2 into this linked list, 1610 01:11:33,805 --> 01:11:34,930 well what do we need to do? 1611 01:11:34,930 --> 01:11:36,340 I already initialized you to 2. 1612 01:11:36,340 --> 01:11:38,320 And pointing as you are to the ground means 1613 01:11:38,320 --> 01:11:40,630 you're initialized to null for your next field. 1614 01:11:40,630 --> 01:11:42,400 Pedro, what you should you-- perfect. 1615 01:11:42,400 --> 01:11:43,720 What should Pedro do. 1616 01:11:43,720 --> 01:11:44,620 That's fine, too. 1617 01:11:44,620 --> 01:11:46,195 So Pedro is now pointing at the list. 1618 01:11:46,195 --> 01:11:48,320 So now our list looks a little something like this. 1619 01:11:48,320 --> 01:11:49,540 So far, so good. 1620 01:11:49,540 --> 01:11:50,170 All is well. 1621 01:11:50,170 --> 01:11:52,670 So the first couple of these will be pretty straightforward. 1622 01:11:52,670 --> 01:11:56,180 Let's insert one more, if anyone really wants another foam finger. 1623 01:11:56,180 --> 01:11:57,680 Here, how about right in the middle. 1624 01:11:57,680 --> 01:11:58,870 Come on down. 1625 01:11:58,870 --> 01:12:01,678 And just in anticipation, how about let's malloc someone else. 1626 01:12:01,678 --> 01:12:03,220 OK, your friends are pointing at you. 1627 01:12:03,220 --> 01:12:05,350 Do you want to come down too, preemptively? 1628 01:12:05,350 --> 01:12:07,852 This is a pool of memory, if you will. 1629 01:12:07,852 --> 01:12:08,560 What's your name? 1630 01:12:08,560 --> 01:12:09,130 AUDIENCE: Hannah. 1631 01:12:09,130 --> 01:12:09,880 SPEAKER 1: Hannah. 1632 01:12:09,880 --> 01:12:10,600 All right, Hanna. 1633 01:12:10,600 --> 01:12:11,440 You are number 4. 1634 01:12:11,440 --> 01:12:13,180 [AUDIENCE CLAPPING] 1635 01:12:13,180 --> 01:12:14,810 And hang there for just a moment. 1636 01:12:14,810 --> 01:12:15,310 All right. 1637 01:12:15,310 --> 01:12:16,870 So we've just malloced Hannah. 1638 01:12:16,870 --> 01:12:20,140 And Hannah, how about Hannah, suppose you ended up over there 1639 01:12:20,140 --> 01:12:21,800 in just some random location. 1640 01:12:21,800 --> 01:12:22,300 All right. 1641 01:12:22,300 --> 01:12:25,960 So what should we now do, if the goal is to keep these things sorted? 1642 01:12:25,960 --> 01:12:26,560 How about? 1643 01:12:26,560 --> 01:12:28,538 So Pedro, do you have to update yourself? 1644 01:12:28,538 --> 01:12:29,080 AUDIENCE: No. 1645 01:12:29,080 --> 01:12:29,410 SPEAKER 1: No. 1646 01:12:29,410 --> 01:12:29,910 All right. 1647 01:12:29,910 --> 01:12:31,300 Caleb, what do you have to do? 1648 01:12:31,300 --> 01:12:31,800 OK. 1649 01:12:31,800 --> 01:12:34,692 And Hannah what should you be doing? 1650 01:12:34,692 --> 01:12:37,900 I would, it's just for you for now, so point at the ground representing null. 1651 01:12:37,900 --> 01:12:38,400 OK. 1652 01:12:38,400 --> 01:12:41,290 So, again demonstrating the fact that, unlike in past weeks where 1653 01:12:41,290 --> 01:12:43,810 we had our nice, clean array back, to back, to back, 1654 01:12:43,810 --> 01:12:46,380 contiguously, these guys are deliberately all over the stage. 1655 01:12:46,380 --> 01:12:47,380 So let's malloc another. 1656 01:12:47,380 --> 01:12:49,012 How about number 5. 1657 01:12:49,012 --> 01:12:49,720 What's your name? 1658 01:12:49,720 --> 01:12:50,440 AUDIENCE: Jonathan. 1659 01:12:50,440 --> 01:12:50,920 SPEAKER 1: Jonathan. 1660 01:12:50,920 --> 01:12:51,753 All right, Jonathan. 1661 01:12:51,753 --> 01:12:53,440 You are our number 5. 1662 01:12:53,440 --> 01:12:55,255 And pick your favorite place in memory. 1663 01:12:55,255 --> 01:12:56,200 [AUDIENCE CLAPPING] 1664 01:12:56,200 --> 01:12:56,700 OK. 1665 01:12:58,820 --> 01:12:59,320 All right. 1666 01:12:59,320 --> 01:13:01,548 So Jonathan's now over there. 1667 01:13:01,548 --> 01:13:02,590 And Hannah is over there. 1668 01:13:02,590 --> 01:13:04,447 So 5, we want to point Hannah at number 5. 1669 01:13:04,447 --> 01:13:06,280 So you, of course, are going to point there. 1670 01:13:06,280 --> 01:13:07,655 And where should you be pointing? 1671 01:13:07,655 --> 01:13:09,500 Down to represent null, as well. 1672 01:13:09,500 --> 01:13:10,000 OK. 1673 01:13:10,000 --> 01:13:11,553 So pretty straightforward. 1674 01:13:11,553 --> 01:13:13,220 But now things get a little interesting. 1675 01:13:13,220 --> 01:13:16,000 And here, we'll use a chance to, without the weeds of code, 1676 01:13:16,000 --> 01:13:19,090 point out how order of operations is really going to matter. 1677 01:13:19,090 --> 01:13:23,320 Suppose that I next want to allocate say, the number 1. 1678 01:13:23,320 --> 01:13:25,510 And I want to insert the number 1 into this list. 1679 01:13:25,510 --> 01:13:26,010 Yes. 1680 01:13:26,010 --> 01:13:27,620 This is what the code would look like. 1681 01:13:27,620 --> 01:13:31,180 But if we act this out-- could we get one more volunteer? 1682 01:13:31,180 --> 01:13:32,990 How about on the end there in the sweater. 1683 01:13:32,990 --> 01:13:33,490 Yeah. 1684 01:13:33,490 --> 01:13:34,780 Come on down. 1685 01:13:34,780 --> 01:13:35,950 We have, what's your name? 1686 01:13:35,950 --> 01:13:36,850 AUDIENCE: Lauren. 1687 01:13:36,850 --> 01:13:37,300 SPEAKER 1: Lauren. 1688 01:13:37,300 --> 01:13:37,540 OK. 1689 01:13:37,540 --> 01:13:38,650 Lauren, come on down. 1690 01:13:38,650 --> 01:13:43,975 [AUDIENCE CLAPPING] 1691 01:13:43,975 --> 01:13:45,850 And how about, Lauren, why don't you go right 1692 01:13:45,850 --> 01:13:47,470 in here in front, if you don't mind. 1693 01:13:47,470 --> 01:13:48,670 Here is your number. 1694 01:13:48,670 --> 01:13:49,780 Here is your pointer. 1695 01:13:49,780 --> 01:13:51,850 So I've initialized Lauren to the number 1. 1696 01:13:51,850 --> 01:13:54,460 And your pointer will be null, pointing at the ground. 1697 01:13:54,460 --> 01:13:57,003 Where do you belong if we're maintaining sorted order? 1698 01:13:57,003 --> 01:13:58,420 Looks like right at the beginning. 1699 01:13:58,420 --> 01:14:00,920 What should happen here? 1700 01:14:00,920 --> 01:14:01,420 OK. 1701 01:14:01,420 --> 01:14:06,100 So Pedro has presumed to point now at Lauren. 1702 01:14:06,100 --> 01:14:10,330 But how do you know where to point? 1703 01:14:10,330 --> 01:14:11,500 AUDIENCE: He's number 2. 1704 01:14:11,500 --> 01:14:13,400 SPEAKER 1: Pedro's undoing what he did a moment ago. 1705 01:14:13,400 --> 01:14:14,380 So this was deliberate. 1706 01:14:14,380 --> 01:14:17,750 And that was perfect that Pedro presumed to point immediately at Lauren. 1707 01:14:17,750 --> 01:14:18,250 Why? 1708 01:14:18,250 --> 01:14:21,950 You literally just orphaned all of these folks, all of these chunks of memory. 1709 01:14:21,950 --> 01:14:22,450 Why? 1710 01:14:22,450 --> 01:14:26,800 Because if Pedro was our only variable pointing at that chunk of memory, 1711 01:14:26,800 --> 01:14:29,800 this is the danger of using pointers, and dynamic memory allocation, 1712 01:14:29,800 --> 01:14:31,180 and building your own data structures. 1713 01:14:31,180 --> 01:14:33,138 The moment you point temporarily, if you could, 1714 01:14:33,138 --> 01:14:36,490 to Lauren, I have no idea where he's pointing to. 1715 01:14:36,490 --> 01:14:41,260 I have no idea how to get back to Caleb, or Hannah, or anyone else on stage. 1716 01:14:41,260 --> 01:14:42,040 So that was bad. 1717 01:14:42,040 --> 01:14:43,310 So you did undo it. 1718 01:14:43,310 --> 01:14:44,290 So that's good. 1719 01:14:44,290 --> 01:14:46,300 I think we need Lauren to make a decision first. 1720 01:14:46,300 --> 01:14:47,410 Who should you point at? 1721 01:14:47,410 --> 01:14:47,650 AUDIENCE: Caleb. 1722 01:14:47,650 --> 01:14:48,820 SPEAKER 1: So pointing at Caleb. 1723 01:14:48,820 --> 01:14:49,120 Why? 1724 01:14:49,120 --> 01:14:51,703 Because you're pointing at literally who Pedro is pointing at. 1725 01:14:51,703 --> 01:14:53,490 Pedro, now what are you safe to do? 1726 01:14:53,490 --> 01:14:53,990 Good. 1727 01:14:53,990 --> 01:14:55,730 So order of operations there matters. 1728 01:14:55,730 --> 01:14:59,830 And if we had just done this line of code in red here, list equals n. 1729 01:14:59,830 --> 01:15:02,740 That was like Pedro's first instinct, bad things happen. 1730 01:15:02,740 --> 01:15:04,700 And we orphaned the rest of the list. 1731 01:15:04,700 --> 01:15:08,350 But if we think through it logically and do this, as Lauren did for us, instead, 1732 01:15:08,350 --> 01:15:11,840 we've now updated the list to look a little something more like this. 1733 01:15:11,840 --> 01:15:12,910 Let's do one last one. 1734 01:15:12,910 --> 01:15:15,485 We got one more foam finger here for the number 3. 1735 01:15:15,485 --> 01:15:16,360 How about on the end? 1736 01:15:16,360 --> 01:15:16,860 Yeah. 1737 01:15:16,860 --> 01:15:18,190 You want to come down. 1738 01:15:18,190 --> 01:15:18,850 All right. 1739 01:15:18,850 --> 01:15:19,900 One final volunteer. 1740 01:15:19,900 --> 01:15:26,010 [AUDIENCE CLAPPING] 1741 01:15:26,010 --> 01:15:26,510 All right. 1742 01:15:26,510 --> 01:15:27,385 And what's your name? 1743 01:15:27,385 --> 01:15:28,230 AUDIENCE: Miriam. 1744 01:15:28,230 --> 01:15:28,430 SPEAKER 1: I'm sorry? 1745 01:15:28,430 --> 01:15:28,940 AUDIENCE: Miriam. 1746 01:15:28,940 --> 01:15:29,480 SPEAKER 1: Miriam. 1747 01:15:29,480 --> 01:15:29,750 All right. 1748 01:15:29,750 --> 01:15:30,860 So here is your number 3. 1749 01:15:30,860 --> 01:15:31,735 Here is your pointer. 1750 01:15:31,735 --> 01:15:35,370 If you want to go maybe in the middle of the stage in a random memory location. 1751 01:15:35,370 --> 01:15:39,270 So here, too, the goal is to maintain sorted order. 1752 01:15:39,270 --> 01:15:44,400 So let's ask the audience, who or what number should point at whom first here? 1753 01:15:44,400 --> 01:15:46,910 So we don't screw up and orphan some of the memory. 1754 01:15:46,910 --> 01:15:50,240 And if we do orphan memory, this is what's called, again per last week, 1755 01:15:50,240 --> 01:15:51,110 a memory leak. 1756 01:15:51,110 --> 01:15:53,420 Your Mac, your PC, your phone can start to slow down 1757 01:15:53,420 --> 01:15:56,610 if you keep asking for memory but never give it back or lose track of it. 1758 01:15:56,610 --> 01:15:58,430 So we want to get this right. 1759 01:15:58,430 --> 01:16:00,140 Who should point at whom? 1760 01:16:00,140 --> 01:16:01,370 Or what number? 1761 01:16:01,370 --> 01:16:02,312 Say again. 1762 01:16:02,312 --> 01:16:03,020 AUDIENCE: 3 to 4. 1763 01:16:03,020 --> 01:16:04,700 SPEAKER 1: 3 should point at 4. 1764 01:16:04,700 --> 01:16:08,090 So 3, do you want to point at 4. 1765 01:16:08,090 --> 01:16:09,800 And not, so, OK, good. 1766 01:16:09,800 --> 01:16:14,960 And how did you know, Miriam, whom to point at? 1767 01:16:14,960 --> 01:16:15,998 AUDIENCE: Copying Caleb. 1768 01:16:15,998 --> 01:16:16,790 SPEAKER 1: Perfect. 1769 01:16:16,790 --> 01:16:18,150 OK, so copying Caleb. 1770 01:16:18,150 --> 01:16:18,650 Why? 1771 01:16:18,650 --> 01:16:22,220 Because if you look at where this list is currently constructed, 1772 01:16:22,220 --> 01:16:25,070 and you can cheat on the board here, 2 is pointing to 4. 1773 01:16:25,070 --> 01:16:28,640 If you point at whoever Caleb, number 2, is pointing out, 1774 01:16:28,640 --> 01:16:31,460 that, indeed, leads you to Hannah for number 4. 1775 01:16:31,460 --> 01:16:35,600 So now what's the next step to stitch this together? 1776 01:16:35,600 --> 01:16:37,220 Our voice in the crowd. 1777 01:16:37,220 --> 01:16:38,150 AUDIENCE: 2 to 3. 1778 01:16:38,150 --> 01:16:39,260 SPEAKER 1: 2 to 3. 1779 01:16:39,260 --> 01:16:40,310 So, 2 to 3. 1780 01:16:40,310 --> 01:16:42,903 So Caleb, I think it's now safe for you to decouple. 1781 01:16:42,903 --> 01:16:44,820 Because someone is already pointing at Hannah. 1782 01:16:44,820 --> 01:16:45,945 We haven't orphaned anyone. 1783 01:16:45,945 --> 01:16:47,840 So now, if we follow the breadcrumbs, we've 1784 01:16:47,840 --> 01:16:52,870 got Pedro leading to 1, to 2, to 3, to 4, to 5. 1785 01:16:52,870 --> 01:16:55,370 We need the numbers back, but you can keep the foam fingers. 1786 01:16:55,370 --> 01:16:57,537 Thank you to our volunteers here. 1787 01:16:57,537 --> 01:16:58,370 AUDIENCE: Thank you. 1788 01:16:58,370 --> 01:16:58,870 Thank you. 1789 01:16:58,870 --> 01:17:00,260 [AUDIENCE CLAPPING] 1790 01:17:00,260 --> 01:17:03,257 SPEAKER 1: You can just put the numbers here. 1791 01:17:03,257 --> 01:17:04,090 AUDIENCE: Thank you. 1792 01:17:04,090 --> 01:17:05,257 SPEAKER 1: Thank you to all. 1793 01:17:05,257 --> 01:17:09,200 So this is only to say that when you start looking at the code this week 1794 01:17:09,200 --> 01:17:11,763 and in the problem set, it's going to be very easy to lose 1795 01:17:11,763 --> 01:17:13,180 sight of the forest for the trees. 1796 01:17:13,180 --> 01:17:15,220 Because the code does get really dense. 1797 01:17:15,220 --> 01:17:20,240 But the idea is, again, really do bubble up to these higher level descriptions. 1798 01:17:20,240 --> 01:17:23,300 And if you think about data structures at this level. 1799 01:17:23,300 --> 01:17:25,417 If you go off in program after a class like CS50 1800 01:17:25,417 --> 01:17:28,000 and your whiteboarding something with a friend or a colleague, 1801 01:17:28,000 --> 01:17:31,030 most people think at and talk at this level. 1802 01:17:31,030 --> 01:17:33,550 And they just assume that, yeah, if we went back and looked 1803 01:17:33,550 --> 01:17:36,890 at our textbooks or class notes, we could figure out how to implement this. 1804 01:17:36,890 --> 01:17:38,740 But the important stuff is the conversation. 1805 01:17:38,740 --> 01:17:40,120 And the idea is up here. 1806 01:17:40,120 --> 01:17:45,080 Even though, via this week, will we get some practice with the actual code. 1807 01:17:45,080 --> 01:17:49,090 So when it comes to analyzing an algorithm like this, 1808 01:17:49,090 --> 01:17:51,160 let's consider the following. 1809 01:17:51,160 --> 01:17:58,480 What might be now the running time of operations like searching and inserting 1810 01:17:58,480 --> 01:18:00,100 into a linked list? 1811 01:18:00,100 --> 01:18:01,810 We talked about arrays earlier. 1812 01:18:01,810 --> 01:18:04,810 And we had some binary search possibilities still, as soon 1813 01:18:04,810 --> 01:18:05,650 as it's an array. 1814 01:18:05,650 --> 01:18:08,830 But as soon as we have a linked list, these arrows, like our volunteers, 1815 01:18:08,830 --> 01:18:10,180 could be anywhere on stage. 1816 01:18:10,180 --> 01:18:11,888 And so you can't just assume that you can 1817 01:18:11,888 --> 01:18:14,680 jump arithmetically to the middle element, to the middle element, 1818 01:18:14,680 --> 01:18:15,500 to the middle one. 1819 01:18:15,500 --> 01:18:19,090 You pretty much have to follow all of these breadcrumbs again and again. 1820 01:18:19,090 --> 01:18:21,880 So how might that inform what we see? 1821 01:18:21,880 --> 01:18:23,595 Well, consider this too. 1822 01:18:23,595 --> 01:18:26,470 Even though I keep drawing all these pictures with all of the numbers 1823 01:18:26,470 --> 01:18:26,980 exposed. 1824 01:18:26,980 --> 01:18:28,772 And all of us humans in the room can easily 1825 01:18:28,772 --> 01:18:32,360 spot where the 1 is, where the 2 is, where the 3 is, the computer, again, 1826 01:18:32,360 --> 01:18:36,610 just like with our lockers and arrays, can only see one location at a time. 1827 01:18:36,610 --> 01:18:40,510 And the key thing with a linked list is that the only address 1828 01:18:40,510 --> 01:18:44,410 we've fundamentally been remembering is what Pedro represented a moment ago. 1829 01:18:44,410 --> 01:18:47,990 He was the link to all of the other nodes. 1830 01:18:47,990 --> 01:18:49,990 And, in turn, each person led to the next. 1831 01:18:49,990 --> 01:18:54,650 But without Pedro, we would have lost some of, or all of, the linked list. 1832 01:18:54,650 --> 01:18:56,950 So when you start with a linked list, if you 1833 01:18:56,950 --> 01:19:00,730 want to find an element as via search, you have to do it linearly. 1834 01:19:00,730 --> 01:19:02,200 Following all of the arrows. 1835 01:19:02,200 --> 01:19:04,210 Following all of the pointers on the stage 1836 01:19:04,210 --> 01:19:06,340 in order to get to the node in question. 1837 01:19:06,340 --> 01:19:09,700 And only once you hit null can you conclude, yep, it was there. 1838 01:19:09,700 --> 01:19:11,500 Or no, it was not. 1839 01:19:11,500 --> 01:19:14,440 So given that if a computer, essentially, 1840 01:19:14,440 --> 01:19:18,970 can only see the number 1, or the number 2, or the number 3, or the number 4, 1841 01:19:18,970 --> 01:19:22,270 or the number 5, one at a time, how might we 1842 01:19:22,270 --> 01:19:25,690 think about the running time of search? 1843 01:19:25,690 --> 01:19:27,610 And it is indeed Big O of n. 1844 01:19:27,610 --> 01:19:28,410 But why is that? 1845 01:19:28,410 --> 01:19:30,910 Well, in the worst case, the number you might be looking for 1846 01:19:30,910 --> 01:19:32,480 is all the way at the end. 1847 01:19:32,480 --> 01:19:35,710 And so, obviously, you're going to have to search all of the n elements. 1848 01:19:35,710 --> 01:19:37,943 And I drew these things with boxes on top of them. 1849 01:19:37,943 --> 01:19:40,360 Because, again, even though you and I can immediately see, 1850 01:19:40,360 --> 01:19:42,610 where the 5 is for instance, the computer 1851 01:19:42,610 --> 01:19:46,480 can only figure that out by starting at the beginning and going there. 1852 01:19:46,480 --> 01:19:48,400 So there, too, is another trade off. 1853 01:19:48,400 --> 01:19:52,030 It would seem that, overnight, we have lost the ability 1854 01:19:52,030 --> 01:19:57,190 to do a very powerful algorithm from week 0 known as binary search, right. 1855 01:19:57,190 --> 01:19:57,820 It's gone. 1856 01:19:57,820 --> 01:20:01,810 Because there's no way in this picture to jump mathematically 1857 01:20:01,810 --> 01:20:04,375 to the middle node, unless you remember where it is. 1858 01:20:04,375 --> 01:20:06,250 And then, remember where every other node is. 1859 01:20:06,250 --> 01:20:08,042 And at that point, you're back to an array. 1860 01:20:08,042 --> 01:20:12,380 Linked list, by design, only remember the next node in the list. 1861 01:20:12,380 --> 01:20:12,880 All right. 1862 01:20:12,880 --> 01:20:15,370 How about something like insert? 1863 01:20:15,370 --> 01:20:18,190 In the worst case, perhaps, how many steps 1864 01:20:18,190 --> 01:20:21,340 might it take to insert something into a linked list? 1865 01:20:21,340 --> 01:20:22,998 Someone else. 1866 01:20:22,998 --> 01:20:23,540 Someone else. 1867 01:20:23,540 --> 01:20:24,040 Yeah. 1868 01:20:24,040 --> 01:20:25,060 AUDIENCE: N squared. 1869 01:20:25,060 --> 01:20:25,480 SPEAKER 1: Say again? 1870 01:20:25,480 --> 01:20:26,320 AUDIENCE: N squared. 1871 01:20:26,320 --> 01:20:26,890 SPEAKER 1: N squared. 1872 01:20:26,890 --> 01:20:28,232 Fortunately, it's not that bad. 1873 01:20:28,232 --> 01:20:29,440 It's not as bad as n squared. 1874 01:20:29,440 --> 01:20:31,720 That typically means doing n things, n times. 1875 01:20:31,720 --> 01:20:36,260 And I think we can stay under that, but not a bad thought. 1876 01:20:36,260 --> 01:20:36,760 Yeah. 1877 01:20:36,760 --> 01:20:37,832 AUDIENCE: Is it n? 1878 01:20:37,832 --> 01:20:39,040 SPEAKER 1: Why would it be n? 1879 01:20:39,040 --> 01:20:42,787 AUDIENCE: Because the [INAUDIBLE]. 1880 01:20:42,787 --> 01:20:43,370 SPEAKER 1: OK. 1881 01:20:43,370 --> 01:20:45,650 So to summarize, you're proposing n. 1882 01:20:45,650 --> 01:20:47,513 Because to find where the thing goes, you 1883 01:20:47,513 --> 01:20:49,430 have to traverse, potentially, the whole list. 1884 01:20:49,430 --> 01:20:52,220 Because if I'm inserting the number 6 or the number 99, 1885 01:20:52,220 --> 01:20:54,770 that numerically belongs at the very end, 1886 01:20:54,770 --> 01:20:57,830 I can only find its location by looking for all of them. 1887 01:20:57,830 --> 01:20:59,368 At this point, though, in the term. 1888 01:20:59,368 --> 01:21:01,160 And really, at this point in the story, you 1889 01:21:01,160 --> 01:21:04,590 should start to question these very simplistic questions, to be honest. 1890 01:21:04,590 --> 01:21:08,360 Because the answer is almost always going to depend, right. 1891 01:21:08,360 --> 01:21:10,980 If I've just got a link to list that looks like this, 1892 01:21:10,980 --> 01:21:14,240 the first question back to someone asking this question 1893 01:21:14,240 --> 01:21:17,300 would be, well does the list need to be sorted, right? 1894 01:21:17,300 --> 01:21:19,692 I've drawn it as sorted and it might imply as much. 1895 01:21:19,692 --> 01:21:21,650 So that's a reasonable assumption to have made. 1896 01:21:21,650 --> 01:21:24,320 But if I don't care about maintaining sorted order, 1897 01:21:24,320 --> 01:21:28,190 I could actually insert into a linked list in constant time. 1898 01:21:28,190 --> 01:21:28,730 Why? 1899 01:21:28,730 --> 01:21:31,628 I could just keep inserting into the beginning, into the beginning, 1900 01:21:31,628 --> 01:21:32,420 into the beginning. 1901 01:21:32,420 --> 01:21:34,310 And even though the list is getting longer, 1902 01:21:34,310 --> 01:21:38,270 the number of steps required to insert something between the first element 1903 01:21:38,270 --> 01:21:40,220 is not growing at all. 1904 01:21:40,220 --> 01:21:42,740 You just keep inserting. 1905 01:21:42,740 --> 01:21:44,900 If you want to keep it sorted though, yes, it's 1906 01:21:44,900 --> 01:21:46,310 going to be, indeed, Big O of n. 1907 01:21:46,310 --> 01:21:47,840 But again, these kinds of, now, assumptions 1908 01:21:47,840 --> 01:21:49,048 are going to start to matter. 1909 01:21:49,048 --> 01:21:51,740 So let's for the sake of discussion say it's Big O of n, 1910 01:21:51,740 --> 01:21:53,660 if we do want to maintain sorted order. 1911 01:21:53,660 --> 01:21:56,810 But what about in the case of not caring. 1912 01:21:56,810 --> 01:21:58,628 It might indeed be a Big O of 1. 1913 01:21:58,628 --> 01:22:01,670 And now these are the kinds of decisions that will start to leave to you. 1914 01:22:01,670 --> 01:22:03,200 What about in the best case here? 1915 01:22:03,200 --> 01:22:05,240 If we're thinking about Big Omega notation, 1916 01:22:05,240 --> 01:22:07,632 then, frankly, we could just get lucky in the best case. 1917 01:22:07,632 --> 01:22:10,340 And the element we're looking for happens to be at the beginning. 1918 01:22:10,340 --> 01:22:14,570 Or heck, we just blindly insert to the beginning irrespective of the order 1919 01:22:14,570 --> 01:22:16,500 that we want to keep things in. 1920 01:22:16,500 --> 01:22:17,000 All right. 1921 01:22:17,000 --> 01:22:22,418 So besides then, how can we improve further on this design? 1922 01:22:22,418 --> 01:22:23,960 We don't need to stop at linked list. 1923 01:22:23,960 --> 01:22:26,090 Because, honestly, it's not been a clear win. 1924 01:22:26,090 --> 01:22:28,940 Like, linked list allow us to use more of our memory 1925 01:22:28,940 --> 01:22:32,430 because we don't need massive growing chunks of contiguous memory. 1926 01:22:32,430 --> 01:22:33,300 So that's a win. 1927 01:22:33,300 --> 01:22:37,310 But they still require Big O of n time to find the end of it, 1928 01:22:37,310 --> 01:22:38,630 if we care about order. 1929 01:22:38,630 --> 01:22:41,870 We're using at least twice as much memory for the darn pointer. 1930 01:22:41,870 --> 01:22:44,120 So that seems like a sidestep. 1931 01:22:44,120 --> 01:22:46,100 It's not really a step forward. 1932 01:22:46,100 --> 01:22:47,840 So can we do better? 1933 01:22:47,840 --> 01:22:52,157 Here's where we can now accelerate the story by just stipulating that, hey, 1934 01:22:52,157 --> 01:22:53,990 even if you haven't used this technique yet, 1935 01:22:53,990 --> 01:22:58,130 we would seem to have an ability to stitch together pieces of memory just 1936 01:22:58,130 --> 01:22:59,120 using pointers . 1937 01:22:59,120 --> 01:23:01,520 And anything you could imagine drawing with arrows, 1938 01:23:01,520 --> 01:23:04,140 you can implement, it would seem, in code. 1939 01:23:04,140 --> 01:23:06,620 So what if we leverage a second dimension. 1940 01:23:06,620 --> 01:23:09,137 Instead of just stringing together things laterally, 1941 01:23:09,137 --> 01:23:10,970 left to right, essentially, even though they 1942 01:23:10,970 --> 01:23:12,620 were bouncing around on the screen. 1943 01:23:12,620 --> 01:23:15,770 What if we start to leverage a second dimension here, so to speak. 1944 01:23:15,770 --> 01:23:19,400 And build more interesting structures in the computer's memory. 1945 01:23:19,400 --> 01:23:22,190 Well it turns out that in a computer's memory, 1946 01:23:22,190 --> 01:23:25,130 we could create a tree, similar to a family tree. 1947 01:23:25,130 --> 01:23:28,880 If you've ever seen or draw on a family tree with grandparents, and parents, 1948 01:23:28,880 --> 01:23:30,170 and siblings, and so forth. 1949 01:23:32,960 --> 01:23:36,170 So inverted branch of a tree that grows, typically 1950 01:23:36,170 --> 01:23:39,050 when it's drawn, downward instead of upward like a typical tree. 1951 01:23:39,050 --> 01:23:41,540 But that's something we could translate into code as well. 1952 01:23:41,540 --> 01:23:45,240 Specifically, let's do something called a binary search tree. 1953 01:23:45,240 --> 01:23:47,120 Which is a type of tree. 1954 01:23:47,120 --> 01:23:49,670 And what I mean by this is the following. 1955 01:23:49,670 --> 01:23:50,480 Notice this. 1956 01:23:50,480 --> 01:23:53,360 This is an example of an array from like week 2, 1957 01:23:53,360 --> 01:23:54,750 when we first talked about those. 1958 01:23:54,750 --> 01:23:56,450 And we had the lockers on stage. 1959 01:23:56,450 --> 01:24:02,480 And recall that what was nice about an array, if 1, it's sorted. 1960 01:24:02,480 --> 01:24:05,540 And 2, all of its numbers are indeed contiguous, 1961 01:24:05,540 --> 01:24:07,530 which is by definition an array. 1962 01:24:07,530 --> 01:24:09,270 We can just do some simple math. 1963 01:24:09,270 --> 01:24:13,980 For instance, if there are 7 elements in this array, and we do 7 divided by 2, 1964 01:24:13,980 --> 01:24:14,480 that's what? 1965 01:24:14,480 --> 01:24:17,330 3 and 1/2, round down through truncation, that's 3. 1966 01:24:17,330 --> 01:24:18,680 0, 1, 2, 3. 1967 01:24:18,680 --> 01:24:21,933 That gives me the middle element, arithmetically, in this thing. 1968 01:24:21,933 --> 01:24:24,350 And even though I have to be careful about rounding, using 1969 01:24:24,350 --> 01:24:28,430 simple arithmetic, I can very quickly, with a single line of code or math, 1970 01:24:28,430 --> 01:24:30,890 find for you the middle of the left half, of the left half, 1971 01:24:30,890 --> 01:24:32,182 of the right half, or whatever. 1972 01:24:32,182 --> 01:24:33,480 That's the power of arrays. 1973 01:24:33,480 --> 01:24:35,420 And that's what gave us binary search. 1974 01:24:35,420 --> 01:24:36,940 And how did binary search work? 1975 01:24:36,940 --> 01:24:38,190 Well, we looked at the middle. 1976 01:24:38,190 --> 01:24:39,830 And then, we went left or right. 1977 01:24:39,830 --> 01:24:45,080 And then, we went left or right again, implied by this color scheme here. 1978 01:24:45,080 --> 01:24:50,210 Wouldn't it be nice if we somehow preserved the new upsides 1979 01:24:50,210 --> 01:24:53,038 today of dynamic memory allocation, giving ourselves 1980 01:24:53,038 --> 01:24:55,580 the ability to just add another element, add another element, 1981 01:24:55,580 --> 01:24:56,750 add another element. 1982 01:24:56,750 --> 01:24:59,300 But retain the power of binary search. 1983 01:24:59,300 --> 01:25:04,100 Because log of n was much better than n, certainly for large data sets, right. 1984 01:25:04,100 --> 01:25:06,980 Even the phone book demonstrated as much weeks ago. 1985 01:25:06,980 --> 01:25:11,010 So what if I draw this same picture in 2 dimensions. 1986 01:25:11,010 --> 01:25:14,960 And I preserve the color scheme, just so it's obvious what came where. 1987 01:25:14,960 --> 01:25:18,500 What are these things look like now? 1988 01:25:18,500 --> 01:25:21,050 Maybe, like, things we might now call nodes, right. 1989 01:25:21,050 --> 01:25:25,030 A node is just a generic term for like, storing some data. 1990 01:25:25,030 --> 01:25:28,200 What if the data these nodes are storing are numbers. 1991 01:25:28,200 --> 01:25:29,730 So still integers. 1992 01:25:29,730 --> 01:25:33,860 But what if we connected these cleverly, like an old family tree. 1993 01:25:33,860 --> 01:25:39,230 Whereby, every node has not one pointer now, but as many as 2. 1994 01:25:39,230 --> 01:25:42,330 Maybe 0, like in the leaves at the bottom are in green. 1995 01:25:42,330 --> 01:25:45,450 But other nodes on the interior might have as many as 2. 1996 01:25:45,450 --> 01:25:47,250 Like having 2 children, so to speak. 1997 01:25:47,250 --> 01:25:49,420 And indeed, the vernacular here is exactly that. 1998 01:25:49,420 --> 01:25:51,330 This would be called the root of the tree. 1999 01:25:51,330 --> 01:25:54,270 Or this would be a parent, with respect to these children. 2000 01:25:54,270 --> 01:25:56,910 The green ones would be grandchildren, respect to these. 2001 01:25:56,910 --> 01:26:01,530 The green ones would be siblings with respect to each other. 2002 01:26:01,530 --> 01:26:02,370 And over there, too. 2003 01:26:02,370 --> 01:26:04,662 So all the same jargon you might use in the real world, 2004 01:26:04,662 --> 01:26:07,920 applies in the world of data structures and CS trees. 2005 01:26:07,920 --> 01:26:12,810 But this is interesting because I think we could build this now, this data 2006 01:26:12,810 --> 01:26:15,300 structure in the computer's memory. 2007 01:26:15,300 --> 01:26:15,840 How? 2008 01:26:15,840 --> 01:26:20,040 Well, suppose that we defined a node to be no longer just 2009 01:26:20,040 --> 01:26:22,110 this, a number in a next field. 2010 01:26:22,110 --> 01:26:24,870 What if we give ourselves a bit more room here? 2011 01:26:24,870 --> 01:26:29,730 And give ourselves a pointer called left and another one called right. 2012 01:26:29,730 --> 01:26:32,080 Both of which is a pointer to a struct node. 2013 01:26:32,080 --> 01:26:36,030 So same idea as before, but now we just make sure we think of these things 2014 01:26:36,030 --> 01:26:39,210 as pointing this way and this way, not just this way. 2015 01:26:39,210 --> 01:26:41,280 Not just a single direction, but 2. 2016 01:26:41,280 --> 01:26:45,180 So you could imagine, in code, building something up like this with a node. 2017 01:26:45,180 --> 01:26:48,570 That creates, in essence, this diagram here. 2018 01:26:48,570 --> 01:26:50,250 But why is this compelling? 2019 01:26:50,250 --> 01:26:52,290 Suppose I want to find the number 3. 2020 01:26:52,290 --> 01:26:54,840 I want to search for the number 3 in this tree. 2021 01:26:54,840 --> 01:26:58,200 It would seem, just like Pedro was the beginning of our linked list, 2022 01:26:58,200 --> 01:27:01,090 in the world of trees, the root, so to speak, 2023 01:27:01,090 --> 01:27:03,090 is the beginning of your data structure. 2024 01:27:03,090 --> 01:27:08,730 You can retain and remember this entire tree just by pointing at the root node, 2025 01:27:08,730 --> 01:27:09,270 ultimately. 2026 01:27:09,270 --> 01:27:12,330 One variable can hang on to this whole tree. 2027 01:27:12,330 --> 01:27:14,520 So how can I find the number 3? 2028 01:27:14,520 --> 01:27:18,660 Well, if I look at the root node and the number I'm looking for is less than. 2029 01:27:18,660 --> 01:27:20,250 Notice, I can go this way. 2030 01:27:20,250 --> 01:27:22,570 Or if it's greater than, I can go this way. 2031 01:27:22,570 --> 01:27:24,750 So I preserve that property of the phone book, 2032 01:27:24,750 --> 01:27:27,000 or just assorted array in general. 2033 01:27:27,000 --> 01:27:28,320 What's true over here? 2034 01:27:28,320 --> 01:27:31,328 If I'm looking for 3, I can go to the right of the 2 2035 01:27:31,328 --> 01:27:33,120 because that number is going to be greater. 2036 01:27:33,120 --> 01:27:35,680 If I go left, it's going to be smaller instead. 2037 01:27:35,680 --> 01:27:38,430 And here's an example of actually recursion. 2038 01:27:38,430 --> 01:27:42,090 Recursion in a physical sense much like the Mario's pyramid. 2039 01:27:42,090 --> 01:27:44,250 Which was recursively to find. 2040 01:27:44,250 --> 01:27:45,300 Notice this. 2041 01:27:45,300 --> 01:27:47,250 I claim this whole thing is a tree. 2042 01:27:47,250 --> 01:27:50,790 Specifically, a binary search tree, which means every node 2043 01:27:50,790 --> 01:27:53,880 has 2, or maybe 1, or maybe 0 children. 2044 01:27:53,880 --> 01:27:55,110 But no more than 2. 2045 01:27:55,110 --> 01:27:56,730 Hence the bi in binary. 2046 01:27:56,730 --> 01:28:02,160 And it's the case that every left child is smaller than the root. 2047 01:28:02,160 --> 01:28:05,130 And every right child is larger than the root. 2048 01:28:05,130 --> 01:28:08,100 That definition certainly works for 2, 4, and 6. 2049 01:28:08,100 --> 01:28:12,930 But it also works recursively for every sub tree, or branch of this tree. 2050 01:28:12,930 --> 01:28:14,910 Notice, if you think of this as the root, 2051 01:28:14,910 --> 01:28:16,980 it is indeed bigger than this left child. 2052 01:28:16,980 --> 01:28:19,080 And it's smaller than this right child. 2053 01:28:19,080 --> 01:28:21,600 And if you look even at the leaves, so to speak. 2054 01:28:21,600 --> 01:28:23,010 The grandchildren here. 2055 01:28:23,010 --> 01:28:26,687 This root node is bigger than its left child, if it existed. 2056 01:28:26,687 --> 01:28:28,020 So it's a meaningless statement. 2057 01:28:28,020 --> 01:28:30,210 And it's less than its right child. 2058 01:28:30,210 --> 01:28:33,000 Or it's not greater than, certainly, so that's meaningless too. 2059 01:28:33,000 --> 01:28:36,760 So we haven't violated the definition even for these leaves, as well. 2060 01:28:36,760 --> 01:28:40,230 And so, now, how many steps does it take to find in the worst case 2061 01:28:40,230 --> 01:28:44,580 any number in a binary search tree, it would seem? 2062 01:28:44,580 --> 01:28:46,530 So it seems 2, literally. 2063 01:28:46,530 --> 01:28:48,400 And the height of this thing is actually 3. 2064 01:28:48,400 --> 01:28:51,150 And so long story short, especially, if you're a little less comfy 2065 01:28:51,150 --> 01:28:53,310 with your logarithms from yesteryear. 2066 01:28:53,310 --> 01:28:57,120 Log base 2 is the number of times you can divide something in half, and half, 2067 01:28:57,120 --> 01:28:58,860 and half, until you get down to 1. 2068 01:28:58,860 --> 01:29:01,828 This is like a logarithm in the reverse direction. 2069 01:29:01,828 --> 01:29:03,120 Here's a whole lot of elements. 2070 01:29:03,120 --> 01:29:05,490 And we're having, we're having until we get down to 1. 2071 01:29:05,490 --> 01:29:09,643 So the height of this tree, that is to say, is log base 2 of n. 2072 01:29:09,643 --> 01:29:12,810 Which means that even in the worst case, the number you're looking for maybe 2073 01:29:12,810 --> 01:29:14,685 it's all the way at the bottom in the leaves. 2074 01:29:14,685 --> 01:29:15,330 Doesn't matter. 2075 01:29:15,330 --> 01:29:20,220 It's going to take log base 2 of n steps, or log of n steps, 2076 01:29:20,220 --> 01:29:23,830 to find, maximally, any one of those numbers. 2077 01:29:23,830 --> 01:29:28,620 So, again, binary search is back. 2078 01:29:28,620 --> 01:29:30,635 But we've paid a price, right. 2079 01:29:30,635 --> 01:29:32,010 This isn't a linked list anymore. 2080 01:29:32,010 --> 01:29:33,192 It's a tree. 2081 01:29:33,192 --> 01:29:36,150 But we've gained back binary search, which is pretty compelling, right. 2082 01:29:36,150 --> 01:29:38,775 That's where the whole class began, on making that distinction. 2083 01:29:38,775 --> 01:29:44,020 But what price have we paid to retain binary search in this new world. 2084 01:29:44,020 --> 01:29:44,520 Yeah. 2085 01:29:47,070 --> 01:29:49,050 It's no longer sorted left to right, but this 2086 01:29:49,050 --> 01:29:52,020 is a claim sorted, according to the binary search tree definition. 2087 01:29:52,020 --> 01:29:56,010 Where, again, left child is smaller than root. 2088 01:29:56,010 --> 01:29:58,440 And right child is greater than root. 2089 01:29:58,440 --> 01:30:01,860 So it is sorted, but it's sorted in a 2-dimensional sense, if you will. 2090 01:30:01,860 --> 01:30:02,910 Not just 1. 2091 01:30:02,910 --> 01:30:05,260 But another price paid? 2092 01:30:05,260 --> 01:30:06,670 AUDIENCE: [INAUDIBLE] nodes now. 2093 01:30:06,670 --> 01:30:07,462 SPEAKER 1: Exactly. 2094 01:30:07,462 --> 01:30:11,830 Every node now needs not one number, but 2, 3 pieces of data. 2095 01:30:11,830 --> 01:30:13,630 A number and now 2 pointers. 2096 01:30:13,630 --> 01:30:15,385 So, again, there's that trade off again. 2097 01:30:15,385 --> 01:30:17,260 Where, well, if you want to save time, you've 2098 01:30:17,260 --> 01:30:20,080 got to give something if you start giving space. 2099 01:30:20,080 --> 01:30:22,547 And you start using more space, you can speed up time. 2100 01:30:22,547 --> 01:30:23,380 Like, you've got it. 2101 01:30:23,380 --> 01:30:24,640 There's always a price paid. 2102 01:30:24,640 --> 01:30:30,400 And it's very often in space, or time, or complexity, or developer time, 2103 01:30:30,400 --> 01:30:32,030 the number of bugs you have to solve. 2104 01:30:32,030 --> 01:30:34,060 I mean, all of these are finite resources 2105 01:30:34,060 --> 01:30:35,833 that you have to juggle them on. 2106 01:30:35,833 --> 01:30:38,500 So if we consider now the code with which we can implement this, 2107 01:30:38,500 --> 01:30:40,120 here might be the node. 2108 01:30:40,120 --> 01:30:43,070 And how might we actually use something like this? 2109 01:30:43,070 --> 01:30:45,520 Well, let's take a look at, maybe, one final program. 2110 01:30:45,520 --> 01:30:49,640 And see here, before we transition to higher level concepts, ultimately. 2111 01:30:49,640 --> 01:30:54,070 Let me go ahead here and let me just open a program I wrote here in advance. 2112 01:30:54,070 --> 01:30:58,210 So let me, in a moment, copy over file called tree.c. 2113 01:30:58,210 --> 01:31:01,068 Which we'll have on the course's websites. 2114 01:31:01,068 --> 01:31:02,860 And I'll walk you through some of the logic 2115 01:31:02,860 --> 01:31:07,790 here that I've written for tree.c. 2116 01:31:07,790 --> 01:31:08,290 All right. 2117 01:31:08,290 --> 01:31:09,800 So what do we have here first? 2118 01:31:09,800 --> 01:31:14,440 So here is an implementation of a binary search tree for numbers. 2119 01:31:14,440 --> 01:31:18,860 And as before, I've played around and I've inserted the numbers manually. 2120 01:31:18,860 --> 01:31:20,290 So what's going on first? 2121 01:31:20,290 --> 01:31:24,130 Here is my definition of a node for a binary search tree, copied and pasted 2122 01:31:24,130 --> 01:31:27,010 from what I proposed on the board a moment ago. 2123 01:31:27,010 --> 01:31:29,710 Here are 2 prototypes for 2 functions, that I'll 2124 01:31:29,710 --> 01:31:31,780 show you in a moment, that allow me to free 2125 01:31:31,780 --> 01:31:35,170 an entire tree, one node at a time. 2126 01:31:35,170 --> 01:31:37,900 And then, also allow me to print the tree in order. 2127 01:31:37,900 --> 01:31:40,300 So even though they're not sorted left to right, 2128 01:31:40,300 --> 01:31:43,450 I bet if I'm clever about what child I print first, 2129 01:31:43,450 --> 01:31:46,670 I can reconstruct the idea of printing this tree properly. 2130 01:31:46,670 --> 01:31:49,150 So how might I implement a binary search tree? 2131 01:31:49,150 --> 01:31:50,440 Here's my main function. 2132 01:31:50,440 --> 01:31:53,020 Here is how I might represent a tree of size 0. 2133 01:31:53,020 --> 01:31:55,960 It's just a null pointer called tree. 2134 01:31:55,960 --> 01:31:58,060 Here's how I might add a number to that list. 2135 01:31:58,060 --> 01:32:02,080 So here, for instance, is me malllocing space for a node. 2136 01:32:02,080 --> 01:32:04,210 Storing it in a temporary variable called n. 2137 01:32:04,210 --> 01:32:06,070 Here is me just doing a safety check. 2138 01:32:06,070 --> 01:32:07,780 Make sure n does not equal null. 2139 01:32:07,780 --> 01:32:12,130 And then, here is me initializing this node to contain the number 2, first. 2140 01:32:12,130 --> 01:32:14,860 Then, initializing the left child of that node to be null. 2141 01:32:14,860 --> 01:32:17,510 And the right child of that null node to be null. 2142 01:32:17,510 --> 01:32:22,670 And then, initializing the tree itself to be equal to that particular node. 2143 01:32:22,670 --> 01:32:25,840 So at this point in the story, there's just one rectangle on the screen 2144 01:32:25,840 --> 01:32:28,740 containing the number 2 with no children. 2145 01:32:28,740 --> 01:32:29,240 All right. 2146 01:32:29,240 --> 01:32:31,630 Let's just add manually to this a little further. 2147 01:32:31,630 --> 01:32:34,780 Let's add another number to the list, by mallocing another node. 2148 01:32:34,780 --> 01:32:38,140 I don't need to declare n as a node* because it already exists at this 2149 01:32:38,140 --> 01:32:38,780 point. 2150 01:32:38,780 --> 01:32:40,720 Here's a little safety check. 2151 01:32:40,720 --> 01:32:45,280 I'm going to not bother with my, let me do this, free memory here. 2152 01:32:45,280 --> 01:32:47,240 Just to be safe. 2153 01:32:47,240 --> 01:32:49,803 Do I want to do this? 2154 01:32:49,803 --> 01:32:51,970 We want a free memory too, which I've not done here, 2155 01:32:51,970 --> 01:32:53,650 but I'll save that for another time. 2156 01:32:53,650 --> 01:32:55,990 Here, I'm going to initialize the number to 1. 2157 01:32:55,990 --> 01:33:00,100 I'm going to initialize the children of this node to null and null. 2158 01:33:00,100 --> 01:33:01,810 And now, I'm going to do this. 2159 01:33:01,810 --> 01:33:06,280 Initialize the tree's left child to be n. 2160 01:33:06,280 --> 01:33:09,222 So what that's essentially doing here is if this 2161 01:33:09,222 --> 01:33:12,430 is my root node, the single rectangle I described a moment ago that currently 2162 01:33:12,430 --> 01:33:14,530 has no children, neither left nor right. 2163 01:33:14,530 --> 01:33:16,480 Here's my new node with the number 1. 2164 01:33:16,480 --> 01:33:18,620 I want it to become the new left child. 2165 01:33:18,620 --> 01:33:22,150 So that line of code on the screen there, tree left equals n, 2166 01:33:22,150 --> 01:33:26,720 is like stitching these 2 together with a pointer from 2 to the 1. 2167 01:33:26,720 --> 01:33:27,220 All right. 2168 01:33:27,220 --> 01:33:30,100 The next lines of code, you can probably guess, 2169 01:33:30,100 --> 01:33:32,560 are me adding another number to the list. 2170 01:33:32,560 --> 01:33:33,730 Just the number 3. 2171 01:33:33,730 --> 01:33:39,200 So this is a simpler tree with 2, 1, and, 3 respectively. 2172 01:33:39,200 --> 01:33:41,710 And this code, let me wave my hands, is almost the same. 2173 01:33:41,710 --> 01:33:45,010 Except for the fact that I'm updating the tree's right child 2174 01:33:45,010 --> 01:33:46,990 to be this new and third node. 2175 01:33:46,990 --> 01:33:50,380 Let's now run the code before looking at those 2 functions. 2176 01:33:50,380 --> 01:33:54,280 Let me do make tree, ./tree. 2177 01:33:54,280 --> 01:33:55,510 And while I'll 1, 2, 3. 2178 01:33:55,510 --> 01:33:58,930 So it sounds like the data structure is sorted, to your concern earlier. 2179 01:33:58,930 --> 01:34:00,700 But how did I actually print this? 2180 01:34:00,700 --> 01:34:02,590 And then, eventually, free the whole thing? 2181 01:34:02,590 --> 01:34:05,980 Well let's look at the definition of first print tree. 2182 01:34:05,980 --> 01:34:08,950 And this is where things get interesting. 2183 01:34:08,950 --> 01:34:12,790 Print tree returns nothing so it's a void function. 2184 01:34:12,790 --> 01:34:18,520 But it takes a pointer to a root element as its sole argument, node* root. 2185 01:34:18,520 --> 01:34:19,690 Here's my safety check. 2186 01:34:19,690 --> 01:34:21,790 If root equals equals null, there's obviously 2187 01:34:21,790 --> 01:34:23,110 nothing to print, just return. 2188 01:34:23,110 --> 01:34:24,970 That goes without saying. 2189 01:34:24,970 --> 01:34:27,010 But here's where things get a little magical. 2190 01:34:27,010 --> 01:34:30,280 Otherwise, print your left child. 2191 01:34:30,280 --> 01:34:33,010 Then print your own number. 2192 01:34:33,010 --> 01:34:36,430 Then, print your right child. 2193 01:34:36,430 --> 01:34:41,700 What is this an example of, even though it's not mentioned by name here? 2194 01:34:41,700 --> 01:34:43,320 What programming technique here? 2195 01:34:43,320 --> 01:34:44,250 AUDIENCE: Recursion. 2196 01:34:44,250 --> 01:34:44,917 SPEAKER 1: Yeah. 2197 01:34:44,917 --> 01:34:48,372 So this is actually perhaps the most compelling use of recursion, yet. 2198 01:34:48,372 --> 01:34:50,580 It wasn't really that compelling with the Mario thing 2199 01:34:50,580 --> 01:34:52,710 because we had such an easy implementation with a for-loop loop 2200 01:34:52,710 --> 01:34:53,550 weeks ago. 2201 01:34:53,550 --> 01:34:58,170 But here is a perfect application of recursion, where your data structure 2202 01:34:58,170 --> 01:34:59,910 itself is recursive, right. 2203 01:34:59,910 --> 01:35:02,220 If you take any snip of any branch, it all 2204 01:35:02,220 --> 01:35:04,590 still looks like a tree, just a smaller one. 2205 01:35:04,590 --> 01:35:06,430 That lends itself to recursion. 2206 01:35:06,430 --> 01:35:11,010 So here is this leap of faith where I say, print my left tree, or my left sub 2207 01:35:11,010 --> 01:35:13,830 tree, if you will, via my child at the left. 2208 01:35:13,830 --> 01:35:17,130 Then, I'll print my own root node here in the middle. 2209 01:35:17,130 --> 01:35:19,740 Then, go ahead and print my right sub tree. 2210 01:35:19,740 --> 01:35:24,180 And because we have this base case that makes sure that if the root is null, 2211 01:35:24,180 --> 01:35:26,967 there's nothing to do, you're not going to recurse infinitely. 2212 01:35:26,967 --> 01:35:29,550 You're not going to call yourself again, and again, and again, 2213 01:35:29,550 --> 01:35:31,210 infinitely, many times. 2214 01:35:31,210 --> 01:35:35,400 So it works out and prints the 1, the 2, and the 3. 2215 01:35:35,400 --> 01:35:36,840 And notice what we could do, too. 2216 01:35:36,840 --> 01:35:40,260 If you wanted to print the tree in reverse order, you could do that. 2217 01:35:40,260 --> 01:35:43,050 Print your right tree first, the greater element. 2218 01:35:43,050 --> 01:35:43,950 Then, yourself. 2219 01:35:43,950 --> 01:35:45,330 Then, your smaller sub tree. 2220 01:35:45,330 --> 01:35:47,970 And if I do make tree here and ./tree, well now, 2221 01:35:47,970 --> 01:35:50,100 I've reversed the order of the list. 2222 01:35:50,100 --> 01:35:51,190 And that's pretty cool. 2223 01:35:51,190 --> 01:35:52,940 You can do it with a for-loop in an array. 2224 01:35:52,940 --> 01:35:56,370 But you can also do it, even with this 2-dimensional structure. 2225 01:35:56,370 --> 01:36:00,180 Let's lastly look at this free tree function. 2226 01:36:00,180 --> 01:36:02,160 And this one's almost the same. 2227 01:36:02,160 --> 01:36:05,400 Order doesn't matter in quite the same way, but it does still matter. 2228 01:36:05,400 --> 01:36:07,020 Here's what I did with free tree. 2229 01:36:07,020 --> 01:36:09,978 Well, if the root of the tree is null, there's obviously nothing to do. 2230 01:36:09,978 --> 01:36:10,560 Just return. 2231 01:36:10,560 --> 01:36:15,100 Otherwise, go ahead and free your left child and all of its descendants. 2232 01:36:15,100 --> 01:36:18,090 Then free your right child and all of its descendants. 2233 01:36:18,090 --> 01:36:19,900 And then, free yourself. 2234 01:36:19,900 --> 01:36:25,690 And again, free literally just frees the address in that variable. 2235 01:36:25,690 --> 01:36:27,570 It doesn't free the whole darn thing. 2236 01:36:27,570 --> 01:36:29,850 It just frees literally what's at that address. 2237 01:36:29,850 --> 01:36:33,900 Why was it important that I did line 72 last, though? 2238 01:36:33,900 --> 01:36:36,450 Why did I free the left child and the right child 2239 01:36:36,450 --> 01:36:39,973 before I freed myself, so to speak? 2240 01:36:39,973 --> 01:36:40,890 AUDIENCE: [INAUDIBLE]. 2241 01:36:40,890 --> 01:36:41,682 SPEAKER 1: Exactly. 2242 01:36:41,682 --> 01:36:46,140 If you free yourself first, if I had done incorrectly this line higher up, 2243 01:36:46,140 --> 01:36:50,820 you're not allowed to touch the left child tree or the right child tree. 2244 01:36:50,820 --> 01:36:53,350 Because the memory address is no longer valid at that point. 2245 01:36:53,350 --> 01:36:55,290 You would get some memory error, perhaps. 2246 01:36:55,290 --> 01:36:56,310 The program would crash. 2247 01:36:56,310 --> 01:36:57,990 Valgrind definitely wouldn't like it. 2248 01:36:57,990 --> 01:37:00,060 Bad things would otherwise happen. 2249 01:37:00,060 --> 01:37:01,890 But here, then, is an example of recursion. 2250 01:37:01,890 --> 01:37:06,360 And again, just a recursive use of an actual data structure. 2251 01:37:06,360 --> 01:37:09,120 And what's even cooler here is, relatively speaking, 2252 01:37:09,120 --> 01:37:11,640 suppose we wanted to search something like this. 2253 01:37:11,640 --> 01:37:15,720 Binary search actually gets pretty straightforward to implement 2. 2254 01:37:15,720 --> 01:37:16,410 For instance. 2255 01:37:16,410 --> 01:37:20,940 here might be the prototype for a search function for a binary search tree. 2256 01:37:20,940 --> 01:37:25,920 You give me the root of a tree, and you give me a number I'm looking for, 2257 01:37:25,920 --> 01:37:29,880 and I can pretty easily now return true if it's in there or false if it's not. 2258 01:37:29,880 --> 01:37:30,450 How? 2259 01:37:30,450 --> 01:37:32,430 Well, let's first ask a question. 2260 01:37:32,430 --> 01:37:35,395 If tree equals equals null, then you just return false. 2261 01:37:35,395 --> 01:37:38,520 Because if there's no tree, there's no number, so it's obviously not there. 2262 01:37:38,520 --> 01:37:39,860 Return false. 2263 01:37:39,860 --> 01:37:46,560 Else if, the number you're looking for is less than the tree's own number, 2264 01:37:46,560 --> 01:37:48,570 which direction should we go? 2265 01:37:48,570 --> 01:37:49,247 AUDIENCE: Left. 2266 01:37:49,247 --> 01:37:50,080 SPEAKER 1: OK, left. 2267 01:37:50,080 --> 01:37:51,190 How do we express that? 2268 01:37:51,190 --> 01:37:54,300 Well, let's just return the answer to this question. 2269 01:37:54,300 --> 01:37:58,440 Search the left sub tree, by way of my left child, 2270 01:37:58,440 --> 01:37:59,970 looking for the same number. 2271 01:37:59,970 --> 01:38:02,250 And you just assume through the beauty of recursion 2272 01:38:02,250 --> 01:38:05,400 that you're kicking the can and let yourself figure it out 2273 01:38:05,400 --> 01:38:06,600 with a smaller problem. 2274 01:38:06,600 --> 01:38:09,060 Just that snipped left tree instead. 2275 01:38:09,060 --> 01:38:13,320 Else if, the number you're looking for is greater than the tree's own number, 2276 01:38:13,320 --> 01:38:15,160 go to the right, as you might infer. 2277 01:38:15,160 --> 01:38:18,060 So I can just return the answer to this question. 2278 01:38:18,060 --> 01:38:21,150 Search my right sub tree for that same number. 2279 01:38:21,150 --> 01:38:23,020 And there's a fourth and final condition. 2280 01:38:23,020 --> 01:38:26,250 What's the fourth scenario we have to consider, explicitly? 2281 01:38:26,250 --> 01:38:26,760 Yeah. 2282 01:38:26,760 --> 01:38:27,780 AUDIENCE: The number. 2283 01:38:27,780 --> 01:38:29,822 SPEAKER 1: If the number, itself, is right there. 2284 01:38:29,822 --> 01:38:33,480 So else if, the number I'm looking for equals the tree's own number, 2285 01:38:33,480 --> 01:38:36,250 then and only then, should you return true. 2286 01:38:36,250 --> 01:38:38,490 And if you're thinking quickly here, there's 2287 01:38:38,490 --> 01:38:42,150 an optimization possible, better design opportunity. 2288 01:38:42,150 --> 01:38:43,650 Think back to even our scratch days. 2289 01:38:43,650 --> 01:38:45,770 What could we do a little better here? 2290 01:38:45,770 --> 01:38:46,710 You're pointing at it. 2291 01:38:46,710 --> 01:38:47,508 AUDIENCE: Else. 2292 01:38:47,508 --> 01:38:48,300 SPEAKER 1: Exactly. 2293 01:38:48,300 --> 01:38:49,140 An else suffices. 2294 01:38:49,140 --> 01:38:51,682 Because if there's logically only 4 things that could happen, 2295 01:38:51,682 --> 01:38:54,540 you're wasting your time by asking a fourth gratuitous question. 2296 01:38:54,540 --> 01:38:55,860 And else here suffices. 2297 01:38:55,860 --> 01:38:59,500 So here to, more so than the Mario example a few weeks ago, 2298 01:38:59,500 --> 01:39:02,100 there's just this elegance arguably to recursion. 2299 01:39:02,100 --> 01:39:02,850 And that's it. 2300 01:39:02,850 --> 01:39:03,960 This is not pseudocode. 2301 01:39:03,960 --> 01:39:07,950 This is the code for binary search on a binary search tree. 2302 01:39:07,950 --> 01:39:10,020 And so, recursion tends to work in lockstep 2303 01:39:10,020 --> 01:39:14,700 with these kinds of data structures that have this structure to them 2304 01:39:14,700 --> 01:39:16,180 as we're seeing here. 2305 01:39:16,180 --> 01:39:16,680 All right. 2306 01:39:16,680 --> 01:39:22,360 Any questions, then, on binary search as implemented here with a tree? 2307 01:39:22,360 --> 01:39:23,227 Yeah. 2308 01:39:23,227 --> 01:39:25,175 AUDIENCE: About like third years. 2309 01:39:25,175 --> 01:39:26,149 [INAUDIBLE] 2310 01:39:29,688 --> 01:39:30,730 SPEAKER 1: Good question. 2311 01:39:30,730 --> 01:39:36,690 So when returning a Boolean value, true and false are values that are defined 2312 01:39:36,690 --> 01:39:40,350 in a library called Standard Bool, S-T-D-B-O-O-L dot H. 2313 01:39:40,350 --> 01:39:42,480 With a header file that you can use. 2314 01:39:42,480 --> 01:39:49,258 It is the case that true is, it's not well defined what they are. 2315 01:39:49,258 --> 01:39:50,550 But they would map indeed, yes. 2316 01:39:50,550 --> 01:39:51,960 To 0 and 1, essentially. 2317 01:39:51,960 --> 01:39:54,390 But you should not compare them explicitly to 0 and 1. 2318 01:39:54,390 --> 01:39:57,390 When you're using true and false, you should compare them to each other. 2319 01:39:57,390 --> 01:40:01,375 AUDIENCE: I meant if it's in a code return. 2320 01:40:01,375 --> 01:40:02,250 SPEAKER 1: Oh, sorry. 2321 01:40:02,250 --> 01:40:05,850 So if I am in my own code from earlier, an avoid function, 2322 01:40:05,850 --> 01:40:08,280 it is totally fine to return. 2323 01:40:08,280 --> 01:40:10,950 You just can't return something explicitly. 2324 01:40:10,950 --> 01:40:12,720 So return just means that's it. 2325 01:40:12,720 --> 01:40:14,280 Quit out of this function. 2326 01:40:14,280 --> 01:40:16,150 You're not actually handing back a value. 2327 01:40:16,150 --> 01:40:19,770 So it's a way of short circuiting the execution. 2328 01:40:19,770 --> 01:40:22,050 If you don't like that, and some people do frown 2329 01:40:22,050 --> 01:40:26,760 upon having code return from functions prematurely, you could invert the logic 2330 01:40:26,760 --> 01:40:28,050 and do something like this. 2331 01:40:28,050 --> 01:40:31,740 If the root does not equal null, do all of these things. 2332 01:40:31,740 --> 01:40:34,020 And then, indent all three of these lines underneath. 2333 01:40:34,020 --> 01:40:35,490 That's perfectly fine too. 2334 01:40:35,490 --> 01:40:37,290 I happen to write it the other way just so 2335 01:40:37,290 --> 01:40:40,990 that there was explicitly a base case that I could point to on the screen. 2336 01:40:40,990 --> 01:40:43,920 Whereas, now, it's implicitly there for us only. 2337 01:40:43,920 --> 01:40:45,790 But a good observation too. 2338 01:40:45,790 --> 01:40:46,290 All right. 2339 01:40:46,290 --> 01:40:49,960 So let's ask the question as before about running time of this. 2340 01:40:49,960 --> 01:40:51,930 It would look like binary search is back. 2341 01:40:51,930 --> 01:40:57,600 And we can now do things in logarithmic time, but we should be careful. 2342 01:40:57,600 --> 01:40:59,940 Is this a binary search tree? 2343 01:40:59,940 --> 01:41:01,660 Just to be clear. 2344 01:41:01,660 --> 01:41:04,380 And again, a binary search tree is a tree 2345 01:41:04,380 --> 01:41:11,118 where the root is greater than its left child and smaller than its right child. 2346 01:41:11,118 --> 01:41:11,910 That's the essence. 2347 01:41:11,910 --> 01:41:13,380 So you're nodding your head. 2348 01:41:13,380 --> 01:41:15,280 You agree? 2349 01:41:15,280 --> 01:41:16,020 I agree. 2350 01:41:16,020 --> 01:41:18,030 So this is a binary search tree. 2351 01:41:18,030 --> 01:41:20,390 Is this a binary search tree? 2352 01:41:20,390 --> 01:41:21,330 [INTERPOSING VOICES] 2353 01:41:21,330 --> 01:41:21,830 OK. 2354 01:41:21,830 --> 01:41:22,860 I'm hearing yeses. 2355 01:41:22,860 --> 01:41:25,710 Or I'm hearing just my delay changing the vote it would seem. 2356 01:41:25,710 --> 01:41:28,080 So this is one of those trick questions. 2357 01:41:28,080 --> 01:41:30,480 This is a binary search tree because I've not 2358 01:41:30,480 --> 01:41:33,390 violated the definition of what I gave you, right. 2359 01:41:33,390 --> 01:41:39,480 Is there any example of a left child that is greater than its parent? 2360 01:41:39,480 --> 01:41:42,480 Or is there any example of a right child that's smaller than its parent? 2361 01:41:42,480 --> 01:41:44,897 That's just the opposite way of describing the same thing. 2362 01:41:44,897 --> 01:41:47,070 No, this is a binary search tree. 2363 01:41:47,070 --> 01:41:50,210 Unfortunately, it also looks like, albeit at a different axis, what? 2364 01:41:50,210 --> 01:41:51,210 AUDIENCE: A linked list. 2365 01:41:51,210 --> 01:41:51,900 SPEAKER 1: A linked list. 2366 01:41:51,900 --> 01:41:53,970 But you could imagine this happening, right. 2367 01:41:53,970 --> 01:41:56,640 Suppose that I hadn't been as thoughtful as I was earlier 2368 01:41:56,640 --> 01:41:59,970 by inserting 2, And then 1, and then 3. 2369 01:41:59,970 --> 01:42:02,160 Which nicely balanced everything out. 2370 01:42:02,160 --> 01:42:04,860 Suppose that instead, because of what the user is typing in 2371 01:42:04,860 --> 01:42:07,980 or whatever you contrive in your own code, suppose you insert a 1, 2372 01:42:07,980 --> 01:42:10,260 and then a 2, and then a 3. 2373 01:42:10,260 --> 01:42:12,850 Like, you've created a problem for yourself. 2374 01:42:12,850 --> 01:42:16,290 Because if we follow the same logic as before, going left or going right, 2375 01:42:16,290 --> 01:42:21,030 this is how you might implement a binary search tree accidentally 2376 01:42:21,030 --> 01:42:24,750 if you just blindly keep following that definition. 2377 01:42:24,750 --> 01:42:27,030 I mean, this would be better designed as what? 2378 01:42:27,030 --> 01:42:29,490 If we rotated the whole thing around. 2379 01:42:29,490 --> 01:42:30,870 And that's totally fine. 2380 01:42:30,870 --> 01:42:33,060 And those kinds of trees actually have names. 2381 01:42:33,060 --> 01:42:35,400 There's trees called AVL trees in computer science. 2382 01:42:35,400 --> 01:42:37,050 There are red-black black trees in computer science. 2383 01:42:37,050 --> 01:42:39,300 There are other types of trees that, additionally, 2384 01:42:39,300 --> 01:42:42,510 add some logic that tell you when you got to pivot the thing, 2385 01:42:42,510 --> 01:42:46,238 and rotate it, and snip off the root, and fix things in this way. 2386 01:42:46,238 --> 01:42:48,030 But a binary search tree, in and of itself, 2387 01:42:48,030 --> 01:42:51,670 does not guarantee that it will be balanced, so to speak. 2388 01:42:51,670 --> 01:42:54,240 And so, if you consider the worst case scenario 2389 01:42:54,240 --> 01:42:55,860 of even using a binary search tree. 2390 01:42:55,860 --> 01:42:57,960 If you're not smart about the code you're writing 2391 01:42:57,960 --> 01:43:00,180 and you just blindly follow this definition, 2392 01:43:00,180 --> 01:43:04,290 you might accidentally create a crazy, long and stringy binary search 2393 01:43:04,290 --> 01:43:07,050 tree that essentially looks like a linked list. 2394 01:43:07,050 --> 01:43:09,510 Because you're not even using any of the left children. 2395 01:43:09,510 --> 01:43:12,750 So unfortunately, the literal answer to the question 2396 01:43:12,750 --> 01:43:15,480 here is what's the running time of search? 2397 01:43:15,480 --> 01:43:17,400 Well, hopefully, log n. 2398 01:43:17,400 --> 01:43:19,980 But not if you don't maintain the balance of the tree. 2399 01:43:19,980 --> 01:43:25,290 Both, in certain search, could actually devolve into instead of big O of log n, 2400 01:43:25,290 --> 01:43:26,952 literally, big O of n. 2401 01:43:26,952 --> 01:43:29,160 If you don't somehow take into account, and we're not 2402 01:43:29,160 --> 01:43:30,720 going to do the code for that here. 2403 01:43:30,720 --> 01:43:34,140 It's a higher level thing you might explore down the road. 2404 01:43:34,140 --> 01:43:37,930 It can devolve into something that you might not have intended. 2405 01:43:37,930 --> 01:43:40,022 And so, now that we're talking about 2 dimensions, 2406 01:43:40,022 --> 01:43:41,730 it's really the onus is on the programmer 2407 01:43:41,730 --> 01:43:44,490 to consider what kinds of perverse situations might happen. 2408 01:43:44,490 --> 01:43:46,860 Where the thing devolves into a structure 2409 01:43:46,860 --> 01:43:50,350 that you don't actually want it to devolve into. 2410 01:43:50,350 --> 01:43:50,850 All right. 2411 01:43:50,850 --> 01:43:52,360 We've got just a few structures to go. 2412 01:43:52,360 --> 01:43:53,940 Let's go ahead and take one more 5 minute break here. 2413 01:43:53,940 --> 01:43:55,410 When we come back, we'll talk at this level 2414 01:43:55,410 --> 01:43:57,030 about some final applications of this. 2415 01:43:57,030 --> 01:43:58,510 See you in 5. 2416 01:43:58,510 --> 01:44:00,270 All right. 2417 01:44:00,270 --> 01:44:01,860 So we are back. 2418 01:44:01,860 --> 01:44:05,250 And as promised, we'll operate now at this higher level. 2419 01:44:05,250 --> 01:44:08,520 Where if we take for granted that, even though you haven't had an opportunity 2420 01:44:08,520 --> 01:44:11,312 to play with these techniques yet, you have the ability now in code 2421 01:44:11,312 --> 01:44:12,780 to stitch things together. 2422 01:44:12,780 --> 01:44:15,630 Both in a one dimension and even 2 dimensions, 2423 01:44:15,630 --> 01:44:17,970 to build things like lists and trees. 2424 01:44:17,970 --> 01:44:19,980 So if we have these building blocks. 2425 01:44:19,980 --> 01:44:22,680 Things like now arrays, and lists, and trees, 2426 01:44:22,680 --> 01:44:26,790 what if we start to amalgamate them such that we build things out 2427 01:44:26,790 --> 01:44:28,900 of multiple data structures? 2428 01:44:28,900 --> 01:44:32,360 Can we start to get some of the best of both worlds by way of, for instance, 2429 01:44:32,360 --> 01:44:33,710 something called a hash table. 2430 01:44:33,710 --> 01:44:37,540 So a hash table is a Swiss army knife of data structures 2431 01:44:37,540 --> 01:44:39,310 in that it's so commonly used. 2432 01:44:39,310 --> 01:44:44,000 Because it allows you to associate keys with value, so to speak. 2433 01:44:44,000 --> 01:44:49,060 So, for instance, it allows you to associate a username with a password. 2434 01:44:49,060 --> 01:44:51,070 Or a name with a number. 2435 01:44:51,070 --> 01:44:53,920 Or anything where you have to take something as input, 2436 01:44:53,920 --> 01:44:56,300 and get as output a corresponding piece of information. 2437 01:44:56,300 --> 01:44:59,210 A hash table is often a data structure of choice. 2438 01:44:59,210 --> 01:45:00,460 And here's what it looks like. 2439 01:45:00,460 --> 01:45:02,800 It's actually looks like an array, at first glance. 2440 01:45:02,800 --> 01:45:05,990 But for discussion's sake, I've drawn this array vertically, 2441 01:45:05,990 --> 01:45:06,920 which is totally fine. 2442 01:45:06,920 --> 01:45:08,660 It's still just an array. 2443 01:45:08,660 --> 01:45:13,720 But it allows you, a hash table, to jump to any of these locations randomly. 2444 01:45:13,720 --> 01:45:14,740 That is instantly. 2445 01:45:14,740 --> 01:45:18,130 So, for instance, there's actually 26 locations in this array. 2446 01:45:18,130 --> 01:45:21,100 Because I want to, for instance, store initially 2447 01:45:21,100 --> 01:45:23,980 names of people, for instance. 2448 01:45:23,980 --> 01:45:26,653 And wouldn't it be nice if the person's name starts with A, 2449 01:45:26,653 --> 01:45:27,820 I have a go to place for it. 2450 01:45:27,820 --> 01:45:28,780 Maybe the first box. 2451 01:45:28,780 --> 01:45:30,863 And if it starts with Z, I put them at the bottom. 2452 01:45:30,863 --> 01:45:33,070 So that I can jump instantly, arithmetically, 2453 01:45:33,070 --> 01:45:35,470 using a little bit of Ascii or Unicode fanciness, 2454 01:45:35,470 --> 01:45:38,540 exactly to the location that they want to they need to go. 2455 01:45:38,540 --> 01:45:40,690 So, for instance, here's our array 0 index. 2456 01:45:40,690 --> 01:45:42,130 0 through 25. 2457 01:45:42,130 --> 01:45:44,500 If I think of this, though, as A through Z, 2458 01:45:44,500 --> 01:45:46,370 I'm going to think of these 26 locations, 2459 01:45:46,370 --> 01:45:49,630 now in the context of a hash table, is what we'll generally call buckets. 2460 01:45:49,630 --> 01:45:52,010 So buckets into which you can put values. 2461 01:45:52,010 --> 01:45:56,380 So, for instance, suppose that we want to insert a value, one name 2462 01:45:56,380 --> 01:45:57,590 into this data structure. 2463 01:45:57,590 --> 01:45:59,260 And that name is say, Albus. 2464 01:45:59,260 --> 01:46:03,980 So Albus starting with A. Albus might go at the very beginning of this list. 2465 01:46:03,980 --> 01:46:04,480 All right. 2466 01:46:04,480 --> 01:46:06,188 And then, we want to insert another name. 2467 01:46:06,188 --> 01:46:07,630 This one happens to be Zacharias. 2468 01:46:07,630 --> 01:46:10,690 Starting with Z, so it goes all the way at the end of this data 2469 01:46:10,690 --> 01:46:12,490 structure in location 25 a.k.a. 2470 01:46:12,490 --> 01:46:13,390 Z. 2471 01:46:13,390 --> 01:46:17,260 And then, maybe a third name like Hermione, and that goes at location H 2472 01:46:17,260 --> 01:46:19,310 according to that position in the alphabet. 2473 01:46:19,310 --> 01:46:22,060 So this is great because in constant time, 2474 01:46:22,060 --> 01:46:26,020 I can insert and conversely search for any of these names, 2475 01:46:26,020 --> 01:46:27,700 based on the first letter of their name. 2476 01:46:27,700 --> 01:46:30,098 A, or Z, or H, in this case. 2477 01:46:30,098 --> 01:46:32,890 Let's fast forward and assume we put a whole bunch of other names-- 2478 01:46:32,890 --> 01:46:34,900 might look familiar, into this hash table. 2479 01:46:34,900 --> 01:46:39,110 It's great because every name has its own location. 2480 01:46:39,110 --> 01:46:43,480 But if you're thinking of names you don't yet see it on the screen, 2481 01:46:43,480 --> 01:46:45,710 we eventually encounter a problem with this, right. 2482 01:46:45,710 --> 01:46:49,480 When could something go wrong using a hash table like this 2483 01:46:49,480 --> 01:46:52,090 if we wanted to insert even more names? 2484 01:46:52,090 --> 01:46:54,290 What's going to eventually happen? 2485 01:46:54,290 --> 01:46:54,790 Yeah. 2486 01:46:54,790 --> 01:46:56,998 There's already someone with the first letter, right. 2487 01:46:56,998 --> 01:46:59,860 Like I haven't even mentioned Harry, for instance, or Hagrid. 2488 01:46:59,860 --> 01:47:01,750 And yet, Hermione's already using that spot. 2489 01:47:01,750 --> 01:47:04,030 So that invites the question, well, what happens? 2490 01:47:04,030 --> 01:47:07,600 Maybe, if we want to insert Harry next, do we maybe cheat and put him 2491 01:47:07,600 --> 01:47:08,710 at location I? 2492 01:47:08,710 --> 01:47:11,323 But then if there's a location I, where do we put them? 2493 01:47:11,323 --> 01:47:13,990 And it just feels like the situation could very quickly devolve. 2494 01:47:13,990 --> 01:47:16,930 But I've deliberately drawn this data structure, 2495 01:47:16,930 --> 01:47:19,990 that I claim as a hash table, in 2 directions. 2496 01:47:19,990 --> 01:47:22,120 An array vertically, here. 2497 01:47:22,120 --> 01:47:25,300 But what might this be hinting I'm using horizontally, 2498 01:47:25,300 --> 01:47:28,300 even though I'm drawing the rectangles a little differently from before? 2499 01:47:28,300 --> 01:47:29,092 AUDIENCE: An array. 2500 01:47:29,092 --> 01:47:29,758 SPEAKER 1: Yeah. 2501 01:47:29,758 --> 01:47:31,091 Maybe another array, to be fair. 2502 01:47:31,091 --> 01:47:34,258 But, honestly, arrays are such a pain with the allocating, and reallocating, 2503 01:47:34,258 --> 01:47:34,810 and so forth. 2504 01:47:34,810 --> 01:47:38,600 These look like the beginnings of a linked list, if you will. 2505 01:47:38,600 --> 01:47:42,190 Where the name is where the number used to be, even though I'm drawing it 2506 01:47:42,190 --> 01:47:44,200 horizontally now just for discussion's sake. 2507 01:47:44,200 --> 01:47:47,800 And this seems to be a pointer that isn't pointing anywhere yet. 2508 01:47:47,800 --> 01:47:53,080 But it looks like the array is 26 pointers, some of which are null, 2509 01:47:53,080 --> 01:47:53,920 that is empty. 2510 01:47:53,920 --> 01:47:56,675 Some of which are pointing at the first node in a linked list. 2511 01:47:56,675 --> 01:47:59,050 So that's really what a hash table might be in your mind. 2512 01:47:59,050 --> 01:48:03,828 An amalgam of an array, whose elements are linked lists. 2513 01:48:03,828 --> 01:48:06,370 And in theory, this gives you the best of both worlds, right. 2514 01:48:06,370 --> 01:48:09,430 You get random access with high probability, right. 2515 01:48:09,430 --> 01:48:12,620 You get to jump immediately to the location you want to put someone. 2516 01:48:12,620 --> 01:48:15,430 But, if you run into this perverse situation where there's someone 2517 01:48:15,430 --> 01:48:16,870 already there, OK, fine. 2518 01:48:16,870 --> 01:48:20,350 It starts to devolve into a linked list, but it's at least 26 2519 01:48:20,350 --> 01:48:21,580 smaller length lists. 2520 01:48:21,580 --> 01:48:24,670 Not one massive linked list, which would be Big O of n. 2521 01:48:24,670 --> 01:48:26,480 And quite slow to solve. 2522 01:48:26,480 --> 01:48:28,630 So if Harry gets inserted in Hagrid. 2523 01:48:28,630 --> 01:48:32,780 Yeah, you have to chain them together, so to speak, in this way. 2524 01:48:32,780 --> 01:48:35,645 But, at least you've not painted yourself into a corner. 2525 01:48:35,645 --> 01:48:38,770 And in fact, if we fast forward and put a whole bunch of familiar names in, 2526 01:48:38,770 --> 01:48:41,120 the data structure starts to look like this. 2527 01:48:41,120 --> 01:48:43,460 So the chains not terribly long. 2528 01:48:43,460 --> 01:48:46,270 And some of them are actually of size 0 because there's just 2529 01:48:46,270 --> 01:48:49,150 some unpopular letters of the alphabet among these names. 2530 01:48:49,150 --> 01:48:51,100 But it seems better than just putting everyone 2531 01:48:51,100 --> 01:48:53,860 in one big array, or one big linked list. 2532 01:48:53,860 --> 01:48:58,190 We're trying to balance these trade offs a little bit in the middle here. 2533 01:48:58,190 --> 01:49:00,410 Well, how might we represent something like this? 2534 01:49:00,410 --> 01:49:02,140 Here's how we could describe this thing. 2535 01:49:02,140 --> 01:49:05,320 A node in the context of a linked list could be this. 2536 01:49:05,320 --> 01:49:08,860 I have an array called word of type char. 2537 01:49:08,860 --> 01:49:13,060 And it's big enough to fit the longest word in the alphabet plus 1. 2538 01:49:13,060 --> 01:49:14,890 And the plus 1 why, probably? 2539 01:49:14,890 --> 01:49:15,760 AUDIENCE: The null. 2540 01:49:15,760 --> 01:49:16,730 SPEAKER 1: The null character. 2541 01:49:16,730 --> 01:49:19,840 So I'm assuming that longest word is like a constant defined elsewhere 2542 01:49:19,840 --> 01:49:20,470 in the story. 2543 01:49:20,470 --> 01:49:22,735 And it's something big like 40, 100, whatever. 2544 01:49:22,735 --> 01:49:25,810 Whatever the longest word in the Harry Potter universe 2545 01:49:25,810 --> 01:49:28,440 is or the English dictionary is. 2546 01:49:28,440 --> 01:49:34,050 Longest word plus 1 should be sufficient to store any name in the story here. 2547 01:49:34,050 --> 01:49:36,360 And then, what else does it each of these nodes have? 2548 01:49:36,360 --> 01:49:40,060 Well it has a pointer to another node. 2549 01:49:40,060 --> 01:49:42,390 So here's how we might implement the notion of a node 2550 01:49:42,390 --> 01:49:46,710 in the context of storing not integers, but names. 2551 01:49:46,710 --> 01:49:48,360 Instead, like this. 2552 01:49:48,360 --> 01:49:51,360 But how do we decide what the hash table itself is? 2553 01:49:51,360 --> 01:49:55,140 Well, if we now have a definition of a node, we could have a variable in main, 2554 01:49:55,140 --> 01:49:57,510 or even globally, called hash table. 2555 01:49:57,510 --> 01:50:02,910 That itself is an array of node* pointers. 2556 01:50:02,910 --> 01:50:05,310 That is an array of pointers to nodes. 2557 01:50:05,310 --> 01:50:07,290 The beginnings of linked lists. 2558 01:50:07,290 --> 01:50:08,950 Number of buckets is to me. 2559 01:50:08,950 --> 01:50:11,083 I proposed, verbally, that it be 26. 2560 01:50:11,083 --> 01:50:13,500 But honestly, if you get a lot of collisions, so to speak. 2561 01:50:13,500 --> 01:50:15,623 A lot of H names trying to go to the same place. 2562 01:50:15,623 --> 01:50:17,790 Well, maybe, we need to be smarter and not just look 2563 01:50:17,790 --> 01:50:19,207 at the first letter of their name. 2564 01:50:19,207 --> 01:50:20,800 But, maybe, the first and the second. 2565 01:50:20,800 --> 01:50:24,900 So it's H-A and H-E. But wait, no, then Harry and Hagrid still collide. 2566 01:50:24,900 --> 01:50:27,840 But we start to at least make the problem a little less 2567 01:50:27,840 --> 01:50:31,500 impactful by tinkering with something like the number of buckets 2568 01:50:31,500 --> 01:50:32,880 in a hash table like this. 2569 01:50:32,880 --> 01:50:37,560 But how do we decide where someone goes in a hash table in this way? 2570 01:50:37,560 --> 01:50:39,900 Well, it's an old school problem of input and output. 2571 01:50:39,900 --> 01:50:43,260 The input to the problem is going to be something like the name. 2572 01:50:43,260 --> 01:50:45,300 And the algorithm in the middle, as of today, 2573 01:50:45,300 --> 01:50:47,730 is going to be something called a hash function. 2574 01:50:47,730 --> 01:50:49,620 A hash function is generally something that 2575 01:50:49,620 --> 01:50:53,370 takes as input, a string, a number, whatever, and produces 2576 01:50:53,370 --> 01:50:55,860 as output a location in our context. 2577 01:50:55,860 --> 01:50:57,750 Like a number 0 through 25. 2578 01:50:57,750 --> 01:50:59,490 Or 0 through 16,000. 2579 01:50:59,490 --> 01:51:02,190 Or whatever the number of buckets you want is, 2580 01:51:02,190 --> 01:51:06,370 it's going to just tell you where to put that input at a specific location. 2581 01:51:06,370 --> 01:51:10,200 So, for instance, Albus, according to the story thus far, gave me back to 0 2582 01:51:10,200 --> 01:51:10,710 as output. 2583 01:51:10,710 --> 01:51:12,570 Zacharias gave me 25. 2584 01:51:12,570 --> 01:51:15,300 So the hash function, in the middle of that black box, 2585 01:51:15,300 --> 01:51:17,760 is pretty simplistic in this story. 2586 01:51:17,760 --> 01:51:21,360 It's just looking at the Ascii value, it seems, of the first letter 2587 01:51:21,360 --> 01:51:22,110 in their name. 2588 01:51:22,110 --> 01:51:25,150 And then, subtracting off what capital A is 65. 2589 01:51:25,150 --> 01:51:29,470 So like doing some math to get back in number between 0 and 25. 2590 01:51:29,470 --> 01:51:32,610 So that's how we got to this point in the story. 2591 01:51:32,610 --> 01:51:37,440 And how might we, then, resolve the problem further and use 2592 01:51:37,440 --> 01:51:39,060 this notion of hashing more generally? 2593 01:51:39,060 --> 01:51:40,935 Well just for demonstration sake here, here's 2594 01:51:40,935 --> 01:51:43,290 actually some buckets, literally. 2595 01:51:43,290 --> 01:51:46,380 And we've labeled, in advance, these buckets with the suits 2596 01:51:46,380 --> 01:51:47,800 from a deck of cards. 2597 01:51:47,800 --> 01:51:49,770 So we've got some spades. 2598 01:51:49,770 --> 01:51:54,600 And we've got diamonds here. 2599 01:51:54,600 --> 01:51:58,110 And we've got, what else here? 2600 01:51:58,110 --> 01:52:01,890 Clubs and hearts. 2601 01:52:01,890 --> 01:52:04,592 So we have a deck of cards here, for instance, right. 2602 01:52:04,592 --> 01:52:07,050 And this is something you, yourself, might do instinctively 2603 01:52:07,050 --> 01:52:09,420 if you're getting ready to start playing a game of cards. 2604 01:52:09,420 --> 01:52:11,587 You're just cleaning up or you want things in order. 2605 01:52:11,587 --> 01:52:13,963 Like, here is literally a jumbo deck of cards. 2606 01:52:13,963 --> 01:52:16,380 What would be the easiest way for me to sort these things? 2607 01:52:16,380 --> 01:52:19,088 Well we've got a whole bunch of sorting algorithms from the past. 2608 01:52:19,088 --> 01:52:21,630 So I could go through like, here's the 3 of diamonds. 2609 01:52:21,630 --> 01:52:23,880 And I could, here let me throw this up on the screen. 2610 01:52:23,880 --> 01:52:25,570 Just so, if you're far in back. 2611 01:52:25,570 --> 01:52:27,900 So here's diamonds. 2612 01:52:27,900 --> 01:52:28,890 I could put this here. 2613 01:52:28,890 --> 01:52:30,510 3, 4. 2614 01:52:30,510 --> 01:52:32,130 I could do this in order here. 2615 01:52:32,130 --> 01:52:34,540 But a lot of us, honestly, if given a deck of cards. 2616 01:52:34,540 --> 01:52:37,290 And you just want to clean it up and sort it in order, 2617 01:52:37,290 --> 01:52:38,620 you might do things like this. 2618 01:52:38,620 --> 01:52:42,030 Well here's my input, 3 of diamonds, let's put it in this bucket. 2619 01:52:42,030 --> 01:52:43,770 4 of diamonds, this bucket. 2620 01:52:43,770 --> 01:52:45,640 5 of diamonds, this bucket. 2621 01:52:45,640 --> 01:52:49,500 And if you keep going through the cards, here's seven of hearts, hearts bucket. 2622 01:52:49,500 --> 01:52:51,210 8's bucket. 2623 01:52:51,210 --> 01:52:53,070 Queen of spades over here. 2624 01:52:53,070 --> 01:52:55,020 And it's still going to take you 52 steps. 2625 01:52:55,020 --> 01:52:58,020 But at the end of it, you have hashed all of the cards 2626 01:52:58,020 --> 01:52:59,610 into 4 distinct buckets. 2627 01:52:59,610 --> 01:53:02,490 And now you have problems of size 13, which 2628 01:53:02,490 --> 01:53:06,030 is a little more tenable than doing one massive 52 card problem. 2629 01:53:06,030 --> 01:53:08,070 You can now do 4, 13 size problems. 2630 01:53:08,070 --> 01:53:11,790 And so hashing is something that even you and I might do instinctively. 2631 01:53:11,790 --> 01:53:16,680 Taking as input some card, some name, and producing as output some location. 2632 01:53:16,680 --> 01:53:21,960 A temporary pile in which you want to stage things, so to speak. 2633 01:53:21,960 --> 01:53:24,442 But these collisions are inevitable. 2634 01:53:24,442 --> 01:53:27,150 And honestly, if we kept going through the Harry Potter universe, 2635 01:53:27,150 --> 01:53:29,950 some of these chains would get longer, and longer and longer. 2636 01:53:29,950 --> 01:53:33,330 Which means that instead of getting someone's name quickly, 2637 01:53:33,330 --> 01:53:36,178 by searching for them or inserting them, might 2638 01:53:36,178 --> 01:53:37,720 start taking a decent amount of time. 2639 01:53:37,720 --> 01:53:40,770 So what could we do instead to resolve situations like this? 2640 01:53:40,770 --> 01:53:44,370 If the problem, fundamentally, is that the first letter is just too darn 2641 01:53:44,370 --> 01:53:47,387 popular, H, we need to take in more input. 2642 01:53:47,387 --> 01:53:49,720 Not just the first letter but maybe the first 2 letters. 2643 01:53:49,720 --> 01:53:52,770 So if we do that, we can go from A through Z 2644 01:53:52,770 --> 01:53:59,200 to something more extreme like maybe H-A, H-B, H-C, H-D, H-F, and so forth. 2645 01:53:59,200 --> 01:54:02,670 So that now Harry and Hermione end up at different locations. 2646 01:54:02,670 --> 01:54:05,590 But, darn it, Hagrid still collides with Harry. 2647 01:54:05,590 --> 01:54:07,380 So it's better than before. 2648 01:54:07,380 --> 01:54:09,550 The chains aren't quite as long. 2649 01:54:09,550 --> 01:54:11,410 But the problem isn't fundamentally gone. 2650 01:54:11,410 --> 01:54:14,640 And in this case here, anyone know how many buckets we just 2651 01:54:14,640 --> 01:54:22,830 increased to, if we now look at not just a through Z but AA through ZZ, roughly? 2652 01:54:22,830 --> 01:54:24,183 AUDIENCE: 26 squared. 2653 01:54:24,183 --> 01:54:24,850 SPEAKER 1: Yeah. 2654 01:54:24,850 --> 01:54:25,440 OK, good. 2655 01:54:25,440 --> 01:54:28,980 So the easy answer to 26 squared are 676. 2656 01:54:28,980 --> 01:54:30,570 So that's a lot more buckets. 2657 01:54:30,570 --> 01:54:33,040 And this is why I only showed a few of them on the screen. 2658 01:54:33,040 --> 01:54:33,930 So that's a lot more. 2659 01:54:33,930 --> 01:54:37,050 And it spreads things out in particular. 2660 01:54:37,050 --> 01:54:38,640 What if we take this one step further? 2661 01:54:38,640 --> 01:54:44,130 Instead of H-A, we do like H-A-A, H-A-B, H-A-C, H-Z-Z, and so forth. 2662 01:54:44,130 --> 01:54:46,080 Well now, we have an even better situation. 2663 01:54:46,080 --> 01:54:48,480 Because Hermoine has her one spot. 2664 01:54:48,480 --> 01:54:49,770 Harry has his one spot. 2665 01:54:49,770 --> 01:54:51,840 Hagrid has his one spot. 2666 01:54:51,840 --> 01:54:53,880 But there's a trade off here. 2667 01:54:53,880 --> 01:54:57,240 The upside is now, arithmetically, we can find their locations 2668 01:54:57,240 --> 01:54:58,620 in constant time. 2669 01:54:58,620 --> 01:55:00,030 Maybe, technically 3 steps. 2670 01:55:00,030 --> 01:55:03,940 But 3 is constant, no matter how many other names are in here, it would seem. 2671 01:55:03,940 --> 01:55:07,152 But what's the downside here? 2672 01:55:07,152 --> 01:55:07,860 Sorry, say again. 2673 01:55:07,860 --> 01:55:08,490 AUDIENCE: Memory. 2674 01:55:08,490 --> 01:55:09,240 SPEAKER 1: Memory. 2675 01:55:09,240 --> 01:55:10,290 So significantly more. 2676 01:55:10,290 --> 01:55:15,840 We're now up to 17,576 buckets, which itself isn't that big a deal, right. 2677 01:55:15,840 --> 01:55:17,740 Computers have a lot of memory these days. 2678 01:55:17,740 --> 01:55:21,450 But as you can infer, I can't really think 2679 01:55:21,450 --> 01:55:26,160 of someone whose name started with H-E-Q, for instance, in the Harry 2680 01:55:26,160 --> 01:55:26,832 Potter universe. 2681 01:55:26,832 --> 01:55:29,040 And if we keep going, definitely don't know of anyone 2682 01:55:29,040 --> 01:55:32,040 whose name started with Z-Z-Z or A-A-A. There's 2683 01:55:32,040 --> 01:55:37,390 a lot of not useful combinations that have to be there mathematically, 2684 01:55:37,390 --> 01:55:41,040 so that you can do a bit of math and jump to randomly, so to speak, 2685 01:55:41,040 --> 01:55:42,292 the precise location. 2686 01:55:42,292 --> 01:55:43,750 But they're just going to be empty. 2687 01:55:43,750 --> 01:55:47,380 So it's a very sparsely populated array, so to speak. 2688 01:55:47,380 --> 01:55:50,640 So what does that really mean for performance, ultimately? 2689 01:55:50,640 --> 01:55:53,400 Well let's consider, again, in the context of our Big O notation. 2690 01:55:53,400 --> 01:55:56,790 It turns out that a hash table, technically speaking, 2691 01:55:56,790 --> 01:56:00,870 is still just going to give us Big O of n in the worst case. 2692 01:56:00,870 --> 01:56:01,470 Why? 2693 01:56:01,470 --> 01:56:04,440 If you have some crazy perverse case where everyone in the universe 2694 01:56:04,440 --> 01:56:07,950 has a name that starts with A, or starts with H, or starts with Z, 2695 01:56:07,950 --> 01:56:09,240 you just get really unlucky. 2696 01:56:09,240 --> 01:56:11,117 And your chain is massively long. 2697 01:56:11,117 --> 01:56:13,200 Well then, at that point, it's just a linked list. 2698 01:56:13,200 --> 01:56:14,117 It's not a hash table. 2699 01:56:14,117 --> 01:56:16,380 It's like the perverse situation with the tree, where 2700 01:56:16,380 --> 01:56:22,200 if you insert it without any mind for keeping it balance, it just evolves. 2701 01:56:22,200 --> 01:56:26,400 But there's a difference here between a theoretical performance 2702 01:56:26,400 --> 01:56:28,020 and an actual performance. 2703 01:56:28,020 --> 01:56:31,290 If you look back at the the hash table here, 2704 01:56:31,290 --> 01:56:37,890 this is absolutely, in practice, going to be faster than a single linked list. 2705 01:56:37,890 --> 01:56:40,860 Mathematically, asymptotically, big O notation, sure. 2706 01:56:40,860 --> 01:56:41,700 It's all the same. 2707 01:56:41,700 --> 01:56:42,630 Big O of n. 2708 01:56:42,630 --> 01:56:46,500 But if what we're really caring about is real humans using our software, 2709 01:56:46,500 --> 01:56:48,990 there's something to be said for crafting a data structure. 2710 01:56:48,990 --> 01:56:51,570 That technically, if this data were uniformly distributed, 2711 01:56:51,570 --> 01:56:55,450 is 26 times faster than a linked list alone. 2712 01:56:55,450 --> 01:57:00,720 And so, there's this tension too between systems, types of CS, 2713 01:57:00,720 --> 01:57:01,847 and theoretical CS. 2714 01:57:01,847 --> 01:57:03,930 Where yeah, theoretically, these are all the same. 2715 01:57:03,930 --> 01:57:06,660 But in practice, for making real-world software, 2716 01:57:06,660 --> 01:57:12,390 improving this speed by a factor of 26 in this case, let alone 576 or more, 2717 01:57:12,390 --> 01:57:14,170 might actually make a big difference. 2718 01:57:14,170 --> 01:57:15,670 But there's going to be a trade off. 2719 01:57:15,670 --> 01:57:19,540 And that's typically some other resource like giving up more space. 2720 01:57:19,540 --> 01:57:20,040 All right. 2721 01:57:20,040 --> 01:57:23,100 How about another data structure we could build. 2722 01:57:23,100 --> 01:57:26,010 Let me fast forward to something here called a trie. 2723 01:57:26,010 --> 01:57:28,920 So a trie, a weird name in pronunciation. 2724 01:57:28,920 --> 01:57:31,950 Short for retrieval, pronounced trie typically. 2725 01:57:31,950 --> 01:57:37,680 A trie is a tree that actually gives us constant time lookup, 2726 01:57:37,680 --> 01:57:41,040 even for massive data sets. 2727 01:57:41,040 --> 01:57:42,090 What do I mean by this? 2728 01:57:42,090 --> 01:57:47,230 In the world of a trie, you create a tree out of arrays. 2729 01:57:47,230 --> 01:57:49,560 So we're really getting into the Frankenstein territory 2730 01:57:49,560 --> 01:57:52,320 of just building things up with spare parts of data structures 2731 01:57:52,320 --> 01:57:53,500 that we have here. 2732 01:57:53,500 --> 01:57:56,460 But the root of a trie is, itself, an array. 2733 01:57:56,460 --> 01:57:58,530 For instance, of size 26. 2734 01:57:58,530 --> 01:58:04,800 Where each element in that trie points to another node, 2735 01:58:04,800 --> 01:58:06,510 which is to say another array. 2736 01:58:06,510 --> 01:58:09,480 And each of those locations in the array represents a letter 2737 01:58:09,480 --> 01:58:10,920 of the alphabet like A through Z. 2738 01:58:10,920 --> 01:58:14,970 So for instance, if you wanted to store the names of the Harry Potter universe, 2739 01:58:14,970 --> 01:58:19,050 not in a hash table, not in a linked list, not in a tree, but in a trie. 2740 01:58:19,050 --> 01:58:23,820 What you would do is hash on every letter in the person's name one 2741 01:58:23,820 --> 01:58:24,640 at a time. 2742 01:58:24,640 --> 01:58:28,050 So a trie is like a multi-tier hash table, in a sense. 2743 01:58:28,050 --> 01:58:29,770 Where you first look at the first letter, 2744 01:58:29,770 --> 01:58:32,478 then the second letter, then the third, and you do the following. 2745 01:58:32,478 --> 01:58:35,940 For instance, each of these locations represents a letter A 2746 01:58:35,940 --> 01:58:39,450 through Z. Suppose I wanted to insert someone's name into this 2747 01:58:39,450 --> 01:58:43,530 that starts with the letter H, like Hagrid for instance. 2748 01:58:43,530 --> 01:58:46,360 Well, I go to the location H. I see it's null, 2749 01:58:46,360 --> 01:58:49,440 which means I need to malloc myself another node or another array. 2750 01:58:49,440 --> 01:58:50,970 And that's depicted here. 2751 01:58:50,970 --> 01:58:54,810 Then, suppose I want to store the second letter in Hagrid's name, 2752 01:58:54,810 --> 01:58:57,432 an A. So I go to that location in the second node. 2753 01:58:57,432 --> 01:58:58,890 And I see, OK, it's currently null. 2754 01:58:58,890 --> 01:58:59,932 There's nothing below it. 2755 01:58:59,932 --> 01:59:02,440 So I allocate another node using malloc or the like. 2756 01:59:02,440 --> 01:59:06,690 And now I have H-A-G. And I continue this with R-I-D. 2757 01:59:06,690 --> 01:59:10,240 And then, when I get to the bottom of this person's name, 2758 01:59:10,240 --> 01:59:12,840 I just have to indicate here in color, but probably 2759 01:59:12,840 --> 01:59:14,280 with a Boolean value or something. 2760 01:59:14,280 --> 01:59:18,190 Like a true value that says, a name stops here. 2761 01:59:18,190 --> 01:59:23,740 So that it's clear that the person's name is not H-A, or H-A-G, or H-A-G-R, 2762 01:59:23,740 --> 01:59:28,270 or H-A-G-R-I. It's H-A-G-R-I-D. And the D is green, 2763 01:59:28,270 --> 01:59:31,600 just to indicate there's like some other Boolean value that just says, yes. 2764 01:59:31,600 --> 01:59:35,300 This is the node in which the name stops. 2765 01:59:35,300 --> 01:59:40,240 And if I continue this logic, here's how I might insert someone like Harry. 2766 01:59:40,240 --> 01:59:43,420 And here's how I might insert someone like Hermione. 2767 01:59:43,420 --> 01:59:48,010 And what's interesting about the design here is that some of these names 2768 01:59:48,010 --> 01:59:49,930 share a common prefix. 2769 01:59:49,930 --> 01:59:52,990 Which starts to get compelling because you're reusing space. 2770 01:59:52,990 --> 01:59:57,910 You're using the same nodes for names like H-A-G and H-A-R 2771 01:59:57,910 --> 02:00:00,370 because they share H and an A in common. 2772 02:00:00,370 --> 02:00:02,630 And they all share an H in common. 2773 02:00:02,630 --> 02:00:06,340 So you have this data structure now that, itself, is a tree. 2774 02:00:06,340 --> 02:00:10,090 Each node in the tree is, itself, an array. 2775 02:00:10,090 --> 02:00:13,690 And we, therefore, might implement this thing using code like this. 2776 02:00:13,690 --> 02:00:19,195 Every node is containing, I'll do it in reverse order, an array. 2777 02:00:19,195 --> 02:00:21,820 I'll call it children because that's what it really represents. 2778 02:00:21,820 --> 02:00:24,130 Up to 26 children for each of these nodes. 2779 02:00:24,130 --> 02:00:25,430 Size of the alphabet. 2780 02:00:25,430 --> 02:00:28,360 So I might have used just a constant for number 26, 2781 02:00:28,360 --> 02:00:30,400 to give myself 26 letters of the alphabet. 2782 02:00:30,400 --> 02:00:34,630 And each of those arrays stores that many node stars. 2783 02:00:34,630 --> 02:00:36,550 That many pointers to another node. 2784 02:00:36,550 --> 02:00:38,020 And here's an example of the Bool. 2785 02:00:38,020 --> 02:00:40,750 This is what I represented in green on the slide a moment ago. 2786 02:00:40,750 --> 02:00:42,580 I also need another piece of data. 2787 02:00:42,580 --> 02:00:45,520 Just a 0 or 1, a true or false, that says yes. 2788 02:00:45,520 --> 02:00:50,810 A name stops in this node or it's just a path to the rest of the person's name. 2789 02:00:50,810 --> 02:00:55,090 But the upside of this is that the height of this tree 2790 02:00:55,090 --> 02:00:58,090 is only as tall as the person's longest name. 2791 02:00:58,090 --> 02:01:04,930 H-A-G-R-I-D or H-E-R-M-O-I-N-E. And notice that no matter how many other 2792 02:01:04,930 --> 02:01:08,740 people are in this data structure, there's 3 at the moment, 2793 02:01:08,740 --> 02:01:13,150 if there were 3 million, it would still take me how many steps to search 2794 02:01:13,150 --> 02:01:14,500 for Hermoine? 2795 02:01:14,500 --> 02:01:19,750 H-E-R-M-I-O-N-E. So, 8 steps total. 2796 02:01:19,750 --> 02:01:24,580 No matter if there's 2 other people, 2 million, 10 million other people. 2797 02:01:24,580 --> 02:01:28,660 Because the path to her name is always on the same path. 2798 02:01:28,660 --> 02:01:33,550 And if you assume that there's a maximum limit on the length of names 2799 02:01:33,550 --> 02:01:34,420 in the human world. 2800 02:01:34,420 --> 02:01:36,510 Maybe it's 40, 100, whatever. 2801 02:01:36,510 --> 02:01:38,260 Whatever the longest name in the world is. 2802 02:01:38,260 --> 02:01:39,160 That's constant. 2803 02:01:39,160 --> 02:01:41,630 Maybe it's 40, 100, but that's constant. 2804 02:01:41,630 --> 02:01:44,840 Which is to say that with a trie, technically speaking, 2805 02:01:44,840 --> 02:01:49,480 it is the case that your lookup time, Big O of n, a big O notation, 2806 02:01:49,480 --> 02:01:51,520 would be big O of 1. 2807 02:01:51,520 --> 02:01:54,580 It's constant time, because unlike every other data structure 2808 02:01:54,580 --> 02:01:59,440 we've looked at, with a trie, the amount of time it takes you to find one person 2809 02:01:59,440 --> 02:02:02,920 or insert one person is completely independent of how 2810 02:02:02,920 --> 02:02:07,210 many other pieces of data are already in the data structure. 2811 02:02:07,210 --> 02:02:09,970 And this holds true even if one name is a prefix of another. 2812 02:02:09,970 --> 02:02:13,373 I don't think there was a Daniel or Danielle in the Harry Potter universe 2813 02:02:13,373 --> 02:02:14,290 that I could think of. 2814 02:02:14,290 --> 02:02:18,400 But, D-A-N-I-E-L could be one name. 2815 02:02:18,400 --> 02:02:20,988 And, therefore, we have a true there in green. 2816 02:02:20,988 --> 02:02:22,780 And if there's a longer name like Danielle. 2817 02:02:22,780 --> 02:02:24,760 Then, you keep going until you get to the E. 2818 02:02:24,760 --> 02:02:27,550 So you can still have with a trie, one name that's 2819 02:02:27,550 --> 02:02:29,660 a substring of another name. 2820 02:02:29,660 --> 02:02:32,380 So it's not as though we've created a problem there. 2821 02:02:32,380 --> 02:02:34,052 That, too, is still possible. 2822 02:02:34,052 --> 02:02:36,760 But at the end of the day, it only takes a finite number of steps 2823 02:02:36,760 --> 02:02:38,410 to find any of these people. 2824 02:02:38,410 --> 02:02:41,320 And again, that's what's particularly compelling. 2825 02:02:41,320 --> 02:02:43,398 That you effectively have constant time lookup. 2826 02:02:43,398 --> 02:02:44,440 So that's amazing, right. 2827 02:02:44,440 --> 02:02:48,153 We've gone through this whole story for weeks now of like, linear time. 2828 02:02:48,153 --> 02:02:49,570 And then, it went up to n squared. 2829 02:02:49,570 --> 02:02:50,350 And then, log n. 2830 02:02:50,350 --> 02:02:55,430 And now constant time, what's the price paid for a data structure like this? 2831 02:02:55,430 --> 02:02:58,630 This so-called trie? 2832 02:02:58,630 --> 02:02:59,810 What's the downside here? 2833 02:02:59,810 --> 02:03:01,540 There's got to be a catch. 2834 02:03:01,540 --> 02:03:03,970 And in fact, tries are not actually used that often, 2835 02:03:03,970 --> 02:03:07,500 amazing as they might sound on some CS level here. 2836 02:03:07,500 --> 02:03:08,260 AUDIENCE: Memory. 2837 02:03:08,260 --> 02:03:09,520 SPEAKER 1: Memory. 2838 02:03:09,520 --> 02:03:10,735 In what sense? 2839 02:03:10,735 --> 02:03:12,898 AUDIENCE: Much like a [INAUDIBLE]. 2840 02:03:12,898 --> 02:03:13,690 SPEAKER 1: Exactly. 2841 02:03:13,690 --> 02:03:15,610 If you're storing all of these darn arrays 2842 02:03:15,610 --> 02:03:18,870 it's, again, a sparsely populated data structure. 2843 02:03:18,870 --> 02:03:19,870 And you can see it here. 2844 02:03:19,870 --> 02:03:23,800 Granted there's only 3 names, but most of those boxes, most of those pointers, 2845 02:03:23,800 --> 02:03:25,490 are going to remain null. 2846 02:03:25,490 --> 02:03:28,540 So this is an incredibly wide data structure, if you will. 2847 02:03:28,540 --> 02:03:31,040 It uses a huge amount of memory to store the names. 2848 02:03:31,040 --> 02:03:32,860 But again, you've got to pick a lane. 2849 02:03:32,860 --> 02:03:35,980 Either you're going to minimize space or you're going to minimize time. 2850 02:03:35,980 --> 02:03:39,240 It's not really possible to get truly the best of both worlds. 2851 02:03:39,240 --> 02:03:41,290 You have to decide where the inflection point is 2852 02:03:41,290 --> 02:03:44,110 for the device you're writing software for, how much memory it has, 2853 02:03:44,110 --> 02:03:45,460 how expensive it is. 2854 02:03:45,460 --> 02:03:48,980 And again, taking all of these things into account. 2855 02:03:48,980 --> 02:03:51,400 So lastly, let's do one further abstraction. 2856 02:03:51,400 --> 02:03:54,910 So even higher level to discuss something that are generally 2857 02:03:54,910 --> 02:03:56,962 known as abstract data structures. 2858 02:03:56,962 --> 02:03:58,670 It turns out we could spend like all day, 2859 02:03:58,670 --> 02:04:00,250 all week, talking about different things we 2860 02:04:00,250 --> 02:04:01,700 could build with these data structures. 2861 02:04:01,700 --> 02:04:03,658 But for the most part, now that we have arrays. 2862 02:04:03,658 --> 02:04:06,430 Now that we have linked lists or their cousin's trees, which 2863 02:04:06,430 --> 02:04:07,428 are 2-dimensional. 2864 02:04:07,428 --> 02:04:09,220 And beyond that, there's even graphs, where 2865 02:04:09,220 --> 02:04:12,407 the arrows can go in multiple directions, not just down, so to speak. 2866 02:04:12,407 --> 02:04:14,740 Now that we have this ability to stitch things together, 2867 02:04:14,740 --> 02:04:16,790 we can solve all different types of problems. 2868 02:04:16,790 --> 02:04:20,740 So, for instance, a very common type of data structure 2869 02:04:20,740 --> 02:04:24,730 to use in a program, or even our human world, are things called queues. 2870 02:04:24,730 --> 02:04:28,780 A queue being a data structure like a line outside of a store. 2871 02:04:28,780 --> 02:04:30,850 Where it has what's called a FIFO property. 2872 02:04:30,850 --> 02:04:32,240 First In, First Out. 2873 02:04:32,240 --> 02:04:34,660 Which is great for fairness, at least in the human world. 2874 02:04:34,660 --> 02:04:38,800 And if you've ever waited outside of Tasty Burger, or Salsa Fresca, 2875 02:04:38,800 --> 02:04:40,990 or some other restaurant nearby, presumably, 2876 02:04:40,990 --> 02:04:43,780 if you're queuing up at the counter, you want 2877 02:04:43,780 --> 02:04:46,270 them store to maintain a FIFO system. 2878 02:04:46,270 --> 02:04:47,530 First in and first out. 2879 02:04:47,530 --> 02:04:51,160 So that whoever's first in line gets their food first and gets out first. 2880 02:04:51,160 --> 02:04:54,710 So a queue is actually a computer science term, too. 2881 02:04:54,710 --> 02:04:57,460 And even if you're still in the habit of printing things on paper, 2882 02:04:57,460 --> 02:04:59,710 there are things you might have heard called printer 2883 02:04:59,710 --> 02:05:02,050 queues, which also do things in order. 2884 02:05:02,050 --> 02:05:04,467 The first person to send their essay to the printer 2885 02:05:04,467 --> 02:05:06,550 should, ideally, be printed before the last person 2886 02:05:06,550 --> 02:05:08,920 to send their essay to the printer. 2887 02:05:08,920 --> 02:05:10,720 Again, in the interest of fairness. 2888 02:05:10,720 --> 02:05:12,370 But how can you implement a queue? 2889 02:05:12,370 --> 02:05:15,250 Well, you typically have to implement 2 fundamental operations, 2890 02:05:15,250 --> 02:05:16,810 enqueue and dequeue. 2891 02:05:16,810 --> 02:05:19,910 So adding something to it and removing something from it. 2892 02:05:19,910 --> 02:05:23,650 And the interesting thing here is that how do you implement a queue? 2893 02:05:23,650 --> 02:05:26,650 Well in the human world, you would just have literally physical space 2894 02:05:26,650 --> 02:05:29,290 for humans to line up from left to right, or right to left. 2895 02:05:29,290 --> 02:05:30,333 Same in a computer. 2896 02:05:30,333 --> 02:05:33,250 Like a printer queue, if you send a whole bunch of jobs to be printed, 2897 02:05:33,250 --> 02:05:35,350 a whole bunch of essays or documents, well, you 2898 02:05:35,350 --> 02:05:37,430 need a chunk of memory like an array. 2899 02:05:37,430 --> 02:05:37,930 All right. 2900 02:05:37,930 --> 02:05:40,150 Well, if you use an array, what's a problem 2901 02:05:40,150 --> 02:05:43,760 that could happen in the world of printing, for instance? 2902 02:05:43,760 --> 02:05:47,020 If you use an array to store all of the documents that need to be printed. 2903 02:05:47,020 --> 02:05:48,178 AUDIENCE: It can be filled. 2904 02:05:48,178 --> 02:05:49,720 SPEAKER 1: It could be filled, right. 2905 02:05:49,720 --> 02:05:53,020 So if the programmer decided, HP or whoever makes the printer decides, 2906 02:05:53,020 --> 02:05:56,680 oh, you can send like a megabyte worth of documents to this printer at once. 2907 02:05:56,680 --> 02:05:58,730 At some point you might get an error message, 2908 02:05:58,730 --> 02:06:00,100 which says, sorry out of memory. 2909 02:06:00,100 --> 02:06:00,995 Wait a few minutes. 2910 02:06:00,995 --> 02:06:03,370 Which is maybe a reasonable solution, but a little annoy. 2911 02:06:03,370 --> 02:06:07,000 Or HP could write code that maybe dynamically resizes the array 2912 02:06:07,000 --> 02:06:07,670 or so forth. 2913 02:06:07,670 --> 02:06:10,240 But at that point, maybe they should just use a linked list. 2914 02:06:10,240 --> 02:06:11,170 And they could. 2915 02:06:11,170 --> 02:06:14,890 So there, too, you could implement the notion of a queue 2916 02:06:14,890 --> 02:06:16,238 using a linked list instead. 2917 02:06:16,238 --> 02:06:18,280 You're going to spend more memory, but you're not 2918 02:06:18,280 --> 02:06:20,650 going to run out of space in your array. 2919 02:06:20,650 --> 02:06:22,493 Which might be more compelling. 2920 02:06:22,493 --> 02:06:24,160 This happens even in the physical world. 2921 02:06:24,160 --> 02:06:27,640 You go to the store and you start having to line up outside and down the road. 2922 02:06:27,640 --> 02:06:31,927 And like, for a really busy store, they run out of space so they make do. 2923 02:06:31,927 --> 02:06:34,510 But in that case, it tends to be more of an array just because 2924 02:06:34,510 --> 02:06:36,965 of the physical notion of humans lining up. 2925 02:06:36,965 --> 02:06:38,590 But there's other data structures, too. 2926 02:06:38,590 --> 02:06:41,715 If you've ever gone to the dining hall and picked up like a Harvard or Yale 2927 02:06:41,715 --> 02:06:46,870 tray, you're typically picking up the last tray that was just cleaned, 2928 02:06:46,870 --> 02:06:48,730 not the first tray that was cleaned. 2929 02:06:48,730 --> 02:06:49,240 Why? 2930 02:06:49,240 --> 02:06:53,170 Because these cafeteria trays stack up on top of each other. 2931 02:06:53,170 --> 02:06:56,410 And indeed a stack is another type of abstract data structure. 2932 02:06:56,410 --> 02:06:58,870 In the physical world, it's literally something physical 2933 02:06:58,870 --> 02:07:01,030 like a stack of trays. 2934 02:07:01,030 --> 02:07:03,940 Which have what we would call a LIFO property. 2935 02:07:03,940 --> 02:07:05,460 Last In, First Out. 2936 02:07:05,460 --> 02:07:07,210 So as these things come out of the washer, 2937 02:07:07,210 --> 02:07:09,520 they're putting the most recent ones on the top. 2938 02:07:09,520 --> 02:07:13,240 And then you, the human, are probably taking the most recently cleaned one. 2939 02:07:13,240 --> 02:07:15,700 Which means in the extreme, no one on campus 2940 02:07:15,700 --> 02:07:19,135 might ever use that very first tray. 2941 02:07:19,135 --> 02:07:21,010 Which is probably fine in the world of trays, 2942 02:07:21,010 --> 02:07:24,970 but would really be bad in the world of Tasty Burger lining up for food if LIFO 2943 02:07:24,970 --> 02:07:26,770 were the property being implemented. 2944 02:07:26,770 --> 02:07:28,840 But here, too, it could be an array. 2945 02:07:28,840 --> 02:07:29,950 It could be a linked list. 2946 02:07:29,950 --> 02:07:31,533 And you see this, honestly, every day. 2947 02:07:31,533 --> 02:07:33,760 If you're using Gmail and your Gmail inbox. 2948 02:07:33,760 --> 02:07:36,280 That is actually a stack, at least by default, 2949 02:07:36,280 --> 02:07:39,678 where your newest message last in are the first ones 2950 02:07:39,678 --> 02:07:40,720 at the top of the screen. 2951 02:07:40,720 --> 02:07:42,580 That's a LIFO data structure. 2952 02:07:42,580 --> 02:07:44,710 And it means that you see your most recent emails. 2953 02:07:44,710 --> 02:07:47,168 But if you have a busy day, you're getting a lot of emails, 2954 02:07:47,168 --> 02:07:48,430 it might not be a good thing. 2955 02:07:48,430 --> 02:07:50,830 Because now you're ignoring the people who wrote you 2956 02:07:50,830 --> 02:07:53,140 way earlier in the day or the week. 2957 02:07:53,140 --> 02:07:55,600 So LIFO and FIFO are just properties that you 2958 02:07:55,600 --> 02:07:58,360 can achieve with these very specific types of data structures. 2959 02:07:58,360 --> 02:08:00,110 And the parliaments in the world of stacks 2960 02:08:00,110 --> 02:08:03,970 is to push something onto a stack or pop something out. 2961 02:08:03,970 --> 02:08:06,160 These are here, for instance, as an example of why 2962 02:08:06,160 --> 02:08:07,450 might you always wear the same color. 2963 02:08:07,450 --> 02:08:09,710 Well, if you're storing all of your clothes in a stack, 2964 02:08:09,710 --> 02:08:11,530 you might not ever get to the different colored 2965 02:08:11,530 --> 02:08:12,970 clothes at the bottom of the list. 2966 02:08:12,970 --> 02:08:17,890 And in fact, to paint this picture, we have a couple of minute video here. 2967 02:08:17,890 --> 02:08:20,890 Just to paint this here, made by a faculty member elsewhere. 2968 02:08:20,890 --> 02:08:23,830 Let's go ahead and dim the lights for just a minute or 2 here. 2969 02:08:23,830 --> 02:08:27,985 So that we can take a look at Jack learning some facts. 2970 02:08:27,985 --> 02:08:28,610 [VIDEO PLAYING] 2971 02:08:28,610 --> 02:08:31,360 SPEAKER 2: Once upon a time, there was a guy named Jack. 2972 02:08:31,360 --> 02:08:34,750 When it came to making friends Jack did not have the knack. 2973 02:08:34,750 --> 02:08:37,720 So Jack went to talk to the most popular guy he knew. 2974 02:08:37,720 --> 02:08:40,390 He went up to Lou and asked, what do I do? 2975 02:08:40,390 --> 02:08:42,850 Lou saw that his friend was really distressed. 2976 02:08:42,850 --> 02:08:45,560 Well, Lou began, just look how you're dressed. 2977 02:08:45,560 --> 02:08:48,130 Don't you have any clothes with a different look? 2978 02:08:48,130 --> 02:08:49,210 Yes, said Jack. 2979 02:08:49,210 --> 02:08:50,530 I sure do. 2980 02:08:50,530 --> 02:08:52,720 Come to my house and I'll showed them to you. 2981 02:08:52,720 --> 02:08:54,010 So they went off the Jack's. 2982 02:08:54,010 --> 02:08:57,700 And Jack showed Lou the box, where he kept all his shirts, and his pants, 2983 02:08:57,700 --> 02:08:58,750 at his socks. 2984 02:08:58,750 --> 02:09:01,720 Lou said, I see you have all your clothes in a pile. 2985 02:09:01,720 --> 02:09:04,300 Why don't you wear some others once in a while? 2986 02:09:04,300 --> 02:09:07,450 Jack said, well, when I remove clothes and socks, 2987 02:09:07,450 --> 02:09:10,180 I wash them and put them away in the box. 2988 02:09:10,180 --> 02:09:12,670 Then comes the next morning and up I hop. 2989 02:09:12,670 --> 02:09:15,910 I go to the box and get my clothes off the top. 2990 02:09:15,910 --> 02:09:18,520 Lou quickly realized the problem with Jack. 2991 02:09:18,520 --> 02:09:21,490 He kept clothes, CDs, and books in a stack. 2992 02:09:21,490 --> 02:09:23,920 When he'd reached for something to read or to wear, 2993 02:09:23,920 --> 02:09:26,530 he chose a top book or underwear. 2994 02:09:26,530 --> 02:09:28,920 Then when he was done he would put it right back. 2995 02:09:28,920 --> 02:09:31,500 Back it would go on top of the stack. 2996 02:09:31,500 --> 02:09:33,870 I know the solution, said a triumphant Lou. 2997 02:09:33,870 --> 02:09:36,510 You need to learn to start using a queue. 2998 02:09:36,510 --> 02:09:39,300 Lou took Jack's clothes and hung them in a closet. 2999 02:09:39,300 --> 02:09:42,120 And when he had emptied the box, he just tossed it. 3000 02:09:42,120 --> 02:09:45,990 Then he said, now Jack, at the end of the day, put your clothes on the left 3001 02:09:45,990 --> 02:09:47,470 when you put them away. 3002 02:09:47,470 --> 02:09:50,190 Then tomorrow morning when you see the sunshine, get 3003 02:09:50,190 --> 02:09:52,920 your clothes from the right, from the end of the line. 3004 02:09:52,920 --> 02:09:55,800 Don't you see, said Lou, it will be so nice. 3005 02:09:55,800 --> 02:09:59,130 You'll wear everything once before you wear something twice. 3006 02:09:59,130 --> 02:10:02,070 And with everything in queues in his closet and shelf, 3007 02:10:02,070 --> 02:10:04,680 Jack started to feel quite sure of himself. 3008 02:10:04,680 --> 02:10:07,155 All thanks to Lou and his wonderful queue. 3009 02:10:09,220 --> 02:10:12,220 SPEAKER 1: So just to help you realize that these things are everywhere. 3010 02:10:12,220 --> 02:10:14,830 [AUDIENCE CLAPPING] 3011 02:10:14,830 --> 02:10:16,380 Even in our human world. 3012 02:10:16,380 --> 02:10:18,060 If you've ever lined up at this place. 3013 02:10:18,060 --> 02:10:19,980 Anyone recognize this? 3014 02:10:19,980 --> 02:10:22,800 OK, so sweetgreen, little salad place in the square. 3015 02:10:22,800 --> 02:10:24,690 This is if you order online or in advance, 3016 02:10:24,690 --> 02:10:27,232 your food ends up according to the first letter in your name. 3017 02:10:27,232 --> 02:10:29,482 Which actually sounds awfully reminiscent of something 3018 02:10:29,482 --> 02:10:30,300 like a hash table. 3019 02:10:30,300 --> 02:10:33,360 And in fact, no matter whether you implement a hash table like we 3020 02:10:33,360 --> 02:10:35,130 did, with an array and linked list. 3021 02:10:35,130 --> 02:10:37,335 Or with 3 shelves like this. 3022 02:10:37,335 --> 02:10:40,320 This is actually an abstract data type called a dictionary. 3023 02:10:40,320 --> 02:10:43,680 And a dictionary, just like in our human world, has keys and values. 3024 02:10:43,680 --> 02:10:45,390 Words and their definitions. 3025 02:10:45,390 --> 02:10:49,890 This just has letters of the alphabet and salads as their value. 3026 02:10:49,890 --> 02:10:52,260 But here, too, there's a real world constraint. 3027 02:10:52,260 --> 02:10:55,740 In what kind of scenario does this system at sweetgreen 3028 02:10:55,740 --> 02:10:58,410 devolve into a problem, for instance? 3029 02:10:58,410 --> 02:11:02,100 Because they, too, are using only finite space, finite storage. 3030 02:11:02,100 --> 02:11:03,090 What could go wrong? 3031 02:11:03,090 --> 02:11:03,360 Yeah. 3032 02:11:03,360 --> 02:11:04,290 AUDIENCE: Run out of space. 3033 02:11:04,290 --> 02:11:04,530 SPEAKER 1: Yeah. 3034 02:11:04,530 --> 02:11:05,910 If they run out of space on the shelf and there's 3035 02:11:05,910 --> 02:11:08,380 a lot of people whose names start with D, or E, or whatever. 3036 02:11:08,380 --> 02:11:09,300 And so, they just pile up. 3037 02:11:09,300 --> 02:11:11,880 And then, maybe, they kind of overflow into the E's or the F's. 3038 02:11:11,880 --> 02:11:13,800 And they probably don't really care because any human 3039 02:11:13,800 --> 02:11:16,290 is going to come by, and just eyeball it, and figure it out anyway. 3040 02:11:16,290 --> 02:11:18,780 But in the world of a computer, you're the one coding 3041 02:11:18,780 --> 02:11:20,670 and have to be ever so precise. 3042 02:11:20,670 --> 02:11:24,240 We thought we would lastly do one final thing here. 3043 02:11:24,240 --> 02:11:28,045 In advance, we prepared a linked list of sorts in the audience. 3044 02:11:28,045 --> 02:11:29,670 Since this has become a bit of a thing. 3045 02:11:29,670 --> 02:11:32,530 I am starting to represent the beginning of this linked list. 3046 02:11:32,530 --> 02:11:37,110 And so far as I have a pointer here with seat location G9. 3047 02:11:37,110 --> 02:11:40,500 Whoever is in G9, would you mind standing up? 3048 02:11:40,500 --> 02:11:43,170 And what letter is on your sheet there? 3049 02:11:43,170 --> 02:11:44,100 AUDIENCE: F15. 3050 02:11:44,100 --> 02:11:46,650 SPEAKER 1: OK, so you have S15 and your letter-- 3051 02:11:46,650 --> 02:11:47,305 AUDIENCE: F15. 3052 02:11:47,305 --> 02:11:48,180 SPEAKER 1: Say again? 3053 02:11:48,180 --> 02:11:48,870 AUDIENCE: F. 3054 02:11:48,870 --> 02:11:49,680 SPEAKER 1: F15. 3055 02:11:49,680 --> 02:11:51,990 So I see you're holding a C in your node. 3056 02:11:51,990 --> 02:11:55,500 You are pointing to, if you could physically, F15. 3057 02:11:55,500 --> 02:11:56,880 F15, what do you hold? 3058 02:11:56,880 --> 02:11:57,780 AUDIENCE: S. 3059 02:11:57,780 --> 02:12:00,390 SPEAKER 1: You have an S. And who should you be pointing at? 3060 02:12:00,390 --> 02:12:01,170 AUDIENCE: F5. 3061 02:12:01,170 --> 02:12:01,930 SPEAKER 1: F5. 3062 02:12:01,930 --> 02:12:03,240 Could you stand up, F5. 3063 02:12:03,240 --> 02:12:04,950 You're holding a 5, I see. 3064 02:12:04,950 --> 02:12:06,030 What address? 3065 02:12:06,030 --> 02:12:07,020 AUDIENCE: F12. 3066 02:12:07,020 --> 02:12:08,040 SPEAKER 1: F12. 3067 02:12:08,040 --> 02:12:08,820 Big finale. 3068 02:12:08,820 --> 02:12:13,020 F12, if you'd like to stand up holding a 0 and null, which means that was CS50. 3069 02:12:13,020 --> 02:12:16,540 [AUDIENCE CLAPPING] 3070 02:12:16,540 --> 02:12:17,040 All right. 3071 02:12:17,040 --> 02:12:19,340 We'll see you next time. 3072 02:12:19,340 --> 02:12:54,000 [MUSIC PLAYING]