1 00:00:00,000 --> 00:00:00,500 2 00:00:00,500 --> 00:00:03,340 [MUSIC PLAYING] 3 00:00:03,340 --> 00:00:11,020 4 00:00:11,020 --> 00:00:14,010 >> DAVID MALAN: This is CS50. 5 00:00:14,010 --> 00:00:18,090 And this is both the start and the end-- like literally-- almost the end 6 00:00:18,090 --> 00:00:18,825 of week six. 7 00:00:18,825 --> 00:00:20,030 8 00:00:20,030 --> 00:00:22,640 >> I thought I'd share a little bit of a fun fact. 9 00:00:22,640 --> 00:00:25,370 I've pulled this up from a past semester's data set. 10 00:00:25,370 --> 00:00:29,710 You may recall that we ask you on every p set form if you've watched online 11 00:00:29,710 --> 00:00:31,580 or if you've attended in person. 12 00:00:31,580 --> 00:00:33,020 And here is the data. 13 00:00:33,020 --> 00:00:34,710 So today was very much predictable. 14 00:00:34,710 --> 00:00:37,126 But we wanted to spend a bit of time with you nonetheless. 15 00:00:37,126 --> 00:00:40,599 Would anyone like to conjecture why this graph is so jaggy, up down, up down, 16 00:00:40,599 --> 00:00:41,265 so consistently? 17 00:00:41,265 --> 00:00:42,980 18 00:00:42,980 --> 00:00:45,130 What do each of the peaks and troughs represent? 19 00:00:45,130 --> 00:00:46,005 >> AUDIENCE: [INAUDIBLE] 20 00:00:46,005 --> 00:00:47,002 21 00:00:47,002 --> 00:00:47,835 DAVID MALAN: Indeed. 22 00:00:47,835 --> 00:00:50,900 23 00:00:50,900 --> 00:00:55,480 And more amusingly, god forbid, we hold one lecture on a Friday 24 00:00:55,480 --> 00:00:58,960 at the beginning of the semester, that's what we see happen. 25 00:00:58,960 --> 00:01:03,430 So today, we partake in a bit more about data structures. 26 00:01:03,430 --> 00:01:06,660 And to give you more of a solid mental model for problems at five, 27 00:01:06,660 --> 00:01:07,450 which is now out. 28 00:01:07,450 --> 00:01:10,817 Misspellings, wherein, we'll hand you a text file some 100,000 29 00:01:10,817 --> 00:01:12,650 plus English words, and you're going to have 30 00:01:12,650 --> 00:01:17,770 to figure out how to cleverly load them into memory, into RAM, using some data 31 00:01:17,770 --> 00:01:19,330 structure of your choice. 32 00:01:19,330 --> 00:01:22,470 >> Now one such data structure could be, but probably shouldn't be, 33 00:01:22,470 --> 00:01:25,630 the fairly simplistic linked list, which we introduced last time. 34 00:01:25,630 --> 00:01:29,220 And a linked list had at least one advantage over an array. 35 00:01:29,220 --> 00:01:32,096 What's one advantage of a linked list arguably? 36 00:01:32,096 --> 00:01:32,950 >> AUDIENCE: Insertion. 37 00:01:32,950 --> 00:01:33,908 >> DAVID MALAN: Insertion. 38 00:01:33,908 --> 00:01:34,155 39 00:01:34,155 --> 00:01:35,196 What do you mean by that? 40 00:01:35,196 --> 00:01:37,872 >> AUDIENCE: Anywhere along the list [INAUDIBLE]. 41 00:01:37,872 --> 00:01:38,770 >> DAVID MALAN: Good. 42 00:01:38,770 --> 00:01:42,090 So you can insert an element wherever you want in the middle of the list 43 00:01:42,090 --> 00:01:45,490 without having to shuffle anything, which we concluded, in our sorting 44 00:01:45,490 --> 00:01:47,630 discussions, isn't necessarily a good thing, 45 00:01:47,630 --> 00:01:51,200 because it takes time to actually move all of those humans left or right. 46 00:01:51,200 --> 00:01:55,540 And so with a linked list, you can just allocate with malloc, a new node, 47 00:01:55,540 --> 00:01:58,385 and then update a couple of pointers-- two, three operations max-- 48 00:01:58,385 --> 00:02:01,480 and we're able to slot someone in anywhere into a list. 49 00:02:01,480 --> 00:02:03,550 >> What else was advantageous about a linked list? 50 00:02:03,550 --> 00:02:04,980 51 00:02:04,980 --> 00:02:05,659 Yeah? 52 00:02:05,659 --> 00:02:06,534 >> AUDIENCE: [INAUDIBLE] 53 00:02:06,534 --> 00:02:07,538 54 00:02:07,538 --> 00:02:08,413 DAVID MALAN: Perfect. 55 00:02:08,413 --> 00:02:10,590 56 00:02:10,590 --> 00:02:11,090 Perfect. 57 00:02:11,090 --> 00:02:12,070 It's really dynamic. 58 00:02:12,070 --> 00:02:15,100 And that you're not committing, in advance, to some fixed size 59 00:02:15,100 --> 00:02:18,750 chunk of memory, like you would have to with an array, the upside of which 60 00:02:18,750 --> 00:02:22,455 is that you can allocate nodes only on demand thereby using only as much space 61 00:02:22,455 --> 00:02:23,330 as you actually need. 62 00:02:23,330 --> 00:02:26,830 By contrast with an array, you might accidentally allocate too little. 63 00:02:26,830 --> 00:02:28,871 And then it's just going to be a pain in the neck 64 00:02:28,871 --> 00:02:32,440 to reallocate a new bigger array, copy everything over, free the old array, 65 00:02:32,440 --> 00:02:33,990 and then move about your business. 66 00:02:33,990 --> 00:02:37,479 Or worse, you might allocate way more memory than you actually need, 67 00:02:37,479 --> 00:02:40,520 and so you're going to have a very sparsely-populated array, so to speak. 68 00:02:40,520 --> 00:02:44,350 >> So a linked list gives you these advantages of dynamism and flexibility 69 00:02:44,350 --> 00:02:46,080 with insertions and deletions. 70 00:02:46,080 --> 00:02:48,000 But surely there must be a price paid. 71 00:02:48,000 --> 00:02:50,000 In fact, one of the themes explored on quiz zero 72 00:02:50,000 --> 00:02:52,430 was a couple of the trade-offs we've seen thus far. 73 00:02:52,430 --> 00:02:56,161 So what's a price paid or a downside of a linked list? 74 00:02:56,161 --> 00:02:56,660 Yeah. 75 00:02:56,660 --> 00:02:57,560 >> AUDIENCE: No random access. 76 00:02:57,560 --> 00:02:58,809 >> DAVID MALAN: No random access. 77 00:02:58,809 --> 00:02:59,540 But who cares? 78 00:02:59,540 --> 00:03:01,546 Random access doesn't sound compelling. 79 00:03:01,546 --> 00:03:02,421 >> AUDIENCE: [INAUDIBLE] 80 00:03:02,421 --> 00:03:04,865 81 00:03:04,865 --> 00:03:05,740 DAVID MALAN: Exactly. 82 00:03:05,740 --> 00:03:07,580 If you want to have a certain algorithm-- 83 00:03:07,580 --> 00:03:10,170 and let me actually propose binary search in particular, which 84 00:03:10,170 --> 00:03:12,600 is one we've used quite a bit-- if you don't have random access, 85 00:03:12,600 --> 00:03:15,516 you can't do that simple arithmetic of finding like the middle element 86 00:03:15,516 --> 00:03:16,530 and jumping right to it. 87 00:03:16,530 --> 00:03:20,239 You instead have to start at the first element and linearly search from left 88 00:03:20,239 --> 00:03:22,780 to right if you want to find the middle or any other element. 89 00:03:22,780 --> 00:03:24,410 >> AUDIENCE: It probably takes more memory. 90 00:03:24,410 --> 00:03:25,040 >> DAVID MALAN: Takes more memory. 91 00:03:25,040 --> 00:03:27,464 Where is that additional cost coming from in memory? 92 00:03:27,464 --> 00:03:28,339 >> AUDIENCE: [INAUDIBLE] 93 00:03:28,339 --> 00:03:32,566 94 00:03:32,566 --> 00:03:33,440 DAVID MALAN: Exactly. 95 00:03:33,440 --> 00:03:35,679 In this case here, we had a linked list for integers, 96 00:03:35,679 --> 00:03:37,470 and yet we're doubling the amount of memory 97 00:03:37,470 --> 00:03:39,680 we need by also storing these pointers. 98 00:03:39,680 --> 00:03:42,090 Now less of a big deal as your structs get larger 99 00:03:42,090 --> 00:03:45,320 and you're storing not a number but maybe a student or some other object. 100 00:03:45,320 --> 00:03:46,880 But the point certainly remains. 101 00:03:46,880 --> 00:03:49,421 And so a number of the operations on linked lists were called 102 00:03:49,421 --> 00:03:50,570 were big O of n-- linear. 103 00:03:50,570 --> 00:03:54,730 Things like insertion or search or deletion in case an element 104 00:03:54,730 --> 00:03:57,720 happened to be at the very end of the list whether it's sorted or not. 105 00:03:57,720 --> 00:04:01,167 >> Sometimes you might get lucky and in so lower bounds on these operations 106 00:04:01,167 --> 00:04:04,250 might also be constant time if you're always looking at the first element, 107 00:04:04,250 --> 00:04:05,070 for instance. 108 00:04:05,070 --> 00:04:09,360 But ultimately, we promised to achieve the holy grail 109 00:04:09,360 --> 00:04:12,630 of data structures, or some approximation thereof, 110 00:04:12,630 --> 00:04:14,290 by way of constant time. 111 00:04:14,290 --> 00:04:17,579 Can we find elements or add elements or remove elements from a list? 112 00:04:17,579 --> 00:04:19,059 We shall see quite soon. 113 00:04:19,059 --> 00:04:21,100 And it turns out that one of the mechanisms we're 114 00:04:21,100 --> 00:04:23,464 going to start to use today, annual use in p set five, 115 00:04:23,464 --> 00:04:24,630 is actually pretty familiar. 116 00:04:24,630 --> 00:04:27,430 For instance, if this is a bunch of exam books, each of which 117 00:04:27,430 --> 00:04:29,660 has a student's first name and last name on it, 118 00:04:29,660 --> 00:04:31,820 and I pick them up from at the end of an exam, 119 00:04:31,820 --> 00:04:33,746 and they're all pretty much in a random order, 120 00:04:33,746 --> 00:04:36,370 and we want to go about sorting these exams so that once graded 121 00:04:36,370 --> 00:04:38,661 it's just a lot easier and faster to hand them back out 122 00:04:38,661 --> 00:04:40,030 to students alphabetically. 123 00:04:40,030 --> 00:04:42,770 What would your instincts be for a pile of exams like this? 124 00:04:42,770 --> 00:04:45,019 >> Well, if you're like me, you might see that this is m, 125 00:04:45,019 --> 00:04:48,505 so I'm going to sort of put this into, if this is my table or my floor where 126 00:04:48,505 --> 00:04:50,650 I'm spreading things out-- or my array really-- 127 00:04:50,650 --> 00:04:52,210 I might put all of the Ms in there. 128 00:04:52,210 --> 00:04:52,710 Oh. 129 00:04:52,710 --> 00:04:55,020 Here's an A. So I might put the As over here. 130 00:04:55,020 --> 00:04:55,520 Oh. 131 00:04:55,520 --> 00:04:57,980 Here's another A. I'm going to put that over here. 132 00:04:57,980 --> 00:05:02,490 Here's a Z. Here is another M. And so I might start making piles like this. 133 00:05:02,490 --> 00:05:06,620 And then maybe I'd go in later and sort of very nitpicky-ly sort 134 00:05:06,620 --> 00:05:07,710 the individual piles. 135 00:05:07,710 --> 00:05:11,300 But the point is I would look at the input that I'm handed 136 00:05:11,300 --> 00:05:14,016 and I would make some calculated decision based on that input. 137 00:05:14,016 --> 00:05:15,640 If it starts with A, put it over there. 138 00:05:15,640 --> 00:05:18,980 If it starts with Z, put it over there, and everything in between. 139 00:05:18,980 --> 00:05:22,730 >> So this is a technique that's generally known as hashing-- H-A-S-H-- 140 00:05:22,730 --> 00:05:26,550 which generally means taking as input and using that input to compute 141 00:05:26,550 --> 00:05:30,940 a value, generally a number, and that number is the index into a storage 142 00:05:30,940 --> 00:05:32,260 container, like an array. 143 00:05:32,260 --> 00:05:35,490 So in other words, I might have a hash function, as I do in my head, 144 00:05:35,490 --> 00:05:37,940 that if I see someone's name who starts with A, 145 00:05:37,940 --> 00:05:40,190 I'm going to map that to zero in my head. 146 00:05:40,190 --> 00:05:44,160 And if I see someone with Z, I'm going to map that to 25 in my head 147 00:05:44,160 --> 00:05:46,220 and then put that into the last most pile. 148 00:05:46,220 --> 00:05:50,990 >> Now, if you think about not my brain but a C program, what numbers could 149 00:05:50,990 --> 00:05:53,170 you rely on to achieve that same result? 150 00:05:53,170 --> 00:05:55,594 In other words, if you had the ASCII character A, 151 00:05:55,594 --> 00:05:57,510 how do you determine what bucket to put it in? 152 00:05:57,510 --> 00:05:59,801 You probably don't want to put it into bucket 65, which 153 00:05:59,801 --> 00:06:01,840 would be like over there for no good reason. 154 00:06:01,840 --> 00:06:04,320 Where do you want to put A in terms of its ASCII value? 155 00:06:04,320 --> 00:06:05,600 156 00:06:05,600 --> 00:06:08,920 Where do you want to do to its ASCII value to come up with a smarter bucket 157 00:06:08,920 --> 00:06:09,480 to put it in? 158 00:06:09,480 --> 00:06:10,206 >> AUDIENCE: Minus A. 159 00:06:10,206 --> 00:06:10,956 >> DAVID MALAN: Yeah. 160 00:06:10,956 --> 00:06:13,190 So minus A or minus specifically 65 if it's 161 00:06:13,190 --> 00:06:18,240 a capital A. Or 98 if it's a lowercase a. 162 00:06:18,240 --> 00:06:21,300 And so that would allow us to, very simply and very arithmetically, 163 00:06:21,300 --> 00:06:23,260 put something into a bucket like that. 164 00:06:23,260 --> 00:06:26,010 So it turns out we actually do this as well even with the quizzes. 165 00:06:26,010 --> 00:06:29,051 >> So you might recall you circled your teaching fellow's name on the cover. 166 00:06:29,051 --> 00:06:32,270 And the TF's names were organized into these columns alphabetically, 167 00:06:32,270 --> 00:06:34,400 well, believe it or not, when all 80 plus of us 168 00:06:34,400 --> 00:06:37,800 got together the other night to grade, the last step in our grading process 169 00:06:37,800 --> 00:06:41,830 is to hash the quizzes into a big space of floor at the [INAUDIBLE] 170 00:06:41,830 --> 00:06:45,110 and to lay everyone's quizzes out in exactly the order of their TF's 171 00:06:45,110 --> 00:06:47,700 names on the cover, because then it's a lot easier for us 172 00:06:47,700 --> 00:06:51,290 to search through that using linear search or some kind of cleverness 173 00:06:51,290 --> 00:06:54,050 for a TF to find his or her students' quizzes. 174 00:06:54,050 --> 00:06:56,060 >> So this idea of hashing that you'll see is 175 00:06:56,060 --> 00:07:00,520 quite powerful is actually pretty commonplace and very intuitive, 176 00:07:00,520 --> 00:07:03,000 much like perhaps divide and conquer was in week zero. 177 00:07:03,000 --> 00:07:05,250 I fast forward to the hackathon a couple of years ago. 178 00:07:05,250 --> 00:07:08,040 This was Zamyla and a couple of other staff greeting students 179 00:07:08,040 --> 00:07:09,030 as they came in. 180 00:07:09,030 --> 00:07:12,680 And we had a whole bunch of folding tables there with name tags. 181 00:07:12,680 --> 00:07:15,380 And we had the name tags organized with like the As over there 182 00:07:15,380 --> 00:07:16,690 and the Zs over there. 183 00:07:16,690 --> 00:07:20,350 And so one of the TFs very cleverly wrote this as the instructions 184 00:07:20,350 --> 00:07:21,030 for the day. 185 00:07:21,030 --> 00:07:24,480 And in week 12 of the semester this all made perfect sense and everyone 186 00:07:24,480 --> 00:07:25,310 knew what to do. 187 00:07:25,310 --> 00:07:27,900 But anytime you've queued in the same way, 188 00:07:27,900 --> 00:07:30,272 you're implementing the same notion of a hash. 189 00:07:30,272 --> 00:07:31,730 So let's formalize it a little bit. 190 00:07:31,730 --> 00:07:32,890 Here is an array. 191 00:07:32,890 --> 00:07:36,820 It's drawn to be a little wide just to depict, visually, 192 00:07:36,820 --> 00:07:38,920 that we might put strings in something like this. 193 00:07:38,920 --> 00:07:41,970 And this array is clearly of size 26 total. 194 00:07:41,970 --> 00:07:43,935 And the thing is called table arbitrarily. 195 00:07:43,935 --> 00:07:48,930 But this is just an artist's rendition of what a hash table might be. 196 00:07:48,930 --> 00:07:52,799 >> So a hash table now is going to be a higher level data structure. 197 00:07:52,799 --> 00:07:54,840 At the end of the day we're about to see that you 198 00:07:54,840 --> 00:07:58,700 can implement a hash table, which is much like the check-in line 199 00:07:58,700 --> 00:08:02,059 at a hackathon much like this table used for sorting exam books. 200 00:08:02,059 --> 00:08:03,850 But a hash table is sort of this high level 201 00:08:03,850 --> 00:08:08,250 concept that could use an array underneath the hood to implement it, 202 00:08:08,250 --> 00:08:11,890 or it could use a length list, or even perhaps some other data structures. 203 00:08:11,890 --> 00:08:15,590 And now that's the theme-- taking some of these fundamental ingredients 204 00:08:15,590 --> 00:08:18,310 like an array and this building block now of a length list 205 00:08:18,310 --> 00:08:21,740 and seeing what else we can build on top of those, like ingredients 206 00:08:21,740 --> 00:08:26,550 into a recipe, making more and more interesting and useful final results. 207 00:08:26,550 --> 00:08:28,680 >> So with the hash table we might implement it 208 00:08:28,680 --> 00:08:32,540 in memory pictorially like this, but how might it actually be coded up? 209 00:08:32,540 --> 00:08:33,789 Well, maybe as simply is this. 210 00:08:33,789 --> 00:08:38,270 If CAPACITY in all caps, is just some constant-- for instance 26, 211 00:08:38,270 --> 00:08:42,030 for 26 letters of the alphabet-- I might call my variable table, 212 00:08:42,030 --> 00:08:45,630 and I might claim that I'm going to put char stars in there, or string. 213 00:08:45,630 --> 00:08:49,880 So it's as simple as this if you want to implement a hash table. 214 00:08:49,880 --> 00:08:51,490 And yet, this is really just an array. 215 00:08:51,490 --> 00:08:53,198 But again, a hash table is now what we'll 216 00:08:53,198 --> 00:08:57,470 call an abstract data type that's just sort of a conceptual layering on top 217 00:08:57,470 --> 00:09:00,780 of something more mundane now like an array. 218 00:09:00,780 --> 00:09:02,960 >> Now, how do we go about solving problems? 219 00:09:02,960 --> 00:09:06,980 Well, earlier I had the luxury of having enough table space here 220 00:09:06,980 --> 00:09:09,460 so that I could put the quizzes anywhere I wanted. 221 00:09:09,460 --> 00:09:10,620 So As might go here. 222 00:09:10,620 --> 00:09:12,100 Zs might go here. 223 00:09:12,100 --> 00:09:13,230 Ms might go here. 224 00:09:13,230 --> 00:09:14,740 And then I had some extra space. 225 00:09:14,740 --> 00:09:18,740 But this is a bit of a cheat right now because this table, if I really 226 00:09:18,740 --> 00:09:22,720 thought of it as an array, is just going to be of some fixed size. 227 00:09:22,720 --> 00:09:25,380 >> So technically, if I pull up another student's quiz 228 00:09:25,380 --> 00:09:28,490 and see, oh, this person's name starts with an A too, 229 00:09:28,490 --> 00:09:30,980 I kind of want to put it there. 230 00:09:30,980 --> 00:09:34,740 But as soon as I put it there, if this table indeed represents an array, 231 00:09:34,740 --> 00:09:37,840 I'm going to be overriding or clobbering whoever this student's quiz is. 232 00:09:37,840 --> 00:09:38,340 Right? 233 00:09:38,340 --> 00:09:41,972 If this is an array, only one thing can go in each of these cells or elements. 234 00:09:41,972 --> 00:09:43,680 And so I kind of have to pick and choose. 235 00:09:43,680 --> 00:09:45,735 >> Now earlier I kind of cheated and did this or I 236 00:09:45,735 --> 00:09:47,526 just kind of stacked them above each other. 237 00:09:47,526 --> 00:09:49,170 But that's not going to fly in code. 238 00:09:49,170 --> 00:09:52,260 So where could I put the second student whose name 239 00:09:52,260 --> 00:09:54,964 is A if all I had is this available table space? 240 00:09:54,964 --> 00:09:57,880 And I've used three slots and it looks like there's just a few others. 241 00:09:57,880 --> 00:09:58,959 What could you do? 242 00:09:58,959 --> 00:09:59,834 AUDIENCE: [INAUDIBLE] 243 00:09:59,834 --> 00:10:00,565 244 00:10:00,565 --> 00:10:01,315 DAVID MALAN: Yeah. 245 00:10:01,315 --> 00:10:02,370 Maybe let's just keep it simple. 246 00:10:02,370 --> 00:10:02,660 Right? 247 00:10:02,660 --> 00:10:04,243 It doesn't fit where I want to put it. 248 00:10:04,243 --> 00:10:07,450 So I'm going to put it technically where a B would go. 249 00:10:07,450 --> 00:10:09,932 Now, of course, I'm starting to paint myself into a corner. 250 00:10:09,932 --> 00:10:11,890 If I get to a student whose name is actually B, 251 00:10:11,890 --> 00:10:14,840 now B is going to be moved a little forward, as might happen, yep, 252 00:10:14,840 --> 00:10:17,530 if this is a B, now it has to go here. 253 00:10:17,530 --> 00:10:20,180 >> And so this very quickly could become problematic, 254 00:10:20,180 --> 00:10:23,850 but it's a technique that actually is referred to as linear probing, 255 00:10:23,850 --> 00:10:26,650 whereby you just consider your array to be along the line. 256 00:10:26,650 --> 00:10:29,680 And you just kind of probe or inspect each available element 257 00:10:29,680 --> 00:10:31,360 looking for an available spot. 258 00:10:31,360 --> 00:10:34,010 And as soon as you find one, you drop it in there. 259 00:10:34,010 --> 00:10:38,390 >> Now, the price being paid now for this solution is what? 260 00:10:38,390 --> 00:10:41,300 We have a fixed size array, and when I insert names 261 00:10:41,300 --> 00:10:44,059 into it, at least initially, what's the running time of insertion 262 00:10:44,059 --> 00:10:46,350 for putting the students' quizzes in the right buckets? 263 00:10:46,350 --> 00:10:48,710 264 00:10:48,710 --> 00:10:50,002 Big O of what? 265 00:10:50,002 --> 00:10:51,147 >> AUDIENCE: n. 266 00:10:51,147 --> 00:10:52,480 DAVID MALAN: I heard big O of n. 267 00:10:52,480 --> 00:10:53,530 268 00:10:53,530 --> 00:10:54,300 Not true. 269 00:10:54,300 --> 00:10:56,490 But we'll tease apart why in just a moment. 270 00:10:56,490 --> 00:10:57,702 What else might it be? 271 00:10:57,702 --> 00:10:58,755 >> AUDIENCE: [INAUDIBLE] 272 00:10:58,755 --> 00:11:00,380 DAVID MALAN: And let me do it visually. 273 00:11:00,380 --> 00:11:04,720 So suppose this is the letter S. 274 00:11:04,720 --> 00:11:05,604 >> AUDIENCE: It's one. 275 00:11:05,604 --> 00:11:06,520 DAVID MALAN: It's one. 276 00:11:06,520 --> 00:11:06,710 Right? 277 00:11:06,710 --> 00:11:08,950 This is an array, which means we have random access. 278 00:11:08,950 --> 00:11:11,790 And if we think of this as zero and this as 25, 279 00:11:11,790 --> 00:11:13,800 and we realize that, oh, here's my input S, 280 00:11:13,800 --> 00:11:16,350 I can certainly convert S, an ASCII character, 281 00:11:16,350 --> 00:11:18,540 to a corresponding number between zero and 25 282 00:11:18,540 --> 00:11:20,910 and then immediately put it where it belongs. 283 00:11:20,910 --> 00:11:26,120 >> But of course, as soon as I get to the second person who's name is A or B or C 284 00:11:26,120 --> 00:11:29,300 eventually, if I've used the linear probing as my solution, 285 00:11:29,300 --> 00:11:31,360 the running time of insertion in the worst case 286 00:11:31,360 --> 00:11:33,120 is actually going to devolve into what? 287 00:11:33,120 --> 00:11:34,270 288 00:11:34,270 --> 00:11:36,045 And I did hear it here correctly early on. 289 00:11:36,045 --> 00:11:36,920 AUDIENCE: [INAUDIBLE] 290 00:11:36,920 --> 00:11:41,620 DAVID MALAN: So it is n indeed once you have a sufficiently large data set. 291 00:11:41,620 --> 00:11:44,410 So, on the one hand, if your array is big enough 292 00:11:44,410 --> 00:11:48,287 and your data is sparse enough, you get this beautiful constant time. 293 00:11:48,287 --> 00:11:50,620 But as soon as you start getting more and more elements, 294 00:11:50,620 --> 00:11:53,200 and just statistically you get more people with the letter 295 00:11:53,200 --> 00:11:56,030 A as their name or the letter B, it could potentially 296 00:11:56,030 --> 00:11:57,900 devolve into something more linear. 297 00:11:57,900 --> 00:11:59,640 So not quite perfect. 298 00:11:59,640 --> 00:12:00,690 So could we do better? 299 00:12:00,690 --> 00:12:03,210 >> Well, what was our solution before when we 300 00:12:03,210 --> 00:12:06,820 want to have more dynamism than something like an array allowed? 301 00:12:06,820 --> 00:12:08,085 302 00:12:08,085 --> 00:12:08,960 AUDIENCE: [INAUDIBLE] 303 00:12:08,960 --> 00:12:10,030 DAVID MALAN: What did we introduce? 304 00:12:10,030 --> 00:12:10,530 Yeah. 305 00:12:10,530 --> 00:12:11,430 So a linked list. 306 00:12:11,430 --> 00:12:14,430 Well, let's see what a linked list might do for us instead. 307 00:12:14,430 --> 00:12:17,630 Well, let me propose that we draw the picture as follows. 308 00:12:17,630 --> 00:12:19,620 Now this is a different picture from an example 309 00:12:19,620 --> 00:12:24,750 from a different text, actually, that is actually using an array of size 31. 310 00:12:24,750 --> 00:12:28,220 And this author simply decided to hash strings 311 00:12:28,220 --> 00:12:32,430 not based on the person's names, but based on their birthdates. 312 00:12:32,430 --> 00:12:35,680 Irrespective of the month, they figured if you're born on the first of a month 313 00:12:35,680 --> 00:12:39,580 or the 31st of a month, the author will hash based on that value, 314 00:12:39,580 --> 00:12:44,154 so as to spread the names out a bit more than just 26 spots might allow. 315 00:12:44,154 --> 00:12:47,320 And perhaps it's a little more uniform than going with alphabetical letters, 316 00:12:47,320 --> 00:12:50,236 because of course there's probably more people in the world with names 317 00:12:50,236 --> 00:12:54,020 that start with A than certainly some other letters of the alphabet. 318 00:12:54,020 --> 00:12:56,380 So maybe this is a little more uniform, assuming 319 00:12:56,380 --> 00:12:58,640 a uniform distribution of babies across a month. 320 00:12:58,640 --> 00:12:59,990 >> But, of course, this is still imperfect. 321 00:12:59,990 --> 00:13:00,370 Right? 322 00:13:00,370 --> 00:13:01,370 We're having collisions. 323 00:13:01,370 --> 00:13:04,680 Multiple people in this data structure are still 324 00:13:04,680 --> 00:13:08,432 having the same birthdate at least you're irrespective of month. 325 00:13:08,432 --> 00:13:09,640 But what has the author done? 326 00:13:09,640 --> 00:13:13,427 Well, it looks like we have an array on the left-hand side drawn vertically, 327 00:13:13,427 --> 00:13:15,010 but that's just an artist's rendition. 328 00:13:15,010 --> 00:13:18,009 It doesn't matter what direction you draw an array, it's still an array. 329 00:13:18,009 --> 00:13:20,225 What is this an array of apparently? 330 00:13:20,225 --> 00:13:21,500 >> AUDIENCE: Linked list. 331 00:13:21,500 --> 00:13:21,650 >> DAVID MALAN: Yeah. 332 00:13:21,650 --> 00:13:23,490 It looks like it's an array of linked list. 333 00:13:23,490 --> 00:13:26,490 So again, to this point of sort of using these data structures now 334 00:13:26,490 --> 00:13:28,550 as ingredients to more interesting solutions, 335 00:13:28,550 --> 00:13:30,862 you can absolutely take a fundamental, like an array, 336 00:13:30,862 --> 00:13:33,320 and then take something more interesting like a linked list 337 00:13:33,320 --> 00:13:36,660 and even combine them into an even more interesting data structure. 338 00:13:36,660 --> 00:13:39,630 And indeed, this too would be called a hash table, 339 00:13:39,630 --> 00:13:42,610 whereby the array is really the hash table, 340 00:13:42,610 --> 00:13:45,600 but that hash table has chains, so to speak, 341 00:13:45,600 --> 00:13:50,220 that can grow or shrink based on the number of elements you want to insert. 342 00:13:50,220 --> 00:13:52,990 >> Now, accordingly, what's the running time now? 343 00:13:52,990 --> 00:13:58,030 If I want to insert someone whose birthday is October 31, 344 00:13:58,030 --> 00:13:59,040 where does he or she go? 345 00:13:59,040 --> 00:14:00,530 346 00:14:00,530 --> 00:14:01,030 All right. 347 00:14:01,030 --> 00:14:02,819 At the very bottom where it says 31. 348 00:14:02,819 --> 00:14:03,610 And that's perfect. 349 00:14:03,610 --> 00:14:05,060 That was constant time. 350 00:14:05,060 --> 00:14:08,760 But what if we find someone else whose birthday is, let's see, 351 00:14:08,760 --> 00:14:10,950 October, November, December 31? 352 00:14:10,950 --> 00:14:12,790 Where is he or she going to go? 353 00:14:12,790 --> 00:14:13,290 Same thing. 354 00:14:13,290 --> 00:14:13,970 Two step though. 355 00:14:13,970 --> 00:14:15,303 That's constant though isn't it? 356 00:14:15,303 --> 00:14:16,360 357 00:14:16,360 --> 00:14:16,860 All right. 358 00:14:16,860 --> 00:14:17,840 At the moment it is. 359 00:14:17,840 --> 00:14:20,570 But in the general case, the more people we add, 360 00:14:20,570 --> 00:14:23,790 probabilistically, we're going to get more and more collisions. 361 00:14:23,790 --> 00:14:26,820 >> Now this is a little better because technically 362 00:14:26,820 --> 00:14:34,580 now my chains could be in the worst case how long? 363 00:14:34,580 --> 00:14:38,890 If I insert n people into this more sophisticated data structure, n people, 364 00:14:38,890 --> 00:14:41,080 in the worst case it's going to be n. 365 00:14:41,080 --> 00:14:41,815 Why? 366 00:14:41,815 --> 00:14:43,332 >> AUDIENCE: Because if everybody has the same birthday, 367 00:14:43,332 --> 00:14:44,545 they're going to be one line. 368 00:14:44,545 --> 00:14:45,420 DAVID MALAN: Perfect. 369 00:14:45,420 --> 00:14:47,480 It might be a little contrived, but truly in the worst case, 370 00:14:47,480 --> 00:14:50,117 if everyone has the same birthday, given the inputs you have, 371 00:14:50,117 --> 00:14:51,950 you're going to have a massively long chain. 372 00:14:51,950 --> 00:14:54,241 And so, you could call it a hash table, but really it's 373 00:14:54,241 --> 00:14:56,810 just a massive linked list with a whole lot of wasted space. 374 00:14:56,810 --> 00:15:00,460 But in general, if we assume that at least birthdays are uniform-- 375 00:15:00,460 --> 00:15:01,750 and it probably isn't. 376 00:15:01,750 --> 00:15:02,587 I'm making that up. 377 00:15:02,587 --> 00:15:04,420 But if we assume, for the sake of discussion 378 00:15:04,420 --> 00:15:07,717 that they are, then in theory, if this is the vertical representation 379 00:15:07,717 --> 00:15:11,050 of the array, well then hopefully you're going to get chains that are, you know, 380 00:15:11,050 --> 00:15:15,880 roughly the same length where each of these represents a day of the month. 381 00:15:15,880 --> 00:15:19,930 >> Now if there's 31 days in the month, that means my running time really 382 00:15:19,930 --> 00:15:25,230 is big O of n over 31, which feels better than linear. 383 00:15:25,230 --> 00:15:27,950 But what was one of our commitments a couple of weeks 384 00:15:27,950 --> 00:15:31,145 ago whenever it came to expressing the running time of an algorithm? 385 00:15:31,145 --> 00:15:33,450 386 00:15:33,450 --> 00:15:35,190 Just only look at the high order term. 387 00:15:35,190 --> 00:15:35,690 Right? 388 00:15:35,690 --> 00:15:37,400 31 is definitely helpful. 389 00:15:37,400 --> 00:15:39,610 But this is still big O of n. 390 00:15:39,610 --> 00:15:41,730 But one of the themes of problem set five 391 00:15:41,730 --> 00:15:43,950 is going to be to acknowledge that absolutely, 392 00:15:43,950 --> 00:15:47,320 asymptotically, theoretically this data structure 393 00:15:47,320 --> 00:15:50,470 is no better than just one massive linked list. 394 00:15:50,470 --> 00:15:53,550 And indeed, in the worst case, this hash table might devolve into that. 395 00:15:53,550 --> 00:15:57,620 >> But in the real world, with us humans that own Macs or PCs or whatever 396 00:15:57,620 --> 00:16:01,240 and are running real world software on real world data, 397 00:16:01,240 --> 00:16:03,260 which algorithm are you going to prefer? 398 00:16:03,260 --> 00:16:09,180 The one that takes end steps or the one that takes n divided by 31 steps 399 00:16:09,180 --> 00:16:12,900 to find some piece of data or to look up some information? 400 00:16:12,900 --> 00:16:16,580 I mean, absolutely the 31 makes a difference in the real world. 401 00:16:16,580 --> 00:16:18,540 It is 31 times faster. 402 00:16:18,540 --> 00:16:20,880 And we humans are certainly going to appreciate that. 403 00:16:20,880 --> 00:16:23,004 >> So realize the dichotomy there between actually 404 00:16:23,004 --> 00:16:25,920 talking about things theoretically and asymptotically which definitely 405 00:16:25,920 --> 00:16:28,760 has value as we've seen, but in the real world, 406 00:16:28,760 --> 00:16:32,930 if you care about just making the human happy for general inputs, 407 00:16:32,930 --> 00:16:36,010 you might very well want to accept the fact that, yes, this is linear, 408 00:16:36,010 --> 00:16:38,360 but it's 31 times faster than linear might be. 409 00:16:38,360 --> 00:16:41,610 And better yet, we don't just have to do something arbitrary like a birthdate, 410 00:16:41,610 --> 00:16:44,030 we could spend a little more time and cleverness 411 00:16:44,030 --> 00:16:47,140 and think about what we might do, given a person's name and maybe 412 00:16:47,140 --> 00:16:50,130 their birthdate to combine those ingredients to figure out something 413 00:16:50,130 --> 00:16:52,720 that is truly more uniform and less jaggy, 414 00:16:52,720 --> 00:16:56,250 so to speak than this picture currently suggests it might be. 415 00:16:56,250 --> 00:16:57,750 How could we implement this in code? 416 00:16:57,750 --> 00:17:00,280 Well, let me propose that we just borrow some syntax we've 417 00:17:00,280 --> 00:17:01,799 used a couple times thus far. 418 00:17:01,799 --> 00:17:03,590 And I'm going to define a node, which again 419 00:17:03,590 --> 00:17:06,812 is a generic term for just some container for some data structure. 420 00:17:06,812 --> 00:17:09,020 I'm going to propose that a string is going in there. 421 00:17:09,020 --> 00:17:11,369 But we're going to start taking those training wheels off now. 422 00:17:11,369 --> 00:17:13,230 >> No more CS50 library really, unless you want 423 00:17:13,230 --> 00:17:15,230 to use it for your final project, which is fine, 424 00:17:15,230 --> 00:17:18,569 but now we're going to pull back the curtain and say it's just a char star. 425 00:17:18,569 --> 00:17:22,069 So the word there is going to be the person's name in question. 426 00:17:22,069 --> 00:17:25,079 And now I have a link here to the next node 427 00:17:25,079 --> 00:17:28,170 so that these represent each of the nodes 428 00:17:28,170 --> 00:17:30,950 in the chain, potentially, of a linked list. 429 00:17:30,950 --> 00:17:34,090 >> And now how do I declare the hash table itself? 430 00:17:34,090 --> 00:17:36,660 How do I declare this whole structure? 431 00:17:36,660 --> 00:17:40,960 Well, really, much like I used a pointer to just the first element of a list 432 00:17:40,960 --> 00:17:44,510 before, similarly can I just say I just need a bunch of pointers 433 00:17:44,510 --> 00:17:46,270 to implement this whole hash table. 434 00:17:46,270 --> 00:17:49,484 I'm going to have an array called table for hash table. 435 00:17:49,484 --> 00:17:50,900 It's going to be of size capacity. 436 00:17:50,900 --> 00:17:52,525 That's how many elements can fit in it. 437 00:17:52,525 --> 00:17:56,180 And each of those elements in this array is going to be a node star. 438 00:17:56,180 --> 00:17:56,810 Why? 439 00:17:56,810 --> 00:18:00,160 Well, per this picture, what I'm implementing the hash table as 440 00:18:00,160 --> 00:18:04,330 effectively in the beginning is just this array that we've drawn vertically, 441 00:18:04,330 --> 00:18:06,820 each of whose squares represents a pointer. 442 00:18:06,820 --> 00:18:09,170 That ones that have slashes through them are just null. 443 00:18:09,170 --> 00:18:11,410 And the ones that have arrows going to the right 444 00:18:11,410 --> 00:18:16,140 are actual pointers to actual nodes, ergo the start of a linked list. 445 00:18:16,140 --> 00:18:19,050 >> So here, then, is how we might implement a hash table that 446 00:18:19,050 --> 00:18:21,580 implements separate chaining. 447 00:18:21,580 --> 00:18:22,840 Now can we do better? 448 00:18:22,840 --> 00:18:25,632 All right I promised last time that we could achieve constant time. 449 00:18:25,632 --> 00:18:27,381 And I kind of gave you constant time here, 450 00:18:27,381 --> 00:18:29,850 but then said not really constant time because it's still 451 00:18:29,850 --> 00:18:31,890 dependent on the total number of elements 452 00:18:31,890 --> 00:18:34,500 you're inputting into the data structure. 453 00:18:34,500 --> 00:18:35,980 But suppose we did this. 454 00:18:35,980 --> 00:18:39,550 Let me go back to the screen over here. 455 00:18:39,550 --> 00:18:44,520 Let me also project this up here, clear the screen, and suppose I did this. 456 00:18:44,520 --> 00:18:49,300 Suppose I wanted to insert the name Daven in into my data structure. 457 00:18:49,300 --> 00:18:52,100 >> So I want to insert a string Daven into the data structure. 458 00:18:52,100 --> 00:18:54,370 What if I don't use a hash table, but I use 459 00:18:54,370 --> 00:18:56,980 something that's more tree-like like a family tree, where 460 00:18:56,980 --> 00:18:59,670 you have some root at the top and then nodes and leaves 461 00:18:59,670 --> 00:19:01,440 that go downward and outward. 462 00:19:01,440 --> 00:19:04,450 Suppose then, that I want to insert Daven's 463 00:19:04,450 --> 00:19:06,430 into what's currently an empty list. 464 00:19:06,430 --> 00:19:09,780 I'm going to do the following: I'm going to create a node in this family 465 00:19:09,780 --> 00:19:15,170 tree-like data structure that looks a little like this, each of which 466 00:19:15,170 --> 00:19:19,640 rectangles has, let's say, for now 26 elements in it. 467 00:19:19,640 --> 00:19:21,650 And each of the cells in this array is going 468 00:19:21,650 --> 00:19:23,470 to represent the letter of an alphabet. 469 00:19:23,470 --> 00:19:28,190 >> Specifically, I'm going to treat this is A, then B, then C, then D, 470 00:19:28,190 --> 00:19:29,310 this one here. 471 00:19:29,310 --> 00:19:32,940 So this is going to effectively represent the letter D. 472 00:19:32,940 --> 00:19:36,040 But to insert all of Daven's name I need to do a bit more. 473 00:19:36,040 --> 00:19:37,840 So I'm first going to hash, so to speak. 474 00:19:37,840 --> 00:19:41,049 I'm going to look at the first letter in Daven's which is obviously a D, 475 00:19:41,049 --> 00:19:42,840 and I'm going to allocate a node that looks 476 00:19:42,840 --> 00:19:45,570 like this-- a big rectangle big enough to fit the whole alphabet. 477 00:19:45,570 --> 00:19:47,140 >> Now D is done. 478 00:19:47,140 --> 00:19:49,720 Now A. D-A-V-E-N is the goal. 479 00:19:49,720 --> 00:19:51,220 So now what I'm going to do is this. 480 00:19:51,220 --> 00:19:54,027 As soon as I started D notice there's no pointer there. 481 00:19:54,027 --> 00:19:56,860 It's garbage values at the moment, or I might initialize it to null. 482 00:19:56,860 --> 00:19:59,630 But let me keep going with this idea of building a tree. 483 00:19:59,630 --> 00:20:04,260 Let me allocate another one of these nodes that has 26 elements in it. 484 00:20:04,260 --> 00:20:05,150 >> And you know what? 485 00:20:05,150 --> 00:20:09,130 If this is just a node in memory that I created with malloc, using a struct 486 00:20:09,130 --> 00:20:11,240 as we'll soon see, I'm going to do this-- 487 00:20:11,240 --> 00:20:14,450 I'm going to draw an arrow from the thing that represented D down 488 00:20:14,450 --> 00:20:15,860 to this new node. 489 00:20:15,860 --> 00:20:19,240 And now, first the next letter in Daven's name, 490 00:20:19,240 --> 00:20:24,150 V-- D-A-V-- I'm going to go ahead and draw another node like this, 491 00:20:24,150 --> 00:20:30,150 whereby, the V elements here, which we'll draw for instance-- whoops. 492 00:20:30,150 --> 00:20:31,020 We won't draw there. 493 00:20:31,020 --> 00:20:31,936 It's going to go here. 494 00:20:31,936 --> 00:20:32,890 495 00:20:32,890 --> 00:20:35,712 >> Then we're going to consider this to be V. 496 00:20:35,712 --> 00:20:44,920 And then down here we're going to index down from V into what we'll consider E. 497 00:20:44,920 --> 00:20:50,100 And then from here we're going to go have one of these nodes here. 498 00:20:50,100 --> 00:20:52,930 And now we have a question to answer. 499 00:20:52,930 --> 00:20:57,840 I need to somehow indicate that we're at the end of the string Daven. 500 00:20:57,840 --> 00:20:59,490 So I could just leave it null. 501 00:20:59,490 --> 00:21:02,670 >> But what if we have Daven's full name also, which 502 00:21:02,670 --> 00:21:04,280 is, as we've said, Davenport? 503 00:21:04,280 --> 00:21:06,970 So what if Daven is actually a substring, 504 00:21:06,970 --> 00:21:08,960 a prefix of a much longer string? 505 00:21:08,960 --> 00:21:11,450 We can't just permanently say nothing is going 506 00:21:11,450 --> 00:21:14,410 to go there, because we could never insert a word like Davenport 507 00:21:14,410 --> 00:21:15,840 into this data Structure 508 00:21:15,840 --> 00:21:19,560 >> So what we could do instead is treat each of these elements 509 00:21:19,560 --> 00:21:22,170 as maybe having two elements inside of them. 510 00:21:22,170 --> 00:21:24,810 One is a pointer, indeed, as I've been doing. 511 00:21:24,810 --> 00:21:27,100 So each of these boxes is not just one cell. 512 00:21:27,100 --> 00:21:29,855 But what if the top one-- the bottom one's 513 00:21:29,855 --> 00:21:32,230 going to be null, because there is no Davenport just yet. 514 00:21:32,230 --> 00:21:34,197 What if the top one is some special value? 515 00:21:34,197 --> 00:21:36,530 And it's going to be a little hard to draw it this size. 516 00:21:36,530 --> 00:21:38,130 But suppose it's just a check mark. 517 00:21:38,130 --> 00:21:38,920 Check. 518 00:21:38,920 --> 00:21:44,230 D-A-V-E-N is a string in this data structure. 519 00:21:44,230 --> 00:21:48,350 >> Meanwhile, if I had more space here, I could do P-O-R-T, 520 00:21:48,350 --> 00:21:52,650 and I could put check in the node that has the letter T at the very end. 521 00:21:52,650 --> 00:21:55,460 So this is a massively complex-looking data structure. 522 00:21:55,460 --> 00:21:57,210 And my handwriting certainly doesn't help. 523 00:21:57,210 --> 00:22:00,043 But if I wanted to insert something else, consider what we would do. 524 00:22:00,043 --> 00:22:03,370 If we wanted to put David in, we'd follow the same logic, D-A-V, 525 00:22:03,370 --> 00:22:08,802 but now I would point in the next element not from E, but from I to D. 526 00:22:08,802 --> 00:22:10,760 So there's going to be more nodes in this tree. 527 00:22:10,760 --> 00:22:12,325 We're going to have call malloc more. 528 00:22:12,325 --> 00:22:14,700 But I don't want to make a complete mess of this picture. 529 00:22:14,700 --> 00:22:17,710 So let's instead look at one that's been pre-formulated 530 00:22:17,710 --> 00:22:21,810 like this with not dot, dot, dots, but just abbreviated arrays. 531 00:22:21,810 --> 00:22:23,950 But each of the nodes in this tree up here 532 00:22:23,950 --> 00:22:26,700 represents the same thing-- an array Ray of size 26. 533 00:22:26,700 --> 00:22:28,860 >> Or if we want to be really proper now, what 534 00:22:28,860 --> 00:22:30,790 if someone's name as an apostrophe, let's 535 00:22:30,790 --> 00:22:35,560 assume that each node actually has like 27 indexes in it, not just 26. 536 00:22:35,560 --> 00:22:42,020 So this now is going to be a data structure called a trie-- T-R-I-E. 537 00:22:42,020 --> 00:22:46,120 A trie, which is supposedly historically a clever name for a tree 538 00:22:46,120 --> 00:22:49,040 that's optimized for retrieval, which of course, 539 00:22:49,040 --> 00:22:50,870 is spelled with an I-E so it's trie. 540 00:22:50,870 --> 00:22:52,710 But that is the history of the trie. 541 00:22:52,710 --> 00:22:55,860 >> So a trie is this tree-like data structure like a family tree 542 00:22:55,860 --> 00:22:57,510 that ultimately behaves like that. 543 00:22:57,510 --> 00:23:00,890 And here is just another example of a whole bunch of other people's names. 544 00:23:00,890 --> 00:23:03,540 But the question now at hand is what have 545 00:23:03,540 --> 00:23:08,070 we gained by introducing arguably a more complicated data structure, and one, 546 00:23:08,070 --> 00:23:09,870 frankly, that uses a lot of memory. 547 00:23:09,870 --> 00:23:11,703 >> Because even though, at the moment, I'm only 548 00:23:11,703 --> 00:23:15,050 using D's pointer and A and V and Es and Ns, 549 00:23:15,050 --> 00:23:16,700 I'm wasting a heck of lot of memory. 550 00:23:16,700 --> 00:23:18,030 551 00:23:18,030 --> 00:23:22,660 But where I spend one resource, I tend to do gain back another. 552 00:23:22,660 --> 00:23:26,020 So if I'm spending more space, what's probably the hope? 553 00:23:26,020 --> 00:23:27,407 That I'm spending less what? 554 00:23:27,407 --> 00:23:28,240 AUDIENCE: Less time. 555 00:23:28,240 --> 00:23:28,990 DAVID MALAN: Time. 556 00:23:28,990 --> 00:23:30,320 Now why might that be? 557 00:23:30,320 --> 00:23:33,880 Well, what is the insertion time, in terms of big O now, 558 00:23:33,880 --> 00:23:37,660 of a name like Daven or Davenport or David? 559 00:23:37,660 --> 00:23:39,340 Well, Daven was five steps. 560 00:23:39,340 --> 00:23:42,350 Davenport would be nine steps, so it would be a few more steps. 561 00:23:42,350 --> 00:23:44,250 David would be five steps as well. 562 00:23:44,250 --> 00:23:47,230 So those are concrete numbers, but surely there's 563 00:23:47,230 --> 00:23:49,550 an upper bound on the length of someone's name. 564 00:23:49,550 --> 00:23:52,240 And indeed, in the problem sets of five specification, 565 00:23:52,240 --> 00:23:54,050 we're going to propose that it's something 566 00:23:54,050 --> 00:23:55,470 that's 40-some-odd characters. 567 00:23:55,470 --> 00:23:58,180 >> Realistically, no one has an infinitely long name, 568 00:23:58,180 --> 00:24:01,542 which is to say that the length of a name or the length of a string we might 569 00:24:01,542 --> 00:24:03,750 have certain the state of structure is arguably what? 570 00:24:03,750 --> 00:24:05,550 571 00:24:05,550 --> 00:24:06,250 It's constant. 572 00:24:06,250 --> 00:24:06,430 Right? 573 00:24:06,430 --> 00:24:09,310 It might be a big constant like 40-something, but it is constant. 574 00:24:09,310 --> 00:24:13,752 And it has no dependency on how many other names are in this data structure. 575 00:24:13,752 --> 00:24:15,460 In other words, if I wanted to now insert 576 00:24:15,460 --> 00:24:20,540 Colton or Gabriel or Rob or Zamyla or Alison or Belinda or any other names 577 00:24:20,540 --> 00:24:23,940 from the staff into this data structure, is the running time 578 00:24:23,940 --> 00:24:26,750 of inserting other names going to be at all impacted 579 00:24:26,750 --> 00:24:30,220 by how many other elements are in the data structure already? 580 00:24:30,220 --> 00:24:31,040 It's not. 581 00:24:31,040 --> 00:24:31,540 Right? 582 00:24:31,540 --> 00:24:36,150 Because we're effectively using this multi-layer hash table. 583 00:24:36,150 --> 00:24:38,280 And the running time of any of these operations 584 00:24:38,280 --> 00:24:41,510 is dependent not on the number of elements that are in the data structure 585 00:24:41,510 --> 00:24:43,090 or that are eventually going to be in the data structure, 586 00:24:43,090 --> 00:24:44,714 but on the length of what specifically? 587 00:24:44,714 --> 00:24:46,500 588 00:24:46,500 --> 00:24:49,200 >> The string being inserted, which does make 589 00:24:49,200 --> 00:24:52,580 this asymptotically constant time-- big O of one. 590 00:24:52,580 --> 00:24:54,720 And frankly, just in the real world, this 591 00:24:54,720 --> 00:24:58,380 means inserting Daven's name takes like five steps, or Davenport nine 592 00:24:58,380 --> 00:25:00,100 steps, or David five steps. 593 00:25:00,100 --> 00:25:03,071 That's pretty darn small running times. 594 00:25:03,071 --> 00:25:05,320 And, indeed, that's a very good thing, especially when 595 00:25:05,320 --> 00:25:08,126 it's not dependent on the total number of elements in there. 596 00:25:08,126 --> 00:25:10,500 So how might we implement this kind of structure in code? 597 00:25:10,500 --> 00:25:12,900 It's a little more complex, but still it's 598 00:25:12,900 --> 00:25:15,050 just an application of basic building blocks. 599 00:25:15,050 --> 00:25:17,830 I'm going to redefine us node as follows: 600 00:25:17,830 --> 00:25:21,100 bool called word-- and this could be called anything. 601 00:25:21,100 --> 00:25:23,970 But the bool represents what I drew as a check mark. 602 00:25:23,970 --> 00:25:24,490 Yes. 603 00:25:24,490 --> 00:25:26,720 This is the end of a string in this data structure. 604 00:25:26,720 --> 00:25:30,702 >> And, of course, the node star there is referring to children. 605 00:25:30,702 --> 00:25:32,410 And, indeed, just like a family tree, you 606 00:25:32,410 --> 00:25:34,370 would consider the nodes that are hanging off 607 00:25:34,370 --> 00:25:36,920 of the bottom of some parent element to be children. 608 00:25:36,920 --> 00:25:40,510 And so the children is going to be an array of 27, the 27th one 609 00:25:40,510 --> 00:25:41,680 just being for apostrophe. 610 00:25:41,680 --> 00:25:43,390 We're going to sort of special case that. 611 00:25:43,390 --> 00:25:45,400 So you can have certain names with apostrophes. 612 00:25:45,400 --> 00:25:47,399 Maybe even hyphen should go in there, but you'll 613 00:25:47,399 --> 00:25:50,330 see in p set 5 we only care about letters and apostrophes. 614 00:25:50,330 --> 00:25:52,990 >> And then how do you represent the data structure itself? 615 00:25:52,990 --> 00:25:56,454 How do you represent the root of this trie, so to speak? 616 00:25:56,454 --> 00:25:59,620 Well, just like with a linked list, you need a pointer to the first element. 617 00:25:59,620 --> 00:26:04,270 With a trie you just need one pointer to the root of this trie. 618 00:26:04,270 --> 00:26:07,290 And from there you can hash your way down deeper and deeper 619 00:26:07,290 --> 00:26:10,460 to every other node in the structure. 620 00:26:10,460 --> 00:26:13,440 So simply with this can we represent that struct. 621 00:26:13,440 --> 00:26:15,877 >> Now Meanwhile-- Oh, question. 622 00:26:15,877 --> 00:26:17,220 >> AUDIENCE: What's bool word? 623 00:26:17,220 --> 00:26:20,490 >> DAVID MALAN: Bool word is just this C incarnation 624 00:26:20,490 --> 00:26:22,920 of what I described in this box here, when 625 00:26:22,920 --> 00:26:26,000 I started splitting each of the array's elements into two pieces. 626 00:26:26,000 --> 00:26:27,600 One is a pointer to the next node. 627 00:26:27,600 --> 00:26:30,280 The other has to be something like a check box 628 00:26:30,280 --> 00:26:33,770 to say yes, there's a word Daven that ends here, 629 00:26:33,770 --> 00:26:35,610 because we don't want, at the moment, Dave. 630 00:26:35,610 --> 00:26:39,320 >> Even though Dave is going to be a legitimate word, he's not in the trie 631 00:26:39,320 --> 00:26:39,830 yet. 632 00:26:39,830 --> 00:26:40,950 And D is not a word. 633 00:26:40,950 --> 00:26:42,770 And D-A is not a word or a name. 634 00:26:42,770 --> 00:26:45,020 So the check mark indicates only once you 635 00:26:45,020 --> 00:26:48,190 hit this node is the previous path of characters 636 00:26:48,190 --> 00:26:50,700 actually a string that you've inserted. 637 00:26:50,700 --> 00:26:53,660 So that's all the bool there is doing for us. 638 00:26:53,660 --> 00:26:55,500 >> Any other questions on tries? 639 00:26:55,500 --> 00:26:56,215 Yeah. 640 00:26:56,215 --> 00:26:58,035 >> AUDIENCE: What is the overlap? 641 00:26:58,035 --> 00:26:59,945 What if you have a Dave and a Daven? 642 00:26:59,945 --> 00:27:00,820 DAVID MALAN: Perfect. 643 00:27:00,820 --> 00:27:02,580 What if you have a Dave and a Daven? 644 00:27:02,580 --> 00:27:06,240 So if we insert, say a nickname, for David-- Dave-- D-A-V-E? 645 00:27:06,240 --> 00:27:07,370 646 00:27:07,370 --> 00:27:08,700 This is actually super simple. 647 00:27:08,700 --> 00:27:10,325 So we're only going to take four steps. 648 00:27:10,325 --> 00:27:11,042 649 00:27:11,042 --> 00:27:15,847 D-A-V-E. And what do I have to do once I hit that fourth node? 650 00:27:15,847 --> 00:27:16,680 Just going to check. 651 00:27:16,680 --> 00:27:18,000 We're already good to go. 652 00:27:18,000 --> 00:27:18,840 Done. 653 00:27:18,840 --> 00:27:19,750 Four steps. 654 00:27:19,750 --> 00:27:21,590 Constant time asymptotically. 655 00:27:21,590 --> 00:27:26,300 And now we've indicated that both Dave and Daven are strings in the structure. 656 00:27:26,300 --> 00:27:27,710 So not a problem. 657 00:27:27,710 --> 00:27:30,200 And notice how the presence of Daven did not make it 658 00:27:30,200 --> 00:27:34,750 take any more time or less time for Dave and vice versa. 659 00:27:34,750 --> 00:27:36,000 >> So what else can we now do? 660 00:27:36,000 --> 00:27:40,680 We've used this metaphor before of trays representing something. 661 00:27:40,680 --> 00:27:43,380 But it turns out that a stack of trays is actually 662 00:27:43,380 --> 00:27:47,187 demonstrative of another abstract data type-- a higher level data structure 663 00:27:47,187 --> 00:27:49,770 that at the end the day is just like an array or a linked list 664 00:27:49,770 --> 00:27:50,970 or something more mundane. 665 00:27:50,970 --> 00:27:53,270 But it's a more interesting conceptual concept. 666 00:27:53,270 --> 00:27:56,440 A stack, like these trays here in Mather, 667 00:27:56,440 --> 00:27:58,750 are generally called just that-- a stack. 668 00:27:58,750 --> 00:28:02,540 >> And in this type of data structure you have two operations-- 669 00:28:02,540 --> 00:28:05,880 you have one called push for adding something to the stack, 670 00:28:05,880 --> 00:28:08,320 like putting another tray back on the top of the stack. 671 00:28:08,320 --> 00:28:11,350 And then pop, which means you take the topmost tray off. 672 00:28:11,350 --> 00:28:16,210 But what's key about a stack is that it's got this curious characteristic. 673 00:28:16,210 --> 00:28:19,560 As the dining hall staff are rearranging the trays for the next meal, 674 00:28:19,560 --> 00:28:21,380 what's going to be true about how students 675 00:28:21,380 --> 00:28:22,856 interact with this data structure? 676 00:28:22,856 --> 00:28:24,480 AUDIENCE: They're going to pop one off. 677 00:28:24,480 --> 00:28:26,550 DAVID MALAN: They're going to pop one off, hopefully the top. 678 00:28:26,550 --> 00:28:28,910 Otherwise it's just kind of stupid to go all the way to the bottom. 679 00:28:28,910 --> 00:28:29,070 Right? 680 00:28:29,070 --> 00:28:31,620 The data structure doesn't really allow you to grab the bottom tray at least 681 00:28:31,620 --> 00:28:32,520 easily. 682 00:28:32,520 --> 00:28:35,040 So there's this curious property to a stack 683 00:28:35,040 --> 00:28:39,730 that the last item in is going to be the first one out. 684 00:28:39,730 --> 00:28:43,400 And computer scientists call this LIFO-- last in, first out. 685 00:28:43,400 --> 00:28:45,540 And it actually does have interesting applications. 686 00:28:45,540 --> 00:28:50,090 It's not necessarily as obvious as some others, but it can, indeed, be useful, 687 00:28:50,090 --> 00:28:54,040 and it can, indeed, be implemented in a couple of different ways. 688 00:28:54,040 --> 00:28:58,550 >> So one, and actually, let me not to dive into that. 689 00:28:58,550 --> 00:28:59,860 Let's do this instead. 690 00:28:59,860 --> 00:29:03,700 Let's look at one that's almost the same idea, but it's a little fairer. 691 00:29:03,700 --> 00:29:04,200 Right? 692 00:29:04,200 --> 00:29:07,560 If you're one of these fan boys or girls that really likes Apple products 693 00:29:07,560 --> 00:29:10,130 and you woke up at 3:00 AM to line up at some store 694 00:29:10,130 --> 00:29:14,150 to get the very latest iPhone, you might have queued up like this. 695 00:29:14,150 --> 00:29:15,800 >> Now a queue is very deliberately named. 696 00:29:15,800 --> 00:29:18,190 It's a line because there's some fairness to it. 697 00:29:18,190 --> 00:29:18,690 Right? 698 00:29:18,690 --> 00:29:21,690 It would kind of sucked if you've got there first at the Apple Store 699 00:29:21,690 --> 00:29:25,700 but you are effectively the bottommost tray because the Apple employees then 700 00:29:25,700 --> 00:29:28,189 pop the last person who actually got in line. 701 00:29:28,189 --> 00:29:31,230 So stacks and queues, even though functionally they're kind of the same-- 702 00:29:31,230 --> 00:29:33,105 it's just this collection of resources that's 703 00:29:33,105 --> 00:29:36,210 going to grow and shrink-- there's this fairness aspect to it, 704 00:29:36,210 --> 00:29:39,634 at least in the real world, where the operations you exercise 705 00:29:39,634 --> 00:29:40,800 are fundamentally different. 706 00:29:40,800 --> 00:29:43,360 A stack-- a queue rather-- is said to have 707 00:29:43,360 --> 00:29:45,320 two operations: n queue and d queue. 708 00:29:45,320 --> 00:29:46,341 709 00:29:46,341 --> 00:29:48,090 Or you can call them any number of things. 710 00:29:48,090 --> 00:29:50,770 But you just want to capture the notion that one is adding 711 00:29:50,770 --> 00:29:53,230 and one is ultimately subtracting. 712 00:29:53,230 --> 00:29:58,840 >> Now underneath the hood, both the stack and a queue could be implemented how? 713 00:29:58,840 --> 00:30:01,390 We won't go into the code of it because the higher level 714 00:30:01,390 --> 00:30:03,387 idea is sort of more obvious. 715 00:30:03,387 --> 00:30:04,470 I mean, what do humans do? 716 00:30:04,470 --> 00:30:07,030 If I'm the first person at the Apple Store and this is the front door, 717 00:30:07,030 --> 00:30:08,130 you know, I'm going to stand here. 718 00:30:08,130 --> 00:30:09,750 And the next person's going to stand here. 719 00:30:09,750 --> 00:30:11,500 And the next person's going to stand here. 720 00:30:11,500 --> 00:30:13,792 So what data structure lends itself to a queue? 721 00:30:13,792 --> 00:30:14,542 >> AUDIENCE: A queue. 722 00:30:14,542 --> 00:30:15,667 DAVID MALAN: Well, a queue. 723 00:30:15,667 --> 00:30:16,390 Sure. 724 00:30:16,390 --> 00:30:16,920 What else? 725 00:30:16,920 --> 00:30:17,600 >> AUDIENCE: A linked list. 726 00:30:17,600 --> 00:30:18,990 >> DAVID MALAN: A linked list you could implement. 727 00:30:18,990 --> 00:30:22,500 And a linked list is nice because then it can grow arbitrarily long as opposed 728 00:30:22,500 --> 00:30:24,880 to having some fixed number of people in the store. 729 00:30:24,880 --> 00:30:27,030 But maybe a fixed number of places is legitimate. 730 00:30:27,030 --> 00:30:30,350 Because if they only have like 20 iPhones on the first day, maybe 731 00:30:30,350 --> 00:30:33,930 they only need an array of size 20 to represent that queue, which 732 00:30:33,930 --> 00:30:37,070 is only to say now once we start talking about these higher level problems, 733 00:30:37,070 --> 00:30:38,890 you can implement it in any number of ways. 734 00:30:38,890 --> 00:30:42,030 And there's probably just going to be a trade off in space and time 735 00:30:42,030 --> 00:30:43,950 or just in your own code complexity. 736 00:30:43,950 --> 00:30:45,380 >> What about a stack? 737 00:30:45,380 --> 00:30:48,190 Well, a stack, we've seen too could just be these trays. 738 00:30:48,190 --> 00:30:50,007 And you could implement this an array. 739 00:30:50,007 --> 00:30:53,090 But at some point if you use an array, what's going to happen to the trays 740 00:30:53,090 --> 00:30:54,173 you're trying to put down? 741 00:30:54,173 --> 00:30:55,170 742 00:30:55,170 --> 00:30:55,670 All right. 743 00:30:55,670 --> 00:30:57,490 You're only going to be able to go so high. 744 00:30:57,490 --> 00:31:00,156 And I think in Mather they're actually recessed in that opening. 745 00:31:00,156 --> 00:31:01,950 So indeed, it's almost like Mather is using 746 00:31:01,950 --> 00:31:03,783 an array of fixed size, because you can only 747 00:31:03,783 --> 00:31:08,302 fit so many trays in that opening in the wall down below people's knees. 748 00:31:08,302 --> 00:31:10,010 And so that might be said to be an array, 749 00:31:10,010 --> 00:31:14,300 but we could certainly implement that more generally with a linked list. 750 00:31:14,300 --> 00:31:16,390 >> Well, what about another data structure? 751 00:31:16,390 --> 00:31:18,760 Let me pull up one other visual here. 752 00:31:18,760 --> 00:31:24,710 Something like how about this one here? 753 00:31:24,710 --> 00:31:28,920 Why might it be useful to have not something as fancy as a trie, which 754 00:31:28,920 --> 00:31:32,370 we saw had these very wide nodes, each of which is in an array? 755 00:31:32,370 --> 00:31:35,740 But what if we do something more simply, like an old school family tree, 756 00:31:35,740 --> 00:31:38,110 each of whose nodes here is just storing a number. 757 00:31:38,110 --> 00:31:42,180 Instead of a name or a descendant is just storing a number like this. 758 00:31:42,180 --> 00:31:45,250 >> Well, the jargon we use in data structures is both tries 759 00:31:45,250 --> 00:31:49,510 and trees, where a trie, again, is just one whose nodes are arrays, 760 00:31:49,510 --> 00:31:51,680 is still what you might use from grade school 761 00:31:51,680 --> 00:31:53,860 when you made a family tree-- leaves and the root 762 00:31:53,860 --> 00:31:57,250 of the tree and children of the parent and siblings thereof. 763 00:31:57,250 --> 00:32:03,670 And we might implement a tree, for instance, as simply as this. 764 00:32:03,670 --> 00:32:07,420 A tree, if it as a node, one of these circles that has a number, 765 00:32:07,420 --> 00:32:09,947 it's not going to have one pointer, but two. 766 00:32:09,947 --> 00:32:11,780 And as soon as you add a second pointer, you 767 00:32:11,780 --> 00:32:13,905 can actually now make sort of two-dimensional data 768 00:32:13,905 --> 00:32:14,780 structures in memory. 769 00:32:14,780 --> 00:32:16,660 Much like a two-dimensional array, you can 770 00:32:16,660 --> 00:32:18,904 have kind of two-dimensional linked lists but ones 771 00:32:18,904 --> 00:32:20,820 that follow a pattern where there's no cycles. 772 00:32:20,820 --> 00:32:24,487 It's truly a tree with one grandparent way up here and then 773 00:32:24,487 --> 00:32:27,320 some parents and children and grandchildren and great-grandchildren. 774 00:32:27,320 --> 00:32:28,370 and so forth. 775 00:32:28,370 --> 00:32:32,390 >> But what's really neat about this too, just to tease you with a bit of code, 776 00:32:32,390 --> 00:32:35,370 recall recursion from awhile back, whereby 777 00:32:35,370 --> 00:32:38,220 you write a function that calls itself. 778 00:32:38,220 --> 00:32:41,140 This is a beautiful opportunity to implement something 779 00:32:41,140 --> 00:32:42,920 like recursion, because consider this. 780 00:32:42,920 --> 00:32:43,860 >> This is a tree. 781 00:32:43,860 --> 00:32:48,040 And I've been a little anal with how I put the integers into the street. 782 00:32:48,040 --> 00:32:51,020 So much so that it has a special name-- a binary search tree. 783 00:32:51,020 --> 00:32:53,460 Now we've heard of binary search, but can you 784 00:32:53,460 --> 00:32:55,180 work backwards from this thing's name? 785 00:32:55,180 --> 00:32:59,280 What is the pattern of how I inserted the integers into this tree? 786 00:32:59,280 --> 00:33:00,696 It's not arbitrary. 787 00:33:00,696 --> 00:33:01,570 There's some pattern. 788 00:33:01,570 --> 00:33:02,090 Yeah. 789 00:33:02,090 --> 00:33:03,370 >> AUDIENCE: Smaller ones on the left. 790 00:33:03,370 --> 00:33:03,690 >> DAVID MALAN: Yeah. 791 00:33:03,690 --> 00:33:05,062 Smaller ones are on the left. 792 00:33:05,062 --> 00:33:06,270 Bigger ones are on the right. 793 00:33:06,270 --> 00:33:12,940 Such that a true statement is a parent is greater than its left child, 794 00:33:12,940 --> 00:33:14,850 but less than its right child. 795 00:33:14,850 --> 00:33:17,750 And that alone is even a recursive verbal definition 796 00:33:17,750 --> 00:33:20,500 because you can apply that same logic to every node 797 00:33:20,500 --> 00:33:23,080 and it only bottoms out, a base case if you 798 00:33:23,080 --> 00:33:25,740 will, when you hit one of the leaves, so to speak, 799 00:33:25,740 --> 00:33:28,580 where a leave has no children further. 800 00:33:28,580 --> 00:33:30,614 >> Now how might you find the number 44? 801 00:33:30,614 --> 00:33:32,280 You would start at the root and say, hm. 802 00:33:32,280 --> 00:33:35,690 55 is not 44 So do I want to go right or do I want to go left? 803 00:33:35,690 --> 00:33:37,190 Well, obviously you want to go left. 804 00:33:37,190 --> 00:33:40,060 And so it's just like the phone book example in binary search 805 00:33:40,060 --> 00:33:41,099 more generally. 806 00:33:41,099 --> 00:33:43,390 But we're implementing it now a little more dynamically 807 00:33:43,390 --> 00:33:45,339 than an array might allow. 808 00:33:45,339 --> 00:33:48,130 And in fact, if you want to look at the code, at first glance sure. 809 00:33:48,130 --> 00:33:49,671 It looks like a whole bunch of lines. 810 00:33:49,671 --> 00:33:51,220 But it's beautifully simple. 811 00:33:51,220 --> 00:33:54,490 If you want to implement a function called search whose purpose in life 812 00:33:54,490 --> 00:33:57,290 is to search for a value like n, an integer, 813 00:33:57,290 --> 00:34:01,756 and you're passed in a one pointer-- a pointer to the node of the roots, 814 00:34:01,756 --> 00:34:04,380 rather, of that tree from which you can access everything else, 815 00:34:04,380 --> 00:34:08,850 notice how straightforwardly you can implement the logic. 816 00:34:08,850 --> 00:34:10,880 If tree is null, obviously it's not there. 817 00:34:10,880 --> 00:34:11,880 Let's just return false. 818 00:34:11,880 --> 00:34:12,000 Right? 819 00:34:12,000 --> 00:34:14,040 If you hand it nothing, there's nothing there. 820 00:34:14,040 --> 00:34:17,900 >> Else, if n is less than tree arrow n-- now arrow n, 821 00:34:17,900 --> 00:34:20,670 recall we introduced super briefly the other day, 822 00:34:20,670 --> 00:34:25,100 and that just means de-reference the pointer and look at the field called n. 823 00:34:25,100 --> 00:34:27,690 So it means go there and look at the field called n. 824 00:34:27,690 --> 00:34:33,810 So if n, the value you're given, is less than the value in the trees integer, 825 00:34:33,810 --> 00:34:35,449 where do you want to go? 826 00:34:35,449 --> 00:34:36,389 To the left. 827 00:34:36,389 --> 00:34:37,780 >> So notice the recursion. 828 00:34:37,780 --> 00:34:39,860 I'm returning-- not true. 829 00:34:39,860 --> 00:34:40,989 Not false. 830 00:34:40,989 --> 00:34:45,670 I'm returning whatever the answer is from a call to myself, passing 831 00:34:45,670 --> 00:34:50,100 an n again, which is redundant, but what's slightly different now? 832 00:34:50,100 --> 00:34:51,989 How am I making the problem smaller? 833 00:34:51,989 --> 00:34:54,920 I'm passing in as the second argument, not the root of the tree, 834 00:34:54,920 --> 00:34:59,616 but the left child in this case. 835 00:34:59,616 --> 00:35:00,990 So I'm passing in the left child. 836 00:35:00,990 --> 00:35:04,720 >> Meanwhile, if n is bigger than the node I'm currently looking at, 837 00:35:04,720 --> 00:35:06,690 I search the right hand side. 838 00:35:06,690 --> 00:35:10,880 Else, if the tree is not null, and if the element's not to the left 839 00:35:10,880 --> 00:35:13,240 and it's not to the right, what is wonderfully the case? 840 00:35:13,240 --> 00:35:14,630 841 00:35:14,630 --> 00:35:18,440 We've actually found the node in question, and so we return true. 842 00:35:18,440 --> 00:35:21,490 >> So we've just scratched the surface now some of these data structures. 843 00:35:21,490 --> 00:35:24,370 In problem set five you'll explore these yet further, 844 00:35:24,370 --> 00:35:27,250 and you'll be given your design choice of how to go about this. 845 00:35:27,250 --> 00:35:30,250 What I'd like to conclude on is just a 30 second teaser 846 00:35:30,250 --> 00:35:32,080 of what awaits next week and beyond. 847 00:35:32,080 --> 00:35:35,390 >> As we begin-- thankfully you might think-- our transition slowly 848 00:35:35,390 --> 00:35:38,680 from the world of C and lower level implementation details, 849 00:35:38,680 --> 00:35:42,090 to a world in which we can take for granted that someone else has finally 850 00:35:42,090 --> 00:35:44,010 implemented these data structures for us, 851 00:35:44,010 --> 00:35:47,570 and we'll start to understand the real world means of implementing 852 00:35:47,570 --> 00:35:50,560 web-based programs and websites more generally 853 00:35:50,560 --> 00:35:52,910 and also the very security implications that we've only 854 00:35:52,910 --> 00:35:54,850 begun to scratch the surface of. 855 00:35:54,850 --> 00:35:57,320 Here is what awaits us in the days to come. 856 00:35:57,320 --> 00:36:00,480 >> [VIDEO PLAYBACK] 857 00:36:00,480 --> 00:36:03,432 858 00:36:03,432 --> 00:36:12,780 >> -He came with a message, with a protocol all his own. 859 00:36:12,780 --> 00:36:26,110 860 00:36:26,110 --> 00:36:30,894 He came to a world of cruel firewalls, uncaring routers, 861 00:36:30,894 --> 00:36:33,368 and dangers far worse than death. 862 00:36:33,368 --> 00:36:35,280 863 00:36:35,280 --> 00:36:36,236 He's fast. 864 00:36:36,236 --> 00:36:37,980 He's strong. 865 00:36:37,980 --> 00:36:42,830 He's TCP/IP, and he's got your address. 866 00:36:42,830 --> 00:36:45,290 867 00:36:45,290 --> 00:36:48,074 "Warriors of the Net." 868 00:36:48,074 --> 00:36:49,660 [END VIDEO PLAYBACK] 869 00:36:49,660 --> 00:36:50,910 DAVID MALAN: Coming next week. 870 00:36:50,910 --> 00:36:51,880 We will see you then. 871 00:36:51,880 --> 00:36:54,615 872 00:36:54,615 --> 00:36:56,060 [VIDEO PLAYBACK] 873 00:36:56,060 --> 00:36:59,240 -And now, "Deep Thoughts" by Daven Farnham. 874 00:36:59,240 --> 00:37:02,030 875 00:37:02,030 --> 00:37:05,820 -David always starts lectures with, "All right." 876 00:37:05,820 --> 00:37:08,750 Why not, "Here's the solution to this week's problem set" 877 00:37:08,750 --> 00:37:12,180 or "We're giving all of you an A?" 878 00:37:12,180 --> 00:37:13,380 [LAUGHING] 879 00:37:13,380 --> 00:37:15,530 [END VIDEO PLAYBACK]