1 00:00:00,000 --> 00:00:11,860 2 00:00:11,860 --> 00:00:13,120 >> SPEAKER 1: All right, so we are back. 3 00:00:13,120 --> 00:00:14,480 Welcome to CS50. 4 00:00:14,480 --> 00:00:16,510 This is the end of week seven. 5 00:00:16,510 --> 00:00:20,200 So recall that last time, we started looking at slightly more sophisticated 6 00:00:20,200 --> 00:00:21,100 data structures. 7 00:00:21,100 --> 00:00:25,110 Since up until now, all we had really at our disposal was this, an array. 8 00:00:25,110 --> 00:00:29,340 >> But before we discard the array as not all that interesting, which indeed it 9 00:00:29,340 --> 00:00:33,570 actually is, what are some of the pluses of this simple data 10 00:00:33,570 --> 00:00:34,560 structure thus far? 11 00:00:34,560 --> 00:00:36,110 What's it good at? 12 00:00:36,110 --> 00:00:39,450 So far as we've seen? 13 00:00:39,450 --> 00:00:42,540 What do you got? 14 00:00:42,540 --> 00:00:44,028 Nothing. 15 00:00:44,028 --> 00:00:45,020 >> STUDENT: [INAUDIBLE]. 16 00:00:45,020 --> 00:00:45,395 >> SPEAKER 1: What's that? 17 00:00:45,395 --> 00:00:46,410 >> STUDENT: [INAUDIBLE]. 18 00:00:46,410 --> 00:00:47,000 >> SPEAKER 1: Fixed size. 19 00:00:47,000 --> 00:00:51,260 OK, so why is fixed size good though? 20 00:00:51,260 --> 00:00:53,180 >> STUDENT: [INAUDIBLE]. 21 00:00:53,180 --> 00:00:56,240 >> SPEAKER 1: OK, so it's efficient in the sense that you can allocate a 22 00:00:56,240 --> 00:01:00,070 fixed amount of space, which hopefully is precisely as much 23 00:01:00,070 --> 00:01:01,180 space as you want. 24 00:01:01,180 --> 00:01:02,720 So that could be absolutely a plus. 25 00:01:02,720 --> 00:01:06,530 >> What's another up side of an array? 26 00:01:06,530 --> 00:01:07,610 Yeah? 27 00:01:07,610 --> 00:01:08,750 >> STUDENT: [INAUDIBLE]. 28 00:01:08,750 --> 00:01:09,550 >> SPEAKER 1: All the-- sorry? 29 00:01:09,550 --> 00:01:11,270 >> STUDENT: [INAUDIBLE]. 30 00:01:11,270 --> 00:01:13,620 >> SPEAKER 1: All the boxes in memory or next to each other. 31 00:01:13,620 --> 00:01:15,220 And that's helpful-- why? 32 00:01:15,220 --> 00:01:15,970 That's quite true. 33 00:01:15,970 --> 00:01:18,611 But how can we exploit that truth? 34 00:01:18,611 --> 00:01:21,500 >> STUDENT: [INAUDIBLE]. 35 00:01:21,500 --> 00:01:24,490 >> SPEAKER 1: Exactly, we can keep track of where everything is just by knowing 36 00:01:24,490 --> 00:01:28,560 one address, namely the address of the first byte of that chunk of memory. 37 00:01:28,560 --> 00:01:30,420 Or in the case of the string, the address of the first 38 00:01:30,420 --> 00:01:31,460 char in that string. 39 00:01:31,460 --> 00:01:33,330 And from there, we can find the end of the string. 40 00:01:33,330 --> 00:01:35,710 We can find the second element, the third element, and so forth. 41 00:01:35,710 --> 00:01:38,740 >> And so the fancy way of describing that feature is that arrays give us 42 00:01:38,740 --> 00:01:40,020 random access. 43 00:01:40,020 --> 00:01:44,330 Just by using the square bracket notation and a number, you can jump to 44 00:01:44,330 --> 00:01:48,070 a specific element in the array in constant time, big O 45 00:01:48,070 --> 00:01:49,810 of one, so to speak. 46 00:01:49,810 --> 00:01:51,080 >> But there's been some downsides. 47 00:01:51,080 --> 00:01:53,110 What an array not do very easily? 48 00:01:53,110 --> 00:01:55,810 49 00:01:55,810 --> 00:01:57,170 What's it not good at? 50 00:01:57,170 --> 00:01:58,810 >> STUDENT: [INAUDIBLE]. 51 00:01:58,810 --> 00:01:59,860 >> SPEAKER 1: What's that? 52 00:01:59,860 --> 00:02:00,530 >> STUDENT: [INAUDIBLE]. 53 00:02:00,530 --> 00:02:01,460 >> SPEAKER 1: Expanding in size. 54 00:02:01,460 --> 00:02:04,800 So the downsides of the array are precisely the opposite of what the 55 00:02:04,800 --> 00:02:05,540 upsides are. 56 00:02:05,540 --> 00:02:07,610 So one of the downsides is that it's a fixed size. 57 00:02:07,610 --> 00:02:09,400 So you can't really grow it. 58 00:02:09,400 --> 00:02:13,510 You can reallocate a bigger chunk of memory, and then move the old elements 59 00:02:13,510 --> 00:02:14,460 into the new array. 60 00:02:14,460 --> 00:02:18,060 And then free the old array, for instance, by using malloc or a similar 61 00:02:18,060 --> 00:02:21,180 function called realloc, which reallocates memory. 62 00:02:21,180 --> 00:02:25,490 >> Realloc, as an aside, tries to give you memory that's next to the array 63 00:02:25,490 --> 00:02:26,610 that you already have. 64 00:02:26,610 --> 00:02:28,740 But it might move things around altogether. 65 00:02:28,740 --> 00:02:30,710 But in short, that's expensive, right? 66 00:02:30,710 --> 00:02:33,440 Because if you have a chunk of memory of this size, but you really want one 67 00:02:33,440 --> 00:02:36,710 of this size, and you want to preserve the original elements, you have 68 00:02:36,710 --> 00:02:40,510 roughly a linear time copying process that needs to happen from 69 00:02:40,510 --> 00:02:41,900 old array to new. 70 00:02:41,900 --> 00:02:44,630 And the reality is asking the operating system again and again and 71 00:02:44,630 --> 00:02:48,340 again for big chunks of memory can start to cost you some time as well. 72 00:02:48,340 --> 00:02:52,250 So it's both a blessing and a curse in disguise, the fact that these arrays 73 00:02:52,250 --> 00:02:53,860 are of fixed size. 74 00:02:53,860 --> 00:02:56,790 But if we introduce instead something like this, which we called a linked 75 00:02:56,790 --> 00:03:00,580 list, we get a few upsides and a few downsides here as well. 76 00:03:00,580 --> 00:03:05,780 >> So a linked list is simply a data structure made up of C structs in this 77 00:03:05,780 --> 00:03:09,850 case, where a struct, recall, is just a container for one or more specific 78 00:03:09,850 --> 00:03:11,100 types of variables. 79 00:03:11,100 --> 00:03:16,110 In this case, what do the data types appear to be inside of the struct that 80 00:03:16,110 --> 00:03:17,600 last time we called a node? 81 00:03:17,600 --> 00:03:19,380 Each of these rectangles is a node. 82 00:03:19,380 --> 00:03:22,660 And each of the smaller rectangles inside of it is a data type. 83 00:03:22,660 --> 00:03:25,300 What types did we say they were on Monday? 84 00:03:25,300 --> 00:03:26,478 Yeah? 85 00:03:26,478 --> 00:03:27,870 >> STUDENT: [INAUDIBLE]. 86 00:03:27,870 --> 00:03:30,721 >> SPEAKER 1: A variable and a pointer, or more specifically, an int, for n, 87 00:03:30,721 --> 00:03:32,180 and a pointer at the bottom. 88 00:03:32,180 --> 00:03:35,360 Both of those happen to be 32 bits, at least on a computer like this CS50 89 00:03:35,360 --> 00:03:37,980 Appliance, and so they're drawn equally in size. 90 00:03:37,980 --> 00:03:42,260 >> So what are using the pointer though for apparently? 91 00:03:42,260 --> 00:03:47,690 Why add this arrow now when arrays were so nice and clean and simple? 92 00:03:47,690 --> 00:03:50,460 What is the pointer doing for us in each of these nodes? 93 00:03:50,460 --> 00:03:52,160 >> STUDENT: [INAUDIBLE]. 94 00:03:52,160 --> 00:03:52,465 >> SPEAKER 1: Exactly. 95 00:03:52,465 --> 00:03:54,120 It's telling you where the next one is. 96 00:03:54,120 --> 00:03:57,350 So I sort of use the analogy of using a thread to sort of 97 00:03:57,350 --> 00:03:59,180 thread these nodes together. 98 00:03:59,180 --> 00:04:01,760 And that's exactly what we're doing with pointers because each of these 99 00:04:01,760 --> 00:04:06,360 chunks of memory may or may not be contiguous, back to back to back 100 00:04:06,360 --> 00:04:09,500 inside of RAM, because each time you call malloc saying, give me enough 101 00:04:09,500 --> 00:04:12,510 bytes for a new node, it might be here or it might be here. 102 00:04:12,510 --> 00:04:13,120 It might be here. 103 00:04:13,120 --> 00:04:13,730 It might be here. 104 00:04:13,730 --> 00:04:14,640 You just don't know. 105 00:04:14,640 --> 00:04:17,880 >> But using pointers in the addresses of those nodes, you can stitch them 106 00:04:17,880 --> 00:04:22,370 together in a way that looks visually like a list even if these things are 107 00:04:22,370 --> 00:04:26,770 all spread out throughout your one or your two or your four gigabytes of RAM 108 00:04:26,770 --> 00:04:28,760 inside of your own computer. 109 00:04:28,760 --> 00:04:33,230 >> So the downside, then, of a linked list is what? 110 00:04:33,230 --> 00:04:34,670 What's a price we're apparently paying? 111 00:04:34,670 --> 00:04:36,010 >> STUDENT: [INAUDIBLE]. 112 00:04:36,010 --> 00:04:36,920 >> SPEAKER 1: More space, right? 113 00:04:36,920 --> 00:04:39,340 We've, in this case, doubled the amount of space because we've gone 114 00:04:39,340 --> 00:04:43,500 from 32 bits for each node, for each int, so now 64 bits because we have to 115 00:04:43,500 --> 00:04:45,050 keep around a pointer as well. 116 00:04:45,050 --> 00:04:48,860 You get more efficiency if your struct is bigger than this simple thing. 117 00:04:48,860 --> 00:04:52,020 If you actually have a student inside of which is a couple of strings for 118 00:04:52,020 --> 00:04:55,430 name and house, maybe an ID number, maybe some other fields altogether. 119 00:04:55,430 --> 00:04:59,000 >> So if you have a large enough struct, then maybe the cost of the pointer is 120 00:04:59,000 --> 00:05:00,010 not such a big deal. 121 00:05:00,010 --> 00:05:03,570 This is a bit of a corner case in that we're storing such a simple primitive 122 00:05:03,570 --> 00:05:04,760 inside of the linked list. 123 00:05:04,760 --> 00:05:05,790 But the point is the same. 124 00:05:05,790 --> 00:05:08,230 You're definitely spending more memory, but you're getting 125 00:05:08,230 --> 00:05:08,990 flexibility. 126 00:05:08,990 --> 00:05:12,280 Because now if I want to add an element at the beginning of this list, 127 00:05:12,280 --> 00:05:14,340 I have to allocate a new node. 128 00:05:14,340 --> 00:05:17,180 And I have to just update those arrows somehow by just moving 129 00:05:17,180 --> 00:05:17,980 some pointers around. 130 00:05:17,980 --> 00:05:20,580 >> If I want to insert something into the middle of the list, I don't have to 131 00:05:20,580 --> 00:05:24,410 push everyone aside like we did in weeks' past with our volunteers who 132 00:05:24,410 --> 00:05:25,700 represented an array. 133 00:05:25,700 --> 00:05:29,470 I can just allocate a new node and then just point the arrows in 134 00:05:29,470 --> 00:05:32,290 different directions because it doesn't have to remain in actual 135 00:05:32,290 --> 00:05:35,670 memory a true line like I've drawn it here on the screen. 136 00:05:35,670 --> 00:05:38,400 >> And then lastly, if you want to insert something at the end of the list, it's 137 00:05:38,400 --> 00:05:39,210 even easier. 138 00:05:39,210 --> 00:05:43,320 This is sort of arbitrary notation, but 34's pointer, take a guess. 139 00:05:43,320 --> 00:05:46,710 What is the value of its pointer most likely drawn sort of like an old 140 00:05:46,710 --> 00:05:47,700 school antenna there? 141 00:05:47,700 --> 00:05:48,920 >> STUDENT: [INAUDIBLE]. 142 00:05:48,920 --> 00:05:49,900 >> SPEAKER 1: It's probably null. 143 00:05:49,900 --> 00:05:52,710 And indeed that is one author's representation of null. 144 00:05:52,710 --> 00:05:56,310 And it's null because you absolutely need to know where the end of a linked 145 00:05:56,310 --> 00:06:00,050 list is, lest you keep following and following and following these arrows 146 00:06:00,050 --> 00:06:01,170 to some garbage value. 147 00:06:01,170 --> 00:06:06,230 So null will signify that there's no more nodes to the right of number 34, 148 00:06:06,230 --> 00:06:07,200 in this case. 149 00:06:07,200 --> 00:06:10,270 >> So we propose that we can implement this node in code. 150 00:06:10,270 --> 00:06:12,130 And we've seen this kind of syntax before. 151 00:06:12,130 --> 00:06:15,090 Typedef just defines a new type for us, gives us a synonym like 152 00:06:15,090 --> 00:06:17,100 string was for char*. 153 00:06:17,100 --> 00:06:21,030 In this case, it's going to give us shorthand notation so that struct node 154 00:06:21,030 --> 00:06:24,010 can instead just be written as node, which is a lot cleaner. 155 00:06:24,010 --> 00:06:25,360 It's a lot less verbose. 156 00:06:25,360 --> 00:06:30,080 >> Inside of a node is apparently an int called n, and then a struct node* 157 00:06:30,080 --> 00:06:34,670 which means exactly what we wanted the arrows to mean, a pointer to another 158 00:06:34,670 --> 00:06:36,940 node of the exact same data type. 159 00:06:36,940 --> 00:06:40,300 And I propose that we could implement a search function like this, which at 160 00:06:40,300 --> 00:06:41,890 first glance might seem a little complex. 161 00:06:41,890 --> 00:06:43,330 But let's see it in context. 162 00:06:43,330 --> 00:06:45,480 >> Let me go over to the appliance here. 163 00:06:45,480 --> 00:06:48,460 Let me open up a file called list zero dot h. 164 00:06:48,460 --> 00:06:53,950 And that only contains the definition we just saw a moment ago for this data 165 00:06:53,950 --> 00:06:55,390 type called a node. 166 00:06:55,390 --> 00:06:57,350 So we've put that into a dot h file. 167 00:06:57,350 --> 00:07:01,430 >> And as an aside, even though this program that you're about to see is 168 00:07:01,430 --> 00:07:05,410 not all that complex, it's indeed convention when writing a program to 169 00:07:05,410 --> 00:07:10,270 put things like data types, to pull constants sometimes, inside of your 170 00:07:10,270 --> 00:07:13,210 header file and not necessarily in your C file, certainly when your 171 00:07:13,210 --> 00:07:17,370 programs get larger and larger, so that you know where to look both for 172 00:07:17,370 --> 00:07:20,840 documentation in some cases, or for basics like this, the 173 00:07:20,840 --> 00:07:22,360 definition of some type. 174 00:07:22,360 --> 00:07:25,680 >> If I now open up list zero dot c, notice a few things. 175 00:07:25,680 --> 00:07:29,090 It includes a few header files, most of which we've seen before. 176 00:07:29,090 --> 00:07:31,980 It includes its own header file. 177 00:07:31,980 --> 00:07:35,200 >> And as an aside, why that's double quotes here, as opposed to the angle 178 00:07:35,200 --> 00:07:38,340 brackets on the line that I've highlighted there? 179 00:07:38,340 --> 00:07:39,180 >> STUDENT: [INAUDIBLE]. 180 00:07:39,180 --> 00:07:40,460 >> SPEAKER 1: Yeah so it's a local file. 181 00:07:40,460 --> 00:07:44,300 So if it's a local file of your own here on line 15, for instance, you use 182 00:07:44,300 --> 00:07:46,570 the double quotes instead of the angled brackets. 183 00:07:46,570 --> 00:07:48,270 >> Now this is kind of interesting. 184 00:07:48,270 --> 00:07:51,830 Notice that I've declared a global variable in this program on line 18 185 00:07:51,830 --> 00:07:55,910 called first, the idea being this is going to be a pointer to the first 186 00:07:55,910 --> 00:07:59,190 node in my linked list, and I've initialized it to null, because I've 187 00:07:59,190 --> 00:08:02,310 not allocated any actual nodes just yet. 188 00:08:02,310 --> 00:08:07,570 >> So this represents, pictorially, what we saw a moment ago in the picture as 189 00:08:07,570 --> 00:08:10,090 that pointer on the far left hand side. 190 00:08:10,090 --> 00:08:12,260 So right now, that pointer does not have an arrow. 191 00:08:12,260 --> 00:08:14,590 It instead is just null. 192 00:08:14,590 --> 00:08:17,880 But it represents what will be the address of the first actual 193 00:08:17,880 --> 00:08:19,480 node in this list. 194 00:08:19,480 --> 00:08:22,120 So I've implemented it is a global because, as you'll see, all this 195 00:08:22,120 --> 00:08:25,310 program does in life is implement a linked list for me. 196 00:08:25,310 --> 00:08:27,050 >> Now I've got a few prototypes here. 197 00:08:27,050 --> 00:08:31,190 I decided to implement features like deletion, insertion, searching, and 198 00:08:31,190 --> 00:08:31,740 traversal-- 199 00:08:31,740 --> 00:08:35,210 the last just being walk across the list, printing out its elements. 200 00:08:35,210 --> 00:08:36,750 And now here's my main routine. 201 00:08:36,750 --> 00:08:39,890 And we won't spend too much time on these since this is sort of, hopefully 202 00:08:39,890 --> 00:08:41,780 old hat by now. 203 00:08:41,780 --> 00:08:45,370 >> I'm going to do the following, while the user cooperates. 204 00:08:45,370 --> 00:08:47,300 So one, I'm going to print out this menu. 205 00:08:47,300 --> 00:08:49,420 And I've formatted it as cleanly as I could. 206 00:08:49,420 --> 00:08:52,240 If the user types in one, that means they want to delete something. 207 00:08:52,240 --> 00:08:54,560 If the user types in two, that means they want to insert something. 208 00:08:54,560 --> 00:08:55,930 And so forth. 209 00:08:55,930 --> 00:08:58,270 I'm going to then prompt then for a command. 210 00:08:58,270 --> 00:08:59,300 And then I'm going to use GetInt. 211 00:08:59,300 --> 00:09:02,790 >> So this is a really simple menuing interface where you just have to type 212 00:09:02,790 --> 00:09:05,270 a number mapping to one of those commands. 213 00:09:05,270 --> 00:09:08,730 And now I have a nice clean switch statement that's going to switch on 214 00:09:08,730 --> 00:09:10,090 whatever the user typed in. 215 00:09:10,090 --> 00:09:12,180 And if they typed one, I'll call delete and break. 216 00:09:12,180 --> 00:09:14,380 If they typed two, I'll call insert and break. 217 00:09:14,380 --> 00:09:16,490 >> And now notice I've put each of these on the same line. 218 00:09:16,490 --> 00:09:18,360 This is just a stylistic decision. 219 00:09:18,360 --> 00:09:20,210 Typically we've seen something like this. 220 00:09:20,210 --> 00:09:23,260 But I just decided, frankly, my program looked more readable because 221 00:09:23,260 --> 00:09:25,980 it was only four cases to just list it like this. 222 00:09:25,980 --> 00:09:28,360 Totally legitimate use of style. 223 00:09:28,360 --> 00:09:31,480 And I'm going to do this so long as the user has not typed zero, which I 224 00:09:31,480 --> 00:09:33,910 decided will mean they want to quit. 225 00:09:33,910 --> 00:09:36,630 >> So now notice what I'm going to do here. 226 00:09:36,630 --> 00:09:38,650 I'm going to free the list apparently. 227 00:09:38,650 --> 00:09:40,230 But more on that in just a moment. 228 00:09:40,230 --> 00:09:41,640 Let's first run this program. 229 00:09:41,640 --> 00:09:45,250 So let me make a bigger terminal window, dot slash list 0. 230 00:09:45,250 --> 00:09:49,510 I'm going to go ahead and insert by typing two, a number like 50, and now 231 00:09:49,510 --> 00:09:51,590 you'll see the list is now 50. 232 00:09:51,590 --> 00:09:53,380 And my text just scrolled up a bit. 233 00:09:53,380 --> 00:09:55,940 So now notice the list contains the number 50. 234 00:09:55,940 --> 00:09:58,220 >> Let's do another insert by taking two. 235 00:09:58,220 --> 00:10:01,630 Let's type in the number like one. 236 00:10:01,630 --> 00:10:03,940 List is now one, followed by 50. 237 00:10:03,940 --> 00:10:06,020 So this is just a textual representation of the list. 238 00:10:06,020 --> 00:10:10,550 And let's insert one more number like the number 42, which is hopefully 239 00:10:10,550 --> 00:10:14,620 going to end up in the middle, because this program in particular sorts it 240 00:10:14,620 --> 00:10:16,320 elements as it inserts them. 241 00:10:16,320 --> 00:10:17,220 So there we have it. 242 00:10:17,220 --> 00:10:20,730 Super simple program that could absolutely have used an array, but I 243 00:10:20,730 --> 00:10:23,280 happen to be using a linked list just so I can dynamically 244 00:10:23,280 --> 00:10:24,610 grow and shrink it. 245 00:10:24,610 --> 00:10:28,470 >> So let's take a look for search, if I run command three, I want to search 246 00:10:28,470 --> 00:10:31,040 for, say, the number 43. 247 00:10:31,040 --> 00:10:34,190 And nothing was apparently found, because I got back no response. 248 00:10:34,190 --> 00:10:35,010 So let's do this again. 249 00:10:35,010 --> 00:10:35,690 Search. 250 00:10:35,690 --> 00:10:39,520 Let's search for 50, or rather search for 42, which has a nice 251 00:10:39,520 --> 00:10:40,850 little subtle meaning. 252 00:10:40,850 --> 00:10:42,610 And I found the meaning of life there. 253 00:10:42,610 --> 00:10:44,990 Number 42, if you don't know the reference, Google it. 254 00:10:44,990 --> 00:10:45,350 All right. 255 00:10:45,350 --> 00:10:47,130 So what has this program done for me? 256 00:10:47,130 --> 00:10:50,660 It's just allowed me to insert thus far and search for elements. 257 00:10:50,660 --> 00:10:53,650 >> Let's fast forward, then, to that function we glanced at 258 00:10:53,650 --> 00:10:55,360 on Monday as a teaser. 259 00:10:55,360 --> 00:10:59,620 So this function, I claim, searches for an element in the list by first 260 00:10:59,620 --> 00:11:03,830 one, prompting the user and then calling GetInt to get an actual int 261 00:11:03,830 --> 00:11:05,060 that you want to search for. 262 00:11:05,060 --> 00:11:06,460 >> Then notice this. 263 00:11:06,460 --> 00:11:10,690 I'm going to create a temporary variable in line 188 called pointer-- 264 00:11:10,690 --> 00:11:11,270 PTR-- 265 00:11:11,270 --> 00:11:12,440 could have called it anything. 266 00:11:12,440 --> 00:11:16,140 And it's a pointer to a node because I said node* there. 267 00:11:16,140 --> 00:11:19,900 And I'm initializing it to be equal to first so that I effectively have my 268 00:11:19,900 --> 00:11:22,860 finger, so to speak, on the very first element of the list. 269 00:11:22,860 --> 00:11:27,460 So if my right hand here is PTR I'm pointing at the same thing that first 270 00:11:27,460 --> 00:11:28,670 is pointing at. 271 00:11:28,670 --> 00:11:31,430 >> So now back in code, what happens next-- 272 00:11:31,430 --> 00:11:35,070 this is a common paradigm when iterating over a structure like a 273 00:11:35,070 --> 00:11:35,970 linked list. 274 00:11:35,970 --> 00:11:40,410 I'm going to do the following while pointer is not equal to null So while 275 00:11:40,410 --> 00:11:47,530 my finger is not pointing at some null value, if pointer arrow n equals n. 276 00:11:47,530 --> 00:11:52,290 We'll notice first that n is what the user typed in per GetInts call here. 277 00:11:52,290 --> 00:11:54,280 >> And pointer arrow n means what? 278 00:11:54,280 --> 00:11:59,020 Well if we go back to the picture here, if I have a finger pointing at 279 00:11:59,020 --> 00:12:02,960 that first node containing nine, the arrow essentially means go to that 280 00:12:02,960 --> 00:12:08,860 node and grab the value at location n, in this case, the data field called n. 281 00:12:08,860 --> 00:12:14,120 >> As an aside-- and we saw this a couple of weeks ago when someone asked-- 282 00:12:14,120 --> 00:12:18,840 this syntax is new, but it doesn't give us powers that we 283 00:12:18,840 --> 00:12:20,040 didn't already have. 284 00:12:20,040 --> 00:12:25,325 What was this phrase equivalent to using dot notation and star a couple 285 00:12:25,325 --> 00:12:29,490 of weeks ago when we peeled back this layer a bit prematurely? 286 00:12:29,490 --> 00:12:31,780 >> STUDENT: [INAUDIBLE]. 287 00:12:31,780 --> 00:12:38,880 >> SPEAKER 1: Exactly, it was star, and then it was star dot n, with 288 00:12:38,880 --> 00:12:41,930 parentheses here, which looks, frankly, I think a lot 289 00:12:41,930 --> 00:12:43,320 more cryptic to read. 290 00:12:43,320 --> 00:12:46,270 But star pointer, as always, means go there. 291 00:12:46,270 --> 00:12:49,090 And once you're there, what data field do you want to access? 292 00:12:49,090 --> 00:12:52,730 Well you use the dot notation to access a structs data field, and I 293 00:12:52,730 --> 00:12:54,140 specifically want n. 294 00:12:54,140 --> 00:12:56,240 >> Frankly, I would argue this is just harder to read. 295 00:12:56,240 --> 00:12:58,080 It's harder to remember where do the parentheses go, the 296 00:12:58,080 --> 00:12:59,030 star and all of that. 297 00:12:59,030 --> 00:13:02,150 So the world adopted some syntactic sugar, so to speak. 298 00:13:02,150 --> 00:13:04,740 Just a sexy way of saying, this is equivalent, and 299 00:13:04,740 --> 00:13:05,970 perhaps more intuitive. 300 00:13:05,970 --> 00:13:09,600 If pointer is indeed a pointer, the arrow notation means go there and find 301 00:13:09,600 --> 00:13:11,890 the field in this case called n. 302 00:13:11,890 --> 00:13:13,660 >> So if I find it, notice what I do. 303 00:13:13,660 --> 00:13:17,430 I simply print out, I found percent i, plugging in the value for that int. 304 00:13:17,430 --> 00:13:20,730 I call sleep for one second just to kind of pause things on the screen to 305 00:13:20,730 --> 00:13:22,900 give the user a second to absorb what just happened. 306 00:13:22,900 --> 00:13:24,290 And then I break. 307 00:13:24,290 --> 00:13:26,330 Otherwise, what do I do? 308 00:13:26,330 --> 00:13:30,960 I update pointer to equal pointer arrow next. 309 00:13:30,960 --> 00:13:35,840 >> So just to be clear, this means go there, using my old-school notation. 310 00:13:35,840 --> 00:13:39,580 So this just means to go to whatever you're pointing at, which in the very 311 00:13:39,580 --> 00:13:43,660 first case is I'm pointing at the struct with nine in it. 312 00:13:43,660 --> 00:13:44,510 So I've gone there. 313 00:13:44,510 --> 00:13:47,880 And then the dot notation means, get the value at next. 314 00:13:47,880 --> 00:13:50,470 >> But the value, even though it's drawn as an narrow, is just a number. 315 00:13:50,470 --> 00:13:51,720 It's a numeric address. 316 00:13:51,720 --> 00:13:55,670 So this one line of code, whether written like this, the more cryptic 317 00:13:55,670 --> 00:14:00,190 way, or like this, the slightly more intuitive way, just means move my hand 318 00:14:00,190 --> 00:14:03,460 from the first node to the next one, and then the next one, and then the 319 00:14:03,460 --> 00:14:05,320 next one, and so forth. 320 00:14:05,320 --> 00:14:09,920 >> So we won't dwell on the other implementations of insert and delete 321 00:14:09,920 --> 00:14:14,030 and traversal, the first two of which are fairly involved. 322 00:14:14,030 --> 00:14:17,010 And I think it's quite easy to get lost when doing it verbally. 323 00:14:17,010 --> 00:14:19,890 But what we can do here is try to determine how 324 00:14:19,890 --> 00:14:21,640 best to do this visually. 325 00:14:21,640 --> 00:14:24,800 Because I would propose that if we want to insert elements into this 326 00:14:24,800 --> 00:14:26,680 existing list, which has five elements-- 327 00:14:26,680 --> 00:14:29,530 9, 17, 22, 26, and 33-- 328 00:14:29,530 --> 00:14:33,300 if I were going to implement this in code, I need to consider how to go 329 00:14:33,300 --> 00:14:34,160 about doing this. 330 00:14:34,160 --> 00:14:37,720 >> And I would propose taking baby steps whereby, in this case I mean, what are 331 00:14:37,720 --> 00:14:41,090 the possible scenarios that we might encounter in general? 332 00:14:41,090 --> 00:14:44,120 When implementing insert for a linked list, this just happens to be a 333 00:14:44,120 --> 00:14:46,090 specific example of size five. 334 00:14:46,090 --> 00:14:50,420 Well if you want to insert a number, like say the number one, and 335 00:14:50,420 --> 00:14:53,380 maintaining sorted order, where obviously does the number one need to 336 00:14:53,380 --> 00:14:55,686 go in this specific example? 337 00:14:55,686 --> 00:14:56,840 Like at the beginning. 338 00:14:56,840 --> 00:15:00,030 >> But what's interesting there is that if you want to insert one into this 339 00:15:00,030 --> 00:15:04,100 list, what special pointer needs to be updated apparently? 340 00:15:04,100 --> 00:15:04,610 First. 341 00:15:04,610 --> 00:15:07,830 So I would argue, this is the first case that we might want to consider, a 342 00:15:07,830 --> 00:15:11,140 scenario involving inserting at the beginning of the list. 343 00:15:11,140 --> 00:15:15,400 >> Let's pluck off maybe an as easy or even easier case, relatively speaking. 344 00:15:15,400 --> 00:15:18,110 Suppose I want to insert the number 35 in sorted order. 345 00:15:18,110 --> 00:15:20,600 It obviously belongs over there. 346 00:15:20,600 --> 00:15:25,320 So what pointer obviously is going to have to be updated in that scenario? 347 00:15:25,320 --> 00:15:30,060 34's pointer becoming not null but the address of the struct 348 00:15:30,060 --> 00:15:31,800 containing the number 35. 349 00:15:31,800 --> 00:15:32,750 So that's case two. 350 00:15:32,750 --> 00:15:36,190 So already, I'm sort of quantizing how much work I have to do here. 351 00:15:36,190 --> 00:15:39,880 >> And finally, the obvious middle case is indeed, in the middle, if I want to 352 00:15:39,880 --> 00:15:45,870 insert something like say 23, that goes between the 23 and the 26, but 353 00:15:45,870 --> 00:15:48,680 now things get a little more involved because what 354 00:15:48,680 --> 00:15:52,800 pointers need to be changed? 355 00:15:52,800 --> 00:15:56,680 So 22 obviously needs to be changed because he can't point to 26 anymore. 356 00:15:56,680 --> 00:16:00,320 He needs to point to the new node that I'll have to allocate by calling 357 00:16:00,320 --> 00:16:01,770 malloc or some equivalent. 358 00:16:01,770 --> 00:16:05,990 >> But then I also need that new node, 23 in this case, to have its pointer 359 00:16:05,990 --> 00:16:07,870 pointing at whom? 360 00:16:07,870 --> 00:16:08,560 26. 361 00:16:08,560 --> 00:16:10,380 And there's going to be an order of operations here. 362 00:16:10,380 --> 00:16:13,410 Because if I do this foolishly, and I for instance start at the beginning of 363 00:16:13,410 --> 00:16:16,040 the list, and my goal is to insert 23. 364 00:16:16,040 --> 00:16:18,610 And I check, does it belong here, near nine? 365 00:16:18,610 --> 00:16:18,950 No. 366 00:16:18,950 --> 00:16:20,670 Does it belong here, next to 17? 367 00:16:20,670 --> 00:16:20,940 No. 368 00:16:20,940 --> 00:16:22,530 Does it belongs here next to 22? 369 00:16:22,530 --> 00:16:23,300 Yes. 370 00:16:23,300 --> 00:16:26,400 >> Now if I'm foolish here, and not thinking this through, I might 371 00:16:26,400 --> 00:16:28,320 allocate my new node for 23. 372 00:16:28,320 --> 00:16:32,080 I might update the pointer from the node called 22, pointing 373 00:16:32,080 --> 00:16:33,080 it at the new node. 374 00:16:33,080 --> 00:16:36,140 And then what do I have to update the new node's pointer to be? 375 00:16:36,140 --> 00:16:38,120 >> STUDENT: [INAUDIBLE]. 376 00:16:38,120 --> 00:16:38,385 >> SPEAKER 1: Exactly. 377 00:16:38,385 --> 00:16:39,710 Pointing at 26. 378 00:16:39,710 --> 00:16:45,590 But dammit if I didn't already update 22's pointer to point at this guy, and 379 00:16:45,590 --> 00:16:48,260 now I have orphans, the rest of the list, so to speak. 380 00:16:48,260 --> 00:16:52,140 So order of operations here is going to be important. 381 00:16:52,140 --> 00:16:55,100 >> To do this could I steal, say, six volunteers. 382 00:16:55,100 --> 00:16:57,650 And let's see if we can't do this visually instead of code-wise. 383 00:16:57,650 --> 00:16:59,330 And we have some lovely stress balls for you today. 384 00:16:59,330 --> 00:17:02,510 OK, how about one, two, in the back-- on the end there. 385 00:17:02,510 --> 00:17:04,530 three, four, both of you guys on the end. 386 00:17:04,530 --> 00:17:05,579 And five, six. 387 00:17:05,579 --> 00:17:05,839 Sure. 388 00:17:05,839 --> 00:17:06,450 Five and six. 389 00:17:06,450 --> 00:17:08,390 All right and we'll come to you guys next time. 390 00:17:08,390 --> 00:17:09,640 All right, come on up. 391 00:17:09,640 --> 00:17:12,010 392 00:17:12,010 --> 00:17:14,819 >> All right, since you're up here first, would you like to be the one awkwardly 393 00:17:14,819 --> 00:17:16,119 in Google Glass here? 394 00:17:16,119 --> 00:17:19,075 All right, so, OK, Glass, record a video. 395 00:17:19,075 --> 00:17:22,720 396 00:17:22,720 --> 00:17:24,589 OK, you're good to go. 397 00:17:24,589 --> 00:17:27,950 >> All right, so if you guys can come over here, I have prepared in advance 398 00:17:27,950 --> 00:17:30,110 some numbers. 399 00:17:30,110 --> 00:17:31,240 All right, come on over here. 400 00:17:31,240 --> 00:17:33,440 And why don't you go a little further that way. 401 00:17:33,440 --> 00:17:35,520 And let's see, what's your name, with the Google Glass? 402 00:17:35,520 --> 00:17:35,910 >> STUDENT: Ben. 403 00:17:35,910 --> 00:17:36,230 >> SPEAKER 1: Ben? 404 00:17:36,230 --> 00:17:38,380 OK, Ben, you will be first, literally. 405 00:17:38,380 --> 00:17:40,580 So we're going to send you to the end of the stage. 406 00:17:40,580 --> 00:17:41,670 All right, and your name? 407 00:17:41,670 --> 00:17:41,990 >> STUDENT: Jason. 408 00:17:41,990 --> 00:17:44,530 >> SPEAKER 1: Jason, OK you'll be number nine. 409 00:17:44,530 --> 00:17:46,700 So if you want to follow Ben that way. 410 00:17:46,700 --> 00:17:47,010 >> STUDENT: Jill. 411 00:17:47,010 --> 00:17:49,630 >> SPEAKER 1: Jill, you're going to be 17, which if I'd done this more 412 00:17:49,630 --> 00:17:51,260 intelligently, I would have started at the other end. 413 00:17:51,260 --> 00:17:52,370 You go that way. 414 00:17:52,370 --> 00:17:53,030 22. 415 00:17:53,030 --> 00:17:53,670 And you are? 416 00:17:53,670 --> 00:17:53,980 >> STUDENT: Mary. 417 00:17:53,980 --> 00:17:56,130 >> SPEAKER 1: Mary, you'll be 22. 418 00:17:56,130 --> 00:17:58,420 And your name is? 419 00:17:58,420 --> 00:17:58,810 >> STUDENT: Chris. 420 00:17:58,810 --> 00:18:00,100 >> SPEAKER 1: Chris, you'll be 26. 421 00:18:00,100 --> 00:18:00,740 And then lastly. 422 00:18:00,740 --> 00:18:01,400 >> STUDENT: Diana. 423 00:18:01,400 --> 00:18:02,670 >> SPEAKER 1: Diana, you'll be 34. 424 00:18:02,670 --> 00:18:03,920 So you come on over here. 425 00:18:03,920 --> 00:18:06,360 >> All right, so perfect sorted order already. 426 00:18:06,360 --> 00:18:09,600 And let's go ahead and do this so that we can really-- 427 00:18:09,600 --> 00:18:11,720 Ben you're just kind of looking out into nowhere there. 428 00:18:11,720 --> 00:18:15,670 OK, so let's go ahead and depict this using arms, much like I was, exactly, 429 00:18:15,670 --> 00:18:16,250 what's going on. 430 00:18:16,250 --> 00:18:19,540 So go ahead and give yourselves a foot or two between yourselves. 431 00:18:19,540 --> 00:18:22,900 And go ahead and point with one hand to whoever you should be pointing at 432 00:18:22,900 --> 00:18:23,470 based on this. 433 00:18:23,470 --> 00:18:25,890 And if you're null just point straight down to the floor. 434 00:18:25,890 --> 00:18:27,690 OK, so good. 435 00:18:27,690 --> 00:18:32,290 >> So now we have a linked list, and let me propose that I'll play the role of 436 00:18:32,290 --> 00:18:35,110 PTR, so I won't bother carrying this around. 437 00:18:35,110 --> 00:18:37,830 And then-- someone stupid convention-- you can call this anything you want-- 438 00:18:37,830 --> 00:18:39,800 predecessor pointer, pred pointer-- 439 00:18:39,800 --> 00:18:43,930 it's just the nickname we gave in our sample code to my left hand. 440 00:18:43,930 --> 00:18:47,240 The other hand that going to be keeping track of who is who in the 441 00:18:47,240 --> 00:18:48,400 following scenarios. 442 00:18:48,400 --> 00:18:52,390 >> So suppose, first, I want to pluck off that first example of inserting, say 443 00:18:52,390 --> 00:18:54,330 20, into the list. 444 00:18:54,330 --> 00:18:57,160 So I'm going to need someone to embody the number 20 for us. 445 00:18:57,160 --> 00:18:58,950 So I need to malloc someone from the audience. 446 00:18:58,950 --> 00:18:59,380 Come on up. 447 00:18:59,380 --> 00:19:00,340 What's your name? 448 00:19:00,340 --> 00:19:01,300 >> STUDENT: Brian. 449 00:19:01,300 --> 00:19:05,270 >> SPEAKER 1: Brian, all right, so you shall be the node containing 20. 450 00:19:05,270 --> 00:19:06,810 All right, come on over here. 451 00:19:06,810 --> 00:19:10,025 And obviously, where does Brian belong? 452 00:19:10,025 --> 00:19:12,190 So, in the middle of-- actually, wait a minute. 453 00:19:12,190 --> 00:19:13,420 We're doing this out of order. 454 00:19:13,420 --> 00:19:17,170 We're making this a lot harder than it needs to be at first. 455 00:19:17,170 --> 00:19:21,210 OK, we're going to free Brian and realloc Brian as five. 456 00:19:21,210 --> 00:19:23,680 >> OK, so now we want to insert Brian as five. 457 00:19:23,680 --> 00:19:25,960 So come on over here next to Ben for just a moment. 458 00:19:25,960 --> 00:19:28,250 And you can presumably tell where this story is going. 459 00:19:28,250 --> 00:19:30,500 But let's think carefully about the order of operations. 460 00:19:30,500 --> 00:19:32,880 And it's precisely this visual that's going to line up 461 00:19:32,880 --> 00:19:34,080 with that sample code. 462 00:19:34,080 --> 00:19:40,120 So here I have PTR pointing initially not at Ben, per se, but at whatever 463 00:19:40,120 --> 00:19:43,245 value he contains, which in this case is-- what's your name again? 464 00:19:43,245 --> 00:19:43,670 >> STUDENT: Jason. 465 00:19:43,670 --> 00:19:47,350 >> SPEAKER 1: Jason, so both Ben and I are pointing at Jason at this moment. 466 00:19:47,350 --> 00:19:49,700 So now I have to determine, where does Brian belong? 467 00:19:49,700 --> 00:19:53,500 So the only thing I have access to right now is his n data item. 468 00:19:53,500 --> 00:19:58,280 So I'm going to check, is Brian less than Jason? 469 00:19:58,280 --> 00:19:59,770 The answer is true. 470 00:19:59,770 --> 00:20:03,680 >> So what now needs to happen, in the correct order? 471 00:20:03,680 --> 00:20:07,120 I need to update how many pointers in total in this story? 472 00:20:07,120 --> 00:20:10,720 Where my hand is still pointing at Jason, and your hand-- if you want to 473 00:20:10,720 --> 00:20:12,930 put your hand like, sort of, I don't know, a question mark. 474 00:20:12,930 --> 00:20:14,070 OK, good. 475 00:20:14,070 --> 00:20:15,670 >> All right, so you have a few candidates. 476 00:20:15,670 --> 00:20:20,500 Either Ben or I or Brian or Jason or everyone else, which 477 00:20:20,500 --> 00:20:21,370 pointers need to change? 478 00:20:21,370 --> 00:20:23,260 How many in total? 479 00:20:23,260 --> 00:20:24,080 >> OK, so two. 480 00:20:24,080 --> 00:20:27,090 My pointer doesn't really matter anymore because I'm just temporary. 481 00:20:27,090 --> 00:20:31,370 So it's these two guys, presumably, both Ben and Brian. 482 00:20:31,370 --> 00:20:34,410 So let me propose that we update Ben, since he's first. 483 00:20:34,410 --> 00:20:36,350 The first element of this list is now going to be Brian. 484 00:20:36,350 --> 00:20:38,070 So Ben point at Brian. 485 00:20:38,070 --> 00:20:39,320 OK, now what? 486 00:20:39,320 --> 00:20:41,950 487 00:20:41,950 --> 00:20:43,460 >> Who gets pointed at whom? 488 00:20:43,460 --> 00:20:44,710 >> STUDENT: [INAUDIBLE]. 489 00:20:44,710 --> 00:20:46,180 >> SPEAKER 1: OK so Brian has to point at Jason. 490 00:20:46,180 --> 00:20:48,360 But have I lost track of that pointer? 491 00:20:48,360 --> 00:20:49,980 Do I know where Jason is? 492 00:20:49,980 --> 00:20:50,790 >> STUDENT: [INAUDIBLE]. 493 00:20:50,790 --> 00:20:52,620 >> SPEAKER 1: I do, since I'm the temporary pointer. 494 00:20:52,620 --> 00:20:55,110 And presumably, I have not changed to point at the new node. 495 00:20:55,110 --> 00:20:58,300 So we can simply have Brian point at whoever I'm pointing at. 496 00:20:58,300 --> 00:20:59,000 And we're done. 497 00:20:59,000 --> 00:21:01,890 So case one, insertion at the beginning of the list. 498 00:21:01,890 --> 00:21:02,950 There were two key steps. 499 00:21:02,950 --> 00:21:06,750 One, we have to update Ben, and then we also have to update Brian. 500 00:21:06,750 --> 00:21:09,230 And then I don't have to bother traipsing through the rest of the 501 00:21:09,230 --> 00:21:12,680 list, because we already found his location, because he belonged to the 502 00:21:12,680 --> 00:21:14,080 left of the first element. 503 00:21:14,080 --> 00:21:15,400 >> All right, so pretty straightforward. 504 00:21:15,400 --> 00:21:18,110 In fact, feels like we're almost making this too complicated. 505 00:21:18,110 --> 00:21:20,240 So let's now pluck off the end of the list, and see where 506 00:21:20,240 --> 00:21:21,380 the complexity starts. 507 00:21:21,380 --> 00:21:24,560 So if now, I alloc from the audience. 508 00:21:24,560 --> 00:21:25,540 Anyone want to play 55? 509 00:21:25,540 --> 00:21:26,700 All right, I saw your hand first. 510 00:21:26,700 --> 00:21:29,620 Come on up. 511 00:21:29,620 --> 00:21:30,030 Yeah. 512 00:21:30,030 --> 00:21:31,177 What's your name? 513 00:21:31,177 --> 00:21:32,310 >> STUDENT: [INAUDIBLE]. 514 00:21:32,310 --> 00:21:33,240 >> SPEAKER 1: Habata. 515 00:21:33,240 --> 00:21:33,890 OK, come on up. 516 00:21:33,890 --> 00:21:35,730 You'll be the number 55. 517 00:21:35,730 --> 00:21:37,820 So you, of course, belong at the end of the list. 518 00:21:37,820 --> 00:21:41,850 So let's replay the simulation with me being the PTR for just a moment. 519 00:21:41,850 --> 00:21:44,050 So I'm first going to point at whatever Ben's pointing at. 520 00:21:44,050 --> 00:21:45,900 We're both pointing now at Brian. 521 00:21:45,900 --> 00:21:48,420 So 55 is not less than five. 522 00:21:48,420 --> 00:21:52,510 So I'm going to update myself by pointing to Brian's next pointer, who 523 00:21:52,510 --> 00:21:54,450 now is of course Jason. 524 00:21:54,450 --> 00:21:57,310 55 is not less than nine, so I'm going to update PTR. 525 00:21:57,310 --> 00:21:58,890 I'm going to update PTR. 526 00:21:58,890 --> 00:22:02,290 I'm going to update PTR I going to update PTR. 527 00:22:02,290 --> 00:22:05,060 And I'm going to-- hmm, what's your name again? 528 00:22:05,060 --> 00:22:05,560 >> STUDENT: Diana. 529 00:22:05,560 --> 00:22:09,190 >> SPEAKER 1: Diana is pointing, of course, at null with her left hand. 530 00:22:09,190 --> 00:22:13,030 So where does Habata actually belong clearly? 531 00:22:13,030 --> 00:22:15,050 To the left, here. 532 00:22:15,050 --> 00:22:19,460 So how do I know to put her Here I think I've screwed up. 533 00:22:19,460 --> 00:22:22,420 Because what is PTR art this moment in time? 534 00:22:22,420 --> 00:22:23,240 Null. 535 00:22:23,240 --> 00:22:25,580 So even though, visually, we can obviously see all of these 536 00:22:25,580 --> 00:22:26,610 guys here on stage. 537 00:22:26,610 --> 00:22:29,680 I've not kept track of the previous person in the list. 538 00:22:29,680 --> 00:22:33,210 I don't have a finger pointing out, in this case, the node number 34. 539 00:22:33,210 --> 00:22:34,760 >> So let's actually start this over. 540 00:22:34,760 --> 00:22:37,560 So now I actually do need a second local variable. 541 00:22:37,560 --> 00:22:40,980 And this is what you'll see in the actual sample C code, where as I go, 542 00:22:40,980 --> 00:22:45,860 when I update my right hand to point Jason, thereby leaving Brian behind, I 543 00:22:45,860 --> 00:22:51,440 better start using my left hand to update where I was, so that as I go 544 00:22:51,440 --> 00:22:52,700 through this list-- 545 00:22:52,700 --> 00:22:55,040 more awkwardly than I intended now here visually-- 546 00:22:55,040 --> 00:22:56,740 I'm going to get to the end of the list. 547 00:22:56,740 --> 00:23:00,020 >> This hand is still null, which is pretty useless, other than to indicate 548 00:23:00,020 --> 00:23:02,980 I'm clearly at the end of the list, but now at least I have this 549 00:23:02,980 --> 00:23:08,270 predecessor pointer pointing here, so now what hands and what pointers need 550 00:23:08,270 --> 00:23:10,150 to be updated? 551 00:23:10,150 --> 00:23:13,214 Whose hand do you want to reconfigure first? 552 00:23:13,214 --> 00:23:15,190 >> STUDENT: [INAUDIBLE]. 553 00:23:15,190 --> 00:23:16,220 >> SPEAKER 1: OK, so Diana's. 554 00:23:16,220 --> 00:23:21,110 Where do you want to point Diana's left pointer at? 555 00:23:21,110 --> 00:23:23,620 At 55, presumably, so that we've inserted there. 556 00:23:23,620 --> 00:23:25,560 And where should 55 pointer go? 557 00:23:25,560 --> 00:23:27,000 Down, representing null. 558 00:23:27,000 --> 00:23:28,890 And my hands, at this point, don't matter because they were just 559 00:23:28,890 --> 00:23:30,070 temporary variables. 560 00:23:30,070 --> 00:23:31,030 So now we're done. 561 00:23:31,030 --> 00:23:34,650 >> So the additional complexity there-- and it's not that hard to implement, 562 00:23:34,650 --> 00:23:38,660 but we need a secondary variable to make sure that before I move my right 563 00:23:38,660 --> 00:23:42,140 hand, I update the value of my left hand, pred pointer in this case, so 564 00:23:42,140 --> 00:23:45,860 that I have a trailing pointer to keep track of where I was. 565 00:23:45,860 --> 00:23:49,360 Now as an aside, if you're thinking this through, this feels like it's a 566 00:23:49,360 --> 00:23:51,490 little annoying to have to keep track of this left hand. 567 00:23:51,490 --> 00:23:54,015 >> What would another solution to this problem have been? 568 00:23:54,015 --> 00:23:56,500 If you got to redesign the data structure we're talking 569 00:23:56,500 --> 00:23:59,630 through right now? 570 00:23:59,630 --> 00:24:02,690 If this just kind of feels a little annoying to have, like, two pointers 571 00:24:02,690 --> 00:24:08,430 going through the list, who else could have, in an ideal world, maintained 572 00:24:08,430 --> 00:24:10,160 information that we need? 573 00:24:10,160 --> 00:24:11,360 Yeah? 574 00:24:11,360 --> 00:24:12,610 >> STUDENT: [INAUDIBLE]. 575 00:24:12,610 --> 00:24:15,160 576 00:24:15,160 --> 00:24:16,150 >> SPEAKER 1: Exactly. 577 00:24:16,150 --> 00:24:19,130 Right so there's actually an interesting germ of an idea. 578 00:24:19,130 --> 00:24:22,470 And this idea of a previous pointer, pointing at the previous element. 579 00:24:22,470 --> 00:24:25,580 What if I just embodied that inside of the list itself? 580 00:24:25,580 --> 00:24:27,810 And it's going to be hard to visualize this without all the paper 581 00:24:27,810 --> 00:24:28,830 falling to the floor. 582 00:24:28,830 --> 00:24:31,860 But suppose that these guys used both of their hands to have a previous 583 00:24:31,860 --> 00:24:35,950 pointer, and a next pointer, thereby implementing what we'll call a doubly 584 00:24:35,950 --> 00:24:36,830 linked list. 585 00:24:36,830 --> 00:24:41,090 That would allow me to sort of rewind, much more easily without me, the 586 00:24:41,090 --> 00:24:43,800 programmer, having to keep track manually-- 587 00:24:43,800 --> 00:24:44,980 truly manually-- 588 00:24:44,980 --> 00:24:47,280 of where I had been previously in the list. 589 00:24:47,280 --> 00:24:48,110 So we won't do that. 590 00:24:48,110 --> 00:24:50,950 We'll keep it simple because that's going to come at a price, twice as 591 00:24:50,950 --> 00:24:53,450 much space for the pointers, if you want a second one. 592 00:24:53,450 --> 00:24:55,760 But that's indeed a common data structure known as a 593 00:24:55,760 --> 00:24:57,410 doubly linked list. 594 00:24:57,410 --> 00:25:01,310 >> Let's do the final example here and put these guys out of their misery. 595 00:25:01,310 --> 00:25:03,270 So malloc 20. 596 00:25:03,270 --> 00:25:05,320 Come on up from the aisle there. 597 00:25:05,320 --> 00:25:06,280 All right, what's your name? 598 00:25:06,280 --> 00:25:07,440 >> STUDENT: [INAUDIBLE]. 599 00:25:07,440 --> 00:25:07,855 >> SPEAKER 1: Sorry? 600 00:25:07,855 --> 00:25:08,480 >> STUDENT: [INAUDIBLE]. 601 00:25:08,480 --> 00:25:09,410 >> SPEAKER 1: Demeron? 602 00:25:09,410 --> 00:25:10,230 OK come on up. 603 00:25:10,230 --> 00:25:11,910 You shall be 20. 604 00:25:11,910 --> 00:25:14,720 You obviously are going to belong between 17 and 22. 605 00:25:14,720 --> 00:25:16,150 So let me learn my lesson. 606 00:25:16,150 --> 00:25:18,150 I'm going to start pointer pointing at Brian. 607 00:25:18,150 --> 00:25:21,190 And I'm going to have my left hand only update to Brian as I move to 608 00:25:21,190 --> 00:25:23,600 Jason, checking does 20 less than nine? 609 00:25:23,600 --> 00:25:24,060 No. 610 00:25:24,060 --> 00:25:25,430 Is 20 less than 17? 611 00:25:25,430 --> 00:25:25,880 No. 612 00:25:25,880 --> 00:25:27,450 Is 20 less than 22? 613 00:25:27,450 --> 00:25:28,440 Yes. 614 00:25:28,440 --> 00:25:34,070 So what pointers or hands need to change where they're pointing now? 615 00:25:34,070 --> 00:25:37,070 >> So we can do 17 pointing at 20. 616 00:25:37,070 --> 00:25:37,860 So that's fine. 617 00:25:37,860 --> 00:25:40,080 Where do we want to point your pointer now? 618 00:25:40,080 --> 00:25:41,330 At 22. 619 00:25:41,330 --> 00:25:45,410 And we know where 22 is, again thanks to my temporary pointer. 620 00:25:45,410 --> 00:25:46,760 So we're OK there. 621 00:25:46,760 --> 00:25:49,440 So because of this temporary storage I've kept track of where everyone is. 622 00:25:49,440 --> 00:25:55,055 And now you can visually go into where you belong, and now we need 1, 2, 3, 623 00:25:55,055 --> 00:25:58,410 4, 5, 6, 7, 8, 9 stress balls, and a round of applause for 624 00:25:58,410 --> 00:25:59,770 these guys, if we could. 625 00:25:59,770 --> 00:26:00,410 Nicely done. 626 00:26:00,410 --> 00:26:05,320 >> [APPLAUSE] 627 00:26:05,320 --> 00:26:06,330 >> SPEAKER 1: All right. 628 00:26:06,330 --> 00:26:09,860 And you may keep the pieces of paper as mementos. 629 00:26:09,860 --> 00:26:15,930 >> All right, so, trust me it's a lot easier to walk through that with 630 00:26:15,930 --> 00:26:17,680 humans than it is with actual code. 631 00:26:17,680 --> 00:26:22,690 But what you'll find in just a moment now, is that same-- oh, thank you. 632 00:26:22,690 --> 00:26:23,630 Thank you-- 633 00:26:23,630 --> 00:26:29,360 is that you'll find that the same data structure, a linked list, can actually 634 00:26:29,360 --> 00:26:33,200 be used as a building block to even more sophisticated data structures. 635 00:26:33,200 --> 00:26:37,620 >> And realize too the theme here is that we've absolutely introduced more 636 00:26:37,620 --> 00:26:40,060 complexity into the implementation of this algorithm. 637 00:26:40,060 --> 00:26:43,940 Insertion, and if we went through it, deletion and searching, is a little 638 00:26:43,940 --> 00:26:46,660 more complicated than it was with an array. 639 00:26:46,660 --> 00:26:48,040 But we gain some dynamism. 640 00:26:48,040 --> 00:26:50,180 We get an adaptive data structure. 641 00:26:50,180 --> 00:26:54,010 >> But again, we pay a price of having some additional complexity, both in 642 00:26:54,010 --> 00:26:54,910 implementing it. 643 00:26:54,910 --> 00:26:56,750 And we're given up random access. 644 00:26:56,750 --> 00:27:00,450 And to be honest, there's not some nice clean slide I can give you that 645 00:27:00,450 --> 00:27:03,120 says here is why a linked list is better than an array. 646 00:27:03,120 --> 00:27:04,100 And leave it at that. 647 00:27:04,100 --> 00:27:07,520 Because the theme reoccurring now, even more so in the coming weeks, is 648 00:27:07,520 --> 00:27:10,200 that there's not necessarily a correct answer. 649 00:27:10,200 --> 00:27:13,830 >> This is why we have the separate axis of design for problem sets. 650 00:27:13,830 --> 00:27:17,700 It will be very context sensitive whether you want to use this data 651 00:27:17,700 --> 00:27:21,750 structure or that one, and it will depend on what matters to you in terms 652 00:27:21,750 --> 00:27:24,620 of resources and complexity. 653 00:27:24,620 --> 00:27:28,830 >> But let me propose that the ideal data structure, the holy grail, would be 654 00:27:28,830 --> 00:27:32,200 something that's constant time, irrespective of how much stuff is 655 00:27:32,200 --> 00:27:36,940 inside it, wouldn't it be amazing if a data structure returned answers in 656 00:27:36,940 --> 00:27:37,920 constant time. 657 00:27:37,920 --> 00:27:38,330 Yes. 658 00:27:38,330 --> 00:27:40,110 This word is in your huge dictionary. 659 00:27:40,110 --> 00:27:41,550 Or no, this word is not. 660 00:27:41,550 --> 00:27:43,270 Or any such problem there. 661 00:27:43,270 --> 00:27:46,360 Well let's see if we can't at least take a step toward that. 662 00:27:46,360 --> 00:27:50,190 >> Let me propose a new data structure that can be used for different things, 663 00:27:50,190 --> 00:27:52,260 in this case called a hash table. 664 00:27:52,260 --> 00:27:55,590 And so we're actually back to glancing at an array, in this case, and 665 00:27:55,590 --> 00:28:00,550 somewhat arbitrarily, I've drawn this hash table as an array with sort of a 666 00:28:00,550 --> 00:28:02,810 two-dimensional array-- 667 00:28:02,810 --> 00:28:05,410 or rather it's depicted here as a two dimensional array-- but this is just 668 00:28:05,410 --> 00:28:10,770 an array of size 26, such that if we call the array table, table bracket 669 00:28:10,770 --> 00:28:12,440 zero is the rectangle at the top. 670 00:28:12,440 --> 00:28:15,090 Table bracket 25 is the rectangle at the bottom. 671 00:28:15,090 --> 00:28:18,620 And this is how I might draw a data structure in which I want to store 672 00:28:18,620 --> 00:28:19,790 people's names. 673 00:28:19,790 --> 00:28:24,370 >> So for instance, and I won't draw the whole thing here on the overhead, if I 674 00:28:24,370 --> 00:28:29,160 had this array, which I'm now going to call a hash table, and this is again 675 00:28:29,160 --> 00:28:31,360 location zero. 676 00:28:31,360 --> 00:28:34,840 This here is location one, and so forth. 677 00:28:34,840 --> 00:28:37,880 I claim that I want to use this data structure, for the sake of discussion, 678 00:28:37,880 --> 00:28:42,600 to store people's names, Alice and Bob and Charlie and other such names. 679 00:28:42,600 --> 00:28:46,110 So think of this now as the beginnings of, say, a dictionary 680 00:28:46,110 --> 00:28:47,520 with lots of words. 681 00:28:47,520 --> 00:28:49,435 They happen to be names in our example here. 682 00:28:49,435 --> 00:28:52,560 And this is all too germane, perhaps, to implementing a spell checker, as we 683 00:28:52,560 --> 00:28:54,400 might for problem set six. 684 00:28:54,400 --> 00:28:59,300 >> So if we have an array of total size 26 so that this is the 25th location 685 00:28:59,300 --> 00:29:03,390 at the bottom, and I claim that Alice is the first word in the dictionary of 686 00:29:03,390 --> 00:29:07,260 names that I want to insert into RAM, into this data structure, where are 687 00:29:07,260 --> 00:29:12,480 instincts telling you that Alice's name should go in this array? 688 00:29:12,480 --> 00:29:13,510 >> We have 26 options. 689 00:29:13,510 --> 00:29:14,990 Where we want to put her? 690 00:29:14,990 --> 00:29:16,200 We want her in bracket zero, right? 691 00:29:16,200 --> 00:29:18,280 A for Alice, let's call that zero. 692 00:29:18,280 --> 00:29:20,110 And B will be one, and C will be two. 693 00:29:20,110 --> 00:29:22,600 So we're going to write Alice's name up here. 694 00:29:22,600 --> 00:29:24,890 If we then insert Bob, his name will go here. 695 00:29:24,890 --> 00:29:27,280 Charlie will go here. 696 00:29:27,280 --> 00:29:30,500 And so forth down through this data structure. 697 00:29:30,500 --> 00:29:32,090 >> This is a wonderful data structure. 698 00:29:32,090 --> 00:29:32,730 Why? 699 00:29:32,730 --> 00:29:37,460 Well what is the running time of inserting a human's name into this 700 00:29:37,460 --> 00:29:39,850 data structure right now? 701 00:29:39,850 --> 00:29:43,702 Given that this table is implemented, truly, as an array. 702 00:29:43,702 --> 00:29:44,940 Well it's constant time. 703 00:29:44,940 --> 00:29:45,800 It's order of one. 704 00:29:45,800 --> 00:29:46,360 Why? 705 00:29:46,360 --> 00:29:48,630 >> Well how do you determine where Alice belongs? 706 00:29:48,630 --> 00:29:51,000 You look at which letter of her name? 707 00:29:51,000 --> 00:29:51,490 The first. 708 00:29:51,490 --> 00:29:54,350 And you can get there, if it's a string, by just looking at string 709 00:29:54,350 --> 00:29:55,200 bracket zero. 710 00:29:55,200 --> 00:29:57,110 So the zeroth character of the string. 711 00:29:57,110 --> 00:29:57,610 That's easy. 712 00:29:57,610 --> 00:30:00,350 We did that in the crypto assignment weeks ago. 713 00:30:00,350 --> 00:30:05,310 And then once you know that Alice's letter is capital A, we can subtract 714 00:30:05,310 --> 00:30:08,160 off 65 or capital A itself, that gives us zero. 715 00:30:08,160 --> 00:30:10,940 So we now know that Alice belongs at location zero. 716 00:30:10,940 --> 00:30:14,240 >> And given a pointer to this data structure, of some sort, how long does 717 00:30:14,240 --> 00:30:18,840 it take me to find location zero in an array? 718 00:30:18,840 --> 00:30:22,080 Just one step, right It's constant time because of the random access we 719 00:30:22,080 --> 00:30:23,780 proposed was a feature of an array. 720 00:30:23,780 --> 00:30:28,570 So in short, figuring out what the index of Alice's name is, which is, in 721 00:30:28,570 --> 00:30:32,610 this case, is A, or let's just resolve that to zero, where B is one and C is 722 00:30:32,610 --> 00:30:34,900 two, figuring that out is constant time. 723 00:30:34,900 --> 00:30:38,510 I just have to look at her first letter, figuring out where zero is an 724 00:30:38,510 --> 00:30:40,460 array is also constant time. 725 00:30:40,460 --> 00:30:42,140 So technically that's like two steps now. 726 00:30:42,140 --> 00:30:43,330 But that's still constant. 727 00:30:43,330 --> 00:30:46,880 So we call that big O of one, so we've inserted Alice into this table in 728 00:30:46,880 --> 00:30:48,440 constant time. 729 00:30:48,440 --> 00:30:50,960 >> But of course, I'm being naive here, right? 730 00:30:50,960 --> 00:30:53,240 What if there's an Aaron in the class? 731 00:30:53,240 --> 00:30:53,990 Or Alicia? 732 00:30:53,990 --> 00:30:57,230 Or any other names starting with A. Where are we going to put 733 00:30:57,230 --> 00:31:00,800 that person, right? 734 00:31:00,800 --> 00:31:03,420 I mean, right now there's only three people on the table, so maybe we 735 00:31:03,420 --> 00:31:07,490 should put Aaron at location zero one two three. 736 00:31:07,490 --> 00:31:09,480 >> Right, I could put A here. 737 00:31:09,480 --> 00:31:13,350 But then, if we try to insert David into this list, where does David go? 738 00:31:13,350 --> 00:31:15,170 Now our system starts breaking down, right? 739 00:31:15,170 --> 00:31:19,210 Because now David ends up here if Aaron is actually here. 740 00:31:19,210 --> 00:31:23,060 And so now this whole idea of having a clean data structure that gives us 741 00:31:23,060 --> 00:31:28,010 constant time insertions is no longer constant time, because I have to 742 00:31:28,010 --> 00:31:31,240 check, oh, damnit, someone's already at Alice's location. 743 00:31:31,240 --> 00:31:35,320 >> Let me probe the rest of this data structure, looking for a spot to put 744 00:31:35,320 --> 00:31:37,130 someone like Aaron's name. 745 00:31:37,130 --> 00:31:39,390 And so that too is starting to take linear time. 746 00:31:39,390 --> 00:31:42,710 Moreover , if you now want to find the Aaron in this data structure, and you 747 00:31:42,710 --> 00:31:45,430 check, and Aaron's name is not here. 748 00:31:45,430 --> 00:31:47,960 Ideally, you would just say Aaron's not in the data structure. 749 00:31:47,960 --> 00:31:51,530 But if you do start making room for Aaron where there should have been a D 750 00:31:51,530 --> 00:31:55,600 or an E, you, worst case, have to check the whole data structure, in 751 00:31:55,600 --> 00:31:59,480 which case it devolves into something linear in the size of the table. 752 00:31:59,480 --> 00:32:00,920 >> So all right, I'll fix this. 753 00:32:00,920 --> 00:32:04,200 The problem here is that I had 26 elements in this array. 754 00:32:04,200 --> 00:32:05,000 Let me change it. 755 00:32:05,000 --> 00:32:06,010 Whoops. 756 00:32:06,010 --> 00:32:10,600 Let me change it so that rather being of size 26 in total, notice the bottom 757 00:32:10,600 --> 00:32:12,720 index is going to change to n minus 1. 758 00:32:12,720 --> 00:32:16,610 If 26 is clearly too small for humans' names, because there's thousands of 759 00:32:16,610 --> 00:32:20,830 names of the world, let's just make in 100 or 1,000 or 10,000. 760 00:32:20,830 --> 00:32:22,960 Let's just allocate a lot more space. 761 00:32:22,960 --> 00:32:27,230 >> Well that doesn't necessarily decrease the probability that we won't have two 762 00:32:27,230 --> 00:32:31,510 people with names starting with A, and so, you were going to try to put A 763 00:32:31,510 --> 00:32:33,120 names at location zero still. 764 00:32:33,120 --> 00:32:36,850 They're still going to collide, which means we still need a solution to put 765 00:32:36,850 --> 00:32:41,020 Alice and Aaron and Alicia and other names starting with A elsewhere. 766 00:32:41,020 --> 00:32:43,460 But how much of a problem is this? 767 00:32:43,460 --> 00:32:46,870 What's the probability that you have collisions in a data 768 00:32:46,870 --> 00:32:48,240 structure like this? 769 00:32:48,240 --> 00:32:52,570 >> Well, let me-- we'll come back to that question here. 770 00:32:52,570 --> 00:32:55,530 And look at how we might solve it first. 771 00:32:55,530 --> 00:32:58,480 Let me pull up this proposal here. 772 00:32:58,480 --> 00:33:02,020 What we just described is an algorithm, a heuristic called linear 773 00:33:02,020 --> 00:33:05,030 probing whereby, if you tried to insert something here in this data 774 00:33:05,030 --> 00:33:08,920 structure, which is called a hash table, and there's no room there, you 775 00:33:08,920 --> 00:33:12,000 truly probe the data structure checking, is this available? 776 00:33:12,000 --> 00:33:13,430 Is this Available is this available? 777 00:33:13,430 --> 00:33:13,980 Is this available? 778 00:33:13,980 --> 00:33:17,550 And when it finally is, you insert the name that you originally intended 779 00:33:17,550 --> 00:33:19,370 elsewhere at that location. 780 00:33:19,370 --> 00:33:23,360 But in the worst case, the only spot might be the very bottom of the data 781 00:33:23,360 --> 00:33:25,090 structure, the very end of the array. 782 00:33:25,090 --> 00:33:30,130 >> So linear probing, in the worst case, devolves into a linear algorithm where 783 00:33:30,130 --> 00:33:34,500 Aaron, if he happens to be inserted last in this data structure, he might 784 00:33:34,500 --> 00:33:39,540 collide with this first location, but then end by bad luck at the very end. 785 00:33:39,540 --> 00:33:43,940 So this is not a constant time holy grail for us. 786 00:33:43,940 --> 00:33:47,650 This approach of inserting elements into a data structure called a hash 787 00:33:47,650 --> 00:33:52,050 table does not seem to be constant time at least not in the general case. 788 00:33:52,050 --> 00:33:54,000 It can devolve into something linear. 789 00:33:54,000 --> 00:33:56,970 >> So what if we resolve collisions somewhat differently? 790 00:33:56,970 --> 00:34:00,740 So here's a more sophisticated approach to what's still 791 00:34:00,740 --> 00:34:02,800 called a hash table. 792 00:34:02,800 --> 00:34:05,890 And by hash, as an aside, what I mean is the index that 793 00:34:05,890 --> 00:34:07,070 I referred to earlier. 794 00:34:07,070 --> 00:34:09,810 To hash something can be thought of as a verb. 795 00:34:09,810 --> 00:34:13,690 >> So if you hash Alice's a name, a hash function, so to speak, 796 00:34:13,690 --> 00:34:14,710 should return a number. 797 00:34:14,710 --> 00:34:18,199 In this case is zero if she belongs at location zero, one if she belongs at 798 00:34:18,199 --> 00:34:20,000 location one, and so forth. 799 00:34:20,000 --> 00:34:24,360 So my hash function thus far has been super simple, only looking at the 800 00:34:24,360 --> 00:34:26,159 first letter in someone's name. 801 00:34:26,159 --> 00:34:29,090 But a hash function takes as input some piece of data, a 802 00:34:29,090 --> 00:34:30,210 string, an int, whatever. 803 00:34:30,210 --> 00:34:32,239 And it spits out typically a number. 804 00:34:32,239 --> 00:34:35,739 And that number is where that data element belongs in a data structure 805 00:34:35,739 --> 00:34:37,800 known here as a hash table. 806 00:34:37,800 --> 00:34:41,400 >> So just intuitively, this is a slightly different context. 807 00:34:41,400 --> 00:34:44,170 This actually is referring to an example involving birthdays, where 808 00:34:44,170 --> 00:34:46,850 there might be as many as 31 days in the month. 809 00:34:46,850 --> 00:34:52,239 But what did this person decide to do in the event of a collision? 810 00:34:52,239 --> 00:34:55,304 Context now being, not a collision of names, but a collision of birthdays, 811 00:34:55,304 --> 00:35:00,760 if two people have the same birthday on the 2nd of October, for instance. 812 00:35:00,760 --> 00:35:02,120 >> STUDENT: [INAUDIBLE]. 813 00:35:02,120 --> 00:35:05,010 >> SPEAKER 1: Yeah, so here we have the leveraging of linked lists. 814 00:35:05,010 --> 00:35:07,830 So it looks a little differently than we drew it earlier. 815 00:35:07,830 --> 00:35:10,790 But we appear to have to an array on the left hand side. 816 00:35:10,790 --> 00:35:13,230 That's one index, for no particular reason. 817 00:35:13,230 --> 00:35:14,630 But it's still an array. 818 00:35:14,630 --> 00:35:16,160 It's an array of pointers. 819 00:35:16,160 --> 00:35:20,670 And each of those elements, each of these circles or slashes--the slash 820 00:35:20,670 --> 00:35:23,970 representing null-- each of these pointers is apparently pointing to 821 00:35:23,970 --> 00:35:25,730 what data structure? 822 00:35:25,730 --> 00:35:26,890 A linked list. 823 00:35:26,890 --> 00:35:30,530 >> So now we have the ability to hard code into our program 824 00:35:30,530 --> 00:35:32,010 the size of the table. 825 00:35:32,010 --> 00:35:35,360 In this case, we know there's never more than 31 days in a month. 826 00:35:35,360 --> 00:35:38,480 So hard coding a value like 31 is reasonable in that context. 827 00:35:38,480 --> 00:35:42,700 In the context of names, hard coding 26 is not unreasonable it people's 828 00:35:42,700 --> 00:35:46,340 names only start with, for instance, the alphabet involving A through Z. 829 00:35:46,340 --> 00:35:50,180 >> We can cram them all into that data structure so long as, when we get a 830 00:35:50,180 --> 00:35:55,330 collision, we don't put the names here, we instead think of these cells 831 00:35:55,330 --> 00:36:00,270 not as strings themselves, but as pointers to, for instance, Alice. 832 00:36:00,270 --> 00:36:03,660 And then Alice can have another pointer to another name starting with 833 00:36:03,660 --> 00:36:06,150 A. And Bob actually goes over here. 834 00:36:06,150 --> 00:36:10,850 >> And if there's another name starting with B, he ends up over here. 835 00:36:10,850 --> 00:36:15,070 And so each of the elements of this table two, if we designed this a 836 00:36:15,070 --> 00:36:17,350 little more cleverly-- 837 00:36:17,350 --> 00:36:18,125 come on-- 838 00:36:18,125 --> 00:36:22,950 if we designed this a little more cleverly, now becomes an adaptive data 839 00:36:22,950 --> 00:36:27,720 structure, where there's no hard limit on how many elements you can insert 840 00:36:27,720 --> 00:36:30,700 into it because if you do have a collision, that's fine. 841 00:36:30,700 --> 00:36:34,690 Just go ahead and append it to what we saw a bit ago was 842 00:36:34,690 --> 00:36:38,290 known as a linked list. 843 00:36:38,290 --> 00:36:39,690 >> Well let's pause for just a moment. 844 00:36:39,690 --> 00:36:42,570 What is the probability of a collision in the first place? 845 00:36:42,570 --> 00:36:45,480 Right, maybe I'm over thinking, maybe I'm over engineering this problem, 846 00:36:45,480 --> 00:36:46,370 because you know what? 847 00:36:46,370 --> 00:36:49,070 Yes, I can come up with arbitrary examples off the top of my head like 848 00:36:49,070 --> 00:36:52,870 Allison and Aaron, but in reality, given a uniform distribution of 849 00:36:52,870 --> 00:36:56,990 inputs, that is some random insertions into a data structure, what really is 850 00:36:56,990 --> 00:36:58,580 the probability of a collision? 851 00:36:58,580 --> 00:37:01,670 Well turns out, it's actually super high. 852 00:37:01,670 --> 00:37:03,850 Let me generalize this problem is as this. 853 00:37:03,850 --> 00:37:08,890 >> So in a room of n CS50 students, what's the probability that at least 854 00:37:08,890 --> 00:37:11,010 two students in the room have the same birthday? 855 00:37:11,010 --> 00:37:13,346 So there's what. a few hund-- 856 00:37:13,346 --> 00:37:16,790 200, 300 people here and several hundred people at home today. 857 00:37:16,790 --> 00:37:20,670 So if you wanted to ask ourselves what's the probability of two people 858 00:37:20,670 --> 00:37:23,930 in this room having the same birthday, we can figure this out. 859 00:37:23,930 --> 00:37:26,250 And I claim actually there are two people with the same birthday. 860 00:37:26,250 --> 00:37:29,560 >> For instance, does anyone have birthday today? 861 00:37:29,560 --> 00:37:31,340 Yesterday? 862 00:37:31,340 --> 00:37:32,590 Tomorrow? 863 00:37:32,590 --> 00:37:35,980 All right, so it feels like I'm going to have to do this 363 or so more 864 00:37:35,980 --> 00:37:39,500 times to actually figure out if we do have a collision. 865 00:37:39,500 --> 00:37:42,350 Or we could just do this mathematically rather than tediously 866 00:37:42,350 --> 00:37:43,200 doing this. 867 00:37:43,200 --> 00:37:44,500 And propose the following. 868 00:37:44,500 --> 00:37:48,740 >> So I propose that we could model the probability of two people having the 869 00:37:48,740 --> 00:37:55,320 same birthday as the probability of 1 minus the probability of no one having 870 00:37:55,320 --> 00:37:56,290 the same birthday. 871 00:37:56,290 --> 00:37:59,960 So to get this, and this is just the fancy way of writing this, for the 872 00:37:59,960 --> 00:38:03,090 first person in the room, he or she can have any one of the possible 873 00:38:03,090 --> 00:38:07,370 birthdays assuming 365 days in the year, with apologies to persons with 874 00:38:07,370 --> 00:38:08,760 the February 29th birthday. 875 00:38:08,760 --> 00:38:13,470 >> So the first person in this room is free to have any number of birthdays 876 00:38:13,470 --> 00:38:18,280 out of the 365 possibilities so that we'll do that 365 divided by 365, 877 00:38:18,280 --> 00:38:18,990 which is one. 878 00:38:18,990 --> 00:38:22,700 The next person in the room, if the goal is to avoid a collision, can only 879 00:38:22,700 --> 00:38:26,460 have his or her birthday on how many different possible days? 880 00:38:26,460 --> 00:38:27,610 364. 881 00:38:27,610 --> 00:38:31,430 So the second term in this expression is essentially doing that math for us 882 00:38:31,430 --> 00:38:33,460 by subtracting off one possible day. 883 00:38:33,460 --> 00:38:36,390 And then the next day, the next day, the next day down to the total number 884 00:38:36,390 --> 00:38:38,100 of people in the room. 885 00:38:38,100 --> 00:38:41,290 >> And if we then consider, what then is the probability not of everyone having 886 00:38:41,290 --> 00:38:45,265 unique birthdays, but again 1 minus that, what we get is an expression 887 00:38:45,265 --> 00:38:47,810 that can very fancifully look like this. 888 00:38:47,810 --> 00:38:50,330 But it's more interesting to look at visually. 889 00:38:50,330 --> 00:38:55,120 This is a chart where on the x-axis is the number of people in the room, the 890 00:38:55,120 --> 00:38:56,180 number of birthdays. 891 00:38:56,180 --> 00:38:59,840 On the y-axis is the probability of a collision, two people 892 00:38:59,840 --> 00:39:01,230 having the same birthday. 893 00:39:01,230 --> 00:39:05,020 >> And the takeaway from this curve is that as soon as you get to like 40 894 00:39:05,020 --> 00:39:11,110 students, you're up at a 90% probability combinatorically of two 895 00:39:11,110 --> 00:39:13,550 people or more having the same birthday. 896 00:39:13,550 --> 00:39:18,600 And once you get to like 58 people it's nearly 100% of a chance the two 897 00:39:18,600 --> 00:39:21,310 people in the room are going to have the same birthday, even though there's 898 00:39:21,310 --> 00:39:26,650 365 or 366 possible buckets, and only 58 people in the room. 899 00:39:26,650 --> 00:39:29,900 Just statistically you're likely to get collisions, which in short 900 00:39:29,900 --> 00:39:31,810 motivates this discussion. 901 00:39:31,810 --> 00:39:35,890 That even if we get fancy here, and start having these chains, we're still 902 00:39:35,890 --> 00:39:36,950 going to have collisions. 903 00:39:36,950 --> 00:39:42,710 >> So that begs the question, what is the cost of doing insertions and deletions 904 00:39:42,710 --> 00:39:44,850 into a data structure like this? 905 00:39:44,850 --> 00:39:46,630 Well let me propose-- 906 00:39:46,630 --> 00:39:51,570 and let me go back to the screen over here-- if we have n elements in the 907 00:39:51,570 --> 00:39:56,330 list, so if we're trying to insert n elements, and we have 908 00:39:56,330 --> 00:39:58,050 how many total buckets? 909 00:39:58,050 --> 00:40:03,450 Let's say 31 total buckets in the case of birthdays. 910 00:40:03,450 --> 00:40:09,240 What's the maximum length of one of these chains potentially? 911 00:40:09,240 --> 00:40:12,670 >> If again there's 31 possible birthdays in a given month. 912 00:40:12,670 --> 00:40:14,580 And we're just clumping everyone-- 913 00:40:14,580 --> 00:40:15,580 actually that's a stupid example. 914 00:40:15,580 --> 00:40:16,960 Let's do 26 instead. 915 00:40:16,960 --> 00:40:20,890 So if actually have people whose names start with A through Z, thereby giving 916 00:40:20,890 --> 00:40:22,780 us 26 possibilities. 917 00:40:22,780 --> 00:40:25,920 And we're using a data structure like the one we just saw, whereby we have 918 00:40:25,920 --> 00:40:30,210 an array of pointers, each of which points to a linked list where the 919 00:40:30,210 --> 00:40:32,360 first list is everyone with the name Alice. 920 00:40:32,360 --> 00:40:35,770 The second list is every with the name starting with A, starting 921 00:40:35,770 --> 00:40:36,980 with B, and so forth. 922 00:40:36,980 --> 00:40:41,020 >> What's the likely length of each of those lists if we assume a nice clean 923 00:40:41,020 --> 00:40:45,410 distribution of names A through Z across the whole data structure? 924 00:40:45,410 --> 00:40:50,210 There's n people in the data structure divided by 26, if they're nicely 925 00:40:50,210 --> 00:40:52,110 spread out over the whole data structure. 926 00:40:52,110 --> 00:40:54,970 So the length of each of these chains is n divided by 26. 927 00:40:54,970 --> 00:40:57,380 But in big O notation, what is that? 928 00:40:57,380 --> 00:41:00,100 929 00:41:00,100 --> 00:41:02,440 What is that really? 930 00:41:02,440 --> 00:41:04,150 So it's really just n, right? 931 00:41:04,150 --> 00:41:06,620 Because we've said in the past, that ugh you divide by 26. 932 00:41:06,620 --> 00:41:08,710 Yes, in reality it is faster. 933 00:41:08,710 --> 00:41:12,720 But in theory, it's not fundamentally all that faster. 934 00:41:12,720 --> 00:41:16,040 >> So we don't seem to be all that much closer to this holy grail. 935 00:41:16,040 --> 00:41:17,750 In fact, this is just linear time. 936 00:41:17,750 --> 00:41:20,790 Heck, at this point, why don't we just use one huge linked list? 937 00:41:20,790 --> 00:41:23,510 Why don't we just use one huge array to store the names of 938 00:41:23,510 --> 00:41:25,010 everyone in the room? 939 00:41:25,010 --> 00:41:28,280 Well, is there still something compelling about a hash table? 940 00:41:28,280 --> 00:41:30,810 Is there still something compelling about a data structure 941 00:41:30,810 --> 00:41:33,940 that looks like this? 942 00:41:33,940 --> 00:41:35,182 This. 943 00:41:35,182 --> 00:41:37,050 >> STUDENT: [INAUDIBLE]. 944 00:41:37,050 --> 00:41:39,840 >> SPEAKER 1: Right, and again if it's just a linear time algorithm, and a 945 00:41:39,840 --> 00:41:42,780 linear time data structure, why don't I just store everyone's name in a big 946 00:41:42,780 --> 00:41:44,210 array, or in a big linked list? 947 00:41:44,210 --> 00:41:47,010 And stop making CS so much harder than it needs to be? 948 00:41:47,010 --> 00:41:49,600 949 00:41:49,600 --> 00:41:53,190 What is compelling about this, even though I scratched it out? 950 00:41:53,190 --> 00:41:54,930 >> STUDENT: [INAUDIBLE]. 951 00:41:54,930 --> 00:41:57,040 >> SPEAKER 1: Insertions aren't? 952 00:41:57,040 --> 00:41:58,140 Expensive anymore. 953 00:41:58,140 --> 00:42:03,390 So insertions potentially could still be constant time, even if your data 954 00:42:03,390 --> 00:42:07,910 structure looks like this, an array of pointers, each of which is pointing at 955 00:42:07,910 --> 00:42:09,550 potentially a linked list. 956 00:42:09,550 --> 00:42:15,220 How could you achieve constant time insertion of names? 957 00:42:15,220 --> 00:42:16,280 Stick it in the front, right? 958 00:42:16,280 --> 00:42:19,290 >> If we sacrifice a design goal from earlier, where we wanted to keep 959 00:42:19,290 --> 00:42:22,650 everyone's name, for instance, sorted, or all of the numbers on stage sorted, 960 00:42:22,650 --> 00:42:25,020 suppose that we have an unsorted linked list. 961 00:42:25,020 --> 00:42:29,960 It only costs us one or two steps, like in the case of Ben and Brian 962 00:42:29,960 --> 00:42:32,750 earlier, to insert an element at the beginning of the list. 963 00:42:32,750 --> 00:42:36,090 So if we don't care about sorting all of the names starting with A or all 964 00:42:36,090 --> 00:42:39,660 the names starting with B, we can still achieve constant time insertion. 965 00:42:39,660 --> 00:42:43,900 Now looking up Alice or Bob or any name more generally is still what? 966 00:42:43,900 --> 00:42:48,100 It's big O of n divided by 26, in the ideal case where everyone's uniformly 967 00:42:48,100 --> 00:42:51,190 distributed, where there's as many A's as there are Z's, which is probably 968 00:42:51,190 --> 00:42:52,220 unrealistic. 969 00:42:52,220 --> 00:42:53,880 But that's still linear. 970 00:42:53,880 --> 00:42:57,120 >> But here, we come back to the point of asymptotic notation being 971 00:42:57,120 --> 00:42:58,600 theoretically true. 972 00:42:58,600 --> 00:43:02,960 But in the real world, if I claim that my program can do something 26 times 973 00:43:02,960 --> 00:43:06,210 faster than yours, whose program are you going to prefer using? 974 00:43:06,210 --> 00:43:09,660 Yours or mine, which is 26 times faster? 975 00:43:09,660 --> 00:43:14,320 Realistically, the person whose is 26 times faster, even if theoretically 976 00:43:14,320 --> 00:43:18,790 our algorithms run in the same asymptotic running time. 977 00:43:18,790 --> 00:43:20,940 >> Let me propose a different solution altogether. 978 00:43:20,940 --> 00:43:24,380 And if this doesn't blow your mind, we're out of data structures. 979 00:43:24,380 --> 00:43:27,420 So this is it a trie-- 980 00:43:27,420 --> 00:43:28,520 kind of a stupid name. 981 00:43:28,520 --> 00:43:32,880 It comes from retrievals, and the word is spelled trie, t-r-i-e, because of 982 00:43:32,880 --> 00:43:34,450 course retrieval sounds like trie. 983 00:43:34,450 --> 00:43:36,580 But that's the history of the word trie. 984 00:43:36,580 --> 00:43:40,980 >> So a trie is indeed some kind of tree, and it's also a play on that word. 985 00:43:40,980 --> 00:43:46,330 And even though you can't quite see it with this visualization, a trie is a 986 00:43:46,330 --> 00:43:50,790 tree structured, like a family tree with one ancestor at the top and lots 987 00:43:50,790 --> 00:43:54,530 of grandchildren and great grandchildren as leaves on the bottom. 988 00:43:54,530 --> 00:43:58,100 But each node in a trie is an array. 989 00:43:58,100 --> 00:44:00,680 And it's in an array-- and let's oversimplify for a moment-- it's an 990 00:44:00,680 --> 00:44:04,600 array, in this case, of size 26, where each node again is an array of size 991 00:44:04,600 --> 00:44:09,000 26, where the zeroth element in that array represents A, and the last 992 00:44:09,000 --> 00:44:11,810 element in each such array represents Z. 993 00:44:11,810 --> 00:44:15,520 >> So I propose, then, that this data structure, known as a trie, can be 994 00:44:15,520 --> 00:44:17,600 used also to store words. 995 00:44:17,600 --> 00:44:21,740 We saw a moment ago how we could store words, or in this case names, and we 996 00:44:21,740 --> 00:44:25,440 saw earlier how we can store numbers, but if we focus on names or strings 997 00:44:25,440 --> 00:44:27,460 here, notice what's interesting. 998 00:44:27,460 --> 00:44:32,210 I claim that the name Maxwell is inside of this data structure. 999 00:44:32,210 --> 00:44:33,730 Where do you see Maxwell? 1000 00:44:33,730 --> 00:44:35,140 >> STUDENT: [INAUDIBLE]. 1001 00:44:35,140 --> 00:44:36,240 >> SPEAKER 1: On the left. 1002 00:44:36,240 --> 00:44:39,910 So what's interesting with this data structure is rather than store the 1003 00:44:39,910 --> 00:44:46,200 string M-A-X-W-E-L-L backslash zero, all contiguously, what you instead do 1004 00:44:46,200 --> 00:44:46,890 is following. 1005 00:44:46,890 --> 00:44:50,510 If this is a trie like data structure, each of whose nodes is again an array, 1006 00:44:50,510 --> 00:44:54,650 and you want to store Maxwell, you first index and so the root's node, so 1007 00:44:54,650 --> 00:44:57,810 to speak, the topmost node, at location M, right, so 1008 00:44:57,810 --> 00:44:59,160 roughly into the middle. 1009 00:44:59,160 --> 00:45:03,740 And then from there, you follow a pointer to a child nodes, so to speak. 1010 00:45:03,740 --> 00:45:06,150 So in the family tree sense, you follow it downward. 1011 00:45:06,150 --> 00:45:09,030 And that lead you to another node on the left there, which is 1012 00:45:09,030 --> 00:45:10,540 just another array. 1013 00:45:10,540 --> 00:45:14,710 >> And then if you want to store Maxwell, you find the pointer that represents 1014 00:45:14,710 --> 00:45:16,430 A, which is this one here. 1015 00:45:16,430 --> 00:45:17,840 Then you go to the next node. 1016 00:45:17,840 --> 00:45:20,100 And notice-- this is why the picture's a little deceiving-- 1017 00:45:20,100 --> 00:45:21,990 this node look super tiny. 1018 00:45:21,990 --> 00:45:26,050 But to the right of this is Y and Z. It's just the author has truncated the 1019 00:45:26,050 --> 00:45:27,630 picture so that you actually see things. 1020 00:45:27,630 --> 00:45:30,400 Otherwise this picture would be hugely wide. 1021 00:45:30,400 --> 00:45:36,180 So now you index into location X, then W, Then E, then L, then L. Then what's 1022 00:45:36,180 --> 00:45:37,380 this curiosity? 1023 00:45:37,380 --> 00:45:41,250 >> Well, if we're using this sort of new take on how to store a string in a 1024 00:45:41,250 --> 00:45:44,500 data structure, you still need to essentially check off in the data 1025 00:45:44,500 --> 00:45:47,250 structure that a word ends here. 1026 00:45:47,250 --> 00:45:50,830 In other words, each of these nodes somehow has to remember that we 1027 00:45:50,830 --> 00:45:53,500 actually followed all of these pointers and are leaving a little 1028 00:45:53,500 --> 00:45:58,370 bread crumb at the bottom here of this structure to indicate M-A-X-W-E-L-L is 1029 00:45:58,370 --> 00:46:00,230 indeed in this data structure. 1030 00:46:00,230 --> 00:46:02,040 >> So we might do this as follows. 1031 00:46:02,040 --> 00:46:06,810 Each of the nodes in the picture we just saw has one, an array of size 27. 1032 00:46:06,810 --> 00:46:10,550 And it's now 27, because in p set six, we'll actually give you an apostrophe, 1033 00:46:10,550 --> 00:46:13,590 so we can have names like O'Reilly and others with apostrophes. 1034 00:46:13,590 --> 00:46:14,820 But same idea. 1035 00:46:14,820 --> 00:46:17,710 Each of those elements in the array points to a struct 1036 00:46:17,710 --> 00:46:19,320 node, so just a node. 1037 00:46:19,320 --> 00:46:21,430 So this is very reminiscent of our linked list. 1038 00:46:21,430 --> 00:46:24,550 >> And then I have a Boolean, which I'll call word, which is just going to be 1039 00:46:24,550 --> 00:46:29,120 true if a word ends at this node in the tree. 1040 00:46:29,120 --> 00:46:32,870 It effectively represents the little triangle we saw a moment ago. 1041 00:46:32,870 --> 00:46:37,190 So if a word ends at that node in the tree, that word field will be true, 1042 00:46:37,190 --> 00:46:41,990 which is conceptually checking off, or we're drawing this triangle, yes there 1043 00:46:41,990 --> 00:46:44,080 is a word here. 1044 00:46:44,080 --> 00:46:45,120 >> So this is a trie. 1045 00:46:45,120 --> 00:46:48,540 And now the question is, what is its running time? 1046 00:46:48,540 --> 00:46:49,930 Is it big O of n? 1047 00:46:49,930 --> 00:46:51,410 Is it something else? 1048 00:46:51,410 --> 00:46:57,330 Well, if you have n names in this data structure, Maxwell being just one of 1049 00:46:57,330 --> 00:47:02,330 them, what is the running time of inserting or finding Maxwell? 1050 00:47:02,330 --> 00:47:06,230 1051 00:47:06,230 --> 00:47:09,050 What's the running time of inserting Maxwell? 1052 00:47:09,050 --> 00:47:11,740 If there's n other names already in the table? 1053 00:47:11,740 --> 00:47:12,507 Yeah? 1054 00:47:12,507 --> 00:47:15,429 >> STUDENT: [INAUDIBLE]. 1055 00:47:15,429 --> 00:47:17,550 >> SPEAKER 1: Yeah, it's the length of the name, right? 1056 00:47:17,550 --> 00:47:24,420 So M-a-x-w-e-l-l so it feels like this algorithm is big O of seven. 1057 00:47:24,420 --> 00:47:26,580 Now, of course, the name will vary in length. 1058 00:47:26,580 --> 00:47:27,380 Maybe it's a short name. 1059 00:47:27,380 --> 00:47:28,600 Maybe it's a longer name. 1060 00:47:28,600 --> 00:47:33,390 But what's key here is that it's a constant number. 1061 00:47:33,390 --> 00:47:36,810 And maybe it's not really constant, but god, if realistically, in a 1062 00:47:36,810 --> 00:47:41,570 dictionary, there's probably some limit on the number of letters in a 1063 00:47:41,570 --> 00:47:43,820 person's name in a particular country. 1064 00:47:43,820 --> 00:47:46,940 >> And so we can assume that value is a constant. 1065 00:47:46,940 --> 00:47:47,750 I don't know what it is. 1066 00:47:47,750 --> 00:47:50,440 It's probably larger than we think it is. 1067 00:47:50,440 --> 00:47:52,720 Because there's always some corner case with a crazy long name. 1068 00:47:52,720 --> 00:47:56,360 So let's call it k, but it's still a constant presumably, because every 1069 00:47:56,360 --> 00:48:00,190 name in the world, at least in a particular country, is that length or 1070 00:48:00,190 --> 00:48:01,780 shorter, so it's constant. 1071 00:48:01,780 --> 00:48:04,490 But when we've said something is big O of a constant value, what's that 1072 00:48:04,490 --> 00:48:07,760 really equivalent to? 1073 00:48:07,760 --> 00:48:10,420 That's really the same thing as saying constant time. 1074 00:48:10,420 --> 00:48:11,530 >> Now we're kind of cheating, right? 1075 00:48:11,530 --> 00:48:15,340 We're kind of leveraging some theory here to say that well, order of k is 1076 00:48:15,340 --> 00:48:17,450 really just order of one, and it's constant time. 1077 00:48:17,450 --> 00:48:18,200 But it really is. 1078 00:48:18,200 --> 00:48:22,550 Because the key insight here is that if we have n names already in this 1079 00:48:22,550 --> 00:48:26,010 data structure, and we insert Maxwell, is the amount of time it takes us to 1080 00:48:26,010 --> 00:48:29,530 insert Maxwell at all affected by how many other people 1081 00:48:29,530 --> 00:48:31,100 are in the data structure? 1082 00:48:31,100 --> 00:48:31,670 Doesn't seem to be. 1083 00:48:31,670 --> 00:48:36,280 If I had a billion more elements to this trie, and then insert Maxwell, is 1084 00:48:36,280 --> 00:48:38,650 he at all affected? 1085 00:48:38,650 --> 00:48:39,050 No. 1086 00:48:39,050 --> 00:48:42,950 And that's unlike any of the day data structures we've seen thus far, where 1087 00:48:42,950 --> 00:48:46,820 the running time of your algorithm is completely independent of how much 1088 00:48:46,820 --> 00:48:51,430 stuff is or isn't already in that data structure. 1089 00:48:51,430 --> 00:48:54,650 >> And so with this affords you now is an opportunity for p set six, which will 1090 00:48:54,650 --> 00:48:58,310 again involve implementing your own spell checker, reading in 150,000 1091 00:48:58,310 --> 00:49:01,050 words, how best to store that isn't necessarily obvious. 1092 00:49:01,050 --> 00:49:04,030 And though I've aspired to find the holy grail, I don't 1093 00:49:04,030 --> 00:49:05,330 claim that a trie is. 1094 00:49:05,330 --> 00:49:09,810 In fact, a hash table may very well prove to be much more efficient. 1095 00:49:09,810 --> 00:49:10,830 But those are just-- 1096 00:49:10,830 --> 00:49:14,620 that's just one of the design decisions you will have to make. 1097 00:49:14,620 --> 00:49:18,920 >> But in closing let's take 50 or so seconds to take a peek at what lies 1098 00:49:18,920 --> 00:49:22,190 ahead next week and beyond we transition from this command line 1099 00:49:22,190 --> 00:49:26,220 world if C programs to things web based and languages like PHP and 1100 00:49:26,220 --> 00:49:30,350 JavaScript and the internet itself, protocols like HTTP, which you've 1101 00:49:30,350 --> 00:49:32,870 taken for granted for years now, and typed most every 1102 00:49:32,870 --> 00:49:34,440 day, perhaps, or seen. 1103 00:49:34,440 --> 00:49:37,420 And we'll start to peel back the layers of what is the internet. 1104 00:49:37,420 --> 00:49:40,650 And what is the code that underlies today's tools. 1105 00:49:40,650 --> 00:49:43,230 So 50 seconds of this teaser here. 1106 00:49:43,230 --> 00:49:46,570 I give you Warriors of the Net. 1107 00:49:46,570 --> 00:49:51,370 >> [VIDEO PLAYBACK] 1108 00:49:51,370 --> 00:49:56,764 >> -He came with a message. 1109 00:49:56,764 --> 00:50:00,687 With a protocol all his own. 1110 00:50:00,687 --> 00:50:13,370 1111 00:50:13,370 --> 00:50:19,780 He came to a world of cruel firewalls, uncaring routers, and dangers far 1112 00:50:19,780 --> 00:50:22,600 worse than death. 1113 00:50:22,600 --> 00:50:23,590 He's fast. 1114 00:50:23,590 --> 00:50:25,300 He's strong. 1115 00:50:25,300 --> 00:50:27,700 He's TCPIP. 1116 00:50:27,700 --> 00:50:30,420 And he's got your address. 1117 00:50:30,420 --> 00:50:32,920 1118 00:50:32,920 --> 00:50:34,590 Warriors of the Net. 1119 00:50:34,590 --> 00:50:35,290 >> [END VIDEO PLAYBACK] 1120 00:50:35,290 --> 00:50:38,070 >> SPEAKER 1: That is how the internet shall work as of next week. 1121 00:50:38,070 --> 00:50:40,406