1 00:00:00,000 --> 00:00:02,994 >> [MUSIC PLAYING] 2 00:00:02,994 --> 00:00:05,426 3 00:00:05,426 --> 00:00:08,550 DOUG LLOYD: So we've been inching closer and closer that holy grail of data 4 00:00:08,550 --> 00:00:13,050 structures, one that we can insert into, delete from, and look up 5 00:00:13,050 --> 00:00:15,440 in constant time. 6 00:00:15,440 --> 00:00:16,270 Right. 7 00:00:16,270 --> 00:00:17,280 That's sort of the goal. 8 00:00:17,280 --> 00:00:19,720 We want to be able to do things very, very quickly. 9 00:00:19,720 --> 00:00:22,580 >> Have we found it here when we're talking about tries? 10 00:00:22,580 --> 00:00:23,670 Well, let's take a look. 11 00:00:23,670 --> 00:00:25,628 So we've seen several different data structures 12 00:00:25,628 --> 00:00:28,680 that handle the mapping of so-called key-value pairs, 13 00:00:28,680 --> 00:00:32,080 mapping some piece of data to some other piece of data 14 00:00:32,080 --> 00:00:36,020 so we know where to find information in the structure. 15 00:00:36,020 --> 00:00:40,060 >> So for array, for example, the key is the element index or array 16 00:00:40,060 --> 00:00:42,600 location 0 or array 1 and so on. 17 00:00:42,600 --> 00:00:46,140 And the value is the data that exists at that location. 18 00:00:46,140 --> 00:00:48,550 So what is stored in array 0? 19 00:00:48,550 --> 00:00:54,290 What is stored in array 1 versus just 0 and 1, which would be the keys. 20 00:00:54,290 --> 00:00:56,360 >> With a hash table it's sort of the same idea. 21 00:00:56,360 --> 00:01:00,690 With a hash table, we have this hash function that generates hash codes. 22 00:01:00,690 --> 00:01:03,670 So the key is the hash code of the data. 23 00:01:03,670 --> 00:01:06,530 And the value, particularly we talked about chaining 24 00:01:06,530 --> 00:01:10,590 in the video on hash tables, is that linked list of data 25 00:01:10,590 --> 00:01:12,550 that hashes to that hashcode. 26 00:01:12,550 --> 00:01:14,050 Right. 27 00:01:14,050 --> 00:01:16,050 What about another approach this method, though? 28 00:01:16,050 --> 00:01:21,000 What about a method where the key is guaranteed to be unique, 29 00:01:21,000 --> 00:01:25,410 unlike a hash table, where we could end up with two pieces of data 30 00:01:25,410 --> 00:01:27,200 having the same hashcode. 31 00:01:27,200 --> 00:01:30,020 And then we have to deal with that by either probing or more 32 00:01:30,020 --> 00:01:33,340 preferably chaining to fix that problem. 33 00:01:33,340 --> 00:01:37,520 >> So now we can guarantee that our key will be unique. 34 00:01:37,520 --> 00:01:39,690 And what if our value was just something as easy 35 00:01:39,690 --> 00:01:44,080 as true and false that tells us whether or not that piece of information 36 00:01:44,080 --> 00:01:45,610 exists in the structure? 37 00:01:45,610 --> 00:01:48,180 A Boolean could be as simple as a bit. 38 00:01:48,180 --> 00:01:52,660 Realistically it's probably a byte more likely than a bit. 39 00:01:52,660 --> 00:01:55,410 But that's a lot smaller than storing maybe a 50-character string, 40 00:01:55,410 --> 00:01:57,360 for example. 41 00:01:57,360 --> 00:02:02,210 >> So tries, similar to hash tables, which combine arrays and linked list, 42 00:02:02,210 --> 00:02:05,790 tries combine arrays, structures, and pointers 43 00:02:05,790 --> 00:02:08,509 together to store data in an interesting way that's 44 00:02:08,509 --> 00:02:11,550 pretty different from anything we've seen so far. 45 00:02:11,550 --> 00:02:16,750 Now we use the data as a roadmap to navigate this data structure. 46 00:02:16,750 --> 00:02:18,710 And if we can follow the roadmap, if we can 47 00:02:18,710 --> 00:02:22,390 follow the data from beginning to end, we'll 48 00:02:22,390 --> 00:02:24,945 know whether that data exist in the trie. 49 00:02:24,945 --> 00:02:28,310 >> And if we can't follow the map from meaning to end at all, 50 00:02:28,310 --> 00:02:30,600 the data can't exist. 51 00:02:30,600 --> 00:02:32,890 Again, the keys here are guaranteed to be unique. 52 00:02:32,890 --> 00:02:36,020 And so unlike a hash table, we'll never have to deal with collisions here. 53 00:02:36,020 --> 00:02:39,090 And no two pieces of data have exactly the same roadmap 54 00:02:39,090 --> 00:02:40,530 unless that data is identical. 55 00:02:40,530 --> 00:02:44,580 >> If we insert John, then we search for John. 56 00:02:44,580 --> 00:02:47,430 That's two identical pieces of data, right, we're looking through. 57 00:02:47,430 --> 00:02:49,880 But otherwise, any two pieces of data are 58 00:02:49,880 --> 00:02:52,750 guaranteed to have unique roadmaps through this data structure. 59 00:02:52,750 --> 00:02:56,210 And we're going to take a look at a visual of this in just a moment. 60 00:02:56,210 --> 00:02:58,810 >> We'll do this by trying to create a new data structure, 61 00:02:58,810 --> 00:03:00,564 mapping the following key value pairs. 62 00:03:00,564 --> 00:03:03,480 In this case, we're not going to use something as simple as a Boolean. 63 00:03:03,480 --> 00:03:06,200 We actually will store the string. 64 00:03:06,200 --> 00:03:08,690 And that string is going to be the name of a university. 65 00:03:08,690 --> 00:03:12,140 >> And the key is going to be the year when that university was founded. 66 00:03:12,140 --> 00:03:15,380 All years for universities are going to be four digits. 67 00:03:15,380 --> 00:03:19,840 And so we'll use those four digits to navigate through this data structure. 68 00:03:19,840 --> 00:03:22,270 And we'll see, again, how we do that in just a second. 69 00:03:22,270 --> 00:03:25,110 >> At the end of the path, we'll see the name 70 00:03:25,110 --> 00:03:30,250 of the university that corresponds to that key, those four digits. 71 00:03:30,250 --> 00:03:34,390 The basic idea behind a trie is we have a central route. 72 00:03:34,390 --> 00:03:35,640 So think about it like a tree. 73 00:03:35,640 --> 00:03:39,211 And this is similar in spelling and in concept to a tree. 74 00:03:39,211 --> 00:03:41,460 Generally when we think about trees in the real world, 75 00:03:41,460 --> 00:03:44,090 they have a root that's in the ground and they grow upward 76 00:03:44,090 --> 00:03:46,830 and they have branches and they have leaves. 77 00:03:46,830 --> 00:03:49,450 And basically the idea of a trie is exactly the same, 78 00:03:49,450 --> 00:03:51,755 except that root is anchored somewhere in the sky. 79 00:03:51,755 --> 00:03:53,130 And the leaves are at the bottom. 80 00:03:53,130 --> 00:03:55,750 >> So it's kind of like taking a tree and just flipping it upside down. 81 00:03:55,750 --> 00:03:56,880 But there are still branches. 82 00:03:56,880 --> 00:03:59,463 And those would be our pathways, those will be our connections 83 00:03:59,463 --> 00:04:02,220 from the root to the leaves. 84 00:04:02,220 --> 00:04:04,200 In this case, those paths, those branches 85 00:04:04,200 --> 00:04:08,490 are labeled with digits that tell us which way to go from where we are. 86 00:04:08,490 --> 00:04:11,800 >> If we see a 0, we go down this branch, if we see a 1, we go down this branch, 87 00:04:11,800 --> 00:04:12,900 and so and so on. 88 00:04:12,900 --> 00:04:14,060 Well, what does this mean? 89 00:04:14,060 --> 00:04:16,519 Well, that means that at every junction point 90 00:04:16,519 --> 00:04:19,260 and every node in the middle and every branch, 91 00:04:19,260 --> 00:04:23,020 there are 10 possible places that we can go. 92 00:04:23,020 --> 00:04:27,690 So there are 10 pointers from every location. 93 00:04:27,690 --> 00:04:30,610 >> And this is where tries can get a little bit intimidating for somebody 94 00:04:30,610 --> 00:04:34,460 who's doesn't have a lot of experience in computer science before. 95 00:04:34,460 --> 00:04:35,960 But tries are really pretty awesome. 96 00:04:35,960 --> 00:04:37,793 And if you have the chance to work with them 97 00:04:37,793 --> 00:04:40,420 and you're willing to dig-in and experiment with them, 98 00:04:40,420 --> 00:04:44,234 they're really quite interesting data structures to work with. 99 00:04:44,234 --> 00:04:46,900 If we want to insert an element into the trie, all we need to do 100 00:04:46,900 --> 00:04:51,360 is build the correct path from the root to the leaf. 101 00:04:51,360 --> 00:04:55,390 Here's what every step along the way might look like. 102 00:04:55,390 --> 00:04:59,660 We're going to define a new data structure for a new node called a trie. 103 00:04:59,660 --> 00:05:02,560 >> And inside of that data structure there are two pieces. 104 00:05:02,560 --> 00:05:05,460 We're going to store the name of a university. 105 00:05:05,460 --> 00:05:09,410 And we're going to store an array of pointers 106 00:05:09,410 --> 00:05:12,190 to other nodes of the same type. 107 00:05:12,190 --> 00:05:14,780 So, again, this is that sort of concept of everywhere 108 00:05:14,780 --> 00:05:18,567 we are, we at 10 possible places we can go. 109 00:05:18,567 --> 00:05:20,150 If we see a 0, we go down this branch. 110 00:05:20,150 --> 00:05:22,690 If we see a 1, this branch, and so on and so on and so on. 111 00:05:22,690 --> 00:05:25,160 If we say 9, we go down this branch. 112 00:05:25,160 --> 00:05:28,220 So at every junction point, we can go 10 possible places. 113 00:05:28,220 --> 00:05:35,740 So every node has to contain 10 pointers to other nodes, to 10 other nodes. 114 00:05:35,740 --> 00:05:39,810 >> And the data we're storing is just the name of the university. 115 00:05:39,810 --> 00:05:41,060 So let's build a trie. 116 00:05:41,060 --> 00:05:44,860 Let's insert a couple of items into our trie. 117 00:05:44,860 --> 00:05:46,740 So at the very top, this is our root node. 118 00:05:46,740 --> 00:05:49,740 This is probably going to be something you're going to globally declare. 119 00:05:49,740 --> 00:05:53,450 And you're going to globally maintain a pointer to this node always. 120 00:05:53,450 --> 00:05:55,360 >> You're going to say, root equals, and you're 121 00:05:55,360 --> 00:05:57,580 going to malloc yourself trie node. 122 00:05:57,580 --> 00:05:59,850 And you're never going to touch root again. 123 00:05:59,850 --> 00:06:02,300 Every time you want to start navigating through, 124 00:06:02,300 --> 00:06:05,802 you set another pointer equal to root, such as trav, 125 00:06:05,802 --> 00:06:07,760 which is the example I use in many of my videos 126 00:06:07,760 --> 00:06:11,090 here on stacks and queues and link lists and so on. 127 00:06:11,090 --> 00:06:13,320 >> You set another pointer called trav for traversal. 128 00:06:13,320 --> 00:06:15,890 And you use trav to navigate through the data structure. 129 00:06:15,890 --> 00:06:17,500 So let's see how this might look. 130 00:06:17,500 --> 00:06:19,880 So right now, what does a node look like? 131 00:06:19,880 --> 00:06:22,920 Well, just as our data structure declaration indicated, 132 00:06:22,920 --> 00:06:26,906 we have a string, which in this case is empty. 133 00:06:26,906 --> 00:06:27,780 There's nothing here. 134 00:06:27,780 --> 00:06:29,550 >> And an array of 10 pointers. 135 00:06:29,550 --> 00:06:31,790 And right now, we only have 1 node in this trie. 136 00:06:31,790 --> 00:06:33,110 There's nothing else in it. 137 00:06:33,110 --> 00:06:36,020 So all 10 of those pointers point to null. 138 00:06:36,020 --> 00:06:38,090 That's what the red indicates. 139 00:06:38,090 --> 00:06:39,500 >> Let's insert the string Harvard. 140 00:06:39,500 --> 00:06:41,999 Let's insert the University of Harvard into this trie, which 141 00:06:41,999 --> 00:06:43,940 was founded in the year 1636. 142 00:06:43,940 --> 00:06:48,220 We want to use the key, 1636, to tell us where we're 143 00:06:48,220 --> 00:06:50,140 going to store Harvard in the trie. 144 00:06:50,140 --> 00:06:51,470 Now, how might we do that? 145 00:06:51,470 --> 00:06:52,886 >> It might look something like this. 146 00:06:52,886 --> 00:06:54,160 We start at the root. 147 00:06:54,160 --> 00:06:56,920 And we have these 10 places we can go. 148 00:06:56,920 --> 00:06:59,900 The root is just like any other node of the trie. 149 00:06:59,900 --> 00:07:02,850 There are 10 places we can go from here. 150 00:07:02,850 --> 00:07:07,215 >> Where do we probably want to go if the key is 1636? 151 00:07:07,215 --> 00:07:08,340 There's really two options. 152 00:07:08,340 --> 00:07:08,450 Right. 153 00:07:08,450 --> 00:07:10,825 We can build the key from right to left and start with 6. 154 00:07:10,825 --> 00:07:14,000 Or we could build the key from left to right and start with 1. 155 00:07:14,000 --> 00:07:16,140 >> It's probably more intuitive as a human being 156 00:07:16,140 --> 00:07:18,110 to understand we'll just go left to right. 157 00:07:18,110 --> 00:07:21,140 And so if I want to insert Harvard into this trie, 158 00:07:21,140 --> 00:07:23,560 I probably want to start by starting at the root, 159 00:07:23,560 --> 00:07:25,720 looking at my 10 options in front of me, and saying 160 00:07:25,720 --> 00:07:28,700 I want to go down the 1 path. 161 00:07:28,700 --> 00:07:29,700 OK. 162 00:07:29,700 --> 00:07:31,810 >> Now, 1 path is currently null. 163 00:07:31,810 --> 00:07:35,920 So if I want to proceed down that path to insert this element into the trie, 164 00:07:35,920 --> 00:07:42,040 I have to malloc a new node, have 1 point there, and then I'm good to go. 165 00:07:42,040 --> 00:07:46,460 >> So I basically am at a point where I'm standing 166 00:07:46,460 --> 00:07:50,270 at the root of the tree or the trie and there are 10 branches. 167 00:07:50,270 --> 00:07:52,260 But each branch has a gate in front of it. 168 00:07:52,260 --> 00:07:53,060 Right. 169 00:07:53,060 --> 00:07:54,850 Because there's nothing else there's. 170 00:07:54,850 --> 00:07:56,522 No safe passage. 171 00:07:56,522 --> 00:07:58,980 That means that there's nothing down any of those branches. 172 00:07:58,980 --> 00:08:02,532 If I want to start building something, I want to remove the gate. 173 00:08:02,532 --> 00:08:04,490 I want to remove the gate in front of number 1. 174 00:08:04,490 --> 00:08:05,698 And I want to walk down that. 175 00:08:05,698 --> 00:08:08,060 And I want to build another place for me to go. 176 00:08:08,060 --> 00:08:09,470 >> And that's what I've done here. 177 00:08:09,470 --> 00:08:11,430 So 1 no longer points to null. 178 00:08:11,430 --> 00:08:13,830 I've said it's safe to go down here now. 179 00:08:13,830 --> 00:08:15,789 I built another node. 180 00:08:15,789 --> 00:08:18,330 And when I get to that node, I have another decision to make. 181 00:08:18,330 --> 00:08:20,890 Where am I going to go from here? 182 00:08:20,890 --> 00:08:22,700 >> Well, I've already gone down the 1. 183 00:08:22,700 --> 00:08:24,470 So now I probably want to go down the 6. 184 00:08:24,470 --> 00:08:24,970 Right. 185 00:08:24,970 --> 00:08:27,100 Again, I have 10 locations I can choose. 186 00:08:27,100 --> 00:08:30,060 So let's now go down number 6. 187 00:08:30,060 --> 00:08:32,280 So I clear the gate in front of number 6. 188 00:08:32,280 --> 00:08:33,250 And I walk down there. 189 00:08:33,250 --> 00:08:34,580 And I build another node. 190 00:08:34,580 --> 00:08:37,630 And I've reached another junction point. 191 00:08:37,630 --> 00:08:40,289 >> Again, I have 10 choices for where I can go. 192 00:08:40,289 --> 00:08:42,799 I've moved from 1 to 6. 193 00:08:42,799 --> 00:08:44,215 So now I probably want to go to 3. 194 00:08:44,215 --> 00:08:45,381 3, there's nowhere I can go. 195 00:08:45,381 --> 00:08:48,980 So I have to clear the way and build myself a new space. 196 00:08:48,980 --> 00:08:50,870 And then from 3, where do I want to go? 197 00:08:50,870 --> 00:08:52,450 I want to go down 6. 198 00:08:52,450 --> 00:08:54,770 >> And, again, I had to clear the way to do it. 199 00:08:54,770 --> 00:08:59,179 So now I've used my key to insert create nodes and start to build this trie. 200 00:08:59,179 --> 00:09:00,220 I've started at the root. 201 00:09:00,220 --> 00:09:03,666 I've gone down 1636. 202 00:09:03,666 --> 00:09:05,540 And now I'm at the bottom there on that node. 203 00:09:05,540 --> 00:09:06,610 And you might be able to see it on your screen. 204 00:09:06,610 --> 00:09:07,735 >> It's highlighted in yellow. 205 00:09:07,735 --> 00:09:10,020 That's where I currently am. 206 00:09:10,020 --> 00:09:11,300 My key is done. 207 00:09:11,300 --> 00:09:13,030 I've exhausted every position in my key. 208 00:09:13,030 --> 00:09:15,040 So I can't go any further. 209 00:09:15,040 --> 00:09:17,720 So at this point, all I really need to do is say, OK. 210 00:09:17,720 --> 00:09:18,990 It's kind of like looking down at the ground, 211 00:09:18,990 --> 00:09:21,115 if you're envisioning yourself as this sort of path 212 00:09:21,115 --> 00:09:22,350 with different connections. 213 00:09:22,350 --> 00:09:25,800 Sort of looking down and sort of spray painting Harvard on the ground. 214 00:09:25,800 --> 00:09:26,800 That's the name of this. 215 00:09:26,800 --> 00:09:28,300 Know that's what's at this location. 216 00:09:28,300 --> 00:09:31,870 If we start at the root and we go down 1 and then 6 and then 3 and then 6, 217 00:09:31,870 --> 00:09:32,780 where are we? 218 00:09:32,780 --> 00:09:35,640 Well if we look down and we see Harvard, then 219 00:09:35,640 --> 00:09:38,960 we know that Harvard was founded in 1636 based on the way 220 00:09:38,960 --> 00:09:41,400 we're implementing this data structure. 221 00:09:41,400 --> 00:09:43,177 So that was hopefully straightforward. 222 00:09:43,177 --> 00:09:44,760 We're going to do two more insertions. 223 00:09:44,760 --> 00:09:50,060 And hopefully it'll help to see this done twice more. 224 00:09:50,060 --> 00:09:52,210 >> Now, let's insert another university. 225 00:09:52,210 --> 00:09:54,630 Let's insert Yale into this trie. 226 00:09:54,630 --> 00:09:57,037 Yale was founded in 1701. 227 00:09:57,037 --> 00:09:58,870 So we'll start at the root, as we always do. 228 00:09:58,870 --> 00:09:59,890 And we set a traversal pointer. 229 00:09:59,890 --> 00:10:01,624 We're going to use that to move through. 230 00:10:01,624 --> 00:10:03,790 The first thing we want to do is go down the 1 path. 231 00:10:03,790 --> 00:10:05,830 That's the first digit of our key. 232 00:10:05,830 --> 00:10:08,420 Fortunately, though, we don't have to do any work this time. 233 00:10:08,420 --> 00:10:09,919 The 1 path has already been cleared. 234 00:10:09,919 --> 00:10:13,520 I cleared it previously when I was inserting Harvard at 1636. 235 00:10:13,520 --> 00:10:18,090 So I can safely move down 1 and just go there. 236 00:10:18,090 --> 00:10:20,150 If can move down the 1. 237 00:10:20,150 --> 00:10:22,930 >> Now, though, I want to go to 7. 238 00:10:22,930 --> 00:10:24,280 I cleared the way at 6. 239 00:10:24,280 --> 00:10:27,050 I know I can safely proceed down the 6 path. 240 00:10:27,050 --> 00:10:29,220 But I need to proceed on the 7 path. 241 00:10:29,220 --> 00:10:30,580 So what do I need to do? 242 00:10:30,580 --> 00:10:35,070 Well, just like before, I just need to clear the gate, get out of the way, 243 00:10:35,070 --> 00:10:38,740 and build a new node from the 7 path. 244 00:10:38,740 --> 00:10:40,250 Just like this. 245 00:10:40,250 --> 00:10:42,930 >> So now I've moved 1 and then 7. 246 00:10:42,930 --> 00:10:45,550 And now notice, I'm sort of on this new subbranch. 247 00:10:45,550 --> 00:10:46,050 Right. 248 00:10:46,050 --> 00:10:49,260 Everything else from 16 on, I don't care about. 249 00:10:49,260 --> 00:10:50,720 I'm not doing 16 anything. 250 00:10:50,720 --> 00:10:51,750 I'm doing 17 stuff. 251 00:10:51,750 --> 00:10:58,380 >> So now from 17 on, I have to sort of blaze new trails here. 252 00:10:58,380 --> 00:11:00,462 The next digit my key is 0. 253 00:11:00,462 --> 00:11:01,670 I clearly can't get anywhere. 254 00:11:01,670 --> 00:11:02,628 I just built this node. 255 00:11:02,628 --> 00:11:04,550 So I know there's no paths forward from here. 256 00:11:04,550 --> 00:11:06,370 So I have to make one myself. 257 00:11:06,370 --> 00:11:09,360 >> So I malloc a new node and have 0 point there. 258 00:11:09,360 --> 00:11:12,770 And then one more time, I malloc a new node and have one point there. 259 00:11:12,770 --> 00:11:15,870 Again, I've exhausted my key, 1701. 260 00:11:15,870 --> 00:11:18,472 So I look down and I spray paint Yale. 261 00:11:18,472 --> 00:11:19,680 That's the name of this node. 262 00:11:19,680 --> 00:11:24,660 >> And so now if I ever need to see if Yale is in this trie, I start at the root, 263 00:11:24,660 --> 00:11:27,060 I go down 1701, and look down. 264 00:11:27,060 --> 00:11:30,030 And if I see Yale spray painted onto the ground, then 265 00:11:30,030 --> 00:11:32,200 I know Yale exists in this trie. 266 00:11:32,200 --> 00:11:32,950 Let's do one more. 267 00:11:32,950 --> 00:11:36,430 Let's insert Dartmouth into this trie, which was founded in 1769. 268 00:11:36,430 --> 00:11:37,750 >> Start at the root again. 269 00:11:37,750 --> 00:11:39,445 My first digit of my key is 1. 270 00:11:39,445 --> 00:11:40,820 I can safely move down that path. 271 00:11:40,820 --> 00:11:42,400 That already exists. 272 00:11:42,400 --> 00:11:44,040 The next digit of my key is 7. 273 00:11:44,040 --> 00:11:45,890 I can safely move down that path. 274 00:11:45,890 --> 00:11:47,540 It exists as well. 275 00:11:47,540 --> 00:11:49,000 >> My next is 6. 276 00:11:49,000 --> 00:11:52,860 From here, from where I currently am in yellow there in that middle node, 277 00:11:52,860 --> 00:11:56,060 6 is currently locked off. 278 00:11:56,060 --> 00:11:58,830 If I want to go down that path, I have to build it myself. 279 00:11:58,830 --> 00:12:02,250 So I'll malloc a new node and have 6 point there. 280 00:12:02,250 --> 00:12:04,250 And then, again, I'm blazing new trails here. 281 00:12:04,250 --> 00:12:10,750 >> So I malloc a new node so that from that node-- path number 9-- and then now 282 00:12:10,750 --> 00:12:13,584 if I travel 1769, and I look down. 283 00:12:13,584 --> 00:12:15,500 There's nothing currently spray painted there. 284 00:12:15,500 --> 00:12:16,930 I can write Dartmouth. 285 00:12:16,930 --> 00:12:20,710 And I've inserted Dartmouth into the trie. 286 00:12:20,710 --> 00:12:23,450 >> So that's inserting things into the trie. 287 00:12:23,450 --> 00:12:25,384 Now we want to search for things. 288 00:12:25,384 --> 00:12:27,050 How do we search for things in the trie? 289 00:12:27,050 --> 00:12:29,170 Well, it's pretty much the same idea. 290 00:12:29,170 --> 00:12:33,620 Now we just use the digits of the key to see if we can navigate from the root 291 00:12:33,620 --> 00:12:37,170 to where we want to go in the trie. 292 00:12:37,170 --> 00:12:41,620 >> If we hit a dead end at any point, then we know that that element can't exists 293 00:12:41,620 --> 00:12:44,500 or else that path would have already been cleared. 294 00:12:44,500 --> 00:12:45,930 If we make it all the way to the end, all we need to do 295 00:12:45,930 --> 00:12:48,471 is look down and see if that's the element we're looking for. 296 00:12:48,471 --> 00:12:49,335 If it is, success. 297 00:12:49,335 --> 00:12:52,610 If it's not, fail. 298 00:12:52,610 --> 00:12:54,940 >> So let's search for Harvard in this trie. 299 00:12:54,940 --> 00:12:56,020 We start at the root. 300 00:12:56,020 --> 00:12:58,228 And, again, we're going to create a traversal pointer 301 00:12:58,228 --> 00:12:59,390 to do our moves for us. 302 00:12:59,390 --> 00:13:02,080 From the root we know that the first place we need to go is 1, 303 00:13:02,080 --> 00:13:03,390 can we do that? 304 00:13:03,390 --> 00:13:03,982 Yes, we can. 305 00:13:03,982 --> 00:13:04,690 If safely exists. 306 00:13:04,690 --> 00:13:06,660 We can go there. 307 00:13:06,660 --> 00:13:08,440 >> Now, the next place we need to go is 6. 308 00:13:08,440 --> 00:13:10,557 Does the 6 path exist from here? 309 00:13:10,557 --> 00:13:11,140 Yeah, it does. 310 00:13:11,140 --> 00:13:12,690 We can go down the 6 path. 311 00:13:12,690 --> 00:13:13,905 And we end up here. 312 00:13:13,905 --> 00:13:16,130 >> Can we go down the 3 path from here? 313 00:13:16,130 --> 00:13:18,450 Well, as it turns out, yes, that exists too. 314 00:13:18,450 --> 00:13:20,790 And can we get on the 6 path from here? 315 00:13:20,790 --> 00:13:21,982 Yes, we can. 316 00:13:21,982 --> 00:13:24,002 >> We haven't quite answered the question yet. 317 00:13:24,002 --> 00:13:25,710 There's still one more step, which is now 318 00:13:25,710 --> 00:13:28,520 we need to look down and see if that's actually-- 319 00:13:28,520 --> 00:13:32,660 if we're looking for Harvard, is that what we find after we exhaust the key? 320 00:13:32,660 --> 00:13:35,430 In the example we're using here, the years are always four digits. 321 00:13:35,430 --> 00:13:40,280 But you might be using the example where you are storing a dictionary of words. 322 00:13:40,280 --> 00:13:44,060 >> And so instead of having 10 pointers for my location, you might have 26. 323 00:13:44,060 --> 00:13:46,040 One for each letter of the alphabet. 324 00:13:46,040 --> 00:13:50,350 And there are some words like bat, which is a subset of batch, for example. 325 00:13:50,350 --> 00:13:53,511 And so even if you get to the end of the key and you look down, 326 00:13:53,511 --> 00:13:55,260 you might not see what you're looking for. 327 00:13:55,260 --> 00:13:58,500 >> So you always have to traverse the entire path and then 328 00:13:58,500 --> 00:14:01,540 if you were successfully able to traverse the entire path, 329 00:14:01,540 --> 00:14:03,440 look down and do one final confirmation. 330 00:14:03,440 --> 00:14:05,120 Is that what I'm looking for? 331 00:14:05,120 --> 00:14:07,740 Well, I look down after starting at the top and going 1636. 332 00:14:07,740 --> 00:14:08,240 I look down. 333 00:14:08,240 --> 00:14:09,400 I see Harvard. 334 00:14:09,400 --> 00:14:11,689 So, yes, I succeeded. 335 00:14:11,689 --> 00:14:13,980 What if what I'm looking for isn't in the trie, though. 336 00:14:13,980 --> 00:14:17,200 What if I'm looking for Princeton, which was founded in 1746. 337 00:14:17,200 --> 00:14:20,875 And so 1746 becomes my key to navigate through the trie. 338 00:14:20,875 --> 00:14:22,040 Well, I start at the root. 339 00:14:22,040 --> 00:14:24,760 And the first place I want to goes down the 1 path. 340 00:14:24,760 --> 00:14:25,590 Can I do it? 341 00:14:25,590 --> 00:14:26,490 Yes, I can. 342 00:14:26,490 --> 00:14:28,730 >> Can I go down the 7 path from there? 343 00:14:28,730 --> 00:14:29,230 Yeah, I can. 344 00:14:29,230 --> 00:14:30,750 That exists too. 345 00:14:30,750 --> 00:14:32,460 But can I go down the 4 path from here? 346 00:14:32,460 --> 00:14:35,550 That's like asking the question, can I proceed down that little square 347 00:14:35,550 --> 00:14:37,114 that I've highlighted in yellow? 348 00:14:37,114 --> 00:14:38,030 There's nothing there. 349 00:14:38,030 --> 00:14:38,610 Right. 350 00:14:38,610 --> 00:14:41,310 >> There's no way forward down the 4 path. 351 00:14:41,310 --> 00:14:46,480 If Princeton was in this trie, that 4 would have been cleared for us already. 352 00:14:46,480 --> 00:14:49,130 And so at this point we've reached a dead end. 353 00:14:49,130 --> 00:14:50,250 We can't go any further. 354 00:14:50,250 --> 00:14:53,440 And so we can say, definitively, no. 355 00:14:53,440 --> 00:14:56,760 Princeton does not exist in this trie. 356 00:14:56,760 --> 00:14:58,860 >> So what does this all mean? 357 00:14:58,860 --> 00:14:59,360 Right. 358 00:14:59,360 --> 00:15:01,000 There's a lot going on here. 359 00:15:01,000 --> 00:15:02,500 There's pointers all over the place. 360 00:15:02,500 --> 00:15:04,249 And, as you can see just from the diagram, 361 00:15:04,249 --> 00:15:07,010 there's a lot of nodes that are kind of flying around. 362 00:15:07,010 --> 00:15:13,480 But notice every time we wanted to check whether something was in the trie, 363 00:15:13,480 --> 00:15:15,000 we only had to make 4 moves. 364 00:15:15,000 --> 00:15:17,208 >> Every time we wanted to insert something in the trie, 365 00:15:17,208 --> 00:15:20,440 we have to make 4 moves, possibly mallocing some stuff along the way. 366 00:15:20,440 --> 00:15:23,482 But as we saw when we inserted Dartmouth into the trie, 367 00:15:23,482 --> 00:15:25,940 sometimes some of the path might already be cleared for us. 368 00:15:25,940 --> 00:15:30,520 And so as our trie gets bigger and bigger, we have do less work every time 369 00:15:30,520 --> 00:15:32,270 to insert new things because we've already 370 00:15:32,270 --> 00:15:35,746 built a lot of the intermediate branches along the way. 371 00:15:35,746 --> 00:15:38,370 If we only ever have to look at 4 things, 4 is just a constant. 372 00:15:38,370 --> 00:15:41,750 We really are kind of approaching constant time insertion 373 00:15:41,750 --> 00:15:44,501 and constant time lookup. 374 00:15:44,501 --> 00:15:47,500 The tradeoff, of course, being that this trie, as you can probably tell, 375 00:15:47,500 --> 00:15:49,030 is huge. 376 00:15:49,030 --> 00:15:51,040 Each one of these nodes takes up a lot of space. 377 00:15:51,040 --> 00:15:52,090 >> But that's the tradeoff. 378 00:15:52,090 --> 00:15:55,260 If we want really quick insertion, really quick deletion, 379 00:15:55,260 --> 00:15:59,630 and really quick lookup, we have to have a lot of data flying around. 380 00:15:59,630 --> 00:16:03,590 We have to set aside a lot of space and memory for that data structure 381 00:16:03,590 --> 00:16:04,290 to exist. 382 00:16:04,290 --> 00:16:05,415 >> And so that's the tradeoff. 383 00:16:05,415 --> 00:16:07,310 But it looks like we might have found it. 384 00:16:07,310 --> 00:16:09,560 We might have found that holy grail of data structures 385 00:16:09,560 --> 00:16:12,264 with quick insertion, deletion, and lookup. 386 00:16:12,264 --> 00:16:14,430 And maybe this will be an appropriate data structure 387 00:16:14,430 --> 00:16:18,890 to use for whatever information we're trying to store. 388 00:16:18,890 --> 00:16:21,860 I'm Doug Lloyd, this is cs50. 389 00:16:21,860 --> 00:16:23,433