KEVIN SCHMID: Sometimes, when building a program, you might want to utilize a data structure known as a dictionary. A dictionary maps keys, which are usually strings, to values, ints, chars, a pointer to some object, whatever we want. It's just like ordinary dictionaries that map words through definitions. Dictionaries provide us with the ability to store information associated with something and look it up later. So how do we actually implement a dictionary in, say, C code that we can use in one of our programs? Well, there are a lot of ways that we could implement a dictionary. For one, we could use an array that we dynamically re-size or we could use a linked list, hash table or a binary tree. But whatever we choose, we should be mindful of the efficiency and performance of the implementation. We should think about the algorithm used to insert and look up items into our data structure. For now, let's assume that we want to use strings as keys. Let's talk about one possibility, a data structure called a trie. So here's a visual representation of a trie. As the picture suggests, a trie is a tree data structure with nodes linked together. We see that there's clearly a root node with some links extending to other nodes. But what does each node consist of? If we assume that we're storing keys with only alphabetic characters, and we don't care about capitalization, here's a definition of a node that will suffice. An object whose type is struct node has two parts called data and children. We've left the data part as a comment to be replaced by a component declaration when struct node is incorporated in a C program. The data part of a node might be a Boolean value to indicate whether or not the node represents the completion of a dictionary key or it might be a string representing the definition of a word in the dictionary. We'll use a smiley face to indicate when data is present in a node. There are 26 elements in our children array, one index per alphabetic character. We'll see the significance of this soon. Let's get a closer look of the root node in our diagram, which has no data associated with it, as indicated by the absence of the smiley face in the data portion. The arrows extending from the parts of the children array represent non-node pointers to other nodes. For example, the arrow extending from the second element of children represents the letter B in a dictionary key. And in the larger diagram we label it with a B. Note that in the larger diagram, when we draw a pointer to another node, it doesn't matter where the arrowhead meets that other node. Our sample dictionary trie contains two words, that and zoom. Let's walk through an example of looking up data for a key. Suppose we wanted to look up the corresponding value for the key bath. We'll begin our look up at the root node. Then we'll take the first letter of our key, B, and find the corresponding spot in our children array. Notice that there are exactly 26 spots in the array, one for each letter of the alphabet. And we'll have the spots represent the letters of the alphabet in order. We'll look at the second index then, index one, for B. In general, if we have some alphabetic character C we could determine the corresponding spot in the children array using a calculation like this. We could have used a larger children array if we wanted to offer look up of keys with a wider range of characters, such as the entire ASCII character set. In this case, the pointer in our children array at index one is not null. So we'll continue looking up the key bath. If we ever encountered a null pointer at the proper spot in the children array while we traversed the nodes, then we'll have to say that we couldn't find anything for that key. Now, we'll take the second letter of our key, A, and continue following pointers in this way until we reach the end of our key. If we reach the end of the key without hitting any dead ends, null pointers, as is the case here, then we only have to check one more thing. Is this key actually in the dictionary? If so, we should find a value, well a smiley face icon in our diagram where the word ends. If there is something else stored with the data, then we can return it. For example, the key zoo is not in the dictionary, even though we could have reached the end of this key without ever hitting a null pointer, while we iterate through the trie. If we tried to look up the key bath, the second to last node's array index, corresponding to the letter H, would have held a null pointer. So bath is not in the dictionary. And so a trie is unique in that the keys are never explicitly stored in the data structure. So how do we insert something into a trie? Let's insert the key zoo into our trie. Remember that a smiley face at a node could correspond in code to a simple Boolean value to indicate that zoo is in the dictionary or it could correspond to more information that we wish to associate with the key zoo, like the definition of the word or something else. In some ways, the process to insert something into a trie is similar to looking up something in a trie. We'll start with the root node again, following pointers corresponding to the letters of our key. Luckily, we were able to follow pointers all the way until we reached the end of the key. Since zoo is a prefix of the word zoom, which is a member of the dictionary, we don't need to allocate any new nodes. We can modify the node to indicate that the path of characters leading to it represents a key in our dictionary. Now, let's try inserting the key BATH into the trie. We'll start at the root node and follow pointers again. But in this situation, we hit a dead end before we're able to get to the end of the key. Now, we'll need to allocate some new nodes will need to allocate one new node for each remaining letter of our key. In this case, we just need to allocate one new node. Then we'll need to make the H index reference this new node. Once again, we can modify the node to indicate that the path of characters leading to it represents a key in our dictionary. Let's reason about the asymptotic complexity of our procedures for these two operations. We notice that in both cases the number of steps our algorithm took was proportional to the number of letters in the keyword. That's right. When you want to look up a word in a trie you just need to iterate through the letters one by one until you either reach the end of the word or hit a dead end in the trie. And when you wish to insert a key value pair into a trie using the procedure we discussed, the worst case will have you allocating a new node for each letter. And we'll assume that allocation is a constant time operation. So if we assume that the key length is bounded by a fixed constant, both insertion and look up are constant time operations for a trie. If we don't make this assumption that the key length is bounded by a fixed constant, then insertion and look up, in the worst case, are linear in the length of the key. Notice that the number of items stored in the trie doesn't affect the look up or insertion time. It's only impacted by the length of the key. By contrast, adding entries to, say, a hash table tends to make future look up slower. While this may sound appealing at first, we should keep in mind that a favorable asymptotic complexity doesn't mean that in practice the data structure is necessarily beyond reproach. We must also consider that to store a word in a trie we need, in the worst case, a number of nodes proportional to the length of the word itself. Tries tend to use a lot of space. That's in contrast to a hash table, where we only need one new node to store some key value pair. Now, again in theory, large space consumption doesn't seem like a big deal, especially given that modern computers have gigabytes and gigabytes of memory. But it turns out that we still have to worry about memory usage and organization for the sake of performance, since modern computers have mechanisms in place under the hood to speed up memory access. But these mechanisms work best when memory accesses are made in compact regions or areas. And the nodes of a trie could reside anywhere in that heap. But these are trade-offs that we must consider. Remember that, when choosing a data structure for a certain task, we should think about what kinds of operations the data structure needs to support and how much the performance of each of those operations matters to us. These operations may even extend beyond just basic look up and insertion. Suppose we wanted to implement a kind of auto-complete functionality, much like Google search engine does. That is, return all the keys and potentially values which have a given prefix. A trie is uniquely useful for this operation. It's straightforward to iterate through the trie for each character of the prefix. Just like a look up operation, we could follow pointers character by character. Then, when we arrive at the end of the prefix, we could iterate through the remaining portion of the data structure since any of the keys beyond this point have the prefix. It's also easy to obtain this listing in alphabetical order since the elements of the children array are ordered alphabetically. So hopefully you'll consider giving tries a try. I'm Kevin Schmid, and this is CS50. Ah, this is the beginning of the decline. I'm sorry. Sorry. Sorry. Sorry. Strike four. I'm out. Sorry. Sorry. Sorry. Sorry for making the person who has to edit this go crazy. Sorry. Sorry. Sorry. Sorry. SPEAKER 1: Well done. That was really well done.