SPEAKER: Your first task in speller is going to be to implement the load function, which is responsible for taking the dictionary and loading it into a hash table. How is this going to work? Well, the load function will take as its argument a char star, or a string, which is the dictionary file that you're going to have to open up and read from in order to load all this data into your hash table. It's going to return a Boolean value. True if you were able to successfully load all the data into your hash table, and false if there was some sort of memory error, like not enough memory available to allocate, for example, or a file that couldn't be opened. So let's take a look at how you're going to do this. Well, ultimately, when you open up this dictionary file and store all of the words in memory, you're going to store them all inside of a hash table. And recall that a hash table is really just an array of individual linked lists. And the way you determine which of those linked lists you're going to insert a word into is based on a hash function, a function that in this case is going to take input, a string, or a word that you're going to include in the dictionary. And the output is going to be some number that you're able to generate from that input, which is going to correspond to which of the linked lists you want to place this particular word into. Let's take a closer look at hash tables to get a better sense for how they work and how they're going to be useful for this load function. Here, for example, I have five words that I would like to store inside of my hash table. Apple, banana, bat, car and book. And I could take all of these words and just put them in a single linked list. But then I'd have a pretty long linked list of five words that I would have to traverse if I wanted to find whether any one particular word was inside of my data structure. Instead, I can put them into a hash table, creating several buckets so to speak. I'll create an A bucket for all words that begin with the letter A, a B bucket for all words that begin with the letter B, and a C bucket for all words that begin with the letter C, and then associate each of those words with a particular bucket. So, for example, apple ends up in the A bucket. Bat, book, and banana end up in the B bucket. Car ends up with the C bucket. What I've used here is a hash function, a function that takes apple, or bat, or book as input, and outputs which of the linked lists-- the A linked list, or the B linked list, or the C linked list-- that I should place this particular word into. So what exactly is a hash function? Well, a hash function in this case is going to take a word as input, and it's going to output a number corresponding to which bucket or element within this array we want to store the word in. Because recall that a hash table really is just an array where the very first linked list in that array is at index 0, and then 1 and, then 2, and then so forth. And so our hash function is going to take the word and output which index into that array we should use to place that particular word in. And at each index inside of the hash table is going to be a linked list. And what is a linked list again? We'll recall that a linked list is a way of storing data that's comprised of nodes, where every node has a value as well as a pointer to the next node. So, for example, in the case of our spell checker, we're going to have a hash table, which has an array of linked lists. And each of those linked list nodes is going to have a value-- some place to store a word-- which in this case is going to be a character array, which can store words up to the maximum length of any word, as well as a pointer to the next word. What does this look like? Well, if you look at the top of dictionary dot C, you'll see the definition of a node here. We've typedef'd a new node to create a new type called node, where every node has a word, which is a character array, as well as a pointer that is going to give us the address of the next node in this linked list. So our hash table is going to give us the first node in the linked list. And from there, we can get to the next node, and the node after that, so on and so forth, until we get to the end of the linked list, where we're just going to find the value null. Then, immediately below that in dictionary dot C, you'll also notice this. We've defined const int N to be equal to 1. In other words, we defined a constant integer, which is called N, which we've initially set to 1. And now said that table is going to represent our hash table. And it's going to be an array of node pointers. An array where every element in that array is a pointer to a node. Right now, this hash table has only one element in the array. In other words, there's only one bucket in the hash table. So you'll likely want to change N. Set it to some higher value so your hash table can have more buckets, and therefore spread data out a little bit more evenly. Initially, though, your hash table is going to be empty. So how are you going to be able to add more data to that hash table? Well, to do so, you're going to need to allocate memory for a new node, and then put some data into that node. In order to allocate memory for a new node, I might have some code that looks something like this. Node star N equals malloc size of node. Malloc here means allocate some memory. Ask for some memory from the computer. And how much memory do I want to allocate? Well, size of node is going to be the number of bytes of memory that I need in order to restore a node. And so malloc size of node is going to give me just enough memory to be able to store a node. And I'm going to store the address of that node inside of N. How do I then put data into that node? Well, remember that a node has two properties to it. It has the word itself, as well as a pointer to the next node. So I might say, strcpy N arrow word, and then hello. What is this going to do? Well, strcpy is short for string copy. It's going to copy a string from a source into a destination. And in this case, it's going to copy the word hello into the character array N arrow word, which is the word field of this node that I've just created some space for. I've copied the word hello into that space of memory. In order to set the next pointer-- if I knew what the next node was, I could set it right away. But if I don't know, I can just set it to null. N arrow next equals null means right now, nothing comes after this particular node. Of course, in our linked list, some of these nodes are going to have other nodes that are the next node in the sequence of elements inside of the linked list. So what exactly do you need to do now inside of this load function? Well, the first thing that you're going to have to do is open up this dictionary file. Next, you're going to read strings from that file one at a time. One word in the dictionary at a time. For each of those words, you're going to create a new node. Some node that is going to have a value and the next pointer. And you're going to hash that word using a hash function to obtain a hash value, and index into your hash table to determine which of those linked lists you should use when you're inserting this node into a linked list. And then, finally, you're going to take each of those words and insert that node into the hash table at the location given by your hash function. So let's take a look now at each of these steps one at a time. Let's start with how you're going to open the dictionary file. Recall that the argument to the load function is a char star or string representing the dictionary file that you're going to use for the loading. You can use the f-open function to open up that file. And then, make sure to check to see if the return value is null to make sure that you were able to successfully open up the file. If you weren't able to successfully open up the file, your load function should return false. Otherwise, you'll be able to keep going to the next step of the load function. The next step is to read strings from the file. And for this, you can use a function called fscanf, short for scanning from a file, where the first argument is going to be that file pointer, the result of calling f-open on the dictionary file. The second argument is %s, meaning you want to read in a string. And the third argument is word, which is going to be some character array. Some place where you're going to read the word into. In other words, some character array inside of memory where you can then access all of the individual characters of that word after you've read it from the file. And you're going to want to repeat this again, and again, and again for each of the words in the dictionary, calling fscanf on the file over and over getting one word at a time until fscanf returns EOF. And that will happen once fscanf reaches the end of the file. So when fscanf returns EOF, that's your sign that you finished reading all of the words in the dictionary. But you'll likely want to have some sort of loop that's running fscanf again and again on one word, after word, after word until you get to the end of the file. What are you going to do for each of those individual words? Well, the next step is going to be to create a new node that is going to store that particular word inside of your hash table. How to do that? We'll recall that the malloc function you can use to allocate memory. So you'll want to allocate enough memory to store a new node. And again, be sure to check if malloc returns null or not, because if malloc doesn't have enough memory to be able to store the node, it will return null and your load function should return false. Once you've created that node, you're going to want to copy the word into that node using the strcpy function, a function which will take a string and copy it from one location into another location. Now that we've created the node, the next step is to figure out how it is we're going to insert this node into the hash table. Recall again that a hash table is just an array of linked lists. And we now need to determine which linked list we should use when we're determining where to place this particular node that we've just created. And to do that, we're going to hash the word. You're going to use the hash function, which is defined in dictionary don't C, which will be a function that takes a string and returns an index, some number that you can use to index into your linked list. Right now, we've given you a hash function that just returns 0 for every particular input. And later in this assignment, we're going to replace this hash function with a better hash function. But for now, you can just call upon the hash function inside of load to determine which index into the hash table you should use when you're inserting this new node. Once you've determined which linked list you should use, the next step is to actually insert that word into the linked list. How are you going to do that? Well, again, recall that the hash table is an array of linked lists. And you're going to want to index into the hash table to get the linked list that you want to use to add this particular word into. And in order to do that, you're going to have to add a new node to a linked list. So you'll want to be sure to make sure you're setting your pointers in the correct order. What do I mean by that? Well, let's take a look at an example. Here up at the top, imagine that head is a variable that points to the start of a linked list. The linked list is bat. And next is book. And next is banana. And banana here points to null, for example. And let's imagine that I have this new node, which is this blue mode that I've just malloced. And I want to add it to this linked list. So if I wanted to make this new node the new head of the linked list, one thing you might imagine that I would do is point head to point at this new node that I've just created. But what's just happened? Now that I've set head to be blue, this very first node inside of this linked list, I've now lost access effectively to all of the other nodes that were previously in the linked list. I have no way to get to them because both head and new node both point to just this blue node. And blue doesn't point to anything else. So what is the actual solution here. Well, the first thing that you're going to want to do is probably to take this new node-- in this case, blue-- and set its next pointer to be the first element in the linked list. Then after that, you can reset head-- in other words, the first element in the linked list-- to be the new node that you've just created. Now the beginning of the linked list is blue. And next comes bat, and then book, and banana. And we've been able to represent all of these words now together inside of a single linked list by adding our new node to the beginning of the linked list. And you're going to repeat this process for every word in the dictionary. Reading that word using fscanf, allocating enough memory in order to store that word, copying the word into the node, and then inserting that node into one of the linked lists in your hash table based on the result of calling your hash function. After you've done that for each of the words in the dictionary, you'll have successfully loaded all of your data into memory into a hash table such that you can now begin the spell checking process.