1 00:00:00,000 --> 00:00:00,360 2 00:00:00,360 --> 00:00:02,310 SPEAKER: Your first task in speller is going 3 00:00:02,310 --> 00:00:04,500 to be to implement the load function, which 4 00:00:04,500 --> 00:00:09,000 is responsible for taking the dictionary and loading it into a hash table. 5 00:00:09,000 --> 00:00:10,230 How is this going to work? 6 00:00:10,230 --> 00:00:14,340 Well, the load function will take as its argument a char star, or a string, 7 00:00:14,340 --> 00:00:16,440 which is the dictionary file that you're going 8 00:00:16,440 --> 00:00:18,750 to have to open up and read from in order 9 00:00:18,750 --> 00:00:21,510 to load all this data into your hash table. 10 00:00:21,510 --> 00:00:23,100 It's going to return a Boolean value. 11 00:00:23,100 --> 00:00:27,300 True if you were able to successfully load all the data into your hash table, 12 00:00:27,300 --> 00:00:29,370 and false if there was some sort of memory error, 13 00:00:29,370 --> 00:00:31,270 like not enough memory available to allocate, 14 00:00:31,270 --> 00:00:34,060 for example, or a file that couldn't be opened. 15 00:00:34,060 --> 00:00:36,400 So let's take a look at how you're going to do this. 16 00:00:36,400 --> 00:00:38,483 Well, ultimately, when you open up this dictionary 17 00:00:38,483 --> 00:00:40,530 file and store all of the words in memory, 18 00:00:40,530 --> 00:00:43,140 you're going to store them all inside of a hash table. 19 00:00:43,140 --> 00:00:47,010 And recall that a hash table is really just an array of individual linked 20 00:00:47,010 --> 00:00:47,820 lists. 21 00:00:47,820 --> 00:00:50,520 And the way you determine which of those linked lists 22 00:00:50,520 --> 00:00:52,740 you're going to insert a word into is based 23 00:00:52,740 --> 00:00:55,290 on a hash function, a function that in this case 24 00:00:55,290 --> 00:00:57,442 is going to take input, a string, or a word 25 00:00:57,442 --> 00:00:59,400 that you're going to include in the dictionary. 26 00:00:59,400 --> 00:01:01,770 And the output is going to be some number 27 00:01:01,770 --> 00:01:04,080 that you're able to generate from that input, which 28 00:01:04,080 --> 00:01:06,720 is going to correspond to which of the linked lists 29 00:01:06,720 --> 00:01:10,320 you want to place this particular word into. 30 00:01:10,320 --> 00:01:13,740 Let's take a closer look at hash tables to get a better sense for how they work 31 00:01:13,740 --> 00:01:16,860 and how they're going to be useful for this load function. 32 00:01:16,860 --> 00:01:19,140 Here, for example, I have five words that I would 33 00:01:19,140 --> 00:01:21,330 like to store inside of my hash table. 34 00:01:21,330 --> 00:01:23,550 Apple, banana, bat, car and book. 35 00:01:23,550 --> 00:01:27,210 And I could take all of these words and just put them in a single linked list. 36 00:01:27,210 --> 00:01:29,030 But then I'd have a pretty long linked list 37 00:01:29,030 --> 00:01:31,080 of five words that I would have to traverse 38 00:01:31,080 --> 00:01:34,590 if I wanted to find whether any one particular word was inside 39 00:01:34,590 --> 00:01:36,030 of my data structure. 40 00:01:36,030 --> 00:01:38,640 Instead, I can put them into a hash table, 41 00:01:38,640 --> 00:01:41,100 creating several buckets so to speak. 42 00:01:41,100 --> 00:01:44,430 I'll create an A bucket for all words that begin with the letter A, 43 00:01:44,430 --> 00:01:47,070 a B bucket for all words that begin with the letter B, 44 00:01:47,070 --> 00:01:49,920 and a C bucket for all words that begin with the letter C, 45 00:01:49,920 --> 00:01:53,670 and then associate each of those words with a particular bucket. 46 00:01:53,670 --> 00:01:55,910 So, for example, apple ends up in the A bucket. 47 00:01:55,910 --> 00:01:58,260 Bat, book, and banana end up in the B bucket. 48 00:01:58,260 --> 00:02:00,270 Car ends up with the C bucket. 49 00:02:00,270 --> 00:02:05,070 What I've used here is a hash function, a function that takes apple, or bat, 50 00:02:05,070 --> 00:02:09,419 or book as input, and outputs which of the linked lists-- the A linked list, 51 00:02:09,419 --> 00:02:11,670 or the B linked list, or the C linked list-- 52 00:02:11,670 --> 00:02:14,700 that I should place this particular word into. 53 00:02:14,700 --> 00:02:17,550 So what exactly is a hash function? 54 00:02:17,550 --> 00:02:20,970 Well, a hash function in this case is going to take a word as input, 55 00:02:20,970 --> 00:02:23,370 and it's going to output a number corresponding 56 00:02:23,370 --> 00:02:27,600 to which bucket or element within this array we want to store the word in. 57 00:02:27,600 --> 00:02:29,820 Because recall that a hash table really is just 58 00:02:29,820 --> 00:02:33,390 an array where the very first linked list in that array is at index 0, 59 00:02:33,390 --> 00:02:35,402 and then 1 and, then 2, and then so forth. 60 00:02:35,402 --> 00:02:37,110 And so our hash function is going to take 61 00:02:37,110 --> 00:02:40,560 the word and output which index into that array 62 00:02:40,560 --> 00:02:42,960 we should use to place that particular word in. 63 00:02:42,960 --> 00:02:47,670 And at each index inside of the hash table is going to be a linked list. 64 00:02:47,670 --> 00:02:49,265 And what is a linked list again? 65 00:02:49,265 --> 00:02:51,390 We'll recall that a linked list is a way of storing 66 00:02:51,390 --> 00:02:55,770 data that's comprised of nodes, where every node has a value as well 67 00:02:55,770 --> 00:02:57,930 as a pointer to the next node. 68 00:02:57,930 --> 00:03:00,330 So, for example, in the case of our spell checker, 69 00:03:00,330 --> 00:03:03,360 we're going to have a hash table, which has an array of linked lists. 70 00:03:03,360 --> 00:03:06,950 And each of those linked list nodes is going to have a value-- 71 00:03:06,950 --> 00:03:08,430 some place to store a word-- 72 00:03:08,430 --> 00:03:11,910 which in this case is going to be a character array, which can store words 73 00:03:11,910 --> 00:03:17,010 up to the maximum length of any word, as well as a pointer to the next word. 74 00:03:17,010 --> 00:03:18,190 What does this look like? 75 00:03:18,190 --> 00:03:21,040 Well, if you look at the top of dictionary dot C, 76 00:03:21,040 --> 00:03:22,980 you'll see the definition of a node here. 77 00:03:22,980 --> 00:03:26,130 We've typedef'd a new node to create a new type called node, where 78 00:03:26,130 --> 00:03:29,130 every node has a word, which is a character array, 79 00:03:29,130 --> 00:03:31,320 as well as a pointer that is going to give us 80 00:03:31,320 --> 00:03:34,900 the address of the next node in this linked list. 81 00:03:34,900 --> 00:03:38,040 So our hash table is going to give us the first node in the linked list. 82 00:03:38,040 --> 00:03:41,125 And from there, we can get to the next node, and the node after that, 83 00:03:41,125 --> 00:03:44,250 so on and so forth, until we get to the end of the linked list, where we're 84 00:03:44,250 --> 00:03:46,890 just going to find the value null. 85 00:03:46,890 --> 00:03:49,380 Then, immediately below that in dictionary dot C, 86 00:03:49,380 --> 00:03:50,880 you'll also notice this. 87 00:03:50,880 --> 00:03:54,010 We've defined const int N to be equal to 1. 88 00:03:54,010 --> 00:03:56,200 In other words, we defined a constant integer, 89 00:03:56,200 --> 00:03:58,800 which is called N, which we've initially set to 1. 90 00:03:58,800 --> 00:04:02,370 And now said that table is going to represent our hash table. 91 00:04:02,370 --> 00:04:05,460 And it's going to be an array of node pointers. 92 00:04:05,460 --> 00:04:09,540 An array where every element in that array is a pointer to a node. 93 00:04:09,540 --> 00:04:12,990 Right now, this hash table has only one element in the array. 94 00:04:12,990 --> 00:04:15,670 In other words, there's only one bucket in the hash table. 95 00:04:15,670 --> 00:04:19,149 So you'll likely want to change N. Set it to some higher value 96 00:04:19,149 --> 00:04:21,060 so your hash table can have more buckets, 97 00:04:21,060 --> 00:04:23,970 and therefore spread data out a little bit more evenly. 98 00:04:23,970 --> 00:04:26,920 Initially, though, your hash table is going to be empty. 99 00:04:26,920 --> 00:04:30,570 So how are you going to be able to add more data to that hash table? 100 00:04:30,570 --> 00:04:34,140 Well, to do so, you're going to need to allocate memory for a new node, 101 00:04:34,140 --> 00:04:36,360 and then put some data into that node. 102 00:04:36,360 --> 00:04:38,310 In order to allocate memory for a new node, 103 00:04:38,310 --> 00:04:40,560 I might have some code that looks something like this. 104 00:04:40,560 --> 00:04:43,680 Node star N equals malloc size of node. 105 00:04:43,680 --> 00:04:45,840 Malloc here means allocate some memory. 106 00:04:45,840 --> 00:04:47,730 Ask for some memory from the computer. 107 00:04:47,730 --> 00:04:50,040 And how much memory do I want to allocate? 108 00:04:50,040 --> 00:04:53,040 Well, size of node is going to be the number of bytes of memory 109 00:04:53,040 --> 00:04:55,620 that I need in order to restore a node. 110 00:04:55,620 --> 00:04:58,620 And so malloc size of node is going to give me just enough memory 111 00:04:58,620 --> 00:05:00,010 to be able to store a node. 112 00:05:00,010 --> 00:05:03,700 And I'm going to store the address of that node inside of N. 113 00:05:03,700 --> 00:05:05,920 How do I then put data into that node? 114 00:05:05,920 --> 00:05:08,450 Well, remember that a node has two properties to it. 115 00:05:08,450 --> 00:05:11,950 It has the word itself, as well as a pointer to the next node. 116 00:05:11,950 --> 00:05:16,600 So I might say, strcpy N arrow word, and then hello. 117 00:05:16,600 --> 00:05:17,770 What is this going to do? 118 00:05:17,770 --> 00:05:19,990 Well, strcpy is short for string copy. 119 00:05:19,990 --> 00:05:23,530 It's going to copy a string from a source into a destination. 120 00:05:23,530 --> 00:05:27,880 And in this case, it's going to copy the word hello into the character array 121 00:05:27,880 --> 00:05:31,780 N arrow word, which is the word field of this node 122 00:05:31,780 --> 00:05:33,430 that I've just created some space for. 123 00:05:33,430 --> 00:05:36,490 I've copied the word hello into that space of memory. 124 00:05:36,490 --> 00:05:37,930 In order to set the next pointer-- 125 00:05:37,930 --> 00:05:40,570 if I knew what the next node was, I could set it right away. 126 00:05:40,570 --> 00:05:43,000 But if I don't know, I can just set it to null. 127 00:05:43,000 --> 00:05:45,460 N arrow next equals null means right now, 128 00:05:45,460 --> 00:05:48,160 nothing comes after this particular node. 129 00:05:48,160 --> 00:05:50,290 Of course, in our linked list, some of these nodes 130 00:05:50,290 --> 00:05:54,190 are going to have other nodes that are the next node in the sequence 131 00:05:54,190 --> 00:05:56,990 of elements inside of the linked list. 132 00:05:56,990 --> 00:06:00,202 So what exactly do you need to do now inside of this load function? 133 00:06:00,202 --> 00:06:02,410 Well, the first thing that you're going to have to do 134 00:06:02,410 --> 00:06:04,630 is open up this dictionary file. 135 00:06:04,630 --> 00:06:08,020 Next, you're going to read strings from that file one at a time. 136 00:06:08,020 --> 00:06:10,840 One word in the dictionary at a time. 137 00:06:10,840 --> 00:06:13,990 For each of those words, you're going to create a new node. 138 00:06:13,990 --> 00:06:16,990 Some node that is going to have a value and the next pointer. 139 00:06:16,990 --> 00:06:21,160 And you're going to hash that word using a hash function to obtain a hash value, 140 00:06:21,160 --> 00:06:24,850 and index into your hash table to determine which of those linked lists 141 00:06:24,850 --> 00:06:28,390 you should use when you're inserting this node into a linked list. 142 00:06:28,390 --> 00:06:31,030 And then, finally, you're going to take each of those words 143 00:06:31,030 --> 00:06:34,320 and insert that node into the hash table at the location 144 00:06:34,320 --> 00:06:36,680 given by your hash function. 145 00:06:36,680 --> 00:06:39,830 So let's take a look now at each of these steps one at a time. 146 00:06:39,830 --> 00:06:43,000 Let's start with how you're going to open the dictionary file. 147 00:06:43,000 --> 00:06:45,040 Recall that the argument to the load function 148 00:06:45,040 --> 00:06:48,550 is a char star or string representing the dictionary file that you're 149 00:06:48,550 --> 00:06:50,290 going to use for the loading. 150 00:06:50,290 --> 00:06:53,708 You can use the f-open function to open up that file. 151 00:06:53,708 --> 00:06:55,750 And then, make sure to check to see if the return 152 00:06:55,750 --> 00:07:00,070 value is null to make sure that you were able to successfully open up the file. 153 00:07:00,070 --> 00:07:02,560 If you weren't able to successfully open up the file, 154 00:07:02,560 --> 00:07:04,780 your load function should return false. 155 00:07:04,780 --> 00:07:09,010 Otherwise, you'll be able to keep going to the next step of the load function. 156 00:07:09,010 --> 00:07:12,250 The next step is to read strings from the file. 157 00:07:12,250 --> 00:07:14,170 And for this, you can use a function called 158 00:07:14,170 --> 00:07:18,640 fscanf, short for scanning from a file, where the first argument is 159 00:07:18,640 --> 00:07:21,790 going to be that file pointer, the result of calling f-open 160 00:07:21,790 --> 00:07:23,270 on the dictionary file. 161 00:07:23,270 --> 00:07:27,010 The second argument is %s, meaning you want to read in a string. 162 00:07:27,010 --> 00:07:31,130 And the third argument is word, which is going to be some character array. 163 00:07:31,130 --> 00:07:33,520 Some place where you're going to read the word into. 164 00:07:33,520 --> 00:07:35,312 In other words, some character array inside 165 00:07:35,312 --> 00:07:38,380 of memory where you can then access all of the individual characters 166 00:07:38,380 --> 00:07:40,625 of that word after you've read it from the file. 167 00:07:40,625 --> 00:07:43,000 And you're going to want to repeat this again, and again, 168 00:07:43,000 --> 00:07:47,260 and again for each of the words in the dictionary, calling fscanf on the file 169 00:07:47,260 --> 00:07:52,960 over and over getting one word at a time until fscanf returns EOF. 170 00:07:52,960 --> 00:07:56,870 And that will happen once fscanf reaches the end of the file. 171 00:07:56,870 --> 00:07:59,590 So when fscanf returns EOF, that's your sign 172 00:07:59,590 --> 00:08:02,410 that you finished reading all of the words in the dictionary. 173 00:08:02,410 --> 00:08:04,750 But you'll likely want to have some sort of loop that's 174 00:08:04,750 --> 00:08:07,300 running fscanf again and again on one word, 175 00:08:07,300 --> 00:08:11,350 after word, after word until you get to the end of the file. 176 00:08:11,350 --> 00:08:14,390 What are you going to do for each of those individual words? 177 00:08:14,390 --> 00:08:16,480 Well, the next step is going to be to create 178 00:08:16,480 --> 00:08:19,720 a new node that is going to store that particular word 179 00:08:19,720 --> 00:08:21,280 inside of your hash table. 180 00:08:21,280 --> 00:08:22,280 How to do that? 181 00:08:22,280 --> 00:08:25,630 We'll recall that the malloc function you can use to allocate memory. 182 00:08:25,630 --> 00:08:28,690 So you'll want to allocate enough memory to store a new node. 183 00:08:28,690 --> 00:08:31,810 And again, be sure to check if malloc returns null or not, 184 00:08:31,810 --> 00:08:35,159 because if malloc doesn't have enough memory to be able to store the node, 185 00:08:35,159 --> 00:08:39,070 it will return null and your load function should return false. 186 00:08:39,070 --> 00:08:40,900 Once you've created that node, you're going 187 00:08:40,900 --> 00:08:43,780 to want to copy the word into that node using 188 00:08:43,780 --> 00:08:47,020 the strcpy function, a function which will take a string 189 00:08:47,020 --> 00:08:50,803 and copy it from one location into another location. 190 00:08:50,803 --> 00:08:52,720 Now that we've created the node, the next step 191 00:08:52,720 --> 00:08:54,910 is to figure out how it is we're going to insert 192 00:08:54,910 --> 00:08:56,930 this node into the hash table. 193 00:08:56,930 --> 00:09:00,160 Recall again that a hash table is just an array of linked lists. 194 00:09:00,160 --> 00:09:02,200 And we now need to determine which linked 195 00:09:02,200 --> 00:09:04,480 list we should use when we're determining 196 00:09:04,480 --> 00:09:07,810 where to place this particular node that we've just created. 197 00:09:07,810 --> 00:09:10,480 And to do that, we're going to hash the word. 198 00:09:10,480 --> 00:09:13,600 You're going to use the hash function, which is defined in dictionary 199 00:09:13,600 --> 00:09:16,240 don't C, which will be a function that takes a string 200 00:09:16,240 --> 00:09:20,590 and returns an index, some number that you can use to index into your linked 201 00:09:20,590 --> 00:09:21,400 list. 202 00:09:21,400 --> 00:09:23,680 Right now, we've given you a hash function that just 203 00:09:23,680 --> 00:09:25,812 returns 0 for every particular input. 204 00:09:25,812 --> 00:09:27,520 And later in this assignment, we're going 205 00:09:27,520 --> 00:09:30,550 to replace this hash function with a better hash function. 206 00:09:30,550 --> 00:09:34,840 But for now, you can just call upon the hash function inside of load 207 00:09:34,840 --> 00:09:37,480 to determine which index into the hash table 208 00:09:37,480 --> 00:09:41,170 you should use when you're inserting this new node. 209 00:09:41,170 --> 00:09:44,200 Once you've determined which linked list you should use, 210 00:09:44,200 --> 00:09:48,190 the next step is to actually insert that word into the linked list. 211 00:09:48,190 --> 00:09:49,430 How are you going to do that? 212 00:09:49,430 --> 00:09:52,480 Well, again, recall that the hash table is an array of linked lists. 213 00:09:52,480 --> 00:09:55,150 And you're going to want to index into the hash table 214 00:09:55,150 --> 00:09:59,045 to get the linked list that you want to use to add this particular word into. 215 00:09:59,045 --> 00:10:00,920 And in order to do that, you're going to have 216 00:10:00,920 --> 00:10:02,912 to add a new node to a linked list. 217 00:10:02,912 --> 00:10:05,120 So you'll want to be sure to make sure you're setting 218 00:10:05,120 --> 00:10:07,530 your pointers in the correct order. 219 00:10:07,530 --> 00:10:08,580 What do I mean by that? 220 00:10:08,580 --> 00:10:10,400 Well, let's take a look at an example. 221 00:10:10,400 --> 00:10:13,430 Here up at the top, imagine that head is a variable that 222 00:10:13,430 --> 00:10:15,500 points to the start of a linked list. 223 00:10:15,500 --> 00:10:16,760 The linked list is bat. 224 00:10:16,760 --> 00:10:17,870 And next is book. 225 00:10:17,870 --> 00:10:18,950 And next is banana. 226 00:10:18,950 --> 00:10:21,510 And banana here points to null, for example. 227 00:10:21,510 --> 00:10:25,400 And let's imagine that I have this new node, which is this blue mode 228 00:10:25,400 --> 00:10:26,510 that I've just malloced. 229 00:10:26,510 --> 00:10:28,650 And I want to add it to this linked list. 230 00:10:28,650 --> 00:10:32,488 So if I wanted to make this new node the new head of the linked list, 231 00:10:32,488 --> 00:10:34,280 one thing you might imagine that I would do 232 00:10:34,280 --> 00:10:38,030 is point head to point at this new node that I've just created. 233 00:10:38,030 --> 00:10:39,380 But what's just happened? 234 00:10:39,380 --> 00:10:43,100 Now that I've set head to be blue, this very first node inside 235 00:10:43,100 --> 00:10:46,130 of this linked list, I've now lost access effectively 236 00:10:46,130 --> 00:10:49,340 to all of the other nodes that were previously in the linked list. 237 00:10:49,340 --> 00:10:53,780 I have no way to get to them because both head and new node both point 238 00:10:53,780 --> 00:10:55,310 to just this blue node. 239 00:10:55,310 --> 00:10:58,050 And blue doesn't point to anything else. 240 00:10:58,050 --> 00:10:59,822 So what is the actual solution here. 241 00:10:59,822 --> 00:11:02,030 Well, the first thing that you're going to want to do 242 00:11:02,030 --> 00:11:03,920 is probably to take this new node-- 243 00:11:03,920 --> 00:11:05,210 in this case, blue-- 244 00:11:05,210 --> 00:11:10,070 and set its next pointer to be the first element in the linked list. 245 00:11:10,070 --> 00:11:13,110 Then after that, you can reset head-- in other words, 246 00:11:13,110 --> 00:11:14,780 the first element in the linked list-- 247 00:11:14,780 --> 00:11:17,280 to be the new node that you've just created. 248 00:11:17,280 --> 00:11:19,580 Now the beginning of the linked list is blue. 249 00:11:19,580 --> 00:11:22,580 And next comes bat, and then book, and banana. 250 00:11:22,580 --> 00:11:24,990 And we've been able to represent all of these words 251 00:11:24,990 --> 00:11:28,940 now together inside of a single linked list by adding our new node 252 00:11:28,940 --> 00:11:31,640 to the beginning of the linked list. 253 00:11:31,640 --> 00:11:35,030 And you're going to repeat this process for every word in the dictionary. 254 00:11:35,030 --> 00:11:38,420 Reading that word using fscanf, allocating enough memory in order 255 00:11:38,420 --> 00:11:41,750 to store that word, copying the word into the node, 256 00:11:41,750 --> 00:11:45,500 and then inserting that node into one of the linked lists in your hash table 257 00:11:45,500 --> 00:11:48,245 based on the result of calling your hash function. 258 00:11:48,245 --> 00:11:50,870 After you've done that for each of the words in the dictionary, 259 00:11:50,870 --> 00:11:53,300 you'll have successfully loaded all of your data 260 00:11:53,300 --> 00:11:55,940 into memory into a hash table such that you can now 261 00:11:55,940 --> 00:11:58,600 begin the spell checking process. 262 00:11:58,600 --> 00:11:59,585