1 00:00:00,000 --> 00:00:11,856 2 00:00:11,856 --> 00:00:13,050 >> ROB BOWDEN: Hi. 3 00:00:13,050 --> 00:00:16,210 I'm Rob, and let's hash this solution out. 4 00:00:16,210 --> 00:00:20,070 So here we're going to implement a general hash table. 5 00:00:20,070 --> 00:00:24,090 We see that the struct node of our hash table is going to look like this. 6 00:00:24,090 --> 00:00:28,710 So it's going to have a char word array of size length plus 1. 7 00:00:28,710 --> 00:00:32,259 Don't forget the 1 since the maximum word in the dictionary is 45 8 00:00:32,259 --> 00:00:35,110 characters, and then we're going to need one extra character for the 9 00:00:35,110 --> 00:00:37,070 backslash 0. 10 00:00:37,070 --> 00:00:40,870 >> And then our hash table in each bucket is going to store a 11 00:00:40,870 --> 00:00:42,320 linked list of nodes. 12 00:00:42,320 --> 00:00:44,420 We're not doing linear probing here. 13 00:00:44,420 --> 00:00:48,430 And so in order to link to the next element in the bucket, we need a 14 00:00:48,430 --> 00:00:51,110 struct node* next. 15 00:00:51,110 --> 00:00:53,090 So that's what a node looks like. 16 00:00:53,090 --> 00:00:56,180 Now, here is the declaration of our hash table. 17 00:00:56,180 --> 00:01:01,910 It's going to have 16,384 buckets, but that number doesn't really matter. 18 00:01:01,910 --> 00:01:05,450 And finally, we're going to have the global variable hashtable_size, which 19 00:01:05,450 --> 00:01:08,640 is going to start off as 0, and it's going to keep track of how many words 20 00:01:08,640 --> 00:01:10,080 were in our dictionary. 21 00:01:10,080 --> 00:01:10,760 All right. 22 00:01:10,760 --> 00:01:13,190 >> So let's take a look at load. 23 00:01:13,190 --> 00:01:16,390 So notice that load, it returns a bool. 24 00:01:16,390 --> 00:01:20,530 You return true if it was successfully loaded and false otherwise. 25 00:01:20,530 --> 00:01:23,990 And it takes a const char* star dictionary, which is the dictionary 26 00:01:23,990 --> 00:01:25,280 that we want to open. 27 00:01:25,280 --> 00:01:27,170 So that's the first thing we're going to do. 28 00:01:27,170 --> 00:01:30,420 We're going to fopen the dictionary for reading, and we're going to have 29 00:01:30,420 --> 00:01:34,700 to make sure that it succeeded so if it returned NULL, then we did not 30 00:01:34,700 --> 00:01:37,440 successfully open the dictionary and we need to return false. 31 00:01:37,440 --> 00:01:41,580 >> But assuming that it did successfully open, then we want to read the 32 00:01:41,580 --> 00:01:42,400 dictionary. 33 00:01:42,400 --> 00:01:46,210 So keep looping until we find some reason to break out of this 34 00:01:46,210 --> 00:01:47,570 loop which we'll see. 35 00:01:47,570 --> 00:01:51,780 So keep looping, and now we're going to malloc a single node. 36 00:01:51,780 --> 00:01:56,800 And of course, we need to error check again so if mallocing did not succeed 37 00:01:56,800 --> 00:02:00,660 and we want to unload any node that we happened to malloc before, close the 38 00:02:00,660 --> 00:02:02,590 dictionary and return false. 39 00:02:02,590 --> 00:02:07,440 >> But ignoring that, assuming we succeeded, then we want to use fscanf 40 00:02:07,440 --> 00:02:12,400 to read a single word from our dictionary into our node. 41 00:02:12,400 --> 00:02:17,310 So remember that entry->word is the char word buffer of size length plus 42 00:02:17,310 --> 00:02:20,300 one that we're going to store the word in. 43 00:02:20,300 --> 00:02:25,280 So fscanf is going to return 1 as long as it was able to successfully read a 44 00:02:25,280 --> 00:02:26,750 word from the file. 45 00:02:26,750 --> 00:02:31,030 >> If either an error happens or we reach the end of the file, it will not 46 00:02:31,030 --> 00:02:34,950 return 1 in which case if it does not return 1, we're finally going to break 47 00:02:34,950 --> 00:02:37,280 out of this while loop. 48 00:02:37,280 --> 00:02:42,770 So we see that once we have successfully read a word into 49 00:02:42,770 --> 00:02:48,270 entry->word, then we're going to hash that word using our hash function. 50 00:02:48,270 --> 00:02:49,580 Let's take a look at the hash function. 51 00:02:49,580 --> 00:02:52,430 52 00:02:52,430 --> 00:02:55,610 >> So you don't really need to understand this. 53 00:02:55,610 --> 00:02:59,460 And actually, we just pulled this hash function from the internet. 54 00:02:59,460 --> 00:03:04,010 The only thing you need to recognize is that this takes a const char* word, 55 00:03:04,010 --> 00:03:08,960 so it's taking a string as input and returning an unsigned int as output. 56 00:03:08,960 --> 00:03:12,360 So that's all a hash function is, is it takes in an input, it gives you an 57 00:03:12,360 --> 00:03:14,490 index into the hash table. 58 00:03:14,490 --> 00:03:18,530 Notice that we're modding by NUM_BUCKETS so the hash value returned 59 00:03:18,530 --> 00:03:21,730 actually is an index into the hash table and doesn't index beyond the 60 00:03:21,730 --> 00:03:24,320 bounds of the array. 61 00:03:24,320 --> 00:03:27,855 >> So given that hash function, we're going to hash the word that we read 62 00:03:27,855 --> 00:03:31,700 from the dictionary and then we're going to use that has to insert the 63 00:03:31,700 --> 00:03:33,750 entry into the hash table. 64 00:03:33,750 --> 00:03:38,830 Now, hashtable hash is the current linked list in the hash table, and 65 00:03:38,830 --> 00:03:41,410 it's very possible that is just NULL. 66 00:03:41,410 --> 00:03:45,640 We want to insert our entry at the beginning of this linked list, and so 67 00:03:45,640 --> 00:03:48,910 we're going to have our current entry point to what the hash table currently 68 00:03:48,910 --> 00:03:54,030 points to and then we're going to store in the hash table at the hash 69 00:03:54,030 --> 00:03:55,350 the current entry. 70 00:03:55,350 --> 00:03:59,320 >> So these two lines successfully insert the entry at the beginning of the 71 00:03:59,320 --> 00:04:02,270 linked list at that index in the hash table. 72 00:04:02,270 --> 00:04:04,900 Once we're done with that, we know that we found another word in the 73 00:04:04,900 --> 00:04:07,800 dictionary and we increment again. 74 00:04:07,800 --> 00:04:13,960 So we keep doing that until fscanf finally returns something non 1 at 75 00:04:13,960 --> 00:04:18,560 which point remember that we need to free entry, so up here, we malloced an 76 00:04:18,560 --> 00:04:21,329 entry and we tried to read something from the dictionary. 77 00:04:21,329 --> 00:04:24,110 And we did not successfully read something from the dictionary in which 78 00:04:24,110 --> 00:04:27,440 case we need to free the entry that we never actually put into the hash table 79 00:04:27,440 --> 00:04:29,110 and finally break. 80 00:04:29,110 --> 00:04:32,750 >> Once we break out, we need to see, well, did we break out because there 81 00:04:32,750 --> 00:04:35,840 was an error reading from the file, or did we break out because we reached 82 00:04:35,840 --> 00:04:37,120 the end of the file? 83 00:04:37,120 --> 00:04:40,150 If there was an error, then we want to return false because load did not 84 00:04:40,150 --> 00:04:43,260 succeed, and in the process, we want to unload all the words that we read 85 00:04:43,260 --> 00:04:45,670 in and close the dictionary file. 86 00:04:45,670 --> 00:04:48,740 Assuming we did succeed, then we just still need to close the dictionary 87 00:04:48,740 --> 00:04:51,970 file, and finally return true since we've successfully loaded the 88 00:04:51,970 --> 00:04:53,040 dictionary. 89 00:04:53,040 --> 00:04:54,420 And that's it for load. 90 00:04:54,420 --> 00:04:59,020 >> So now check, given a loaded hash table, is going to look like this. 91 00:04:59,020 --> 00:05:02,690 So check, it returns a bool, which is going to indicate whether the 92 00:05:02,690 --> 00:05:07,530 passed-in char* word, whether the passed-in string is in our dictionary. 93 00:05:07,530 --> 00:05:10,530 So if it is in the dictionary, if it is in our hash table, we will return 94 00:05:10,530 --> 00:05:13,380 true, and if it's not, we will return false. 95 00:05:13,380 --> 00:05:17,770 Given this passed-in word, we're going to hash the word. 96 00:05:17,770 --> 00:05:22,020 >> Now, an important thing to recognize is that in load, we knew that all of 97 00:05:22,020 --> 00:05:25,820 the words were going to be lower case, but here, we're not so sure. 98 00:05:25,820 --> 00:05:29,510 If we take a look at our hash function, our hash function actually 99 00:05:29,510 --> 00:05:32,700 is lowercasing each character of the word. 100 00:05:32,700 --> 00:05:37,580 So regardless of the capitalization of word, our hash function is going to 101 00:05:37,580 --> 00:05:42,270 return the same index for whatever the capitalization is as it would have 102 00:05:42,270 --> 00:05:45,280 returned for a completely lowercase version of the word. 103 00:05:45,280 --> 00:05:45,950 All right. 104 00:05:45,950 --> 00:05:47,410 >> So that's our index. 105 00:05:47,410 --> 00:05:49,790 It's the hash table for this word. 106 00:05:49,790 --> 00:05:52,940 Now, this for loop is going to over the linked list 107 00:05:52,940 --> 00:05:55,000 that was at that index. 108 00:05:55,000 --> 00:05:59,630 So notice we are initializing entry to point to that index. 109 00:05:59,630 --> 00:06:03,480 We're going to continue while entry does not equal NULL, and remember that 110 00:06:03,480 --> 00:06:08,350 updating the pointer in our linked list entry equals entry->next, so have 111 00:06:08,350 --> 00:06:13,840 our current entry point to the next item in linked list. 112 00:06:13,840 --> 00:06:14,400 All right. 113 00:06:14,400 --> 00:06:19,150 >> So for each entry in the linked list, we're going to use strcasecmp. 114 00:06:19,150 --> 00:06:23,780 It's not strcmp because once again, we want to do things case insensitively. 115 00:06:23,780 --> 00:06:28,830 So we use strcasecmp to compare the word that was passed to this function 116 00:06:28,830 --> 00:06:31,860 against the word that is in this entry. 117 00:06:31,860 --> 00:06:35,570 If it returns 0, that means there was a match, in which case we want to 118 00:06:35,570 --> 00:06:36,630 return true. 119 00:06:36,630 --> 00:06:39,590 We successfully found the word in our hash table. 120 00:06:39,590 --> 00:06:43,040 >> If there was not a match, then we're going to loop again and look at the 121 00:06:43,040 --> 00:06:43,990 next entry. 122 00:06:43,990 --> 00:06:47,640 And we'll continue looping while there are entries in this linked list. 123 00:06:47,640 --> 00:06:50,160 What happens if we break out of this for loop? 124 00:06:50,160 --> 00:06:55,110 That means we did not find an entry that matched this word, in which case 125 00:06:55,110 --> 00:07:00,220 we return false to indicate that our hash table did not contain this word. 126 00:07:00,220 --> 00:07:01,910 And that's it for check. 127 00:07:01,910 --> 00:07:02,540 All right. 128 00:07:02,540 --> 00:07:04,790 >> So let's take a look at size. 129 00:07:04,790 --> 00:07:09,280 Now, size is going to be pretty simple since remember in load, for each word 130 00:07:09,280 --> 00:07:12,880 we found we incremented a global variable hashtable_size. 131 00:07:12,880 --> 00:07:15,830 So the size function is just going to return that global 132 00:07:15,830 --> 00:07:18,150 variable, and that's it. 133 00:07:18,150 --> 00:07:22,300 >> Now finally, we need to unload the dictionary once everything's done. 134 00:07:22,300 --> 00:07:25,340 So how are we going to do that? 135 00:07:25,340 --> 00:07:30,440 Right here, we're looping over all buckets of our hash table. 136 00:07:30,440 --> 00:07:33,240 So there are NUM_BUCKETS buckets. 137 00:07:33,240 --> 00:07:37,490 And for each linked list in our hash table, we're going to loop over the 138 00:07:37,490 --> 00:07:41,070 entirety of the linked list freeing each element. 139 00:07:41,070 --> 00:07:46,070 Now, we need to be careful, so here we have a temporary variable that's 140 00:07:46,070 --> 00:07:49,740 storing the pointer to the next element in the linked list. 141 00:07:49,740 --> 00:07:52,140 And then we're going to free the current element. 142 00:07:52,140 --> 00:07:55,990 >> We need to be sure we do this since we can't just free the current element 143 00:07:55,990 --> 00:07:59,260 and then try to access the next pointer since once we freed it the 144 00:07:59,260 --> 00:08:00,870 memory becomes invalid. 145 00:08:00,870 --> 00:08:04,990 So we need to keep around a pointer to the next element, then we can free the 146 00:08:04,990 --> 00:08:08,360 current element, and then we can update our current element to point to 147 00:08:08,360 --> 00:08:09,590 the next element. 148 00:08:09,590 --> 00:08:12,770 >> We'll loop while there are elements in this linked list. 149 00:08:12,770 --> 00:08:16,450 We'll do that for all linked lists in the hash table, and once we're done 150 00:08:16,450 --> 00:08:20,180 with that, we've completely unloaded the hash table, and we're done. 151 00:08:20,180 --> 00:08:24,050 So it's impossible for unloads to ever return false, and when we're done, we 152 00:08:24,050 --> 00:08:25,320 just return true. 153 00:08:25,320 --> 00:08:27,563