1
00:00:00,000 --> 00:00:11,856

2
00:00:11,856 --> 00:00:13,050
>> ROB BOWDEN: Hi.

3
00:00:13,050 --> 00:00:16,210
I'm Rob, and let's hash
this solution out.

4
00:00:16,210 --> 00:00:20,070
So here we're going to implement
a general hash table.

5
00:00:20,070 --> 00:00:24,090
We see that the struct node of our hash
table is going to look like this.

6
00:00:24,090 --> 00:00:28,710
So it's going to have a char word
array of size length plus 1.

7
00:00:28,710 --> 00:00:32,259
Don't forget the 1 since the maximum
word in the dictionary is 45

8
00:00:32,259 --> 00:00:35,110
characters, and then we're going to
need one extra character for the

9
00:00:35,110 --> 00:00:37,070
backslash 0.

10
00:00:37,070 --> 00:00:40,870
>> And then our hash table in each
bucket is going to store a

11
00:00:40,870 --> 00:00:42,320
linked list of nodes.

12
00:00:42,320 --> 00:00:44,420
We're not doing linear probing here.

13
00:00:44,420 --> 00:00:48,430
And so in order to link to the next
element in the bucket, we need a

14
00:00:48,430 --> 00:00:51,110
struct node* next.

15
00:00:51,110 --> 00:00:53,090
So that's what a node looks like.

16
00:00:53,090 --> 00:00:56,180
Now, here is the declaration
of our hash table.

17
00:00:56,180 --> 00:01:01,910
It's going to have 16,384 buckets, but
that number doesn't really matter.

18
00:01:01,910 --> 00:01:05,450
And finally, we're going to have the
global variable hashtable_size, which

19
00:01:05,450 --> 00:01:08,640
is going to start off as 0, and it's
going to keep track of how many words

20
00:01:08,640 --> 00:01:10,080
were in our dictionary.

21
00:01:10,080 --> 00:01:10,760
All right.

22
00:01:10,760 --> 00:01:13,190
>> So let's take a look at load.

23
00:01:13,190 --> 00:01:16,390
So notice that load,
it returns a bool.

24
00:01:16,390 --> 00:01:20,530
You return true if it was successfully
loaded and false otherwise.

25
00:01:20,530 --> 00:01:23,990
And it takes a const char* star
dictionary, which is the dictionary

26
00:01:23,990 --> 00:01:25,280
that we want to open.

27
00:01:25,280 --> 00:01:27,170
So that's the first thing
we're going to do.

28
00:01:27,170 --> 00:01:30,420
We're going to fopen the dictionary for
reading, and we're going to have

29
00:01:30,420 --> 00:01:34,700
to make sure that it succeeded so if
it returned NULL, then we did not

30
00:01:34,700 --> 00:01:37,440
successfully open the dictionary
and we need to return false.

31
00:01:37,440 --> 00:01:41,580
>> But assuming that it did successfully
open, then we want to read the

32
00:01:41,580 --> 00:01:42,400
dictionary.

33
00:01:42,400 --> 00:01:46,210
So keep looping until we find some
reason to break out of this

34
00:01:46,210 --> 00:01:47,570
loop which we'll see.

35
00:01:47,570 --> 00:01:51,780
So keep looping, and now we're going
to malloc a single node.

36
00:01:51,780 --> 00:01:56,800
And of course, we need to error check
again so if mallocing did not succeed

37
00:01:56,800 --> 00:02:00,660
and we want to unload any node that we
happened to malloc before, close the

38
00:02:00,660 --> 00:02:02,590
dictionary and return false.

39
00:02:02,590 --> 00:02:07,440
>> But ignoring that, assuming we
succeeded, then we want to use fscanf

40
00:02:07,440 --> 00:02:12,400
to read a single word from our
dictionary into our node.

41
00:02:12,400 --> 00:02:17,310
So remember that entry->word is the char
word buffer of size length plus

42
00:02:17,310 --> 00:02:20,300
one that we're going to
store the word in.

43
00:02:20,300 --> 00:02:25,280
So fscanf is going to return 1 as long
as it was able to successfully read a

44
00:02:25,280 --> 00:02:26,750
word from the file.

45
00:02:26,750 --> 00:02:31,030
>> If either an error happens or we reach
the end of the file, it will not

46
00:02:31,030 --> 00:02:34,950
return 1 in which case if it does not
return 1, we're finally going to break

47
00:02:34,950 --> 00:02:37,280
out of this while loop.

48
00:02:37,280 --> 00:02:42,770
So we see that once we have successfully
read a word into

49
00:02:42,770 --> 00:02:48,270
entry->word, then we're going to hash
that word using our hash function.

50
00:02:48,270 --> 00:02:49,580
Let's take a look at
the hash function.

51
00:02:49,580 --> 00:02:52,430

52
00:02:52,430 --> 00:02:55,610
>> So you don't really need
to understand this.

53
00:02:55,610 --> 00:02:59,460
And actually, we just pulled this
hash function from the internet.

54
00:02:59,460 --> 00:03:04,010
The only thing you need to recognize is
that this takes a const char* word,

55
00:03:04,010 --> 00:03:08,960
so it's taking a string as input and
returning an unsigned int as output.

56
00:03:08,960 --> 00:03:12,360
So that's all a hash function is, is it
takes in an input, it gives you an

57
00:03:12,360 --> 00:03:14,490
index into the hash table.

58
00:03:14,490 --> 00:03:18,530
Notice that we're modding by NUM_BUCKETS
so the hash value returned

59
00:03:18,530 --> 00:03:21,730
actually is an index into the hash table
and doesn't index beyond the

60
00:03:21,730 --> 00:03:24,320
bounds of the array.

61
00:03:24,320 --> 00:03:27,855
>> So given that hash function, we're going
to hash the word that we read

62
00:03:27,855 --> 00:03:31,700
from the dictionary and then we're going
to use that has to insert the

63
00:03:31,700 --> 00:03:33,750
entry into the hash table.

64
00:03:33,750 --> 00:03:38,830
Now, hashtable hash is the current
linked list in the hash table, and

65
00:03:38,830 --> 00:03:41,410
it's very possible that is just NULL.

66
00:03:41,410 --> 00:03:45,640
We want to insert our entry at the
beginning of this linked list, and so

67
00:03:45,640 --> 00:03:48,910
we're going to have our current entry
point to what the hash table currently

68
00:03:48,910 --> 00:03:54,030
points to and then we're going to store
in the hash table at the hash

69
00:03:54,030 --> 00:03:55,350
the current entry.

70
00:03:55,350 --> 00:03:59,320
>> So these two lines successfully insert
the entry at the beginning of the

71
00:03:59,320 --> 00:04:02,270
linked list at that index
in the hash table.

72
00:04:02,270 --> 00:04:04,900
Once we're done with that, we know
that we found another word in the

73
00:04:04,900 --> 00:04:07,800
dictionary and we increment again.

74
00:04:07,800 --> 00:04:13,960
So we keep doing that until fscanf
finally returns something non 1 at

75
00:04:13,960 --> 00:04:18,560
which point remember that we need to
free entry, so up here, we malloced an

76
00:04:18,560 --> 00:04:21,329
entry and we tried to read something
from the dictionary.

77
00:04:21,329 --> 00:04:24,110
And we did not successfully read
something from the dictionary in which

78
00:04:24,110 --> 00:04:27,440
case we need to free the entry that we
never actually put into the hash table

79
00:04:27,440 --> 00:04:29,110
and finally break.

80
00:04:29,110 --> 00:04:32,750
>> Once we break out, we need to see, well,
did we break out because there

81
00:04:32,750 --> 00:04:35,840
was an error reading from the file, or
did we break out because we reached

82
00:04:35,840 --> 00:04:37,120
the end of the file?

83
00:04:37,120 --> 00:04:40,150
If there was an error, then we want to
return false because load did not

84
00:04:40,150 --> 00:04:43,260
succeed, and in the process, we want to
unload all the words that we read

85
00:04:43,260 --> 00:04:45,670
in and close the dictionary file.

86
00:04:45,670 --> 00:04:48,740
Assuming we did succeed, then we just
still need to close the dictionary

87
00:04:48,740 --> 00:04:51,970
file, and finally return true since
we've successfully loaded the

88
00:04:51,970 --> 00:04:53,040
dictionary.

89
00:04:53,040 --> 00:04:54,420
And that's it for load.

90
00:04:54,420 --> 00:04:59,020
>> So now check, given a loaded hash table,
is going to look like this.

91
00:04:59,020 --> 00:05:02,690
So check, it returns a bool, which
is going to indicate whether the

92
00:05:02,690 --> 00:05:07,530
passed-in char* word, whether the
passed-in string is in our dictionary.

93
00:05:07,530 --> 00:05:10,530
So if it is in the dictionary, if it is
in our hash table, we will return

94
00:05:10,530 --> 00:05:13,380
true, and if it's not,
we will return false.

95
00:05:13,380 --> 00:05:17,770
Given this passed-in word, we're
going to hash the word.

96
00:05:17,770 --> 00:05:22,020
>> Now, an important thing to recognize is
that in load, we knew that all of

97
00:05:22,020 --> 00:05:25,820
the words were going to be lower case,
but here, we're not so sure.

98
00:05:25,820 --> 00:05:29,510
If we take a look at our hash function,
our hash function actually

99
00:05:29,510 --> 00:05:32,700
is lowercasing each character
of the word.

100
00:05:32,700 --> 00:05:37,580
So regardless of the capitalization of
word, our hash function is going to

101
00:05:37,580 --> 00:05:42,270
return the same index for whatever the
capitalization is as it would have

102
00:05:42,270 --> 00:05:45,280
returned for a completely lowercase
version of the word.

103
00:05:45,280 --> 00:05:45,950
All right.

104
00:05:45,950 --> 00:05:47,410
>> So that's our index.

105
00:05:47,410 --> 00:05:49,790
It's the hash table for this word.

106
00:05:49,790 --> 00:05:52,940
Now, this for loop is going
to over the linked list

107
00:05:52,940 --> 00:05:55,000
that was at that index.

108
00:05:55,000 --> 00:05:59,630
So notice we are initializing entry
to point to that index.

109
00:05:59,630 --> 00:06:03,480
We're going to continue while entry does
not equal NULL, and remember that

110
00:06:03,480 --> 00:06:08,350
updating the pointer in our linked list
entry equals entry->next, so have

111
00:06:08,350 --> 00:06:13,840
our current entry point to the
next item in linked list.

112
00:06:13,840 --> 00:06:14,400
All right.

113
00:06:14,400 --> 00:06:19,150
>> So for each entry in the linked list,
we're going to use strcasecmp.

114
00:06:19,150 --> 00:06:23,780
It's not strcmp because once again, we
want to do things case insensitively.

115
00:06:23,780 --> 00:06:28,830
So we use strcasecmp to compare the word
that was passed to this function

116
00:06:28,830 --> 00:06:31,860
against the word that
is in this entry.

117
00:06:31,860 --> 00:06:35,570
If it returns 0, that means there was
a match, in which case we want to

118
00:06:35,570 --> 00:06:36,630
return true.

119
00:06:36,630 --> 00:06:39,590
We successfully found the
word in our hash table.

120
00:06:39,590 --> 00:06:43,040
>> If there was not a match, then we're
going to loop again and look at the

121
00:06:43,040 --> 00:06:43,990
next entry.

122
00:06:43,990 --> 00:06:47,640
And we'll continue looping while there
are entries in this linked list.

123
00:06:47,640 --> 00:06:50,160
What happens if we break
out of this for loop?

124
00:06:50,160 --> 00:06:55,110
That means we did not find an entry that
matched this word, in which case

125
00:06:55,110 --> 00:07:00,220
we return false to indicate that our
hash table did not contain this word.

126
00:07:00,220 --> 00:07:01,910
And that's it for check.

127
00:07:01,910 --> 00:07:02,540
All right.

128
00:07:02,540 --> 00:07:04,790
>> So let's take a look at size.

129
00:07:04,790 --> 00:07:09,280
Now, size is going to be pretty simple
since remember in load, for each word

130
00:07:09,280 --> 00:07:12,880
we found we incremented a global
variable hashtable_size.

131
00:07:12,880 --> 00:07:15,830
So the size function is just
going to return that global

132
00:07:15,830 --> 00:07:18,150
variable, and that's it.

133
00:07:18,150 --> 00:07:22,300
>> Now finally, we need to unload the
dictionary once everything's done.

134
00:07:22,300 --> 00:07:25,340
So how are we going to do that?

135
00:07:25,340 --> 00:07:30,440
Right here, we're looping over all
buckets of our hash table.

136
00:07:30,440 --> 00:07:33,240
So there are NUM_BUCKETS buckets.

137
00:07:33,240 --> 00:07:37,490
And for each linked list in our hash
table, we're going to loop over the

138
00:07:37,490 --> 00:07:41,070
entirety of the linked list
freeing each element.

139
00:07:41,070 --> 00:07:46,070
Now, we need to be careful, so here we
have a temporary variable that's

140
00:07:46,070 --> 00:07:49,740
storing the pointer to the next
element in the linked list.

141
00:07:49,740 --> 00:07:52,140
And then we're going to free
the current element.

142
00:07:52,140 --> 00:07:55,990
>> We need to be sure we do this since we
can't just free the current element

143
00:07:55,990 --> 00:07:59,260
and then try to access the next pointer
since once we freed it the

144
00:07:59,260 --> 00:08:00,870
memory becomes invalid.

145
00:08:00,870 --> 00:08:04,990
So we need to keep around a pointer to
the next element, then we can free the

146
00:08:04,990 --> 00:08:08,360
current element, and then we can update
our current element to point to

147
00:08:08,360 --> 00:08:09,590
the next element.

148
00:08:09,590 --> 00:08:12,770
>> We'll loop while there are elements
in this linked list.

149
00:08:12,770 --> 00:08:16,450
We'll do that for all linked lists in
the hash table, and once we're done

150
00:08:16,450 --> 00:08:20,180
with that, we've completely unloaded
the hash table, and we're done.

151
00:08:20,180 --> 00:08:24,050
So it's impossible for unloads to ever
return false, and when we're done, we

152
00:08:24,050 --> 00:08:25,320
just return true.

153
00:08:25,320 --> 00:08:27,563