1
00:00:00,000 --> 00:00:00,360


2
00:00:00,360 --> 00:00:02,310
SPEAKER: Your first
task in speller is going

3
00:00:02,310 --> 00:00:04,500
to be to implement the
load function, which

4
00:00:04,500 --> 00:00:09,000
is responsible for taking the dictionary
and loading it into a hash table.

5
00:00:09,000 --> 00:00:10,230
How is this going to work?

6
00:00:10,230 --> 00:00:14,340
Well, the load function will take as
its argument a char star, or a string,

7
00:00:14,340 --> 00:00:16,440
which is the dictionary
file that you're going

8
00:00:16,440 --> 00:00:18,750
to have to open up
and read from in order

9
00:00:18,750 --> 00:00:21,510
to load all this data
into your hash table.

10
00:00:21,510 --> 00:00:23,100
It's going to return a Boolean value.

11
00:00:23,100 --> 00:00:27,300
True if you were able to successfully
load all the data into your hash table,

12
00:00:27,300 --> 00:00:29,370
and false if there was
some sort of memory error,

13
00:00:29,370 --> 00:00:31,270
like not enough memory
available to allocate,

14
00:00:31,270 --> 00:00:34,060
for example, or a file
that couldn't be opened.

15
00:00:34,060 --> 00:00:36,400
So let's take a look at how
you're going to do this.

16
00:00:36,400 --> 00:00:38,483
Well, ultimately, when you
open up this dictionary

17
00:00:38,483 --> 00:00:40,530
file and store all of
the words in memory,

18
00:00:40,530 --> 00:00:43,140
you're going to store them
all inside of a hash table.

19
00:00:43,140 --> 00:00:47,010
And recall that a hash table is really
just an array of individual linked

20
00:00:47,010 --> 00:00:47,820
lists.

21
00:00:47,820 --> 00:00:50,520
And the way you determine
which of those linked lists

22
00:00:50,520 --> 00:00:52,740
you're going to insert
a word into is based

23
00:00:52,740 --> 00:00:55,290
on a hash function, a
function that in this case

24
00:00:55,290 --> 00:00:57,442
is going to take input,
a string, or a word

25
00:00:57,442 --> 00:00:59,400
that you're going to
include in the dictionary.

26
00:00:59,400 --> 00:01:01,770
And the output is
going to be some number

27
00:01:01,770 --> 00:01:04,080
that you're able to generate
from that input, which

28
00:01:04,080 --> 00:01:06,720
is going to correspond to
which of the linked lists

29
00:01:06,720 --> 00:01:10,320
you want to place this
particular word into.

30
00:01:10,320 --> 00:01:13,740
Let's take a closer look at hash tables
to get a better sense for how they work

31
00:01:13,740 --> 00:01:16,860
and how they're going to be
useful for this load function.

32
00:01:16,860 --> 00:01:19,140
Here, for example, I have
five words that I would

33
00:01:19,140 --> 00:01:21,330
like to store inside of my hash table.

34
00:01:21,330 --> 00:01:23,550
Apple, banana, bat, car and book.

35
00:01:23,550 --> 00:01:27,210
And I could take all of these words and
just put them in a single linked list.

36
00:01:27,210 --> 00:01:29,030
But then I'd have a
pretty long linked list

37
00:01:29,030 --> 00:01:31,080
of five words that I
would have to traverse

38
00:01:31,080 --> 00:01:34,590
if I wanted to find whether any
one particular word was inside

39
00:01:34,590 --> 00:01:36,030
of my data structure.

40
00:01:36,030 --> 00:01:38,640
Instead, I can put
them into a hash table,

41
00:01:38,640 --> 00:01:41,100
creating several buckets so to speak.

42
00:01:41,100 --> 00:01:44,430
I'll create an A bucket for all
words that begin with the letter A,

43
00:01:44,430 --> 00:01:47,070
a B bucket for all words
that begin with the letter B,

44
00:01:47,070 --> 00:01:49,920
and a C bucket for all words
that begin with the letter C,

45
00:01:49,920 --> 00:01:53,670
and then associate each of those
words with a particular bucket.

46
00:01:53,670 --> 00:01:55,910
So, for example, apple
ends up in the A bucket.

47
00:01:55,910 --> 00:01:58,260
Bat, book, and banana
end up in the B bucket.

48
00:01:58,260 --> 00:02:00,270
Car ends up with the C bucket.

49
00:02:00,270 --> 00:02:05,070
What I've used here is a hash function,
a function that takes apple, or bat,

50
00:02:05,070 --> 00:02:09,419
or book as input, and outputs which of
the linked lists-- the A linked list,

51
00:02:09,419 --> 00:02:11,670
or the B linked list,
or the C linked list--

52
00:02:11,670 --> 00:02:14,700
that I should place this
particular word into.

53
00:02:14,700 --> 00:02:17,550
So what exactly is a hash function?

54
00:02:17,550 --> 00:02:20,970
Well, a hash function in this case
is going to take a word as input,

55
00:02:20,970 --> 00:02:23,370
and it's going to output
a number corresponding

56
00:02:23,370 --> 00:02:27,600
to which bucket or element within this
array we want to store the word in.

57
00:02:27,600 --> 00:02:29,820
Because recall that a
hash table really is just

58
00:02:29,820 --> 00:02:33,390
an array where the very first linked
list in that array is at index 0,

59
00:02:33,390 --> 00:02:35,402
and then 1 and, then
2, and then so forth.

60
00:02:35,402 --> 00:02:37,110
And so our hash function
is going to take

61
00:02:37,110 --> 00:02:40,560
the word and output which
index into that array

62
00:02:40,560 --> 00:02:42,960
we should use to place
that particular word in.

63
00:02:42,960 --> 00:02:47,670
And at each index inside of the hash
table is going to be a linked list.

64
00:02:47,670 --> 00:02:49,265
And what is a linked list again?

65
00:02:49,265 --> 00:02:51,390
We'll recall that a linked
list is a way of storing

66
00:02:51,390 --> 00:02:55,770
data that's comprised of nodes,
where every node has a value as well

67
00:02:55,770 --> 00:02:57,930
as a pointer to the next node.

68
00:02:57,930 --> 00:03:00,330
So, for example, in the
case of our spell checker,

69
00:03:00,330 --> 00:03:03,360
we're going to have a hash table,
which has an array of linked lists.

70
00:03:03,360 --> 00:03:06,950
And each of those linked list
nodes is going to have a value--

71
00:03:06,950 --> 00:03:08,430
some place to store a word--

72
00:03:08,430 --> 00:03:11,910
which in this case is going to be a
character array, which can store words

73
00:03:11,910 --> 00:03:17,010
up to the maximum length of any word,
as well as a pointer to the next word.

74
00:03:17,010 --> 00:03:18,190
What does this look like?

75
00:03:18,190 --> 00:03:21,040
Well, if you look at the
top of dictionary dot C,

76
00:03:21,040 --> 00:03:22,980
you'll see the definition
of a node here.

77
00:03:22,980 --> 00:03:26,130
We've typedef'd a new node to
create a new type called node, where

78
00:03:26,130 --> 00:03:29,130
every node has a word,
which is a character array,

79
00:03:29,130 --> 00:03:31,320
as well as a pointer
that is going to give us

80
00:03:31,320 --> 00:03:34,900
the address of the next
node in this linked list.

81
00:03:34,900 --> 00:03:38,040
So our hash table is going to give
us the first node in the linked list.

82
00:03:38,040 --> 00:03:41,125
And from there, we can get to the
next node, and the node after that,

83
00:03:41,125 --> 00:03:44,250
so on and so forth, until we get to
the end of the linked list, where we're

84
00:03:44,250 --> 00:03:46,890
just going to find the value null.

85
00:03:46,890 --> 00:03:49,380
Then, immediately below
that in dictionary dot C,

86
00:03:49,380 --> 00:03:50,880
you'll also notice this.

87
00:03:50,880 --> 00:03:54,010
We've defined const
int N to be equal to 1.

88
00:03:54,010 --> 00:03:56,200
In other words, we defined
a constant integer,

89
00:03:56,200 --> 00:03:58,800
which is called N, which
we've initially set to 1.

90
00:03:58,800 --> 00:04:02,370
And now said that table is going
to represent our hash table.

91
00:04:02,370 --> 00:04:05,460
And it's going to be an
array of node pointers.

92
00:04:05,460 --> 00:04:09,540
An array where every element in
that array is a pointer to a node.

93
00:04:09,540 --> 00:04:12,990
Right now, this hash table has
only one element in the array.

94
00:04:12,990 --> 00:04:15,670
In other words, there's only
one bucket in the hash table.

95
00:04:15,670 --> 00:04:19,149
So you'll likely want to change
N. Set it to some higher value

96
00:04:19,149 --> 00:04:21,060
so your hash table
can have more buckets,

97
00:04:21,060 --> 00:04:23,970
and therefore spread data
out a little bit more evenly.

98
00:04:23,970 --> 00:04:26,920
Initially, though, your hash
table is going to be empty.

99
00:04:26,920 --> 00:04:30,570
So how are you going to be able to
add more data to that hash table?

100
00:04:30,570 --> 00:04:34,140
Well, to do so, you're going to need
to allocate memory for a new node,

101
00:04:34,140 --> 00:04:36,360
and then put some data into that node.

102
00:04:36,360 --> 00:04:38,310
In order to allocate
memory for a new node,

103
00:04:38,310 --> 00:04:40,560
I might have some code that
looks something like this.

104
00:04:40,560 --> 00:04:43,680
Node star N equals malloc size of node.

105
00:04:43,680 --> 00:04:45,840
Malloc here means allocate some memory.

106
00:04:45,840 --> 00:04:47,730
Ask for some memory from the computer.

107
00:04:47,730 --> 00:04:50,040
And how much memory
do I want to allocate?

108
00:04:50,040 --> 00:04:53,040
Well, size of node is going to
be the number of bytes of memory

109
00:04:53,040 --> 00:04:55,620
that I need in order to restore a node.

110
00:04:55,620 --> 00:04:58,620
And so malloc size of node is
going to give me just enough memory

111
00:04:58,620 --> 00:05:00,010
to be able to store a node.

112
00:05:00,010 --> 00:05:03,700
And I'm going to store the
address of that node inside of N.

113
00:05:03,700 --> 00:05:05,920
How do I then put data into that node?

114
00:05:05,920 --> 00:05:08,450
Well, remember that a node
has two properties to it.

115
00:05:08,450 --> 00:05:11,950
It has the word itself, as well
as a pointer to the next node.

116
00:05:11,950 --> 00:05:16,600
So I might say, strcpy N
arrow word, and then hello.

117
00:05:16,600 --> 00:05:17,770
What is this going to do?

118
00:05:17,770 --> 00:05:19,990
Well, strcpy is short for string copy.

119
00:05:19,990 --> 00:05:23,530
It's going to copy a string from
a source into a destination.

120
00:05:23,530 --> 00:05:27,880
And in this case, it's going to copy
the word hello into the character array

121
00:05:27,880 --> 00:05:31,780
N arrow word, which is the
word field of this node

122
00:05:31,780 --> 00:05:33,430
that I've just created some space for.

123
00:05:33,430 --> 00:05:36,490
I've copied the word hello
into that space of memory.

124
00:05:36,490 --> 00:05:37,930
In order to set the next pointer--

125
00:05:37,930 --> 00:05:40,570
if I knew what the next node
was, I could set it right away.

126
00:05:40,570 --> 00:05:43,000
But if I don't know, I
can just set it to null.

127
00:05:43,000 --> 00:05:45,460
N arrow next equals
null means right now,

128
00:05:45,460 --> 00:05:48,160
nothing comes after
this particular node.

129
00:05:48,160 --> 00:05:50,290
Of course, in our linked
list, some of these nodes

130
00:05:50,290 --> 00:05:54,190
are going to have other nodes that
are the next node in the sequence

131
00:05:54,190 --> 00:05:56,990
of elements inside of the linked list.

132
00:05:56,990 --> 00:06:00,202
So what exactly do you need to do
now inside of this load function?

133
00:06:00,202 --> 00:06:02,410
Well, the first thing that
you're going to have to do

134
00:06:02,410 --> 00:06:04,630
is open up this dictionary file.

135
00:06:04,630 --> 00:06:08,020
Next, you're going to read strings
from that file one at a time.

136
00:06:08,020 --> 00:06:10,840
One word in the dictionary at a time.

137
00:06:10,840 --> 00:06:13,990
For each of those words, you're
going to create a new node.

138
00:06:13,990 --> 00:06:16,990
Some node that is going to have
a value and the next pointer.

139
00:06:16,990 --> 00:06:21,160
And you're going to hash that word using
a hash function to obtain a hash value,

140
00:06:21,160 --> 00:06:24,850
and index into your hash table to
determine which of those linked lists

141
00:06:24,850 --> 00:06:28,390
you should use when you're inserting
this node into a linked list.

142
00:06:28,390 --> 00:06:31,030
And then, finally, you're going
to take each of those words

143
00:06:31,030 --> 00:06:34,320
and insert that node into the
hash table at the location

144
00:06:34,320 --> 00:06:36,680
given by your hash function.

145
00:06:36,680 --> 00:06:39,830
So let's take a look now at each
of these steps one at a time.

146
00:06:39,830 --> 00:06:43,000
Let's start with how you're going
to open the dictionary file.

147
00:06:43,000 --> 00:06:45,040
Recall that the argument
to the load function

148
00:06:45,040 --> 00:06:48,550
is a char star or string representing
the dictionary file that you're

149
00:06:48,550 --> 00:06:50,290
going to use for the loading.

150
00:06:50,290 --> 00:06:53,708
You can use the f-open
function to open up that file.

151
00:06:53,708 --> 00:06:55,750
And then, make sure to
check to see if the return

152
00:06:55,750 --> 00:07:00,070
value is null to make sure that you were
able to successfully open up the file.

153
00:07:00,070 --> 00:07:02,560
If you weren't able to
successfully open up the file,

154
00:07:02,560 --> 00:07:04,780
your load function should return false.

155
00:07:04,780 --> 00:07:09,010
Otherwise, you'll be able to keep going
to the next step of the load function.

156
00:07:09,010 --> 00:07:12,250
The next step is to read
strings from the file.

157
00:07:12,250 --> 00:07:14,170
And for this, you can
use a function called

158
00:07:14,170 --> 00:07:18,640
fscanf, short for scanning from a
file, where the first argument is

159
00:07:18,640 --> 00:07:21,790
going to be that file pointer,
the result of calling f-open

160
00:07:21,790 --> 00:07:23,270
on the dictionary file.

161
00:07:23,270 --> 00:07:27,010
The second argument is %s, meaning
you want to read in a string.

162
00:07:27,010 --> 00:07:31,130
And the third argument is word, which
is going to be some character array.

163
00:07:31,130 --> 00:07:33,520
Some place where you're
going to read the word into.

164
00:07:33,520 --> 00:07:35,312
In other words, some
character array inside

165
00:07:35,312 --> 00:07:38,380
of memory where you can then access
all of the individual characters

166
00:07:38,380 --> 00:07:40,625
of that word after you've
read it from the file.

167
00:07:40,625 --> 00:07:43,000
And you're going to want to
repeat this again, and again,

168
00:07:43,000 --> 00:07:47,260
and again for each of the words in the
dictionary, calling fscanf on the file

169
00:07:47,260 --> 00:07:52,960
over and over getting one word at
a time until fscanf returns EOF.

170
00:07:52,960 --> 00:07:56,870
And that will happen once fscanf
reaches the end of the file.

171
00:07:56,870 --> 00:07:59,590
So when fscanf returns
EOF, that's your sign

172
00:07:59,590 --> 00:08:02,410
that you finished reading all
of the words in the dictionary.

173
00:08:02,410 --> 00:08:04,750
But you'll likely want to
have some sort of loop that's

174
00:08:04,750 --> 00:08:07,300
running fscanf again
and again on one word,

175
00:08:07,300 --> 00:08:11,350
after word, after word until
you get to the end of the file.

176
00:08:11,350 --> 00:08:14,390
What are you going to do for
each of those individual words?

177
00:08:14,390 --> 00:08:16,480
Well, the next step is
going to be to create

178
00:08:16,480 --> 00:08:19,720
a new node that is going to
store that particular word

179
00:08:19,720 --> 00:08:21,280
inside of your hash table.

180
00:08:21,280 --> 00:08:22,280
How to do that?

181
00:08:22,280 --> 00:08:25,630
We'll recall that the malloc function
you can use to allocate memory.

182
00:08:25,630 --> 00:08:28,690
So you'll want to allocate enough
memory to store a new node.

183
00:08:28,690 --> 00:08:31,810
And again, be sure to check
if malloc returns null or not,

184
00:08:31,810 --> 00:08:35,159
because if malloc doesn't have enough
memory to be able to store the node,

185
00:08:35,159 --> 00:08:39,070
it will return null and your load
function should return false.

186
00:08:39,070 --> 00:08:40,900
Once you've created
that node, you're going

187
00:08:40,900 --> 00:08:43,780
to want to copy the word
into that node using

188
00:08:43,780 --> 00:08:47,020
the strcpy function, a function
which will take a string

189
00:08:47,020 --> 00:08:50,803
and copy it from one location
into another location.

190
00:08:50,803 --> 00:08:52,720
Now that we've created
the node, the next step

191
00:08:52,720 --> 00:08:54,910
is to figure out how it
is we're going to insert

192
00:08:54,910 --> 00:08:56,930
this node into the hash table.

193
00:08:56,930 --> 00:09:00,160
Recall again that a hash table
is just an array of linked lists.

194
00:09:00,160 --> 00:09:02,200
And we now need to
determine which linked

195
00:09:02,200 --> 00:09:04,480
list we should use
when we're determining

196
00:09:04,480 --> 00:09:07,810
where to place this particular
node that we've just created.

197
00:09:07,810 --> 00:09:10,480
And to do that, we're
going to hash the word.

198
00:09:10,480 --> 00:09:13,600
You're going to use the hash function,
which is defined in dictionary

199
00:09:13,600 --> 00:09:16,240
don't C, which will be a
function that takes a string

200
00:09:16,240 --> 00:09:20,590
and returns an index, some number that
you can use to index into your linked

201
00:09:20,590 --> 00:09:21,400
list.

202
00:09:21,400 --> 00:09:23,680
Right now, we've given you
a hash function that just

203
00:09:23,680 --> 00:09:25,812
returns 0 for every particular input.

204
00:09:25,812 --> 00:09:27,520
And later in this
assignment, we're going

205
00:09:27,520 --> 00:09:30,550
to replace this hash function
with a better hash function.

206
00:09:30,550 --> 00:09:34,840
But for now, you can just call upon
the hash function inside of load

207
00:09:34,840 --> 00:09:37,480
to determine which index
into the hash table

208
00:09:37,480 --> 00:09:41,170
you should use when you're
inserting this new node.

209
00:09:41,170 --> 00:09:44,200
Once you've determined which
linked list you should use,

210
00:09:44,200 --> 00:09:48,190
the next step is to actually insert
that word into the linked list.

211
00:09:48,190 --> 00:09:49,430
How are you going to do that?

212
00:09:49,430 --> 00:09:52,480
Well, again, recall that the hash
table is an array of linked lists.

213
00:09:52,480 --> 00:09:55,150
And you're going to want to
index into the hash table

214
00:09:55,150 --> 00:09:59,045
to get the linked list that you want to
use to add this particular word into.

215
00:09:59,045 --> 00:10:00,920
And in order to do that,
you're going to have

216
00:10:00,920 --> 00:10:02,912
to add a new node to a linked list.

217
00:10:02,912 --> 00:10:05,120
So you'll want to be sure
to make sure you're setting

218
00:10:05,120 --> 00:10:07,530
your pointers in the correct order.

219
00:10:07,530 --> 00:10:08,580
What do I mean by that?

220
00:10:08,580 --> 00:10:10,400
Well, let's take a look at an example.

221
00:10:10,400 --> 00:10:13,430
Here up at the top, imagine
that head is a variable that

222
00:10:13,430 --> 00:10:15,500
points to the start of a linked list.

223
00:10:15,500 --> 00:10:16,760
The linked list is bat.

224
00:10:16,760 --> 00:10:17,870
And next is book.

225
00:10:17,870 --> 00:10:18,950
And next is banana.

226
00:10:18,950 --> 00:10:21,510
And banana here points
to null, for example.

227
00:10:21,510 --> 00:10:25,400
And let's imagine that I have this
new node, which is this blue mode

228
00:10:25,400 --> 00:10:26,510
that I've just malloced.

229
00:10:26,510 --> 00:10:28,650
And I want to add it
to this linked list.

230
00:10:28,650 --> 00:10:32,488
So if I wanted to make this new node
the new head of the linked list,

231
00:10:32,488 --> 00:10:34,280
one thing you might
imagine that I would do

232
00:10:34,280 --> 00:10:38,030
is point head to point at this
new node that I've just created.

233
00:10:38,030 --> 00:10:39,380
But what's just happened?

234
00:10:39,380 --> 00:10:43,100
Now that I've set head to be
blue, this very first node inside

235
00:10:43,100 --> 00:10:46,130
of this linked list, I've
now lost access effectively

236
00:10:46,130 --> 00:10:49,340
to all of the other nodes that
were previously in the linked list.

237
00:10:49,340 --> 00:10:53,780
I have no way to get to them because
both head and new node both point

238
00:10:53,780 --> 00:10:55,310
to just this blue node.

239
00:10:55,310 --> 00:10:58,050
And blue doesn't point to anything else.

240
00:10:58,050 --> 00:10:59,822
So what is the actual solution here.

241
00:10:59,822 --> 00:11:02,030
Well, the first thing that
you're going to want to do

242
00:11:02,030 --> 00:11:03,920
is probably to take this new node--

243
00:11:03,920 --> 00:11:05,210
in this case, blue--

244
00:11:05,210 --> 00:11:10,070
and set its next pointer to be the
first element in the linked list.

245
00:11:10,070 --> 00:11:13,110
Then after that, you can
reset head-- in other words,

246
00:11:13,110 --> 00:11:14,780
the first element in the linked list--

247
00:11:14,780 --> 00:11:17,280
to be the new node that
you've just created.

248
00:11:17,280 --> 00:11:19,580
Now the beginning of
the linked list is blue.

249
00:11:19,580 --> 00:11:22,580
And next comes bat, and
then book, and banana.

250
00:11:22,580 --> 00:11:24,990
And we've been able to
represent all of these words

251
00:11:24,990 --> 00:11:28,940
now together inside of a single
linked list by adding our new node

252
00:11:28,940 --> 00:11:31,640
to the beginning of the linked list.

253
00:11:31,640 --> 00:11:35,030
And you're going to repeat this process
for every word in the dictionary.

254
00:11:35,030 --> 00:11:38,420
Reading that word using fscanf,
allocating enough memory in order

255
00:11:38,420 --> 00:11:41,750
to store that word, copying
the word into the node,

256
00:11:41,750 --> 00:11:45,500
and then inserting that node into one
of the linked lists in your hash table

257
00:11:45,500 --> 00:11:48,245
based on the result of
calling your hash function.

258
00:11:48,245 --> 00:11:50,870
After you've done that for each
of the words in the dictionary,

259
00:11:50,870 --> 00:11:53,300
you'll have successfully
loaded all of your data

260
00:11:53,300 --> 00:11:55,940
into memory into a hash
table such that you can now

261
00:11:55,940 --> 00:11:58,600
begin the spell checking process.

262
00:11:58,600 --> 00:11:59,585