1
00:00:00,000 --> 00:00:03,423
>> [MUSIC PLAYING]

2
00:00:03,423 --> 00:00:05,380

3
00:00:05,380 --> 00:00:08,210
>> ANDI PENG: Welcome to week 6 of section.

4
00:00:08,210 --> 00:00:11,620
We deviated from our standard
section time of Tuesday

5
00:00:11,620 --> 00:00:14,130
afternoon to this lovely Sunday morning.

6
00:00:14,130 --> 00:00:17,330
Thank you for everyone that
joined me today, but seriously,

7
00:00:17,330 --> 00:00:18,170
a round of applause.

8
00:00:18,170 --> 00:00:20,600
>> That's a pretty big effort.

9
00:00:20,600 --> 00:00:23,600
I almost didn't even make it
up in time, but It was OK.

10
00:00:23,600 --> 00:00:27,520
So I know that all of you
have just made it to the quiz.

11
00:00:27,520 --> 00:00:30,370
First of all, welcome to
the flip side of that.

12
00:00:30,370 --> 00:00:32,917
>> Secondly, we'll talk about it.

13
00:00:32,917 --> 00:00:34,000
We'll talk about the quiz.

14
00:00:34,000 --> 00:00:35,700
We'll talk about how
you're doing in the class.

15
00:00:35,700 --> 00:00:36,550
You'll be fine.

16
00:00:36,550 --> 00:00:39,080
I have your quizzes for
you at the end of here,

17
00:00:39,080 --> 00:00:42,120
so if you guys want to take
a look at it, totally fine.

18
00:00:42,120 --> 00:00:46,590
>> So quickly before we begin, the
agenda for today is as follows.

19
00:00:46,590 --> 00:00:48,430
As you can see, we're
basically rapid firing

20
00:00:48,430 --> 00:00:52,120
through a whole bunch of data structures
really, really, really quickly.

21
00:00:52,120 --> 00:00:54,380
So as such, it won't be
super interactive today.

22
00:00:54,380 --> 00:00:59,620
It'll just be me kind of shouting
things that you, and if I confuse you,

23
00:00:59,620 --> 00:01:02,680
if I'm going too fast, let me know.

24
00:01:02,680 --> 00:01:05,200
They're just various data
structures, and as part

25
00:01:05,200 --> 00:01:07,070
of your pset for this
upcoming week, you'll

26
00:01:07,070 --> 00:01:10,340
be asked to implement one of them,
perhaps two of them-- two of them

27
00:01:10,340 --> 00:01:12,319
in your pset.

28
00:01:12,319 --> 00:01:14,610
OK, so I'm just going to
start with some announcements.

29
00:01:14,610 --> 00:01:19,070
We'll go over stacks and queues more in
depth than what we did before the quiz.

30
00:01:19,070 --> 00:01:20,990
We'll go over linked
list again, once again,

31
00:01:20,990 --> 00:01:23,899
more in depth than what
we had before the quiz.

32
00:01:23,899 --> 00:01:26,440
And then we'll talk about hash
tables, trees and tries, which

33
00:01:26,440 --> 00:01:28,890
are all pretty necessary for your pset.

34
00:01:28,890 --> 00:01:32,925
And then we'll go over some
helpful tips for pset5.

35
00:01:32,925 --> 00:01:37,360
>> OK, so quiz 0.

36
00:01:37,360 --> 00:01:41,090
The average was a 58%.

37
00:01:41,090 --> 00:01:45,370
It was very low, and so you guys all
did very, very well in accordance

38
00:01:45,370 --> 00:01:46,510
with that.

39
00:01:46,510 --> 00:01:49,970
>> Pretty much, rule of thumb is if you're
within a standard deviation of the mean

40
00:01:49,970 --> 00:01:52,990
especially since we're in a less
comfy section, you're totally fine.

41
00:01:52,990 --> 00:01:54,120
You're on track.

42
00:01:54,120 --> 00:01:55,190
Life is good.

43
00:01:55,190 --> 00:01:58,952
>> I know it's scary to think that
I got like a 40% on this quiz.

44
00:01:58,952 --> 00:02:00,160
I'm going to fail this class.

45
00:02:00,160 --> 00:02:02,243
I promise you, you're not
going to fail the class.

46
00:02:02,243 --> 00:02:03,680
You're totally fine.

47
00:02:03,680 --> 00:02:06,850
>> For those of you who got over
the mean, impressive, impressive,

48
00:02:06,850 --> 00:02:08,780
like, seriously well done.

49
00:02:08,780 --> 00:02:09,689
I have them with me.

50
00:02:09,689 --> 00:02:11,730
Feel free to come get them
at the end of section.

51
00:02:11,730 --> 00:02:14,520
Let me know if you have any
issues, questions with them.

52
00:02:14,520 --> 00:02:17,204
If we add up your score
wrong, let us know.

53
00:02:17,204 --> 00:02:21,240
>> OK, so pset5, this is a really
weird week for Yale in the sense

54
00:02:21,240 --> 00:02:24,240
that our pset is due
Wednesday at noon including

55
00:02:24,240 --> 00:02:27,317
the late day, so it's actually
theoretically due Tuesday at noon.

56
00:02:27,317 --> 00:02:29,150
Probably no one finished
at Tuesday at noon.

57
00:02:29,150 --> 00:02:30,830
That's totally fine.

58
00:02:30,830 --> 00:02:33,700
We're going to have office hours
tonight as well as Monday night.

59
00:02:33,700 --> 00:02:36,810
And all of the sections this week will
actually be turned into workshops,

60
00:02:36,810 --> 00:02:38,800
so feel free to pop in
any section you want,

61
00:02:38,800 --> 00:02:42,810
and they'll be kind of mini-pset
workshops for help on that.

62
00:02:42,810 --> 00:02:45,620
So as such, this is the only section
where we're teaching material.

63
00:02:45,620 --> 00:02:49,220
All the other sections will be focusing
exclusively on help for the pset.

64
00:02:49,220 --> 00:02:50,146
Yeah?

65
00:02:50,146 --> 00:02:52,000
>> AUDIENCE: Where are office hours?

66
00:02:52,000 --> 00:02:56,120
>> ANDI PENG: Office hours
tonight-- oh, good question.

67
00:02:56,120 --> 00:03:00,580
I think office hours tonight
are in Teal or at Commons.

68
00:03:00,580 --> 00:03:02,984
If you check online CS50
and you go to office hours,

69
00:03:02,984 --> 00:03:05,650
there should be a schedule that
tells you where all of them are.

70
00:03:05,650 --> 00:03:07,954
>> I know either tonight
or tomorrow is teal,

71
00:03:07,954 --> 00:03:10,120
and I think we may have
commons for the other night.

72
00:03:10,120 --> 00:03:11,020
I'm not sure.

73
00:03:11,020 --> 00:03:11,700
Good question.

74
00:03:11,700 --> 00:03:14,430
Check on CS50.

75
00:03:14,430 --> 00:03:18,780
>> Cool, any questions regarding the
schedule for the next like three days?

76
00:03:18,780 --> 00:03:21,690
I promise you guys like David
said, this is the top of the hill.

77
00:03:21,690 --> 00:03:23,050
You guys are almost there.

78
00:03:23,050 --> 00:03:24,644
Just three more days.

79
00:03:24,644 --> 00:03:26,310
Get there, and then we'll all come down.

80
00:03:26,310 --> 00:03:28,114
We'll have a nice CS-free break.

81
00:03:28,114 --> 00:03:28,780
We'll come back.

82
00:03:28,780 --> 00:03:30,779
We'll dive into web
programming and development,

83
00:03:30,779 --> 00:03:35,150
things that are very fun compared
to some of the other psets.

84
00:03:35,150 --> 00:03:37,974
And it'll be chill, and
we'll have lots of fun.

85
00:03:37,974 --> 00:03:38,890
We'll have more candy.

86
00:03:38,890 --> 00:03:39,730
Sorry for candy.

87
00:03:39,730 --> 00:03:40,945
I forgot candy.

88
00:03:40,945 --> 00:03:43,310
It was a rough morning.

89
00:03:43,310 --> 00:03:46,340
So you guys are almost there,
and I'm really proud of you guys.

90
00:03:46,340 --> 00:03:49,570
>> OK, so stacks.

91
00:03:49,570 --> 00:03:53,331
Who loved the question about Jack
and his clothing on the quiz?

92
00:03:53,331 --> 00:03:53,830
No one?

93
00:03:53,830 --> 00:03:56,500
OK, that's fine.

94
00:03:56,500 --> 00:04:00,200
>> So essentially as you can
picture Jack, this guy here,

95
00:04:00,200 --> 00:04:03,350
loves to take the clothing
out of the top of the stack,

96
00:04:03,350 --> 00:04:05,750
and he puts it back onto
the stack after he's done.

97
00:04:05,750 --> 00:04:07,600
So in this way, he never
seems to be getting

98
00:04:07,600 --> 00:04:10,090
to the bottom of the
stack in his clothing.

99
00:04:10,090 --> 00:04:12,600
So this kind of describes
the basic data structure

100
00:04:12,600 --> 00:04:16,610
of how a stack is implemented.

101
00:04:16,610 --> 00:04:20,060
>> Essentially, think of a
stack as any stack of objects

102
00:04:20,060 --> 00:04:24,900
where you put things onto the top, and
then you pop them out from the top.

103
00:04:24,900 --> 00:04:28,600
So LIFO is the acronym we like
to use-- Last In, First Out.

104
00:04:28,600 --> 00:04:32,480
And so last in to the top of the
stack is the first one that comes out.

105
00:04:32,480 --> 00:04:34,260
And so the two terms
we like to associate

106
00:04:34,260 --> 00:04:36,190
with that are called push and pop.

107
00:04:36,190 --> 00:04:39,790
When you push something onto the
stack, and you pop it back up.

108
00:04:39,790 --> 00:04:43,422
>> And so I guess this is kind of an
abstract concept for those of you

109
00:04:43,422 --> 00:04:45,630
who want to see like an
actual implementation of this

110
00:04:45,630 --> 00:04:46,740
in the real world.

111
00:04:46,740 --> 00:04:50,170
How many of you have written an essay
maybe like an hour before it was due,

112
00:04:50,170 --> 00:04:54,510
and you accidentally deleted a huge
chunk of it, like accidentally?

113
00:04:54,510 --> 00:04:58,560
And then what control do
we use to put it back?

114
00:04:58,560 --> 00:05:00,030
Control-Z, yeah?

115
00:05:00,030 --> 00:05:03,640
Control-Z, so the amount of times
that Control-Z has saved my life,

116
00:05:03,640 --> 00:05:08,820
has saved my ass, every time
that's implemented through a stack.

117
00:05:08,820 --> 00:05:13,020
>> Essentially all the information
that's on your Word document,

118
00:05:13,020 --> 00:05:15,080
it gets pushed and popped at will.

119
00:05:15,080 --> 00:05:19,460
And so essentially whenever you
delete anything, you pop it back up.

120
00:05:19,460 --> 00:05:22,820
And then if you need it back on, you
push it, which is what Control-C does.

121
00:05:22,820 --> 00:05:26,770
And so real world function
of how simple data structure

122
00:05:26,770 --> 00:05:28,690
can help with your everyday life.

123
00:05:28,690 --> 00:05:31,710

124
00:05:31,710 --> 00:05:40,150
>> So a struct is the way that
we actually create a stack.

125
00:05:40,150 --> 00:05:44,720
We type define struct, and then
we call it stack at the bottom.

126
00:05:44,720 --> 00:05:47,440
And within the stack,
we have two parameters

127
00:05:47,440 --> 00:05:51,580
that we can essentially manipulate,
so we have char star strings capacity.

128
00:05:51,580 --> 00:05:55,150
>> All that it is doing
is creating an array

129
00:05:55,150 --> 00:05:58,835
that we can store whatever you want
which we can determine its capacity.

130
00:05:58,835 --> 00:06:01,990
Capacity Is just the max amount of
items we can put into this array.

131
00:06:01,990 --> 00:06:05,660
int size is the counter that keeps
track of how many items are currently

132
00:06:05,660 --> 00:06:07,850
in the stack.

133
00:06:07,850 --> 00:06:11,860
So then we can keep track of, A,
both how large the actual stack is,

134
00:06:11,860 --> 00:06:14,850
and, B, how much of that stack
we filled because we don't want

135
00:06:14,850 --> 00:06:18,800
to overflow over what our capacity is.

136
00:06:18,800 --> 00:06:24,340
>> So for example, this lovely
question was on your quiz.

137
00:06:24,340 --> 00:06:28,160
Essentially how do we push
onto the top of a stack.

138
00:06:28,160 --> 00:06:28,830
Pretty simple.

139
00:06:28,830 --> 00:06:30,621
If you look at it,
we'll walk through this.

140
00:06:30,621 --> 00:06:32,640
If [INAUDIBLE] size--
remember, whenever you

141
00:06:32,640 --> 00:06:35,300
want to access any
parameter within a struct,

142
00:06:35,300 --> 00:06:40,320
you do the name of the struct.parameter.

143
00:06:40,320 --> 00:06:42,720
>> In this case, s is
the name of our stack.

144
00:06:42,720 --> 00:06:46,230
We want to access the size
of it, so we do s.size.

145
00:06:46,230 --> 00:06:50,280
So as long as the size is not
equal to capacity or as long

146
00:06:50,280 --> 00:06:52,940
as it's less than capacity,
either would work here.

147
00:06:52,940 --> 00:06:57,180
>> You want to access the inside
of your stack, so s.strings,

148
00:06:57,180 --> 00:07:00,790
and you're going to put that new number
that you want to insert into there.

149
00:07:00,790 --> 00:07:05,030
Let's just say we will want to
insert int n onto the stack,

150
00:07:05,030 --> 00:07:08,905
we could do s.strings,
brackets, s.size equals n.

151
00:07:08,905 --> 00:07:11,030
Because size is where we
currently are in the stack

152
00:07:11,030 --> 00:07:14,590
if we're going to push
it on, we just access

153
00:07:14,590 --> 00:07:17,370
wherever the size is, the
current fullness of the stack,

154
00:07:17,370 --> 00:07:21,729
and we push the int n onto it.

155
00:07:21,729 --> 00:07:24,770
And then we want to make sure that
we're also incrementing size of the n,

156
00:07:24,770 --> 00:07:27,436
so we can keep track of we've
added an extra thing to the stack.

157
00:07:27,436 --> 00:07:29,660
Now we have a greater size.

158
00:07:29,660 --> 00:07:33,196
Does this here make sense to
everybody, how logically it works?

159
00:07:33,196 --> 00:07:34,160
It was kind of quick.

160
00:07:34,160 --> 00:07:39,535

161
00:07:39,535 --> 00:07:42,160
AUDIENCE: Can you go over
the s.stringss.strings[s.size] again?

162
00:07:42,160 --> 00:07:45,808
ANDI PENG: Sure, so what does
s.size currently give us?

163
00:07:45,808 --> 00:07:47,440
AUDIENCE: It's the current size.

164
00:07:47,440 --> 00:07:50,890
ANDI PENG: Exactly, so the
current index that our size is at,

165
00:07:50,890 --> 00:07:57,780
and so we want to put the new integer
that we want to insert into s.size.

166
00:07:57,780 --> 00:07:58,760
Does that make sense?

167
00:07:58,760 --> 00:08:01,110
Because s.strings, all that
is is the name of the array.

168
00:08:01,110 --> 00:08:03,510
All it is is accessing the
array within our struct,

169
00:08:03,510 --> 00:08:06,030
and so if we want to
place n into that index,

170
00:08:06,030 --> 00:08:09,651
we can just access it
using brackets s.size.

171
00:08:09,651 --> 00:08:10,150
Cool.

172
00:08:10,150 --> 00:08:13,580

173
00:08:13,580 --> 00:08:18,916
>> All right, pop, I pseudocode it out
for you guys, but similar concept.

174
00:08:18,916 --> 00:08:19,790
Does that make sense?

175
00:08:19,790 --> 00:08:22,310
If the size is greater
than zero, then you

176
00:08:22,310 --> 00:08:25,350
know that you want to take something
out because if the size is not

177
00:08:25,350 --> 00:08:27,620
greater than zero, then you
have nothing in the stack.

178
00:08:27,620 --> 00:08:29,840
>> So you only want to execute
this code, it can only

179
00:08:29,840 --> 00:08:32,320
pop if there is something to pop.

180
00:08:32,320 --> 00:08:35,830
So if the size is greater
than 0, we minus the size.

181
00:08:35,830 --> 00:08:40,020
We decrement the size and then return
whatever is inside of it because

182
00:08:40,020 --> 00:08:42,710
by popping, we want to
access whatever is stored

183
00:08:42,710 --> 00:08:45,694
in the index of the top of the stack.

184
00:08:45,694 --> 00:08:46,610
Everything make sense?

185
00:08:46,610 --> 00:08:49,693
If I made you guys write this out,
would you guys be able to write it out?

186
00:08:49,693 --> 00:08:52,029

187
00:08:52,029 --> 00:08:53,570
OK, you guys can play around with it.

188
00:08:53,570 --> 00:08:55,252
No worries if you don't get it.

189
00:08:55,252 --> 00:08:57,460
We don't have time to code
it out today because we've

190
00:08:57,460 --> 00:08:59,959
got a lot of these structures
to go through, but essentially

191
00:08:59,959 --> 00:09:02,214
pseudocode, very, very similar to push.

192
00:09:02,214 --> 00:09:03,380
Just follow along the logic.

193
00:09:03,380 --> 00:09:06,092
Make sure you're accessing all the
features of your struct correctly.

194
00:09:06,092 --> 00:09:06,574
Yeah?

195
00:09:06,574 --> 00:09:09,282
>> AUDIENCE: Will these slides and
this whole thing be up today-ish?

196
00:09:09,282 --> 00:09:11,586
ANDI PENG: Always, yep.

197
00:09:11,586 --> 00:09:13,710
I'm going to try to put
this up like an hour after.

198
00:09:13,710 --> 00:09:16,626
I'll email David, David will try to
put it up like an hour after this.

199
00:09:16,626 --> 00:09:20,040

200
00:09:20,040 --> 00:09:25,470
>> OK, so then we move into this other
lovely data structure called a queue.

201
00:09:25,470 --> 00:09:30,140
As you guys can see here, a
queue, for the British amongst us,

202
00:09:30,140 --> 00:09:32,010
all it is is a line.

203
00:09:32,010 --> 00:09:34,680
So contrary to what
you think a stack is,

204
00:09:34,680 --> 00:09:37,750
a queue is exactly what
logically you think it is.

205
00:09:37,750 --> 00:09:41,914
It's held by the rules of FIFO,
which is First In, First Out.

206
00:09:41,914 --> 00:09:43,705
If you're the first
one in the line, you're

207
00:09:43,705 --> 00:09:46,230
the first one that
comes out of the line.

208
00:09:46,230 --> 00:09:49,680
>> So what we like to call this
is dequeueing and enqueueing.

209
00:09:49,680 --> 00:09:52,380
If we want to add something
to our queue, we enqueue.

210
00:09:52,380 --> 00:09:55,690
If we want to dequeue, or take
something away, we dequeue.

211
00:09:55,690 --> 00:10:03,350
>> So same sense that we're kind of
creating fixed-size elements that we

212
00:10:03,350 --> 00:10:06,500
can store certain
things, but we can also

213
00:10:06,500 --> 00:10:10,100
change where we're placing
parameters inside of them

214
00:10:10,100 --> 00:10:13,140
based on what type of
functionality we want.

215
00:10:13,140 --> 00:10:16,700
So stacks, we wanted the last
one, N to be the first one out.

216
00:10:16,700 --> 00:10:19,800
Queue is we want the first thing
in to be the first thing out.

217
00:10:19,800 --> 00:10:22,510

218
00:10:22,510 --> 00:10:26,710
>> So the struct-type
define, as you can see,

219
00:10:26,710 --> 00:10:29,470
it's a little bit different
from what the stack was

220
00:10:29,470 --> 00:10:33,120
because not only do we have to keep
track of where the size currently is,

221
00:10:33,120 --> 00:10:37,420
we also want to keep track of the head
as well as where we currently are.

222
00:10:37,420 --> 00:10:39,580
So I think it's easier
if I draw this up.

223
00:10:39,580 --> 00:10:53,270
So let's imagine we've got a queue,
so let's say the head is right here.

224
00:10:53,270 --> 00:10:55,811

225
00:10:55,811 --> 00:10:58,310
The head of the line, let's
just say that's currently there,

226
00:10:58,310 --> 00:11:01,809
and we want to insert
something into the queue.

227
00:11:01,809 --> 00:11:04,350
I'm going to call size essentially
is the same thing as tail,

228
00:11:04,350 --> 00:11:06,314
the end of wherever your queue is.

229
00:11:06,314 --> 00:11:07,730
Let's just say size is right here.

230
00:11:07,730 --> 00:11:14,380

231
00:11:14,380 --> 00:11:18,400
>> So how does one feasibly
insert something into a queue?

232
00:11:18,400 --> 00:11:21,000

233
00:11:21,000 --> 00:11:24,130
What index do we want to place
where we want to insert into.

234
00:11:24,130 --> 00:11:29,320
If this is the beginning of your
queue and this is the end of it

235
00:11:29,320 --> 00:11:31,860
or the size of it, where do we
want to add the next object?

236
00:11:31,860 --> 00:11:32,920
>> AUDIENCE: [INAUDIBLE]

237
00:11:32,920 --> 00:11:35,920
ANDI PENG: Exactly, you want to add
it depending on have you written it.

238
00:11:35,920 --> 00:11:37,840
Either this is blank or that is blank.

239
00:11:37,840 --> 00:11:42,630
So you want to add it probably
here because if the size is--

240
00:11:42,630 --> 00:11:50,540
if these are all full, you want
to add it right here, right?

241
00:11:50,540 --> 00:11:57,150
>> And so that's, while very, very
simple, not quite always correct

242
00:11:57,150 --> 00:12:00,690
because the main difference
between a queue and a stack

243
00:12:00,690 --> 00:12:04,350
is that the queue can
actually be manipulated

244
00:12:04,350 --> 00:12:06,980
so that the head changes
depending on where you want

245
00:12:06,980 --> 00:12:08,650
the beginning of your cue to start.

246
00:12:08,650 --> 00:12:11,900
And as a result, your tail
is also going to change.

247
00:12:11,900 --> 00:12:14,770
And so take a look at
this code right now.

248
00:12:14,770 --> 00:12:18,620
As you guys were also asked to
write out on the quiz, enqueue.

249
00:12:18,620 --> 00:12:22,580
Maybe we'll talk through why
the answer was what it was.

250
00:12:22,580 --> 00:12:26,790
>> I couldn't quite fit this line on one,
but essentially this piece of code

251
00:12:26,790 --> 00:12:29,030
should be on one line.

252
00:12:29,030 --> 00:12:30,140
Spend like 30 seconds.

253
00:12:30,140 --> 00:12:33,000
Take a look, and see why
this is the way that it is.

254
00:12:33,000 --> 00:12:50,030

255
00:12:50,030 --> 00:12:55,420
>> Very, very similar struct, very, very
similar structure as the previous

256
00:12:55,420 --> 00:12:58,090
stack except for perhaps
one line of code.

257
00:12:58,090 --> 00:13:01,190
And that one line of code
determines the functionality.

258
00:13:01,190 --> 00:13:03,900
And it really differentiates
a queue from a stack.

259
00:13:03,900 --> 00:13:18,510

260
00:13:18,510 --> 00:13:22,010
>> Anyone want to take a stab
at explaining why you've

261
00:13:22,010 --> 00:13:24,980
got this complicated thing in here?

262
00:13:24,980 --> 00:13:27,845
We see the return of our
wonderful friend modulus.

263
00:13:27,845 --> 00:13:31,020
As you guys will soon come
to recognize in programming,

264
00:13:31,020 --> 00:13:34,910
almost anytime you need something
to wrap around anything,

265
00:13:34,910 --> 00:13:36,850
modulus is going to be the way to do it.

266
00:13:36,850 --> 00:13:40,510
So knowing that, does anyone want
to try explaining that line of code?

267
00:13:40,510 --> 00:13:44,060

268
00:13:44,060 --> 00:13:47,507
Yeah, all answers are
accepted and welcome.

269
00:13:47,507 --> 00:13:48,840
AUDIENCE: Are you talking to me?

270
00:13:48,840 --> 00:13:49,506
ANDI PENG: Yeah.

271
00:13:49,506 --> 00:13:56,200
AUDIENCE: Oh, no sorry.

272
00:13:56,200 --> 00:14:00,250
ANDI PENG: OK, so let's
walk through this code.

273
00:14:00,250 --> 00:14:03,642
So when you're trying to
add something onto a queue,

274
00:14:03,642 --> 00:14:08,510
in the lovely case that the head happens
to be right here, it's very easy for us

275
00:14:08,510 --> 00:14:10,960
to just go to the end
insert something, right?

276
00:14:10,960 --> 00:14:14,690
But the whole point of a queue is
that the head can actually dynamically

277
00:14:14,690 --> 00:14:17,280
change depending on where we
want the start of our q to be,

278
00:14:17,280 --> 00:14:19,880
and as such, the tail
is also going to change.

279
00:14:19,880 --> 00:14:31,100
>> And so imagine that this was not the
queue, but rather this was the queue.

280
00:14:31,100 --> 00:14:37,900

281
00:14:37,900 --> 00:14:39,330
Let's say the head is right here.

282
00:14:39,330 --> 00:14:54,900

283
00:14:54,900 --> 00:14:56,980
Let's say our queue looked like this.

284
00:14:56,980 --> 00:15:00,190
If we wanted to shift where
the beginning of the line is,

285
00:15:00,190 --> 00:15:03,400
let's say we shifted head
this way and sizes here.

286
00:15:03,400 --> 00:15:07,100
>> Now we want to add something to
this queue, but as you guys can see,

287
00:15:07,100 --> 00:15:11,150
it's not so simple as to just
add whatever is after the size

288
00:15:11,150 --> 00:15:13,630
because then we run out of
bounds of our actual array.

289
00:15:13,630 --> 00:15:16,190
Where we want to really add is here.

290
00:15:16,190 --> 00:15:18,610
That's the beauty of a queue
is that to us, visually it

291
00:15:18,610 --> 00:15:22,380
looks like the line goes like this,
but when stored in a data structure,

292
00:15:22,380 --> 00:15:29,370
they give it as like a cycle.

293
00:15:29,370 --> 00:15:32,360
It kind of wraps around
to the front the same way

294
00:15:32,360 --> 00:15:34,780
that a line can also wrap
around depending on wherever you

295
00:15:34,780 --> 00:15:36,279
want to beginning of the line to be.

296
00:15:36,279 --> 00:15:38,630
And so if we take a
look down here, let's

297
00:15:38,630 --> 00:15:40,880
say we wanted to create a
function called enqueue.

298
00:15:40,880 --> 00:15:43,980
We wanted to add int n into that q.

299
00:15:43,980 --> 00:15:49,250
If q.size q-- we'll call that our data
structure-- if our queue.size does not

300
00:15:49,250 --> 00:15:52,520
equal to capacity or if
it's less than capacity,

301
00:15:52,520 --> 00:15:55,120
q.strings is the array within our q.

302
00:15:55,120 --> 00:15:58,380
We're going to set
that equal to q.heads,

303
00:15:58,380 --> 00:16:02,730
which is right here, plus q.size
modulus by the capacity, which

304
00:16:02,730 --> 00:16:04,290
wrap us back around here.

305
00:16:04,290 --> 00:16:08,040
>> So in this example, index
of head is 1, right?

306
00:16:08,040 --> 00:16:11,480
The index of size is 0, 1, 2, 3, 4.

307
00:16:11,480 --> 00:16:19,500
So we can do 1 plus 4 modulus
by our capacity which is 5.

308
00:16:19,500 --> 00:16:20,920
What does that give us?

309
00:16:20,920 --> 00:16:23,270
What is the index that
comes out of this?

310
00:16:23,270 --> 00:16:24,080
>> AUDIENCE: 0.

311
00:16:24,080 --> 00:16:27,870
>> ANDI PENG: 0, which
happens to be right here,

312
00:16:27,870 --> 00:16:30,640
and so we want to be able
to insert into right here.

313
00:16:30,640 --> 00:16:34,730
And so this equation here kind
of just works with any numbers

314
00:16:34,730 --> 00:16:36,750
depending on where your
head and your size are.

315
00:16:36,750 --> 00:16:38,541
If you know what those
things are, you know

316
00:16:38,541 --> 00:16:43,170
exactly where you want to insert
whatever is after your queue.

317
00:16:43,170 --> 00:16:44,640
Does that make sense to everybody?

318
00:16:44,640 --> 00:16:48,560
>> I know kind of a brain
teaser especially since this

319
00:16:48,560 --> 00:16:50,512
came in the aftermath of your quiz.

320
00:16:50,512 --> 00:16:52,220
But hopefully everyone
now can understand

321
00:16:52,220 --> 00:16:57,800
why this solution or this
function is the way that it is.

322
00:16:57,800 --> 00:16:59,840
Anyone a bit unclear on that?

323
00:16:59,840 --> 00:17:03,471

324
00:17:03,471 --> 00:17:03,970
OK.

325
00:17:03,970 --> 00:17:07,109

326
00:17:07,109 --> 00:17:09,970
>> And so now, if you
wanted to dequeue, this

327
00:17:09,970 --> 00:17:15,240
is where our head would be shifting
because if we were to dequeue,

328
00:17:15,240 --> 00:17:17,030
we don't take off the end of the q.

329
00:17:17,030 --> 00:17:19,130
We want to take off the head, right?

330
00:17:19,130 --> 00:17:24,260
So as a result, head is going to change,
and that is why when you enqueue,

331
00:17:24,260 --> 00:17:26,800
you've got to keep track of
where your head and your size

332
00:17:26,800 --> 00:17:29,450
are to be able to insert
into the correct position.

333
00:17:29,450 --> 00:17:32,740
>> And so when you dequeue,
I also pseudocode it out.

334
00:17:32,740 --> 00:17:35,480
Feel free to if you want
to attempt coding this out.

335
00:17:35,480 --> 00:17:36,980
You want to move the head, right?

336
00:17:36,980 --> 00:17:39,320
If I wanted to dequeue, I
would move the head over.

337
00:17:39,320 --> 00:17:40,800
This would be the head.

338
00:17:40,800 --> 00:17:45,617
>> And our current size would
subtract because we no longer

339
00:17:45,617 --> 00:17:46,950
have four elements in the array.

340
00:17:46,950 --> 00:17:51,370
We only have three, and then we want
to return whatever was stored inside

341
00:17:51,370 --> 00:17:56,260
of the head because we want to take this
value out so very similar to the stack.

342
00:17:56,260 --> 00:17:58,010
Just you're taking
from a different place,

343
00:17:58,010 --> 00:18:01,770
and you have to reassign your pointer
to different place as a result.

344
00:18:01,770 --> 00:18:03,890
Logically, everyone follow?

345
00:18:03,890 --> 00:18:05,690
Great.

346
00:18:05,690 --> 00:18:10,156
>> OK, so we're going to talk a bit
more in depth about linked lists

347
00:18:10,156 --> 00:18:13,280
because they'll be very, very valuable
for you in the course of this week's

348
00:18:13,280 --> 00:18:14,964
psets.

349
00:18:14,964 --> 00:18:17,130
Linked lists, as you guys
can remember, all they are

350
00:18:17,130 --> 00:18:22,570
are nodes that are nodes of certain
values of both a value and a pointer

351
00:18:22,570 --> 00:18:26,290
that are linked together
by those pointers.

352
00:18:26,290 --> 00:18:29,880
And so the struct on how
we create a node here is we

353
00:18:29,880 --> 00:18:33,569
have int n, which is whatever
the value in a store or string n

354
00:18:33,569 --> 00:18:35,610
or whatever you want to
call it, the char star n.

355
00:18:35,610 --> 00:18:41,482
Struct node star, which is the pointer
that you want to have in each node,

356
00:18:41,482 --> 00:18:43,690
you're going to have that
pointer point towards next.

357
00:18:43,690 --> 00:18:48,207

358
00:18:48,207 --> 00:18:50,040
You'll have the head
of a linked list that's

359
00:18:50,040 --> 00:18:53,140
going to point to the rest of
the values so on and so forth

360
00:18:53,140 --> 00:18:55,290
until you eventually reach the end.

361
00:18:55,290 --> 00:18:58,040
And this last node is just
going to not have a pointer.

362
00:18:58,040 --> 00:18:59,952
It's going to point to
null, and that's when

363
00:18:59,952 --> 00:19:01,910
you know you've hit the
end of your linked list

364
00:19:01,910 --> 00:19:04,076
is when your last pointer
doesn't point to anything.

365
00:19:04,076 --> 00:19:06,670

366
00:19:06,670 --> 00:19:10,990
>> So we're going to go a bit more in
depth regarding how one would possibly

367
00:19:10,990 --> 00:19:12,400
search a linked list.

368
00:19:12,400 --> 00:19:15,460
Remember what are some of the
drawbacks of the linked lists

369
00:19:15,460 --> 00:19:19,340
verses an array regarding searches.

370
00:19:19,340 --> 00:19:22,565
An array you can binary search, but
why can't you do that in a linked list?

371
00:19:22,565 --> 00:19:26,834

372
00:19:26,834 --> 00:19:30,320
>> AUDIENCE: Because they're all connected,
but you don't quite know where

373
00:19:30,320 --> 00:19:31,330
[INAUDIBLE].

374
00:19:31,330 --> 00:19:34,600
>> ANDI PENG: Yeah, exactly so remember
that the brilliance of an array

375
00:19:34,600 --> 00:19:37,190
was the fact that we had
random access memory where

376
00:19:37,190 --> 00:19:41,580
if I wanted the value from index
six, I could just say index six,

377
00:19:41,580 --> 00:19:42,407
give me that value.

378
00:19:42,407 --> 00:19:45,240
And that's because arrays are sorted
in a contiguous space of memory

379
00:19:45,240 --> 00:19:48,020
in one place, whereas
kind of linked lists

380
00:19:48,020 --> 00:19:52,820
are randomly interspersed all around,
and the only way you can find one

381
00:19:52,820 --> 00:19:56,890
is through a pointer that tells you
the address of where that next node is.

382
00:19:56,890 --> 00:20:00,290
>> And so as a result, the only way
to search through a linked list

383
00:20:00,290 --> 00:20:01,560
is linear search.

384
00:20:01,560 --> 00:20:05,890
Because I don't exactly know where
the 12th value in the linked list is,

385
00:20:05,890 --> 00:20:08,780
I have to traverse the entirety
of that linked list one

386
00:20:08,780 --> 00:20:12,450
by one from the head to the first node,
to the second node, to the third node,

387
00:20:12,450 --> 00:20:17,690
all the way down until I finally get
to where that node I'm looking for is.

388
00:20:17,690 --> 00:20:22,110
And so in this sense, search
on a linked list is always n.

389
00:20:22,110 --> 00:20:23,040
It's always n.

390
00:20:23,040 --> 00:20:25,690
It's always in linear time.

391
00:20:25,690 --> 00:20:28,470
>> And so the code in which
we implement this, and this

392
00:20:28,470 --> 00:20:32,620
is a bit new for you guys since you
guys haven't really talked about or ever

393
00:20:32,620 --> 00:20:35,000
seen pointers in how to
search through pointers,

394
00:20:35,000 --> 00:20:37,670
so we'll walk through
this very, very slowly.

395
00:20:37,670 --> 00:20:40,200
So bool search, right,
let's imagine we want

396
00:20:40,200 --> 00:20:42,820
to create a function called
search that returns true

397
00:20:42,820 --> 00:20:46,820
if you found a value inside the linked
list, and it returns false otherwise.

398
00:20:46,820 --> 00:20:50,030
Node star list is
currently just the pointer

399
00:20:50,030 --> 00:20:52,960
to the first item in your linked list.

400
00:20:52,960 --> 00:20:56,700
int n is the value that you're
searching for in that list.

401
00:20:56,700 --> 00:20:58,770
>> So node star pointer equals list.

402
00:20:58,770 --> 00:21:00,970
That means we're setting
and creating a pointer

403
00:21:00,970 --> 00:21:03,592
to that first node inside of the list.

404
00:21:03,592 --> 00:21:04,300
Everyone with me?

405
00:21:04,300 --> 00:21:06,530
So if we were to go
back here, I would have

406
00:21:06,530 --> 00:21:13,850
initialized a pointer that points to
the head of whatever that list is.

407
00:21:13,850 --> 00:21:18,600
>> And then once you get down here,
while pointer does not equal null,

408
00:21:18,600 --> 00:21:22,160
so that is the loop in which we are
going to be subsequently traversing

409
00:21:22,160 --> 00:21:25,940
the rest of our list because what
happens when pointer equals null?

410
00:21:25,940 --> 00:21:27,550
We know that we have--

411
00:21:27,550 --> 00:21:28,450
>> AUDIENCE: [INAUDIBLE]

412
00:21:28,450 --> 00:21:31,491
>> ANDI PENG: Exactly, so we know that
we've reached the end of list, right?

413
00:21:31,491 --> 00:21:34,470
If you go back here, each node
should be pointing to another node

414
00:21:34,470 --> 00:21:36,550
and so on and so forth
until you hit eventually

415
00:21:36,550 --> 00:21:41,589
the tail of your linked list,
which has a pointer that just

416
00:21:41,589 --> 00:21:43,130
doesn't point anywhere other than no.

417
00:21:43,130 --> 00:21:47,510
And so you basically know that
your list is still there up

418
00:21:47,510 --> 00:21:50,900
until pointer does not equal
null because once it equals null,

419
00:21:50,900 --> 00:21:53,310
you know that there's no more stuff.

420
00:21:53,310 --> 00:21:56,930
>> So that is the loop in which we're
going to have the actual search.

421
00:21:56,930 --> 00:22:01,690
And if the pointer-- do you see
that kind of arrow function there?

422
00:22:01,690 --> 00:22:06,930
So if pointer points to n, if
the pointer at n equals equals n,

423
00:22:06,930 --> 00:22:09,180
so that means that if
the pointer that you're

424
00:22:09,180 --> 00:22:13,420
searching for on the end of each
node is actually equal to the value

425
00:22:13,420 --> 00:22:15,990
you're looking for, then
you want to return true.

426
00:22:15,990 --> 00:22:19,280
So basically, if you're at a node that
has the value that you're looking for,

427
00:22:19,280 --> 00:22:23,550
you know that you've been
able to successfully search.

428
00:22:23,550 --> 00:22:27,150
>> Otherwise, you want to set
your pointer to the next node.

429
00:22:27,150 --> 00:22:28,850
That is what that line here is doing.

430
00:22:28,850 --> 00:22:31,750
Pointer equals pointer next.

431
00:22:31,750 --> 00:22:33,360
Everyone see how that's working?

432
00:22:33,360 --> 00:22:36,580
>> And essentially you're going to just
traverse the entirety of the list,

433
00:22:36,580 --> 00:22:41,920
resetting your pointer each time until
you eventually hit the end of the list.

434
00:22:41,920 --> 00:22:45,030
And you know that there are no
more nodes to search through,

435
00:22:45,030 --> 00:22:47,999
and then you can return false
because you know that, oh, well,

436
00:22:47,999 --> 00:22:50,540
if I've been able to search
through the entirety of the list.

437
00:22:50,540 --> 00:22:54,530
If in this example, if I wanted
to look for the value of 10,

438
00:22:54,530 --> 00:22:57,250
and I start at the head, and
I search all the way down,

439
00:22:57,250 --> 00:23:00,550
and I eventually got to this, which
a pointer that points to null,

440
00:23:00,550 --> 00:23:04,415
I know that, crap, I guess 10 isn't in
this list because I couldn't find it.

441
00:23:04,415 --> 00:23:06,520
And I'm at the end of the list.

442
00:23:06,520 --> 00:23:11,040
And in which case you know
I'm going to return false.

443
00:23:11,040 --> 00:23:12,900
>> Let that soak in for a little bit.

444
00:23:12,900 --> 00:23:17,350
This will be pretty
important for your pset.

445
00:23:17,350 --> 00:23:21,140
The logic of it is very simple, perhaps
syntactically just implementing it.

446
00:23:21,140 --> 00:23:23,365
You guys want to make
sure that you understand.

447
00:23:23,365 --> 00:23:25,870

448
00:23:25,870 --> 00:23:27,650
Cool.

449
00:23:27,650 --> 00:23:32,560
>> OK, so how we would be
inserting nodes, right,

450
00:23:32,560 --> 00:23:35,380
into a list because remember
what are the what of the benefits

451
00:23:35,380 --> 00:23:39,230
of having a linked list versus
an array in terms of storage?

452
00:23:39,230 --> 00:23:41,110
>> AUDIENCE: It's dynamic,
so it's easier to--

453
00:23:41,110 --> 00:23:43,180
>> ANDI PENG: Exactly,
so it's dynamic, which

454
00:23:43,180 --> 00:23:46,880
means that it can expand and shrink
depending on the user's needs.

455
00:23:46,880 --> 00:23:56,570
And so, in this sense, we don't need
to waste unnecessary memory because I

456
00:23:56,570 --> 00:24:00,850
if I don't know how many values I want
to store, it doesn't make sense for me

457
00:24:00,850 --> 00:24:04,310
to create an array because
if I want to store 10 values

458
00:24:04,310 --> 00:24:08,380
and I create an array of 1,000, that's
a lot of wasted memory, allotted.

459
00:24:08,380 --> 00:24:11,180
That's why we want to use a linked
list to be able to dynamically

460
00:24:11,180 --> 00:24:13,860
change or shrink our size.

461
00:24:13,860 --> 00:24:17,040
>> And so that makes insertion
a bit more complicated.

462
00:24:17,040 --> 00:24:20,810
Since we can't randomly access elements
the way that we would of an array.

463
00:24:20,810 --> 00:24:24,270
If I want to insert an element
into the seventh index,

464
00:24:24,270 --> 00:24:26,930
I just can insert it
into the seventh index.

465
00:24:26,930 --> 00:24:30,020
On a linked list, it doesn't
quite work as easily,

466
00:24:30,020 --> 00:24:34,947
and so if we wanted to insert
the one here in the linked list,

467
00:24:34,947 --> 00:24:36,280
visually, it's very easy to see.

468
00:24:36,280 --> 00:24:39,363
We just want to insert it right there,
right at the beginning of the list,

469
00:24:39,363 --> 00:24:40,840
right after head.

470
00:24:40,840 --> 00:24:44,579
>> But the way in which we have to reassign
the pointers is a bit convoluted

471
00:24:44,579 --> 00:24:47,620
or, logically, it makes sense, but
you want to make sure that you have it

472
00:24:47,620 --> 00:24:50,250
completely down because
the last thing you want

473
00:24:50,250 --> 00:24:52,990
is to reassign a pointer the
way that we're doing here.

474
00:24:52,990 --> 00:24:58,170
If you dereference the
pointer from head to 1,

475
00:24:58,170 --> 00:25:01,086
then all of a sudden the
rest of your linked list

476
00:25:01,086 --> 00:25:04,680
is lost because you haven't actually
created a temporary anything.

477
00:25:04,680 --> 00:25:06,220
That's pointed to the 2.

478
00:25:06,220 --> 00:25:10,080
If you reassign the pointer, then the
rest of your list is totally lost.

479
00:25:10,080 --> 00:25:13,310
So you want to be
very, very careful here

480
00:25:13,310 --> 00:25:17,010
to first assign the
pointer from whatever you

481
00:25:17,010 --> 00:25:20,150
want to insert into wherever
you want, and then you

482
00:25:20,150 --> 00:25:22,710
can dereference the rest of your list.

483
00:25:22,710 --> 00:25:25,250
>> So this applies for wherever
you're trying to insert into.

484
00:25:25,250 --> 00:25:27,520
If you want to insert at the
head, if you want to answer here,

485
00:25:27,520 --> 00:25:29,455
if you want to insert at
the end, well, the end I

486
00:25:29,455 --> 00:25:30,910
guess you would just
have no pointer, but you

487
00:25:30,910 --> 00:25:33,830
want to make sure that you don't
lose the rest of your list.

488
00:25:33,830 --> 00:25:36,640
You always want to make sure
your new node is pointing

489
00:25:36,640 --> 00:25:39,330
towards whatever you
want to insert into,

490
00:25:39,330 --> 00:25:42,170
and then you can add the chaining on.

491
00:25:42,170 --> 00:25:43,330
Everyone clear?

492
00:25:43,330 --> 00:25:45,427
>> This is going to be
one of the real issues.

493
00:25:45,427 --> 00:25:48,010
One of the most major issues
you're going to have on your pset

494
00:25:48,010 --> 00:25:51,340
is that you're going to try to create
a linked list and insert things

495
00:25:51,340 --> 00:25:53,340
but then just lose the
rest of your linked list.

496
00:25:53,340 --> 00:25:54,900
And you're going to be like, I
don't know why this is happening?

497
00:25:54,900 --> 00:25:58,040
And it's a pain to go through
and search all of your pointers.

498
00:25:58,040 --> 00:26:02,100
>> And I guarantee you on this pset,
writing and drawing these nodes out

499
00:26:02,100 --> 00:26:03,344
will be very, very helpful.

500
00:26:03,344 --> 00:26:06,010
So you can completely keep track
of where all your pointers are,

501
00:26:06,010 --> 00:26:08,540
what's going wrong,
where all your nodes are,

502
00:26:08,540 --> 00:26:12,660
what you need to do to access or
insert or delete or any of them.

503
00:26:12,660 --> 00:26:14,550
Everyone good with that?

504
00:26:14,550 --> 00:26:15,050
Cool.

505
00:26:15,050 --> 00:26:19,300

506
00:26:19,300 --> 00:26:22,600
>> So if we wanted to look at the code?

507
00:26:22,600 --> 00:26:24,470
Oh, I don't know if we
can see the-- OK, so

508
00:26:24,470 --> 00:26:27,940
at the top all it is is a function
named insert where we want

509
00:26:27,940 --> 00:26:31,365
to insert int n into the linked list.

510
00:26:31,365 --> 00:26:32,740
We're going to walk through this.

511
00:26:32,740 --> 00:26:34,770
It's a lot of code, a lot of new syntax.

512
00:26:34,770 --> 00:26:36,220
We'll be OK.

513
00:26:36,220 --> 00:26:39,120
>> So up at the top, whenever
we want to create anything

514
00:26:39,120 --> 00:26:42,380
what do we need to do, especially if you
want it to not be stored on the stack

515
00:26:42,380 --> 00:26:43,920
but in the heap?

516
00:26:43,920 --> 00:26:45,460
We go to a malloc, right?

517
00:26:45,460 --> 00:26:48,240
So we're going to create a pointer.

518
00:26:48,240 --> 00:26:52,074
Node, pointer, new equals
malloc the size of a node

519
00:26:52,074 --> 00:26:53,740
because we want that node to be created.

520
00:26:53,740 --> 00:26:56,720
We want the amount of
memory that a node takes up

521
00:26:56,720 --> 00:26:59,300
to be allotted for the
creation of the new node.

522
00:26:59,300 --> 00:27:02,270
>> And then we're going to check to
see if new equals equals null.

523
00:27:02,270 --> 00:27:03,370
Remember what we said?

524
00:27:03,370 --> 00:27:06,470
Whatever you malloc,
what must you always do?

525
00:27:06,470 --> 00:27:09,490
You must always check to see
whether or not that is null.

526
00:27:09,490 --> 00:27:13,620
>> For example, if your operating
system was completely full,

527
00:27:13,620 --> 00:27:17,060
if you had no more memory at
all and you try to malloc,

528
00:27:17,060 --> 00:27:18,410
it would return null for you.

529
00:27:18,410 --> 00:27:21,094
And so if you try to use it
when it was pointing to null,

530
00:27:21,094 --> 00:27:23,260
you're not going to able
to access that information.

531
00:27:23,260 --> 00:27:27,010
And so as such, we wanted to make
sure that whenever you're mallocing,

532
00:27:27,010 --> 00:27:30,500
you're always checking to see if
that memory given to you is null.

533
00:27:30,500 --> 00:27:33,670
And if it's not, then we can move
on with the rest of our code.

534
00:27:33,670 --> 00:27:36,140
>> So we're going to
initialize the new node.

535
00:27:36,140 --> 00:27:39,050
We're going to do new n equals n.

536
00:27:39,050 --> 00:27:42,390
And then we're going to do
set new the pointer on new

537
00:27:42,390 --> 00:27:46,900
to null because right now we don't
want anything for it to point to.

538
00:27:46,900 --> 00:27:48,755
We have no idea where
it's going to put you,

539
00:27:48,755 --> 00:27:50,630
and then if we want to
insert it at the head,

540
00:27:50,630 --> 00:27:53,820
then we can reassign
the pointer to the head.

541
00:27:53,820 --> 00:27:58,530
Does everyone follow the logic
of where that's happening?

542
00:27:58,530 --> 00:28:02,502
>> All we're doing is creating a new
node, setting the pointer to null,

543
00:28:02,502 --> 00:28:04,210
and then reassigning
it to the head if we

544
00:28:04,210 --> 00:28:06,320
know we want to insert it at the head.

545
00:28:06,320 --> 00:28:09,420
And then the head is going to
point towards that new node.

546
00:28:09,420 --> 00:28:11,060
Everyone OK with that?

547
00:28:11,060 --> 00:28:12,380
>> So it's a two-step process.

548
00:28:12,380 --> 00:28:14,760
You've got to first assign
whatever you're creating.

549
00:28:14,760 --> 00:28:18,260
Set that pointer to the
reference, and then you

550
00:28:18,260 --> 00:28:21,400
can kind of dereference
the first pointer

551
00:28:21,400 --> 00:28:22,972
and point it towards the new node.

552
00:28:22,972 --> 00:28:25,680
Wherever you want to insert it,
that logic is going to hold true.

553
00:28:25,680 --> 00:28:27,530
>> It's kind of like assigning
temporary variables.

554
00:28:27,530 --> 00:28:28,700
Remember, you've got
to make sure that you

555
00:28:28,700 --> 00:28:30,346
don't lose track of if you're swapping.

556
00:28:30,346 --> 00:28:33,470
You want to make sure that you have a
temporary variable that kind of keeps

557
00:28:33,470 --> 00:28:35,620
track of where that thing
is stored so that you

558
00:28:35,620 --> 00:28:41,190
don't lose any value in the course
of like messing around with it.

559
00:28:41,190 --> 00:28:42,710
>> OK, so code will be here.

560
00:28:42,710 --> 00:28:45,020
You guys take a look after section.

561
00:28:45,020 --> 00:28:48,060
It will be there.

562
00:28:48,060 --> 00:28:50,280
>> So I guess how does
this differ if we wanted

563
00:28:50,280 --> 00:28:52,300
to insert into the middle or the end?

564
00:28:52,300 --> 00:28:57,892
Does anyone have an idea of what's the
pseudocode as the logical reference

565
00:28:57,892 --> 00:29:00,350
that we would take if we wanted
to insert it in the middle?

566
00:29:00,350 --> 00:29:03,391
So if we wanted to insert it at the
head, all we do is create a new node.

567
00:29:03,391 --> 00:29:06,311
We set the pointer of that
new node to whatever the head,

568
00:29:06,311 --> 00:29:08,310
and then we set the head
to the new node, right?

569
00:29:08,310 --> 00:29:11,560
If we wanted to insert it in the middle
of the list, what would we have to do?

570
00:29:11,560 --> 00:29:14,108

571
00:29:14,108 --> 00:29:16,110
>> AUDIENCE: It would still
be a similar process

572
00:29:16,110 --> 00:29:19,114
of like assigning pointer and
then assigning that pointer,

573
00:29:19,114 --> 00:29:20,530
but we would have to locate there.

574
00:29:20,530 --> 00:29:23,560
>> ANDI PENG: Exactly, so exactly
the same process except you

575
00:29:23,560 --> 00:29:27,820
have to locate where exactly you
want that new pointer to go into,

576
00:29:27,820 --> 00:29:44,790
so if I want to insert into
the middle of linked list-- OK,

577
00:29:44,790 --> 00:29:46,370
let's say that's our linked list.

578
00:29:46,370 --> 00:29:49,500
If we want to insert it right here,
we're going to create a new node.

579
00:29:49,500 --> 00:29:50,520
We're going to malloc.

580
00:29:50,520 --> 00:29:52,220
We're going to create a new node.

581
00:29:52,220 --> 00:29:55,940
We're going to assign the
pointer of this node here.

582
00:29:55,940 --> 00:29:58,335
>> But the problem that differs
from where the head is

583
00:29:58,335 --> 00:30:00,490
is that we knew exactly
where the head is.

584
00:30:00,490 --> 00:30:01,930
It was right at the first, right?

585
00:30:01,930 --> 00:30:04,870
But here we've got to keep track
of where we're inserting it into.

586
00:30:04,870 --> 00:30:07,930
If we are inserting our
node here, we've got

587
00:30:07,930 --> 00:30:12,270
to make sure that the
one previous to this node

588
00:30:12,270 --> 00:30:14,172
is the one that reassigns the pointer.

589
00:30:14,172 --> 00:30:16,380
So then you have to kind of
keep track of two things.

590
00:30:16,380 --> 00:30:19,420
If you keep track of where this
node currently is inserting into.

591
00:30:19,420 --> 00:30:23,280
You also have to keep track of where
the previous node that you're looking at

592
00:30:23,280 --> 00:30:24,340
was also there.

593
00:30:24,340 --> 00:30:25,830
Everyone good with that?

594
00:30:25,830 --> 00:30:26,500
OK.

595
00:30:26,500 --> 00:30:28,000
>> How about inserting into the end?

596
00:30:28,000 --> 00:30:34,220
If I wanted to add it here-- if I wanted
to add a new node to the end of a list,

597
00:30:34,220 --> 00:30:37,009
how might I go about doing that?

598
00:30:37,009 --> 00:30:39,300
AUDIENCE: So currently, the
last one's pointed to null.

599
00:30:39,300 --> 00:30:40,960
ANDI PENG: Yeah.

600
00:30:40,960 --> 00:30:43,560
Exactly, so this one
currently is pointed to know,

601
00:30:43,560 --> 00:30:46,720
and so I guess, in this sense, it's
very easy to add to the end of a list.

602
00:30:46,720 --> 00:30:51,810
All you have to do is set it
equal to null and then boom.

603
00:30:51,810 --> 00:30:53,070
Right there, very easy.

604
00:30:53,070 --> 00:30:53,960
Very simple.

605
00:30:53,960 --> 00:30:56,430
>> Very similar to the
head, but logically you

606
00:30:56,430 --> 00:30:59,690
want to make sure that the steps
you take towards doing any of this,

607
00:30:59,690 --> 00:31:01,500
you're following along.

608
00:31:01,500 --> 00:31:04,420
It's very easy to, in the middle
of your code, get caught up on,

609
00:31:04,420 --> 00:31:05,671
oh, I've got so many pointers.

610
00:31:05,671 --> 00:31:07,461
I don't know where
anything is pointing to.

611
00:31:07,461 --> 00:31:09,170
I don't even know which node I'm on.

612
00:31:09,170 --> 00:31:11,490
What's going on?

613
00:31:11,490 --> 00:31:13,620
>> Relax, calm down, take a deep breath.

614
00:31:13,620 --> 00:31:15,530
Draw out your linked list.

615
00:31:15,530 --> 00:31:18,800
If you say, I know where exactly
I need to insert this into

616
00:31:18,800 --> 00:31:22,970
and I know exactly how to reassign my
pointers, much, much easier to picture

617
00:31:22,970 --> 00:31:27,200
out-- much, much easier to not
get lost in the bugs of your code.

618
00:31:27,200 --> 00:31:29,410
Everyone OK with that?

619
00:31:29,410 --> 00:31:31,380
OK.

620
00:31:31,380 --> 00:31:35,120
>> So I guess a concept that we haven't
really talked about before now,

621
00:31:35,120 --> 00:31:38,131
and I guess you probably
won't encounter much yet--

622
00:31:38,131 --> 00:31:40,880
it's kind of an advanced concept--
is that we actually have a data

623
00:31:40,880 --> 00:31:43,900
structure called a doubly linked list.

624
00:31:43,900 --> 00:31:46,390
So as you guys can see,
all we're doing is creating

625
00:31:46,390 --> 00:31:50,400
an actual value, an extra
pointer on each of our nodes

626
00:31:50,400 --> 00:31:52,660
that also points to the previous node.

627
00:31:52,660 --> 00:31:58,170
So not only do we have our
nodes point to the next one.

628
00:31:58,170 --> 00:32:01,430
They also point to the previous one.

629
00:32:01,430 --> 00:32:04,310
I'm going to ignore these two right now.

630
00:32:04,310 --> 00:32:06,740
>> So then you have a chain
that can move both ways,

631
00:32:06,740 --> 00:32:09,630
and then it's a bit easier
to logically follow along.

632
00:32:09,630 --> 00:32:11,896
Like here, instead of
keeping track of, oh, I

633
00:32:11,896 --> 00:32:14,520
have to know that this node is
the one that I have to reassign,

634
00:32:14,520 --> 00:32:17,532
I can just go here and
just pull the previous.

635
00:32:17,532 --> 00:32:19,490
Then I know exactly where
that is, and then you

636
00:32:19,490 --> 00:32:21,130
don't have to traverse the
entirety of the linked list.

637
00:32:21,130 --> 00:32:22,180
It's a bit easier.

638
00:32:22,180 --> 00:32:24,960
>> But as such, you have doubly
the amount of pointers,

639
00:32:24,960 --> 00:32:26,960
that's double the amount of memory.

640
00:32:26,960 --> 00:32:28,950
It's a lot of pointers to keep track of.

641
00:32:28,950 --> 00:32:32,140
It's a bit more complex, but it's
a bit more user friendly depending

642
00:32:32,140 --> 00:32:34,080
on what you're trying to accomplish.

643
00:32:34,080 --> 00:32:36,910
>> So this type of data
structure totally exists,

644
00:32:36,910 --> 00:32:40,280
and the structure for is very, very
simple except all you're having is,

645
00:32:40,280 --> 00:32:43,850
instead of just a pointer to next,
you also have a pointer to previous.

646
00:32:43,850 --> 00:32:45,940
That's all the difference was.

647
00:32:45,940 --> 00:32:47,740
Everyone good with that?

648
00:32:47,740 --> 00:32:48,240
Cool.

649
00:32:48,240 --> 00:32:50,940

650
00:32:50,940 --> 00:32:53,280
>> All right, so now I'm
to really spend probably

651
00:32:53,280 --> 00:32:56,870
like 15 to 20 minutes or the bulk
of the rest of the time in section

652
00:32:56,870 --> 00:32:58,360
talking about hash tables.

653
00:32:58,360 --> 00:33:02,590
How many of you guys
have read pset5 spec?

654
00:33:02,590 --> 00:33:03,620
All right, good.

655
00:33:03,620 --> 00:33:06,160
That's higher than the 50% of normally.

656
00:33:06,160 --> 00:33:07,560
It's OK.

657
00:33:07,560 --> 00:33:10,345
>> So as you guys will see,
you're challenge in pset5

658
00:33:10,345 --> 00:33:16,790
will be to implement a dictionary
where you load over 140,000 words

659
00:33:16,790 --> 00:33:20,610
that we give you and spell check
it against all of the text.

660
00:33:20,610 --> 00:33:22,580
We'll give you random
pieces of literature.

661
00:33:22,580 --> 00:33:23,520
We'll give you The Odyssey.

662
00:33:23,520 --> 00:33:24,561
We'll give you The Iliad.

663
00:33:24,561 --> 00:33:26,350
We'll give you Austin Powers.

664
00:33:26,350 --> 00:33:28,220
>> And your challenge
will be to spell check

665
00:33:28,220 --> 00:33:31,760
every single word in all
of those dictionaries

666
00:33:31,760 --> 00:33:34,960
essentially with our spell checker.

667
00:33:34,960 --> 00:33:38,620
And so there's a few parts
of creating this pset,

668
00:33:38,620 --> 00:33:41,970
first you want to be
able to actually load

669
00:33:41,970 --> 00:33:43,970
all the words into your
dictionary, and then you

670
00:33:43,970 --> 00:33:45,530
want to be able to
spell check all of them.

671
00:33:45,530 --> 00:33:48,780
And so as such, you're going to require
a data structure that can do this fast

672
00:33:48,780 --> 00:33:50,790
and efficiently and dynamically.

673
00:33:50,790 --> 00:33:52,900
>> So I suppose the easiest
way to do this, you

674
00:33:52,900 --> 00:33:55,010
would probably create an array, right?

675
00:33:55,010 --> 00:33:58,910
The easiest way of storage is you
can create an array of 140,000 words

676
00:33:58,910 --> 00:34:03,400
and just place them all there and
then traverse them by binary search

677
00:34:03,400 --> 00:34:06,780
or by selections or not--
sorry that's sorting.

678
00:34:06,780 --> 00:34:10,729
You can sort them and then traverse them
by binary search or just linear search

679
00:34:10,729 --> 00:34:13,730
and just final the words, but that
takes a huge amount of memory,

680
00:34:13,730 --> 00:34:15,190
and it's not very efficient.

681
00:34:15,190 --> 00:34:18,350
>> And so we're going to start
talking about ways of making

682
00:34:18,350 --> 00:34:20,110
our running time more efficient.

683
00:34:20,110 --> 00:34:23,190
And our goal is to get
constant time where

684
00:34:23,190 --> 00:34:25,810
it's almost like arrays, where
you have instantaneous access.

685
00:34:25,810 --> 00:34:28,560
If I wanted to search for anything,
I want to be able to just,

686
00:34:28,560 --> 00:34:30,810
boom, find it exactly, and pull it out.

687
00:34:30,810 --> 00:34:34,100
And so a structure in which
we'll be becoming very close

688
00:34:34,100 --> 00:34:37,569
to be able to access constant
time, this holy grail

689
00:34:37,569 --> 00:34:41,370
in programming of constant
time is called a hash table.

690
00:34:41,370 --> 00:34:45,370
And so David previously mentioned the
[INAUDIBLE] a little bit in lecture,

691
00:34:45,370 --> 00:34:49,100
but we're going to really
dive in deep this week

692
00:34:49,100 --> 00:34:51,780
on a piece that's regarding
how a hash table works.

693
00:34:51,780 --> 00:34:53,949
>> So the way that a hash
table works, for example,

694
00:34:53,949 --> 00:35:00,230
if I wanted to store a bunch of words, a
bunch of words in the English language,

695
00:35:00,230 --> 00:35:02,940
I could theoretically put
banana, apple, kiwi, mango, pair,

696
00:35:02,940 --> 00:35:04,980
and cantaloupe all on just an array.

697
00:35:04,980 --> 00:35:07,044
They could all fit in and be find.

698
00:35:07,044 --> 00:35:09,210
It'd be kind of a pain to
search through and access,

699
00:35:09,210 --> 00:35:12,920
but the easier way of doing this is
that we can create actually a structure

700
00:35:12,920 --> 00:35:15,680
called a hash table where we hash.

701
00:35:15,680 --> 00:35:19,880
We run all of our keys through
a hash function, an equation,

702
00:35:19,880 --> 00:35:22,600
that turns them all into
some sort of a value

703
00:35:22,600 --> 00:35:28,740
that then we can store onto
essentially an array of linked list.

704
00:35:28,740 --> 00:35:32,570
>> And so here, if we wanted
to store English words,

705
00:35:32,570 --> 00:35:37,250
we could potentially just, I don't
know, turn all the first letters

706
00:35:37,250 --> 00:35:39,630
into some sort of a number.

707
00:35:39,630 --> 00:35:43,140
And so, for example, if I wanted
A to be synonymous with apple--

708
00:35:43,140 --> 00:35:47,460
or with the index of 0, and
B to be synonymous with 1,

709
00:35:47,460 --> 00:35:51,030
we can have 26 entries
that can just store

710
00:35:51,030 --> 00:35:53,610
all of the letters of the
alphabet that we'll start with.

711
00:35:53,610 --> 00:35:56,130
And then we can have
apple at the index of 0.

712
00:35:56,130 --> 00:35:59,160
We can have banana at the index of
1, cantaloupe at the index of 2,

713
00:35:59,160 --> 00:36:00,540
and so on and so forth.

714
00:36:00,540 --> 00:36:04,460
And thus if I wanted to search
my hash table and access apple,

715
00:36:04,460 --> 00:36:07,560
I know apple starts with
an A, and I know exactly

716
00:36:07,560 --> 00:36:10,860
that it must be and the hash
table at index 0 because

717
00:36:10,860 --> 00:36:13,620
of the function previously assigned.

718
00:36:13,620 --> 00:36:16,572
>> So I don't know, we are
a user program where

719
00:36:16,572 --> 00:36:18,780
you'll be charged with
arbitrarily-- not arbitrarily,

720
00:36:18,780 --> 00:36:22,530
with trying to thoughtfully
think of good equations

721
00:36:22,530 --> 00:36:25,460
to be able to spread
out all of your values

722
00:36:25,460 --> 00:36:29,370
in a way they can easily access
it later on with like an equation

723
00:36:29,370 --> 00:36:31,130
that you, yourself, know.

724
00:36:31,130 --> 00:36:35,210
So in the sense if I wanted to go to
mango, I know, oh, it starts with m.

725
00:36:35,210 --> 00:36:37,134
It must be at the index of 12.

726
00:36:37,134 --> 00:36:38,800
I don't have to search through anything.

727
00:36:38,800 --> 00:36:42,080
I know exactly-- I could just go to
the index of 12 and pull that out.

728
00:36:42,080 --> 00:36:45,520
>> Everyone clear on how a
hash table's function works?

729
00:36:45,520 --> 00:36:48,380
It's kind of just a more complex array.

730
00:36:48,380 --> 00:36:50,010
That's all it is.

731
00:36:50,010 --> 00:36:51,630
OK.

732
00:36:51,630 --> 00:36:57,690
>> So I guess we run into
this issue of what

733
00:36:57,690 --> 00:37:06,390
happens if you have multiple things
that give you the same index?

734
00:37:06,390 --> 00:37:10,570
So say our function, all it
did was take that first letter

735
00:37:10,570 --> 00:37:14,490
and turn that into a
respective 0 through 25 index.

736
00:37:14,490 --> 00:37:17,137
That's totally fine if
you only have one of each.

737
00:37:17,137 --> 00:37:18,970
But the second you start
having more, you're

738
00:37:18,970 --> 00:37:20,910
going to have what's called a collision.

739
00:37:20,910 --> 00:37:25,580
>> So if I try to insert bury into a hash
table that already has banana on it,

740
00:37:25,580 --> 00:37:27,870
what's going to happen when
you try to insert that?

741
00:37:27,870 --> 00:37:30,930
Bad things because banana
already exists within the index

742
00:37:30,930 --> 00:37:33,800
that you want to store it in.

743
00:37:33,800 --> 00:37:35,560
Berry kind of is like, ah, what do I do?

744
00:37:35,560 --> 00:37:37,080
I don't know where to go.

745
00:37:37,080 --> 00:37:38,410
How do I resolve this?

746
00:37:38,410 --> 00:37:41,150
>> And so you guys will kind of
see we do this tricky thing

747
00:37:41,150 --> 00:37:44,810
where we can kind of actually
create linked list in our arrays.

748
00:37:44,810 --> 00:37:46,840
And so the easiest way
to think about this,

749
00:37:46,840 --> 00:37:50,830
all hash table is an
array of linked lists.

750
00:37:50,830 --> 00:37:55,670
And so, in that sense, you have
this beautiful array of pointers,

751
00:37:55,670 --> 00:37:58,740
and then each pointer in
that value, in that index,

752
00:37:58,740 --> 00:38:00,740
can actually point to other things.

753
00:38:00,740 --> 00:38:05,720
And so you have all these separate
chains coming off of one big array.

754
00:38:05,720 --> 00:38:07,960
>> And so here, if I
wanted to insert berry,

755
00:38:07,960 --> 00:38:11,220
I know, OK, I'm going to input
it through my hash function.

756
00:38:11,220 --> 00:38:15,070
I'm going to end up with the index of
1, and then I'm going to be able to have

757
00:38:15,070 --> 00:38:20,410
just a smaller subset of this
giant 140,000-word dictionary.

758
00:38:20,410 --> 00:38:24,220
And then I can just look
through 1/26 of that.

759
00:38:24,220 --> 00:38:27,910
>> And so then I can just insert
berry either before or after banana

760
00:38:27,910 --> 00:38:28,820
in this case?

761
00:38:28,820 --> 00:38:29,700
After, right?

762
00:38:29,700 --> 00:38:33,920
And so you're going to want to
insert this node after banana,

763
00:38:33,920 --> 00:38:36,667
and so you're going to insert
at the tail of that linked list.

764
00:38:36,667 --> 00:38:38,500
I'm going to go back
to this previous slide,

765
00:38:38,500 --> 00:38:40,680
so you guys can see how
hash function works.

766
00:38:40,680 --> 00:38:43,980
>> So hash function is this equation
that you're running kind of your input

767
00:38:43,980 --> 00:38:46,940
through to get whatever index
you want to assign it towards.

768
00:38:46,940 --> 00:38:51,130
And so, in this example, all we wanted
to do was take the first letter,

769
00:38:51,130 --> 00:38:55,890
turn that into an index, then we
can store that in our hash function.

770
00:38:55,890 --> 00:39:00,160
All we're doing here is we're
converting the first letter.

771
00:39:00,160 --> 00:39:04,770
So keykey[0] is just the first letter
of whatever string we're having,

772
00:39:04,770 --> 00:39:05,720
we're passing in.

773
00:39:05,720 --> 00:39:09,740
We're converting that to upper, and
we're subtracting by uppercase A,

774
00:39:09,740 --> 00:39:11,740
so all that is doing
is giving us a number

775
00:39:11,740 --> 00:39:13,670
in which we can hash our values onto.

776
00:39:13,670 --> 00:39:16,550
>> And then we're going to
return hash modulus SIZE.

777
00:39:16,550 --> 00:39:19,340
Be very, very careful
because, theoretically, here

778
00:39:19,340 --> 00:39:21,870
your hash value could be infinite.

779
00:39:21,870 --> 00:39:23,660
It could just go on and on and on.

780
00:39:23,660 --> 00:39:26,080
It could be some really,
really large value,

781
00:39:26,080 --> 00:39:29,849
but because your hash table that
you've created only has 26 indexes,

782
00:39:29,849 --> 00:39:31,890
you want to make sure your
modulusing so that you

783
00:39:31,890 --> 00:39:33,848
don't run-- it's the same
thing as your queue--

784
00:39:33,848 --> 00:39:36,320
so that you don't run off the
bottom of your hash function.

785
00:39:36,320 --> 00:39:39,210
>> You want to wrap it back around
the same way in [INAUDIBLE] when

786
00:39:39,210 --> 00:39:41,750
you had like a very,
very large letter, you

787
00:39:41,750 --> 00:39:43,740
didn't want that to
just run off the end.

788
00:39:43,740 --> 00:39:46,948
Same thing here, you want to make sure
it doesn't run off the end by wrapping

789
00:39:46,948 --> 00:39:48,330
around to the top of the table.

790
00:39:48,330 --> 00:39:50,530
So this is just a very
simple hash function.

791
00:39:50,530 --> 00:39:56,570
All that did was take the first
letter of whatever our input was

792
00:39:56,570 --> 00:40:01,660
and turn that into an index that
we could put into our hash table.

793
00:40:01,660 --> 00:40:05,450
>> Yeah, and so as I said before,
the way that we resolve collisions

794
00:40:05,450 --> 00:40:09,330
in our hash tables are having,
what we call, chaining.

795
00:40:09,330 --> 00:40:13,860
So if you try to insert multiple
words that start with the same thing,

796
00:40:13,860 --> 00:40:16,145
you're going to have one hash value.

797
00:40:16,145 --> 00:40:18,770
Avocados and apple, if you've
run it through our hash function,

798
00:40:18,770 --> 00:40:21,450
are going to give you the
same number, the number of 0.

799
00:40:21,450 --> 00:40:24,550
And so the way we resolve that is
that we can actually kind of link them

800
00:40:24,550 --> 00:40:27,010
together via linked lists.

801
00:40:27,010 --> 00:40:29,600
>> And so in this sense,
you guys can see kind

802
00:40:29,600 --> 00:40:32,640
of how data structures that
we've been setting previously

803
00:40:32,640 --> 00:40:35,870
like a raisin linked list kind
of can come together into one.

804
00:40:35,870 --> 00:40:38,860
And then you can create far
more efficient data structures

805
00:40:38,860 --> 00:40:43,350
that can handle larger amounts of
data, that dynamically resize depending

806
00:40:43,350 --> 00:40:44,870
on your needs.

807
00:40:44,870 --> 00:40:45,620
Everyone clear?

808
00:40:45,620 --> 00:40:47,580
Everyone kind of clear
on what happens here?

809
00:40:47,580 --> 00:40:52,110
>> If I wanted to insert-- what's a
fruit that starts with, I don't know,

810
00:40:52,110 --> 00:40:54,726
B, other than berry, banana.

811
00:40:54,726 --> 00:40:55,710
>> AUDIENCE: Blackberry.

812
00:40:55,710 --> 00:40:57,910
>> ANDI PENG: Blackberry, blackberry.

813
00:40:57,910 --> 00:41:00,530
Where does blackberry go here?

814
00:41:00,530 --> 00:41:04,251
Well, we actually haven't sorted
this yet, but theoretically

815
00:41:04,251 --> 00:41:06,250
if we wanted to have this
in alphabetical order,

816
00:41:06,250 --> 00:41:07,944
where should blackberry go?

817
00:41:07,944 --> 00:41:09,210
>> AUDIENCE: [INAUDIBLE]

818
00:41:09,210 --> 00:41:11,100
>> ANDI PENG: Exactly, after here, right?

819
00:41:11,100 --> 00:41:14,950
But since it's very difficult to
reorder-- I guess it's up to you guys.

820
00:41:14,950 --> 00:41:17,920
You guys can totally
implement whatever you want.

821
00:41:17,920 --> 00:41:20,730
The more efficient way
of doing this perhaps

822
00:41:20,730 --> 00:41:24,570
would be to sort your linked
list into alphabetical order,

823
00:41:24,570 --> 00:41:26,520
and so when you're
inserting things, you want

824
00:41:26,520 --> 00:41:28,632
to be sure to insert them
into alphabetical order

825
00:41:28,632 --> 00:41:30,590
so that then when you're
trying to search them,

826
00:41:30,590 --> 00:41:32,410
you don't have to traverse everything.

827
00:41:32,410 --> 00:41:35,290
You know exactly where
it is, and it's easier.

828
00:41:35,290 --> 00:41:39,100
>> But if you kind of have
things interspersed randomly,

829
00:41:39,100 --> 00:41:41,420
you're still going to have
to traverse it anyways.

830
00:41:41,420 --> 00:41:44,990
And so if I wanted to just
insert blackberry here

831
00:41:44,990 --> 00:41:47,470
and I wanted to search for
it, I know, oh, blackberry

832
00:41:47,470 --> 00:41:52,012
must start with the index of 1, so I
know instantaneously just search at 1.

833
00:41:52,012 --> 00:41:53,970
And then I can kind of
traverse the linked list

834
00:41:53,970 --> 00:41:56,120
until I get to blackberry,
and then-- yeah?

835
00:41:56,120 --> 00:41:59,550
>> AUDIENCE: If you're trying to create--
I guess like this is a very simple hash

836
00:41:59,550 --> 00:42:00,050
function.

837
00:42:00,050 --> 00:42:02,835
And if we wanted to do
multiple layers of that like,

838
00:42:02,835 --> 00:42:05,870
OK, we want to separate into
like all the alphabetical letters

839
00:42:05,870 --> 00:42:09,040
and then again to like another set
of alphabetical letters within that,

840
00:42:09,040 --> 00:42:11,715
are we putting like a hash
table within a hash table,

841
00:42:11,715 --> 00:42:13,256
or like a function within a function?

842
00:42:13,256 --> 00:42:14,880
Or is that--

843
00:42:14,880 --> 00:42:17,510
>> ANDI PENG: So your hash
function-- your hash table

844
00:42:17,510 --> 00:42:19,360
can be as large as you want it to.

845
00:42:19,360 --> 00:42:21,930
So in this sense, I thought
it was very easy, very

846
00:42:21,930 --> 00:42:25,320
simple for me to just sort based
on letters of the first word.

847
00:42:25,320 --> 00:42:28,690
And so there's only 26 options.

848
00:42:28,690 --> 00:42:32,650
I can only get 26 options from
0 to 25 because they can only

849
00:42:32,650 --> 00:42:36,510
start from A to Z. But If you wanted
to add, perhaps, more complexity

850
00:42:36,510 --> 00:42:39,260
or faster run time to your
hash table, you absolutely

851
00:42:39,260 --> 00:42:40,760
can do all sorts of things.

852
00:42:40,760 --> 00:42:43,330
You can make your own
equation that gives you

853
00:42:43,330 --> 00:42:48,000
more distribution in your
words, then when you search,

854
00:42:48,000 --> 00:42:49,300
it's going to be faster.

855
00:42:49,300 --> 00:42:52,100
>> It's totally up to you guys
how you want to implement that.

856
00:42:52,100 --> 00:42:55,140
Think of it as just buckets.

857
00:42:55,140 --> 00:42:57,376
If I wanted to have
26 buckets, I'm going

858
00:42:57,376 --> 00:42:59,420
to sort things into those buckets.

859
00:42:59,420 --> 00:43:02,980
But I'm going to have a bunch
of stuff in each bucket,

860
00:43:02,980 --> 00:43:05,890
so if you want to make it
faster and more efficient,

861
00:43:05,890 --> 00:43:07,190
let me have a hundred buckets.

862
00:43:07,190 --> 00:43:09,290
>> But then you have to figure out a
way to sort things so that they are

863
00:43:09,290 --> 00:43:11,040
in the proper bucket they should be in.

864
00:43:11,040 --> 00:43:13,331
But then when you actually
want to look at that bucket,

865
00:43:13,331 --> 00:43:16,410
it's a lot faster because there's
less stuff in each bucket.

866
00:43:16,410 --> 00:43:20,250
And so, yeah, that's actually
the trick for you guys in pset5

867
00:43:20,250 --> 00:43:22,360
is that you'll be
challenged to just create

868
00:43:22,360 --> 00:43:26,170
whatever is the most efficient
function you can think of to be

869
00:43:26,170 --> 00:43:28,520
able to store and check these values.

870
00:43:28,520 --> 00:43:30,840
>> Totally up to you guys
however you want to do it,

871
00:43:30,840 --> 00:43:32,229
but that's a really good point.

872
00:43:32,229 --> 00:43:34,520
That the kind of logic you
want to start thinking about

873
00:43:34,520 --> 00:43:37,236
is, well, why don't I make more buckets.

874
00:43:37,236 --> 00:43:39,527
And then I have to search
less things, and then maybe I

875
00:43:39,527 --> 00:43:41,640
have a different hash function.

876
00:43:41,640 --> 00:43:45,500
>> Yeah, there's a lot of ways to do this
pset, some are faster than others.

877
00:43:45,500 --> 00:43:50,630
I'm totally going to just see how
fast was the fastest you guys will

878
00:43:50,630 --> 00:43:55,170
be able to get your functions to work.

879
00:43:55,170 --> 00:43:58,176
OK, everyone good on
chaining and hash tables?

880
00:43:58,176 --> 00:44:00,800
It's actually like a very simple
concept if you think about it.

881
00:44:00,800 --> 00:44:05,160
All it is is separating whatever
your inputs are into buckets,

882
00:44:05,160 --> 00:44:10,670
sorting them, and then searching the
lists that there's associated with.

883
00:44:10,670 --> 00:44:11,852
>> Cool.

884
00:44:11,852 --> 00:44:18,160
All right, now we have a different sort
of data structure that's called a tree.

885
00:44:18,160 --> 00:44:20,850
Let's go on and talk about tries
which are distinctly different,

886
00:44:20,850 --> 00:44:22,330
but in the same category.

887
00:44:22,330 --> 00:44:29,010
Essentially, all a tree is instead
of organizing data in the linear way

888
00:44:29,010 --> 00:44:32,560
that a hash table does-- you
know, it's got a top and a bottom

889
00:44:32,560 --> 00:44:37,900
and then you kind of link off of it-- a
tree has a top which you call the root,

890
00:44:37,900 --> 00:44:40,220
and then it has leaves all around it.

891
00:44:40,220 --> 00:44:42,390
>> And so all you have here
is just the top node

892
00:44:42,390 --> 00:44:45,980
that points to other nodes, that points
to more nodes, and so on and so forth.

893
00:44:45,980 --> 00:44:48,130
And so you just have splitting branches.

894
00:44:48,130 --> 00:44:53,255
It's just a different way of organizing
data, and because we call it a tree,

895
00:44:53,255 --> 00:44:56,270
you guys just-- it's just
modeled out to look like a tree.

896
00:44:56,270 --> 00:44:57,670
That's why we call it trees.

897
00:44:57,670 --> 00:44:59,370
>> Hash table looks like a table.

898
00:44:59,370 --> 00:45:01,310
A tree just looks like a tree.

899
00:45:01,310 --> 00:45:03,300
All it is is a separate
way of organizing nodes

900
00:45:03,300 --> 00:45:06,020
depending on what your needs are.

901
00:45:06,020 --> 00:45:11,810
>> So you have a root and
then you have leaves.

902
00:45:11,810 --> 00:45:15,380
The way that we can particularly
think about it is a binary tree,

903
00:45:15,380 --> 00:45:18,150
a binary tree is just a
specific type of a tree

904
00:45:18,150 --> 00:45:22,450
where each node only points
to, at max, two other nodes.

905
00:45:22,450 --> 00:45:25,434
And so here you have distinct
symmetry in your tree

906
00:45:25,434 --> 00:45:28,600
that makes it easier to kind of look
at what values you are because then you

907
00:45:28,600 --> 00:45:30,150
have always a left or a right.

908
00:45:30,150 --> 00:45:33,150
There's never like a left third from
the left or a fourth from the left.

909
00:45:33,150 --> 00:45:36,358
It's just you have a left and a right
and you can search either of those two.

910
00:45:36,358 --> 00:45:38,980
And so why is this useful?

911
00:45:38,980 --> 00:45:40,980
The way that this is
useful is if you're looking

912
00:45:40,980 --> 00:45:42,890
to search through values, right?

913
00:45:42,890 --> 00:45:45,640
Rather than implementing binary
search in an error array,

914
00:45:45,640 --> 00:45:49,260
if you wanted to be able to insert nodes
and take away nodes at will and also

915
00:45:49,260 --> 00:45:52,185
preserve the search
capacities of binary search.

916
00:45:52,185 --> 00:45:54,560
So in this way, we're kind of
tricking-- remember when we

917
00:45:54,560 --> 00:45:56,530
said linked lists can't binary search?

918
00:45:56,530 --> 00:46:01,700
We're kind of creating a data structure
that tricks that into working.

919
00:46:01,700 --> 00:46:05,034
>> And so because linked lists are linear,
they only link one after the other.

920
00:46:05,034 --> 00:46:06,950
We can kind of have
different sort of pointers

921
00:46:06,950 --> 00:46:09,408
that point to different nodes
that can help us with search.

922
00:46:09,408 --> 00:46:12,590
And so here, if I wanted to
have a binary search tree,

923
00:46:12,590 --> 00:46:14,090
I know that my middle if 55.

924
00:46:14,090 --> 00:46:18,280
I'm just going to create that
as my middle, as my root,

925
00:46:18,280 --> 00:46:20,770
and then I'm going to have
values spin off of it.

926
00:46:20,770 --> 00:46:25,610
>> So here, if I'm going to search for
the value of 66, I can start at 55.

927
00:46:25,610 --> 00:46:27,310
It's 66 greater than 55?

928
00:46:27,310 --> 00:46:30,970
Yes it is, so I know I mus search
i n the right pointer of this tree.

929
00:46:30,970 --> 00:46:32,440
I go to 77.

930
00:46:32,440 --> 00:46:35,367
OK, is 66 less than or greater than 77?

931
00:46:35,367 --> 00:46:37,950
It's less than, so you know, oh,
that has to be the left node.

932
00:46:37,950 --> 00:46:41,410
>> And so here we're kind of preserving
all of the great things about arrays,

933
00:46:41,410 --> 00:46:44,420
so like dynamic resizing
of objects, being

934
00:46:44,420 --> 00:46:49,530
able to insert and delete at will,
without having to worry about the fixed

935
00:46:49,530 --> 00:46:50,370
amount of space.

936
00:46:50,370 --> 00:46:52,820
We still preserve all of
those wonderful things

937
00:46:52,820 --> 00:46:57,140
while also being able to preserve the
log and search time of binary search

938
00:46:57,140 --> 00:47:00,450
that we were only previously
able to get a phrase.

939
00:47:00,450 --> 00:47:06,310
>> Cool data structure, kind of
complex to implement, the node.

940
00:47:06,310 --> 00:47:08,311
As you can see, all it
is the struct of the node

941
00:47:08,311 --> 00:47:10,143
is that you have a left
and a right pointer.

942
00:47:10,143 --> 00:47:11,044
That's all it is.

943
00:47:11,044 --> 00:47:12,960
So rather than just
having an x or a previous.

944
00:47:12,960 --> 00:47:15,920
You have a left or a right, and then
you can kind of link them together

945
00:47:15,920 --> 00:47:16,836
however you so choose.

946
00:47:16,836 --> 00:47:21,080

947
00:47:21,080 --> 00:47:24,270
>> OK, we're actually going
just take a few minutes.

948
00:47:24,270 --> 00:47:25,790
So we're going to go back here.

949
00:47:25,790 --> 00:47:28,270
As I said previously,
I kind of explained

950
00:47:28,270 --> 00:47:31,520
the logic behind how we
would search through this.

951
00:47:31,520 --> 00:47:33,860
We're going to try
pseudocoding this out to see

952
00:47:33,860 --> 00:47:38,000
if we can kind of apply the
same logic of binary search

953
00:47:38,000 --> 00:47:40,055
to a different type of data structure.

954
00:47:40,055 --> 00:47:45,049
If you guys want to take like a couple
minutes to just think about this.

955
00:47:45,049 --> 00:48:45,927

956
00:48:45,927 --> 00:48:46,925
OK.

957
00:48:46,925 --> 00:48:51,407
All right, I'm going to
actually just give you the-- no,

958
00:48:51,407 --> 00:48:52,990
we'll talk about the pseudocode first.

959
00:48:52,990 --> 00:48:56,580
So does anyone want
to give a stab at what

960
00:48:56,580 --> 00:49:02,100
the first thing you want to do when
you're starting out searching is?

961
00:49:02,100 --> 00:49:04,460
If we're looking for
the value of 66, what's

962
00:49:04,460 --> 00:49:07,940
the first thing we want to do if
we want to binary search this tree?

963
00:49:07,940 --> 00:49:10,760
>> AUDIENCE: You want to look right
and look left and see [INAUDIBLE]

964
00:49:10,760 --> 00:49:11,230
greater number.

965
00:49:11,230 --> 00:49:12,271
>> ANDI PENG: Yeah, exactly.

966
00:49:12,271 --> 00:49:15,350
So you're going to look at your root.

967
00:49:15,350 --> 00:49:18,180
There's lots of ways you can call
it, your parent node people say.

968
00:49:18,180 --> 00:49:21,317
I like to say root because
that's like the root of the tree.

969
00:49:21,317 --> 00:49:23,400
You're going to look at
your root node, and you're

970
00:49:23,400 --> 00:49:26,940
going to see is 66 greater
than or less than 55.

971
00:49:26,940 --> 00:49:30,360
And if it's greater than, well, it is
greater than, where do we want to look?

972
00:49:30,360 --> 00:49:32,000
Where do we want to search now, right?

973
00:49:32,000 --> 00:49:34,340
We want to search the
right half of this tree.

974
00:49:34,340 --> 00:49:38,390
>> So we have, conveniently, a
pointer that points to the right.

975
00:49:38,390 --> 00:49:44,325
And so then we can set
our new root to be 77.

976
00:49:44,325 --> 00:49:46,450
We can just go to wherever
the pointer is pointing.

977
00:49:46,450 --> 00:49:49,100
Well, oh, here we're starting
at 77, and we can just

978
00:49:49,100 --> 00:49:51,172
do this recursively again and again.

979
00:49:51,172 --> 00:49:52,880
In this way, you kind
of have a function.

980
00:49:52,880 --> 00:49:57,430
You have a way of searching that you
can just repeat over and over and over,

981
00:49:57,430 --> 00:50:02,720
depending on where you want to look
until you eventually get to the value

982
00:50:02,720 --> 00:50:04,730
that you're searching for.

983
00:50:04,730 --> 00:50:05,230
Make sense?

984
00:50:05,230 --> 00:50:07,800
>> I'm about to show you the actual
code, and it's a lot of code.

985
00:50:07,800 --> 00:50:08,674
No need to freak out.

986
00:50:08,674 --> 00:50:09,910
We'll talk through it.

987
00:50:09,910 --> 00:50:13,410

988
00:50:13,410 --> 00:50:14,020
>> Actually, no.

989
00:50:14,020 --> 00:50:15,061
That was just pseudocode.

990
00:50:15,061 --> 00:50:17,860
OK, that was just the pseudocode,
which is a bit complex,

991
00:50:17,860 --> 00:50:19,751
but it's totally fine.

992
00:50:19,751 --> 00:50:21,000
Everyone following along here?

993
00:50:21,000 --> 00:50:24,260
If the root is null, return
false because that means

994
00:50:24,260 --> 00:50:26,850
you don't even have anything there.

995
00:50:26,850 --> 00:50:31,376
>> If root n is the value, so if it
happens to be the one you're looking at,

996
00:50:31,376 --> 00:50:34,000
then you're going to return true
because you know you found it.

997
00:50:34,000 --> 00:50:36,250
But if the value is less
than root of n, you're

998
00:50:36,250 --> 00:50:38,332
going to search the left
child or the left leaf,

999
00:50:38,332 --> 00:50:39,540
whatever you want to call it.

1000
00:50:39,540 --> 00:50:41,750
And if the value is greater than root,
you're going to search the right tree,

1001
00:50:41,750 --> 00:50:44,610
then just run the function
through search again.

1002
00:50:44,610 --> 00:50:48,037
>> And if root is null, that that
means you've reached the end?

1003
00:50:48,037 --> 00:50:50,120
That means you have no
more more leaves to search,

1004
00:50:50,120 --> 00:50:52,230
then you know, oh, I
guess it's not in here

1005
00:50:52,230 --> 00:50:55,063
because after I've looked through
the whole thing and it's not here,

1006
00:50:55,063 --> 00:50:56,930
it just might not be here.

1007
00:50:56,930 --> 00:50:58,350
>> Does that make sense to everybody?

1008
00:50:58,350 --> 00:51:03,230
So it's like binary search preserving
the capabilities of linked lists.

1009
00:51:03,230 --> 00:51:09,200
Cool, and so the second type
of data structure you guys

1010
00:51:09,200 --> 00:51:13,180
can try implementing on your pset,
you only have to choose one method.

1011
00:51:13,180 --> 00:51:19,430
But perhaps an alternative method to
the hash table is what we call a trie.

1012
00:51:19,430 --> 00:51:24,080
>> All a trie is is a
specific type of tree that

1013
00:51:24,080 --> 00:51:28,600
has values that go to other values.

1014
00:51:28,600 --> 00:51:31,450
So instead of having a binary
tree in the sense that only one

1015
00:51:31,450 --> 00:51:35,940
thing can point to two, you can have
one thing point to many, many things.

1016
00:51:35,940 --> 00:51:39,450
You essentially have arrays
inside of which you store

1017
00:51:39,450 --> 00:51:41,790
pointers that point to other arrays.

1018
00:51:41,790 --> 00:51:45,210

1019
00:51:45,210 --> 00:51:49,460
>> So the node of how we
would define a trie

1020
00:51:49,460 --> 00:51:52,590
is we want to have a
Boolean, c word, right?

1021
00:51:52,590 --> 00:51:54,920
So the node is Boolean
like true or false,

1022
00:51:54,920 --> 00:51:58,490
first of all at the head of
that array, is this a word?

1023
00:51:58,490 --> 00:52:03,620
Secondly, you want to have pointers
to whatever the rest of them are.

1024
00:52:03,620 --> 00:52:07,470
A bit complex, a bit abstract, but
I will explain what that all means.

1025
00:52:07,470 --> 00:52:13,800
>> So here, at the top, if you
have an array declared already,

1026
00:52:13,800 --> 00:52:17,040
a node where you have a Boolean
value stored at the front

1027
00:52:17,040 --> 00:52:19,490
that tells you is this a word?

1028
00:52:19,490 --> 00:52:20,520
Is this not a word?

1029
00:52:20,520 --> 00:52:23,240
And then you have the
rest of your array that

1030
00:52:23,240 --> 00:52:26,040
actually stores all the
possibilities of what it could be.

1031
00:52:26,040 --> 00:52:28,660
So, for example, like
at the top you have

1032
00:52:28,660 --> 00:52:32,140
the first thing that says true or
false, yes or no, this is a word.

1033
00:52:32,140 --> 00:52:38,130
>> And then you have 0 through 26 of
the letters that you can store.

1034
00:52:38,130 --> 00:52:42,790
If I wanted to search here
for bat, I go to the top

1035
00:52:42,790 --> 00:52:49,200
and I look for B. I find B in my
array, and so I know, OK, is B a word?

1036
00:52:49,200 --> 00:52:53,010
B is not a word, so thus
I must keep searching.

1037
00:52:53,010 --> 00:52:56,410
I go from B, and I look to the
pointer that B points towards

1038
00:52:56,410 --> 00:53:00,900
and I see another array of information,
the same structure that we had before.

1039
00:53:00,900 --> 00:53:05,240
>> And here-- oh, the next
letter in [INAUDIBLE] is A.

1040
00:53:05,240 --> 00:53:07,210
So we look in that array.

1041
00:53:07,210 --> 00:53:10,860
We find the eighth value,
and then we look to see, oh,

1042
00:53:10,860 --> 00:53:12,840
hey, is that a word, is B-A a word?

1043
00:53:12,840 --> 00:53:13,807
It is not a word.

1044
00:53:13,807 --> 00:53:14,890
We've got to keep looking.

1045
00:53:14,890 --> 00:53:17,850
>> And so then we look to where
the pointer of A points,

1046
00:53:17,850 --> 00:53:21,130
and it points to another way in
which we have more value stored.

1047
00:53:21,130 --> 00:53:24,150
And eventually, we get to
B-A-T, which is a word.

1048
00:53:24,150 --> 00:53:25,970
And so the next time
you look, you're going

1049
00:53:25,970 --> 00:53:30,850
to have that check of, yes,
this Boolean function is true.

1050
00:53:30,850 --> 00:53:35,450
And so in the sense we're kind
of having a tree with arrays.

1051
00:53:35,450 --> 00:53:39,890
>> So then you can kind of search down.

1052
00:53:39,890 --> 00:53:43,650
Rather than hashing a function and
assigning values by linked list,

1053
00:53:43,650 --> 00:53:49,190
you can just implement a
trie that searches downwords.

1054
00:53:49,190 --> 00:53:50,850
Really, really complicated stuff.

1055
00:53:50,850 --> 00:53:54,060
Not easy to think about because I'm like
spitting so many data structures out

1056
00:53:54,060 --> 00:53:58,710
at you, but does everyone kind of
understand how the logic of this works?

1057
00:53:58,710 --> 00:54:01,920
>> OK, cool.

1058
00:54:01,920 --> 00:54:05,600
So B-A-T, and then
you're going to search.

1059
00:54:05,600 --> 00:54:07,940
The next time you're going
to see, oh, hey, it's true,

1060
00:54:07,940 --> 00:54:09,273
thus I know this must be a word.

1061
00:54:09,273 --> 00:54:12,030

1062
00:54:12,030 --> 00:54:13,770
>> Same thing for zoo.

1063
00:54:13,770 --> 00:54:17,960
So here's the thing right now, if we
wanted to search for zoo, right now,

1064
00:54:17,960 --> 00:54:20,780
currently zoo is not a
word in our dictionary

1065
00:54:20,780 --> 00:54:25,300
because, as you guys can see, the
first place that we have a Boolean

1066
00:54:25,300 --> 00:54:28,590
return true is at the end of zoom.

1067
00:54:28,590 --> 00:54:30,430
We have Z-O-O-M.

1068
00:54:30,430 --> 00:54:33,900
>> And so here, we don't actually have
the word, zoo, in our dictionary

1069
00:54:33,900 --> 00:54:36,070
because this check box is not checked.

1070
00:54:36,070 --> 00:54:39,540
So the computer doesn't
know that zoo is a word

1071
00:54:39,540 --> 00:54:42,430
because the way that we've
stored it, only a zoom here

1072
00:54:42,430 --> 00:54:44,920
actually has a Boolean value
that's been turned true.

1073
00:54:44,920 --> 00:54:49,380
So if we want to insert the
word, zoo, into our dictionary,

1074
00:54:49,380 --> 00:54:51,770
how would we go about doing that?

1075
00:54:51,770 --> 00:54:55,960
What do we have to do to make sure our
computer knows that Z-O-O is a word

1076
00:54:55,960 --> 00:54:58,130
and not the first word is Z-O-O-M?

1077
00:54:58,130 --> 00:54:59,360
>> AUDIENCE: [INAUDIBLE]

1078
00:54:59,360 --> 00:55:01,450
>> ANDI PENG: Exactly, we
want to make sure that this

1079
00:55:01,450 --> 00:55:07,890
here, that Boolean value is
checked off that it's true.

1080
00:55:07,890 --> 00:55:13,297
Z-O-O, then we're going to check that,
so we know exactly, hey, zoo is a word.

1081
00:55:13,297 --> 00:55:15,380
I'm going to tell the
computer that it's a word so

1082
00:55:15,380 --> 00:55:18,000
that when the computer checks,
it knows that zoo is a word.

1083
00:55:18,000 --> 00:55:21,269
>> Because remember all these data
structures, it's very easy for us

1084
00:55:21,269 --> 00:55:22,310
to say, oh, bat's a word.

1085
00:55:22,310 --> 00:55:22,851
Zoo's a word.

1086
00:55:22,851 --> 00:55:23,611
Zoom's a word.

1087
00:55:23,611 --> 00:55:25,860
But when you're building it,
the computer has no idea.

1088
00:55:25,860 --> 00:55:28,619
>> So you have to tell it exactly
at what point is this a word?

1089
00:55:28,619 --> 00:55:29,910
At what point is it not a word?

1090
00:55:29,910 --> 00:55:31,784
And at what point do I
need to search things,

1091
00:55:31,784 --> 00:55:34,000
and at what point do I need to go next?

1092
00:55:34,000 --> 00:55:37,010
Everyone clear of that?

1093
00:55:37,010 --> 00:55:39,540
Cool.

1094
00:55:39,540 --> 00:55:42,530
>> And so then comes the
problem of how would we

1095
00:55:42,530 --> 00:55:45,560
go about inserting something
that's actually not there?

1096
00:55:45,560 --> 00:55:49,090
So let's just say we want to insert
the word, bath, into our trie.

1097
00:55:49,090 --> 00:55:53,589
As you guys can see like currently
all we have now is B-A-T,

1098
00:55:53,589 --> 00:55:55,630
and this new data structure
there had a pint that

1099
00:55:55,630 --> 00:55:59,740
pointed to null because we assume
that, oh, there's no words after B-A-T,

1100
00:55:59,740 --> 00:56:02,530
why do we need to keep
having things after that T.

1101
00:56:02,530 --> 00:56:06,581
>> But the problem arises if we do you
want to have a word that comes after

1102
00:56:06,581 --> 00:56:07,080
the T's.

1103
00:56:07,080 --> 00:56:09,500
If you have bath, you're
going to want an H right.

1104
00:56:09,500 --> 00:56:13,290
And so the way we're going to do that is
we're going to create a separate node.

1105
00:56:13,290 --> 00:56:16,840
We're not allot whatever amount
of memory for this new array,

1106
00:56:16,840 --> 00:56:20,720
and we're going to reassign pointers.

1107
00:56:20,720 --> 00:56:22,947
>> We're going to assign the
H, First of all, this null,

1108
00:56:22,947 --> 00:56:24,030
we're going to get rid of.

1109
00:56:24,030 --> 00:56:26,590
We're going to have
the H point downwards.

1110
00:56:26,590 --> 00:56:30,600
If we see an H, we want it
to go to somewhere else.

1111
00:56:30,600 --> 00:56:33,910
>> In here, we can then check off yes.

1112
00:56:33,910 --> 00:56:38,170
If we hit an H after the T, oh,
then we know that this is a word.

1113
00:56:38,170 --> 00:56:41,110
The Boolean is going to return true.

1114
00:56:41,110 --> 00:56:42,950
Everyone clear on how that happened?

1115
00:56:42,950 --> 00:56:45,110
OK.

1116
00:56:45,110 --> 00:56:47,214
>> So essentially, all of
these data structures

1117
00:56:47,214 --> 00:56:50,130
that we've gone over today, I've
gone over them really, really quickly

1118
00:56:50,130 --> 00:56:52,192
and not in to much
detail, and that's OK.

1119
00:56:52,192 --> 00:56:53,900
Once you start messing
with it, you'll be

1120
00:56:53,900 --> 00:56:55,733
keeping track of where
all the pointers are,

1121
00:56:55,733 --> 00:56:58,060
what's going on in your
data structures, et cetera.

1122
00:56:58,060 --> 00:56:59,810
They'll be very useful,
and it's up to you

1123
00:56:59,810 --> 00:57:03,890
guys to totally figure out how
you want to implement things.

1124
00:57:03,890 --> 00:57:07,650
>> And so pset4, of 5-- oh, that is wrong.

1125
00:57:07,650 --> 00:57:10,140
Pset5 is misspellings.

1126
00:57:10,140 --> 00:57:13,710
As I said before, you're going to, once
again, download source code from us.

1127
00:57:13,710 --> 00:57:16,210
There's going to be three main
things you'll be downloading.

1128
00:57:16,210 --> 00:57:18,470
You'll download dictionaries,
kers, and texts.

1129
00:57:18,470 --> 00:57:21,660
>> All those things are are
either dictionaries of words

1130
00:57:21,660 --> 00:57:25,190
that we want you to check
or test of information

1131
00:57:25,190 --> 00:57:26,930
that we want you to spell check.

1132
00:57:26,930 --> 00:57:29,670
And so the dictionaries
we give you are going

1133
00:57:29,670 --> 00:57:34,870
to give you actual words that we want
you to store somehow in a way that's

1134
00:57:34,870 --> 00:57:36,530
more efficient than an array.

1135
00:57:36,530 --> 00:57:38,470
And then the texts are
going to be what we're

1136
00:57:38,470 --> 00:57:43,900
asking you to spell check to make sure
all of the words there are real words.

1137
00:57:43,900 --> 00:57:47,970
>> And so the three blocks of
programs that we'll give you

1138
00:57:47,970 --> 00:57:51,130
are called dictionary.c,
dictionary.h, and speller.c.

1139
00:57:51,130 --> 00:57:56,500
And so all dictionary.c does is
what you're asked to implement.

1140
00:57:56,500 --> 00:57:57,880
It loads words.

1141
00:57:57,880 --> 00:58:02,000
It spell checks them, and it makes sure
that everything is inserted properly.

1142
00:58:02,000 --> 00:58:05,180
>> diction.h is just a library file
that declares all those functions.

1143
00:58:05,180 --> 00:58:07,650
And speller.c, we're going to give you.

1144
00:58:07,650 --> 00:58:09,290
You don't need to modify any of it.

1145
00:58:09,290 --> 00:58:14,290
All speller.c does is take that,
loads it, checks the speed of it,

1146
00:58:14,290 --> 00:58:19,190
tests the benchmark of like how
quickly you're able to do things.

1147
00:58:19,190 --> 00:58:20,410
>> It's a speller.

1148
00:58:20,410 --> 00:58:23,920
Just don't mess with it, but make
sure you understand what it's doing.

1149
00:58:23,920 --> 00:58:28,090
We use a function called getrusage that
tests the performance of your spell

1150
00:58:28,090 --> 00:58:28,590
checker.

1151
00:58:28,590 --> 00:58:32,200
All it does is basically test the
time of everything in your dictionary,

1152
00:58:32,200 --> 00:58:33,680
so make sure you understand it.

1153
00:58:33,680 --> 00:58:36,660
Be careful to not mess with it or
else things will not run properly.

1154
00:58:36,660 --> 00:58:39,740

1155
00:58:39,740 --> 00:58:44,170
>> And the bulk of this challenge is for
you guys to really modify dictionary.c.

1156
00:58:44,170 --> 00:58:48,526
We're going to give you
140,000 words in a dictionary.

1157
00:58:48,526 --> 00:58:50,900
We're going to give you a text
file that has those words,

1158
00:58:50,900 --> 00:58:54,840
and we want you to be able to organize
them into a hash table or a trie

1159
00:58:54,840 --> 00:58:58,140
because when we ask you to spell
check-- imagine if you're spell

1160
00:58:58,140 --> 00:59:00,690
checking like Homer's Odyssey.

1161
00:59:00,690 --> 00:59:03,010
It's like this huge, huge test.

1162
00:59:03,010 --> 00:59:05,190
>> Imagine if every single
word you had to look

1163
00:59:05,190 --> 00:59:08,100
through an array of 140,000 values.

1164
00:59:08,100 --> 00:59:10,350
That would take forever
for your machine to run.

1165
00:59:10,350 --> 00:59:14,490
That is why we want to organize our
data into more efficient data structures

1166
00:59:14,490 --> 00:59:17,270
such as a hash table or a trie.

1167
00:59:17,270 --> 00:59:20,700
And then you guys can kind
of when you search access

1168
00:59:20,700 --> 00:59:22,570
things more easily and more quickly.

1169
00:59:22,570 --> 00:59:24,934
>> And so be careful to resolve collisions.

1170
00:59:24,934 --> 00:59:27,350
You're going to get a bunch
of words of that start with A.

1171
00:59:27,350 --> 00:59:29,957
You're going to get a bunch words
that start with B. Up to you

1172
00:59:29,957 --> 00:59:31,290
guys how you want to resolve it.

1173
00:59:31,290 --> 00:59:34,144
Perhaps there's more
efficient hash function

1174
00:59:34,144 --> 00:59:36,810
than just the first letter of
something, and so that's up to you

1175
00:59:36,810 --> 00:59:38,190
guys to kind of do whatever you want.

1176
00:59:38,190 --> 00:59:40,148
>> Maybe you want to add
all the letters together.

1177
00:59:40,148 --> 00:59:43,410
Maybe you want to like do weird things
to account the number of letters,

1178
00:59:43,410 --> 00:59:43,970
whatever.

1179
00:59:43,970 --> 00:59:45,386
Up to you guys how you want to do.

1180
00:59:45,386 --> 00:59:49,262
If you want to do a hash table, if you
want to try a trie, totally up to you.

1181
00:59:49,262 --> 00:59:52,470
I will warn you ahead of time that the
trie is typically a bit more difficult

1182
00:59:52,470 --> 00:59:54,520
just because there's a lot
more pointers to keep track of.

1183
00:59:54,520 --> 00:59:55,645
But totally up to you guys.

1184
00:59:55,645 --> 00:59:58,742
It's far more efficient
in most instances.

1185
00:59:58,742 --> 01:00:01,450
You want to really be able to keep
track of all of your pointers.

1186
01:00:01,450 --> 01:00:03,850
Like do the same thing
that I was doing here.

1187
01:00:03,850 --> 01:00:06,871
When you're trying to insert
values into a hash table or delete,

1188
01:00:06,871 --> 01:00:08,620
make sure that you're
really keeping track

1189
01:00:08,620 --> 01:00:11,860
of where everything is because
it's really easy for if I'm

1190
01:00:11,860 --> 01:00:14,727
trying to insert like the word, andy.

1191
01:00:14,727 --> 01:00:16,810
Let's just say that's a
real word, the word, andy,

1192
01:00:16,810 --> 01:00:19,640
into a giant list of A words.

1193
01:00:19,640 --> 01:00:22,450
>> If I just happen to reassign
a pointer wrong, oops,

1194
01:00:22,450 --> 01:00:24,940
there goes the entirety of
the rest of my linked list.

1195
01:00:24,940 --> 01:00:26,897
Now the only word I
have is andy, and now

1196
01:00:26,897 --> 01:00:29,230
all of the other words in the
dictionary have been lost.

1197
01:00:29,230 --> 01:00:31,370
And so you want to make sure you
keep track of all of your pointers

1198
01:00:31,370 --> 01:00:33,661
or else you're going to get
huge problems in your code.

1199
01:00:33,661 --> 01:00:35,840
Draw things out carefully step by step.

1200
01:00:35,840 --> 01:00:37,870
It makes it a lot easier to think of.

1201
01:00:37,870 --> 01:00:40,910
>> And lastly, you want to be able to
test your performance of your program

1202
01:00:40,910 --> 01:00:41,618
on the big board.

1203
01:00:41,618 --> 01:00:43,710
If you guys take a
look at CS50 right now,

1204
01:00:43,710 --> 01:00:45,210
we have what's called the big board.

1205
01:00:45,210 --> 01:00:50,200
It is the score sheet of the fastest
spell checking times across all of CS50

1206
01:00:50,200 --> 01:00:55,720
right now, I think the top like 10
times I think eight of them are staff.

1207
01:00:55,720 --> 01:00:57,960
We really want you guys to beat us.

1208
01:00:57,960 --> 01:01:00,870
>> All of us were trying to implement
the fastest code as possible.

1209
01:01:00,870 --> 01:01:04,880
We want you guys to try to challenge
us and implement faster than all of us

1210
01:01:04,880 --> 01:01:05,550
can.

1211
01:01:05,550 --> 01:01:07,970
And so this is really
the first time that we're

1212
01:01:07,970 --> 01:01:12,680
asking you guys to do a pset that
you can really do in whatever method

1213
01:01:12,680 --> 01:01:13,760
you want.

1214
01:01:13,760 --> 01:01:17,730
>> I always say, this is more akin
to a real-life solution, right?

1215
01:01:17,730 --> 01:01:19,550
I say, hey, I need you to do this.

1216
01:01:19,550 --> 01:01:21,380
Build a program that does this for me.

1217
01:01:21,380 --> 01:01:22,630
Do it however you want.

1218
01:01:22,630 --> 01:01:24,271
I just know that I want to fast.

1219
01:01:24,271 --> 01:01:25,770
That's your challenge for this week.

1220
01:01:25,770 --> 01:01:27,531
You guys, we're going
to give you a task.

1221
01:01:27,531 --> 01:01:29,030
We're going to give you a challenge.

1222
01:01:29,030 --> 01:01:31,559
And then it's up to you guys
to completely just figure out

1223
01:01:31,559 --> 01:01:34,100
what's the quickest and most
efficient way to implement this.

1224
01:01:34,100 --> 01:01:34,600
Yeah?

1225
01:01:34,600 --> 01:01:37,476
>> AUDIENCE: Are we allowed to if
wanted to research faster ways

1226
01:01:37,476 --> 01:01:40,821
to do hash tables online, can we do
that and cite someone else's code?

1227
01:01:40,821 --> 01:01:42,070
ANDI PENG: Yeah, totally fine.

1228
01:01:42,070 --> 01:01:44,320
So if you guys read the
spec, there's a line

1229
01:01:44,320 --> 01:01:48,310
in the spec that says you guys
are totally free to research hash

1230
01:01:48,310 --> 01:01:51,070
functions on what are some
of the quicker hash functions

1231
01:01:51,070 --> 01:01:54,720
to run things through as
long as you cite that code.

1232
01:01:54,720 --> 01:01:57,220
So some people have already
figured out fast ways

1233
01:01:57,220 --> 01:02:00,250
of doing spell checkers, of fast
ways of storing information.

1234
01:02:00,250 --> 01:02:02,750
Totally up to you guys if you
want to just take that, right?

1235
01:02:02,750 --> 01:02:04,045
Make sure you're citing.

1236
01:02:04,045 --> 01:02:06,170
The challenge here really
that we're trying to test

1237
01:02:06,170 --> 01:02:09,750
is making sure that you know
your way around pointers.

1238
01:02:09,750 --> 01:02:12,700
As far as you implementing
the actual hash function

1239
01:02:12,700 --> 01:02:15,070
and coming up with like
the math to do that,

1240
01:02:15,070 --> 01:02:17,570
you guys can research whatever
methods online you guys want.

1241
01:02:17,570 --> 01:02:17,996
Yeah?

1242
01:02:17,996 --> 01:02:19,700
>> AUDIENCE: Can we cite just
by using the [INAUDIBLE]?

1243
01:02:19,700 --> 01:02:20,120
>> ANDI PENG: Yeah.

1244
01:02:20,120 --> 01:02:22,328
You can just, in your comment,
you can cite like, oh,

1245
01:02:22,328 --> 01:02:26,127
taken from yada, yada,
yada, hash function.

1246
01:02:26,127 --> 01:02:27,210
Anyone have any questions?

1247
01:02:27,210 --> 01:02:29,694
We actually breezed
through section today.

1248
01:02:29,694 --> 01:02:31,610
I will be up here to
answer questions as well.

1249
01:02:31,610 --> 01:02:36,570
>> Also, as I said, office
hours tonight and tomorrow.

1250
01:02:36,570 --> 01:02:40,307
The spec this week is actually
super easy and super short to read.

1251
01:02:40,307 --> 01:02:43,140
I would suggest taking a look, just
read through the entirety of it.

1252
01:02:43,140 --> 01:02:45,730
>> And Zamyla actually walks you
through each of the functions

1253
01:02:45,730 --> 01:02:49,796
you need to implement, and so it's
very, very clear how to do everything.

1254
01:02:49,796 --> 01:02:51,920
Just to make sure you're
keeping track of pointers.

1255
01:02:51,920 --> 01:02:53,650
This is a very challenging pset.

1256
01:02:53,650 --> 01:02:56,744
>> It's not challenging because like,
oh, the concepts are so much more

1257
01:02:56,744 --> 01:02:59,160
difficult, or you have to learn
so much new syntax the way

1258
01:02:59,160 --> 01:03:00,650
that you did for the last pset.

1259
01:03:00,650 --> 01:03:03,320
This pset is difficult because
there are so many pointers,

1260
01:03:03,320 --> 01:03:06,980
and then it's very, very easy to once
you have a bug in your code not be able

1261
01:03:06,980 --> 01:03:08,315
to find where that bug is.

1262
01:03:08,315 --> 01:03:13,200
>> And so complete and utter faith in you
guys to be able to beat our [INAUDIBLE]

1263
01:03:13,200 --> 01:03:13,700
spellings.

1264
01:03:13,700 --> 01:03:16,640
I actually haven't any written mine
yet, but I'm about to write mine.

1265
01:03:16,640 --> 01:03:19,070
So while you're writing
yours, I'll be writing mine.

1266
01:03:19,070 --> 01:03:21,070
I'm going to try to make
mine faster than yours.

1267
01:03:21,070 --> 01:03:23,940
We'll see who has the fastest one.

1268
01:03:23,940 --> 01:03:27,340
>> And yeah, I will see all of
you guys here on Tuesday.

1269
01:03:27,340 --> 01:03:29,510
I will run a kind like a pset workshop.

1270
01:03:29,510 --> 01:03:32,640
All of the sections this
week are pset workshops,

1271
01:03:32,640 --> 01:03:36,690
so you guys have lots of opportunities
for help, office hours as always,

1272
01:03:36,690 --> 01:03:41,330
and I really look forward to
reading all of your guys' code.

1273
01:03:41,330 --> 01:03:44,160
I have quizzes up here if you
guys want to come get those.

1274
01:03:44,160 --> 01:03:45,880
That's all.

1275
01:03:45,880 --> 01:03:48,180