1
00:00:00,000 --> 00:00:03,472
[MUSIC PLAYING]

2
00:00:03,472 --> 00:00:17,322


3
00:00:17,322 --> 00:00:20,280
DAVID MALAN: Recall that an algorithm
is just step-by-step instructions

4
00:00:20,280 --> 00:00:21,510
for solving some problem.

5
00:00:21,510 --> 00:00:25,470
Not unlike this problem here wherein I
sought Mike Smith among the whole phone

6
00:00:25,470 --> 00:00:27,210
book of names and numbers.

7
00:00:27,210 --> 00:00:29,280
But up until now, we've
really only focused

8
00:00:29,280 --> 00:00:31,680
on those step-by-step
instructions and not

9
00:00:31,680 --> 00:00:35,163
so much on how the data we
are searching is stored.

10
00:00:35,163 --> 00:00:38,080
Of course, in this version of that
problem, it's stored here on paper,

11
00:00:38,080 --> 00:00:42,150
but in the digital world, it's of course
not going to be paper, but 0's and 1's.

12
00:00:42,150 --> 00:00:45,180
But it's one thing to say that the
numbers and maybe even the names

13
00:00:45,180 --> 00:00:49,950
are stored ultimately as 0's and
1's, but where and how exactly?

14
00:00:49,950 --> 00:00:52,650
There's all those transistors
and they're flipping on and off,

15
00:00:52,650 --> 00:00:56,830
but with respect to each other, are
those numbers laid out left to right,

16
00:00:56,830 --> 00:00:58,950
top to bottom, are they
all over the place?

17
00:00:58,950 --> 00:01:01,440
Let's actually take a
look at that question now

18
00:01:01,440 --> 00:01:03,540
and consider how a
computer leverages what

19
00:01:03,540 --> 00:01:07,800
are called data structures to
facilitate implementation of algorithms.

20
00:01:07,800 --> 00:01:12,060
Indeed, how you lay out a
computer's data inside of its memory

21
00:01:12,060 --> 00:01:15,540
has non-trivial impacts on
the performance or efficiency

22
00:01:15,540 --> 00:01:17,940
of your algorithms, whereas
the algorithm itself

23
00:01:17,940 --> 00:01:21,990
can be correct as we've seen, but
not necessarily efficient logically.

24
00:01:21,990 --> 00:01:25,110
Both space and the representation
underneath the hood of your data

25
00:01:25,110 --> 00:01:27,000
can also make a significant impact.

26
00:01:27,000 --> 00:01:28,590
But let's simplify the world first.

27
00:01:28,590 --> 00:01:32,850
And rather than focus on, say, a
whole phone book of names and numbers,

28
00:01:32,850 --> 00:01:36,060
let's focus just on numbers, and
much smaller numbers that aren't even

29
00:01:36,060 --> 00:01:40,770
phone numbers, but just integers, and
only save seven of them at a time.

30
00:01:40,770 --> 00:01:42,940
And I've hidden these
seven numbers, if you will,

31
00:01:42,940 --> 00:01:45,540
behind these seven yellow doors.

32
00:01:45,540 --> 00:01:48,360
And so by knocking and
opening, one of these doors

33
00:01:48,360 --> 00:01:49,803
will reveal one number at a time.

34
00:01:49,803 --> 00:01:52,470
And the goal at hand, though, is
to find a very specific number,

35
00:01:52,470 --> 00:01:56,280
just like I sought one specific
phone number before, this time I want

36
00:01:56,280 --> 00:01:59,670
to find the number 50 specifically.

37
00:01:59,670 --> 00:02:01,290
Well, where to begin?

38
00:02:01,290 --> 00:02:04,050
I'll go with the one closest
to me and knock, knock, knock--

39
00:02:04,050 --> 00:02:05,580
15 is the number.

40
00:02:05,580 --> 00:02:06,840
So a little bit low.

41
00:02:06,840 --> 00:02:08,930
Let's proceed from there to see 23.

42
00:02:08,930 --> 00:02:10,320
We seem to be getting closer.

43
00:02:10,320 --> 00:02:14,040
Let's open this door next and-- oh,
we seem to have veered down smaller,

44
00:02:14,040 --> 00:02:15,180
so I'm a little confused.

45
00:02:15,180 --> 00:02:17,130
But I have four doors left to check.

46
00:02:17,130 --> 00:02:23,940
So 50 is not there and 50 is not
there and 50 is, in fact, there.

47
00:02:23,940 --> 00:02:24,690
So not bad.

48
00:02:24,690 --> 00:02:28,910
Within just six steps, have I
found the number in question.

49
00:02:28,910 --> 00:02:31,670
But of course, to be fair,
there were only seven doors.

50
00:02:31,670 --> 00:02:36,050
So if we generalize that to say that
there were n doors where n is just

51
00:02:36,050 --> 00:02:41,540
a number, well that was roughly n
doors I had to open among the n doors

52
00:02:41,540 --> 00:02:44,210
just to find the one that I sought.

53
00:02:44,210 --> 00:02:45,740
So could I have done better?

54
00:02:45,740 --> 00:02:49,460
You know, my instincts like yours
were perhaps to start at the left

55
00:02:49,460 --> 00:02:53,090
and move to the right, and we seem
to be on a good path initially.

56
00:02:53,090 --> 00:02:56,390
We went from 15 to 23,
and then darn it if 16

57
00:02:56,390 --> 00:02:59,210
didn't throw a wrench in the
works, because I expected it,

58
00:02:59,210 --> 00:03:03,290
perhaps naively, to be bigger
and bigger as I moved right.

59
00:03:03,290 --> 00:03:07,010
But honestly, had I not told you
anything-- and indeed I did-- then

60
00:03:07,010 --> 00:03:10,220
you wouldn't have known anything
about these numbers other than maybe

61
00:03:10,220 --> 00:03:12,380
the number 50 is actually there.

62
00:03:12,380 --> 00:03:14,840
I told you nothing as to
the magnitude or the size

63
00:03:14,840 --> 00:03:17,453
of any of the other numbers,
let alone the order,

64
00:03:17,453 --> 00:03:19,370
but in the world of the
phone book, of course,

65
00:03:19,370 --> 00:03:22,280
we were able to take for
granted that those names were

66
00:03:22,280 --> 00:03:27,080
sorted by the phone company for us--
from left to right, from A to Z.

67
00:03:27,080 --> 00:03:31,070
But in this case, if your data is just
added to the computer's memory one

68
00:03:31,070 --> 00:03:33,620
at a time in no
particular order, the onus

69
00:03:33,620 --> 00:03:37,850
is on you, the programmer or
algorithm, to find that number

70
00:03:37,850 --> 00:03:39,560
you're interested in nonetheless.

71
00:03:39,560 --> 00:03:40,820
Now what was left here?

72
00:03:40,820 --> 00:03:43,940
And indeed 4 is even smaller than 50.

73
00:03:43,940 --> 00:03:48,140
So these seven doors were by
design randomly assigned a number.

74
00:03:48,140 --> 00:03:49,847
And so you could do no better.

75
00:03:49,847 --> 00:03:50,930
I might have gotten lucky.

76
00:03:50,930 --> 00:03:53,000
I might not have gone
with my initial instincts

77
00:03:53,000 --> 00:03:55,010
and touch the number 15 at left.

78
00:03:55,010 --> 00:03:59,060
I might have, effectively blinded, gone
and touched 50 and just gotten lucky,

79
00:03:59,060 --> 00:04:00,920
and then it would have
been just one step.

80
00:04:00,920 --> 00:04:04,760
But there's only a one in seven chance
I would have been correct so quickly,

81
00:04:04,760 --> 00:04:07,220
so that's not really an
algorithm that I could

82
00:04:07,220 --> 00:04:11,340
reproduce with the same
efficiency again and again.

83
00:04:11,340 --> 00:04:13,010
So how can I do better?

84
00:04:13,010 --> 00:04:16,490
And how does the phone company
enable us to do better?

85
00:04:16,490 --> 00:04:19,339
Well they, of course, put in a
huge amount of effort upfront

86
00:04:19,339 --> 00:04:23,270
to sort all of those names and
associated numbers from left

87
00:04:23,270 --> 00:04:26,030
to right, from A to Z. And so
that's a huge leg up for us,

88
00:04:26,030 --> 00:04:30,230
because then I can assume I can do
divide and conquer or so-called binary

89
00:04:30,230 --> 00:04:32,570
search, dividing that
phone book in two as

90
00:04:32,570 --> 00:04:36,860
implied by "bi" in "binary,"
having the problem again and again.

91
00:04:36,860 --> 00:04:40,430
But someone's got to do that work
for us, be it the phone company

92
00:04:40,430 --> 00:04:42,120
or perhaps me with these numbers.

93
00:04:42,120 --> 00:04:44,360
So let's take one more
stab at this problem,

94
00:04:44,360 --> 00:04:47,720
this time presuming that
the seven doors in question

95
00:04:47,720 --> 00:04:52,370
do, in fact, have the numbers behind
them sorted from left to right,

96
00:04:52,370 --> 00:04:54,170
small to big.

97
00:04:54,170 --> 00:04:56,630
So where to find the number 50 now?

98
00:04:56,630 --> 00:04:59,780
I have seven doors behind
which are those same numbers,

99
00:04:59,780 --> 00:05:02,930
but this time they are
sorted from left to right.

100
00:05:02,930 --> 00:05:06,420
And no skipping ahead thinking that,
well, I remember all the other numbers,

101
00:05:06,420 --> 00:05:08,228
so I know immediately where 50 is.

102
00:05:08,228 --> 00:05:10,520
Let's assume for the moment
that we don't know anything

103
00:05:10,520 --> 00:05:14,090
about the other numbers other than
the fact that they are sorted.

104
00:05:14,090 --> 00:05:17,930
Well, my inclination is not to start
at the left with this first door,

105
00:05:17,930 --> 00:05:21,260
much like my inclination ultimately
with that phone book was not to start

106
00:05:21,260 --> 00:05:23,388
with the first page, but the middle.

107
00:05:23,388 --> 00:05:26,180
And indeed, I'm going to go here
to the middle of these doors and--

108
00:05:26,180 --> 00:05:27,050
16.

109
00:05:27,050 --> 00:05:29,030
Not quite the one I want.

110
00:05:29,030 --> 00:05:33,960
But if the doors are sorted now, I know
that that number 50 is not to the left,

111
00:05:33,960 --> 00:05:35,460
and so I'm going to go to the right.

112
00:05:35,460 --> 00:05:37,280
Where do I go to the right?

113
00:05:37,280 --> 00:05:41,090
Well, I have three doors left, I'm
going to follow the same algorithm

114
00:05:41,090 --> 00:05:44,960
and open that door in the
middle and-- oh, so close.

115
00:05:44,960 --> 00:05:47,360
I found only, if you
will, the meaning of life.

116
00:05:47,360 --> 00:05:49,460
So 42, though, is not
the number I care about,

117
00:05:49,460 --> 00:05:52,940
but I do know something about
50-- it's bigger than 42.

118
00:05:52,940 --> 00:05:57,320
And so now, it's quite simply
the case that-- aha, 50 is there,

119
00:05:57,320 --> 00:05:59,840
it's going to be in that last number.

120
00:05:59,840 --> 00:06:03,770
So whereas before took me up to
six steps to find the number 50,

121
00:06:03,770 --> 00:06:05,750
and only then by luck
did I find it where

122
00:06:05,750 --> 00:06:07,920
it was because it was
just randomly placed,

123
00:06:07,920 --> 00:06:12,990
now I spent 1, 2, 3 steps in total,
which is, of course, fewer than six.

124
00:06:12,990 --> 00:06:16,010
And as these numbers
of doors grow in size

125
00:06:16,010 --> 00:06:18,200
and I have hundreds
or thousands of doors,

126
00:06:18,200 --> 00:06:21,440
surely it will be the case just like
the phone book that having this problem

127
00:06:21,440 --> 00:06:23,990
again and again is going
to get me to my answer

128
00:06:23,990 --> 00:06:29,430
if it's there in logarithmic
instead of linear time, so to speak.

129
00:06:29,430 --> 00:06:33,830
But what's key to the success of
this algorithm-- binary search--

130
00:06:33,830 --> 00:06:39,350
is that the doors are not only sorted,
but they are back-to-back-to-back.

131
00:06:39,350 --> 00:06:42,140
Now I have the luxury of feet
and I can move back and forth

132
00:06:42,140 --> 00:06:46,190
among these numbers, but even my steps
take me some amount of time and energy.

133
00:06:46,190 --> 00:06:50,120
But fortunately, each such step just
takes one unit of energy, if you will,

134
00:06:50,120 --> 00:06:54,840
and I can immediately jump wherever
I would like one step at a time.

135
00:06:54,840 --> 00:06:58,520
But a computer is purely electronic,
and in the context of memory,

136
00:06:58,520 --> 00:07:01,490
doesn't actually need to take any steps.

137
00:07:01,490 --> 00:07:05,270
Electronically a computer can
jump to any location in memory

138
00:07:05,270 --> 00:07:07,460
instantly in so-called constant time.

139
00:07:07,460 --> 00:07:10,077
So just one step, that
might take me several.

140
00:07:10,077 --> 00:07:13,160
And so that's an advantage a computer
has and it's just one of the reasons

141
00:07:13,160 --> 00:07:17,330
why they are so much faster than
us at solving so many problems.

142
00:07:17,330 --> 00:07:22,790
But the key ingredient to laying
out the data for a computer to solve

143
00:07:22,790 --> 00:07:26,712
your problems quickly is that you need
to put your data back-to-back-to-back.

144
00:07:26,712 --> 00:07:28,420
Because a computer at
the end of the day,

145
00:07:28,420 --> 00:07:32,470
yes, stores only 0's and 1's, but
those 0's and 1's are generally

146
00:07:32,470 --> 00:07:34,630
treated in units of, say, eight--

147
00:07:34,630 --> 00:07:36,070
8 bits per byte.

148
00:07:36,070 --> 00:07:39,220
But those bytes, when
storing numbers like this,

149
00:07:39,220 --> 00:07:42,670
need those numbers to be
back-to-back-to-back and not just

150
00:07:42,670 --> 00:07:44,283
jumbled all over the place.

151
00:07:44,283 --> 00:07:46,450
Because it needs to be the
case that the computer is

152
00:07:46,450 --> 00:07:51,280
allowed to do the simplest of
arithmetic to figure out where to look.

153
00:07:51,280 --> 00:07:53,980
Even I in my head am sort of
doing a bit of math figuring out,

154
00:07:53,980 --> 00:07:55,000
well where's the middle?

155
00:07:55,000 --> 00:07:57,812
Even though among few doors you
can pretty much eyeball it quickly.

156
00:07:57,812 --> 00:08:00,520
But a computer's going to have to
do a bit of arithmetic, so what

157
00:08:00,520 --> 00:08:01,600
is that math?

158
00:08:01,600 --> 00:08:05,680
Well if I have 1, 2, 3, 4,
5, 6, 7 doors initially,

159
00:08:05,680 --> 00:08:08,950
and I want to find the middle one,
I'm actually just going to do what?

160
00:08:08,950 --> 00:08:12,520
7 divided by 2, which
gives me 3 and 1/2-- that's

161
00:08:12,520 --> 00:08:14,920
not an integer that's that
useful for counting doors,

162
00:08:14,920 --> 00:08:17,330
so let's just round it down to 3.

163
00:08:17,330 --> 00:08:22,090
So 7 divided by 2 is 3.5
rounded down to 3 suggests

164
00:08:22,090 --> 00:08:26,920
mathematically that the number of the
door that's in the middle of my doors

165
00:08:26,920 --> 00:08:29,088
should be that known as 3.

166
00:08:29,088 --> 00:08:30,880
Now recall that a
computer generally starts

167
00:08:30,880 --> 00:08:34,690
counting at 0 because 0
bits represent 0 in decimal,

168
00:08:34,690 --> 00:08:38,440
and so this is door 0, 1, 2, 3, 4, 5, 6.

169
00:08:38,440 --> 00:08:42,340
So there's still seven doors, but the
first is 0 and the last is called 6.

170
00:08:42,340 --> 00:08:46,458
So if I'm looking for
number 3, that's 0, 1, 2, 3.

171
00:08:46,458 --> 00:08:49,000
And indeed, that's why I jumped
to the middle of these doors,

172
00:08:49,000 --> 00:08:52,960
because I went very
specifically to location 3.

173
00:08:52,960 --> 00:08:56,200
Now why did I jump to 42 next?

174
00:08:56,200 --> 00:08:59,020
Of course, that was in the middle
of the three remaining doors,

175
00:08:59,020 --> 00:09:01,750
but how would a computer know
mathematically where to go,

176
00:09:01,750 --> 00:09:04,480
whereas we can just
rather eyeball it here?

177
00:09:04,480 --> 00:09:09,430
Well if you've got 3 doors divided
by 2, that gives me, of course, 1.5--

178
00:09:09,430 --> 00:09:11,230
let's round that down to 1.

179
00:09:11,230 --> 00:09:15,040
So if we now re-number
these doors, it's 0, 1, 2,

180
00:09:15,040 --> 00:09:20,220
because these are the only three doors
that exist, well door 1 is 0, 1--

181
00:09:20,220 --> 00:09:25,018
the 42, and that's how a computer
would know to jump right to 42.

182
00:09:25,018 --> 00:09:27,310
Of course, with just one door
left, it's pretty simple.

183
00:09:27,310 --> 00:09:29,980
You'd needn't even do any of
that math if there's just one,

184
00:09:29,980 --> 00:09:33,820
and so we can immediately
access that in constant time.

185
00:09:33,820 --> 00:09:37,360
In other words, even though my human
feet are taking a bit of energy

186
00:09:37,360 --> 00:09:41,710
to get from one door to another, a
computer has the leg-up, so to speak,

187
00:09:41,710 --> 00:09:45,250
of getting to these doors even
quicker, because all it has to do

188
00:09:45,250 --> 00:09:47,770
is a little bit of division,
maybe some rounding,

189
00:09:47,770 --> 00:09:51,400
and then jump exactly to
that position in memory.

190
00:09:51,400 --> 00:09:56,050
And that is what we call constant
time, but it presupposes, again,

191
00:09:56,050 --> 00:09:59,950
that the data is laid out
back-to-back-to-back so that every one

192
00:09:59,950 --> 00:10:04,255
of these numbers is an equal
distance away from every other.

193
00:10:04,255 --> 00:10:06,130
Because otherwise if
you were to do this math

194
00:10:06,130 --> 00:10:08,278
and coming up with the
numbers 3 or 1, you

195
00:10:08,278 --> 00:10:10,570
have to be able to know where
you're jumping in memory,

196
00:10:10,570 --> 00:10:15,880
because that number 42 can't be down
here, it has to be numerically in order

197
00:10:15,880 --> 00:10:18,580
exactly where you expect.

198
00:10:18,580 --> 00:10:23,500
And so in computer science and in
programming is this kind of arrangement

199
00:10:23,500 --> 00:10:27,970
where you have doors or really
data back-to-back-to-back known

200
00:10:27,970 --> 00:10:29,920
as what's called an array.

201
00:10:29,920 --> 00:10:34,900
An array is a contiguous block of
memory wherein values are stored

202
00:10:34,900 --> 00:10:39,280
back-to-back-to-back-to-back--
from left to right conceptually,

203
00:10:39,280 --> 00:10:42,700
although of course, direction has
less meaning once you're inside

204
00:10:42,700 --> 00:10:44,720
of a computer.

205
00:10:44,720 --> 00:10:47,800
Now it is thanks to these arrays
that we were able to search,

206
00:10:47,800 --> 00:10:49,690
even something like a phone so quickly.

207
00:10:49,690 --> 00:10:52,000
After all, you can imagine
in the physical world,

208
00:10:52,000 --> 00:10:54,850
a phone book isn't all
that unlike an array,

209
00:10:54,850 --> 00:10:58,480
albeit a more arcane version
here, because its pages are indeed

210
00:10:58,480 --> 00:11:02,120
back-to-back-to-back-to-back from
left to right, which is wonderful.

211
00:11:02,120 --> 00:11:04,120
And you'll recall when
we searched a phone book,

212
00:11:04,120 --> 00:11:07,600
we were already able to describe
the efficiency via which

213
00:11:07,600 --> 00:11:10,330
we were able to search it-- via
each of those three algorithms.

214
00:11:10,330 --> 00:11:13,870
One page at a time, two pages
at a time, and then one-half

215
00:11:13,870 --> 00:11:15,610
of the remaining problem at a time.

216
00:11:15,610 --> 00:11:18,670
Well it turns out that there's
a direct connection even

217
00:11:18,670 --> 00:11:20,860
to the simplification
of that same problem.

218
00:11:20,860 --> 00:11:24,640
If I have n doors and I search them
from left to right, that of course

219
00:11:24,640 --> 00:11:28,720
might take me as many six, seven total
steps or n if the number I'm seeking

220
00:11:28,720 --> 00:11:30,160
is all the way at the end.

221
00:11:30,160 --> 00:11:33,100
I could have gone two doors at
a time, although that really

222
00:11:33,100 --> 00:11:36,130
would have gone off the rails
with the randomly-sorted numbers,

223
00:11:36,130 --> 00:11:39,970
because there would have been no logic
to just going left to right twice as

224
00:11:39,970 --> 00:11:43,750
fast because I would be missing
every other element never knowing

225
00:11:43,750 --> 00:11:45,070
when to go back.

226
00:11:45,070 --> 00:11:47,800
And in the case of binary
search, my last algorithm where

227
00:11:47,800 --> 00:11:50,050
I started in the middle
and found 16, and then

228
00:11:50,050 --> 00:11:52,540
started in the middle of that
middle and found 42, and then

229
00:11:52,540 --> 00:11:55,720
started in the middle of the
middle and found my last number,

230
00:11:55,720 --> 00:12:00,010
binary search is quite akin to what
we did by tearing that problem in half

231
00:12:00,010 --> 00:12:00,880
and in half.

232
00:12:00,880 --> 00:12:04,150
So how did we describe the efficiency
of that algorithm last time?

233
00:12:04,150 --> 00:12:08,740
Well we proposed that my first algorithm
was linear, this straight line in red

234
00:12:08,740 --> 00:12:12,340
represented here by the label n, because
for every page in the phone book,

235
00:12:12,340 --> 00:12:15,550
in the worst case you might need
one extra step to find someone

236
00:12:15,550 --> 00:12:16,313
like Mike Smith.

237
00:12:16,313 --> 00:12:19,480
And indeed, in the case of these doors,
if there's just one more door added,

238
00:12:19,480 --> 00:12:23,920
you might need one more step to
find that number 50 or any other.

239
00:12:23,920 --> 00:12:26,200
Now I could, once
those doors are sorted,

240
00:12:26,200 --> 00:12:29,270
go through them twice as fast,
looking two doors at a time,

241
00:12:29,270 --> 00:12:34,630
and if I go too far and find, say, 51, I
could double-back and fix that mistake.

242
00:12:34,630 --> 00:12:36,672
But what I ultimately did
was divide and conquer.

243
00:12:36,672 --> 00:12:39,088
Starting in the middle, and
then the middle of the middle,

244
00:12:39,088 --> 00:12:41,020
and the middle of the
middle of the middle,

245
00:12:41,020 --> 00:12:43,420
and that's what give
me this performance.

246
00:12:43,420 --> 00:12:46,600
This so-called logarithmic
time-- log base 2 event

247
00:12:46,600 --> 00:12:50,740
which if nothing else means that we
have a different shape fundamentally

248
00:12:50,740 --> 00:12:52,570
to the performance of this algorithm.

249
00:12:52,570 --> 00:12:57,430
It grows so much more slowly in time
even as the problem gets really big.

250
00:12:57,430 --> 00:13:01,600
And even off the screen here,
imagine that even as n gets huge,

251
00:13:01,600 --> 00:13:04,270
that green line would not
seem to be going very high

252
00:13:04,270 --> 00:13:07,150
even as the red and yellow ones do.

253
00:13:07,150 --> 00:13:08,950
So in computer science,
there are actually

254
00:13:08,950 --> 00:13:14,110
formal labels we can apply to this sort
of methodology of analyzing algorithms.

255
00:13:14,110 --> 00:13:18,280
When you talk about upper bounds, on
just how much time an algorithm takes,

256
00:13:18,280 --> 00:13:21,490
you might say this--
big O, quite literally.

257
00:13:21,490 --> 00:13:24,940
That an algorithm is in
a big O of some formula.

258
00:13:24,940 --> 00:13:27,760
For instance, among the formulas
it might be are these here--

259
00:13:27,760 --> 00:13:32,170
n squared, or n log n,
or n, or log n, or 1.

260
00:13:32,170 --> 00:13:34,570
Which is to say you can
represent somewhat simply

261
00:13:34,570 --> 00:13:38,860
mathematically using n-- or really
any other place holder-- as your value

262
00:13:38,860 --> 00:13:44,110
a variable that represents the
size of the problem in question.

263
00:13:44,110 --> 00:13:47,200
So for instance, in the
case of linear search,

264
00:13:47,200 --> 00:13:49,230
when I'm searching that
phone book left to right

265
00:13:49,230 --> 00:13:51,970
or searching these doors left
to right, in the worst case,

266
00:13:51,970 --> 00:13:55,990
it might take me as many as n
steps to find Mike or that 50,

267
00:13:55,990 --> 00:13:58,990
and so we would say that
that linear algorithm is

268
00:13:58,990 --> 00:14:03,850
in big O of n, which is just a fancier
way of saying quite simply that it's

269
00:14:03,850 --> 00:14:06,070
indeed linear in time.

270
00:14:06,070 --> 00:14:09,340
But sometimes I might get lucky,
and indeed in the best case,

271
00:14:09,340 --> 00:14:12,040
I might find Mike or 50 or
anything else much faster,

272
00:14:12,040 --> 00:14:15,430
and computer scientists also have
ways of expressing lower bounds

273
00:14:15,430 --> 00:14:17,170
on the running times of algorithms.

274
00:14:17,170 --> 00:14:19,630
Whereby in the best case,
perhaps, an algorithm

275
00:14:19,630 --> 00:14:24,010
might take only this much time
and at least this much time.

276
00:14:24,010 --> 00:14:28,450
And we use a capitalized omega to
express that notion of a lower bound,

277
00:14:28,450 --> 00:14:32,410
whereas again, a big O represents
an upper bound on the same.

278
00:14:32,410 --> 00:14:35,650
So we can use these same formulas,
because depending on the algorithm,

279
00:14:35,650 --> 00:14:40,000
it might indeed take n squared steps
or just 1 or constant number thereof,

280
00:14:40,000 --> 00:14:43,360
but we can consider even linear
search to having a lower bound,

281
00:14:43,360 --> 00:14:47,020
because in the best case,
maybe Mike or maybe 50

282
00:14:47,020 --> 00:14:49,300
or any other inputs
of the problem just so

283
00:14:49,300 --> 00:14:53,200
happens to be at the very beginning
of that book or those doors.

284
00:14:53,200 --> 00:14:56,440
And so in the best case, a lower bound
on the running time of linear search

285
00:14:56,440 --> 00:14:59,170
might indeed be omega of
1 because you might just

286
00:14:59,170 --> 00:15:01,810
get lucky and take one
step or two or three

287
00:15:01,810 --> 00:15:05,590
or terribly few, but
independent of the number n.

288
00:15:05,590 --> 00:15:09,310
And so there, we might express
this lower bound as well.

289
00:15:09,310 --> 00:15:11,830
Now meanwhile there's
one more Greek symbol

290
00:15:11,830 --> 00:15:17,170
here, theta, capitalized here, which
represents a coincidence of upper

291
00:15:17,170 --> 00:15:18,100
and lower bounds.

292
00:15:18,100 --> 00:15:20,980
Whereby if it happens to be
the case for some algorithm

293
00:15:20,980 --> 00:15:24,250
that you have an upper bound and
a lower bound that are the same,

294
00:15:24,250 --> 00:15:27,940
you can equivalently say not both of
those statements, but quite simply

295
00:15:27,940 --> 00:15:30,755
that the algorithm is in
theta of some formula.

296
00:15:30,755 --> 00:15:33,460


297
00:15:33,460 --> 00:15:36,460
Now suffice it to say,
this green line is good.

298
00:15:36,460 --> 00:15:40,630
Indeed, any time we achieve logarithmic
time instead of, say, linear time, we

299
00:15:40,630 --> 00:15:42,370
have made an improvement.

300
00:15:42,370 --> 00:15:44,180
But what did we presuppose?

301
00:15:44,180 --> 00:15:46,510
Well, we presupposed in both
the case of the phone book

302
00:15:46,510 --> 00:15:50,162
and in the case of those doors that
they were sorted in advance for us.

303
00:15:50,162 --> 00:15:52,120
By me in the case of the
doors and by the phone

304
00:15:52,120 --> 00:15:54,370
company in the case of the book.

305
00:15:54,370 --> 00:15:56,560
But what did it cost
me and what did it cost

306
00:15:56,560 --> 00:15:58,960
them to sort all of
those numbers and names

307
00:15:58,960 --> 00:16:03,220
just to enable us ultimately
to sort logarithmically?

308
00:16:03,220 --> 00:16:06,640
Well let's consider that in the context
of, again, some numbers, this time

309
00:16:06,640 --> 00:16:09,130
some numbers that I
myself can move around.

310
00:16:09,130 --> 00:16:13,270
Here we have eight cups, and on these
eight cups are eight numbers from 1

311
00:16:13,270 --> 00:16:14,080
through 8.

312
00:16:14,080 --> 00:16:17,500
And they're indeed sorted from smallest
to largest, though I could equivalently

313
00:16:17,500 --> 00:16:20,980
do this problem from largest
to smallest so long as we all

314
00:16:20,980 --> 00:16:22,720
agree what the goal is.

315
00:16:22,720 --> 00:16:25,660
Well let me go ahead and just
randomly shuffle some of these cups

316
00:16:25,660 --> 00:16:28,850
so that not everything
is in order anymore,

317
00:16:28,850 --> 00:16:32,270
and indeed now they're fairly jumbled,
and indeed not in the order I want,

318
00:16:32,270 --> 00:16:34,420
so some work needs to be done.

319
00:16:34,420 --> 00:16:36,610
Now why might they arrive in this order?

320
00:16:36,610 --> 00:16:39,100
Well in the case of the phone
book, certainly new people

321
00:16:39,100 --> 00:16:42,580
are moving into a town every day, and
so they're coming in not themselves

322
00:16:42,580 --> 00:16:44,980
in alphabetical order,
but seemingly random,

323
00:16:44,980 --> 00:16:46,900
and it's up to the phone
company to slot them

324
00:16:46,900 --> 00:16:50,800
into the right place in a phone book
for the sake of next year's print.

325
00:16:50,800 --> 00:16:52,300
And the same thing with those doors.

326
00:16:52,300 --> 00:16:54,700
Were I to add more and more
numbers behind those doors,

327
00:16:54,700 --> 00:16:57,580
I'd need to decide where to put
them, and they're not necessarily

328
00:16:57,580 --> 00:17:01,870
going to arrive for my input
source in the order I want.

329
00:17:01,870 --> 00:17:04,780
So here, then, I have some
randomly-ordered data,

330
00:17:04,780 --> 00:17:07,690
how do I go about sorting it quickly?

331
00:17:07,690 --> 00:17:10,170
Well, let's take a look at
the first problem I see.

332
00:17:10,170 --> 00:17:13,990
2 and 1 are out of order, so let me
just go ahead and swap, so to speak,

333
00:17:13,990 --> 00:17:14,829
those two.

334
00:17:14,829 --> 00:17:16,930
I've now improved to
the state of my cups,

335
00:17:16,930 --> 00:17:19,720
and I've made some
progress still, but 2 and 6

336
00:17:19,720 --> 00:17:23,290
seem OK even though maybe there
should be some cups in between.

337
00:17:23,290 --> 00:17:24,910
So let's look at the next pair now.

338
00:17:24,910 --> 00:17:29,260
We have 6 and 5, which definitely are
out of order, so let's switch those.

339
00:17:29,260 --> 00:17:31,480
6 and 4 are the same, out of order.

340
00:17:31,480 --> 00:17:33,680
6 and 3, just as much.

341
00:17:33,680 --> 00:17:37,820
6 and 8 are not quite
back-to-back, but there's probably

342
00:17:37,820 --> 00:17:41,260
going to be a number in-between, but
they are at least in the right order,

343
00:17:41,260 --> 00:17:43,210
because 6, of course, is less than 8.

344
00:17:43,210 --> 00:17:45,310
And then lastly we have 8 and 7.

345
00:17:45,310 --> 00:17:47,740
Let's swap those here and done--

346
00:17:47,740 --> 00:17:49,120
or are we not?

347
00:17:49,120 --> 00:17:53,680
Well I've made improvements with every
such swap, but some of these cups

348
00:17:53,680 --> 00:17:55,240
still remain out of order.

349
00:17:55,240 --> 00:17:56,410
Now these two are all set.

350
00:17:56,410 --> 00:17:58,780
2 and 5 are as well,
even though ultimately we

351
00:17:58,780 --> 00:18:02,390
might need some numbers between them,
but 4 and 5 are indeed out of order.

352
00:18:02,390 --> 00:18:04,560
3 and 5 just as much.

353
00:18:04,560 --> 00:18:08,930
6 and 5 are OK, 7 and 6 are
OK, and 8 and 7 as well.

354
00:18:08,930 --> 00:18:11,750
So we're almost done there,
but I do see some glitches.

355
00:18:11,750 --> 00:18:14,200
So let's again compare all
of these cups pairwise--

356
00:18:14,200 --> 00:18:17,895
1, 2; 2, 4-- oops, 4,
3, let's swap that.

357
00:18:17,895 --> 00:18:19,270
Let's keep going just to be safe.

358
00:18:19,270 --> 00:18:23,050
4, 5; 5, 6; 6, 7; 7, 8.

359
00:18:23,050 --> 00:18:27,430
And by way of this process, just
comparing cups back-to-back,

360
00:18:27,430 --> 00:18:29,740
we can fix any mistakes we see.

361
00:18:29,740 --> 00:18:31,870
Just for good measure,
let me do this once more.

362
00:18:31,870 --> 00:18:36,640
1, 2; 2, 3; 3, 4; 4,
5; 5, 6; 6, 7; 7, 8.

363
00:18:36,640 --> 00:18:40,000
Now this time that I've gone all
the way from left to right checking

364
00:18:40,000 --> 00:18:44,350
that every cup is in order, I can safely
conclude that these cups are sorted.

365
00:18:44,350 --> 00:18:47,320
After all, if I just went from
left to right and did no work,

366
00:18:47,320 --> 00:18:50,800
why would I presume that if I
do that same algorithm again,

367
00:18:50,800 --> 00:18:51,790
I'd make any changes?

368
00:18:51,790 --> 00:18:55,450
I wouldn't, so I can quit at this point.

369
00:18:55,450 --> 00:18:59,260
So that's all fine and good, but perhaps
we could have sorted these differently.

370
00:18:59,260 --> 00:19:02,650
That felt a little tedious and I
felt like I was doing a lot of work.

371
00:19:02,650 --> 00:19:06,550
What if I just try to select
the cups I want rather than deal

372
00:19:06,550 --> 00:19:08,120
with two cups at a time?

373
00:19:08,120 --> 00:19:11,570
Let's go ahead and randomly shuffle
these again in any old order,

374
00:19:11,570 --> 00:19:17,620
making sure to perturb what
was otherwise left to right.

375
00:19:17,620 --> 00:19:20,137
And here we have now another
random assortment of cups.

376
00:19:20,137 --> 00:19:21,970
But you know what I'm
going to do this time?

377
00:19:21,970 --> 00:19:25,000
I'm just going to select
the smallest I see.

378
00:19:25,000 --> 00:19:28,043
2 is already pretty small, so
I'll start as before on the left.

379
00:19:28,043 --> 00:19:31,210
So let's now check the other cups to
see if there's something smaller that I

380
00:19:31,210 --> 00:19:33,550
might prefer to be in this location.

381
00:19:33,550 --> 00:19:36,760
3, 1-- ooh, 1 is better, I'm going
to make mental note of this one.

382
00:19:36,760 --> 00:19:42,788
5, 8, 7, 6, 4-- all right, so 1
would seem to be the smallest number.

383
00:19:42,788 --> 00:19:45,080
So I'm going to go ahead and
put this where it belongs,

384
00:19:45,080 --> 00:19:47,020
which is right here at the side.

385
00:19:47,020 --> 00:19:49,300
There's really no room
for it, but you know what?

386
00:19:49,300 --> 00:19:51,300
These were randomly-sorted,
let me just go ahead

387
00:19:51,300 --> 00:19:54,552
and evict whatever's there,
too, and put 1 in it's place.

388
00:19:54,552 --> 00:19:57,010
Now to be fair, I might have
messed things up a little bit,

389
00:19:57,010 --> 00:20:01,420
but no more so than I might have when
I received these numbers randomly.

390
00:20:01,420 --> 00:20:03,640
In fact, I might even get
lucky-- by evicting a cup,

391
00:20:03,640 --> 00:20:08,960
I might end up putting it in the right
place so it all washes out in the end.

392
00:20:08,960 --> 00:20:11,380
Now let's go ahead and select
the next smallest number,

393
00:20:11,380 --> 00:20:14,140
but not bother looking at
that first one anymore.

394
00:20:14,140 --> 00:20:16,730
So 3 is pretty small, so
I'll keep that in mind.

395
00:20:16,730 --> 00:20:19,990
2 is even smaller, so I'll forget
about 3 and now remember 2.

396
00:20:19,990 --> 00:20:22,780
5 is bigger, 8 and 7 and 6 and 4--

397
00:20:22,780 --> 00:20:26,710
all right, 2 now seems to be the
next smallest number I can select.

398
00:20:26,710 --> 00:20:29,700
I know it belongs there, but 3's
already there, so let's evict 3

399
00:20:29,700 --> 00:20:31,510
and there you go, I got lucky.

400
00:20:31,510 --> 00:20:33,877
Now I have 1 and 2 in the right place.

401
00:20:33,877 --> 00:20:35,710
Let's again select the
next smallest number.

402
00:20:35,710 --> 00:20:38,920
I see 3 here, and again, I don't
necessarily know as a computer

403
00:20:38,920 --> 00:20:41,200
if I'm only looking at
one number at a time

404
00:20:41,200 --> 00:20:44,980
if there are, in fact,
anything smaller to its side.

405
00:20:44,980 --> 00:20:48,340
So let's check-- 5, 8, 7, 6, 4-- nope.

406
00:20:48,340 --> 00:20:52,492
So 3 I shall select, and I got
lucky, I'll leave it alone.

407
00:20:52,492 --> 00:20:53,950
How about the next smallest number?

408
00:20:53,950 --> 00:20:58,030
5 is pretty small, but 8,
7, 6, 4 is even smaller.

409
00:20:58,030 --> 00:21:01,900
Let's select this one, put it
in its place, evicting the 5

410
00:21:01,900 --> 00:21:03,650
and putting it where there's room.

411
00:21:03,650 --> 00:21:05,920
8 is not that small,
but it's all I know now.

412
00:21:05,920 --> 00:21:08,020
But ooh-- 7 is smaller,
I'll remember this.

413
00:21:08,020 --> 00:21:10,655
6 is even smaller, I'll
remember that, and it feels

414
00:21:10,655 --> 00:21:12,280
like I'm creating some work for myself.

415
00:21:12,280 --> 00:21:15,010
5 is the next smallest, 8's in the way.

416
00:21:15,010 --> 00:21:18,070
We'll evict 8 and put 5 right there.

417
00:21:18,070 --> 00:21:22,150
7 is pretty small, but 6 is even
smaller, but still smaller than 8,

418
00:21:22,150 --> 00:21:27,290
so let's pick up 6, evict
7, and put 7 in its place.

419
00:21:27,290 --> 00:21:30,250
Now for good measure, we're
obviously done, but I as the computer

420
00:21:30,250 --> 00:21:33,458
don't know that yet if I'm just looking
at one of these cups or, if you will,

421
00:21:33,458 --> 00:21:34,510
doors at a time.

422
00:21:34,510 --> 00:21:38,740
7's pretty small, 8 is no
smaller, so 7 I've selected

423
00:21:38,740 --> 00:21:40,750
to stay right there in its place.

424
00:21:40,750 --> 00:21:45,520
8 as well, by that same logic,
is now in its right place.

425
00:21:45,520 --> 00:21:48,430
So it turns out that
these two algorithms

426
00:21:48,430 --> 00:21:52,178
that I concocted along the way
actually do have some formal semantics.

427
00:21:52,178 --> 00:21:54,220
In fact, in computer
science, we'd call the first

428
00:21:54,220 --> 00:21:57,310
of those algorithms that
thing here, bubble sort.

429
00:21:57,310 --> 00:22:01,990
Because in fact, as you compare two cups
side-by-side and swap them on occasion

430
00:22:01,990 --> 00:22:05,980
in order to fix transpositions,
well, your largest numbers

431
00:22:05,980 --> 00:22:08,590
would seem to be bubbling
their way up to the top,

432
00:22:08,590 --> 00:22:13,210
or equivalently, the smallest ones
down to the end, and so bubble sort

433
00:22:13,210 --> 00:22:15,130
is the formal name for that algorithm.

434
00:22:15,130 --> 00:22:18,850
How might express this more
succinctly than my voice over there?

435
00:22:18,850 --> 00:22:20,380
Well let me propose this pseudocode.

436
00:22:20,380 --> 00:22:23,140
There's no one way to describe
this or any algorithm,

437
00:22:23,140 --> 00:22:26,710
but this was as few English words
as I could come up with and still

438
00:22:26,710 --> 00:22:28,390
be pretty precise.

439
00:22:28,390 --> 00:22:34,720
So repeat until no swaps the
following-- for i from 0 to n minus 2,

440
00:22:34,720 --> 00:22:39,310
if the i-th and i-th plus 1 elements
are out of order, swap them.

441
00:22:39,310 --> 00:22:40,570
Now why this lingo?

442
00:22:40,570 --> 00:22:43,390
Well computational thinking is
all about expressing yourself

443
00:22:43,390 --> 00:22:46,210
very methodically, very
clearly, and ultimately

444
00:22:46,210 --> 00:22:50,060
defining, say, some variables or terms
that you'll need in your arguments.

445
00:22:50,060 --> 00:22:52,390
And so here what I've done
is adopt a convention.

446
00:22:52,390 --> 00:22:54,550
I'm using i to represent an integer--

447
00:22:54,550 --> 00:22:55,840
some sort of counter--

448
00:22:55,840 --> 00:23:01,240
to represent the index of each
of my cups or doors or pages.

449
00:23:01,240 --> 00:23:03,160
And here, we are adopting
the convention, too,

450
00:23:03,160 --> 00:23:04,870
of starting to count from 0.

451
00:23:04,870 --> 00:23:08,030
And so if I want to start
looking at the first cup, a.k.a.

452
00:23:08,030 --> 00:23:13,330
0, I want to keep looking up,
up to the cup called n minus 2,

453
00:23:13,330 --> 00:23:22,210
because if my first cup is cup 0,
and this is then 1, 2, 3, 4, 5, 6, 7,

454
00:23:22,210 --> 00:23:25,900
indeed the cup is labeled
8, but it's in position 7.

455
00:23:25,900 --> 00:23:30,355
And so this position more generally, if
there are n cups, would be n minus 1.

456
00:23:30,355 --> 00:23:36,400
So bubble sort is telling me to start
at 0 and then look up to n minus 2,

457
00:23:36,400 --> 00:23:38,500
because in the next line
of code, I'm supposed

458
00:23:38,500 --> 00:23:43,450
to compare the i-th elements and
the i-th plus 1, so to speak.

459
00:23:43,450 --> 00:23:45,610
So I don't want to look
all the way to the end,

460
00:23:45,610 --> 00:23:49,210
I want to look one shy to the end,
because I know in looking at pairs,

461
00:23:49,210 --> 00:23:53,680
I'm looking at this one as well
as the one to its right, a.k.a.

462
00:23:53,680 --> 00:23:54,948
i plus 1.

463
00:23:54,948 --> 00:23:56,740
So the algorithm
ultimately is just saying,

464
00:23:56,740 --> 00:24:01,360
as you repeat that process again and
again until there are no swaps, just

465
00:24:01,360 --> 00:24:07,150
as I proposed, you're swapping any two
cups that with respect to each other

466
00:24:07,150 --> 00:24:08,440
are out of order.

467
00:24:08,440 --> 00:24:13,030
And so this, too, is an example more
generally of smalling local problems

468
00:24:13,030 --> 00:24:16,600
and achieving ultimately a
global result, if you will.

469
00:24:16,600 --> 00:24:22,030
Because with each swap of those cups,
I'm improving the quality of my data.

470
00:24:22,030 --> 00:24:26,290
And each swap in and of itself doesn't
necessarily solve the big picture,

471
00:24:26,290 --> 00:24:29,560
but together when we aggregate all
of those smaller solutions have we

472
00:24:29,560 --> 00:24:32,540
assembled the final result.

473
00:24:32,540 --> 00:24:34,780
Now what about that
second algorithm, wherein

474
00:24:34,780 --> 00:24:38,170
I started again with some random
cups, and then that time I

475
00:24:38,170 --> 00:24:42,220
selected one at a time the number
I actually wanted in place?

476
00:24:42,220 --> 00:24:43,660
I first sought out the smallest.

477
00:24:43,660 --> 00:24:46,990
I found that to be 1 and I put
it all the way there on the left.

478
00:24:46,990 --> 00:24:49,090
And I then sought out
the next smallest number,

479
00:24:49,090 --> 00:24:52,690
which after checking the remaining
cups, I determined was 2.

480
00:24:52,690 --> 00:24:55,808
And so I put 2 second in place.

481
00:24:55,808 --> 00:24:57,850
And then I repeated that
process again and again,

482
00:24:57,850 --> 00:25:02,380
not necessarily knowing in advance
from anyone what numbers I'd find.

483
00:25:02,380 --> 00:25:04,960
Because I checked each
and every remaining cup,

484
00:25:04,960 --> 00:25:10,000
I was able to conclude safely that I had
indeed found the next smallest element.

485
00:25:10,000 --> 00:25:12,010
And so that algorithm, too, has a name--

486
00:25:12,010 --> 00:25:13,120
selection sort.

487
00:25:13,120 --> 00:25:15,520
And I might describe
it pseudocode similar

488
00:25:15,520 --> 00:25:18,940
in structure but with
different logic ultimately.

489
00:25:18,940 --> 00:25:23,270
Let me propose that we do
for i from 0 to n minus 1,

490
00:25:23,270 --> 00:25:28,300
where again, n is the number of cups,
and 0 is by convention my first cup,

491
00:25:28,300 --> 00:25:31,600
and n minus 1, therefore, is my last.

492
00:25:31,600 --> 00:25:35,370
And what I then want to do is find
the smallest element between the i-th

493
00:25:35,370 --> 00:25:37,530
element and the n-th plus--

494
00:25:37,530 --> 00:25:39,015
at n minus 1.

495
00:25:39,015 --> 00:25:42,390
That is, find the smallest element
between wherever you've begun

496
00:25:42,390 --> 00:25:44,625
and that last element, n minus 1.

497
00:25:44,625 --> 00:25:48,390
And then if-- when you've
found that smallest element,

498
00:25:48,390 --> 00:25:50,730
you swap it with the i-th element.

499
00:25:50,730 --> 00:25:53,190
And that's why I was picking
up one cup and another

500
00:25:53,190 --> 00:25:57,630
and swapping them in place-- evicting
one and putting one where it belongs.

501
00:25:57,630 --> 00:25:59,670
And you do this again
and again and again,

502
00:25:59,670 --> 00:26:01,830
because each time your incrementing 1.

503
00:26:01,830 --> 00:26:06,780
So whereas the first iteration of this
loop will start here all the way left,

504
00:26:06,780 --> 00:26:09,840
the second iteration will start
here, and the third iteration

505
00:26:09,840 --> 00:26:10,860
will start here.

506
00:26:10,860 --> 00:26:13,950
And so with the amount
of problem to be solved

507
00:26:13,950 --> 00:26:20,020
is steadily decreasing until
I have 1 and then 0 cups left.

508
00:26:20,020 --> 00:26:22,810
Now it certainly took some
work to sort those n cups,

509
00:26:22,810 --> 00:26:24,700
but how much work did it take?

510
00:26:24,700 --> 00:26:28,030
Well in the case of bubble sort,
what was I doing on each pass

511
00:26:28,030 --> 00:26:29,020
through these cups?

512
00:26:29,020 --> 00:26:32,140
Well I was comparing and
then potentially swapping

513
00:26:32,140 --> 00:26:36,100
each adjacent pair of cups, and then
repeating myself again and again.

514
00:26:36,100 --> 00:26:39,100
Well if we have here
n cups, how many pairs

515
00:26:39,100 --> 00:26:41,650
can you create which you
then consider swapping?

516
00:26:41,650 --> 00:26:47,461
Well if I have n cups, I could
seem to make 1, 2, 3, 4, 5, 6,

517
00:26:47,461 --> 00:26:52,750
7 out of 8 pairs at a time,
so more generally n minus 1.

518
00:26:52,750 --> 00:26:57,100
So on each pass here, it would seem
that I'm comparing n minus 1 cups.

519
00:26:57,100 --> 00:26:59,530
Now how many passes do I
need to ultimately make?

520
00:26:59,530 --> 00:27:02,170
It would seem to be roughly
n, because in the worst case,

521
00:27:02,170 --> 00:27:05,110
these cups might be
completely out of order.

522
00:27:05,110 --> 00:27:09,160
Which is to say, I might indeed
do n things n minus 1 times,

523
00:27:09,160 --> 00:27:13,990
and if you multiply that out, I'm
going to get some factor of n squared.

524
00:27:13,990 --> 00:27:18,040
But what about selection sort, wherein I
instead looked through all of the cups,

525
00:27:18,040 --> 00:27:20,680
selecting first the
smallest, and then repeating

526
00:27:20,680 --> 00:27:23,380
that process for the
next smallest still?

527
00:27:23,380 --> 00:27:25,840
Well in that case, I
started with n cups,

528
00:27:25,840 --> 00:27:28,600
and I might need to
look at all n, and then

529
00:27:28,600 --> 00:27:32,635
once I found that, I might
instead look at n minus 1.

530
00:27:32,635 --> 00:27:37,990
So there, too, I seem to be summing
something like n plus n minus 1

531
00:27:37,990 --> 00:27:40,600
plus n minus 2 and so
forth, so let's see

532
00:27:40,600 --> 00:27:43,640
if we can't now summarize this as well.

533
00:27:43,640 --> 00:27:46,900
Well let me propose more mathematically,
that, say, with selection sort,

534
00:27:46,900 --> 00:27:48,540
what we've done is this.

535
00:27:48,540 --> 00:27:53,200
In looking for that smallest cup, I
had to make n minus 1 comparisons.

536
00:27:53,200 --> 00:27:55,960
Because as I identified the
smallest cup I'd yet seen,

537
00:27:55,960 --> 00:27:59,590
I compared it to no more
than n minus others.

538
00:27:59,590 --> 00:28:04,870
Now if the first selection of a cup took
me n minus 1 steps but then it's done,

539
00:28:04,870 --> 00:28:07,590
the next lesson of the
next smallest cup would

540
00:28:07,590 --> 00:28:10,690
have taken me only n minus 2 steps.

541
00:28:10,690 --> 00:28:13,280
And if you continue that
logic with each pass,

542
00:28:13,280 --> 00:28:17,770
you have to do a little bit less
work until you're left with just one

543
00:28:17,770 --> 00:28:20,680
very last cup at the end, such as 8.

544
00:28:20,680 --> 00:28:23,110
So what does this actually sum too?

545
00:28:23,110 --> 00:28:25,840
Well you might not remember
or see it at first glance,

546
00:28:25,840 --> 00:28:28,960
but it turns out, particularly if
you look at one of those charts

547
00:28:28,960 --> 00:28:33,970
at the back of a textbook, does
this summation or series actually

548
00:28:33,970 --> 00:28:38,530
aggregate to n times n
minus all divided by 2.

549
00:28:38,530 --> 00:28:41,560
Now this you can perhaps multiply
out a bit more readily as

550
00:28:41,560 --> 00:28:44,590
in n squared minus n all divided by 2.

551
00:28:44,590 --> 00:28:47,230
And if we factor that out,
we can now get n squared

552
00:28:47,230 --> 00:28:51,520
divided by 2 minus n divided by 2.

553
00:28:51,520 --> 00:28:56,620
Now which of these terms, n squared
divided by 2 or n divided by 2,

554
00:28:56,620 --> 00:28:58,450
tends to dominate the other?

555
00:28:58,450 --> 00:29:01,420
That is to say, as n
gets larger and larger,

556
00:29:01,420 --> 00:29:05,500
which of these mathematical
expressions has the biggest effect

557
00:29:05,500 --> 00:29:07,060
on the number of steps?

558
00:29:07,060 --> 00:29:12,760
Well surely it's n squared, albeit
divided by 2, because as n gets large,

559
00:29:12,760 --> 00:29:15,677
n squared is certainly larger than n.

560
00:29:15,677 --> 00:29:18,010
And so what a computer scientist
here would typically do

561
00:29:18,010 --> 00:29:22,330
is just ignore those
lower-ordered terms, so to speak.

562
00:29:22,330 --> 00:29:26,440
And he would say with a figurative
or literal wave of the hand,

563
00:29:26,440 --> 00:29:31,030
this is on the order of
n squared this algorithm.

564
00:29:31,030 --> 00:29:33,550
That isn't to say it's
precisely that many steps,

565
00:29:33,550 --> 00:29:37,270
but rather as n gets really
large, it is pretty much

566
00:29:37,270 --> 00:29:41,140
that n squared term that
really matters the most.

567
00:29:41,140 --> 00:29:45,310
Now this is not a form of proof, but
rather a proof by example, if you will,

568
00:29:45,310 --> 00:29:49,600
but let's see if I can't convince
you with a single example numerically

569
00:29:49,600 --> 00:29:51,550
of the impact of that square.

570
00:29:51,550 --> 00:29:56,350
Well if we start again with n squared
over 2 minus n over 2 and say n

571
00:29:56,350 --> 00:30:01,660
is maybe 1 million initially-- so not
eight cups, not 1,000 pages in a book,

572
00:30:01,660 --> 00:30:06,550
but 1 million numbers or
any other element itself.

573
00:30:06,550 --> 00:30:08,280
What does this actually sum to?

574
00:30:08,280 --> 00:30:12,820
Well 1 million squared divided
by 2 minus 1 million divided by 2

575
00:30:12,820 --> 00:30:21,970
happens to be 500 billion minus 500,000,
which of course is 499,999,500,000.

576
00:30:21,970 --> 00:30:28,090
Now I daresay that is pretty
darn close to big O of n squared.

577
00:30:28,090 --> 00:30:28,900
Why?

578
00:30:28,900 --> 00:30:31,180
Well if we started with,
say, 1 trillion then

579
00:30:31,180 --> 00:30:36,520
halved it and ended up with 499
billion, that's still pretty close.

580
00:30:36,520 --> 00:30:41,270
Now in real terms, that does not
equal the same number of steps,

581
00:30:41,270 --> 00:30:44,710
but it gives us a general sense it's
on the order of this many steps,

582
00:30:44,710 --> 00:30:47,830
because if we plugged in
larger and larger values for n,

583
00:30:47,830 --> 00:30:51,118
that difference would
not even be as extreme.

584
00:30:51,118 --> 00:30:54,160
Well why don't we take a look now at
these algorithms in a different form

585
00:30:54,160 --> 00:30:58,030
altogether without the physical
limitation of me as the computer?

586
00:30:58,030 --> 00:31:02,080
Pictured here is, if you will, an array
of numbers, but pictured graphically.

587
00:31:02,080 --> 00:31:04,450
Wherein we have vertical
bars, and the taller

588
00:31:04,450 --> 00:31:07,190
the bar, the bigger the
number it represents.

589
00:31:07,190 --> 00:31:10,300
So big bar is big number,
small bar is small number,

590
00:31:10,300 --> 00:31:12,970
but they're clearly,
therefore, unsorted.

591
00:31:12,970 --> 00:31:16,870
Via these number of algorithms we've
seen, bubble sort and selection sort,

592
00:31:16,870 --> 00:31:20,650
what does it actually look
like to sort of many elements?

593
00:31:20,650 --> 00:31:22,140
Let's take a look.

594
00:31:22,140 --> 00:31:25,290
In this tool where I proceed
to choose my first algorithm,

595
00:31:25,290 --> 00:31:27,630
which shall be, say, bubble sort.

596
00:31:27,630 --> 00:31:30,750
And you'll see rather slowly that
this algorithm is indeed comparing

597
00:31:30,750 --> 00:31:32,280
pairwise elements, and if--

598
00:31:32,280 --> 00:31:36,630
and only if they're out of order,
swapping them again and again.

599
00:31:36,630 --> 00:31:38,730
Now to be fair, this
quickly gets tedious,

600
00:31:38,730 --> 00:31:41,190
so let me increase the
animation speed here.

601
00:31:41,190 --> 00:31:45,630
And now you can rather see that
bubbling up of the largest.

602
00:31:45,630 --> 00:31:48,300
Previously it was my 8 and my 7 and 6.

603
00:31:48,300 --> 00:31:52,890
Here we have 99, 98, 97, but
indeed, those tallest bars

604
00:31:52,890 --> 00:31:54,610
are making their way up.

605
00:31:54,610 --> 00:31:57,720
So let's turn our attention next to
this other algorithm, selection sort,

606
00:31:57,720 --> 00:32:01,440
to see if it looks or perhaps
feels rather different.

607
00:32:01,440 --> 00:32:03,720
Here now we have
selection sort each time

608
00:32:03,720 --> 00:32:07,680
going through the entire list looking
for the smallest possible element.

609
00:32:07,680 --> 00:32:09,810
Highlighted in red for
just a moment here is

610
00:32:09,810 --> 00:32:13,530
9, because we have not
yet until-- oh, now found

611
00:32:13,530 --> 00:32:16,455
a smaller element, now 2, and now 1.

612
00:32:16,455 --> 00:32:19,080
And we'll continue looking through
the rest of the numbers just

613
00:32:19,080 --> 00:32:21,960
to be sure we don't find
something smaller, and once we do,

614
00:32:21,960 --> 00:32:23,340
1 goes into place.

615
00:32:23,340 --> 00:32:26,820
And then we repeat that process,
but we do fewer steps now,

616
00:32:26,820 --> 00:32:30,540
because whereas there are n total bars,
we don't need to look at the leftmost

617
00:32:30,540 --> 00:32:34,455
now because it's sorted, we
only need look at n minus 1.

618
00:32:34,455 --> 00:32:36,150
So this process again will repeat.

619
00:32:36,150 --> 00:32:37,140
We found 2.

620
00:32:37,140 --> 00:32:40,440
We're just double-checking that
there's not something smaller,

621
00:32:40,440 --> 00:32:42,520
and now 2 is in its place.

622
00:32:42,520 --> 00:32:44,850
Now we humans, of course,
have the advantage

623
00:32:44,850 --> 00:32:48,030
of having an aerial view, if
you will, of all this data.

624
00:32:48,030 --> 00:32:51,330
And certainly a computer
could remember more than just

625
00:32:51,330 --> 00:32:53,970
the smallest number it's recently seen.

626
00:32:53,970 --> 00:32:56,985
Why not for efficiency remember
the two smallest numbers?

627
00:32:56,985 --> 00:32:58,110
The three smallest numbers?

628
00:32:58,110 --> 00:32:59,400
The four smallest numbers?

629
00:32:59,400 --> 00:33:03,120
That's fine, but that argument
is quickly devolving into--

630
00:33:03,120 --> 00:33:05,670
just remember all the original numbers.

631
00:33:05,670 --> 00:33:08,280
And so yes, you could
perhaps save some time,

632
00:33:08,280 --> 00:33:11,520
but it sounds like you're
asking for more and more space

633
00:33:11,520 --> 00:33:14,580
with which to remember the
answers to those questions.

634
00:33:14,580 --> 00:33:17,010
Now this, too, would seem
to be taking us all day.

635
00:33:17,010 --> 00:33:20,160
Even if we down here
increase the animation speed,

636
00:33:20,160 --> 00:33:24,000
it now is selecting those
elements a bit faster and faster,

637
00:33:24,000 --> 00:33:26,650
but there's still so
much work to be done.

638
00:33:26,650 --> 00:33:29,820
Indeed, these comparison-based sorts
that are comparing things again

639
00:33:29,820 --> 00:33:34,110
and again and then redoing that work in
some form to improve the problem still

640
00:33:34,110 --> 00:33:36,630
just tend to end up on the order of--

641
00:33:36,630 --> 00:33:38,410
bingo, of n squared.

642
00:33:38,410 --> 00:33:41,220
Which is to say that n
squared or something quadratic

643
00:33:41,220 --> 00:33:42,810
tends to be rather slow.

644
00:33:42,810 --> 00:33:46,650
And this is in quite contrast
to our logarithmic time before,

645
00:33:46,650 --> 00:33:51,600
but that logarithm thus far
was for searching, not sorting.

646
00:33:51,600 --> 00:33:54,435
So let's compare these
two now side by side,

647
00:33:54,435 --> 00:33:57,060
albeit with a different tool that
presents the same information

648
00:33:57,060 --> 00:33:58,770
graphically sideways.

649
00:33:58,770 --> 00:34:01,440
Here again we have bars, and
small bar is small number,

650
00:34:01,440 --> 00:34:06,090
and big bar is big number, but here,
they've simply been rotated 90 degrees.

651
00:34:06,090 --> 00:34:09,090
On the left here we have selection
sort, on the right here bubble sort,

652
00:34:09,090 --> 00:34:12,420
both of whose bars are
randomly sorted so that neither

653
00:34:12,420 --> 00:34:14,880
has an edge necessarily over the other.

654
00:34:14,880 --> 00:34:18,150
Let's go ahead and play all
and see what happens here.

655
00:34:18,150 --> 00:34:20,340
And you'll see that
indeed, bubbles bubbling up

656
00:34:20,340 --> 00:34:23,460
and selection is improving
its selections as we go.

657
00:34:23,460 --> 00:34:27,460
Bubble would seem to have won because
selection's got a bit more work,

658
00:34:27,460 --> 00:34:30,929
but there, too, it's
pretty close to a tie.

659
00:34:30,929 --> 00:34:32,800
So can we do better?

660
00:34:32,800 --> 00:34:37,080
Well it turns out we can, so long as
we use a bit more of that intuition

661
00:34:37,080 --> 00:34:40,830
we had when we started
thinking computationally

662
00:34:40,830 --> 00:34:44,340
and we divided and conquered,
we divided and conquered.

663
00:34:44,340 --> 00:34:49,469
In other words, why not, given
n doors or n cups or in pages,

664
00:34:49,469 --> 00:34:52,927
why don't we divide and conquer
that problem again and again?

665
00:34:52,927 --> 00:34:54,719
In other words, in the
context of the cups,

666
00:34:54,719 --> 00:34:59,220
why don't I simply sort for you the
left half and then the right half,

667
00:34:59,220 --> 00:35:03,167
and then with two sorted halves, just
interweave them for you together.

668
00:35:03,167 --> 00:35:06,000
That would seem to be a little
different from walking back and forth

669
00:35:06,000 --> 00:35:08,722
and back and forth and swapping
elements again and again.

670
00:35:08,722 --> 00:35:10,680
Just do a little bit of
work here, a little bit

671
00:35:10,680 --> 00:35:14,070
more now, and then
reassemble your total work.

672
00:35:14,070 --> 00:35:16,650
Now of course, if I simply
say, I'll sort this left half,

673
00:35:16,650 --> 00:35:18,450
what does it mean to
sort this left half?

674
00:35:18,450 --> 00:35:21,210
Well, I dare say this
left half can be divided

675
00:35:21,210 --> 00:35:25,380
into a left half of the left half,
thereby making the problem smaller.

676
00:35:25,380 --> 00:35:30,120
So somehow or other, we could leverage
that intuition of binary search,

677
00:35:30,120 --> 00:35:31,800
but apply it to sort.

678
00:35:31,800 --> 00:35:35,400
It's not going to be in the end
quite as fast as binary search,

679
00:35:35,400 --> 00:35:38,490
because with sort, you have to
deal with all of the elements,

680
00:35:38,490 --> 00:35:40,590
you can't simply tear
half of the problem

681
00:35:40,590 --> 00:35:44,490
away because you'd be leaving
half of your elements unsorted.

682
00:35:44,490 --> 00:35:47,010
But it turns out there's
many algorithms that

683
00:35:47,010 --> 00:35:50,520
are faster than selection and
bubble sort, and one of those

684
00:35:50,520 --> 00:35:52,290
is called merge sort.

685
00:35:52,290 --> 00:35:55,800
And merge sort leverage is
precisely this intuition of dividing

686
00:35:55,800 --> 00:35:59,640
a problem in half and in half, and to
be fair, touching all of those halves

687
00:35:59,640 --> 00:36:04,290
ultimately, but doing it in a way
that's more efficient and less

688
00:36:04,290 --> 00:36:08,280
comparison-based than bubble sort
and selection sort themselves.

689
00:36:08,280 --> 00:36:11,670
So let me go ahead and play all
now with these three sets of bars

690
00:36:11,670 --> 00:36:15,410
and see just which one wins now.

691
00:36:15,410 --> 00:36:17,580
And after just a moment,
there's nothing more

692
00:36:17,580 --> 00:36:22,500
to say-- merge sort has already won, if
you will, even though now bubble has,

693
00:36:22,500 --> 00:36:23,520
and now selection.

694
00:36:23,520 --> 00:36:25,270
And perhaps this was
a fluke-- to be fair,

695
00:36:25,270 --> 00:36:27,810
these numbers are random,
maybe merge sort got lucky.

696
00:36:27,810 --> 00:36:31,500
Let's go ahead and play the test
once more with other numbers.

697
00:36:31,500 --> 00:36:33,720
And indeed it again is done.

698
00:36:33,720 --> 00:36:36,540
Let me play it one third and
final time, but notice the pattern

699
00:36:36,540 --> 00:36:38,520
now that emerges with merge sort.

700
00:36:38,520 --> 00:36:44,930
You can see if you look closely
the actual halving again and again.

701
00:36:44,930 --> 00:36:47,360
And indeed, it seems that
half of the list get sorted,

702
00:36:47,360 --> 00:36:49,888
and then you re assemble
it at the very end.

703
00:36:49,888 --> 00:36:51,680
And indeed, let's zoom
in on this algorithm

704
00:36:51,680 --> 00:36:54,290
now and look specifically
at merge sort alone.

705
00:36:54,290 --> 00:36:57,290
Here we have merge sort,
and highlighted in colors

706
00:36:57,290 --> 00:37:01,160
as we do work is exactly the elements
you're sorting again and again.

707
00:37:01,160 --> 00:37:03,410
The reason so few of
these bars are being

708
00:37:03,410 --> 00:37:07,880
looked at a time is because again,
logically or recursively, if you will,

709
00:37:07,880 --> 00:37:09,980
are we sorting first the left half?

710
00:37:09,980 --> 00:37:11,750
But no, the left half of the left half.

711
00:37:11,750 --> 00:37:15,020
But no, the left half of the
left half of the left half and so

712
00:37:15,020 --> 00:37:17,120
forth, and what this
really boils down to

713
00:37:17,120 --> 00:37:21,140
ultimately is sorting
eventually individual elements.

714
00:37:21,140 --> 00:37:23,990
But if I hand you one element
and I say, please sort this,

715
00:37:23,990 --> 00:37:27,920
it has no halves, so your work is
done-- you don't need do a thing.

716
00:37:27,920 --> 00:37:30,947
But then if you have two
halves, each of size 1,

717
00:37:30,947 --> 00:37:32,780
there might indeed be
work to be done there,

718
00:37:32,780 --> 00:37:35,900
because if one is smaller than the
other or one is larger than the other,

719
00:37:35,900 --> 00:37:39,410
you do need to interleave
those for me to merge them.

720
00:37:39,410 --> 00:37:41,930
And that's exactly what
merge sort's doing here.

721
00:37:41,930 --> 00:37:45,470
Allow me to increase the animation
speed and you'll see as we go,

722
00:37:45,470 --> 00:37:48,410
that half of the list is
getting sorted at a time.

723
00:37:48,410 --> 00:37:50,450
It's not perfect and it's
not perfectly smooth,

724
00:37:50,450 --> 00:37:53,033
because that's-- because half
of the other elements are there,

725
00:37:53,033 --> 00:37:56,600
but now are reemerging the two halves.

726
00:37:56,600 --> 00:37:58,850
And that was fast,
but it finished faster

727
00:37:58,850 --> 00:38:02,060
indeed than would have been
for bubble and selection sort,

728
00:38:02,060 --> 00:38:05,750
but there was a price being paid.

729
00:38:05,750 --> 00:38:09,170
If you think back to our vertical
visualization of bubble sort

730
00:38:09,170 --> 00:38:13,420
and selection sort, they were
doing all of their work in place.

731
00:38:13,420 --> 00:38:17,870
Merge sort seemed to be getting a
little greedy on us, if you will,

732
00:38:17,870 --> 00:38:21,650
and that it was temporarily putting
some of those bars down here,

733
00:38:21,650 --> 00:38:26,420
effectively using twice as much space
as those first two algorithms, selection

734
00:38:26,420 --> 00:38:27,110
and bubble.

735
00:38:27,110 --> 00:38:30,350
And indeed, that's where merge
sort gets its edge fundamentally.

736
00:38:30,350 --> 00:38:34,880
It's not just a better algorithm,
per se, and better thought-out,

737
00:38:34,880 --> 00:38:40,100
but it actually additionally consumes
more resources-- not time, but space.

738
00:38:40,100 --> 00:38:43,520
By using twice as much space-- not
just the top half of the screen,

739
00:38:43,520 --> 00:38:44,510
but the bottom--

740
00:38:44,510 --> 00:38:47,930
can merge sort temporarily put
some of its work over here,

741
00:38:47,930 --> 00:38:51,710
continue doing some other work,
and then reassemble them together.

742
00:38:51,710 --> 00:38:54,412
Both selection sort and bubble
sort did not have that advantage.

743
00:38:54,412 --> 00:38:56,120
They had to do everything
in place, which

744
00:38:56,120 --> 00:38:59,540
is why we had to swap so
many things so many times.

745
00:38:59,540 --> 00:39:03,290
We had far fewer spots in
which to work on that table.

746
00:39:03,290 --> 00:39:06,110
But with merge sort,
spend a bit more space,

747
00:39:06,110 --> 00:39:10,580
and you can reduce that amount of time.

748
00:39:10,580 --> 00:39:15,370
Now all of these algorithms assume
that our data is back-to-back-to-back--

749
00:39:15,370 --> 00:39:17,140
that is, stored in an array.

750
00:39:17,140 --> 00:39:21,010
And that's great, because that's
exactly how a computer is so inclined

751
00:39:21,010 --> 00:39:23,170
to store data inherently.

752
00:39:23,170 --> 00:39:26,590
For instance, pictured here
is a stick of memory of RAM--

753
00:39:26,590 --> 00:39:28,190
Random Access Memory.

754
00:39:28,190 --> 00:39:33,550
And indeed, albeit a bit of a misnomer
that R in RAM, random, actually

755
00:39:33,550 --> 00:39:37,480
means that a computer can jump
in instant or constant time

756
00:39:37,480 --> 00:39:38,920
to a specific byte.

757
00:39:38,920 --> 00:39:41,440
And that's so important
when we want to jump

758
00:39:41,440 --> 00:39:46,330
around our data, our cups, or our pages
in order to get at data instantly,

759
00:39:46,330 --> 00:39:47,440
if you will.

760
00:39:47,440 --> 00:39:52,270
And the reason it is so conducive to
laying out information back-to-back

761
00:39:52,270 --> 00:39:57,370
contiguously in memory is if we consider
one of these black chips on this DIMM--

762
00:39:57,370 --> 00:39:59,380
or Dual In-line Memory Module--

763
00:39:59,380 --> 00:40:02,200
is that we have in this black
chip really, if you will,

764
00:40:02,200 --> 00:40:04,420
an artist's rendition at hand.

765
00:40:04,420 --> 00:40:06,670
That artist's rendition
might propose that if you

766
00:40:06,670 --> 00:40:11,770
have some number of bytes in this
chip, say 1 billion for 1 gigabyte,

767
00:40:11,770 --> 00:40:14,290
it certainly stands to
reason that we humans could

768
00:40:14,290 --> 00:40:16,810
number those bytes from 0 on up--

769
00:40:16,810 --> 00:40:19,810
from 0 to 1 billion, roughly speaking.

770
00:40:19,810 --> 00:40:23,110
And so the top left one here might
be 0, the next one might be 1,

771
00:40:23,110 --> 00:40:25,960
the next one thereafter
should be 2, and so we can

772
00:40:25,960 --> 00:40:28,090
number each and every one of our bytes.

773
00:40:28,090 --> 00:40:33,370
And so when you store a number on
a cup or a number behind a door,

774
00:40:33,370 --> 00:40:37,420
that amounts to just writing those
numbers inside of each of these boxes.

775
00:40:37,420 --> 00:40:40,540
And each is next to the other,
and so with simple arithmetic,

776
00:40:40,540 --> 00:40:43,060
a bit of division and
rounding, might you

777
00:40:43,060 --> 00:40:46,240
be able to jump instantly to
any one of these addresses?

778
00:40:46,240 --> 00:40:49,870
There is no moving parts here to do
any work like my human feet might

779
00:40:49,870 --> 00:40:51,310
have to do in our real world.

780
00:40:51,310 --> 00:40:55,630
Rather the computer can jump
instantly to that so-called address

781
00:40:55,630 --> 00:40:59,030
or index of the array.

782
00:40:59,030 --> 00:41:01,960
Now what can we do when
we have a canvas that

783
00:41:01,960 --> 00:41:05,320
allows us to layout memory in this way?

784
00:41:05,320 --> 00:41:07,720
We can represent any number of types.

785
00:41:07,720 --> 00:41:11,440
Indeed in Python, there are
all sorts of types of data.

786
00:41:11,440 --> 00:41:15,400
For instance, bool for a Boolean
value and float for a floating point

787
00:41:15,400 --> 00:41:17,450
value, a real number with a decimal.

788
00:41:17,450 --> 00:41:20,530
An int for an integer
and str for a string.

789
00:41:20,530 --> 00:41:23,800
Each of those is laid out in memory
in some particular way that's

790
00:41:23,800 --> 00:41:26,590
conducive to accessing it efficiently.

791
00:41:26,590 --> 00:41:29,440
But that's precisely why,
too, we've run into issues

792
00:41:29,440 --> 00:41:33,130
when using something like a float,
because if you decide a priori to use

793
00:41:33,130 --> 00:41:36,460
only so many bytes, bytes to
the left and to the right,

794
00:41:36,460 --> 00:41:41,050
above and below it might end up getting
used by other parts of your program.

795
00:41:41,050 --> 00:41:46,120
And so if you've only asked for,
say, 32 or 64 bits or 4 or 8 bytes,

796
00:41:46,120 --> 00:41:49,090
because you're then going to
be surrounded by other data,

797
00:41:49,090 --> 00:41:55,030
that floating point value or some other
can only be ultimately so precise.

798
00:41:55,030 --> 00:41:57,190
Because ultimately yes,
we're operating in bits,

799
00:41:57,190 --> 00:42:01,520
but those bits are physically
laid out in some order.

800
00:42:01,520 --> 00:42:06,910
So with that said, what are the options
via which we can paint on this canvas?

801
00:42:06,910 --> 00:42:08,860
Surely it would be
nice if we could store

802
00:42:08,860 --> 00:42:12,550
data not necessarily always
back-to-back in this way,

803
00:42:12,550 --> 00:42:15,610
but we can create more
sophisticated data structures

804
00:42:15,610 --> 00:42:20,860
so as to support not only these
types here, but also ones like these.

805
00:42:20,860 --> 00:42:25,030
Dict in Python for dictionary,
otherwise known as a hash table.

806
00:42:25,030 --> 00:42:29,770
And list for a sort of array that
can grow and shrink, and range

807
00:42:29,770 --> 00:42:31,330
for a range of values.

808
00:42:31,330 --> 00:42:33,820
Set for a collection
of values that contain

809
00:42:33,820 --> 00:42:39,160
no duplicates, and tuples, something
like x, y or latitude, longitude.

810
00:42:39,160 --> 00:42:42,640
These concepts-- surely
it would be nice to have

811
00:42:42,640 --> 00:42:46,750
accessible to us in higher
level contexts like Python,

812
00:42:46,750 --> 00:42:50,530
but if at the end of the day all we
have is bytes of memory back-to-back,

813
00:42:50,530 --> 00:42:53,830
we need some layers of
abstraction on top of that memory

814
00:42:53,830 --> 00:42:57,200
so as to implement these more
sophisticated structures.

815
00:42:57,200 --> 00:42:59,275
So we'll take a look at
a few in particular ints

816
00:42:59,275 --> 00:43:02,380
and str and dict and
list, because all of those

817
00:43:02,380 --> 00:43:08,140
somehow need to be built on top of
these lower-level principles of memory.

818
00:43:08,140 --> 00:43:11,950
So how might this work and
what problems might we solve?

819
00:43:11,950 --> 00:43:15,130
Let's now use the board as
my canvas, drawing on it

820
00:43:15,130 --> 00:43:17,710
that same grid of rows
and columns in order

821
00:43:17,710 --> 00:43:21,940
to divide this screen
into that many bytes.

822
00:43:21,940 --> 00:43:26,020
And I'll go ahead and divide this board
into these squares, each one of which

823
00:43:26,020 --> 00:43:30,640
represents an individual byte, and
each of those bytes, of course,

824
00:43:30,640 --> 00:43:32,740
has some number associated with it.

825
00:43:32,740 --> 00:43:36,790
That number is not the number inside
of that box, per se, not the bits

826
00:43:36,790 --> 00:43:40,000
that compose it, but rather
just metadata-- an index

827
00:43:40,000 --> 00:43:44,350
where address that exists implicitly,
but is not actually stored.

828
00:43:44,350 --> 00:43:47,470
This then might be index
0 or address 0, this

829
00:43:47,470 --> 00:43:52,180
might be 1, this 2, this
3, this one 4, this one 5.

830
00:43:52,180 --> 00:43:55,420
And if we, as for artist's
sake, move to the next row,

831
00:43:55,420 --> 00:43:59,000
we might call this 6 and
this 7, and so forth.

832
00:43:59,000 --> 00:44:03,190
Now suppose we want to store some
actual values in this memory,

833
00:44:03,190 --> 00:44:04,990
well let's go ahead and do just that.

834
00:44:04,990 --> 00:44:09,040
We might stored the actual
number 4 here, followed by 8,

835
00:44:09,040 --> 00:44:15,390
followed by 15 and 16, perhaps
followed by 23, and then 42.

836
00:44:15,390 --> 00:44:19,268
And so we have some random
numbers inside of this memory,

837
00:44:19,268 --> 00:44:21,060
and because those
numbers are back-to-back,

838
00:44:21,060 --> 00:44:24,240
we can call this an array of size 6.

839
00:44:24,240 --> 00:44:26,820
Its first index is 0,
its last index is 5,

840
00:44:26,820 --> 00:44:30,150
and between there are six total values.

841
00:44:30,150 --> 00:44:36,630
Now what can we do if we're ready to
add a seventh number to this list?

842
00:44:36,630 --> 00:44:38,580
Well, we could certainly
put it right here

843
00:44:38,580 --> 00:44:41,490
because this is the next
appropriate location,

844
00:44:41,490 --> 00:44:44,570
but it depends whether that
spot is still available.

845
00:44:44,570 --> 00:44:46,320
Because the way a
computer typically works

846
00:44:46,320 --> 00:44:48,070
is that when you're
writing a program, you

847
00:44:48,070 --> 00:44:51,210
need to decide in advance
how much memory you want.

848
00:44:51,210 --> 00:44:54,270
And you tell the computer by
way of the operating system,

849
00:44:54,270 --> 00:44:57,420
be it Windows or macOS,
Linux, or something else,

850
00:44:57,420 --> 00:45:02,640
how many bytes of memory you would like
to allocate to your particular problem.

851
00:45:02,640 --> 00:45:04,740
And if I only had the
foresight to say, I

852
00:45:04,740 --> 00:45:07,868
would like 6 bytes in
which to store 6 numbers,

853
00:45:07,868 --> 00:45:10,410
the operating system might have
handed me that back and said,

854
00:45:10,410 --> 00:45:14,340
fine, here you go, but the
operating system thereafter

855
00:45:14,340 --> 00:45:18,720
might have proceeded to allocate
subsequent adjacent bytes, like 6

856
00:45:18,720 --> 00:45:22,962
and 7, to some other
aspect of your program.

857
00:45:22,962 --> 00:45:25,920
Which is to say, you might have
painted yourself into a bit of a corner

858
00:45:25,920 --> 00:45:32,100
by only in code asking the operating
system for just those initial 6 bytes.

859
00:45:32,100 --> 00:45:35,670
You instead might have
wanted to ask for more bytes

860
00:45:35,670 --> 00:45:37,710
so as to allow yourself
this room to grow,

861
00:45:37,710 --> 00:45:41,760
but if you didn't do that in
code, you might just be unlucky.

862
00:45:41,760 --> 00:45:44,520
But that's the price
you pay for an array.

863
00:45:44,520 --> 00:45:46,710
You have this wonderfully
efficient ability

864
00:45:46,710 --> 00:45:48,900
to search it randomly,
if you will, which

865
00:45:48,900 --> 00:45:51,700
is to say instantly via arithmetic.

866
00:45:51,700 --> 00:45:54,510
You can jump to the beginning
or the end or even the middle,

867
00:45:54,510 --> 00:45:59,520
as we've seen, by just doing perhaps
some addition, subtraction, division,

868
00:45:59,520 --> 00:46:02,400
and rounding, and that gets
you ultimately right where

869
00:46:02,400 --> 00:46:07,030
you want to go in some constant
and very few number of steps.

870
00:46:07,030 --> 00:46:10,050
But unfortunately, because
you wanted all of that memory

871
00:46:10,050 --> 00:46:15,000
back-to-back-to-back, it's up to you
to decide how much of it you want.

872
00:46:15,000 --> 00:46:18,180
And if the operating system, I'm
sorry, has already allocated 6, 7,

873
00:46:18,180 --> 00:46:22,620
and elsewhere on the board to
other parts of the program,

874
00:46:22,620 --> 00:46:25,830
you might be faced with the
decision as to just say, no,

875
00:46:25,830 --> 00:46:31,330
I cannot accept any more data, or
you might say, OK, operating system,

876
00:46:31,330 --> 00:46:35,310
what if I don't mind where I am in
memory-- and you probably don't--

877
00:46:35,310 --> 00:46:39,390
but I would like you to find
me more bytes somewhere else?

878
00:46:39,390 --> 00:46:42,540
Rather like going from a one-bedroom
to a two-bedroom apartment

879
00:46:42,540 --> 00:46:45,990
so that you have more room, you might
physically have to pack your bags

880
00:46:45,990 --> 00:46:47,490
and go somewhere else.

881
00:46:47,490 --> 00:46:51,060
Unfortunately, just like in the
real world, that's not without cost.

882
00:46:51,060 --> 00:46:54,600
You need to pack those bags and
physically move, which takes time,

883
00:46:54,600 --> 00:46:57,780
and so will it take you and
the operating system some time

884
00:46:57,780 --> 00:47:00,600
to relocate every one of your values.

885
00:47:00,600 --> 00:47:04,260
So sure, there might be plenty of
space down here below on multiple rows

886
00:47:04,260 --> 00:47:07,890
and even not pictured, but it's going
to take a non-zero amount of time

887
00:47:07,890 --> 00:47:14,310
to relocate that 4 and 8 and 15 and
that 16 and 23 and 42 to new locations.

888
00:47:14,310 --> 00:47:17,370
That might be your only option
if you want to support more data,

889
00:47:17,370 --> 00:47:20,880
and indeed, most programs would want--
it would be an unfortunate situation

890
00:47:20,880 --> 00:47:24,900
if you had to tell your user or
boss, I'm sorry, I ran out of space,

891
00:47:24,900 --> 00:47:26,460
and that's certainly foolish.

892
00:47:26,460 --> 00:47:31,470
If you actually do have more space,
it's just not right there next to you.

893
00:47:31,470 --> 00:47:34,830
So with an array, you have
the ability physically

894
00:47:34,830 --> 00:47:39,390
to perform very sophisticated, very
efficient algorithms such as we've

895
00:47:39,390 --> 00:47:42,570
seen-- binary search and
bubble sort and selection sort

896
00:47:42,570 --> 00:47:46,860
and merge sort, and do
so in quite fast time.

897
00:47:46,860 --> 00:47:50,160
Even though selection sort and
bubble sort were big O of n squared,

898
00:47:50,160 --> 00:47:55,470
merge sort was actually n
times log n, which is slow--

899
00:47:55,470 --> 00:48:00,210
which is slower than log n
alone, but faster than n squared.

900
00:48:00,210 --> 00:48:03,300
But they all presuppose that you
had random access to elements

901
00:48:03,300 --> 00:48:06,630
arithmetically via their indexes
or address, and to do so,

902
00:48:06,630 --> 00:48:09,870
you can with your computer's
memory with arrays,

903
00:48:09,870 --> 00:48:12,210
but you need to commit to some value.

904
00:48:12,210 --> 00:48:13,110
All right, fine.

905
00:48:13,110 --> 00:48:17,190
Let's not ask the operating system for
6 bytes initially, let's say, give me 7

906
00:48:17,190 --> 00:48:19,080
because I'm going to
leave one of them blank.

907
00:48:19,080 --> 00:48:22,470
Now of course, that might buy
you some runway, so to speak,

908
00:48:22,470 --> 00:48:25,230
so that you can accommodate
if and when a seventh element,

909
00:48:25,230 --> 00:48:26,723
but what about an eighth?

910
00:48:26,723 --> 00:48:29,640
Well, you could ask the operating
system from the get-go, don't get me

911
00:48:29,640 --> 00:48:35,280
6 bytes of space, but give me
8 or give me 16 or give me 100.

912
00:48:35,280 --> 00:48:38,220
But at that point, you're
starting to get a little greedy,

913
00:48:38,220 --> 00:48:41,130
and you're starting to ask for
more memory than you might actually

914
00:48:41,130 --> 00:48:44,520
need anytime soon, and
that, too, is unfortunate,

915
00:48:44,520 --> 00:48:46,030
because now you're being wasteful.

916
00:48:46,030 --> 00:48:49,140
Your computer, of course, only
has a finite amount of space,

917
00:48:49,140 --> 00:48:51,930
and if you're asking for more
of it than you actually need,

918
00:48:51,930 --> 00:48:55,860
that memory, by definition,
is unavailable to other parts

919
00:48:55,860 --> 00:48:58,320
of your program and perhaps even others.

920
00:48:58,320 --> 00:49:00,690
And so your computer
ultimately might not

921
00:49:00,690 --> 00:49:05,670
be able to get as much work done because
it's been holding off to the side

922
00:49:05,670 --> 00:49:07,650
just some empty space.

923
00:49:07,650 --> 00:49:10,680
Empty parking spaces you've
reserved for yourself or empty seats

924
00:49:10,680 --> 00:49:14,220
at a table that might potentially
go unused, it's just wasteful.

925
00:49:14,220 --> 00:49:16,050
And hardware costs money.

926
00:49:16,050 --> 00:49:18,390
And hardware enables
you to solve problems.

927
00:49:18,390 --> 00:49:21,930
And with less hardware available,
can you solve fewer problems at hand,

928
00:49:21,930 --> 00:49:25,060
and so that, too, doesn't
feel like a perfect solution.

929
00:49:25,060 --> 00:49:27,840
So again, this series of
trade-offs, it depends

930
00:49:27,840 --> 00:49:32,430
on what's most important to you--
time or space or money or development

931
00:49:32,430 --> 00:49:35,980
or any number of other scarce resources.

932
00:49:35,980 --> 00:49:39,450
So what can we do instead
as opposed to an array?

933
00:49:39,450 --> 00:49:43,770
How do we go about getting dynamism
that we so clearly wants here,

934
00:49:43,770 --> 00:49:47,550
whereas it wouldn't-- wouldn't it be
nice if we could grow these great data

935
00:49:47,550 --> 00:49:50,220
structures, and better
yet, even shrink them?

936
00:49:50,220 --> 00:49:52,380
If I no longer need
some of these numbers,

937
00:49:52,380 --> 00:49:55,320
I'm going to give you back that
memory so that I can use it elsewhere

938
00:49:55,320 --> 00:49:57,390
for more compelling purposes.

939
00:49:57,390 --> 00:49:59,790
Well it turns out that
in computer science,

940
00:49:59,790 --> 00:50:02,670
programmers can create even
fancier data structures

941
00:50:02,670 --> 00:50:04,950
but at a higher level of abstraction.

942
00:50:04,950 --> 00:50:10,360
It turns out, we could start
making lists out of our values.

943
00:50:10,360 --> 00:50:13,530
In fact, if I wanted to add some
number to the screen, and for instance,

944
00:50:13,530 --> 00:50:16,920
maybe these two spots were
blocked off by something else.

945
00:50:16,920 --> 00:50:17,670
But you know what?

946
00:50:17,670 --> 00:50:20,370
I do know there's some room
elsewhere on the screen,

947
00:50:20,370 --> 00:50:22,630
it just happens to be available here.

948
00:50:22,630 --> 00:50:26,977
And so if I want to put the
number 50 in my list of values,

949
00:50:26,977 --> 00:50:29,310
I might just have to say, I
don't care where you put it,

950
00:50:29,310 --> 00:50:31,350
go ahead and put it right there.

951
00:50:31,350 --> 00:50:32,400
Well where is there?

952
00:50:32,400 --> 00:50:38,743
Well if we continue this indexing--
this is 6 and 7 and 8 and 9, 10, 11, 12,

953
00:50:38,743 --> 00:50:46,440
13, 14, and 15, if 50 happens to
end up by chance at location 15

954
00:50:46,440 --> 00:50:50,730
because it's the first byte available,
because not only these two, but maybe

955
00:50:50,730 --> 00:50:54,360
even all of these are taken
for some other reason--

956
00:50:54,360 --> 00:50:58,170
ever since you asked for
your first six, that's OK,

957
00:50:58,170 --> 00:51:02,940
so long as you can somehow link
your original data to the new.

958
00:51:02,940 --> 00:51:07,290
And pictorially here, I might be
inclined just to say, you know what?

959
00:51:07,290 --> 00:51:12,390
Let me just leave a little breadcrumb,
so to speak, and say that after the 42,

960
00:51:12,390 --> 00:51:16,380
I should actually go down
here and follow this arrow.

961
00:51:16,380 --> 00:51:19,200
Sort of Chutes and Ladders
style, if you will.

962
00:51:19,200 --> 00:51:22,530
Now that's fine and you can do that--
after all, at the end of the day,

963
00:51:22,530 --> 00:51:24,840
computers will do what
you want, and if you

964
00:51:24,840 --> 00:51:27,420
can write the code to
implement this idea,

965
00:51:27,420 --> 00:51:30,840
it will, in fact, remember that value.

966
00:51:30,840 --> 00:51:32,067
But how do we achieve this?

967
00:51:32,067 --> 00:51:34,650
Here, too, you have to come back
to the fundamental definition

968
00:51:34,650 --> 00:51:36,450
of what your computer is doing and how.

969
00:51:36,450 --> 00:51:39,930
It's just got that chip of memory,
and those bytes back-to-back,

970
00:51:39,930 --> 00:51:41,700
such as those pictured here.

971
00:51:41,700 --> 00:51:46,410
So this is all you get-- there is no
arrow feature inside of a computer.

972
00:51:46,410 --> 00:51:49,320
You have to implement
that notion yourself.

973
00:51:49,320 --> 00:51:51,940
So how can you go about doing that?

974
00:51:51,940 --> 00:51:54,780
Well, you can implement
this concept of an arrow,

975
00:51:54,780 --> 00:51:58,200
but you need to implement it
ultimately at a lower level or trust

976
00:51:58,200 --> 00:52:00,210
that someone else will for you.

977
00:52:00,210 --> 00:52:04,550
Well, as best I can tell, I do know
that my first several elements happened

978
00:52:04,550 --> 00:52:09,870
to be back-to-back from 4 on up
to 42 in locations 0 through 5.

979
00:52:09,870 --> 00:52:12,360
Because those are contiguous,
I get my random access

980
00:52:12,360 --> 00:52:15,690
and I can immediately jump from
beginning to middle to end.

981
00:52:15,690 --> 00:52:19,680
This 50 and anything after it needs
to be handled a little better.

982
00:52:19,680 --> 00:52:23,700
If I want to implement this
arrow, the only possible way

983
00:52:23,700 --> 00:52:28,650
seems to be to somehow remember
that the next element after 42

984
00:52:28,650 --> 00:52:32,190
is at location 15.

985
00:52:32,190 --> 00:52:33,870
And that location, a.k.a.

986
00:52:33,870 --> 00:52:38,340
address or index, just has
to be something I remember.

987
00:52:38,340 --> 00:52:42,720
Unfortunately I don't have quite
enough room left to remember that.

988
00:52:42,720 --> 00:52:45,930
What I really want to do is not
store this arrow, but by the way,

989
00:52:45,930 --> 00:52:50,340
parenthetically go ahead
and store the number 15--

990
00:52:50,340 --> 00:52:54,420
not as the index of that
cell, but as the next address

991
00:52:54,420 --> 00:52:56,190
that should be followed.

992
00:52:56,190 --> 00:52:58,980
The catch, though, is that I've
not left myself enough room.

993
00:52:58,980 --> 00:53:01,230
I've made mental note
in parentheses here

994
00:53:01,230 --> 00:53:03,550
that we've got to solve
this a bit better.

995
00:53:03,550 --> 00:53:07,260
So let's start over for the
moment, and no longer worry

996
00:53:07,260 --> 00:53:11,340
about this very low level, because
it's too messy at some point.

997
00:53:11,340 --> 00:53:13,050
It's like talking in 0's and 1's--

998
00:53:13,050 --> 00:53:15,640
I don't want to talk
in bytes in this way.

999
00:53:15,640 --> 00:53:17,670
So let's take things up
in abstraction level,

1000
00:53:17,670 --> 00:53:24,390
if you will, and just agree to agree
that you can store values in memory,

1001
00:53:24,390 --> 00:53:27,360
and those values can be
data, like numbers you want--

1002
00:53:27,360 --> 00:53:32,100
4, 8, 15, 16, 23, 42, and now 50.

1003
00:53:32,100 --> 00:53:37,380
And you can also store somehow
the addresses or indexes--

1004
00:53:37,380 --> 00:53:39,600
locations of those values.

1005
00:53:39,600 --> 00:53:43,000
It's just up to you
how to use this canvas.

1006
00:53:43,000 --> 00:53:45,420
So let's do that and clear
the screen and now start

1007
00:53:45,420 --> 00:53:47,160
to build a higher-level concept.

1008
00:53:47,160 --> 00:53:50,265
Not an array, but something
we'll call a linked list.

1009
00:53:50,265 --> 00:53:53,330


1010
00:53:53,330 --> 00:53:55,370
Now what is a linked list?

1011
00:53:55,370 --> 00:53:59,750
A linked list is a data structure that's
a higher-level concept in abstraction

1012
00:53:59,750 --> 00:54:04,820
on top of what ultimately is
just chunks of memory or bytes.

1013
00:54:04,820 --> 00:54:08,840
But this linked list shall enable
me to store more and more values

1014
00:54:08,840 --> 00:54:12,510
and even remove them simply
by linking them together.

1015
00:54:12,510 --> 00:54:16,640
So here, let me go ahead and represent
those same values starting with 4,

1016
00:54:16,640 --> 00:54:23,300
followed by 8 and 15, and then
16 and 23, and finally, 42.

1017
00:54:23,300 --> 00:54:26,840
And now eventually I'm going to want
to store 50, but I've run out of room

1018
00:54:26,840 --> 00:54:32,000
but that's fine, I'm going to go ahead
and write 50 wherever there's space.

1019
00:54:32,000 --> 00:54:35,900
But now let's not worry about that
grid, rows, and columns of memory.

1020
00:54:35,900 --> 00:54:38,480
Let's just stipulate that
yes, that's actually there,

1021
00:54:38,480 --> 00:54:41,180
but it's not useful to
operate at that level.

1022
00:54:41,180 --> 00:54:45,980
Much like it's not useful to continually
talk in terms of 0's and 1's.

1023
00:54:45,980 --> 00:54:50,030
So let me go ahead and wrap these
values with a higher-level idea

1024
00:54:50,030 --> 00:54:52,340
called a node or just a box.

1025
00:54:52,340 --> 00:54:56,660
And this box is going to store
for us each of these values.

1026
00:54:56,660 --> 00:55:02,303
Here I have 4, here I have
8 and 15, here I have 16,

1027
00:55:02,303 --> 00:55:05,960
I have 23, and finally, 42.

1028
00:55:05,960 --> 00:55:09,290
And then when it comes
time to add 50 to the mix,

1029
00:55:09,290 --> 00:55:11,150
it, too, will come in this box.

1030
00:55:11,150 --> 00:55:12,080
Now what is this box?

1031
00:55:12,080 --> 00:55:14,990
It's just an artist's rendition
of the underlying bytes,

1032
00:55:14,990 --> 00:55:18,790
but now I have the ability to draw
a prettier picture, if you will,

1033
00:55:18,790 --> 00:55:22,320
that somehow interlinks
these boxes together.

1034
00:55:22,320 --> 00:55:24,500
Indeed, what I ultimately
want to remember

1035
00:55:24,500 --> 00:55:28,760
is that 4 comes first and 42 comes
last, but then wait, if I had 50,

1036
00:55:28,760 --> 00:55:30,590
it shall now come last.

1037
00:55:30,590 --> 00:55:34,760
So we could do this as an artist quite
simply with those arrows pointing

1038
00:55:34,760 --> 00:55:38,840
each box to the next, implying
that the next element in the list,

1039
00:55:38,840 --> 00:55:45,590
whether it's next door or far away,
happens to be at the end of that arrow.

1040
00:55:45,590 --> 00:55:46,850
But what are those arrows?

1041
00:55:46,850 --> 00:55:49,820
Those are not something that
you can represent in a computer

1042
00:55:49,820 --> 00:55:55,940
if at the end of the day all you have
are blocks of memory and in them bytes.

1043
00:55:55,940 --> 00:55:58,190
If all you have are bytes--
when, therefore, patterns

1044
00:55:58,190 --> 00:56:00,530
of 0's and 1's, whatever
you store in the computer

1045
00:56:00,530 --> 00:56:04,580
must be representable with those 0's
and 1's, and among the easiest things

1046
00:56:04,580 --> 00:56:10,340
to represent, we know already, is
numbers, like indexes or addresses

1047
00:56:10,340 --> 00:56:11,540
of these nodes.

1048
00:56:11,540 --> 00:56:16,550
So for instance, depending on
where these nodes are in memory,

1049
00:56:16,550 --> 00:56:20,300
we can simply check that
address and store it as well.

1050
00:56:20,300 --> 00:56:23,510
So for instance, if the 4 still
happens to be at address 0,

1051
00:56:23,510 --> 00:56:29,060
and this time 8 is at address 4,
and this one 8, and this one 12,

1052
00:56:29,060 --> 00:56:34,430
and this one 16, and this one 20-- just
by chance back-to-back-to-back 4 bytes

1053
00:56:34,430 --> 00:56:35,120
apart--

1054
00:56:35,120 --> 00:56:38,270
32 bits, well 50 might
be some distance away.

1055
00:56:38,270 --> 00:56:42,710
Maybe it's actually at
location 100, that's OK.

1056
00:56:42,710 --> 00:56:44,150
We can still do this.

1057
00:56:44,150 --> 00:56:48,320
Because if we use part of
this node, part of each box

1058
00:56:48,320 --> 00:56:53,300
to implement those actual arrows, we
can actually store all the information

1059
00:56:53,300 --> 00:56:56,520
we need to know how to get
from one box to another.

1060
00:56:56,520 --> 00:56:59,890
For instance, to get from
4 to the next element,

1061
00:56:59,890 --> 00:57:04,850
you're going to want to coincidentally
go to not number 4, but address 4.

1062
00:57:04,850 --> 00:57:08,510
And if you want to go from
value 8 to the next value, 15,

1063
00:57:08,510 --> 00:57:11,270
you're going to want to go to address 8.

1064
00:57:11,270 --> 00:57:14,300
And if you want to go from
15 to 16, the next address

1065
00:57:14,300 --> 00:57:19,010
is going to be 12, followed
by 16, followed by 20.

1066
00:57:19,010 --> 00:57:21,170
And herein lies the magic--

1067
00:57:21,170 --> 00:57:24,410
if you want to get from 42
to that newest element that's

1068
00:57:24,410 --> 00:57:31,820
just elsewhere at address 100, that's
what gets associated with 42's node.

1069
00:57:31,820 --> 00:57:33,750
As for 50, it's the dead end.

1070
00:57:33,750 --> 00:57:35,990
There's nothing more
there, so we might simply

1071
00:57:35,990 --> 00:57:38,780
draw a line through that
box saying, eh, just

1072
00:57:38,780 --> 00:57:44,660
store it all 0 bits or some
other convention equivalently.

1073
00:57:44,660 --> 00:57:46,670
So there's so many
numbers now on the screen,

1074
00:57:46,670 --> 00:57:50,330
but to be fair, that's all that's
going on inside of a computer--

1075
00:57:50,330 --> 00:57:52,640
just storing of these bytes.

1076
00:57:52,640 --> 00:57:56,720
But now we can stipulate
that, OK, I can somehow

1077
00:57:56,720 --> 00:58:03,770
store the location of each node in
memory using its index or address.

1078
00:58:03,770 --> 00:58:06,890
It's just frankly not all that
pleasant to stare at these values,

1079
00:58:06,890 --> 00:58:10,910
I'd much rather look at and
draw the arrows graphically,

1080
00:58:10,910 --> 00:58:14,870
thereby representing the same idea
of these pointers, if you will,

1081
00:58:14,870 --> 00:58:18,710
a term of art in some languages
that allows me to remember

1082
00:58:18,710 --> 00:58:21,680
which element goes to which.

1083
00:58:21,680 --> 00:58:25,830
And what is the upside of
all this now complexity?

1084
00:58:25,830 --> 00:58:29,810
Well now we have the ability to
string together all of these nodes.

1085
00:58:29,810 --> 00:58:32,210
And frankly, if we wanted to
remove one of these elements

1086
00:58:32,210 --> 00:58:35,150
from the list, that's fine,
we can rather snip it out.

1087
00:58:35,150 --> 00:58:37,910
And we can simply update what
the arrow is pointing to,

1088
00:58:37,910 --> 00:58:42,140
and equivalently, we can update
the next address in that node.

1089
00:58:42,140 --> 00:58:45,620
And we can certainly add to this list
by drawing more nodes here or perhaps

1090
00:58:45,620 --> 00:58:50,120
over here and just link them with arrows
conceptually, or more specifically,

1091
00:58:50,120 --> 00:58:54,640
by changing that dead end to
the address of the next element.

1092
00:58:54,640 --> 00:58:59,650
And so we can create the idea of
the abstraction of a list using

1093
00:58:59,650 --> 00:59:02,260
just this canvas of memory.

1094
00:59:02,260 --> 00:59:04,000
But not all is good here.

1095
00:59:04,000 --> 00:59:06,040
We've surely paid a price, right?

1096
00:59:06,040 --> 00:59:10,120
Surely we couldn't get dynamism
for addition and removal

1097
00:59:10,120 --> 00:59:13,840
and updating of a list
without paying some price.

1098
00:59:13,840 --> 00:59:17,260
This dynamic growth, this ability
to store as many more elements

1099
00:59:17,260 --> 00:59:20,890
as we want without having to tell the
operating system from the get-go how

1100
00:59:20,890 --> 00:59:23,110
many elements we expect.

1101
00:59:23,110 --> 00:59:26,590
And indeed, while we're
lucky at first, perhaps,

1102
00:59:26,590 --> 00:59:29,770
if we know from the get-go we
need at least six values here,

1103
00:59:29,770 --> 00:59:33,100
they might be a consistent
distance apart--

1104
00:59:33,100 --> 00:59:36,250
4 bytes or 32 bits.

1105
00:59:36,250 --> 00:59:38,650
And so I could do arithmetic
on some of these nodes,

1106
00:59:38,650 --> 00:59:42,670
but that is no longer, unfortunately,
a guarantee of this structure.

1107
00:59:42,670 --> 00:59:47,230
Whereas arrays do guarantee you
random access, linked lists do not.

1108
00:59:47,230 --> 00:59:52,900
And linked lists instead require
that you traverse them in linear time

1109
00:59:52,900 --> 00:59:56,470
from the first element potentially
all the way to the last.

1110
00:59:56,470 --> 00:59:58,600
There is no way to jump
to the middle element,

1111
00:59:58,600 --> 01:00:04,140
because frankly, if I do that math as
before, 100 bytes away is the last,

1112
01:00:04,140 --> 01:00:08,950
so 100 divided by 2 is 50--
rounding down, keeping me at 50,

1113
01:00:08,950 --> 01:00:12,590
puts me somewhere over
here, and that's not right.

1114
01:00:12,590 --> 01:00:14,920
The middle element is
earlier, but that's

1115
01:00:14,920 --> 01:00:19,450
because there's no now support for
random access or instant arithmetic

1116
01:00:19,450 --> 01:00:24,250
access to elements like
the first, last, or middle.

1117
01:00:24,250 --> 01:00:27,430
All we'll remember now for the
linked list is that first element,

1118
01:00:27,430 --> 01:00:31,360
and from there, we have to
follow all of those breadcrumbs.

1119
01:00:31,360 --> 01:00:34,100
So that might be too
high of a price to pay.

1120
01:00:34,100 --> 01:00:36,760
And moreover, there's overhead
now, because I'm not storing

1121
01:00:36,760 --> 01:00:39,280
for every node one value, but two--

1122
01:00:39,280 --> 01:00:43,450
the value or data I care about, and
the address or metadata that lets me

1123
01:00:43,450 --> 01:00:45,700
get to the next node.

1124
01:00:45,700 --> 01:00:48,190
So I'm using twice as
much space there, say,

1125
01:00:48,190 --> 01:00:50,560
at least when storing
numbers, but at least

1126
01:00:50,560 --> 01:00:53,500
I'm getting that dynamic
support for growth.

1127
01:00:53,500 --> 01:00:59,710
So again, it depends on that trade-off
and what is less costly to you.

1128
01:00:59,710 --> 01:01:00,490
But never fear.

1129
01:01:00,490 --> 01:01:02,650
This is just another problem to solve.

1130
01:01:02,650 --> 01:01:06,220
To be clear, we'd like to retain
the dynamism that something

1131
01:01:06,220 --> 01:01:10,300
a linked list offers-- the ability
to grow and even shrink that data

1132
01:01:10,300 --> 01:01:14,770
structure over time without having to
decide a priori just how much memory we

1133
01:01:14,770 --> 01:01:15,400
want.

1134
01:01:15,400 --> 01:01:19,810
But at the moment we've lost the ability
to search it quickly, as with something

1135
01:01:19,810 --> 01:01:21,100
like binary search.

1136
01:01:21,100 --> 01:01:25,570
So wouldn't it be nice if we could
get both properties together?

1137
01:01:25,570 --> 01:01:29,320
The ability to grow and shrink
as well as to search fast?

1138
01:01:29,320 --> 01:01:31,480
Well I daresay we can
if we're just a bit more

1139
01:01:31,480 --> 01:01:34,300
clever about how we draw on our canvas.

1140
01:01:34,300 --> 01:01:36,310
Again, let's stipulate
that we can certainly

1141
01:01:36,310 --> 01:01:40,180
store values anywhere in memory
and somehow stitch them together

1142
01:01:40,180 --> 01:01:41,660
using addresses.

1143
01:01:41,660 --> 01:01:43,720
Now those addresses,
otherwise known as pointers,

1144
01:01:43,720 --> 01:01:48,230
we no longer need draw, because
frankly, they're just now a distraction.

1145
01:01:48,230 --> 01:01:52,450
It suffices to know we can draw them
pictorially as with some arrows,

1146
01:01:52,450 --> 01:01:54,010
so let's do just that.

1147
01:01:54,010 --> 01:01:56,470
Let me go ahead now
and draw those values,

1148
01:01:56,470 --> 01:02:02,680
say 16 up here followed by
my 8 and 15, as well as my 4.

1149
01:02:02,680 --> 01:02:07,330
Over here, well I draw
that 42 and my 23,

1150
01:02:07,330 --> 01:02:10,450
and now it remains for me to
somehow link these together.

1151
01:02:10,450 --> 01:02:13,870
Since I don't need to leave
room for those actual addresses,

1152
01:02:13,870 --> 01:02:16,420
it suffices now to just draw arrows.

1153
01:02:16,420 --> 01:02:22,630
I'll go ahead and draw just a box around
16 and 8, as well as my 4 and my 15,

1154
01:02:22,630 --> 01:02:26,330
as well as my 23 and my 42.

1155
01:02:26,330 --> 01:02:28,060
Now how should I go about linking them?

1156
01:02:28,060 --> 01:02:31,600
Well let me propose that we no
longer link just from left to right,

1157
01:02:31,600 --> 01:02:37,690
but rather assemble more of a
hierarchy here with 16 pointing at 8,

1158
01:02:37,690 --> 01:02:40,960
and 16 also pointing at 42.

1159
01:02:40,960 --> 01:02:48,160
And 42, meanwhile, pointing at 23
with 8 pointing at 4 as well as 15.

1160
01:02:48,160 --> 01:02:51,190
Now why have I done it this way?

1161
01:02:51,190 --> 01:02:54,430
Well by including these arrows
sometimes bidirectionally,

1162
01:02:54,430 --> 01:02:57,100
have I stitched together
a two-dimensional data

1163
01:02:57,100 --> 01:02:58,300
structure, if you will?

1164
01:02:58,300 --> 01:03:01,720
Now this again surely could be
mapped to that lower level of memory

1165
01:03:01,720 --> 01:03:06,040
just by jotting down the addresses
that each of these arrows represents,

1166
01:03:06,040 --> 01:03:08,470
but I like thinking at
this level of abstraction

1167
01:03:08,470 --> 01:03:12,130
because I now can think in more
sophisticated form about how

1168
01:03:12,130 --> 01:03:14,550
I might layout my data.

1169
01:03:14,550 --> 01:03:17,740
So what properties do I now
get from this structure?

1170
01:03:17,740 --> 01:03:19,870
Well, dynamism was the
first goal at hand,

1171
01:03:19,870 --> 01:03:21,970
and how might I go about
adding a new value?

1172
01:03:21,970 --> 01:03:25,370
Say it's 50 that I'd like
to add to this structure.

1173
01:03:25,370 --> 01:03:28,090
Well, if I look at the
top here, 16, it's already

1174
01:03:28,090 --> 01:03:33,430
got two arrows, so it's full,
but I know 50 is bigger than 16,

1175
01:03:33,430 --> 01:03:36,310
so let's start to apply
that dynamic and say 50

1176
01:03:36,310 --> 01:03:39,190
shall definitely go down to the right.

1177
01:03:39,190 --> 01:03:43,870
Unfortunately, 42 already has one arrow
off it, but there is room for more,

1178
01:03:43,870 --> 01:03:48,400
and it turns out that 50 is,
in fact, greater than 42.

1179
01:03:48,400 --> 01:03:49,330
So you know what?

1180
01:03:49,330 --> 01:03:55,800
I'm just going to slot 50 right there
and draw 42's second arrow to 50.

1181
01:03:55,800 --> 01:03:58,720
And what picture seems
to be emerging here?

1182
01:03:58,720 --> 01:04:02,130
It's perhaps reminiscent
of a family tree of sorts.

1183
01:04:02,130 --> 01:04:06,390
Indeed, with parents and children,
or a tree more generally with roots.

1184
01:04:06,390 --> 01:04:08,940
Now whereas in our human
world, trees tend to grow up,

1185
01:04:08,940 --> 01:04:12,170
these trees in computer
science tend to grow down.

1186
01:04:12,170 --> 01:04:15,480
But henceforth, let's
call this 16 our root,

1187
01:04:15,480 --> 01:04:19,140
and to its left is its left child, to
its right is its right child, or more

1188
01:04:19,140 --> 01:04:22,740
generally, a whole left subtree
and a whole right subtree.

1189
01:04:22,740 --> 01:04:26,040
Because indeed, starting at 42,
we have another tree of sorts.

1190
01:04:26,040 --> 01:04:31,740
Rooted at 42 is a child called
23, and another child called 50.

1191
01:04:31,740 --> 01:04:35,310
So in this case, it's each of
the nodes in our structure,

1192
01:04:35,310 --> 01:04:41,520
otherwise known in computer science as
a tree, has zero, one, or two children,

1193
01:04:41,520 --> 01:04:44,110
you can create the second dimension.

1194
01:04:44,110 --> 01:04:46,140
and you can preserve
not only the ability

1195
01:04:46,140 --> 01:04:49,320
to add data dynamically
like 50, but, but,

1196
01:04:49,320 --> 01:04:53,912
but, we also now gain back
that ability to search.

1197
01:04:53,912 --> 01:04:55,620
After all, if I'm
asked now the question,

1198
01:04:55,620 --> 01:04:57,900
is the number 15 in this structure?

1199
01:04:57,900 --> 01:04:59,190
Well let me check for you.

1200
01:04:59,190 --> 01:05:02,610
Starting at 16, which is where this
structure begins, just like a linked

1201
01:05:02,610 --> 01:05:05,790
list starts conceptually
at the left, I'll

1202
01:05:05,790 --> 01:05:08,940
check if 16 is the value you
want-- it's not, it's too big,

1203
01:05:08,940 --> 01:05:13,000
but I do know that 15, if
it's here, it's to the left.

1204
01:05:13,000 --> 01:05:15,450
Now 8, of course, is not
the value you want either,

1205
01:05:15,450 --> 01:05:19,620
but 8 is smaller than 15,
so I'll now go to the right.

1206
01:05:19,620 --> 01:05:22,470
And indeed, sure enough,
that I now find 15.

1207
01:05:22,470 --> 01:05:26,610
And it only took me one,
two steps, not n to find it,

1208
01:05:26,610 --> 01:05:31,710
because through this second dimension
am I able to lift up some of those nodes

1209
01:05:31,710 --> 01:05:34,890
rather than draw them just
down as a straight line,

1210
01:05:34,890 --> 01:05:37,320
or in the linked to list, all
the way from left to right.

1211
01:05:37,320 --> 01:05:41,550
With the second dimension can I
now organize things more tightly.

1212
01:05:41,550 --> 01:05:44,230
And notice the key
characteristics of this tree.

1213
01:05:44,230 --> 01:05:48,210
It is what's generally known,
indeed, as a binary search tree.

1214
01:05:48,210 --> 01:05:51,090
Not only because it's a tree
that lends itself to search,

1215
01:05:51,090 --> 01:05:57,500
but also because each of the nodes
has no more than two or bi-children--

1216
01:05:57,500 --> 01:05:58,920
zero, one, or two.

1217
01:05:58,920 --> 01:06:02,430
And notice that to the left of
the 16 is not only the value

1218
01:06:02,430 --> 01:06:07,710
8, but every number that can be reached
to the left of 16 happens to be,

1219
01:06:07,710 --> 01:06:10,350
by design, less than 16.

1220
01:06:10,350 --> 01:06:12,000
And that's how we found 15.

1221
01:06:12,000 --> 01:06:17,280
Moreover to the right of 16,
every value is greater than 16,

1222
01:06:17,280 --> 01:06:18,720
just as we have here.

1223
01:06:18,720 --> 01:06:22,140
And that definition can be
applied so-called recursively.

1224
01:06:22,140 --> 01:06:25,800
You can make that claim about every
node in this tree at any level,

1225
01:06:25,800 --> 01:06:30,810
because here, 42, every node to
its left albeit just one is less.

1226
01:06:30,810 --> 01:06:34,800
Every node to its right
albeit one is indeed more.

1227
01:06:34,800 --> 01:06:39,840
So so long as you bring to bear to
our data the same sort of intuition

1228
01:06:39,840 --> 01:06:43,590
we brought to our phone book can
we achieve these same properties

1229
01:06:43,590 --> 01:06:47,340
and goals, this efficiency
of logarithmic time.

1230
01:06:47,340 --> 01:06:52,050
Log base 2 of n is indeed how long
it might take us, big O of that

1231
01:06:52,050 --> 01:06:54,540
to find or insert some value.

1232
01:06:54,540 --> 01:06:57,600
Now to be fair, there are
some prices paid here.

1233
01:06:57,600 --> 01:07:00,120
If I'm not careful, a
data structure like this

1234
01:07:00,120 --> 01:07:02,790
could actually devolve
into a linked list

1235
01:07:02,790 --> 01:07:05,790
if I just keep adding,
by coincidence or intent,

1236
01:07:05,790 --> 01:07:08,620
more and more big and big numbers.

1237
01:07:08,620 --> 01:07:12,060
They might just so happen to get
long and long and long and stringy

1238
01:07:12,060 --> 01:07:16,050
unless we're smart about how we
rebalance the tree occasionally.

1239
01:07:16,050 --> 01:07:18,420
And indeed, there are other
forms of these trees that

1240
01:07:18,420 --> 01:07:22,560
are smart, and with more code, will
rebalance themselves to make sure

1241
01:07:22,560 --> 01:07:26,910
that they don't get long and stringy,
but stay as high up as possible.

1242
01:07:26,910 --> 01:07:30,420
But there's another price paid
beyond that potential gotcha--

1243
01:07:30,420 --> 01:07:31,800
more space.

1244
01:07:31,800 --> 01:07:36,930
Whereas my array used no arrows
whatsoever and thus no extra space,

1245
01:07:36,930 --> 01:07:41,910
my linked list did use one extra
chunk of space for each node--

1246
01:07:41,910 --> 01:07:45,240
storage for that point or
address of its neighbor.

1247
01:07:45,240 --> 01:07:48,180
But in a tree structure, if
you're storing multiple children,

1248
01:07:48,180 --> 01:07:52,410
you're using as many as two
additional chunks of memory

1249
01:07:52,410 --> 01:07:55,330
to store as many if two of those arrows.

1250
01:07:55,330 --> 01:07:58,110
And so with a tree structure
are you spending more space,

1251
01:07:58,110 --> 01:08:00,640
but potentially it's saving you time.

1252
01:08:00,640 --> 01:08:02,610
So again, we see this
theme of trade-offs,

1253
01:08:02,610 --> 01:08:06,240
whereby if you really want
less time to be spent,

1254
01:08:06,240 --> 01:08:10,530
you're going to have to
spend more of that space.

1255
01:08:10,530 --> 01:08:12,660
Now can we do even better?

1256
01:08:12,660 --> 01:08:15,720
With an array, we had
instant access to data,

1257
01:08:15,720 --> 01:08:18,180
but we painted ourselves
into that corner.

1258
01:08:18,180 --> 01:08:21,069
With a linked list did we
solve that particular problem,

1259
01:08:21,069 --> 01:08:24,240
but we gave up the ability
to jump right where we want.

1260
01:08:24,240 --> 01:08:27,000
But with trees, particularly
binary search trees,

1261
01:08:27,000 --> 01:08:32,399
can we rearrange our data intelligently
and regain that logarithmic time.

1262
01:08:32,399 --> 01:08:35,340
But wouldn't it be nice if we
could achieve even better, say,

1263
01:08:35,340 --> 01:08:40,590
constant time searches of
data and insertions thereof?

1264
01:08:40,590 --> 01:08:43,830
Well for that, perhaps we could
amalgamate some of the ideas

1265
01:08:43,830 --> 01:08:47,970
we've seen thus far into just
one especially clever structure.

1266
01:08:47,970 --> 01:08:51,569
And let's call that particular
structure a hash table.

1267
01:08:51,569 --> 01:08:55,890
And indeed, this is perhaps, in theory,
the holy grail of data structures,

1268
01:08:55,890 --> 01:09:01,380
insofar as you can store anything
in it in ideally constant time.

1269
01:09:01,380 --> 01:09:02,630
But how best to do this?

1270
01:09:02,630 --> 01:09:06,120
Well let's begin by
drawing ourselves an array.

1271
01:09:06,120 --> 01:09:08,939
And that array this time
I'll draw vertically simply

1272
01:09:08,939 --> 01:09:12,810
to leave ourselves a bit more
in room for something clever.

1273
01:09:12,810 --> 01:09:16,830
This array, as always, can be indexed
into by way of these locations

1274
01:09:16,830 --> 01:09:20,970
here where this might be
location 0 and 1, 2, and 3,

1275
01:09:20,970 --> 01:09:23,830
followed by any number of others.

1276
01:09:23,830 --> 01:09:25,770
Now how do I want to use this array?

1277
01:09:25,770 --> 01:09:29,130
Well suppose that I want to
store names and not numbers.

1278
01:09:29,130 --> 01:09:32,229
Those names, of course, could just
be inserted in any old location,

1279
01:09:32,229 --> 01:09:34,430
but if unsorted, we
already know we're going

1280
01:09:34,430 --> 01:09:37,350
to suffer as much as big O of n time--

1281
01:09:37,350 --> 01:09:40,380
linear time with which to find
a particular name in that array

1282
01:09:40,380 --> 01:09:43,957
if you know nothing a
priori about the order.

1283
01:09:43,957 --> 01:09:47,040
Well we know already, too, we could
do better just like the phone company,

1284
01:09:47,040 --> 01:09:50,100
and if we sort the names we're
putting into this structure,

1285
01:09:50,100 --> 01:09:53,490
we can at least then do binary search
and whittle that search time down

1286
01:09:53,490 --> 01:09:56,370
to log base 2 of n.

1287
01:09:56,370 --> 01:09:59,160
But wouldn't it be nice if we
can whittle that down further

1288
01:09:59,160 --> 01:10:04,290
and get to any name we want in nearly
constant time-- one step, maybe two

1289
01:10:04,290 --> 01:10:05,730
or a few?

1290
01:10:05,730 --> 01:10:10,440
Well with a hash table can you
approximately or ideally do that,

1291
01:10:10,440 --> 01:10:14,640
so long as we decide in advance
how to hash those strings.

1292
01:10:14,640 --> 01:10:18,680
In other words, those strings of
characters, here called names,

1293
01:10:18,680 --> 01:10:23,640
they have letters inside of
them, say D-A-V-I-D for my own.

1294
01:10:23,640 --> 01:10:26,580
Well what if we looked
at not the whole name,

1295
01:10:26,580 --> 01:10:29,400
but that first letter, which
is, of course, constant time

1296
01:10:29,400 --> 01:10:31,090
to just look at one value.

1297
01:10:31,090 --> 01:10:35,520
And so if D is the fourth letter in
the English alphabet, what if I store

1298
01:10:35,520 --> 01:10:36,270
DAVID--

1299
01:10:36,270 --> 01:10:40,620
or really, any D name at the
fourth index in my array,

1300
01:10:40,620 --> 01:10:43,960
location 3 if you start counting at 0?

1301
01:10:43,960 --> 01:10:47,730
So here might be the A names, and here
the B names, and here the C names,

1302
01:10:47,730 --> 01:10:53,160
and someone like David now belongs
in this bucket, if you will.

1303
01:10:53,160 --> 01:10:57,210
Now suppose I want to store
other names in this structure.

1304
01:10:57,210 --> 01:11:02,400
Well Alice belongs at location 0,
and Bob, for instance, location 1.

1305
01:11:02,400 --> 01:11:05,310
And we can continue this
logic and can continue

1306
01:11:05,310 --> 01:11:09,690
to insert more and more names
so long as we hash those names

1307
01:11:09,690 --> 01:11:12,240
and jump right to the right location.

1308
01:11:12,240 --> 01:11:15,360
After all, I can in one
step look at A or B or D

1309
01:11:15,360 --> 01:11:18,840
and instantly know 0 or 1 or 3.

1310
01:11:18,840 --> 01:11:19,560
How?

1311
01:11:19,560 --> 01:11:22,560
Well recall that in a computer
you have ASCII or Unicode.

1312
01:11:22,560 --> 01:11:26,190
And we already have numbers
predetermined to map

1313
01:11:26,190 --> 01:11:27,820
to those same characters.

1314
01:11:27,820 --> 01:11:32,310
Now to be fair, A I'm pretty
sure it was 65 in ASCII,

1315
01:11:32,310 --> 01:11:37,020
but we could certainly
subtract 65 from 65 to get 0.

1316
01:11:37,020 --> 01:11:43,230
And if capital B was 66, we could
certainly subtract 65 from 66 to get 1.

1317
01:11:43,230 --> 01:11:48,480
So we can look, then, at the first
letter of any name, convert it to ASCII

1318
01:11:48,480 --> 01:11:51,810
and subtract quite simply
65 if it's capital,

1319
01:11:51,810 --> 01:11:54,810
and get precisely to the index we want.

1320
01:11:54,810 --> 01:11:57,850
So to be fair, that's not one,
but it is two or three steps,

1321
01:11:57,850 --> 01:12:01,410
but that is a constant number of
steps again and again independent

1322
01:12:01,410 --> 01:12:04,380
of n, the total number of names.

1323
01:12:04,380 --> 01:12:08,220
Now what's nice about this is that we
have a data structure into which we

1324
01:12:08,220 --> 01:12:13,530
can insert names instantly by
hashing them and getting as output

1325
01:12:13,530 --> 01:12:19,470
that number or index 0 through 25,
in the case of an English alphabet.

1326
01:12:19,470 --> 01:12:22,690
But what problem might arise?

1327
01:12:22,690 --> 01:12:25,740
The catch, though, is that we
have someone else, like Doug,

1328
01:12:25,740 --> 01:12:28,290
whose name happens to
start with the same name,

1329
01:12:28,290 --> 01:12:31,890
unfortunately there seems to be
no room at this moment for Doug

1330
01:12:31,890 --> 01:12:33,510
since I'm already there.

1331
01:12:33,510 --> 01:12:37,140
But there we can draw inspiration
from other data structures still.

1332
01:12:37,140 --> 01:12:40,770
We could maybe not just
put David in this array,

1333
01:12:40,770 --> 01:12:45,270
but not even treat this array
as the entire data structure,

1334
01:12:45,270 --> 01:12:47,590
but really the beginning of another.

1335
01:12:47,590 --> 01:12:52,230
In fact, let me go ahead and
put David in his or my own box

1336
01:12:52,230 --> 01:12:54,690
and give Doug his own as well.

1337
01:12:54,690 --> 01:12:58,710
Now Doug and I are really
just nodes in a structure.

1338
01:12:58,710 --> 01:13:03,480
And we can use this array still to
get to the right nodes of interest,

1339
01:13:03,480 --> 01:13:07,290
but now we can use arrows
to stitch them together.

1340
01:13:07,290 --> 01:13:10,760
If I have multiple names,
each of which starts with a D,

1341
01:13:10,760 --> 01:13:12,740
I just need to remember
to link those together,

1342
01:13:12,740 --> 01:13:15,650
thereby allowing myself to
have any number of names

1343
01:13:15,650 --> 01:13:18,800
that start with that same
letter, treating that list really

1344
01:13:18,800 --> 01:13:20,240
as a linked list.

1345
01:13:20,240 --> 01:13:24,680
But I get to that length list instantly
by looking at that first letter

1346
01:13:24,680 --> 01:13:27,710
and jumping here to the right location.

1347
01:13:27,710 --> 01:13:33,540
And so here I get both dynamic growth
and instant access to that list,

1348
01:13:33,540 --> 01:13:37,140
thereby decreasing
significantly the amount of time

1349
01:13:37,140 --> 01:13:40,950
it takes me to find someone
maybe 1/26 of the time.

1350
01:13:40,950 --> 01:13:43,020
Now to be fair, wait a
minute, we're already

1351
01:13:43,020 --> 01:13:47,160
seeing collisions, so to speak,
whereby I have multiple inputs hashing

1352
01:13:47,160 --> 01:13:48,600
to the same output--

1353
01:13:48,600 --> 01:13:50,460
three in this instance.

1354
01:13:50,460 --> 01:13:52,950
And in the worst case,
perhaps everyone in the room

1355
01:13:52,950 --> 01:13:55,710
all has a name that starts
with D, which means really,

1356
01:13:55,710 --> 01:13:58,240
you don't have a hash
table or array at all,

1357
01:13:58,240 --> 01:14:02,970
you just have one really long
linked list, and thus, linear.

1358
01:14:02,970 --> 01:14:07,180
But that would be considered a more
perverse scenario, which you should try

1359
01:14:07,180 --> 01:14:09,570
to avoid by way of that hash function.

1360
01:14:09,570 --> 01:14:13,770
If that is the problem you're facing,
then your hash function is just bad.

1361
01:14:13,770 --> 01:14:16,100
You should not have
looked only in that case

1362
01:14:16,100 --> 01:14:18,270
at just the first letter of every name.

1363
01:14:18,270 --> 01:14:21,030
Perhaps you should have looked
at the first two letters

1364
01:14:21,030 --> 01:14:26,510
back-to-back, and put anyone's name
that starts with D-A in one list;

1365
01:14:26,510 --> 01:14:32,310
and D-B, if there is any, in a second
list; and D-C, if there's any of those,

1366
01:14:32,310 --> 01:14:36,930
in some third list altogether;
and D-D and D-E and D-F

1367
01:14:36,930 --> 01:14:42,870
and so forth, and actually have multiple
combinations of every two letters,

1368
01:14:42,870 --> 01:14:47,790
and have as many buckets, so to
speak, as many indexes in your array

1369
01:14:47,790 --> 01:14:51,840
as there are pairs of
two alphabetical letters.

1370
01:14:51,840 --> 01:14:53,730
Now to be fair, you
might have two people

1371
01:14:53,730 --> 01:14:58,770
whose names start with D-A or D-O,
but hopefully there's even fewer.

1372
01:14:58,770 --> 01:15:00,510
And indeed, I say a hash table--

1373
01:15:00,510 --> 01:15:04,440
this whole structure approximates
the idea of constant time

1374
01:15:04,440 --> 01:15:10,680
because it can devolve in places to
linear time with longer lists of names.

1375
01:15:10,680 --> 01:15:14,520
But if your hash function is good
and you don't have these collisions,

1376
01:15:14,520 --> 01:15:19,650
and therefore ideally you don't have
any linked lists, just names, then

1377
01:15:19,650 --> 01:15:23,010
you indeed have a structure that
gives you constant time access,

1378
01:15:23,010 --> 01:15:25,710
ultimately, combining
all of these underlying

1379
01:15:25,710 --> 01:15:29,250
principles of dynamic
growth and random access

1380
01:15:29,250 --> 01:15:34,680
to achieve ultimately the
storage of all your values.

1381
01:15:34,680 --> 01:15:39,150
How, then, might a language like Python
implement data types like int and str?

1382
01:15:39,150 --> 01:15:41,130
Well in the case of
Python's latest version,

1383
01:15:41,130 --> 01:15:45,040
it allows ints to grow as
big as you need them to be.

1384
01:15:45,040 --> 01:15:49,530
And so it surely can only be using
contiguous memory once allocated

1385
01:15:49,530 --> 01:15:51,190
that stays in the same place.

1386
01:15:51,190 --> 01:15:54,590
If instead you want a
number to grow over time,

1387
01:15:54,590 --> 01:15:58,890
well you're probably going to need to
allocate some variable number of bytes

1388
01:15:58,890 --> 01:15:59,850
in that memory.

1389
01:15:59,850 --> 01:16:01,020
Strings, too, as well.

1390
01:16:01,020 --> 01:16:04,540
If you want to allocate strings, you're
going to need to allow them to grow,

1391
01:16:04,540 --> 01:16:07,080
which means finding
extra space in proximity

1392
01:16:07,080 --> 01:16:10,890
to the characters you already have, or
maybe relocating the whole structure

1393
01:16:10,890 --> 01:16:13,000
so that that value can keep growing.

1394
01:16:13,000 --> 01:16:15,960
But we know now, we can do
this with our canvas of memory.

1395
01:16:15,960 --> 01:16:19,530
How the particular language does it
isn't even necessarily of interest,

1396
01:16:19,530 --> 01:16:23,850
we just know that it can, and
even underneath the hood, how

1397
01:16:23,850 --> 01:16:25,020
it might do so.

1398
01:16:25,020 --> 01:16:29,040
As for these other structures in Python
like dict or dictionary and list,

1399
01:16:29,040 --> 01:16:32,760
well those, too, are exactly
what we've seen here.

1400
01:16:32,760 --> 01:16:36,960
A dictionary in Python is really just
a hash table, some sort of variable

1401
01:16:36,960 --> 01:16:40,770
that has indexes that are not
necessarily numbers, but words,

1402
01:16:40,770 --> 01:16:43,950
and via those words can
you get back a value.

1403
01:16:43,950 --> 01:16:47,580
Indeed, more generally does a
hash table have keys and values.

1404
01:16:47,580 --> 01:16:51,100
The keys are the inputs via
which you produce those outputs.

1405
01:16:51,100 --> 01:16:54,930
So in our data structure, might
have been the inputs as names.

1406
01:16:54,930 --> 01:16:59,710
The output of my hash function was
an index value like some number.

1407
01:16:59,710 --> 01:17:03,390
And in Python do you have a
wonderful abstraction in code that

1408
01:17:03,390 --> 01:17:06,540
allows you to express that
idea of associating keys

1409
01:17:06,540 --> 01:17:10,200
with values, names with
yes or no, true or false

1410
01:17:10,200 --> 01:17:14,520
they are present so that you can ask
those questions yourself in your code.

1411
01:17:14,520 --> 01:17:16,740
And as for list, it's quite simply that.

1412
01:17:16,740 --> 01:17:19,440
It's the idea of an array
but with that added dynamism,

1413
01:17:19,440 --> 01:17:22,390
and as such, a linked list of sorts.

1414
01:17:22,390 --> 01:17:27,390
And so now at this higher level of code
can you not only think computationally,

1415
01:17:27,390 --> 01:17:31,050
but express yourself
computationally knowing and trusting

1416
01:17:31,050 --> 01:17:32,970
that the computer can do that bidding.

1417
01:17:32,970 --> 01:17:35,250
How the data structures
are organized really

1418
01:17:35,250 --> 01:17:37,950
is the secret source of
these languages and tools,

1419
01:17:37,950 --> 01:17:42,240
and indeed, when you have some
database or backend system, too,

1420
01:17:42,240 --> 01:17:45,360
the intellectual property
that underlies those systems

1421
01:17:45,360 --> 01:17:47,820
ultimately boils down not
only to the algorithms

1422
01:17:47,820 --> 01:17:50,100
in use, but also the data structures.

1423
01:17:50,100 --> 01:17:53,340
Because together, they--
and we've seen this--

1424
01:17:53,340 --> 01:17:57,660
together combine to produce not only
the correctness of answers you want,

1425
01:17:57,660 --> 01:18:01,820
but the efficiency with which
you can to those answers.

1426
01:18:01,820 --> 01:18:06,064