1
00:00:00,000 --> 00:00:11,270

2
00:00:11,270 --> 00:00:14,910
>> SPEAKER: All right, this is CS50.

3
00:00:14,910 --> 00:00:19,020
This is the end of week three, and if
you haven't taken advantage already,

4
00:00:19,020 --> 00:00:21,790
know that there will be lunch
this Friday as usual, where

5
00:00:21,790 --> 00:00:25,430
you can enjoy good conversation
and food at Fire and Ice

6
00:00:25,430 --> 00:00:27,980
with some of CS50's
staff and classmates.

7
00:00:27,980 --> 00:00:30,170
Head to this URL here.

8
00:00:30,170 --> 00:00:33,420
>> Now you may recall, or you
may soon be acquainted with,

9
00:00:33,420 --> 00:00:35,970
these things here, which
are given out at the end

10
00:00:35,970 --> 00:00:37,850
of the semester for many classes.

11
00:00:37,850 --> 00:00:40,870
So-called exam blue books, in which
you write your answers to exams.

12
00:00:40,870 --> 00:00:44,240
Now I have here 26 such
blue books, on each of them

13
00:00:44,240 --> 00:00:47,580
is written a name, A through Z. And
indeed the names are that simple, A

14
00:00:47,580 --> 00:00:50,490
through Z. And one of
the goals at hand today

15
00:00:50,490 --> 00:00:53,910
is going to be to continue what
we started on Monday, which is not

16
00:00:53,910 --> 00:00:57,830
so much looking at code, but really
looking at ideas and problem solving.

17
00:00:57,830 --> 00:01:00,170
One of the goals and
promises of this course

18
00:01:00,170 --> 00:01:02,985
is to teach you to think more
carefully, more methodically,

19
00:01:02,985 --> 00:01:05,400
and to solve problems more efficiently.

20
00:01:05,400 --> 00:01:09,526
And indeed, we can do that really
without even touching a line of code.

21
00:01:09,526 --> 00:01:12,150
So I have a couple of elephants
up here today, orange and blue,

22
00:01:12,150 --> 00:01:15,780
if we could get one volunteer,
maybe from farther back than usual.

23
00:01:15,780 --> 00:01:18,070
How about right there, come on down.

24
00:01:18,070 --> 00:01:24,180
The goal of which is going to be to
help plus administer this exam here.

25
00:01:24,180 --> 00:01:24,935
What's your name?

26
00:01:24,935 --> 00:01:25,768
>> AUDIENCE: Mary Beth.

27
00:01:25,768 --> 00:01:27,560
SPEAKER: Mary Beth, come on up.

28
00:01:27,560 --> 00:01:29,560
Let me get the microphone here for you.

29
00:01:29,560 --> 00:01:32,172

30
00:01:32,172 --> 00:01:32,880
Nice to meet you.

31
00:01:32,880 --> 00:01:34,005
>> AUDIENCE: Nice to meet you.

32
00:01:34,005 --> 00:01:36,790
SPEAKER: All right, so I have
here blue books A through Z,

33
00:01:36,790 --> 00:01:41,680
and I'm going to pretend that
I have one of the students,

34
00:01:41,680 --> 00:01:45,770
and they're coming in somewhat randomly
at the end of a three hour exam block,

35
00:01:45,770 --> 00:01:49,400
so they're ending up in some
semi-random order like this.

36
00:01:49,400 --> 00:01:54,510
Now your job in just a moment is going
to be-- this is actually how they get

37
00:01:54,510 --> 00:01:56,820
turned in at the end of
the class, most likely.

38
00:01:56,820 --> 00:02:01,120
Your job now is going to be, quite
simply, to sort these blue books for us

39
00:02:01,120 --> 00:02:05,220
from A through Z.

40
00:02:05,220 --> 00:02:08,400
>> AUDIENCE: Oh, this is
going to take forever.

41
00:02:08,400 --> 00:02:13,747
>> SPEAKER: And we will watch
as you do this, no pressure.

42
00:02:13,747 --> 00:02:15,330
AUDIENCE: No, no pressure or anything.

43
00:02:15,330 --> 00:02:19,230

44
00:02:19,230 --> 00:02:23,570
>> SPEAKER: And for fun,
let's put up a timer.

45
00:02:23,570 --> 00:02:26,680

46
00:02:26,680 --> 00:02:28,700
>> AUDIENCE: So much fun, so much fun.

47
00:02:28,700 --> 00:02:36,741

48
00:02:36,741 --> 00:02:38,574
>> SPEAKER: I can hold the mic for you.

49
00:02:38,574 --> 00:02:40,240
All right, we've just doubled our speed.

50
00:02:40,240 --> 00:02:44,190

51
00:02:44,190 --> 00:02:49,060
So in the meantime, let me pose what's
going to be the question for Mary Beth

52
00:02:49,060 --> 00:02:51,540
is what is she doing, how is
she going about solving this?

53
00:02:51,540 --> 00:02:54,040
And in fact, you might not have
ever thought about something

54
00:02:54,040 --> 00:02:57,440
so simple as when you pick
up 26 books like this,

55
00:02:57,440 --> 00:02:59,350
which do have a natural
ordering to them.

56
00:02:59,350 --> 00:03:01,335
What is the process
that you actually use?

57
00:03:01,335 --> 00:03:03,770
Is it fairly random just
picking the first one you see

58
00:03:03,770 --> 00:03:05,250
and putting it in its place?

59
00:03:05,250 --> 00:03:09,680
Do you first move your hands around
looking for A then looking for B?

60
00:03:09,680 --> 00:03:11,722
Do you take a look at a
pair of them side by side

61
00:03:11,722 --> 00:03:14,680
and just say, wait a minute, this
isn't right, and then swap the order?

62
00:03:14,680 --> 00:03:16,960
We saw already on Monday
that there's a number of ways

63
00:03:16,960 --> 00:03:22,140
in which we can do this, and
indeed as we near the end here,

64
00:03:22,140 --> 00:03:26,360
I would take note perhaps
of what Mary Beth is doing.

65
00:03:26,360 --> 00:03:30,040
We have a few piles it seems, a
bigger one, three smaller ones.

66
00:03:30,040 --> 00:03:33,790

67
00:03:33,790 --> 00:03:36,415
>> AUDIENCE: I'm ordering them
when I find two letters

68
00:03:36,415 --> 00:03:39,540
that I know are together in a sequence,
I put them together so that I don't

69
00:03:39,540 --> 00:03:42,915
have to worry about keeping
track of a whole row of books.

70
00:03:42,915 --> 00:03:45,706
It's just, oh, A is first,
I've got this stack here.

71
00:03:45,706 --> 00:03:47,580
SPEAKER: So, almost like
a puzzle pieces that

72
00:03:47,580 --> 00:03:49,860
have the right shape to
match up with each other.

73
00:03:49,860 --> 00:03:51,026
AUDIENCE: Pretty much, yeah.

74
00:03:51,026 --> 00:03:55,320
SPEAKER: OK, excellent.

75
00:03:55,320 --> 00:03:59,850
And now each of these
piles is presumably sorted?

76
00:03:59,850 --> 00:04:00,990
>> AUDIENCE: Yeah.

77
00:04:00,990 --> 00:04:09,900
>> SPEAKER: All right, A through Z. All
right, congratulations, you did it.

78
00:04:09,900 --> 00:04:11,461
You have your choice.

79
00:04:11,461 --> 00:04:11,960
Blue?

80
00:04:11,960 --> 00:04:13,530
All right, thank you for that.

81
00:04:13,530 --> 00:04:16,679
So Mary Beth did propose
what her approach was,

82
00:04:16,679 --> 00:04:19,720
but what is another approach how you
might go about sorting these things?

83
00:04:19,720 --> 00:04:21,130
What would you have done?

84
00:04:21,130 --> 00:04:24,060
The record to beat would have been
one minute and 50 or so seconds,

85
00:04:24,060 --> 00:04:26,039
plus the ones I forgot to count.

86
00:04:26,039 --> 00:04:27,080
What would you have done?

87
00:04:27,080 --> 00:04:27,579
Yeah?

88
00:04:27,579 --> 00:04:28,735
AUDIENCE: Take the stack.

89
00:04:28,735 --> 00:04:29,776
Start from the beginning.

90
00:04:29,776 --> 00:04:32,284
Check your papers.

91
00:04:32,284 --> 00:04:36,586
And if the top one is higher
than, maybe, they are,

92
00:04:36,586 --> 00:04:38,980
the bottom one is
higher, then switch them.

93
00:04:38,980 --> 00:04:41,300
>> SPEAKER: OK, so starting
at the top and the bottom,

94
00:04:41,300 --> 00:04:43,716
and then working your way
inward like that, swapping them?

95
00:04:43,716 --> 00:04:46,580
OK, so a little similar
in spirit to bubble sort,

96
00:04:46,580 --> 00:04:49,160
but choosing the extremes
not the adjacent pairs.

97
00:04:49,160 --> 00:04:52,080
But the short of it is that there's
surely a bunch of different ways

98
00:04:52,080 --> 00:04:54,210
we could do this, and
frankly, I think you kind of

99
00:04:54,210 --> 00:04:55,700
adopted a couple approaches, right?

100
00:04:55,700 --> 00:05:00,567
You made sort of four sorted piles, and
then effectively merged them together.

101
00:05:00,567 --> 00:05:02,650
And that's, daresay, another
technique altogether.

102
00:05:02,650 --> 00:05:06,950
You didn't treat it as one big pile,
you divided the problem into four quads,

103
00:05:06,950 --> 00:05:09,820
if you will, and then somehow
merged them in the end.

104
00:05:09,820 --> 00:05:13,410
>> So let's consider, ultimately,
how else we might do this.

105
00:05:13,410 --> 00:05:15,860
We formalized the notion
of bubble sort last time,

106
00:05:15,860 --> 00:05:18,780
and bubble sort recall was an
algorithm that we visualized

107
00:05:18,780 --> 00:05:22,640
with eight of your classmates up here,
seemingly randomly sorted at first.

108
00:05:22,640 --> 00:05:26,110
And we then decided pairwise, if
two elements are out of order,

109
00:05:26,110 --> 00:05:26,950
simply swap them.

110
00:05:26,950 --> 00:05:28,930
So four and two are
obviously out of order,

111
00:05:28,930 --> 00:05:31,080
so those two classmates
switched positions.

112
00:05:31,080 --> 00:05:35,390
And then we repeated with four and six,
then six and eight, on each iteration,

113
00:05:35,390 --> 00:05:36,980
moving to the right.

114
00:05:36,980 --> 00:05:42,590
>> So given eight people, how many pairwise
comparisons did I do while walking from

115
00:05:42,590 --> 00:05:45,220
left to right in one such iteration?

116
00:05:45,220 --> 00:05:48,410
How many comparisons?

117
00:05:48,410 --> 00:05:49,197
Seven, right?

118
00:05:49,197 --> 00:05:51,405
Because if there's eight
people but you have the pair

119
00:05:51,405 --> 00:05:53,880
them and you keep moving
one hop to the right,

120
00:05:53,880 --> 00:05:56,060
you're not going to have eight
comparisons because you can't compare

121
00:05:56,060 --> 00:05:59,226
an element against itself, or it would
just be pointless, so you have seven.

122
00:05:59,226 --> 00:06:01,290
Or more generally, if
we have n people, we

123
00:06:01,290 --> 00:06:04,300
do n minus 1 comparisons
with bubble sort.

124
00:06:04,300 --> 00:06:08,150
>> So let's consider now how good or
bad bubble sort actually was, and try

125
00:06:08,150 --> 00:06:13,570
to give ourselves vocabulary with
which to critique algorithms like this,

126
00:06:13,570 --> 00:06:14,430
and soon our own.

127
00:06:14,430 --> 00:06:16,970
So the first pass through
bubble sort, the first time

128
00:06:16,970 --> 00:06:20,909
I walked from left to right across the
stage, took me n minus 1 comparisons.

129
00:06:20,909 --> 00:06:22,950
And that's going to be my
unit of measure, right?

130
00:06:22,950 --> 00:06:26,170
I was kind of talking and strolling,
somewhat fast, somewhat slow,

131
00:06:26,170 --> 00:06:29,300
so counting my number of seconds
isn't particularly telling,

132
00:06:29,300 --> 00:06:32,260
but counting the number of
operations that I did on Monday,

133
00:06:32,260 --> 00:06:35,900
comparing two people, that feels
like a nice unit of measure.

134
00:06:35,900 --> 00:06:40,980
>> So n minus 1 steps the first time,
but then what happened after that?

135
00:06:40,980 --> 00:06:46,610
What's the one upside of one pass
through an otherwise unsorted list?

136
00:06:46,610 --> 00:06:49,840
What can you tell me about the element
who was all the way over there?

137
00:06:49,840 --> 00:06:51,300
Yeah?

138
00:06:51,300 --> 00:06:52,870
That was the biggest element, right?

139
00:06:52,870 --> 00:06:55,710
Number eight, even though she
started here, every time I

140
00:06:55,710 --> 00:06:57,860
compared her against
a neighbor, she kept

141
00:06:57,860 --> 00:07:00,480
bubbling up to the right
hand side of the list.

142
00:07:00,480 --> 00:07:02,710
And indeed, that's where
the algorithm gets its name.

143
00:07:02,710 --> 00:07:07,630
>> Now by that logic, how many comparisons
need I make on the second time

144
00:07:07,630 --> 00:07:09,800
I make that pass from left to right?

145
00:07:09,800 --> 00:07:10,730
n minus 2, right?

146
00:07:10,730 --> 00:07:14,297
It would just be wasting my time if I
keep comparing eight against someone

147
00:07:14,297 --> 00:07:16,630
else because we already know
she was in the right place.

148
00:07:16,630 --> 00:07:19,760
So that's a bit of an
optimization, so the next pass

149
00:07:19,760 --> 00:07:23,899
is going to be plus n minus two steps,
where n is the number of people.

150
00:07:23,899 --> 00:07:26,940
Now you can kind of extrapolate, even
if you're not a computer scientist,

151
00:07:26,940 --> 00:07:27,680
how this ends.

152
00:07:27,680 --> 00:07:31,259
At the end of this algorithm, presumably
you've got just one comparison left.

153
00:07:31,259 --> 00:07:33,800
You have to kind of fix the
beginning of the list in case two

154
00:07:33,800 --> 00:07:36,540
and one are out of order
and should be one and two,

155
00:07:36,540 --> 00:07:40,330
so this bottoms out at
plus 1 final comparison.

156
00:07:40,330 --> 00:07:44,500
>> Now the dot, dot, dot kind of waves it's
hands at some of the juicier details,

157
00:07:44,500 --> 00:07:46,452
but let's just go ahead and simplify.

158
00:07:46,452 --> 00:07:48,660
If you recall from high
school, frankly, a lot of you

159
00:07:48,660 --> 00:07:50,340
had math books that had
a little cheat sheet

160
00:07:50,340 --> 00:07:52,550
on the front cover or the
back cover that showed you

161
00:07:52,550 --> 00:07:56,400
what series summations like
this ultimately added up to.

162
00:07:56,400 --> 00:07:59,600
In the general case, if you have a
variable like n, and indeed this one,

163
00:07:59,600 --> 00:08:01,634
if you looked at your
old school math book,

164
00:08:01,634 --> 00:08:04,050
you would see that this actually
adds up to this sum here,

165
00:08:04,050 --> 00:08:07,970
n times n minus 1 all divided by 2.

166
00:08:07,970 --> 00:08:11,172
So for now let me just stipulate
this is true, so on a leap of faith,

167
00:08:11,172 --> 00:08:12,880
that's what this sums
up to, and we could

168
00:08:12,880 --> 00:08:14,341
prove that in a more general case.

169
00:08:14,341 --> 00:08:15,590
But now let's expand this out.

170
00:08:15,590 --> 00:08:19,920
So let's multiply this out, so that's
n squared, minus n, all divided by 2.

171
00:08:19,920 --> 00:08:23,200
That's really n squared,
divided by 2, minus n over 2,

172
00:08:23,200 --> 00:08:25,010
so that's all nice and interesting.

173
00:08:25,010 --> 00:08:27,060
But what happens if we
now plug-in a value?

174
00:08:27,060 --> 00:08:29,724
Suppose I didn't have eight
people, but say a million.

175
00:08:29,724 --> 00:08:31,890
And a million just because
it's a pretty big number,

176
00:08:31,890 --> 00:08:34,039
let's plug that in and see what happens.

177
00:08:34,039 --> 00:08:39,039
So if I plug a million into that formula
I'm going to get a million squared,

178
00:08:39,039 --> 00:08:42,868
divided by 2, minus a
million, divided by 2.

179
00:08:42,868 --> 00:08:44,159
Now what's that going to equal?

180
00:08:44,159 --> 00:08:47,354
So 500 billion, minus 500,000.

181
00:08:47,354 --> 00:08:49,270
And if I actually do
that math out, that means

182
00:08:49,270 --> 00:08:53,920
that sorting a million
people with the bubble sort

183
00:08:53,920 --> 00:09:01,800
might take me 499,999,500,000
steps or comparisons in the end,

184
00:09:01,800 --> 00:09:02,900
we're just extrapolating.

185
00:09:02,900 --> 00:09:06,860
>> That feels pretty slow, but frankly
measuring one particular input

186
00:09:06,860 --> 00:09:09,160
like this, isn't all that telling.

187
00:09:09,160 --> 00:09:14,050
But indeed it does suggest that as n
gets larger and larger, this algorithm

188
00:09:14,050 --> 00:09:16,280
kind of feels worse and
worse, or you really

189
00:09:16,280 --> 00:09:20,450
start to feel the pain of that
exponentiation, that n squared,

190
00:09:20,450 --> 00:09:21,770
which adds up pretty fast.

191
00:09:21,770 --> 00:09:25,340
And this detail isn't
lost on people, in fact

192
00:09:25,340 --> 00:09:29,640
some years ago a certain senator who was
campaigning, sat down for an interview

193
00:09:29,640 --> 00:09:32,180
with Google's Eric
Schmidt, CEO at the time,

194
00:09:32,180 --> 00:09:36,380
and was challenged with a question
much like we're exploring today.

195
00:09:36,380 --> 00:09:38,468
Let's take a look.

196
00:09:38,468 --> 00:09:45,280
>> [VIDEO PLAYBACK]

197
00:09:45,280 --> 00:09:48,560
>> -Senator, you're here
at Google, and I like

198
00:09:48,560 --> 00:09:53,382
to think of the presidency
as a job interview.

199
00:09:53,382 --> 00:09:56,434
Now, it's hard to get
a job as president,

200
00:09:56,434 --> 00:09:58,100
and you're going through the rigors now.

201
00:09:58,100 --> 00:10:01,860
It's also hard to get a job at Google.

202
00:10:01,860 --> 00:10:05,490
We have questions, and we
ask our candidates questions,

203
00:10:05,490 --> 00:10:09,770
and this one is from Larry Schwimmer.

204
00:10:09,770 --> 00:10:14,760
What-- you guys think I'm
kidding, it's right here.

205
00:10:14,760 --> 00:10:17,930
What is the most efficient way to
sort a million 32-bit integers?

206
00:10:17,930 --> 00:10:21,800

207
00:10:21,800 --> 00:10:24,350
>> -Well--

208
00:10:24,350 --> 00:10:25,200
>> -I'm sorry, maybe--

209
00:10:25,200 --> 00:10:27,400
>> -No, no, no.

210
00:10:27,400 --> 00:10:30,700
I think the bubble sort
would be the wrong way to go.

211
00:10:30,700 --> 00:10:34,165

212
00:10:34,165 --> 00:10:38,180
>> -Come on, who told him this?

213
00:10:38,180 --> 00:10:40,590
I didn't see computer
science in your background.

214
00:10:40,590 --> 00:10:42,130
>> -We've got our spies in there.

215
00:10:42,130 --> 00:10:44,930

216
00:10:44,930 --> 00:10:48,444
>> -OK, let's ask a different
interview question.

217
00:10:48,444 --> 00:10:49,300
>> [END VIDEO PLAYBACK]

218
00:10:49,300 --> 00:10:52,290
>> SPEAKER: So talking about
specific numbers though,

219
00:10:52,290 --> 00:10:53,890
isn't going to be all that useful.

220
00:10:53,890 --> 00:10:56,810
It is not a life lesson that bubble
sort, given a million inputs,

221
00:10:56,810 --> 00:10:58,590
might take as many as 500 billion steps.

222
00:10:58,590 --> 00:11:01,120
You can't really generalize
too effectively from that

223
00:11:01,120 --> 00:11:03,560
and make good design decisions
when writing programs.

224
00:11:03,560 --> 00:11:07,070
So let's focus though on how
we might simplify this result.

225
00:11:07,070 --> 00:11:11,780
>> So I've highlighted in yellow here
the result of n squared divided by 2,

226
00:11:11,780 --> 00:11:14,330
so a million squared
divided by 2, and then

227
00:11:14,330 --> 00:11:16,710
I've highlighted what
the ultimate answer was

228
00:11:16,710 --> 00:11:20,180
once we subtracted off n divided by 2.

229
00:11:20,180 --> 00:11:24,850
And the claim I'm going to make now is,
who the heck cares if you subtract off

230
00:11:24,850 --> 00:11:30,060
a little old n over 2 when the first
part of this formula is so much bigger?

231
00:11:30,060 --> 00:11:33,910
It dominates the other
term, n squared divided by 2

232
00:11:33,910 --> 00:11:37,510
is so much bigger, clearly, as
n gets large like a million,

233
00:11:37,510 --> 00:11:41,450
that is there really a big difference at
the end of the day between 500 billion

234
00:11:41,450 --> 00:11:45,730
and 499,999,500,000?

235
00:11:45,730 --> 00:11:46,349
Not really.

236
00:11:46,349 --> 00:11:48,640
And so what we're going to
do as computer scientists is

237
00:11:48,640 --> 00:11:53,270
ignore those lower order terms and
take something like this and really

238
00:11:53,270 --> 00:11:56,050
just simplify it to the
term that's going to matter.

239
00:11:56,050 --> 00:12:00,315
The bigger our data sets get, the bigger
our databases get, the more web pages

240
00:12:00,315 --> 00:12:02,690
we have to search, the more
friends you have on Facebook.

241
00:12:02,690 --> 00:12:07,340
>> As n gets larger, we're really
going to care about the largest

242
00:12:07,340 --> 00:12:11,560
term in any such analysis of
our algorithms performance.

243
00:12:11,560 --> 00:12:16,230
And I'm going to say, you know what,
bubble sort is on the order of big O,

244
00:12:16,230 --> 00:12:18,060
on the order of n squared.

245
00:12:18,060 --> 00:12:20,090
It's not exactly n
squared as we've seen,

246
00:12:20,090 --> 00:12:22,060
but who really cares
about those smaller terms,

247
00:12:22,060 --> 00:12:24,390
and frankly, who really
cares if we divide by 2?

248
00:12:24,390 --> 00:12:25,870
That's just a constant factor.

249
00:12:25,870 --> 00:12:29,480
And is 500 billion versus 250
billion really that big of a deal?

250
00:12:29,480 --> 00:12:32,190
I could just wait one year,
let my laptop literally

251
00:12:32,190 --> 00:12:34,810
get twice as fast in hardware,
and that sort of difference

252
00:12:34,810 --> 00:12:36,650
just goes away naturally over time.

253
00:12:36,650 --> 00:12:39,300
>> What we care about is
the expression, the part

254
00:12:39,300 --> 00:12:42,489
of the expression that's going to vary
as our input gets bigger and bigger.

255
00:12:42,489 --> 00:12:45,280
And indeed, in the real world,
that's what's happening increasingly

256
00:12:45,280 --> 00:12:48,330
is the inputs to our problems and
algorithms are getting bigger.

257
00:12:48,330 --> 00:12:53,470
So big O is going to be the notation,
the asymptotic notation, that we just

258
00:12:53,470 --> 00:12:57,160
use as computer scientists to describe
the performance, or the running time,

259
00:12:57,160 --> 00:12:58,130
of an algorithm.

260
00:12:58,130 --> 00:13:00,800
So that we can compare algorithms
on different computers written

261
00:13:00,800 --> 00:13:04,170
by different people, by using
some fundamentally similar metric

262
00:13:04,170 --> 00:13:07,557
like the number of comparisons you're
making, or maybe the number of swaps

263
00:13:07,557 --> 00:13:08,140
you're making.

264
00:13:08,140 --> 00:13:11,910
>> What we're not going to
count is the amount of time

265
00:13:11,910 --> 00:13:13,981
that passes on the clock
on the wall typically.

266
00:13:13,981 --> 00:13:16,230
What we're not going to worry
about is how much memory

267
00:13:16,230 --> 00:13:17,820
you're using today at
least, though that's

268
00:13:17,820 --> 00:13:19,370
another resource we might measure.

269
00:13:19,370 --> 00:13:23,610
We're going to try to base our analyses
on just the basic operations, the ones,

270
00:13:23,610 --> 00:13:25,930
frankly, that you can see most visually.

271
00:13:25,930 --> 00:13:30,700
So with something like big O of n
squared, I claim that O of n squared

272
00:13:30,700 --> 00:13:35,820
is an upper bound on the so-called
running time of bubble sort.

273
00:13:35,820 --> 00:13:38,820
In other words, if you
wanted to claim that there's

274
00:13:38,820 --> 00:13:41,370
this upper limit on how many
steps an algorithm might take,

275
00:13:41,370 --> 00:13:46,240
it's going to be in the big O of n
squared in this case, an upper bound.

276
00:13:46,240 --> 00:13:49,710
>> What if I instead change the
story to be not about bubble sort,

277
00:13:49,710 --> 00:13:50,910
but about this upper bound.

278
00:13:50,910 --> 00:13:54,030
Can you think of an algorithm
that we've looked at already

279
00:13:54,030 --> 00:13:59,530
whose upper bound, maximum
measure of time or operations,

280
00:13:59,530 --> 00:14:04,300
would be said to be bounded
by n, a linear function,

281
00:14:04,300 --> 00:14:07,260
not a quadratic one that's curved?

282
00:14:07,260 --> 00:14:10,780
What's an algorithm that
always takes no more

283
00:14:10,780 --> 00:14:12,860
than like n steps, or
2n steps, or 3n steps?

284
00:14:12,860 --> 00:14:13,360
Yeah?

285
00:14:13,360 --> 00:14:15,030
>> AUDIENCE: Finding the
biggest number in a list?

286
00:14:15,030 --> 00:14:16,930
>> SPEAKER: Perfect, finding
the biggest number in a list.

287
00:14:16,930 --> 00:14:18,940
If I'm given a list of
people for instance,

288
00:14:18,940 --> 00:14:21,440
each of who is holding a number,
what is the maximum number

289
00:14:21,440 --> 00:14:23,770
of steps it should take me,
a reasonably smart person,

290
00:14:23,770 --> 00:14:27,530
to find the largest person in that list?

291
00:14:27,530 --> 00:14:28,100
n, right?

292
00:14:28,100 --> 00:14:31,320
Because in the worst case, where
might the biggest value be?

293
00:14:31,320 --> 00:14:32,700
Right, all the way at the end.

294
00:14:32,700 --> 00:14:34,575
So in the worst case
upper bound, I might

295
00:14:34,575 --> 00:14:36,450
have to go all the way
over here and be like,

296
00:14:36,450 --> 00:14:39,170
oh, here's number eight,
or whatever that value is.

297
00:14:39,170 --> 00:14:41,330
Now it would just be stupid
if I kept going, right?

298
00:14:41,330 --> 00:14:43,840
Looking for more and more elements
if the last of them is over there?

299
00:14:43,840 --> 00:14:45,340
So surely, n is an upper bound.

300
00:14:45,340 --> 00:14:47,420
I don't need to take
more steps than that.

301
00:14:47,420 --> 00:14:51,580
>> So what if instead I proposed that
there are algorithms in this world that

302
00:14:51,580 --> 00:14:57,750
have a running time that's
bounded by big O of log n, log n?

303
00:14:57,750 --> 00:15:00,390
Where have we seen this before?

304
00:15:00,390 --> 00:15:00,890
Yeah?

305
00:15:00,890 --> 00:15:03,309
>> AUDIENCE: In the phone book problem?

306
00:15:03,309 --> 00:15:04,850
SPEAKER: Like the phone book problem.

307
00:15:04,850 --> 00:15:07,754
What was the measure of how
much time or how many tears it

308
00:15:07,754 --> 00:15:10,170
took me to find someone like
Mike Smith in the phone book?

309
00:15:10,170 --> 00:15:13,212
We claimed it was log n, and
even if unfamiliar or it it's

310
00:15:13,212 --> 00:15:15,170
a little hazy what a
logarithm or exponent was,

311
00:15:15,170 --> 00:15:17,650
just remember that log n
generally refers to the process,

312
00:15:17,650 --> 00:15:20,790
in this case, of dividing
something in half again, and again,

313
00:15:20,790 --> 00:15:25,790
and again, and again, such that it
gets increasingly small as you do that.

314
00:15:25,790 --> 00:15:28,470
>> So log of n refers, sure,
to the phone book example,

315
00:15:28,470 --> 00:15:32,662
to binary search in theory, when we
had the virtual doors on the board,

316
00:15:32,662 --> 00:15:34,370
or when Sean was
searching for something.

317
00:15:34,370 --> 00:15:37,374
If he had used binary search, log n
would be the upper bound on how much

318
00:15:37,374 --> 00:15:38,040
time that takes.

319
00:15:38,040 --> 00:15:44,027
But those algorithms that ran in
log n assumed what key detail?

320
00:15:44,027 --> 00:15:45,360
That the list was sorted, right?

321
00:15:45,360 --> 00:15:47,789
Your algorithm is wrong if
your input is not sorted,

322
00:15:47,789 --> 00:15:49,830
and yet you're using
something like binary search

323
00:15:49,830 --> 00:15:51,704
because you might jump
right over the element

324
00:15:51,704 --> 00:15:53,600
without realizing it's indeed there.

325
00:15:53,600 --> 00:15:55,600
>> Now what might this mean, big O of one?

326
00:15:55,600 --> 00:15:59,117
This doesn't mean that your algorithm
takes one and only one step,

327
00:15:59,117 --> 00:16:01,200
it just means it takes a
constant number of steps.

328
00:16:01,200 --> 00:16:04,060
Maybe it's 1, maybe it's
10, maybe it's 1,000,

329
00:16:04,060 --> 00:16:07,750
but it's independent of
the size of the problem.

330
00:16:07,750 --> 00:16:10,850
No matter how big n is,
a constant time algorithm

331
00:16:10,850 --> 00:16:12,747
always takes the same number of steps.

332
00:16:12,747 --> 00:16:15,080
So what might be an algorithm
we've talked about or just

333
00:16:15,080 --> 00:16:20,418
intuitively that comes to you that
always runs in so-called constant time?

334
00:16:20,418 --> 00:16:20,918
Yeah?

335
00:16:20,918 --> 00:16:22,001
>> AUDIENCE: Add two numbers.

336
00:16:22,001 --> 00:16:25,320
SPEAKER: Add two numbers,
2 plus 2 equals 4, done.

337
00:16:25,320 --> 00:16:27,227
So that might work, what else?

338
00:16:27,227 --> 00:16:28,560
How about more real world, yeah?

339
00:16:28,560 --> 00:16:30,686
>> AUDIENCE: Finding the
first thing in a list.

340
00:16:30,686 --> 00:16:32,810
SPEAKER: Finding the first
element in a list, sure.

341
00:16:32,810 --> 00:16:34,540
We've actually been talking
about arrays already,

342
00:16:34,540 --> 00:16:36,540
how do you get at the
first element in an array,

343
00:16:36,540 --> 00:16:40,465
no matter how long the
array is in C code?

344
00:16:40,465 --> 00:16:43,090
You just use like the bracket
zero notation, bam, you're there.

345
00:16:43,090 --> 00:16:46,120
And indeed arrays, as an aside,
support something generally known

346
00:16:46,120 --> 00:16:49,240
as random access, random access
memory, because you can literally

347
00:16:49,240 --> 00:16:50,284
jump to any one place.

348
00:16:50,284 --> 00:16:52,700
We can do this even more simply
we can rewind to week zero

349
00:16:52,700 --> 00:16:53,900
when we did Scratch.

350
00:16:53,900 --> 00:16:59,707
How much time did it take for the
say block in Scratch to execute?

351
00:16:59,707 --> 00:17:00,790
Just constant time, right?

352
00:17:00,790 --> 00:17:03,960
Say something, say
something, it doesn't matter

353
00:17:03,960 --> 00:17:07,359
how big Scratches world is, it's always
going to take the same amount of time

354
00:17:07,359 --> 00:17:08,490
to simply say something.

355
00:17:08,490 --> 00:17:11,089
>> So that's constant time,
but what's the flip side?

356
00:17:11,089 --> 00:17:13,030
If that was upper
bounds, what if we want

357
00:17:13,030 --> 00:17:17,089
to describe the lower bounds
of our algorithms running time?

358
00:17:17,089 --> 00:17:19,852
Almost a best case
potentially, if you will,

359
00:17:19,852 --> 00:17:23,060
though these terms could apply to best
cases, worst cases, average cases more

360
00:17:23,060 --> 00:17:26,359
generally, but let's just focus
on lower bounds more generally.

361
00:17:26,359 --> 00:17:31,920
What's an algorithm that has
a lower bound of n steps,

362
00:17:31,920 --> 00:17:33,350
or 2n steps, or 3n steps?

363
00:17:33,350 --> 00:17:36,241
Some factor of n steps,
that's its lower bound.

364
00:17:36,241 --> 00:17:36,740
Yeah?

365
00:17:36,740 --> 00:17:37,910
>> AUDIENCE: Bubble sort?

366
00:17:37,910 --> 00:17:41,610
>> SPEAKER: Bubble sort takes
you minimally n steps, why?

367
00:17:41,610 --> 00:17:42,279
Why is that?

368
00:17:42,279 --> 00:17:45,320
Why should that start to come to you
intuitively, even if it doesn't just

369
00:17:45,320 --> 00:17:46,530
yet?

370
00:17:46,530 --> 00:17:47,030
Yeah?

371
00:17:47,030 --> 00:17:47,990
>> AUDIENCE: [INAUDIBLE].

372
00:17:47,990 --> 00:17:51,652

373
00:17:51,652 --> 00:17:52,360
SPEAKER: Exactly.

374
00:17:52,360 --> 00:17:55,810
In the best possible scenario of
bubble sort, and a lot of algorithms,

375
00:17:55,810 --> 00:17:58,769
if I hand you eight people
who are already sorted,

376
00:17:58,769 --> 00:18:00,560
it would be foolish
for you, the algorithm,

377
00:18:00,560 --> 00:18:02,202
to go back and forth
more than once, right?

378
00:18:02,202 --> 00:18:04,285
Because as soon as you
walk through the list once,

379
00:18:04,285 --> 00:18:08,090
you should realize, oh, I made no
swaps, this list is sorted, exit.

380
00:18:08,090 --> 00:18:09,700
But that's going to take you n steps.

381
00:18:09,700 --> 00:18:12,033
>> And conversely, what's another
way of thinking about it?

382
00:18:12,033 --> 00:18:15,240
Bubble sort is an omega,
so to speak, of n,

383
00:18:15,240 --> 00:18:19,050
because if you look at
fewer than n elements, what

384
00:18:19,050 --> 00:18:23,009
is the fundamental issue there?

385
00:18:23,009 --> 00:18:24,550
You don't know if it's sorted, right.

386
00:18:24,550 --> 00:18:26,800
We humans might glance at eight
people and be like, oh, it's sorted,

387
00:18:26,800 --> 00:18:28,430
that didn't take me n steps, but it did.

388
00:18:28,430 --> 00:18:30,810
Your eyes, even though you kind
of have a big field of vision,

389
00:18:30,810 --> 00:18:33,184
you looked at eight elements,
you looked at eight people,

390
00:18:33,184 --> 00:18:34,610
that's eight steps effectively.

391
00:18:34,610 --> 00:18:38,612
And only if I walk through the whole
list do I realize, yes, sorted.

392
00:18:38,612 --> 00:18:41,320
If I stop halfway thinking, all
right, it's pretty sorted so far,

393
00:18:41,320 --> 00:18:42,520
what are the odds it's not sorted?

394
00:18:42,520 --> 00:18:44,186
That algorithms not going to be correct.

395
00:18:44,186 --> 00:18:46,250
Might be faster, but incorrect.

396
00:18:46,250 --> 00:18:48,500
>> So now we have a way of
describing a lower bounds,

397
00:18:48,500 --> 00:18:49,710
and what about constant time?

398
00:18:49,710 --> 00:18:54,565
What's an algorithm that has a lower
bound on its running time of one?

399
00:18:54,565 --> 00:18:58,350
1 step, 2 steps, 10 steps, but
constant, independent of n,

400
00:18:58,350 --> 00:18:59,310
the size of the input?

401
00:18:59,310 --> 00:19:03,930

402
00:19:03,930 --> 00:19:04,600
Yeah, in back.

403
00:19:04,600 --> 00:19:05,309
>> AUDIENCE: Printf?

404
00:19:05,309 --> 00:19:06,183
SPEAKER: What's that?

405
00:19:06,183 --> 00:19:07,184
AUDIENCE: Printf?

406
00:19:07,184 --> 00:19:07,850
SPEAKER: Printf.

407
00:19:07,850 --> 00:19:08,400
OK, sure.

408
00:19:08,400 --> 00:19:10,720
So it takes a fixed number of steps.

409
00:19:10,720 --> 00:19:13,170
And I should now-- now that
we're talking about C code

410
00:19:13,170 --> 00:19:16,040
and not Scratch, something
like say, with printf,

411
00:19:16,040 --> 00:19:17,710
we should start to get careful.

412
00:19:17,710 --> 00:19:21,090
Because printf does take
input, it's a string,

413
00:19:21,090 --> 00:19:23,220
and strings do technically have length.

414
00:19:23,220 --> 00:19:25,530
So if we now want to pick
on you, if you don't mind,

415
00:19:25,530 --> 00:19:29,430
technically we could argue that printf
does take a variable length input,

416
00:19:29,430 --> 00:19:32,270
and surely it might take more
time to print a string this long,

417
00:19:32,270 --> 00:19:33,560
than this long.

418
00:19:33,560 --> 00:19:36,570
>> So what if we consider just the
sorting and searching examples?

419
00:19:36,570 --> 00:19:40,450
What about Mike Smith in the phone
book, or binary search more generally?

420
00:19:40,450 --> 00:19:42,220
In the best case, what might happen?

421
00:19:42,220 --> 00:19:45,577
I open the phone book and, bam,
there's Mike Smith's number.

422
00:19:45,577 --> 00:19:46,660
I can call him right away.

423
00:19:46,660 --> 00:19:49,390
>> Took one step, maybe two steps,
but a constant number of steps

424
00:19:49,390 --> 00:19:50,230
if I got lucky.

425
00:19:50,230 --> 00:19:52,570
And frankly, we saw on
Monday your classmate

426
00:19:52,570 --> 00:19:54,710
get quite lucky twice in a row.

427
00:19:54,710 --> 00:19:57,050
And that was indeed constant
time in a lower bounds

428
00:19:57,050 --> 00:20:01,280
on the algorithm in question for finding
the number 50 behind those closed

429
00:20:01,280 --> 00:20:01,830
doors.

430
00:20:01,830 --> 00:20:06,400
>> Now, as an aside, if you discover
that both big O, the upper bound,

431
00:20:06,400 --> 00:20:09,310
and omega, the lower bound,
are one in the same, that

432
00:20:09,310 --> 00:20:11,830
is the same formula in
parentheses, you can also

433
00:20:11,830 --> 00:20:15,170
say, just to be fancy,
that something is in theta

434
00:20:15,170 --> 00:20:18,270
of n or theta of some other value.

435
00:20:18,270 --> 00:20:20,661
That just means when big
O and omega are the same.

436
00:20:20,661 --> 00:20:21,910
Now what about selection sort?

437
00:20:21,910 --> 00:20:23,400
Let's use this new vocabulary.

438
00:20:23,400 --> 00:20:27,407
In selection sort, what were we
doing again, and again, and again?

439
00:20:27,407 --> 00:20:29,990
I was going back and forth through
the list, looking for whom?

440
00:20:29,990 --> 00:20:33,260

441
00:20:33,260 --> 00:20:34,730
The smallest number.

442
00:20:34,730 --> 00:20:37,560
>> So how many steps, how
many comparisons did I

443
00:20:37,560 --> 00:20:43,250
have to make in order to figure out who
the smallest element in the list was?

444
00:20:43,250 --> 00:20:44,437
n minus 1, right?

445
00:20:44,437 --> 00:20:47,770
Because if I just start with the one I'm
given and I start comparing him or her,

446
00:20:47,770 --> 00:20:49,519
then him or her, him
or her, him or her, I

447
00:20:49,519 --> 00:20:52,010
can only pair elements
together n minus 1 times.

448
00:20:52,010 --> 00:20:55,630
So selection sort similarly takes
n minus 1 steps the first time.

449
00:20:55,630 --> 00:20:59,540
>> How many steps does it take me to
find the second smallest element?

450
00:20:59,540 --> 00:21:02,920
n minus 2, because I'm being dumb
if I keep looking at the same people

451
00:21:02,920 --> 00:21:06,280
again if I've already selected him
or her and put them in their place.

452
00:21:06,280 --> 00:21:09,270
And the third step, n
minus 3, then n minus 4.

453
00:21:09,270 --> 00:21:11,020
We've seen this pattern
before, and indeed

454
00:21:11,020 --> 00:21:13,460
selection sort similarly
has an upper bound

455
00:21:13,460 --> 00:21:16,210
of n squared if we do up that summation.

456
00:21:16,210 --> 00:21:19,790
What is its lower bound, selection sort?

457
00:21:19,790 --> 00:21:25,350
Minimally, how much time must selection
sort take, as we defined it on Monday?

458
00:21:25,350 --> 00:21:29,370

459
00:21:29,370 --> 00:21:30,490
Propose two options.

460
00:21:30,490 --> 00:21:32,360
Maybe it's n, as before.

461
00:21:32,360 --> 00:21:35,040
Maybe it's n squared, as it
is now as the upper bound.

462
00:21:35,040 --> 00:21:35,874
>> AUDIENCE: n squared.

463
00:21:35,874 --> 00:21:36,664
SPEAKER: n squared.

464
00:21:36,664 --> 00:21:37,368
Why?

465
00:21:37,368 --> 00:21:40,060
>> AUDIENCE: Because you have
to define [INAUDIBLE].

466
00:21:40,060 --> 00:21:41,510
>> SPEAKER: Exactly.

467
00:21:41,510 --> 00:21:45,077
At least as I defined selection sort
it was pretty naive, keep going,

468
00:21:45,077 --> 00:21:46,160
find the smallest element.

469
00:21:46,160 --> 00:21:47,770
Go again, find the smallest element.

470
00:21:47,770 --> 00:21:49,490
Go again, find the smallest element.

471
00:21:49,490 --> 00:21:51,700
There's no sort of
optimization in there that

472
00:21:51,700 --> 00:21:54,350
might let me abort after
just n or so steps.

473
00:21:54,350 --> 00:21:57,080
So indeed, selection
sort, omega of n squared.

474
00:21:57,080 --> 00:22:00,667
>> What about insertion sort, where I took
who I was given, and then I plopped him

475
00:22:00,667 --> 00:22:01,750
or her in the right place?

476
00:22:01,750 --> 00:22:04,958
Then I proceeded to the second person,
plopped him or her in the right place.

477
00:22:04,958 --> 00:22:07,910
Then the next person, plopped
him or her in the right place.

478
00:22:07,910 --> 00:22:10,537
Notice that this is very
linear, so to speak.

479
00:22:10,537 --> 00:22:12,620
I'm a straight line, I'm
not going back and forth,

480
00:22:12,620 --> 00:22:16,080
I've never looking back really, but
what's happening when I insert him

481
00:22:16,080 --> 00:22:20,302
or her into the beginning of
the list as we did on Monday?

482
00:22:20,302 --> 00:22:21,010
What's happening?

483
00:22:21,010 --> 00:22:21,510
Yeah?

484
00:22:21,510 --> 00:22:23,122
AUDIENCE: [INAUDIBLE].

485
00:22:23,122 --> 00:22:24,830
SPEAKER: Yeah, that
was the catch, right?

486
00:22:24,830 --> 00:22:26,746
You might recall from
your classmates, if they

487
00:22:26,746 --> 00:22:29,670
were making any movement with
their feet, that was an operation.

488
00:22:29,670 --> 00:22:33,610
So if there were three people here and
the new person belonged way over there,

489
00:22:33,610 --> 00:22:37,360
on a long stage like this, sure, he
or she could just go to the very end.

490
00:22:37,360 --> 00:22:40,074
But if we're thinking about a
computer and an array of memory,

491
00:22:40,074 --> 00:22:41,990
these people are going
to have to shuffle over

492
00:22:41,990 --> 00:22:43,260
to make room for that person.

493
00:22:43,260 --> 00:22:46,930
And so that n minus 1 shufflings,
n minus 2 shufflings, n

494
00:22:46,930 --> 00:22:50,660
minus 3 shufflings is just kind of
happening behind me, not in front of me

495
00:22:50,660 --> 00:22:52,710
as before, in some sense.

496
00:22:52,557 --> 00:22:54,640
Now as an aside, and as
you might have seen online

497
00:22:54,640 --> 00:22:57,699
if you start poking around about
sorts, there's so many different ones

498
00:22:57,699 --> 00:22:59,490
out there, some of them
better than others.

499
00:22:59,490 --> 00:23:02,200
Indeed, bogosort is one
that's kind of fun to look up.

500
00:23:02,200 --> 00:23:06,650
Bogosort takes a set of
numbers or say a deck of cards,

501
00:23:06,650 --> 00:23:09,870
randomly shuffles them, and
checks if they're sorted.

502
00:23:09,870 --> 00:23:12,130
And if not, does it again.

503
00:23:12,130 --> 00:23:14,140
And if not, does it again.

504
00:23:14,140 --> 00:23:15,440
If not, does it again.

505
00:23:15,440 --> 00:23:17,060
Incredibly stupid.

506
00:23:17,060 --> 00:23:19,520
>> And indeed, if you read
like the Wikipedia article,

507
00:23:19,520 --> 00:23:21,200
its nickname is stupid sort.

508
00:23:21,200 --> 00:23:25,180
It will eventually work,
hopefully, given enough time,

509
00:23:25,180 --> 00:23:28,240
but that amount of time
could take quite some time.

510
00:23:28,240 --> 00:23:31,650
So if I could, let's speed things
up from Mary Beth's example earlier,

511
00:23:31,650 --> 00:23:35,150
by having a few more elements,
but two more processors.

512
00:23:35,150 --> 00:23:37,100
Two people, if you
wouldn't mind joining me.

513
00:23:37,100 --> 00:23:40,972
How about 1 over here, and
let's go-- no one over there?

514
00:23:40,972 --> 00:23:41,722
No one over there?

515
00:23:41,722 --> 00:23:42,221
OK.

516
00:23:42,221 --> 00:23:44,190
You with the black
shirt, yes, come on down.

517
00:23:44,190 --> 00:23:45,000
All right, what's your name?

518
00:23:45,000 --> 00:23:45,720
>> AUDIENCE: Peter.

519
00:23:45,720 --> 00:23:46,100
>> SPEAKER: What's that?

520
00:23:46,100 --> 00:23:46,766
>> AUDIENCE: Peter.

521
00:23:46,766 --> 00:23:49,450
SPEAKER: Peter, David, nice to meet you.

522
00:23:49,450 --> 00:23:53,670
All right, we have Peter here, if you
want to come onto the table over here.

523
00:23:53,670 --> 00:23:54,550
And what's your name?

524
00:23:54,550 --> 00:23:55,216
>> AUDIENCE: Elena.

525
00:23:55,216 --> 00:23:55,970
SPEAKER: Elena.

526
00:23:55,970 --> 00:23:57,030
OK, nice to meet you.

527
00:23:57,030 --> 00:23:58,060
Elena meet Peter.

528
00:23:58,060 --> 00:23:59,170
Peter, Elena.

529
00:23:59,170 --> 00:24:02,290
And we'll need Andrew
up here as well, please.

530
00:24:02,290 --> 00:24:06,107
And your challenge is going
to be to sort a deck of cards.

531
00:24:06,107 --> 00:24:08,190
And if unfamiliar, deck
of cards should ultimately

532
00:24:08,190 --> 00:24:11,064
be sorted a little something like
this where we'll do the clubs, then

533
00:24:11,064 --> 00:24:13,660
the spades, then the hearts and
diamonds, from ace as a one,

534
00:24:13,660 --> 00:24:15,570
all the way up to king.

535
00:24:15,570 --> 00:24:20,890
>> The cards I'm going to give you
are going to be 52 in quantity.

536
00:24:20,890 --> 00:24:23,160
We're going to similarly
time you, in just a moment.

537
00:24:23,160 --> 00:24:26,410
We're going to throw Andrew
up on the screen here,

538
00:24:26,410 --> 00:24:28,170
so as to watch as you do this.

539
00:24:28,170 --> 00:24:31,070
And so that all of this
is all the more visible,

540
00:24:31,070 --> 00:24:33,490
these are the cards I got on Amazon.

541
00:24:33,490 --> 00:24:42,861
So they are already randomly
sorted, and we're going to time you.

542
00:24:42,861 --> 00:24:44,610
And we're going to
keep it real this time,

543
00:24:44,610 --> 00:24:47,820
so we're going to try to pressure you
because otherwise this will get tedious

544
00:24:47,820 --> 00:24:48,460
quickly.

545
00:24:48,460 --> 00:24:53,860
If you could proceed to sort 52
elements together via some means, now.

546
00:24:53,860 --> 00:25:04,710

547
00:25:04,710 --> 00:25:07,180
>> And again, as we watch these
guys do what, in the end

548
00:25:07,180 --> 00:25:10,200
is going to produce an obvious
result, think about really

549
00:25:10,200 --> 00:25:12,962
how they're each doing it,
how you might describe it.

550
00:25:12,962 --> 00:25:15,045
Because again, these are
all processes, algorithms

551
00:25:15,045 --> 00:25:17,090
that we take for granted as a human.

552
00:25:17,090 --> 00:25:22,349
But you've probably long had
intuition, long before you even

553
00:25:22,349 --> 00:25:24,390
thought about taking a
computer science class you

554
00:25:24,390 --> 00:25:27,223
might have had the intuition with
which to solve problems like this.

555
00:25:27,223 --> 00:25:29,560
But once you recognize
the patterns and begin

556
00:25:29,560 --> 00:25:32,407
to formalize the steps with which
you're solving these problems,

557
00:25:32,407 --> 00:25:35,490
you'll find that you can solve much
more interesting and much more complex

558
00:25:35,490 --> 00:25:39,190
problems quickly.

559
00:25:39,190 --> 00:25:42,351
So someone from the audience, what is
at least one element of the algorithm

560
00:25:42,351 --> 00:25:43,350
that they're using here?

561
00:25:43,350 --> 00:25:44,275
>> AUDIENCE: [INAUDIBLE]

562
00:25:44,275 --> 00:25:45,150
SPEAKER: What's that?

563
00:25:45,150 --> 00:25:47,062
AUDIENCE: By suit.

564
00:25:47,062 --> 00:25:47,770
SPEAKER: By suit.

565
00:25:47,770 --> 00:25:50,630
So first they are clustering
all of the diamonds together

566
00:25:50,630 --> 00:25:52,560
it seems, all of the
hearts together it seems,

567
00:25:52,560 --> 00:25:56,520
and so forth, without respect
for the numbers on the cards.

568
00:25:56,520 --> 00:26:00,900
And now they appear, for instance,
to be sorting them by number.

569
00:26:00,900 --> 00:26:06,870

570
00:26:06,870 --> 00:26:08,910
Very good.

571
00:26:08,910 --> 00:26:12,370
>> All right, so what's going to
be the final step then here?

572
00:26:12,370 --> 00:26:16,950
Once we have four sorted suits, what
do we need to do to the four piles

573
00:26:16,950 --> 00:26:20,059
in order to achieve one
sorted deck, quite simply?

574
00:26:20,059 --> 00:26:21,350
So we need to merge them again.

575
00:26:21,350 --> 00:26:25,160
>> So there's an interesting idea that
again, daresay, is very intuitive even

576
00:26:25,160 --> 00:26:28,140
if you might never have slapped
that kind of label on it.

577
00:26:28,140 --> 00:26:31,900
This fundamental notion of dividing
the problem not in half this time,

578
00:26:31,900 --> 00:26:33,410
but at least into four pieces.

579
00:26:33,410 --> 00:26:36,810
Solving pretty much
fundamentally identical problems

580
00:26:36,810 --> 00:26:40,480
in isolation of each other,
and then merging the results.

581
00:26:40,480 --> 00:26:46,940

582
00:26:46,940 --> 00:26:50,140
And, excellent, done.

583
00:26:50,140 --> 00:26:52,140
All right, a big round
of applause, if we could.

584
00:26:52,140 --> 00:26:56,480
>> [APPLAUSE]

585
00:26:56,480 --> 00:26:59,740
>> SPEAKER: I have no idea what you'll
do with these, but here you go.

586
00:26:59,740 --> 00:27:01,690
Thank you so much.

587
00:27:01,690 --> 00:27:04,660
So let's see, two minutes
and eight seconds,

588
00:27:04,660 --> 00:27:07,490
if you'd like to challenge your friends.

589
00:27:07,490 --> 00:27:12,160
What then is going to
be a take away from this

590
00:27:12,160 --> 00:27:13,830
that we can leverage more generally?

591
00:27:13,830 --> 00:27:16,080
Well, think back to
this array of numbers,

592
00:27:16,080 --> 00:27:19,060
and think back now to some of the
pseudocode we've written in the past,

593
00:27:19,060 --> 00:27:22,080
and this was the pseudocode for
solving the phone book problem.

594
00:27:22,080 --> 00:27:25,150
Whereby in pseudocode I
enumerated a more methodical way

595
00:27:25,150 --> 00:27:28,400
of describing how I did a very intuitive
human algorithm of dividing the phone

596
00:27:28,400 --> 00:27:31,650
book in half, repeat, repeat, repeat,
until I find someone like Mike Smith,

597
00:27:31,650 --> 00:27:33,790
if he is indeed in the phone book.

598
00:27:33,790 --> 00:27:37,610
>> But I kind of used what I'll call
a very iterative approach here,

599
00:27:37,610 --> 00:27:42,160
in particular notice line 8 and line 11.

600
00:27:42,160 --> 00:27:46,750
Those are evidence of an iterative
approach, a looping approach,

601
00:27:46,750 --> 00:27:49,040
because that's exactly
the behavior they induce.

602
00:27:49,040 --> 00:27:52,910
Those lines both say go to
line three, and you can kind of

603
00:27:52,910 --> 00:27:55,140
think of that in your
mind's eye as being a loop.

604
00:27:55,140 --> 00:27:59,080
It's telling you to go back up to step
three and repeat, again, and again,

605
00:27:59,080 --> 00:28:00,010
and again.

606
00:28:00,010 --> 00:28:04,410
>> But what if we leverage a key idea
here that we didn't the last time,

607
00:28:04,410 --> 00:28:10,280
and simplify line 8 and
line 11 and their neighbors

608
00:28:10,280 --> 00:28:12,840
as just this, in yellow.

609
00:28:12,840 --> 00:28:16,480
It's not fundamentally shortening
the pseudocode very much,

610
00:28:16,480 --> 00:28:20,530
but it's fundamentally changing
the nature of my algorithm.

611
00:28:20,530 --> 00:28:24,220
What I'm now saying
in step 7, in step 10,

612
00:28:24,220 --> 00:28:29,140
is to search for Mike
in the exact same way,

613
00:28:29,140 --> 00:28:31,580
but just in the left
half or the right half.

614
00:28:31,580 --> 00:28:33,420
>> So in other words, if
I start from step one,

615
00:28:33,420 --> 00:28:36,150
pick up phone book, open to middle
of phone book, look at names,

616
00:28:36,150 --> 00:28:39,010
if Smith is among
name's, call Mike, else

617
00:28:39,010 --> 00:28:44,340
if Smith is earlier in book, step seven
search for Mike in left half of book.

618
00:28:44,340 --> 00:28:47,130
But that kind of feels like
it's leaving me hanging, right?

619
00:28:47,130 --> 00:28:49,240
In yellow, is an
instruction, but how do I

620
00:28:49,240 --> 00:28:51,870
search for Mike in the left
half of the phone book?

621
00:28:51,870 --> 00:28:54,210
Where do I have an
algorithm with which I

622
00:28:54,210 --> 00:28:57,100
can search for someone like Mike Smith?

623
00:28:57,100 --> 00:28:58,980
Well, it's staring us in the face.

624
00:28:58,980 --> 00:29:03,090
I can literally use the exact same
program effectively going up to the top

625
00:29:03,090 --> 00:29:06,490
again and re-running
the same lines of code.

626
00:29:06,490 --> 00:29:10,610
>> So even though this should feel
like a bit of a cyclical definition

627
00:29:10,610 --> 00:29:13,480
where you're answering someone's
question by just sort of asking

628
00:29:13,480 --> 00:29:15,990
the same question again,
like why, why, why?

629
00:29:15,990 --> 00:29:21,580
The reality is because we've hard coded
a couple of special lines, step 4,

630
00:29:21,580 --> 00:29:25,320
which is an if, and step 12, which
is effectively another branch,

631
00:29:25,320 --> 00:29:30,120
because we have those stopgap measures,
this algorithm will terminate if we

632
00:29:30,120 --> 00:29:32,050
find Mike, or if we don't.

633
00:29:32,050 --> 00:29:36,810
But in step 7 and 10 now, we have
what we'll call a recursive algorithm.

634
00:29:36,810 --> 00:29:40,420
And recursion is indeed a powerful idea
that's a little mind bending at first,

635
00:29:40,420 --> 00:29:42,500
that we can now apply as follows.

636
00:29:42,500 --> 00:29:46,600
>> Merge sort will be the last sort that
we look at, at least in class formally.

637
00:29:46,600 --> 00:29:50,040
And it's fundamentally different
from those last three, and certainly

638
00:29:50,040 --> 00:29:52,140
last four if we include bogosort.

639
00:29:52,140 --> 00:29:54,810
Here's the pseudocode for merge sort.

640
00:29:54,810 --> 00:30:00,170
When on input of n elements, so given
an array of size n, if n is less than 2,

641
00:30:00,170 --> 00:30:01,040
return.

642
00:30:01,040 --> 00:30:03,610
So why do I have that
sanity check first?

643
00:30:03,610 --> 00:30:09,477
What's the implication if I hand you
an array whose length n is less than 2?

644
00:30:09,477 --> 00:30:11,060
It's already sorted, obviously, right?

645
00:30:11,060 --> 00:30:13,640
Because the list either has
one element, which is trivially

646
00:30:13,640 --> 00:30:15,180
sorted because it's
the only thing there.

647
00:30:15,180 --> 00:30:18,138
Or, it's of size zero which means
there's nothing to sort, so by nature

648
00:30:18,138 --> 00:30:18,720
it is sorted.

649
00:30:18,720 --> 00:30:20,410
There's just nothing wrong there.

650
00:30:20,410 --> 00:30:22,310
So that's our so-called base case.

651
00:30:22,310 --> 00:30:24,440
>> That is similar in spirit
to what we did with Mike.

652
00:30:24,440 --> 00:30:26,023
If Mike's in the phone book, call him.

653
00:30:26,023 --> 00:30:27,740
If he's not there, give up.

654
00:30:27,740 --> 00:30:31,240
It's a so-called base case, to make sure
this algorithm at the end of the day

655
00:30:31,240 --> 00:30:33,540
will stop in certain circumstances.

656
00:30:33,540 --> 00:30:37,890
>> But here's the leap of faith now, else,
sort the left half of the elements,

657
00:30:37,890 --> 00:30:39,740
then sort the right
half of the elements,

658
00:30:39,740 --> 00:30:41,189
and then merge the sorted halves.

659
00:30:41,189 --> 00:30:43,230
And here's where it feels
like we're copping out.

660
00:30:43,230 --> 00:30:46,900
I've asked you to sort
n elements, and I'm

661
00:30:46,900 --> 00:30:50,712
saying, OK, do it by sorting
the left and sorting the right.

662
00:30:50,712 --> 00:30:52,420
But I am saying one
other thing, and this

663
00:30:52,420 --> 00:30:55,530
is the key theme it seems
in the intuition thus far,

664
00:30:55,530 --> 00:30:57,380
there's this third step of merging.

665
00:30:57,380 --> 00:31:00,430
Which even though it
seems so dumb in spirit,

666
00:31:00,430 --> 00:31:02,320
like just merge things
together, it seems

667
00:31:02,320 --> 00:31:05,380
to be a key step toward the
reassembly of two problems that

668
00:31:05,380 --> 00:31:07,330
were divided ultimately in half.

669
00:31:07,330 --> 00:31:12,090
>> So merge sort, let's do this, if you'll
humor me, with one more demonstration,

670
00:31:12,090 --> 00:31:14,730
just so that we have some
numbers to work with.

671
00:31:14,730 --> 00:31:19,470
Can I exchange eight stress
balls for eight people?

672
00:31:19,470 --> 00:31:29,320
All right, how about you three, you four
in this section, five, six, and let's

673
00:31:29,320 --> 00:31:30,720
do 7, 8, come on up.

674
00:31:30,720 --> 00:31:35,120

675
00:31:35,120 --> 00:31:36,520
OK, yeah OK.

676
00:31:36,520 --> 00:31:38,640
Minus 8, there we go, plus 1.

677
00:31:38,640 --> 00:31:39,150
Excellent.

678
00:31:39,150 --> 00:31:42,000
All right come on up, let's
quickly give you numbers.

679
00:31:42,000 --> 00:31:50,800
Number two, number three, number four,
number five, six, seven, and eight.

680
00:31:50,800 --> 00:31:52,140
I did eight correctly this time.

681
00:31:52,140 --> 00:31:56,390
>> OK, so go ahead if you could, and
let's sort in the original order

682
00:31:56,390 --> 00:31:59,810
that we had yesterday which looked
like this, if you wouldn't mind.

683
00:31:59,810 --> 00:32:03,620
And let's do it in front of the table.

684
00:32:03,620 --> 00:32:06,510
All right, so merge sort.

685
00:32:06,510 --> 00:32:08,820
This is where it's going
to get kind of interesting,

686
00:32:08,820 --> 00:32:12,800
because I seem to be giving myself
so much less information today.

687
00:32:12,800 --> 00:32:15,149
>> So merge sort first of all
on input of n elements,

688
00:32:15,149 --> 00:32:18,440
and is obviously not less than two, it's
eight, so I have some more work to do.

689
00:32:18,440 --> 00:32:21,140
So now mentally we as a class
are now in the else branch,

690
00:32:21,140 --> 00:32:22,540
which means three steps.

691
00:32:22,540 --> 00:32:25,017
First, I have to sort the
left half of the elements.

692
00:32:25,017 --> 00:32:26,350
So how do I go about doing this?

693
00:32:26,350 --> 00:32:28,950
Well, I'm going to kind of
mentally divide the list here,

694
00:32:28,950 --> 00:32:30,700
you don't have to
physically move, and I'm

695
00:32:30,700 --> 00:32:33,180
going to focus only on the
left half of the elements here.

696
00:32:33,180 --> 00:32:36,770
So how do I go about sorting
a list now of size four?

697
00:32:36,770 --> 00:32:38,730
What's my algorithm?

698
00:32:38,730 --> 00:32:42,580
First I check is n less than two, no,
so I proceed to the else block again.

699
00:32:42,580 --> 00:32:43,900
Sort left half of elements.

700
00:32:43,900 --> 00:32:45,608
>> So now again, mentally,
and this is where

701
00:32:45,608 --> 00:32:49,550
you have to accrue a lot of
mental history, if you will.

702
00:32:49,550 --> 00:32:51,940
Now I'm sorting the left
half of the left half.

703
00:32:51,940 --> 00:32:57,000
All right, so now I call my same merge
sorting algorithm, is n less than two?

704
00:32:57,000 --> 00:33:00,590
No, it is two, so I have to sort
the left half, and the right half.

705
00:33:00,590 --> 00:33:02,042
So here we go, sort the left half.

706
00:33:02,042 --> 00:33:03,750
Why don't you just
take one step forward.

707
00:33:03,750 --> 00:33:04,415
What's your name?

708
00:33:04,415 --> 00:33:04,860
>> AUDIENCE: Darren.

709
00:33:04,860 --> 00:33:05,260
>> SPEAKER: Dan.

710
00:33:05,260 --> 00:33:06,040
Dan has stepped forward.

711
00:33:06,040 --> 00:33:06,748
>> AUDIENCE: Darren.

712
00:33:06,748 --> 00:33:09,000
SPEAKER: Darren, done.

713
00:33:09,000 --> 00:33:10,090
Did you say Darren or Dan?

714
00:33:10,090 --> 00:33:10,550
>> AUDIENCE: Darren.

715
00:33:10,550 --> 00:33:11,216
>> SPEAKER: Darren.

716
00:33:11,216 --> 00:33:14,422
OK, Darren has stepped
forward and he is now sorted.

717
00:33:14,422 --> 00:33:16,130
And this is almost an
inane claim, right?

718
00:33:16,130 --> 00:33:18,862
I don't really seem to be achieving
anything, but let's proceed.

719
00:33:18,862 --> 00:33:20,820
Now let me sort the right
half of the elements.

720
00:33:20,820 --> 00:33:21,200
What's your name?

721
00:33:21,200 --> 00:33:21,690
>> AUDIENCE: Luke.

722
00:33:21,690 --> 00:33:22,273
>> SPEAKER: Luke.

723
00:33:22,273 --> 00:33:23,400
Come on, step forward.

724
00:33:23,400 --> 00:33:25,640
Done, I have sorted Luke.

725
00:33:25,640 --> 00:33:28,570
The left half is now sorted and
the right half is now sorted,

726
00:33:28,570 --> 00:33:30,770
but again, there's a key step here.

727
00:33:30,770 --> 00:33:32,940
What do I next need to do?

728
00:33:32,940 --> 00:33:33,941
Merge the sorted halves.

729
00:33:33,941 --> 00:33:36,648
Now we're going to just have
everyone back and forth in this way,

730
00:33:36,648 --> 00:33:38,620
because I kind of need
some scratch space.

731
00:33:38,620 --> 00:33:40,411
It's almost like these
guys are on a table,

732
00:33:40,411 --> 00:33:42,460
and I need some room
to move them around on.

733
00:33:42,460 --> 00:33:44,170
So I'm going to merge
you guys by looking

734
00:33:44,170 --> 00:33:45,960
at the left half and the right half.

735
00:33:45,960 --> 00:33:48,740
And who obviously comes first,
left half or right half?

736
00:33:48,740 --> 00:33:52,710
So right half, so let's move Luke over
here to Darren's original position.

737
00:33:52,710 --> 00:33:57,640
And now to merge their left half in,
Darren's going to move right there.

738
00:33:57,640 --> 00:33:59,750
>> So feels like almost
a bubble sort effect,

739
00:33:59,750 --> 00:34:02,482
but my fundamental algorithm,
very different this time.

740
00:34:02,482 --> 00:34:04,815
But now's where things get a
little annoying because you

741
00:34:04,815 --> 00:34:06,810
have to rewind mentally
where did I leave off.

742
00:34:06,810 --> 00:34:09,893
I've just merged the sorted halves,
which means I'm where in my algorithm?

743
00:34:09,893 --> 00:34:12,229

744
00:34:12,229 --> 00:34:13,770
I have to sort the right half, right?

745
00:34:13,770 --> 00:34:15,910
>> If you rewind, literally
on the video, you'll

746
00:34:15,910 --> 00:34:18,339
see that we got to this
point of Luke and Darren

747
00:34:18,339 --> 00:34:21,370
by sorting the left
half of the left half.

748
00:34:21,370 --> 00:34:23,430
Then we merged those
sorted halves, which

749
00:34:23,430 --> 00:34:27,941
means the next step is sort the
right half of the left half.

750
00:34:27,941 --> 00:34:29,649
All right, so let's
do this more quickly.

751
00:34:29,649 --> 00:34:33,282
All right, six, I'm going to claim
you are now sorted, come on forward.

752
00:34:33,282 --> 00:34:33,990
What's your name?

753
00:34:33,990 --> 00:34:34,589
>> AUDIENCE: Adriano.

754
00:34:34,589 --> 00:34:35,200
>> SPEAKER: Adriano.

755
00:34:35,200 --> 00:34:36,010
Adriano is now sorted.

756
00:34:36,010 --> 00:34:36,450
And what's your name?

757
00:34:36,450 --> 00:34:37,080
>> AUDIENCE: Alex.

758
00:34:37,080 --> 00:34:38,379
>> SPEAKER: Alex is now sorted.

759
00:34:38,379 --> 00:34:40,750
Left half, right half,
what's the final step?

760
00:34:40,750 --> 00:34:41,250
Merge.

761
00:34:41,250 --> 00:34:44,310
Pretty trivial, so I'm
going to merge in six,

762
00:34:44,310 --> 00:34:46,930
take a step back,
eight, take a step back.

763
00:34:46,930 --> 00:34:49,530
And now notice this is
a useful takeaway, what

764
00:34:49,530 --> 00:34:53,930
is now true about the left half of the
list, irrespective of how we began?

765
00:34:53,930 --> 00:34:55,090
It is sorted.

766
00:34:55,090 --> 00:34:57,750
>> Now it's not sorted in
the big scheme of things,

767
00:34:57,750 --> 00:35:00,250
but it is sorted independently
of the other half.

768
00:35:00,250 --> 00:35:04,100
Now what step am I on if I keep
rewinding how the story began?

769
00:35:04,100 --> 00:35:05,680
Now I have to sort the right half.

770
00:35:05,680 --> 00:35:07,630
So now we're way back at
the beginning of the story,

771
00:35:07,630 --> 00:35:08,921
and let's do this more rapidly.

772
00:35:08,921 --> 00:35:11,320
So I'm going to sort the
right half of the whole list.

773
00:35:11,320 --> 00:35:13,060
What's the next step?

774
00:35:13,060 --> 00:35:15,840
Sort the left half of the right half.

775
00:35:15,840 --> 00:35:18,715
Sort the left half of the
left half of the right half.

776
00:35:18,715 --> 00:35:19,590
And what's your name?

777
00:35:19,590 --> 00:35:20,230
>> AUDIENCE: Omar.

778
00:35:20,230 --> 00:35:21,970
>> SPEAKER: Omar, step forward, done.

779
00:35:21,970 --> 00:35:22,860
Left half is sorted.

780
00:35:22,860 --> 00:35:23,330
And what's your name?

781
00:35:23,330 --> 00:35:23,820
>> AUDIENCE: Chris.

782
00:35:23,820 --> 00:35:25,620
>> SPEAKER: Chris, take a step
forward, you are now sorted.

783
00:35:25,620 --> 00:35:27,010
What's the key step now?

784
00:35:27,010 --> 00:35:27,510
Merge.

785
00:35:27,510 --> 00:35:30,509
So one is going to merge into place
here, if you could take a step back,

786
00:35:30,509 --> 00:35:32,930
and three is going to
take a step back, merge.

787
00:35:32,930 --> 00:35:38,080
So the left half of the
right half, is now sorted.

788
00:35:38,080 --> 00:35:41,747
Frankly, this algorithm feels like we
are wasting way more time than before,

789
00:35:41,747 --> 00:35:44,830
but if we did this in real time, we'll
see what the takeaways going to be.

790
00:35:44,830 --> 00:35:47,970
Now here I am, right
half of the right half,

791
00:35:47,970 --> 00:35:50,170
let me go ahead and sort the left half.

792
00:35:50,170 --> 00:35:51,482
Step forward, what's your name?

793
00:35:51,482 --> 00:35:52,190
AUDIENCE: Ramsey.

794
00:35:52,190 --> 00:35:53,210
SPEAKER: Ramsey is now sorted.

795
00:35:53,210 --> 00:35:53,570
What's your name?

796
00:35:53,570 --> 00:35:54,200
>> AUDIENCE: Marina.

797
00:35:54,200 --> 00:35:57,033
>> SPEAKER: Marina is now sorted as
well, if you take one step forward.

798
00:35:57,033 --> 00:36:00,690
Key step here is now merge, I'm
going to pluck from my two lists,

799
00:36:00,690 --> 00:36:01,720
left and right.

800
00:36:01,720 --> 00:36:05,150
Five is going to come first,
and seven is going to come next.

801
00:36:05,150 --> 00:36:06,410
And again, this is deliberate.

802
00:36:06,410 --> 00:36:08,535
The fact that they're taking
steps forward and back

803
00:36:08,535 --> 00:36:12,997
is meant to represent that we can't
do this algorithm in place as easily

804
00:36:12,997 --> 00:36:15,830
as bubble sort, and selection sort,
and insertion sort where we just

805
00:36:15,830 --> 00:36:16,960
kept swapping people.

806
00:36:16,960 --> 00:36:19,940
I literally need a sort
of scratch paper in which

807
00:36:19,940 --> 00:36:21,827
to put these folks
while I do the merging,

808
00:36:21,827 --> 00:36:23,410
and then I can put them back in place.

809
00:36:23,410 --> 00:36:27,260
And that's key because I'm using a
new resource, space, not just time.

810
00:36:27,260 --> 00:36:28,270
>> OK, this is amazing.

811
00:36:28,270 --> 00:36:32,050
Left half is sorted, right half is
sorted, now that key merging step.

812
00:36:32,050 --> 00:36:33,450
How am I going to merge this?

813
00:36:33,450 --> 00:36:35,470
So if you'll follow my
left hand and right hand,

814
00:36:35,470 --> 00:36:38,930
I'm going to point my left hand
at the left half, my right hand

815
00:36:38,930 --> 00:36:42,680
at the right half, and now I have to
decide step by step whom to merge in.

816
00:36:42,680 --> 00:36:44,650
Who obviously comes first?

817
00:36:44,650 --> 00:36:45,150
Number one.

818
00:36:45,150 --> 00:36:47,327
So come on over here,
here's our scratch pad.

819
00:36:47,327 --> 00:36:49,910
So now number one, and notice
what I'll do with my right hand,

820
00:36:49,910 --> 00:36:54,152
I'm going to move my right hand one
step over to point number three,

821
00:36:54,152 --> 00:36:55,860
and now I have to make
the same decision.

822
00:36:55,860 --> 00:36:58,387
And actually stand right in
front of Luke here if you could,

823
00:36:58,387 --> 00:36:59,720
because this is our scratch pad.

824
00:36:59,720 --> 00:37:00,610
So who comes next?

825
00:37:00,610 --> 00:37:05,000
We have Luke with number two
or Chris with number three.

826
00:37:05,000 --> 00:37:07,460
Obviously Luke, number
two, so you come here.

827
00:37:07,460 --> 00:37:11,270
>> But my left hand now is going to
be incremented to point at Darren,

828
00:37:11,270 --> 00:37:15,160
and here's the key take away with
merging, I'm going to keep doing this,

829
00:37:15,160 --> 00:37:17,340
obviously, if you kind
of follow the logic.

830
00:37:17,340 --> 00:37:19,670
But my hands are never
going to go backwards,

831
00:37:19,670 --> 00:37:23,861
which means I'm only ever moving to
the left with my merging process,

832
00:37:23,861 --> 00:37:26,360
and that's going to be key to
our analysis in just a moment.

833
00:37:26,360 --> 00:37:27,859
>> So now let's finish this up rapidly.

834
00:37:27,859 --> 00:37:31,650
So three comes next,
then four comes next,

835
00:37:31,650 --> 00:37:38,750
and now five comes next, then six,
and seven, and then finally eight.

836
00:37:38,750 --> 00:37:42,960
Feels like the slowest algorithm
yet, but not if we actually

837
00:37:42,960 --> 00:37:45,510
run it at the same sort
of clock speed, so to

838
00:37:45,510 --> 00:37:48,106
speak, with the same
ticking clock as before.

839
00:37:48,106 --> 00:37:48,605
Why?

840
00:37:48,605 --> 00:37:51,100
Well, Let's take a
look at the end result.

841
00:37:51,100 --> 00:37:56,990
>> Let's go back over here, let me
pull up a demonstration visually

842
00:37:56,990 --> 00:37:59,030
of what we just did.

843
00:37:59,030 --> 00:38:06,110
Zooming in here, on this
page here, telling Firefox

844
00:38:06,110 --> 00:38:08,200
that we want to queue
up in this box, let's

845
00:38:08,200 --> 00:38:11,260
say bubble sort, with which
we're now well familiar,

846
00:38:11,260 --> 00:38:14,130
selection sort, which is another
fairly straightforward one,

847
00:38:14,130 --> 00:38:18,250
and now today's merge sort, which
will be our climactic ending.

848
00:38:18,250 --> 00:38:21,530
The reason it took so much longer
here with humans and me verbally is,

849
00:38:21,530 --> 00:38:23,480
obviously, I'm explaining every step.

850
00:38:23,480 --> 00:38:26,920
But if you simply execute this, much
like we did bubble sort and selection

851
00:38:26,920 --> 00:38:30,890
sort not only visually, watch
just how much more efficiently

852
00:38:30,890 --> 00:38:33,330
this leveraging of
division and conquering

853
00:38:33,330 --> 00:38:39,150
can be when applied to a data set that's
not even size eight, but even much,

854
00:38:39,150 --> 00:38:39,970
much bigger.

855
00:38:39,970 --> 00:38:44,585
I give you merge sort, side by
side with these other algorithms.

856
00:38:44,585 --> 00:38:56,364

857
00:38:56,364 --> 00:38:58,530
This is going to get painful
quickly, and the ending

858
00:38:58,530 --> 00:39:00,890
is not particularly climactic,
they just end up sorted.

859
00:39:00,890 --> 00:39:05,280
But the key take away is that
look how much faster merge sort

860
00:39:05,280 --> 00:39:08,110
was, unless you think I'm
just kind of messing with you.

861
00:39:08,110 --> 00:39:13,100
If we do this one final time,
let's reload this, let's go back

862
00:39:13,100 --> 00:39:14,960
and choose bubble sort,
and just for kicks,

863
00:39:14,960 --> 00:39:17,330
let's choose insertion
sort, just for good measure.

864
00:39:17,330 --> 00:39:20,020
And this time again, let's
choose merge sort and let's

865
00:39:20,020 --> 00:39:21,595
actually run these side by side.

866
00:39:21,595 --> 00:39:24,140

867
00:39:24,140 --> 00:39:26,930
>> And it's not, in fact, a fluke.

868
00:39:26,930 --> 00:39:31,140
What I've effectively done is I've
divided my input in half, again,

869
00:39:31,140 --> 00:39:32,240
and again, and again.

870
00:39:32,240 --> 00:39:35,590
And there's only so many times you can
divide your input into halves, left

871
00:39:35,590 --> 00:39:36,240
and right.

872
00:39:36,240 --> 00:39:39,425
What's the formula that we keep seeing
that describes the division in half

873
00:39:39,425 --> 00:39:41,050
again, and again, and again, and again?

874
00:39:41,050 --> 00:39:41,890
>> AUDIENCE: Log n.

875
00:39:41,890 --> 00:39:42,760
>> SPEAKER: Log n.

876
00:39:42,760 --> 00:39:46,300
But then there's one other key step,
this algorithm is not log n steps.

877
00:39:46,300 --> 00:39:48,992
If it were only log n steps,
we would be in the same problem

878
00:39:48,992 --> 00:39:51,200
as before where we can't be
sure everything's sorted.

879
00:39:51,200 --> 00:39:54,480
You have to minimally look at n elements
to be sure n elements are sorted,

880
00:39:54,480 --> 00:39:55,950
otherwise it's a leap of faith.

881
00:39:55,950 --> 00:39:59,810
>> So it's minimally log n steps, but
what about this key merging step

882
00:39:59,810 --> 00:40:04,370
where I merged my left half and right
half and walked across the stage?

883
00:40:04,370 --> 00:40:06,980
How many steps is that to merge?

884
00:40:06,980 --> 00:40:10,150
It's n, but I didn't just
merge the final time.

885
00:40:10,150 --> 00:40:15,089
On each of those nested calls, on each
of those nested merges, I still sorted.

886
00:40:15,089 --> 00:40:18,380
I merged these two guys, then these two
guys, then these two guys and so forth.

887
00:40:18,380 --> 00:40:19,955
>> So I did merging again, and again.

888
00:40:19,955 --> 00:40:20,580
How many times?

889
00:40:20,580 --> 00:40:23,510
So every time I divided the
list in half, I did a merge.

890
00:40:23,510 --> 00:40:25,460
Divide the list in half, do a merge.

891
00:40:25,460 --> 00:40:28,570
So if dividing the list
can be done log n times,

892
00:40:28,570 --> 00:40:33,880
and the merging ultimately takes n
steps, what might be now the upper

893
00:40:33,880 --> 00:40:37,000
bound on the running
time of our algorithm?

894
00:40:37,000 --> 00:40:37,980
n log n.

895
00:40:37,980 --> 00:40:40,560
>> And indeed, that's what
we've achieved here.

896
00:40:40,560 --> 00:40:44,650
So the feel that you see visually when
those three things run side by side

897
00:40:44,650 --> 00:40:47,930
is n squared against n
squared against n log n.

898
00:40:47,930 --> 00:40:51,010
Which fundamentally we'll see,
not only today but in the future,

899
00:40:51,010 --> 00:40:52,760
is much, much faster.

900
00:40:52,760 --> 00:40:56,010
A round of applause for these guys,
I will reward them with stress balls.

901
00:40:56,010 --> 00:41:00,260
Let's adjourn here today, and
we will see you on Monday.

902
00:41:00,260 --> 00:41:02,255