WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:01:24.180
[MUSIC PLAYING]

00:01:24.180 --> 00:01:25.890
SPEAKER 1: This is CS50.

00:01:25.890 --> 00:01:27.870
And perhaps not unlike
past lectures, today

00:01:27.870 --> 00:01:30.187
is going to feel a bit
like a fire hose again,

00:01:30.187 --> 00:01:32.520
but realize that it's going
to be a lot less code today.

00:01:32.520 --> 00:01:35.550
So there's less syntax, just a
few new ideas here and there.

00:01:35.550 --> 00:01:38.550
And the goal ultimately today
really is the concepts and the ideas

00:01:38.550 --> 00:01:39.360
that we take away.

00:01:39.360 --> 00:01:41.735
And keep in mind that over
the remainder of the semester,

00:01:41.735 --> 00:01:44.710
we will continue to apply
and reapply these same ideas.

00:01:44.710 --> 00:01:46.380
So the goal today really is exposure.

00:01:46.380 --> 00:01:48.690
And the goal for the
semester is comfort.

00:01:48.690 --> 00:01:52.058
So with that said, let's
consider where we left off

00:01:52.058 --> 00:01:54.600
last time, which was to consider
that inside of our computer.

00:01:54.600 --> 00:01:55.683
Of course, we have memory.

00:01:55.683 --> 00:01:57.360
We have RAM, Random Access Memory.

00:01:57.360 --> 00:02:00.180
And it was convenient, we found,
to start to divide this up

00:02:00.180 --> 00:02:01.440
into individual bytes.

00:02:01.440 --> 00:02:05.520
Like, byte 0 might be in the top left,
and 2 gigabytes, 2 billion bytes,

00:02:05.520 --> 00:02:08.370
might be all the way down there
in the bottom right-hand corner.

00:02:08.370 --> 00:02:10.199
And once we started
to layout our memory,

00:02:10.199 --> 00:02:14.730
both conceptually and technologically,
in this way, left to right, top

00:02:14.730 --> 00:02:18.420
to bottom, we had the ability to
use a data structure of sorts.

00:02:18.420 --> 00:02:21.330
We introduced, recall, arrays,
whereby if we think of our memory

00:02:21.330 --> 00:02:24.180
as just a grid of bytes,
we can start using it

00:02:24.180 --> 00:02:26.610
kind of to our advantage
to solve problems.

00:02:26.610 --> 00:02:30.660
And it turns out that perhaps a physical
incarnation of this idea of an array

00:02:30.660 --> 00:02:32.790
might be like these red lockers here.

00:02:32.790 --> 00:02:35.970
Because even though you and I, every
time we've looked at arrays thus far,

00:02:35.970 --> 00:02:39.330
can kind of get a bird's-eye view
of everything that's in the array,

00:02:39.330 --> 00:02:40.830
computer's actually pretty limited.

00:02:40.830 --> 00:02:43.387
It doesn't have that instant
detection that you and I

00:02:43.387 --> 00:02:46.470
have when we just scan a list of numbers
and [? kind ?] of take it all in.

00:02:46.470 --> 00:02:50.760
A computer is only going to be able to
look at the contents of an array step

00:02:50.760 --> 00:02:53.692
by step by step, consistent with
this whole idea of an algorithm.

00:02:53.692 --> 00:02:56.400
A computer can't just look at all
the numbers and take it all in.

00:02:56.400 --> 00:03:00.000
It can only open, so to speak,
one of these lockers at a time.

00:03:00.000 --> 00:03:03.120
So toward that end, we've gone
ahead and populated these lockers

00:03:03.120 --> 00:03:04.545
with a whole bunch of numbers.

00:03:04.545 --> 00:03:06.420
And the goal at hand is
to solve the problem.

00:03:06.420 --> 00:03:08.693
The goal at hand is to
find one of those numbers.

00:03:08.693 --> 00:03:11.610
So if we distill computer science
into this problem-solving mechanism,

00:03:11.610 --> 00:03:14.190
the input today is these seven lockers.

00:03:14.190 --> 00:03:18.090
And the output is going to
be a Boolean, true or false,

00:03:18.090 --> 00:03:21.480
is the number we're looking
for among those seven lockers?

00:03:21.480 --> 00:03:25.140
And so rather than my kind of poking
around looking for this number,

00:03:25.140 --> 00:03:31.050
might I call on, say, two volunteers to
kick us off with two, say, algorithms?

00:03:31.050 --> 00:03:32.310
Let's see.

00:03:32.310 --> 00:03:33.180
Over here?

00:03:33.180 --> 00:03:33.870
Yeah.

00:03:33.870 --> 00:03:37.710
And let's go over here
in front, if we may.

00:03:37.710 --> 00:03:39.560
Come on up.

00:03:39.560 --> 00:03:40.560
And what are your names?

00:03:40.560 --> 00:03:41.500
AUDIENCE: [? Nizari. ?]

00:03:41.500 --> 00:03:41.930
SPEAKER 1: Lizari?

00:03:41.930 --> 00:03:42.430
AUDIENCE: [? Nizari. ?]

00:03:42.430 --> 00:03:44.050
SPEAKER 1: [? Nizari. ?] OK, David.

00:03:44.050 --> 00:03:44.800
Nice to meet you.

00:03:44.800 --> 00:03:47.670
Come on up, [? Nizari. ?] And
over here, what's your name?

00:03:47.670 --> 00:03:48.467
AUDIENCE: Eric.

00:03:48.467 --> 00:03:49.300
SPEAKER 1: Eric, OK.

00:03:49.300 --> 00:03:49.800
David.

00:03:49.800 --> 00:03:51.210
And come on up, Brian, as well.

00:03:51.210 --> 00:03:52.670
Nice to meet you, Eric.

00:03:52.670 --> 00:03:54.560
Eric, [? Nizari. ?] [? Nizari, ?] Eric.

00:03:54.560 --> 00:03:55.890
[APPLAUSE]

00:03:55.890 --> 00:03:57.630
Come on over here.

00:03:57.630 --> 00:03:59.730
So you came on up the stage first.

00:03:59.730 --> 00:04:01.530
Would you like to go first or go second?

00:04:01.530 --> 00:04:02.270
AUDIENCE: I'll go second.

00:04:02.270 --> 00:04:03.510
SPEAKER 1: You're going to go second.

00:04:03.510 --> 00:04:04.260
So Eric, you're up first.

00:04:04.260 --> 00:04:05.593
Come on over here, if you would.

00:04:05.593 --> 00:04:10.260
So Eric, behind these seven doors we
have placed, in advance, the number 50.

00:04:10.260 --> 00:04:12.570
And we would simply
like you, the computer,

00:04:12.570 --> 00:04:15.560
to search this array for the number 50.

00:04:15.560 --> 00:04:17.440
AUDIENCE: Is it sorted?

00:04:17.440 --> 00:04:21.180
SPEAKER 1: I cannot answer
that question at this time.

00:04:21.180 --> 00:04:23.582
Go.

00:04:23.582 --> 00:04:25.290
Oh, and so that the
audience knows what's

00:04:25.290 --> 00:04:28.170
going on, if you wouldn't mind
taking the numbers out to see.

00:04:28.170 --> 00:04:29.360
AUDIENCE: This is seven.

00:04:29.360 --> 00:04:30.480
SPEAKER 1: Excellent.

00:04:30.480 --> 00:04:31.310
Not 50.

00:04:31.310 --> 00:04:33.160
AUDIENCE: Should I-- can I take it out?

00:04:33.160 --> 00:04:35.827
SPEAKER 1: At this point, yes,
you may do whatever you want now.

00:04:35.827 --> 00:04:37.594
Just find us 50.

00:04:37.594 --> 00:04:38.940
AUDIENCE: Two.

00:04:38.940 --> 00:04:39.860
[LAUGHTER]

00:04:39.860 --> 00:04:42.216
SPEAKER 1: Very good.

00:04:42.216 --> 00:04:43.400
AUDIENCE: One.

00:04:43.400 --> 00:04:45.972
SPEAKER 1: Nice.

00:04:45.972 --> 00:04:47.286
AUDIENCE: Six.

00:04:47.286 --> 00:04:50.094
SPEAKER 1: Very good.

00:04:50.094 --> 00:04:50.800
AUDIENCE: Three.

00:04:51.300 --> 00:04:51.967
SPEAKER 1: Nice.

00:04:51.967 --> 00:04:53.633
AUDIENCE: None of these are close to 50.

00:04:53.633 --> 00:04:55.620
SPEAKER 1: No, none of
them are close to 50.

00:04:55.620 --> 00:04:56.355
AUDIENCE: Four.

00:04:56.355 --> 00:04:57.810
SPEAKER 1: Four?

00:04:57.810 --> 00:04:59.748
And?

00:04:59.748 --> 00:05:00.720
AUDIENCE: 50!

00:05:00.720 --> 00:05:01.770
SPEAKER 1: Amazing!

00:05:01.770 --> 00:05:02.770
Very well done.

00:05:02.770 --> 00:05:04.960
[APPLAUSE]

00:05:05.785 --> 00:05:06.410
Very well done.

00:05:06.410 --> 00:05:08.390
Now, if I may, Eric,
what was the algorithm

00:05:08.390 --> 00:05:10.160
via which you found us the number 50?

00:05:10.160 --> 00:05:11.195
AUDIENCE: Linear search.

00:05:11.195 --> 00:05:13.520
SPEAKER 1: OK, linear
search, meaning what to you?

00:05:13.520 --> 00:05:17.215
AUDIENCE: You just go in a line,
starting from there until there.

00:05:17.215 --> 00:05:19.340
SPEAKER 1: OK, that was a
very sophisticated answer

00:05:19.340 --> 00:05:20.798
to a term we've not yet introduced.

00:05:20.798 --> 00:05:24.290
And that's great, linear search
from left to right, so literally

00:05:24.290 --> 00:05:25.043
following a line.

00:05:25.043 --> 00:05:26.960
And was your algorithm
correct, would you say?

00:05:26.960 --> 00:05:27.605
AUDIENCE: Yes.

00:05:27.605 --> 00:05:29.010
SPEAKER 1: OK, so it was correct.

00:05:29.010 --> 00:05:31.048
But there's these different
parameters that we

00:05:31.048 --> 00:05:33.590
want to optimize solutions for
not just correctness, but what

00:05:33.590 --> 00:05:35.110
other property as well?

00:05:35.110 --> 00:05:35.870
AUDIENCE: Design.

00:05:35.870 --> 00:05:37.953
SPEAKER 1: So maybe design,
right, the efficiency.

00:05:37.953 --> 00:05:40.100
So was that the most
efficient you could have done?

00:05:40.100 --> 00:05:43.160
AUDIENCE: Actually, yeah, I think so.

00:05:43.160 --> 00:05:43.940
[LAUGHTER]

00:05:43.940 --> 00:05:44.940
SPEAKER 1: And why do you say that?

00:05:44.940 --> 00:05:46.950
AUDIENCE: Because-- so
the numbers are sorted.

00:05:46.950 --> 00:05:50.160
So at the end of the day, I have
to look through every single one.

00:05:50.160 --> 00:05:50.360
SPEAKER 1: Yeah.

00:05:50.360 --> 00:05:52.520
AUDIENCE: And it's just by
chance that the 50 was at last.

00:05:52.520 --> 00:05:53.312
SPEAKER 1: Exactly.

00:05:53.312 --> 00:05:55.370
So it's unfortunate that
they were all random.

00:05:55.370 --> 00:05:57.680
And I didn't want to tell you because I
didn't want to bias your algorithm one

00:05:57.680 --> 00:05:58.388
way or the other.

00:05:58.388 --> 00:06:01.400
But not knowing if they're sorted
and them not even being sorted

00:06:01.400 --> 00:06:04.968
means that that is the best you
can do, look at all of the doors

00:06:04.968 --> 00:06:06.260
to find the number in question.

00:06:06.260 --> 00:06:08.908
And maybe you could have gotten
lucky, if we had put 50 here.

00:06:08.908 --> 00:06:10.700
But in the worst case,
Eric was, of course,

00:06:10.700 --> 00:06:13.280
going to have to do exactly
that, searching all of the boxes.

00:06:13.280 --> 00:06:14.480
So thank you, Eric.

00:06:14.480 --> 00:06:16.100
Stay on stage with us, if
you would, for a moment.

00:06:16.100 --> 00:06:18.230
And a round of applause, if we
could, for finding 50 so well.

00:06:18.230 --> 00:06:18.730
[APPLAUSE]

00:06:18.730 --> 00:06:21.120
[? Nizari, ?] could you come on up?

00:06:21.120 --> 00:06:23.390
We need you not to look at
the numbers, because Brian

00:06:23.390 --> 00:06:24.830
needs to do a little bit of magic.

00:06:24.830 --> 00:06:27.170
And he's going to put some of
the numbers back into the locker.

00:06:27.170 --> 00:06:29.253
So literally everyone in
the room will know what's

00:06:29.253 --> 00:06:30.932
going on except you, at the moment.

00:06:30.932 --> 00:06:33.140
But we're going to give you
the added bonus this time

00:06:33.140 --> 00:06:36.080
of sorting the numbers in advance.

00:06:36.080 --> 00:06:38.567
So Brian is in the process of
sorting some numbers for us.

00:06:38.567 --> 00:06:40.400
The goal at hand, in
just a moment, is still

00:06:40.400 --> 00:06:42.380
going to be the find the number 50.

00:06:42.380 --> 00:06:45.780
I'm really just stalling right
now because he's still doing this.

00:06:45.780 --> 00:06:49.730
So I don't really have anything
interesting to say just yet.

00:06:49.730 --> 00:06:50.660
Brian's back now.

00:06:50.660 --> 00:06:51.460
Hold on.

00:06:51.460 --> 00:06:52.790
And would you like to
introduce yourself maybe?

00:06:52.790 --> 00:06:53.580
AUDIENCE: I'm [? Nizari. ?]

00:06:53.580 --> 00:06:54.590
SPEAKER 1: [? Nizari, ?]
and what year are you?

00:06:54.590 --> 00:06:56.615
AUDIENCE: I'm a high
school student, a senior.

00:06:56.615 --> 00:06:57.490
SPEAKER 1: Wonderful.

00:06:57.490 --> 00:06:58.130
At what school?

00:06:58.130 --> 00:06:59.300
AUDIENCE: Cambridge Rindge and
Latin, it's down the street.

00:06:59.300 --> 00:07:00.572
SPEAKER 1: Just down the road.

00:07:00.572 --> 00:07:02.030
So glad you can join us here today.

00:07:02.030 --> 00:07:03.770
And perfect timing, if I may.

00:07:03.770 --> 00:07:06.230
Now we have seven
lockers here behind you.

00:07:06.230 --> 00:07:08.270
And the goal now is to
still find the number 50.

00:07:08.270 --> 00:07:10.860
But I'm going to tell you
that the numbers are sorted.

00:07:10.860 --> 00:07:14.100
So what's going to be your
algorithm, if not the same as Eric?

00:07:14.100 --> 00:07:15.100
AUDIENCE: I will start--

00:07:15.100 --> 00:07:15.510
SPEAKER 1: And here you go.

00:07:15.510 --> 00:07:17.450
AUDIENCE: I'm going to
start in the middle.

00:07:17.450 --> 00:07:19.400
SPEAKER 1: All right, go ahead
and show us what's in the middle.

00:07:19.400 --> 00:07:20.805
AUDIENCE: Middle number is seven.

00:07:20.805 --> 00:07:21.680
SPEAKER 1: All right.

00:07:21.680 --> 00:07:23.695
And now what's your
next step going to be?

00:07:23.695 --> 00:07:25.070
AUDIENCE: So I want to get to 50.

00:07:25.070 --> 00:07:27.770
So assuming that they're sorted,
I'm going to go this way.

00:07:27.770 --> 00:07:28.700
SPEAKER 1: Go to the right, OK.

00:07:28.700 --> 00:07:31.117
So we have three lockers
remaining on the right-hand side.

00:07:31.117 --> 00:07:32.240
What's your instinct now?

00:07:32.240 --> 00:07:34.620
AUDIENCE: Mm, I'm going
to start with this locker.

00:07:34.620 --> 00:07:36.530
SPEAKER 1: OK, this one being
in the middle of those three.

00:07:36.530 --> 00:07:36.910
And you find?

00:07:36.910 --> 00:07:37.940
AUDIENCE: And we got 81.

00:07:37.940 --> 00:07:38.435
SPEAKER 1: 81.

00:07:38.435 --> 00:07:39.800
AUDIENCE: So I know that's too big.

00:07:39.800 --> 00:07:40.620
SPEAKER 1: Way too far.

00:07:40.620 --> 00:07:42.260
AUDIENCE: Hoo, so I'm
going to go with this one.

00:07:42.350 --> 00:07:43.970
SPEAKER 1: Which is now in
the middle of the two lockers.

00:07:43.970 --> 00:07:44.590
AUDIENCE: And I got 50.

00:07:44.590 --> 00:07:47.574
SPEAKER 1: And a round of applause,
if we could, for [? Nizari. ?]

00:07:47.574 --> 00:07:50.240
[APPLAUSE]

00:07:50.240 --> 00:07:52.190
Congratulations and
thank you to you both.

00:07:52.190 --> 00:07:53.340
So thanks to you both.

00:07:53.340 --> 00:07:57.260
So here were two algorithms, dubbed
linear search and binary search.

00:07:57.260 --> 00:08:00.440
And that's all we have
for you right now.

00:08:00.440 --> 00:08:01.730
[LAUGHTER]

00:08:01.730 --> 00:08:04.190
So linear search and
binary search are aptly

00:08:04.190 --> 00:08:06.320
named for exactly the reasons we saw.

00:08:06.320 --> 00:08:09.230
Eric literally walked
across in a line looking

00:08:09.230 --> 00:08:12.110
for some element, where
[? Nizari, ?] instead, actually used

00:08:12.110 --> 00:08:14.510
binary search, "bi"
meaning two, and being

00:08:14.510 --> 00:08:18.080
very reminiscent of our discussion
of phone books in week 0,

00:08:18.080 --> 00:08:20.153
when I did this
divide-and-conquer approach.

00:08:20.153 --> 00:08:22.070
That, too, was called,
even though I might not

00:08:22.070 --> 00:08:24.620
have labeled it as such,
binary search because I

00:08:24.620 --> 00:08:27.995
kept dividing the problem in
two, hence the "bi" in binary.

00:08:27.995 --> 00:08:30.150
Binary search was again
and again and again,

00:08:30.150 --> 00:08:34.773
just as we did here when searching
for 50 the second time around.

00:08:34.773 --> 00:08:36.440
So these, of course, are two algorithms.

00:08:36.440 --> 00:08:39.380
But let's now start to formalize
this discussion a little bit

00:08:39.380 --> 00:08:43.340
and consider how it was each of them
was able to solve the problem correctly

00:08:43.340 --> 00:08:45.470
and then ultimately with better design.

00:08:45.470 --> 00:08:48.740
So linear search, we might
distill as pseudocode like this

00:08:48.740 --> 00:08:52.408
and again, pseudocode, English-like
syntax, no one way to write this.

00:08:52.408 --> 00:08:54.950
Eric, if I can put words in your
mouth, might have done this.

00:08:54.950 --> 00:08:58.190
You might have thought to yourself
for [? i ?] from 0 to n minus 1,

00:08:58.190 --> 00:09:03.230
to very quickly kind of map it to the
idea of code, where this is locker 0,

00:09:03.230 --> 00:09:07.820
and this is locker n minus 1 or 7
or 6, specifically, in this case,

00:09:07.820 --> 00:09:09.350
with 7 total lockers.

00:09:09.350 --> 00:09:13.670
He then checked if the ith elements--
ith just meaning the one he's currently

00:09:13.670 --> 00:09:14.330
looking at--

00:09:14.330 --> 00:09:16.910
happens to be 50, then
go ahead and return true,

00:09:16.910 --> 00:09:19.910
the bool that was meant to be
the output of this algorithm.

00:09:19.910 --> 00:09:22.490
And he kept doing that and
doing that and doing that.

00:09:22.490 --> 00:09:24.710
But suppose 50 were not there.

00:09:24.710 --> 00:09:28.160
And suppose he got all the way
here, to where there is no locker.

00:09:28.160 --> 00:09:29.736
What should he ultimately return?

00:09:29.736 --> 00:09:30.430
AUDIENCE: False.

00:09:30.430 --> 00:09:31.263
SPEAKER 1: So false.

00:09:31.263 --> 00:09:34.910
And so the very last step of this
algorithm not inside of that loop

00:09:34.910 --> 00:09:38.013
has to be kind of a catch all,
where you just say, return false.

00:09:38.013 --> 00:09:40.430
If I got all the way through
this loop and didn't find it,

00:09:40.430 --> 00:09:42.840
it must be the case that
50 is simply not there.

00:09:42.840 --> 00:09:45.660
So that might be one way to write
the pseudocode for this problem.

00:09:45.660 --> 00:09:50.600
But now let's consider for a moment just
how efficient or inefficient that code

00:09:50.600 --> 00:09:53.300
might have been vis a vis the
second algorithm, [? Nizari, ?]

00:09:53.300 --> 00:09:56.470
where she actually divided the
conquer in half in half in half.

00:09:56.470 --> 00:09:58.280
That, of course, was
called binary search.

00:09:58.280 --> 00:10:00.447
And we can write this in
any number of ways as well.

00:10:00.447 --> 00:10:02.120
But in pseudocode, I might propose this.

00:10:02.120 --> 00:10:03.870
Look right in the
middle, just as she did.

00:10:03.870 --> 00:10:07.160
And if that number is 50, what should
she have returned or outputted?

00:10:07.160 --> 00:10:08.150
AUDIENCE: True.

00:10:08.150 --> 00:10:09.650
SPEAKER 1: So true as our bool.

00:10:09.650 --> 00:10:14.600
And so we might have done this, else
if 50 were less than the middle item.

00:10:14.600 --> 00:10:16.940
She probably wanted
a search to the left,

00:10:16.940 --> 00:10:20.300
just as when I was searching for Mike
Smith, I might have gone left or right.

00:10:20.300 --> 00:10:25.250
So if 50 is less than the middle item,
she might want to search the left half.

00:10:25.250 --> 00:10:28.040
Meanwhile, if 50 is greater
than the middle item,

00:10:28.040 --> 00:10:31.160
then she might want to search
instead the right half.

00:10:31.160 --> 00:10:34.730
But there is a fourth
possibility, just to be safe here.

00:10:34.730 --> 00:10:36.050
What else might be the case?

00:10:36.050 --> 00:10:39.742
It's not in the middle, and it's not
to the left, and it's not to the right.

00:10:39.742 --> 00:10:40.700
So it's just not there.

00:10:40.700 --> 00:10:43.550
And so there's actually a fourth case,
and we can express this differently.

00:10:43.550 --> 00:10:45.425
I'm going to go ahead
and just say at the top

00:10:45.425 --> 00:10:49.070
if there's no items in the list, let me
go ahead and just claim return false.

00:10:49.070 --> 00:10:50.120
There's nothing there.

00:10:50.120 --> 00:10:53.360
After all, if I keep dividing a list
in half and half and half and half,

00:10:53.360 --> 00:10:55.250
eventually there's going
to be no list left.

00:10:55.250 --> 00:10:58.160
At which point, I should just
conclude, oh, it clearly wasn't there.

00:10:58.160 --> 00:11:02.623
If I halved it so many times, nothing
is left on the right or the left.

00:11:02.623 --> 00:11:04.040
So how might we now think of this?

00:11:04.040 --> 00:11:07.230
Well, just as in week 0,
we had a picture like this.

00:11:07.230 --> 00:11:09.740
And we claimed that these
algorithms were either

00:11:09.740 --> 00:11:11.600
linear in nature,
literally a straight line,

00:11:11.600 --> 00:11:14.780
like Eric's, or a little
more curved or logarithmic,

00:11:14.780 --> 00:11:16.640
so to speak, like
[? Nizari's. ?] And these

00:11:16.640 --> 00:11:18.098
had fundamentally different shapes.

00:11:18.098 --> 00:11:20.420
And we refer to them really
by the number of steps

00:11:20.420 --> 00:11:22.220
they might take in the worst case.

00:11:22.220 --> 00:11:26.720
If the phone book or today, the
number of lockers was n in total,

00:11:26.720 --> 00:11:29.120
it might take as many as
n steps for Eric or anyone

00:11:29.120 --> 00:11:32.790
to find Mike Smith or the
number 50 from left to right.

00:11:32.790 --> 00:11:37.910
If in week 0, I did two pages at a
time, you can actually speed that up,

00:11:37.910 --> 00:11:39.800
but the shape of the line was the same.

00:11:39.800 --> 00:11:41.360
Eric didn't do that
here, but he could have.

00:11:41.360 --> 00:11:43.970
With two hands, he maybe could
have looked at two lockers at once.

00:11:43.970 --> 00:11:46.970
So that might have been an intermediate
step between those two extremes.

00:11:46.970 --> 00:11:49.160
But logarithmic was
this more curved shape.

00:11:49.160 --> 00:11:52.010
But today, we're going to start
to formalize this a little bit

00:11:52.010 --> 00:11:54.380
so that we don't keep
talking about searching

00:11:54.380 --> 00:11:58.100
and binary search in linear search
alone, but other algorithms as well.

00:11:58.100 --> 00:12:00.470
And computer scientists
now actually have

00:12:00.470 --> 00:12:03.680
terminology with which
to describe algorithms

00:12:03.680 --> 00:12:06.140
and just how well
designed your algorithm is

00:12:06.140 --> 00:12:08.090
or how well implemented your code is.

00:12:08.090 --> 00:12:12.050
And it's generally called big O,
literally a capital, italicized O.

00:12:12.050 --> 00:12:14.870
Big O notation just
means on the order of.

00:12:14.870 --> 00:12:19.490
So if you were asked by someone what
is the efficiency of your algorithm

00:12:19.490 --> 00:12:21.620
or the efficiency of your
code, you could kind of

00:12:21.620 --> 00:12:23.910
wave your hand, literally
and figuratively,

00:12:23.910 --> 00:12:27.710
and give them an approximation of
just how fast or slow your code is.

00:12:27.710 --> 00:12:31.840
So instead of saying literally
n steps or n/2 or log n steps,

00:12:31.840 --> 00:12:33.920
a computer scientists
would typically say, ah,

00:12:33.920 --> 00:12:38.240
that algorithm is on the order
of n or on the order of n/2

00:12:38.240 --> 00:12:40.220
or on the order of log n.

00:12:40.220 --> 00:12:43.160
So this is just cryptic-looking
syntax that you pronounce verbally

00:12:43.160 --> 00:12:44.840
as "on the order of."

00:12:44.840 --> 00:12:48.470
And it's kind of written like a
math function, just as we have here.

00:12:48.470 --> 00:12:51.050
But it turns out that when
you're using big O notation,

00:12:51.050 --> 00:12:52.520
it really is kind of hand-waving.

00:12:52.520 --> 00:12:54.560
Like, it's just meant
to be an approximation.

00:12:54.560 --> 00:12:55.460
And you know what?

00:12:55.460 --> 00:12:59.030
In this case here, these
lines are so similar looking,

00:12:59.030 --> 00:13:01.370
I'm actually going to throw
away the divided by 2.

00:13:01.370 --> 00:13:03.470
And we'll see why this
is OK in just a moment.

00:13:03.470 --> 00:13:06.830
But those are so similar that I'm just
going to call them the same thing.

00:13:06.830 --> 00:13:09.950
And it turns out-- and it's fine if
you don't recall logarithms too well--

00:13:09.950 --> 00:13:11.660
the base 2 there it
doesn't really matter.

00:13:11.660 --> 00:13:12.868
I'm going to throw that away.

00:13:12.868 --> 00:13:14.510
It can be base 2 or 3 or 10.

00:13:14.510 --> 00:13:16.527
They're all within
multiples of one another.

00:13:16.527 --> 00:13:18.110
So that's no big deal either, I claim.

00:13:18.110 --> 00:13:20.330
And if you don't recall, that's OK, too.

00:13:20.330 --> 00:13:23.780
But the reason I claim that this
red line and this yellow line

00:13:23.780 --> 00:13:27.950
are essentially the same thing is
because if the problem gets big enough,

00:13:27.950 --> 00:13:30.500
that is the size of the
problem gets bigger and bigger,

00:13:30.500 --> 00:13:34.580
and I only have so much screen here-- so
let me instead just zoom out so that we

00:13:34.580 --> 00:13:37.010
see more y-axis and more x-axis.

00:13:37.010 --> 00:13:40.160
Notice how much closer the yellow and
red lines even get to one another.

00:13:40.160 --> 00:13:42.080
And honestly, if I kept
zooming out so that we

00:13:42.080 --> 00:13:44.300
could see bigger and
bigger and bigger problems,

00:13:44.300 --> 00:13:46.343
these, frankly, would
look pretty much the same.

00:13:46.343 --> 00:13:49.260
So when a computer scientist describes
the efficiency of an algorithm,

00:13:49.260 --> 00:13:53.930
they say it's on the order of n, even
if it's technically on the order of n/2.

00:13:53.930 --> 00:13:56.930
And here, too, on the order of
[? law, ?] [? again ?] irrespective

00:13:56.930 --> 00:13:57.810
of what the base is.

00:13:57.810 --> 00:13:58.730
So it's kind of nice, right?

00:13:58.730 --> 00:14:01.772
Even though it looks a little mathy,
you can still kind of wave your hand

00:14:01.772 --> 00:14:04.107
and approximate just a little bit.

00:14:04.107 --> 00:14:06.440
So there are different
algorithms, though, in the world.

00:14:06.440 --> 00:14:10.370
And here's kind of a cheat
sheet of common running times.

00:14:10.370 --> 00:14:12.530
A running time is just
how much time it takes

00:14:12.530 --> 00:14:15.140
for your program or
your algorithm to run,

00:14:15.140 --> 00:14:18.350
how many seconds does it take,
how many seconds does it take,

00:14:18.350 --> 00:14:21.680
how many steps does it make,
whenever your unit of measure is.

00:14:21.680 --> 00:14:24.470
And we'll see on the list
here some familiar terms.

00:14:24.470 --> 00:14:27.890
If I were to label this chart now with
a couple of the algorithms we've seen,

00:14:27.890 --> 00:14:30.380
linear search, we'll
say, is in big O of n.

00:14:30.380 --> 00:14:34.070
In the worst case, Eric is going to
have to look at all of the lockers,

00:14:34.070 --> 00:14:38.870
just like a few weeks ago I had to look
at all of the pages in the phone book

00:14:38.870 --> 00:14:40.910
maximally to find Mike Smith.

00:14:40.910 --> 00:14:43.430
And just to be clear,
where's binary search going

00:14:43.430 --> 00:14:46.470
to be in this list of running times?

00:14:46.470 --> 00:14:47.432
AUDIENCE: Log n.

00:14:47.432 --> 00:14:48.140
SPEAKER 1: Log n.

00:14:48.140 --> 00:14:49.280
So it's actually better.

00:14:49.280 --> 00:14:52.792
Lower on this chart is better, at
least in terms of time required,

00:14:52.792 --> 00:14:53.750
than anything above it.

00:14:53.750 --> 00:14:55.170
So we've seen this thus far.

00:14:55.170 --> 00:14:57.170
And now this sort of
invites the question, well,

00:14:57.170 --> 00:14:59.120
what algorithms kind of go here or here?

00:14:59.120 --> 00:15:00.150
Which ones are slower?

00:15:00.150 --> 00:15:01.070
Which ones are faster?

00:15:01.070 --> 00:15:03.395
That'll be one of the things
we look at here today.

00:15:03.395 --> 00:15:06.020
But computer scientist have
another sort of tool in the toolkit

00:15:06.020 --> 00:15:07.650
that we want to introduce you today.

00:15:07.650 --> 00:15:11.570
And this is just a capital
Greek omega, this symbol here.

00:15:11.570 --> 00:15:15.470
And this just refers to not a--

00:15:15.470 --> 00:15:17.450
it's the opposite of big O, if you will.

00:15:17.450 --> 00:15:21.200
Big O is essentially an upper bound on
how much time an algorithm might take.

00:15:21.200 --> 00:15:24.290
It might have taken
Eric n steps, 7 lockers,

00:15:24.290 --> 00:15:27.230
to find the number 50
because of linear search.

00:15:27.230 --> 00:15:29.540
That's big O of n, or on the order of n.

00:15:29.540 --> 00:15:32.310
That's an upper bound,
worst case in this scenario.

00:15:32.310 --> 00:15:35.550
You can use omega, though, to
describe things like best cases.

00:15:35.550 --> 00:15:39.530
So for instance, with Eric's linear
search approach in the worst case,

00:15:39.530 --> 00:15:43.670
it could have and it did take
him n steps, or 7 specifically.

00:15:43.670 --> 00:15:47.263
But in the best case, how few
steps might it have taken him?

00:15:47.263 --> 00:15:47.930
Just one, right?

00:15:47.930 --> 00:15:51.300
He might have gotten lucky, and
50 might have been just there.

00:15:51.300 --> 00:15:53.600
Similarly, when [? Nizari, ?]
when she looked for 50

00:15:53.600 --> 00:15:56.330
in the middle, how few
steps might she have

00:15:56.330 --> 00:15:59.500
needed to find 50 among her 7 lockers?

00:15:59.500 --> 00:16:00.170
AUDIENCE: One.

00:16:00.170 --> 00:16:00.830
SPEAKER 1: One step, too.

00:16:00.830 --> 00:16:02.747
She might have just
gotten lucky because Brian

00:16:02.747 --> 00:16:06.230
might have just, by coincidence or
design, put the number 50 there.

00:16:06.230 --> 00:16:09.800
So whereas you have this upper bound on
how many steps an algorithm might take,

00:16:09.800 --> 00:16:11.030
sometimes you can get lucky.

00:16:11.030 --> 00:16:13.280
And if the inputs are
in a certain order,

00:16:13.280 --> 00:16:15.740
you might get lucky and have
a lower bound on the running

00:16:15.740 --> 00:16:17.193
time that's much, much better.

00:16:17.193 --> 00:16:19.110
So we might have a chart
that looks like this.

00:16:19.110 --> 00:16:22.130
This is the same function,
so to speak, the same math.

00:16:22.130 --> 00:16:25.220
But I'm just using omega
now instead of big O.

00:16:25.220 --> 00:16:29.030
And now let's just apply some of these
algorithms to the chat here, then.

00:16:29.030 --> 00:16:33.380
Linear search is in omega of what,
so to speak, by this definition?

00:16:33.380 --> 00:16:34.422
AUDIENCE: Omega 1.

00:16:34.422 --> 00:16:35.630
SPEAKER 1: Omega of 1, right?

00:16:35.630 --> 00:16:39.705
In the best case, the lower bound
on how much time linear search might

00:16:39.705 --> 00:16:41.330
have taken Eric would just be one step.

00:16:41.330 --> 00:16:44.930
So we're going to call
linear search omega of 1.

00:16:44.930 --> 00:16:46.770
And meanwhile, when
we did binary search,

00:16:46.770 --> 00:16:50.030
secondly, it's not going to
be log n in the best case.

00:16:50.030 --> 00:16:54.710
It might be also omega of 1
because we might just get lucky.

00:16:54.710 --> 00:16:57.050
And so now we have kind
of useful rules of thumb

00:16:57.050 --> 00:16:59.810
for describing just how good or
bad your algorithm or your code

00:16:59.810 --> 00:17:05.849
might be, depending on, at least, the
inputs that are fed to that algorithm.

00:17:05.849 --> 00:17:07.730
So that's big O, and that's omega.

00:17:07.730 --> 00:17:13.130
Any questions on these two
principles, big O or omega?

00:17:13.130 --> 00:17:13.630
Yeah?

00:17:13.630 --> 00:17:22.407
AUDIENCE: [INAUDIBLE] no matter
where it starts [INAUDIBLE]??

00:17:22.407 --> 00:17:23.740
SPEAKER 1: Really good question.

00:17:23.740 --> 00:17:25.698
And we'll touch on a few
such algorithms today.

00:17:25.698 --> 00:17:28.780
But for now the question is, what's
an example of an algorithm that

00:17:28.780 --> 00:17:33.490
might be omega of n, such that in the
best case, no matter how good or bad

00:17:33.490 --> 00:17:35.167
your input is, it takes n steps?

00:17:35.167 --> 00:17:37.000
Maybe counting the
number of lockers, right?

00:17:37.000 --> 00:17:38.110
How do I do that?

00:17:38.110 --> 00:17:42.580
1, 2, 3, 4, 5, 6, 7-- my output is 7.

00:17:42.580 --> 00:17:44.080
How many steps did that take?

00:17:44.080 --> 00:17:47.060
Big O of n because in the worst
case, I had to look at all of them,

00:17:47.060 --> 00:17:49.622
but also omega of n
because in the best case,

00:17:49.622 --> 00:17:51.080
I still had to look at all of them.

00:17:51.080 --> 00:17:53.410
Otherwise, I couldn't have
given you an accurate count.

00:17:53.410 --> 00:17:55.780
So that would be an example
of an omega of n algorithm.

00:17:55.780 --> 00:17:57.103
And we'll see others over time.

00:17:57.103 --> 00:17:57.770
Other questions?

00:17:57.770 --> 00:17:58.884
Yeah?

00:17:58.884 --> 00:18:06.516
AUDIENCE: [INAUDIBLE] omega or
[INAUDIBLE] better omega value

00:18:06.516 --> 00:18:07.887
or a better O value?

00:18:07.887 --> 00:18:09.220
SPEAKER 1: Really good question.

00:18:09.220 --> 00:18:13.240
Is it better to have a really good
omega value or a really good O value?

00:18:13.240 --> 00:18:15.205
The latter, and we'll
see this over time.

00:18:15.205 --> 00:18:17.080
Really what computer
scientists tend to worry

00:18:17.080 --> 00:18:19.900
about is how their code
performs in the worst case,

00:18:19.900 --> 00:18:21.790
or maybe not even that,
in the average case.

00:18:21.790 --> 00:18:25.270
Typically, day today,
best case is nice to have.

00:18:25.270 --> 00:18:28.765
But who really cares if your code
is super fast when the input happens

00:18:28.765 --> 00:18:30.580
to be sorted for you already?

00:18:30.580 --> 00:18:32.600
That would be a corner
case, so to speak.

00:18:32.600 --> 00:18:34.960
So it's a useful tool to
describe your algorithm.

00:18:34.960 --> 00:18:39.260
But a big O and upper bound is typically
what we'll care about a little more.

00:18:39.260 --> 00:18:41.260
So let's go in and make
this a little more real.

00:18:41.260 --> 00:18:43.490
Let me go ahead and
switch over to CS50 IDE.

00:18:43.490 --> 00:18:47.440
And let me go ahead and create
a program here called numbers.c

00:18:47.440 --> 00:18:50.990
that's going to allow us to explore,
for instance, linear search.

00:18:50.990 --> 00:18:54.620
So numbers.c is going to start
off with our usual lines.

00:18:54.620 --> 00:18:57.580
So I'm going to go ahead
and include cs50.h.

00:18:57.580 --> 00:19:01.420
I'm going to go ahead and include
standard io.h, int main void,

00:19:01.420 --> 00:19:03.790
so no command line arguments for now.

00:19:03.790 --> 00:19:06.460
And in here, let me go ahead
and just declare some numbers,

00:19:06.460 --> 00:19:07.960
maybe six numbers total.

00:19:07.960 --> 00:19:11.080
And if I want to declare an array of
six numbers, recall from last week,

00:19:11.080 --> 00:19:12.670
I can literally say this.

00:19:12.670 --> 00:19:14.440
And if I want to
initialize those numbers,

00:19:14.440 --> 00:19:17.830
I can do numbers bracket 0 gets,
for instance, the number 4.

00:19:17.830 --> 00:19:21.190
Numbers bracket 1 gets
the number, say, 8.

00:19:21.190 --> 00:19:25.150
Numbers bracket 2 gets the number 15.

00:19:25.150 --> 00:19:28.000
Numbers-- OK, so this is
getting really tedious.

00:19:28.000 --> 00:19:30.460
Turns out in C, there's
a shorthand notation

00:19:30.460 --> 00:19:34.270
when you know in advance what
values you want to put in an array.

00:19:34.270 --> 00:19:42.070
I can actually go up and do this, 4,
8, 15, 16, 23, 42 with curly braces

00:19:42.070 --> 00:19:43.180
on either side.

00:19:43.180 --> 00:19:46.310
So this is just what's called
a statically initialized array.

00:19:46.310 --> 00:19:48.520
You just know in advance
what the values are.

00:19:48.520 --> 00:19:51.230
And so I can just save some
lines of code that way.

00:19:51.230 --> 00:19:54.130
But it's the same thing as the
road I was going down a moment ago.

00:19:54.130 --> 00:19:56.828
But the curly braces are
new for that little feature.

00:19:56.828 --> 00:19:58.870
Now I'm going to go ahead
and iterate over these.

00:19:58.870 --> 00:20:01.728
So for int, i gets 0, i less than 6.

00:20:01.728 --> 00:20:05.020
And I'm going to cut some corners now so
that we focus on the new stuff and not

00:20:05.020 --> 00:20:05.520
on the old.

00:20:05.520 --> 00:20:09.370
I'm hard coding 6 instead of using
a constant or something like that.

00:20:09.370 --> 00:20:13.120
But all I want to do ultimately
is search for the number 50.

00:20:13.120 --> 00:20:15.520
So what code can I now write
inside of this for loop

00:20:15.520 --> 00:20:21.440
to just ask the question,
is 50 behind this door?

00:20:21.440 --> 00:20:24.400
Someone want to call it out?

00:20:24.400 --> 00:20:24.900
Yeah?

00:20:24.900 --> 00:20:26.178
If?

00:20:26.178 --> 00:20:29.500
AUDIENCE: Number i [INAUDIBLE].

00:20:29.500 --> 00:20:33.175
SPEAKER 1: Numbers i equals and not just
single equals, but equals equals 50.

00:20:33.175 --> 00:20:34.925
I can go ahead now and
return some answer.

00:20:34.925 --> 00:20:37.675
So I'm going to go ahead [? and say ?]
printf, for instance, found

00:20:37.675 --> 00:20:38.510
in a new line.

00:20:38.510 --> 00:20:42.010
And then if I want to say
that, no 50 was found,

00:20:42.010 --> 00:20:45.460
recall, that I want to do this outside
of the loop, just like in my pseudocode

00:20:45.460 --> 00:20:45.970
earlier.

00:20:45.970 --> 00:20:48.230
So not found can go way down there.

00:20:48.230 --> 00:20:51.313
So just to be clear, what
algorithm have I implemented here?

00:20:51.313 --> 00:20:52.280
AUDIENCE: [INAUDIBLE].

00:20:52.280 --> 00:20:52.700
SPEAKER 1: Yeah.

00:20:52.700 --> 00:20:53.742
So this is linear search.

00:20:53.742 --> 00:20:58.910
This is the code incarnation of my
pseudocode in Eric's actual execution

00:20:58.910 --> 00:20:59.645
of his algorithm.

00:20:59.645 --> 00:21:01.020
So let me go ahead and save this.

00:21:01.020 --> 00:21:04.220
Let me go ahead and make numbers,
no error messages, which is good,

00:21:04.220 --> 00:21:08.840
dot slash numbers and Enter, or I
should see what when I hit Enter here?

00:21:08.840 --> 00:21:10.100
AUDIENCE: Not found.

00:21:10.100 --> 00:21:13.640
SPEAKER 1: Hopefully not found because
indeed 50 is not among those numbers.

00:21:13.640 --> 00:21:17.170
So that's interesting, but it's
mostly warm up from last week.

00:21:17.170 --> 00:21:18.920
Why don't we consider
a different problem,

00:21:18.920 --> 00:21:22.090
where now we might want to search not
just for numbers, but maybe names.

00:21:22.090 --> 00:21:24.590
Like, if the goal is to search
a phone book, let me go ahead

00:21:24.590 --> 00:21:28.950
and create names.c that allows me
to search now for names in an array.

00:21:28.950 --> 00:21:31.340
So let me go ahead and include cs50.h.

00:21:31.340 --> 00:21:34.100
Let me go ahead and
include standard io.h.

00:21:34.100 --> 00:21:36.620
Let me go ahead and do it int main void.

00:21:36.620 --> 00:21:40.220
And then down here let me go
ahead and give myself an array,

00:21:40.220 --> 00:21:42.560
so an array of string called names.

00:21:42.560 --> 00:21:44.870
I'm going to go ahead and
give myself four names.

00:21:44.870 --> 00:21:48.170
And just like last time, I can
do names bracket 0 gets Emma.

00:21:48.170 --> 00:21:52.190
Or again, to save myself time,
I can cut a few corners here

00:21:52.190 --> 00:21:57.740
and say Emma, Rodrigo, Brian,
David, just like last week,

00:21:57.740 --> 00:21:59.210
capitalized just because.

00:21:59.210 --> 00:22:04.610
So that's another way of writing the
same code as with more lines than that.

00:22:04.610 --> 00:22:06.410
Now I'm going to do int i gets 0.

00:22:06.410 --> 00:22:10.130
i is less than 5, in
this case, i plus plus.

00:22:10.130 --> 00:22:12.800
And now things get a little
interesting because I

00:22:12.800 --> 00:22:17.370
might want to say if names bracket i
equals equals-- let's not search for 50

00:22:17.370 --> 00:22:17.870
now.

00:22:17.870 --> 00:22:20.030
Let's search for Emma,
just like last week.

00:22:20.030 --> 00:22:24.530
I want to go ahead and
say found if I find Emma,

00:22:24.530 --> 00:22:28.700
else down here I want to say not found.

00:22:28.700 --> 00:22:31.050
The catch is that this will not work.

00:22:31.050 --> 00:22:31.550
Sorry.

00:22:31.550 --> 00:22:32.925
It's a little warm up here today.

00:22:32.925 --> 00:22:36.830
The catch is this will not work,
even though I'm pretty much

00:22:36.830 --> 00:22:39.860
doing exactly what I did last time.

00:22:39.860 --> 00:22:43.310
What might the intuition be, especially
if you've never studied C before,

00:22:43.310 --> 00:22:47.570
as to why line 10 here
won't actually work

00:22:47.570 --> 00:22:51.246
as easily as numbers did a moment ago?

00:22:51.246 --> 00:22:51.746
Yeah?

00:22:51.746 --> 00:22:53.080
AUDIENCE: Difference in data type.

00:22:53.080 --> 00:22:54.370
SPEAKER 1: Difference in
data type, and what do

00:22:54.370 --> 00:22:56.462
what are the differences, to be clear?

00:22:56.462 --> 00:22:58.893
AUDIENCE: [INAUDIBLE]

00:22:58.893 --> 00:22:59.560
SPEAKER 1: Yeah.

00:22:59.560 --> 00:23:05.158
AUDIENCE: [INAUDIBLE] of
the array [INAUDIBLE]..

00:23:05.158 --> 00:23:05.950
SPEAKER 1: Exactly.

00:23:05.950 --> 00:23:08.760
You can't use equals equals
4 strings because, remember,

00:23:08.760 --> 00:23:12.390
a string is not a data type, like
a char, a bool, a float, an int.

00:23:12.390 --> 00:23:15.120
Remember, it's actually
an array and an array

00:23:15.120 --> 00:23:17.070
that likely has multiple characters.

00:23:17.070 --> 00:23:19.320
And odds if you want
to compare two strings,

00:23:19.320 --> 00:23:23.550
you probably intuitively need to compare
all of the characters in those strings,

00:23:23.550 --> 00:23:25.170
not just the whole thing at once.

00:23:25.170 --> 00:23:27.480
In other languages, if
you use Python or Java,

00:23:27.480 --> 00:23:30.030
you can actually do this in
one line, just like this.

00:23:30.030 --> 00:23:32.430
But in C, everything
is much more low level.

00:23:32.430 --> 00:23:35.460
If you want to compare strings,
you can't use equal equals.

00:23:35.460 --> 00:23:37.260
However, it turns out
there's a function,

00:23:37.260 --> 00:23:39.360
and you might have even
used this in p-set 2,

00:23:39.360 --> 00:23:43.200
if you took this approach, where you
can actually compare two strings.

00:23:43.200 --> 00:23:45.400
So I'm going to delete
this line and instead say,

00:23:45.400 --> 00:23:50.430
str comp, for string comparison, names
bracket i being the first string I want

00:23:50.430 --> 00:23:53.850
to compare and then, quote unquote,
"Emma" being the second string

00:23:53.850 --> 00:23:54.808
that I want to compare.

00:23:54.808 --> 00:23:57.017
And you would only know this
from having been told it

00:23:57.017 --> 00:23:58.410
or reading in the documentation.

00:23:58.410 --> 00:24:05.190
This function str compare returns
0 if two strings are the same.

00:24:05.190 --> 00:24:07.098
It happens to return a
positive number if one

00:24:07.098 --> 00:24:09.390
comes after the other
alphabetically or negative number

00:24:09.390 --> 00:24:11.265
if one comes before the
other alphabetically.

00:24:11.265 --> 00:24:15.240
But for today, we're just using it to
test equality of strings, so to speak.

00:24:15.240 --> 00:24:17.350
So let me go ahead and save this.

00:24:17.350 --> 00:24:22.050
Let me go ahead and scroll up
here and do make names this time.

00:24:22.050 --> 00:24:25.350
And unfortunately, I can't just
use this function, it seems.

00:24:25.350 --> 00:24:27.330
And while it's fine
certainly to keep using

00:24:27.330 --> 00:24:30.360
help 50 to understand these
messages, any thoughts as to what

00:24:30.360 --> 00:24:32.964
I've done wrong?

00:24:32.964 --> 00:24:33.910
AUDIENCE: [INAUDIBLE]

00:24:33.910 --> 00:24:34.300
SPEAKER 1: Yeah.

00:24:34.300 --> 00:24:36.520
I mean, I can't quite understand
all of the words on the screen,

00:24:36.520 --> 00:24:37.600
frankly, at first glance.

00:24:37.600 --> 00:24:40.150
But string.h is something
we've seen before.

00:24:40.150 --> 00:24:43.040
And indeed, if you read the
documentation or the manual page,

00:24:43.040 --> 00:24:46.660
you'll see that str compare,
indeed, comes in string.h

00:24:46.660 --> 00:24:48.700
so I need to put this up here.

00:24:48.700 --> 00:24:53.800
And now if I save my file and recompile
my code down here with make names,

00:24:53.800 --> 00:24:54.730
now it compiles.

00:24:54.730 --> 00:24:59.545
And if I do dot slash names, I
should see, hmm, interesting,

00:24:59.545 --> 00:25:02.290
a mixed message, literally.

00:25:02.290 --> 00:25:05.245
So is Emma there or
not there in my array?

00:25:08.000 --> 00:25:09.590
She's obviously there.

00:25:09.590 --> 00:25:11.300
And yet she's somehow not there.

00:25:11.300 --> 00:25:13.010
So what have I done wrong logically.

00:25:13.010 --> 00:25:13.510
Yeah?

00:25:13.510 --> 00:25:20.200
AUDIENCE: Do you have
[INAUDIBLE] if it's found or not.

00:25:20.200 --> 00:25:23.343
So [INAUDIBLE] is not found [INAUDIBLE].

00:25:23.343 --> 00:25:24.010
SPEAKER 1: Yeah.

00:25:24.010 --> 00:25:27.890
So it's this not found that
I'm just blindingly printing

00:25:27.890 --> 00:25:29.510
at the end as a sort of catch all.

00:25:29.510 --> 00:25:34.070
But really, if I execute
found or print found up here,

00:25:34.070 --> 00:25:37.780
what should I really be
doing maybe right after that?

00:25:37.780 --> 00:25:38.330
Returning.

00:25:38.330 --> 00:25:39.663
And we looked at this last week.

00:25:39.663 --> 00:25:42.950
Recall that if you want to go ahead
and return a successful outcome,

00:25:42.950 --> 00:25:45.110
the convention is to return 0.

00:25:45.110 --> 00:25:47.163
And actually down here,
if you're unsuccessful,

00:25:47.163 --> 00:25:48.830
what should be perhaps returned instead?

00:25:48.830 --> 00:25:49.398
AUDIENCE: 1.

00:25:49.398 --> 00:25:49.940
SPEAKER 1: 1.

00:25:49.940 --> 00:25:51.980
And again, these are totally
arbitrary conventions.

00:25:51.980 --> 00:25:53.630
You just kind of learn them as you go.

00:25:53.630 --> 00:25:54.860
But 0 mean success.

00:25:54.860 --> 00:25:57.020
1 tends to mean failure.

00:25:57.020 --> 00:25:58.260
And that now lines up.

00:25:58.260 --> 00:26:01.440
So now my function main
will essentially exit early.

00:26:01.440 --> 00:26:05.030
So if I go ahead and run make names
and then do dot slash names, now

00:26:05.030 --> 00:26:07.580
if I'm searching for Emma
in that array of four names,

00:26:07.580 --> 00:26:11.480
she's found and only found.

00:26:11.480 --> 00:26:15.620
Any questions, then, on this here?

00:26:15.620 --> 00:26:18.560
All right, well, what if I
want to do one further thing

00:26:18.560 --> 00:26:21.740
and combine these two ideas
into one final program, namely

00:26:21.740 --> 00:26:22.762
that of a phone book?

00:26:22.762 --> 00:26:24.470
So let me go ahead
and close these files.

00:26:24.470 --> 00:26:26.430
Let me go ahead and
give myself a new file.

00:26:26.430 --> 00:26:28.430
I'll call it phonebook.c.

00:26:28.430 --> 00:26:31.310
And let's actually integrate
all of these building blocks

00:26:31.310 --> 00:26:33.680
as follows, cs50.h again.

00:26:33.680 --> 00:26:36.200
I'm going to go ahead and
include standard io.h.

00:26:36.200 --> 00:26:39.390
I'm going to go ahead and
include string.h just as before.

00:26:39.390 --> 00:26:41.210
And now I'm going to do int main void.

00:26:41.210 --> 00:26:44.420
And now I want to implement the
idea of searching a phone book,

00:26:44.420 --> 00:26:48.540
just like in week 0, but now doing
it in C. So let's keep it simple.

00:26:48.540 --> 00:26:52.918
And we'll have just four names in this
phone book, so string names 4 equals.

00:26:52.918 --> 00:26:54.710
And I'm going to use
my same new trick just

00:26:54.710 --> 00:26:58.520
to save myself some lines
of code, Emma, Rodrigo,

00:26:58.520 --> 00:27:03.500
and then, quote unquote,
"Brian," quote unquote, "myself."

00:27:03.500 --> 00:27:05.000
But then our numbers.

00:27:05.000 --> 00:27:08.990
So how should we store phone number,
would you propose, what data type?

00:27:11.990 --> 00:27:13.022
AUDIENCE: [INAUDIBLE]

00:27:13.022 --> 00:27:13.730
SPEAKER 1: Sorry?

00:27:13.730 --> 00:27:14.750
AUDIENCE: String.

00:27:14.750 --> 00:27:15.140
SPEAKER 1: String?

00:27:15.140 --> 00:27:15.680
Why string?

00:27:15.680 --> 00:27:18.160
I feel like phone numbers
are numbers and strings--

00:27:18.160 --> 00:27:21.864
AUDIENCE: Maybe if you store it
as a [INAUDIBLE] or an integer,

00:27:21.864 --> 00:27:25.910
then it's implied that you
need to do much [INAUDIBLE]..

00:27:25.910 --> 00:27:30.728
You don't have [INAUDIBLE]
the [INAUDIBLE],,

00:27:30.728 --> 00:27:34.200
like, [? add ?] [INAUDIBLE]
number a dash or something.

00:27:34.200 --> 00:27:36.468
It would be really hard
to manipulate an integer.

00:27:36.468 --> 00:27:37.260
SPEAKER 1: Exactly.

00:27:37.260 --> 00:27:41.040
So to summarize if a phone number has
dashes in it or parentheses or maybe

00:27:41.040 --> 00:27:43.345
plus signs abroad, those are characters.

00:27:43.345 --> 00:27:44.220
Those aren't numbers.

00:27:44.220 --> 00:27:45.803
So they won't fit in ints or in longs.

00:27:45.803 --> 00:27:48.762
So even though we call it a phone
number, now that you're a programmer,

00:27:48.762 --> 00:27:51.720
it's not really a number so much as
a string that looks like a number.

00:27:51.720 --> 00:27:53.820
So string is probably
the better bet here.

00:27:53.820 --> 00:27:56.100
And if you consider, too,
in certain geographies,

00:27:56.100 --> 00:27:59.340
you sometimes have to dial 0 to
dial someone's number if it's local.

00:27:59.340 --> 00:28:01.842
But if it's a 0, it's going
to get dropped mathematically

00:28:01.842 --> 00:28:03.300
because leading zeros don't matter.

00:28:03.300 --> 00:28:07.680
So again, modeling things that look like
numbers but really aren't as integers

00:28:07.680 --> 00:28:08.870
is probably the wrong call.

00:28:08.870 --> 00:28:11.545
So let's indeed do string numbers.

00:28:11.545 --> 00:28:13.170
And I'll give myself four numbers here.

00:28:13.170 --> 00:28:19.170
And let's do 617 555 how about, 0100.

00:28:19.170 --> 00:28:26.580
We'll do 617 555 just
like in the movies, 0101.

00:28:26.580 --> 00:28:27.660
Let me fix that.

00:28:27.660 --> 00:28:34.620
Then we'll do 617 555 [? 0102. ?] And
then lastly my number, which shall be--

00:28:34.620 --> 00:28:39.360
whoops-- which shall be 617 555 0103.

00:28:39.360 --> 00:28:41.550
And I'm doing a same
kind of trick, but this

00:28:41.550 --> 00:28:44.910
is giving me now two arrays, one
called names, one called numbers.

00:28:44.910 --> 00:28:50.580
Here we go, for int i gets 0, i less
than 4 i plus plus, so same quick loop

00:28:50.580 --> 00:28:51.720
as before.

00:28:51.720 --> 00:28:53.400
I'm going to go ahead and compare now.

00:28:53.400 --> 00:28:54.390
I'm searching for Emma.

00:28:54.390 --> 00:28:57.810
And specifically now I'm searching
for her number not just her name.

00:28:57.810 --> 00:29:01.350
So I want to print out her number
this time not just found or not found.

00:29:01.350 --> 00:29:06.990
So as before, I can say if comparing
the two strings at names bracket i

00:29:06.990 --> 00:29:12.480
and, quote unquote, "Emma" equals
equals 0, I know that I found Emma.

00:29:12.480 --> 00:29:16.500
And if I want to go ahead and print
out Emma's phone number, what should

00:29:16.500 --> 00:29:18.930
I do here?

00:29:18.930 --> 00:29:20.670
It's not names.

00:29:20.670 --> 00:29:22.883
It's not numbers.

00:29:22.883 --> 00:29:24.300
What should go between the quotes?

00:29:24.300 --> 00:29:25.623
AUDIENCE: [INAUDIBLE].

00:29:25.623 --> 00:29:28.540
SPEAKER 1: Yeah, so [? %s, ?] remember,
just our familiar place holder

00:29:28.540 --> 00:29:29.130
for strings.

00:29:29.130 --> 00:29:32.760
And then here not names because
I know I'm looking for Emma.

00:29:32.760 --> 00:29:34.500
Here I want to go ahead and put number.

00:29:34.500 --> 00:29:38.520
So it's a separate array, but it's
at the same location, bracket 1.

00:29:38.520 --> 00:29:40.060
Let me go ahead and save that.

00:29:40.060 --> 00:29:42.870
And down here, I'm going to go
ahead and say printf not found,

00:29:42.870 --> 00:29:45.630
if we don't find Emma, even though
we surely will in this case.

00:29:45.630 --> 00:29:47.005
And I'm going to learn my lesson.

00:29:47.005 --> 00:29:51.120
I'm going to return 0 for success and
return 1 for failure in this case.

00:29:51.120 --> 00:29:54.450
Let me save the file, scroll my
terminal window up a little bit,

00:29:54.450 --> 00:29:59.160
do make phone book Enter,
compiles OK dot slash phone book.

00:29:59.160 --> 00:30:02.706
And what should I see when
I run the program now?

00:30:02.706 --> 00:30:04.600
AUDIENCE: [INAUDIBLE]

00:30:04.600 --> 00:30:09.610
SPEAKER 1: 617 555 0100, hopefully.

00:30:09.610 --> 00:30:11.860
So this code is correct.

00:30:11.860 --> 00:30:14.500
And this is an opportunity now
for us to criticize it, though,

00:30:14.500 --> 00:30:16.150
along a different line.

00:30:16.150 --> 00:30:17.150
This is correct.

00:30:17.150 --> 00:30:20.740
I've got two arrays, both of size
4, one with names, one with numbers,

00:30:20.740 --> 00:30:23.710
code finds Emma, prints
her number, returns 0.

00:30:23.710 --> 00:30:26.050
I seem to have done
everything correctly.

00:30:26.050 --> 00:30:30.520
But does anything rub you the wrong way
perhaps about the design of this code?

00:30:30.520 --> 00:30:33.250
Could we do better?

00:30:33.250 --> 00:30:36.490
Is there's something that's a little
arbitrary, a little contrived,

00:30:36.490 --> 00:30:39.680
a little dangerous about this code?

00:30:39.680 --> 00:30:40.690
Any glimpses?

00:30:40.690 --> 00:30:41.734
Yeah, over here?

00:30:41.734 --> 00:30:43.630
AUDIENCE: [INAUDIBLE].

00:30:43.630 --> 00:30:45.134
SPEAKER 1: Sorry, a little louder.

00:30:45.134 --> 00:30:47.444
AUDIENCE: [INAUDIBLE] [? two ?]
[? single digit ?] [? on both sides ?]

00:30:47.444 --> 00:30:48.370
[INAUDIBLE].

00:30:48.370 --> 00:30:50.980
SPEAKER 1: So we could use
a two-dimensional array

00:30:50.980 --> 00:30:52.420
to store data like this.

00:30:52.420 --> 00:30:54.253
I would propose it's
not strictly necessary,

00:30:54.253 --> 00:30:56.378
and it might make things
a little more complicated,

00:30:56.378 --> 00:30:58.030
but a reasonable alternative as well.

00:30:58.030 --> 00:30:58.974
Other thoughts?

00:30:58.974 --> 00:31:01.580
AUDIENCE: [INAUDIBLE]
Emma's number [INAUDIBLE]..

00:31:01.580 --> 00:31:04.600
SPEAKER 1: Yeah, it's assuming that
Emma's number is the first one.

00:31:04.600 --> 00:31:06.220
And that seems reasonable, right?

00:31:06.220 --> 00:31:07.300
Emma's name is first.

00:31:07.300 --> 00:31:08.950
So presumably her number's first.

00:31:08.950 --> 00:31:10.270
Rodrigo's name is second.

00:31:10.270 --> 00:31:12.070
So presumably his number is second.

00:31:12.070 --> 00:31:13.750
And that might be true.

00:31:13.750 --> 00:31:17.380
But frankly, that's the concern,
this sort of honor system

00:31:17.380 --> 00:31:19.810
that I promised to keep the
names in the right order,

00:31:19.810 --> 00:31:23.080
and I promise to keep the numbers
in the right order, when really,

00:31:23.080 --> 00:31:26.410
that is just sort of an unspoken
agreement between me and myself,

00:31:26.410 --> 00:31:28.810
or if I'm working with
colleagues or classmates,

00:31:28.810 --> 00:31:31.273
that we all just agree to
keep those things in sync.

00:31:31.273 --> 00:31:32.440
And that's dangerous, right?

00:31:32.440 --> 00:31:35.710
If you had more numbers than four,
you could imagine things very quickly

00:31:35.710 --> 00:31:37.240
getting slightly out of order.

00:31:37.240 --> 00:31:39.970
Or god forbid, you sort
the names alphabetically,

00:31:39.970 --> 00:31:43.970
how do you go about sorting the numbers
as well and keeping things together?

00:31:43.970 --> 00:31:46.780
So this feels like an opportunity
for one new feature in C

00:31:46.780 --> 00:31:48.820
and in programming
languages more generally,

00:31:48.820 --> 00:31:52.540
whereby we can actually keep these
pieces of data, someone's name

00:31:52.540 --> 00:31:54.010
and number, together.

00:31:54.010 --> 00:31:56.140
And today we give
ourselves the opportunity

00:31:56.140 --> 00:31:58.450
to introduce our own custom types.

00:31:58.450 --> 00:32:01.930
We've seen ints and bools and
floats and longs and strings.

00:32:01.930 --> 00:32:04.390
And string, recall, is
a custom CS50 data type.

00:32:04.390 --> 00:32:07.840
And we'll take that one away in a
couple of weeks as a training wheel.

00:32:07.840 --> 00:32:11.650
But today let's give ourselves
our own data type as follows.

00:32:11.650 --> 00:32:14.030
Typedef is our new keyword today.

00:32:14.030 --> 00:32:16.360
And it literally means define a type.

00:32:16.360 --> 00:32:17.710
It's going to be a structure.

00:32:17.710 --> 00:32:20.740
And so struct in C is an
actual keyword, and it

00:32:20.740 --> 00:32:26.500
refers to a container, inside of which
you can put multiple other data types.

00:32:26.500 --> 00:32:29.620
Struct is a container
for multiple data types.

00:32:29.620 --> 00:32:31.720
What do I want to contain?

00:32:31.720 --> 00:32:34.210
Well, I want to give
myself a name for everyone.

00:32:34.210 --> 00:32:36.580
And I want to give myself
a number for everyone,

00:32:36.580 --> 00:32:39.880
even though it's a string because phone
numbers can have dashes and parentheses

00:32:39.880 --> 00:32:40.900
and so forth.

00:32:40.900 --> 00:32:41.650
And you know what?

00:32:41.650 --> 00:32:45.070
The name I'm going to give to this
structure is going to be person.

00:32:45.070 --> 00:32:46.390
It's a simple person.

00:32:46.390 --> 00:32:49.420
But using this syntax,
I can teach my compiler,

00:32:49.420 --> 00:32:51.490
[INAUDIBLE] in this case,
that not only are there

00:32:51.490 --> 00:32:55.240
ints and floats and chars and
bools and so forth and strings,

00:32:55.240 --> 00:32:59.710
there are also person types now in C.
They didn't come with the language.

00:32:59.710 --> 00:33:04.360
But I'm inventing them now with
typedef struct person, inside of which,

00:33:04.360 --> 00:33:06.700
or encapsulated, so to
speak, inside of which

00:33:06.700 --> 00:33:09.970
is going to be two
things, name and number.

00:33:09.970 --> 00:33:11.570
So what can I do with this?

00:33:11.570 --> 00:33:15.820
Well, my code gets a little different
but better designed, I would argue.

00:33:15.820 --> 00:33:19.150
Down in my code now, I'm going to
give myself an array of people.

00:33:19.150 --> 00:33:20.560
There's four of us on the staff.

00:33:20.560 --> 00:33:22.960
And I want to give myself
an array of four people.

00:33:22.960 --> 00:33:26.140
So I might do literally the
same approach I've always

00:33:26.140 --> 00:33:27.520
done when declaring a data type.

00:33:27.520 --> 00:33:28.645
What data type do you want?

00:33:28.645 --> 00:33:29.470
Person.

00:33:29.470 --> 00:33:31.330
And what should my array be called?

00:33:31.330 --> 00:33:32.860
Well, I could call it persons.

00:33:32.860 --> 00:33:35.320
Or frankly, I could just
call it people in English.

00:33:35.320 --> 00:33:37.450
And how many people do
I want to represent?

00:33:37.450 --> 00:33:38.470
Four.

00:33:38.470 --> 00:33:40.900
So my array is called people.

00:33:40.900 --> 00:33:42.340
It's a size 4.

00:33:42.340 --> 00:33:46.070
And each element in that
array is going to be a person.

00:33:46.070 --> 00:33:48.110
So this syntax is not new.

00:33:48.110 --> 00:33:50.800
This syntax up here is new.

00:33:50.800 --> 00:33:54.220
But as of today now,
persons exist in C. Now,

00:33:54.220 --> 00:33:57.550
my syntax here does have to change
a little bit, but not all that much.

00:33:57.550 --> 00:34:01.450
Now, if I want to go ahead and fill this
array, I can do something like this.

00:34:01.450 --> 00:34:03.700
Emma will be our 0th person.

00:34:03.700 --> 00:34:07.570
But I don't just do something like this
because, quote unquote, "Emma" is not

00:34:07.570 --> 00:34:08.409
a person.

00:34:08.409 --> 00:34:10.989
Quote unquote "Emma" is a name.

00:34:10.989 --> 00:34:15.409
And quote unquote "617
555 0100" is a number.

00:34:15.409 --> 00:34:17.590
So I actually need to be
a little more specific.

00:34:17.590 --> 00:34:23.320
I need to say that
people 0 name is Emma.

00:34:23.320 --> 00:34:27.610
And then people 0 number
is whatever Emma's was,

00:34:27.610 --> 00:34:31.460
which was 617 555 0100 semicolon.

00:34:31.460 --> 00:34:35.620
And now I can do the same thing
again, so people bracket 1 dot name

00:34:35.620 --> 00:34:37.300
gets Rodrigo.

00:34:37.300 --> 00:34:43.719
People bracket 1 dot number
gets 617 555 0101 semicolon.

00:34:43.719 --> 00:34:47.679
People bracket 2 dot name gets Brian.

00:34:47.679 --> 00:34:54.130
And people bracket 2 dot
number gets 617 555--

00:34:54.130 --> 00:34:56.469
555-- 0102.

00:34:56.469 --> 00:34:58.900
And then lastly-- it's
getting tedious quickly.

00:34:58.900 --> 00:35:02.380
But in an ideal world, we would
just ask the human for these inputs.

00:35:02.380 --> 00:35:03.880
Name will be mine.

00:35:03.880 --> 00:35:07.570
And then lastly, people bracket 3
dot number equals, quote unquote,

00:35:07.570 --> 00:35:10.390
"617 555 0103."

00:35:10.390 --> 00:35:11.750
Whew.

00:35:11.750 --> 00:35:14.750
So it's a little more
to write in this case.

00:35:14.750 --> 00:35:16.990
And so it might rub you the
wrong way in that sense.

00:35:16.990 --> 00:35:19.990
But notice that we're now kind of
encapsulating everything together.

00:35:19.990 --> 00:35:23.920
We only have four values,
each of which is a person.

00:35:23.920 --> 00:35:26.950
And each of those persons,
inside of them, so to speak,

00:35:26.950 --> 00:35:28.570
have a name and a number.

00:35:28.570 --> 00:35:30.700
And everything is intricately related.

00:35:30.700 --> 00:35:33.440
So even if I sought
these things by name,

00:35:33.440 --> 00:35:37.790
they're going to end up having the same
associations between numbers and names.

00:35:37.790 --> 00:35:40.720
So now the last thing I have to
do is change my logic down here.

00:35:40.720 --> 00:35:45.940
It's not sufficient anymore to
compare names bracket i against Emma.

00:35:45.940 --> 00:35:48.650
What should I compare name against Emma?

00:35:48.650 --> 00:35:50.090
AUDIENCE: [INAUDIBLE].

00:35:50.090 --> 00:35:51.820
SPEAKER 1: Dot name.

00:35:51.820 --> 00:35:55.180
And then down here, numbers
doesn't even-- oh, and this was--

00:35:55.180 --> 00:35:56.680
this is people.

00:35:56.680 --> 00:35:58.000
Numbers doesn't exist either.

00:35:58.000 --> 00:35:58.840
It's people.

00:35:58.840 --> 00:36:00.340
But I want to print her number here.

00:36:00.340 --> 00:36:02.800
So I do dot number.

00:36:02.800 --> 00:36:04.930
So again, we've add a
little bit of complexity

00:36:04.930 --> 00:36:07.540
by adding typedef and
these dot notations.

00:36:07.540 --> 00:36:11.680
But if I go ahead and make my phone
book now, all too many errors.

00:36:11.680 --> 00:36:13.360
Oh, interesting.

00:36:13.360 --> 00:36:18.310
Array index 4 is past the end of the
array, which contains four elements.

00:36:18.310 --> 00:36:21.340
So I made a stupid mistake here.

00:36:21.340 --> 00:36:22.185
What did I do?

00:36:22.185 --> 00:36:23.060
AUDIENCE: [INAUDIBLE]

00:36:23.060 --> 00:36:23.727
SPEAKER 1: Yeah.

00:36:23.727 --> 00:36:25.990
So I just kept incrementing incorrectly.

00:36:25.990 --> 00:36:28.630
Let me save that, run
make phone book, Enter.

00:36:28.630 --> 00:36:29.650
Now it's good.

00:36:29.650 --> 00:36:35.080
Dot slash phone book, Enter, and
hopefully I will see Emma's number.

00:36:35.080 --> 00:36:37.240
So it's no more correct than before.

00:36:37.240 --> 00:36:40.900
But it's arguably better designed/
and we'll come back to this later

00:36:40.900 --> 00:36:41.630
in the semester.

00:36:41.630 --> 00:36:43.338
[? As ?] you choose
your choice of tracks

00:36:43.338 --> 00:36:47.230
and start implementing applications
for the web or mobile devices or games,

00:36:47.230 --> 00:36:50.020
it's going to be quite common to
encapsulate related information

00:36:50.020 --> 00:36:52.330
like this so that you
keep lots of information

00:36:52.330 --> 00:36:55.000
together, especially when you
use something called a database.

00:36:55.000 --> 00:36:55.738
Yeah?

00:36:55.738 --> 00:36:58.037
AUDIENCE: [INAUDIBLE]

00:36:58.037 --> 00:37:00.620
SPEAKER 1: Is there any shortcut
for writing everything I did?

00:37:00.620 --> 00:37:02.760
Yes, you can actually use
curly bracket notation.

00:37:02.760 --> 00:37:05.510
It gets a little uglier in this case
so I'm not going to bother doing it.

00:37:05.510 --> 00:37:06.927
But, yes, there is a way to do it.

00:37:06.927 --> 00:37:08.968
However, this is, at the
end of the day, realize,

00:37:08.968 --> 00:37:11.180
kind of a silly program
because I'm writing a program

00:37:11.180 --> 00:37:13.550
to find Emma in a list
of names I already wrote.

00:37:13.550 --> 00:37:14.700
So it's not dynamic at all.

00:37:14.700 --> 00:37:18.230
So in an ideal world, we would be using
get string or something fancier anyway.

00:37:18.230 --> 00:37:21.180
Other questions on this?

00:37:21.180 --> 00:37:21.860
All right.

00:37:21.860 --> 00:37:24.650
So this is only to say we
clearly have the ability, then,

00:37:24.650 --> 00:37:30.140
in code to implement these ideas,
like [? Nizari ?] and Eric implemented

00:37:30.140 --> 00:37:33.980
more physically, using something
like this array of lockers.

00:37:33.980 --> 00:37:35.640
So where do we go from here?

00:37:35.640 --> 00:37:38.450
Well, unfortunately, [? Nizari ?]
benefited from the fact

00:37:38.450 --> 00:37:41.690
that the lockers were,
of course, already

00:37:41.690 --> 00:37:43.940
sorted sort of behind her by Brian.

00:37:43.940 --> 00:37:45.638
But there were some price paid, right?

00:37:45.638 --> 00:37:48.680
Indeed, even we had to wait a little
bit of time for all of those numbers

00:37:48.680 --> 00:37:50.513
to get sorted in the
lockers before we could

00:37:50.513 --> 00:37:52.800
proceed to execute that algorithm.

00:37:52.800 --> 00:37:55.350
So a question, then,
reasonable to ask is, well,

00:37:55.350 --> 00:37:57.590
how expensive is it to sort numbers?

00:37:57.590 --> 00:38:00.080
And should you sort
numbers and then search?

00:38:00.080 --> 00:38:02.120
Or should you just jump
right into searching

00:38:02.120 --> 00:38:05.150
and not worry about sorting
the numbers, especially if one

00:38:05.150 --> 00:38:06.650
might be more costly than the other?

00:38:06.650 --> 00:38:08.483
These are going to be
ultimately trade-offs.

00:38:08.483 --> 00:38:09.960
So let's consider them as follows.

00:38:09.960 --> 00:38:13.190
If now the problem at hand is to
provide as input to our problem

00:38:13.190 --> 00:38:16.160
an unsorted list of
numbers, the goal of which

00:38:16.160 --> 00:38:19.010
is to get a sorted list
of numbers back out,

00:38:19.010 --> 00:38:20.990
how do we go about implementing this?

00:38:20.990 --> 00:38:26.720
For instance, if the numbers are
7, 2, 1, 6, 3, 4, 50 in that order,

00:38:26.720 --> 00:38:29.390
that unsorted order, the
goal at hand is to get out

00:38:29.390 --> 00:38:34.700
1, 2, 3, 4, 6, 7, 50 in sorted
order from left to right,

00:38:34.700 --> 00:38:36.570
smallest to largest.

00:38:36.570 --> 00:38:39.710
So how can we go about
implementing that idea?

00:38:39.710 --> 00:38:41.110
Well, let me go ahead and see.

00:38:41.110 --> 00:38:42.470
We have a few stress balls left.

00:38:42.470 --> 00:38:44.803
And we could perhaps do this
a little dramatically maybe

00:38:44.803 --> 00:38:47.330
with eight volunteers, if you will.

00:38:47.330 --> 00:38:48.350
OK, that's a plan.

00:38:48.350 --> 00:38:58.100
OK, so 1, about 2, 3, if we could, OK, 4
in the middle there, 5, 6, 7, and let's

00:38:58.100 --> 00:38:58.820
see--

00:38:58.820 --> 00:39:02.250
and let's see, [INAUDIBLE]
can come up here.

00:39:02.250 --> 00:39:03.170
Can we do it after?

00:39:03.170 --> 00:39:03.808
OK, thanks.

00:39:03.808 --> 00:39:05.850
And how about-- wait, I
saw a hand in the middle.

00:39:05.850 --> 00:39:07.770
How about eight,
volunteered by your friends.

00:39:07.770 --> 00:39:08.780
Come on up.

00:39:08.780 --> 00:39:10.042
So come on up, if you would.

00:39:10.042 --> 00:39:11.750
And Brian, if we could
go ahead and equip

00:39:11.750 --> 00:39:15.230
our volunteers each with a number.

00:39:15.230 --> 00:39:18.320
We're going to go ahead and
see if we can't solve together

00:39:18.320 --> 00:39:25.310
the idea of finding an algorithm
for sorting the numbers at hand.

00:39:25.310 --> 00:39:27.770
So in just a moment, each of
you will be handed a number.

00:39:27.770 --> 00:39:30.520
In the meantime, let's go ahead
and just say a quick introduction,

00:39:30.520 --> 00:39:32.160
who you are, and perhaps your house.

00:39:32.160 --> 00:39:36.080
AUDIENCE: [? Crus, ?]
Dudley House, from Germany.

00:39:36.080 --> 00:39:38.410
AUDIENCE: Curtis, just here visiting.

00:39:38.410 --> 00:39:39.696
SPEAKER 1: Wonderful.

00:39:39.696 --> 00:39:43.775
AUDIENCE: Ali, freshman,
[INAUDIBLE],, from Turkey.

00:39:43.775 --> 00:39:46.110
AUDIENCE: Farah
[? Foho, ?] from Detroit.

00:39:46.910 --> 00:39:47.660
SPEAKER 1: Nice.

00:39:47.660 --> 00:39:52.151
AUDIENCE: Allison, Hollis because
I'm first year, from Cleveland.

00:39:52.151 --> 00:39:53.150
AUDIENCE: I'm Claude.

00:39:53.150 --> 00:39:53.720
I'm in Mauer.

00:39:53.720 --> 00:39:55.290
And I'm from Virginia.

00:39:55.290 --> 00:39:57.260
AUDIENCE: I'm [? Rohil. ?]
I'm in Wigglesworth.

00:39:57.260 --> 00:39:58.946
And I'm from Atlanta.

00:39:58.946 --> 00:40:01.250
AUDIENCE: I'm [? Yowell. ?]
I'm also from Wigglesworth.

00:40:01.250 --> 00:40:02.505
And I'm from New York.

00:40:02.505 --> 00:40:03.390
AUDIENCE: I'm Bonnie.

00:40:03.390 --> 00:40:03.665
I'm in Lowell.

00:40:03.665 --> 00:40:05.457
I'm from Beijing and
[? Ann ?] [? Arbor. ?]

00:40:05.457 --> 00:40:06.530
SPEAKER 1: Wonderful.

00:40:06.530 --> 00:40:10.360
And I'm noticing now, as you might be
too, we have nine volunteers on stage.

00:40:10.360 --> 00:40:12.110
So we're going to go
ahead and solve this.

00:40:12.110 --> 00:40:12.740
That's OK.

00:40:12.740 --> 00:40:13.270
What's your name again?

00:40:13.270 --> 00:40:13.960
AUDIENCE: Bonnie.

00:40:13.960 --> 00:40:14.770
SPEAKER 1: Bonnie, come on over here.

00:40:14.770 --> 00:40:16.937
You're going to be maybe
my assistant, if you could,

00:40:16.937 --> 00:40:18.592
as we sought these elements.

00:40:18.592 --> 00:40:20.300
Let's go ahead and
give you the mic here.

00:40:20.300 --> 00:40:23.360
Each of you has been
handed a number that

00:40:23.360 --> 00:40:27.330
happens to match with this, which
is just an unsorted list of numbers.

00:40:27.330 --> 00:40:30.530
And let me just ask that our eight
volunteers here sort yourselves.

00:40:30.530 --> 00:40:32.752
Go.

00:40:32.752 --> 00:40:35.353
[INTERPOSING VOICES]

00:40:35.353 --> 00:40:37.520
SPEAKER 1: And I'll have
you direct them after this.

00:40:41.200 --> 00:40:41.830
Excellent.

00:40:41.830 --> 00:40:42.910
Very well done.

00:40:42.910 --> 00:40:45.010
[APPLAUSE]

00:40:45.010 --> 00:40:45.727
OK.

00:40:45.727 --> 00:40:47.560
So let me ask any of
you, and we'll hand you

00:40:47.560 --> 00:40:50.200
the mic, if need be, what was the
algorithm you used to sort yourselves?

00:40:50.200 --> 00:40:51.540
AUDIENCE: Human intuition.

00:40:51.540 --> 00:40:53.370
SPEAKER 1: Human intuition, OK.

00:40:53.370 --> 00:40:55.130
[LAUGHTER]

00:40:55.130 --> 00:40:55.630
Nice.

00:40:55.630 --> 00:40:58.080
[APPLAUSE]

00:40:59.320 --> 00:40:59.820
Nice.

00:40:59.820 --> 00:41:01.220
Other formulations?

00:41:01.220 --> 00:41:01.720
Yeah?

00:41:04.642 --> 00:41:09.160
AUDIENCE: I just checked
if the person who's left

00:41:09.160 --> 00:41:13.320
me, who is supposed to be larger
than me is larger than me.

00:41:13.320 --> 00:41:17.880
And if he was larger than
me, then I stayed there.

00:41:17.880 --> 00:41:20.535
And if I was larger than him, I
just switched places with him.

00:41:20.535 --> 00:41:21.660
SPEAKER 1: OK, I like that.

00:41:21.660 --> 00:41:23.400
It's sort [? of a ?]
locally optimum approach,

00:41:23.400 --> 00:41:25.942
where you just kind of look to
the left and right and sort of

00:41:25.942 --> 00:41:27.630
fix any transpositions or mismatches.

00:41:27.630 --> 00:41:30.172
And in fact, let's go ahead and
try and apply that same idea.

00:41:30.172 --> 00:41:32.790
Can all eight of you reorder
yourselves, just like that,

00:41:32.790 --> 00:41:34.620
so that you're standing
below your number

00:41:34.620 --> 00:41:39.820
so that we're undoing the human
intuition that we just executed.

00:41:39.820 --> 00:41:42.450
And now let's go ahead and
say, all right, so, Bonnie,

00:41:42.450 --> 00:41:44.560
if you don't mind
helping direct us there--

00:41:44.560 --> 00:41:47.610
direct us here, we clearly have
now an unsorted list of numbers.

00:41:47.610 --> 00:41:49.928
Let's just bite off this
problem one bit at a time.

00:41:49.928 --> 00:41:51.720
So for instance, you
two, your names again?

00:41:51.720 --> 00:41:52.350
AUDIENCE: Tris.

00:41:52.350 --> 00:41:52.740
SPEAKER 1: Tris.

00:41:52.740 --> 00:41:53.280
AUDIENCE: Curtis.

00:41:53.280 --> 00:41:53.940
SPEAKER 1: And Curtis.

00:41:53.940 --> 00:41:55.655
So you guys are clearly out of order.

00:41:55.655 --> 00:41:57.780
So what would be the locally
optimal solution here.

00:41:57.780 --> 00:41:58.800
AUDIENCE: They would switch orders.

00:41:58.800 --> 00:42:00.050
SPEAKER 1: OK, please do that.

00:42:00.050 --> 00:42:01.885
All right, now let's consider 6 and 8.

00:42:01.885 --> 00:42:02.910
AUDIENCE: They're fine.

00:42:02.910 --> 00:42:03.892
SPEAKER 1: OK, 8 and 5?

00:42:03.892 --> 00:42:05.100
AUDIENCE: Let's switch again.

00:42:05.100 --> 00:42:06.392
SPEAKER 1: Please switch again.

00:42:06.392 --> 00:42:07.140
8 and 2?

00:42:07.140 --> 00:42:08.190
AUDIENCE: Switch.

00:42:08.190 --> 00:42:09.030
SPEAKER 1: OK.

00:42:09.030 --> 00:42:09.840
8 and 7?

00:42:09.840 --> 00:42:11.063
AUDIENCE: Switch.

00:42:11.063 --> 00:42:11.855
SPEAKER 1: 8 and 4?

00:42:11.855 --> 00:42:13.110
AUDIENCE: Switch.

00:42:13.110 --> 00:42:13.860
SPEAKER 1: 8 and--

00:42:13.860 --> 00:42:14.490
AUDIENCE: 1.

00:42:14.685 --> 00:42:14.880
SPEAKER 1: --1?

00:42:14.880 --> 00:42:15.480
AUDIENCE: Switch.

00:42:15.480 --> 00:42:16.355
SPEAKER 1: All right.

00:42:16.355 --> 00:42:17.930
So have we solved the problem?

00:42:17.930 --> 00:42:18.605
AUDIENCE: No.

00:42:18.605 --> 00:42:20.730
SPEAKER 1: OK, no, obviously
not, but is it better?

00:42:20.730 --> 00:42:23.160
Are we closer to the solution?

00:42:23.160 --> 00:42:27.493
I'd argue we are closer because,
right, like 8 somehow made its way all

00:42:27.493 --> 00:42:30.660
the way to the correct destination,
even though we still have kind of a mess

00:42:30.660 --> 00:42:31.920
here to fix.

00:42:31.920 --> 00:42:35.580
But notice that the solution got better
in this direction and a little better

00:42:35.580 --> 00:42:36.210
this direction.

00:42:36.210 --> 00:42:37.270
But we're going to do this again.

00:42:37.270 --> 00:42:38.715
So Bonnie, can you direct us once more?

00:42:38.715 --> 00:42:39.325
AUDIENCE: Yes.

00:42:39.325 --> 00:42:44.100
So if you would proceed from
this order, you two would switch.

00:42:44.100 --> 00:42:45.802
SPEAKER 1: 5 and 6?

00:42:45.802 --> 00:42:47.240
AUDIENCE: Let's switch again.

00:42:47.240 --> 00:42:48.330
SPEAKER 1: 6 and 2?

00:42:48.330 --> 00:42:50.630
AUDIENCE: Remain, and
then the next person--

00:42:50.630 --> 00:42:51.450
SPEAKER 1: 7 and 4?

00:42:51.450 --> 00:42:52.440
AUDIENCE: 7 and 4 switch.

00:42:52.440 --> 00:42:52.620
SPEAKER 1: Nice.

00:42:52.620 --> 00:42:53.120
7 and 1?

00:42:53.120 --> 00:42:54.162
AUDIENCE: 1 and 7 switch.

00:42:54.162 --> 00:42:54.780
And then--

00:42:54.780 --> 00:42:55.560
SPEAKER 1: So now are we done?

00:42:55.560 --> 00:42:56.233
AUDIENCE: No.

00:42:56.233 --> 00:42:58.650
SPEAKER 1: So no, but look,
the problem is getting better.

00:42:58.650 --> 00:43:01.968
It's closer to solution because now
we have 8 in place and 7 in place.

00:43:01.968 --> 00:43:04.260
So we've taken a bite out of
the problem, if you would.

00:43:04.260 --> 00:43:05.670
Now, we can do this a little more rapid.

00:43:05.670 --> 00:43:08.580
So if you want to tell everyone
what to do pairwise, pretty quickly.

00:43:08.580 --> 00:43:09.080
Go.

00:43:09.080 --> 00:43:12.187
AUDIENCE: So everyone, just if you're--

00:43:12.187 --> 00:43:13.650
[LAUGHTER]

00:43:13.650 --> 00:43:15.408
SPEAKER 1: Human
intuition, if you would.

00:43:15.408 --> 00:43:16.450
But let's do it pairwise.

00:43:16.450 --> 00:43:17.210
AUDIENCE: OK.

00:43:17.210 --> 00:43:18.360
Sure.

00:43:18.360 --> 00:43:21.240
Could everyone if the person on
your right is smaller than you,

00:43:21.240 --> 00:43:25.453
switch with them and then do that again.

00:43:25.453 --> 00:43:26.120
SPEAKER 1: Good.

00:43:26.120 --> 00:43:27.540
AUDIENCE: Do that again, again.

00:43:27.540 --> 00:43:29.231
SPEAKER 1: Good.

00:43:29.231 --> 00:43:30.010
AUDIENCE: Again.

00:43:32.670 --> 00:43:33.833
And then one last time.

00:43:33.833 --> 00:43:34.500
SPEAKER 1: Yeah.

00:43:34.500 --> 00:43:36.810
So even though we allowed it to get
a little organic there at the end,

00:43:36.810 --> 00:43:38.100
now is the list sorted?

00:43:38.100 --> 00:43:39.570
AUDIENCE: Yeah.

00:43:39.570 --> 00:43:40.722
SPEAKER 1: [LAUGHS] Yes.

00:43:40.722 --> 00:43:42.930
So maybe a round of applause
for our volunteers here.

00:43:42.930 --> 00:43:44.260
And thank you to Bonnie, especially.

00:43:44.260 --> 00:43:44.760
Thank you.

00:43:44.760 --> 00:43:47.160
[APPLAUSE]

00:43:47.660 --> 00:43:49.950
Brian, here we have a
stressful for each of you.

00:43:49.950 --> 00:43:51.090
And thank you so much.

00:43:51.090 --> 00:43:54.160
So let's see if we can't now formalize--

00:43:54.160 --> 00:43:56.350
feel free to make your
way off to either side.

00:43:56.350 --> 00:44:01.080
Let's see if we can't formalize exactly
what it is these volunteers wonderfully

00:44:01.080 --> 00:44:03.870
did at Bonnie's direction
to get this list sorted.

00:44:03.870 --> 00:44:06.870
It turns out that what
everyone did here has a name.

00:44:06.870 --> 00:44:10.230
It's an algorithm known as bubble
sort because as you notice,

00:44:10.230 --> 00:44:14.110
the 8 initially kind of bubbled
its way up from left to right,

00:44:14.110 --> 00:44:16.837
and then the 7 kind of bubbled
its way up from left to right.

00:44:16.837 --> 00:44:19.920
And as they repeated, even though we
did it more quickly at the [? end, ?]

00:44:19.920 --> 00:44:23.070
[? the ?] bigger numbers bubble their
way all the way up until they were

00:44:23.070 --> 00:44:24.060
in the right place.

00:44:24.060 --> 00:44:26.580
So in pseudocode, I'd argue
that what we did was this.

00:44:26.580 --> 00:44:29.460
Bonnie directed our audience,
at an increasing speed,

00:44:29.460 --> 00:44:32.340
to repeat the following n minus 1 times.

00:44:32.340 --> 00:44:33.390
Why n minus 1?

00:44:33.390 --> 00:44:36.900
Well, if you've got n people and
they're comparing each other,

00:44:36.900 --> 00:44:40.800
you can only compare people n
minus 1 times if you have n people.

00:44:40.800 --> 00:44:44.370
So she told them to do this
n minus 1 times in total

00:44:44.370 --> 00:44:47.130
for i from 0 to n minus 2.

00:44:47.130 --> 00:44:48.960
Now what's that actually referring to?

00:44:48.960 --> 00:44:50.610
So this i is our index.

00:44:50.610 --> 00:44:52.980
So it's kind of like treating
our humans like an array.

00:44:52.980 --> 00:44:54.030
What did we do?

00:44:54.030 --> 00:45:00.130
If the ith person, starting at 0, and
the ith plus 1 person are out of order,

00:45:00.130 --> 00:45:02.910
what did she tell them to do?

00:45:02.910 --> 00:45:05.680
Switch places or swap, so to speak.

00:45:05.680 --> 00:45:07.350
And so this looks pretty technical.

00:45:07.350 --> 00:45:11.790
But it's really just a pseudocode way of
distilling into more succinct English,

00:45:11.790 --> 00:45:15.860
with some numbers involved, what it is
Bonnie was directing everyone to do.

00:45:15.860 --> 00:45:17.610
She said do the following
n minus 1 times.

00:45:17.610 --> 00:45:20.610
That's why it went on for several
rotations, quicker and quicker.

00:45:20.610 --> 00:45:23.160
She then pretty much
treated the first person

00:45:23.160 --> 00:45:27.580
as bracket 0, the next person as bracket
1, bracket 2, just like an array,

00:45:27.580 --> 00:45:28.740
albeit of humans.

00:45:28.740 --> 00:45:31.950
And then she compared them side
by side, calling one person

00:45:31.950 --> 00:45:34.320
i and the person next to them i plus 1.

00:45:34.320 --> 00:45:37.815
And if they were out of order, it
was swapped and again and again

00:45:37.815 --> 00:45:39.440
and again till this algorithm executed.

00:45:39.440 --> 00:45:43.080
Until finally, the whole
thing was hopefully sorted.

00:45:43.080 --> 00:45:46.320
How many times did it--

00:45:46.320 --> 00:45:47.880
how many steps did it take?

00:45:47.880 --> 00:45:48.910
How long did it take?

00:45:48.910 --> 00:45:52.200
What's the running time in
big O notation of bubble sort?

00:45:52.200 --> 00:45:54.810
Well, the outer loop
takes n minus 1 steps.

00:45:54.810 --> 00:45:59.820
The inner loop also takes n minus 1
steps because it's 0 through n minus 2.

00:45:59.820 --> 00:46:02.800
And so if we go ahead and
multiply that out, ala FOIL,

00:46:02.800 --> 00:46:06.640
we have n squared minus
1n minus 1n plus 1.

00:46:06.640 --> 00:46:10.380
If we combine like terms, we now
have n squared minus 2n plus 1.

00:46:10.380 --> 00:46:12.660
But at this point,
what matters ultimately

00:46:12.660 --> 00:46:16.800
is that the highest order term, the n
squared, is what ultimately dominates.

00:46:16.800 --> 00:46:19.860
The bigger n gets, the more
impact that n squared has.

00:46:19.860 --> 00:46:22.410
And so a computer scientist
would say that bubble sort

00:46:22.410 --> 00:46:25.720
is on the order of n squared.

00:46:25.720 --> 00:46:29.970
So if we add to our list from
before the algorithm's upper bounds,

00:46:29.970 --> 00:46:33.750
we can now put bubble sort way up
at the top, unfortunately, which

00:46:33.750 --> 00:46:36.240
is to say that sorting
numbers with bubble sort

00:46:36.240 --> 00:46:41.085
is apparently way more expensive than
linearly searching or binary searching.

00:46:41.085 --> 00:46:42.960
And so it kind of invites
the question, then,

00:46:42.960 --> 00:46:45.168
with Eric and [? Nizari ?]
when they came up earlier.

00:46:45.168 --> 00:46:47.550
Yes, [? Nizari's ?]
algorithm was better.

00:46:47.550 --> 00:46:50.520
But it was better in the
sense that it ran faster.

00:46:50.520 --> 00:46:53.650
But it presupposed
what, just to be clear?

00:46:53.650 --> 00:46:54.583
AUDIENCE: [INAUDIBLE]

00:46:54.583 --> 00:46:56.250
SPEAKER 1: That the numbers were sorted.

00:46:56.250 --> 00:46:59.040
And so it's a little misleading
to say that binary search is

00:46:59.040 --> 00:47:00.570
better than linear search.

00:47:00.570 --> 00:47:02.490
Because if it costs you
a huge amount of time

00:47:02.490 --> 00:47:04.573
to sort those elements so
that, then, [? Nizari ?]

00:47:04.573 --> 00:47:07.530
can go ahead and execute binary
search, it might be a wash,

00:47:07.530 --> 00:47:09.390
or it might even be a net negative.

00:47:09.390 --> 00:47:11.400
So it's really going to
depend on, well, are you

00:47:11.400 --> 00:47:13.698
searching more often, more than once?

00:47:13.698 --> 00:47:15.990
Are you searching lots and
lots and lots of times, such

00:47:15.990 --> 00:47:18.210
that it's worth it to
sort it once and then

00:47:18.210 --> 00:47:20.670
benefit long term by much faster code?

00:47:20.670 --> 00:47:23.470
Well, what about omega for bubble sort?

00:47:23.470 --> 00:47:25.930
Bubble sort's code,
again, looked like this.

00:47:25.930 --> 00:47:30.040
And frankly, it doesn't really take
into account at all good inputs, right?

00:47:30.040 --> 00:47:33.150
Like, the best possible input to
any sorting algorithm most likely

00:47:33.150 --> 00:47:34.817
is it's already sorted for you, right?

00:47:34.817 --> 00:47:36.900
Because if it's already
sorted, presumably there's

00:47:36.900 --> 00:47:38.190
no actual work to be done.

00:47:38.190 --> 00:47:40.020
How lucky would that be?

00:47:40.020 --> 00:47:42.660
But bubble sort, as defined,
is kind of stupid, right?

00:47:42.660 --> 00:47:45.360
It doesn't say if already sorted, quit.

00:47:45.360 --> 00:47:48.450
It just blindly does the
following n minus 1 times

00:47:48.450 --> 00:47:51.480
and then inside of that does
something n minus 2 times.

00:47:51.480 --> 00:47:54.360
So what's the lower bound on
the running time of bubble sort,

00:47:54.360 --> 00:47:58.347
even if you get lucky and the whole
thing is already sorted for you?

00:47:58.347 --> 00:47:59.430
AUDIENCE: [? n squared. ?]

00:47:59.430 --> 00:48:02.760
SPEAKER 1: It's still in squared because
it's still going to take as many steps

00:48:02.760 --> 00:48:03.310
as before.

00:48:03.310 --> 00:48:07.963
And so bubble short as a lower bound,
arguably, has omega of n squared.

00:48:07.963 --> 00:48:10.380
And let's see, Brian, if you
wouldn't mind lending a hand,

00:48:10.380 --> 00:48:13.740
let's see if we can't do better
than by taking maybe a fundamentally

00:48:13.740 --> 00:48:16.680
different approach to
sorting, as by laying out

00:48:16.680 --> 00:48:18.750
something called selection sort.

00:48:18.750 --> 00:48:21.560
So in selection sort, we have
a similar set of numbers,

00:48:21.560 --> 00:48:23.940
but we won't bother using
something as large as 50.

00:48:23.940 --> 00:48:26.200
Brian's going to kindly set
them up in a random order,

00:48:26.200 --> 00:48:28.200
but we happen to have a
cheat sheet on the board

00:48:28.200 --> 00:48:30.240
so that we can try this
again if we need to.

00:48:30.240 --> 00:48:35.400
And these numbers right now are
unsorted from left to right.

00:48:35.400 --> 00:48:39.620
And we have 1 2, 3, 4, 5, 6,
7, 8 numbers in total here.

00:48:39.620 --> 00:48:42.120
So bubble sort was nice because
it leveraged your intuition,

00:48:42.120 --> 00:48:43.770
where it will just look to
the left, look to the right

00:48:43.770 --> 00:48:45.220
and fix those small problems.

00:48:45.220 --> 00:48:47.970
But honestly, a fundamentally
different way to think about sorting

00:48:47.970 --> 00:48:51.570
would be, well, if I know I want
small to large, left to right,

00:48:51.570 --> 00:48:53.610
why don't I just do that?

00:48:53.610 --> 00:48:54.930
What is the smallest number?

00:48:54.930 --> 00:48:58.470
Well, recall that these things, if
they're implemented in an array,

00:48:58.470 --> 00:48:59.760
might as well be in lockers.

00:48:59.760 --> 00:49:01.960
I can't just use a human
intuition in this case.

00:49:01.960 --> 00:49:03.930
I have to look at each
element individually.

00:49:03.930 --> 00:49:05.610
But I'm not going to bother
throwing them back in the locker

00:49:05.610 --> 00:49:07.890
because that's just going
to take unnecessary time.

00:49:07.890 --> 00:49:08.880
But I look at 6.

00:49:08.880 --> 00:49:11.453
6 is the smallest number
I have seen thus far.

00:49:11.453 --> 00:49:13.870
So at the moment, this is the
smallest number in the list.

00:49:13.870 --> 00:49:16.620
So I'm going to remember that
with a variable in my mind.

00:49:16.620 --> 00:49:17.400
Now I see 3.

00:49:17.400 --> 00:49:20.460
3 is obviously less than 6,
so I'm going to forget about 6

00:49:20.460 --> 00:49:23.940
and just remember for now that 3
is the smallest element I've seen.

00:49:23.940 --> 00:49:25.170
8 is no smaller.

00:49:25.170 --> 00:49:26.250
5 is no smaller.

00:49:26.250 --> 00:49:27.493
Ooh, 2 is smaller.

00:49:27.493 --> 00:49:29.160
I'm going to remember 2 is the smallest.

00:49:29.160 --> 00:49:31.050
I'm going to forget about the 3.

00:49:31.050 --> 00:49:34.680
Meanwhile I keep going, 7,
4-- ooh, 1 is even smaller.

00:49:34.680 --> 00:49:36.430
And so I've gotten to
the end of the list.

00:49:36.430 --> 00:49:38.610
The smallest element in this list is 1.

00:49:38.610 --> 00:49:40.388
It obviously belongs over there.

00:49:40.388 --> 00:49:41.430
So what can I do with it?

00:49:41.430 --> 00:49:42.843
AUDIENCE: [INAUDIBLE].

00:49:42.843 --> 00:49:44.760
SPEAKER 1: Yeah, ideally
I could just move it.

00:49:44.760 --> 00:49:46.302
Now, maybe I should make room, right?

00:49:46.302 --> 00:49:48.873
The table's a little small,
or my array is a fixed size.

00:49:48.873 --> 00:49:51.040
So I could start scooching
everything over this way.

00:49:51.040 --> 00:49:51.420
But you know what?

00:49:51.420 --> 00:49:53.460
Frankly, that's going to
take a while, right? [? I ?]

00:49:53.460 --> 00:49:54.990
have to move, like, seven elements.

00:49:54.990 --> 00:49:59.370
Why don't I just kind of
forcefully evict the 6,

00:49:59.370 --> 00:50:03.050
put it over here, because after all, it
was in random order in the first place.

00:50:03.050 --> 00:50:05.850
Who cares if I move it
someplace else even more random?

00:50:05.850 --> 00:50:06.930
I'll deal with it later.

00:50:06.930 --> 00:50:08.263
So you could do either approach.

00:50:08.263 --> 00:50:09.510
You could shift everything.

00:50:09.510 --> 00:50:11.490
But that feels like
it'll take some time.

00:50:11.490 --> 00:50:14.440
Or you can just evict whatever
is in the place you want to be.

00:50:14.440 --> 00:50:18.330
But what's nice now is that
my list is closer to sorted.

00:50:18.330 --> 00:50:21.010
The 1 is in its correct place.

00:50:21.010 --> 00:50:24.535
So now all I have to look at
is n minus one other element.

00:50:24.535 --> 00:50:25.410
So let's take a look.

00:50:25.410 --> 00:50:26.785
What's the next smallest element?

00:50:26.785 --> 00:50:30.070
At the moment, it's 3, still 3, still 3.

00:50:30.070 --> 00:50:31.950
Oh, wait a minute, it looks like 2.

00:50:31.950 --> 00:50:34.680
Now, you might want to just
abort now and rip out the 2.

00:50:34.680 --> 00:50:36.993
But you don't know
necessarily, as the computer,

00:50:36.993 --> 00:50:38.910
if you're only looking
at one value at a time,

00:50:38.910 --> 00:50:41.340
unless you have multiple
variables in your mind, which

00:50:41.340 --> 00:50:42.573
I'm not going to bother with.

00:50:42.573 --> 00:50:44.490
Let me see if there's
anything smaller than 2.

00:50:44.490 --> 00:50:46.900
7, 4, 6-- no.

00:50:46.900 --> 00:50:48.570
So I'm going to grab the 2.

00:50:48.570 --> 00:50:51.312
And where do I want to put it?

00:50:51.312 --> 00:50:52.020
Right over there.

00:50:52.020 --> 00:50:52.832
And you know what?

00:50:52.832 --> 00:50:54.040
This could be a net negative.

00:50:54.040 --> 00:50:55.650
But I think it's going to average out.

00:50:55.650 --> 00:50:59.160
I'm going to move the 3 to where I
do have room and go ahead and claim

00:50:59.160 --> 00:51:01.385
that my 2 is now sorted.

00:51:01.385 --> 00:51:03.510
And I'm going to do this
again and again and again.

00:51:03.510 --> 00:51:05.760
And just like Bonnie did, I'm
going to do it a little faster now,

00:51:05.760 --> 00:51:06.677
walk through the list.

00:51:06.677 --> 00:51:08.140
OK, 3 is the smallest.

00:51:08.140 --> 00:51:11.277
I'm going to go ahead and put it
in sorted order by evicting the 8.

00:51:11.277 --> 00:51:12.360
Now I'm going to go ahead.

00:51:12.360 --> 00:51:14.130
All right, 5 8, 7--

00:51:14.130 --> 00:51:15.690
4 is now the smallest.

00:51:15.690 --> 00:51:18.000
I'm going to go ahead and
evict the 5, move it over

00:51:18.000 --> 00:51:19.980
here, and claim that that's sorted.

00:51:19.980 --> 00:51:23.250
Let me do it once more, 8, 7, 5, 6.

00:51:23.250 --> 00:51:24.900
5 is clearly the smallest.

00:51:24.900 --> 00:51:28.310
Let me go ahead and evict the
8 again, make room for the 5.

00:51:28.310 --> 00:51:31.670
But I only have three
steps left, 7, 8, 6.

00:51:31.670 --> 00:51:36.420
Let me go ahead and move the 7
over here, put the sixth in place.

00:51:36.420 --> 00:51:37.170
8 is the smallest.

00:51:37.170 --> 00:51:38.280
No, 7 is smaller.

00:51:38.280 --> 00:51:40.920
Let me go ahead and put it
in place, evicting the 8.

00:51:40.920 --> 00:51:46.360
Voila, hopefully now, oof, done but
a fundamentally different algorithm,

00:51:46.360 --> 00:51:46.860
right?

00:51:46.860 --> 00:51:50.220
There was no pairwise swapping
back and forth and back and forth.

00:51:50.220 --> 00:51:53.670
Each time I sort of set my mind on a
goal, get the next smallest element,

00:51:53.670 --> 00:51:55.000
get the next smallest element.

00:51:55.000 --> 00:51:59.460
And that is what we shall call
selection sort, where on each iteration

00:51:59.460 --> 00:52:01.510
you select the next smallest element.

00:52:01.510 --> 00:52:05.880
So in pseudocode we might say
this, for i from 0 to n minus 1.

00:52:05.880 --> 00:52:07.580
And again, just adopt this habit now.

00:52:07.580 --> 00:52:11.650
Any time in the life, and certainly
a CS class, when you have n items,

00:52:11.650 --> 00:52:15.300
the first one is ironically
1 but in this case, 0.

00:52:15.300 --> 00:52:17.520
And the last one is n minus 1.

00:52:17.520 --> 00:52:19.920
0 to n minus 1 is how
a computer scientist

00:52:19.920 --> 00:52:22.780
counts from 1 to n in the real world.

00:52:22.780 --> 00:52:26.880
So this just says do the
following n times but use i.

00:52:26.880 --> 00:52:28.530
Start counting from 0.

00:52:28.530 --> 00:52:32.280
Find the smallest item between
the ith item and the last item.

00:52:32.280 --> 00:52:33.510
What am I saying there?

00:52:33.510 --> 00:52:36.030
Well, if I initialize i
initially to 0, that's

00:52:36.030 --> 00:52:38.820
just saying find the smallest
element among all eight

00:52:38.820 --> 00:52:42.810
and grab it, swap the smallest
item with that ith item.

00:52:42.810 --> 00:52:45.030
So wherever I found
the smallest element,

00:52:45.030 --> 00:52:47.160
go ahead and swap it with that one.

00:52:47.160 --> 00:52:48.870
And then this algorithm-- whoops--

00:52:48.870 --> 00:52:51.570
is just going to repeat
again and again and again.

00:52:51.570 --> 00:52:56.440
It's almost a little more succinct
to represent in pseudocode.

00:52:56.440 --> 00:52:59.260
But it invites the question,
then, is this better?

00:52:59.260 --> 00:53:00.330
Is selection sort better?

00:53:00.330 --> 00:53:01.950
Well, what would it mean for
an algorithm to be better?

00:53:01.950 --> 00:53:04.128
We have two rules of
thumb, big O and omega.

00:53:04.128 --> 00:53:04.920
So let's try those.

00:53:04.920 --> 00:53:09.870
So in big O notation, how many steps
does it take to sort a list of numbers

00:53:09.870 --> 00:53:12.870
like I did, where you just
again and again and again select

00:53:12.870 --> 00:53:16.210
the smallest, the smallest,
the smallest element?

00:53:16.210 --> 00:53:18.210
Well, how do you even
begin to think about that?

00:53:18.210 --> 00:53:18.710
Yeah?

00:53:18.710 --> 00:53:23.691
AUDIENCE: [INAUDIBLE] n squared
because you have at iteration n

00:53:23.691 --> 00:53:28.113
[? an ?] n minus 1 [INAUDIBLE].

00:53:28.113 --> 00:53:28.780
SPEAKER 1: Yeah.

00:53:28.780 --> 00:53:29.730
That's the right intuition.

00:53:29.730 --> 00:53:31.820
And let me back up just one
step until we get to that.

00:53:31.820 --> 00:53:33.168
The proposal was its n squared.

00:53:33.168 --> 00:53:34.960
And indeed, that's
going to be the spoiler.

00:53:34.960 --> 00:53:35.770
But why?

00:53:35.770 --> 00:53:37.570
Well, if you actually
started to count up

00:53:37.570 --> 00:53:41.110
how many steps I was taking physically,
right, to find the smallest element,

00:53:41.110 --> 00:53:44.498
it's going to take me maybe
seven steps to find the smallest

00:53:44.498 --> 00:53:46.540
element because I'm going
to look at all of them.

00:53:46.540 --> 00:53:49.450
So in my first pass, I'm
looking at all eight elements,

00:53:49.450 --> 00:53:53.620
or taking almost n steps to find
the smallest number, like 1.

00:53:53.620 --> 00:53:55.300
But after that, the 1 was in place.

00:53:55.300 --> 00:53:58.820
And I turned on its light bulbs,
and that left seven numbers left.

00:53:58.820 --> 00:54:00.490
And how many steps did I then take?

00:54:00.490 --> 00:54:01.570
Well, n minus 1.

00:54:01.570 --> 00:54:04.300
Then after the 2 was in
place, how many steps?

00:54:04.300 --> 00:54:07.090
n minus 2 and then n
minus 3, n minus 4, dot,

00:54:07.090 --> 00:54:09.490
dot, dot, until there
was just one number left.

00:54:09.490 --> 00:54:13.873
So that invites the question,
what, then, does this total up to?

00:54:13.873 --> 00:54:15.790
And indeed, you jumped
to the right intuition.

00:54:15.790 --> 00:54:18.730
If you start with n, and you
add to the n minus 1 steps,

00:54:18.730 --> 00:54:21.160
and you add to that n minus
2 steps, dot, dot, dot,

00:54:21.160 --> 00:54:24.303
one final step once you
get to the end of the list,

00:54:24.303 --> 00:54:25.720
what does this actually sum up to?

00:54:25.720 --> 00:54:26.710
It's actually not obvious.

00:54:26.710 --> 00:54:29.050
And this is one of those things in
life, unless you're a math major,

00:54:29.050 --> 00:54:31.030
you probably would look at
the back of a math textbook

00:54:31.030 --> 00:54:32.860
or a physics textbook for
those little cheat sheets

00:54:32.860 --> 00:54:34.630
that they used to come with,
at least in high school.

00:54:34.630 --> 00:54:37.870
Allow me just to propose for today's
sake, if you actually do out this math

00:54:37.870 --> 00:54:40.840
or look it up at the back of
a book, it ends up being this,

00:54:40.840 --> 00:54:43.877
n times n plus 1 divided by 2.

00:54:43.877 --> 00:54:45.460
And you can prove this mathematically.

00:54:45.460 --> 00:54:48.820
But for our purposes, just
trust me, if you will,

00:54:48.820 --> 00:54:52.730
that adding a number plus the smaller
number plus the smaller number plus

00:54:52.730 --> 00:54:56.230
the smaller number all the way to 1,
gives you this relationship, n times n

00:54:56.230 --> 00:54:57.550
plus 1 divided by 2.

00:54:57.550 --> 00:54:59.745
And it's fine if you
just take that as fact.

00:54:59.745 --> 00:55:01.120
So let me just multiply this out.

00:55:01.120 --> 00:55:03.340
That's n squared plus n/2.

00:55:03.340 --> 00:55:05.890
That, of course, is n squared
divided by 2 plus n/2.

00:55:05.890 --> 00:55:08.140
But, again, who cares?

00:55:08.140 --> 00:55:10.840
Big O notation would propose
that we focus only on what?

00:55:10.840 --> 00:55:11.765
AUDIENCE: n squared.

00:55:11.765 --> 00:55:12.640
SPEAKER 1: n squared.

00:55:12.640 --> 00:55:14.980
This frankly, is on
the order of n squared,

00:55:14.980 --> 00:55:17.650
exactly as you said,
because as n gets large,

00:55:17.650 --> 00:55:20.710
the only factor in that mathematical
expression that we're really

00:55:20.710 --> 00:55:23.890
going to care about is the one that
gets bigger and bigger and bigger

00:55:23.890 --> 00:55:26.030
faster than everything else.

00:55:26.030 --> 00:55:28.870
So in terms of selection
sort, it would seem that we

00:55:28.870 --> 00:55:30.907
have big O of n squared for it as well.

00:55:30.907 --> 00:55:32.740
So it's a fundamentally
different algorithm,

00:55:32.740 --> 00:55:34.720
but mathematically
and in the real world,

00:55:34.720 --> 00:55:36.410
it kind of works out to be the same.

00:55:36.410 --> 00:55:38.080
So we haven't really done better yet.

00:55:38.080 --> 00:55:40.330
What about omega for selection sort?

00:55:40.330 --> 00:55:43.690
If the code for selection
sort is this, does it

00:55:43.690 --> 00:55:48.170
benefit from the list
being sorted already?

00:55:48.170 --> 00:55:51.920
Or is it just going to blindly
do its order of n squared work

00:55:51.920 --> 00:55:54.470
again and again anyway, right?

00:55:54.470 --> 00:55:56.720
Like, this is opportune that
the numbers are currently

00:55:56.720 --> 00:55:59.540
sorted because we can make a point,
well, this is the best case scenario.

00:55:59.540 --> 00:56:01.020
I hand you the numbers 1 through 8.

00:56:01.020 --> 00:56:01.800
They're already sorted.

00:56:01.800 --> 00:56:03.467
And you try to use selection sort on it.

00:56:03.467 --> 00:56:05.698
Well, you might think, ooh,
it's in the right place.

00:56:05.698 --> 00:56:07.490
I'm just going to grab
the smallest number.

00:56:07.490 --> 00:56:10.377
Now I'm going to grab the next
smallest number and so forth.

00:56:10.377 --> 00:56:11.210
But that's not true.

00:56:11.210 --> 00:56:14.780
When I'm the computer, and I open the
first locker, and I see the number 1,

00:56:14.780 --> 00:56:16.810
do I know anything more
about my numbers yet?

00:56:16.810 --> 00:56:17.660
AUDIENCE: No.

00:56:17.660 --> 00:56:18.140
SPEAKER 1: No, right?

00:56:18.140 --> 00:56:21.223
You're using human intuition to see
that, OK, obviously it's the smallest.

00:56:21.223 --> 00:56:25.970
I, the program, do not know that until
I look at the other numbers in the list.

00:56:25.970 --> 00:56:29.360
And so again, if you just iterate
through using selection sort,

00:56:29.360 --> 00:56:31.460
you only know what's
in front of you, which

00:56:31.460 --> 00:56:35.390
means you're going to execute the
exact same code again and again.

00:56:35.390 --> 00:56:36.890
And that means the math is the same.

00:56:36.890 --> 00:56:39.680
Even in this best case, we
are truly wasting our time

00:56:39.680 --> 00:56:45.020
now with selections sort because it is
going to be omega of n squared, too.

00:56:45.020 --> 00:56:48.740
So my god, now we have two
bad solutions to a problem.

00:56:48.740 --> 00:56:49.950
Can we do better?

00:56:49.950 --> 00:56:52.120
Well, let me propose
we revisit bubble sort.

00:56:52.120 --> 00:56:56.390
Bubble sort, again, just has you swap
adjacent elements again and again

00:56:56.390 --> 00:56:59.720
and again and again
until you're all sorted.

00:56:59.720 --> 00:57:02.960
But when might you want to stop
going back and forth the list?

00:57:02.960 --> 00:57:08.550
Like, when might Bonnie have wanted to
say, ooh, that's enough work, I'm done?

00:57:08.550 --> 00:57:11.550
If she walks through the list
looking at every person, i and i

00:57:11.550 --> 00:57:13.950
plus 1 next to each
other, when might she

00:57:13.950 --> 00:57:16.200
conclude that she's done
doing that work of sorting?

00:57:19.984 --> 00:57:20.484
Yeah?

00:57:20.484 --> 00:57:22.400
AUDIENCE: [INAUDIBLE]

00:57:22.400 --> 00:57:24.530
SPEAKER 1: Yeah, if there
was a question she asked,

00:57:24.530 --> 00:57:27.980
or if there was a pass she made
walking through the volunteers

00:57:27.980 --> 00:57:29.360
and didn't have to do any work.

00:57:29.360 --> 00:57:31.485
She doesn't have to keep
doing work again and again

00:57:31.485 --> 00:57:34.280
just because the algorithm said
to repeat it n minus 1 times.

00:57:34.280 --> 00:57:37.820
We kind of want to have a condition in
here, or some way of short circuiting

00:57:37.820 --> 00:57:41.570
the algorithm, so that we stop once
we're really just wasting our time.

00:57:41.570 --> 00:57:43.820
And bubble sort lends
itself to that because we

00:57:43.820 --> 00:57:48.485
can tweak our wording of our pseudocode
as follows, repeat until no swaps.

00:57:48.485 --> 00:57:51.110
So again, it's opportune that
these numbers are already sorted.

00:57:51.110 --> 00:57:52.370
Let's try bubble sort on it.

00:57:52.370 --> 00:57:54.998
So Bonnie probably would
have said, compare 1 and 2.

00:57:54.998 --> 00:57:56.040
They're not out of order.

00:57:56.040 --> 00:57:57.260
So we don't have to swap.

00:57:57.260 --> 00:58:02.570
2 and 3, 3 and 4, 4 and 5,
5 and 6, 6 and 7, 7 and 8,

00:58:02.570 --> 00:58:04.580
she obviously did no swaps.

00:58:04.580 --> 00:58:09.290
It would be stupid for her to go
again just because the algorithm said

00:58:09.290 --> 00:58:13.460
do this n minus 1 times because
she's going to get no, no, no, no,

00:58:13.460 --> 00:58:15.080
again and again as her answer.

00:58:15.080 --> 00:58:19.520
So by saying repeat until no swaps, she
can abort this algorithm early and then

00:58:19.520 --> 00:58:23.327
have taken how many
steps in this best case?

00:58:23.327 --> 00:58:24.202
AUDIENCE: [INAUDIBLE]

00:58:24.202 --> 00:58:26.119
SPEAKER 1: Yeah, technically
n minus 1, right?

00:58:26.119 --> 00:58:31.750
Because if this is n elements, or 8, you
can compare seven pairs, 1, 2, 3, 4, 5,

00:58:31.750 --> 00:58:35.000
6, 7, so n minus 1.

00:58:35.000 --> 00:58:38.480
So she could, in the best
case, then, have a lower bound

00:58:38.480 --> 00:58:41.030
on running time for selection sort--

00:58:41.030 --> 00:58:45.620
of bubble sort no longer
of n squared, but now n.

00:58:45.620 --> 00:58:47.630
So it would seem that
with a bit more cleverness

00:58:47.630 --> 00:58:52.635
we can actually benefit in terms of
the running time of these algorithms.

00:58:52.635 --> 00:58:55.760
Well, let's see if we can't see these
from a slightly different perspective

00:58:55.760 --> 00:58:58.290
now by doing this visualization.

00:58:58.290 --> 00:59:02.990
I'm going to go ahead and open up
a graphical visualization of each

00:59:02.990 --> 00:59:04.890
of these algorithms in turn.

00:59:04.890 --> 00:59:09.890
So what you have here is an
array of numbers, each of which

00:59:09.890 --> 00:59:11.690
is represented by a vertical bar.

00:59:11.690 --> 00:59:14.270
Short bar is small number, like 0, 1, 2.

00:59:14.270 --> 00:59:18.500
Tall bar is big number, like 99
or 100 or anything in between.

00:59:18.500 --> 00:59:20.330
This is a visualization tool online.

00:59:20.330 --> 00:59:21.740
And we'll link this on
the course's website

00:59:21.740 --> 00:59:23.240
so that we can try these algorithms.

00:59:23.240 --> 00:59:25.090
So let's try bubble sort, for instance.

00:59:25.090 --> 00:59:26.660
I'm going to start it kind of slow.

00:59:26.660 --> 00:59:30.860
But you can see highlighted in pink
two elements being compared side

00:59:30.860 --> 00:59:34.220
by side, i and i plus 1 being
swapped if they're out of order.

00:59:34.220 --> 00:59:37.460
So this is the graphical version
of what Bonnie's instructions were

00:59:37.460 --> 00:59:38.810
to our volunteers.

00:59:38.810 --> 00:59:40.800
And now notice, bubble
sort gets its name

00:59:40.800 --> 00:59:43.550
because notice what's happening
to apparently the biggest element.

00:59:43.550 --> 00:59:46.880
It's sort of bubbling its way
up all the way to the end.

00:59:46.880 --> 00:59:48.710
Smaller elements are making progress.

00:59:48.710 --> 00:59:51.410
Like a 15 and a 12 just moved
a little bit to the left.

00:59:51.410 --> 00:59:52.340
But they're not done.

00:59:52.340 --> 00:59:54.020
They're not in their right places yet.

00:59:54.020 --> 00:59:58.100
But the big elements are starting
to bubble all the way to the right.

00:59:58.100 --> 01:00:00.600
Now, this gets a little
tedious pretty quickly.

01:00:00.600 --> 01:00:03.390
So I'm going to go ahead and
speed up the animation speed.

01:00:03.390 --> 01:00:04.650
And if we watch it now--

01:00:04.650 --> 01:00:06.650
same algorithm, it's
just running faster--

01:00:06.650 --> 01:00:08.690
you can really see that
the larger elements are

01:00:08.690 --> 01:00:10.920
accumulating at the right-hand side.

01:00:10.920 --> 01:00:12.830
So this is identical to
our eight volunteers.

01:00:12.830 --> 01:00:15.260
It's just each human now
is represented by a bar.

01:00:15.260 --> 01:00:19.070
And you can really see the larger
numbers bubbling their way up

01:00:19.070 --> 01:00:20.210
to the top.

01:00:20.210 --> 01:00:23.810
But you can see perhaps more
visually there's a lot of work here.

01:00:23.810 --> 01:00:25.940
Bonnie was uttering a lot of sentences.

01:00:25.940 --> 01:00:28.310
She was doing a lot of back
and forth, because, just

01:00:28.310 --> 01:00:32.120
as this pink bar suggests, it's going
back and forth and back and forth,

01:00:32.120 --> 01:00:35.887
doing a lot of work again
and again and again.

01:00:35.887 --> 01:00:36.470
And let's see.

01:00:36.470 --> 01:00:38.512
It's going to start to
speed up now because we're

01:00:38.512 --> 01:00:40.820
nearing the latter half of it.

01:00:40.820 --> 01:00:44.420
But as you can see,
ultimately, this is kind

01:00:44.420 --> 01:00:46.130
of what n squared feels like, right?

01:00:46.130 --> 01:00:47.540
I'm kind of out of words again.

01:00:47.540 --> 01:00:49.050
And I could say some more things.

01:00:49.050 --> 01:00:51.842
But it's really just stalling
because the algorithm's kind of slow.

01:00:51.842 --> 01:00:55.550
n squared is not a good
upper bound on running

01:00:55.550 --> 01:00:58.598
time, especially when your
[? elements ?] are randomly sorted.

01:00:58.598 --> 01:00:59.640
So let's try another one.

01:00:59.640 --> 01:01:02.375
Let's do, in this case, selection sort.

01:01:02.375 --> 01:01:05.520
So I'm going to re-randomize the
numbers just as we started and now do

01:01:05.520 --> 01:01:06.380
selection sort.

01:01:06.380 --> 01:01:08.240
And I'm starting at the faster speed.

01:01:08.240 --> 01:01:10.010
And it's working a little differently.

01:01:10.010 --> 01:01:11.990
Notice that the pink
line is sweeping from

01:01:11.990 --> 01:01:13.990
left to right, looking
for the smallest element.

01:01:13.990 --> 01:01:16.880
And when it finds it, it
highlights the little bar,

01:01:16.880 --> 01:01:19.363
and it moves it all the
way in place to the left.

01:01:19.363 --> 01:01:22.280
So whereas bubble [? sort's ?] large
elements bubbled up to the right,

01:01:22.280 --> 01:01:26.360
selection sort is much more emphatically
grabbing the smallest element

01:01:26.360 --> 01:01:29.340
and putting it into its
place one after the other.

01:01:29.340 --> 01:01:30.890
So this has a different feel.

01:01:30.890 --> 01:01:33.200
But here, too, I'm
going to have to ad lib

01:01:33.200 --> 01:01:35.380
quite a bit because it's taking a while.

01:01:35.380 --> 01:01:37.970
And you can see the pink bars
are really going back and forth

01:01:37.970 --> 01:01:43.610
and back and forth, doing quite a
bit of work, quite a bit of work,

01:01:43.610 --> 01:01:44.960
quite a bit of work.

01:01:44.960 --> 01:01:47.060
And now finally, it's done.

01:01:47.060 --> 01:01:50.690
So in a bit, we'll take a look
at fundamentally faster solutions

01:01:50.690 --> 01:01:52.730
and see why n squared actually is small.

01:01:52.730 --> 01:01:56.270
But first let's take our five-minute
break with mini cupcakes outside.

01:01:59.900 --> 01:02:01.370
So we are back.

01:02:01.370 --> 01:02:03.860
And just as a teaser
for this coming week,

01:02:03.860 --> 01:02:07.280
the week is ultimately about algorithms
and the implementation thereof.

01:02:07.280 --> 01:02:09.528
And it turns out that
certainly on campus

01:02:09.528 --> 01:02:11.570
and in the real world,
our [? election's ?] quite

01:02:11.570 --> 01:02:14.600
often, an algorithmic process
to which there's actually

01:02:14.600 --> 01:02:16.550
multiple possible solutions.

01:02:16.550 --> 01:02:19.450
Indeed, when you vote for someone,
how those votes are tabulated

01:02:19.450 --> 01:02:21.700
can actually differ based
on the algorithm being used.

01:02:21.700 --> 01:02:24.460
And those can actually had
very real-world effects

01:02:24.460 --> 01:02:26.110
on the outcomes of those elections.

01:02:26.110 --> 01:02:28.930
So among the challenges ahead for
the coming week with problem set

01:02:28.930 --> 01:02:32.260
three is to implement a number of
algorithms related to elections.

01:02:32.260 --> 01:02:34.300
For instance, it might
be a very simple ballot,

01:02:34.300 --> 01:02:39.610
whereby whoever has the most votes
among all among all of the candidates,

01:02:39.610 --> 01:02:41.218
or whoever has a plurality wins.

01:02:41.218 --> 01:02:44.260
Or you can implement some kind of
runoff election, whereby you don't just

01:02:44.260 --> 01:02:46.690
vote for one candidate, but
you rank your preferences.

01:02:46.690 --> 01:02:49.720
And then you use software or
a more manual human process

01:02:49.720 --> 01:02:52.660
to adjudicate who wins based on
the ranking of those candidates.

01:02:52.660 --> 01:02:55.150
And there's even more
possibilities that ultimately

01:02:55.150 --> 01:02:57.370
can influence real-world
outcomes, whether it's here

01:02:57.370 --> 01:02:59.090
on campus or in the real world.

01:02:59.090 --> 01:03:02.080
And so that's what we'll
explore this week in code.

01:03:02.080 --> 01:03:04.240
But now let's see if
we can't fundamentally

01:03:04.240 --> 01:03:07.660
do better than both bubble
sort and selection sort.

01:03:07.660 --> 01:03:10.810
And let me stipulate, there's
actually dozens of sorting algorithms.

01:03:10.810 --> 01:03:14.320
We're looking just at a couple of
representative algorithms here.

01:03:14.320 --> 01:03:19.090
But let's see if we can't do
fundamentally better than the n squared

01:03:19.090 --> 01:03:20.863
big O that we kept bumping up against.

01:03:20.863 --> 01:03:23.530
And to do that, let me propose
that we introduce a fundamentally

01:03:23.530 --> 01:03:27.070
new idea that, frankly, among the
ideas we explore in computer science

01:03:27.070 --> 01:03:28.960
will kind of bend your
mind a little bit.

01:03:28.960 --> 01:03:30.460
So again, here comes that fire hose.

01:03:30.460 --> 01:03:33.760
But again, the goal today is
exposure, not yet comfort.

01:03:33.760 --> 01:03:37.600
Comfort will come in the coming weeks
as we apply [? this ?] [? ideas ?]

01:03:37.600 --> 01:03:38.470
and others.

01:03:38.470 --> 01:03:41.608
So let's rewind to week 0, where
everything was very simple at the time.

01:03:41.608 --> 01:03:43.900
And we were just searching
a phone book for Mike Smith.

01:03:43.900 --> 01:03:45.670
And we had this pseudocode here.

01:03:45.670 --> 01:03:48.550
This had an example of a programming
construct that, at the time,

01:03:48.550 --> 01:03:51.460
we highlighted and called
a loop, go back to line 3

01:03:51.460 --> 01:03:54.010
so that you can do
something again and again.

01:03:54.010 --> 01:03:56.503
This is an example of
what's called iteration,

01:03:56.503 --> 01:03:58.420
a word you might have
heard your [? TFs ?] say

01:03:58.420 --> 01:04:02.365
or someone else, where to iterate
just means to loop again and again.

01:04:02.365 --> 01:04:03.740
And this is very straightforward.

01:04:03.740 --> 01:04:05.698
And we could implement
this in code if we want.

01:04:05.698 --> 01:04:08.980
But there there's an opportunity
to design this algorithm not only

01:04:08.980 --> 01:04:11.500
differently, but perhaps better, right?

01:04:11.500 --> 01:04:14.440
After all, let me go ahead
and erase that line there

01:04:14.440 --> 01:04:17.530
and get rid of this iteration and
see if I can't solve the problem

01:04:17.530 --> 01:04:21.230
more elegantly, if you will,
a better design, if you will,

01:04:21.230 --> 01:04:23.230
though there will invariably
be some trade-offs.

01:04:23.230 --> 01:04:26.230
Here, with the open to
the middle of the left--

01:04:26.230 --> 01:04:28.720
here, with open to middle
of left half of book

01:04:28.720 --> 01:04:31.540
and here, open to middle
of right half of book,

01:04:31.540 --> 01:04:35.530
the whole point of opening to the middle
of the left or the middle of the right

01:04:35.530 --> 01:04:39.190
was just a search for Mike Smith
again but in half of the phone book,

01:04:39.190 --> 01:04:40.180
left or right.

01:04:40.180 --> 01:04:43.810
The key detail being it's half
the size of the whole phone book.

01:04:43.810 --> 01:04:45.500
But the algorithm is really the same.

01:04:45.500 --> 01:04:47.593
So in fact, why don't we
simplify our pseudocode

01:04:47.593 --> 01:04:50.260
and not get into the logistics
of like, oh, go back to this line

01:04:50.260 --> 01:04:51.635
and then do this again and again.

01:04:51.635 --> 01:04:54.700
No, let's just say search
the left half of book

01:04:54.700 --> 01:04:56.800
or search the right half of book.

01:04:56.800 --> 01:04:59.620
And in fact, let's tighten up
the code and make it fewer lines

01:04:59.620 --> 01:05:03.250
so that we don't even need to get
into the specific line numbers.

01:05:03.250 --> 01:05:06.070
We can just tell ourselves what to do.

01:05:06.070 --> 01:05:09.040
Now, highlighted in yellow
here are those two new lines.

01:05:09.040 --> 01:05:11.165
And it might seem kind of
like a cyclical argument.

01:05:11.165 --> 01:05:12.790
Well, how do you search for Mike Smith?

01:05:12.790 --> 01:05:14.590
Well, you just search for Mike Smith.

01:05:14.590 --> 01:05:17.620
But the key detail here
is I'm not just telling

01:05:17.620 --> 01:05:19.390
you to do the same thing endlessly.

01:05:19.390 --> 01:05:22.057
I'm telling you, if you want to
search for Mike Smith in a phone

01:05:22.057 --> 01:05:23.140
book of this size, mm-mm.

01:05:23.140 --> 01:05:25.315
Search for Mike Smith in
a phone book of this size.

01:05:25.315 --> 01:05:27.940
And then the next step of that
algorithm becomes search for him

01:05:27.940 --> 01:05:32.320
in a phone book of this size, this
size, when you keep halving the problem.

01:05:32.320 --> 01:05:37.150
So this is an example of a technique in
programming called recursion, whereby

01:05:37.150 --> 01:05:43.810
you implement a program or an algorithm
or code that, in a sense, calls itself.

01:05:43.810 --> 01:05:47.950
If what we're looking at here on the
board is a function called search,

01:05:47.950 --> 01:05:52.600
a function is recursive if it
literally references its own name

01:05:52.600 --> 01:05:53.785
in its own code.

01:05:53.785 --> 01:05:55.910
And this is where your mind
starts to bend perhaps.

01:05:55.910 --> 01:05:57.640
And we'll see this more concretely.

01:05:57.640 --> 01:06:00.910
But recursion is when a
function calls itself.

01:06:00.910 --> 01:06:03.970
So if this is a function implementing
search and highlighted in yellow

01:06:03.970 --> 01:06:07.690
are two lines of code that say
search again but on a smaller

01:06:07.690 --> 01:06:13.330
piece of the problem, that is recursion,
something happening again and again.

01:06:13.330 --> 01:06:14.715
So let's see this in context.

01:06:14.715 --> 01:06:17.590
So let's go back to Mario, where
this is a slightly different pyramid

01:06:17.590 --> 01:06:18.548
that we've seen before.

01:06:18.548 --> 01:06:22.270
Notice that it's left aligned,
and it goes downward to the right.

01:06:22.270 --> 01:06:25.930
Let's, in fact, get rid of the ground
and just focus only on the pyramid.

01:06:25.930 --> 01:06:29.980
How could I go about writing code that
implements this type of Mario pyramid?

01:06:29.980 --> 01:06:33.100
Well, let me go ahead and create
a new file called Mario.c.

01:06:33.100 --> 01:06:35.860
Or actually, no, let's be
even more specific this time.

01:06:35.860 --> 01:06:38.313
Let's call this
iteration.c to make clear

01:06:38.313 --> 01:06:39.730
that this is an iterative program.

01:06:39.730 --> 01:06:42.580
And let me go ahead and include cs50.h.

01:06:42.580 --> 01:06:44.920
And let me include standard io.h.

01:06:44.920 --> 01:06:48.220
And let me go ahead then
and do int main void.

01:06:48.220 --> 01:06:51.460
And in here, let me go ahead and
just get the height of the pyramid

01:06:51.460 --> 01:06:53.673
from the user, using
our old friend get int.

01:06:53.673 --> 01:06:55.840
I'm not going to bother,
for today, doing a do while

01:06:55.840 --> 01:06:57.382
and making sure the human cooperates.

01:06:57.382 --> 01:07:00.670
They need to just behave and
give us a positive integer here.

01:07:00.670 --> 01:07:04.300
And then I'm going to go ahead and
just draw a pyramid of that height

01:07:04.300 --> 01:07:07.690
by using a function that doesn't exist
yet but that's going to be called draw.

01:07:07.690 --> 01:07:10.240
I'm going to implement this
function draw as follows,

01:07:10.240 --> 01:07:13.790
void draw, because it doesn't need
to return a value, per our discussion

01:07:13.790 --> 01:07:14.290
last week.

01:07:14.290 --> 01:07:15.520
It's just going to print something.

01:07:15.520 --> 01:07:17.650
But it is going to take
input, like a number n.

01:07:17.650 --> 01:07:19.450
Or rather, let's call it H for Height.

01:07:19.450 --> 01:07:21.700
That represents the height
of the pyramid to draw.

01:07:21.700 --> 01:07:25.010
And how do I draw a pyramid
that looks like this?

01:07:25.010 --> 01:07:28.630
Well, again, use some intuition, as
you might have four problem set 1,

01:07:28.630 --> 01:07:31.190
even though the pyramid
there was a little trickier.

01:07:31.190 --> 01:07:33.043
On the first row, I
want to print one brick.

01:07:33.043 --> 01:07:34.960
On the second row, I
want to print two bricks.

01:07:34.960 --> 01:07:37.450
On the third row,
three, fourth row, four.

01:07:37.450 --> 01:07:40.210
So it turns out this is an
easier pyramid than the one we

01:07:40.210 --> 01:07:41.950
had you do for problem set 1.

01:07:41.950 --> 01:07:42.610
Sorry.

01:07:42.610 --> 01:07:45.550
So for int, i gets 0.

01:07:45.550 --> 01:07:47.410
i is less than--
actually, you know what?

01:07:47.410 --> 01:07:50.530
Let me make it a little clearer and
more mapping to my verbal pseudocode.

01:07:50.530 --> 01:07:53.080
Let's initialize i to
1 for the first row.

01:07:53.080 --> 01:07:56.140
Let's do this so long as i is
less than or equal to the height.

01:07:56.140 --> 01:07:57.310
And let's do i plus plus.

01:07:57.310 --> 01:08:02.550
So this is the same thing as
starting from 0 but just, surprise--

01:08:02.550 --> 01:08:08.380
[LAUGHS] so I actually didn't make
that mistake, if you didn't see it.

01:08:08.380 --> 01:08:14.860
So I'm going to go ahead and say for
in i gets 1 to represent my first row.

01:08:14.860 --> 01:08:17.439
i is less than or equal
to height, i plus plus.

01:08:17.439 --> 01:08:20.242
This is identical, again,
to starting from 0,

01:08:20.242 --> 01:08:23.200
but it's just nice to start counting
from 1 sometimes, as in this case,

01:08:23.200 --> 01:08:24.279
for the first row.

01:08:24.279 --> 01:08:26.696
And then anytime you want to
do something two dimensional,

01:08:26.696 --> 01:08:28.700
like in Mario, odds
are, if you're like me,

01:08:28.700 --> 01:08:33.310
you probably had an inner nested
loop, maybe calling it j, and doing j

01:08:33.310 --> 01:08:37.593
is less than or equal to
i and then j plus plus.

01:08:37.593 --> 01:08:39.760
And I'll run this so that
it's clear what I'm doing.

01:08:39.760 --> 01:08:43.090
But inside this nested loop, I'm
just going to print one brick.

01:08:43.090 --> 01:08:49.600
And then down here, I'm going to
print my new line backslash n.

01:08:49.600 --> 01:08:51.370
So again, it's a simple draw function.

01:08:51.370 --> 01:08:53.710
And now because it's at
the bottom of my file,

01:08:53.710 --> 01:08:56.950
I need to put its prototype up here,
one of the few times copy and paste

01:08:56.950 --> 01:08:58.720
is reasonable, I would say.

01:08:58.720 --> 01:09:03.055
So let me make iteration,
compiles OK, dot slash iteration.

01:09:03.055 --> 01:09:04.430
And now I'm asked for the height.

01:09:04.430 --> 01:09:06.470
Let's go ahead do a pyramid of size 4.

01:09:06.470 --> 01:09:07.715
And voila, it seems to work.

01:09:07.715 --> 01:09:08.840
And let me do it once more.

01:09:08.840 --> 01:09:12.100
I'll try, for instance,
a pyramid of height 3.

01:09:12.100 --> 01:09:12.950
That works.

01:09:12.950 --> 01:09:15.340
And let me go ahead and
do a pyramid of size 5.

01:09:15.340 --> 01:09:16.420
So it seems to work.

01:09:16.420 --> 01:09:19.180
And this is a very reasonable,
very correct approach

01:09:19.180 --> 01:09:24.040
to implementing that Mario pyramid
using iteration, that is to say,

01:09:24.040 --> 01:09:26.920
using loops, in this case, two loops.

01:09:26.920 --> 01:09:29.380
But you know what's interesting
about this Mario pyramid,

01:09:29.380 --> 01:09:31.540
as well as some of
the others we've seen,

01:09:31.540 --> 01:09:33.609
is there's this common structure, right?

01:09:33.609 --> 01:09:35.649
And if we look at the
pyramid in isolation,

01:09:35.649 --> 01:09:38.950
what is the definition
of a pyramid of height 4?

01:09:38.950 --> 01:09:44.050
Well, arguably, it's a pyramid of
height 3 plus 1 additional row.

01:09:44.050 --> 01:09:46.359
What's the definition of
a pyramid of height 3?

01:09:46.359 --> 01:09:49.359
Well, it's a pyramid of height
2 plus 1 additional row.

01:09:49.359 --> 01:09:51.460
What's the definition
of a pyramid of high 2?

01:09:51.460 --> 01:09:54.730
It's a pyramid of height
1 with an additional row.

01:09:54.730 --> 01:09:59.830
That's a recursive definition of just
a physical object or a virtual object,

01:09:59.830 --> 01:10:04.090
whereby you can describe the structure
of something in terms of itself.

01:10:04.090 --> 01:10:08.020
Now, at some point, I need a
special case, at least one height.

01:10:08.020 --> 01:10:09.970
What is a pyramid of height 0?

01:10:09.970 --> 01:10:10.780
Nothing, right?

01:10:10.780 --> 01:10:16.210
Return or exit or quit, whatever the
right verbiage is for the algorithm.

01:10:16.210 --> 01:10:19.430
So long as you have a so-called
base case, where you manually say,

01:10:19.430 --> 01:10:21.580
oh, in that specific case,
just don't do anything,

01:10:21.580 --> 01:10:23.950
and you don't recursively
call yourself again

01:10:23.950 --> 01:10:28.340
and again, we can use this
principle of code calling itself.

01:10:28.340 --> 01:10:29.630
So let's try this once more.

01:10:29.630 --> 01:10:33.880
Let me go ahead and create
another file called recursion.c.

01:10:33.880 --> 01:10:37.570
I'm again going to go ahead and include
[? cs50.h. ?] And I'm going to go ahead

01:10:37.570 --> 01:10:41.590
and include standard [? io.h. ?] And
then I'm going to go ahead and have int

01:10:41.590 --> 01:10:43.450
main void again.

01:10:43.450 --> 01:10:45.520
And in this program here,
I'm going to again ask

01:10:45.520 --> 01:10:48.430
the user for the height of
interest for their pyramid

01:10:48.430 --> 01:10:52.540
using int height gets get
int and ask them for height.

01:10:52.540 --> 01:10:55.000
I'm not going to bother
error checking here.

01:10:55.000 --> 01:10:57.400
I'm going to go ahead and
draw a pyramid of that height.

01:10:57.400 --> 01:11:00.610
And so what's going to change
this time is my draw function,

01:11:00.610 --> 01:11:04.400
void draw int h as before.

01:11:04.400 --> 01:11:06.070
And now's where things get interesting.

01:11:06.070 --> 01:11:09.460
My goal now is not to
just use nested loops,

01:11:09.460 --> 01:11:13.390
but to define a bigger pyramid
in terms of a small pyramid.

01:11:13.390 --> 01:11:19.030
So suppose that the goal at hand
is to draw a pyramid of size 4.

01:11:19.030 --> 01:11:23.890
What should I do first, according
to this definition of a pyramid?

01:11:23.890 --> 01:11:29.060
How do I draw a pyramid
of size 4 in English?

01:11:29.060 --> 01:11:29.560
Yeah?

01:11:29.560 --> 01:11:33.750
AUDIENCE: Draw a pyramid
of the size 4 minus 1.

01:11:33.750 --> 01:11:37.715
SPEAKER 1: Yeah, draw a pyramid of
size 4 minus 1, or a pyramid of size 3.

01:11:37.715 --> 01:11:39.090
So how do I express this in code?

01:11:39.090 --> 01:11:43.140
Well, wonderfully in code, this
is super simple, h minus 1.

01:11:43.140 --> 01:11:51.060
That will draw me a pyramid of height
h minus 1, or 3 in this specific case.

01:11:51.060 --> 01:11:53.250
Now, it's not done the program, right?

01:11:53.250 --> 01:11:55.500
I can't possibly just
compile this and expect

01:11:55.500 --> 01:11:59.790
it to work because this seems like it's
just going to call itself endlessly.

01:11:59.790 --> 01:12:05.610
Well, what's a pyramid of size 3, 2,
1, 0, negative 1, negative 2, right?

01:12:05.610 --> 01:12:09.360
It would go on endlessly if
I just blindly subtract 1.

01:12:09.360 --> 01:12:10.860
So I need that base case.

01:12:10.860 --> 01:12:13.630
Under what circumstances should
I actually not draw anything?

01:12:13.630 --> 01:12:15.373
AUDIENCE: [INAUDIBLE]

01:12:15.373 --> 01:12:16.040
SPEAKER 1: Yeah.

01:12:16.040 --> 01:12:19.190
So maybe if h equals
equals 0, you know what?

01:12:19.190 --> 01:12:19.880
Just return.

01:12:19.880 --> 01:12:21.290
Don't do anything, right?

01:12:21.290 --> 01:12:23.480
I need a base case, a
hard-coded condition

01:12:23.480 --> 01:12:27.440
that says stop doing this,
this mind-bending [? cyclity ?]

01:12:27.440 --> 01:12:28.700
again and again.

01:12:28.700 --> 01:12:30.330
But I do need to do one more thing.

01:12:30.330 --> 01:12:34.130
So this is just an error check to
make sure I don't do this forever.

01:12:34.130 --> 01:12:36.730
This is this leap of faith,
where somehow I haven't even

01:12:36.730 --> 01:12:38.480
written the function
yet, and somehow it's

01:12:38.480 --> 01:12:40.572
magically going to draw my pyramid.

01:12:40.572 --> 01:12:42.530
But what's the second
step of drawing a pyramid

01:12:42.530 --> 01:12:44.678
of height 4, if I can ask again?

01:12:44.678 --> 01:12:46.630
AUDIENCE: Well, in terms of [INAUDIBLE]?

01:12:46.630 --> 01:12:48.130
SPEAKER 1: Yeah, so what comes next?

01:12:48.130 --> 01:12:49.713
I've just drawn a pyramid of height 3.

01:12:49.713 --> 01:12:51.910
AUDIENCE: Oh, then you
draw a pyramid of height 2.

01:12:51.910 --> 01:12:53.482
SPEAKER 1: Now I draw a--

01:12:53.482 --> 01:12:54.190
say it once more.

01:12:54.190 --> 01:12:55.675
AUDIENCE: Pyramid of height 2.

01:12:55.675 --> 01:12:56.550
SPEAKER 1: Not quite.

01:12:56.550 --> 01:12:57.383
Take this literally.

01:12:57.383 --> 01:13:01.330
If I have just in code
drawn a pyramid of height 3,

01:13:01.330 --> 01:13:03.280
how do I get to a
pyramid of height 4 now?

01:13:03.280 --> 01:13:04.772
AUDIENCE: Oh, you add [INAUDIBLE].

01:13:04.772 --> 01:13:07.480
SPEAKER 1: Yeah, I add that
additional row, right? [? Because, ?]

01:13:07.480 --> 01:13:10.040
again, per our diagram,
what's a pyramid of height 4?

01:13:10.040 --> 01:13:13.440
Well, it's really just a pyramid
of height 3 plus an additional row.

01:13:13.440 --> 01:13:15.830
So if we all just kind of
agree, a leap of faith,

01:13:15.830 --> 01:13:20.150
that somehow or other I have the ability
to draw pyramids of height h minus 1,

01:13:20.150 --> 01:13:26.280
lets you and I do the hard part in code
of drawing that one additional row.

01:13:26.280 --> 01:13:30.560
So if I go back in code here, after
drawing a pyramid of height h minus 1,

01:13:30.560 --> 01:13:37.250
I need to go ahead and for int i gets
0, i is less than h, i plus plus.

01:13:37.250 --> 01:13:43.200
It would seem that I just need
to print out, for instance,

01:13:43.200 --> 01:13:48.800
up here a hash followed by a
new line after that, right?

01:13:48.800 --> 01:13:51.947
So I do need a for loop,
but just one not nested.

01:13:51.947 --> 01:13:53.780
And what does this have
the effect of doing?

01:13:53.780 --> 01:13:59.240
Well, on the fourth row, where h equals
4, how many hashes am I going to print?

01:13:59.240 --> 01:14:04.430
1, 2, 3, 4, if I'm iterating
from 0 on up to h, 0, 1, 2, 3, 4.

01:14:04.430 --> 01:14:08.820
So these lines of code, in the story at
hand, are going to print four hashes.

01:14:08.820 --> 01:14:13.460
This line of code, amazingly, is going
to print everything else above it,

01:14:13.460 --> 01:14:14.855
the pyramid of height 3.

01:14:14.855 --> 01:14:16.730
And the line of code
above that is just going

01:14:16.730 --> 01:14:22.070
to make sure that we don't blindly call
draw forever into the negative numbers.

01:14:22.070 --> 01:14:27.290
I'm literally going to say, if h
equals equals 0, stop doing this magic.

01:14:27.290 --> 01:14:31.190
So let's go ahead and put my
prototype up top, just as before,

01:14:31.190 --> 01:14:36.260
even though it's the same, save
the file, make recursion, Enter.

01:14:36.260 --> 01:14:37.700
It compiles OK.

01:14:37.700 --> 01:14:40.910
Now let me go ahead and run
recursion a height of 4.

01:14:40.910 --> 01:14:47.150
And, oh, my god, I wrote a function
that called itself and somehow magically

01:14:47.150 --> 01:14:47.940
printed a pyramid.

01:14:47.940 --> 01:14:51.620
And yet all I ever explicitly
did was print what?

01:14:51.620 --> 01:14:54.860
A row of bricks myself.

01:14:54.860 --> 01:14:58.280
And the recursion comes from the
fact that I'm calling myself.

01:14:58.280 --> 01:15:02.630
But just like with
binary search, just like

01:15:02.630 --> 01:15:08.780
with any divide-and-conquer approach,
I'm calling myself on a smaller problem

01:15:08.780 --> 01:15:09.860
than I was handed.

01:15:09.860 --> 01:15:14.220
The bites are eating into the
problem again and again and again.

01:15:14.220 --> 01:15:17.570
Any questions on this
technique, a function

01:15:17.570 --> 01:15:19.880
that calls itself is recursive?

01:15:19.880 --> 01:15:20.606
Yeah?

01:15:20.606 --> 01:15:22.550
AUDIENCE: A quick question.

01:15:22.550 --> 01:15:25.952
[INAUDIBLE]

01:15:25.952 --> 01:15:29.283
So [INAUDIBLE] loop, how
does it go back [INAUDIBLE]??

01:15:29.283 --> 01:15:31.450
SPEAKER 1: Really good
question, after the for loop,

01:15:31.450 --> 01:15:32.700
how does it go back and print?

01:15:32.700 --> 01:15:33.240
It doesn't.

01:15:33.240 --> 01:15:34.960
That happens first.

01:15:34.960 --> 01:15:38.220
So if you actually were to
use debug 50 in the [? IDE, ?]

01:15:38.220 --> 01:15:40.740
you would see that when
this line 20 is called,

01:15:40.740 --> 01:15:45.650
and you call draw of a pyramid of
height 3, draw gets called again.

01:15:45.650 --> 01:15:47.400
And then it gets called
again on height 2.

01:15:47.400 --> 01:15:49.140
Then it gets called again on height 1.

01:15:49.140 --> 01:15:51.450
But guess what happens
on a pyramid of height 1?

01:15:51.450 --> 01:15:53.100
It prints a single hash.

01:15:53.100 --> 01:15:55.410
Then if you rewind the
story, what happens next?

01:15:55.410 --> 01:15:57.300
You print a row of two hashes.

01:15:57.300 --> 01:15:58.350
What happens next?

01:15:58.350 --> 01:15:59.940
You print a row of three hashes.

01:15:59.940 --> 01:16:01.080
What happens next?

01:16:01.080 --> 01:16:03.420
You print a row of four hashes.

01:16:03.420 --> 01:16:05.790
And we'll see more of this before long.

01:16:05.790 --> 01:16:07.800
But because I'm printing--

01:16:07.800 --> 01:16:12.660
I'm calling draw before I'm printing the
base, I don't know how this works yet.

01:16:12.660 --> 01:16:14.970
That's the leap of faith
to which I keep alluding.

01:16:14.970 --> 01:16:18.870
But it keeps happening because,
1, I have this base case that

01:16:18.870 --> 01:16:20.880
stops this from happening forever.

01:16:20.880 --> 01:16:26.470
And I have this other case that
adds to my pyramid again and again.

01:16:26.470 --> 01:16:26.970
Yeah?

01:16:26.970 --> 01:16:30.906
AUDIENCE: It's kind of like a layering
of [INAUDIBLE] for iterations.

01:16:30.906 --> 01:16:34.962
But instead of going from top
down, it's going [? down up. ?]

01:16:34.962 --> 01:16:35.670
SPEAKER 1: It is.

01:16:35.670 --> 01:16:36.390
It's going down up.

01:16:36.390 --> 01:16:37.860
And you're referring
actually to a [? concept ?]

01:16:37.860 --> 01:16:40.652
we'll talk about actually in a week
or two's time called the stack.

01:16:40.652 --> 01:16:43.110
We'll see actually how
this magic is working.

01:16:43.110 --> 01:16:45.870
For now let me just stipulate that
functions can call themselves,

01:16:45.870 --> 01:16:48.090
so long as what you
pass them is a smaller

01:16:48.090 --> 01:16:50.545
input than you were handed initially.

01:16:50.545 --> 01:16:52.920
And now just to demonstrate
that computer scientists have

01:16:52.920 --> 01:16:55.080
a sense of humor, if
we Google recursion,

01:16:55.080 --> 01:16:57.810
as you might currently be doing
to understand what this is,

01:16:57.810 --> 01:16:59.032
you'll notice--

01:16:59.032 --> 01:17:01.392
[LAUGHTER]

01:17:02.810 --> 01:17:04.040
Get it?

01:17:04.040 --> 01:17:06.350
Kind of-- OK, anyhow.

01:17:06.350 --> 01:17:10.640
Google has literally hard-coded that
into their source code of google.com.

01:17:10.640 --> 01:17:17.090
So let's now use this to
solve a problem of sorting.

01:17:17.090 --> 01:17:19.997
It turns out there's an algorithm
out there called merge sort.

01:17:19.997 --> 01:17:23.080
And it's representative of sorts that
are actually better than bubble sort

01:17:23.080 --> 01:17:25.550
and better than selection
sort fundamentally.

01:17:25.550 --> 01:17:28.160
In terms of big O
notation, we can do better.

01:17:28.160 --> 01:17:30.708
n squared does not have to be our fate.

01:17:30.708 --> 01:17:32.750
After all, so many things
in our life are sorted.

01:17:32.750 --> 01:17:36.320
Your contacts in your phone, maybe
your friends on Facebook or Instagram,

01:17:36.320 --> 01:17:40.100
or any application using the cloud
typically sorts data in some way.

01:17:40.100 --> 01:17:42.920
It would be a shame if
it's super slow to sort,

01:17:42.920 --> 01:17:44.870
as we saw already, with n squared.

01:17:44.870 --> 01:17:47.402
So merge sort works as follows.

01:17:47.402 --> 01:17:49.610
This is pseudocode for an
algorithm called merge sort

01:17:49.610 --> 01:17:53.000
that if you hand it an array of
numbers or names or anything,

01:17:53.000 --> 01:17:54.050
it acts as follows.

01:17:54.050 --> 01:17:57.850
If there's only one item you're handed
in your array, well, just return.

01:17:57.850 --> 01:17:58.850
There's nothing to sort.

01:17:58.850 --> 01:18:00.050
So that's our base case.

01:18:00.050 --> 01:18:02.000
That's the sort of stupid
case where you have

01:18:02.000 --> 01:18:04.520
to hard code, that is
literally write out,

01:18:04.520 --> 01:18:06.903
if this situation happens, do this.

01:18:06.903 --> 01:18:09.320
And the case is if you just
hand me a list with one thing,

01:18:09.320 --> 01:18:12.070
it's obviously sorted, by definition,
because nothing can possibly

01:18:12.070 --> 01:18:12.950
be out of order.

01:18:12.950 --> 01:18:15.380
Things get more interesting otherwise.

01:18:15.380 --> 01:18:18.800
Merge sort says, just like that
Mario example, you know what?

01:18:18.800 --> 01:18:21.140
If you want me to sort
this whole list, I'm

01:18:21.140 --> 01:18:25.970
going to tell you sort the left
half, then sort the right half,

01:18:25.970 --> 01:18:30.200
and then merge those lists together,
such that you weave them together

01:18:30.200 --> 01:18:33.830
in such a way that the merged
list is sorted as well.

01:18:33.830 --> 01:18:38.330
So merge sort is three steps,
sort left half, sort right half,

01:18:38.330 --> 01:18:41.870
merge those two sorted halves.

01:18:41.870 --> 01:18:43.310
And this is the--

01:18:43.310 --> 01:18:45.800
we were chatting earlier
about an apt metaphor here.

01:18:45.800 --> 01:18:49.770
This is kind of a roller-coaster-type
ride, where you got to hold on.

01:18:49.770 --> 01:18:50.640
You've got to focus.

01:18:50.640 --> 01:18:54.080
It's OK if it doesn't all work out
for the best the first time around.

01:18:54.080 --> 01:18:58.530
But each step will be important here in
the metaphor of the fire hose as well.

01:18:58.530 --> 01:19:00.890
So here is a list of unsorted numbers.

01:19:00.890 --> 01:19:04.470
The goal at hand is to sort them faster
than bubble sort and selection sort

01:19:04.470 --> 01:19:05.140
can.

01:19:05.140 --> 01:19:07.160
So merge sort tells me what?

01:19:07.160 --> 01:19:09.560
Sort left half, sort right half, merge.

01:19:09.560 --> 01:19:10.670
That is it for merge sort.

01:19:10.670 --> 01:19:11.390
That's the magic.

01:19:11.390 --> 01:19:16.220
Just like Mario says, print a pyramid of
height h minus 1, print the base, done.

01:19:16.220 --> 01:19:19.160
That's the essence of this recursive
algorithm, left half, right half,

01:19:19.160 --> 01:19:19.680
merge.

01:19:19.680 --> 01:19:20.680
So what's the left half?

01:19:20.680 --> 01:19:22.100
It's these four elements here.

01:19:22.100 --> 01:19:24.440
Let me go ahead now and
sort those four elements.

01:19:24.440 --> 01:19:27.800
How do I saw a list of four elements?

01:19:27.800 --> 01:19:28.880
Merge sort them, right?

01:19:28.880 --> 01:19:32.175
Sort the left half, then sort the
right half, then merge them together.

01:19:32.175 --> 01:19:33.800
So you're kind of like kicking the can.

01:19:33.800 --> 01:19:34.620
Like, I've done no work.

01:19:34.620 --> 01:19:36.560
You're just telling me to go
sort something [? else. ?]

01:19:36.560 --> 01:19:38.185
But OK, let me follow those directions.

01:19:38.185 --> 01:19:40.310
Let me sort the left half, 7, 4.

01:19:40.310 --> 01:19:41.810
How do I saw a list of size 2?

01:19:41.810 --> 01:19:43.190
AUDIENCE: Swap.

01:19:43.190 --> 01:19:44.555
SPEAKER 1: Not swapping yet.

01:19:44.555 --> 01:19:45.860
AUDIENCE: [INAUDIBLE]

01:19:45.860 --> 01:19:46.910
SPEAKER 1: Merge sort--

01:19:46.910 --> 01:19:49.555
the left half, then the right
half, then merge them together.

01:19:49.555 --> 01:19:51.680
So again, it's kind of
crazy talk because we've not

01:19:51.680 --> 01:19:53.030
done any actual work yet.

01:19:53.030 --> 01:19:54.300
And I claim we're sorting.

01:19:54.300 --> 01:19:55.470
But let's see what happens.

01:19:55.470 --> 01:19:56.900
Here's the left half.

01:19:56.900 --> 01:19:59.550
How do I sort a list of size 1?

01:19:59.550 --> 01:20:00.050
Done.

01:20:00.050 --> 01:20:00.980
That's the return.

01:20:00.980 --> 01:20:03.980
That's the base case to make
sure I don't do this forever.

01:20:03.980 --> 01:20:05.210
What came next?

01:20:05.210 --> 01:20:06.380
I just sorted the left half.

01:20:06.380 --> 01:20:07.977
What was the second step?

01:20:07.977 --> 01:20:08.810
Sort the right half.

01:20:08.810 --> 01:20:10.130
How do I sort this?

01:20:10.130 --> 01:20:11.000
Done.

01:20:11.000 --> 01:20:12.500
Now it gets interesting.

01:20:12.500 --> 01:20:13.800
What was the third step?

01:20:13.800 --> 01:20:14.540
AUDIENCE: Merge.

01:20:14.540 --> 01:20:17.283
SPEAKER 1: Merge two lists of size 1.

01:20:17.283 --> 01:20:18.575
So now I need some extra space.

01:20:18.575 --> 01:20:21.480
So I'm going to give myself an extra
row, some extra memory, if you will,

01:20:21.480 --> 01:20:22.280
in the computer.

01:20:22.280 --> 01:20:23.840
4 obviously comes first.

01:20:23.840 --> 01:20:25.033
7 obviously comes next.

01:20:25.033 --> 01:20:25.950
That's the merge step.

01:20:25.950 --> 01:20:29.075
That's what I mean by merge, take the
smallest element from whichever list,

01:20:29.075 --> 01:20:32.180
and then follow it by the smallest
element in the other list.

01:20:32.180 --> 01:20:34.760
This now is a sorted list of size 2.

01:20:34.760 --> 01:20:39.690
So if you rewind in your mind,
what was the second step now?

01:20:39.690 --> 01:20:41.790
That was sort left half.

01:20:41.790 --> 01:20:43.620
Sort right half, right?

01:20:43.620 --> 01:20:47.008
So you really have to kind of rewind in
the story, like, 30-plus seconds ago.

01:20:47.008 --> 01:20:48.300
How do you sort the right half?

01:20:48.300 --> 01:20:52.470
Well, you sort the left
half, done, right half, done.

01:20:52.470 --> 01:20:54.600
Here's the magic, merge.

01:20:54.600 --> 01:20:56.850
How do I merge these two lists?

01:20:56.850 --> 01:20:58.240
2 comes first.

01:20:58.240 --> 01:20:59.640
5 comes next.

01:20:59.640 --> 01:21:04.022
I have just sorted the
right half of this list.

01:21:04.022 --> 01:21:05.730
So I sorted left half,
sorted right half.

01:21:05.730 --> 01:21:06.647
What's the third step?

01:21:06.647 --> 01:21:07.980
AUDIENCE: Merge.

01:21:07.980 --> 01:21:08.730
SPEAKER 1: Merge.

01:21:08.730 --> 01:21:09.940
So how do I do that?

01:21:09.940 --> 01:21:11.190
Well, I look at the two lists.

01:21:11.190 --> 01:21:13.230
And how do I merge these
together [? interleaving ?] them

01:21:13.230 --> 01:21:14.130
in the right order?

01:21:14.130 --> 01:21:18.810
2 comes first, then 4, then 5, then 7.

01:21:18.810 --> 01:21:21.850
So now I have sorted the left
half of the original list.

01:21:21.850 --> 01:21:24.750
So what was step two originally?

01:21:24.750 --> 01:21:26.370
Sort the right half.

01:21:26.370 --> 01:21:29.820
So sort the right half
means sort the left half.

01:21:29.820 --> 01:21:34.790
And then sort the left half of that,
done, right half of that, done.

01:21:34.790 --> 01:21:38.220
Merge makes it
interesting, 3 and then 6.

01:21:38.220 --> 01:21:40.770
I've now sorted the left
half of four numbers.

01:21:40.770 --> 01:21:41.790
What comes next?

01:21:41.790 --> 01:21:44.250
Sort of right half, so 8 in 1.

01:21:44.250 --> 01:21:47.430
Sort the left half of that,
done, right half of that, done.

01:21:47.430 --> 01:21:50.670
Now merge those two together, 1 and 8.

01:21:50.670 --> 01:21:53.610
I've now sorted the right
half of the four elements.

01:21:53.610 --> 01:21:56.230
What's the third step?

01:21:56.230 --> 01:21:56.730
Merge.

01:21:56.730 --> 01:21:58.980
So it's left half, right
half, merge, again and again.

01:21:58.980 --> 01:22:05.310
So right half, left half,
let's merge them, 1, 3, 6, 8.

01:22:05.310 --> 01:22:07.530
And now if you rewind,
like, two minutes,

01:22:07.530 --> 01:22:11.050
this is the right half
of the whole list.

01:22:11.050 --> 01:22:13.180
So what's step three?

01:22:13.180 --> 01:22:13.680
Merge.

01:22:13.680 --> 01:22:15.570
So let's give ourselves
a little more memory

01:22:15.570 --> 01:22:24.620
and merge these two,
1, 2, 3, 4, 5, 6, 7, 8.

01:22:24.620 --> 01:22:26.550
And my god, it's merged in the end.

01:22:26.550 --> 01:22:28.410
Now, that was a lot of steps.

01:22:28.410 --> 01:22:31.350
But it turns out it was
far fewer than the number

01:22:31.350 --> 01:22:33.940
of steps we were used to thus far.

01:22:33.940 --> 01:22:36.600
In fact, if you consider
what really happened,

01:22:36.600 --> 01:22:39.180
after all of those verbal
gymnastics, what I really did was

01:22:39.180 --> 01:22:42.060
I took a list of size 8 and
broke it down at some point

01:22:42.060 --> 01:22:43.195
into eight lists of size 1.

01:22:43.195 --> 01:22:45.570
And that's when there was no
interesting work to be done.

01:22:45.570 --> 01:22:46.320
We just returned.

01:22:46.820 --> 01:22:50.760
But I did that so that I could
then compose four lists of size 2

01:22:50.760 --> 01:22:51.720
along the way.

01:22:51.720 --> 01:22:54.545
And I did that so I could
compose two lists of size 4.

01:22:54.545 --> 01:22:56.670
And I did that so that I
could aggregate everything

01:22:56.670 --> 01:23:00.810
together and get one list of size 8.

01:23:00.810 --> 01:23:02.550
So notice the pattern here.

01:23:02.550 --> 01:23:05.310
If you go bottom up,
even, here's one list.

01:23:05.310 --> 01:23:06.660
I divide it in half.

01:23:06.660 --> 01:23:07.980
I divided those halves in half.

01:23:07.980 --> 01:23:10.180
I divided those halves in halves.

01:23:10.180 --> 01:23:13.590
So what function or
mathematics have we use

01:23:13.590 --> 01:23:16.110
to describe any process
thus far since week 0,

01:23:16.110 --> 01:23:19.498
where we're doing
something halves at a time?

01:23:19.498 --> 01:23:20.675
AUDIENCE: Logarithm.

01:23:20.675 --> 01:23:21.550
SPEAKER 1: Logarithm.

01:23:21.550 --> 01:23:25.150
So any time you see in CS50 and
really in algorithms is more generally

01:23:25.150 --> 01:23:29.570
a process that is dividing and
dividing and dividing again and again,

01:23:29.570 --> 01:23:31.750
there's a logarithm involved there.

01:23:31.750 --> 01:23:36.040
And indeed, the number of times that
you can chop up a list of size 8

01:23:36.040 --> 01:23:40.600
into eight lists of size 1 is,
by definition, log base 2 of n

01:23:40.600 --> 01:23:43.322
or just, again, with a wave
of the hand, log n, which

01:23:43.322 --> 01:23:46.030
is to say like the height of this
picture, if you will, is log n.

01:23:46.030 --> 01:23:48.405
But again, we don't have to
worry too much about numbers.

01:23:48.405 --> 01:23:52.210
But every time we did that dividing
into smaller lists, we merged, right?

01:23:52.210 --> 01:23:54.430
That was the third and
most important step.

01:23:54.430 --> 01:23:58.810
And every time we merged,
we combined 4 elements

01:23:58.810 --> 01:24:02.740
plus 4 elements or 2 plus 2 plus
2 plus 2 elements or 1 plus 1

01:24:02.740 --> 01:24:04.840
plus 1, 8 elements individually.

01:24:04.840 --> 01:24:07.370
So we touched all n elements.

01:24:07.370 --> 01:24:11.050
So this picture, if you will,
is, like, 8 numbers wide.

01:24:11.050 --> 01:24:14.740
And I-- or n numbers wide,
if we generalize as n.

01:24:14.740 --> 01:24:19.360
And it's log n rows tall, if you will,
because that's how many times you

01:24:19.360 --> 01:24:20.890
can divide things again and again.

01:24:20.890 --> 01:24:24.970
So what is the running time
intuitively, perhaps, of merge sort?

01:24:24.970 --> 01:24:28.240
It's actually n times
log n because you've

01:24:28.240 --> 01:24:31.000
got n numbers that need to be
merged again and again and again.

01:24:31.000 --> 01:24:34.000
But how many times did I say again?

01:24:34.000 --> 01:24:38.620
Log n times, because that's the number
of times you can have things again

01:24:38.620 --> 01:24:40.758
and again and again.

01:24:40.758 --> 01:24:44.050
And if you do the math, log base 2 of 8,
which is the total number of elements,

01:24:44.050 --> 01:24:45.970
indeed is 1, 2, 3.

01:24:45.970 --> 01:24:47.080
So math works out.

01:24:47.080 --> 01:24:49.660
But it's OK if you think
about it more intuitively.

01:24:49.660 --> 01:24:51.640
So this is perhaps the
bigger leap of faith,

01:24:51.640 --> 01:24:55.420
to just believe that, indeed,
that is how that math works out.

01:24:55.420 --> 01:24:58.600
But it turns out that what
this means is the algorithm

01:24:58.600 --> 01:25:01.610
itself is fundamentally faster.

01:25:01.610 --> 01:25:03.568
So if we consider our
little chart from before,

01:25:03.568 --> 01:25:06.318
where bubble sort and selection
sort were way up here at the top--

01:25:06.318 --> 01:25:08.810
and frankly, you can have even
slower algorithms than that,

01:25:08.810 --> 01:25:11.680
especially if the problems are
even more difficult to solve.

01:25:11.680 --> 01:25:16.283
Now we can add to the list
merge sort there at n log n.

01:25:16.283 --> 01:25:16.950
It's in between.

01:25:16.950 --> 01:25:17.450
Why?

01:25:17.450 --> 01:25:20.920
Because, again, even if you're not
100% comfortable with what log in is,

01:25:20.920 --> 01:25:22.270
notice that here's n.

01:25:22.270 --> 01:25:23.420
Here's n squared.

01:25:23.420 --> 01:25:27.910
So n times a slightly smaller
value is in between, or n log n.

01:25:27.910 --> 01:25:31.540
And we'll see in a moment what
this actually means or feels like.

01:25:31.540 --> 01:25:33.310
What about omega?

01:25:33.310 --> 01:25:37.460
In the best case with merge
sort, how much time does it take?

01:25:37.460 --> 01:25:39.730
Well, it, too, does not
have that optimization

01:25:39.730 --> 01:25:42.970
that bubble sort had, which is,
well, if you do no swaps, just quit.

01:25:42.970 --> 01:25:45.940
It does the same thing always, sort
the left half, sort the right half,

01:25:45.940 --> 01:25:48.430
merge, even if it's a bit unnecessary.

01:25:48.430 --> 01:25:53.710
So it turns out that the omega notation
for merge sort is also n log n.

01:25:53.710 --> 01:25:58.540
The newer version of bubble sort,
recall, we could get as good as n steps

01:25:58.540 --> 01:26:01.000
if we stop after seeing no swaps.

01:26:01.000 --> 01:26:02.920
So merge sort, it's a trade-off, right?

01:26:02.920 --> 01:26:05.038
In the worst case, much faster, I claim.

01:26:05.038 --> 01:26:05.830
It's not n squared.

01:26:05.830 --> 01:26:06.760
It's n log n.

01:26:06.760 --> 01:26:09.280
But in the best case, you might
waste a little bit of time.

01:26:09.280 --> 01:26:11.800
And again, that's thematic in
computer science more generally.

01:26:11.800 --> 01:26:13.210
You're not going to
get anything for free.

01:26:13.210 --> 01:26:15.190
If you want to improve
your upper bound, you

01:26:15.190 --> 01:26:18.493
might have to sacrifice
your lower bound as well.

01:26:18.493 --> 01:26:20.410
Now, it turns out with
some algorithms-- and I

01:26:20.410 --> 01:26:24.580
promise this is last Greek
notation for the course.

01:26:24.580 --> 01:26:27.220
This is a capital theta in Greek.

01:26:27.220 --> 01:26:31.000
And it turns out that if an algorithm
has an upper bound and a lower

01:26:31.000 --> 01:26:33.940
bound that are identical, you
can describe it using, just

01:26:33.940 --> 01:26:35.860
for shorthand notation, theta.

01:26:35.860 --> 01:26:38.320
So we've seen two algorithms
that fit this criteria.

01:26:38.320 --> 01:26:40.000
Selection sort was pretty bad.

01:26:40.000 --> 01:26:42.100
It was big O of n squared.

01:26:42.100 --> 01:26:44.320
And it was omega of n
squared because it just

01:26:44.320 --> 01:26:47.590
kept blindly looking for the
smallest elements again and again.

01:26:47.590 --> 01:26:51.910
Merge sort is in theta of n
log n for the same reason.

01:26:51.910 --> 01:26:54.250
It just blindly does the same
algorithm again and again,

01:26:54.250 --> 01:26:58.420
no matter whether the input is
already sorted or completely unsorted.

01:26:58.420 --> 01:27:04.130
But on the whole, n log n is a
pretty powerful, compelling feature.

01:27:04.130 --> 01:27:07.990
So let me go ahead and turn
our attention, finally,

01:27:07.990 --> 01:27:11.530
to a little visualization that
might help this sink in as well.

01:27:11.530 --> 01:27:15.760
What you're about to see is a bunch
of vertical bars, the top of which

01:27:15.760 --> 01:27:17.590
are 100 bars from left to right.

01:27:17.590 --> 01:27:19.840
Small bars equals small number.

01:27:19.840 --> 01:27:21.160
Big bar equals big number.

01:27:21.160 --> 01:27:23.650
And the first algorithm
up here is selection sort.

01:27:23.650 --> 01:27:25.810
The second algorithm
down here is bubble sort.

01:27:25.810 --> 01:27:27.820
And the middle algorithm is merge sort.

01:27:27.820 --> 01:27:30.820
So if you will, we'll
end on this note today.

01:27:30.820 --> 01:27:33.490
We'll time these algorithms
with these simple inputs

01:27:33.490 --> 01:27:39.160
and see just how much better, I
claim, merge sort is, which is to say,

01:27:39.160 --> 01:27:43.330
just how big of a difference does
n squared versus n log n make,

01:27:43.330 --> 01:27:46.120
which is to say when you design
algorithms, making things correct

01:27:46.120 --> 01:27:47.650
is not the ultimate goal.

01:27:47.650 --> 01:27:50.290
It's to make them well designed as well.

01:27:50.290 --> 01:27:53.284
[MUSIC PLAYING]

01:28:53.280 --> 01:28:55.020
That's it for C50 and merge sort.

01:28:55.020 --> 01:28:56.820
We will see you next time.

01:28:56.820 --> 01:28:59.570
[APPLAUSE]