WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:02.994
[MUSIC PLAYING]

00:01:01.377 --> 00:01:04.590
DAVID MALAN: All right, this is CS50.

00:01:04.590 --> 00:01:05.700
And this is week 3.

00:01:05.700 --> 00:01:09.420
And as promised, we thought we'd take a
bit of a break on new syntax this week

00:01:09.420 --> 00:01:12.300
and focus a lot more on
algorithms and implementation

00:01:12.300 --> 00:01:15.840
thereof because over the past few
weeks, besides Scratch, we've now had C.

00:01:15.840 --> 00:01:17.850
And you have a lot of vocabulary now.

00:01:17.850 --> 00:01:20.460
Even if it might not
seem yet that you fully

00:01:20.460 --> 00:01:23.790
grasp all of the functionality of this
particular language, with practice,

00:01:23.790 --> 00:01:25.030
it'll get easier and easier.

00:01:25.030 --> 00:01:28.410
But today, we'd focus instead
on ideas and ultimately on this,

00:01:28.410 --> 00:01:32.700
how to think algorithmically, and so
to take problems in the real world

00:01:32.700 --> 00:01:35.550
and try to quantize them in a
way that you can map those puzzle

00:01:35.550 --> 00:01:39.210
pieces from week 0 or all of this
new syntax from weeks 1 and 2

00:01:39.210 --> 00:01:41.792
on to actually writing code
to solve those problems.

00:01:41.792 --> 00:01:43.500
To contextualize this,
though, we thought

00:01:43.500 --> 00:01:46.230
we'd remind you of this
here chart from week zero.

00:01:46.230 --> 00:01:48.900
And recall that in week zero,
we painted this picture where

00:01:48.900 --> 00:01:52.320
on the x-axis on the horizontal
was the size of the problem.

00:01:52.320 --> 00:01:54.808
And the number of phone
book pages increased

00:01:54.808 --> 00:01:56.100
as you went from left to right.

00:01:56.100 --> 00:01:59.200
And then on the vertical axis
or y-axis, we had time to solve.

00:01:59.200 --> 00:02:01.900
So this was like how many
seconds, how many page turns,

00:02:01.900 --> 00:02:05.140
how many units of measure--
whatever you're using.

00:02:05.140 --> 00:02:07.690
We might actually describe
the solution there too.

00:02:07.690 --> 00:02:10.660
And the first algorithm we had
in week 0 for the phone book

00:02:10.660 --> 00:02:12.350
was, like, one page at a time.

00:02:12.350 --> 00:02:14.410
So we plotted it with this 1 to 1 slope.

00:02:14.410 --> 00:02:16.990
The second algorithm, we
started doing two pages

00:02:16.990 --> 00:02:19.000
at a time, which did risk a bug.

00:02:19.000 --> 00:02:20.680
I might have to double back one page.

00:02:20.680 --> 00:02:23.120
But it was still going for
the most part twice as fast.

00:02:23.120 --> 00:02:27.100
So it was still a slow but sort of
a 2 to 1 slope instead of 1 to 1.

00:02:27.100 --> 00:02:29.290
But the third and final
algorithm, recall,

00:02:29.290 --> 00:02:31.210
was sort of fundamentally different.

00:02:31.210 --> 00:02:34.250
And it was this logarithmic
curve, so to speak,

00:02:34.250 --> 00:02:38.435
whereby it kept increasing, increasing,
increasing but very, very slowly.

00:02:38.435 --> 00:02:40.810
So even if you, like, doubled
the size of the phone book,

00:02:40.810 --> 00:02:44.810
as by having Cambridge and Allston here
in Massachusetts merge, no big deal.

00:02:44.810 --> 00:02:49.190
It was just one more page turn,
not another 500 or another 1,000.

00:02:49.190 --> 00:02:54.730
So think back today on that particular
idea of how we began to divide

00:02:54.730 --> 00:02:55.900
and conquer the problem.

00:02:55.900 --> 00:02:59.330
And that sort of gave us a
fundamentally better advantage.

00:02:59.330 --> 00:03:02.840
So we thought we'd see if we can
apply this lesson learned as follows.

00:03:02.840 --> 00:03:06.340
If I were to take, like, attendance
today sort of here on stage,

00:03:06.340 --> 00:03:10.870
I could do it old school, like,
1, 2, 3, 4, 5, 6, 7, 8, and so

00:03:10.870 --> 00:03:13.058
forth, so one step at a time.

00:03:13.058 --> 00:03:14.350
I could also double the speed--

00:03:14.350 --> 00:03:18.200
2, 4, 6, 8, 10, 12,
14, 16, and so forth.

00:03:18.200 --> 00:03:20.890
But I dare say we can
learn a bit from week zero.

00:03:20.890 --> 00:03:24.040
And if you'll indulge me
right in place, could everyone

00:03:24.040 --> 00:03:26.865
stand up and think of the number one.

00:03:26.865 --> 00:03:28.990
So right where you are,
just stand up if you could.

00:03:32.650 --> 00:03:33.250
Stand up.

00:03:33.250 --> 00:03:34.872
And think of the number one.

00:03:34.872 --> 00:03:38.080
So at this point, hopefully, everyone's
literally thinking of the number one.

00:03:38.080 --> 00:03:42.010
And the second step of this algorithm,
which I claim ultimately theoretically

00:03:42.010 --> 00:03:45.490
should be much faster than
either my one person at a time

00:03:45.490 --> 00:03:47.740
or two people at a
time, step two is this.

00:03:47.740 --> 00:03:49.240
Pair off with someone standing.

00:03:49.240 --> 00:03:51.340
And add their number to yours.

00:03:51.340 --> 00:03:52.345
And remember the sum.

00:03:59.620 --> 00:04:03.400
Person can be in front of,
behind, left, or right of you.

00:04:06.510 --> 00:04:10.950
All right most likely most
everyone in the room assuming you

00:04:10.950 --> 00:04:13.620
found someone is thinking
of the number two

00:04:13.620 --> 00:04:16.073
now unless you're sort of an
odd person out in the row.

00:04:16.073 --> 00:04:16.740
And that's fine.

00:04:16.740 --> 00:04:18.115
If you're still one, that's fine.

00:04:18.115 --> 00:04:19.800
But most of you are probably two.

00:04:19.800 --> 00:04:23.925
Next step is that one of you
in those pairs should sit down.

00:04:29.290 --> 00:04:34.210
OK, so many of you tried to sit down
as quickly as possible we noticed.

00:04:34.210 --> 00:04:37.898
But so next step now, at this
point, rather, most of you

00:04:37.898 --> 00:04:39.190
are thinking of the number two.

00:04:39.190 --> 00:04:41.023
A few of you are thinking
of the number one.

00:04:41.023 --> 00:04:41.710
And that's OK.

00:04:41.710 --> 00:04:45.130
The next step, and notice we're about
to induce a loop, so the rest of this

00:04:45.130 --> 00:04:49.195
is on you, if still standing,
go back to step two.

00:04:53.584 --> 00:04:56.572
[INTERPOSING VOICES]

00:05:18.030 --> 00:05:20.490
DAVID MALAN: If still standing,
notice that this is a loop.

00:05:20.490 --> 00:05:21.270
So keep going.

00:05:21.270 --> 00:05:24.914
Keep going if still standing.

00:05:24.914 --> 00:05:27.680
[INTERPOSING VOICES]

00:05:29.100 --> 00:05:30.990
DAVID MALAN: There?

00:05:30.990 --> 00:05:33.276
How about there?

00:05:33.276 --> 00:05:35.702
[INTERPOSING VOICES]

00:05:35.702 --> 00:05:36.660
DAVID MALAN: That's OK.

00:05:36.660 --> 00:05:37.570
But now keep going.

00:05:37.570 --> 00:05:38.070
Keep going.

00:05:38.070 --> 00:05:39.900
Keep pairing off, so maybe you two.

00:05:42.610 --> 00:05:45.823
All right, a few more seconds.

00:05:45.823 --> 00:05:48.709
[INTERPOSING VOICES]

00:05:49.922 --> 00:05:51.255
DAVID MALAN: So step two, still.

00:05:55.810 --> 00:05:57.670
All right, keep pairing
if you're standing.

00:06:00.595 --> 00:06:03.529
[INTERPOSING VOICES]

00:06:08.717 --> 00:06:09.675
DAVID MALAN: All right.

00:06:12.420 --> 00:06:16.325
All right, so theoretically, there's
only one person standing left.

00:06:16.325 --> 00:06:17.700
But clearly, that's not the case.

00:06:17.700 --> 00:06:18.450
That's fine.

00:06:18.450 --> 00:06:22.198
I will help with the pairing
because some of you are far away.

00:06:22.198 --> 00:06:22.740
So let's see.

00:06:22.740 --> 00:06:24.450
What's your number here?

00:06:24.450 --> 00:06:25.680
Sorry?

00:06:25.680 --> 00:06:27.620
What's your number?

00:06:27.620 --> 00:06:28.120
Eight.

00:06:28.120 --> 00:06:29.350
OK, go ahead and sit down.

00:06:29.350 --> 00:06:30.160
How about in back?

00:06:30.160 --> 00:06:31.480
What's your number?

00:06:31.480 --> 00:06:32.270
46.

00:06:32.270 --> 00:06:32.770
Nice.

00:06:32.770 --> 00:06:34.000
OK, go ahead and sit down.

00:06:34.000 --> 00:06:35.860
Who else is standing?

00:06:35.860 --> 00:06:38.840
Over here, what's your number?

00:06:38.840 --> 00:06:40.490
You're 16?

00:06:40.490 --> 00:06:42.350
OK, so go ahead and sit down.

00:06:42.350 --> 00:06:43.820
And behind you?

00:06:43.820 --> 00:06:44.480
48.

00:06:44.480 --> 00:06:46.400
OK, go ahead and sit down.

00:06:46.400 --> 00:06:48.290
Is anyone still standing?

00:06:48.290 --> 00:06:49.400
Yeah?

00:06:49.400 --> 00:06:50.270
32.

00:06:50.270 --> 00:06:50.960
Nice.

00:06:50.960 --> 00:06:53.180
Still standing over here.

00:06:53.180 --> 00:06:54.530
43.

00:06:54.530 --> 00:06:55.140
OK, nice.

00:06:55.140 --> 00:06:55.640
Sit down.

00:06:55.640 --> 00:06:56.140
Sit down.

00:06:56.140 --> 00:06:58.220
And anyone else still standing here?

00:06:58.220 --> 00:06:59.480
22.

00:06:59.480 --> 00:07:01.160
Go ahead and sit down.

00:07:01.160 --> 00:07:03.860
Is anyone still standing
and participating?

00:07:03.860 --> 00:07:07.400
Yeah, where-- oh, yeah.

00:07:07.400 --> 00:07:08.357
16.

00:07:08.357 --> 00:07:09.440
OK, go ahead and sit down.

00:07:09.440 --> 00:07:12.360
Anyone else still standing?

00:07:12.360 --> 00:07:14.250
OK, so theoretically,
everyone's paired off.

00:07:14.250 --> 00:07:16.330
You were the last person standing.

00:07:16.330 --> 00:07:19.257
So when I hit Enter here,
having just greased the wheels

00:07:19.257 --> 00:07:21.090
to do all of the remaining
additions myself,

00:07:21.090 --> 00:07:24.810
we should have the total
count of people in the room.

00:07:24.810 --> 00:07:27.330
And recognize that unlike
my algorithm, which

00:07:27.330 --> 00:07:29.910
would have required pointing
at each and every person

00:07:29.910 --> 00:07:33.930
or my second algorithm, which would
mean pointing at every two people

00:07:33.930 --> 00:07:38.010
twice as fast, theoretically, the
algorithm you all just executed,

00:07:38.010 --> 00:07:40.480
I daresay, should've been
fundamentally faster.

00:07:40.480 --> 00:07:40.980
Why?

00:07:40.980 --> 00:07:43.680
Because no matter how many
people in the room-- maybe, like,

00:07:43.680 --> 00:07:45.840
if there were 1,000
people in the room, there

00:07:45.840 --> 00:07:50.280
would then have been 500, just as
there would be 500 pages in week zero.

00:07:50.280 --> 00:07:52.320
Then from 500, there'd be 250--

00:07:52.320 --> 00:07:53.200
125.

00:07:53.200 --> 00:07:53.700
Why?

00:07:53.700 --> 00:07:55.783
Because on each step of
the algorithm, half of you

00:07:55.783 --> 00:07:59.310
theoretically were sitting down,
sitting down, sitting down, dividing

00:07:59.310 --> 00:08:00.880
and conquering that problem.

00:08:00.880 --> 00:08:05.370
So the total number of people in the
room as of now according to your count

00:08:05.370 --> 00:08:08.460
is 231.

00:08:08.460 --> 00:08:11.650
As a backup, though, Carter kindly
did it the old-fashioned way,

00:08:11.650 --> 00:08:12.760
one person at a time.

00:08:12.760 --> 00:08:16.310
And Carter, the actual number
of people in the room is?

00:08:16.310 --> 00:08:18.070
[LAUGHS]

00:08:18.070 --> 00:08:21.220
OK, so our first real
world bug to be fair.

00:08:21.220 --> 00:08:23.510
So theoretically, that
should have worked.

00:08:23.510 --> 00:08:26.230
But clearly, we lost some
numbers along the way, so a bug

00:08:26.230 --> 00:08:27.620
that we can fix today.

00:08:27.620 --> 00:08:31.810
But remember that really,
this is just similar in spirit

00:08:31.810 --> 00:08:34.750
to that algorithm we
indeed did in week zero.

00:08:34.750 --> 00:08:36.532
It's the same as the phone book example.

00:08:36.532 --> 00:08:37.990
It went off the rails in this case.

00:08:37.990 --> 00:08:41.320
But it's the same idea, ultimately
dividing and conquering.

00:08:41.320 --> 00:08:44.402
And any time you have this
halving, halving, halving,

00:08:44.402 --> 00:08:46.360
there's going to be a
logarithm involved there,

00:08:46.360 --> 00:08:48.152
even if you're a little
rusty on your math.

00:08:48.152 --> 00:08:51.220
And that's fundamentally faster
than just doing something n times

00:08:51.220 --> 00:08:55.090
or even n divided by 2 times where n
is the number of people in the room

00:08:55.090 --> 00:08:57.980
or in week zero, the number
of pages in the phone book.

00:08:57.980 --> 00:09:01.450
So even when you're using your iPhone
or Android device later today, like,

00:09:01.450 --> 00:09:04.750
if you search for a
contact using autocomplete,

00:09:04.750 --> 00:09:09.070
it is that so-called divide
and conquer algorithm that's

00:09:09.070 --> 00:09:10.930
finding people in your address book.

00:09:10.930 --> 00:09:13.570
It's not starting top
to bottom or bottom up.

00:09:13.570 --> 00:09:15.520
It's probably going
roughly to the middle

00:09:15.520 --> 00:09:19.300
and then doing the top half or the
bottom half, repeating again and again.

00:09:19.300 --> 00:09:21.040
So these ideas are everywhere.

00:09:21.040 --> 00:09:23.340
And hopefully, you end
up finding just the one

00:09:23.340 --> 00:09:25.090
person you're looking
for or in your case,

00:09:25.090 --> 00:09:28.030
the one person last standing
who should have theoretically

00:09:28.030 --> 00:09:32.350
had the count of everyone because if
each of you started by representing 1,

00:09:32.350 --> 00:09:35.440
effectively handed off your
number, handed off your number,

00:09:35.440 --> 00:09:38.260
handed off your number, it
theoretically should have coalesced

00:09:38.260 --> 00:09:40.090
in that final person standing.

00:09:40.090 --> 00:09:43.730
So let's consider the
connection now between this idea

00:09:43.730 --> 00:09:47.680
and what we introduced last week,
which was this idea of very simple data

00:09:47.680 --> 00:09:52.240
structures in your computer's memory--
like, actually using this memory as

00:09:52.240 --> 00:09:54.700
though it's kind of a grid of bytes.

00:09:54.700 --> 00:09:58.190
Each one of these squares, recall,
represents 1 byte or 8 bits.

00:09:58.190 --> 00:10:00.250
And we can get rid of
the hardware and sort

00:10:00.250 --> 00:10:03.850
of abstract it away as just this
grid of memory or this canvas.

00:10:03.850 --> 00:10:05.680
And then we introduced arrays last week.

00:10:05.680 --> 00:10:08.680
And what was the key
definition of an array?

00:10:08.680 --> 00:10:11.420
How would you describe an array?

00:10:11.420 --> 00:10:14.400
What is it, anyone?

00:10:14.400 --> 00:10:15.540
What's an array?

00:10:15.540 --> 00:10:18.180
Yeah, in the middle.

00:10:18.180 --> 00:10:20.720
A collection.

00:10:20.720 --> 00:10:21.330
A collection.

00:10:21.330 --> 00:10:24.790
And I don't love data types only because
in C, it tends to be the same type.

00:10:24.790 --> 00:10:25.790
So a collection of data.

00:10:25.790 --> 00:10:26.450
I do like that.

00:10:26.450 --> 00:10:28.370
But there's one other
key characteristic.

00:10:28.370 --> 00:10:29.670
Do you want to be more precise?

00:10:29.670 --> 00:10:32.850
It's not just a collection--

00:10:32.850 --> 00:10:35.680
something about where it is.

00:10:35.680 --> 00:10:36.880
Potentially strings.

00:10:36.880 --> 00:10:39.850
But strings are just an example
of putting char, char, char, char.

00:10:39.850 --> 00:10:42.500
It could certainly be integers
or floating point values.

00:10:42.500 --> 00:10:44.410
Another characteristic?

00:10:44.410 --> 00:10:45.220
Sorry?

00:10:45.220 --> 00:10:46.433
It's not necessarily ordered.

00:10:46.433 --> 00:10:48.100
Actually, we'll come back to that today.

00:10:48.100 --> 00:10:49.340
It could be in any order.

00:10:49.340 --> 00:10:52.090
And certainly a string isn't
necessarily in sorted order.

00:10:52.090 --> 00:10:54.890
It's in whatever the word is.

00:10:54.890 --> 00:10:56.450
It's a list in concept.

00:10:56.450 --> 00:10:59.750
But there was something key about
where we put things in memory.

00:10:59.750 --> 00:11:01.580
Yeah?

00:11:01.580 --> 00:11:05.420
Consecutive-- the memory is
consecutive, a.k.a., contiguous.

00:11:05.420 --> 00:11:08.660
An array is important in C because,
yes, it's a list of values.

00:11:08.660 --> 00:11:10.130
Yes, it's a collection of values.

00:11:10.130 --> 00:11:15.080
But the real key distinction in an
array in C is that it's contiguous.

00:11:15.080 --> 00:11:19.740
The bytes are back to back to back
somewhere in the computer's memory,

00:11:19.740 --> 00:11:23.930
at least for any given data type, be
it an int, a float, a bigger string.

00:11:23.930 --> 00:11:27.140
All of the characters are back to
back to back in the computer's memory,

00:11:27.140 --> 00:11:30.150
not spread all out, even if
you have space elsewhere.

00:11:30.150 --> 00:11:35.190
So with that said, we can actually start
to solve problems with that mindset.

00:11:35.190 --> 00:11:39.020
And for instance, if I kind of pare this
down to just an abstract array of size

00:11:39.020 --> 00:11:42.860
1, 2, 3, 4, 5, 6, 7,
for instance, suppose

00:11:42.860 --> 00:11:47.220
that there are these numbers
inside of this array of memory.

00:11:47.220 --> 00:11:51.080
So here are seven integers or
ints in C. I have in this case

00:11:51.080 --> 00:11:55.100
sorted them just to make the numbers pop
out as obviously smallest to largest.

00:11:55.100 --> 00:11:58.070
But the catch with C is
that if I were to ask you

00:11:58.070 --> 00:12:01.650
or if you were to ask the computer
through code to find you the number 50,

00:12:01.650 --> 00:12:03.390
well, obviously, every
human in this room

00:12:03.390 --> 00:12:07.800
just found it obviously right there
because we kind of have this bird's eye

00:12:07.800 --> 00:12:10.500
view of the whole memory at once.

00:12:10.500 --> 00:12:12.990
But the computer
ironically does not have

00:12:12.990 --> 00:12:15.240
that bird's eye view of its own memory.

00:12:15.240 --> 00:12:18.820
It can only look at each
location one at a time.

00:12:18.820 --> 00:12:22.080
So if you really were to do this like
a computer, you would kind of have to,

00:12:22.080 --> 00:12:26.040
like, shield your eye and only look
at one number at a time from left

00:12:26.040 --> 00:12:28.030
to right, from right to
left, or in any order

00:12:28.030 --> 00:12:31.487
in order to find is
the 50 actually there.

00:12:31.487 --> 00:12:32.820
You can't just take a step back.

00:12:32.820 --> 00:12:34.540
And boom, it pops out at you.

00:12:34.540 --> 00:12:37.230
So this is kind of
analogous, this array,

00:12:37.230 --> 00:12:39.690
to being like a set of
gym lockers or school

00:12:39.690 --> 00:12:42.828
lockers like this where the doors
are actually closed by default.

00:12:42.828 --> 00:12:43.870
The numbers are in there.

00:12:43.870 --> 00:12:46.120
But the doors are closed,
which is to say the computer

00:12:46.120 --> 00:12:47.562
and we can't actually look.

00:12:47.562 --> 00:12:49.020
So we couldn't find yellow lockers.

00:12:49.020 --> 00:12:51.010
But we did find red lockers here.

00:12:51.010 --> 00:12:53.850
And so I propose that you think
of these lockers on the stage

00:12:53.850 --> 00:12:57.270
here of which there are seven as
well as representing an array.

00:12:57.270 --> 00:13:01.740
And just so we have some terminology,
notice that I've labeled these bracket

00:13:01.740 --> 00:13:05.220
0, bracket 1, bracket 2, 3, 4, 5 and 6.

00:13:05.220 --> 00:13:07.410
And the bracket notation
recalls the new syntax

00:13:07.410 --> 00:13:10.230
from last week that
lets you index into--

00:13:10.230 --> 00:13:13.450
go to a specific spot
inside of an array.

00:13:13.450 --> 00:13:15.720
And notice that even though
there are seven lockers,

00:13:15.720 --> 00:13:17.520
I only counted as high as six.

00:13:17.520 --> 00:13:20.850
But again, that's just a side effect
of our generally counting from 0.

00:13:20.850 --> 00:13:27.330
So 0 through 6 or 0 through n minus 1
because if there are n equal 7 lockers,

00:13:27.330 --> 00:13:28.590
n minus 1 is 6.

00:13:28.590 --> 00:13:31.900
So that's the left bound and
the right bound respectively.

00:13:31.900 --> 00:13:37.560
So suppose that we were to use these
lockers as representing a problem,

00:13:37.560 --> 00:13:40.320
like we want to find an actual
number behind these doors.

00:13:40.320 --> 00:13:42.820
So this is actually a very
common problem in the real world.

00:13:42.820 --> 00:13:46.650
And you and I take for granted every
day that big companies like Google

00:13:46.650 --> 00:13:49.110
and Microsoft and others,
like, do this for us

00:13:49.110 --> 00:13:52.920
constantly, not to mention AI
doing something similar nowadays,

00:13:52.920 --> 00:13:54.360
searching for information.

00:13:54.360 --> 00:13:56.940
And we'll focus on
some basics first that

00:13:56.940 --> 00:13:59.542
will lead us to more sophisticated
algorithms ultimately.

00:13:59.542 --> 00:14:01.500
But all we're going to
talk about fundamentally

00:14:01.500 --> 00:14:05.160
is this same picture from week zero
and from week one and from week two

00:14:05.160 --> 00:14:07.120
where here is a problem to be solved.

00:14:07.120 --> 00:14:11.295
So if for instance, the input to
this problem is an array of numbers--

00:14:11.295 --> 00:14:12.420
an array of seven numbers--

00:14:12.420 --> 00:14:15.870
I can't see them all at once-- but I'm
looking for something like the number

00:14:15.870 --> 00:14:16.560
50--

00:14:16.560 --> 00:14:18.540
ultimately, I want to get back out.

00:14:18.540 --> 00:14:20.280
I claim true or false.

00:14:20.280 --> 00:14:21.900
Like, the number 50 is there.

00:14:21.900 --> 00:14:22.740
Or it is not.

00:14:22.740 --> 00:14:25.050
That's one way of thinking
about the search problem.

00:14:25.050 --> 00:14:27.390
Find me some piece of
data if it's there.

00:14:27.390 --> 00:14:31.060
Otherwise, tell me that it's
not-- true or false, respectively.

00:14:31.060 --> 00:14:33.772
So the algorithm inside
of this black box,

00:14:33.772 --> 00:14:36.480
though, is where we're going to
actually have to do some thinking

00:14:36.480 --> 00:14:41.097
and actually figure out how best to find
the number or the data we care about.

00:14:41.097 --> 00:14:43.680
And even though we'll use numbers
to keep things simple today,

00:14:43.680 --> 00:14:47.190
you could certainly generalize
this to web pages or contacts

00:14:47.190 --> 00:14:51.450
or any other type of information that
are in some computer or database more

00:14:51.450 --> 00:14:52.240
generally.

00:14:52.240 --> 00:14:56.970
So maybe to keep things
interesting, could we

00:14:56.970 --> 00:14:59.028
get-- how about two volunteers?

00:14:59.028 --> 00:14:59.820
Wow, that was fast.

00:14:59.820 --> 00:15:02.080
OK, come on down.

00:15:02.080 --> 00:15:03.630
And how about one other volunteer?

00:15:03.630 --> 00:15:04.050
I'll go over here.

00:15:04.050 --> 00:15:04.800
OK, how about you?

00:15:04.800 --> 00:15:06.300
Come on down.

00:15:06.300 --> 00:15:08.670
Sure, round of applause
for our volunteers.

00:15:08.670 --> 00:15:10.380
[APPLAUSE]

00:15:10.380 --> 00:15:13.670
OK, welcome.

00:15:13.670 --> 00:15:14.250
Come on over.

00:15:14.250 --> 00:15:15.875
Do you want to introduce
yourself to the group?

00:15:15.875 --> 00:15:16.970
AUDIENCE: [INAUDIBLE].

00:15:16.970 --> 00:15:18.760
DAVID MALAN: Like a few seconds is fine.

00:15:18.760 --> 00:15:20.155
AUDIENCE: Hi, I'm Sam.

00:15:20.155 --> 00:15:21.895
I am not a CS concentration.

00:15:21.895 --> 00:15:24.580
DAVID MALAN: [LAUGHS] So what are you?

00:15:24.580 --> 00:15:25.705
Do you know yet?

00:15:25.705 --> 00:15:26.830
AUDIENCE: Applied math.

00:15:26.830 --> 00:15:27.497
DAVID MALAN: OK.

00:15:27.497 --> 00:15:27.997
Nice.

00:15:27.997 --> 00:15:28.720
Nice to meet you.

00:15:28.720 --> 00:15:29.140
Welcome.

00:15:29.140 --> 00:15:29.640
And?

00:15:29.640 --> 00:15:30.910
AUDIENCE: Hi, I'm Louis.

00:15:30.910 --> 00:15:32.740
I'm from first year, Matthews.

00:15:32.740 --> 00:15:34.705
And I'm going to do Econ with Stats.

00:15:34.705 --> 00:15:36.690
DAVID MALAN: Oh, you're in Matthews too?

00:15:36.690 --> 00:15:37.190
OK.

00:15:37.190 --> 00:15:40.720
[CHUCKLES] I was in Matthews
too, so Matthews South?

00:15:40.720 --> 00:15:41.350
Oh, wow.

00:15:41.350 --> 00:15:42.220
Oh, my god.

00:15:42.220 --> 00:15:43.840
I was room 201.

00:15:43.840 --> 00:15:44.986
AUDIENCE: 103.

00:15:44.986 --> 00:15:46.630
DAVID MALAN: Fifth floor--

00:15:46.630 --> 00:15:48.280
all right, so anyhow.

00:15:48.280 --> 00:15:50.337
And so we have Louis and your name?

00:15:50.337 --> 00:15:50.920
AUDIENCE: Sam.

00:15:50.920 --> 00:15:51.460
DAVID MALAN: Sam.

00:15:51.460 --> 00:15:51.940
Louis and Sam.

00:15:51.940 --> 00:15:54.550
So Louis and I are going to step off to
the side for just a moment because Sam,

00:15:54.550 --> 00:15:56.077
we have a problem for you to solve.

00:15:56.077 --> 00:15:57.910
This feels a little bit
like Price Is Right.

00:15:57.910 --> 00:16:00.010
But behind you are these seven lockers.

00:16:00.010 --> 00:16:03.460
And we'd like you to just
find us the number 50.

00:16:03.460 --> 00:16:05.050
That's all the information you get.

00:16:05.050 --> 00:16:08.205
But we'd like you to then explain
how you go about finding it.

00:16:08.205 --> 00:16:10.122
AUDIENCE: Wait, are they
in order or anything?

00:16:10.122 --> 00:16:12.486
Or [INAUDIBLE] it just
kind of [INAUDIBLE]??

00:16:12.486 --> 00:16:15.540
DAVID MALAN: Find us the number 50.

00:16:15.540 --> 00:16:17.370
And then tell us how you found it.

00:16:20.662 --> 00:16:22.745
OK, what was in there,
just so the audience knows.

00:16:22.745 --> 00:16:23.620
AUDIENCE: [INAUDIBLE]

00:16:23.620 --> 00:16:24.850
DAVID MALAN: Yes.

00:16:24.850 --> 00:16:25.550
[LAUGHTER]

00:16:25.550 --> 00:16:28.818
AUDIENCE: It was $10, a
very, very big $10 bill.

00:16:28.818 --> 00:16:29.485
DAVID MALAN: OK.

00:16:29.485 --> 00:16:30.423
AUDIENCE: Fake money.

00:16:30.423 --> 00:16:31.090
DAVID MALAN: OK.

00:16:33.925 --> 00:16:39.310
AUDIENCE: That was $100.

00:16:39.310 --> 00:16:42.100
$1.

00:16:42.100 --> 00:16:43.570
$5.

00:16:43.570 --> 00:16:45.400
OK, I'm not lucky.

00:16:45.400 --> 00:16:46.810
Oh, I found it.

00:16:46.810 --> 00:16:47.560
DAVID MALAN: Nice.

00:16:47.560 --> 00:16:49.180
Take it out and so people can believe.

00:16:49.180 --> 00:16:50.530
All right, wonderful.

00:16:50.530 --> 00:16:52.390
So you found the 50.

00:16:52.390 --> 00:16:53.260
[APPLAUSE]

00:16:53.260 --> 00:16:55.305
And now if you could explain.

00:16:55.305 --> 00:16:55.930
I'll take that.

00:16:55.930 --> 00:16:58.060
If you could explain,
what was your algorithm?

00:16:58.060 --> 00:17:00.810
What was the step-by-step
approach you took?

00:17:00.810 --> 00:17:03.490
AUDIENCE: I didn't really have one.

00:17:03.490 --> 00:17:04.720
I just started on one end.

00:17:04.720 --> 00:17:07.450
And I went down because it
wasn't in order or anything.

00:17:07.450 --> 00:17:09.400
So I just kept going until I found it.

00:17:09.400 --> 00:17:10.720
DAVID MALAN: OK, so that's
actually pretty fair.

00:17:10.720 --> 00:17:12.262
And in fact, let's step forward here.

00:17:12.262 --> 00:17:14.829
Carter's going to very secretly
kind of shuffle the numbers

00:17:14.829 --> 00:17:16.880
in a particular new arrangement here.

00:17:16.880 --> 00:17:19.390
And so you really went
from right to left.

00:17:19.390 --> 00:17:22.960
And I dare say maybe going from
left to right might be equivalent.

00:17:22.960 --> 00:17:25.750
But could she have done better?

00:17:25.750 --> 00:17:29.590
Could she have done better, because
that took 1, 2, 3, 4, 5 steps?

00:17:29.590 --> 00:17:31.970
Could Sam have done better?

00:17:31.970 --> 00:17:32.470
Sure.

00:17:32.470 --> 00:17:35.270
How?

00:17:35.270 --> 00:17:36.527
OK, so you got lucky.

00:17:36.527 --> 00:17:38.360
You could have found
the number in one step.

00:17:38.360 --> 00:17:40.400
Although, luck isn't
really an algorithm.

00:17:40.400 --> 00:17:42.780
It really isn't a step-by-step approach.

00:17:42.780 --> 00:17:45.053
So another thought?

00:17:45.053 --> 00:17:46.890
AUDIENCE: [INAUDIBLE] dollar bills--

00:17:46.890 --> 00:17:49.656
sort them in order?

00:17:49.656 --> 00:17:51.270
DAVID MALAN: Oh, interesting.

00:17:51.270 --> 00:17:54.492
So you could have taken out all
of the dollar bills, sorted them,

00:17:54.492 --> 00:17:57.200
put them back in, and then you
probably could have done something

00:17:57.200 --> 00:17:59.315
like the divide and conquer approach.

00:17:59.315 --> 00:18:00.520
AUDIENCE: I didn't know
I was allowed to do that.

00:18:00.520 --> 00:18:01.400
DAVID MALAN: No, you weren't allowed to.

00:18:01.400 --> 00:18:02.360
So that's fine.

00:18:02.360 --> 00:18:02.810
AUDIENCE: [INAUDIBLE]

00:18:02.810 --> 00:18:04.852
DAVID MALAN: But that
would be a valid algorithm.

00:18:04.852 --> 00:18:07.970
Although, it sounds very
inefficient to do all this work just

00:18:07.970 --> 00:18:10.128
to find the number 50
because in doing that work,

00:18:10.128 --> 00:18:11.420
she would've found the 50 once.

00:18:11.420 --> 00:18:13.420
But that might actually
be a reasonable solution

00:18:13.420 --> 00:18:17.330
if she plans to search again and again
and again, not just for seven numbers

00:18:17.330 --> 00:18:18.560
but maybe a lot of numbers.

00:18:18.560 --> 00:18:21.690
Maybe you do want to incur some
cost up front and all the data

00:18:21.690 --> 00:18:22.910
so as to find it faster.

00:18:22.910 --> 00:18:24.320
But let's do this a
little more methodically.

00:18:24.320 --> 00:18:25.862
We'll step off to the side once more.

00:18:25.862 --> 00:18:29.120
And just by convention, if you want
to go ahead and-- do we need this?

00:18:29.120 --> 00:18:29.660
OK.

00:18:29.660 --> 00:18:30.368
AUDIENCE: Got it.

00:18:30.368 --> 00:18:32.390
DAVID MALAN: OK, let's
go from left to right.

00:18:32.390 --> 00:18:32.900
Yep.

00:18:32.900 --> 00:18:34.858
And go ahead and show
everyone the numbers just

00:18:34.858 --> 00:18:37.430
to prove that there's
no funny business here.

00:18:37.430 --> 00:18:38.848
AUDIENCE: 20.

00:18:38.848 --> 00:18:41.596
DAVID MALAN: [LAUGHS]

00:18:41.596 --> 00:18:42.960
AUDIENCE: Oh, OK.

00:18:42.960 --> 00:18:44.010
500.

00:18:44.010 --> 00:18:44.760
DAVID MALAN: Nice.

00:18:48.191 --> 00:18:49.340
AUDIENCE: 10.

00:18:49.340 --> 00:18:50.290
DAVID MALAN: Mm-hm.

00:18:50.290 --> 00:18:54.212
So it doesn't sound like it's
sorted, either, this time.

00:18:54.212 --> 00:18:54.945
AUDIENCE: Five.

00:18:54.945 --> 00:18:58.040
DAVID MALAN: Nice.

00:18:58.040 --> 00:18:59.826
And?

00:18:59.826 --> 00:19:00.940
AUDIENCE: 100.

00:19:00.940 --> 00:19:02.280
DAVID MALAN: Oh, so close.

00:19:05.090 --> 00:19:06.057
AUDIENCE: [INAUDIBLE]

00:19:06.057 --> 00:19:06.890
DAVID MALAN: I know.

00:19:06.890 --> 00:19:07.730
Now we're just messing with you.

00:19:07.730 --> 00:19:08.230
AUDIENCE: 1.

00:19:08.230 --> 00:19:10.904
DAVID MALAN: OK, and lastly?

00:19:10.904 --> 00:19:12.380
AUDIENCE: [GASPS] 50.

00:19:12.380 --> 00:19:13.670
DAVID MALAN: Very well done.

00:19:13.670 --> 00:19:15.360
OK, so nicely done.

00:19:15.360 --> 00:19:15.860
[APPLAUSE]

00:19:15.860 --> 00:19:17.870
And so thank you.

00:19:17.870 --> 00:19:20.990
So this is a say whether she goes
from left to right or right to left,

00:19:20.990 --> 00:19:24.080
like, the performance of that
algorithm just going linearly

00:19:24.080 --> 00:19:27.170
from side to side really depends
on where the number ends up being.

00:19:27.170 --> 00:19:28.760
And it kind of does boil down to luck.

00:19:28.760 --> 00:19:31.550
And so that was kind of the best
you could do because I dare say,

00:19:31.550 --> 00:19:34.772
had you gone linearly from right
to left-- and go ahead to reset--

00:19:34.772 --> 00:19:37.730
had you gone from right to left, that
time you would have gotten lucky.

00:19:37.730 --> 00:19:40.080
So on average, it might
just kind of work out.

00:19:40.080 --> 00:19:43.070
And so half the time, it takes
you maybe half as many steps--

00:19:43.070 --> 00:19:45.087
half the number of lockers
to find it on average

00:19:45.087 --> 00:19:46.670
because sometimes it's in the one end.

00:19:46.670 --> 00:19:47.840
Sometimes it's on the other end.

00:19:47.840 --> 00:19:49.620
Sometimes it's smack dab in the middle.

00:19:49.620 --> 00:19:52.490
But we're going to give
you the option now,

00:19:52.490 --> 00:19:54.523
Louis, of knowing that it's sorted.

00:19:54.523 --> 00:19:56.690
So we're going to take away
the microphone from you.

00:19:56.690 --> 00:19:58.400
But stay on up here with us.

00:19:58.400 --> 00:20:00.710
And you are now given the
assumption that the numbers

00:20:00.710 --> 00:20:03.860
are this time sorted from smallest
to largest, left to right.

00:20:03.860 --> 00:20:08.570
And might you want to take more of
a divide and conquer approach here?

00:20:08.570 --> 00:20:09.656
Wait a minute.

00:20:09.656 --> 00:20:10.934
AUDIENCE: [LAUGHS]

00:20:10.934 --> 00:20:14.450
DAVID MALAN: What might you
do as your algorithm, Louis?

00:20:14.450 --> 00:20:16.700
AUDIENCE: Well, I think I
know all the numbers, right?

00:20:16.700 --> 00:20:17.440
It's 1, 5--

00:20:17.440 --> 00:20:18.482
DAVID MALAN: Oh, damn it.

00:20:18.482 --> 00:20:19.610
AUDIENCE: 10, 20, 50.

00:20:19.610 --> 00:20:21.630
DAVID MALAN: OK, so Louis has
some memory, as we'll say.

00:20:21.630 --> 00:20:22.130
OK.

00:20:22.130 --> 00:20:24.320
AUDIENCE: But assuming
if I didn't have memory.

00:20:24.320 --> 00:20:25.310
DAVID MALAN: OK, assuming
if you didn't have

00:20:25.310 --> 00:20:26.480
memory, where would you start first?

00:20:26.480 --> 00:20:27.920
AUDIENCE: I would probably
start in the middle.

00:20:27.920 --> 00:20:28.490
DAVID MALAN: OK, go ahead.

00:20:28.490 --> 00:20:29.220
Go in the middle.

00:20:29.220 --> 00:20:31.505
And what do you see?

00:20:31.505 --> 00:20:32.283
AUDIENCE: 20.

00:20:32.283 --> 00:20:32.950
DAVID MALAN: 20.

00:20:32.950 --> 00:20:34.505
OK, what does that mean now for you?

00:20:34.505 --> 00:20:37.630
AUDIENCE: So now that means that since4
I know there's some numbers bigger,

00:20:37.630 --> 00:20:38.390
I'll go to the right.

00:20:38.390 --> 00:20:38.500
DAVID MALAN: Good.

00:20:38.500 --> 00:20:40.720
So you can tear the problem
in half, so to speak.

00:20:40.720 --> 00:20:43.690
As per week zero, skip all
of the lockers on the left.

00:20:43.690 --> 00:20:45.460
And now go where
relative to these three?

00:20:45.460 --> 00:20:46.540
AUDIENCE: Relative to the middle.

00:20:46.540 --> 00:20:46.720
DAVID MALAN: OK.

00:20:46.720 --> 00:20:48.800
AUDIENCE: So maybe [? you ?] don't
know what's on the right-hand side.

00:20:48.800 --> 00:20:49.765
And so now it's 100.

00:20:49.765 --> 00:20:51.070
DAVID MALAN: OK, nice.

00:20:51.070 --> 00:20:54.250
AUDIENCE: So the 50 must
be between 20 and 100.

00:20:54.250 --> 00:20:56.012
So it must be this one.

00:20:56.012 --> 00:20:56.845
DAVID MALAN: Indeed.

00:20:56.845 --> 00:21:00.342
So a round of applause for Louis
for getting this right this time.

00:21:00.342 --> 00:21:01.790
[APPLAUSE]

00:21:01.790 --> 00:21:05.008
We have some lovely parting gifts,
since we're using monopoly money.

00:21:05.008 --> 00:21:06.050
So it's not actual money.

00:21:06.050 --> 00:21:10.610
But this is the Cambridge edition
Harvard Square and all for both of you.

00:21:10.610 --> 00:21:11.390
So welcome.

00:21:11.390 --> 00:21:12.986
Thank you so much.

00:21:12.986 --> 00:21:17.140
[APPLAUSE]

00:21:17.140 --> 00:21:19.630
So Louis was actually getting
particularly clever there

00:21:19.630 --> 00:21:21.610
when his first instinct
was to just remember

00:21:21.610 --> 00:21:23.800
what were all the numbers
and then sort of deduce

00:21:23.800 --> 00:21:25.600
where the 50 must obviously be.

00:21:25.600 --> 00:21:26.440
So that's on us.

00:21:26.440 --> 00:21:29.050
Like, ideally, we would have
completely changed the dollar amount so

00:21:29.050 --> 00:21:30.633
that he couldn't use that information.

00:21:30.633 --> 00:21:34.540
But it turns out Louie's instinct there
to figure out where the number should

00:21:34.540 --> 00:21:35.980
be and jump right there--

00:21:35.980 --> 00:21:39.160
index into that exact location--
is actually a technique.

00:21:39.160 --> 00:21:42.760
And it's a technique we'll
talk about in future classes

00:21:42.760 --> 00:21:45.280
where you actually do take
into account the information

00:21:45.280 --> 00:21:46.940
and go right where you want.

00:21:46.940 --> 00:21:50.230
It's an example of what we'll
eventually call hashing, so to speak,

00:21:50.230 --> 00:21:51.940
in a concept called hash tables.

00:21:51.940 --> 00:21:54.520
But for now, let's try to
formalize the algorithms

00:21:54.520 --> 00:21:58.240
that both of these volunteers
kind of intuitively

00:21:58.240 --> 00:22:00.070
came up with, first and second.

00:22:00.070 --> 00:22:01.580
And we'll slap some names on them.

00:22:01.580 --> 00:22:05.140
So the first, and I've hinted at this,
is what we would call linear search.

00:22:05.140 --> 00:22:08.590
Anytime you search from left
to right or from right to left,

00:22:08.590 --> 00:22:10.190
it's generally called linear search.

00:22:10.190 --> 00:22:10.690
Why?

00:22:10.690 --> 00:22:13.482
Because you're kind of walking in
a line, no matter which direction

00:22:13.482 --> 00:22:14.080
you're going.

00:22:14.080 --> 00:22:17.390
But now for today's purposes,
let's see if we can't truly

00:22:17.390 --> 00:22:21.440
formalize what our volunteers'
algorithms were by translating them,

00:22:21.440 --> 00:22:24.320
not necessarily to code
yet, but pseudocode.

00:22:24.320 --> 00:22:28.190
See if we can't map it to English-like
syntax that gets the ideas across.

00:22:28.190 --> 00:22:32.220
So I dare say, the first algorithm,
even though she went from right to left,

00:22:32.220 --> 00:22:34.460
then from left to right,
might look like this.

00:22:34.460 --> 00:22:39.800
For each door from left to right,
she checked if 50 is behind the door

00:22:39.800 --> 00:22:40.850
as by looking at it.

00:22:40.850 --> 00:22:44.417
If it was behind the door,
then she returned true.

00:22:44.417 --> 00:22:45.500
Like, yes, this is the 50.

00:22:45.500 --> 00:22:47.583
That didn't happen on the
first iteration, though.

00:22:47.583 --> 00:22:51.750
So she moved on again and again.

00:22:51.750 --> 00:22:53.660
And now notice the
indentation here is just

00:22:53.660 --> 00:22:55.670
as important as it was in week zero.

00:22:55.670 --> 00:22:59.180
Notice that only at the very
bottom of this algorithm

00:22:59.180 --> 00:23:00.620
do I propose returning false.

00:23:00.620 --> 00:23:03.650
But it's not indented
inside of this pseudocode.

00:23:03.650 --> 00:23:04.160
Why?

00:23:04.160 --> 00:23:06.530
Well, because if I had
changed it to be this,

00:23:06.530 --> 00:23:10.740
what would be the logical bug in
this version of that algorithm?

00:23:10.740 --> 00:23:11.240
Yeah?

00:23:17.040 --> 00:23:17.730
Exactly.

00:23:17.730 --> 00:23:22.020
If she had opened the first door, found
it to be the wrong number if it says

00:23:22.020 --> 00:23:24.330
else-- if it's not behind
the door, then return false--

00:23:24.330 --> 00:23:26.760
that would erroneously
conclude the 50's not there,

00:23:26.760 --> 00:23:29.350
even though it could certainly
be among those other doors.

00:23:29.350 --> 00:23:32.370
So this first version of the
code where the return false is

00:23:32.370 --> 00:23:35.670
sort of left indented, so to
speak, and the very last thing

00:23:35.670 --> 00:23:39.690
you do if you don't previously
return true, that just

00:23:39.690 --> 00:23:42.360
makes sure that we're
handling all possible cases.

00:23:42.360 --> 00:23:45.060
But let's make this maybe
a little more technical.

00:23:45.060 --> 00:23:47.580
This is how a computer
scientist or a programmer

00:23:47.580 --> 00:23:49.970
would likely express this instead.

00:23:49.970 --> 00:23:52.260
Instead of just drawing
it in broad strokes,

00:23:52.260 --> 00:23:56.100
it's actually fine to kind of steal
some of the syntax from languages like C

00:23:56.100 --> 00:24:01.530
and actually use some of the indices
or indexes like 0, 1, 2, 3, 4, 5, 6,

00:24:01.530 --> 00:24:04.360
to represent the pieces
of data we care about.

00:24:04.360 --> 00:24:06.150
So this is a little more precise.

00:24:06.150 --> 00:24:12.030
For i-- like a variable i--
from the value 0 to n minus 1,

00:24:12.030 --> 00:24:16.390
so in the case of seven doors, this
is like saying, for i starting at 0,

00:24:16.390 --> 00:24:19.280
going up to 6, do the following.

00:24:19.280 --> 00:24:22.730
If the number 50 is
behind the doors array--

00:24:22.730 --> 00:24:27.250
so I'm using array syntax, even though
this is technically still pseudocode--

00:24:27.250 --> 00:24:32.720
if the ith location of my doors
array has the number 50, return true.

00:24:32.720 --> 00:24:35.680
Otherwise, if you do that
again and again and again--

00:24:35.680 --> 00:24:39.863
n total times-- and you still don't
find it, you want to return false.

00:24:39.863 --> 00:24:40.780
So we introduced this.

00:24:40.780 --> 00:24:44.590
This is just an example of how you can
start to borrow ideas from actual code

00:24:44.590 --> 00:24:47.890
to paint the picture even
more precisely of what

00:24:47.890 --> 00:24:50.200
it is you want a
colleague to do, what it

00:24:50.200 --> 00:24:53.860
is you want your code to
do, ultimately, by sort

00:24:53.860 --> 00:24:58.720
of borrowing these ideas from code and
incorporating it into our pseudocode.

00:24:58.720 --> 00:25:03.430
But what about the
second algorithm here,

00:25:03.430 --> 00:25:07.930
the second algorithm, whereby he
took a divide and conquer approach,

00:25:07.930 --> 00:25:10.660
starting in the middle and then
going right and then going left.

00:25:10.660 --> 00:25:13.077
Well, it turns out this is
generally called binary search.

00:25:13.077 --> 00:25:15.040
Bi implying two, because
you're either going

00:25:15.040 --> 00:25:17.395
with the left half or the
right half again and again.

00:25:17.395 --> 00:25:20.020
This is literally what we've been
talking about since week zero

00:25:20.020 --> 00:25:21.312
when I searched the phone book.

00:25:21.312 --> 00:25:25.450
That too was binary search, dividing and
dividing and dividing in half and half.

00:25:25.450 --> 00:25:27.850
So if we were to draw
some pseudocode for this,

00:25:27.850 --> 00:25:30.250
I would propose that we
could do something like this.

00:25:30.250 --> 00:25:34.900
If 50 is behind the middle
door, then we got lucky.

00:25:34.900 --> 00:25:36.490
Just return true.

00:25:36.490 --> 00:25:40.700
Else if the 50 is less than
the value at the middle door--

00:25:40.700 --> 00:25:42.430
so if it's smaller
than the middle door--

00:25:42.430 --> 00:25:43.810
I want to search to the left.

00:25:43.810 --> 00:25:45.580
So I can say search left half.

00:25:45.580 --> 00:25:50.330
Else if 50 is greater than the middle
door, I want to search to the right.

00:25:50.330 --> 00:25:54.550
And I think that's
almost everything, right?

00:25:54.550 --> 00:25:57.680
Is there a fourth possible case?

00:25:57.680 --> 00:25:58.715
What else could happen?

00:26:03.130 --> 00:26:03.670
Good.

00:26:03.670 --> 00:26:07.720
So taking into account that if
there are no doors left or no doors

00:26:07.720 --> 00:26:09.830
to begin with, we
better handle that case

00:26:09.830 --> 00:26:11.650
so that we don't induce one
of those spinning beach balls,

00:26:11.650 --> 00:26:13.540
so that the computer
doesn't freeze or crash.

00:26:13.540 --> 00:26:17.020
There's really four possible scenarios
in searching for information.

00:26:17.020 --> 00:26:19.990
It's either in the middle or
to the left or to the right.

00:26:19.990 --> 00:26:21.438
Or it's just not there at all.

00:26:21.438 --> 00:26:24.730
And so I sort of slip that in at the end
because technically, that's a question

00:26:24.730 --> 00:26:28.390
you should ask first because if there's
no doors, there's no work to be done.

00:26:28.390 --> 00:26:31.780
But logically, this is the juicy
part-- the other three questions

00:26:31.780 --> 00:26:33.320
that you might ask yourself.

00:26:33.320 --> 00:26:35.860
So this then might be the
pseudocode for binary search.

00:26:35.860 --> 00:26:37.398
And we could make it more technical.

00:26:37.398 --> 00:26:39.940
And this is where it kind of
escalates quickly syntactically.

00:26:39.940 --> 00:26:41.770
But I'm just using the
same kind of syntax.

00:26:41.770 --> 00:26:45.190
If doors in my pseudocode
represents an array of doors,

00:26:45.190 --> 00:26:48.550
well, then doors bracket middle
is just a pseudocode-like way

00:26:48.550 --> 00:26:51.110
of saying go to the
middle door in that array.

00:26:51.110 --> 00:26:55.090
And then notice, else if 50 is
less than-- that middle value--

00:26:55.090 --> 00:26:59.620
then search doors bracket
0, so the leftmost one--

00:26:59.620 --> 00:27:02.380
through doors middle minus 1.

00:27:02.380 --> 00:27:05.780
So you don't need to waste time
researching the middle door.

00:27:05.780 --> 00:27:10.250
So I say middle minus 1 so that I scooch
over slightly to the left, so to speak.

00:27:10.250 --> 00:27:13.550
Else if 50 is greater than
the value at the middle door,

00:27:13.550 --> 00:27:16.700
then you want to search
1 over to the right,

00:27:16.700 --> 00:27:21.770
so middle plus 1 among those doors,
through the last door, which is not n,

00:27:21.770 --> 00:27:24.710
because we start counting
at 0, but n minus 1.

00:27:24.710 --> 00:27:26.460
And the rest of the
algorithm is the same.

00:27:26.460 --> 00:27:27.918
This is just a little more precise.

00:27:27.918 --> 00:27:32.450
And I dare say, when writing a
program in C or any language,

00:27:32.450 --> 00:27:36.200
like, honestly, starting in pseudocode
like this will generally make it

00:27:36.200 --> 00:27:38.370
much easier to write the actual code.

00:27:38.370 --> 00:27:41.360
So in fact, in this and future
problem sets, do get into the habit,

00:27:41.360 --> 00:27:43.610
especially if you're struggling
getting started-- just

00:27:43.610 --> 00:27:45.680
write things out in English
and maybe high level

00:27:45.680 --> 00:27:47.120
English a little more like this.

00:27:47.120 --> 00:27:51.710
Then as a version two, go in with
your keyboard or paper, pencil.

00:27:51.710 --> 00:27:56.240
And make it a little more precise
using some code-like syntax.

00:27:56.240 --> 00:27:58.430
And then I dare say in
version 3, you can now

00:27:58.430 --> 00:28:01.460
translate this pretty
much verbatim to C code.

00:28:01.460 --> 00:28:04.040
The only headache is going
to be rounding issues

00:28:04.040 --> 00:28:06.890
with integers because if you divide
an integer and you a fraction,

00:28:06.890 --> 00:28:08.810
it's going to truncate, so
all that kind of headache.

00:28:08.810 --> 00:28:10.670
But you can work through
that by just thinking

00:28:10.670 --> 00:28:12.440
through what's going
to get truncated when

00:28:12.440 --> 00:28:15.630
you round down or up as a solution.

00:28:15.630 --> 00:28:17.870
Any questions, though,
on this pseudocode

00:28:17.870 --> 00:28:23.420
for either linear or binary
search as we've defined them--

00:28:23.420 --> 00:28:26.340
linear or binary search--

00:28:26.340 --> 00:28:26.840
no?

00:28:26.840 --> 00:28:28.798
All right, well, let's
consider then a bit more

00:28:28.798 --> 00:28:33.000
formally a question that we'll come
back to in the future in future classes

00:28:33.000 --> 00:28:33.500
as well.

00:28:33.500 --> 00:28:35.885
What is the running time
of these algorithms?

00:28:35.885 --> 00:28:38.927
What is, that is to say, the
efficiency of these algorithms?

00:28:38.927 --> 00:28:41.510
And how do we actually measure
the efficiency of an algorithm?

00:28:41.510 --> 00:28:44.017
Is it with a stopwatch or
with some other mechanism?

00:28:44.017 --> 00:28:46.100
Well, I propose that we
think back to this picture

00:28:46.100 --> 00:28:50.660
here, whereby this, again, was
representative of both the phone book

00:28:50.660 --> 00:28:54.140
example from week zero and
theoretically, bug aside,

00:28:54.140 --> 00:28:57.920
the attendance counting algorithm
from earlier today, whereby

00:28:57.920 --> 00:29:02.150
this same green line theoretically
represents how much time it should have

00:29:02.150 --> 00:29:04.380
taken us as a group to count ourselves.

00:29:04.380 --> 00:29:04.880
Why?

00:29:04.880 --> 00:29:07.520
Because if maybe another
class comes in and doubles

00:29:07.520 --> 00:29:11.280
the size of the number of humans
in this room, no big deal.

00:29:11.280 --> 00:29:14.960
That's just one more step or
one more iteration of the loop

00:29:14.960 --> 00:29:17.850
because half of the people
would anyway sit down.

00:29:17.850 --> 00:29:22.410
So this green algorithm still represents
the faster theoretical algorithm today.

00:29:22.410 --> 00:29:26.130
And so recall that we described these
things more mathematically as n.

00:29:26.130 --> 00:29:29.970
So this was one page at a
time or one person at a time.

00:29:29.970 --> 00:29:33.130
This was two people or
two pages at a time.

00:29:33.130 --> 00:29:36.480
So it's twice as fast, so if n
is the number of people or pages

00:29:36.480 --> 00:29:38.850
and divided by 2 is the
total number of steps.

00:29:38.850 --> 00:29:41.820
And then this one got a little
mathy, but log base 2 of n.

00:29:41.820 --> 00:29:47.850
And log base 2 of n just means what is
the value when you take n and divide it

00:29:47.850 --> 00:29:51.360
in two by 2 again and
again and again and again

00:29:51.360 --> 00:29:54.960
until you're left with just one
page or one person standing.

00:29:54.960 --> 00:29:58.500
But in the world of running times,
it turns out that being this precise

00:29:58.500 --> 00:30:00.390
is not that intellectually interesting.

00:30:00.390 --> 00:30:02.370
And it sort of devolves
into lower level math.

00:30:02.370 --> 00:30:04.560
It's just not necessary
when having discussions

00:30:04.560 --> 00:30:07.500
about the efficiency of an algorithm
or even code that you've written.

00:30:07.500 --> 00:30:10.020
So generally, a computer
scientist, when asked

00:30:10.020 --> 00:30:11.730
what's the running
time of your algorithm

00:30:11.730 --> 00:30:13.813
or what's the efficiency
of your algorithm or more

00:30:13.813 --> 00:30:16.560
generally how good or
bad is your algorithm,

00:30:16.560 --> 00:30:20.588
they'll talk about it being on
the order of some number of steps.

00:30:20.588 --> 00:30:23.130
This is a phrase you'll hear
increasingly in computer science

00:30:23.130 --> 00:30:25.020
where you can kind of
wave your hand at it.

00:30:25.020 --> 00:30:27.600
Like, oh, the lower level
details don't matter that much.

00:30:27.600 --> 00:30:31.410
All you care about in broad
strokes are certain numbers

00:30:31.410 --> 00:30:32.940
that will add up the most.

00:30:32.940 --> 00:30:35.250
And in fact, when
computer scientists talk

00:30:35.250 --> 00:30:37.860
about the efficiency of
algorithms, they tend

00:30:37.860 --> 00:30:43.020
to throw away constant factors,
so literally, numbers like 2

00:30:43.020 --> 00:30:46.540
that might be dividing
here or a base here.

00:30:46.540 --> 00:30:49.710
So for instance, these two
algorithms to a computer scientist

00:30:49.710 --> 00:30:50.918
would sort be the same.

00:30:50.918 --> 00:30:52.710
Like, yeah, it's
technically twice as fast.

00:30:52.710 --> 00:30:53.940
But look at the lines.

00:30:53.940 --> 00:30:57.210
I mean, they're practically the same and
this one here too, log base 2-- sure.

00:30:57.210 --> 00:31:01.050
But if you remember from math class,
you can change the base of any logarithm

00:31:01.050 --> 00:31:03.040
from one number to
another pretty easily.

00:31:03.040 --> 00:31:06.120
So ah, let's just
generalize it as log of n.

00:31:06.120 --> 00:31:09.390
It doesn't really matter fundamentally
what the numbers actually are.

00:31:09.390 --> 00:31:14.400
And honestly, if we zoom out slightly
so that the y-axis and the x-axis

00:31:14.400 --> 00:31:18.240
get even bigger, honestly,
these first two algorithms

00:31:18.240 --> 00:31:21.750
really do start to resemble
each other closer and closer.

00:31:21.750 --> 00:31:25.050
And I daresay in your mind's eye,
imagine zooming further and further

00:31:25.050 --> 00:31:26.070
and further out.

00:31:26.070 --> 00:31:29.850
Like, that red and yellow line are
pretty much-- once n is large enough--

00:31:29.850 --> 00:31:31.305
going to be functionally the same.

00:31:31.305 --> 00:31:33.180
Like, they're practically
the same algorithm.

00:31:33.180 --> 00:31:35.700
But this one is still doing
amazingly because it's

00:31:35.700 --> 00:31:37.360
a fundamentally different shape.

00:31:37.360 --> 00:31:40.320
So this is to say when a computer
scientist talks about, thinks

00:31:40.320 --> 00:31:43.890
about the efficiency of algorithms,
we just throw away the constant terms

00:31:43.890 --> 00:31:47.450
that when n gets really large
just don't seem to matter as much.

00:31:47.450 --> 00:31:49.200
They don't add up as
much or fundamentally

00:31:49.200 --> 00:31:51.760
change the picture in a case like this.

00:31:51.760 --> 00:31:55.320
So what I'm describing here with this
capital letter O has a technical term.

00:31:55.320 --> 00:31:57.810
This is called big O notation.

00:31:57.810 --> 00:32:00.180
And this is omnipresent
in computer science

00:32:00.180 --> 00:32:02.250
and often rears its head
even in programming,

00:32:02.250 --> 00:32:05.617
specifically when talking about
the design of some algorithm.

00:32:05.617 --> 00:32:07.950
And this is a little cheat
sheet here on the screen now.

00:32:07.950 --> 00:32:10.890
Very often, algorithms
that you write or you

00:32:10.890 --> 00:32:17.400
use will be describable as being on the
order of one of these running times.

00:32:17.400 --> 00:32:21.180
So n is just representative of the
number of things-- number of people,

00:32:21.180 --> 00:32:22.210
number of pages--

00:32:22.210 --> 00:32:25.830
whatever it is you're
actually doing in code.

00:32:25.830 --> 00:32:30.150
So the mathematical formulas
inside of the parentheses

00:32:30.150 --> 00:32:33.870
describe as a function
of the size of that input

00:32:33.870 --> 00:32:36.360
how fast or slow the
algorithm's going to be.

00:32:36.360 --> 00:32:40.710
So this algorithm in the middle
here, big O of n, so to speak,

00:32:40.710 --> 00:32:43.600
means that it takes linear
time, in other words.

00:32:43.600 --> 00:32:44.760
So my first algorithm--

00:32:44.760 --> 00:32:47.040
1, 2, 3, 4--

00:32:47.040 --> 00:32:48.787
or my first algorithm in week zero--

00:32:48.787 --> 00:32:51.630
1, 2, 3, 4.

00:32:51.630 --> 00:32:53.250
That was a linear search.

00:32:53.250 --> 00:32:56.940
The number of steps it takes is on the
order of n because if there's n pages,

00:32:56.940 --> 00:33:00.065
in the worst case, like, John Harvard's
all the way at the end of the phone

00:33:00.065 --> 00:33:01.318
book, so it takes me n steps.

00:33:01.318 --> 00:33:04.110
In this case, it's always going to
take me n steps to count you all

00:33:04.110 --> 00:33:06.443
because if I want to point
at each and every one of you,

00:33:06.443 --> 00:33:08.550
that is always going to take me n steps.

00:33:08.550 --> 00:33:13.605
So big O represents an upper
bound on the number of steps

00:33:13.605 --> 00:33:14.730
that you might be counting.

00:33:14.730 --> 00:33:17.760
And so we often use it to consider
the worst case and the worst

00:33:17.760 --> 00:33:20.190
case, John Harvard, or whoever might
be all the way at the end of the phone

00:33:20.190 --> 00:33:20.690
book.

00:33:20.690 --> 00:33:23.240
So that linear search
is on the order of n.

00:33:23.240 --> 00:33:25.230
But what about n squared?

00:33:25.230 --> 00:33:27.870
This means n people doing n things.

00:33:27.870 --> 00:33:30.590
So for instance, and
we won't do this today,

00:33:30.590 --> 00:33:33.440
but if we were to ask you
again to stand up and shake

00:33:33.440 --> 00:33:36.140
everyone's hand in the room--

00:33:36.140 --> 00:33:40.340
not good for health nowadays-- but
shake everyone's hand in the room,

00:33:40.340 --> 00:33:42.420
how many handshakes would there be?

00:33:42.420 --> 00:33:46.580
Well, if there's n of you and you've
got to shake everyone else's hand,

00:33:46.580 --> 00:33:48.650
that's technically n times n minus 1.

00:33:48.650 --> 00:33:50.030
Let's throw away the minus 1.

00:33:50.030 --> 00:33:53.600
That's n times n or
n squared handshakes.

00:33:53.600 --> 00:33:54.800
That's a lot of handshakes.

00:33:54.800 --> 00:33:57.140
And so the running time
of shaking everyone's hand

00:33:57.140 --> 00:34:00.860
in the room to introduce yourself
would be on the order of n squared.

00:34:00.860 --> 00:34:04.370
At the other end of the spectrum,
the faster end, big O of 1

00:34:04.370 --> 00:34:07.310
doesn't mean that the algorithm
takes literally one step.

00:34:07.310 --> 00:34:11.040
It could take two steps or
three or even 1,000 steps.

00:34:11.040 --> 00:34:14.159
But what it means is it's
a constant number of steps.

00:34:14.159 --> 00:34:16.520
So it doesn't matter how
many people are in the room,

00:34:16.520 --> 00:34:21.320
this describes something
taking just one step total

00:34:21.320 --> 00:34:23.520
or a constant number of steps total.

00:34:23.520 --> 00:34:26.610
So for instance, earlier when
everyone stood up at the same time,

00:34:26.610 --> 00:34:29.412
that was constant time because
if we had twice as many people

00:34:29.412 --> 00:34:32.370
come into the room, it's not going
to take us twice as long to stand up

00:34:32.370 --> 00:34:34.739
if everyone stands up at the same time.

00:34:34.739 --> 00:34:37.030
That would be a constant time algorithm.

00:34:37.030 --> 00:34:38.290
So this is linear.

00:34:38.290 --> 00:34:39.330
This is constant.

00:34:39.330 --> 00:34:41.460
If you want to get
fancy, this is quadratic.

00:34:41.460 --> 00:34:42.960
This is logarithmic.

00:34:42.960 --> 00:34:44.760
And this is n log n.

00:34:44.760 --> 00:34:48.340
Or there's other fancier terms we
can give it as well, but for now,

00:34:48.340 --> 00:34:53.050
just a common list of running times that
we might apply to certain algorithms.

00:34:53.050 --> 00:34:56.400
So linear search, I
claim, is in big O of n

00:34:56.400 --> 00:34:58.900
because it's going to take
in the worst case n steps.

00:34:58.900 --> 00:35:01.490
What about binary search?

00:35:01.490 --> 00:35:05.870
How many steps does binary
search take on the order of

00:35:05.870 --> 00:35:07.490
according to this chart?

00:35:07.490 --> 00:35:07.990
Yeah?

00:35:07.990 --> 00:35:08.700
AUDIENCE: Log n.

00:35:08.700 --> 00:35:09.630
DAVID MALAN: Log n.

00:35:09.630 --> 00:35:14.250
Yeah, because no matter what with
binary search, you're dividing in half,

00:35:14.250 --> 00:35:15.510
half, half.

00:35:15.510 --> 00:35:19.380
But in the worst case, it might
be in the last door you check.

00:35:19.380 --> 00:35:21.630
But you only took log
n steps to get there.

00:35:21.630 --> 00:35:23.550
But it still might be
the last one you check.

00:35:23.550 --> 00:35:26.897
So binary search indeed would
be on the order of log n.

00:35:26.897 --> 00:35:28.230
But sometimes, you do get lucky.

00:35:28.230 --> 00:35:30.990
And we saw with our volunteers
that sometimes you can get lucky

00:35:30.990 --> 00:35:32.430
and just find things quicker.

00:35:32.430 --> 00:35:35.793
So we don't always want to talk about
things in terms of an upper bound,

00:35:35.793 --> 00:35:38.460
like how many steps in the worst
case might this algorithm take.

00:35:38.460 --> 00:35:40.860
Sometimes it's useful
to know in the best case

00:35:40.860 --> 00:35:43.350
how few steps might an algorithm take.

00:35:43.350 --> 00:35:46.170
So for that, we have this
capital Greek omega, which

00:35:46.170 --> 00:35:48.540
is another symbol in computer science.

00:35:48.540 --> 00:35:53.872
And whereas big O represents upper
bound, omega represents lower bound.

00:35:53.872 --> 00:35:55.080
And it's the exact same idea.

00:35:55.080 --> 00:35:56.820
It's just a different
symbol to represent

00:35:56.820 --> 00:35:58.930
a different idea, the
opposite, in this case.

00:35:58.930 --> 00:36:00.690
So here's a similar cheat sheet here.

00:36:00.690 --> 00:36:02.670
But when you use the
omega symbol, that just

00:36:02.670 --> 00:36:07.240
means that this algorithm might
take as few as this many steps,

00:36:07.240 --> 00:36:09.610
for instance, in the very best case.

00:36:09.610 --> 00:36:12.640
So by that logic, if I
ask about linear search,

00:36:12.640 --> 00:36:17.090
our first demonstration with the
lockers, let's consider the best case.

00:36:17.090 --> 00:36:19.720
How many steps might it
take to search n lockers

00:36:19.720 --> 00:36:21.610
using linear search in the best case?

00:36:21.610 --> 00:36:22.960
You get lucky.

00:36:22.960 --> 00:36:23.680
So I heard it.

00:36:23.680 --> 00:36:24.950
Yeah, just one step.

00:36:24.950 --> 00:36:29.210
So we could say that linear search
is an omega of 1, so to speak.

00:36:29.210 --> 00:36:30.880
What about binary search?

00:36:30.880 --> 00:36:34.150
If you've got n lockers,
in the best case, though,

00:36:34.150 --> 00:36:35.947
how few steps might it take us?

00:36:35.947 --> 00:36:37.780
Again, one, because we
might just get lucky.

00:36:37.780 --> 00:36:40.460
And boom, it happens to be
right there in the middle.

00:36:40.460 --> 00:36:46.000
So you could say that both linear search
and binary search are an omega of 1.

00:36:46.000 --> 00:36:50.620
Now, by contrast, my attendance
algorithm, the first one I proposed--

00:36:50.620 --> 00:36:55.822
1, 2, 3, 4, 5, 6, 7, I claimed a
moment ago that that's in big O of n

00:36:55.822 --> 00:36:58.780
because if there's n people in the
room, I've got to point at everyone.

00:36:58.780 --> 00:37:00.580
But equivalently, if
there's n people in the room

00:37:00.580 --> 00:37:03.400
and I have to point at everyone,
what's the fewest number of steps

00:37:03.400 --> 00:37:08.770
it could take me to take attendance
using this linear approach?

00:37:08.770 --> 00:37:11.730
So still n, right, unless I
guess, which is not an algorithm.

00:37:11.730 --> 00:37:14.230
Like, unless I guess, I'm not
going to get the right answer,

00:37:14.230 --> 00:37:16.120
so I kind of have to point at everyone.

00:37:16.120 --> 00:37:18.210
So in both the best
case and the worst case,

00:37:18.210 --> 00:37:20.640
some algorithms still take n steps.

00:37:20.640 --> 00:37:24.300
And for this, we have what we'll
call theta notation, whereby

00:37:24.300 --> 00:37:28.110
if big O and omega happen to be the
same, which is not always the case

00:37:28.110 --> 00:37:33.190
but can be, then you can say that
algorithm is in theta of such and such.

00:37:33.190 --> 00:37:37.530
So my attendance-taking algorithm,
the very first, 1, 2, 3, 4,

00:37:37.530 --> 00:37:40.020
all the way on up to n
would be in theta of n

00:37:40.020 --> 00:37:42.250
because in both the best
case and the worst case,

00:37:42.250 --> 00:37:48.143
it takes the same number of steps
as per my big O and omega analysis.

00:37:48.143 --> 00:37:51.060
Now, there is a more formal mathematical
definition for both of these.

00:37:51.060 --> 00:37:52.620
And if you take higher
level computer science,

00:37:52.620 --> 00:37:54.495
you'll go more into the
weeds of these ideas.

00:37:54.495 --> 00:37:56.820
But for now, big O, upper bound.

00:37:56.820 --> 00:37:59.100
Omega, lower bound.

00:37:59.100 --> 00:38:03.430
Questions on this symbology?

00:38:03.430 --> 00:38:06.920
It'll be a tool in our tool kit.

00:38:06.920 --> 00:38:07.420
No?

00:38:07.420 --> 00:38:14.260
OK, so with that said, let's see how
we might translate this to actual code

00:38:14.260 --> 00:38:17.050
in something that makes sense
now using C and not so much

00:38:17.050 --> 00:38:20.570
new syntax but applications of
similar ideas from last time.

00:38:20.570 --> 00:38:25.237
So for instance, let me actually go over
to search dot C. And in search dot C,

00:38:25.237 --> 00:38:27.820
I'm going to go ahead and implement
the idea of linear search,

00:38:27.820 --> 00:38:29.980
very simply, using integers initially.

00:38:29.980 --> 00:38:34.150
So to do this, let me go ahead and
give myself the familiar CS50 dot h

00:38:34.150 --> 00:38:36.460
so that I can ask the human
what number to search for.

00:38:36.460 --> 00:38:40.000
Then let me go ahead and
include standard io.h

00:38:40.000 --> 00:38:42.430
so that I can use printf and the like.

00:38:42.430 --> 00:38:45.880
Then I'm going to go ahead and do int
main void without any command line

00:38:45.880 --> 00:38:49.130
arguments because I'm not going to need
any for this particular demonstration.

00:38:49.130 --> 00:38:50.920
And someone asked about
this a while back.

00:38:50.920 --> 00:38:53.560
If I want to declare an
array of values but I

00:38:53.560 --> 00:38:57.650
know in advance what the values are,
there is a special syntax I can use,

00:38:57.650 --> 00:38:58.330
which is this.

00:38:58.330 --> 00:39:01.480
If I want a whole bunch of numbers
but I want those numbers to be stored

00:39:01.480 --> 00:39:04.780
in an array, I can store them in--

00:39:04.780 --> 00:39:06.380
using these curly braces.

00:39:06.380 --> 00:39:09.582
And I'm going to use the same
numbers as the monopoly denominations

00:39:09.582 --> 00:39:10.790
that we've been playing with.

00:39:10.790 --> 00:39:12.690
And I'm just going to put
them in sort of random order.

00:39:12.690 --> 00:39:15.315
But I'm going to deliberately
put the 50 all the way at the end

00:39:15.315 --> 00:39:19.430
just so that I know that it's going to
try all possible steps, so big O of n,

00:39:19.430 --> 00:39:20.210
ultimately.

00:39:20.210 --> 00:39:23.180
Now let's go ahead and ask
the user for a value n.

00:39:23.180 --> 00:39:24.530
And we'll use get int.

00:39:24.530 --> 00:39:28.520
And just ask the user for a number to
search for, be it 50 or something else.

00:39:28.520 --> 00:39:32.000
And then here's how I might
implement in code now linear search,

00:39:32.000 --> 00:39:34.850
translating effectively the
pseudocode from last time.

00:39:34.850 --> 00:39:38.090
For int i equals 0.

00:39:38.090 --> 00:39:40.940
i is less than 7. i plus plus.

00:39:40.940 --> 00:39:43.670
And then inside of this
loop, I can ask a question.

00:39:43.670 --> 00:39:47.610
If the ith number in the numbers array--

00:39:47.610 --> 00:39:51.170
so numbers bracket i--
equals, equals the number n

00:39:51.170 --> 00:39:55.130
that I care about, well, this is
where I could just declare return true

00:39:55.130 --> 00:39:55.930
or return false.

00:39:55.930 --> 00:39:58.430
Here, I'm going to go ahead and
just use a printf statement.

00:39:58.430 --> 00:40:01.670
And I'm going to say found,
backslash n, just to know visually

00:40:01.670 --> 00:40:03.110
that I found the number.

00:40:03.110 --> 00:40:06.920
And else I might do something like this.

00:40:06.920 --> 00:40:09.080
Else if that's not the
case, I'll go ahead

00:40:09.080 --> 00:40:13.910
and print out not found backslash n.

00:40:13.910 --> 00:40:16.410
All right, so let me zoom
out for just a moment.

00:40:16.410 --> 00:40:17.690
Here's all of my code.

00:40:17.690 --> 00:40:25.130
Any concerns with this implementation
of what I claim is now linear search?

00:40:25.130 --> 00:40:28.970
Any concerns with what I
claim is linear search?

00:40:28.970 --> 00:40:29.885
Yeah?

00:40:29.885 --> 00:40:32.855
AUDIENCE: [INAUDIBLE]

00:40:32.855 --> 00:40:33.950
DAVID MALAN: Exactly.

00:40:33.950 --> 00:40:37.917
If I search the first number and it's
not found, it's going to say not found.

00:40:37.917 --> 00:40:40.500
But it's going to keep saying
not found, not found, not found,

00:40:40.500 --> 00:40:41.360
which might be fine.

00:40:41.360 --> 00:40:42.402
But it's a little stupid.

00:40:42.402 --> 00:40:44.325
I probably want to know
if it's found or not.

00:40:44.325 --> 00:40:46.700
So I've made that same mistake
that I called out earlier.

00:40:46.700 --> 00:40:49.700
Like, the else is not the
alternative to not finding

00:40:49.700 --> 00:40:51.570
the number in that first location.

00:40:51.570 --> 00:40:55.890
It's the final decision to make when
I haven't actually found the value.

00:40:55.890 --> 00:40:58.580
So I think what I want to do
is get rid of this else clause.

00:40:58.580 --> 00:41:01.940
And then at the outside
of this loop, I think

00:41:01.940 --> 00:41:03.770
I want to conclude printf not found.

00:41:03.770 --> 00:41:08.420
But here too there's a
new bug that's arisen.

00:41:08.420 --> 00:41:13.310
There's a new bug, even though I fixed
the logical error you just described.

00:41:13.310 --> 00:41:15.290
What symptom are we still going to see?

00:41:18.290 --> 00:41:20.960
Yeah, now it's going to
always print not found,

00:41:20.960 --> 00:41:23.997
even when I have found it
because even once I've found it

00:41:23.997 --> 00:41:26.330
and I finished going through
the array, it's still going

00:41:26.330 --> 00:41:27.740
to assume that I got to the bottom.

00:41:27.740 --> 00:41:28.990
And therefore, it's not found.

00:41:28.990 --> 00:41:32.660
So I need to somehow exit out of
main prematurely, if you will.

00:41:32.660 --> 00:41:35.510
And recall that last week,
we also introduced the idea

00:41:35.510 --> 00:41:39.680
that main all this time does
actually return a value, an integer.

00:41:39.680 --> 00:41:42.110
By default, it's secretly been zero.

00:41:42.110 --> 00:41:46.130
Anytime a program exits, it just
returns zero, an exit status of zero.

00:41:46.130 --> 00:41:47.970
But we do now have control over that.

00:41:47.970 --> 00:41:51.080
And so a convention in
C would be that when

00:41:51.080 --> 00:41:54.500
you want your main function
to end prematurely if so,

00:41:54.500 --> 00:41:56.587
you can literally just return a value.

00:41:56.587 --> 00:41:59.670
And even though this feels a little
backwards, this is just the way it is.

00:41:59.670 --> 00:42:02.450
You return zero to indicate success.

00:42:02.450 --> 00:42:06.120
And you return any other
integer to indicate failure.

00:42:06.120 --> 00:42:09.110
So by convention, people go
to 1 and then 2 and then 3.

00:42:09.110 --> 00:42:11.310
They don't think too hard
about what numbers to use.

00:42:11.310 --> 00:42:14.120
But in this case, I'm going
to go ahead and return 1

00:42:14.120 --> 00:42:16.440
if I do get to the bottom of this file.

00:42:16.440 --> 00:42:21.090
So now if I open back up my
terminal window, I run make search.

00:42:21.090 --> 00:42:23.267
No syntax errors-- dot slash search.

00:42:23.267 --> 00:42:24.850
I'm going to be prompted for a number.

00:42:24.850 --> 00:42:26.683
Let's go ahead and
search for the number 50.

00:42:26.683 --> 00:42:28.650
And I should see found.

00:42:28.650 --> 00:42:31.290
Meanwhile, if I run it
again-- dot slash search--

00:42:31.290 --> 00:42:34.590
I search for the number 13,
that should be not found.

00:42:34.590 --> 00:42:38.850
So I claim that this is now a correct
implementation of linear search

00:42:38.850 --> 00:42:43.620
that's gone from left to right, looking
for a number that may or may not

00:42:43.620 --> 00:42:44.310
be there.

00:42:44.310 --> 00:42:48.420
Any questions on this
version of the code here?

00:42:48.420 --> 00:42:49.578
Yeah?

00:42:49.578 --> 00:42:53.064
AUDIENCE: [INAUDIBLE]

00:42:53.064 --> 00:42:55.760
DAVID MALAN: Return
zero indicates success

00:42:55.760 --> 00:42:58.250
when doing it from main in particular.

00:42:58.250 --> 00:43:02.480
And that's backwards only in the
sense that generally 0 is false.

00:43:02.480 --> 00:43:04.640
And 1 is true.

00:43:04.640 --> 00:43:09.272
But the logic here is that if the
program works correctly, that's zero.

00:43:09.272 --> 00:43:11.730
But there's an infinite number
of things that can go wrong.

00:43:11.730 --> 00:43:15.290
And that's why we need 1 and 2
and 3 and 4 and all the way up.

00:43:15.290 --> 00:43:17.362
Other questions?

00:43:17.362 --> 00:43:19.842
AUDIENCE: [INAUDIBLE]

00:43:19.842 --> 00:43:21.637
DAVID MALAN: Yes.

00:43:21.637 --> 00:43:25.533
AUDIENCE: [INAUDIBLE]

00:43:25.533 --> 00:43:26.590
DAVID MALAN: Correct.

00:43:26.590 --> 00:43:31.270
When you return zero or return any value
from main wherever it is in your code,

00:43:31.270 --> 00:43:34.240
the program will effectively
terminate right then and there.

00:43:34.240 --> 00:43:37.600
No additional code will get executed
at the bottom of the function.

00:43:37.600 --> 00:43:39.310
You'll effectively exit out.

00:43:39.310 --> 00:43:43.240
Just like in a normal function that
isn't main, when you return a value,

00:43:43.240 --> 00:43:46.930
it immediately exits that
function and hands back the value.

00:43:46.930 --> 00:43:47.440
Yeah?

00:43:47.440 --> 00:43:49.352
AUDIENCE: [INAUDIBLE]

00:43:49.352 --> 00:43:52.020
DAVID MALAN: So return
1 is just me being

00:43:52.020 --> 00:43:56.250
pedantic at this point because I'm
frankly not going to really care what

00:43:56.250 --> 00:43:58.110
the exit status is of this program.

00:43:58.110 --> 00:44:03.810
But once I've introduced the idea of
manually returning 0 on this line 14

00:44:03.810 --> 00:44:07.470
to indicate success, it stands
to reason that I should also

00:44:07.470 --> 00:44:10.510
return a different value when
I want to indicate failure.

00:44:10.510 --> 00:44:13.038
And so even though this
does not functionally

00:44:13.038 --> 00:44:15.330
change the program-- it will
still work-- it will still

00:44:15.330 --> 00:44:19.050
print the same things correctly--
it's a lower level detail

00:44:19.050 --> 00:44:22.260
that programmers, teaching
assistants, testing software,

00:44:22.260 --> 00:44:24.930
might appreciate, knowing
what actually happened

00:44:24.930 --> 00:44:28.160
in the program underneath the hood.

00:44:28.160 --> 00:44:30.660
All right, so what about strings?

00:44:30.660 --> 00:44:32.660
So it turns out with
strings, we're going

00:44:32.660 --> 00:44:35.790
to have to think a little harder
about how best to do this.

00:44:35.790 --> 00:44:37.530
So let me actually go ahead and do this.

00:44:37.530 --> 00:44:41.240
Let me go ahead and get rid of
much of this code but transition

00:44:41.240 --> 00:44:44.420
to a different type of array,
this time an array of strings.

00:44:44.420 --> 00:44:47.570
And I'm going to call the array strings
itself plural, just to make clear

00:44:47.570 --> 00:44:48.985
what's in it instead of numbers.

00:44:48.985 --> 00:44:51.110
I'm going to use the square
bracket notation, which

00:44:51.110 --> 00:44:54.235
just means I don't know at the moment
how many elements it's going to have.

00:44:54.235 --> 00:44:56.030
But the compiler can
figure it out for me.

00:44:56.030 --> 00:44:58.910
And in the spirit of Monopoly,
let's go ahead in our curly braces,

00:44:58.910 --> 00:45:00.440
do something like this.

00:45:00.440 --> 00:45:03.200
Battleship is going to
be one of the strings.

00:45:03.200 --> 00:45:05.390
Boot is going to be another.

00:45:05.390 --> 00:45:07.160
Cannon is going to be a third.

00:45:07.160 --> 00:45:09.140
Iron is going to be the fourth.

00:45:09.140 --> 00:45:11.510
Thimble-- and if you've
ever played Monopoly,

00:45:11.510 --> 00:45:14.610
you know where these are coming
from-- and top hat, for instance.

00:45:14.610 --> 00:45:17.510
So this gives me 1, 2, 3, 4, 5, 6.

00:45:17.510 --> 00:45:19.148
I could write the number 6 here.

00:45:19.148 --> 00:45:20.690
And the compiler would appreciate it.

00:45:20.690 --> 00:45:21.607
But it doesn't matter.

00:45:21.607 --> 00:45:25.280
The compiler can figure it out on its
own just based on the number of commas.

00:45:25.280 --> 00:45:28.440
And this also ensures that I
don't write one number to the left

00:45:28.440 --> 00:45:30.130
and then miscount on the right.

00:45:30.130 --> 00:45:32.890
So omitting it is probably
in everyone's benefit.

00:45:32.890 --> 00:45:33.840
Now let's do this.

00:45:33.840 --> 00:45:38.340
Let's ask the user for a string
using get string instead of get int

00:45:38.340 --> 00:45:40.650
for some string to search for.

00:45:40.650 --> 00:45:42.660
Then let's go ahead and
do the exact same thing.

00:45:42.660 --> 00:45:46.620
For i int i equals 0.

00:45:46.620 --> 00:45:50.670
i is less than 6, which is
technically a magic number.

00:45:50.670 --> 00:45:53.070
But let's focus on the
searching algorithm for now.

00:45:53.070 --> 00:45:54.330
i plus plus.

00:45:54.330 --> 00:45:56.850
And then inside of this
loop, let's do this.

00:45:56.850 --> 00:46:04.980
If strings bracket i equals
equals, s, then let's go ahead

00:46:04.980 --> 00:46:08.100
and print out just as before, quote,
unquote, found-- backslash n--

00:46:08.100 --> 00:46:10.125
and proactively return zero this time.

00:46:10.125 --> 00:46:12.000
And if we don't find it
anywhere in the loop,

00:46:12.000 --> 00:46:14.280
let's go ahead and return
1 at the very bottom,

00:46:14.280 --> 00:46:17.820
before which we will print
not found backslash n.

00:46:17.820 --> 00:46:20.670
So it's the exact same
logic at the moment,

00:46:20.670 --> 00:46:23.530
even though I've changed
my ints to strings.

00:46:23.530 --> 00:46:25.950
Let me go ahead and open
up my terminal window now.

00:46:25.950 --> 00:46:30.810
Do make search again and
see, OK, so far, so good.

00:46:30.810 --> 00:46:33.780
Let me now go ahead and
do dot slash search.

00:46:33.780 --> 00:46:37.470
And let's go ahead and search
for-- how about top hat.

00:46:37.470 --> 00:46:39.653
So we should see found.

00:46:39.653 --> 00:46:40.415
Huh.

00:46:40.415 --> 00:46:41.290
All right, not found.

00:46:41.290 --> 00:46:43.332
All right, well, let's do
dot slash search again.

00:46:43.332 --> 00:46:44.320
How about thimble?

00:46:44.320 --> 00:46:47.380
Maybe it's because it's just two words.

00:46:47.380 --> 00:46:50.110
No, thimble's not found, either.

00:46:50.110 --> 00:46:53.230
Dot slash search-- let's search
for the first one, battleship.

00:46:53.230 --> 00:46:55.240
Enter-- still not found.

00:46:55.240 --> 00:46:57.280
Let's search for
something else like cat.

00:46:57.280 --> 00:46:58.600
Not found.

00:46:58.600 --> 00:47:03.550
What is going on because I'm pretty
sure the logic is exactly the same?

00:47:03.550 --> 00:47:09.010
Well, it turns out in C, this
line here, currently line 11,

00:47:09.010 --> 00:47:11.350
is not how you compare strings.

00:47:11.350 --> 00:47:15.670
If you want to compare strings in C,
you don't do it like you did integers.

00:47:15.670 --> 00:47:17.920
You actually need another
technique altogether.

00:47:17.920 --> 00:47:21.070
And for that, we're going to need
to revisit one of our friends, which

00:47:21.070 --> 00:47:24.610
is string.h, which is one of the
header files for the string library

00:47:24.610 --> 00:47:27.610
that we introduced last week
that has in addition to functions

00:47:27.610 --> 00:47:31.390
like strlen, which gives you the
length of a string, it also gives us,

00:47:31.390 --> 00:47:34.330
as per the documentation here,
another function that we'll

00:47:34.330 --> 00:47:38.830
start to find useful here, succinctly
named strcmp, for string compare.

00:47:38.830 --> 00:47:44.750
And string compare will actually tell
us if two strings are the same or not.

00:47:44.750 --> 00:47:46.200
It will indeed compare them.

00:47:46.200 --> 00:47:49.970
And if I use this now,
let me go back to my code

00:47:49.970 --> 00:47:53.990
here and see what I might do differently
if I go back into my code here

00:47:53.990 --> 00:47:55.160
and change this value.

00:47:55.160 --> 00:47:59.570
Instead of using strings bracket i
equals, equals s, let's do str compare.

00:47:59.570 --> 00:48:01.190
And I read the documentation earlier.

00:48:01.190 --> 00:48:04.640
So I know that it takes two arguments,
the first and the second string

00:48:04.640 --> 00:48:09.500
that you want to compare, so
strings bracket i and then s,

00:48:09.500 --> 00:48:12.290
which is the string
that the human typed in.

00:48:12.290 --> 00:48:15.680
But somewhat weirdly,
what I want to check for

00:48:15.680 --> 00:48:19.220
is that str compare returns zero.

00:48:19.220 --> 00:48:24.950
So if str compare when given two strings
is input, strings bracket i and s,

00:48:24.950 --> 00:48:30.750
returns an integer 0, that actually
means the strings are the same.

00:48:30.750 --> 00:48:31.500
So let's try this.

00:48:31.500 --> 00:48:33.020
Let me do make search again.

00:48:33.020 --> 00:48:33.890
Huh.

00:48:33.890 --> 00:48:36.410
What did I do wrong here?

00:48:36.410 --> 00:48:39.200
A whole bunch of errors popped out.

00:48:39.200 --> 00:48:40.100
What did I do wrong?

00:48:40.100 --> 00:48:41.530
Yeah?

00:48:41.530 --> 00:48:44.510
Yeah, so I didn't include the very
header file we're talking about.

00:48:44.510 --> 00:48:46.270
So again, it doesn't necessarily
mean a logical error.

00:48:46.270 --> 00:48:47.920
It just means a stupid error on my part.

00:48:47.920 --> 00:48:49.670
I didn't actually
include the header file.

00:48:49.670 --> 00:48:51.340
So let me go back and actually do that.

00:48:51.340 --> 00:48:54.280
Up at the top in addition
to cs50.h and standard io,

00:48:54.280 --> 00:48:56.890
let's also include string.h.

00:48:56.890 --> 00:48:59.290
Let me clear my terminal
and do make search again.

00:48:59.290 --> 00:49:00.340
Crossing my fingers.

00:49:00.340 --> 00:49:01.390
That time it worked.

00:49:01.390 --> 00:49:06.040
And now if I do dot slash search
and search for top hat like before,

00:49:06.040 --> 00:49:08.500
now, thankfully, it is, in fact, found.

00:49:08.500 --> 00:49:12.340
If I do it once more and search for
battleship, now it's, in fact, found.

00:49:12.340 --> 00:49:15.520
If I do it once more and search for
cat, which should not be in there,

00:49:15.520 --> 00:49:17.870
that is not, in fact, found.

00:49:17.870 --> 00:49:21.050
So now just intuitively, even if
you've never done this before,

00:49:21.050 --> 00:49:25.450
why might it be valuable for
this function called strcmp

00:49:25.450 --> 00:49:30.370
to return zero if the
strings are equal as opposed

00:49:30.370 --> 00:49:34.060
to a simple Boolean like true or false,
which might have been your intuition?

00:49:34.060 --> 00:49:40.540
When you compare two strings,
what are the possible takeaways

00:49:40.540 --> 00:49:43.270
you might have from
comparing two strings?

00:49:43.270 --> 00:49:45.790
It's not just that they're equal or not.

00:49:45.790 --> 00:49:47.970
AUDIENCE: [INAUDIBLE]

00:49:47.970 --> 00:49:51.170
DAVID MALAN: OK, so maybe if the
ASCII values are the same, that

00:49:51.170 --> 00:49:53.150
might imply, indeed, literal equality.

00:49:53.150 --> 00:49:54.720
But something else.

00:49:54.720 --> 00:49:58.044
AUDIENCE: [INAUDIBLE] about how
similar these things are [INAUDIBLE]..

00:49:58.044 --> 00:49:59.020
DAVID MALAN: Ah, nice.

00:49:59.020 --> 00:50:00.940
Like, you and I in English,
certainly, are very much

00:50:00.940 --> 00:50:02.982
in the habit of sorting
information, whether it's

00:50:02.982 --> 00:50:06.490
in a dictionary, in our contacts, in
a phone book, in any such technology.

00:50:06.490 --> 00:50:11.140
And so it's often useful to be able to
know, does this string equal another?

00:50:11.140 --> 00:50:11.740
Sure.

00:50:11.740 --> 00:50:15.760
But does this string come before
another alphabetically or maybe

00:50:15.760 --> 00:50:17.620
after another alphabetically?

00:50:17.620 --> 00:50:20.680
So sometimes, you want functions
to give you back three answers.

00:50:20.680 --> 00:50:24.550
But equals, equals alone can only
give you true or false, yes or no.

00:50:24.550 --> 00:50:27.040
And that might not be
useful enough when you're

00:50:27.040 --> 00:50:29.200
trying to solve some
problem involving strings.

00:50:29.200 --> 00:50:34.280
So it turns out str compare actually
compares the two strings for equality

00:50:34.280 --> 00:50:38.470
but also for what's called,
playfully, ASCII-betical order.

00:50:38.470 --> 00:50:41.590
So not alphabetical order, per
se, but ASCII-betical order

00:50:41.590 --> 00:50:44.680
where it actually compares the
integer values of the letters.

00:50:44.680 --> 00:50:46.450
So if you're comparing
the letter A, It's

00:50:46.450 --> 00:50:51.250
going to compare 65 against some
other letter's integer value-- hence

00:50:51.250 --> 00:50:52.880
ASCII-betical value.

00:50:52.880 --> 00:50:54.790
So we're not doing any
form of sorting here.

00:50:54.790 --> 00:50:56.110
So it's sort of immaterial.

00:50:56.110 --> 00:50:59.000
But as per the documentation,
I do know that str compare

00:50:59.000 --> 00:51:01.880
returns zero if two strings are equal.

00:51:01.880 --> 00:51:06.230
And a little teaser for next week, it
turns out when I was only using equals,

00:51:06.230 --> 00:51:09.400
equals to compare
strings bracket i and s,

00:51:09.400 --> 00:51:13.350
I was not comparing the strings in
the way that you might have thought.

00:51:13.350 --> 00:51:18.087
And if you have programmed in Java
or Python before, equals, equals

00:51:18.087 --> 00:51:20.420
is actually doing something
different in those languages

00:51:20.420 --> 00:51:23.130
than it is actually doing in
C. But more on that next week.

00:51:23.130 --> 00:51:26.600
For now, just take on faith
that str compare is indeed

00:51:26.600 --> 00:51:29.010
how you compare two strings.

00:51:29.010 --> 00:51:33.080
So let's actually put this into play
with some actual additional code.

00:51:33.080 --> 00:51:37.940
Let me propose that we implement a very
simplistic phone book, for instance.

00:51:37.940 --> 00:51:40.640
Let me go ahead and implement here--

00:51:40.640 --> 00:51:42.110
how about in a new file.

00:51:42.110 --> 00:51:46.550
Instead of search dot C, let's
actually do phone book dot c.

00:51:46.550 --> 00:51:50.510
And in this phone book, I'm going to go
ahead and include the same header file,

00:51:50.510 --> 00:51:52.730
so cs50.h, so I can get input--

00:51:52.730 --> 00:51:55.310
standard io.h, so I can use printf--

00:51:55.310 --> 00:51:58.070
string.h so that I can use str compare.

00:51:58.070 --> 00:52:02.250
Let me give myself a main function again
without command line arguments for now.

00:52:02.250 --> 00:52:05.150
And let me go ahead now and
store a proper phone book, which

00:52:05.150 --> 00:52:07.800
has some names and some actual numbers.

00:52:07.800 --> 00:52:09.320
So let's store the names first.

00:52:09.320 --> 00:52:12.560
So string-- names is going
to be the name of my array.

00:52:12.560 --> 00:52:14.580
And let's go ahead and
store Carter's name,

00:52:14.580 --> 00:52:18.260
how about my name, and maybe John
Harvard for phone book throwback.

00:52:18.260 --> 00:52:22.640
Then let's go ahead and give me another
array called numbers wherein I'll put

00:52:22.640 --> 00:52:27.560
our phone number, so
617-495-1000 for Carter--

00:52:27.560 --> 00:52:32.060
617-495-1000 for me-- technically,
directory assistance here.

00:52:32.060 --> 00:52:34.160
And then for John, we'll
give him an actual one.

00:52:34.160 --> 00:52:40.100
So it's actually going
to be 949-468-2750.

00:52:40.100 --> 00:52:42.380
You're welcome to text or
call John when you want.

00:52:42.380 --> 00:52:43.070
Whoops.

00:52:43.070 --> 00:52:46.130
And just for good measure, let's
go ahead and put our country codes

00:52:46.130 --> 00:52:50.990
in here plus 1, even though at the
end of the day, these are strings.

00:52:50.990 --> 00:52:54.320
So indeed notice I'm not using
an integer for these values.

00:52:54.320 --> 00:52:56.510
I kind of sort of should.

00:52:56.510 --> 00:52:59.270
But here's where data types
in C and programming more

00:52:59.270 --> 00:53:01.280
generally might sometimes mislead you.

00:53:01.280 --> 00:53:04.100
Even though we call it,
obviously, a phone number.

00:53:04.100 --> 00:53:08.210
It's probably best to represent
it generally as strings, in fact,

00:53:08.210 --> 00:53:12.110
so that you can have the pluses-- you
can have the dashes so that it doesn't

00:53:12.110 --> 00:53:14.030
get too big and overflow an integer.

00:53:14.030 --> 00:53:16.340
Maybe it's an international number
for which there's even more digits.

00:53:16.340 --> 00:53:18.120
You don't want to risk
overflowing a value.

00:53:18.120 --> 00:53:20.120
And in general, the rule
of thumb in programming

00:53:20.120 --> 00:53:22.940
is even if in English we
call something a number,

00:53:22.940 --> 00:53:26.600
if you wouldn't do math on it ever,
you should probably be storing it

00:53:26.600 --> 00:53:29.060
as a string, not as an integer.

00:53:29.060 --> 00:53:33.060
And it makes no logical sense to
do math on phone numbers, per se.

00:53:33.060 --> 00:53:35.570
So those are best
instinctively left as strings.

00:53:35.570 --> 00:53:37.850
And in this case, even
more simply, this ensures

00:53:37.850 --> 00:53:41.630
that we have pluses and dashes
stored inside of the string.

00:53:41.630 --> 00:53:46.490
All right, so now that I have these
two arrays in parallel, if you will.

00:53:46.490 --> 00:53:49.040
Like, I'm assuming that
Carter's name is first.

00:53:49.040 --> 00:53:50.180
So his number is first.

00:53:50.180 --> 00:53:51.230
David's name is second.

00:53:51.230 --> 00:53:52.850
So his number is second.

00:53:52.850 --> 00:53:54.810
John's is third and thus third.

00:53:54.810 --> 00:53:56.060
So let's actually search this.

00:53:56.060 --> 00:53:58.010
Let's ask the user for--

00:53:58.010 --> 00:54:01.340
how about a name using get string.

00:54:01.340 --> 00:54:03.950
And this will be a name to
search for in the phone book

00:54:03.950 --> 00:54:05.420
just like we did in week zero.

00:54:05.420 --> 00:54:09.290
Let's do for int i
equals 0, i less than 3.

00:54:09.290 --> 00:54:10.738
Again, the 3 is bad practice.

00:54:10.738 --> 00:54:12.530
I should probably store
that in a constant.

00:54:12.530 --> 00:54:15.950
But let's keep the focus for
today on the algorithm alone--

00:54:15.950 --> 00:54:17.330
i plus, plus.

00:54:17.330 --> 00:54:20.240
Then in here, let's do if--

00:54:20.240 --> 00:54:27.800
how about names bracket i
equals, equals name typed in.

00:54:27.800 --> 00:54:28.550
But wait a minute.

00:54:28.550 --> 00:54:30.320
I'm screwing this up again.

00:54:30.320 --> 00:54:32.270
What should I be using here?

00:54:32.270 --> 00:54:37.340
str compare again-- so let's do
str compare, names bracket i comma,

00:54:37.340 --> 00:54:38.930
name, which came from the user.

00:54:38.930 --> 00:54:42.770
And if that return value is
zero, then let's go ahead

00:54:42.770 --> 00:54:45.830
and print out, just like
before, found backslash n.

00:54:45.830 --> 00:54:46.580
But you know what?

00:54:46.580 --> 00:54:48.170
We can do something more interesting.

00:54:48.170 --> 00:54:49.670
Let's actually print out the number.

00:54:49.670 --> 00:54:51.260
So I didn't just find something.

00:54:51.260 --> 00:54:52.550
I found the name.

00:54:52.550 --> 00:54:55.710
So let's actually plug in that
person's corresponding number.

00:54:55.710 --> 00:54:57.900
So now it's a more useful
phonebook or contacts app

00:54:57.900 --> 00:55:00.060
where I'm going to show
the human not just found

00:55:00.060 --> 00:55:02.430
but found this specific number.

00:55:02.430 --> 00:55:06.010
Then I'm going to go ahead as before
and return 0 to indicate success.

00:55:06.010 --> 00:55:09.538
And if we get all the way down here,
I'm going to go ahead and say not found

00:55:09.538 --> 00:55:12.330
and not print out any number because
I obviously haven't found one,

00:55:12.330 --> 00:55:15.880
and return one by convention
to indicate failure.

00:55:15.880 --> 00:55:17.890
So let me open my terminal window.

00:55:17.890 --> 00:55:20.340
Let me do make phone book-- enter.

00:55:20.340 --> 00:55:23.010
So far, so good-- dot slash phone book.

00:55:23.010 --> 00:55:25.080
Let's search for Carter--

00:55:25.080 --> 00:55:27.060
enter-- found his number.

00:55:27.060 --> 00:55:27.840
Let's do it again.

00:55:27.840 --> 00:55:28.757
Dot slash phone book--

00:55:28.757 --> 00:55:30.030
David-- found it.

00:55:30.030 --> 00:55:32.490
Let's do it one more time for John--

00:55:32.490 --> 00:55:33.240
found it.

00:55:33.240 --> 00:55:38.160
And just for good measure, let's
do one other here like Eli--

00:55:38.160 --> 00:55:40.920
enter-- not found in this case.

00:55:40.920 --> 00:55:43.710
All right, so it seems to be
working based on this example here.

00:55:43.710 --> 00:55:48.630
But now we've actually implemented the
idea of, like, a proper phone book.

00:55:48.630 --> 00:55:51.510
But does any aspect of
the design of this code,

00:55:51.510 --> 00:55:54.060
even if you've never
programmed before CS50,

00:55:54.060 --> 00:55:58.120
does anything rub you wrong about
how we're storing our data--

00:55:58.120 --> 00:56:02.690
the phone book itself--
these names and numbers?

00:56:02.690 --> 00:56:04.220
Does anything rub you the wrong way?

00:56:04.220 --> 00:56:06.008
Yeah, in back.

00:56:06.008 --> 00:56:09.326
AUDIENCE: [INAUDIBLE]

00:56:11.222 --> 00:56:13.320
DAVID MALAN: Yeah,
really good observation.

00:56:13.320 --> 00:56:15.510
I'm separating the
names and the numbers,

00:56:15.510 --> 00:56:17.775
which indeed, it looks
a little bit weird.

00:56:17.775 --> 00:56:20.650
And there's actually this technical
term in the world of programming,

00:56:20.650 --> 00:56:22.088
which is code smell, where, like--

00:56:22.088 --> 00:56:25.380
[SNIFFING]---- something smells a little
off about this code in the sense that,

00:56:25.380 --> 00:56:27.270
like, this probably
doesn't end well, right?

00:56:27.270 --> 00:56:31.438
If I add a fourth name, a fourth
number, a fifth name, a fiftieth name

00:56:31.438 --> 00:56:33.480
and number, like, at some
point, they're probably

00:56:33.480 --> 00:56:34.813
going to get out of sync, right?

00:56:34.813 --> 00:56:37.110
So, like, there's something
awry about this design

00:56:37.110 --> 00:56:39.870
that I shouldn't decouple
the names from the numbers.

00:56:39.870 --> 00:56:42.300
So something kind of smells
about this code, so to speak.

00:56:42.300 --> 00:56:44.760
And any time you perceive
that in your code,

00:56:44.760 --> 00:56:48.150
it's probably an opportunity to
go about improving it somehow.

00:56:48.150 --> 00:56:50.893
But to do that, we actually need
another tool in the toolkit.

00:56:50.893 --> 00:56:53.560
And that is, again, this term
that I've used a couple of times--

00:56:53.560 --> 00:56:54.367
data structures.

00:56:54.367 --> 00:56:56.700
Like, arrays have been the
first of our data structures.

00:56:56.700 --> 00:56:57.783
And they're so simplistic.

00:56:57.783 --> 00:57:01.110
It just means storing things back to
back to back contiguously in memory.

00:57:01.110 --> 00:57:03.300
But it turns out C--

00:57:03.300 --> 00:57:06.660
and a little bit of new syntax-- but
it's not a lot of new syntax today--

00:57:06.660 --> 00:57:09.060
a little bit of new
syntax today will allow

00:57:09.060 --> 00:57:13.710
us to create our own data structures,
our own types of variables,

00:57:13.710 --> 00:57:16.330
largely using syntax
we've seen thus far.

00:57:16.330 --> 00:57:19.230
So to do this, let me
propose that in order

00:57:19.230 --> 00:57:22.980
to represent a person in
a phone book-- well, let's

00:57:22.980 --> 00:57:26.250
not just implement them as a list
of names and a list of numbers.

00:57:26.250 --> 00:57:30.660
Wouldn't it be nice if C had a
data type actually called person?

00:57:30.660 --> 00:57:34.560
Because if it did, then I could go
about creating an array called--

00:57:34.560 --> 00:57:36.690
using the pluralized form-- people--

00:57:36.690 --> 00:57:40.470
containing my people in my phone book.

00:57:40.470 --> 00:57:44.070
And maybe a person has
both a name and a number.

00:57:44.070 --> 00:57:46.690
And therefore, we can kind
of keep everything together.

00:57:46.690 --> 00:57:47.740
So how can I do this?

00:57:47.740 --> 00:57:48.960
Well, what is a person?

00:57:48.960 --> 00:57:52.350
Well, a person, really, in this
story is a person has a name.

00:57:52.350 --> 00:57:53.730
And a person has a number.

00:57:53.730 --> 00:57:58.500
So can we create a new data type that
maybe has both of these together?

00:57:58.500 --> 00:58:02.280
Well, we actually can by
using one piece of new syntax

00:58:02.280 --> 00:58:05.100
today, which is just this here.

00:58:05.100 --> 00:58:09.395
Using what's called a struct, we
can create our own data structure

00:58:09.395 --> 00:58:11.020
that actually has some structure in it.

00:58:11.020 --> 00:58:13.150
It's not just one thing
like a string or an int.

00:58:13.150 --> 00:58:14.320
Maybe it's two strings.

00:58:14.320 --> 00:58:15.190
Maybe it's two ints.

00:58:15.190 --> 00:58:16.210
Maybe it's one of each.

00:58:16.210 --> 00:58:21.380
So a structure can be a variable that
contains any number of other variables,

00:58:21.380 --> 00:58:22.060
so to speak.

00:58:22.060 --> 00:58:24.310
And typedef is a cryptic
keyword that just

00:58:24.310 --> 00:58:28.630
means define the following type--
invent the following data type for me.

00:58:28.630 --> 00:58:30.320
And the syntax is a little weird.

00:58:30.320 --> 00:58:32.620
But you say typedef struct curly brace.

00:58:32.620 --> 00:58:35.410
Inside of the curly braces,
you put all of the types

00:58:35.410 --> 00:58:38.090
of variables you want to
associate with this new data type.

00:58:38.090 --> 00:58:40.660
And then after the curly
brace, you invent the name

00:58:40.660 --> 00:58:41.830
that you want to give it.

00:58:41.830 --> 00:58:45.550
And this will create a new
data type in C called person,

00:58:45.550 --> 00:58:47.830
even though no one
thought of this decades

00:58:47.830 --> 00:58:52.670
ago when C was created alongside of
the ints and the floats and so forth.

00:58:52.670 --> 00:58:55.220
So how can I actually use this?

00:58:55.220 --> 00:58:58.330
Well, it turns out that
once you have this building

00:58:58.330 --> 00:59:03.190
block of creating your very own data
structures, I can go back into my code

00:59:03.190 --> 00:59:06.340
and improve it as follows in
direct response to your concern

00:59:06.340 --> 00:59:10.752
about it seeming not ideal that we're
decoupling the names and the numbers.

00:59:10.752 --> 00:59:13.210
Now, it's going to look a little
more complicated at first.

00:59:13.210 --> 00:59:15.320
But it will scale better over time.

00:59:15.320 --> 00:59:17.380
So within my code here,
I'm going to go ahead

00:59:17.380 --> 00:59:21.020
and essentially type out
exactly that same data type.

00:59:21.020 --> 00:59:27.460
So define a structure that has inside
of it a string called name and a string

00:59:27.460 --> 00:59:28.300
called number.

00:59:28.300 --> 00:59:30.700
And let's call this thing a person.

00:59:30.700 --> 00:59:34.270
So these new lines copied and pasted
from the slide a moment ago just

00:59:34.270 --> 00:59:36.370
invent the data type called person.

00:59:36.370 --> 00:59:39.010
So the only thing we need
today is the syntax via which

00:59:39.010 --> 00:59:43.610
we can set a person's name and number--
like, how can we access those values.

00:59:43.610 --> 00:59:47.920
So to do this, I'm going to go ahead
and erase the hard-coded arrays

00:59:47.920 --> 00:59:49.250
that I had before.

00:59:49.250 --> 00:59:53.740
And I'm going to give myself one
array of type person called people

00:59:53.740 --> 00:59:55.610
with room for three people.

00:59:55.610 --> 00:59:57.490
So this seems like a bit of a mouthful.

00:59:57.490 --> 00:59:59.560
But the new data type is called person.

00:59:59.560 --> 01:00:02.048
The array name is called people,
which in English is weird.

01:00:02.048 --> 01:00:04.840
I mean, it could call it persons
to make it a little more parallel.

01:00:04.840 --> 01:00:06.520
But we call them people, generally.

01:00:06.520 --> 01:00:09.880
And I want three such
people in my phone book.

01:00:09.880 --> 01:00:12.670
Now, how do I actually
initialize those people?

01:00:12.670 --> 01:00:16.030
Well, previously, I did something
like this-- names, bracket, 0.

01:00:16.030 --> 01:00:18.520
And then I did numbers, bracket, 0.

01:00:18.520 --> 01:00:20.080
Well, it's a similar idea.

01:00:20.080 --> 01:00:22.630
I do people, bracket, 0.

01:00:22.630 --> 01:00:27.490
But if I want to set this person's
name, the one new piece of syntax today

01:00:27.490 --> 01:00:32.290
is a period, a literal dot operator,
that says go inside of this person

01:00:32.290 --> 01:00:36.400
and access their name field
or their name attribute.

01:00:36.400 --> 01:00:39.190
And set it equal to,
quote, unquote, "Carter."

01:00:39.190 --> 01:00:41.800
Then go into that same person.

01:00:41.800 --> 01:00:43.660
But go into their number field.

01:00:43.660 --> 01:00:48.368
And set that equal to
plus 1, 617-495-1000.

01:00:48.368 --> 01:00:50.410
Then-- and I'll just
separate it with white space

01:00:50.410 --> 01:00:53.500
for readability-- go
into people bracket 1.

01:00:53.500 --> 01:00:56.170
Set their name into mine, David.

01:00:56.170 --> 01:00:58.390
Let's go into people bracket 1 number.

01:00:58.390 --> 01:01:02.000
Set that equal to the same, since we're
both available through the directory,

01:01:02.000 --> 01:01:04.720
so 617-495-1000.

01:01:04.720 --> 01:01:10.390
And then lastly, let's go ahead and
do people bracket 2 dot name equals,

01:01:10.390 --> 01:01:15.210
quote, unquote, "John," and then people
bracket 2 dot number equals, quote,

01:01:15.210 --> 01:01:21.646
unquote, "plus 1, 949-468-2750--

01:01:21.646 --> 01:01:23.548
let me fix my dash--

01:01:23.548 --> 01:01:24.850
semicolon.

01:01:24.850 --> 01:01:29.010
And now I think I can mostly keep
the rest of the code the same

01:01:29.010 --> 01:01:31.500
because if I'm searching
for this person's name,

01:01:31.500 --> 01:01:35.160
I think the only thing I want to change
is this because I don't have a names

01:01:35.160 --> 01:01:36.300
array anymore.

01:01:36.300 --> 01:01:40.440
So what should I change this
highlighted phrase to in order

01:01:40.440 --> 01:01:44.930
to search the ith person's name?

01:01:44.930 --> 01:01:45.800
What should this be?

01:01:45.800 --> 01:01:47.870
Yeah?

01:01:47.870 --> 01:01:51.470
People bracket i dash name because
the whole point of this loop

01:01:51.470 --> 01:01:54.330
is to iterate over each of
these people one at a time.

01:01:54.330 --> 01:01:57.950
So people bracket i gives me the ith
person-- first [INAUDIBLE] people 0,

01:01:57.950 --> 01:01:59.060
people 1, people 2.

01:01:59.060 --> 01:02:01.880
But if on each iteration, I
want to check that person's name

01:02:01.880 --> 01:02:04.010
and compare it against
whatever the human typed in,

01:02:04.010 --> 01:02:05.700
now I can do exactly that.

01:02:05.700 --> 01:02:12.350
But I have to change the output
to be people bracket i dot number,

01:02:12.350 --> 01:02:13.560
in this case.

01:02:13.560 --> 01:02:15.838
So I've added a little
bit of complexity.

01:02:15.838 --> 01:02:19.130
And granted, this is not going to be the
way long term you create a phone book.

01:02:19.130 --> 01:02:21.660
Odds are, we're going to get the
phone book with a loop of some sort.

01:02:21.660 --> 01:02:24.702
We're going to use a constant, so I
know how many people I have room for.

01:02:24.702 --> 01:02:27.440
For demonstration sake, I'm just
typing everything out manually.

01:02:27.440 --> 01:02:30.660
But I think logically now,
we've achieved the same thing.

01:02:30.660 --> 01:02:33.050
So let me do make phone
book for this new version--

01:02:33.050 --> 01:02:35.180
no syntax errors-- dot slash phone book.

01:02:35.180 --> 01:02:36.770
Let's search for Carter-- enter.

01:02:36.770 --> 01:02:38.210
And there indeed is his number.

01:02:38.210 --> 01:02:39.710
Let's go ahead and search for David.

01:02:39.710 --> 01:02:40.860
There's my number.

01:02:40.860 --> 01:02:42.470
And lastly, let's search for John--

01:02:42.470 --> 01:02:42.970
his number.

01:02:42.970 --> 01:02:45.387
And again, we'll search for
someone we know is not there--

01:02:45.387 --> 01:02:46.130
Eli.

01:02:46.130 --> 01:02:47.840
And Eli is not found.

01:02:47.840 --> 01:02:52.160
So what we've done is try to solve this
problem of introducing a brand new data

01:02:52.160 --> 01:02:57.560
type that allows us to represent
this custom data that you and I just

01:02:57.560 --> 01:02:58.220
created.

01:02:58.220 --> 01:03:02.100
And that is by using the struct keyword
to cluster these things together

01:03:02.100 --> 01:03:05.960
and the typedef keyword to give it
a brand new name that we might like.

01:03:05.960 --> 01:03:08.330
Now, as an aside, when it
comes to styling your code,

01:03:08.330 --> 01:03:11.430
you'll actually see that style50
and similar tools will actually

01:03:11.430 --> 01:03:14.610
put the name of the data type on the
same line as the closing curly brace,

01:03:14.610 --> 01:03:15.900
just sort of a curiosity.

01:03:15.900 --> 01:03:16.523
That's fine.

01:03:16.523 --> 01:03:18.690
Even though I wrote it the
first way for consistency

01:03:18.690 --> 01:03:23.530
with Scratch and our prior examples,
this is the right way in styling it.

01:03:23.530 --> 01:03:26.470
Any questions now on this data type?

01:03:26.470 --> 01:03:26.970
Yeah?

01:03:29.675 --> 01:03:33.042
AUDIENCE: [INAUDIBLE]

01:03:34.966 --> 01:03:36.980
DAVID MALAN: The
question is, do you have

01:03:36.980 --> 01:03:39.590
to assign both the name and the
number when creating a person?

01:03:39.590 --> 01:03:42.410
Or can you only get away
with assigning one of them?

01:03:42.410 --> 01:03:43.200
You can.

01:03:43.200 --> 01:03:46.200
But that's going to be one of those
so-called garbage values, typically.

01:03:46.200 --> 01:03:48.800
And so there's just going
to be some bogus data there.

01:03:48.800 --> 01:03:53.270
And unless you are so careful as to
never touch that field thereafter,

01:03:53.270 --> 01:03:56.630
you probably run the risk of some kind
of bug, even a crash in your code,

01:03:56.630 --> 01:03:59.330
if you try to access that value
later, even though you've never

01:03:59.330 --> 01:03:59.960
initialized it.

01:03:59.960 --> 01:04:01.100
So yes, it's possible.

01:04:01.100 --> 01:04:02.520
But no, do not do that.

01:04:02.520 --> 01:04:05.020
Other questions?

01:04:05.020 --> 01:04:05.890
No?

01:04:05.890 --> 01:04:08.470
All right, well, now
that we have the ability

01:04:08.470 --> 01:04:11.570
to sort of represent more
interesting structures, up until now,

01:04:11.570 --> 01:04:14.350
we've sort of assumed that in
order to get to binary search,

01:04:14.350 --> 01:04:16.990
we have a phone book that
someone already sorted for us.

01:04:16.990 --> 01:04:19.540
For our second demonstration
with our volunteers,

01:04:19.540 --> 01:04:21.820
we assumed that someone,
Carter in that case,

01:04:21.820 --> 01:04:24.010
had already sorted the
information for us.

01:04:24.010 --> 01:04:26.050
It was proposed by the
audience that, well,

01:04:26.050 --> 01:04:29.770
what if we sort the information
first and then go find the number 50?

01:04:29.770 --> 01:04:33.580
That invited the question even early
on-- well, how expensive is it to sort?

01:04:33.580 --> 01:04:38.170
How much time and money and inefficiency
do Google and Microsoft and others

01:04:38.170 --> 01:04:41.590
spend to keep their data
and our data sorted?

01:04:41.590 --> 01:04:43.840
Well, let's consider what
the problem really is.

01:04:43.840 --> 01:04:47.380
If this is how we represent
any problem, the unsorted data

01:04:47.380 --> 01:04:50.770
that we might consume by typing
things in randomly to a search engine

01:04:50.770 --> 01:04:54.430
or crawling the internet or just adding
context in any old order to our phone

01:04:54.430 --> 01:04:55.840
is arguably unsorted.

01:04:55.840 --> 01:04:58.060
It's certainly not
alphabetically sorted by default.

01:04:58.060 --> 01:04:59.650
But we want to get it sorted.

01:04:59.650 --> 01:05:02.860
And so somewhere in here in this
black box, we need a set of algorithms

01:05:02.860 --> 01:05:05.130
for actually sorting
information as well.

01:05:05.130 --> 01:05:09.080
For instance, if we have
these integers here unsorted--

01:05:09.080 --> 01:05:14.130
72541603-- effectively random,
the problem of sorting, of course,

01:05:14.130 --> 01:05:18.050
is to turn it into 01234567 instead.

01:05:18.050 --> 01:05:20.100
And there's a bunch of
different ways to do this.

01:05:20.100 --> 01:05:22.730
But before we do that, I think it's
probably time for some brownies.

01:05:22.730 --> 01:05:24.605
So let's go ahead and
take a 10-minute break.

01:05:24.605 --> 01:05:26.570
And we'll see you in 10.

01:05:26.570 --> 01:05:28.040
All right, we are back.

01:05:28.040 --> 01:05:30.470
And where we left off
was this cliffhanger.

01:05:30.470 --> 01:05:32.060
We've got some unsorted numbers.

01:05:32.060 --> 01:05:33.380
We want to make them sorted.

01:05:33.380 --> 01:05:34.430
How do we do this?

01:05:34.430 --> 01:05:37.250
And at the risk of one
too many volunteers,

01:05:37.250 --> 01:05:39.680
could we get eight more volunteers?

01:05:39.680 --> 01:05:41.840
Wow, OK, overwhelming.

01:05:41.840 --> 01:05:45.110
OK, how about 1, 2, 3, 4.

01:05:45.110 --> 01:05:46.250
How about all three of you?

01:05:46.250 --> 01:05:51.290
OK, 5, 6, 7--

01:05:51.290 --> 01:05:51.950
OK, 8.

01:05:51.950 --> 01:05:52.460
Come on.

01:05:52.460 --> 01:05:54.120
All right, one from each section--

01:05:54.120 --> 01:05:56.244
all right, come on down.

01:05:56.244 --> 01:06:00.073
[INTERPOSING VOICES]

01:06:00.073 --> 01:06:01.615
DAVID MALAN: All right, come on down.

01:06:05.410 --> 01:06:06.100
Thank you.

01:06:06.100 --> 01:06:10.695
And if you all could grab a number here.

01:06:10.695 --> 01:06:11.320
So you'll be 7.

01:06:11.320 --> 01:06:13.220
Stand on the left there if you could.

01:06:13.220 --> 01:06:14.470
All right, you'll be number 2.

01:06:14.470 --> 01:06:17.590
Stand to his left.

01:06:17.590 --> 01:06:20.020
OK, keep coming.

01:06:20.020 --> 01:06:21.910
OK, yeah.

01:06:21.910 --> 01:06:24.380
OK, here we go.

01:06:24.380 --> 01:06:26.530
There you go.

01:06:26.530 --> 01:06:28.210
OK, [INAUDIBLE] 0 and 3.

01:06:28.210 --> 01:06:31.810
OK, so let's just make sure
they match what we've got there.

01:06:31.810 --> 01:06:32.810
Good so far.

01:06:32.810 --> 01:06:36.313
OK, so we have eight volunteers here,
an array of volunteers, if you would.

01:06:36.313 --> 01:06:38.230
This time, we've used
eight rather than seven.

01:06:38.230 --> 01:06:41.840
We deliberately had seven lockers
just so that the divide by 2 math

01:06:41.840 --> 01:06:42.340
worked out.

01:06:42.340 --> 01:06:44.890
That was very deliberate that there
was always a [INAUDIBLE] a middle,

01:06:44.890 --> 01:06:45.682
a middle, a middle.

01:06:45.682 --> 01:06:48.682
In this case, it doesn't matter because
we're going to focus on sorting.

01:06:48.682 --> 01:06:51.130
But first, how about some
introductions from each group?

01:06:51.130 --> 01:06:52.520
AUDIENCE: I'm Rebecca.

01:06:52.520 --> 01:06:53.962
I'm a first year in Pennypacker.

01:06:53.962 --> 01:06:56.170
And I'm thinking about
studying environmental science

01:06:56.170 --> 01:06:57.410
and public policy.

01:06:57.410 --> 01:06:57.910
[CHEERING]

01:06:57.910 --> 01:06:59.240
DAVID MALAN: Nice.

01:06:59.240 --> 01:07:00.100
[APPLAUSE]

01:07:00.100 --> 01:07:02.420
AUDIENCE: I'm [? Mariella. ?]
I'm also in Pennypacker.

01:07:02.420 --> 01:07:03.170
I'm a first year.

01:07:03.170 --> 01:07:05.020
And I'm thinking of
studying bioengineering.

01:07:05.020 --> 01:07:06.565
DAVID MALAN: Wonderful.

01:07:06.565 --> 01:07:09.440
AUDIENCE: My name's [? Haron ?]
[? Li. ?] I'm a freshman in Matthews.

01:07:09.440 --> 01:07:11.015
I'm planning on studying
applied mathematics.

01:07:11.015 --> 01:07:12.290
DAVID MALAN: Nice-- Matthews.

01:07:12.290 --> 01:07:13.910
AUDIENCE: My name is Emily.

01:07:13.910 --> 01:07:15.140
I'm a first year in Canada.

01:07:15.140 --> 01:07:17.470
And I'm still deciding what to study.

01:07:17.470 --> 01:07:18.220
DAVID MALAN: Fair.

01:07:18.220 --> 01:07:21.020
AUDIENCE: My name's [? Tanai. ?]
I'm a first year from Toronto.

01:07:21.020 --> 01:07:23.315
And I'm planning on
studying ECON and CS.

01:07:23.315 --> 01:07:24.110
DAVID MALAN: Nice.

01:07:24.110 --> 01:07:26.030
AUDIENCE: My name is Teddy.

01:07:26.030 --> 01:07:27.320
I'm a first year in Hurlbut.

01:07:27.320 --> 01:07:30.320
And I'm planning on concentrating in
computer science with linguistics.

01:07:30.320 --> 01:07:31.070
DAVID MALAN: Nice.

01:07:31.070 --> 01:07:32.000
AUDIENCE: Yeah!

01:07:32.000 --> 01:07:32.750
DAVID MALAN: Nice.

01:07:32.750 --> 01:07:33.690
[APPLAUSE]

01:07:33.690 --> 01:07:36.470
AUDIENCE: My name's [? Lenny. ?]
I'm a first year in Matthews.

01:07:36.470 --> 01:07:39.300
And I'm planning on
concentrating in gov and CS.

01:07:39.300 --> 01:07:40.220
DAVID MALAN: Ah, nice.

01:07:40.220 --> 01:07:40.565
[CHEERING]

01:07:40.565 --> 01:07:41.037
[APPLAUSE]

01:07:41.037 --> 01:07:41.537
And?

01:07:41.537 --> 01:07:42.740
AUDIENCE: My name is Eli.

01:07:42.740 --> 01:07:44.023
I'm a first year in Hollis.

01:07:44.023 --> 01:07:45.440
And I plan on concentrating in CS.

01:07:45.440 --> 01:07:47.540
DAVID MALAN: Eli, we keep
looking for you today.

01:07:47.540 --> 01:07:49.962
OK, so if you guys could scooch
a little bit this way just

01:07:49.962 --> 01:07:51.170
to be a little more centered.

01:07:51.170 --> 01:07:54.750
Notice that this array of
volunteers is entirely unsorted.

01:07:54.750 --> 01:07:56.960
So we thought we'd do
three passes at this.

01:07:56.960 --> 01:08:00.135
Could you all sort yourselves
from smallest to largest?

01:08:00.135 --> 01:08:00.635
Go.

01:08:06.420 --> 01:08:08.190
All right, that was very good.

01:08:08.190 --> 01:08:09.420
So yes, so, sure.

01:08:09.420 --> 01:08:10.003
OK.

01:08:10.003 --> 01:08:10.503
[CHEERING]

01:08:10.503 --> 01:08:12.920
[APPLAUSE]

01:08:12.920 --> 01:08:14.420
And let's see.

01:08:14.420 --> 01:08:17.603
What was your algorithm?

01:08:17.603 --> 01:08:19.520
AUDIENCE: I just kind
of found the person that

01:08:19.520 --> 01:08:21.260
was, like, one lower or one higher.

01:08:21.260 --> 01:08:22.560
So I looked at him.

01:08:22.560 --> 01:08:23.060
He had 2.

01:08:23.060 --> 01:08:24.500
So I knew I had to be on his left.

01:08:24.500 --> 01:08:25.729
And then she had 3.

01:08:25.729 --> 01:08:27.410
So I told her to come to my right.

01:08:27.410 --> 01:08:28.467
And then I knew he had 5.

01:08:28.467 --> 01:08:30.050
DAVID MALAN: OK, nice-- pretty clever.

01:08:30.050 --> 01:08:31.640
And how about your algorithm?

01:08:31.640 --> 01:08:34.935
AUDIENCE: Same thing-- looked for
the number that was lower and higher

01:08:34.935 --> 01:08:35.810
and found the middle.

01:08:35.810 --> 01:08:38.029
DAVID MALAN: OK, interesting, because
I think from the outside view,

01:08:38.029 --> 01:08:39.638
it all seemed very organic.

01:08:39.638 --> 01:08:42.180
And things just kind of worked
themselves out, which is fine.

01:08:42.180 --> 01:08:44.930
But I dare say what you guys did
was probably a little hard for me

01:08:44.930 --> 01:08:46.410
to translate into code.

01:08:46.410 --> 01:08:48.560
So let me propose that we
take two passes at this.

01:08:48.560 --> 01:08:51.380
If you could reset yourselves
to be in that order from left

01:08:51.380 --> 01:08:54.920
to right, which is just the exact
same sequence, just so that we're

01:08:54.920 --> 01:08:58.310
starting from the same point each time.

01:08:58.310 --> 01:08:59.630
[INAUDIBLE]

01:08:59.630 --> 01:09:00.319
Very good.

01:09:00.319 --> 01:09:02.120
All right, so let me propose this.

01:09:02.120 --> 01:09:04.470
We can approach sorting in
a couple of different ways.

01:09:04.470 --> 01:09:05.720
But it needs to be methodical.

01:09:05.720 --> 01:09:09.600
Like, it needs to translate ideally
to pseudocode and eventually code.

01:09:09.600 --> 01:09:12.810
So as much as we can quantize
things to be step by step,

01:09:12.810 --> 01:09:15.100
I think the better this
will scale overall,

01:09:15.100 --> 01:09:19.020
especially when there's not eight people
but maybe there's 80 people or 800.

01:09:19.020 --> 01:09:22.020
Because I dare say that if we had
everyone in the room sort themselves--

01:09:22.020 --> 01:09:23.580
if they were all handling a number--

01:09:23.580 --> 01:09:26.062
like, that probably wouldn't
have worked out very well.

01:09:26.062 --> 01:09:29.229
It probably would have taken forever
because there's just so much more data.

01:09:29.229 --> 01:09:30.700
So let's be a little more methodical.

01:09:30.700 --> 01:09:33.283
So for instance, I don't have a
bird's eye view of the numbers

01:09:33.283 --> 01:09:35.130
just as before because
even though we don't

01:09:35.130 --> 01:09:38.460
have doors in front of our volunteers,
they're effectively lockers too.

01:09:38.460 --> 01:09:42.040
And the computer or the human in my case
can only look at one door at a time.

01:09:42.040 --> 01:09:44.880
So if I want to find the
smallest number and put it

01:09:44.880 --> 01:09:46.830
where it should go on
the left, I can't just

01:09:46.830 --> 01:09:49.500
take a step back and be like,
OK, obviously, there's the one.

01:09:49.500 --> 01:09:51.010
I have to do it more methodically.

01:09:51.010 --> 01:09:52.350
So I'm going to check here--

01:09:52.350 --> 01:09:52.890
7.

01:09:52.890 --> 01:09:55.320
At the moment, this is actually
the smallest number I've seen.

01:09:55.320 --> 01:09:56.945
So I'm going to make mental note of it.

01:09:56.945 --> 01:09:58.230
OK, 2 I see now.

01:09:58.230 --> 01:10:01.140
I can forget about the 7
because 2 is clearly smaller.

01:10:01.140 --> 01:10:02.850
5-- I don't need to remember it.

01:10:02.850 --> 01:10:04.200
4-- I don't need to remember it.

01:10:04.200 --> 01:10:06.330
1-- OK, that's clearly smaller.

01:10:06.330 --> 01:10:08.560
Have I found my smallest number?

01:10:08.560 --> 01:10:10.337
Not even because there actually is a 0.

01:10:10.337 --> 01:10:11.920
So I should go through the whole list.

01:10:11.920 --> 01:10:14.905
But I will remember that my
smallest element is now 1.

01:10:14.905 --> 01:10:18.190
6 is not smaller-- oh, 0 is
smaller, so I'll remember this.

01:10:18.190 --> 01:10:19.400
3 is not smaller.

01:10:19.400 --> 01:10:21.790
And so now, our volunteer
for 0-- what was your name?

01:10:21.790 --> 01:10:22.707
AUDIENCE: [INAUDIBLE].

01:10:22.707 --> 01:10:26.120
DAVID MALAN: Mariana, and where
should we put you clearly?

01:10:26.120 --> 01:10:26.665
So there.

01:10:26.665 --> 01:10:27.790
But what's your name again?

01:10:27.790 --> 01:10:28.665
AUDIENCE: [INAUDIBLE]

01:10:28.665 --> 01:10:30.010
DAVID MALAN: Eli's in the way.

01:10:30.010 --> 01:10:31.745
So we could have Mary Ellen?

01:10:31.745 --> 01:10:32.620
AUDIENCE: [INAUDIBLE]

01:10:32.620 --> 01:10:35.860
DAVID MALAN: [? Mariella ?]
just go to the right of Eli.

01:10:35.860 --> 01:10:37.390
But that's kind of cheating, right?

01:10:37.390 --> 01:10:40.570
Because if this is an array,
recall that they are contiguous.

01:10:40.570 --> 01:10:43.578
But there could be other stuff in
memory to the left and to the right.

01:10:43.578 --> 01:10:44.620
So that's not quite fair.

01:10:44.620 --> 01:10:47.110
We can't just have the luxury of
putting things wherever we want.

01:10:47.110 --> 01:10:49.120
But Eli, you're not even
in the right order, anyway.

01:10:49.120 --> 01:10:50.620
So why don't we just swap you two.

01:10:50.620 --> 01:10:52.720
So [? Mariella ?] and Eli swap.

01:10:52.720 --> 01:10:55.780
But now I've taken one
bite out of this problem.

01:10:55.780 --> 01:10:57.910
I've coincidentally
made it a little better

01:10:57.910 --> 01:11:00.760
by moving Eli closer to where he is.

01:11:00.760 --> 01:11:01.510
But you know what?

01:11:01.510 --> 01:11:02.830
He was in a random location, anyway.

01:11:02.830 --> 01:11:05.372
So I don't think I made the
problem any worse, fundamentally.

01:11:05.372 --> 01:11:06.820
Now I can do this again.

01:11:06.820 --> 01:11:08.870
And I can shave a little
bit of time off of it

01:11:08.870 --> 01:11:11.600
because I don't need to revisit
[? Mariella ?] because if she

01:11:11.600 --> 01:11:14.300
was the smaller and I've already
selected her from the array,

01:11:14.300 --> 01:11:17.600
I can just move on and take one
fewer bites this time around.

01:11:17.600 --> 01:11:20.930
So 2 is the smallest
number, not 5, not 4.

01:11:20.930 --> 01:11:22.620
OK, 1 is the smallest number.

01:11:22.620 --> 01:11:27.320
So I'm going to remember that as
with a mental variable- 6, 7, 3.

01:11:27.320 --> 01:11:28.430
OK, 1 is the smallest.

01:11:28.430 --> 01:11:29.930
So let me select number 1.

01:11:29.930 --> 01:11:32.270
And we're going to have
to evict you, which

01:11:32.270 --> 01:11:34.080
is making the problem slightly worse.

01:11:34.080 --> 01:11:35.780
But I think it'll average out.

01:11:35.780 --> 01:11:38.490
And now 0 and 1 are in the right place.

01:11:38.490 --> 01:11:40.160
So now let me do this again but faster.

01:11:40.160 --> 01:11:42.230
So 5-- OK, 4--

01:11:42.230 --> 01:11:43.340
2 is the smallest--

01:11:43.340 --> 01:11:44.630
6, 7, 3.

01:11:44.630 --> 01:11:48.110
OK, 2, let's put you where
you belong, evicting 5.

01:11:48.110 --> 01:11:50.900
Now I can skip all three
of these volunteers.

01:11:50.900 --> 01:11:52.080
OK, 4, is the smallest.

01:11:52.080 --> 01:11:52.580
Nope.

01:11:52.580 --> 01:11:52.910
Nope.

01:11:52.910 --> 01:11:53.120
Nope.

01:11:53.120 --> 01:11:54.200
3, you're the smallest.

01:11:54.200 --> 01:11:55.880
Let me evict 4.

01:11:55.880 --> 01:11:59.330
All right, and now let me move
in front of these volunteers.

01:11:59.330 --> 01:12:02.250
5, 6, 7, 4, come on back.

01:12:02.250 --> 01:12:04.910
All right, and now let's select 6, 7, 5.

01:12:04.910 --> 01:12:07.140
OK, come on back.

01:12:07.140 --> 01:12:08.340
Oh, no, no cheating, Eli.

01:12:08.340 --> 01:12:09.630
OK, and then let's see.

01:12:09.630 --> 01:12:13.020
5, 7, 6-- OK, 6, come on back.

01:12:13.020 --> 01:12:14.250
OK, now you have to move.

01:12:14.250 --> 01:12:18.847
OK, and now we've selected in turn
all of the numbers from left to right.

01:12:18.847 --> 01:12:20.430
And even though that did feel slower--

01:12:20.430 --> 01:12:22.890
I was doing it a little more
verbally as I stepped through it--

01:12:22.890 --> 01:12:25.510
but I dare say we could translate
that probably more to code.

01:12:25.510 --> 01:12:26.010
Why?

01:12:26.010 --> 01:12:27.690
Because there's a lot of,
like, looping through it

01:12:27.690 --> 01:12:30.270
and probably a lot of conditionals
just asking the question,

01:12:30.270 --> 01:12:34.800
is this number smaller than this one or
conversely greater than this other one.

01:12:34.800 --> 01:12:36.520
All right, let's do this one more time.

01:12:36.520 --> 01:12:41.010
If you guys could reset
yourselves to this ordering.

01:12:41.010 --> 01:12:46.720
All right, so we again have 72541603.

01:12:46.720 --> 01:12:47.220
Good.

01:12:47.220 --> 01:12:48.480
So how about this?

01:12:48.480 --> 01:12:50.550
I liked the intuition
that they originally

01:12:50.550 --> 01:12:53.520
had, funny enough, whereby
they just kind of organically

01:12:53.520 --> 01:12:55.570
looked to the person to
the left and to the right

01:12:55.570 --> 01:12:56.820
and kind of fixed the problem.

01:12:56.820 --> 01:12:59.970
So if they were out of order, they
just kind of swapped locally adjacent

01:12:59.970 --> 01:13:00.610
to each other.

01:13:00.610 --> 01:13:01.500
So let's try this.

01:13:01.500 --> 01:13:03.670
So 7 and 2, you're clearly out of order.

01:13:03.670 --> 01:13:05.340
So let's swap just you two.

01:13:05.340 --> 01:13:06.870
7 and 5, you're out of order.

01:13:06.870 --> 01:13:08.190
Let's swap you two.

01:13:08.190 --> 01:13:10.290
7 and 4, let's swap you two.

01:13:10.290 --> 01:13:12.390
7 and 1, let's swap you two.

01:13:12.390 --> 01:13:15.360
7 and 6, let's swap you two.

01:13:15.360 --> 01:13:17.220
7 and 0, swap you two.

01:13:17.220 --> 01:13:18.900
7 and 3, swap you two.

01:13:18.900 --> 01:13:20.620
OK, that was a lot of swapping.

01:13:20.620 --> 01:13:22.560
But notice what happened.

01:13:22.560 --> 01:13:23.790
I'm not done, clearly.

01:13:23.790 --> 01:13:24.780
It's not sorted yet.

01:13:24.780 --> 01:13:26.230
But I have improved the situation.

01:13:26.230 --> 01:13:26.730
How?

01:13:26.730 --> 01:13:29.910
Who is now definitely
in the right place?

01:13:29.910 --> 01:13:34.510
So, 7-- or Eli has wonderfully bubbled
all the way up to the top of the list,

01:13:34.510 --> 01:13:35.170
so to speak.

01:13:35.170 --> 01:13:37.030
Now I can actually skip
him moving forward.

01:13:37.030 --> 01:13:38.820
So that takes one bite
out of the problem.

01:13:38.820 --> 01:13:39.653
Let's do this again.

01:13:39.653 --> 01:13:40.440
2 and 5 are OK.

01:13:40.440 --> 01:13:42.285
5 and 4, let's swap.

01:13:42.285 --> 01:13:43.020
No, over there.

01:13:43.020 --> 01:13:45.150
[LAUGHS] Thank you.

01:13:45.150 --> 01:13:46.950
5 and 1, let's swap.

01:13:46.950 --> 01:13:48.420
5 and 6 are good.

01:13:48.420 --> 01:13:50.220
6 and 0, let's swap.

01:13:50.220 --> 01:13:52.158
6 and 3, let's swap.

01:13:52.158 --> 01:13:53.700
And we don't have to worry about Eli.

01:13:53.700 --> 01:13:54.720
And now, what's your name again?

01:13:54.720 --> 01:13:54.960
AUDIENCE: [? Haron. ?]

01:13:54.960 --> 01:13:57.700
DAVID MALAN: [? Haron-- ?] we don't
have to worry about him either as well.

01:13:57.700 --> 01:13:59.283
So now I can go back to the beginning.

01:13:59.283 --> 01:14:02.460
And even though it feels like
I'm going back and forth a lot,

01:14:02.460 --> 01:14:05.257
that's OK because the problem's
still getting better and better.

01:14:05.257 --> 01:14:06.840
I'm taking a bite out of it each time.

01:14:06.840 --> 01:14:07.770
2 and 4 are good.

01:14:07.770 --> 01:14:10.140
4 and 1, let's swap.

01:14:10.140 --> 01:14:11.130
4 and 5 are good.

01:14:11.130 --> 01:14:12.420
5 and 0, swap.

01:14:12.420 --> 01:14:13.950
5 and 3, swap.

01:14:13.950 --> 01:14:15.750
And now these three
are in the right place.

01:14:15.750 --> 01:14:16.500
Let's do it again.

01:14:16.500 --> 01:14:17.670
2 and 1-- ah, here we go.

01:14:17.670 --> 01:14:18.330
Let's swap.

01:14:18.330 --> 01:14:19.200
2 and 4 are OK.

01:14:19.200 --> 01:14:20.670
4 and 0, swap.

01:14:20.670 --> 01:14:22.110
4 and 3, swap.

01:14:22.110 --> 01:14:23.820
Now, these four are OK.

01:14:23.820 --> 01:14:24.990
1 and 2, you're good.

01:14:24.990 --> 01:14:26.280
2 and 0, swap.

01:14:26.280 --> 01:14:27.840
2 and 3, you're good.

01:14:27.840 --> 01:14:28.530
You're good.

01:14:28.530 --> 01:14:30.790
OK, now 1 and 0, swap--

01:14:30.790 --> 01:14:33.190
1 and 2 and now 0 and 1.

01:14:33.190 --> 01:14:36.027
A round of applause if we
could for our volunteers.

01:14:36.027 --> 01:14:37.338
[APPLAUSE]

01:14:37.338 --> 01:14:38.622
[CHEERING]

01:14:38.622 --> 01:14:40.830
So if you want to put your
numbers on the tray there,

01:14:40.830 --> 01:14:44.820
we have some lovely Super
Mario Oreos today, which

01:14:44.820 --> 01:14:46.650
maybe drives a lot of the volunteerism.

01:14:46.650 --> 01:14:48.930
But here we go.

01:14:48.930 --> 01:14:51.180
Thank you all so much.

01:14:51.180 --> 01:14:54.060
And maybe, Carter, if you
can help with the reset?

01:14:54.060 --> 01:14:56.020
All right, here we go.

01:14:56.020 --> 01:14:57.840
[INAUDIBLE] yes, all set.

01:14:57.840 --> 01:14:58.740
Thank you very much.

01:14:58.740 --> 01:14:59.430
Thank you.

01:14:59.430 --> 01:15:02.340
Thank you, guys.

01:15:02.340 --> 01:15:04.290
Thank you, yes.

01:15:04.290 --> 01:15:08.880
To recap, let's actually formalize
a little more algorithmically

01:15:08.880 --> 01:15:09.573
what we did.

01:15:09.573 --> 01:15:11.490
And I deliberately kind
of orchestrated things

01:15:11.490 --> 01:15:14.280
there to show two fundamentally
different approaches, one

01:15:14.280 --> 01:15:18.210
where I kind of selected the
element I wanted again and again

01:15:18.210 --> 01:15:19.770
on the basis of how small it was.

01:15:19.770 --> 01:15:22.210
I looked for the smallest, then
the next smallest, and so forth.

01:15:22.210 --> 01:15:24.335
The second time around, I
took a different approach

01:15:24.335 --> 01:15:26.010
by just fixing local problems.

01:15:26.010 --> 01:15:28.710
But I did it again and
again and again until I

01:15:28.710 --> 01:15:30.408
fixed all of the minor problems.

01:15:30.408 --> 01:15:32.700
And frankly, what they did
organically at the beginning

01:15:32.700 --> 01:15:36.167
was probably closer to the
second algorithm than the first,

01:15:36.167 --> 01:15:39.250
even though I'm not sure they would
write down the same pseudocode for it.

01:15:39.250 --> 01:15:43.510
The first algorithm we executed
is actually called selection sort.

01:15:43.510 --> 01:15:46.060
And I deliberately used
that vernacular of selecting

01:15:46.060 --> 01:15:50.590
the smallest element again and again
to evoke this name of the algorithm.

01:15:50.590 --> 01:15:54.460
So for instance, when we had these
numbers here initially unsorted,

01:15:54.460 --> 01:15:58.900
I kept looking again and again and
again for the smallest element.

01:15:58.900 --> 01:16:02.380
And I don't know a priori what
the smallest number is until I

01:16:02.380 --> 01:16:04.510
go through the list at least once.

01:16:04.510 --> 01:16:06.190
Then I can pluck out the 0.

01:16:06.190 --> 01:16:10.360
I then go through the list a second
time to pluck out the 1, a third time

01:16:10.360 --> 01:16:11.560
to pluck out the 2.

01:16:11.560 --> 01:16:14.380
Now, this assumes that I don't
have an infinite amount of memory

01:16:14.380 --> 01:16:17.920
because even though I kept repeating
myself looking for the next smallest

01:16:17.920 --> 01:16:20.800
element, next smallest
element, I propose that I only

01:16:20.800 --> 01:16:25.060
keep track of one number at a time,
one variable in my mind, which

01:16:25.060 --> 01:16:28.330
was the smallest element
I have seen thus far.

01:16:28.330 --> 01:16:32.080
If I used more memory, I could
probably remember from the first pass

01:16:32.080 --> 01:16:33.580
where the 2 is, where the 3 is.

01:16:33.580 --> 01:16:34.955
But that's a different algorithm.

01:16:34.955 --> 01:16:37.180
And it would take more
space, more memory.

01:16:37.180 --> 01:16:41.390
I was confining myself to just
one variable in my approach.

01:16:41.390 --> 01:16:44.800
So here might be the pseudocode
for what we'd call selection

01:16:44.800 --> 01:16:48.490
sort, the very first algorithm that
we all did together, not organically,

01:16:48.490 --> 01:16:50.350
but more methodically as a group.

01:16:50.350 --> 01:16:54.220
I would propose that we use some syntax
from C when we talk about the loop

01:16:54.220 --> 01:16:57.790
and say for i from 0 to n minus 1.

01:16:57.790 --> 01:16:58.600
Now, why is that?

01:16:58.600 --> 01:17:02.980
Well, if there were n
people or 8, 0 through 7

01:17:02.980 --> 01:17:07.040
are the indexes or indices of those
humans on stage from left to right.

01:17:07.040 --> 01:17:08.620
So what did I have myself do?

01:17:08.620 --> 01:17:15.190
Find the smallest number between
numbers bracket i and numbers bracket

01:17:15.190 --> 01:17:16.390
n minus 1.

01:17:16.390 --> 01:17:20.710
So when the loop starts
at 0, this is literally

01:17:20.710 --> 01:17:25.750
saying between numbers bracket
0 and numbers bracket n minus 1.

01:17:25.750 --> 01:17:30.490
So from the far left to the far
right, find me the smallest number.

01:17:30.490 --> 01:17:34.300
Then swap that number
with numbers bracket i.

01:17:34.300 --> 01:17:36.880
So that's why we evicted
the person all the way

01:17:36.880 --> 01:17:39.970
over to the right or your
left the very first time,

01:17:39.970 --> 01:17:42.650
and then the next person, and
the next person, and so forth.

01:17:42.650 --> 01:17:46.990
So this is an algorithm that has us
go back and forth, back and forth,

01:17:46.990 --> 01:17:49.250
iteratively selecting
the smallest element.

01:17:49.250 --> 01:17:53.980
So if we generalize this now, not away
from eight people to, like, n people,

01:17:53.980 --> 01:17:56.800
you can think of them as
representing an array, a.k.a.

01:17:56.800 --> 01:18:01.180
doors like this where the leftmost one
is 0, the rightmost one is n minus 1,

01:18:01.180 --> 01:18:04.540
second to last is n minus 2, and
so forth if we don't know or care

01:18:04.540 --> 01:18:06.250
what n specifically is.

01:18:06.250 --> 01:18:11.980
So how many total steps does
selection sort perhaps take?

01:18:11.980 --> 01:18:13.930
And let's make this a
little more real here.

01:18:13.930 --> 01:18:17.990
Let me actually open up, for
instance, a quick visualization here.

01:18:17.990 --> 01:18:19.930
And on the screen here,
you'll just see now

01:18:19.930 --> 01:18:24.070
an artist's rendition of an array
of values whereby tall purple bars

01:18:24.070 --> 01:18:25.390
represent big integers.

01:18:25.390 --> 01:18:28.310
And short purple bars
represent small integers.

01:18:28.310 --> 01:18:31.690
So the idea of any sorting
algorithm here as visualized

01:18:31.690 --> 01:18:34.600
is to get the small bars on the
left and the big bars on the right.

01:18:34.600 --> 01:18:36.550
And this is a nice handy tool
that lets you play around

01:18:36.550 --> 01:18:37.700
with different algorithms.

01:18:37.700 --> 01:18:39.970
So here is selection sort
that I've just clicked.

01:18:39.970 --> 01:18:42.760
In pink again and
again is the equivalent

01:18:42.760 --> 01:18:46.210
of me walking through the volunteers
looking for the next smallest element.

01:18:46.210 --> 01:18:50.260
And as soon as I found them, I swapped
them into their leftmost location,

01:18:50.260 --> 01:18:54.250
evicting whoever's there
in order to gradually sort

01:18:54.250 --> 01:18:56.480
this list from left to right.

01:18:56.480 --> 01:19:00.400
And so as you can see here,
it holds very briefly in pink

01:19:00.400 --> 01:19:03.250
whatever the currently smallest
element it has found is.

01:19:03.250 --> 01:19:05.830
That's the analog of, like,
me pointing to my head

01:19:05.830 --> 01:19:09.940
whereby it's constantly
comparing, comparing, comparing,

01:19:09.940 --> 01:19:12.423
looking for the next smallest element.

01:19:12.423 --> 01:19:15.340
Now, I'm kind of stalling because
I'm running out of intelligent words

01:19:15.340 --> 01:19:15.840
to say.

01:19:15.840 --> 01:19:18.190
But that is to say, this
algorithm feels kind of slow.

01:19:18.190 --> 01:19:20.590
Like, it seems to be
doing a lot of work.

01:19:20.590 --> 01:19:23.170
Where is the work coming in?

01:19:23.170 --> 01:19:26.060
Like, what is it doing
a lot of specifically?

01:19:26.060 --> 01:19:27.118
Yeah?

01:19:27.118 --> 01:19:29.410
Yeah, it keeps going back
and forth and back and forth.

01:19:29.410 --> 01:19:31.090
And even though it's shaving
a little bit of time,

01:19:31.090 --> 01:19:33.670
right, because it doesn't have
to stupidly go all the way back

01:19:33.670 --> 01:19:35.920
to the beginning, and I was
saving myself a few steps,

01:19:35.920 --> 01:19:38.680
like, it's a lot of cyclicity
again and again and again.

01:19:38.680 --> 01:19:41.500
And put another way, it's a lot
of comparisons again and again.

01:19:41.500 --> 01:19:44.560
And some of those comparisons
you're doing multiple times

01:19:44.560 --> 01:19:47.890
because I only remembered one
element at a time in my head.

01:19:47.890 --> 01:19:50.950
So I have to kind of remind
myself on every pass which

01:19:50.950 --> 01:19:52.640
is smallest, which is smallest.

01:19:52.640 --> 01:19:57.760
So this invites the question how
fast or how slow or equivalently

01:19:57.760 --> 01:20:02.350
how efficient or inefficient
is something like bubble--

01:20:02.350 --> 01:20:03.400
selection sort.

01:20:03.400 --> 01:20:07.790
Well, let's actually consider how
we could analyze this as follows.

01:20:07.790 --> 01:20:12.790
So if we have n numbers
that we want to sort,

01:20:12.790 --> 01:20:15.730
how many comparisons do
we do the first time?

01:20:15.730 --> 01:20:19.280
Well, if there's n numbers, you can
only make n minus 1 comparisons.

01:20:19.280 --> 01:20:19.780
Why?

01:20:19.780 --> 01:20:21.550
Because if we have
eight people here and we

01:20:21.550 --> 01:20:23.440
started with whoever's
all the way over here,

01:20:23.440 --> 01:20:27.890
we compare this person against
seven others-- n minus 1 if n is 8.

01:20:27.890 --> 01:20:32.000
So the first pass through the
list, I made n minus 1 comparisons.

01:20:32.000 --> 01:20:34.950
But that put the smallest
number 0 in place.

01:20:34.950 --> 01:20:37.547
The second time I walked
across our eight volunteers,

01:20:37.547 --> 01:20:39.380
I didn't need to walk
in front of all eight.

01:20:39.380 --> 01:20:41.630
I could shave off a little
bit and do seven of them,

01:20:41.630 --> 01:20:43.460
then six, then five, then four.

01:20:43.460 --> 01:20:45.710
So if I were to write this
out roughly mathematically,

01:20:45.710 --> 01:20:50.630
I could do this-- n minus 1 plus
n minus 2 plus n minus 3 plus

01:20:50.630 --> 01:20:54.020
dot, dot, dot, all the way
down to my very last comparison

01:20:54.020 --> 01:20:55.260
at the end of the list.

01:20:55.260 --> 01:20:57.320
Now, this is the kind of thing
that typically in high school

01:20:57.320 --> 01:20:59.612
you'd look at the back of
the math book or physics book

01:20:59.612 --> 01:21:02.960
that's got a little cheat sheet for
all of the formulas that add up.

01:21:02.960 --> 01:21:07.160
This series here, let me just
stipulate, adds up to this-- n times

01:21:07.160 --> 01:21:09.260
n minus 1 divided by 2.

01:21:09.260 --> 01:21:12.770
So no matter what n is, this
formula captures that series,

01:21:12.770 --> 01:21:14.760
that summation of all of those values.

01:21:14.760 --> 01:21:18.830
So that is how many steps
I took again and again

01:21:18.830 --> 01:21:22.070
and again when implementing
selection sort for eight people.

01:21:22.070 --> 01:21:24.000
So of course, let's multiply this out.

01:21:24.000 --> 01:21:27.013
So this is like n squared
minus n all divided by 2.

01:21:27.013 --> 01:21:28.430
Let's do it out a little bit more.

01:21:28.430 --> 01:21:31.610
That's n squared divided
by 2 minus n over 2.

01:21:31.610 --> 01:21:34.913
And now we're back into the
territory of running times.

01:21:34.913 --> 01:21:36.830
Like, how many steps
does this algorithm take?

01:21:36.830 --> 01:21:39.080
How many comparisons are we making?

01:21:39.080 --> 01:21:42.950
Now, n squared seems
like the biggest term.

01:21:42.950 --> 01:21:44.250
It's the dominant term.

01:21:44.250 --> 01:21:48.110
In other words, as n gets
large-- not eight but 80 or 800

01:21:48.110 --> 01:21:53.000
or 8,000 or 8 million-- squaring that
is going to make a way bigger difference

01:21:53.000 --> 01:21:55.880
than just doing, like, n divided
by 2 and subtracting that off.

01:21:55.880 --> 01:21:59.300
Similarly, just dividing even
this quadratic formula by 2,

01:21:59.300 --> 01:22:01.460
like, yes, it's going
to halve it literally.

01:22:01.460 --> 01:22:03.410
But that's kind of a drop in the bucket.

01:22:03.410 --> 01:22:05.930
As n gets larger and
larger and larger, it's

01:22:05.930 --> 01:22:07.650
still going to be a crazy big number.

01:22:07.650 --> 01:22:11.000
So computer scientists would typically
wrap this in some big O notation

01:22:11.000 --> 01:22:16.070
and say, OK, OK, selection sort
is on the order of n squared.

01:22:16.070 --> 01:22:17.502
That's not a precise measurement.

01:22:17.502 --> 01:22:19.460
But it's on the order of
n squared because it's

01:22:19.460 --> 01:22:23.480
making so many darn comparisons, not
unlike everyone shaking everyone else's

01:22:23.480 --> 01:22:25.470
hand like I proposed verbally earlier.

01:22:25.470 --> 01:22:27.300
It's a lot of work to get that done.

01:22:27.300 --> 01:22:30.170
So selection sort is on the
order of n squared steps.

01:22:30.170 --> 01:22:31.967
And that's kind of a slow one.

01:22:31.967 --> 01:22:33.800
That's what was at the
top of my cheat sheet

01:22:33.800 --> 01:22:37.232
earlier when I proposed a ranking
of some common running times.

01:22:37.232 --> 01:22:38.690
There's an infinite number of them.

01:22:38.690 --> 01:22:40.380
But those are some of the common ones.

01:22:40.380 --> 01:22:42.920
So can we do actually better?

01:22:42.920 --> 01:22:47.330
Well, if bubble sort then is in big O--

01:22:47.330 --> 01:22:47.840
sorry.

01:22:47.840 --> 01:22:49.010
Sorry-- spoiler.

01:22:49.010 --> 01:22:52.610
If selection sort is in
the order of n squared,

01:22:52.610 --> 01:22:55.420
could we maybe get lucky sometimes?

01:22:55.420 --> 01:22:58.285
Like, with selection sort, what
would the best case scenario be?

01:22:58.285 --> 01:23:00.910
Well, the best case would be,
like, everyone is already sorted,

01:23:00.910 --> 01:23:02.020
0 through 7.

01:23:02.020 --> 01:23:03.250
And we just get lucky.

01:23:03.250 --> 01:23:05.820
But does bubble sort--

01:23:05.820 --> 01:23:10.580
[SIGHS]---- does selection
sort appreciate that?

01:23:10.580 --> 01:23:14.590
Does selection sort take into account
whether the list is already sorted?

01:23:14.590 --> 01:23:18.280
Not necessarily, because if you
look at the pseudocode even,

01:23:18.280 --> 01:23:22.480
there's no special conditional in here
that says, if it's already sorted,

01:23:22.480 --> 01:23:23.620
exit early.

01:23:23.620 --> 01:23:27.010
It's just going to blindly
do this this many times.

01:23:27.010 --> 01:23:30.380
And you can actually see
pseudocode-wise the n squared.

01:23:30.380 --> 01:23:33.850
Notice that this line of pseudocode here
is essentially telling me to do what?

01:23:33.850 --> 01:23:37.450
Do something n times
from 0 to n minus 1,

01:23:37.450 --> 01:23:40.720
or equivalently, if you prefer
the real world, from 1 to n.

01:23:40.720 --> 01:23:42.710
That's n times total.

01:23:42.710 --> 01:23:44.797
But what are you doing
inside of this loop?

01:23:44.797 --> 01:23:46.630
Well, every time you're
inside of this loop,

01:23:46.630 --> 01:23:49.720
you're looking for the smallest element,
looking for the smallest element.

01:23:49.720 --> 01:23:51.790
And that might take
you as many as n steps.

01:23:51.790 --> 01:23:55.000
So another way to think about the
running time of this algorithm

01:23:55.000 --> 01:23:58.540
selection sort is this loop is
telling you to do something n times.

01:23:58.540 --> 01:24:02.230
This line is telling you to
do something n times as well.

01:24:02.230 --> 01:24:04.630
And that's roughly n
times n or n squared.

01:24:04.630 --> 01:24:05.630
It's not precise.

01:24:05.630 --> 01:24:07.390
But it's on the order of n squared.

01:24:07.390 --> 01:24:11.090
Unfortunately, if you're just
blindly doing that much work always,

01:24:11.090 --> 01:24:14.240
even if you have any
number of doors, this

01:24:14.240 --> 01:24:19.800
is going to end up being in
omega of n squared as well,

01:24:19.800 --> 01:24:23.130
because even in the best case where
all of the numbers are already sorted,

01:24:23.130 --> 01:24:26.580
there is nothing about the algorithm
called selection sort or even

01:24:26.580 --> 01:24:29.850
my implementation thereof that would
have said, wait a minute-- like, I'm

01:24:29.850 --> 01:24:31.980
done, and exit prematurely.

01:24:31.980 --> 01:24:35.760
So selection sort is
in big O of and omega

01:24:35.760 --> 01:24:39.420
of n squared as we've designed it.

01:24:39.420 --> 01:24:42.988
And by coincidence, because
those boundaries are the same,

01:24:42.988 --> 01:24:45.030
you can also say that it's
in theta of n squared.

01:24:45.030 --> 01:24:49.170
No matter what, you're going to spend n
squared steps or n squared comparisons.

01:24:49.170 --> 01:24:53.100
But that second algorithm that I
keep teasing called bubble sort,

01:24:53.100 --> 01:24:55.808
and I deliberately used the word
bubble in my description of what

01:24:55.808 --> 01:24:58.433
was happening because Eli, I
think was our first volunteer, who

01:24:58.433 --> 01:25:01.140
kind of bubbled his way up all
the way to the end of the list.

01:25:01.140 --> 01:25:05.302
Then number 6 did, then 5,
then 4, then 3, then 2, then 1.

01:25:05.302 --> 01:25:07.260
All of the numbers kind
of bubbled their way up

01:25:07.260 --> 01:25:09.000
being the bigger values to the right.

01:25:09.000 --> 01:25:11.790
So bubble sort just does
something again and again

01:25:11.790 --> 01:25:15.060
by comparing adjacencies,
comparing, comparing, comparing.

01:25:15.060 --> 01:25:17.810
And then it does it again
and again and again.

01:25:17.810 --> 01:25:19.180
So let's analyze bubble sort.

01:25:19.180 --> 01:25:22.750
If, for instance, this was the original
array that we tried sorting before--

01:25:22.750 --> 01:25:24.100
same exact numbers--

01:25:24.100 --> 01:25:28.330
I was doing things like flipping the
7 and the 2, and then the 7 and the 5,

01:25:28.330 --> 01:25:29.770
and then the 7 and the 4.

01:25:29.770 --> 01:25:34.360
But that would only fix
minimally one number.

01:25:34.360 --> 01:25:37.120
I then had to repeat
again and again and again.

01:25:37.120 --> 01:25:39.290
So that one too felt like
it was taking some time.

01:25:39.290 --> 01:25:41.560
So here's one way of
thinking about bubble sort.

01:25:41.560 --> 01:25:43.660
Bubble sort pseudocode could say this.

01:25:43.660 --> 01:25:45.790
Repeat the following n times.

01:25:45.790 --> 01:25:48.700
And that's why I kept going
through the list again and again--

01:25:48.700 --> 01:25:53.473
for i from 0 to n minus 2.

01:25:53.473 --> 01:25:56.140
Now, this is a little weird, but
we'll get back to this-- from i

01:25:56.140 --> 01:25:57.820
from 0 to n minus 2.

01:25:57.820 --> 01:26:02.380
If the number at location i and
the number at location i plus 1

01:26:02.380 --> 01:26:05.140
are out of order, swap them.

01:26:05.140 --> 01:26:08.140
And then just repeat, repeat, repeat.

01:26:08.140 --> 01:26:11.470
But we've never seen n minus
2 in pseudocode before.

01:26:11.470 --> 01:26:14.830
Why is it n minus 2 and
not the usual n minus 1?

01:26:14.830 --> 01:26:16.086
Yeah?

01:26:16.086 --> 01:26:20.370
AUDIENCE: [INAUDIBLE]

01:26:20.370 --> 01:26:22.770
DAVID MALAN: Exactly,
because we're taking a number

01:26:22.770 --> 01:26:24.250
and comparing it to the next one.

01:26:24.250 --> 01:26:26.370
So you better not go
all the way to the end

01:26:26.370 --> 01:26:29.245
and expect there to be a next
one at the end of the list.

01:26:29.245 --> 01:26:31.620
So if we use these same numbers
here, even though they're

01:26:31.620 --> 01:26:34.350
in a different order,
if this is location 0,

01:26:34.350 --> 01:26:38.280
and this is location n
minus 1 always, if you think

01:26:38.280 --> 01:26:42.480
of i as being my left hand, it's
pointing from 0 to 1 to 2 to 3

01:26:42.480 --> 01:26:44.910
to 4 to 5 to 6 to 7.

01:26:44.910 --> 01:26:47.430
If you get to the end, you
don't want to look at i plus 1

01:26:47.430 --> 01:26:49.555
because that's pointing to
no man's land over here.

01:26:49.555 --> 01:26:51.960
And you can't go beyond the
boundaries of your array.

01:26:51.960 --> 01:26:55.012
But n minus 2 would be the second
to last element, as you know.

01:26:55.012 --> 01:26:56.970
And that makes sure that
my left hand and right

01:26:56.970 --> 01:26:59.070
hand stay within the
boundaries of the array

01:26:59.070 --> 01:27:02.777
if left hand represents i and
right hand represents i plus 1.

01:27:02.777 --> 01:27:04.860
So it's just to make sure
we don't screw up and go

01:27:04.860 --> 01:27:06.930
too far past the boundary of the array.

01:27:06.930 --> 01:27:10.920
But as implemented here, this too does
not take into account whether or not

01:27:10.920 --> 01:27:13.620
the array is already sorted.

01:27:13.620 --> 01:27:16.087
So technically, we can actually
do this n minus 1 times

01:27:16.087 --> 01:27:17.670
because the last one you get for free.

01:27:17.670 --> 01:27:19.020
It ends up being in place.

01:27:19.020 --> 01:27:22.060
But even still, let's do
a quick analysis here.

01:27:22.060 --> 01:27:27.100
If we've got n doors in total
from 0 on up to n minus 1,

01:27:27.100 --> 01:27:30.520
bubble sort's analysis looks
a little bit like this.

01:27:30.520 --> 01:27:35.228
Do the following-- n minus
1 times n minus 1 times.

01:27:35.228 --> 01:27:36.520
Well, how did we get from that?

01:27:36.520 --> 01:27:38.020
Let me back up to the pseudocode.

01:27:38.020 --> 01:27:40.750
If this is the pseudocode
as I originally put it,

01:27:40.750 --> 01:27:42.580
I then propose verbally a refinement.

01:27:42.580 --> 01:27:45.580
You don't need to repeat yourself n
times again and again because again,

01:27:45.580 --> 01:27:46.915
the last one bubbles up.

01:27:46.915 --> 01:27:49.660
But by process of elimination,
it's in the right place.

01:27:49.660 --> 01:27:52.390
So you can think of it
as just n minus 1 times.

01:27:52.390 --> 01:27:57.910
How many times can you
compare i and i plus 1?

01:27:57.910 --> 01:28:01.270
Well, you're iterating
from 0 to n minus 2.

01:28:01.270 --> 01:28:03.370
And this is where the math
is going to get weird.

01:28:03.370 --> 01:28:06.500
But that is n minus 1 steps also.

01:28:06.500 --> 01:28:07.000
Why?

01:28:07.000 --> 01:28:09.940
Because if you do it in the
real world starting at one,

01:28:09.940 --> 01:28:13.300
this is saying from i
from 1 to n minus 1.

01:28:13.300 --> 01:28:15.460
And then maybe it pops
a little more obviously.

01:28:15.460 --> 01:28:18.400
This inner loop is
repeating n minus 1 times.

01:28:18.400 --> 01:28:20.830
So outer loop says do the
following n minus 1 times.

01:28:20.830 --> 01:28:23.620
Inner loop says do this
now n minus 1 times.

01:28:23.620 --> 01:28:27.010
Mathematically, then, that's
n minus 1 times n minus 1.

01:28:27.010 --> 01:28:28.550
Now we've got our old FOIL method.

01:28:28.550 --> 01:28:31.780
So n squared minus n minus n plus 1.

01:28:31.780 --> 01:28:35.440
Combine like terms gives us
n squared minus 2n plus 1.

01:28:35.440 --> 01:28:37.600
But now let's think like
a computer scientist.

01:28:37.600 --> 01:28:41.710
When n gets really large, we
definitely don't care about the plus 1.

01:28:41.710 --> 01:28:45.640
When n gets really large, we probably
don't even care about the minus 2n.

01:28:45.640 --> 01:28:48.580
It will make a mathematical
difference but a drop in the bucket

01:28:48.580 --> 01:28:50.860
once n gets to be in the
millions or billions.

01:28:50.860 --> 01:28:54.770
So this would be on the
order of what, similarly?

01:28:54.770 --> 01:28:56.730
On the order of n squared as well.

01:28:56.730 --> 01:28:59.840
So algorithmically, and actually,
we'll use a different term

01:28:59.840 --> 01:29:03.770
now-- asymptotically-- asymptotic
notation is the fancy term

01:29:03.770 --> 01:29:08.960
to describe big O and omega and theta
notation-- asymptotically bubble

01:29:08.960 --> 01:29:12.530
sort is, quote, unquote,
"the same" as selection sort

01:29:12.530 --> 01:29:13.760
in terms of its upper bound.

01:29:13.760 --> 01:29:15.812
It's not 100% exactly the same.

01:29:15.812 --> 01:29:19.020
If we get into the weeds of the math,
there are obviously different formulas.

01:29:19.020 --> 01:29:22.462
But when n gets really large and
you plot the pictures on a chart,

01:29:22.462 --> 01:29:24.920
they're going to look almost
the same because they're going

01:29:24.920 --> 01:29:27.060
to be fundamentally the same shape.

01:29:27.060 --> 01:29:33.380
So in our cheat sheet here, bubble
sort now is in big O of n squared.

01:29:33.380 --> 01:29:36.600
But let me propose that
we make an improvement.

01:29:36.600 --> 01:29:38.720
Here's our pseudocode earlier.

01:29:38.720 --> 01:29:43.430
When might it make sense
to abort bubble sort early?

01:29:43.430 --> 01:29:47.780
Like, when could you logically
conclude that I do not need

01:29:47.780 --> 01:29:51.975
to make another pass again and again?

01:29:51.975 --> 01:29:54.600
And remember what I did from left
to right over our volunteers.

01:29:54.600 --> 01:29:57.510
I was comparing, comparing, comparing,
comparing and maybe swapping

01:29:57.510 --> 01:29:59.200
people who were out of order.

01:29:59.200 --> 01:30:03.980
So what could I do to
short circuit this?

01:30:03.980 --> 01:30:04.855
AUDIENCE: [INAUDIBLE]

01:30:04.855 --> 01:30:05.950
DAVID MALAN: Perfect.

01:30:05.950 --> 01:30:10.060
If I compare through the whole list
left to right and I make no swaps,

01:30:10.060 --> 01:30:12.808
it stands to reason that
they're already ordered.

01:30:12.808 --> 01:30:14.350
Otherwise, I would have swapped them.

01:30:14.350 --> 01:30:17.923
And therefore, I would be crazy to
do that again and again and again.

01:30:17.923 --> 01:30:20.090
Because if I didn't swap
them the first time around,

01:30:20.090 --> 01:30:22.757
why would I swap them the second
time around if no one's moving.

01:30:22.757 --> 01:30:25.280
So I can terminate the algorithm early.

01:30:25.280 --> 01:30:27.550
So in pseudocode, I could
say something like this.

01:30:27.550 --> 01:30:29.920
If no swaps, quit.

01:30:29.920 --> 01:30:31.940
And I can do that inside
of the inner loop.

01:30:31.940 --> 01:30:35.560
So once I make a pass through the people
and I say, hm, did I make any swaps?

01:30:35.560 --> 01:30:38.320
If no, just quit because
they're not going

01:30:38.320 --> 01:30:41.260
to move any further if they
didn't just move already.

01:30:41.260 --> 01:30:45.040
So bubble sort then might
arguably be an omega of what?

01:30:45.040 --> 01:30:51.130
In the best case, how few steps could
we get away with with bubble sort?

01:30:51.130 --> 01:30:52.330
OK, it's not one.

01:30:52.330 --> 01:30:53.490
But it is n.

01:30:53.490 --> 01:30:54.160
Why?

01:30:54.160 --> 01:30:56.980
Because I minimally need to go
through the whole list to decide,

01:30:56.980 --> 01:30:58.240
did I make any swaps.

01:30:58.240 --> 01:31:03.010
And logically-- and this will come up
a lot in the analysis of algorithms--

01:31:03.010 --> 01:31:07.780
any question like, is this list sorted,
cannot possibly take only one step

01:31:07.780 --> 01:31:09.040
if you've got n elements.

01:31:09.040 --> 01:31:09.310
Why?

01:31:09.310 --> 01:31:10.727
Because then you're just guessing.

01:31:10.727 --> 01:31:14.560
Like, if you don't at least take
the time to look at every element,

01:31:14.560 --> 01:31:17.710
how in the world can you conclude
logically that they're sorted or not?

01:31:17.710 --> 01:31:21.290
Like, you minimally have to meet
us halfway and check every element.

01:31:21.290 --> 01:31:24.338
So it's minimally in omega of n.

01:31:24.338 --> 01:31:27.130
Constant time wouldn't even give
you time to look at the whole list

01:31:27.130 --> 01:31:28.190
if there's n elements.

01:31:28.190 --> 01:31:29.500
So that's the intuition there.

01:31:29.500 --> 01:31:32.042
So we can't say anything about
theta notation for bubble sort

01:31:32.042 --> 01:31:34.000
because it's different,
upper and lower bounds.

01:31:34.000 --> 01:31:39.850
But we seem to have done better
asymptotically than selection sort.

01:31:39.850 --> 01:31:41.530
In the worst case, they're just as bad.

01:31:41.530 --> 01:31:42.530
They're super slow.

01:31:42.530 --> 01:31:45.310
But in the best case, bubble
sort with that tweak--

01:31:45.310 --> 01:31:48.550
that conditional about swapping--
might actually be faster for us.

01:31:48.550 --> 01:31:52.940
And you'd want Google, Microsoft, your
phone, to use that kind of algorithm.

01:31:52.940 --> 01:31:55.280
Let me go back to the sorting
demonstration earlier.

01:31:55.280 --> 01:31:57.380
Let me re-randomize the array.

01:31:57.380 --> 01:31:58.820
Small bars is small number.

01:31:58.820 --> 01:31:59.990
Big bars is big number.

01:31:59.990 --> 01:32:02.030
And let me click bubble sort this time.

01:32:02.030 --> 01:32:04.970
And you'll see again, the
pink represents comparisons.

01:32:04.970 --> 01:32:08.400
But now the comparisons are always
adjacent from left to right.

01:32:08.400 --> 01:32:12.360
So this is pretty much doing with more
bars what I did with eight people here.

01:32:12.360 --> 01:32:15.950
And you'll see that the
biggest elements, Eli

01:32:15.950 --> 01:32:18.260
being the first one of
the humans, bubbled up

01:32:18.260 --> 01:32:21.470
all the way to the right, followed
by the next largest, next largest,

01:32:21.470 --> 01:32:22.490
next largest.

01:32:22.490 --> 01:32:25.700
But here again you can visually see.

01:32:25.700 --> 01:32:28.490
And with the number of words
I'm using to kind of stall here,

01:32:28.490 --> 01:32:31.100
you can [? hear ?] that
this is kind of slow

01:32:31.100 --> 01:32:34.618
because you're doing a lot of
comparisons again and again and again.

01:32:34.618 --> 01:32:35.910
And you're making improvements.

01:32:35.910 --> 01:32:37.618
So you're taking bites
out of the problem

01:32:37.618 --> 01:32:40.897
but really only one bite at a time.

01:32:40.897 --> 01:32:43.480
So I kind of feel this is going
to be like holding in a sneeze

01:32:43.480 --> 01:32:44.560
if we don't finish this.

01:32:44.560 --> 01:32:47.410
So I feel like we should
give you emotional closure

01:32:47.410 --> 01:32:49.930
with getting this to finish.

01:32:49.930 --> 01:32:53.020
Any questions in the meantime because
there's going to be no surprise.

01:32:53.020 --> 01:32:54.830
It's going to be sorted.

01:32:54.830 --> 01:32:57.580
This is bubble sort.

01:32:57.580 --> 01:32:58.570
No?

01:32:58.570 --> 01:33:00.040
All right, we're almost there.

01:33:00.040 --> 01:33:02.800
It's going faster because the list
is effectively getting shorter.

01:33:02.800 --> 01:33:08.350
And-- oh, OK, so that
then was bubble sort.

01:33:08.350 --> 01:33:12.920
But both of these, I claim now,
are actually pretty inefficient.

01:33:12.920 --> 01:33:13.780
They're pretty slow.

01:33:13.780 --> 01:33:16.330
And my god, that was only
eight humans on the stage.

01:33:16.330 --> 01:33:19.540
That was only, like, what,
40 or 50 bars on the screen.

01:33:19.540 --> 01:33:23.320
What if we start talking about hundreds
or thousands or millions of inputs?

01:33:23.320 --> 01:33:25.690
Like, those algorithms are
probably going to take crazy

01:33:25.690 --> 01:33:28.220
long because n squared gets really big.

01:33:28.220 --> 01:33:30.440
So can we do better than n squared?

01:33:30.440 --> 01:33:31.330
Well, we can.

01:33:31.330 --> 01:33:35.170
But let me propose that we introduce
a new problem-solving technique

01:33:35.170 --> 01:33:38.320
that we've actually been using
already, just not by name.

01:33:38.320 --> 01:33:42.700
In programming and in mathematics,
there's this idea of recursion.

01:33:42.700 --> 01:33:48.280
And recursion is a description
for a function that calls itself.

01:33:48.280 --> 01:33:51.460
A function that calls
itself is recursive.

01:33:51.460 --> 01:33:52.895
So what do we mean by this?

01:33:52.895 --> 01:33:54.520
Well, we've actually seen this already.

01:33:54.520 --> 01:33:58.810
Here's the same pseudocode
for binary search earlier.

01:33:58.810 --> 01:34:01.180
And binary search--
bi implying two-- went

01:34:01.180 --> 01:34:05.070
either left or right or left
or right, halving each time.

01:34:05.070 --> 01:34:06.950
So that was the division and conquering.

01:34:06.950 --> 01:34:08.570
So this is the same pseudocode.

01:34:08.570 --> 01:34:12.080
But notice that in my
pseudocode earlier,

01:34:12.080 --> 01:34:16.710
I literally used the keyword search
inside of my definition of search.

01:34:16.710 --> 01:34:19.550
So this is sort of one of those,
like, circular definitions.

01:34:19.550 --> 01:34:21.590
Like, if this is a search
algorithm, how am I

01:34:21.590 --> 01:34:24.860
getting away with using the word
search in the definition of search?

01:34:24.860 --> 01:34:27.110
It's kind of when you get
reprimanded for using a word

01:34:27.110 --> 01:34:28.850
to define a vocabulary word.

01:34:28.850 --> 01:34:32.960
This is kind of like that because
this algorithm is calling itself.

01:34:32.960 --> 01:34:35.700
This search function is calling itself.

01:34:35.700 --> 01:34:38.300
Now, normally, if you were to
have a function call itself,

01:34:38.300 --> 01:34:41.750
call itself, call itself, call itself,
it would just do that infinitely.

01:34:41.750 --> 01:34:43.550
And presumably, it would go on forever.

01:34:43.550 --> 01:34:45.350
Or the program or
computer would probably

01:34:45.350 --> 01:34:48.517
crash because out of memory or something
like that-- more on that next week.

01:34:48.517 --> 01:34:50.780
But there's a key detail,
a key characteristic

01:34:50.780 --> 01:34:54.770
about this algorithm that makes
sure that doing the same thing again

01:34:54.770 --> 01:34:56.690
and again is not crazy.

01:34:56.690 --> 01:34:59.630
It's actually going to
lead us to a solution.

01:34:59.630 --> 01:35:02.720
Even though I'm using
search here or search here,

01:35:02.720 --> 01:35:08.742
what's happening to the size of the
problem for each of these lines?

01:35:08.742 --> 01:35:09.450
What's happening?

01:35:09.450 --> 01:35:10.290
Yeah?

01:35:10.290 --> 01:35:11.610
It's being cut in half.

01:35:11.610 --> 01:35:15.420
So even though I'm doing the exact same
thing algorithmically again and again,

01:35:15.420 --> 01:35:18.930
I'm doing it on a smaller,
smaller, smaller input--

01:35:18.930 --> 01:35:21.820
fewer, fewer, fewer doors--
fewer, fewer, fewer people.

01:35:21.820 --> 01:35:23.820
And so even though I'm
doing it again and again,

01:35:23.820 --> 01:35:26.760
so long as I have this so-called
base case, as we'll call it,

01:35:26.760 --> 01:35:29.700
that just makes sure if there's
no doors-- you're done--

01:35:29.700 --> 01:35:34.710
I can break out of what would
otherwise be an infinite loop of sorts.

01:35:34.710 --> 01:35:37.525
All right, so those two lines
we've already kind of seen.

01:35:37.525 --> 01:35:39.150
And actually, we saw this in week zero.

01:35:39.150 --> 01:35:42.510
So this is the pseudocode for the
phone book example from week zero.

01:35:42.510 --> 01:35:45.300
And you might recall that we
had these lines of code here.

01:35:45.300 --> 01:35:48.420
And this is very procedural
or iterative where I literally

01:35:48.420 --> 01:35:49.872
told you go back to line 3.

01:35:49.872 --> 01:35:52.330
And we did that today when we
took attendance, so to speak.

01:35:52.330 --> 01:35:55.860
With that third algorithm, you went
back to step 2 again and again.

01:35:55.860 --> 01:35:58.000
That's a very iterative
loop-based approach.

01:35:58.000 --> 01:35:58.750
But you know what?

01:35:58.750 --> 01:36:00.840
I could be a little more
clever here too in week zero.

01:36:00.840 --> 01:36:03.210
We just didn't want to get
too far ahead of ourselves.

01:36:03.210 --> 01:36:07.410
Let me actually change these highlighted
lines to be more, succinctly,

01:36:07.410 --> 01:36:10.150
search left half book,
search right half of book.

01:36:10.150 --> 01:36:11.620
I can now tighten the code up.

01:36:11.620 --> 01:36:13.270
So it's kind of a shorter algorithm.

01:36:13.270 --> 01:36:14.860
But it's the exact same idea.

01:36:14.860 --> 01:36:20.570
But on week zero, I very methodically
told you what I want you to do next.

01:36:20.570 --> 01:36:22.900
But here, I'm sort of
recognizing that, wait a minute,

01:36:22.900 --> 01:36:25.660
I just spent the past six lines
telling you how to search.

01:36:25.660 --> 01:36:27.220
So go do more of that.

01:36:27.220 --> 01:36:29.180
You have all of the logic you need.

01:36:29.180 --> 01:36:33.280
So this too is recursive in the sense
that if this is a search algorithm,

01:36:33.280 --> 01:36:36.610
the search algorithm is using a
search algorithm inside of it.

01:36:36.610 --> 01:36:39.430
But here too, the phone book
was getting divided and divided

01:36:39.430 --> 01:36:42.160
and divided in half again and again.

01:36:42.160 --> 01:36:45.370
So recursion doesn't just
describe mathematical formulas.

01:36:45.370 --> 01:36:49.077
It doesn't just describe pseudocode
or code-- even physical structures.

01:36:49.077 --> 01:36:51.410
So we deliberately brought
back these bricks here today,

01:36:51.410 --> 01:36:54.040
which are reminiscent, of course,
from Mario and Super Mario Brothers.

01:36:54.040 --> 01:36:56.695
That pyramid, especially if
we get rid of the distractions

01:36:56.695 --> 01:36:58.570
of the ground and the
mountains and so forth,

01:36:58.570 --> 01:37:01.810
is itself recursive as a structure.

01:37:01.810 --> 01:37:02.920
Now, why is that?

01:37:02.920 --> 01:37:05.440
Well, here you can kind
of play verbal games.

01:37:05.440 --> 01:37:07.930
If this is a pyramid of height 4--

01:37:07.930 --> 01:37:11.092
and ask the question, well, what is
that-- what is a pyramid of height 4--

01:37:11.092 --> 01:37:13.300
I could kind of be difficult
and say, well, it's just

01:37:13.300 --> 01:37:16.270
a pyramid of height 3 plus 1 other row.

01:37:16.270 --> 01:37:18.310
All right, well, what's
a pyramid of height 3?

01:37:18.310 --> 01:37:21.185
You could be difficult and say,
well, it's just a pyramid of height 2

01:37:21.185 --> 01:37:22.060
plus 1 more row.

01:37:22.060 --> 01:37:23.290
What's a pyramid of height 2?

01:37:23.290 --> 01:37:27.363
Well, it's this plus 1 more row.

01:37:27.363 --> 01:37:28.780
I might have skipped a step there.

01:37:28.780 --> 01:37:30.010
What's a pyramid of height 2?

01:37:30.010 --> 01:37:33.130
Well, it's a pyramid of
height 1 plus 1 more row.

01:37:33.130 --> 01:37:34.900
What's a pyramid of height 1?

01:37:34.900 --> 01:37:37.240
Well, it's nothing plus 1 more row.

01:37:37.240 --> 01:37:39.638
Or at this point, you can
just say it's a single brick.

01:37:39.638 --> 01:37:42.430
You can have a base case that just
says, like, the buck stops here.

01:37:42.430 --> 01:37:43.722
Stop asking me these questions.

01:37:43.722 --> 01:37:48.140
I can special case or hard code my
answer to the very tiniest of problems.

01:37:48.140 --> 01:37:50.740
So even those bricks then are sort of--

01:37:50.740 --> 01:37:52.622
the pyramid is recursively defined.

01:37:52.622 --> 01:37:54.580
And we can actually
implement this even in code

01:37:54.580 --> 01:37:57.080
if we want in a couple of
different ways to make this clear.

01:37:57.080 --> 01:37:59.690
So let me go back over to VS Code here.

01:37:59.690 --> 01:38:03.760
And let me propose that we do one
old-school iterative example with

01:38:03.760 --> 01:38:04.510
this--

01:38:04.510 --> 01:38:06.190
code iteration dot C.

01:38:06.190 --> 01:38:09.040
And iteration just means
to loop again and again.

01:38:09.040 --> 01:38:11.000
An iteration is a pass through the code.

01:38:11.000 --> 01:38:13.430
And let me go ahead and
whip this up as follows.

01:38:13.430 --> 01:38:15.730
Let's include the CS50 library.

01:38:15.730 --> 01:38:18.850
So we have our get int function.

01:38:18.850 --> 01:38:21.790
Let's include standard io
so that we have printf.

01:38:21.790 --> 01:38:27.650
Let's go ahead and create main with
no command line arguments today.

01:38:27.650 --> 01:38:30.430
Let's go ahead and create
a variable of type int set

01:38:30.430 --> 01:38:33.070
equal to the return
value of get int asking

01:38:33.070 --> 01:38:35.230
the user for the height of the pyramid.

01:38:35.230 --> 01:38:37.480
And then let me assume
for the sake of discussion

01:38:37.480 --> 01:38:41.290
that there is a function now that
will draw a pyramid of that height.

01:38:41.290 --> 01:38:44.810
All right, now let's actually implement
that function as a helper function,

01:38:44.810 --> 01:38:45.500
so to speak.

01:38:45.500 --> 01:38:49.270
So if I have a function called draw,
it's going to clearly take an integer.

01:38:49.270 --> 01:38:51.670
And I'll call it, maybe, n as input.

01:38:51.670 --> 01:38:53.600
And it doesn't need to return anything.

01:38:53.600 --> 01:38:55.120
So I'm going to say that
this is a void function.

01:38:55.120 --> 01:38:56.203
It doesn't return a value.

01:38:56.203 --> 01:38:59.030
It just has a side effect of
printing bricks on the screen.

01:38:59.030 --> 01:39:02.050
How can I go about doing
a pyramid of height n?

01:39:02.050 --> 01:39:07.120
Well, I'm going to do this one, which
looks like this-- one at a time,

01:39:07.120 --> 01:39:09.430
then two, then three, then four.

01:39:09.430 --> 01:39:11.210
And this is a little
easier, for instance,

01:39:11.210 --> 01:39:14.380
than problem set one, which had the
pyramid flipped around the other way.

01:39:14.380 --> 01:39:16.480
So this one's a little
easier to do deliberately.

01:39:16.480 --> 01:39:21.250
So if I go back to my code, I could say
something like this-- for int i gets 0.

01:39:21.250 --> 01:39:24.610
i is less than, let's say,
n, which is the height--

01:39:24.610 --> 01:39:25.810
i plus, plus.

01:39:25.810 --> 01:39:30.280
That's going to essentially iterate over
every row of the pyramid top to bottom.

01:39:30.280 --> 01:39:31.480
This might feel similar--

01:39:31.480 --> 01:39:33.700
reminiscent-- to problem set one.

01:39:33.700 --> 01:39:36.280
Then in my inner loop,
I'm going to do four.

01:39:36.280 --> 01:39:38.200
int j gets 0.

01:39:38.200 --> 01:39:42.682
j is less than i plus 1 j plus, plus.

01:39:42.682 --> 01:39:44.390
And we'll see why this
works in a moment.

01:39:44.390 --> 01:39:47.800
And then let's go ahead and print
out a single hash, no new line.

01:39:47.800 --> 01:39:52.660
But at the very bottom of this row,
let's print out just a new line

01:39:52.660 --> 01:39:54.850
to move the cursor down a line.

01:39:54.850 --> 01:39:56.050
Now, I'm not done yet.

01:39:56.050 --> 01:39:57.760
Let me hide my terminal for a second.

01:39:57.760 --> 01:39:58.940
I need the prototype.

01:39:58.940 --> 01:40:02.720
So I'm going to copy this, paste
it up here with a semicolon.

01:40:02.720 --> 01:40:05.300
And I think now the code is correct.

01:40:05.300 --> 01:40:06.430
Now, why is this correct?

01:40:06.430 --> 01:40:11.160
Because on my outer loop, I'm
iterating n times starting at 0.

01:40:11.160 --> 01:40:15.900
My inner loop-- realize I want
to have at least one hash,

01:40:15.900 --> 01:40:18.480
then two, then three, then four.

01:40:18.480 --> 01:40:20.970
So I can't start my
number of hashes at 0.

01:40:20.970 --> 01:40:26.910
And that's why I'm doing j all the way
up to i plus 1 so that when i is 0,

01:40:26.910 --> 01:40:30.390
I'm actually still printing one hash,
and then two hashes, and then three.

01:40:30.390 --> 01:40:32.670
Otherwise, I would start
with 0, which is not my goal.

01:40:32.670 --> 01:40:35.700
Let me go in and do make
iteration to compile this--

01:40:35.700 --> 01:40:37.350
dot slash iteration.

01:40:37.350 --> 01:40:39.600
And let me go ahead and
make my terminal bigger--

01:40:39.600 --> 01:40:40.650
enter.

01:40:40.650 --> 01:40:42.660
The height will be, for instance, 4.

01:40:42.660 --> 01:40:45.660
And indeed, it's not quite to scale
because these are a little more

01:40:45.660 --> 01:40:47.550
vertical than they are horizontal.

01:40:47.550 --> 01:40:49.440
But it essentially
looks like that pyramid.

01:40:49.440 --> 01:40:50.910
And I can do this even larger.

01:40:50.910 --> 01:40:52.930
Let's do this again
after clearing my screen.

01:40:52.930 --> 01:40:54.210
Let's do like 10 of these.

01:40:54.210 --> 01:40:55.300
And it gets bigger.

01:40:55.300 --> 01:40:56.750
Let's do, like, 50 of these.

01:40:56.750 --> 01:40:57.750
And it gets even bigger.

01:40:57.750 --> 01:40:59.167
It doesn't even fit on the screen.

01:40:59.167 --> 01:40:59.730
But it works.

01:40:59.730 --> 01:41:02.648
And that's an iterative
approach, 100% correct,

01:41:02.648 --> 01:41:05.190
and similar to what you might
have done already for something

01:41:05.190 --> 01:41:08.130
like week zero or week one.

01:41:08.130 --> 01:41:10.390
But it turns out if
we leverage recursion,

01:41:10.390 --> 01:41:12.650
we can be a little more clever, in fact.

01:41:12.650 --> 01:41:13.880
Let me go ahead and do this.

01:41:13.880 --> 01:41:15.600
Let me minimize my terminal window.

01:41:15.600 --> 01:41:17.990
And let me go back into my code here.

01:41:17.990 --> 01:41:23.800
And let me propose to implement the
exact same program recursively instead.

01:41:23.800 --> 01:41:27.710
And let me go ahead and do this.

01:41:27.710 --> 01:41:31.030
I'm going to leave main--

01:41:31.030 --> 01:41:34.060
let's see-- exactly as is.

01:41:34.060 --> 01:41:39.410
And I'm going to go ahead and change
my draw function to work as follows.

01:41:39.410 --> 01:41:41.860
Let me delete all of these
loops because loops are sort

01:41:41.860 --> 01:41:43.750
of an indication of using iteration.

01:41:43.750 --> 01:41:45.550
And I'm instead going to do this.

01:41:45.550 --> 01:41:49.030
What is a pyramid of height 4?

01:41:49.030 --> 01:41:52.210
I said it's a pyramid of
height 3 plus 1 more row.

01:41:52.210 --> 01:41:53.740
So let's take that literally.

01:41:53.740 --> 01:41:56.950
If a pyramid of height
n needs to be drawn,

01:41:56.950 --> 01:42:01.270
let's first draw a pyramid of n minus 1.

01:42:01.270 --> 01:42:04.150
And then how do I go about
drawing one more row?

01:42:04.150 --> 01:42:05.870
Well, this, I can use
a bit of iteration.

01:42:05.870 --> 01:42:08.710
But I don't need a
doubly-nested loop anymore.

01:42:08.710 --> 01:42:13.750
I can set i equal to 0, i
less than n, i plus, plus.

01:42:13.750 --> 01:42:16.300
And this block of code
is simply going to print

01:42:16.300 --> 01:42:23.590
one simple row of hashes again and
again followed by a new line just

01:42:23.590 --> 01:42:25.070
to move the cursor down.

01:42:25.070 --> 01:42:27.040
So notice this line here.

01:42:27.040 --> 01:42:33.880
And I'll add some comments--
print pyramid of height n minus 1,

01:42:33.880 --> 01:42:35.990
print one more row.

01:42:35.990 --> 01:42:39.680
So I'm kind of taking a bite out
of the problem by printing one row.

01:42:39.680 --> 01:42:44.230
But I'm deferring to, weirdly, myself,
to print the rest of the pyramid.

01:42:44.230 --> 01:42:47.800
Now I'm going to go ahead and
try compiling this-- make--

01:42:47.800 --> 01:42:48.340
oh.

01:42:48.340 --> 01:42:49.940
And this is no longer iteration.

01:42:49.940 --> 01:42:53.920
So I'm actually going to do
this-- code recursion dot c.

01:42:53.920 --> 01:42:56.437
I'm going to paste that same
code into this new version,

01:42:56.437 --> 01:42:59.020
just so we have a different file
without breaking the old one.

01:42:59.020 --> 01:43:01.960
And now I'm going to do make recursion.

01:43:01.960 --> 01:43:03.250
All right, interesting.

01:43:03.250 --> 01:43:05.710
So Clang is yelling
at me with this error.

01:43:05.710 --> 01:43:08.810
All paths through this
function will call itself.

01:43:08.810 --> 01:43:11.590
So Clang is actually smart
enough to notice in my code

01:43:11.590 --> 01:43:14.920
that no matter what, the draw function
is going to call the draw function.

01:43:14.920 --> 01:43:16.840
And the draw function is going
to call the draw function.

01:43:16.840 --> 01:43:19.173
And the draw function's going
to call the draw function.

01:43:19.173 --> 01:43:23.230
Now, to be fair, n, the input to
draw, is getting smaller and smaller.

01:43:23.230 --> 01:43:25.990
But what's going to happen
eventually to the input of draw

01:43:25.990 --> 01:43:28.870
as I've written this code right now?

01:43:28.870 --> 01:43:30.040
Yeah, in the back?

01:43:30.040 --> 01:43:32.722
AUDIENCE: [INAUDIBLE]

01:43:32.722 --> 01:43:33.800
DAVID MALAN: Exactly.

01:43:33.800 --> 01:43:35.900
Remember that integers
are signed by default.

01:43:35.900 --> 01:43:38.550
They can be both positive
or 0 or negative.

01:43:38.550 --> 01:43:41.330
And so here, if I just
keep blindly subtracting 1,

01:43:41.330 --> 01:43:44.930
it's going to do that seemingly
forever until I technically underflow.

01:43:44.930 --> 01:43:47.970
But that's going to be like 2
billion rows of pyramids later.

01:43:47.970 --> 01:43:51.660
So I think what I actually need
to do is something like this.

01:43:51.660 --> 01:43:56.900
I need to ask the
question, if n equals 0--

01:43:56.900 --> 01:44:00.710
or heck, just to be super safe, let's
say if n is less than or equal to 0,

01:44:00.710 --> 01:44:04.010
just to make sure it never goes
negative, let's just go ahead

01:44:04.010 --> 01:44:06.540
and return without doing anything.

01:44:06.540 --> 01:44:10.130
So I'm going to comment this as,
like, if nothing to draw, well, then

01:44:10.130 --> 01:44:11.780
don't blindly call draw again.

01:44:11.780 --> 01:44:13.380
So this is that so-called base case.

01:44:13.380 --> 01:44:16.490
This is analogous to saying, like,
if John Harvard not in phone book

01:44:16.490 --> 01:44:22.410
or if no lockers or doors left, then
just exit or return in this case.

01:44:22.410 --> 01:44:27.050
So now I'll never call draw
a negative number of times

01:44:27.050 --> 01:44:28.370
or zero number of times.

01:44:28.370 --> 01:44:30.990
I will only do it so
long as n is positive.

01:44:30.990 --> 01:44:36.420
So now if I make my terminal bigger,
make recursion again does compile fine.

01:44:36.420 --> 01:44:39.630
Dot slash recursion-- let's
do the same input for.

01:44:39.630 --> 01:44:42.840
And I get the exact
same number of bricks.

01:44:42.840 --> 01:44:45.540
Let's go ahead and do
it again with maybe 10.

01:44:45.540 --> 01:44:46.890
That seems to work too.

01:44:46.890 --> 01:44:48.510
Let's do it again with 50.

01:44:48.510 --> 01:44:49.503
That seems to work too.

01:44:49.503 --> 01:44:50.670
And I can go a little crazy.

01:44:50.670 --> 01:44:52.320
I can do, like, 5,000.

01:44:52.320 --> 01:44:53.500
This is still going to work.

01:44:53.500 --> 01:44:55.210
It's just not going to fit on my screen.

01:44:55.210 --> 01:44:59.073
But it is doing a valiant attempt
to print all of those out.

01:44:59.073 --> 01:45:00.990
Now, it turns out that
could get us in trouble

01:45:00.990 --> 01:45:05.790
if we start poking around in the--
now, maybe not 5,000 but 500,000 or 5

01:45:05.790 --> 01:45:07.020
million or beyond.

01:45:07.020 --> 01:45:11.050
But for now, we'll just assume that
this is just a slow process at that.

01:45:11.050 --> 01:45:14.340
But if we go back to the goal,
which was to print this thing here,

01:45:14.340 --> 01:45:17.760
this too is a recursive structure
that we're just now translating

01:45:17.760 --> 01:45:19.623
the ideas of recursive code to.

01:45:19.623 --> 01:45:21.540
And actually, if you've
never discovered this,

01:45:21.540 --> 01:45:23.790
we would be remiss in
not doing this for you.

01:45:23.790 --> 01:45:27.840
If I go into a browser here.

01:45:27.840 --> 01:45:30.540
And suppose that I'm not
using any AI fancy technology.

01:45:30.540 --> 01:45:32.070
I'm just going to Google.com.

01:45:32.070 --> 01:45:35.730
And I search for recursion because
I'm curious to see what it means.

01:45:35.730 --> 01:45:39.650
Google for years has had this little
Easter egg for computer scientists.

01:45:43.940 --> 01:45:45.680
It's not a typo.

01:45:45.680 --> 01:45:46.180
Get it?

01:45:46.180 --> 01:45:47.690
Ha, ha, kind of funny.

01:45:47.690 --> 01:45:48.190
Yeah?

01:45:48.190 --> 01:45:49.540
OK, like, see?

01:45:49.540 --> 01:45:52.070
Recursion-- so if you click
recursion, OK, now we're

01:45:52.070 --> 01:45:53.320
in night mode for some reason.

01:45:53.320 --> 01:45:57.620
But OK, like, so it just
does this endlessly.

01:45:57.620 --> 01:46:00.410
OK, so programmers with too
much free time or too much

01:46:00.410 --> 01:46:02.560
control over Google's
own servers to do that.

01:46:02.560 --> 01:46:05.060
So it turns out, yes, if you
search for recursion on Google,

01:46:05.060 --> 01:46:07.560
this is a long-standing
Easter egg in this case.

01:46:07.560 --> 01:46:09.410
But the point of
introducing recursion is

01:46:09.410 --> 01:46:12.830
that, one, it's actually going to be a
very powerful problem-solving technique

01:46:12.830 --> 01:46:15.770
because honestly, it, one, we've
seen in pseudocode already,

01:46:15.770 --> 01:46:18.920
it kind of tightens up the amount
of code or the amount of lines

01:46:18.920 --> 01:46:20.900
that you need to write
to convey an algorithm.

01:46:20.900 --> 01:46:22.912
And two, it will actually
allow us to solve

01:46:22.912 --> 01:46:24.620
problems in a
fundamentally different way

01:46:24.620 --> 01:46:27.540
by using computer's memory
in an interesting way.

01:46:27.540 --> 01:46:31.220
And so toward that end, we wanted to
introduce one final sort today, which

01:46:31.220 --> 01:46:34.520
we won't use humans to demonstrate
but just numbers instead, namely

01:46:34.520 --> 01:46:35.990
something called merge sort.

01:46:35.990 --> 01:46:40.790
And merge sort is an algorithm
for sorting n numbers that I claim

01:46:40.790 --> 01:46:43.430
is going to be better than
selection sort and bubble sort.

01:46:43.430 --> 01:46:46.010
It's got to be better because
that n squared was killing us.

01:46:46.010 --> 01:46:48.540
We want to get something
lower than n squared.

01:46:48.540 --> 01:46:51.960
So merge sort's pseudocode
essentially looks like this.

01:46:51.960 --> 01:46:53.540
And this is it.

01:46:53.540 --> 01:46:58.460
Merge sort says, if you've got n
numbers or an array of numbers,

01:46:58.460 --> 01:47:01.430
sort the left half of them,
sort the right half of them.

01:47:01.430 --> 01:47:03.412
And then merge the sorted halves.

01:47:03.412 --> 01:47:05.370
And we'll see what this
means in just a moment.

01:47:05.370 --> 01:47:07.280
But if there's only
one number, just quit.

01:47:07.280 --> 01:47:09.920
So this is my base case
because this is recursive.

01:47:09.920 --> 01:47:12.440
Because if this is a sorting
algorithm called merge sort,

01:47:12.440 --> 01:47:15.685
I'm kind of cheating by using the
verb sort in the sorting algorithm.

01:47:15.685 --> 01:47:17.060
But that just makes it recursive.

01:47:17.060 --> 01:47:18.330
It doesn't make it wrong.

01:47:18.330 --> 01:47:19.730
So what do I mean by merging?

01:47:19.730 --> 01:47:25.522
Just to make this clear, here among
these numbers are two lists of size 4.

01:47:25.522 --> 01:47:27.980
And I'm going to scooch them
over to the left and the right

01:47:27.980 --> 01:47:32.540
just to make clear that on the
left is one half that is sorted--

01:47:32.540 --> 01:47:35.270
1346 from smallest to largest.

01:47:35.270 --> 01:47:40.460
On the right is a second
half that's also sorted 0257.

01:47:40.460 --> 01:47:42.980
So for the sake of
discussion, suppose that I'm

01:47:42.980 --> 01:47:46.070
partway through this
algorithm called merge sort.

01:47:46.070 --> 01:47:48.380
And I've sorted the left
half already, clearly.

01:47:48.380 --> 01:47:50.570
I've sorted the right
half already, clearly.

01:47:50.570 --> 01:47:52.790
Now I need to merge the sorted halves.

01:47:52.790 --> 01:47:54.270
What do we mean by that?

01:47:54.270 --> 01:47:57.710
Well, that means to
essentially, conceptually,

01:47:57.710 --> 01:48:00.350
point your left hand at
the first at the left half.

01:48:00.350 --> 01:48:02.540
Point your right hand at the right half.

01:48:02.540 --> 01:48:05.450
And then decide which of these
numbers should come first

01:48:05.450 --> 01:48:08.990
in order to merge or kind of
stitch these two lists together

01:48:08.990 --> 01:48:09.980
in sorted order.

01:48:09.980 --> 01:48:11.180
Well, which is smaller?

01:48:11.180 --> 01:48:12.350
Obviously, 0.

01:48:12.350 --> 01:48:16.460
So that last step, merge sorted
halves, would have me take this number

01:48:16.460 --> 01:48:19.490
and put it in an extra
empty array up here.

01:48:19.490 --> 01:48:22.250
Now I move my right hand to the
next number in the right half.

01:48:22.250 --> 01:48:23.870
And I compare the 1 and the 2.

01:48:23.870 --> 01:48:25.440
Obviously, 1 comes next.

01:48:25.440 --> 01:48:27.140
So I put this now up here.

01:48:27.140 --> 01:48:28.880
Now I compare the 3 and the 2.

01:48:28.880 --> 01:48:30.450
Obviously, the 2 comes next.

01:48:30.450 --> 01:48:32.180
And so I put this up here--

01:48:32.180 --> 01:48:34.670
3 and 5-- obviously, 3--

01:48:34.670 --> 01:48:40.350
4 and 5-- obviously, 4--

01:48:40.350 --> 01:48:43.620
6 and 5-- obviously, 5--

01:48:43.620 --> 01:48:46.800
and 6 and 7-- obviously, 6.

01:48:46.800 --> 01:48:50.550
And I didn't leave quite enough
room for 7, perhaps, and lastly, 7.

01:48:50.550 --> 01:48:53.838
So that's all we mean by
merging two lists together.

01:48:53.838 --> 01:48:56.130
If they're already sorted,
you just kind of stitch them

01:48:56.130 --> 01:48:59.280
together by plucking from one
or the other the next number

01:48:59.280 --> 01:49:00.280
that you actually want.

01:49:00.280 --> 01:49:00.872
And that's it.

01:49:00.872 --> 01:49:03.330
And even though I picked up
partway through this algorithm,

01:49:03.330 --> 01:49:05.370
those three steps alone
would seem to work.

01:49:05.370 --> 01:49:08.310
So long as you can sort the left
half and sort the right half,

01:49:08.310 --> 01:49:10.543
you can surely then
merge the sorted halves.

01:49:10.543 --> 01:49:13.710
Now I'll go ahead and do this digitally
rather than use the physical numbers

01:49:13.710 --> 01:49:16.377
because clearly, it's a little
involved moving them up and down.

01:49:16.377 --> 01:49:19.650
But this rack here, this
shelving, essentially

01:49:19.650 --> 01:49:22.753
represents one array with
maybe a second array here.

01:49:22.753 --> 01:49:25.170
And heck, if I really want it,
a third and a fourth array.

01:49:25.170 --> 01:49:27.660
It turns out that with
selection sort and bubble sort,

01:49:27.660 --> 01:49:30.120
we've really been tying
our hands because I only

01:49:30.120 --> 01:49:33.450
allowed myself a constant amount of
memory, just one variable in my head,

01:49:33.450 --> 01:49:35.640
for instance, with selection
sort that let me keep

01:49:35.640 --> 01:49:37.662
track of who was the smallest element.

01:49:37.662 --> 01:49:40.120
And when we did bubble sort,
the only number I kept in mind

01:49:40.120 --> 01:49:42.460
was i, like i and i plus 1.

01:49:42.460 --> 01:49:44.650
I didn't allow myself
any additional memory.

01:49:44.650 --> 01:49:47.740
But it turns out in
programming and in CS, you

01:49:47.740 --> 01:49:50.480
can trade off one resource for another.

01:49:50.480 --> 01:49:53.410
So if you want to spend
less time solving a problem,

01:49:53.410 --> 01:49:55.150
you've got to throw space at it.

01:49:55.150 --> 01:49:57.460
You've got to spend more
space, spend more money,

01:49:57.460 --> 01:50:00.650
in order to give yourself more
space to reduce your time.

01:50:00.650 --> 01:50:04.120
Conversely, if you're fine with
things being slow in terms of time,

01:50:04.120 --> 01:50:06.400
then you can get away
with very little space.

01:50:06.400 --> 01:50:08.950
So it's kind of like
this balance whereby

01:50:08.950 --> 01:50:11.710
you have to decide which is
more important to you, which

01:50:11.710 --> 01:50:15.350
is more expensive for you, or the like.

01:50:15.350 --> 01:50:20.780
So let's go ahead then and consider
exactly this algorithm as follows.

01:50:20.780 --> 01:50:25.580
So suppose that these are
the numbers in question.

01:50:25.580 --> 01:50:26.770
So here is our array.

01:50:26.770 --> 01:50:27.820
It's clearly unsorted.

01:50:27.820 --> 01:50:29.140
I'd like to sort this.

01:50:29.140 --> 01:50:30.340
I could use selection sort.

01:50:30.340 --> 01:50:33.130
But selection sort was not great
because it's big O of n squared.

01:50:33.130 --> 01:50:36.227
And it's omega of n squared,
so sort of damned if you do.

01:50:36.227 --> 01:50:37.060
Damned if you don't.

01:50:37.060 --> 01:50:38.630
Bubble sort was a little better.

01:50:38.630 --> 01:50:39.963
It was still big O of n squared.

01:50:39.963 --> 01:50:42.380
But sometimes, we could get
lucky and shave some time off.

01:50:42.380 --> 01:50:44.680
So it was omega of only n.

01:50:44.680 --> 01:50:47.170
Let's see if merge sort
is fundamentally better.

01:50:47.170 --> 01:50:51.385
And let's do so by trying to reduce the
number of comparisons-- no more looping

01:50:51.385 --> 01:50:54.010
back and forth and back and forth
and back and forth endlessly.

01:50:54.010 --> 01:50:58.210
So here's how we can draw inspiration
from the idea of recursion and divide

01:50:58.210 --> 01:51:00.160
and conquer as per week zero.

01:51:00.160 --> 01:51:02.980
Let's first think of these
numbers as indeed in an array

01:51:02.980 --> 01:51:04.810
contiguously from left to right.

01:51:04.810 --> 01:51:08.590
Let's then go ahead
and sort the left half.

01:51:08.590 --> 01:51:10.750
Because again, the
pseudocode that you have

01:51:10.750 --> 01:51:14.470
to remember throughout this whole
algorithm has just three real steps.

01:51:14.470 --> 01:51:15.670
Sort the left half.

01:51:15.670 --> 01:51:16.930
Sort the right half.

01:51:16.930 --> 01:51:18.160
Merge the sorted halves.

01:51:18.160 --> 01:51:19.540
And then this is sort
of a one-off thing.

01:51:19.540 --> 01:51:20.350
But it's important.

01:51:20.350 --> 01:51:21.555
But this is the juicy part.

01:51:21.555 --> 01:51:22.180
Sort left half.

01:51:22.180 --> 01:51:22.847
Sort right half.

01:51:22.847 --> 01:51:24.070
Merge the sorted halves.

01:51:24.070 --> 01:51:27.310
So if we go back to this array on the
top shelf, here's the original array.

01:51:27.310 --> 01:51:29.320
Let's go ahead and sort the left half.

01:51:29.320 --> 01:51:32.600
And I'm going to do so by sort
of stealing some more memory,

01:51:32.600 --> 01:51:36.160
so I can sort of work on a second
shelf with just these numbers.

01:51:36.160 --> 01:51:39.730
So 6341 is now an array
of size 4, essentially.

01:51:39.730 --> 01:51:42.070
How do I sort an array of size 4?

01:51:42.070 --> 01:51:44.395
Well, I've got an algorithm
for that-- selection sort.

01:51:44.395 --> 01:51:45.520
But we decided that's slow.

01:51:45.520 --> 01:51:46.780
Bubble sort-- that's slow.

01:51:46.780 --> 01:51:47.500
Wait a minute.

01:51:47.500 --> 01:51:49.300
I'm in the middle of
defining merge sort.

01:51:49.300 --> 01:51:52.570
Let's recursively use the
same algorithm by just

01:51:52.570 --> 01:51:55.130
sorting the left half of the left half.

01:51:55.130 --> 01:51:58.335
So if you've got an array of size 4,
it's only three steps to solve it.

01:51:58.335 --> 01:51:58.960
Sort left half.

01:51:58.960 --> 01:51:59.710
Sort right half.

01:51:59.710 --> 01:52:00.400
Merge.

01:52:00.400 --> 01:52:01.540
So let's do that.

01:52:01.540 --> 01:52:03.490
Let's sort the left half.

01:52:03.490 --> 01:52:04.300
OK, wait a minute.

01:52:04.300 --> 01:52:06.728
How do you sort an array of size 2?

01:52:06.728 --> 01:52:08.020
I've got an algorithm for that.

01:52:08.020 --> 01:52:08.710
Sort the left half.

01:52:08.710 --> 01:52:09.543
Sort the right half.

01:52:09.543 --> 01:52:10.870
Merge the two halves.

01:52:10.870 --> 01:52:14.110
All right, let's sort the left half.

01:52:14.110 --> 01:52:15.070
What now?

01:52:15.070 --> 01:52:17.020
So 6 is a list of size 1.

01:52:17.020 --> 01:52:19.580
What was that special base case then?

01:52:19.580 --> 01:52:21.500
So it was quit or just return.

01:52:21.500 --> 01:52:22.530
Like, I'm already done.

01:52:22.530 --> 01:52:26.510
So this list of size 1 is sort
of weirdly already sorted.

01:52:26.510 --> 01:52:27.920
So I'm making progress.

01:52:27.920 --> 01:52:31.070
Meanwhile, what's the next step
after sorting this left half?

01:52:31.070 --> 01:52:32.720
Sort the right half.

01:52:32.720 --> 01:52:33.710
Done.

01:52:33.710 --> 01:52:35.060
This is already sorted.

01:52:35.060 --> 01:52:36.170
But here's the magic.

01:52:36.170 --> 01:52:38.810
What's the third step for these numbers?

01:52:38.810 --> 01:52:39.310
Merge them.

01:52:39.310 --> 01:52:41.310
So this is like the left
hand, right hand thing.

01:52:41.310 --> 01:52:42.260
Which one comes first?

01:52:42.260 --> 01:52:45.380
Obviously, 3 and then 6.

01:52:45.380 --> 01:52:50.638
And now we are making progress because
now the list of size 2 is sorted.

01:52:50.638 --> 01:52:51.680
So what have I just done?

01:52:51.680 --> 01:52:56.400
I've just finished sorting the
left half of the left half.

01:52:56.400 --> 01:53:01.890
So after I've sorted the left half,
what comes next if you rewind in time?

01:53:01.890 --> 01:53:04.870
Sort the right half of that left half.

01:53:04.870 --> 01:53:06.900
So now I take the right half.

01:53:06.900 --> 01:53:09.330
And how do I sort this
right half of size 2?

01:53:09.330 --> 01:53:10.920
Well, sort its left half.

01:53:10.920 --> 01:53:11.610
Done.

01:53:11.610 --> 01:53:12.870
Sort its right half.

01:53:12.870 --> 01:53:13.560
Done.

01:53:13.560 --> 01:53:15.180
Now merge them together.

01:53:15.180 --> 01:53:17.880
And obviously, the 1
comes first, then the 4.

01:53:17.880 --> 01:53:19.110
Now where are we?

01:53:19.110 --> 01:53:22.080
We're at the point in the story
where we have a left left half

01:53:22.080 --> 01:53:26.040
sorted and a right left half sorted.

01:53:26.040 --> 01:53:28.890
So what comes next now narratively?

01:53:28.890 --> 01:53:35.333
Merge the two together, so
left hand, right hand, so 1346.

01:53:35.333 --> 01:53:36.750
And where are we now in the story?

01:53:36.750 --> 01:53:41.220
We're sort of at the beginning
because I've now sorted the left half.

01:53:41.220 --> 01:53:43.320
And you recall the demo
I did physically, this

01:53:43.320 --> 01:53:46.830
is how I had the left half was
already sorted on the second shelf.

01:53:46.830 --> 01:53:50.550
But now that I've sorted the left
half of the original array, what's

01:53:50.550 --> 01:53:53.580
the original second step?

01:53:53.580 --> 01:53:54.820
Sort the right half.

01:53:54.820 --> 01:53:55.900
So it's the same idea.

01:53:55.900 --> 01:53:57.780
So I'm borrowing some extra memory here.

01:53:57.780 --> 01:53:59.370
And now I've got an array of size 4.

01:53:59.370 --> 01:54:00.870
How do I sort an array of size 4?

01:54:00.870 --> 01:54:02.010
Sort the left half.

01:54:02.010 --> 01:54:03.900
All right, how do I
sort an array of size 2?

01:54:03.900 --> 01:54:05.190
Sort the left half.

01:54:05.190 --> 01:54:06.000
Done.

01:54:06.000 --> 01:54:07.230
Sort the right half.

01:54:07.230 --> 01:54:08.040
Done.

01:54:08.040 --> 01:54:08.670
Now what?

01:54:08.670 --> 01:54:09.990
Merge the two together.

01:54:09.990 --> 01:54:12.090
And of course, it's 2 and 5.

01:54:12.090 --> 01:54:13.080
Now what do I do?

01:54:13.080 --> 01:54:16.300
I've sorted the left
half of the right half.

01:54:16.300 --> 01:54:19.830
So now I sort the right
half of the right half.

01:54:19.830 --> 01:54:21.600
How do I sort a list of size 2?

01:54:21.600 --> 01:54:22.680
Well, sort the left half.

01:54:22.680 --> 01:54:23.100
Done.

01:54:23.100 --> 01:54:23.940
Sort the right half.

01:54:23.940 --> 01:54:24.600
Done.

01:54:24.600 --> 01:54:27.540
Merge them together-- 0 and 7.

01:54:27.540 --> 01:54:29.100
Where am I in the story?

01:54:29.100 --> 01:54:33.610
I've sorted the left half and the
right half of the original right half.

01:54:33.610 --> 01:54:34.980
So I merge these two together--

01:54:34.980 --> 01:54:38.730
0 and then 2 and then 5 and then 7.

01:54:38.730 --> 01:54:39.360
Whew.

01:54:39.360 --> 01:54:40.230
Where are we?

01:54:40.230 --> 01:54:43.080
We're at exactly the point in the
story where we had the numbers

01:54:43.080 --> 01:54:44.670
originally on the second shelf.

01:54:44.670 --> 01:54:48.790
We had a list that's sorted on the
left, a list that's sorted on the right.

01:54:48.790 --> 01:54:52.840
And what I demoed physically
was merging those two together.

01:54:52.840 --> 01:54:54.100
So what happens now?

01:54:54.100 --> 01:55:00.130
01234567.

01:55:00.130 --> 01:55:04.030
And it seems kind of magical and kind
of weird in that I kind of cheated.

01:55:04.030 --> 01:55:05.427
And when I had these leaves--

01:55:05.427 --> 01:55:07.510
these leaf nodes, so to
speak-- these singletons--

01:55:07.510 --> 01:55:08.648
I was, like, sorted.

01:55:08.648 --> 01:55:09.940
I wasn't really doing anything.

01:55:09.940 --> 01:55:12.700
But it's that merging that
seems to really be doing

01:55:12.700 --> 01:55:15.350
the magic of sorting things for us.

01:55:15.350 --> 01:55:17.380
So that felt like a mouthful.

01:55:17.380 --> 01:55:19.300
And recursion in general
is the kind of thing

01:55:19.300 --> 01:55:21.440
that will, like, bend
your brain a little bit.

01:55:21.440 --> 01:55:23.565
And if that went over your
head, like, that's fine.

01:55:23.565 --> 01:55:25.600
It takes time and time
and time and practice.

01:55:25.600 --> 01:55:30.550
But what there was not a lot of with
this algorithm was again and again

01:55:30.550 --> 01:55:31.960
and again and again.

01:55:31.960 --> 01:55:35.080
I was kind of only doing things once.

01:55:35.080 --> 01:55:38.150
And then once I fixed a
number, I moved on to the next.

01:55:38.150 --> 01:55:40.630
And that's sort of the essence
of this algorithm here.

01:55:40.630 --> 01:55:43.480
If it helps you to see
it in another visual way,

01:55:43.480 --> 01:55:46.300
let me go back to our
previous visualization here.

01:55:46.300 --> 01:55:48.790
Let me re-randomize the
array and click merge sort.

01:55:48.790 --> 01:55:51.670
And this time, notice that
merge sort is using more space.

01:55:51.670 --> 01:55:54.970
Technically, I was
using 1, 2, 3 shelves.

01:55:54.970 --> 01:55:57.400
But you can actually be slightly
more intelligent about it

01:55:57.400 --> 01:56:00.220
and actually just go back and forth
and back and forth between two shelves

01:56:00.220 --> 01:56:01.390
just to save a little space.

01:56:01.390 --> 01:56:03.320
So you're still using
twice as much space.

01:56:03.320 --> 01:56:06.190
But you don't need four times
as much space as the diagrams

01:56:06.190 --> 01:56:07.670
or as the shelves might imply.

01:56:07.670 --> 01:56:09.040
So here is merge sort.

01:56:09.040 --> 01:56:13.000
And you'll notice that we're sort
of working in halves-- sometimes

01:56:13.000 --> 01:56:15.070
big halves, sometimes smaller halves.

01:56:15.070 --> 01:56:18.370
But you can see as the
two halves are merged,

01:56:18.370 --> 01:56:20.660
things seem to happen very quickly.

01:56:20.660 --> 01:56:23.860
And so notice that this is the
same number of bars as before.

01:56:23.860 --> 01:56:25.630
But that was way faster, right?

01:56:25.630 --> 01:56:29.050
I don't need to stall nearly
as much as I have in the past.

01:56:29.050 --> 01:56:30.260
So why is that?

01:56:30.260 --> 01:56:33.340
Well, if we go back to
the diagram in question,

01:56:33.340 --> 01:56:35.210
here's the array that's already sorted.

01:56:35.210 --> 01:56:38.440
And if we consider now
exactly how much work is done,

01:56:38.440 --> 01:56:41.980
I'll stipulate already it's not
n squared. n squared was slow.

01:56:41.980 --> 01:56:44.480
And merge sort is actually
much better than that.

01:56:44.480 --> 01:56:46.340
Let's see how much better it is.

01:56:46.340 --> 01:56:48.070
So here is the original list--

01:56:48.070 --> 01:56:50.740
63415270.

01:56:50.740 --> 01:56:54.160
And here are all of the remnants of
that algorithm, all of the states

01:56:54.160 --> 01:56:57.520
that we were in at some point-- sort of
leaving a whole bunch of breadcrumbs.

01:56:57.520 --> 01:57:01.300
How many pieces of work did I do?

01:57:01.300 --> 01:57:08.810
Well, like, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 14, 15, 16, 17, 18, 19,

01:57:08.810 --> 01:57:11.470
20, 21, 22, 23, 24.

01:57:11.470 --> 01:57:15.370
So I moved things around,
like, 24 times, it seems.

01:57:15.370 --> 01:57:17.830
And how do I actually reason about that?

01:57:17.830 --> 01:57:19.040
Well, these are the numbers.

01:57:19.040 --> 01:57:21.070
This is like my
temporary workspace here.

01:57:21.070 --> 01:57:22.447
And 24 is the literal number.

01:57:22.447 --> 01:57:25.030
But let's see if we can't do
things a little more generically.

01:57:25.030 --> 01:57:27.790
Log base 2 of n, recall,
is what refers to anything

01:57:27.790 --> 01:57:30.492
that we're doing in half--
dividing, dividing, dividing.

01:57:30.492 --> 01:57:32.200
And that's kind of
what I was doing here.

01:57:32.200 --> 01:57:36.850
I took a list of size 8-- divide it
into two of size 4, then four of size 2,

01:57:36.850 --> 01:57:38.320
then eight of size 1.

01:57:38.320 --> 01:57:41.950
Well, how many times can you do this
if you start with eight numbers?

01:57:41.950 --> 01:57:43.600
Well, that's log base 2 of 8.

01:57:43.600 --> 01:57:46.990
And if we do some fancy math, that's
just log base 2 of 2 to the third.

01:57:46.990 --> 01:57:49.520
And remember, the base and the
number here can cancel out.

01:57:49.520 --> 01:57:50.710
So that's actually 3.

01:57:50.710 --> 01:57:55.450
So log base 2 of 8 is 3, which
means that's how many times you

01:57:55.450 --> 01:57:59.800
can divide a problem of size
8 in half, in half, in half.

01:57:59.800 --> 01:58:03.760
But every time we did that per this
chart, I had to take the numbers

01:58:03.760 --> 01:58:06.620
and merge them together, merge
them together, merge them together.

01:58:06.620 --> 01:58:12.010
And so on every row of this
postmortem of the algorithm,

01:58:12.010 --> 01:58:16.270
there are n steps, n steps, n steps.

01:58:16.270 --> 01:58:19.660
So laterally, there's n steps because
I had to merge all of those things

01:58:19.660 --> 01:58:20.570
back together.

01:58:20.570 --> 01:58:23.140
But what is the height
of these yellow remnants?

01:58:23.140 --> 01:58:26.650
Well, it's 3, which is log
base 2 of 8, which is 3.

01:58:26.650 --> 01:58:32.110
So this is technically three
times 8, ergo 24 steps.

01:58:32.110 --> 01:58:39.350
But more generally, this is log n
height and n width, so to speak.

01:58:39.350 --> 01:58:41.770
So the total running
time I claim is actually

01:58:41.770 --> 01:58:46.332
n log n, which it's OK if that doesn't
quite gel immediately in your mind,

01:58:46.332 --> 01:58:48.040
especially if you're
rusty on algorithms.

01:58:48.040 --> 01:58:51.230
But and we can throw away the base
because that's just a constant factor

01:58:51.230 --> 01:58:54.770
and with a wave of our hand when we
talk about big O notation, merge sort,

01:58:54.770 --> 01:59:01.070
I claim is in big O of n log
n-- that is, n times log n.

01:59:01.070 --> 01:59:05.360
Unfortunately, it is
also in omega of n log

01:59:05.360 --> 01:59:09.680
n, which means that, frankly, bubble
sort might sometimes outperform it,

01:59:09.680 --> 01:59:14.600
at least when the inputs are already
sorted or certainly relatively small.

01:59:14.600 --> 01:59:16.790
But that's probably
OK because in general,

01:59:16.790 --> 01:59:18.920
data that we're sorting
probably isn't very sorted.

01:59:18.920 --> 01:59:20.810
And honestly, we could
even half merge sort

01:59:20.810 --> 01:59:24.500
to just do one pass to check
initially, is the whole thing sorted,

01:59:24.500 --> 01:59:26.212
and then maybe terminate early.

01:59:26.212 --> 01:59:29.420
So we can maybe massage the algorithm
a little better to be a little smarter.

01:59:29.420 --> 01:59:34.240
But fundamentally, merge
sort is in theta of n log n,

01:59:34.240 --> 01:59:35.240
is how you would say it.

01:59:35.240 --> 01:59:38.210
It's on the order of
n times log n steps.

01:59:38.210 --> 01:59:42.950
Now, in terms of that chart,
it's strictly higher than linear.

01:59:42.950 --> 01:59:47.550
But it's strictly lower than quadratic--
n and n squared, respectively.

01:59:47.550 --> 01:59:49.530
So it clearly seems to be faster.

01:59:49.530 --> 01:59:51.170
So it's not as good as linear search.

01:59:51.170 --> 01:59:53.212
And it's definitely not
as good as binary search.

01:59:53.212 --> 01:59:58.140
But it's way better than selection
sort or bubble sort actually were.

01:59:58.140 --> 02:00:00.680
So with that said, we
thought we'd conclude

02:00:00.680 --> 02:00:03.140
by showing you a final
film that's just about

02:00:03.140 --> 02:00:08.000
a minute long that compares these actual
algorithms and shows them as follows.

02:00:08.000 --> 02:00:12.230
You'll see on the top, the middle, and
the bottom, three different algorithms.

02:00:12.230 --> 02:00:14.570
On the top, you will see selection sort.

02:00:14.570 --> 02:00:16.850
On the bottom, you will see bubble sort.

02:00:16.850 --> 02:00:23.450
And in the middle, you will see and hear
an appreciation of what n log n is--

02:00:23.450 --> 02:00:25.680
a.k.a., merge sort today.

02:00:25.680 --> 02:00:28.200
So if we could dim the
lights dramatically,

02:00:28.200 --> 02:00:31.471
this is n log n vis a vis n squared.

02:00:31.471 --> 02:00:32.138
[VIDEO PLAYBACK]

02:00:32.138 --> 02:00:35.126
[MUSIC PLAYING]

02:01:37.837 --> 02:01:38.420
[END PLAYBACK]

02:01:38.420 --> 02:01:40.090
All right, that's it for CS50.

02:01:40.090 --> 02:01:41.620
We'll see you next time.

02:01:41.620 --> 02:01:42.520
[APPLAUSE]

02:01:42.520 --> 02:01:45.570
[MUSIC PLAYING]