WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:04.960
[MUSIC PLAYING]

00:00:49.182 --> 00:00:50.140
DAVID MALAN: All right.

00:00:50.140 --> 00:00:53.260
This is CS50, and this is week 3.

00:00:53.260 --> 00:00:56.560
And you'll recall that last week, we
equipped you with a lot more tools

00:00:56.560 --> 00:00:59.830
by which to solve problems-- not
only problems that we had proposed,

00:00:59.830 --> 00:01:02.110
but problems in your own
code, that is to say bugs.

00:01:02.110 --> 00:01:05.980
And recall that those tools involve
command line tools like help50

00:01:05.980 --> 00:01:09.580
for help with cryptic error messages
that the compiler might spit out;

00:01:09.580 --> 00:01:11.500
style50 which, gives
you a bit of feedback

00:01:11.500 --> 00:01:14.230
on the stylization of your
code, the aesthetics thereof;

00:01:14.230 --> 00:01:18.550
check50, which checks the correctness
of your code against the specifications

00:01:18.550 --> 00:01:21.400
in a given problem set
or lab; printf, which

00:01:21.400 --> 00:01:24.640
is a function that it exists in
some form in almost any programming

00:01:24.640 --> 00:01:27.185
language that you might
ultimately learn, and this

00:01:27.185 --> 00:01:30.310
is simply a way of printing out anything
you might want from the computer's

00:01:30.310 --> 00:01:31.345
memory onto the screen.

00:01:31.345 --> 00:01:33.220
Then perhaps the most
powerful of these tools

00:01:33.220 --> 00:01:36.190
was debug50, which was
this interactive debugger.

00:01:36.190 --> 00:01:40.060
And even though this command debug50
is a little specific to CS50,

00:01:40.060 --> 00:01:42.970
what it triggers to happen,
that little side window

00:01:42.970 --> 00:01:46.150
where you can see the
stack of functions that you

00:01:46.150 --> 00:01:48.550
might have called
during some break point,

00:01:48.550 --> 00:01:51.942
and you can see the local variables that
you might have defined at some point

00:01:51.942 --> 00:01:53.650
during the execution
of your code, that's

00:01:53.650 --> 00:01:58.900
a very common conventional feature of
any debugger with most any language.

00:01:58.900 --> 00:02:02.500
And then lastly, recall there was
this ddb, duck debugger, which

00:02:02.500 --> 00:02:06.490
of course, takes this physical form, if
you happen to have a rubber duck lying

00:02:06.490 --> 00:02:07.870
around with whom you can talk.

00:02:07.870 --> 00:02:10.270
But I'm so pleased to
say that if you lack

00:02:10.270 --> 00:02:16.420
that currently while at home, CS50's
own Kareem and Brenda and Sophie

00:02:16.420 --> 00:02:19.150
have wonderfully added, if
you haven't noticed already,

00:02:19.150 --> 00:02:22.090
that same virtual duck to CS50 IDE.

00:02:22.090 --> 00:02:24.400
So if you click in the top
corner, you can actually

00:02:24.400 --> 00:02:27.328
begin to have a chat of
sorts with the rubber duck.

00:02:27.328 --> 00:02:30.370
And while this is a certainly more
playful incarnation of that same idea,

00:02:30.370 --> 00:02:32.560
we really can't emphasize
enough the value

00:02:32.560 --> 00:02:36.700
of talking through problems when you're
experiencing them in code with someone

00:02:36.700 --> 00:02:38.530
else or with something else.

00:02:38.530 --> 00:02:42.250
This particular duck, not all
that large of a vocabulary,

00:02:42.250 --> 00:02:45.100
but it's not so much what the
other person says but what you say

00:02:45.100 --> 00:02:47.952
and what you hear yourself
saying that is undoubtedly

00:02:47.952 --> 00:02:49.535
the most valuable part of the process.

00:02:49.535 --> 00:02:53.560
So our thanks to Kareem and
Brenda and Sophie on that.

00:02:53.560 --> 00:02:58.000
Recall last week, 2, that we
took a look underneath the hood,

00:02:58.000 --> 00:03:02.020
literally in some sense, at the
computer's memory in your laptop

00:03:02.020 --> 00:03:03.040
or desktop or phone.

00:03:03.040 --> 00:03:06.160
And then we decided to
think about this more

00:03:06.160 --> 00:03:09.100
artistically as just a grid of bytes.

00:03:09.100 --> 00:03:11.470
So within that chip, there's
a whole bunch of bits.

00:03:11.470 --> 00:03:14.620
And if you look at eight of them at a
time, there's a whole bunch of bytes.

00:03:14.620 --> 00:03:16.150
And it stands to reason
that we could think

00:03:16.150 --> 00:03:18.790
of this as the first byte, the
second byte, the third byte,

00:03:18.790 --> 00:03:21.360
and so forth, and sort of
chop this up pictorially

00:03:21.360 --> 00:03:25.790
into just a whole sequence of
bytes in the computer's memory.

00:03:25.790 --> 00:03:30.160
And recall that if we zoom in on that
and focus on just one continuous block

00:03:30.160 --> 00:03:33.500
of memory, otherwise
known as an "array,"

00:03:33.500 --> 00:03:37.640
we can do things within this array like
storing a bunch of different values.

00:03:37.640 --> 00:03:41.110
So recall last week, we
started by defining a little--

00:03:41.110 --> 00:03:44.710
goofily, multiple variables that
were almost identically names,

00:03:44.710 --> 00:03:47.470
like scores1, scores2, scores3.

00:03:47.470 --> 00:03:51.190
And then we began to clean up the design
of our code by introducing an array,

00:03:51.190 --> 00:03:55.540
so we can have just one variable
called scores, that is of size 3

00:03:55.540 --> 00:03:57.860
and has room for multiple values.

00:03:57.860 --> 00:04:02.320
So today, we'll continue to leverage
this feature of many programming

00:04:02.320 --> 00:04:04.570
languages-- being able to
store things continuously,

00:04:04.570 --> 00:04:07.000
back to back to back to
back, in a computer's memory,

00:04:07.000 --> 00:04:11.050
because this very simple layout, this
very simple feature of the language,

00:04:11.050 --> 00:04:13.270
is going to open up all
sorts of powerful features.

00:04:13.270 --> 00:04:17.410
And in fact, we can even
revisit some of the problems

00:04:17.410 --> 00:04:19.990
we tried to solve way back in week 0.

00:04:19.990 --> 00:04:21.700
But there is a catch with arrays.

00:04:21.700 --> 00:04:25.405
And we didn't really
emphasize this much last week.

00:04:25.405 --> 00:04:27.280
And that's because, even
though you and I can

00:04:27.280 --> 00:04:30.363
glance at this picture on the screen
and see immediately that, oh, there's

00:04:30.363 --> 00:04:33.640
seven boxes on the screen, there
are seven locations in which you

00:04:33.640 --> 00:04:37.870
can store values, you and I can sort of
have this bird's eye view of everything

00:04:37.870 --> 00:04:41.380
and just see what's inside
that entire array all at once.

00:04:41.380 --> 00:04:45.370
But computers, recall, are much
more methodical, more algorithmic,

00:04:45.370 --> 00:04:46.250
if you will.

00:04:46.250 --> 00:04:48.790
And so a computer, as
powerful as they are,

00:04:48.790 --> 00:04:51.945
can technically only look at one
location in an array at a time.

00:04:51.945 --> 00:04:54.070
So whereas you and I can
glance at this and sort of

00:04:54.070 --> 00:04:57.370
take it all in at once a computer
just can't glance at its memory

00:04:57.370 --> 00:05:00.040
and take in all at once
all of the values therein,

00:05:00.040 --> 00:05:04.270
it has to do so more methodically, for
instance, from left to right, maybe

00:05:04.270 --> 00:05:06.950
right to left, maybe middle onward.

00:05:06.950 --> 00:05:08.990
But it has to be an algorithm.

00:05:08.990 --> 00:05:12.850
And so today we'll formalize that notion
and really kind of hide the fact that

00:05:12.850 --> 00:05:16.840
this array cannot be seen all at once,
you can only look at one location

00:05:16.840 --> 00:05:19.090
in an array at a given time.

00:05:19.090 --> 00:05:21.160
And this is going to have
very real implications.

00:05:21.160 --> 00:05:23.820
For instance, if we consider
that very first problem

00:05:23.820 --> 00:05:27.430
in the very first week where we tried
to find my phone number in a phone book,

00:05:27.430 --> 00:05:30.640
the very naive approach was to start
at the beginning and search from left

00:05:30.640 --> 00:05:31.180
to right.

00:05:31.180 --> 00:05:33.055
And we tried a couple
of variants thereafter.

00:05:33.055 --> 00:05:35.590
But the problem, quite
simply, is that of searching.

00:05:35.590 --> 00:05:37.690
And this is a term of
art in computer science,

00:05:37.690 --> 00:05:40.540
super common, certainly for
you and I as users on Google

00:05:40.540 --> 00:05:42.830
and the like to search
for things all day long.

00:05:42.830 --> 00:05:46.690
And so certainly searching well,
designing a search algorithm well,

00:05:46.690 --> 00:05:49.710
is certainly a compelling feature
of so many of today's tools

00:05:49.710 --> 00:05:50.723
that you and I use.

00:05:50.723 --> 00:05:52.890
So if we think of this
really as a problem to solve,

00:05:52.890 --> 00:05:56.430
we've got some input, which, for
instance, might be an array of numbers,

00:05:56.430 --> 00:05:59.640
or maybe an array of web
pages in the case of Google.

00:05:59.640 --> 00:06:01.270
And the goal is to get some output.

00:06:01.270 --> 00:06:05.340
So if the input to the problem is
an array of values, the output,

00:06:05.340 --> 00:06:09.630
hopefully, is going to be something
as simple, really, as a bool--

00:06:09.630 --> 00:06:10.650
yes or no.

00:06:10.650 --> 00:06:15.180
Is the value you're
looking for discoverable?

00:06:15.180 --> 00:06:20.130
Can you search for and find that
value, yes or no, true or false?

00:06:20.130 --> 00:06:23.190
Now, within this black box, recall,
is going to be some algorithm.

00:06:23.190 --> 00:06:25.530
And that's where today we'll
spend most of our time.

00:06:25.530 --> 00:06:28.680
Indeed, we won't really introduce
that many more features of C.

00:06:28.680 --> 00:06:30.480
We won't introduce that much more code.

00:06:30.480 --> 00:06:33.930
We'll focus again on ideas, just
taking for granted now that you

00:06:33.930 --> 00:06:35.670
have some more tools in your toolkit.

00:06:35.670 --> 00:06:38.100
Beyond loops and conditions
and Boolean expressions,

00:06:38.100 --> 00:06:40.830
we now have this other
tool known as arrays.

00:06:40.830 --> 00:06:45.060
But let's first introduce some other
terms of art, some jargon if you will,

00:06:45.060 --> 00:06:46.990
related to what we'll call running time.

00:06:46.990 --> 00:06:48.780
So we've alluded to this a few times.

00:06:48.780 --> 00:06:51.510
When we're thinking about just
how good or bad an algorithm is,

00:06:51.510 --> 00:06:53.490
we describe how long it takes to run.

00:06:53.490 --> 00:06:54.540
That is its running time.

00:06:54.540 --> 00:06:57.330
The running time of an
algorithm is how long it takes--

00:06:57.330 --> 00:07:00.820
how many steps it takes, how many
seconds it takes, how many iterations

00:07:00.820 --> 00:07:01.320
it takes.

00:07:01.320 --> 00:07:04.140
It doesn't really matter
what your unit of measure is.

00:07:04.140 --> 00:07:07.350
Maybe it's time, maybe it's
iterations or something else.

00:07:07.350 --> 00:07:10.422
But running time just refers to
how long does an algorithm take.

00:07:10.422 --> 00:07:13.380
And there are ways that we can think
about this a little more formally.

00:07:13.380 --> 00:07:16.020
And we kind of did this
already in the first week,

00:07:16.020 --> 00:07:19.320
but we didn't give it this
name, this italicized O,

00:07:19.320 --> 00:07:23.220
this capital O on the screen, is
otherwise known as Big O notation.

00:07:23.220 --> 00:07:25.470
And computer scientists
and some mathematicians

00:07:25.470 --> 00:07:28.410
will very frequently use,
literally, this symbol

00:07:28.410 --> 00:07:31.410
to describe the running
times of algorithms,

00:07:31.410 --> 00:07:33.310
or mathematically like a function.

00:07:33.310 --> 00:07:34.720
So recall this picture, in fact.

00:07:34.720 --> 00:07:38.190
When we were searching that phone book,
we did it sort of good, better, best.

00:07:38.190 --> 00:07:41.800
We did it linearly-- that is,
searching one page at a time,

00:07:41.800 --> 00:07:44.940
we did it twice as fast by
doing two pages at a time--

00:07:44.940 --> 00:07:48.690
and then we did it logarithmically
by dividing and conquering,

00:07:48.690 --> 00:07:50.060
in half and half and half.

00:07:50.060 --> 00:07:52.560
And at the time, I proposed
that if we think of a phone book

00:07:52.560 --> 00:07:57.630
as having n pages, where n is just a
number in computer science vernacular,

00:07:57.630 --> 00:08:00.300
we might describe the running
time, or the number of steps

00:08:00.300 --> 00:08:04.350
involved for that first algorithm, as
being maybe in the worst case n steps.

00:08:04.350 --> 00:08:06.780
If the person you're looking
for in a phone book maybe

00:08:06.780 --> 00:08:10.297
alphabetically has the last
name starting with Z in English,

00:08:10.297 --> 00:08:12.880
well, the Z might be all the way
at the end of the phone book.

00:08:12.880 --> 00:08:15.450
So at the worst case, you
might be taking n steps

00:08:15.450 --> 00:08:18.090
to find someone like
myself in that phone book.

00:08:18.090 --> 00:08:20.400
The second algorithm,
though, was twice as fast,

00:08:20.400 --> 00:08:22.510
because we went two pages at a time.

00:08:22.510 --> 00:08:24.985
So we might describe its
running time as n divided by 2.

00:08:24.985 --> 00:08:28.110
And then the third algorithm, where we
divided the problem in half and half

00:08:28.110 --> 00:08:31.680
and half, literally throwing half
of the problem away again and again,

00:08:31.680 --> 00:08:35.760
was logarithmic-- technically log
base 2 of n, which, again, is just

00:08:35.760 --> 00:08:39.390
a mathematical formula that
refers to halving something again

00:08:39.390 --> 00:08:40.289
and again and again.

00:08:40.289 --> 00:08:43.690
And you start with, of course,
n pages in that scenario.

00:08:43.690 --> 00:08:46.560
Well, it turns out that a
computer scientist would actually

00:08:46.560 --> 00:08:49.027
wave their hands at some of
these mathematical details.

00:08:49.027 --> 00:08:51.360
Indeed, we're not going to
get into the habit of writing

00:08:51.360 --> 00:08:53.517
very precise mathematical formulas.

00:08:53.517 --> 00:08:55.350
What we're instead going
to do is try to get

00:08:55.350 --> 00:08:59.850
a sense of the order on which
the running time of an algorithm

00:08:59.850 --> 00:09:03.870
is, just roughly how fast
or how slow it is, but still

00:09:03.870 --> 00:09:06.210
using some symbology
like n as a placeholder.

00:09:06.210 --> 00:09:09.480
And so a computer scientist would
describe the running time of all three

00:09:09.480 --> 00:09:14.820
of those algorithms from week 0 as
being big O of n, or big O of n/2,

00:09:14.820 --> 00:09:17.040
or big O of log base 2 of n.

00:09:17.040 --> 00:09:18.900
So "big O" just means "on the order of."

00:09:18.900 --> 00:09:20.520
It's sort of a wave of the hand.

00:09:20.520 --> 00:09:24.900
Maybe it's n minus 1, maybe it's
n plus 1, maybe it's even 2n.

00:09:24.900 --> 00:09:28.240
But it's on the order of
n or these other values.

00:09:28.240 --> 00:09:30.468
But in fact, too,
notice that this chart,

00:09:30.468 --> 00:09:33.510
there's something kind of curious. ,
Like these first two algorithms from

00:09:33.510 --> 00:09:36.870
week 0 kind of pictorially
look pretty much the same.

00:09:36.870 --> 00:09:39.137
Like undoubtedly, the yellow
line is a little lower

00:09:39.137 --> 00:09:41.970
and therefore a little better and
a little faster than the red line.

00:09:41.970 --> 00:09:43.200
But they have the same shape.

00:09:43.200 --> 00:09:47.370
And in fact, I bet if we zoomed
way out, these two straight lines

00:09:47.370 --> 00:09:49.380
would pretty much look identical.

00:09:49.380 --> 00:09:53.412
If you change your axis to be
big enough and tall enough,

00:09:53.412 --> 00:09:54.870
these would start to blur together.

00:09:54.870 --> 00:09:57.880
But clearly, the green line
is fundamentally different.

00:09:57.880 --> 00:10:01.020
And so this speaks to a
computer scientist's tendency

00:10:01.020 --> 00:10:03.870
to not really quibble
over these details.

00:10:03.870 --> 00:10:06.977
Like, yes, the second
algorithm in week 0 was better.

00:10:06.977 --> 00:10:08.310
Yes, this yellow line is better.

00:10:08.310 --> 00:10:12.540
But, eh, let's just call both of
those algorithms running times

00:10:12.540 --> 00:10:13.800
on the order of n.

00:10:13.800 --> 00:10:18.390
That is to say, a computer scientist
tends to throw away constant factors,

00:10:18.390 --> 00:10:20.840
like the 1/2 or the divided by 2.

00:10:20.840 --> 00:10:23.250
And they tend to focus only
on the dominant factor,

00:10:23.250 --> 00:10:27.810
like which value in that mathematical
expression is going to grow the most,

00:10:27.810 --> 00:10:28.890
grow the fastest.

00:10:28.890 --> 00:10:32.280
And n divided by 2n it's
going to dominate over time.

00:10:32.280 --> 00:10:34.830
The bigger the phone book
gets, the more pages you have.

00:10:34.830 --> 00:10:38.238
It's really n that's going to matter
less so than that divided by 2.

00:10:38.238 --> 00:10:39.280
And same thing over here.

00:10:39.280 --> 00:10:41.700
If you're familiar with and
remember your logarithms,

00:10:41.700 --> 00:10:45.120
we don't really have to even care
about the base of that logarithm.

00:10:45.120 --> 00:10:48.940
Yes, it's base 2, but eh, we
can just multiply that logarithm

00:10:48.940 --> 00:10:51.490
by some other number to convert
it to any base we want--

00:10:51.490 --> 00:10:54.040
base 10, base 3, base 7, anything.

00:10:54.040 --> 00:10:55.960
So let's just say it's
on the order of log n.

00:10:55.960 --> 00:10:57.760
So this is good, because
it means we're not really

00:10:57.760 --> 00:10:59.815
going to waste time getting
really into the weeds

00:10:59.815 --> 00:11:02.440
mathematically when we talk about
the efficiency of algorithms.

00:11:02.440 --> 00:11:07.580
It suffices to describe things really in
terms of the variable, n in this case,

00:11:07.580 --> 00:11:09.370
if you will, that dominates over time.

00:11:09.370 --> 00:11:10.510
And indeed, let's zoom out.

00:11:10.510 --> 00:11:12.910
If I zoom out on this
picture, boom, you begin

00:11:12.910 --> 00:11:16.405
to see that, yeah, these are really
starting to look almost identical.

00:11:16.405 --> 00:11:20.140
And if we kept zooming out, you
would see that they're essentially

00:11:20.140 --> 00:11:21.040
one in the same.

00:11:21.040 --> 00:11:25.420
But the green one stands out, so
that's indeed on the order of log of n

00:11:25.420 --> 00:11:26.768
as opposed to n itself.

00:11:26.768 --> 00:11:28.060
So here's a little cheat sheet.

00:11:28.060 --> 00:11:30.280
It turns out that
within computer science,

00:11:30.280 --> 00:11:33.250
and within the analysis
of algorithms, we're

00:11:33.250 --> 00:11:36.410
going to tend to see some
common formulas like this.

00:11:36.410 --> 00:11:38.290
So we've just seen on the order of n.

00:11:38.290 --> 00:11:39.880
We've seen on the order of log n.

00:11:39.880 --> 00:11:42.850
It turns out that the very
common two is going to be n times

00:11:42.850 --> 00:11:46.677
log n, maybe even n squared,
and then even big O of 1.

00:11:46.677 --> 00:11:48.760
And the last of those just
means that an algorithm

00:11:48.760 --> 00:11:50.800
takes, wonderfully, one step--

00:11:50.800 --> 00:11:55.420
or maybe two steps, maybe even 10
steps, but a constant number of steps.

00:11:55.420 --> 00:11:58.780
So that's sort of the best case
scenario, at least among these options.

00:11:58.780 --> 00:12:01.820
Whereas, n squared is going
to start to take a long time.

00:12:01.820 --> 00:12:03.820
It's going to start to
feel slow, because if you

00:12:03.820 --> 00:12:06.070
take any value of n
and square it, that's

00:12:06.070 --> 00:12:08.510
going to imply more and more steps.

00:12:08.510 --> 00:12:10.998
So just a bit of jargon,
then, to start off today,

00:12:10.998 --> 00:12:12.790
whereby we now have
this sort of vocabulary

00:12:12.790 --> 00:12:16.750
with which to describe the running times
of an algorithm in terms of this Big O

00:12:16.750 --> 00:12:17.560
notation.

00:12:17.560 --> 00:12:19.240
But there's one other notation.

00:12:19.240 --> 00:12:22.960
And just as big O
refers to an upper bound

00:12:22.960 --> 00:12:25.930
on running times, like
how many steps maximally,

00:12:25.930 --> 00:12:27.970
how much time maximally
might an algorithm take,

00:12:27.970 --> 00:12:31.730
this omega notation
refers to the opposite.

00:12:31.730 --> 00:12:34.293
What's a lower bound on the
running time of an algorithm?

00:12:34.293 --> 00:12:36.460
And we don't need another
picture or other formulas.

00:12:36.460 --> 00:12:37.740
We can reuse the same one.

00:12:37.740 --> 00:12:40.150
So this cheat sheet
here just proposes that,

00:12:40.150 --> 00:12:43.850
when describing the efficiency
or inefficiency of an algorithm

00:12:43.850 --> 00:12:45.850
and you want to come
up with a lower bound--

00:12:45.850 --> 00:12:48.630
like minimally, how many
steps does my algorithm take--

00:12:48.630 --> 00:12:50.380
we can use the same
mathematical formulas,

00:12:50.380 --> 00:12:54.340
but we can note that with
omega instead of big O.

00:12:54.340 --> 00:12:56.620
So again, looks fancy,
but it really just refers

00:12:56.620 --> 00:13:01.090
to a wave of the hand trying to sort of
ballpark exactly what the running time

00:13:01.090 --> 00:13:02.600
is of an algorithm.

00:13:02.600 --> 00:13:05.080
And thankfully, we've seen
a few algorithms already,

00:13:05.080 --> 00:13:08.650
including in that week 0, and now we're
going to give it a more formal name.

00:13:08.650 --> 00:13:11.750
Linear search is what we
did with that phone book

00:13:11.750 --> 00:13:15.010
first off by searching
it page by page by page,

00:13:15.010 --> 00:13:18.710
looking for my phone number
in that particular example.

00:13:18.710 --> 00:13:23.412
And so the difference today
is that, unlike us humans, who

00:13:23.412 --> 00:13:26.620
can look down at a phone book page and
see a whole bunch of names and numbers

00:13:26.620 --> 00:13:30.490
at once, unlike a human who can look
at an array on the board a moment ago

00:13:30.490 --> 00:13:32.530
and sort of see everything
at once, we need

00:13:32.530 --> 00:13:34.630
to be more methodical,
more deliberate today so

00:13:34.630 --> 00:13:39.160
that we can translate week 0's
ideas now, not into even pseudocode,

00:13:39.160 --> 00:13:40.510
but actual C code.

00:13:40.510 --> 00:13:43.810
And so wonderfully, here at
the American Repertory Theater

00:13:43.810 --> 00:13:46.280
as we are on Harvard's
campus this semester,

00:13:46.280 --> 00:13:48.610
we've been collaborating
with the whole team

00:13:48.610 --> 00:13:51.670
here who are much more artistically
inclined than certainly I

00:13:51.670 --> 00:13:52.900
could be on my own here.

00:13:52.900 --> 00:13:55.150
And we have these seven
wonderful doors that

00:13:55.150 --> 00:13:58.690
were previously used in various
theatrical shows that took place here

00:13:58.690 --> 00:13:59.680
in this theater.

00:13:59.680 --> 00:14:03.130
And we've even collaborated
with the theater's prop shop,

00:14:03.130 --> 00:14:07.030
who in back have wonderfully
manufactured some delightful numbers

00:14:07.030 --> 00:14:08.300
and brought them to life.

00:14:08.300 --> 00:14:12.437
Which is to say that, behind each
of these seven doors is a number.

00:14:12.437 --> 00:14:15.520
And this is going to be an opportunity
now to really hammer home the point

00:14:15.520 --> 00:14:18.430
that when we want to search
for some number in an array,

00:14:18.430 --> 00:14:22.900
it's pretty equivalent to having to
search for a number, in this case,

00:14:22.900 --> 00:14:25.210
behind an otherwise closed door.

00:14:25.210 --> 00:14:27.190
You and I can't just look
at all of these doors

00:14:27.190 --> 00:14:29.020
now and figure out where a number is.

00:14:29.020 --> 00:14:30.430
We have to be more methodical.

00:14:30.430 --> 00:14:32.440
We have to start searching
these doors, maybe

00:14:32.440 --> 00:14:35.688
from left to right, maybe from right
to left, maybe from the middle on out.

00:14:35.688 --> 00:14:38.980
But we need to come up with an algorithm
and ultimately translate that to code.

00:14:38.980 --> 00:14:43.630
So for instance, suppose I were
to search for the number 0.

00:14:43.630 --> 00:14:48.700
How could we go about searching,
methodically, these seven wooden doors

00:14:48.700 --> 00:14:50.200
for the number 0?

00:14:50.200 --> 00:14:54.310
Let me take a suggestion
from the audience.

00:14:54.310 --> 00:14:56.620
What approach might you take?

00:14:56.620 --> 00:15:02.540
What first step would you propose I
take here on my own with these doors?

00:15:02.540 --> 00:15:04.540
Any recommendations?

00:15:04.540 --> 00:15:07.030
How do I begin to find
myself the number 0?

00:15:07.030 --> 00:15:09.280
Florence, what do you propose?

00:15:09.280 --> 00:15:13.523
AUDIENCE: I would propose starting form
the left, since 0 is a smaller number.

00:15:13.523 --> 00:15:14.440
DAVID MALAN: OK, good.

00:15:14.440 --> 00:15:15.880
And hang in there for
with me for just a moment.

00:15:15.880 --> 00:15:18.630
Let me go ahead and started on the
left edge as Florence proposes.

00:15:18.630 --> 00:15:21.040
Go ahead and open the door,
and hopefully, voila--

00:15:21.040 --> 00:15:21.850
no.

00:15:21.850 --> 00:15:23.020
It's a number 4.

00:15:23.020 --> 00:15:23.920
So it's not a 0.

00:15:23.920 --> 00:15:27.150
So Florence, what would
you propose I do next?

00:15:27.150 --> 00:15:31.790
AUDIENCE: I'd probably start
in the middle somewhere,

00:15:31.790 --> 00:15:35.533
if, like, in case, I don't
know, it's going down by 1.

00:15:35.533 --> 00:15:36.200
DAVID MALAN: OK.

00:15:36.200 --> 00:15:37.367
So maybe it's going down.

00:15:37.367 --> 00:15:38.700
So let me go ahead and try that.

00:15:38.700 --> 00:15:41.690
So you propose middle, I could
go over here, and voila--

00:15:41.690 --> 00:15:42.620
nope.

00:15:42.620 --> 00:15:44.360
That's the number 2.

00:15:44.360 --> 00:15:47.820
And I wonder, where else should I look.

00:15:47.820 --> 00:15:49.070
Let me-- I'm a little curious.

00:15:49.070 --> 00:15:50.570
I'm a little nervous that
I ignored these doors.

00:15:50.570 --> 00:15:53.320
So Florence, if you don't mind,
let's go ahead and look here and--

00:15:53.320 --> 00:15:56.240
no, that's the number 6, it seems.

00:15:56.240 --> 00:15:59.690
Let's go ahead and check
in here, the number 8.

00:15:59.690 --> 00:16:02.670
So they're kind of going up and down.

00:16:02.670 --> 00:16:05.150
So Florence, how might I finish
searching for this number?

00:16:05.150 --> 00:16:08.420
What remains to be done, would you say?

00:16:08.420 --> 00:16:10.490
AUDIENCE: Probably start
from the right now.

00:16:10.490 --> 00:16:11.157
DAVID MALAN: OK.

00:16:11.157 --> 00:16:14.420
So I could start from the right
now, and maybe just go over here.

00:16:14.420 --> 00:16:16.550
And voila-- and there it is.

00:16:16.550 --> 00:16:17.690
So we found the number 0.

00:16:17.690 --> 00:16:19.820
So let me ask Florence,
what was your algorithm?

00:16:19.820 --> 00:16:23.060
How did you go about so successfully
finding the number 0 for us?

00:16:26.130 --> 00:16:30.990
AUDIENCE: I guess I initially tried
starting, like, by going down by 1.

00:16:30.990 --> 00:16:36.840
So like, if the number
was not at the left,

00:16:36.840 --> 00:16:40.490
then going to the center, which
is, like, halfway in between

00:16:40.490 --> 00:16:41.780
and then going to [INAUDIBLE].

00:16:41.780 --> 00:16:42.410
I don't know.

00:16:42.410 --> 00:16:45.780
DAVID MALAN: And playfully, how did that
work out for you, going to the middle?

00:16:45.780 --> 00:16:48.420
Better or worse, no different?

00:16:48.420 --> 00:16:52.290
AUDIENCE: I mean, I guess
maybe it helped a little bit

00:16:52.290 --> 00:16:54.083
to then go all the way to the right.

00:16:54.083 --> 00:16:54.750
DAVID MALAN: OK.

00:16:54.750 --> 00:16:56.370
Yeah, we might have
gleaned some information.

00:16:56.370 --> 00:16:59.203
But let's go ahead and take a look
at all of the doors for a moment.

00:16:59.203 --> 00:17:00.720
There's that 4 and the 6 again.

00:17:00.720 --> 00:17:02.400
Here is that 8 again.

00:17:02.400 --> 00:17:04.920
Over in the middle we have the 2 again.

00:17:04.920 --> 00:17:08.490
Over here we have a
7 for the first time.

00:17:08.490 --> 00:17:10.109
Over here we have a 5.

00:17:10.109 --> 00:17:11.996
And then of course, we have a 0.

00:17:11.996 --> 00:17:14.579
And if you took all of that in,
honestly, Florence, you and I,

00:17:14.579 --> 00:17:16.079
we couldn't really have done any better.

00:17:16.079 --> 00:17:18.130
Because these door-- these
numbers, it turns out,

00:17:18.130 --> 00:17:20.310
are just randomly arranged
behind these doors.

00:17:20.310 --> 00:17:23.460
So it wasn't bad at all that
you kind of hopped around.

00:17:23.460 --> 00:17:26.060
Although, the downside
is if you hop around,

00:17:26.060 --> 00:17:29.642
you and I as humans can pretty easily
remember where we've been before.

00:17:29.642 --> 00:17:32.100
But if you think about how we
would translate that to code,

00:17:32.100 --> 00:17:34.500
I feel like we're starting to
accumulate a bunch of variables

00:17:34.500 --> 00:17:36.417
maybe, because you have
to keep track of that.

00:17:36.417 --> 00:17:40.140
So frankly, maybe the
simplest solution-- whoops--

00:17:40.140 --> 00:17:44.460
maybe the simplest solution would
have been where we started in week 0,

00:17:44.460 --> 00:17:48.540
where we just take a very simple
if naive approach of starting

00:17:48.540 --> 00:17:52.920
with our array, this time of size
7, behind which are some numbers.

00:17:52.920 --> 00:17:55.530
And if you don't know
anything about those numbers,

00:17:55.530 --> 00:18:01.260
honestly the best you can do is just
that same linear search from week 0,

00:18:01.260 --> 00:18:05.190
and just check, one at a time, the
values behind each of these doors

00:18:05.190 --> 00:18:08.940
and just hope that
eventually you will find it.

00:18:08.940 --> 00:18:12.360
So this is already sort of
taking a lot of time, right?

00:18:12.360 --> 00:18:16.090
If I do this linear search
approach like I did in week 0,

00:18:16.090 --> 00:18:18.840
I'm potentially going to have to
search behind all of those doors.

00:18:18.840 --> 00:18:21.090
I'm going to have to search
behind all of those doors.

00:18:21.090 --> 00:18:24.485
So let's consider a little more formally
exactly how I could at least implement

00:18:24.485 --> 00:18:25.110
that algorithm.

00:18:25.110 --> 00:18:26.730
Because I could take the
approach that Florence

00:18:26.730 --> 00:18:29.105
proposed, and just kind of
jumping around and maybe using

00:18:29.105 --> 00:18:30.030
a bit of intuition.

00:18:30.030 --> 00:18:31.530
But again, that's not
really an algorithm.

00:18:31.530 --> 00:18:33.572
We really need to do
something more step by step.

00:18:33.572 --> 00:18:36.447
And in the meantime, let's go ahead,
Joe, and let's close the curtain

00:18:36.447 --> 00:18:39.300
and see if we can't clean those up
with another problem in a moment,

00:18:39.300 --> 00:18:43.180
while we consider now linear
search and the analysis thereof.

00:18:43.180 --> 00:18:46.650
So with linear search,
I would propose that we

00:18:46.650 --> 00:18:50.160
could implement it in pseudocode
first, if you will, like this.

00:18:50.160 --> 00:18:53.280
For i from 0 to n minus 1--

00:18:53.280 --> 00:18:57.090
we'll see where we're going with this--
if the number is behind the i-th door,

00:18:57.090 --> 00:19:02.110
return true, otherwise at
the very end return false.

00:19:02.110 --> 00:19:05.280
So it's a relatively simple
translation into pseudocode,

00:19:05.280 --> 00:19:08.130
much like we did with the
phone book some time ago.

00:19:08.130 --> 00:19:10.260
And why, though, these values?

00:19:10.260 --> 00:19:13.200
Because I'm now starting to express
myself a little more like C,

00:19:13.200 --> 00:19:14.770
even though it's still pseudocode.

00:19:14.770 --> 00:19:16.800
So for i from 0 to n minus 1.

00:19:16.800 --> 00:19:19.320
So computer scientists tend
to start counting from 0.

00:19:19.320 --> 00:19:25.230
If there's n doors, or 7 doors in this
case, you want to go from 0 on up to 6,

00:19:25.230 --> 00:19:27.930
or from 0 on up to n minus 1.

00:19:27.930 --> 00:19:31.980
So this is just a very common way of
setting yourself up with a for loop,

00:19:31.980 --> 00:19:34.320
maybe in C, maybe in
pseudocode in this case, that

00:19:34.320 --> 00:19:38.140
just gets you from left to right,
algorithmically step by step.

00:19:38.140 --> 00:19:41.530
If a condition, number
is behind the i-th door--

00:19:41.530 --> 00:19:43.710
i-th just being a
colloquial way of saying,

00:19:43.710 --> 00:19:45.960
what is behind the door at location i--

00:19:45.960 --> 00:19:47.160
go ahead and return true.

00:19:47.160 --> 00:19:50.640
I have found myself the number I
want, for instance, the number 0.

00:19:50.640 --> 00:19:54.640
And then notice that this return
false is not part of an else,

00:19:54.640 --> 00:19:58.710
because I don't want to abort this
algorithm prematurely and abort simply

00:19:58.710 --> 00:20:01.950
because a number is not
behind the current door.

00:20:01.950 --> 00:20:05.820
I essentially want to wait all the
way to the end of the algorithm,

00:20:05.820 --> 00:20:10.080
after I've checked all n doors,
and if I have still not found

00:20:10.080 --> 00:20:14.150
the number I care about, then and
only then am I going to return false.

00:20:14.150 --> 00:20:15.900
So a very common
programming mistake might

00:20:15.900 --> 00:20:19.920
be to nest this internally and think
about things in terms of ifs and elses.

00:20:19.920 --> 00:20:21.600
But you don't need to have an else.

00:20:21.600 --> 00:20:24.900
This is kind of a catchall
here at the very end.

00:20:24.900 --> 00:20:29.460
But now let's consider, if this is the
pseudocode for linear search, just what

00:20:29.460 --> 00:20:32.730
is the efficiency of linear search?

00:20:32.730 --> 00:20:35.650
What is the efficiency of
linear search, which is to say,

00:20:35.650 --> 00:20:37.680
how well-designed is this algorithm?

00:20:37.680 --> 00:20:40.260
We put or gave ourselves
a framework a moment ago,

00:20:40.260 --> 00:20:43.500
Big O notation, which is an upper
bound, which we can think of for now

00:20:43.500 --> 00:20:45.060
as meaning like a worst case.

00:20:45.060 --> 00:20:49.380
In the worst case, how many steps
might it take me to find the number 0--

00:20:49.380 --> 00:20:51.030
or any number for that matter--

00:20:51.030 --> 00:20:52.800
among n doors?

00:20:52.800 --> 00:20:57.060
Is it big O of n squared, big
O of n times log n, big O of n,

00:20:57.060 --> 00:21:00.360
big O of log n, or big O of
one, which, again, just means

00:21:00.360 --> 00:21:03.290
a constant fixed number of steps?

00:21:03.290 --> 00:21:06.670
Brian, could we go ahead
and pull up this question?

00:21:06.670 --> 00:21:09.180
Let me go ahead and pull
it up on my screen as well.

00:21:09.180 --> 00:21:14.250
If you go to our usual URL to
propose what you think an upper bound

00:21:14.250 --> 00:21:18.210
is on the running time of linear search.

00:21:18.210 --> 00:21:18.710
OK.

00:21:18.710 --> 00:21:20.940
Indeed, if we consider now the
running time of linear search,

00:21:20.940 --> 00:21:22.107
it's going to be big O of n.

00:21:22.107 --> 00:21:22.678
Why is that?

00:21:22.678 --> 00:21:24.720
So in the worst case, the
number I'm looking for,

00:21:24.720 --> 00:21:27.480
0, might very well be at
the end of that list, which

00:21:27.480 --> 00:21:31.087
is going to be on the order of n steps,
or in this case precisely n steps.

00:21:31.087 --> 00:21:32.670
So that's one way to think about this.

00:21:32.670 --> 00:21:34.590
Well, now let me ask
a follow-up question.

00:21:34.590 --> 00:21:39.090
Proposing instead that we consider
omega notation, which is a lower bound

00:21:39.090 --> 00:21:40.800
on the running time of an algorithm--

00:21:40.800 --> 00:21:43.320
Brian, could we go ahead
and ask this question next?

00:21:43.320 --> 00:21:47.470
At that same URL, we'll
see a question asking now

00:21:47.470 --> 00:21:55.210
for the possible answers
for the running time--

00:21:55.210 --> 00:21:58.280
for a lower bound on the
running time of linear search.

00:21:58.280 --> 00:22:00.800
So let's go ahead and take
a look at this one here.

00:22:00.800 --> 00:22:03.220
And in just a moment, we'll
see as the responses come in,

00:22:03.220 --> 00:22:05.830
about 75-plus percent
of you are proposing

00:22:05.830 --> 00:22:07.630
that it's actually omega of 1.

00:22:07.630 --> 00:22:09.220
So omega is a lower bound.

00:22:09.220 --> 00:22:10.923
1 refers to constant time.

00:22:10.923 --> 00:22:11.590
And why is that?

00:22:11.590 --> 00:22:13.507
Let me just take a quick
answer on this point.

00:22:13.507 --> 00:22:17.890
Among the 75% of you who said one
step, or a constant number of steps,

00:22:17.890 --> 00:22:19.060
why is that?

00:22:19.060 --> 00:22:23.260
How do you think about this
lower bound on running time?

00:22:23.260 --> 00:22:25.210
How about from Keith?

00:22:25.210 --> 00:22:27.600
Why omega of 1?

00:22:27.600 --> 00:22:31.120
AUDIENCE: Yeah, you can just open it and
be lucky and find it in the first door.

00:22:31.120 --> 00:22:31.410
DAVID MALAN: Yeah.

00:22:31.410 --> 00:22:32.830
So it really speaks to just that.

00:22:32.830 --> 00:22:34.900
You might just get lucky, and
the number you're looking for

00:22:34.900 --> 00:22:36.290
might be at the very first door.

00:22:36.290 --> 00:22:39.760
So the lower bound, in the best
case, if you will, of this algorithm,

00:22:39.760 --> 00:22:44.530
linear search might very well be
omega of 1 for exactly that reason--

00:22:44.530 --> 00:22:46.540
that you have-- get
lucky and the element

00:22:46.540 --> 00:22:48.020
might be there at the beginning.

00:22:48.020 --> 00:22:48.670
So that's pretty good.

00:22:48.670 --> 00:22:50.378
You really can't do
any better than that.

00:22:50.378 --> 00:22:54.880
So we this range now of a lower bound
from omega of 1 on up to big O of n

00:22:54.880 --> 00:22:57.940
being an upper bound on the
running time of linear search.

00:22:57.940 --> 00:23:00.760
But of course, we have this
other algorithm in our toolkit.

00:23:00.760 --> 00:23:04.240
And recall from week 0 that we
looked at binary search-- although,

00:23:04.240 --> 00:23:05.530
not necessarily by name.

00:23:05.530 --> 00:23:09.250
It was that divide-and-conquer third
algorithm, where we took the phone book

00:23:09.250 --> 00:23:11.950
and split it in half
and half and half again.

00:23:11.950 --> 00:23:17.140
Now, while I fumbled there, Joe kindly
has given us a new set of doors.

00:23:17.140 --> 00:23:19.990
If Joe, you could go ahead and
reveal our seven doors again,

00:23:19.990 --> 00:23:22.510
behind which we still have some numbers.

00:23:22.510 --> 00:23:26.440
But I think this time, I'm
going to be a little better off.

00:23:26.440 --> 00:23:30.010
Cue Joe and the doors behind.

00:23:30.010 --> 00:23:30.620
There we go.

00:23:30.620 --> 00:23:32.140
So we have our same seven doors.

00:23:32.140 --> 00:23:34.930
But behind those doors now is a
different arrangement of numbers.

00:23:34.930 --> 00:23:39.532
And suppose this time, I want
to find myself the number 6.

00:23:39.532 --> 00:23:41.740
So the number 6-- we'll
change the problem slightly--

00:23:41.740 --> 00:23:44.630
but I'm going to give you one
other ingredient this time,

00:23:44.630 --> 00:23:46.990
which is going to be
key to this working.

00:23:46.990 --> 00:23:51.580
Why were Florence and I able to do
no better than linear search before?

00:23:51.580 --> 00:23:53.530
Why were Florence and
I able to do no better

00:23:53.530 --> 00:23:57.610
than randomly searching even last time?

00:23:57.610 --> 00:24:00.760
What was it about the array
of numbers, or the array

00:24:00.760 --> 00:24:06.390
of doors, that did not allow me
previously to use binary search?

00:24:06.390 --> 00:24:08.755
Iris, what do you think?

00:24:08.755 --> 00:24:11.380
AUDIENCE: Because we didn't know
the numbers are sorted or not.

00:24:11.380 --> 00:24:11.860
DAVID MALAN: Yeah.

00:24:11.860 --> 00:24:14.060
We didn't know if the
numbers were sorted or not.

00:24:14.060 --> 00:24:16.510
And indeed, barring that
detail, Florence and I

00:24:16.510 --> 00:24:19.550
really couldn't have done any
better than, say, linear search.

00:24:19.550 --> 00:24:22.900
So this time, though, Joe has
kindly sorted some numbers

00:24:22.900 --> 00:24:24.440
behind these doors for us.

00:24:24.440 --> 00:24:26.590
And so if I want to
search for the number 6,

00:24:26.590 --> 00:24:29.290
now I can begin to use a
bit of that information.

00:24:29.290 --> 00:24:31.480
So you know what, I'm going to start
just like we did with the phone book

00:24:31.480 --> 00:24:32.813
and start roughly in the middle.

00:24:32.813 --> 00:24:34.310
And voila, number 5.

00:24:34.310 --> 00:24:34.810
All right.

00:24:34.810 --> 00:24:36.850
So we're pretty close,
we're pretty close.

00:24:36.850 --> 00:24:38.740
But the thing about
binary search, recall,

00:24:38.740 --> 00:24:41.080
is that this is now useful information.

00:24:41.080 --> 00:24:44.920
If the numbers are sorted behind these
doors all, of the doors to the left

00:24:44.920 --> 00:24:48.790
should presumably be lower than 5,
and all of the doors to the right

00:24:48.790 --> 00:24:51.340
should presumably be larger than 5.

00:24:51.340 --> 00:24:54.430
Now, I might kind of cut a corner
here and be like, well, if this is 5,

00:24:54.430 --> 00:24:57.220
6 is probably right
next door, literally.

00:24:57.220 --> 00:24:59.530
But again, algorithmically,
how might we do this?

00:24:59.530 --> 00:25:02.420
We don't want to necessarily
consider these special cases.

00:25:02.420 --> 00:25:06.580
So more generally, it looks like
I now have an array of size 3.

00:25:06.580 --> 00:25:10.400
So let me go ahead and apply that
same algorithm, voila, to the middle.

00:25:10.400 --> 00:25:11.560
Now I have the number 7.

00:25:11.560 --> 00:25:14.888
And now it's becoming pretty clear
that if the number 6 is present,

00:25:14.888 --> 00:25:16.180
it's probably behind this door.

00:25:16.180 --> 00:25:21.400
And indeed, if I now look at my
remaining array of size 1, and voila,

00:25:21.400 --> 00:25:23.470
in the middle there is that number 6.

00:25:23.470 --> 00:25:27.700
So this time, I only had to open up
three doors instead of all seven,

00:25:27.700 --> 00:25:31.390
potentially, or maybe all six doors
to find my way to that number,

00:25:31.390 --> 00:25:34.060
because I was given this
additional ingredient of all

00:25:34.060 --> 00:25:35.860
of those numbers being sorted.

00:25:35.860 --> 00:25:38.230
So it would seem, then,
that you can apply

00:25:38.230 --> 00:25:40.960
the better, more efficient, better
designed algorithm, now known

00:25:40.960 --> 00:25:46.060
as binary search, if only someone like
Joe would sort the numbers for you

00:25:46.060 --> 00:25:46.820
in advance.

00:25:46.820 --> 00:25:50.110
So let's consider now a
little more algorithmically

00:25:50.110 --> 00:25:51.440
how we might implement this.

00:25:51.440 --> 00:25:54.130
So with binary search, let
me propose this pseudocode.

00:25:54.130 --> 00:25:57.980
If the number is behind the middle
door, return true-- we found it.

00:25:57.980 --> 00:26:00.562
So if we got lucky,
then we might very well

00:26:00.562 --> 00:26:02.520
have found the number 6
behind the middle door,

00:26:02.520 --> 00:26:03.687
and we would have been done.

00:26:03.687 --> 00:26:04.660
But that didn't happen.

00:26:04.660 --> 00:26:06.785
And in the general case
that probably won't happen.

00:26:06.785 --> 00:26:10.452
So if the number is less than
that behind the middle door, then

00:26:10.452 --> 00:26:12.910
just like with the phone book,
I'm going to go to the left,

00:26:12.910 --> 00:26:17.020
and I'm going to search the left half
of the remaining doors in the array.

00:26:17.020 --> 00:26:20.045
Else if the number is greater
than that behind the middle door,

00:26:20.045 --> 00:26:22.420
then like the phone book I'm
going to go ahead and search

00:26:22.420 --> 00:26:24.370
the right half of the phone book.

00:26:24.370 --> 00:26:28.000
But there might still be
one final case potentially,

00:26:28.000 --> 00:26:32.170
whereby if there's no doors left at
all, or no doors in the first place,

00:26:32.170 --> 00:26:36.430
I should at least have this one special
case where I do say return false.

00:26:36.430 --> 00:26:40.272
For instance, if 6, for whatever
reason, weren't be among those doors

00:26:40.272 --> 00:26:41.980
and I were searching
for it, I still need

00:26:41.980 --> 00:26:44.860
to be able to handle that situation
where I can say definitively

00:26:44.860 --> 00:26:48.280
return false if I'm left with
no further doors to search.

00:26:48.280 --> 00:26:51.750
So here, then, might be the pseudocode
for this algorithm a bit more formally.

00:26:51.750 --> 00:26:53.820
Now let's consider the analysis thereof.

00:26:53.820 --> 00:26:58.620
Before, where we left off,
linear search was big O of n.

00:26:58.620 --> 00:27:00.630
Linear search was big O of n.

00:27:00.630 --> 00:27:05.310
This time let's consider where binary
search actually falls into place

00:27:05.310 --> 00:27:06.908
by asking a different question.

00:27:06.908 --> 00:27:09.450
I'm going to go ahead and go
back and ask this question now--

00:27:09.450 --> 00:27:13.920
what's an upper bound on the
running time of binary search?

00:27:13.920 --> 00:27:16.920
An upper bound on the running
time of binary search--

00:27:16.920 --> 00:27:21.210
and go ahead and buzz in, if
you'd like, similarly to before.

00:27:21.210 --> 00:27:25.510
What's an upper bound on the
running time of binary search?

00:27:25.510 --> 00:27:30.960
And you can see here answers are
getting pretty dominant around log n.

00:27:30.960 --> 00:27:33.780
And indeed, that jives with
exactly what we did in week 0.

00:27:33.780 --> 00:27:36.240
The correct answer is indeed
log of n, because that's

00:27:36.240 --> 00:27:38.240
going to be the maximum
number of times that you

00:27:38.240 --> 00:27:40.438
can take a list or an
array of a given size

00:27:40.438 --> 00:27:42.480
and split it in half and
half and half, until you

00:27:42.480 --> 00:27:45.360
find the number you're
looking for, or ultimately you

00:27:45.360 --> 00:27:47.320
don't find that number at all.

00:27:47.320 --> 00:27:54.460
Meanwhile, if we consider now not just
the upper bound on this algorithm--

00:27:54.460 --> 00:27:57.930
so in the worst case, binary
search takes big O of log n--

00:27:57.930 --> 00:28:00.210
now let's consider a
related question which

00:28:00.210 --> 00:28:03.420
is, what's a lower bound on the
running time of this same algorithm?

00:28:03.420 --> 00:28:05.530
What's a lower bound
on the running time?

00:28:05.530 --> 00:28:07.470
I'll go ahead and pluck
this one off myself

00:28:07.470 --> 00:28:10.680
and go back to some of
the suggestions thus far.

00:28:10.680 --> 00:28:13.573
In the best case, maybe,
too, you do get lucky,

00:28:13.573 --> 00:28:15.990
and the number you're looking
for, 6 or some other number,

00:28:15.990 --> 00:28:18.250
is smack dab in the middle of the array.

00:28:18.250 --> 00:28:20.940
And so maybe indeed you can
get away with just one step.

00:28:20.940 --> 00:28:25.110
And indeed, a lower bound on binary
search now might very well just

00:28:25.110 --> 00:28:29.250
be an omega of 1, because in that
best case you just get lucky,

00:28:29.250 --> 00:28:33.250
and it's right where you happen to
start, in this case in the middle.

00:28:33.250 --> 00:28:34.690
So we seem to have a range there.

00:28:34.690 --> 00:28:37.950
But strictly speaking, it would
seem that binary search is better.

00:28:37.950 --> 00:28:40.710
Binary search is better
than linear search,

00:28:40.710 --> 00:28:44.880
because as n gets big, big, big,
you can really feel that difference.

00:28:44.880 --> 00:28:47.970
In fact, recall from week 0 we played
a little bit with these light bulbs.

00:28:47.970 --> 00:28:50.698
And right now, all 64 of
these light bulbs are on.

00:28:50.698 --> 00:28:53.490
And let's consider for a moment,
just to put this into perspective,

00:28:53.490 --> 00:28:56.400
how long it would take to
use linear search to find

00:28:56.400 --> 00:28:58.410
one light bulb among these 64.

00:28:58.410 --> 00:29:01.920
And recall that in the worst case,
maybe the light bulb, or the number

00:29:01.920 --> 00:29:04.480
that we're looking for, is
way down there at the end,

00:29:04.480 --> 00:29:06.100
but we don't know in advance.

00:29:06.100 --> 00:29:09.900
And so Sumner, if you wouldn't
mind executing linear search

00:29:09.900 --> 00:29:13.410
on these light bulbs, let's just
get a feel for the efficiency

00:29:13.410 --> 00:29:15.870
or inefficiency of this algorithm.

00:29:15.870 --> 00:29:18.360
Linear search in light bulb form.

00:29:18.360 --> 00:29:21.060
So you'll notice that
one light bulb at a time

00:29:21.060 --> 00:29:24.570
is going out, implying that I've
searched that door, searched that door,

00:29:24.570 --> 00:29:25.890
searched that door.

00:29:25.890 --> 00:29:27.990
But we've only gone
through 10 or so bulbs,

00:29:27.990 --> 00:29:30.400
and we've got another 50-plus to go.

00:29:30.400 --> 00:29:34.920
And you can see that if we look
inside of these doors one per second,

00:29:34.920 --> 00:29:39.010
or turn off these light bulbs one per
second, it's going to take a long time.

00:29:39.010 --> 00:29:42.310
In fact, it doesn't seem worthwhile
to even wait until the very end.

00:29:42.310 --> 00:29:45.143
So Sumner, if you wouldn't mind,
let's bring all the lights back up,

00:29:45.143 --> 00:29:48.300
and let's try once more another
algorithm, this one binary search, just

00:29:48.300 --> 00:29:52.500
to get, again, a feel of what the
running time is of an algorithm,

00:29:52.500 --> 00:29:54.750
like binary search that
runs in logarithmic time.

00:29:54.750 --> 00:29:58.260
So in just a moment, we'll go
ahead and execute binary search

00:29:58.260 --> 00:30:01.510
on these light bulbs, the idea being
that there's one bulb we care about.

00:30:01.510 --> 00:30:04.620
Let's see how fast we can get
down to just one bulb out of 64.

00:30:04.620 --> 00:30:07.440
So Sumner, on your marks, get set, go.

00:30:10.850 --> 00:30:14.102
And we're done just a few steps later.

00:30:14.102 --> 00:30:15.560
And then have this sole light bulb.

00:30:15.560 --> 00:30:16.880
That was so much faster.

00:30:16.880 --> 00:30:20.330
And in fact, we did this
deliberately one iteration at a time.

00:30:20.330 --> 00:30:24.920
The algorithm that we just executed
with Sumner's and Matt's help,

00:30:24.920 --> 00:30:28.400
algorithmically was operating at
what's called 1 hertz, 1 hertz.

00:30:28.400 --> 00:30:31.400
And if you're unfamiliar with hertz,
it's just one something per second.

00:30:31.400 --> 00:30:34.855
It's very often used in physics or
just in discussions of electricity

00:30:34.855 --> 00:30:35.480
more generally.

00:30:35.480 --> 00:30:38.090
And indeed, in this case if
you're doing one thing per second,

00:30:38.090 --> 00:30:41.930
that first algorithm, linear search,
might have taken us like 64 seconds

00:30:41.930 --> 00:30:44.480
to get all the way to
that final light bulb.

00:30:44.480 --> 00:30:47.390
But that second algorithm
was logarithmic.

00:30:47.390 --> 00:30:54.350
And so by going from 64 to
32 to 16 to 8 to 4 to 2 to 1,

00:30:54.350 --> 00:30:58.850
we get to the final result much
faster, even going at the same pace.

00:30:58.850 --> 00:31:01.500
So in fact, if you think
of your computer's CPU,

00:31:01.500 --> 00:31:03.560
CPUs are also measured in hertz--

00:31:03.560 --> 00:31:06.750
H-E-R-T-Z. Probably
measured in gigahertz,

00:31:06.750 --> 00:31:08.960
which is billions of hertz per second.

00:31:08.960 --> 00:31:11.960
So your CPU, the brain of your
computer, If it's 1 gigahertz,

00:31:11.960 --> 00:31:15.840
that means it can literally
do 1 billion things at a time.

00:31:15.840 --> 00:31:18.770
And here we have this sort of simpler
setup of just light bulbs doing

00:31:18.770 --> 00:31:20.510
one thing per second.

00:31:20.510 --> 00:31:24.290
Your computer can do 1 billion of
these kinds of operations at once.

00:31:24.290 --> 00:31:27.260
So just imagine, therefore,
how much these savings tend

00:31:27.260 --> 00:31:30.743
to add up over time if you can take big
bites out of these problems at once,

00:31:30.743 --> 00:31:32.660
as opposed to doing
things like we did in week

00:31:32.660 --> 00:31:36.620
0, just one single step at a time.

00:31:36.620 --> 00:31:37.380
All right.

00:31:37.380 --> 00:31:39.922
Well, let's now go ahead and
start to translate this to code.

00:31:39.922 --> 00:31:42.677
We have enough tools in our
toolkit in C that I think,

00:31:42.677 --> 00:31:44.510
based on our discussion
of arrays last week,

00:31:44.510 --> 00:31:48.330
we can now actually start to build
something in code on our own.

00:31:48.330 --> 00:31:51.600
So I'm going to go ahead and create
a file here in just a moment,

00:31:51.600 --> 00:31:55.430
in CS50 IDE, called,
for instance, numbers.c.

00:31:55.430 --> 00:32:00.320
Let me go ahead and translate this
to a file in C code called numbers.c.

00:32:00.320 --> 00:32:03.650
The goal at hand is just to
implement linear search in code,

00:32:03.650 --> 00:32:06.380
just so that we're no longer
waving our hands at the pseudocode

00:32:06.380 --> 00:32:08.520
but doing things a
little more concretely.

00:32:08.520 --> 00:32:10.880
So I'm going to go ahead
and include cs50.h.

00:32:10.880 --> 00:32:13.160
I'm going to go ahead
and include stdio.h.

00:32:13.160 --> 00:32:15.530
And I'm going to start with
no command line arguments,

00:32:15.530 --> 00:32:18.647
like we left off last week,
but just with main void.

00:32:18.647 --> 00:32:21.230
And I'm going to go ahead and
give myself an array of numbers,

00:32:21.230 --> 00:32:22.820
seven numbers, just like the doors.

00:32:22.820 --> 00:32:25.670
And I'm going to go ahead
and say int numbers.

00:32:25.670 --> 00:32:28.250
And then this is a little trick
that we didn't see last week,

00:32:28.250 --> 00:32:30.740
but it's handy for
creating an array when

00:32:30.740 --> 00:32:33.110
you know in advance what
numbers you want, which I do,

00:32:33.110 --> 00:32:36.440
because I'm going to mimic the doors
that Joe kindly set up for us here, I'm

00:32:36.440 --> 00:32:44.120
going to go ahead and say give me an
array that is equal to 4, 6, 8, 2, 7,

00:32:44.120 --> 00:32:45.480
5, 0.

00:32:45.480 --> 00:32:47.480
And this is the feature
we didn't see last week.

00:32:47.480 --> 00:32:50.690
If you know in advance the numbers
that you want to assign to an array,

00:32:50.690 --> 00:32:54.560
you actually don't have to bother
specifying the size of the array

00:32:54.560 --> 00:32:55.790
explicitly.

00:32:55.790 --> 00:32:58.640
The compiler can figure that
out intelligently for you.

00:32:58.640 --> 00:33:01.960
But you can use these
curly braces with commas

00:33:01.960 --> 00:33:05.420
inside to enumerate from
left to right the values

00:33:05.420 --> 00:33:07.400
that you want to put into that array.

00:33:07.400 --> 00:33:09.727
So after this line 6 has
executed in my computer,

00:33:09.727 --> 00:33:11.810
I'm going to be left with
an array called numbers,

00:33:11.810 --> 00:33:14.480
inside of which are
seven integers listed

00:33:14.480 --> 00:33:17.895
from left to right in the computer's
memory, so to speak, in this way.

00:33:17.895 --> 00:33:19.770
Now, what do I want to
do with these numbers?

00:33:19.770 --> 00:33:21.560
Well, let's implement linear search.

00:33:21.560 --> 00:33:24.170
Linear search, as we
latched on to earlier,

00:33:24.170 --> 00:33:27.110
is a searching from left to right
or equivalently right to left--

00:33:27.110 --> 00:33:29.460
but convention tends
to go left to right.

00:33:29.460 --> 00:33:31.700
So I'm going to do a standard for loop.

00:33:31.700 --> 00:33:34.730
For int i gets 0, i is less than--

00:33:34.730 --> 00:33:36.980
I'm going to keep it simple
for now and hardcode this,

00:33:36.980 --> 00:33:40.400
but we could clean this up if we
want, and I'm going to do i++ on each

00:33:40.400 --> 00:33:41.130
iteration.

00:33:41.130 --> 00:33:45.330
So I'm pretty sure that my line
8 will induce a for loop that

00:33:45.330 --> 00:33:47.090
iterates eight total times.

00:33:47.090 --> 00:33:49.460
And what question do I want
to ask on each iteration?

00:33:49.460 --> 00:33:56.270
Well, if the numbers array
at location i equals equals--

00:33:56.270 --> 00:34:00.170
for instance, the number I
was searching for initially,

00:34:00.170 --> 00:34:02.060
let's go ahead and search for 0--

00:34:02.060 --> 00:34:03.620
then what do I want to do?

00:34:03.620 --> 00:34:06.860
Let me go ahead and print out
something arbitrary but useful,

00:34:06.860 --> 00:34:09.787
like "Found," quote,
unquote, so the human knows.

00:34:09.787 --> 00:34:12.620
And then let me go ahead, and just
for good measure, let me go ahead

00:34:12.620 --> 00:34:14.000
and return 0.

00:34:14.000 --> 00:34:16.110
And we'll come back to
that in just a moment.

00:34:16.110 --> 00:34:20.389
But at the end of this program, I'm also
going to do this-- printf "Not found"

00:34:20.389 --> 00:34:21.610
with a backslash n.

00:34:21.610 --> 00:34:23.719
And then I'm going to
go ahead and return 1.

00:34:23.719 --> 00:34:26.000
But before we tease
apart those returns, just

00:34:26.000 --> 00:34:27.949
consider the code in the aggregate.

00:34:27.949 --> 00:34:30.020
Here's my entire main function.

00:34:30.020 --> 00:34:32.750
And on line 6, to recap,
I initialized the array,

00:34:32.750 --> 00:34:37.909
just as we did at the very beginning,
with a seemingly random list of numbers

00:34:37.909 --> 00:34:39.020
behind the doors.

00:34:39.020 --> 00:34:41.570
Then on line 8, I'm going to
iterate with this for loop

00:34:41.570 --> 00:34:44.600
seven total times,
incrementing i in each turn.

00:34:44.600 --> 00:34:48.469
And then line 10, just like I was
opening the doors one at a time,

00:34:48.469 --> 00:34:51.469
I'm going to check if the
i-th number in this array

00:34:51.469 --> 00:34:54.949
equals equals the number I care
about, 0, with that first demo.

00:34:54.949 --> 00:34:56.570
I'm going to print "Found."

00:34:56.570 --> 00:35:00.110
Otherwise-- not else,
per se-- but otherwise,

00:35:00.110 --> 00:35:04.550
if I go through this entire loop,
checking if, if, if, if, if, and I

00:35:04.550 --> 00:35:08.870
never actually find 0, I'm going to
have this catchall at the end that just

00:35:08.870 --> 00:35:13.790
says no matter what, if you
reach line 16, print "Not found,"

00:35:13.790 --> 00:35:15.505
and then return 1.

00:35:15.505 --> 00:35:16.880
Now, this is a bit of a subtlety.

00:35:16.880 --> 00:35:20.800
But could someone remind us
what's going on with the return 0

00:35:20.800 --> 00:35:26.566
on line 13 and the return 1 on line 17?

00:35:26.566 --> 00:35:31.930
Why 0 in 1, and why
am I returning at all?

00:35:31.930 --> 00:35:35.630
What problem is this solving for me?

00:35:35.630 --> 00:35:37.380
Even though most of
our programs thus far,

00:35:37.380 --> 00:35:39.150
we haven't bothered too much with this.

00:35:39.150 --> 00:35:40.710
Demi, is it?

00:35:40.710 --> 00:35:42.340
What do you think?

00:35:42.340 --> 00:35:45.600
AUDIENCE: It's Demi,
but basically, return 0

00:35:45.600 --> 00:35:49.490
is like it was executed
correctly, or it found it,

00:35:49.490 --> 00:35:52.170
and it kind of exits that
loop saying that it was found.

00:35:52.170 --> 00:35:58.375
And then return 1 is like the
return false, and it exits as well.

00:35:58.375 --> 00:35:59.250
DAVID MALAN: Exactly.

00:35:59.250 --> 00:36:00.917
And "exit" really is the operative word.

00:36:00.917 --> 00:36:02.580
In main, when you are done--

00:36:02.580 --> 00:36:05.340
ready to quit the program,
as we've done with the word

00:36:05.340 --> 00:36:09.000
"quit" in some of our pseudocode in the
past, you can literally return a value.

00:36:09.000 --> 00:36:11.970
And recall at the end of last
week, we introduced the fact

00:36:11.970 --> 00:36:14.250
that main always returns an int.

00:36:14.250 --> 00:36:17.130
You and I have ignored that
for at least a week or two,

00:36:17.130 --> 00:36:20.130
but sometimes it's useful
to return an explicit value,

00:36:20.130 --> 00:36:22.410
whether it's for autograding
purposes, whether it's

00:36:22.410 --> 00:36:24.735
for automated testing of
your code in the real world,

00:36:24.735 --> 00:36:27.360
or just so it's a signal to the
user that something indeed went

00:36:27.360 --> 00:36:28.030
Wrong.

00:36:28.030 --> 00:36:30.180
So you can return a value from main.

00:36:30.180 --> 00:36:33.507
And as Demi proposed,
0 means "all is well."

00:36:33.507 --> 00:36:35.340
And it's a little
counter-intuitive, because

00:36:35.340 --> 00:36:37.330
thus far true tends to be a good thing.

00:36:37.330 --> 00:36:39.420
But in this case, 0 is a good thing.

00:36:39.420 --> 00:36:40.140
All is well.

00:36:40.140 --> 00:36:41.220
It's success.

00:36:41.220 --> 00:36:44.700
And if you return any other
value, for instance 1,

00:36:44.700 --> 00:36:46.660
that indicates that
something went wrong.

00:36:46.660 --> 00:36:51.390
So the reason I'm printing out, after
the word "Found" I'm returning 0,

00:36:51.390 --> 00:36:54.498
is so that effectively the
program exits at that point.

00:36:54.498 --> 00:36:56.790
I don't want to keep going
again and again if I already

00:36:56.790 --> 00:36:58.320
found the number I care about.

00:36:58.320 --> 00:37:02.160
And down here, this one admittedly
isn't strictly necessary,

00:37:02.160 --> 00:37:06.120
because if I hit line 16
and maybe deleted line 17,

00:37:06.120 --> 00:37:07.740
the program's going to end anyway.

00:37:07.740 --> 00:37:10.800
But there wouldn't be
that so-called exit status

00:37:10.800 --> 00:37:12.960
that we discussed last
week briefly, whereby

00:37:12.960 --> 00:37:15.293
you can kind of signal to the
computer whether something

00:37:15.293 --> 00:37:17.190
was successful or unsuccessful.

00:37:17.190 --> 00:37:21.690
And the reason that 0 is a good thing
and 1 or any other number is not,

00:37:21.690 --> 00:37:24.780
consider how many things can go
wrong in programs that you write

00:37:24.780 --> 00:37:26.607
or that companies in
the real world write

00:37:26.607 --> 00:37:28.440
when you get those error
messages, sometimes

00:37:28.440 --> 00:37:30.060
with those cryptic error codes.

00:37:30.060 --> 00:37:32.430
There are hundreds,
thousands of problems

00:37:32.430 --> 00:37:35.790
that might happen in a computer
program that could be that many error

00:37:35.790 --> 00:37:37.680
codes that you see on
the screen, reasons

00:37:37.680 --> 00:37:40.630
explaining why the program
crashed or froze or the like.

00:37:40.630 --> 00:37:46.020
But 0 is sort of special in that it's
just one value that the world has

00:37:46.020 --> 00:37:47.760
decided means "success."

00:37:47.760 --> 00:37:50.460
So there's only one way to get
your program right, in a sense,

00:37:50.460 --> 00:37:54.000
but there's so many millions of
ways in which things can go wrong.

00:37:54.000 --> 00:37:57.810
And that's why humans have adopted
that particular convention.

00:37:57.810 --> 00:37:58.310
All right.

00:37:58.310 --> 00:38:02.378
But let's consider now not just numbers,
but let's make things more interesting.

00:38:02.378 --> 00:38:04.170
Besides the doors,
suppose that we actually

00:38:04.170 --> 00:38:06.300
had people's names behind them.

00:38:06.300 --> 00:38:08.790
Well, let's go ahead and
write a program this time

00:38:08.790 --> 00:38:13.130
that not only searches for numbers,
but instead searches for names.

00:38:13.130 --> 00:38:16.850
So I'm going to go ahead and create
a different file here called names.c.

00:38:16.850 --> 00:38:18.600
And I'm going to start
a little similarly.

00:38:18.600 --> 00:38:23.460
I'm going to include cs50.h at the top,
I'm going to include stdio at the top.

00:38:23.460 --> 00:38:26.332
But I'm also this time
going to include string.h,

00:38:26.332 --> 00:38:28.040
which we introduced
briefly last week, so

00:38:28.040 --> 00:38:31.110
that we have access to strlen for
getting the length of a string,

00:38:31.110 --> 00:38:33.100
and, it turns out, some other functions.

00:38:33.100 --> 00:38:35.790
Let me go ahead and declare
int main void as usual.

00:38:35.790 --> 00:38:38.190
And then inside here, I
need some arbitrary names.

00:38:38.190 --> 00:38:40.260
So let's come up with seven names here.

00:38:40.260 --> 00:38:43.825
And here, too, I can declare
an array just as I did before.

00:38:43.825 --> 00:38:45.450
But it doesn't have to store only ints.

00:38:45.450 --> 00:38:47.400
It can store strings instead.

00:38:47.400 --> 00:38:49.560
So I've changed the data
type from int to string,

00:38:49.560 --> 00:38:52.480
and I've changed the variable
name from numbers to names.

00:38:52.480 --> 00:38:55.350
And I can still use this
new curly brace notation,

00:38:55.350 --> 00:39:01.470
and I can give myself a name like Bill,
and maybe Charlie, and maybe Fred,

00:39:01.470 --> 00:39:07.480
and maybe George, and maybe Ginny,
and maybe Percy, and lastly,

00:39:07.480 --> 00:39:09.510
maybe a name like Ron.

00:39:09.510 --> 00:39:12.700
And it just barely fits on my screen.

00:39:12.700 --> 00:39:16.110
So with that said, I now
have this array of names.

00:39:16.110 --> 00:39:19.620
And beyond there being a
perhaps obvious pattern to them,

00:39:19.620 --> 00:39:23.310
there's a second less obvious, or
maybe obvious, pattern to them.

00:39:23.310 --> 00:39:28.590
How would you describe the list of
names I arbitrarily just came up with?

00:39:28.590 --> 00:39:31.350
What's a useful characteristic of them?

00:39:31.350 --> 00:39:34.410
What do you notice about these names?

00:39:34.410 --> 00:39:37.530
And there's at least two right
answers to this question, I think.

00:39:37.530 --> 00:39:40.370
What do you notice about these names?

00:39:40.370 --> 00:39:41.533
Jack?

00:39:41.533 --> 00:39:43.200
AUDIENCE: They're in alphabetical order.

00:39:43.200 --> 00:39:43.500
DAVID MALAN: Yes.

00:39:43.500 --> 00:39:46.332
So beyond being the names of the
Weasley children in Harry Potter,

00:39:46.332 --> 00:39:47.790
they're also in alphabetical order.

00:39:47.790 --> 00:39:49.080
And that's the more salient detail.

00:39:49.080 --> 00:39:50.970
For our purposes, I've
had the forethought

00:39:50.970 --> 00:39:53.340
this time to sort
these names in advance.

00:39:53.340 --> 00:39:56.580
And if I've sorted these
names, that means implicitly

00:39:56.580 --> 00:39:59.670
I can use a better algorithm
than linear search.

00:39:59.670 --> 00:40:02.160
I can use, for instance,
our old binary search.

00:40:02.160 --> 00:40:05.615
But let's go ahead first and
just search them naively for now.

00:40:05.615 --> 00:40:07.740
Let's still apply linear
search, because, you know,

00:40:07.740 --> 00:40:12.925
what we haven't yet done is necessarily
compare strings against one another.

00:40:12.925 --> 00:40:15.300
We've done a lot of comparisons
of numbers like integers.

00:40:15.300 --> 00:40:16.178
But what about names?

00:40:16.178 --> 00:40:17.470
So let me go ahead and do this.

00:40:17.470 --> 00:40:22.200
So for int i gets 0, just like
before, i less than 7, i++--

00:40:22.200 --> 00:40:25.200
and I'm doing this only because I
know in advance there are seven names.

00:40:25.200 --> 00:40:27.630
I think we could probably
improve the design of this code,

00:40:27.630 --> 00:40:31.200
too, by having a variable or
a constant storing that value.

00:40:31.200 --> 00:40:34.470
But I'm going to keep it simple and
focus only on the new details for now.

00:40:34.470 --> 00:40:38.280
And it turns out, for reasons we'll
explore in more detail next week,

00:40:38.280 --> 00:40:42.560
it is not sufficient to do what we
did before and do something like this

00:40:42.560 --> 00:40:44.330
if I'm searching for "Ron."

00:40:44.330 --> 00:40:49.340
It turns out that in C, you can't use
equals equals to compare two strings.

00:40:49.340 --> 00:40:52.080
You can for an int, you can for a char.

00:40:52.080 --> 00:40:53.870
And we've done both
of those in the past.

00:40:53.870 --> 00:40:57.920
But there's a subtlety that we'll dive
into in more detail next week that

00:40:57.920 --> 00:41:00.003
means you can't actually do this.

00:41:00.003 --> 00:41:02.420
And this is curious, because
if you have prior programming

00:41:02.420 --> 00:41:06.270
experience in languages like Python
or the like, you can do this.

00:41:06.270 --> 00:41:09.770
So in C you can't, but
we'll see next time why.

00:41:09.770 --> 00:41:13.040
But for now, it turns out
that C can solve this problem,

00:41:13.040 --> 00:41:15.600
and historically the way you
do this is with a function.

00:41:15.600 --> 00:41:18.560
So inside of the string.h
header file, there

00:41:18.560 --> 00:41:22.790
is not only a declaration for strlen,
the length of a string like last week.

00:41:22.790 --> 00:41:25.400
There's another function called strcmp.

00:41:25.400 --> 00:41:29.000
And "stir compare,"
for short, S-T-R-C-M-P,

00:41:29.000 --> 00:41:33.590
allows me to pass in two strings, one
string that I want to compare against

00:41:33.590 --> 00:41:34.880
another string.

00:41:34.880 --> 00:41:37.250
So it's not quite the same syntax.

00:41:37.250 --> 00:41:38.830
Indeed, it's a little harder to read.

00:41:38.830 --> 00:41:40.580
It's not quite as
simple as equals equals.

00:41:40.580 --> 00:41:43.760
But strcmp, if we read
the documentation for it,

00:41:43.760 --> 00:41:46.430
will tell us that this
compares two strings.

00:41:46.430 --> 00:41:49.160
And it returns one of
three possible values.

00:41:49.160 --> 00:41:54.410
If those two strings are equal, that is,
identically the same letter for letter,

00:41:54.410 --> 00:41:58.550
then this function is going
to return 0, it turns out.

00:41:58.550 --> 00:42:03.470
If the first string is supposed
to come before the second string

00:42:03.470 --> 00:42:06.760
alphabetically, in some
sense, then this function

00:42:06.760 --> 00:42:09.140
is going to return a negative value.

00:42:09.140 --> 00:42:13.550
If the first string is supposed to come
after the second string alphabetically,

00:42:13.550 --> 00:42:16.320
if you will, then it's going
to return a positive value.

00:42:16.320 --> 00:42:20.690
So there's three possible outcomes--
either equal to 0, or less than 0,

00:42:20.690 --> 00:42:22.460
or greater than 0.

00:42:22.460 --> 00:42:26.090
But you'll notice, and in fact, if you
look at the documentation some time,

00:42:26.090 --> 00:42:30.890
it doesn't specify what value less
than 0 or what value greater than 0.

00:42:30.890 --> 00:42:34.790
You have to just check for any
negative value or any positive value.

00:42:34.790 --> 00:42:37.070
And I also told a bit of
a white lie a moment ago.

00:42:37.070 --> 00:42:39.620
This does not check
things alphabetically,

00:42:39.620 --> 00:42:42.530
even though it coincidentally
does sometimes.

00:42:42.530 --> 00:42:45.590
Actually compares strings in
what's called ASCII order,

00:42:45.590 --> 00:42:49.670
or ASCIIbetically which is kind
of a goofy way of describing

00:42:49.670 --> 00:42:54.830
this function looks at every character
in the two strings, from left to right,

00:42:54.830 --> 00:42:59.570
it checks the ASCII values of them,
and then it compares those ASCII values

00:42:59.570 --> 00:43:01.440
character by character.

00:43:01.440 --> 00:43:03.680
And if the ASCII value
is less than the other,

00:43:03.680 --> 00:43:06.780
then it returns a negative
value or vice versa.

00:43:06.780 --> 00:43:10.790
So if you have, for instance, the
letter A, capital A in the string,

00:43:10.790 --> 00:43:12.963
that gets converted first to 65.

00:43:12.963 --> 00:43:15.380
And then if you have an A in
the other string capitalized,

00:43:15.380 --> 00:43:18.455
it, too, gets compared to
65, and those would be equal.

00:43:18.455 --> 00:43:21.080
But of course, all of these names
have more than one character,

00:43:21.080 --> 00:43:25.620
so this ASCII order, or
ASCIIbetical, precedes left to right

00:43:25.620 --> 00:43:29.750
so that strcmp checks every
character in the names for you.

00:43:29.750 --> 00:43:33.290
And it stops when it hits that
terminating null character.

00:43:33.290 --> 00:43:35.480
Recall that strings,
underneath the hood,

00:43:35.480 --> 00:43:39.780
always end in C with this
backslash 0, or eight 0 bits.

00:43:39.780 --> 00:43:43.470
So that's how strcmp knows
when to stop comparing values.

00:43:43.470 --> 00:43:46.257
But if I go ahead and find
someone like Ron, let me go ahead

00:43:46.257 --> 00:43:47.840
and print out quote, unquote, "Found."

00:43:47.840 --> 00:43:50.840
And like before, I'll go ahead
and return, like Demi proposed,

00:43:50.840 --> 00:43:54.710
0, just to imply that all is successful.

00:43:54.710 --> 00:43:57.360
Otherwise, if we get all the
way to the bottom of my code,

00:43:57.360 --> 00:44:00.620
I'm going to print out "Not found"
to tell the story that we did not

00:44:00.620 --> 00:44:04.310
find Ron in this array, even
though he does happen to be there,

00:44:04.310 --> 00:44:06.570
and I'm going to go ahead and return 1.

00:44:06.570 --> 00:44:08.480
So even though I've
hardcoded everything--

00:44:08.480 --> 00:44:11.960
to hardcode something in a program
means to type it out explicitly--

00:44:11.960 --> 00:44:15.200
you could imagine using a command
line argument like last week

00:44:15.200 --> 00:44:16.310
to get user's input.

00:44:16.310 --> 00:44:17.720
Who would you like to search for?

00:44:17.720 --> 00:44:20.900
You could imagine using get_string
to get user's input and ask them,

00:44:20.900 --> 00:44:22.460
who would you like to search for?

00:44:22.460 --> 00:44:26.150
But for now, just for demonstration
sake, I've used only Ron's name.

00:44:26.150 --> 00:44:28.310
And if I haven't made any typos--

00:44:28.310 --> 00:44:35.030
let me go ahead and type in make
names, Enter, so far so good, ./names.

00:44:35.030 --> 00:44:37.400
And hopefully, we'll
see, indeed, "Found,"

00:44:37.400 --> 00:44:42.140
because "Ron" is very much in
this array of seven siblings.

00:44:42.140 --> 00:44:44.120
But the building blocks
that are new here

00:44:44.120 --> 00:44:47.540
are, again, the fact that when we
declare an array of some fixed size

00:44:47.540 --> 00:44:49.580
we don't strictly need
to put a number here,

00:44:49.580 --> 00:44:51.890
and we have this curly
brace notation when we

00:44:51.890 --> 00:44:54.110
know the array's contents in advance.

00:44:54.110 --> 00:44:56.150
But perhaps lastly and
most powerfully, we

00:44:56.150 --> 00:45:01.730
do have this function in C called
strcmp that will allow us to actually

00:45:01.730 --> 00:45:05.370
store and compare strings in this way.

00:45:05.370 --> 00:45:07.550
So let me pause here
and just ask if there's

00:45:07.550 --> 00:45:12.320
any questions about how we translated
these ideas to code for numbers,

00:45:12.320 --> 00:45:15.650
and how we translated
these ideas to code for now

00:45:15.650 --> 00:45:19.280
names, each time using
linear search, not, binary.

00:45:19.280 --> 00:45:21.670
Caleb, question?

00:45:21.670 --> 00:45:22.400
AUDIENCE: Yeah.

00:45:22.400 --> 00:45:29.585
So would that program still work if
"Ron," for example, was like all caps,

00:45:29.585 --> 00:45:30.960
like if you're trying to search--

00:45:30.960 --> 00:45:35.643
like, if the cases are different in
terms of uppercase and lowercase?

00:45:35.643 --> 00:45:37.060
DAVID MALAN: Really good question.

00:45:37.060 --> 00:45:40.510
And let me propose an instinct
that's useful to acquire in general--

00:45:40.510 --> 00:45:41.457
when in doubt, try it.

00:45:41.457 --> 00:45:42.790
So I'm going to do exactly that.

00:45:42.790 --> 00:45:44.998
I do happen to know the
answer, but suppose I didn't.

00:45:44.998 --> 00:45:47.560
Let me go ahead and change
"Ron" to all caps, just because.

00:45:47.560 --> 00:45:49.420
Maybe the human, the
Caps Lock key was on,

00:45:49.420 --> 00:45:51.220
and they typed it in a little sloppily.

00:45:51.220 --> 00:45:53.300
Let me go ahead and
make no other changes.

00:45:53.300 --> 00:45:57.820
Notice that I'm leaving the original
array alone with only a capital R.

00:45:57.820 --> 00:46:01.900
Let me remake this program,
make name, ./names.

00:46:01.900 --> 00:46:06.950
And voila, he's still, in fact, found.

00:46:10.860 --> 00:46:11.570
Stand by.

00:46:14.690 --> 00:46:15.960
Oh, OK.

00:46:19.910 --> 00:46:22.280
Caleb, you have just helped
me unearth, a bug that

00:46:22.280 --> 00:46:23.810
was latent in the previous example.

00:46:23.810 --> 00:46:26.600
None of you should
have accepted the fact

00:46:26.600 --> 00:46:29.317
that the previous program worked
with "RON," because I didn't

00:46:29.317 --> 00:46:30.900
practice literally what I'm preaching.

00:46:30.900 --> 00:46:34.070
So Caleb, hold that thought for just
a moment so I can rewind a little bit

00:46:34.070 --> 00:46:35.540
and fix my apparent bug.

00:46:35.540 --> 00:46:37.340
So "RON" was indeed found.

00:46:37.340 --> 00:46:39.650
But he wasn't found
because "RON" was found.

00:46:39.650 --> 00:46:41.240
I did something stupid here.

00:46:41.240 --> 00:46:45.440
And it's perhaps all the more
pedagogically appropriate

00:46:45.440 --> 00:46:47.420
now to highlight that.

00:46:47.420 --> 00:46:53.870
So how did this program say "Ron" was
found, even though this time it also

00:46:53.870 --> 00:46:57.050
says "RON" was found in all caps?

00:46:57.050 --> 00:47:00.480
And you know what, let me
get a little curious here.

00:47:00.480 --> 00:47:03.920
Let me go ahead and search
for, not even "Ron."

00:47:03.920 --> 00:47:07.310
How about we search
for Ron's mom, "Molly"?

00:47:07.310 --> 00:47:08.600
Make names.

00:47:08.600 --> 00:47:09.680
All right.

00:47:09.680 --> 00:47:14.330
And now, just to reveal that I really
did do something stupid, ./names.

00:47:14.330 --> 00:47:16.910
OK, now something's
clearly wrong, right?

00:47:16.910 --> 00:47:22.473
I can even search for the father
"Arthur", make names, ./name.

00:47:22.473 --> 00:47:25.640
It seems that I wrote you a program
that just literally always says "Found."

00:47:25.640 --> 00:47:27.830
So we shouldn't have
accepted this as correct.

00:47:27.830 --> 00:47:32.870
Can anyone spot the bug based
on my definition thus far?

00:47:32.870 --> 00:47:34.880
Can anyone spot the bug?

00:47:34.880 --> 00:47:38.900
In the meantime, this isn't really
a bad time to open up the duck

00:47:38.900 --> 00:47:41.130
and say, "Hello, duck.

00:47:41.130 --> 00:47:50.240
I am having a problem whereby my
program is always printing Found

00:47:50.240 --> 00:47:54.320
even when someone is not in the array.

00:47:54.320 --> 00:47:56.630
And I could proceed to
explain my logic to the duck,

00:47:56.630 --> 00:48:00.640
but hopefully Sophia can point me at
the solution even faster than the duck.

00:48:00.640 --> 00:48:03.350
AUDIENCE: You need to
compare the value that we

00:48:03.350 --> 00:48:05.910
received from strcmp with something.

00:48:05.910 --> 00:48:08.030
So we need to compare it
with like 0 and make sure

00:48:08.030 --> 00:48:11.265
that we receive the
value that they're equal.

00:48:11.265 --> 00:48:12.140
DAVID MALAN: Perfect.

00:48:12.140 --> 00:48:16.400
So I said the right thing, but I
literally did not do the right thing.

00:48:16.400 --> 00:48:19.070
If I want to check for
equality, I literally

00:48:19.070 --> 00:48:22.580
need to check the return value
when comparing names bracket i

00:48:22.580 --> 00:48:24.380
against "Ron" to equal 0.

00:48:24.380 --> 00:48:28.130
Because only in the case when
the return value of strcmp is 0

00:48:28.130 --> 00:48:31.280
do I actually have a match.

00:48:31.280 --> 00:48:36.590
By contrast, if the function returns
a negative value or the function

00:48:36.590 --> 00:48:39.470
returns a positive value,
that means it's not a match.

00:48:39.470 --> 00:48:42.230
That means that one name is
supposed to come before the other

00:48:42.230 --> 00:48:44.240
or after the other.

00:48:44.240 --> 00:48:47.870
But the catch with my shorthand
syntax here, which is not always

00:48:47.870 --> 00:48:52.880
an incorrect syntax to use, whenever
you have a Boolean expression inside

00:48:52.880 --> 00:48:56.340
of which is a function call like this--

00:48:56.340 --> 00:48:58.550
notice that the entirety
of my Boolean expression

00:48:58.550 --> 00:49:01.830
is just a call, so to speak, to strcmp.

00:49:01.830 --> 00:49:06.350
I'm passing in two inputs, names
bracket i and quote, unquote "Ron."

00:49:06.350 --> 00:49:10.070
And therefore, I'm expecting strcmp
to return output, a so-called return

00:49:10.070 --> 00:49:10.730
value.

00:49:10.730 --> 00:49:15.200
That return value is going to
be negative or positive or 0.

00:49:15.200 --> 00:49:18.530
And in fact, to be clear, if
the first name being searched

00:49:18.530 --> 00:49:22.760
for is "Bill" and names
bracket i or names bracket 0

00:49:22.760 --> 00:49:26.540
is "Bill," "Bill" comma
"Ron" is effectively what

00:49:26.540 --> 00:49:28.730
my input is on the first iteration.

00:49:28.730 --> 00:49:31.820
"Bill," alphabetically
and ASCIIbetically,

00:49:31.820 --> 00:49:34.520
comes before "Ron,"
which means it should

00:49:34.520 --> 00:49:36.980
be returning a negative value to me.

00:49:36.980 --> 00:49:41.450
And the problem with Boolean expressions
is, as implemented in this context,

00:49:41.450 --> 00:49:44.870
is that only 0 is false.

00:49:44.870 --> 00:49:50.240
Any other return value is by
definition true or a Yes answer,

00:49:50.240 --> 00:49:53.810
whether it's negative 1 or
positive 1, negative 1 million

00:49:53.810 --> 00:49:55.340
or positive 1 million--

00:49:55.340 --> 00:49:58.850
any non-zero value in a
computer language like C

00:49:58.850 --> 00:50:02.660
is considered true,
also known as truthy.

00:50:02.660 --> 00:50:06.260
Any value that is 0 is
considered false, but only

00:50:06.260 --> 00:50:07.880
that value is considered false.

00:50:07.880 --> 00:50:12.560
So really, I was getting lucky at first,
because my program was finding "Bill,"

00:50:12.560 --> 00:50:14.450
but I was confusing "Bill" for "Ron."

00:50:14.450 --> 00:50:17.960
Then when I did it again for
Caleb and I capitalized "Ron,"

00:50:17.960 --> 00:50:20.690
I was getting unlucky,
because suddenly I

00:50:20.690 --> 00:50:23.030
knew "RON" capitalized
wasn't in the array,

00:50:23.030 --> 00:50:24.590
and yet I'm still saying he's found.

00:50:24.590 --> 00:50:28.580
But that's because I didn't practice
what I preach per Sophia's find.

00:50:28.580 --> 00:50:31.490
And so if I actually compare
this against 0-- and now, Caleb,

00:50:31.490 --> 00:50:33.260
we come full circle to your question--

00:50:33.260 --> 00:50:37.970
I rebuild this program with make names,
I now do ./names and search for all

00:50:37.970 --> 00:50:43.280
caps "RON," I should now
see, thankfully, "Not found."

00:50:43.280 --> 00:50:46.550
So I wish I could say that
was deliberate, but thus

00:50:46.550 --> 00:50:48.410
is the common case of bugs.

00:50:48.410 --> 00:50:50.820
So here I am 20 years later
making bugs in my code.

00:50:50.820 --> 00:50:53.360
So if you run up to a
similar problem this week,

00:50:53.360 --> 00:50:58.910
rest assured that it never ends.

00:50:58.910 --> 00:51:01.430
But hopefully you won't have
several people watching you

00:51:01.430 --> 00:51:03.120
while you do your problem set this week.

00:51:03.120 --> 00:51:03.620
All right.

00:51:03.620 --> 00:51:05.660
Any questions, then, beyond Caleb's?

00:51:05.660 --> 00:51:07.640
So great question, Caleb,
and the answer is no.

00:51:07.640 --> 00:51:09.540
It is case sensitive.

00:51:09.540 --> 00:51:11.780
So it does not find "Rob"--

00:51:11.780 --> 00:51:13.270
"RON."

00:51:13.270 --> 00:51:16.530
Any questions here?

00:51:16.530 --> 00:51:20.640
Any questions on linear
search using strings?

00:51:20.640 --> 00:51:21.140
No?

00:51:21.140 --> 00:51:24.230
All right, well, let's go ahead
and do one final example, I think,

00:51:24.230 --> 00:51:25.280
with searching.

00:51:25.280 --> 00:51:27.260
But let's introduce
just one other feature.

00:51:27.260 --> 00:51:29.540
And this one's actually
pretty cool and powerful.

00:51:29.540 --> 00:51:33.350
Up until now, we've been using
data types that just come with C

00:51:33.350 --> 00:51:37.770
or come from CS50, like int, and
char, and float, and the like.

00:51:37.770 --> 00:51:41.330
And you'll see now that
there's actually sometimes

00:51:41.330 --> 00:51:44.570
reasons where you or I might want
to create our own custom data types,

00:51:44.570 --> 00:51:48.500
our own types that didn't exist
when C itself was invented.

00:51:48.500 --> 00:51:51.890
So for instance, suppose that
I wanted to represent not just

00:51:51.890 --> 00:51:54.440
a whole bunch of numbers and
not just a whole bunch of names,

00:51:54.440 --> 00:51:57.065
but suppose I want to implement
like a full-fledged phone book.

00:51:57.065 --> 00:52:00.110
A phone book, of course,
contains both names and numbers.

00:52:00.110 --> 00:52:02.870
And suppose I want to combine
these two ideas together.

00:52:02.870 --> 00:52:06.950
Wouldn't it be nice if I could
have a data structure that

00:52:06.950 --> 00:52:10.550
is a data type that has some
structure to it that can actually

00:52:10.550 --> 00:52:11.850
store both at once?

00:52:11.850 --> 00:52:16.670
And in fact, wouldn't it be nice
if C had a data type called person,

00:52:16.670 --> 00:52:19.550
so that if I want to represent a
person, like in a phone book, who

00:52:19.550 --> 00:52:22.490
had both a name and a
number, I can actually

00:52:22.490 --> 00:52:27.320
implement that and code by calling
that variable of type person?

00:52:27.320 --> 00:52:30.170
Now, of course, the designers
of C did not have the foresight

00:52:30.170 --> 00:52:32.308
to create a data type called person.

00:52:32.308 --> 00:52:34.100
And, indeed, that would
be a slippery slope

00:52:34.100 --> 00:52:37.820
if they had a data type for every
real-world entity you can think of.

00:52:37.820 --> 00:52:40.830
But they did give us the
capabilities to do this.

00:52:40.830 --> 00:52:44.810
So if a person, in our limited
world here of phone books,

00:52:44.810 --> 00:52:49.610
has both a name and a number, we
might think of it as follows--

00:52:49.610 --> 00:52:51.770
a name and a number,
both of type string.

00:52:51.770 --> 00:52:53.060
But a quick check here.

00:52:53.060 --> 00:52:56.330
Why have I now decided,
somewhat presumptuously,

00:52:56.330 --> 00:53:00.110
to call phone numbers strings as well?

00:53:00.110 --> 00:53:02.360
We've been talking about
ints behind these doors.

00:53:02.360 --> 00:53:04.580
We've been searching for ints in code.

00:53:04.580 --> 00:53:08.150
But why did I just presume
to propose that we instead

00:53:08.150 --> 00:53:12.560
implement a phone book using
strings for names and numbers?

00:53:12.560 --> 00:53:14.650
Any thoughts here, Kurt?

00:53:14.650 --> 00:53:16.070
AUDIENCE: Yeah.

00:53:16.070 --> 00:53:17.570
Because we're not doing math on it.

00:53:17.570 --> 00:53:21.065
It's like-- a phone number could
be, like, letters for all we care.

00:53:21.065 --> 00:53:22.940
And in fact, I mean,
sometimes you see, like,

00:53:22.940 --> 00:53:26.160
1-800 Contacts or something like
that, and maybe we want to allow that.

00:53:26.160 --> 00:53:27.410
DAVID MALAN: Yeah, absolutely.

00:53:27.410 --> 00:53:30.780
A phone number, despite its name,
isn't necessarily just a number.

00:53:30.780 --> 00:53:33.710
It might be 1-800 Contacts,
which is an English word.

00:53:33.710 --> 00:53:36.020
It might have hyphens in it or dashes.

00:53:36.020 --> 00:53:37.640
It might have parentheses in it.

00:53:37.640 --> 00:53:39.800
It might have a plus
sign for country codes.

00:53:39.800 --> 00:53:41.720
There's a lot of characters
that we absolutely

00:53:41.720 --> 00:53:46.257
can represent in C using strings that
we couldn't represent in C using int.

00:53:46.257 --> 00:53:48.090
And so indeed, even
though in the real world

00:53:48.090 --> 00:53:51.710
there are these "numbers" that you and
I talk about once in a while like phone

00:53:51.710 --> 00:53:55.580
numbers, maybe in the US Social
Security numbers, credit card numbers,

00:53:55.580 --> 00:54:00.170
those aren't necessarily values that
you want to treat as actual integers.

00:54:00.170 --> 00:54:02.570
And in fact, those of you
who did the credit problem

00:54:02.570 --> 00:54:04.760
and tried to validate
credit card numbers

00:54:04.760 --> 00:54:08.720
may very well have run into challenges
by using a long to represent a credit

00:54:08.720 --> 00:54:09.650
card number.

00:54:09.650 --> 00:54:11.900
It probably in retrospect
might very well

00:54:11.900 --> 00:54:15.560
have been easier for you to treat
credit card numbers as strings.

00:54:15.560 --> 00:54:17.810
The catch, of course, by
design is that you didn't yet

00:54:17.810 --> 00:54:21.410
have strings in your
vocabulary, at least in C yet.

00:54:21.410 --> 00:54:23.390
So suppose I want to
create my own custom data

00:54:23.390 --> 00:54:27.800
type that encapsulates, if you
will, two different types of values.

00:54:27.800 --> 00:54:31.430
A person shall be henceforth
a name and a number.

00:54:31.430 --> 00:54:34.850
It turns out that C gives
us this syntax here.

00:54:34.850 --> 00:54:39.770
This is the only juicy piece of new
syntax besides those curly braces

00:54:39.770 --> 00:54:42.880
a moment ago that we'll
see today in C, typedef.

00:54:42.880 --> 00:54:44.810
And as the name rather
succinctly suggests,

00:54:44.810 --> 00:54:47.240
this allows you to define a type.

00:54:47.240 --> 00:54:50.120
And the type will be a
structure of some sort.

00:54:50.120 --> 00:54:52.610
So a data structure in
a programming language

00:54:52.610 --> 00:54:56.600
is typically a data type that
has some structure to it.

00:54:56.600 --> 00:54:57.980
What do we mean by "structure"?

00:54:57.980 --> 00:55:01.610
It typically has one or
more values inside of it.

00:55:01.610 --> 00:55:04.940
So using typedef, and in turn
using the struct keyword,

00:55:04.940 --> 00:55:06.770
we can create our own
custom types that's

00:55:06.770 --> 00:55:10.320
a structure, a composition
of multiple other data types.

00:55:10.320 --> 00:55:14.360
So if we want to keep persons together
as their own custom data type,

00:55:14.360 --> 00:55:15.890
the syntax is a little cryptic here.

00:55:15.890 --> 00:55:19.310
You literally do typedef
struct open curly brace,

00:55:19.310 --> 00:55:22.137
then one per line you specify
the data types that you want

00:55:22.137 --> 00:55:24.470
and the names that you want
to give to those data types,

00:55:24.470 --> 00:55:26.270
for instance name and number.

00:55:26.270 --> 00:55:28.850
And then outside of the
closing curly brace,

00:55:28.850 --> 00:55:33.290
you literally put the word "person,"
if that's indeed the data type

00:55:33.290 --> 00:55:35.280
that you want to invent.

00:55:35.280 --> 00:55:37.790
So how can we use this more powerfully?

00:55:37.790 --> 00:55:42.020
Well, let's go ahead and do things the
wrong way without this feature first,

00:55:42.020 --> 00:55:44.060
so as to motivate its existence.

00:55:44.060 --> 00:55:47.016
Let me go ahead and save
this file as phonebook.c.

00:55:47.016 --> 00:55:50.420
And let me start, as always,
with includes cs50.h.

00:55:50.420 --> 00:55:53.030
And then let me go ahead
and include stdio.h.

00:55:53.030 --> 00:55:56.180
And then lastly, let me
also include string.h,

00:55:56.180 --> 00:55:59.100
because I know I'm going to be
manipulating some strings in a moment.

00:55:59.100 --> 00:56:03.200
Let me go ahead now, and within
my main function, let me go ahead

00:56:03.200 --> 00:56:06.050
and give myself initially, for
the first version of this program,

00:56:06.050 --> 00:56:07.370
a whole bunch of names.

00:56:07.370 --> 00:56:10.940
Specifically, how about
"Brian" comma "David"?

00:56:10.940 --> 00:56:15.200
We'll keep it short, just so as to focus
on the ideas and not the actual data

00:56:15.200 --> 00:56:15.862
they're in.

00:56:15.862 --> 00:56:17.570
Then Brian and I each
have phone numbers.

00:56:17.570 --> 00:56:19.890
So let's go ahead and
store them in an array--

00:56:19.890 --> 00:56:28.890
numbers equals, again the curly braces
as before, and +1-617-495-1000--

00:56:28.890 --> 00:56:32.670
and indeed, there's already motivation,
per Kurt's comment, to use strings,

00:56:32.670 --> 00:56:35.250
because we've got a plus and
a couple of dashes in there--

00:56:35.250 --> 00:56:36.460
and then my number here.

00:56:36.460 --> 00:56:43.020
So we'll do +1-949-468-2750
close curly brace, semicolon.

00:56:43.020 --> 00:56:45.900
So I've gone ahead and declared
two arrays, one called names,

00:56:45.900 --> 00:56:47.040
one called numbers.

00:56:47.040 --> 00:56:52.410
And I'm just going to have
sort of a handshake agreement

00:56:52.410 --> 00:56:56.380
that the first name in names corresponds
to the first number in numbers,

00:56:56.380 --> 00:56:59.702
the second name in names corresponds
to the second number in numbers.

00:56:59.702 --> 00:57:02.910
And you can imagine that working well
so long as you don't make any mistakes,

00:57:02.910 --> 00:57:04.950
and you have just the
right number in each.

00:57:04.950 --> 00:57:08.520
Now let me go ahead and do int
i equals 0, i less than 2--

00:57:08.520 --> 00:57:11.580
I'm going to keep that hardcoded for
now just to do the demonstration.

00:57:11.580 --> 00:57:14.910
And then inside of this loop, let me go
ahead and search for my phone number,

00:57:14.910 --> 00:57:17.350
for instance, even though
I happen to be at the end.

00:57:17.350 --> 00:57:22.590
So if strcmp of names bracket i equals--

00:57:22.590 --> 00:57:27.360
rather, comma "David" equals equals 0--

00:57:27.360 --> 00:57:29.200
so I'm not going to
make that mistake again.

00:57:29.200 --> 00:57:37.290
Let me go ahead inside of this
loop, inside of this condition here.

00:57:37.290 --> 00:57:41.650
And I'm going to go ahead and do the
following-- print out that I found,

00:57:41.650 --> 00:57:43.910
for instance my number.

00:57:43.910 --> 00:57:45.160
And I'm going to plug that in.

00:57:45.160 --> 00:57:47.730
So numbers bracket i.

00:57:47.730 --> 00:57:50.430
And then as before, I'm going
to go ahead and return 0.

00:57:50.430 --> 00:57:54.090
And if none of this works out, and
I happen not to be in this array,

00:57:54.090 --> 00:57:59.200
I'll go ahead and print out as
before "Not found" with a semicolon.

00:57:59.200 --> 00:58:00.870
And then I'll return 1 arbitrarily.

00:58:00.870 --> 00:58:03.750
I can return negative 1, I could
return a million, negative million.

00:58:03.750 --> 00:58:07.530
But human convention would typically
have you go from 0 to 1 to 2 to 3

00:58:07.530 --> 00:58:11.050
on up, if you have that many
possible error conditions.

00:58:11.050 --> 00:58:11.760
All right.

00:58:11.760 --> 00:58:14.730
So I essentially have implemented
in C a phone book of sorts.

00:58:14.730 --> 00:58:16.710
We did this verbally in week 0.

00:58:16.710 --> 00:58:17.987
Now I'm doing it in code.

00:58:17.987 --> 00:58:19.070
It's a limited phone book.

00:58:19.070 --> 00:58:21.030
It's only got two names and two numbers.

00:58:21.030 --> 00:58:22.988
But I could certainly
implement this phone book

00:58:22.988 --> 00:58:26.230
by just using two arrays, two
parallel arrays, if you will,

00:58:26.230 --> 00:58:29.490
by just using the honor system that
the first element and names lines

00:58:29.490 --> 00:58:32.072
up with the first elements
and numbers and so forth.

00:58:32.072 --> 00:58:33.780
Now hopefully, if I
don't make any typos,

00:58:33.780 --> 00:58:36.030
let me go ahead and make phonebook.

00:58:36.030 --> 00:58:36.530
All right.

00:58:36.530 --> 00:58:41.440
It compiled OK. ./phonebook, and it
found what seems to be my number there.

00:58:41.440 --> 00:58:45.040
So it seems to work correctly, though
I've tried to pull that one over you

00:58:45.040 --> 00:58:45.540
before.

00:58:45.540 --> 00:58:48.100
But I'm pretty sure this one
actually works correctly.

00:58:48.100 --> 00:58:51.990
And so we found my name
and in turn number.

00:58:51.990 --> 00:58:57.030
But why is the design of this
code not necessarily the best?

00:58:57.030 --> 00:59:00.090
This is starting to get
more subtle, admittedly.

00:59:00.090 --> 00:59:02.980
And we've seen that we
can do this differently.

00:59:02.980 --> 00:59:05.372
But what rubs you the
wrong way about here?

00:59:05.372 --> 00:59:07.830
This is another example of what
we might call "code smell."

00:59:07.830 --> 00:59:10.710
Like, something's a little
funky here, like, ah, this

00:59:10.710 --> 00:59:13.500
might not be the best solution
long term Nick, what do you think?

00:59:13.500 --> 00:59:14.125
AUDIENCE: Yeah.

00:59:14.125 --> 00:59:15.378
So what I'm guessing is that--

00:59:15.378 --> 00:59:18.420
like, you know how you made the data
frame before the new data structure,

00:59:18.420 --> 00:59:20.920
where the two things
were linked together?

00:59:20.920 --> 00:59:23.010
In this case, we're
just banking on the fact

00:59:23.010 --> 00:59:27.270
that we don't screw something up
and unintentionally unlink them

00:59:27.270 --> 00:59:29.390
from the same index.

00:59:29.390 --> 00:59:32.812
So they're not intrinsically
linked, which might not be--

00:59:32.812 --> 00:59:34.770
DAVID MALAN: That's
exactly the right instinct.

00:59:34.770 --> 00:59:38.790
In general, as great as a programmer
as you're maybe aspiring to be,

00:59:38.790 --> 00:59:39.690
you're not all that.

00:59:39.690 --> 00:59:41.357
And like, you're going to make mistakes.

00:59:41.357 --> 00:59:44.700
And the more you can write
code that's self-defensive,

00:59:44.700 --> 00:59:47.880
that protects you from yourself,
the better off you're going to be,

00:59:47.880 --> 00:59:49.890
the more correct your
code is going to be,

00:59:49.890 --> 00:59:53.460
and the more easily you're going to
be able to collaborate successfully,

00:59:53.460 --> 00:59:56.710
if you so choose in the real world,
on real-world programming projects,

00:59:56.710 --> 00:59:59.520
whether for a research project, a
full-time job, a personal project

00:59:59.520 --> 01:00:00.210
or the like.

01:00:00.210 --> 01:00:04.290
Generally speaking, you should
not trust yourself or other people

01:00:04.290 --> 01:00:06.060
that-- with whom you're
writing code, you

01:00:06.060 --> 01:00:11.320
should have as many defense mechanisms
in place exactly along these lines.

01:00:11.320 --> 01:00:15.030
So yes, there's nothing wrong
with what I have done in the sense

01:00:15.030 --> 01:00:16.620
that this is correct.

01:00:16.620 --> 01:00:21.480
But as noted, if you screw up, and
maybe you get an off by one error--

01:00:21.480 --> 01:00:24.210
maybe you transpose two
names or two numbers.

01:00:24.210 --> 01:00:26.910
I mean, imagine if you've got
dozens of names and numbers,

01:00:26.910 --> 01:00:29.220
hundreds of names and
numbers, thousands of them

01:00:29.220 --> 01:00:32.820
the odds that you or someone
messes the order up at some point

01:00:32.820 --> 01:00:35.370
is just probably going
to be too, too high.

01:00:35.370 --> 01:00:39.250
So it would be nice, then, if we could
sort of keep related data together.

01:00:39.250 --> 01:00:42.805
This is kind of a hack, to just on the
honor system say, my arrays line up,

01:00:42.805 --> 01:00:45.180
I'm just going to make sure
to keep them the same length.

01:00:45.180 --> 01:00:46.110
We can do better.

01:00:46.110 --> 01:00:51.030
Let's keep related data together and
design this a little more cleanly.

01:00:51.030 --> 01:00:53.570
And I can do this by
defining my own type

01:00:53.570 --> 01:00:55.650
that I'll call for instance, a person.

01:00:55.650 --> 01:00:58.790
So at the top of this
file, before main, I'm

01:00:58.790 --> 01:01:01.830
going to go ahead and
typedef a structure, inside

01:01:01.830 --> 01:01:04.380
of which are the two types
of data that I care about,

01:01:04.380 --> 01:01:08.520
string name and string
number, just as before.

01:01:08.520 --> 01:01:15.030
Notice, though, here that what I have
done here is not give myself an array.

01:01:15.030 --> 01:01:17.738
I've given myself one
name and one number.

01:01:17.738 --> 01:01:20.280
Outside of this curly brace,
I'm going to give this data type

01:01:20.280 --> 01:01:21.690
a name, which I could call "person."

01:01:21.690 --> 01:01:23.482
I could call it anything
I want, but person

01:01:23.482 --> 01:01:25.410
seems pretty reasonable in this case.

01:01:25.410 --> 01:01:29.890
And now down here, I'm going to go
ahead and change this code a little bit.

01:01:29.890 --> 01:01:32.370
I'm going to go ahead and
give myself an array still,

01:01:32.370 --> 01:01:36.240
but this time I'm going to give
myself an array of persons.

01:01:36.240 --> 01:01:39.360
And I'm going to call that array,
somewhat playfully, "people,"

01:01:39.360 --> 01:01:44.940
because I want to have two persons, two
people, in this program, me and Brian.

01:01:44.940 --> 01:01:47.215
Now I want to go ahead
and populate this array.

01:01:47.215 --> 01:01:48.840
That is, I want to fill it with values.

01:01:48.840 --> 01:01:50.940
And this syntax is a
little new, but it's just

01:01:50.940 --> 01:01:55.230
to enable us to actually store
values inside of a structure.

01:01:55.230 --> 01:01:58.050
If I want to index into
this array, there's

01:01:58.050 --> 01:01:59.460
nothing different from last week.

01:01:59.460 --> 01:02:01.380
I do people bracket 0.

01:02:01.380 --> 01:02:06.270
That's going to give me the first person
variable inside, so probably where

01:02:06.270 --> 01:02:07.890
"Brian" is supposed to go.

01:02:07.890 --> 01:02:10.350
The one last piece of
syntax I need is how do I

01:02:10.350 --> 01:02:14.730
go inside of that structure,
that person data structure,

01:02:14.730 --> 01:02:16.450
and access the person's name?

01:02:16.450 --> 01:02:17.710
I literally just do a dot.

01:02:17.710 --> 01:02:22.620
So people bracket 0 gives me the
first person in the people array.

01:02:22.620 --> 01:02:26.580
And then the dot means, go inside
of it and grab the person variable.

01:02:26.580 --> 01:02:29.880
I'm going to go ahead and set that
name equal to quote, unquote "Brian."

01:02:29.880 --> 01:02:31.990
The syntax now for his
name is almost identical--

01:02:31.990 --> 01:02:39.630
people bracket 0 dot number equals
quote, unquote "+1-617-495-1000"

01:02:39.630 --> 01:02:40.740
semicolon.

01:02:40.740 --> 01:02:43.092
Meanwhile, if I want to
access a location for myself,

01:02:43.092 --> 01:02:45.300
I'm going to go ahead and
put it at location 1, which

01:02:45.300 --> 01:02:46.620
is the second location.

01:02:46.620 --> 01:02:48.630
Name will be, quote, unquote "David."

01:02:48.630 --> 01:02:52.020
And then over here, I'm going to do
people bracket 1 dot number equals

01:02:52.020 --> 01:02:59.250
quote, unquote "+1-949-468-2750"
close quote, semicolon.

01:02:59.250 --> 01:03:01.320
So it's a bit verbose, admittedly.

01:03:01.320 --> 01:03:05.430
But you could imagine, if we just let
our thoughts run ahead of ourselves

01:03:05.430 --> 01:03:08.370
here, if you used get_string, could
sort of automatically do this.

01:03:08.370 --> 01:03:11.453
If you used command line arguments,
maybe you could populate some of this.

01:03:11.453 --> 01:03:14.190
We don't just have to hardcode,
that is, write my name and number

01:03:14.190 --> 01:03:15.668
and Brian's into this program.

01:03:15.668 --> 01:03:17.460
You can imagine doing
this more dynamically

01:03:17.460 --> 01:03:20.700
using some of our techniques, using
get_string and so forth, from week 1.

01:03:20.700 --> 01:03:22.920
But for now, it's just
for demonstration's sake.

01:03:22.920 --> 01:03:29.040
So now if I want to search this new
array, this new single array of people,

01:03:29.040 --> 01:03:31.410
I think my for loop can stay the same.

01:03:31.410 --> 01:03:33.990
And I think I can still use strcmp.

01:03:33.990 --> 01:03:38.970
But now I need to go inside
of not names but people,

01:03:38.970 --> 01:03:41.740
and look for the dot name field.

01:03:41.740 --> 01:03:45.008
So data structures have fields
or variables inside of them.

01:03:45.008 --> 01:03:46.800
So I'm going to use
the dot notation there,

01:03:46.800 --> 01:03:50.670
too, go into the i-th
person in the people array,

01:03:50.670 --> 01:03:53.610
and compare that name against, for
instance, quote, unquote "David."

01:03:53.610 --> 01:03:57.300
And then if I have found
"David," in this case myself,

01:03:57.300 --> 01:04:01.920
go ahead and access the people array
again, but print out using printf

01:04:01.920 --> 01:04:02.830
the number.

01:04:02.830 --> 01:04:05.820
So again, the dot operator is
the only new piece of syntax

01:04:05.820 --> 01:04:10.120
that's letting us go inside of this
new feature known as a data structure.

01:04:10.120 --> 01:04:12.930
If I go ahead and make phonebook
again after making those changes,

01:04:12.930 --> 01:04:13.610
all is well.

01:04:13.610 --> 01:04:14.850
It compiled OK.

01:04:14.850 --> 01:04:20.410
And if I run ./phonebook, I now have
hopefully found my number again.

01:04:20.410 --> 01:04:24.420
So here is a seemingly
useless exercise, in that all

01:04:24.420 --> 01:04:28.050
I really did was re-implement the
same program using more lines of code

01:04:28.050 --> 01:04:29.630
and making it more complicated.

01:04:29.630 --> 01:04:31.460
But it's now better designed.

01:04:31.460 --> 01:04:34.130
Or it's a step toward being
better designed, because now I've

01:04:34.130 --> 01:04:38.330
encapsulated all inside of one variable,
for instance, people bracket 0,

01:04:38.330 --> 01:04:42.590
people bracket 1, all of the information
we care about with respect to Brian,

01:04:42.590 --> 01:04:46.800
or me, or anyone else we
might put into this program.

01:04:46.800 --> 01:04:49.880
And indeed, this is how programs,
this is how Googles, of the world,

01:04:49.880 --> 01:04:52.520
Facebooks of the world store
lots of information together.

01:04:52.520 --> 01:04:55.220
Consider any of your social
media accounts like Instagram,

01:04:55.220 --> 01:04:57.230
or Facebook, or Snapchat and the like.

01:04:57.230 --> 01:04:59.810
You have multiple pieces
of data associated

01:04:59.810 --> 01:05:02.810
with you on all of those
platforms-- not just your username

01:05:02.810 --> 01:05:06.050
but also your password,
also your history of posts,

01:05:06.050 --> 01:05:08.265
also your friends and
followers and the like.

01:05:08.265 --> 01:05:11.390
So there's a lot of information that
these companies, for better for worse,

01:05:11.390 --> 01:05:12.950
are collecting on all of us.

01:05:12.950 --> 01:05:17.660
And can you imagine if they just had
one big array with all of our usernames,

01:05:17.660 --> 01:05:21.892
one big array with all of our passwords,
one big array with all of our friends?

01:05:21.892 --> 01:05:23.600
Like, you can imagine
certainly at scale,

01:05:23.600 --> 01:05:26.418
that's got to be a bad
design, to just trust

01:05:26.418 --> 01:05:29.210
that you're going to get the ordering
of all of these things right.

01:05:29.210 --> 01:05:30.110
They don't do that.

01:05:30.110 --> 01:05:34.040
They instead write code in some
language that somehow encapsulates

01:05:34.040 --> 01:05:36.590
all the information
related to me and Brian

01:05:36.590 --> 01:05:39.620
and you inside of some
kind of data structure.

01:05:39.620 --> 01:05:43.370
And that's what they put in their
database or some other server

01:05:43.370 --> 01:05:44.360
on their back end.

01:05:44.360 --> 01:05:47.720
So this encapsulation is a
feature we now have in terms of C.

01:05:47.720 --> 01:05:51.740
And it allows us to create our own
data structures that we can then use

01:05:51.740 --> 01:05:56.090
in order to keep related data together.

01:05:56.090 --> 01:05:59.990
All right, any questions, then,
on data structures, or more

01:05:59.990 --> 01:06:04.130
specifically typedef and
struct, the C keywords

01:06:04.130 --> 01:06:08.360
with which you can create
your own custom types

01:06:08.360 --> 01:06:10.760
that themselves are data structures?

01:06:10.760 --> 01:06:11.720
Besley?

01:06:11.720 --> 01:06:12.590
AUDIENCE: Hi.

01:06:12.590 --> 01:06:17.450
So is it typical to define the new
data structure outside of main, like

01:06:17.450 --> 01:06:17.950
in a header?

01:06:17.950 --> 01:06:19.367
DAVID MALAN: Really good question.

01:06:19.367 --> 01:06:22.070
Is it typical to define a new
data structure outside of main?

01:06:22.070 --> 01:06:23.270
Quite often yes.

01:06:23.270 --> 01:06:26.270
In this case, it's immaterial,
because I only have

01:06:26.270 --> 01:06:28.190
one function in this program, main.

01:06:28.190 --> 01:06:30.530
But as we'll see this week
and next week and onward,

01:06:30.530 --> 01:06:34.370
our programs are going to start to get
a little more complicated by nature

01:06:34.370 --> 01:06:35.930
of just having more features.

01:06:35.930 --> 01:06:38.987
And once you have more features,
you probably have more functions.

01:06:38.987 --> 01:06:41.570
And when you have more functions,
you want your data structure

01:06:41.570 --> 01:06:44.930
to be available to all
of those functions.

01:06:44.930 --> 01:06:49.100
And so we'll begin to see definition
of some of these structures being,

01:06:49.100 --> 01:06:51.620
indeed, outside of our own functions.

01:06:51.620 --> 01:06:53.000
Peter, over to you.

01:06:53.000 --> 01:06:54.020
AUDIENCE: Oh, yeah.

01:06:54.020 --> 01:06:58.160
Will we define new classes
in header files later,

01:06:58.160 --> 01:07:00.350
or will we keep defining
them outside of main?

01:07:00.350 --> 01:07:01.767
DAVID MALAN: Really good question.

01:07:01.767 --> 01:07:05.430
Might we define our own types and our
own data structures in header files?

01:07:05.430 --> 01:07:05.930
Yes.

01:07:05.930 --> 01:07:07.220
Eventually we'll do that, too.

01:07:07.220 --> 01:07:11.210
Thus far, you and I have only been using
header files that other people wrote.

01:07:11.210 --> 01:07:15.770
We've been using stdio.h, string.h,
that the authors of C created.

01:07:15.770 --> 01:07:19.070
You've been using cs50.h
which we the staff wrote.

01:07:19.070 --> 01:07:23.210
It turns out, you can also create your
own header files, your own .h files,

01:07:23.210 --> 01:07:27.870
inside of which are pieces of code that
you want to share across multiple files

01:07:27.870 --> 01:07:28.370
of your own.

01:07:28.370 --> 01:07:29.453
We're not quite there yet.

01:07:29.453 --> 01:07:32.990
But yes, Peter, that would
be a solution to this problem

01:07:32.990 --> 01:07:36.130
by putting it in one central place.

01:07:36.130 --> 01:07:37.580
Thiago, over to you.

01:07:41.120 --> 01:07:43.370
AUDIENCE: I was--

01:07:43.370 --> 01:07:54.265
I was thinking, this course really takes
enough information to solve the sets,

01:07:54.265 --> 01:08:00.590
because I feel there
is missing information.

01:08:00.590 --> 01:08:07.490
I am a freshman, and I was taking--

01:08:07.490 --> 01:08:14.780
I was so concentrating, and I
can't go on, go ahead on the sets.

01:08:14.780 --> 01:08:17.302
Is there anything that I'm missing?

01:08:17.302 --> 01:08:19.010
DAVID MALAN: It's a
really good question.

01:08:19.010 --> 01:08:19.700
And quite fair.

01:08:19.700 --> 01:08:21.920
We do move quite quickly, admittedly.

01:08:21.920 --> 01:08:26.720
So indeed, recall from week
0 the fire hose metaphor

01:08:26.720 --> 01:08:29.270
that I borrowed from
MIT's water fountain.

01:08:29.270 --> 01:08:31.069
Indeed, that's very much the case.

01:08:31.069 --> 01:08:33.569
There's a lot of new syntax, a
lot of new ideas all at once.

01:08:33.569 --> 01:08:37.310
But when it comes to individual
problems in the problem sets,

01:08:37.310 --> 01:08:39.920
do realize that you should
take those step by step.

01:08:39.920 --> 01:08:45.460
And invariably, they tend to work from
less complicated to more complicated.

01:08:45.460 --> 01:08:47.960
And throughout each of the
lectures and each of the examples

01:08:47.960 --> 01:08:50.390
that we do, either live
or via the examples that

01:08:50.390 --> 01:08:53.060
are premade on the course's
website for your review,

01:08:53.060 --> 01:08:57.529
there's always little clues or hints
or examples that you can then do.

01:08:57.529 --> 01:09:01.910
And certainly, by way of other
resources like labs and the like,

01:09:01.910 --> 01:09:04.200
will you see additional
building blocks as well.

01:09:04.200 --> 01:09:06.080
So feel free to reach out
more individually afterword.

01:09:06.080 --> 01:09:07.997
Happy to point you at
some of those resources.

01:09:07.997 --> 01:09:10.910
In fact, most recently, too, will
you notice on the course's website

01:09:10.910 --> 01:09:13.399
what we call "shorts," which
are shorter videos made

01:09:13.399 --> 01:09:17.120
by another colleague of mine, CS50's
own Doug Lloyd, which are literally

01:09:17.120 --> 01:09:19.680
short videos on very specific topics.

01:09:19.680 --> 01:09:22.490
So after today, you'll
see short videos by Doug

01:09:22.490 --> 01:09:25.580
with a different perspective on
linear search, on binary search,

01:09:25.580 --> 01:09:28.260
and on a number of other
algorithms as well.

01:09:28.260 --> 01:09:28.859
Good question.

01:09:28.859 --> 01:09:30.567
Sophia, back to you.

01:09:30.567 --> 01:09:32.609
AUDIENCE: I was wondering,
with the return values

01:09:32.609 --> 01:09:37.380
that we have for different
error cases, would that be--

01:09:37.380 --> 01:09:39.630
like, what's an example of
what we would use that for?

01:09:39.630 --> 01:09:42.540
Is that for later if there are
like several different cases

01:09:42.540 --> 01:09:44.356
and we want to somehow
keep track of them?

01:09:44.356 --> 01:09:45.689
DAVID MALAN: Exactly the latter.

01:09:45.689 --> 01:09:47.790
So right now, honestly,
it's kind of stupid

01:09:47.790 --> 01:09:52.027
that we're even bothering to spend
time returning 0 or returning 1.

01:09:52.027 --> 01:09:54.360
Like, we don't really need
to do that, because we're not

01:09:54.360 --> 01:09:55.410
using the information.

01:09:55.410 --> 01:09:58.200
But what we're trying to
do is lay the foundation

01:09:58.200 --> 01:09:59.782
for more complicated programs.

01:09:59.782 --> 01:10:01.740
And indeed, this week
and next week and beyond,

01:10:01.740 --> 01:10:05.520
as your own programs get a little
longer, and as we, the course,

01:10:05.520 --> 01:10:08.520
start providing you with
starter code or distribution

01:10:08.520 --> 01:10:12.210
code, that is, lines of code that
the staff and I write that you then

01:10:12.210 --> 01:10:15.630
have to build upon, it's going
to be a very useful mechanism

01:10:15.630 --> 01:10:19.360
to be able to signal that this went
wrong or this other thing went wrong.

01:10:19.360 --> 01:10:22.110
So all we're doing is preparing
for that inevitability,

01:10:22.110 --> 01:10:25.870
even if right now it doesn't really
seem to be scratching an itch.

01:10:25.870 --> 01:10:26.630
Anthony?

01:10:26.630 --> 01:10:28.672
AUDIENCE: I was just going
to ask really quickly,

01:10:28.672 --> 01:10:32.200
obviously in this code we have
"Brian" and your name, "David."

01:10:32.200 --> 01:10:33.310
And that's two people.

01:10:33.310 --> 01:10:35.500
So let's say we had 10
or 20 or even 30 people.

01:10:35.500 --> 01:10:37.500
I know it was a question
in the chat, but I just

01:10:37.500 --> 01:10:39.460
wanted to clarify for myself, too.

01:10:39.460 --> 01:10:42.000
DAVID MALAN: And the "what
if" being what would change?

01:10:42.000 --> 01:10:44.210
Or, what was the end of that question?

01:10:44.210 --> 01:10:44.850
AUDIENCE: Yeah.

01:10:44.850 --> 01:10:46.100
What would change in the code?

01:10:46.100 --> 01:10:48.017
Or what we do exactly
to address that problem?

01:10:48.017 --> 01:10:48.850
DAVID MALAN: Ah, OK.

01:10:48.850 --> 01:10:49.450
Good question.

01:10:49.450 --> 01:10:53.305
So if we were to have more names, like a
third name or a tenth name or the like,

01:10:53.305 --> 01:10:56.430
the only things that we would have to
change in this version of the program

01:10:56.430 --> 01:10:58.947
is first, on line 14,
the size of the array.

01:10:58.947 --> 01:11:00.780
So if we're going to
have 10 people, we need

01:11:00.780 --> 01:11:03.180
to decide in advance that
we're going to have 10 people.

01:11:03.180 --> 01:11:08.048
Better still, I could, for instance,
allocate myself a constant up here.

01:11:08.048 --> 01:11:09.840
So let me actually go
up here, just like we

01:11:09.840 --> 01:11:17.670
did in a previous class, where we did
something like this-- const int NUMBER.

01:11:17.670 --> 01:11:19.380
And I'll just initialize this to 10.

01:11:19.380 --> 01:11:21.450
And recall that const means constant.

01:11:21.450 --> 01:11:23.140
That means this variable can't change.

01:11:23.140 --> 01:11:24.930
Int, of course means it's an integer.

01:11:24.930 --> 01:11:27.450
The fact that I've capitalized
it is just a human convention

01:11:27.450 --> 01:11:29.640
to make a little visually
clear that this is

01:11:29.640 --> 01:11:32.200
a constant, just so you don't forget.

01:11:32.200 --> 01:11:33.660
But it has no functional role.

01:11:33.660 --> 01:11:36.360
And then this, of course, is just
the value to assign to NUMBER.

01:11:36.360 --> 01:11:40.187
Then I could go down here on
line 16 and plug in that variable

01:11:40.187 --> 01:11:42.270
so that I don't have to
hardcode what people would

01:11:42.270 --> 01:11:44.730
call a "magic number,"
which is just a number that

01:11:44.730 --> 01:11:46.410
appears seemingly out of nowhere.

01:11:46.410 --> 01:11:50.520
Now I've put all of my special
numbers at the top of my file,

01:11:50.520 --> 01:11:54.190
or toward the top of my file, and
now I'm using this variable here.

01:11:54.190 --> 01:11:57.390
And then what I could do-- and I
alluded to this only verbally before--

01:11:57.390 --> 01:12:00.120
I could absolutely start
hardcoding in, for instance,

01:12:00.120 --> 01:12:03.420
Montague's name and number, and
Rithvik's and Benedict's, and Cody's

01:12:03.420 --> 01:12:04.110
and others.

01:12:04.110 --> 01:12:07.048
But honestly, this seems kind of
stupid if you're just hardcoding

01:12:07.048 --> 01:12:08.340
all of these names and numbers.

01:12:08.340 --> 01:12:10.423
And in a few weeks, we'll
see how you can actually

01:12:10.423 --> 01:12:13.200
store all of the same information
in like a spreadsheet,

01:12:13.200 --> 01:12:14.910
or what's called a CSV file--

01:12:14.910 --> 01:12:19.560
Comma Separated Values-- or even in a
proper database, which the Facebooks

01:12:19.560 --> 01:12:21.065
and Googles of the world would use.

01:12:21.065 --> 01:12:23.190
But what I could do for
now is something like this.

01:12:23.190 --> 01:12:28.940
For int i gets 0, i less than
the number of people, i++.

01:12:28.940 --> 01:12:30.720
And maybe I could do
something like this--

01:12:30.720 --> 01:12:38.250
people bracket i dot name equals
get_string, "What's the name"

01:12:38.250 --> 01:12:39.150
question mark.

01:12:39.150 --> 01:12:43.080
And then here I could do people
bracket i dot number equals

01:12:43.080 --> 01:12:46.380
get_string, "What's their number?"

01:12:46.380 --> 01:12:48.220
And I could ask that question, too.

01:12:48.220 --> 01:12:51.320
So now the program's getting
to be a little better designed.

01:12:51.320 --> 01:12:53.700
I'm not arbitrarily
hardcoding just me and Brian.

01:12:53.700 --> 01:12:54.780
Now it's dynamic.

01:12:54.780 --> 01:12:58.330
And technically, the phone book only
supports 10 people at the moment,

01:12:58.330 --> 01:12:59.850
but I could make that dynamic, too.

01:12:59.850 --> 01:13:01.630
I could also call get_int.

01:13:01.630 --> 01:13:04.860
Or, like you did this past week,
use a command line argument

01:13:04.860 --> 01:13:07.770
and parameterize the code
so that it can actually

01:13:07.770 --> 01:13:09.990
be for 2 people, 10
people-- whatever you want,

01:13:09.990 --> 01:13:15.270
the program can dynamically
adapt to it for you.

01:13:15.270 --> 01:13:16.890
Other questions?

01:13:16.890 --> 01:13:22.420
On structs, on types, or the like?

01:13:22.420 --> 01:13:23.170
No?

01:13:23.170 --> 01:13:23.670
All right.

01:13:23.670 --> 01:13:25.110
So how did we get here?

01:13:25.110 --> 01:13:27.810
Recall that we started with
this problem of searching,

01:13:27.810 --> 01:13:30.420
whereby we just want to
find someone in the doors.

01:13:30.420 --> 01:13:32.625
We just want to find
someone in the array.

01:13:32.625 --> 01:13:34.500
We've sort of escalated
things pretty quickly

01:13:34.500 --> 01:13:36.720
to finding not just
numbers or names but now

01:13:36.720 --> 01:13:40.380
names with numbers in the
form of these data structures.

01:13:40.380 --> 01:13:43.770
But to do this efficiently really
requires a smarter algorithm

01:13:43.770 --> 01:13:44.850
like binary search.

01:13:44.850 --> 01:13:50.100
Up until now, we've only used in C code
linear search, even though, recall,

01:13:50.100 --> 01:13:54.240
that we did have at our disposal
the pseudocode for binary search.

01:13:54.240 --> 01:13:58.020
But with binary search, we're going
to need the data to be sorted.

01:13:58.020 --> 01:14:01.080
And so if you want to get the
speed benefits of searching

01:14:01.080 --> 01:14:04.290
more quickly by having sorted
numbers, somehow someone

01:14:04.290 --> 01:14:06.120
is going to have to do that for us.

01:14:06.120 --> 01:14:09.748
Joe, for instance, sorted behind the
curtain all of these numbers for us.

01:14:09.748 --> 01:14:11.790
But what algorithm did he
use is going to open up

01:14:11.790 --> 01:14:15.005
a whole can of worms as to how
we can sort numbers efficiently.

01:14:15.005 --> 01:14:17.130
And indeed, if you're the
Googles and the Facebooks

01:14:17.130 --> 01:14:19.240
and the Instagrams of
the world, with millions,

01:14:19.240 --> 01:14:22.740
billions of pieces of
data and users, you surely

01:14:22.740 --> 01:14:24.850
want to keep that data
sorted, presumably,

01:14:24.850 --> 01:14:26.940
so that you can use
algorithms like binary search

01:14:26.940 --> 01:14:30.520
to find information quickly when you're
searching for friends or for content.

01:14:30.520 --> 01:14:32.730
But let's go ahead and here
take a five-minute break.

01:14:32.730 --> 01:14:35.160
And when we come back, we'll
consider a few algorithms

01:14:35.160 --> 01:14:39.600
for sorting that's going to enable us to
do everything we've just now discussed.

01:14:39.600 --> 01:14:40.910
See you in five.

01:14:40.910 --> 01:14:41.790
All right.

01:14:41.790 --> 01:14:43.050
We are back.

01:14:43.050 --> 01:14:47.760
So to recap, we have a couple different
algorithms for searching, linear search

01:14:47.760 --> 01:14:48.870
and binary search.

01:14:48.870 --> 01:14:52.560
Binary search is clearly the winner
from all measures we've seen thus far.

01:14:52.560 --> 01:14:56.910
The catch is that the data needs
to be sorted in advanced in order

01:14:56.910 --> 01:14:58.107
to apply that algorithm.

01:14:58.107 --> 01:14:59.940
So let's just give
ourselves a working model

01:14:59.940 --> 01:15:01.350
for what it means to sort something.

01:15:01.350 --> 01:15:04.560
Well, as always, if you think of this
as just another problem to be solved,

01:15:04.560 --> 01:15:07.970
it's got input and output, and
the goal is to take that input

01:15:07.970 --> 01:15:08.970
and produce that output.

01:15:08.970 --> 01:15:09.990
Well, what's the input?

01:15:09.990 --> 01:15:12.360
It's going to be a whole
bunch of unsorted values.

01:15:12.360 --> 01:15:14.462
And the goal, of course,
is to get sorted values.

01:15:14.462 --> 01:15:16.420
So the interesting part
of the process is going

01:15:16.420 --> 01:15:18.270
to be whatever there is in the middle.

01:15:18.270 --> 01:15:22.710
But just to be even more concrete, if
we think now in terms of this unsorted

01:15:22.710 --> 01:15:25.650
input as being an array of
input-- because after all,

01:15:25.650 --> 01:15:28.470
that's perhaps the most useful
mechanism we've seen thus far,

01:15:28.470 --> 01:15:32.880
to pass around a bunch of values at
once using just one variable name--

01:15:32.880 --> 01:15:37.710
we might have an array like this, 6 3
8 5 2 7 4 1, which seems to be, indeed,

01:15:37.710 --> 01:15:40.050
randomly ordered, that is unsorted.

01:15:40.050 --> 01:15:43.594
And we want to turn that into an
equivalent array that's just 1 2

01:15:43.594 --> 01:15:45.120
3 4 5 6 7 8.

01:15:45.120 --> 01:15:47.490
So eight numbers this
time instead of seven.

01:15:47.490 --> 01:15:51.190
But the goal this time is not to
search them, per se, but to sort them.

01:15:51.190 --> 01:15:53.730
But before I get ahead
of myself, could someone

01:15:53.730 --> 01:15:56.640
push back on this whole
intellectual exercise

01:15:56.640 --> 01:15:59.400
we're about to do with
sorting in the first place?

01:15:59.400 --> 01:16:02.280
Could someone make an argument
as to why we might not

01:16:02.280 --> 01:16:06.990
want to bother using a sorted
array, why we might not

01:16:06.990 --> 01:16:12.000
want to bother sorting the elements,
and heck, let's just use linear search

01:16:12.000 --> 01:16:13.570
to find some element--

01:16:13.570 --> 01:16:17.460
whether it's a number behind
a door, a name in an array.

01:16:17.460 --> 01:16:23.850
Like, when might we want to just use
linear search and not bother sorting?

01:16:23.850 --> 01:16:25.940
Sophia, what do you think?

01:16:25.940 --> 01:16:28.370
AUDIENCE: We could
encounter errors in sorting,

01:16:28.370 --> 01:16:33.010
and that might cause errors, like,
unpredictability in terms of,

01:16:33.010 --> 01:16:34.620
like, if we can find something.

01:16:34.620 --> 01:16:37.143
Versus linear search,
we know we can find it.

01:16:37.143 --> 01:16:38.310
DAVID MALAN: OK, quite fair.

01:16:38.310 --> 01:16:41.940
I will concede that implementing binary
search, not in pseudocode, which we've

01:16:41.940 --> 01:16:44.850
already done, but in code
is actually more difficult,

01:16:44.850 --> 01:16:47.040
because you have to deal
with rounding, especially

01:16:47.040 --> 01:16:49.790
if you've got a weird number of
doors, like an odd number of doors

01:16:49.790 --> 01:16:52.410
versus an even number of doors
or an array of those lengths.

01:16:52.410 --> 01:16:54.618
Honestly, you've got to deal
with these corner cases,

01:16:54.618 --> 01:16:57.180
like rounding down or rounding
up, because anything time

01:16:57.180 --> 01:16:59.857
you divide something by 2, you
might get a fractional value

01:16:59.857 --> 01:17:01.190
or you might get a whole number.

01:17:01.190 --> 01:17:02.350
So we've got to make some decisions.

01:17:02.350 --> 01:17:03.630
So it's totally solvable.

01:17:03.630 --> 01:17:06.870
And humans for decades have been writing
code that implements binary search.

01:17:06.870 --> 01:17:08.160
It's totally possible.

01:17:08.160 --> 01:17:09.450
There's libraries you can use.

01:17:09.450 --> 01:17:12.492
But it's definitely more challenging,
and you open yourselves up to risk.

01:17:12.492 --> 01:17:14.385
But let me stipulate that that's OK.

01:17:14.385 --> 01:17:17.763
I am good enough at this point in
my progression where I'm pretty

01:17:17.763 --> 01:17:19.180
sure I can implement it correctly.

01:17:19.180 --> 01:17:21.480
So correctness is not my concern.

01:17:21.480 --> 01:17:26.880
What else might demotivate me
from sorting an array of elements?

01:17:26.880 --> 01:17:30.420
And what might motivate me to,
ah, just use linear search.

01:17:30.420 --> 01:17:33.150
It's so simple.

01:17:33.150 --> 01:17:34.930
Can anyone propose why?

01:17:34.930 --> 01:17:36.480
Olivia, what do you think?

01:17:36.480 --> 01:17:39.420
AUDIENCE: If the name of
the game is efficiency,

01:17:39.420 --> 01:17:42.180
and you have a small
enough data set, then

01:17:42.180 --> 01:17:46.040
you might as well just
search it versus sort

01:17:46.040 --> 01:17:47.920
it, which would be an extra expense.

01:17:47.920 --> 01:17:49.420
DAVID MALAN: Yeah, really well said.

01:17:49.420 --> 01:17:51.450
If you've got a
relatively small data set,

01:17:51.450 --> 01:17:55.050
and your computer operates at a
billion operations per second,

01:17:55.050 --> 01:17:59.130
for instance, my God, who cares if your
code sucks and it's a little bit slow?

01:17:59.130 --> 01:18:00.580
Just do it the inefficient way.

01:18:00.580 --> 01:18:01.080
Why?

01:18:01.080 --> 01:18:04.205
Because it's going to take you maybe
a few minutes to implement the simpler

01:18:04.205 --> 01:18:07.200
algorithm like linear search, even
though it's going to take longer

01:18:07.200 --> 01:18:10.350
to run, whereas it might take you
tens of minutes, maybe an hour

01:18:10.350 --> 01:18:12.810
or so, to not only write
but debug something

01:18:12.810 --> 01:18:16.200
like a fancier algorithm, like
binary search, at which point

01:18:16.200 --> 01:18:19.710
you might have spent more time writing
the code, the faster code, than you

01:18:19.710 --> 01:18:22.342
would have just running the slower code.

01:18:22.342 --> 01:18:23.800
And I can speak to this personally.

01:18:23.800 --> 01:18:26.008
Back in grad school, some
of the research I was doing

01:18:26.008 --> 01:18:28.170
involved analysis of
very large data sets.

01:18:28.170 --> 01:18:30.960
And I had to write code in
order to analyze this data.

01:18:30.960 --> 01:18:33.330
And I could have spent
hours, days, even,

01:18:33.330 --> 01:18:37.320
writing the best designed
algorithm I could to analyze

01:18:37.320 --> 01:18:39.480
the data as efficiently as possible.

01:18:39.480 --> 01:18:42.240
Or, frankly, I could write the
crappy version of the code,

01:18:42.240 --> 01:18:44.550
go to sleep for eight
hours, and my code will just

01:18:44.550 --> 01:18:46.770
produce the output I want by morning.

01:18:46.770 --> 01:18:49.513
And that is a very real-world,
reasonable trade-off to make.

01:18:49.513 --> 01:18:51.930
And indeed, this is going to
be thematic in the weeks that

01:18:51.930 --> 01:18:54.615
proceed in the course, where
there's going to be this trade-off.

01:18:54.615 --> 01:18:56.490
And quite often, the
trade-off is going to be

01:18:56.490 --> 01:19:01.110
time, or complexity, or the amount
of space or memory that you're using.

01:19:01.110 --> 01:19:04.740
And part of the art of being
a good computer scientist,

01:19:04.740 --> 01:19:08.140
and in turn programmer, is trying
to decide where the line is.

01:19:08.140 --> 01:19:11.430
Do you exert more effort upfront to
make a better, faster, more efficient

01:19:11.430 --> 01:19:13.800
algorithm, or do you
maybe cut some corners

01:19:13.800 --> 01:19:17.580
there so that you can focus your
most precious resource, human time,

01:19:17.580 --> 01:19:20.430
on other, more fundamentally
challenging problems?

01:19:20.430 --> 01:19:22.620
So we for the course's
problem sets and labs

01:19:22.620 --> 01:19:24.880
will always prescribe
what's most important.

01:19:24.880 --> 01:19:27.113
But in a few weeks' time,
with one of our problem

01:19:27.113 --> 01:19:29.280
sets will you implement
your very own spell checker.

01:19:29.280 --> 01:19:30.988
And among the goals
of that spell checker

01:19:30.988 --> 01:19:33.690
are going to be to
minimize the amount of time

01:19:33.690 --> 01:19:37.980
your code is taking to run, and also to
minimize the amount of space or memory

01:19:37.980 --> 01:19:41.070
that your program is
taking while running.

01:19:41.070 --> 01:19:44.010
And so we'll begin to appreciate
those trade-offs ever more.

01:19:44.010 --> 01:19:47.370
But indeed, it's the case-- and I
really like Olivia's formulation of it--

01:19:47.370 --> 01:19:50.340
if your data set is pretty
small, it's probably not worth

01:19:50.340 --> 01:19:54.330
writing the fastest, best
designed algorithm as possible.

01:19:54.330 --> 01:19:56.400
Just write it the simple
way, the correct way,

01:19:56.400 --> 01:19:58.440
and get the answer quickly, and move on.

01:19:58.440 --> 01:20:01.530
But that's not going to be the case
for a lot of problems, dare I say,

01:20:01.530 --> 01:20:03.420
most problems in life.

01:20:03.420 --> 01:20:06.150
If you're building Facebook
or Instagram or Whatsapp,

01:20:06.150 --> 01:20:10.980
or any of today's most popular services
that are getting thousands, millions

01:20:10.980 --> 01:20:13.500
of new pieces of data at
a time, you can't just

01:20:13.500 --> 01:20:17.310
linearly search all of your friends or
connections on LinkedIn efficiently.

01:20:17.310 --> 01:20:20.430
You can't just linearly search
the billions of web pages

01:20:20.430 --> 01:20:23.820
that Google and Microsoft
index in their search engines.

01:20:23.820 --> 01:20:25.450
You've got to be smarter about it.

01:20:25.450 --> 01:20:28.200
And undoubtedly, the more
successful your programs are

01:20:28.200 --> 01:20:31.350
and your code are, or websites,
your apps, whatever the case may be,

01:20:31.350 --> 01:20:33.810
the more important design
does come into play.

01:20:33.810 --> 01:20:38.850
So indeed, let's stipulate now that the
goal is not to search these doors once;

01:20:38.850 --> 01:20:41.040
the goal is not to search
these light bulbs once;

01:20:41.040 --> 01:20:44.430
the goal is not to search the
phone book once, but rather again

01:20:44.430 --> 01:20:45.970
and again and again.

01:20:45.970 --> 01:20:48.600
And if that's going to be
the case, then we probably

01:20:48.600 --> 01:20:52.890
should spend a little more time and
a little more complexity upfront

01:20:52.890 --> 01:20:56.250
getting our code, not only
right but also efficient,

01:20:56.250 --> 01:20:59.670
so that we can benefit from
that efficiency again and again

01:20:59.670 --> 01:21:01.480
and again, over time.

01:21:01.480 --> 01:21:04.050
So how might we go about
sorting some numbers.

01:21:04.050 --> 01:21:06.390
So in fact, let me see, to
do this, if we can maybe

01:21:06.390 --> 01:21:10.470
get a hand from Brian in back.

01:21:10.470 --> 01:21:12.180
Brian, do you mind helping with sorting?

01:21:12.180 --> 01:21:13.210
BRIAN: Yeah, absolutely.

01:21:13.210 --> 01:21:16.950
So I've got eight numbers here right now
that all seem to be in unsorted order.

01:21:16.950 --> 01:21:17.700
DAVID MALAN: Yeah.

01:21:17.700 --> 01:21:21.400
And Brian, could you go ahead, and could
you sort these eight numbers for us?

01:21:21.400 --> 01:21:22.900
BRIAN: Yeah, I'll put them in order.

01:21:22.900 --> 01:21:27.640
So we'll take these and--

01:21:27.640 --> 01:21:35.450
um-- and all right.

01:21:35.450 --> 01:21:37.033
I think these are now in sorted order.

01:21:37.033 --> 01:21:38.117
DAVID MALAN: Yeah, indeed.

01:21:38.117 --> 01:21:38.630
I agree.

01:21:38.630 --> 01:21:42.420
And now let's take some critique
from the audience, some observations.

01:21:42.420 --> 01:21:49.140
Would someone mind explaining how
Brian just sorted those eight numbers?

01:21:49.140 --> 01:21:54.650
What did Brian just do, step by step,
in order to get to that end result?

01:21:54.650 --> 01:21:57.780
The input was unsorted,
the output now is sorted.

01:21:57.780 --> 01:21:58.550
So what did he do?

01:21:58.550 --> 01:22:01.190
Peter, what did you see happen?

01:22:01.190 --> 01:22:03.200
AUDIENCE: He went through
them step by step.

01:22:03.200 --> 01:22:07.200
And if they were in increasing
order, he flipped them,

01:22:07.200 --> 01:22:10.730
and kept doing it until they were
all in the correct [INAUDIBLE]..

01:22:10.730 --> 01:22:11.480
DAVID MALAN: Yeah.

01:22:11.480 --> 01:22:13.820
He kept step by step kind
of looking for small values

01:22:13.820 --> 01:22:16.130
and moving them to the left,
and looking for big values

01:22:16.130 --> 01:22:18.380
and moving them to the right,
so effectively selecting

01:22:18.380 --> 01:22:21.330
numbers one at a time and
putting it into its right place.

01:22:21.330 --> 01:22:24.163
So let's see this is, maybe in more
slow motion, if you will, Brian.

01:22:24.163 --> 01:22:25.913
And if you could be a
little more pedantic

01:22:25.913 --> 01:22:27.900
and explain exactly what you're doing.

01:22:27.900 --> 01:22:32.390
I see you've already reset the numbers
to their original, unsorted order.

01:22:32.390 --> 01:22:35.240
Why don't we go ahead and start
a little more methodically?

01:22:35.240 --> 01:22:37.910
And could you go ahead and
for us, more slowly this time,

01:22:37.910 --> 01:22:40.100
select the smallest value.

01:22:40.100 --> 01:22:41.860
Because I do think,
per Peter, it's going

01:22:41.860 --> 01:22:44.100
to need to end up at the far left.

01:22:44.100 --> 01:22:44.850
BRIAN: Yeah, sure.

01:22:44.850 --> 01:22:47.370
So I'm looking at the numbers,
and the 1 is the smallest.

01:22:47.370 --> 01:22:49.340
So I now have the smallest value.

01:22:49.340 --> 01:22:49.580
DAVID MALAN: All right.

01:22:49.580 --> 01:22:50.872
So you did that really quickly.

01:22:50.872 --> 01:22:52.880
But I feel like you took
the liberty of being

01:22:52.880 --> 01:22:56.120
a human who can kind of have this bird's
eye view of everything all at once.

01:22:56.120 --> 01:22:58.162
But be a little more
computer-like, if you could.

01:22:58.162 --> 01:23:00.650
And if these eight numbers
are technically in an array,

01:23:00.650 --> 01:23:02.690
kind of like my seven
doors out here, such

01:23:02.690 --> 01:23:06.710
that you can only look at one number at
a time, can you be even more methodical

01:23:06.710 --> 01:23:10.010
and deliberate this time in telling
us how you found the smallest

01:23:10.010 --> 01:23:11.760
number to put into place?

01:23:11.760 --> 01:23:12.260
BRIAN: Sure.

01:23:12.260 --> 01:23:15.110
I guess, since the computer can
only look at one number at a time,

01:23:15.110 --> 01:23:17.220
I would start at the
left side of this array

01:23:17.220 --> 01:23:20.490
and work my way through the right,
looking at each number one at a time.

01:23:20.490 --> 01:23:22.910
So I might start with the 6
and say, OK, this right now

01:23:22.910 --> 01:23:25.180
is the smallest number
I've looked at so far.

01:23:25.180 --> 01:23:28.430
But then I look at the next number, and
it's a 3, and that's smaller than a 6.

01:23:28.430 --> 01:23:31.700
So now the 3, that's the
smallest number I found so far.

01:23:31.700 --> 01:23:33.440
So I'll remember that and keep looking.

01:23:33.440 --> 01:23:36.107
The 8 is bigger than the 3, so I
don't need to worry about that.

01:23:36.107 --> 01:23:37.520
The 5 is bigger than the 3.

01:23:37.520 --> 01:23:39.680
The 2 is smaller than
the 3, so that now is

01:23:39.680 --> 01:23:41.905
the smallest number I've found so far.

01:23:41.905 --> 01:23:42.780
But I'm not done yet.

01:23:42.780 --> 01:23:43.655
So I'll keep looking.

01:23:43.655 --> 01:23:46.490
The 7 is bigger than the 2,
the 4 is bigger than the 2.

01:23:46.490 --> 01:23:47.990
But the 1 is smaller than the 2.

01:23:47.990 --> 01:23:50.750
So now I've made my way all the
way to the end of the array.

01:23:50.750 --> 01:23:53.023
And 1, I can say, is the
smallest number that I found.

01:23:53.023 --> 01:23:53.690
DAVID MALAN: OK.

01:23:53.690 --> 01:23:56.232
So what I'm hearing is you're
doing all of these comparisons,

01:23:56.232 --> 01:23:58.867
also similar to what Peter
implied, and you keep checking,

01:23:58.867 --> 01:24:00.950
is this smaller, is this
smaller, is this smaller,

01:24:00.950 --> 01:24:04.100
and you're keeping track of the
currently smallest number you've seen?

01:24:04.100 --> 01:24:05.180
BRIAN: Yeah, that sounds about right.

01:24:05.180 --> 01:24:05.420
DAVID MALAN: All right.

01:24:05.420 --> 01:24:06.260
So you found it.

01:24:06.260 --> 01:24:07.930
And I think it belongs at the beginning.

01:24:07.930 --> 01:24:09.760
So how do we put this into place now?

01:24:09.760 --> 01:24:11.250
BRIAN: Yeah, so I want to
put it at the beginning.

01:24:11.250 --> 01:24:12.583
There's not really space for it.

01:24:12.583 --> 01:24:15.383
So I could make space for it, just
by shifting these numbers over.

01:24:15.383 --> 01:24:16.050
DAVID MALAN: OK.

01:24:16.050 --> 01:24:16.500
Wait, wait.

01:24:16.500 --> 01:24:19.375
But I feel like you're just-- now
you're doubling the amount of work.

01:24:19.375 --> 01:24:20.708
I feel like-- don't do all that.

01:24:20.708 --> 01:24:23.167
That feels like you're going
to do more steps than we need.

01:24:23.167 --> 01:24:24.530
What else could we do here?

01:24:24.530 --> 01:24:25.030
BRIAN: OK.

01:24:25.030 --> 01:24:27.343
So the other option is, it
needs to go in this spot,

01:24:27.343 --> 01:24:28.760
like this first spot in the array.

01:24:28.760 --> 01:24:30.267
So I could just put it there.

01:24:30.267 --> 01:24:33.350
But if I do that, I'm going to have
to take the 6 which is there right now

01:24:33.350 --> 01:24:34.790
and pull the 6 out.

01:24:34.790 --> 01:24:35.060
DAVID MALAN: All right,
but I think that's--

01:24:35.060 --> 01:24:37.170
BRIAN: So the 1 is in the
right place, but the 6 isn't.

01:24:37.170 --> 01:24:37.670
DAVID MALAN: Yeah, I agree.

01:24:37.670 --> 01:24:38.878
But I think that's OK, right?

01:24:38.878 --> 01:24:42.660
Because these numbers started randomly,
and so the 6 is in the wrong place

01:24:42.660 --> 01:24:43.160
anyway.

01:24:43.160 --> 01:24:46.310
I don't think we're making the problem
any worse by just moving it elsewhere.

01:24:46.310 --> 01:24:49.280
And indeed, it's a lot faster, I
would think, to just swap two numbers,

01:24:49.280 --> 01:24:51.530
move one to the other
and vice versa, then

01:24:51.530 --> 01:24:54.020
shift all of those numbers in between.

01:24:54.020 --> 01:24:54.520
BRIAN: Yeah.

01:24:54.520 --> 01:24:56.728
So I took the 1 out of the
position at the very end

01:24:56.728 --> 01:24:58.770
of the array, all the way
on the right-hand side.

01:24:58.770 --> 01:25:01.320
So I guess I could take the
6 and just put it there,

01:25:01.320 --> 01:25:04.100
because that's where there's an
open space to put the number.

01:25:04.100 --> 01:25:04.370
DAVID MALAN: Yeah.

01:25:04.370 --> 01:25:07.110
And it's not exactly in the right
space, but again, it's no worse off.

01:25:07.110 --> 01:25:07.610
So I like that.

01:25:07.610 --> 01:25:07.820
All right.

01:25:07.820 --> 01:25:10.487
But now, the fact that the 1 is
in the right place-- and indeed,

01:25:10.487 --> 01:25:12.380
you've illuminated it
to indicate as much--

01:25:12.380 --> 01:25:14.630
I feel like we can pretty
much ignore the 1 henceforth

01:25:14.630 --> 01:25:16.680
and now just select the
next smallest element.

01:25:16.680 --> 01:25:18.210
So can you walk us through that?

01:25:18.210 --> 01:25:19.790
BRIAN: Yeah, so I guess I'd
repeat the same process.

01:25:19.790 --> 01:25:20.870
I'd start with the 3.

01:25:20.870 --> 01:25:22.928
That's the smallest
number I've found so far.

01:25:22.928 --> 01:25:23.720
And I keep looking.

01:25:23.720 --> 01:25:26.750
The 8 is bigger than the 3,
the 5 is bigger than the 3.

01:25:26.750 --> 01:25:27.990
The 2 is smaller than the 3.

01:25:27.990 --> 01:25:29.060
So I'll remember that 2.

01:25:29.060 --> 01:25:31.023
That's the smallest
thing I've seen so far.

01:25:31.023 --> 01:25:34.190
And then I just need to check to see
if there's anything smaller than the 2.

01:25:34.190 --> 01:25:36.558
And I look at the 7, the 4, and the 6.

01:25:36.558 --> 01:25:38.100
None of those are smaller than the 2.

01:25:38.100 --> 01:25:41.623
So the 2, I can say is the next
smallest number for the array.

01:25:41.623 --> 01:25:42.290
DAVID MALAN: OK.

01:25:42.290 --> 01:25:43.808
And where would you put that then?

01:25:43.808 --> 01:25:45.600
BRIAN: That needs to
go in the second spot.

01:25:45.600 --> 01:25:47.060
So I need to pull the 3 out.

01:25:47.060 --> 01:25:50.450
And I guess I can take the 3 and just
put it into this open spot, where

01:25:50.450 --> 01:25:51.770
there's available space.

01:25:51.770 --> 01:25:52.520
DAVID MALAN: Yeah.

01:25:52.520 --> 01:25:54.650
And I feel like it's
starting to become clear

01:25:54.650 --> 01:25:56.870
that we're inside some kind of
loop, because you pretty much told

01:25:56.870 --> 01:25:58.670
the same story again but
with a different number.

01:25:58.670 --> 01:26:00.710
Do you mind just continuing
the algorithm to the end

01:26:00.710 --> 01:26:03.168
and select the next smallest,
next smallest, next smallest,

01:26:03.168 --> 01:26:04.001
and get that sorted?

01:26:04.001 --> 01:26:04.501
BRIAN: Sure.

01:26:04.501 --> 01:26:05.225
So we got the 8.

01:26:05.225 --> 01:26:07.768
5 is smaller than that,
3 is smaller than that.

01:26:07.768 --> 01:26:09.560
And then the rest of
the number is 7, 4, 6.

01:26:09.560 --> 01:26:10.760
Those are all bigger.

01:26:10.760 --> 01:26:13.850
So the 3, that's going to go
into sorted position here.

01:26:13.850 --> 01:26:15.900
And I'll take the 8 and swap it.

01:26:15.900 --> 01:26:17.380
Now I'm going to look at the 5.

01:26:17.380 --> 01:26:18.830
8 and 7 are both bigger.

01:26:18.830 --> 01:26:21.680
The 4 is smaller than the
5, but the 6 is bigger.

01:26:21.680 --> 01:26:24.540
So the 4, that's the smallest
number I've seen so far.

01:26:24.540 --> 01:26:28.160
So the 4, that's going to go into
place, and I'll swap it with the 5.

01:26:28.160 --> 01:26:29.485
And now I've got the 8.

01:26:29.485 --> 01:26:31.610
The 7 is smaller than the
8, so I'll remember that.

01:26:31.610 --> 01:26:33.040
5 is smaller than that.

01:26:33.040 --> 01:26:34.550
The 6 is bigger.

01:26:34.550 --> 01:26:37.190
So the 5, that's going
to be the next number.

01:26:37.190 --> 01:26:39.877
And now I'm left with 7.

01:26:39.877 --> 01:26:41.960
8 is bigger, so 7 is still
the smallest I've seen.

01:26:41.960 --> 01:26:45.350
But 6 is smaller, so 6 goes next.

01:26:45.350 --> 01:26:47.570
And now I'm down to the last two.

01:26:47.570 --> 01:26:50.900
And between the last two, the
8 and the 7, the 7 is smaller.

01:26:50.900 --> 01:26:53.450
So the 7 is going to go in this spot.

01:26:53.450 --> 01:26:55.730
And at this point, I've
only got one number left.

01:26:55.730 --> 01:26:58.460
So that number must
be in sorted position.

01:26:58.460 --> 01:27:01.310
And now I would say that this
is a sorted array of numbers.

01:27:01.310 --> 01:27:02.060
DAVID MALAN: Nice.

01:27:02.060 --> 01:27:04.250
So it definitely seems to be correct.

01:27:04.250 --> 01:27:05.600
It felt a little slow.

01:27:05.600 --> 01:27:07.808
But of course, the computer
could do this much faster

01:27:07.808 --> 01:27:09.132
than we, using an actual array.

01:27:09.132 --> 01:27:11.090
And if you don't mind my
making an observation,

01:27:11.090 --> 01:27:15.870
it looks like if we have eight numbers
to begin with, or n more generally,

01:27:15.870 --> 01:27:20.120
it looks like you essentially
did n minus 1 comparisons,

01:27:20.120 --> 01:27:23.897
because you kept comparing numbers
again-- actually, did n comparisons.

01:27:23.897 --> 01:27:25.730
You looked at the first
number, and then you

01:27:25.730 --> 01:27:29.000
compared it again and again and again
at all of the other possible values

01:27:29.000 --> 01:27:31.440
in order to find the smallest element.

01:27:31.440 --> 01:27:31.940
BRIAN: Yeah.

01:27:31.940 --> 01:27:35.190
Because for each of the numbers in the
array, I had to do a comparison to see,

01:27:35.190 --> 01:27:38.250
is it smaller than the smallest
thing that I've seen so far?

01:27:38.250 --> 01:27:40.540
And if it is smaller, than
I needed to remember that.

01:27:40.540 --> 01:27:41.290
DAVID MALAN: Yeah.

01:27:41.290 --> 01:27:44.510
So in each pass you considered every
number, so a total of n numbers first.

01:27:44.510 --> 01:27:46.760
And so you found the number
1 you put it in its place,

01:27:46.760 --> 01:27:50.300
and that left you to be clear
with n minus 1 numbers thereafter.

01:27:50.300 --> 01:27:53.630
And then after that, n minus 2
numbers, n minus 3 numbers, dot,

01:27:53.630 --> 01:27:56.130
dot, dot, all the way
down to one final number.

01:27:56.130 --> 01:27:57.440
So I think this is correct.

01:27:57.440 --> 01:28:00.050
And I think that's a
pretty deliberate way

01:28:00.050 --> 01:28:03.428
of sorting these elements, a little more
deliberately than your first approach,

01:28:03.428 --> 01:28:05.720
Brian, which I might describe
as a little more organic.

01:28:05.720 --> 01:28:06.560
You kind of did it like--

01:28:06.560 --> 01:28:09.352
more like a human, just kind of
eyeballing things and moving things

01:28:09.352 --> 01:28:09.930
around.

01:28:09.930 --> 01:28:11.722
But if we were to
translate this into code,

01:28:11.722 --> 01:28:13.790
recall that we have
to be ever so precise.

01:28:13.790 --> 01:28:18.470
And so let me consider altogether how
exactly we might translate what Brian

01:28:18.470 --> 01:28:20.820
did ultimately to, again, pseudocode.

01:28:20.820 --> 01:28:23.280
So what he did is actually
an algorithm that has a name.

01:28:23.280 --> 01:28:24.990
It's called selection sort.

01:28:24.990 --> 01:28:25.530
Why?

01:28:25.530 --> 01:28:27.380
Well, it's sorting the
elements ultimately.

01:28:27.380 --> 01:28:30.530
And it's doing so by having
Brian, or really the computer,

01:28:30.530 --> 01:28:33.830
select the smallest elements
again and again and again.

01:28:33.830 --> 01:28:35.660
And once you found each
such small element,

01:28:35.660 --> 01:28:37.820
you get the added benefit
of just ignoring it.

01:28:37.820 --> 01:28:39.680
Indeed, every time
Brian lit up a number,

01:28:39.680 --> 01:28:43.610
he didn't need to keep comparing it,
so the amount of work he was doing

01:28:43.610 --> 01:28:47.300
was decreasing each iteration--
n numbers, then n minus 1,

01:28:47.300 --> 01:28:49.950
then n minus 2, n minus 3, and so forth.

01:28:49.950 --> 01:28:53.450
And so we can think about the
running time of this algorithm

01:28:53.450 --> 01:28:56.820
as being manifest in
its actual pseudocode.

01:28:56.820 --> 01:28:58.483
So how might we define the pseudocode?

01:28:58.483 --> 01:29:00.650
Well, let me propose that
we think of it like this--

01:29:00.650 --> 01:29:03.300
for i from 0 to n minus 1.

01:29:03.300 --> 01:29:05.390
Now, undoubtedly this is
probably the most cryptic

01:29:05.390 --> 01:29:07.972
looking line of the three lines
of pseudocode on the screen.

01:29:07.972 --> 01:29:09.680
But again, this is
the kind of thing that

01:29:09.680 --> 01:29:13.470
should become rote memory over
time, or just instincts with code.

01:29:13.470 --> 01:29:15.470
We've seen in C how you
can write a for loop.

01:29:15.470 --> 01:29:18.680
For loops typically, by
convention, start counting at 0.

01:29:18.680 --> 01:29:22.400
But if you have n elements, you
don't want to count up through n.

01:29:22.400 --> 01:29:27.560
You want to count up to n or
equivalently up through n minus 1,

01:29:27.560 --> 01:29:29.430
so from 0 to n minus 1.

01:29:29.430 --> 01:29:29.930
All right.

01:29:29.930 --> 01:29:31.580
Now what do I want to do on the next--

01:29:31.580 --> 01:29:32.900
on the first iteration?

01:29:32.900 --> 01:29:37.580
Find the smallest item between
the i-th item and the last item.

01:29:37.580 --> 01:29:40.580
So this is not quite obvious,
I think, at first glance.

01:29:40.580 --> 01:29:43.700
But I do think it's a fair
characterization of what Brian did.

01:29:43.700 --> 01:29:47.540
Because if i is initialized to
0, that was like Brian pointing

01:29:47.540 --> 01:29:51.830
his left hand at the first number
on the very left of the shelf.

01:29:51.830 --> 01:29:56.250
And what he then did was he found the
smallest element between the i-th item,

01:29:56.250 --> 01:29:58.655
the first item 0, and the last item.

01:29:58.655 --> 01:30:00.530
So that's kind of a very
fancy way of saying,

01:30:00.530 --> 01:30:04.790
Brian, find the smallest
elements among all n elements.

01:30:04.790 --> 01:30:09.680
Then what he did was swapped the
smallest item with the i-th item.

01:30:09.680 --> 01:30:12.140
So he just did that
switcheroo, so as to not have

01:30:12.140 --> 01:30:15.080
to waste time shifting everything over.

01:30:15.080 --> 01:30:17.330
He instead just made room
for it by swapping it

01:30:17.330 --> 01:30:20.217
with the value that
was in its wrong place.

01:30:20.217 --> 01:30:23.300
But now in the next iteration of this
loop, consider how a for loop works.

01:30:23.300 --> 01:30:25.582
You do an i++ implicitly in pseudocode.

01:30:25.582 --> 01:30:26.790
That's what's happening here.

01:30:26.790 --> 01:30:28.370
So now i equals 1.

01:30:28.370 --> 01:30:33.800
Find the smallest item between
the i-th item, item 1 0 indexed,

01:30:33.800 --> 01:30:34.880
and the last item.

01:30:34.880 --> 01:30:39.140
So this is a fancy way of saying,
Brian, check all of the n elements

01:30:39.140 --> 01:30:42.200
again, except for the
first, because now you're

01:30:42.200 --> 01:30:45.680
starting at location 1
instead of location 0.

01:30:45.680 --> 01:30:47.300
And now the algorithm proceeds.

01:30:47.300 --> 01:30:49.490
So you could write this
code in different ways

01:30:49.490 --> 01:30:51.410
in English like
pseudocode, but this seems

01:30:51.410 --> 01:30:54.650
to be a reasonable formulation
of exactly that algorithm.

01:30:54.650 --> 01:30:57.470
But let's see it a
little more visually now,

01:30:57.470 --> 01:31:00.808
without all of the switching around of
the humans moving around the numbers.

01:31:00.808 --> 01:31:02.600
Let me go ahead and
use this visualization.

01:31:02.600 --> 01:31:04.040
And we'll put a link
on the course's website

01:31:04.040 --> 01:31:05.707
if you'd like to play with this as well.

01:31:05.707 --> 01:31:10.520
This is just someone's visualization
of an array of numbers.

01:31:10.520 --> 01:31:14.540
But this time, rather than represent
the numbers as symbols, decimal digits,

01:31:14.540 --> 01:31:17.510
now this person is using
vertical bars, like a bar chart.

01:31:17.510 --> 01:31:20.720
And what this means is that a
small bar is like a small number,

01:31:20.720 --> 01:31:22.430
and a big bar is a big number.

01:31:22.430 --> 01:31:26.810
So the goal here is to these bars, which
equivalently might as well be numbers,

01:31:26.810 --> 01:31:29.777
from short bars over to
tall bars, left to right.

01:31:29.777 --> 01:31:30.860
And I'm going to go ahead.

01:31:30.860 --> 01:31:34.100
And along the top of the here, I
can choose my sorting algorithm.

01:31:34.100 --> 01:31:36.830
And the one we just described,
recall, was selection sort.

01:31:36.830 --> 01:31:39.170
So let me go ahead and do this.

01:31:39.170 --> 01:31:41.900
And notice, it takes a moment,
I think, to wrap your mind

01:31:41.900 --> 01:31:43.740
around what's happening here.

01:31:43.740 --> 01:31:48.200
But notice that this pink line is going
from left to right, because that's

01:31:48.200 --> 01:31:49.730
essentially what Brian was doing.

01:31:49.730 --> 01:31:52.730
He was walking back and forth,
back and forth, back and forth

01:31:52.730 --> 01:31:56.180
through that shelf of numbers,
looking for the next smallest number,

01:31:56.180 --> 01:32:00.260
and he kept putting the smallest number
over on the left where it belongs.

01:32:00.260 --> 01:32:02.660
And indeed, that's why
in this visualization

01:32:02.660 --> 01:32:07.730
you see the small numbers beginning
to be put into place on the left

01:32:07.730 --> 01:32:09.500
as we keep swooping through.

01:32:09.500 --> 01:32:13.760
But notice, the colored bar
keeps starting later and later,

01:32:13.760 --> 01:32:18.222
more rightward and more rightward, just
like Brian was not retracing his steps.

01:32:18.222 --> 01:32:20.430
As soon as he lit up the
numbers, he left them alone.

01:32:20.430 --> 01:32:23.257
And voila, all of these
numbers are now sorted.

01:32:23.257 --> 01:32:26.090
So that's just a graphical way of
thinking about the same algorithm.

01:32:26.090 --> 01:32:28.970
But how efficient or
inefficient was that?

01:32:28.970 --> 01:32:31.100
Well, let's see if we can
apply some numbers here.

01:32:31.100 --> 01:32:33.517
But there's also ways to do
this a little more intuitively

01:32:33.517 --> 01:32:35.160
over time, which we'll do, too.

01:32:35.160 --> 01:32:38.450
So if the first time through
the shelf of numbers, he

01:32:38.450 --> 01:32:41.420
had eight numbers at his disposal--
he had to look at all eight numbers

01:32:41.420 --> 01:32:43.560
in order to decide which
of these is the smallest.

01:32:43.560 --> 01:32:45.380
So that's n steps initially.

01:32:45.380 --> 01:32:48.090
The next time he did a
pass through the shelf,

01:32:48.090 --> 01:32:51.140
he ignored the brightly lit
number 1, because it was already

01:32:51.140 --> 01:32:53.660
in place by definition of
what he had already done.

01:32:53.660 --> 01:32:56.480
So now he had n minus 1 steps to go.

01:32:56.480 --> 01:33:01.860
Then he did another n minus 2 steps,
then n minus 3, n minus 4, n minus 5,

01:33:01.860 --> 01:33:05.270
dot, dot, dot, all the way down
to the final step, where he just

01:33:05.270 --> 01:33:08.210
had to find and leave
alone the number 8,

01:33:08.210 --> 01:33:11.280
because that was the biggest
number, so one single step.

01:33:11.280 --> 01:33:13.988
So this is some kind of
series here, mathematically.

01:33:13.988 --> 01:33:17.030
You might recall something like this
in, like, the back of your math book

01:33:17.030 --> 01:33:19.697
or in high school, or back of
your physics textbook or the like.

01:33:19.697 --> 01:33:23.330
It turns out that this actually
sums up to this formula here--

01:33:23.330 --> 01:33:25.692
n times n plus 1 divided by 2.

01:33:25.692 --> 01:33:28.400
And if that's not familiar, you
don't remember that, no big deal.

01:33:28.400 --> 01:33:31.280
Just let me stipulate that the
mathematical formula with which we

01:33:31.280 --> 01:33:35.000
began, where we had the series of
n, plus n minus 1, plus n minus 2,

01:33:35.000 --> 01:33:38.150
plus n minus 3, dot, dot,
dot, simply sums up ultimately

01:33:38.150 --> 01:33:42.650
to the more succinct n
times n plus 1 divided by 2.

01:33:42.650 --> 01:33:47.000
This, of course, if we multiply it out,
gives us n squared plus n divided by 2.

01:33:47.000 --> 01:33:50.770
And this now, I will
propose, gives us just this--

01:33:50.770 --> 01:33:54.440
n squared divided by 2 plus n/2.

01:33:54.440 --> 01:33:56.900
So if we really wanted
to be nit-picky, this

01:33:56.900 --> 01:34:01.400
is the total number of steps,
or operations, or seconds,

01:34:01.400 --> 01:34:04.040
however we want to measure
Brian's running time.

01:34:04.040 --> 01:34:07.970
This seems to be the precise
mathematical formula therefore.

01:34:07.970 --> 01:34:11.700
But at the beginning of this
week, we considered again,

01:34:11.700 --> 01:34:13.160
the sort of Big O notation.

01:34:13.160 --> 01:34:16.760
With a wave of the hand, we care
more about the order of magnitude

01:34:16.760 --> 01:34:18.080
on which an algorithm operates.

01:34:18.080 --> 01:34:22.850
I really don't care about
these divided by 2 and n/2.

01:34:22.850 --> 01:34:26.120
Because which of these factors
is going to matter as n gets big?

01:34:26.120 --> 01:34:29.210
The bigger the phone book
gets, the more doors we have,

01:34:29.210 --> 01:34:33.370
the more light bulbs we have, the
more numbers we have on the shelf,

01:34:33.370 --> 01:34:35.810
n is going to keep getting
bigger and bigger and bigger.

01:34:35.810 --> 01:34:38.430
And given that, which
is the dominant factor?

01:34:38.430 --> 01:34:42.080
Rongxin, if we could call on someone
here, which of these factors,

01:34:42.080 --> 01:34:48.320
n squared divided by 2, or n divided
by 2, really matters in the long run

01:34:48.320 --> 01:34:54.090
as our problems get bigger and
bigger, as n gets bigger and bigger?

01:34:54.090 --> 01:34:57.630
Which of those factors
mathematically dominates?

01:34:57.630 --> 01:34:58.130
Anika?

01:34:58.130 --> 01:35:00.218
AUDIENCE: Oh, it's Anika, but--

01:35:00.218 --> 01:35:01.010
DAVID MALAN: Anika.

01:35:01.010 --> 01:35:02.635
AUDIENCE: It would be the-- no problem.

01:35:02.635 --> 01:35:04.272
It would be the n squared.

01:35:04.272 --> 01:35:05.480
DAVID MALAN: Yeah, n squared.

01:35:05.480 --> 01:35:05.980
Right.

01:35:05.980 --> 01:35:08.390
If you take any number
for n and you square it,

01:35:08.390 --> 01:35:11.150
that's going to be bigger,
certainly in the long run,

01:35:11.150 --> 01:35:12.720
than just doing n divided by 2.

01:35:12.720 --> 01:35:15.470
And so with our Big O notation,
we could describe the running time

01:35:15.470 --> 01:35:20.930
of Brian's selection sort implementation
as, ah, it's on the order of n squared.

01:35:20.930 --> 01:35:23.150
Yes, I'm ignoring some
numbers, and yes, if we really

01:35:23.150 --> 01:35:25.760
wanted to be nit-picky and
count up every single step

01:35:25.760 --> 01:35:29.450
that Brian took, yes, it's n
squared divided by 2 plus n/2.

01:35:29.450 --> 01:35:33.080
But again, if you think about
the problem over time and n

01:35:33.080 --> 01:35:36.170
getting really large, sort of
Facebook-sized, Twitter-sized,

01:35:36.170 --> 01:35:39.890
Google-sized, what's really
going to dominate mathematically

01:35:39.890 --> 01:35:41.690
is this bigger factor here.

01:35:41.690 --> 01:35:44.420
That's what's going to make
the total number of steps way

01:35:44.420 --> 01:35:47.370
bigger than just those
smaller order terms.

01:35:47.370 --> 01:35:49.250
So in Big O notation,
selection sort would

01:35:49.250 --> 01:35:51.540
seem to be on the order of n squared.

01:35:51.540 --> 01:35:54.140
So if we consider our
chart from before where

01:35:54.140 --> 01:35:57.920
we had the upper bounds on
our searching algorithms,

01:35:57.920 --> 01:36:01.410
both linear and binary,
this one, unfortunately,

01:36:01.410 --> 01:36:05.765
is at really the tip top of this
particular list of running times.

01:36:05.765 --> 01:36:07.140
And there's infinitely many more.

01:36:07.140 --> 01:36:09.470
These are just a subset of
the more common formulas

01:36:09.470 --> 01:36:11.807
that a computer scientist
might use and think about.

01:36:11.807 --> 01:36:13.640
Selection sort is kind
of a top of the list.

01:36:13.640 --> 01:36:15.800
And being number one
on this list is bad.

01:36:15.800 --> 01:36:18.980
n squared is certainly
much slower than, say,

01:36:18.980 --> 01:36:22.610
big O of 1, which, of course,
was constant time or one step.

01:36:22.610 --> 01:36:24.580
So I wonder if we could be--

01:36:24.580 --> 01:36:25.880
if we could do a little better.

01:36:25.880 --> 01:36:27.770
I wonder if we could do a little better.

01:36:27.770 --> 01:36:29.970
And Peter actually did say
something else earlier,

01:36:29.970 --> 01:36:33.890
which was about like sharing
two numbers and fixing problems.

01:36:33.890 --> 01:36:35.720
And if I can kind of
run with that, let me

01:36:35.720 --> 01:36:39.620
propose that we, Brian, return to
you for a look at an algorithm that

01:36:39.620 --> 01:36:43.390
might be called instead
bubble sort, bubble sort

01:36:43.390 --> 01:36:45.740
being a different
algorithm, this one that

01:36:45.740 --> 01:36:47.287
tries to fix problems more locally.

01:36:47.287 --> 01:36:49.370
So in fact, Brian, if you
look at the numbers that

01:36:49.370 --> 01:36:51.440
are in front of you,
which you've kindly reset

01:36:51.440 --> 01:36:55.100
to their original, unsorted
location, I feel like this really,

01:36:55.100 --> 01:36:58.578
if we focus on just pairs of numbers,
it's just a lot of small numbers.

01:36:58.578 --> 01:37:00.620
Like last time, we tried
to solve the big problem

01:37:00.620 --> 01:37:02.030
and sorting the whole thing.

01:37:02.030 --> 01:37:06.050
What if we just look at pairs of numbers
that are adjacent to one another?

01:37:06.050 --> 01:37:09.860
Can we maybe make some little tweaks
and change our algorithm fundamentally?

01:37:09.860 --> 01:37:14.450
So for instance, Brian, 6 and 3, what
observation can you make there for us?

01:37:14.450 --> 01:37:15.200
BRIAN: Yeah, sure.

01:37:15.200 --> 01:37:18.233
So 6 and 3 that's, the first
pair of numbers in the array.

01:37:18.233 --> 01:37:20.900
And if I want the array to be
sorted, I want the smaller numbers

01:37:20.900 --> 01:37:23.670
to be on the left and the bigger
numbers to be on the right.

01:37:23.670 --> 01:37:27.227
So just looking at this pair, I can tell
you that the 6 and 3 or out of order.

01:37:27.227 --> 01:37:29.810
The 3 should be on the left, and
the 6 should be on the right.

01:37:29.810 --> 01:37:30.140
DAVID MALAN: All right.

01:37:30.140 --> 01:37:31.932
So let's go ahead and
do that, and go ahead

01:37:31.932 --> 01:37:34.070
and fix that by swapping those two.

01:37:34.070 --> 01:37:35.632
And just fix a small little problem.

01:37:35.632 --> 01:37:37.340
And now let's repeat
this process, right?

01:37:37.340 --> 01:37:39.690
Loops seem to be omnipresent
in a lot of our algorithms.

01:37:39.690 --> 01:37:41.213
So 6 and 8 is the next such pair.

01:37:41.213 --> 01:37:43.130
What you want-- what do
you think about those?

01:37:43.130 --> 01:37:46.362
BRIAN: That particular pair seems OK,
because the 6 is smaller and already

01:37:46.362 --> 01:37:47.070
on the left side.

01:37:47.070 --> 01:37:48.560
So I think I can leave this pair alone.

01:37:48.560 --> 01:37:49.518
DAVID MALAN: All right.

01:37:49.518 --> 01:37:50.380
How about 8 and 5?

01:37:50.380 --> 01:37:51.920
BRIAN: The 8 is bigger than the 5.

01:37:51.920 --> 01:37:53.300
So I'm going to swap these two.

01:37:53.300 --> 01:37:55.190
The 5 should be on the left of the 8.

01:37:55.190 --> 01:37:56.148
DAVID MALAN: All right.

01:37:56.148 --> 01:37:56.870
And 8 and 2?

01:37:56.870 --> 01:37:58.537
BRIAN: Same thing here, the 8 is bigger.

01:37:58.537 --> 01:38:00.337
So the 8 is going to
be swapped with the 2.

01:38:00.337 --> 01:38:01.670
DAVID MALAN: All right, 8 and 7.

01:38:01.670 --> 01:38:06.133
BRIAN: The 8 is bigger than the 7,
so the 8 I should switch with the 7.

01:38:06.133 --> 01:38:07.425
DAVID MALAN: All right 8 and 4?

01:38:07.425 --> 01:38:09.968
BRIAN: 8 and 4, same thing,
it's bigger than the 4.

01:38:09.968 --> 01:38:11.010
DAVID MALAN: And 8 and 1.

01:38:11.010 --> 01:38:12.385
BRIAN: I can do it one last time.

01:38:12.385 --> 01:38:14.912
The 8 is bigger than the
1, and I think that's all.

01:38:14.912 --> 01:38:16.870
DAVID MALAN: And with a
nice dramatic flourish,

01:38:16.870 --> 01:38:18.900
if you step off to the side, voila--

01:38:18.900 --> 01:38:20.340
not sorted.

01:38:20.340 --> 01:38:23.070
In fact, it doesn't really
look all that much better.

01:38:23.070 --> 01:38:26.070
But I do think Brian's
done something smart here.

01:38:26.070 --> 01:38:29.550
Brian, can you speak to at least
some of the marginal improvements

01:38:29.550 --> 01:38:30.300
that you've made?

01:38:30.300 --> 01:38:30.590
BRIAN: Yeah.

01:38:30.590 --> 01:38:32.298
So there are some
improvements, at least.

01:38:32.298 --> 01:38:36.890
The 1 originally was all the way at the
very end, and it moved back one spot.

01:38:36.890 --> 01:38:39.390
And the other improvement, I
think, is that the 8 originally

01:38:39.390 --> 01:38:42.150
was way over here on the left
side of the array somewhere.

01:38:42.150 --> 01:38:44.130
But because the 8 is
the biggest number, I

01:38:44.130 --> 01:38:46.380
kept switching it over and
over again until it made it

01:38:46.380 --> 01:38:47.640
all the way to the end.

01:38:47.640 --> 01:38:51.270
And so now actually, I think
this 8 is in the correct place.

01:38:51.270 --> 01:38:54.660
It's the biggest number, and it
ended up moving its way all the way

01:38:54.660 --> 01:38:56.190
to the right side of the array.

01:38:56.190 --> 01:38:56.550
DAVID MALAN: Yeah.

01:38:56.550 --> 01:38:59.130
And this is where this algorithm that
we'll see the rest of in just a moment

01:38:59.130 --> 01:39:00.720
gets its name, bubble sort--

01:39:00.720 --> 01:39:04.470
alludes to the fact that the biggest
numbers start bubbling their way up

01:39:04.470 --> 01:39:08.220
to the top of, or the end of,
the list, at the right-hand side

01:39:08.220 --> 01:39:09.570
of the shelf as Brian notes.

01:39:09.570 --> 01:39:14.350
But notice, as Brian does, too, the
number 1 only moved over one position.

01:39:14.350 --> 01:39:16.080
So there's clearly more work to be done.

01:39:16.080 --> 01:39:18.690
And that's obvious from the other
numbers being misordered as well.

01:39:18.690 --> 01:39:19.950
But we have improved things.

01:39:19.950 --> 01:39:23.760
The 8 is in place, and the 1
is closer to being in place.

01:39:23.760 --> 01:39:25.377
So how might we proceed next?

01:39:25.377 --> 01:39:28.210
Well, Brian, let's continue to solve
some small bite-sized problems.

01:39:28.210 --> 01:39:29.668
Let's start at the beginning again.

01:39:29.668 --> 01:39:30.400
3 and 6?

01:39:30.400 --> 01:39:30.900
BRIAN: Sure.

01:39:30.900 --> 01:39:33.540
The 3 and the 6, those seem to be
in order, so I'll leave those alone.

01:39:33.540 --> 01:39:34.590
DAVID MALAN: 6 and 5.

01:39:34.590 --> 01:39:37.420
BRIAN: 6 and 5 or out of the order,
so I'll go ahead and take the 6

01:39:37.420 --> 01:39:38.420
and put it to the right.

01:39:38.420 --> 01:39:39.300
DAVID MALAN: 6 and 2.

01:39:39.300 --> 01:39:42.045
BRIAN: Those are out of order as
well, so I'll swap the 2 and the 6.

01:39:42.045 --> 01:39:42.930
DAVID MALAN: 6 and 7.

01:39:42.930 --> 01:39:44.032
BRIAN: 6 and 7 are OK.

01:39:44.032 --> 01:39:44.740
They're in order.

01:39:44.740 --> 01:39:45.690
DAVID MALAN: 7 and 4.

01:39:45.690 --> 01:39:48.535
BRIAN: Those are out of order,
so I'll switch the 4 and the 7.

01:39:48.535 --> 01:39:49.410
DAVID MALAN: 7 and 1.

01:39:49.410 --> 01:39:52.300
BRIAN: And those two are out of
order as well, so I'll swap those.

01:39:52.300 --> 01:39:55.800
And now I think the 7 has made its
way to the sorted position as well.

01:39:55.800 --> 01:39:56.783
DAVID MALAN: Indeed.

01:39:56.783 --> 01:39:58.200
So now we're making some progress.

01:39:58.200 --> 01:40:02.100
7 has bubbled its way up to the top of
the list, stopping just before the 8,

01:40:02.100 --> 01:40:05.550
whereas the 1 has continued its
advance to its correct location.

01:40:05.550 --> 01:40:08.490
So I bet, Brian, if we keep
doing this again and again

01:40:08.490 --> 01:40:11.610
and again, so long as the
list remains in part unsorted,

01:40:11.610 --> 01:40:13.530
I think we'll probably
get to the finish line.

01:40:13.530 --> 01:40:15.690
Do you want to take it from
here and sort the rest?

01:40:15.690 --> 01:40:16.270
BRIAN: Yeah, sure.

01:40:16.270 --> 01:40:17.830
So I just repeat the process again.

01:40:17.830 --> 01:40:19.410
The 3 and the 5 are OK.

01:40:19.410 --> 01:40:22.070
The 2 and the 5 are out of
order, so I'll swap them.

01:40:22.070 --> 01:40:24.330
The 5 and the 6, those
are fine as a pair.

01:40:24.330 --> 01:40:28.200
The 6 and the 4, out of order relative
to each other, so I'll switch those.

01:40:28.200 --> 01:40:31.510
And the 6 and the 1, those are out
of order as well, so I'll swap those.

01:40:31.510 --> 01:40:34.680
And now the 6, that I can say
is in its correct position.

01:40:34.680 --> 01:40:36.010
And I'll repeat it again.

01:40:36.010 --> 01:40:38.820
The 3 and the 2 are out of
order, so those get switched.

01:40:38.820 --> 01:40:40.380
The 3 and the 5 are OK.

01:40:40.380 --> 01:40:43.080
The 5 and the 4 are out of
order, so those get switched.

01:40:43.080 --> 01:40:47.520
And then the 5 and the 1
need to be switched as well.

01:40:47.520 --> 01:40:49.325
So there's the 5 in sorted position.

01:40:49.325 --> 01:40:50.700
And now I'm left with these four.

01:40:50.700 --> 01:40:53.070
The 2 and the 3 are
OK, the 3 and the 4 OK.

01:40:53.070 --> 01:40:55.110
But the 4 and the 1 are out of order.

01:40:55.110 --> 01:40:58.790
So those get switched, and now
the four, that's in its place.

01:40:58.790 --> 01:41:02.720
The 2 and the 3 are OK, but the 3 and
the 1 are not, so I'll swap those.

01:41:02.720 --> 01:41:05.700
And now the 3 goes
into its sorted place.

01:41:05.700 --> 01:41:08.900
And then finally, the last pair to
consider is just the 2 and the 1.

01:41:08.900 --> 01:41:12.690
Those are out of order, so I'll swap
those, and now the 2 is in place.

01:41:12.690 --> 01:41:16.280
And 1 is the only remaining number, so
I can say that that one's in place, too.

01:41:16.280 --> 01:41:18.440
And now I think we have a sorted array.

01:41:18.440 --> 01:41:19.190
DAVID MALAN: Nice.

01:41:19.190 --> 01:41:21.860
So it felt like this was a
fundamentally different approach,

01:41:21.860 --> 01:41:23.762
but we still got to the same end point.

01:41:23.762 --> 01:41:26.720
So that really now invites the question
as to whether bubbles or it was

01:41:26.720 --> 01:41:29.370
better or worse or maybe no different.

01:41:29.370 --> 01:41:33.270
But notice, too, that we've solved the
same problem fundamentally differently.

01:41:33.270 --> 01:41:36.427
The first time, we took the more
human natural intuition of just,

01:41:36.427 --> 01:41:37.510
find the smallest element.

01:41:37.510 --> 01:41:39.480
All right, do it again,
do it again, do it again.

01:41:39.480 --> 01:41:42.230
This time, we sort of viewed the
problem through a different lens.

01:41:42.230 --> 01:41:44.360
And we thought about, it
would seem, what does it

01:41:44.360 --> 01:41:46.130
mean for the list to be unsorted?

01:41:46.130 --> 01:41:48.260
As Peter noted, it's when
things are out of order.

01:41:48.260 --> 01:41:51.380
Like that very basic primitive
where something is out of order

01:41:51.380 --> 01:41:54.380
suggests an opportunity to
solve the problem that way.

01:41:54.380 --> 01:41:57.170
Just fix all of the tiny
bite-sized problems.

01:41:57.170 --> 01:42:00.470
And it would seem that using a
loop, if we repeat that intuition,

01:42:00.470 --> 01:42:03.506
is going to pay off eventually
by fixing, fixing, fixing,

01:42:03.506 --> 01:42:06.620
fixing all of the little
problems until the big one itself

01:42:06.620 --> 01:42:08.000
would seem to go away.

01:42:08.000 --> 01:42:12.150
Well, let me return to the visualization
from before, re-randomize the bars--

01:42:12.150 --> 01:42:15.480
short bar is small number,
big bar is big number.

01:42:15.480 --> 01:42:17.900
And let me go ahead and run
the bubble sort algorithm,

01:42:17.900 --> 01:42:20.390
this time with this visualization.

01:42:20.390 --> 01:42:24.680
And you'll notice now sweeping from
left to right are two colored bars that

01:42:24.680 --> 01:42:30.320
represent the comparison of two adjacent
numbers again and again and again.

01:42:30.320 --> 01:42:33.770
And you'll see this time that the
bars are being a little smart,

01:42:33.770 --> 01:42:36.470
and they're not going all the
way to the end every time,

01:42:36.470 --> 01:42:39.440
just like Brian illuminated
the numbers and stopped

01:42:39.440 --> 01:42:43.080
looking at the 8 and the 7 and
the 6 once they were in place.

01:42:43.080 --> 01:42:46.910
But he and this visualization do
indeed keep returning to the beginning,

01:42:46.910 --> 01:42:50.330
doing another pass, another
pass, and another pass.

01:42:50.330 --> 01:42:53.270
So if we think ahead to the
analysis of this algorithm,

01:42:53.270 --> 01:42:57.560
it sort of invites us to consider, well,
how many total comparisons are there

01:42:57.560 --> 01:42:58.610
this time?

01:42:58.610 --> 01:43:01.280
It would seem that the very
first time through the bars,

01:43:01.280 --> 01:43:04.250
or equivalently the very first
time through the shelf, Brian

01:43:04.250 --> 01:43:07.830
and this visualization did
like n minus 1 comparisons.

01:43:07.830 --> 01:43:10.090
So n minus 1 comparisons
from left to right, out

01:43:10.090 --> 01:43:13.880
of n elements you can compare
n minus 1 adjacencies.

01:43:13.880 --> 01:43:17.780
After that it was n minus
2, n minus 3, n minus 4,

01:43:17.780 --> 01:43:22.050
n minus 5, until just two or one
remain, and at that point you're done.

01:43:22.050 --> 01:43:25.070
So even though this algorithm
fundamentally took a different approach

01:43:25.070 --> 01:43:29.160
and achieved the same goal, it
sorted the elements successfully.

01:43:29.160 --> 01:43:31.400
Let's consider how it
was implemented in code

01:43:31.400 --> 01:43:35.270
and whether it's actually a
little faster or a little slower.

01:43:35.270 --> 01:43:37.910
And let's set one final
bar, in fact, too.

01:43:37.910 --> 01:43:42.218
Earlier, we considered only the
upper bound on selection sort,

01:43:42.218 --> 01:43:44.510
just so that we have something
to compare this against.

01:43:44.510 --> 01:43:48.590
Let's also consider for a
moment what the running time is

01:43:48.590 --> 01:43:52.940
of selection sort in terms of a
lower bound-- best case scenario.

01:43:52.940 --> 01:43:56.690
With selection sort,
if you have n elements,

01:43:56.690 --> 01:43:59.810
and you keep looking for the next
smallest element, again and again

01:43:59.810 --> 01:44:04.390
and again, it turns out that selection
sort is not really our friend.

01:44:04.390 --> 01:44:07.790
Here's, for instance, the chart of where
we left off in terms of omega notation

01:44:07.790 --> 01:44:08.450
before.

01:44:08.450 --> 01:44:10.850
Linear search and binary
search could very well

01:44:10.850 --> 01:44:13.910
get lucky and take just one step
if you happen to open a door

01:44:13.910 --> 01:44:17.340
and, voila, the number you're
looking for is already there.

01:44:17.340 --> 01:44:20.360
But with selection sort,
as we've implemented it,

01:44:20.360 --> 01:44:23.150
both with Brian and
with the visualization,

01:44:23.150 --> 01:44:26.990
unfortunately it's none so
good with the lower bound.

01:44:26.990 --> 01:44:27.920
Why?

01:44:27.920 --> 01:44:32.720
Well, Brian pretty naively, every
time he searched for a number,

01:44:32.720 --> 01:44:37.230
started at the left and went all the
way to the right, started at the left,

01:44:37.230 --> 01:44:38.480
went all the way to the right.

01:44:38.480 --> 01:44:41.808
To be fair, he did ignore the
numbers that were already in place.

01:44:41.808 --> 01:44:44.600
So he didn't keep looking at the
1, he didn't keep looking at the 2

01:44:44.600 --> 01:44:46.260
once they were in place.

01:44:46.260 --> 01:44:50.390
But he did keep repeating
himself again and again, touching

01:44:50.390 --> 01:44:52.400
those numbers multiple times each.

01:44:52.400 --> 01:44:55.400
So again, even though you and I, the
humans, could look at those numbers

01:44:55.400 --> 01:44:57.692
and be like, obviously there's
the 1, obviously there's

01:44:57.692 --> 01:44:59.540
the 2, the obviously
there's the 3, Brian

01:44:59.540 --> 01:45:01.430
had to do it much more methodically.

01:45:01.430 --> 01:45:07.220
And in fact, even if that list
of numbers were perfectly sorted,

01:45:07.220 --> 01:45:09.480
he would have wasted just as much time.

01:45:09.480 --> 01:45:11.840
In fact, Brian, if you don't
mind, could you quickly

01:45:11.840 --> 01:45:14.510
sort all eight numbers again?

01:45:14.510 --> 01:45:17.210
And Brian, if we start
with a sorted list,

01:45:17.210 --> 01:45:21.190
this is kind of a nice perversion to
consider, if you will, algorithmically.

01:45:21.190 --> 01:45:22.940
When analyzing an
algorithm, sometimes you

01:45:22.940 --> 01:45:25.358
want to consider best
cases and worst cases.

01:45:25.358 --> 01:45:28.400
And there would seem to be nothing
better than, heck, the list is already

01:45:28.400 --> 01:45:31.530
sorted, you got lucky, there's
really no work to be done.

01:45:31.530 --> 01:45:34.040
The worst case is the list is
maybe completely backwards,

01:45:34.040 --> 01:45:36.020
and that's a huge amount
of work to be done.

01:45:36.020 --> 01:45:40.262
Unfortunately, selection sort doesn't
really optimize for that lucky case

01:45:40.262 --> 01:45:41.470
where they're already sorted.

01:45:41.470 --> 01:45:44.780
So Brian, I see you've resorted the
numbers for us from left to right.

01:45:44.780 --> 01:45:48.380
If we were to re-execute
selection sort as before,

01:45:48.380 --> 01:45:51.005
how would you go about
finding the smallest number?

01:45:51.005 --> 01:45:53.630
BRIAN: So we decided earlier
that, to find the smallest number,

01:45:53.630 --> 01:45:55.422
I need to look at all
the numbers from left

01:45:55.422 --> 01:45:58.580
to right in the array and each time
check to see if I found something.

01:45:58.580 --> 01:45:59.390
smaller.

01:45:59.390 --> 01:46:00.590
So I would start with the 1.

01:46:00.590 --> 01:46:02.530
That's the smallest
thing I've seen so far.

01:46:02.530 --> 01:46:04.070
But I would have to keep
looking, because maybe there's

01:46:04.070 --> 01:46:05.780
a 0 or a negative number later on.

01:46:05.780 --> 01:46:08.020
I need to check to see if
there's anything smaller.

01:46:08.020 --> 01:46:11.240
So I would check, the 2 is
bigger, the 3, 4, 5, 6, 7, 8.

01:46:11.240 --> 01:46:12.117
They're all bigger.

01:46:12.117 --> 01:46:13.700
So it turns out I was right all along.

01:46:13.700 --> 01:46:16.730
The 1 was the smallest number,
and it's already in place.

01:46:16.730 --> 01:46:18.590
So now that number is in place.

01:46:18.590 --> 01:46:20.090
DAVID MALAN: And then to find
the next smallest number,

01:46:20.090 --> 01:46:21.260
what would you have done?

01:46:21.260 --> 01:46:22.635
BRIAN: I would do the same thing.

01:46:22.635 --> 01:46:24.320
2 is the smallest number I found so far.

01:46:24.320 --> 01:46:25.580
And then I would look
through all the rest

01:46:25.580 --> 01:46:27.497
to see if there's anything
smaller than the 2.

01:46:27.497 --> 01:46:30.170
And I would look at 3, 4, 5, 6, 7, 8.

01:46:30.170 --> 01:46:31.880
Nothing's smaller than the 2.

01:46:31.880 --> 01:46:34.910
So I go back to the two and
say, OK, that number must now

01:46:34.910 --> 01:46:36.327
be in its sorted position.

01:46:36.327 --> 01:46:37.160
DAVID MALAN: Indeed.

01:46:37.160 --> 01:46:40.035
And that story would be the same
for the 3, for the 4, and for the 5.

01:46:40.035 --> 01:46:43.610
Like, nowhere in selection
sort pseudocode or actual code

01:46:43.610 --> 01:46:46.850
is there any sort of intelligence of,
eh, if the numbers are already sorted,

01:46:46.850 --> 01:46:47.630
quit.

01:46:47.630 --> 01:46:50.540
Like, there was no opportunity
to short circuit and abort

01:46:50.540 --> 01:46:51.645
that algorithm earlier.

01:46:51.645 --> 01:46:53.520
Brian would literally
be doing the same work,

01:46:53.520 --> 01:46:55.340
whether they're all
sorted from the get-go

01:46:55.340 --> 01:46:57.620
or completely unsorted,
and even backwards.

01:46:57.620 --> 01:47:00.660
And so selection sort doesn't
really perform very highly.

01:47:00.660 --> 01:47:03.048
So now we're hoping
bubble sort, indeed, does.

01:47:03.048 --> 01:47:05.590
So toward that end, let's take
a look at some proposed pseudo

01:47:05.590 --> 01:47:09.040
code for bubble sort, assuming
that the input is anything.

01:47:09.040 --> 01:47:10.790
Whether sorted or
unsorted, the pseudocode

01:47:10.790 --> 01:47:13.070
is always going to look like this.

01:47:13.070 --> 01:47:14.510
Repeat until sorted.

01:47:14.510 --> 01:47:17.100
For i from 0 to n minus 2--

01:47:17.100 --> 01:47:18.190
now, what does this mean?

01:47:18.190 --> 01:47:22.010
0 to n minus 1 goes from the
first element to the last.

01:47:22.010 --> 01:47:27.050
So 0 to n minus 2 goes from the
first element to the second to last.

01:47:27.050 --> 01:47:27.950
Why am I doing that?

01:47:27.950 --> 01:47:29.210
We'll see in just a moment.

01:47:29.210 --> 01:47:34.010
The condition inside of this loop is,
if the i-th and the i plus 1th elements

01:47:34.010 --> 01:47:36.720
are out of order, swap them.

01:47:36.720 --> 01:47:38.590
So this is me being a little clever.

01:47:38.590 --> 01:47:41.090
If you think about all of these
numbers as being in an array

01:47:41.090 --> 01:47:46.100
or behind doors, if you
iterate from 0 to n minus 2,

01:47:46.100 --> 01:47:49.010
that's like going from the first
door to the second to last door.

01:47:49.010 --> 01:47:52.830
But that's good, because my condition
is checking door i and i plus 1.

01:47:52.830 --> 01:47:57.530
So if I start at the beginning here,
and I only iterate up to this door,

01:47:57.530 --> 01:47:58.520
that's a good thing.

01:47:58.520 --> 01:48:02.070
Because when I compared door i
and i plus 1, at the very end

01:48:02.070 --> 01:48:05.100
I'm going to compare
door i and i plus 1.

01:48:05.100 --> 01:48:09.020
What I don't want to do is
compare this door i against door i

01:48:09.020 --> 01:48:10.763
plus 1, which doesn't even exist.

01:48:10.763 --> 01:48:13.430
And indeed, that's going to be
an error that probably all of you

01:48:13.430 --> 01:48:14.510
make at some point--

01:48:14.510 --> 01:48:19.040
going beyond the boundary
of an array, touching memory

01:48:19.040 --> 01:48:22.233
that is going one or more
spaces too far in the array,

01:48:22.233 --> 01:48:24.150
even though you didn't
allocate memory for it.

01:48:24.150 --> 01:48:26.940
So this hedges against that possibility.

01:48:26.940 --> 01:48:29.630
So this would seem to be
a pretty smart algorithm.

01:48:29.630 --> 01:48:35.030
But as written, it's not actually
as performant as might be ideal.

01:48:35.030 --> 01:48:39.830
With bubble sort, suppose the
list were entirely sorted.

01:48:39.830 --> 01:48:43.910
Brian, not to make you sort and
resort numbers too many times.

01:48:43.910 --> 01:48:47.300
Do you mind giving us a sorted
list one more time real quick?

01:48:47.300 --> 01:48:50.660
In a moment, I want to see, if we
consider that same sorted list as

01:48:50.660 --> 01:48:54.920
before, this time with bubble sort,
can we do fundamentally better?

01:48:54.920 --> 01:48:58.160
I have this code saying,
repeat until sorted.

01:48:58.160 --> 01:48:59.330
So how might this change?

01:48:59.330 --> 01:49:01.310
So Brian, you've got the
sorted numbers again.

01:49:01.310 --> 01:49:02.660
This should be a good case.

01:49:02.660 --> 01:49:05.990
But selection sort did not
benefit from this input,

01:49:05.990 --> 01:49:07.700
even though we could have gotten lucky.

01:49:07.700 --> 01:49:10.085
Bubble sort, what would your
thought process be here?

01:49:10.085 --> 01:49:11.960
BRIAN: So the thought
process for bubble sort

01:49:11.960 --> 01:49:14.030
was to go through each of
the pairs one at a time

01:49:14.030 --> 01:49:17.000
and see if I need to make a
swap for that particular pair.

01:49:17.000 --> 01:49:18.620
So I'd look at the 1 and the 2.

01:49:18.620 --> 01:49:20.600
Those two are OK, I
don't need to swap them.

01:49:20.600 --> 01:49:21.763
The 2 and the 3 are OK.

01:49:21.763 --> 01:49:23.180
I don't need to make a swap there.

01:49:23.180 --> 01:49:24.270
The 3 and the 4 are OK.

01:49:24.270 --> 01:49:25.820
The 4 and the 5 are OK.

01:49:25.820 --> 01:49:29.750
Same with the 5 and the 6, and the
6 and the 7, and the 7 and the 8.

01:49:29.750 --> 01:49:32.480
So I made my way through
all the entire array,

01:49:32.480 --> 01:49:36.020
and I never needed to make any swap,
because every pair that I looked at,

01:49:36.020 --> 01:49:38.840
they were already in the correct
order relative to each other.

01:49:38.840 --> 01:49:39.673
DAVID MALAN: Indeed.

01:49:39.673 --> 01:49:42.440
And so it would be foolish
and so obvious this time

01:49:42.440 --> 01:49:45.140
if Brian literally retraced
those steps and did it

01:49:45.140 --> 01:49:49.130
again with n minus 1 elements, and then
did it again with n minus 2 elements.

01:49:49.130 --> 01:49:52.490
I mean, if he didn't do any
work, any swaps the first pass,

01:49:52.490 --> 01:49:54.830
he's literally wasting
his own time by even

01:49:54.830 --> 01:49:57.050
doing another pass or another pass.

01:49:57.050 --> 01:50:00.650
And so that's kind of implicit in the
pseudocode, this repeat until sorted.

01:50:00.650 --> 01:50:03.110
Even though it doesn't translate
perfectly into a for loop

01:50:03.110 --> 01:50:07.250
or a while loop in C, it kind of
says intuitively what he should do--

01:50:07.250 --> 01:50:08.390
repeat until sorted.

01:50:08.390 --> 01:50:11.150
Brian has already identified
the fact, by nature of him

01:50:11.150 --> 01:50:13.670
not having made any swaps,
that this list is sorted.

01:50:13.670 --> 01:50:16.400
Therefore, he can just
stop, and this loop does not

01:50:16.400 --> 01:50:18.020
have to continue again and again.

01:50:18.020 --> 01:50:21.620
We can map this to C-like
code a little more explicitly.

01:50:21.620 --> 01:50:24.770
We can by default say, do the
following n minus 1 times.

01:50:24.770 --> 01:50:29.570
Because among n elements, you can look
at n minus 1 total pairs from left

01:50:29.570 --> 01:50:31.620
to right without going too far.

01:50:31.620 --> 01:50:35.100
But notice, I can add an
additional line of code here

01:50:35.100 --> 01:50:40.070
which might say, if no swaps, quit
from the algorithm altogether.

01:50:40.070 --> 01:50:42.890
So, so long as Brian is
keeping track of how many swaps

01:50:42.890 --> 01:50:47.160
he made or didn't make through one
pass, as with a variable called counter

01:50:47.160 --> 01:50:50.940
or whatever, he can simply abort
this algorithm early and certainly

01:50:50.940 --> 01:50:52.687
then save us some time.

01:50:52.687 --> 01:50:55.020
So with that said, let's
consider for just a moment what

01:50:55.020 --> 01:50:59.280
the running time of bubble sort
might be in terms of an upper bound,

01:50:59.280 --> 01:51:01.120
in the worst case, if you will.

01:51:01.120 --> 01:51:05.070
Well, in the case of bubble
sort, notice with the pseudocode

01:51:05.070 --> 01:51:08.160
where we're doing
something n minus 1 times,

01:51:08.160 --> 01:51:11.820
and inside of that we're doing
something n minus 1 times.

01:51:11.820 --> 01:51:14.130
So again, repeat n
minus 1 times literally

01:51:14.130 --> 01:51:17.040
says, do the following n minus 1 times.

01:51:17.040 --> 01:51:19.830
The for loop here, which is just
a different way in pseudocode

01:51:19.830 --> 01:51:24.540
of expressing a similar idea but giving
us a variable this time, for i from 0

01:51:24.540 --> 01:51:25.950
to n minus 1--

01:51:25.950 --> 01:51:32.890
n minus 2, is a total number
of n minus 1 comparisons.

01:51:32.890 --> 01:51:37.490
So this is an n minus 1
thing inside the repeat,

01:51:37.490 --> 01:51:39.680
and an n minus 1 outside the repeat.

01:51:39.680 --> 01:51:44.720
So I think what that gives me is n
minus 1 things times n minus 1 times.

01:51:44.720 --> 01:51:47.180
So now if I just kind of FOIL
this, sort of in high school

01:51:47.180 --> 01:51:51.080
or middle school math, n squared
minus 1n minus 1n plus 1.

01:51:51.080 --> 01:51:54.115
We can combine like terms,
n squared minus 2n plus 1.

01:51:54.115 --> 01:51:57.240
But per our discussion earlier, ugh,
this is really getting into the weeds.

01:51:57.240 --> 01:51:59.870
Who cares about the 2n or the 1?

01:51:59.870 --> 01:52:04.320
The dominant factor as n gets large is
definitely going to be the n squared.

01:52:04.320 --> 01:52:06.440
So it would seem that
bubble sort, if you actually

01:52:06.440 --> 01:52:10.130
do out the math and the formulas,
is going to have an upper bound of n

01:52:10.130 --> 01:52:13.200
squared, or rather, on the
order of n squared steps.

01:52:13.200 --> 01:52:17.420
So in that sense, it is
equivalent to selection sort.

01:52:17.420 --> 01:52:19.490
It is no better fundamentally.

01:52:19.490 --> 01:52:22.430
It's what we would say ask
asymptotically equivalent.

01:52:22.430 --> 01:52:25.220
That is, as n gets really
large, this formula

01:52:25.220 --> 01:52:28.670
is, for all intents and purposes,
equivalent to the selection sort

01:52:28.670 --> 01:52:30.750
formula, even though
they differed slightly

01:52:30.750 --> 01:52:32.930
in terms of their lower order terms.

01:52:32.930 --> 01:52:37.080
For all intents and purposes, ah,
they're on the order of n squared both.

01:52:37.080 --> 01:52:40.460
But if we consider a
lower bound, perhaps,

01:52:40.460 --> 01:52:43.520
even though bubble sort has the
same upper bound running time,

01:52:43.520 --> 01:52:47.420
if we consider a lower bound, as with
this smarter code, where Brian might

01:52:47.420 --> 01:52:50.840
actually have the wherewithal to notice,
wait a minute, I didn't do any swaps,

01:52:50.840 --> 01:52:54.630
I'm just going to exit out of
this looping pretty much early--

01:52:54.630 --> 01:52:56.540
not even prematurely
but early, because it

01:52:56.540 --> 01:52:59.120
would be fruitless to keep
doing more and more work--

01:52:59.120 --> 01:53:01.760
we can then whittle
down this running time.

01:53:01.760 --> 01:53:07.610
I think-- not quite as good as omega
of 1, which was constant time--

01:53:07.610 --> 01:53:12.200
like, you cannot conclude definitively
that an array is sorted unless you

01:53:12.200 --> 01:53:14.360
minimally look at all
of the elements once.

01:53:14.360 --> 01:53:17.540
So constant time is completely
naive and unrealistic.

01:53:17.540 --> 01:53:21.230
You can't look at one element, or two
or three, and say, yes, this is sorted.

01:53:21.230 --> 01:53:24.650
You've got to obviously look at
all of the elements at least once.

01:53:24.650 --> 01:53:28.400
So this would seem to suggest that
the omega notation for it, that

01:53:28.400 --> 01:53:31.520
is, the lower bound on
bubble sort's running time,

01:53:31.520 --> 01:53:37.850
if we're clever and don't retrace our
steps unnecessarily, is in omega of n.

01:53:37.850 --> 01:53:39.950
Or technically, it's n
minus 1 steps, right?

01:53:39.950 --> 01:53:41.950
Because if you've got n
elements and you compare

01:53:41.950 --> 01:53:44.030
these two, these two,
these two, these two,

01:53:44.030 --> 01:53:45.650
that's n minus 1 total comparisons.

01:53:45.650 --> 01:53:47.450
But who cares about the minus 1?

01:53:47.450 --> 01:53:53.380
It's on the order of n, or
omega of n notation here.

01:53:53.380 --> 01:53:57.240
So to recap, selection sort selects the
next smallest element again and again

01:53:57.240 --> 01:53:57.870
and again.

01:53:57.870 --> 01:54:01.020
Unfortunately, based on how it's
implemented in pseudocode and actual

01:54:01.020 --> 01:54:03.330
code, it's in Big O of n squared.

01:54:03.330 --> 01:54:05.370
But it's also an omega
of n squared, which

01:54:05.370 --> 01:54:10.170
means it's always going to take the same
amount of time asymptotically, that is,

01:54:10.170 --> 01:54:11.490
as n gets large.

01:54:11.490 --> 01:54:16.083
Unfortunately, too, bubble sort
is no better, it would seem,

01:54:16.083 --> 01:54:17.250
in terms of the upper bound.

01:54:17.250 --> 01:54:19.500
It's going to take as many
as n squared steps, too.

01:54:19.500 --> 01:54:23.610
But it's at least marginally better
when it comes to using something

01:54:23.610 --> 01:54:26.280
like an input that's already sorted.

01:54:26.280 --> 01:54:30.880
It can short circuit and not waste time.

01:54:30.880 --> 01:54:32.850
But honestly, n squared is bad.

01:54:32.850 --> 01:54:34.950
Like, n squared is really
going to add up quickly.

01:54:34.950 --> 01:54:39.180
If you've got n squared and n is a
million or n is a billion, I mean,

01:54:39.180 --> 01:54:40.860
my God, that's a lot of 0's.

01:54:40.860 --> 01:54:44.820
That's a lot of steps in the total
running time of your algorithm.

01:54:44.820 --> 01:54:46.180
Can we do better?

01:54:46.180 --> 01:54:47.500
Can we do better?

01:54:47.500 --> 01:54:48.720
And it turns out we can.

01:54:48.720 --> 01:54:52.560
And we'll consider one final algorithm
today that does fundamentally better.

01:54:52.560 --> 01:54:57.000
Just like in week 0, we sort of latched
onto binary search and again today--

01:54:57.000 --> 01:55:01.390
it's just fundamentally better than
linear search by an order of magnitude,

01:55:01.390 --> 01:55:01.890
so to speak.

01:55:01.890 --> 01:55:05.250
Its picture representation
was fundamentally different.

01:55:05.250 --> 01:55:09.048
I think we can do fundamentally better
than bubble sort and selection sort.

01:55:09.048 --> 01:55:10.840
And so while both bubble
sort and selection

01:55:10.840 --> 01:55:13.465
sort might be the sort of thing
that I was using in grad school

01:55:13.465 --> 01:55:15.840
just to rip up the code
quickly and then go to sleep,

01:55:15.840 --> 01:55:18.272
it's not going to work well
for very large data sets.

01:55:18.272 --> 01:55:19.980
And frankly, it wouldn't
have worked well

01:55:19.980 --> 01:55:22.140
if I didn't want to just
sleep through the problem.

01:55:22.140 --> 01:55:26.190
Rather, we want to do things as
efficiently as we can from the get go.

01:55:26.190 --> 01:55:29.560
And let me propose that
we leverage a technique--

01:55:29.560 --> 01:55:32.310
and this is a technique that you
can use in almost any programming

01:55:32.310 --> 01:55:33.960
language, C among them--

01:55:33.960 --> 01:55:35.430
known as recursion.

01:55:35.430 --> 01:55:41.940
And recursion, quite simply, is the
ability for a function to call itself.

01:55:41.940 --> 01:55:44.760
Up until now, we have not
seen any examples of this.

01:55:44.760 --> 01:55:47.070
We've seen functions
calling other functions.

01:55:47.070 --> 01:55:49.170
Main keeps calling printf.

01:55:49.170 --> 01:55:51.000
Main has started to call strlen.

01:55:51.000 --> 01:55:54.930
Main called strcmp,
compare, earlier today.

01:55:54.930 --> 01:55:56.700
But we've never seen main call main.

01:55:56.700 --> 01:55:59.770
And people don't do that, so that's
not going to solve the problem.

01:55:59.770 --> 01:56:02.640
But we can implement our
own functions and have

01:56:02.640 --> 01:56:05.100
our own functions call themselves.

01:56:05.100 --> 01:56:07.290
Now, this would seem to be
a bad idea in principle.

01:56:07.290 --> 01:56:10.020
If a function calls itself,
my God, where does it end?

01:56:10.020 --> 01:56:11.820
It would seem to just
do something forever,

01:56:11.820 --> 01:56:13.487
and then something bad probably happens.

01:56:13.487 --> 01:56:14.132
And it could.

01:56:14.132 --> 01:56:15.840
And that's the danger
of using recursion.

01:56:15.840 --> 01:56:17.495
You can screw it up easily.

01:56:17.495 --> 01:56:19.620
But it's also a very powerful
technique, because it

01:56:19.620 --> 01:56:21.870
allows us to think about
potential solutions

01:56:21.870 --> 01:56:25.770
to problems in a very interesting,
and daresay elegant, way.

01:56:25.770 --> 01:56:29.160
So we're not only going to be able to
achieve correctness but also better

01:56:29.160 --> 01:56:32.610
design, because of better
efficiency, it would seem, here.

01:56:32.610 --> 01:56:33.960
So let me propose this.

01:56:33.960 --> 01:56:37.830
Recall this code from week 0, which
was the pseudocode for finding someone

01:56:37.830 --> 01:56:38.790
in a phone book.

01:56:38.790 --> 01:56:42.180
And recall that, among the
features of this pseudocode,

01:56:42.180 --> 01:56:44.520
were these lines here,
"Go back to line 3."

01:56:44.520 --> 01:56:48.950
And we describe those in week 0
as being representative of loops,

01:56:48.950 --> 01:56:52.950
a programming construct that has
something happen again and again.

01:56:52.950 --> 01:56:56.640
But you know what, there's a missed
opportunity here in this pseudocode

01:56:56.640 --> 01:56:59.590
to use a technique known as recursion.

01:56:59.590 --> 01:57:02.280
This implementation is what
we would call iterative.

01:57:02.280 --> 01:57:04.290
It is purely loop based.

01:57:04.290 --> 01:57:07.230
It tells me literally, go back to
this line, go back to this line,

01:57:07.230 --> 01:57:08.400
go back to this line.

01:57:08.400 --> 01:57:10.230
There's no calling yourself.

01:57:10.230 --> 01:57:13.620
But what if I changed week 0's
pseudocode to be a little more

01:57:13.620 --> 01:57:14.550
like this?

01:57:14.550 --> 01:57:19.110
Let me go ahead and get rid of, not
just that one line but two lines

01:57:19.110 --> 01:57:21.000
in both of those conditions.

01:57:21.000 --> 01:57:23.310
And let me quite simply
say, instead of open

01:57:23.310 --> 01:57:26.280
to the middle of the left half of
the book and then go back to line 3,

01:57:26.280 --> 01:57:29.910
or open to the middle of the right half
of the book and then go back to line 3,

01:57:29.910 --> 01:57:34.950
why don't I just more elegantly
say, search left half of book,

01:57:34.950 --> 01:57:36.660
search right half of book?

01:57:36.660 --> 01:57:39.780
Now, immediately I can
shorten the code a little bit.

01:57:39.780 --> 01:57:44.880
But I claim that by just saying search
left half of book and search right

01:57:44.880 --> 01:57:49.140
half of book, I claim that
this is enough information

01:57:49.140 --> 01:57:50.760
to implement the very same algorithm.

01:57:50.760 --> 01:57:53.100
But it's not using a loop per se.

01:57:53.100 --> 01:57:56.010
It's going to induce me the
human or me the computer

01:57:56.010 --> 01:57:57.840
to do something again and again.

01:57:57.840 --> 01:58:00.270
But there's other ways to
do things again and again--

01:58:00.270 --> 01:58:03.690
not by way of a for loop, or a
while loop, or a do while loop,

01:58:03.690 --> 01:58:06.360
or a repeat block, or a forever block--

01:58:06.360 --> 01:58:08.610
you can actually use recursion.

01:58:08.610 --> 01:58:12.690
And recursion, again, is this technique
where a function can call itself.

01:58:12.690 --> 01:58:15.660
And if we consider, after all,
the pseudocode we are looking at

01:58:15.660 --> 01:58:18.210
is the pseudocode for searching.

01:58:18.210 --> 01:58:24.210
And on line 7 and 9 now, I am literally
saying, "Search left half of book,"

01:58:24.210 --> 01:58:28.800
and "Search right half of book," this
is already, even in pseudocode form,

01:58:28.800 --> 01:58:30.270
an example of recursion.

01:58:30.270 --> 01:58:34.530
Here I have in 11 lines of
code an algorithm or a function

01:58:34.530 --> 01:58:36.440
that searches a phone book.

01:58:36.440 --> 01:58:40.830
In lines 7 and 9, I have lines of
code that literally say, search

01:58:40.830 --> 01:58:44.460
a phone book, but more specifically,
search half of the phone book.

01:58:44.460 --> 01:58:47.520
And that's where recursion
really works its magic.

01:58:47.520 --> 01:58:50.760
It would be foolish and incorrect
and completely counterproductive

01:58:50.760 --> 01:58:53.148
to just have a function call
itself with the same input,

01:58:53.148 --> 01:58:55.440
with the same input, with
the same input, because you'd

01:58:55.440 --> 01:58:57.960
have to be kind of crazy
to expect different output

01:58:57.960 --> 01:59:00.120
if the input is constantly the same.

01:59:00.120 --> 01:59:03.540
But that's not what we did in week 0,
and that's not what we're doing now.

01:59:03.540 --> 01:59:07.110
If you use the same function,
or equivalently algorithm,

01:59:07.110 --> 01:59:11.580
but change the input to be
smaller and smaller and smaller,

01:59:11.580 --> 01:59:14.770
it's probably OK that a
function is calling itself,

01:59:14.770 --> 01:59:18.120
so long as you have at least
one line of code in there

01:59:18.120 --> 01:59:20.820
that very intelligently
says, if you're out of doors,

01:59:20.820 --> 01:59:23.310
if you're out of phone book pages, quit.

01:59:23.310 --> 01:59:25.500
You need to have a so-called base case.

01:59:25.500 --> 01:59:28.650
You need some line of code that's
going to notice, wait a minute, there's

01:59:28.650 --> 01:59:31.950
no more problem to be solved, quit now.

01:59:31.950 --> 01:59:35.190
And so how can we map
this to actual code?

01:59:35.190 --> 01:59:38.130
Well, let's consider something
very familiar from week 1.

01:59:38.130 --> 01:59:40.440
Recall when you reconstructed
one of Mario's pyramids.

01:59:40.440 --> 01:59:43.330
It looked a little something like this.

01:59:43.330 --> 01:59:46.050
And let's consider that
this is a pyramid of blocks,

01:59:46.050 --> 01:59:47.880
of bricks, that's of height 4.

01:59:47.880 --> 01:59:48.420
Why 4?

01:59:48.420 --> 01:59:52.450
Well, there's 1, then 2, then 3,
then 4 bricks from top to bottom.

01:59:52.450 --> 01:59:54.210
So the total height here is 4.

01:59:54.210 --> 01:59:58.740
But let me ask the question, a little
naively, how do you go about creating,

01:59:58.740 --> 02:00:02.730
or how do you go about
printing a pyramid of height 4?

02:00:02.730 --> 02:00:06.000
Well, it turns out that this
simple Mario pyramid, that's

02:00:06.000 --> 02:00:09.210
ever more clear if we get rid
of the unnecessary background,

02:00:09.210 --> 02:00:12.700
is a recursive structure of some sort.

02:00:12.700 --> 02:00:14.250
It's a recursive physical structure.

02:00:14.250 --> 02:00:15.020
Why?

02:00:15.020 --> 02:00:18.780
Well, notice that this structure,
this brick, this pyramid,

02:00:18.780 --> 02:00:20.970
is kind of defined in terms of itself.

02:00:20.970 --> 02:00:21.690
Why?

02:00:21.690 --> 02:00:24.660
Well, how do you make
a pyramid of height 4?

02:00:24.660 --> 02:00:27.870
I would argue, a little obnoxiously,
a little circularly, well,

02:00:27.870 --> 02:00:30.240
you create a pyramid
of height 3, and then

02:00:30.240 --> 02:00:32.670
you add an additional row of bricks.

02:00:32.670 --> 02:00:33.400
All right.

02:00:33.400 --> 02:00:34.440
Well, let's continue that logic.

02:00:34.440 --> 02:00:35.107
All right, fine.

02:00:35.107 --> 02:00:38.160
How do you build a pyramid of height 3?

02:00:38.160 --> 02:00:41.670
Well, you sort of smile and say,
well, you build a pyramid of height 2,

02:00:41.670 --> 02:00:43.110
and then you add one more layer.

02:00:43.110 --> 02:00:43.590
All right, fine.

02:00:43.590 --> 02:00:45.215
How do you build a pyramid of height 2?

02:00:45.215 --> 02:00:49.165
Well, you build a pyramid of height
1, and then you add one more layer.

02:00:49.165 --> 02:00:51.040
Well, how do you build
a pyramid of height 1?

02:00:51.040 --> 02:00:53.610
Well, you just put
the stupid brick down.

02:00:53.610 --> 02:00:56.280
You have a base case, where
you sort of state the obvious

02:00:56.280 --> 02:00:57.990
and just do something once.

02:00:57.990 --> 02:00:59.550
You hardcode the logic.

02:00:59.550 --> 02:01:02.610
But notice what's kind
of mind bending, or kind

02:01:02.610 --> 02:01:05.550
of obnoxious in a human
interaction, like,

02:01:05.550 --> 02:01:08.640
you're just defining the
answer in terms of itself.

02:01:08.640 --> 02:01:10.470
I keep saying the same thing.

02:01:10.470 --> 02:01:14.790
But that's OK, because the pyramid keeps
getting smaller and smaller and smaller

02:01:14.790 --> 02:01:16.895
until I can handle
that one special case.

02:01:16.895 --> 02:01:19.770
And so we can do this just for fun
with these little cardboard bricks

02:01:19.770 --> 02:01:20.920
here, for instance.

02:01:20.920 --> 02:01:24.010
If I want to build a pyramid
of height 4, how do I do it?

02:01:24.010 --> 02:01:26.220
Well, I can build a pyramid of height 3.

02:01:26.220 --> 02:01:29.280
All right, let me go ahead and
build a pyramid of height 3.

02:01:29.280 --> 02:01:31.095
How do I build a pyramid of height 3?

02:01:31.095 --> 02:01:33.970
All right, well, I build a pyramid
of height 2, and then I add to it.

02:01:33.970 --> 02:01:37.090
OK, how do I build a
pyramid of height 2?

02:01:37.090 --> 02:01:38.790
Well, you build a pyramid of height 1.

02:01:38.790 --> 02:01:39.773
How do I do that?

02:01:39.773 --> 02:01:41.190
Well, you just put the brick down.

02:01:41.190 --> 02:01:43.140
And so here's where
things kind of bottom out,

02:01:43.140 --> 02:01:45.390
and it's no longer a cyclical argument.

02:01:45.390 --> 02:01:47.640
You eventually just do some actual work.

02:01:47.640 --> 02:01:51.780
But in my mind, I have to remember all
of the instructions you just gave me,

02:01:51.780 --> 02:01:53.040
or I gave myself.

02:01:53.040 --> 02:01:56.850
I had to build a pyramid of height
4; nope, 3; nope, 2; nope, 1.

02:01:56.850 --> 02:01:58.270
Now I'm actually doing that.

02:01:58.270 --> 02:02:00.090
So here's a pyramid of height 1.

02:02:00.090 --> 02:02:02.610
How do I now build a
pyramid of height 2?

02:02:02.610 --> 02:02:04.350
Well, rewind in the story.

02:02:04.350 --> 02:02:07.950
To build a pyramid of height 2,
you build a pyramid of height 1,

02:02:07.950 --> 02:02:09.940
and then you add one more layer.

02:02:09.940 --> 02:02:14.440
So I think to add one more layer,
I essentially need to do this.

02:02:14.440 --> 02:02:14.940
All right.

02:02:14.940 --> 02:02:16.740
Now I have a pyramid of height 2.

02:02:16.740 --> 02:02:17.490
But wait a minute.

02:02:17.490 --> 02:02:20.010
The story began with, how do
I build a pyramid of height 3?

02:02:20.010 --> 02:02:22.740
Well, you take a pyramid of
height 2, which I have here,

02:02:22.740 --> 02:02:24.330
and you add an additional layer.

02:02:24.330 --> 02:02:26.370
So I've got to build
this additional layer.

02:02:26.370 --> 02:02:30.900
I'm going to go ahead and give myself
the layer, the layer, the layer.

02:02:30.900 --> 02:02:34.950
And then I'm going to put the original
pyramid of height to on top of it.

02:02:34.950 --> 02:02:37.598
And voila, it's a
pyramid of height 3 now.

02:02:37.598 --> 02:02:38.640
Well, how did I get here?

02:02:38.640 --> 02:02:40.230
Well, let me keep
rewinding in the story.

02:02:40.230 --> 02:02:42.120
The very first question
I asked myself was,

02:02:42.120 --> 02:02:43.828
how do you build a pyramid of height 4?

02:02:43.828 --> 02:02:45.870
Well, the answer was build
a pyramid of height 3.

02:02:45.870 --> 02:02:47.110
Great, that's done.

02:02:47.110 --> 02:02:49.222
Then add one additional layer.

02:02:49.222 --> 02:02:51.930
And if I had more hands, I could
do this a little more elegantly,

02:02:51.930 --> 02:02:54.190
but let me go ahead
and just lay this out.

02:02:54.190 --> 02:02:57.280
Here's the new level of height 3.

02:02:57.280 --> 02:02:58.920
And now I'm going to go--

02:02:58.920 --> 02:03:00.840
of width 4.

02:03:00.840 --> 02:03:05.940
Now I'm going to go and put the pyramid
of height 3 on top of it, until voila,

02:03:05.940 --> 02:03:09.840
I have this form here
of Mario's pyramid.

02:03:09.840 --> 02:03:13.020
So it's a bit cyclical
in that, every time I

02:03:13.020 --> 02:03:15.627
asked myself to build a
pyramid of a certain height,

02:03:15.627 --> 02:03:18.210
I kind of punted and said, no,
build a pyramid of this height.

02:03:18.210 --> 02:03:19.290
No, build a pyramid of this height.

02:03:19.290 --> 02:03:20.940
No, build a pyramid of this height.

02:03:20.940 --> 02:03:25.890
But the magic of that algorithm
was that there was constantly

02:03:25.890 --> 02:03:29.520
this, do a little more work, build
a layer, do a little more work,

02:03:29.520 --> 02:03:30.690
build a layer.

02:03:30.690 --> 02:03:35.160
And it's in that implicit building
of layer after layer after layer

02:03:35.160 --> 02:03:38.483
that the pyramid itself, the
end goal, actually emerges.

02:03:38.483 --> 02:03:41.400
So you could implement the same thing
with a for loop or a while loop.

02:03:41.400 --> 02:03:42.330
And frankly, you did.

02:03:42.330 --> 02:03:45.210
It was a slightly different
shape for problem set 1,

02:03:45.210 --> 02:03:47.340
but you did the same thing using a loop.

02:03:47.340 --> 02:03:51.060
And you kind of had to do it that
way, at least as we prescribed it.

02:03:51.060 --> 02:03:54.327
Because with printf, you have to
print from the top of the screen

02:03:54.327 --> 02:03:54.910
to the bottom.

02:03:54.910 --> 02:03:57.910
Like, we haven't shown you a
technique yet to print a layer

02:03:57.910 --> 02:03:59.253
and then go back on top.

02:03:59.253 --> 02:04:01.420
So I'm kind of taking some
real-world liberties here

02:04:01.420 --> 02:04:03.503
by lifting these things
up and moving them around.

02:04:03.503 --> 02:04:06.010
You'd have to be a little
more clever in code.

02:04:06.010 --> 02:04:07.300
But the idea is the same.

02:04:07.300 --> 02:04:09.280
And so even physical
objects like this can

02:04:09.280 --> 02:04:12.770
have some recursive definition to them.

02:04:12.770 --> 02:04:14.800
And so we present this
sort of goofy example,

02:04:14.800 --> 02:04:18.708
because this notion of recursion is
a fundamental programming technique

02:04:18.708 --> 02:04:20.500
that you can leverage
now to solve problems

02:04:20.500 --> 02:04:22.480
in a fundamentally different way.

02:04:22.480 --> 02:04:26.500
And I think for this, we need one
final visualization of merge sort,

02:04:26.500 --> 02:04:28.450
with both Brian's help
and the computer's.

02:04:28.450 --> 02:04:32.170
And merge sort is going to be an
algorithm whose pseudocode is, daresay,

02:04:32.170 --> 02:04:35.440
the simplest we've seen thus
far, but deceptively simple.

02:04:35.440 --> 02:04:38.740
The pseudocode for merge
sort, quite simply, is this--

02:04:38.740 --> 02:04:42.690
sort the left half of numbers,
sort the right half of numbers,

02:04:42.690 --> 02:04:44.950
merge the sorted halves.

02:04:44.950 --> 02:04:48.640
And notice, even at first glance
this feels kind of unfair.

02:04:48.640 --> 02:04:51.310
Like, here's an algorithm
for sorting, and yet I'm

02:04:51.310 --> 02:04:54.820
literally using the word "sort"
in my algorithm for sorting.

02:04:54.820 --> 02:04:57.070
It's like in English if
you're asked to define a word,

02:04:57.070 --> 02:04:59.290
and you literally use the
word in the definition.

02:04:59.290 --> 02:05:03.950
Like, that rarely flies, because
you're just making a circular argument.

02:05:03.950 --> 02:05:08.740
But in code, it's OK, so long as there's
one special step that's doing something

02:05:08.740 --> 02:05:10.990
a little differently, and
so long as the problem keeps

02:05:10.990 --> 02:05:12.157
getting smaller and smaller.

02:05:12.157 --> 02:05:13.180
And indeed it is.

02:05:13.180 --> 02:05:16.593
This pseudocode is not saying,
sort the numbers, sort the numbers,

02:05:16.593 --> 02:05:17.260
sort of numbers.

02:05:17.260 --> 02:05:22.060
No, it's dividing the problem in
half and then solving the other half

02:05:22.060 --> 02:05:22.700
as well.

02:05:22.700 --> 02:05:25.090
So it's shrinking the
problem on each iteration.

02:05:25.090 --> 02:05:28.667
Now, I will disclaim we're going to
need that so-called base case again.

02:05:28.667 --> 02:05:31.000
I'm going to have to do
something stupid, but necessary,

02:05:31.000 --> 02:05:33.670
and say, if there's
only one number, quit.

02:05:33.670 --> 02:05:34.540
It's sorted.

02:05:34.540 --> 02:05:36.370
That's the so-called base case.

02:05:36.370 --> 02:05:39.970
The recursive case is where
the function calls itself.

02:05:39.970 --> 02:05:44.920
But this is, indeed, our third and final
sorting algorithm called merge sort.

02:05:44.920 --> 02:05:48.290
And we'll focus here really
on the juiciest pieces,

02:05:48.290 --> 02:05:49.870
one, this notion of merging.

02:05:49.870 --> 02:05:52.270
So in fact, Brian, can
we come over to you

02:05:52.270 --> 02:05:56.300
just so we can define, before we look
at the merge sort algorithm itself,

02:05:56.300 --> 02:06:00.010
what do we even mean when
we say merge sorted halves?

02:06:00.010 --> 02:06:04.510
So for instance, Brian has on his
shelf here two arrays of size 4.

02:06:04.510 --> 02:06:08.680
In the first array on the left
are four integers, 3, 5, 6, 8.

02:06:08.680 --> 02:06:12.460
And in the right side, in
another array of size 4,

02:06:12.460 --> 02:06:15.160
are four numbers, too, 1, 2, 4, 7.

02:06:15.160 --> 02:06:18.370
Both the left is sorted
and the right is sorted.

02:06:18.370 --> 02:06:21.880
But now, Brian, I would like you
to merge these sorted halves.

02:06:21.880 --> 02:06:23.480
Tell us what that means.

02:06:23.480 --> 02:06:23.980
BRIAN: Sure.

02:06:23.980 --> 02:06:26.680
So if I have a left half
that sorted from smallest

02:06:26.680 --> 02:06:30.490
to largest and a right half that's
also sorted from smallest to largest,

02:06:30.490 --> 02:06:34.120
I want to merge them into a new list
that has all of the same numbers

02:06:34.120 --> 02:06:36.090
also from smallest to largest.

02:06:36.090 --> 02:06:38.890
And I guess where I could
start here is that the smallest

02:06:38.890 --> 02:06:43.570
number of the combined array needs
to begin with either the smallest

02:06:43.570 --> 02:06:46.540
number of the left half or the
smallest number of the right half.

02:06:46.540 --> 02:06:49.690
So on the left the smallest
number is the 3, and on the right

02:06:49.690 --> 02:06:51.580
the smallest number is the 1.

02:06:51.580 --> 02:06:55.030
Of those two has got to be the
smallest number for the entire array.

02:06:55.030 --> 02:06:58.030
And between the 3 and
the 1, the 1 is smaller.

02:06:58.030 --> 02:07:03.100
So I would take that 1, and that's going
to be the first number, the smallest

02:07:03.100 --> 02:07:06.297
number, of the merged two halves.

02:07:06.297 --> 02:07:08.380
And then I guess I would
repeat the process again.

02:07:08.380 --> 02:07:10.960
On the left side the
smallest number is the 3.

02:07:10.960 --> 02:07:13.060
On the right side the
smallest number is the 2.

02:07:13.060 --> 02:07:16.120
And between the 3 and
the 2, 2 is smaller.

02:07:16.120 --> 02:07:19.270
So I would take the 2 [INAUDIBLE] and
that's going to be the next number.

02:07:19.270 --> 02:07:22.090
So I'm slowly building
up this sorted array that

02:07:22.090 --> 02:07:23.770
is the result of combining these two.

02:07:23.770 --> 02:07:27.040
Now I'm comparing the 3 on the
left to the 4 on the right.

02:07:27.040 --> 02:07:29.170
Between the 3 and the
4, the 3 is smaller.

02:07:29.170 --> 02:07:33.020
So I'll take the 3, and we'll
put that one into position.

02:07:33.020 --> 02:07:36.010
Now I'm comparing the 5 on the
left with the 4 on the right.

02:07:36.010 --> 02:07:38.380
Between the 5 and the
4, the 4 is smaller.

02:07:38.380 --> 02:07:41.130
So that one goes into position.

02:07:41.130 --> 02:07:44.970
And then now I'm comparing the 5 on
the left with the 7 on the right.

02:07:44.970 --> 02:07:47.811
5 is smaller, so the 5 goes next.

02:07:47.811 --> 02:07:50.880
Next I'm comparing the 6 on the
left with the 7 on the right.

02:07:50.880 --> 02:07:55.200
The 6 is still smaller, so
that one is going to go next.

02:07:55.200 --> 02:07:58.290
Now I'm comparing the 8 and the
7, the only two numbers left.

02:07:58.290 --> 02:08:00.650
The 7 is the smaller between the two.

02:08:00.650 --> 02:08:03.530
So I'll take the 7 and
put that into place.

02:08:03.530 --> 02:08:05.280
And now I'm only left
with one number that

02:08:05.280 --> 02:08:09.900
hasn't been put into the merging of the
two halves, and that's the number 8.

02:08:09.900 --> 02:08:12.510
So that number is going to
take up the final position.

02:08:12.510 --> 02:08:16.440
And now I've taken these to halves,
each of which was originally sorted,

02:08:16.440 --> 02:08:19.887
and made one complete array that has
all of those numbers in sorted order.

02:08:19.887 --> 02:08:20.720
DAVID MALAN: Indeed.

02:08:20.720 --> 02:08:21.930
And consider what we've done.

02:08:21.930 --> 02:08:24.120
We've essentially verbally
and physically kind of

02:08:24.120 --> 02:08:27.360
defined a helper function, our
own custom function if you will,

02:08:27.360 --> 02:08:32.430
whereby Brian has defined what
does it mean to merge two arrays--

02:08:32.430 --> 02:08:35.080
specifically merge two sorted arrays.

02:08:35.080 --> 02:08:35.640
Because why?

02:08:35.640 --> 02:08:37.050
Well, that's a building
block that I think

02:08:37.050 --> 02:08:38.920
we're going to want in
this merge sort algorithm.

02:08:38.920 --> 02:08:40.795
So just like in actual
C code, you might have

02:08:40.795 --> 02:08:43.230
defined a function that
does some small task,

02:08:43.230 --> 02:08:46.440
so have we now verbally and physically
defined the notion of merging.

02:08:46.440 --> 02:08:49.620
The mind bending part
here is that "Sort left

02:08:49.620 --> 02:08:52.770
half of numbers" and "Sort
right half of numbers"

02:08:52.770 --> 02:08:54.690
is kind of already implemented.

02:08:54.690 --> 02:08:58.170
There's nothing more for
Brian or me to define.

02:08:58.170 --> 02:09:01.890
All that remains is for us to execute
this algorithm, focusing especially

02:09:01.890 --> 02:09:04.300
on these three
highlighted lines of code.

02:09:04.300 --> 02:09:08.220
And let me disclaim that of the
algorithms we've looked at thus far,

02:09:08.220 --> 02:09:10.290
odds are this will be the
one that doesn't really

02:09:10.290 --> 02:09:11.887
sink in as quickly as the others.

02:09:11.887 --> 02:09:14.220
Even if the others might have
taken you a moment, a day,

02:09:14.220 --> 02:09:16.887
a week to settle in-- or maybe
you're still not quite there yet,

02:09:16.887 --> 02:09:17.520
that's fine--

02:09:17.520 --> 02:09:20.610
merge sort is a bit
of a mind bending one,

02:09:20.610 --> 02:09:23.130
because it seems to work magically.

02:09:23.130 --> 02:09:25.387
But it really just works
more intelligently.

02:09:25.387 --> 02:09:27.720
And you'll begin to get more
comfortable with harnessing

02:09:27.720 --> 02:09:31.190
these kinds of primitives so that we
can ultimately, indeed, solve problems

02:09:31.190 --> 02:09:31.960
more efficiently.

02:09:31.960 --> 02:09:35.130
So Brian has kindly put the
numbers again on the top shelf.

02:09:35.130 --> 02:09:37.740
And he has put them into their
original, unsorted order,

02:09:37.740 --> 02:09:40.080
just like for selection
sort and bubble sort.

02:09:40.080 --> 02:09:44.050
And Brian, I'd like to propose now that
we execute this merge sort algorithm.

02:09:44.050 --> 02:09:47.500
And if you don't mind, I'll
recite aloud first the few steps.

02:09:47.500 --> 02:09:51.690
So here is one array of size
8 with unsorted numbers.

02:09:51.690 --> 02:09:54.150
The goal is to these
numbers using merge sort.

02:09:54.150 --> 02:09:57.520
And recall that merge sort
essentially is just three steps--

02:09:57.520 --> 02:10:00.330
sort left half, sort right
half, merge sorted halves.

02:10:00.330 --> 02:10:02.280
So Brian, looking at
those numbers there,

02:10:02.280 --> 02:10:04.860
could you go ahead and sort
the left half of numbers?

02:10:04.860 --> 02:10:05.100
BRIAN: All right.

02:10:05.100 --> 02:10:06.225
So there are eight numbers.

02:10:06.225 --> 02:10:09.810
The left half would be these four
numbers, so I will sort those.

02:10:09.810 --> 02:10:13.020
Except I'm not really sure how
do I now sort these four numbers.

02:10:13.020 --> 02:10:13.770
DAVID MALAN: Yeah.

02:10:13.770 --> 02:10:16.353
So granted, we've seen selection
sort, we've seen bubble sort.

02:10:16.353 --> 02:10:19.530
But we don't want to regress to
those older, slower algorithms.

02:10:19.530 --> 02:10:22.110
Brian, I can kind of be
a little clever here.

02:10:22.110 --> 02:10:24.220
Well, I'm giving you
a sorting algorithm.

02:10:24.220 --> 02:10:28.140
So now you effectively have a
smaller problem, an array of size 4,

02:10:28.140 --> 02:10:30.870
and I'm pretty sure we can use
the same algorithm, merge sort,

02:10:30.870 --> 02:10:32.670
by sorting left half,
sorting right half,

02:10:32.670 --> 02:10:34.520
and then merging the sorted halves.

02:10:34.520 --> 02:10:38.232
So could you go ahead and sort the
left half of these four numbers?

02:10:38.232 --> 02:10:38.940
BRIAN: All right.

02:10:38.940 --> 02:10:40.020
So I have these four numbers.

02:10:40.020 --> 02:10:41.228
I want to sort the left half.

02:10:41.228 --> 02:10:42.893
That's these two numbers.

02:10:42.893 --> 02:10:45.060
So now I need to figure out
how to sort two numbers.

02:10:45.060 --> 02:10:45.390
DAVID MALAN: All right.

02:10:45.390 --> 02:10:48.490
Now, us with human intuition might
obviously know what we have to do here.

02:10:48.490 --> 02:10:51.360
But again, let's apply the algorithm--
sort left half, sort right half,

02:10:51.360 --> 02:10:52.170
merge sorted half.

02:10:52.170 --> 02:10:55.257
Brian, could you sort the right
half of this array of size 2?

02:10:55.257 --> 02:10:57.340
BRIAN: So I've got the
array of two, so I'll first

02:10:57.340 --> 02:11:00.150
sort the left half of the
array of two, which is the 6.

02:11:00.150 --> 02:11:03.120
DAVID MALAN: And this is where the
base case in white on the slide

02:11:03.120 --> 02:11:04.260
comes into play--

02:11:04.260 --> 02:11:06.180
if only one number, quit.

02:11:06.180 --> 02:11:07.830
So Brian, I can let you off the hook.

02:11:07.830 --> 02:11:12.220
That list of size one with
the number 6 is sorted.

02:11:12.220 --> 02:11:13.650
So that's step one of three done.

02:11:13.650 --> 02:11:16.950
Brian, could you sort the right
half of that array of size two?

02:11:16.950 --> 02:11:18.620
BRIAN: The right half is the number 3.

02:11:18.620 --> 02:11:20.770
It's also just one number,
so that one is done.

02:11:20.770 --> 02:11:21.520
DAVID MALAN: Good.

02:11:21.520 --> 02:11:22.890
So think about where
we are on the story.

02:11:22.890 --> 02:11:25.710
We've sorted the left half, and
we've started the right half,

02:11:25.710 --> 02:11:29.030
even though it looks like neither Brian
nor I have done any useful work yet.

02:11:29.030 --> 02:11:30.270
But now the magic happens.

02:11:30.270 --> 02:11:34.290
Brian, you now have
two arrays of size 1.

02:11:34.290 --> 02:11:36.222
Could you merge them together?

02:11:36.222 --> 02:11:36.930
BRIAN: All right.

02:11:36.930 --> 02:11:38.670
So I'm going to merge
these two together.

02:11:38.670 --> 02:11:40.530
Between the 6 and the
3, the 3 is smaller.

02:11:40.530 --> 02:11:42.300
So that one I'll put there first.

02:11:42.300 --> 02:11:44.700
And then I'll take the 6,
and that one goes next.

02:11:44.700 --> 02:11:47.892
And now I have a sorted array
of size 2 that is now done.

02:11:47.892 --> 02:11:48.850
DAVID MALAN: All right.

02:11:48.850 --> 02:11:51.725
And this is where you now need to
start remembering step by step sort

02:11:51.725 --> 02:11:53.850
of in your brain as the things pile up.

02:11:53.850 --> 02:11:55.180
How did we get to this point?

02:11:55.180 --> 02:11:57.120
We started with a list of size 8.

02:11:57.120 --> 02:12:00.360
We then looked at the left half,
which was an array of size 4.

02:12:00.360 --> 02:12:03.630
We then looked at the left half of
that, which was an array of size 2,

02:12:03.630 --> 02:12:06.990
then two arrays of size 1, then
we merged those two sorted halves.

02:12:06.990 --> 02:12:09.600
So I think now if I rewind
in that story, Brian,

02:12:09.600 --> 02:12:14.262
you need to sort the right half of
the left half of the original numbers.

02:12:14.262 --> 02:12:14.970
BRIAN: All right.

02:12:14.970 --> 02:12:16.770
So the left half is these four.

02:12:16.770 --> 02:12:20.430
The right half of the left half
is going to be these two numbers.

02:12:20.430 --> 02:12:23.790
And so now to those two, I guess
I would repeat the process again--

02:12:23.790 --> 02:12:25.240
look at the numbers individually.

02:12:25.240 --> 02:12:27.825
I would look at the left half
of these two, which is the 8.

02:12:27.825 --> 02:12:29.250
That one is done.

02:12:29.250 --> 02:12:31.180
And the 5, that one is done as well.

02:12:31.180 --> 02:12:32.138
DAVID MALAN: All right.

02:12:32.138 --> 02:12:35.262
So step three of three, then, is
merge those two sorted halves.

02:12:35.262 --> 02:12:35.970
BRIAN: All right.

02:12:35.970 --> 02:12:39.830
So between the 8 and the 5, the 5 is
smaller, so that one will go in first.

02:12:39.830 --> 02:12:41.460
And the 8 will go after that.

02:12:41.460 --> 02:12:45.157
And now I have a second array of
size 2 that is also now sorted.

02:12:45.157 --> 02:12:45.990
DAVID MALAN: Indeed.

02:12:45.990 --> 02:12:49.750
So here's where, again, you have
to rewind in your mind's eye.

02:12:49.750 --> 02:12:53.940
We've just now sorted
the left half, and we've

02:12:53.940 --> 02:12:58.240
sorted the left half and the
right half of the left half.

02:12:58.240 --> 02:13:02.700
So I think the third and final step
at this part of the story is, Brian,

02:13:02.700 --> 02:13:07.692
to merge those sorted halves,
each of which now is of size 2.

02:13:07.692 --> 02:13:08.400
BRIAN: All right.

02:13:08.400 --> 02:13:12.008
I have two arrays of size 2, each of
which is sorted, that I need to merge.

02:13:12.008 --> 02:13:14.300
So I'm going to compare the
smallest numbers from each.

02:13:14.300 --> 02:13:16.110
I'm going to compare the 3 and the 5.

02:13:16.110 --> 02:13:18.570
The 3 is smaller, so that
one will go in first.

02:13:18.570 --> 02:13:21.960
Now between these two arrays,
I have a 6 and a 5 to compare.

02:13:21.960 --> 02:13:24.210
The 5 is smaller, so
that one will go next.

02:13:24.210 --> 02:13:26.830
Between the 6 and the
8, the 6 is smaller.

02:13:26.830 --> 02:13:28.530
And I'm left with just the 8.

02:13:28.530 --> 02:13:32.250
So if we go back to the original story
of eight numbers that I was sorting,

02:13:32.250 --> 02:13:35.970
I think I have now sorted the
left half of the left four numbers

02:13:35.970 --> 02:13:37.230
from that original array.

02:13:37.230 --> 02:13:37.560
DAVID MALAN: Indeed.

02:13:37.560 --> 02:13:39.210
So if you're playing
along at home, think

02:13:39.210 --> 02:13:41.335
about-- you've got all
these thoughts probably kind

02:13:41.335 --> 02:13:42.430
of piling up in your mind.

02:13:42.430 --> 02:13:43.860
That's indeed supposed to be the case.

02:13:43.860 --> 02:13:46.152
And admittedly, it's hard to
keep track of all of that.

02:13:46.152 --> 02:13:48.520
So we'll let Brian now
execute this altogether

02:13:48.520 --> 02:13:52.213
together doing the same thing
now, by sorting the right half all

02:13:52.213 --> 02:13:53.130
the way to completion.

02:13:53.130 --> 02:13:53.963
Brian, if you could.

02:13:53.963 --> 02:13:54.672
BRIAN: All right.

02:13:54.672 --> 02:13:56.340
So the right half, you got four numbers.

02:13:56.340 --> 02:13:59.670
I'm going to start by sorting the
left half of the right half, which

02:13:59.670 --> 02:14:01.180
is these two numbers here.

02:14:01.180 --> 02:14:02.930
To do that, I'll repeat
the same process--

02:14:02.930 --> 02:14:06.240
sort the left half of these two
numbers, which is just the 2.

02:14:06.240 --> 02:14:08.080
That one's done, it's only one number.

02:14:08.080 --> 02:14:09.372
Same thing with the right half.

02:14:09.372 --> 02:14:11.100
The 7 is only one number, so it's done.

02:14:11.100 --> 02:14:13.410
And now I'll merge the
sorted halves together.

02:14:13.410 --> 02:14:17.160
Between the 2 and the 7, the
2 is smaller and then the 7.

02:14:17.160 --> 02:14:21.540
So here now is the left half of
the right half, an array of size 2,

02:14:21.540 --> 02:14:22.500
that is sorted.

02:14:22.500 --> 02:14:25.260
And I'll do the same thing with
the right half of the right half,

02:14:25.260 --> 02:14:27.120
starting with the left half, which is 4.

02:14:27.120 --> 02:14:28.090
That's done.

02:14:28.090 --> 02:14:29.200
The 1 is done.

02:14:29.200 --> 02:14:30.950
And now to merge these
two together, I'll

02:14:30.950 --> 02:14:33.190
compare them and say the 1 is smaller.

02:14:33.190 --> 02:14:36.330
So I'll put the 1 down and then the 4.

02:14:36.330 --> 02:14:39.870
So now I have two sorted
arrays, each of size 2,

02:14:39.870 --> 02:14:42.360
that I now need to backtrack
and now merge together

02:14:42.360 --> 02:14:44.460
to form an array of size 4.

02:14:44.460 --> 02:14:45.930
So I'll compare the 2 and the 1.

02:14:45.930 --> 02:14:48.000
Between those two, the 1 is smaller.

02:14:48.000 --> 02:14:49.980
Then I'll compare the 2 with the 4.

02:14:49.980 --> 02:14:51.570
The 2 is smaller.

02:14:51.570 --> 02:14:53.190
Then I'll compare the 7 with the 4.

02:14:53.190 --> 02:14:54.480
The 4 is smaller.

02:14:54.480 --> 02:14:57.210
And then finally, I'll just
take the 7, the last number,

02:14:57.210 --> 02:14:59.010
and put that in the final spot.

02:14:59.010 --> 02:15:01.740
And so now from the original
array of eight numbers,

02:15:01.740 --> 02:15:05.520
I've now sorted the left half,
and I've sorted the right half.

02:15:05.520 --> 02:15:09.030
DAVID MALAN: And now that brings us
to our third and very final step.

02:15:09.030 --> 02:15:11.950
Could you, Brian, merge
the sorted halves?

02:15:11.950 --> 02:15:12.450
BRIAN: Yeah.

02:15:12.450 --> 02:15:14.520
And I think this is actually
an example we've seen already.

02:15:14.520 --> 02:15:16.728
And what I'm going to do in
order to these two halves

02:15:16.728 --> 02:15:19.080
is just take the smaller
number from each half

02:15:19.080 --> 02:15:20.640
and compare them again and again.

02:15:20.640 --> 02:15:24.240
So between the 3 and the 1, the
1, that's the smallest number.

02:15:24.240 --> 02:15:25.800
So that goes into place.

02:15:25.800 --> 02:15:29.070
Then between the 3 and
the 2, the 2 is smaller,

02:15:29.070 --> 02:15:31.350
so we'll take that and
put that into place.

02:15:31.350 --> 02:15:33.930
Now I'm comparing the 3 with the 4.

02:15:33.930 --> 02:15:36.420
The 3, that goes next.

02:15:36.420 --> 02:15:38.640
Next I'm comparing the 5 with the 4.

02:15:38.640 --> 02:15:42.450
4 is smaller, so the 4
goes into place next.

02:15:42.450 --> 02:15:44.670
Now I'm comparing the 5 with the 7.

02:15:44.670 --> 02:15:47.930
5 is smaller, so that
one goes into place.

02:15:47.930 --> 02:15:51.630
And next, comparing the 6 with
the 7, so the 6 is smaller.

02:15:51.630 --> 02:15:52.870
That goes next.

02:15:52.870 --> 02:15:55.320
And now I'm left with two
numbers, the 8 and the 7.

02:15:55.320 --> 02:15:59.100
The 7 is the smaller of the
2, so that one goes next.

02:15:59.100 --> 02:16:02.200
And at this point, I only have
one number left, which is the 8.

02:16:02.200 --> 02:16:05.100
And so that one's going to
go into its sorted position

02:16:05.100 --> 02:16:06.607
at the end of the array.

02:16:06.607 --> 02:16:07.440
DAVID MALAN: Indeed.

02:16:07.440 --> 02:16:10.110
So even though it felt like we
weren't really doing anything

02:16:10.110 --> 02:16:12.300
at several points in that
story, it all sort of

02:16:12.300 --> 02:16:16.110
came together when we started merging
and merging and merging these lists.

02:16:16.110 --> 02:16:19.650
And it's not an accident that
Brian was using multiple shelves,

02:16:19.650 --> 02:16:22.320
moving the numbers from top
to bottom, to make clear

02:16:22.320 --> 02:16:26.285
just how many times he was
effectively dividing that list up.

02:16:26.285 --> 02:16:28.410
We started with a list of
eight, and we essentially

02:16:28.410 --> 02:16:33.510
took it to two lists of size 4, four
lists of size 2, eight lists of size 1.

02:16:33.510 --> 02:16:35.740
And while it wasn't
exactly in that order,

02:16:35.740 --> 02:16:38.850
if you rewind and analyze all of the
steps, that's indeed what he did.

02:16:38.850 --> 02:16:42.809
He went from 8 to two 4's
to four 2's to eight 1's.

02:16:42.809 --> 02:16:47.040
And that's why he moved those numbers
from the top shelf down three times--

02:16:47.040 --> 02:16:50.920
from 8's, to 4's, to 2's, to 1'r.

02:16:50.920 --> 02:16:53.100
So how many times did
he move the numbers?

02:16:53.100 --> 02:16:55.590
He moved them three times total.

02:16:55.590 --> 02:17:00.150
And on each of those shelves, how many
numbers did he have to merge together?

02:17:00.150 --> 02:17:04.020
On each of those shelves, he
ultimately touched all eight numbers.

02:17:04.020 --> 02:17:08.087
He first inserted the smallest
number, then the second smallest, then

02:17:08.087 --> 02:17:08.879
the third smallest.

02:17:08.879 --> 02:17:13.709
But unlike selection sort, he had
smartly already sorted those halves,

02:17:13.709 --> 02:17:16.072
so he was just plucking
them off one at a time.

02:17:16.072 --> 02:17:18.030
He wasn't going back and
forth, back and forth.

02:17:18.030 --> 02:17:22.469
He was constantly taking from the
beginning of each of those half lists.

02:17:22.469 --> 02:17:26.309
So on every shelf, he was
doing, let's say, n steps,

02:17:26.309 --> 02:17:29.790
because he was merging in
all n elements of that shelf.

02:17:29.790 --> 02:17:34.379
But how many times did he
merge n elements together?

02:17:34.379 --> 02:17:36.633
Well, he did that three total times.

02:17:36.633 --> 02:17:39.550
But if you think about binary search,
and really the process of divide

02:17:39.550 --> 02:17:43.070
and conquer more generally, anytime
you divide something in half and half

02:17:43.070 --> 02:17:47.150
and half, as he was doing
from 8's to 4's to 2's to 1's.

02:17:47.150 --> 02:17:48.170
That's a logarithm.

02:17:48.170 --> 02:17:49.639
That's log base 2.

02:17:49.639 --> 02:17:52.730
And indeed, that is wonderfully
the height of this shelf.

02:17:52.730 --> 02:17:56.480
If you have eight elements on the
shelf, the number of additional shelves

02:17:56.480 --> 02:18:03.180
Brian used, 3, is exactly what you
get by doing the math log base 2 of 8.

02:18:03.180 --> 02:18:08.059
Which is to say, Brian
did n things log n times.

02:18:08.059 --> 02:18:10.520
And again with a wave of the
hand, computer scientists

02:18:10.520 --> 02:18:13.129
don't bother mentioning the
base with Big O notation.

02:18:13.129 --> 02:18:14.840
It suffices just to say log n--

02:18:14.840 --> 02:18:18.410
Brian did n things log n times.

02:18:18.410 --> 02:18:22.490
And so if we consider, then,
the asymptotic complexity

02:18:22.490 --> 02:18:25.430
of this algorithm, that is to say
the running time of this algorithm,

02:18:25.430 --> 02:18:30.230
in terms of big O notation, notice
that it performs strictly better then

02:18:30.230 --> 02:18:32.660
selection sort and bubble sort--

02:18:32.660 --> 02:18:34.780
n times log n.

02:18:34.780 --> 02:18:37.760
And even, again, if you're a
little rusty on logarithms, log n,

02:18:37.760 --> 02:18:40.040
we have seen as of week
0 in binary search,

02:18:40.040 --> 02:18:43.040
is definitely faster than n steps.

02:18:43.040 --> 02:18:45.530
So n squared is n times n.

02:18:45.530 --> 02:18:49.459
n log n is n times log n,
which is indeed mathematically

02:18:49.459 --> 02:18:51.980
better then n squared.

02:18:51.980 --> 02:18:55.400
As with merge sort, though, if
we consider the lower bound,

02:18:55.400 --> 02:19:00.049
notice that bubble sort, yes,
got us as low as omega of n.

02:19:00.049 --> 02:19:03.750
Turns out merge sort is a
little bit like selection sort

02:19:03.750 --> 02:19:07.940
in that it doesn't optimize itself and
get you out of the algorithm early.

02:19:07.940 --> 02:19:13.375
It's always n log n, so it's
lower bound omega of n log n.

02:19:13.375 --> 02:19:14.750
And that might not be acceptable.

02:19:14.750 --> 02:19:16.730
Sometimes you might
have certain data inputs

02:19:16.730 --> 02:19:19.700
where maybe it tends to be sorted
and you don't want to waste time.

02:19:19.700 --> 02:19:21.799
So maybe you'd be OK with bubble sort.

02:19:21.799 --> 02:19:24.920
But honestly, as n gets
large, the probability

02:19:24.920 --> 02:19:29.540
that the input to your sorting algorithm
is just by chance going to be sorted

02:19:29.540 --> 02:19:33.200
is probably so, so low
that you're just better

02:19:33.200 --> 02:19:36.940
off in the general case using an
algorithm like merge sort that's

02:19:36.940 --> 02:19:38.660
n log n always.

02:19:38.660 --> 02:19:41.299
We can see this visually
using our bars, too.

02:19:41.299 --> 02:19:43.850
And notice, just as Brian
was dividing and conquering

02:19:43.850 --> 02:19:47.150
the problem in half and half and
half, and then reconstituting

02:19:47.150 --> 02:19:51.620
the array by merging those halves, you
can kind of see that visually here.

02:19:51.620 --> 02:19:53.618
There's a lot more going on.

02:19:53.618 --> 02:19:56.660
And it's going to seem in a moment
that everything just kind of magically

02:19:56.660 --> 02:19:57.160
worked.

02:19:57.160 --> 02:20:00.680
But you can see in the faded
purple bars that, indeed, this

02:20:00.680 --> 02:20:04.400
is sorting things in halves and
then merging those halves together.

02:20:04.400 --> 02:20:06.590
And this visualization
was a little different.

02:20:06.590 --> 02:20:08.510
It did not have the
luxury of three shelves.

02:20:08.510 --> 02:20:10.760
It just moved top to
bottom, top to bottom.

02:20:10.760 --> 02:20:13.460
And honestly, Brian could have
been a little more optimal there.

02:20:13.460 --> 02:20:16.250
We wanted to make clear how
many total shelves there were.

02:20:16.250 --> 02:20:18.470
But honestly, there's no
reason he couldn't have just

02:20:18.470 --> 02:20:21.950
moved the numbers down then back
up, then back down then back up.

02:20:21.950 --> 02:20:24.440
And, indeed that's the price
you pay with merge sort.

02:20:24.440 --> 02:20:27.920
Even though n log n is better
than n squared, and ergo

02:20:27.920 --> 02:20:31.790
merge sort is arguably better than
selection sort and bubble sort,

02:20:31.790 --> 02:20:32.960
you pay a price.

02:20:32.960 --> 02:20:35.750
And this speaks to the
trade-off I mentioned earlier.

02:20:35.750 --> 02:20:39.560
Almost always, when you do
something better in code

02:20:39.560 --> 02:20:42.897
or solve a problem more
intelligently, you have paid a price.

02:20:42.897 --> 02:20:45.230
Maybe you spent more time as
the human writing the code,

02:20:45.230 --> 02:20:47.420
because it was harder and
took more sophistication.

02:20:47.420 --> 02:20:48.620
That is a cost.

02:20:48.620 --> 02:20:51.680
Maybe you had to use
actually more space.

02:20:51.680 --> 02:20:56.950
Brian had to have at least one extra
shelf in order to implement merge sort.

02:20:56.950 --> 02:20:59.360
If implementing merge
sort in code and C,

02:20:59.360 --> 02:21:05.180
you will need at least a second array to
temporarily put the numbers into as you

02:21:05.180 --> 02:21:06.800
merge things back and forth.

02:21:06.800 --> 02:21:09.740
If you want to be extravagant,
you can have three separate arrays

02:21:09.740 --> 02:21:11.090
or four separate arrays.

02:21:11.090 --> 02:21:14.240
But it's suffices, per the graphical
representation of merge sort,

02:21:14.240 --> 02:21:16.103
to just use a second array.

02:21:16.103 --> 02:21:18.020
Now, that might not seem
like such a big deal.

02:21:18.020 --> 02:21:21.710
But implicitly, you need
twice as much space.

02:21:21.710 --> 02:21:23.270
And that might be a big deal.

02:21:23.270 --> 02:21:27.590
If you've got a million things to
sort, and you now need two arrays,

02:21:27.590 --> 02:21:30.530
that's 2 million chunks
of memory that you need.

02:21:30.530 --> 02:21:32.052
And maybe that's not tenable.

02:21:32.052 --> 02:21:34.010
So there, too, there's
going to be a trade-off.

02:21:34.010 --> 02:21:36.320
And maybe while slower, selection
sort of bubble sort, maybe

02:21:36.320 --> 02:21:38.820
it's better because it's a
little more efficient with space.

02:21:38.820 --> 02:21:41.120
It's going to depend
on what you care about

02:21:41.120 --> 02:21:42.870
and what you want to optimize for.

02:21:42.870 --> 02:21:44.750
And honestly, money
is sometimes a factor.

02:21:44.750 --> 02:21:48.500
In the real world, maybe it's
better to write slightly slower code

02:21:48.500 --> 02:21:52.130
so that you don't have to buy twice as
many servers or twice as much memory

02:21:52.130 --> 02:21:53.210
for your computer.

02:21:53.210 --> 02:21:55.730
It depends there on what
resource is more important--

02:21:55.730 --> 02:22:00.770
your time, the computer's time,
your wallet, or some other resource

02:22:00.770 --> 02:22:01.340
altogether.

02:22:01.340 --> 02:22:03.590
So we'll continue to see
these kinds of trade-offs.

02:22:03.590 --> 02:22:07.500
But perhaps the most mind blowing
thing we can do as we wrap up here

02:22:07.500 --> 02:22:12.920
is share a few visualizations of how
these algorithms actually compare.

02:22:12.920 --> 02:22:17.900
And one last piece of jargon is
this one final Greek symbol, theta.

02:22:17.900 --> 02:22:21.920
It turns out that, thanks to
selection sort and merge sort,

02:22:21.920 --> 02:22:26.570
we can actually apply one more term
of art here, this theta notation.

02:22:26.570 --> 02:22:30.140
Anytime an algorithm has
both the same upper bound

02:22:30.140 --> 02:22:33.050
as its lower bound running
time, you can actually

02:22:33.050 --> 02:22:37.980
describe it in just one sentence instead
of two in terms of theta notation.

02:22:37.980 --> 02:22:41.375
So because selection sort was in
both big O of n squared and omega

02:22:41.375 --> 02:22:44.870
of n squared, you can actually just
say, ah, it's in theta of n squared.

02:22:44.870 --> 02:22:49.010
It's always n squared either in the
upper bound or in the lower bound.

02:22:49.010 --> 02:22:50.210
Same thing for merge sort.

02:22:50.210 --> 02:22:52.310
It's in theta of n log n.

02:22:52.310 --> 02:22:57.860
We cannot use theta for bubble sort or
for binary search or for linear search,

02:22:57.860 --> 02:23:01.150
because they had different
upper and lower bounds.

02:23:01.150 --> 02:23:05.600
Well, let me go ahead now and
prepare a final demonstration,

02:23:05.600 --> 02:23:08.090
this time using some random inputs.

02:23:08.090 --> 02:23:11.750
So you'll see here a video comparing
selection sort, bubble sort,

02:23:11.750 --> 02:23:13.760
and merge sort all together.

02:23:13.760 --> 02:23:16.560
All three of them
start with random data.

02:23:16.560 --> 02:23:18.830
But let's just see what
it means for an algorithm

02:23:18.830 --> 02:23:26.295
to be an n squared in the worst case
or in n log n in this case instead.

02:23:26.295 --> 02:23:26.920
[MUSIC PLAYING]

02:23:26.920 --> 02:23:29.295
Selection sort's on the top,
bubble sort's on the bottom,

02:23:29.295 --> 02:23:30.490
merge sort's in the middle.

02:23:30.490 --> 02:23:35.352
And would you believe it,
merge sort is already done.

02:23:35.352 --> 02:23:37.280
[MUSIC INTENSIFIES]

02:23:37.280 --> 02:23:40.902
And meanwhile, we have some very
trendy music we can listen to,

02:23:40.902 --> 02:23:42.610
which is really just
there to distract us

02:23:42.610 --> 02:23:47.060
from the fact at how slow n
squared actually is in practice.

02:23:47.060 --> 02:23:49.420
And notice, there's not
that many bars here.

02:23:49.420 --> 02:23:51.310
There's maybe like a hundred or so bars.

02:23:51.310 --> 02:23:52.690
Like, n is 100.

02:23:52.690 --> 02:23:53.882
That's not even a big value.

02:23:53.882 --> 02:23:56.590
When we're talking about the
Twitters, the Facebooks, the Googles

02:23:56.590 --> 02:23:59.080
of the world, these are trivial sizes.

02:23:59.080 --> 02:24:03.220
And yet, my God, we're still waiting
for selection sort and bubble sort

02:24:03.220 --> 02:24:04.300
to finish.

02:24:04.300 --> 02:24:08.110
And so you can see here that it really
matters when you exercise a little bit

02:24:08.110 --> 02:24:10.980
more cleverness, and you leverage
a more efficient algorithm--

02:24:10.980 --> 02:24:14.495
and finally, selection sort is
done, bubble sort still taking

02:24:14.495 --> 02:24:15.370
a little longer here.

02:24:15.370 --> 02:24:16.810
And this is going to
depend on the input.

02:24:16.810 --> 02:24:18.610
Sometimes you can get lucky or unlucky.

02:24:18.610 --> 02:24:22.390
But I think it's convincing that
merge sort has won in this case.

02:24:22.390 --> 02:24:26.940
[MUSIC PLAYING]