WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:04.482
[MUSIC PLAYING]

00:00:49.370 --> 00:00:53.270
DAVID MALAN: All right, this
is CS50, and this is week four.

00:00:53.270 --> 00:00:55.190
And for the past
several weeks, we've had

00:00:55.190 --> 00:00:58.217
training wheels of sorts on, while
using this language known as C.

00:00:58.217 --> 00:01:01.050
And those training wheels have been
in the form of the CS50 library.

00:01:01.050 --> 00:01:05.580
And you use this library, of course,
by selecting and including cs50.h

00:01:05.580 --> 00:01:06.650
atop your code.

00:01:06.650 --> 00:01:08.733
And then if you think about
how clang works,

00:01:08.733 --> 00:01:12.080
you've been linking your
code via dash L CS50.

00:01:12.080 --> 00:01:15.290
But all of that has been automated
for you up until now, using make.

00:01:15.290 --> 00:01:17.900
Today, we'll transition
from last week's focus

00:01:17.900 --> 00:01:21.290
on algorithms to a little
more focus on machines

00:01:21.290 --> 00:01:24.980
and on the machines we now use to
implement these algorithms all the more

00:01:24.980 --> 00:01:27.410
powerfully, as we begin to
take off these training wheels

00:01:27.410 --> 00:01:30.840
and look at what's really going on
underneath the hood of your computer.

00:01:30.840 --> 00:01:33.740
And as complicated as some
aspects of C have been,

00:01:33.740 --> 00:01:36.320
as new is programming
may very well be to you,

00:01:36.320 --> 00:01:39.710
realize that there's not all that
much going on underneath the hood

00:01:39.710 --> 00:01:42.350
that we need to understand
to now move onward

00:01:42.350 --> 00:01:45.920
and start solving far more interesting
and more sophisticated and more

00:01:45.920 --> 00:01:46.820
fun problems.

00:01:46.820 --> 00:01:49.170
We just need a few
additional building blocks.

00:01:49.170 --> 00:01:52.340
And so today, we'll do this,
first, by relearning how to count.

00:01:52.340 --> 00:01:55.080
Here, for instance, is what
we'll call the computer's memory.

00:01:55.080 --> 00:01:56.420
And we've seen this grid before.

00:01:56.420 --> 00:01:59.420
And we can number recall all of the
bytes in your computer's memory.

00:01:59.420 --> 00:02:04.550
We might call this byte number 0, 1,
2, 3, 4, all the way up to byte 15,

00:02:04.550 --> 00:02:05.610
and so forth.

00:02:05.610 --> 00:02:08.240
But it turns out, when talking
about computers' memories,

00:02:08.240 --> 00:02:10.610
computers and computer
scientists and programmers

00:02:10.610 --> 00:02:13.070
actually don't tend to use decimal.

00:02:13.070 --> 00:02:15.830
They definitely don't tend to
use binary at that low level.

00:02:15.830 --> 00:02:19.010
Instead, they tend to use,
just for conventional sake,

00:02:19.010 --> 00:02:21.020
something called hexadecimal.

00:02:21.020 --> 00:02:23.210
Hexadecimal is a different
base system that,

00:02:23.210 --> 00:02:27.120
instead of using 10 digits
or 2 digits, uses 16 instead.

00:02:27.120 --> 00:02:29.360
And so a computer scientist,
when numbering things

00:02:29.360 --> 00:02:33.980
like bytes in a computer memory, would
still do 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

00:02:33.980 --> 00:02:37.350
But after that, instead of going
onward with decimal to, say, 10,

00:02:37.350 --> 00:02:40.970
11, 12, 13, 14, 15, they
instead, conventionally,

00:02:40.970 --> 00:02:43.260
would start using a few
letters of the alphabet.

00:02:43.260 --> 00:02:47.270
And so, in hexadecimal, this
different base system base 16,

00:02:47.270 --> 00:02:48.980
you start counting at 0 still.

00:02:48.980 --> 00:02:51.130
You count up to and through 9.

00:02:51.130 --> 00:02:52.880
But when you want to
keep counting higher,

00:02:52.880 --> 00:02:57.440
you then go to A, B, C, D, E, and F.

00:02:57.440 --> 00:03:02.630
And the upside of this is that, within
hexadecimal-- and that hex implies 16--

00:03:02.630 --> 00:03:08.630
you have 16 total individual digits, 0
through 9, and also now, A through F.

00:03:08.630 --> 00:03:12.300
So we don't have to introduce second
digits just to count up as high as 16.

00:03:12.300 --> 00:03:14.480
We can use individual
digits 0 through F.

00:03:14.480 --> 00:03:18.650
And we can keep counting up further
by using multiple hexadecimal digits.

00:03:18.650 --> 00:03:21.150
But to get there, let's
introduce this vocabulary.

00:03:21.150 --> 00:03:23.540
So in binary, of course,
we use 0's and 1's.

00:03:23.540 --> 00:03:25.690
In decimal, of course,
we use 0 through 9's.

00:03:25.690 --> 00:03:29.360
And in hexadecimal, to be clear, we're
going to use 0 through F's, otherwise

00:03:29.360 --> 00:03:30.860
known as base-16.

00:03:30.860 --> 00:03:33.320
And it's just a convention
that we use A through F. We

00:03:33.320 --> 00:03:35.450
could have used any other six symbols.

00:03:35.450 --> 00:03:37.560
But these are what humans have chosen.

00:03:37.560 --> 00:03:41.090
So hexadecimal works quite similarly
to our familiar decimal system.

00:03:41.090 --> 00:03:45.110
And it's even familiar to, now, what you
know as the binary system, as follows.

00:03:45.110 --> 00:03:49.370
Let's consider a two-digit value
using hexadecimal instead of decimal

00:03:49.370 --> 00:03:50.600
and instead of binary.

00:03:50.600 --> 00:03:54.680
Well, just like in the world
of decimal, we used base-10,

00:03:54.680 --> 00:03:57.080
or in the world of
binary, we used base-2.

00:03:57.080 --> 00:04:01.170
We're just going to use, now,
base-16, ergo, hexadecimal.

00:04:01.170 --> 00:04:02.360
So this is 16 to the first.

00:04:02.360 --> 00:04:03.590
This is 16 to the--

00:04:03.590 --> 00:04:05.090
sorry 16 to the 0.

00:04:05.090 --> 00:04:06.590
This is 16 to the first.

00:04:06.590 --> 00:04:09.570
And of course, if we multiply that
out, it's just the ones column

00:04:09.570 --> 00:04:11.280
and now the 16's column.

00:04:11.280 --> 00:04:13.550
And so if you want to
count up in hexadecimal,

00:04:13.550 --> 00:04:21.290
you still start with 0 as usual, then
01, 02, 03, 04, 05, 06, 07, 08, 09.

00:04:21.290 --> 00:04:22.910
And then things get interesting.

00:04:22.910 --> 00:04:26.660
Now, you don't go to 01,
because that would be incorrect.

00:04:26.660 --> 00:04:31.880
01, in this base system, would be
like 16 times 1 plus 1 times 0.

00:04:31.880 --> 00:04:32.960
That's not what we want.

00:04:32.960 --> 00:04:38.930
After the number we know is 9, we
now count up to A, B, C, D, E, F.

00:04:38.930 --> 00:04:40.670
And now, things get interesting again.

00:04:40.670 --> 00:04:43.580
But just like in the decimal system,
when you count up to, like, 99,

00:04:43.580 --> 00:04:46.550
you have to start carrying
the 1, same thing here.

00:04:46.550 --> 00:04:49.820
If you want to count
past F, you carry the 1.

00:04:49.820 --> 00:04:55.340
And so now, to represent one
value greater than F, we use 01,

00:04:55.340 --> 00:04:57.350
which looks like 10, but is not 10.

00:04:57.350 --> 00:04:59.675
In hexadecimal, it is 01.

00:04:59.675 --> 00:05:01.880
16 times 1 gives us 16.

00:05:01.880 --> 00:05:03.680
1 times 0 gives us 0.

00:05:03.680 --> 00:05:07.050
And of course, that gives us the
decimal number we now know is 16.

00:05:07.050 --> 00:05:09.980
So we will no longer introduce
more and more base systems.

00:05:09.980 --> 00:05:12.607
But let me stipulate that
just by using these columns

00:05:12.607 --> 00:05:14.690
that you learned back in
grade school, presumably,

00:05:14.690 --> 00:05:16.940
can you implement any base system now.

00:05:16.940 --> 00:05:19.310
It just so happens that
in the world of computers,

00:05:19.310 --> 00:05:22.295
and today in the world of
memory, and soon, also files,

00:05:22.295 --> 00:05:24.170
it's just going to be
very conventional to be

00:05:24.170 --> 00:05:26.990
able to recognize and use hexadecimal.

00:05:26.990 --> 00:05:29.530
And in fact, there's a reason
humans like hexadecimal,

00:05:29.530 --> 00:05:30.530
or at least some humans.

00:05:30.530 --> 00:05:36.827
Computer scientists recall that if we
count up as high as FF, in this case,

00:05:36.827 --> 00:05:38.160
we would still do the same math.

00:05:38.160 --> 00:05:44.060
So 16 times 15 plus 1 times 15 is
going to give us, really, this,

00:05:44.060 --> 00:05:49.210
or of course, 240 plus 15, or 255.

00:05:49.210 --> 00:05:50.460
And I did that pretty quickly.

00:05:50.460 --> 00:05:53.000
But that's just the sort of
grade school math of multiplying

00:05:53.000 --> 00:05:55.730
the column by the value
that's in it, where again,

00:05:55.730 --> 00:06:00.140
each of these F's is how we now
express 15 using a single digit.

00:06:00.140 --> 00:06:02.480
But recall that we've seen 255 before.

00:06:02.480 --> 00:06:04.610
Back when we talked about
binary a few weeks ago,

00:06:04.610 --> 00:06:12.450
255 also happened to be the pattern that
we see here, eight 1 bits using binary.

00:06:12.450 --> 00:06:15.278
And so the reason that computer
scientists tend to hexadecimal,

00:06:15.278 --> 00:06:17.570
is that, you know what, in
eight bits, there's actually

00:06:17.570 --> 00:06:20.000
two pairs here, like four on
the left, four on the right.

00:06:20.000 --> 00:06:22.340
If we sort of scooch
these things over, it

00:06:22.340 --> 00:06:25.520
turns out that because
hexadecimal allows

00:06:25.520 --> 00:06:28.730
you to represent 16
possible values, it's

00:06:28.730 --> 00:06:32.750
a perfect system for
representing four bits at a time.

00:06:32.750 --> 00:06:36.980
After all, if you've got four bits
here, each of which can be a 0 or 1,

00:06:36.980 --> 00:06:42.020
that's 2 times 2 times 2 times 2
possible values for each of those,

00:06:42.020 --> 00:06:45.740
or 16 total values, which is to
say that in the world of computers,

00:06:45.740 --> 00:06:48.560
if you ever want to talk
in units of four bits,

00:06:48.560 --> 00:06:51.590
it's wonderfully convenient
to use hexadecimal instead,

00:06:51.590 --> 00:06:56.270
only because, conveniently, one
hexadecimal digit happens to be

00:06:56.270 --> 00:07:00.590
equivalent to four binary
digits, 0's and 1's.

00:07:00.590 --> 00:07:05.160
So 0, 0, 0, 0, all the
way up through 1, 1, 1, 1.

00:07:05.160 --> 00:07:06.320
So why do humans do this?

00:07:06.320 --> 00:07:09.240
It's just now the human convention
because of that convenience.

00:07:09.240 --> 00:07:11.760
Now, some of you may very well
have seen hexadecimal before.

00:07:11.760 --> 00:07:14.660
In fact, recall our
discussion in week 0 of RGB,

00:07:14.660 --> 00:07:17.660
where we discussed the
representation of colors using

00:07:17.660 --> 00:07:19.860
some amount of red, green, and blue.

00:07:19.860 --> 00:07:21.720
And at the time, we used this example.

00:07:21.720 --> 00:07:24.080
We took our example out of context.

00:07:24.080 --> 00:07:27.560
And instead of using
hi as a string of text,

00:07:27.560 --> 00:07:33.410
we reinterpreted 72, 73, and
33 as a sequence of colors.

00:07:33.410 --> 00:07:34.550
How much red do you want?

00:07:34.550 --> 00:07:35.720
How much green do you want?

00:07:35.720 --> 00:07:36.860
How much blue do you want?

00:07:36.860 --> 00:07:37.820
And that's fine.

00:07:37.820 --> 00:07:41.060
It's perfectly fine to think and
express yourself in terms of decimal.

00:07:41.060 --> 00:07:44.270
But computer scientists tend not to do
it that way in the context of colors

00:07:44.270 --> 00:07:45.790
and in the context of memory.

00:07:45.790 --> 00:07:49.160
Instead, they tend to use
something called hexadecimal.

00:07:49.160 --> 00:07:51.590
And hexadecimal, here,
would actually just

00:07:51.590 --> 00:07:57.860
have you change these values from 72,
73, 33, to the equivalent hexadecimal

00:07:57.860 --> 00:07:58.533
representation.

00:07:58.533 --> 00:08:00.200
And we won't bother doing the math here.

00:08:00.200 --> 00:08:04.340
But let me just stipulate
that 72, 73, 33 in decimal

00:08:04.340 --> 00:08:10.262
is the same thing as 48,
49, 21 in hexadecimal.

00:08:10.262 --> 00:08:12.470
Now, obviously, if you glance
at these three numbers,

00:08:12.470 --> 00:08:15.980
it's not at all obvious if you're
looking at hexadecimal digits

00:08:15.980 --> 00:08:21.080
or decimal digits, because they do
use the same subset, 0's through 9's.

00:08:21.080 --> 00:08:23.240
And so a convention, too,
in the computing world,

00:08:23.240 --> 00:08:25.850
is any time you represent
hexadecimal digits,

00:08:25.850 --> 00:08:29.300
you tend to prefix them,
just because, with 0x.

00:08:29.300 --> 00:08:32.179
And there's no mathematical
meaning to the 0 or the x.

00:08:32.179 --> 00:08:35.419
It's just a prefix you put there
to make clear to the viewer

00:08:35.419 --> 00:08:38.299
that these are hexadecimal digits,
even if they might otherwise

00:08:38.299 --> 00:08:40.490
look like decimal digits.

00:08:40.490 --> 00:08:41.940
So where are we going with this?

00:08:41.940 --> 00:08:43.857
Well, those of you who
might have experimented

00:08:43.857 --> 00:08:46.850
in the past with making your own
web pages and making them colorful,

00:08:46.850 --> 00:08:50.450
or those of you who are artists and
have used programs like Photoshop, odds

00:08:50.450 --> 00:08:53.190
are, you've seen these codes before.

00:08:53.190 --> 00:08:55.940
In fact, here are a few
screenshots of Photoshop itself.

00:08:55.940 --> 00:08:59.190
If you click on a color in Photoshop
and you pull up this window,

00:08:59.190 --> 00:09:02.300
you can change the color that
you're drawing on the screen

00:09:02.300 --> 00:09:04.970
to be any of the colors of the rainbow.

00:09:04.970 --> 00:09:07.470
But more arcanely, if
you look down here,

00:09:07.470 --> 00:09:09.620
you can actually see
these hexadecimal codes,

00:09:09.620 --> 00:09:11.990
because it's become human
convention over the years

00:09:11.990 --> 00:09:15.630
to use hexadecimal to represent
different amounts of red, green,

00:09:15.630 --> 00:09:16.320
and blue.

00:09:16.320 --> 00:09:23.435
So if you have no red, no green, no
blue, otherwise represented as 000000,

00:09:23.435 --> 00:09:26.060
well, that's going to give you
the color we know here as black.

00:09:26.060 --> 00:09:29.510
It's sort of the absence of
any wavelengths of light there.

00:09:29.510 --> 00:09:33.470
If by contrast, though, you
change all of those six digits

00:09:33.470 --> 00:09:38.810
to the highest possible value, which,
again, is F. The range in hexadecimal 0

00:09:38.810 --> 00:09:42.890
through F, otherwise in decimal,
being 0 through 15, well,

00:09:42.890 --> 00:09:46.800
with FFFFFF, that's a lot of red,
a lot of green, a lot of blue.

00:09:46.800 --> 00:09:48.800
And when you combine those
wavelengths of light,

00:09:48.800 --> 00:09:51.200
you get the color we see here as white.

00:09:51.200 --> 00:09:53.480
And you can imagine, now,
combining different amounts

00:09:53.480 --> 00:09:54.930
of red or green or blue.

00:09:54.930 --> 00:10:00.740
So for instance, in hexadecimal,
FF0000, is the color we know as red.

00:10:00.740 --> 00:10:05.270
00FF00 is the color we know as green.

00:10:05.270 --> 00:10:09.630
And finally, 0000FF is the color
we know as blue, because again,

00:10:09.630 --> 00:10:14.240
the system that programmers and artists
often but don't always use, is indeed,

00:10:14.240 --> 00:10:17.710
this system of RGB for
red, green, and blue.

00:10:17.710 --> 00:10:19.460
So we introduced this
here not because you

00:10:19.460 --> 00:10:21.810
have to start thinking any
differently, because again,

00:10:21.810 --> 00:10:24.560
the mathematical mechanism
is the same as week 0.

00:10:24.560 --> 00:10:28.970
But you're going to start seeing
numbers in examples, in programs,

00:10:28.970 --> 00:10:32.900
as just appearing in hexadecimal by
convention, as opposed to actually

00:10:32.900 --> 00:10:35.550
being interpreted as decimal.

00:10:35.550 --> 00:10:37.880
So if we consider, now,
our computer's memory,

00:10:37.880 --> 00:10:40.610
we'll now start thinking of
this whole canvas of memory,

00:10:40.610 --> 00:10:43.010
all of these bytes inside
of our computer's memory,

00:10:43.010 --> 00:10:46.700
as being innumerable as 0,
1, 2, all the way through F.

00:10:46.700 --> 00:10:53.750
And then if we keep counting, we can go
to 10, 11, 12, 13, 14, 15, 16, 17, 18,

00:10:53.750 --> 00:10:58.850
19, 1A, 1B, 1C, 1D, and so forth.

00:10:58.850 --> 00:11:00.790
And it's fine if it's
not nearly that obvious,

00:11:00.790 --> 00:11:03.670
as you look at these things,
what the decimal equivalents are.

00:11:03.670 --> 00:11:04.690
That's not a problem.

00:11:04.690 --> 00:11:09.130
It's just a different way of thinking
about the locations, in this case,

00:11:09.130 --> 00:11:13.480
of a computer's memory, or the
representation of one color or another.

00:11:13.480 --> 00:11:19.480
All right, well, let's now use this
as an example of an opportunity,

00:11:19.480 --> 00:11:22.690
rather, to consider what's actually
being stored in our computer's memory.

00:11:22.690 --> 00:11:26.320
And to be clear, I'll start prefixing
all of these memory addresses,

00:11:26.320 --> 00:11:29.890
so to speak, with 0x, just to make
clear that we're now talking, indeed,

00:11:29.890 --> 00:11:31.480
in terms of 0's and 1's.

00:11:31.480 --> 00:11:32.980
So here's a simple line of code.

00:11:32.980 --> 00:11:35.147
Out of context, we would
need to, actually, put this

00:11:35.147 --> 00:11:37.910
in main or some other program
to actually do anything with it.

00:11:37.910 --> 00:11:39.702
But we've seen this
before many times, now,

00:11:39.702 --> 00:11:42.760
where you declare a variable,
for instance, n for number.

00:11:42.760 --> 00:11:44.830
Declare it as an int for its type.

00:11:44.830 --> 00:11:47.170
And then, perhaps,
even assign it a value.

00:11:47.170 --> 00:11:51.520
Well, what's actually going on when we
use this kind of code in our computer?

00:11:51.520 --> 00:11:54.760
Well, let's go ahead and whip
this thing up in a actual program.

00:11:54.760 --> 00:11:57.970
Let me create a file
called address.c because I

00:11:57.970 --> 00:12:01.300
want to start experimenting with some
addresses in the computer's memory.

00:12:01.300 --> 00:12:04.180
I'm going to go ahead and
include standard io dot h.

00:12:04.180 --> 00:12:06.460
I'm going to give myself int main void.

00:12:06.460 --> 00:12:08.890
And down here, I'm going to
go ahead and declare exactly

00:12:08.890 --> 00:12:10.915
that variable, int n equals 50.

00:12:10.915 --> 00:12:15.820
And then I'm going to go ahead and print
out, with percent i and a backslash 0,

00:12:15.820 --> 00:12:17.230
the value of n.

00:12:17.230 --> 00:12:19.930
So nothing interesting there,
nothing too complicated.

00:12:19.930 --> 00:12:21.790
I'm going to go ahead and make address.

00:12:21.790 --> 00:12:24.123
And then I'm going to go ahead
and do dot slash address.

00:12:24.123 --> 00:12:26.380
And of course, as per week
one, we should hopefully

00:12:26.380 --> 00:12:27.930
see just the number 50.

00:12:27.930 --> 00:12:31.570
But today, we're going to give you some
more tools with which you can actually

00:12:31.570 --> 00:12:33.880
start poking around
your computer's memory.

00:12:33.880 --> 00:12:35.950
But let's first consider
this line of code

00:12:35.950 --> 00:12:38.240
in the context of your
computer's hardware.

00:12:38.240 --> 00:12:41.200
So if you're writing a program
with a line of code like this,

00:12:41.200 --> 00:12:44.500
that n needs to be somewhere
in your computer's memory.

00:12:44.500 --> 00:12:47.870
That 50 needs to be put somewhere
in your computer's memory.

00:12:47.870 --> 00:12:51.010
So if we, again, consider this
to be just part of our computer's

00:12:51.010 --> 00:12:55.000
memory, a few dozen bytes, well,
suppose that that variable, n,

00:12:55.000 --> 00:12:57.130
happens to end up down here.

00:12:57.130 --> 00:13:01.570
I've deliberately drawn n as taking up
four bytes, four squares, because we

00:13:01.570 --> 00:13:05.830
call that an integer, typically, at
least on CS50 IDE and modern systems,

00:13:05.830 --> 00:13:07.370
tends to be four bytes.

00:13:07.370 --> 00:13:10.630
So I made sure to have it
fill four complete boxes.

00:13:10.630 --> 00:13:13.940
And then value might be 50
that's actually stored there.

00:13:13.940 --> 00:13:17.890
Well, it turns out that within
your computer's memory, again,

00:13:17.890 --> 00:13:20.660
there are these addresses
that are implicitly there.

00:13:20.660 --> 00:13:23.530
So even though, yes, we can
refer to this variable, n,

00:13:23.530 --> 00:13:26.620
based on the variable
name I gave it in my code,

00:13:26.620 --> 00:13:30.940
surely this variable exists at
a specific location in memory.

00:13:30.940 --> 00:13:32.530
I don't know offhand where it is.

00:13:32.530 --> 00:13:38.410
But let me just propose that maybe
it's at location 0x12345678, just

00:13:38.410 --> 00:13:39.550
an arbitrary address.

00:13:39.550 --> 00:13:41.690
I have no idea, in
actuality, where it is.

00:13:41.690 --> 00:13:44.860
But it certainly does have an address,
because every one of these squares

00:13:44.860 --> 00:13:49.540
inside of your computer's memory has an
address, a unique identifier like 0, 1,

00:13:49.540 --> 00:13:50.750
2, and so forth.

00:13:50.750 --> 00:13:56.710
Maybe the 50 ended up at
memory address 0x12345678.

00:13:56.710 --> 00:14:01.750
Well, that's kind of cool about C, is
that we can actually begin to see this,

00:14:01.750 --> 00:14:03.020
no pun intended.

00:14:03.020 --> 00:14:05.080
So let me go ahead and
modify this program

00:14:05.080 --> 00:14:07.480
and introduce a little
bit of new syntax that

00:14:07.480 --> 00:14:11.510
will allow us to start poking around
the inside of your computer's memory

00:14:11.510 --> 00:14:14.830
so we can actually see
what's going on underneath.

00:14:14.830 --> 00:14:17.710
So I'm going to go ahead and change
this program to do this instead.

00:14:17.710 --> 00:14:19.585
I'm going to go ahead
and say, you know what?

00:14:19.585 --> 00:14:23.590
Don't just print out the value,
n, which, of course, is 50.

00:14:23.590 --> 00:14:28.060
Let me see, just out of curiosity,
what is the actual address of n.

00:14:28.060 --> 00:14:31.300
And to do that today, we're going to
introduce one new piece of syntax,

00:14:31.300 --> 00:14:33.070
which happens to be this here.

00:14:33.070 --> 00:14:37.360
There's two new operators, today, in
C. The first is an ampersand, which

00:14:37.360 --> 00:14:39.580
does not represent a logical and.

00:14:39.580 --> 00:14:42.100
Recall a couple of weeks
ago, we did see that if you

00:14:42.100 --> 00:14:46.840
want to combine Boolean expressions,
this and that, you use two ampersands.

00:14:46.840 --> 00:14:51.040
It's an unfortunate coincidence
that an ampersand, solo like this,

00:14:51.040 --> 00:14:52.630
will mean something different today.

00:14:52.630 --> 00:14:56.830
Specifically, this ampersand is
going to be our address of operator.

00:14:56.830 --> 00:15:02.590
By simply prefixing any variable name
with an ampersand, we can tell C,

00:15:02.590 --> 00:15:06.520
please tell me what address
this variable is stored in.

00:15:06.520 --> 00:15:10.180
And this star, not to be
confused with multiplication,

00:15:10.180 --> 00:15:12.880
also has another meaning
in today's context.

00:15:12.880 --> 00:15:15.310
When you use this
asterisk, you can actually

00:15:15.310 --> 00:15:19.910
tell your program to look inside
of a particular memory address.

00:15:19.910 --> 00:15:23.500
So the ampersand tells you
what address a variable is at.

00:15:23.500 --> 00:15:27.310
The star operator, otherwise
known as the dereference operator,

00:15:27.310 --> 00:15:30.190
means, go to the following address.

00:15:30.190 --> 00:15:32.050
So they sort of are reverse operations.

00:15:32.050 --> 00:15:33.400
One figures out the address.

00:15:33.400 --> 00:15:35.240
One goes to the address.

00:15:35.240 --> 00:15:37.850
And so let's see this for real here.

00:15:37.850 --> 00:15:43.070
Let me go ahead and change my n
in my program here to ampersand n.

00:15:43.070 --> 00:15:48.980
So I want to print out, not the
number in n, but the address of n.

00:15:48.980 --> 00:15:50.870
And now, how do I print out an address?

00:15:50.870 --> 00:15:52.170
Well, it is just a number.

00:15:52.170 --> 00:15:56.690
But actually, printf supports a
different format code for addresses.

00:15:56.690 --> 00:15:59.840
You can do percent p, for
reasons we'll soon see,

00:15:59.840 --> 00:16:02.510
that says to print out the
address of this variable

00:16:02.510 --> 00:16:05.375
and interpret it as hexadecimal,
again, by convention.

00:16:05.375 --> 00:16:07.250
So I'm going to go ahead
and make address now

00:16:07.250 --> 00:16:10.530
after only making two
changes to this file.

00:16:10.530 --> 00:16:12.350
Everything seems to compile OK.

00:16:12.350 --> 00:16:14.150
Now, I'm going to go
ahead and run address.

00:16:14.150 --> 00:16:17.210
And we will see that, in
this particular program,

00:16:17.210 --> 00:16:21.620
address.c, for whatever
reason, that variable, n,

00:16:21.620 --> 00:16:30.110
ended up at crazy
location 0x7ffd80792f7c.

00:16:30.110 --> 00:16:31.160
Now, is that useful?

00:16:31.160 --> 00:16:32.870
Not in practice, necessarily.

00:16:32.870 --> 00:16:36.530
We're going to make this become
useful by leveraging these addresses.

00:16:36.530 --> 00:16:38.900
But the specific address
is not interesting.

00:16:38.900 --> 00:16:40.070
I'm glancing at this number.

00:16:40.070 --> 00:16:41.993
I have no idea what that
number is in decimal.

00:16:41.993 --> 00:16:44.660
I would have to do the math, or
frankly, just Google a converter

00:16:44.660 --> 00:16:45.660
and do it for me.

00:16:45.660 --> 00:16:47.420
So again, that's not
the interesting part.

00:16:47.420 --> 00:16:50.420
The fact that this is in hexadecimal
is just an implementation detail.

00:16:50.420 --> 00:16:54.450
It happens to represent the
location of this variable.

00:16:54.450 --> 00:16:58.230
And again, we won't want
to do this, necessarily.

00:16:58.230 --> 00:17:00.830
But just to be clear that
one of these operators,

00:17:00.830 --> 00:17:02.330
the ampersand gets the address.

00:17:02.330 --> 00:17:05.089
And the star operator
goes to an address.

00:17:05.089 --> 00:17:07.160
We can actually undo the
effects of these things.

00:17:07.160 --> 00:17:13.010
For instance, if I print out now, not
ampersand n, but just out of curiosity,

00:17:13.010 --> 00:17:18.170
star ampersand n, I can kind of
undo the effects of this operator.

00:17:18.170 --> 00:17:21.170
Ampersand n is going to say,
what is the address of n?

00:17:21.170 --> 00:17:25.349
Star ampersand n is going
to say, go to that address.

00:17:25.349 --> 00:17:29.360
So this is kind of a pointless exercise,
because if I just want what's in n,

00:17:29.360 --> 00:17:32.120
I can just, obviously,
print n like we began.

00:17:32.120 --> 00:17:34.560
But again, just as an
intellectual exercise,

00:17:34.560 --> 00:17:38.750
if I prefix n with the address of
operator, and then use the asterisk

00:17:38.750 --> 00:17:42.830
and say, go to that address,
it's the same exact thing

00:17:42.830 --> 00:17:44.280
as just printing n itself.

00:17:44.280 --> 00:17:46.640
So let me change the format
code back to an integer.

00:17:46.640 --> 00:17:50.060
Instead percent p, let me go
ahead and make address now,

00:17:50.060 --> 00:17:52.100
seems to compile OK, and run address.

00:17:52.100 --> 00:17:53.885
And voila, we're back at the 50.

00:17:53.885 --> 00:17:57.050
So as weird as the syntax
today might start to feel,

00:17:57.050 --> 00:17:59.330
realize that these operators,
at the end of the day,

00:17:59.330 --> 00:18:01.833
are relatively simple in what they do.

00:18:01.833 --> 00:18:05.000
And if you understand that one just
kind of undoes the effects of the other,

00:18:05.000 --> 00:18:08.360
can we start to build up some pretty
interesting programs with them.

00:18:08.360 --> 00:18:11.870
And we're going to do so by
leveraging a special type of variable,

00:18:11.870 --> 00:18:13.910
a variable called a pointer.

00:18:13.910 --> 00:18:16.670
And there is that p in percent p.

00:18:16.670 --> 00:18:22.240
A pointer is a variable that contains
the address of some other value.

00:18:22.240 --> 00:18:23.790
So we've seen integers before.

00:18:23.790 --> 00:18:27.770
We've seen floats and chars and
strings and other types as well.

00:18:27.770 --> 00:18:31.430
Pointers, now, are just a
different type of variable

00:18:31.430 --> 00:18:34.640
that store the address of some value.

00:18:34.640 --> 00:18:40.250
And you can have pointers to integers,
pointers to chars, pointers to bools,

00:18:40.250 --> 00:18:41.870
or any other data type.

00:18:41.870 --> 00:18:45.980
A pointer references the
specific type of the value

00:18:45.980 --> 00:18:48.223
that it actually is referring to.

00:18:48.223 --> 00:18:49.640
So let's see this more concretely.

00:18:49.640 --> 00:18:51.620
Let me go back, now, to my program here.

00:18:51.620 --> 00:18:53.840
And let me introduce
another variable here.

00:18:53.840 --> 00:18:58.430
Instead of immediately printing out
something like n, let me go ahead

00:18:58.430 --> 00:19:02.870
and introduce a second variable
that is of type int star.

00:19:02.870 --> 00:19:06.860
And this, I will admit, is probably
the most confusing piece of C syntax

00:19:06.860 --> 00:19:09.860
that we'll, in general, see,
just because, my god, star is now

00:19:09.860 --> 00:19:13.220
used for multiplication, for going
to an address, and also, now,

00:19:13.220 --> 00:19:14.610
declaring a variable.

00:19:14.610 --> 00:19:17.120
This is, arguably, not
the best design decision.

00:19:17.120 --> 00:19:18.350
But it was made decades ago.

00:19:18.350 --> 00:19:19.730
So this is what we have.

00:19:19.730 --> 00:19:26.240
But if I do int star p equals ampersand
n, now, what I can do down here,

00:19:26.240 --> 00:19:31.770
is print out the address of n by
temporarily storing it in a variable.

00:19:31.770 --> 00:19:33.830
So I'm not doing anything new just yet.

00:19:33.830 --> 00:19:36.020
I'm still declaring
on line 5, an integer

00:19:36.020 --> 00:19:37.910
called n, assigning at the value 50.

00:19:37.910 --> 00:19:42.260
What's new now on line 6, is that I'm
introducing a new type of variable.

00:19:42.260 --> 00:19:44.210
This type of variable
is known as a pointer.

00:19:44.210 --> 00:19:48.410
A pointer, again, is just a variable
that stores the address of some value.

00:19:48.410 --> 00:19:53.240
And the syntax, admittedly weird, for
declaring a pointer to an integer,

00:19:53.240 --> 00:19:57.560
is literally say int, because
that's the type you're pointing to,

00:19:57.560 --> 00:20:00.350
star, and then the name of the
variable you want to create.

00:20:00.350 --> 00:20:03.320
And I could call this anything, but
I'll call it p to keep it succinct.

00:20:03.320 --> 00:20:05.120
And again, on the right
hand side of the equals sign

00:20:05.120 --> 00:20:06.620
is the same operator as before.

00:20:06.620 --> 00:20:10.040
If you want to figure out what is the
address of n, it's just ampersand n.

00:20:10.040 --> 00:20:14.450
And so we can store that address,
now, somewhere longer-term.

00:20:14.450 --> 00:20:18.110
Before, I just passed in ampersand
n and printf did it's thing.

00:20:18.110 --> 00:20:23.120
Now, I'm temporarily, on line 6,
storing that address in a new variable

00:20:23.120 --> 00:20:24.470
called p.

00:20:24.470 --> 00:20:28.910
And its type is technically int
star, is what a programmer might say.

00:20:28.910 --> 00:20:33.680
So it would be incorrect to
say int p equals ampersand n.

00:20:33.680 --> 00:20:35.780
And indeed, our compiler,
Clang, won't like that.

00:20:35.780 --> 00:20:38.370
It won't let you compile
the code, most likely.

00:20:38.370 --> 00:20:43.160
And so, instead, I do int star p to
make clear that I know what I'm doing.

00:20:43.160 --> 00:20:48.450
I am storing the address of an
int, not an integer, per say.

00:20:48.450 --> 00:20:53.040
So if I go ahead, now, and save
this, recompile with make address.

00:20:53.040 --> 00:20:55.530
And notice, I changed one
line of code 2 earlier.

00:20:55.530 --> 00:20:59.400
I went back to percent p to print
a pointer that is an address.

00:20:59.400 --> 00:21:02.490
And I'm pointing out the value
of p, no longer the value of n.

00:21:02.490 --> 00:21:07.050
If I now run dot slash address,
voila, there's that cryptic address.

00:21:07.050 --> 00:21:09.300
And these addresses may
very well change over time.

00:21:09.300 --> 00:21:11.640
Depending on what's going
on inside of your program

00:21:11.640 --> 00:21:15.390
or other things on the system, these
addresses might be different each time.

00:21:15.390 --> 00:21:18.060
And that's to be expected and
not something to be relied on.

00:21:18.060 --> 00:21:20.250
But it's clearly some
random cryptic address,

00:21:20.250 --> 00:21:24.400
similar to my arbitrary
0x12345678 before.

00:21:24.400 --> 00:21:26.310
But now, let's just undo this operation.

00:21:26.310 --> 00:21:30.120
Just so we can come full
circle here, let me now propose

00:21:30.120 --> 00:21:33.495
how I can print out the value of n.

00:21:33.495 --> 00:21:35.370
And let me call on
someone for this if I can.

00:21:35.370 --> 00:21:41.640
If my goal, now, on line 7, is no longer
to print the address of n, but to print

00:21:41.640 --> 00:21:43.972
n itself using p.

00:21:43.972 --> 00:21:45.930
I'm going to go ahead
and change, preemptively,

00:21:45.930 --> 00:21:47.820
the format code to percent i.

00:21:47.820 --> 00:21:51.660
And a shorthand notation would,
obviously, be just print n.

00:21:51.660 --> 00:21:53.610
But suppose I don't
want to print n for this

00:21:53.610 --> 00:22:02.880
exercise, how can I now print the value
in n by referring to it by way of p?

00:22:02.880 --> 00:22:05.910
What should I literally type
as printf's second argument

00:22:05.910 --> 00:22:12.530
to print out the value of n by using
this new variable, p, in some way.

00:22:12.530 --> 00:22:16.290
Yeah, let's call on Joshua.

00:22:16.290 --> 00:22:19.860
AUDIENCE: I believe, if you
use the ampersand before the p,

00:22:19.860 --> 00:22:21.642
it will probably do it.

00:22:21.642 --> 00:22:24.100
DAVID MALAN: OK, ampersand p,
let me go ahead and try that.

00:22:24.100 --> 00:22:27.700
Let's try ampersand p
to print out this value.

00:22:27.700 --> 00:22:30.370
So ampersand p, I'm
going to save the file.

00:22:30.370 --> 00:22:32.610
I'm going to do make address and enter.

00:22:32.610 --> 00:22:34.415
And it doesn't seem to be the case.

00:22:34.415 --> 00:22:35.790
Notice that I'm getting an error.

00:22:35.790 --> 00:22:36.720
It's a little cryptic.

00:22:36.720 --> 00:22:40.920
Format specifies type int, but the
argument has type int star star,

00:22:40.920 --> 00:22:42.090
more on that another time.

00:22:42.090 --> 00:22:43.570
So it turns out this was incorrect.

00:22:43.570 --> 00:22:47.430
Let's take one other suggestion,
because the ampersand, recall,

00:22:47.430 --> 00:22:49.170
gets the address of something.

00:22:49.170 --> 00:22:50.880
But p is already an address.

00:22:50.880 --> 00:22:52.590
So Joshua, what you
technically proposed,

00:22:52.590 --> 00:22:54.300
was get me the address of the address.

00:22:54.300 --> 00:22:56.190
And that's not the
direction we want to go.

00:22:56.190 --> 00:22:58.170
We want to go to what
is at that address.

00:22:58.170 --> 00:23:00.740
Sophia, what do you think?

00:23:00.740 --> 00:23:02.640
AUDIENCE: We want to add a percent--

00:23:02.640 --> 00:23:06.820
or a star p when we print it.

00:23:06.820 --> 00:23:07.570
DAVID MALAN: Yeah.

00:23:07.570 --> 00:23:09.380
So I had a little trouble hearing you.

00:23:09.380 --> 00:23:12.370
But I think if we instead use
not the ampersand operator,

00:23:12.370 --> 00:23:14.710
but the star operator,
that's going to be,

00:23:14.710 --> 00:23:17.170
indeed, the dereference operator,
which essentially means,

00:23:17.170 --> 00:23:19.120
go to the value in p.

00:23:19.120 --> 00:23:23.530
And if the value in p is an address,
I think, let's try this, make address.

00:23:23.530 --> 00:23:25.490
Yep, that compiled OK this time.

00:23:25.490 --> 00:23:27.550
Now, if I do dot slash
address, hopefully, I

00:23:27.550 --> 00:23:30.400
will now see, indeed, the number 50.

00:23:30.400 --> 00:23:33.010
So again, we don't seem to have
made any fundamental progress.

00:23:33.010 --> 00:23:36.070
At the end of the day, I'm still
just printing out the value of n.

00:23:36.070 --> 00:23:39.100
But we've introduced this new
primitive, this new puzzle piece,

00:23:39.100 --> 00:23:41.440
if you will, that allows
you, programmatically,

00:23:41.440 --> 00:23:44.390
to figure out the address of
something in the computer's memory

00:23:44.390 --> 00:23:46.540
and to actually go to that address.

00:23:46.540 --> 00:23:52.070
And we'll soon see exercise more
sophisticated control over it as well.

00:23:52.070 --> 00:23:56.050
But let's come back to a
pictorial representation of this

00:23:56.050 --> 00:23:59.290
and consider what it is we just did
in the context, now, of this code.

00:23:59.290 --> 00:24:02.080
So inside of my main, the two
interesting lines of code,

00:24:02.080 --> 00:24:05.320
really, were these two lines first
before we made Sophia's addition

00:24:05.320 --> 00:24:07.990
and actually dereferenced p
and printed it out with printf.

00:24:07.990 --> 00:24:10.810
But let's consider, for a
moment, what these values now

00:24:10.810 --> 00:24:12.280
look like in a computer's memory.

00:24:12.280 --> 00:24:14.440
And again, the syntax
is a little cryptic

00:24:14.440 --> 00:24:16.475
because we now have a
star and an ampersand.

00:24:16.475 --> 00:24:18.850
But again, that just means,
now, we get to start thinking

00:24:18.850 --> 00:24:20.405
in terms of the computer's memory.

00:24:20.405 --> 00:24:23.030
So for instance, here's a grid
of memory inside of my computer.

00:24:23.030 --> 00:24:26.980
And maybe, for instance, the
50 and the n end up down there.

00:24:26.980 --> 00:24:29.980
They could end up anywhere, not
even pictured on the screen here.

00:24:29.980 --> 00:24:34.090
They end up somewhere in the computer's
memory, for our purposes thus far.

00:24:34.090 --> 00:24:36.100
But it technically lives in an address.

00:24:36.100 --> 00:24:38.950
And let me simplify the address
just so it's quicker to say.

00:24:38.950 --> 00:24:42.310
This 50, now, stored in the
variable n, maybe it actually

00:24:42.310 --> 00:24:44.590
lives at address 0x123.

00:24:44.590 --> 00:24:46.480
I have no idea where it
is, but we've clearly

00:24:46.480 --> 00:24:50.200
seen that it can live in a
seemingly random address like that.

00:24:50.200 --> 00:24:51.640
Now, what about p?

00:24:51.640 --> 00:24:54.520
p is technically a variable itself.

00:24:54.520 --> 00:24:57.190
It's a variable that stores
the address of something else.

00:24:57.190 --> 00:25:00.190
But it's still a variable,
which means, when you declare p

00:25:00.190 --> 00:25:04.660
with the code earlier, it actually
does take up some bytes of memory

00:25:04.660 --> 00:25:05.660
on the screen.

00:25:05.660 --> 00:25:10.420
And so let me go ahead and propose that
p happens to end up in memory here.

00:25:10.420 --> 00:25:13.450
Now, p is deliberately
drawn to be longer here.

00:25:13.450 --> 00:25:15.700
I'm consuming eight
total bytes this time,

00:25:15.700 --> 00:25:20.470
because it turns out, on modern
computer systems, including CS50 IDE,

00:25:20.470 --> 00:25:23.500
pointers tend to take up eight bytes.

00:25:23.500 --> 00:25:27.190
So not one, not four, but eight bytes,
so I've simply drawn it to be bigger.

00:25:27.190 --> 00:25:31.240
So what is actually
stored in the variable p?

00:25:31.240 --> 00:25:35.600
Well, it turns out that, again, it's
just storing the address of some value.

00:25:35.600 --> 00:25:42.460
So if the integer n, which itself
is storing 50, is at location 0x123,

00:25:42.460 --> 00:25:47.080
and pointer p is being assigned
that address, it's just like saying,

00:25:47.080 --> 00:25:50.620
well, stored in this variable
p, is literally just a number

00:25:50.620 --> 00:25:54.190
represented here in
hexadecimal notation, 0x123.

00:25:54.190 --> 00:25:56.650
So that's all that's going on
inside the computer's memory

00:25:56.650 --> 00:25:57.858
with those two lines of code.

00:25:57.858 --> 00:26:00.040
There's nothing fundamentally
new, except the fact

00:26:00.040 --> 00:26:04.430
that we have new syntax with which to
refer to these addresses explicitly.

00:26:04.430 --> 00:26:06.100
This is n down here.

00:26:06.100 --> 00:26:07.720
This is p up here.

00:26:07.720 --> 00:26:12.160
And the value of p just
happens to be an address.

00:26:12.160 --> 00:26:15.205
Now, I keep saying that these
addresses are a little cryptic.

00:26:15.205 --> 00:26:16.330
They're a little arbitrary.

00:26:16.330 --> 00:26:16.872
And they are.

00:26:16.872 --> 00:26:20.530
And honestly, it is rarely, if ever,
going to be enlightening to know,

00:26:20.530 --> 00:26:25.030
as a human, what address this
integer n is actually at.

00:26:25.030 --> 00:26:28.550
Who cares if it's at 0x123 or 0x456?

00:26:28.550 --> 00:26:29.800
Generally, we don't.

00:26:29.800 --> 00:26:33.070
And so computer scientists, when
talking about computers' memory,

00:26:33.070 --> 00:26:38.010
tend not to talk at these low level
details, in terms of actual numbers. ,

00:26:38.010 --> 00:26:40.600
Instead, they tend to
simplify the picture,

00:26:40.600 --> 00:26:44.230
sort of abstract away all of the
other memory, which frankly, is not

00:26:44.230 --> 00:26:46.690
relevant to the discussion
thus far, and just

00:26:46.690 --> 00:26:50.290
say, you know what, I know
that p is storing an address.

00:26:50.290 --> 00:26:53.740
And that address happens
to be that of 50 down here.

00:26:53.740 --> 00:26:56.830
But I really don't care, in
my everyday programming life,

00:26:56.830 --> 00:26:58.360
what these specific addresses are.

00:26:58.360 --> 00:26:59.230
So you know what?

00:26:59.230 --> 00:27:01.730
Let's just abstract it away as an arrow.

00:27:01.730 --> 00:27:06.250
And again, abstraction is all about
simplifying lower level details

00:27:06.250 --> 00:27:09.250
that you may very well need to
understand but you don't necessarily

00:27:09.250 --> 00:27:10.520
need to keep thinking about.

00:27:10.520 --> 00:27:11.950
You don't need to keep
thinking at this level.

00:27:11.950 --> 00:27:13.730
It suffices to think at this level.

00:27:13.730 --> 00:27:16.600
So we might as well draw
a pointer, pictorially,

00:27:16.600 --> 00:27:20.710
as pointing at some value
and irrespective of what

00:27:20.710 --> 00:27:22.330
the actual address is.

00:27:22.330 --> 00:27:25.150
And so this is very much
the case in our human world.

00:27:25.150 --> 00:27:29.200
We have very similar
conventions whether or not

00:27:29.200 --> 00:27:31.750
it might be obvious
at first glance, such

00:27:31.750 --> 00:27:37.310
that we may very well be using these
same mechanisms in our everyday lives.

00:27:37.310 --> 00:27:40.690
So for instance, if you happen to have
a mailbox out in the street on your home

00:27:40.690 --> 00:27:43.768
or down in the basement of Harvard
Science Center when on campus, it

00:27:43.768 --> 00:27:46.810
may very well look like something like
this, at least more residentially.

00:27:46.810 --> 00:27:51.100
And suppose that this mailbox here
is representing, in this case, p,

00:27:51.100 --> 00:27:51.790
in the story.

00:27:51.790 --> 00:27:55.490
It's storing a pointer, that is,
the address of something else.

00:27:55.490 --> 00:27:58.360
Well, if there's a whole bunch
of other mailboxes on the street,

00:27:58.360 --> 00:28:01.510
well, we can put anything
we want in these mailboxes.

00:28:01.510 --> 00:28:04.840
We can put postcards,
letters, packages even.

00:28:04.840 --> 00:28:08.250
And just as in the real world,
can we do the same in the virtual.

00:28:08.250 --> 00:28:12.890
I can store chars or integers or
other things, including addresses.

00:28:12.890 --> 00:28:17.100
So for instance, Brian, I think you
have your own mailbox somewhere else.

00:28:17.100 --> 00:28:20.660
And Brian, of course, has a mailbox
that itself has a unique address.

00:28:20.660 --> 00:28:23.600
So Brian, for instance, what
happens to be the unique address

00:28:23.600 --> 00:28:26.030
of the mailbox on your street there?

00:28:26.030 --> 00:28:27.600
BRIAN: Yeah, so here is my mailbox.

00:28:27.600 --> 00:28:28.370
It's labeled n.

00:28:28.370 --> 00:28:29.750
And its address is over here.

00:28:29.750 --> 00:28:33.200
The address of my mailbox
appears to be 0x123.

00:28:33.200 --> 00:28:35.450
DAVID MALAN: Yeah, so my
mailbox, too, has an address.

00:28:35.450 --> 00:28:37.200
Frankly, again, I don't
really care about it.

00:28:37.200 --> 00:28:39.033
So I've not even put
it on the mailbox here.

00:28:39.033 --> 00:28:43.070
But if my mailbox represents p,
a pointer, and Brian's mailbox

00:28:43.070 --> 00:28:45.920
represents n, an
integer, well, it should

00:28:45.920 --> 00:28:49.260
mean that if I look inside
the contents of my pointer

00:28:49.260 --> 00:28:53.690
and I see the value 0x123,
that is now my clue,

00:28:53.690 --> 00:28:57.560
a breadcrumb of sorts, that can now let
me go look inside of Brian's mailbox.

00:28:57.560 --> 00:29:00.320
And Brian, if you wouldn't
mind doing that for us,

00:29:00.320 --> 00:29:02.430
what do you have at that address?

00:29:02.430 --> 00:29:05.540
BRIAN: And if I look in my
mailbox at address 0x123,

00:29:05.540 --> 00:29:07.727
I have the number 50
inside of this mailbox.

00:29:07.727 --> 00:29:08.810
DAVID MALAN: Yeah, indeed.

00:29:08.810 --> 00:29:10.400
So in this case, he happens
to be storing an int.

00:29:10.400 --> 00:29:11.650
But it could be anything else.

00:29:11.650 --> 00:29:14.480
And again, we don't typically care
about these specific addresses.

00:29:14.480 --> 00:29:17.450
Once you understand the metaphor,
really, we can do something silly

00:29:17.450 --> 00:29:20.630
and really just think of this
mailbox as storing a value that's

00:29:20.630 --> 00:29:23.180
pointing at Brian's mailbox.

00:29:23.180 --> 00:29:26.510
It's some kind of direction drawn
there, pictorially as an arrow,

00:29:26.510 --> 00:29:29.000
here as a silly foam finger.

00:29:29.000 --> 00:29:34.750
Or if you prefer, a foam Yale finger
pointing, instead, at Brian's mailbox,

00:29:34.750 --> 00:29:38.720
just as a sort of breadcrumb leading
us to some other value on the screen.

00:29:38.720 --> 00:29:41.408
So when we talk today and
beyond about addresses,

00:29:41.408 --> 00:29:42.700
that's all we're talking about.

00:29:42.700 --> 00:29:45.790
We humans in the real world have
been using addresses for eons, now,

00:29:45.790 --> 00:29:49.030
to uniquely identify our homes
or businesses or the like.

00:29:49.030 --> 00:29:51.520
Computers do the exact
same thing at a lower level

00:29:51.520 --> 00:29:53.440
using their computer's memory.

00:29:53.440 --> 00:29:58.330
So let me pause here to see if there
are any questions on pointers, variables

00:29:58.330 --> 00:30:00.760
that store addresses, or
on these new operators,

00:30:00.760 --> 00:30:02.890
like the ampersand or
the asterisk, which

00:30:02.890 --> 00:30:06.310
now has a new meaning today onward.

00:30:06.310 --> 00:30:06.968
Nothing yet.

00:30:06.968 --> 00:30:09.010
All right, seeing none,
well, let's consider now,

00:30:09.010 --> 00:30:12.250
the same story in the context of
a completely different data type.

00:30:12.250 --> 00:30:15.310
Thus far, we've played only with ints.

00:30:15.310 --> 00:30:16.630
But consider strings.

00:30:16.630 --> 00:30:20.950
We've spent a lot of time on
strings, using encryption with them

00:30:20.950 --> 00:30:25.880
and solving implementing electoral
algorithms using user's input.

00:30:25.880 --> 00:30:27.940
So let's consider a
fundamentally different data

00:30:27.940 --> 00:30:31.940
type that stores, not individual
integers, but strings of text instead.

00:30:31.940 --> 00:30:34.150
So for instance, in any
program involving a string,

00:30:34.150 --> 00:30:38.245
you might have a line of code that
looks like this. string s equals, quote

00:30:38.245 --> 00:30:40.090
unquote, "HI!"

00:30:40.090 --> 00:30:41.852
in all caps with an exclamation point.

00:30:41.852 --> 00:30:44.560
So that may very well be a line
of code that we've seen thus far.

00:30:44.560 --> 00:30:46.935
What's actually going on inside
of the computer's memory?

00:30:46.935 --> 00:30:51.340
Well, let me propose that when you type
in quote unquote, "HI!" in a computer,

00:30:51.340 --> 00:30:53.780
it ends up somewhere in
your computer's memory.

00:30:53.780 --> 00:30:58.840
So HI exclamation point, plus, per last
week, a backslash 0-- or two weeks ago,

00:30:58.840 --> 00:31:04.040
a backslash 0, which is how a computer
represents the end of that string.

00:31:04.040 --> 00:31:06.100
But let's look a little
more carefully at

00:31:06.100 --> 00:31:08.350
what is going on
underneath this hood here.

00:31:08.350 --> 00:31:12.190
Technically speaking, I could
address those individual characters

00:31:12.190 --> 00:31:16.280
we have seen as of week two, by using
bracket notation like s bracket 0,

00:31:16.280 --> 00:31:18.910
s bracket 1, s bracket
2, and s bracket 3.

00:31:18.910 --> 00:31:22.427
We use the square bracket
notation to treat a string

00:31:22.427 --> 00:31:24.010
as though it's an array of characters.

00:31:24.010 --> 00:31:26.900
And it is, it was, and it still is.

00:31:26.900 --> 00:31:32.230
But it turns out, strings can also be
manipulated by way of their addresses

00:31:32.230 --> 00:31:32.960
as well.

00:31:32.960 --> 00:31:36.640
And so for instance, maybe
this same exact string, HI,

00:31:36.640 --> 00:31:43.480
is stored at memory address 0x123
and then 0x124, 0x125, and 0x126.

00:31:43.480 --> 00:31:46.150
Notice that they're
deliberately contiguous

00:31:46.150 --> 00:31:47.560
addresses, back to back to back.

00:31:47.560 --> 00:31:50.870
And they're only one byte apart,
because each of these chars, of course,

00:31:50.870 --> 00:31:53.140
is just one byte in C.

00:31:53.140 --> 00:31:56.920
So those numbers are not
important, specifically.

00:31:56.920 --> 00:31:59.530
But the fact that they're one
byte apart from each other

00:31:59.530 --> 00:32:02.350
is important, because that's
the definition of a string,

00:32:02.350 --> 00:32:05.470
and indeed, an array, to have
memory back to back to back.

00:32:05.470 --> 00:32:08.140
Now, what exactly, though, is S?

00:32:08.140 --> 00:32:11.530
S was the name of the variable I gave a
moment ago to go to that line of code,

00:32:11.530 --> 00:32:13.840
string S equals quote unquote, "HI."

00:32:13.840 --> 00:32:14.710
well, what is S?

00:32:14.710 --> 00:32:18.950
S is a variable that has to go
somewhere in the computer's memory.

00:32:18.950 --> 00:32:24.880
And suppose that S is, indeed,
HI with an exclamation point.

00:32:24.880 --> 00:32:28.600
And the HI happens to live
at this location here.

00:32:28.600 --> 00:32:31.390
You know what you can
think of S as being now,

00:32:31.390 --> 00:32:34.840
isn't, at a high level, a
string, but at a lower level,

00:32:34.840 --> 00:32:37.300
it's just the address of a string.

00:32:37.300 --> 00:32:40.780
More specifically, let's
start thinking about a string

00:32:40.780 --> 00:32:46.297
as technically being just the address
of the first character in the string.

00:32:46.297 --> 00:32:48.130
Now, that might give
you pause for a moment,

00:32:48.130 --> 00:32:49.810
because why the first character?

00:32:49.810 --> 00:32:53.710
How are you going to remember that, wait
a minute, this string isn't at and only

00:32:53.710 --> 00:32:54.940
at 0x123.

00:32:54.940 --> 00:33:00.110
It also continues at
0x124, 0x125, and so forth.

00:33:00.110 --> 00:33:02.950
But let me pause and
ask the group here, why

00:33:02.950 --> 00:33:06.110
might it very well be
sufficient for a computer

00:33:06.110 --> 00:33:12.550
and us programmers to just think
of strings in terms of being

00:33:12.550 --> 00:33:15.460
the address of the very first byte.

00:33:15.460 --> 00:33:18.220
Like, why is it sufficient,
no matter how long

00:33:18.220 --> 00:33:20.830
the string is, even if it's
a whole paragraph of text,

00:33:20.830 --> 00:33:25.360
why is it very cleverly sufficient
to think of a string like S

00:33:25.360 --> 00:33:31.420
as just being identical to
the address of the first byte?

00:33:31.420 --> 00:33:33.718
Ginni, is it?

00:33:33.718 --> 00:33:37.480
AUDIENCE: Possibly because it happens
that strings, whenever we are defining

00:33:37.480 --> 00:33:39.490
a new string, that is altogether.

00:33:39.490 --> 00:33:44.410
Suppose, if I'm writing my name, Ginni,
so it will be G-I-N-N-I altogether.

00:33:44.410 --> 00:33:46.810
So it will be sufficient
if something is pointed

00:33:46.810 --> 00:33:50.560
towards just first character
of my name, so that I can just

00:33:50.560 --> 00:33:55.895
follow up for the first character and
then get all the characters afterwards.

00:33:55.895 --> 00:33:56.770
DAVID MALAN: Perfect.

00:33:56.770 --> 00:33:59.800
So all of these basic definitions
we had over the past couple of weeks

00:33:59.800 --> 00:34:00.790
now come together.

00:34:00.790 --> 00:34:02.812
If a string is just an
array of characters--

00:34:02.812 --> 00:34:05.020
and by definition of array,
those characters are back

00:34:05.020 --> 00:34:09.280
to back to back, and per
two weeks ago, every string

00:34:09.280 --> 00:34:13.300
ends with this conventional
backslash zero or nul character.

00:34:13.300 --> 00:34:15.550
All you need to do when
thinking about a string

00:34:15.550 --> 00:34:17.530
is just to know where
does the string begin,

00:34:17.530 --> 00:34:19.719
because you can use a
four loop or a while loop

00:34:19.719 --> 00:34:22.540
or some other heuristic with a
condition and a Boolean expression

00:34:22.540 --> 00:34:25.929
to figure out where the string
ends without even knowing,

00:34:25.929 --> 00:34:27.710
in advance, its length.

00:34:27.710 --> 00:34:30.159
So that is to say, let's
start, for the moment,

00:34:30.159 --> 00:34:32.679
thinking of about strings
as being quite simply

00:34:32.679 --> 00:34:37.969
that, just the address of the
first character in the string.

00:34:37.969 --> 00:34:40.989
And if we then take that as
fact, let's go ahead, now,

00:34:40.989 --> 00:34:43.989
and start playing with a program that
doesn't use integers, but instead,

00:34:43.989 --> 00:34:46.570
used strings, using
this basic primitive.

00:34:46.570 --> 00:34:49.929
So let me go ahead and delete the
code I'd written before, an address.c.

00:34:49.929 --> 00:34:54.580
Let me just change it up to be string
equals quote unquote, "HI" semicolon.

00:34:54.580 --> 00:34:57.700
And notice, I'm not manually
typing any backslash 0's.

00:34:57.700 --> 00:34:59.560
C does that for us automatically.

00:34:59.560 --> 00:35:02.260
When you close the quote,
the compiler takes care

00:35:02.260 --> 00:35:04.158
of adding that backslash 0 for you.

00:35:04.158 --> 00:35:05.950
Now, I'm going to go
ahead on the next line

00:35:05.950 --> 00:35:10.042
and go ahead and print out
percent s backslash n comma s,

00:35:10.042 --> 00:35:11.500
if I want to print out that string.

00:35:11.500 --> 00:35:13.968
Now, this program is not
at all interesting anymore.

00:35:13.968 --> 00:35:15.760
Back in week one, we
wrote something like--

00:35:15.760 --> 00:35:18.730
OK, yes it is interesting
because I screwed up.

00:35:18.730 --> 00:35:19.780
So five errors.

00:35:19.780 --> 00:35:22.450
I've written seven lines
of code and five errors.

00:35:22.450 --> 00:35:24.070
And let's see what's going on.

00:35:24.070 --> 00:35:27.430
As always, always go to
the top, because odds are,

00:35:27.430 --> 00:35:29.650
there's just some
confusing cascading effect.

00:35:29.650 --> 00:35:34.090
The very first error I see is use
of undeclared identifier string.

00:35:34.090 --> 00:35:35.230
Did I mean standard n?

00:35:35.230 --> 00:35:37.900
I didn't mean standard n,
string, string, string.

00:35:37.900 --> 00:35:40.780
So I could run help 50 as
my frontier, but honestly, I

00:35:40.780 --> 00:35:43.150
make this mistake often
enough that I kind of know now

00:35:43.150 --> 00:35:46.690
that I forgot to include cs50.h.

00:35:46.690 --> 00:35:49.960
And indeed, if I now do this
and recompile make address--

00:35:49.960 --> 00:35:53.080
OK, all five errors are gone
just by that one simple change.

00:35:53.080 --> 00:35:56.200
And if I run address now, it's just
going to, quite simply, say HI.

00:35:56.200 --> 00:35:59.020
But let's now start to
consider what's going

00:35:59.020 --> 00:36:00.650
on underneath the hood of this program.

00:36:00.650 --> 00:36:06.040
Suppose I am curious and want
to print out what is actually

00:36:06.040 --> 00:36:08.170
the address at which this string lives.

00:36:08.170 --> 00:36:09.520
Well, it turns out--

00:36:09.520 --> 00:36:10.690
let me be clever here.

00:36:10.690 --> 00:36:14.830
Let me print out, not a format
code of percent s, but percent p.

00:36:14.830 --> 00:36:18.290
Show me this same string as an address.

00:36:18.290 --> 00:36:22.060
Let me go ahead and recompile,
make address, seems to compile OK.

00:36:22.060 --> 00:36:23.560
Let me run dot slash address.

00:36:23.560 --> 00:36:26.350
And again, I'm still printing
s, but I'm asking printf

00:36:26.350 --> 00:36:30.260
to present it as though it's a pointer.

00:36:30.260 --> 00:36:32.430
And interesting, it's
not the same as before.

00:36:32.430 --> 00:36:35.060
But again, that's reasonable
because the memory addresses

00:36:35.060 --> 00:36:36.540
aren't going to always be the same.

00:36:36.540 --> 00:36:37.940
But it doesn't matter what it is.

00:36:37.940 --> 00:36:39.232
But that's kind of interesting.

00:36:39.232 --> 00:36:41.750
All this time, any time
you've been using strings,

00:36:41.750 --> 00:36:44.300
had you just changed your
percent s to a percent p,

00:36:44.300 --> 00:36:48.290
you could have seen where, in
memory, that string actually starts.

00:36:48.290 --> 00:36:50.780
It's not functionally
useful to us just yet.

00:36:50.780 --> 00:36:52.700
But it's been there this whole time.

00:36:52.700 --> 00:36:54.800
And let me go ahead and
do the following now.

00:36:54.800 --> 00:36:58.950
Suppose I get a little curious
further, and I do printf.

00:36:58.950 --> 00:37:02.390
Let me go ahead and print out another
address followed by a new line.

00:37:02.390 --> 00:37:07.035
And let me go ahead and print out
the address of the first character.

00:37:07.035 --> 00:37:08.660
So again, this is a little weird to do.

00:37:08.660 --> 00:37:10.220
And we wouldn't typically
do this that often.

00:37:10.220 --> 00:37:13.430
But again, just to make the point that
these operators give us very simple

00:37:13.430 --> 00:37:16.850
answers to questions like, what
is the address of this thing?

00:37:16.850 --> 00:37:23.960
If s bracket i, as of week two in CS50,
represented the second character in s,

00:37:23.960 --> 00:37:28.190
because 0 index means s bracket 0 is
the first, s bracket 1 is the second.

00:37:28.190 --> 00:37:30.410
If I play around with
today's new operator,

00:37:30.410 --> 00:37:36.020
this ampersand, I bet I can see the
address of that second character.

00:37:36.020 --> 00:37:38.390
And in fact, let me go
ahead and be more explicit.

00:37:38.390 --> 00:37:43.160
Let me change this first s to be s
bracket 0 and put an ampersand here.

00:37:43.160 --> 00:37:46.430
And let me go ahead, now, and
make this program, make address.

00:37:46.430 --> 00:37:48.170
OK, a little funky--

00:37:48.170 --> 00:37:49.680
I just missed a semicolon.

00:37:49.680 --> 00:37:51.060
So easy fix there.

00:37:51.060 --> 00:37:53.600
Let me go ahead and
recompile with make address.

00:37:53.600 --> 00:37:55.880
Let me go ahead and
run dot slash address.

00:37:55.880 --> 00:37:58.970
And interesting, well, maybe--

00:37:58.970 --> 00:38:00.320
interesting to me.

00:38:00.320 --> 00:38:02.780
So you see, now, two
addresses, the first of which

00:38:02.780 --> 00:38:08.900
is 0x4006a4, which apparently, is the
address of the first character in s.

00:38:08.900 --> 00:38:10.880
But notice what's curious
about the next one.

00:38:10.880 --> 00:38:15.720
It's almost the same except
the byte is one further away.

00:38:15.720 --> 00:38:18.380
And I bet if I do this, not
just for the h and the i,

00:38:18.380 --> 00:38:20.330
but also the exclamation
point-- let me do

00:38:20.330 --> 00:38:23.210
one more line of almost
identical code, just

00:38:23.210 --> 00:38:26.240
to make the point that all
this time it's, indeed,

00:38:26.240 --> 00:38:30.560
been the case that all characters in
a string are back to back to back.

00:38:30.560 --> 00:38:32.540
And you can now see it in code.

00:38:32.540 --> 00:38:37.610
b4, b5, b6, are just one byte apart.

00:38:37.610 --> 00:38:40.940
So we see some visual confirmation,
now, that strings are indeed

00:38:40.940 --> 00:38:42.990
laid out in memory just like this.

00:38:42.990 --> 00:38:46.130
Now, again, this is not a very
useful programmatic exercise

00:38:46.130 --> 00:38:48.500
to look at the address
of individual characters.

00:38:48.500 --> 00:38:51.350
But again, this is just to
emphasize that underneath the hood,

00:38:51.350 --> 00:38:53.960
some relatively simple
operations are being

00:38:53.960 --> 00:38:58.562
enabled by way of this new ampersand,
and in turn, star operator.

00:38:58.562 --> 00:39:00.770
So let's consider for a
moment what this really looks

00:39:00.770 --> 00:39:02.390
like inside the computer's memory.

00:39:02.390 --> 00:39:05.660
At a low level, yes, s is
technically an address.

00:39:05.660 --> 00:39:08.540
And yes, it's technically the
address of the first byte,

00:39:08.540 --> 00:39:10.880
which in the actual
computer, looked different.

00:39:10.880 --> 00:39:13.100
But in my slide here, I
just arbitrarily proposed

00:39:13.100 --> 00:39:17.210
that it's at 0x123, 0x124, 0x125.

00:39:17.210 --> 00:39:20.300
But again, let's not care
about that level of detail.

00:39:20.300 --> 00:39:23.210
Let's just kind of wave our hands
and abstract away these addresses

00:39:23.210 --> 00:39:30.950
and just now start thinking of s, that
is a string, as technically just being

00:39:30.950 --> 00:39:32.450
a pointer.

00:39:32.450 --> 00:39:33.260
A pointer.

00:39:33.260 --> 00:39:36.463
So it turns out that even though
it's very useful and very common

00:39:36.463 --> 00:39:39.380
to think of strings as, obviously,
just being sequences of characters.

00:39:39.380 --> 00:39:41.240
And that's been true since week one.

00:39:41.240 --> 00:39:43.130
And you can also think
of them as arrays,

00:39:43.130 --> 00:39:44.990
back to back sequences of characters.

00:39:44.990 --> 00:39:47.330
You can also, it turns
out, starting today,

00:39:47.330 --> 00:39:51.290
think of them as just
being pointers, that is,

00:39:51.290 --> 00:39:54.900
the address of a character
somewhere in the computer's memory.

00:39:54.900 --> 00:39:58.550
And as Ginni notes, because all
of the characters in a string

00:39:58.550 --> 00:40:00.770
are, by definition,
back to back to back,

00:40:00.770 --> 00:40:05.720
and because, by definition, all
strings end with a backslash 0, that

00:40:05.720 --> 00:40:08.750
is literally the smallest and
only amount of information

00:40:08.750 --> 00:40:12.920
you need to keep around in a computer
to know where all of your strings are.

00:40:12.920 --> 00:40:16.340
Just remember the address
of the very first character

00:40:16.340 --> 00:40:19.430
therein, because you can
find your way to the end

00:40:19.430 --> 00:40:24.320
by remembering that this backslash
0 is, really, just eight 0

00:40:24.320 --> 00:40:27.080
bits, otherwise
represented as backslash 0.

00:40:27.080 --> 00:40:29.617
And so we could certainly
have an if condition,

00:40:29.617 --> 00:40:31.700
much like we did two weeks
ago when playing around

00:40:31.700 --> 00:40:36.230
with the lengths of strings, that
allows us to check for precisely that.

00:40:36.230 --> 00:40:41.030
And so when I say we're taking off
some training wheels, here they go.

00:40:41.030 --> 00:40:44.330
So up until now, we've been
using, again, the CS50 library,

00:40:44.330 --> 00:40:47.470
which gives us, conveniently,
functions like get string and get int

00:40:47.470 --> 00:40:49.650
and get float and so forth.

00:40:49.650 --> 00:40:54.650
But all this time, the CS50 library,
specifically the file, cs50.h,

00:40:54.650 --> 00:40:58.070
had a little bit of a
pedagogical simplification in it.

00:40:58.070 --> 00:41:02.510
Recall last week, that you can
define your own custom data types.

00:41:02.510 --> 00:41:06.955
Well, it turns out that all this time,
we've been claiming that strings exist

00:41:06.955 --> 00:41:09.080
and they're something you
can use in your programs.

00:41:09.080 --> 00:41:14.420
And strings do exist in C. They do
exist in Python, in JavaScript, in Java,

00:41:14.420 --> 00:41:16.980
and C++, in many, many,
many other languages.

00:41:16.980 --> 00:41:18.860
This is not a CS50 term.

00:41:18.860 --> 00:41:25.190
But string, technically, does not
exist as a data type in C. It instead,

00:41:25.190 --> 00:41:31.180
is more cryptically and more
low-level known as char star.

00:41:31.180 --> 00:41:33.080
Char star, now what does that mean?

00:41:33.080 --> 00:41:37.180
Well, char star, much like our
int star a few minutes ago,

00:41:37.180 --> 00:41:40.840
just represents the address of
a character, much like int star

00:41:40.840 --> 00:41:43.210
represents the address of an int.

00:41:43.210 --> 00:41:46.210
And if, again, you kind
of agree with me now,

00:41:46.210 --> 00:41:49.450
that you can think of strings
as sequences of characters,

00:41:49.450 --> 00:41:52.660
or more specifically, arrays of
characters, or more specifically,

00:41:52.660 --> 00:41:56.920
as of today, the address of
just the first character,

00:41:56.920 --> 00:41:59.680
then it's, indeed, the
case that we now can

00:41:59.680 --> 00:42:02.800
apply this new terminology,
today, of pointer,

00:42:02.800 --> 00:42:06.040
to our old familiar friends, strings.

00:42:06.040 --> 00:42:10.690
String is the same thing as a
synonym, if you will, for char star.

00:42:10.690 --> 00:42:14.200
And it's in the CS50 library that
we, essentially, have a line of code

00:42:14.200 --> 00:42:18.348
that simplifies or abstracts away char
star, which honestly, no one wants

00:42:18.348 --> 00:42:20.890
to think about or struggle with
in the first week of a class,

00:42:20.890 --> 00:42:23.260
let alone the first two
or three weeks of a class.

00:42:23.260 --> 00:42:28.475
It's a simplification, a custom
data type, that we name string,

00:42:28.475 --> 00:42:30.850
just so you don't have to
think about, what is this star?

00:42:30.850 --> 00:42:32.017
What is it to the character?

00:42:32.017 --> 00:42:33.100
What is it an address of?

00:42:33.100 --> 00:42:37.450
But today, we can remove those training
wheels and reveal that, all this time,

00:42:37.450 --> 00:42:40.720
you've just been manipulating
characters at specific addresses.

00:42:40.720 --> 00:42:43.180
And we've used this kind
of technique before,

00:42:43.180 --> 00:42:45.550
abstracting away these
lower level details.

00:42:45.550 --> 00:42:48.310
For instance, recall last
week, that we introduced

00:42:48.310 --> 00:42:52.630
this notion of a struct, a data type
that you can customize to be your own.

00:42:52.630 --> 00:42:56.200
We implemented a better phone
book by wrapping together

00:42:56.200 --> 00:42:58.630
a name and a number inside
of a custom data type,

00:42:58.630 --> 00:43:01.960
encapsulating them if you will,
inside of something we called person.

00:43:01.960 --> 00:43:05.650
And every person we
claimed had a structure

00:43:05.650 --> 00:43:07.580
that contains a name and a number.

00:43:07.580 --> 00:43:11.410
And by the way of this feature of C,
typedef, we can define a new type.

00:43:11.410 --> 00:43:15.200
And the name of that type,
last week, was just person.

00:43:15.200 --> 00:43:18.100
So we're using, already, and
we have been sort of secretly

00:43:18.100 --> 00:43:22.750
using since the first week of C
in the class, a line of code that

00:43:22.750 --> 00:43:24.020
actually looks like this.

00:43:24.020 --> 00:43:28.090
And this is, indeed, one of the
lines of code inside of cs50.h.

00:43:28.090 --> 00:43:31.000
It says typedef, which
means give me a custom type.

00:43:31.000 --> 00:43:35.770
And it creates a synonym
for char star called string.

00:43:35.770 --> 00:43:39.700
And it's just a way where we
can hide the funky char star.

00:43:39.700 --> 00:43:42.070
We can hide the asterisk, in
particular, which would not

00:43:42.070 --> 00:43:43.990
be fun to play with
in the first few days,

00:43:43.990 --> 00:43:47.200
but without changing the
definition of what a string is.

00:43:47.200 --> 00:43:51.850
So strings exist in C. But there's
no data type called string in C

00:43:51.850 --> 00:43:56.020
until you use a library like
CS50's, which makes it exist

00:43:56.020 --> 00:43:58.930
by way of that kind of definition.

00:43:58.930 --> 00:44:01.450
All right, let me pause
here to see if there's

00:44:01.450 --> 00:44:03.760
any questions, then,
about what strings are

00:44:03.760 --> 00:44:09.360
or these new ways of
thinking about them.

00:44:09.360 --> 00:44:13.390
Any questions about
strings or char stars?

00:44:13.390 --> 00:44:15.140
All right, well, if
no questions here, why

00:44:15.140 --> 00:44:17.515
don't we go ahead and take
our 5 minute break here first.

00:44:17.515 --> 00:44:19.790
And we'll be back in 5
and take another look

00:44:19.790 --> 00:44:22.040
at what we can now do
with these new primitives.

00:44:22.040 --> 00:44:23.480
All right, we're back.

00:44:23.480 --> 00:44:27.680
And we have, now, this ability in code
to get the address of some variable

00:44:27.680 --> 00:44:30.140
and also to go to an
address using ampersand

00:44:30.140 --> 00:44:31.850
and the asterisk, respectively.

00:44:31.850 --> 00:44:36.530
We've thought about strings as
being not only contiguous sequences

00:44:36.530 --> 00:44:38.150
of characters, but also arrays.

00:44:38.150 --> 00:44:42.477
And then of course, as of
today now, actual addresses,

00:44:42.477 --> 00:44:44.810
the address of the first
character and then, from there,

00:44:44.810 --> 00:44:46.940
can we find our way,
programmatically, to the end,

00:44:46.940 --> 00:44:48.380
thanks to that nul character.

00:44:48.380 --> 00:44:52.220
But it turns out there's one other
thing we can do with these addresses

00:44:52.220 --> 00:44:53.840
or with pointers more generally.

00:44:53.840 --> 00:44:55.550
And that's known as pointer arithmetic.

00:44:55.550 --> 00:44:58.577
So anything that's a number,
of course, we can do math on.

00:44:58.577 --> 00:45:00.410
And the math is not
going to be complicated,

00:45:00.410 --> 00:45:03.390
but it is going to be
powerful for us here.

00:45:03.390 --> 00:45:07.040
So I'm going to go back to my
most recent state of address.c.

00:45:07.040 --> 00:45:11.480
And let me go ahead, now, and
reiterate that we can print out

00:45:11.480 --> 00:45:15.800
the individual characters in a string,
just like we did back in week two,

00:45:15.800 --> 00:45:18.270
as by using our square bracket notation.

00:45:18.270 --> 00:45:21.170
So I'm getting rid of all evidence
of those addresses for now.

00:45:21.170 --> 00:45:23.420
I'm recompiling this
program as make address.

00:45:23.420 --> 00:45:25.650
And then I'm going to run
dot slash address now.

00:45:25.650 --> 00:45:29.690
And I see HI exclamation
point, one character per line.

00:45:29.690 --> 00:45:34.290
But now, consider that there doesn't
need to be a string data type.

00:45:34.290 --> 00:45:36.320
In fact, we can take
this training wheel off.

00:45:36.320 --> 00:45:38.690
And while it might feel a
little uncomfortable at first,

00:45:38.690 --> 00:45:42.620
if I delete this first line altogether,
as I've accidentally omitted anyway

00:45:42.620 --> 00:45:45.660
sometimes, I don't need to
keep calling things strings.

00:45:45.660 --> 00:45:47.570
I can describe them as strings verbally.

00:45:47.570 --> 00:45:49.790
I can think of them as
strings, because string

00:45:49.790 --> 00:45:53.150
is a thing in many different
programming languages.

00:45:53.150 --> 00:45:56.070
But by default, in C, it
just doesn't exist as a type.

00:45:56.070 --> 00:45:59.750
Instead, the type is somewhat
cryptically named, char star.

00:45:59.750 --> 00:46:02.840
But again, all that means is
that the star means here's

00:46:02.840 --> 00:46:04.010
the address of something.

00:46:04.010 --> 00:46:06.140
Char means it's the address of a char.

00:46:06.140 --> 00:46:09.950
So char star gives
you a pointer variable

00:46:09.950 --> 00:46:12.720
that's going to point to a character.

00:46:12.720 --> 00:46:16.080
So now, if s is that, I can
actually treat it the same.

00:46:16.080 --> 00:46:20.960
There's no reason I can't keep using
s like a string was back in week two,

00:46:20.960 --> 00:46:22.400
using our square bracket notation.

00:46:22.400 --> 00:46:24.770
And I can keep printing
out HI exclamation point

00:46:24.770 --> 00:46:27.320
using that same square bracket syntax.

00:46:27.320 --> 00:46:30.170
But there's one other way I can do this.

00:46:30.170 --> 00:46:35.150
If I now know that s is
really just an address,

00:46:35.150 --> 00:46:37.760
I can get rid of this
square bracket notation.

00:46:37.760 --> 00:46:42.860
And I can actually just do star s,
because recall that star, in addition

00:46:42.860 --> 00:46:47.270
to being the new symbol that we use
when declaring a pointer up here,

00:46:47.270 --> 00:46:50.990
it's also the same symbol,
confusingly, admittedly,

00:46:50.990 --> 00:46:53.310
that we used to go to an address.

00:46:53.310 --> 00:46:57.650
So if s is storing an address, which
it is by definition of being a pointer,

00:46:57.650 --> 00:46:59.900
star s means go to that address.

00:46:59.900 --> 00:47:02.000
And per my picture
earlier, it would seem

00:47:02.000 --> 00:47:08.060
to be the case that s is most likely
at an address beginning at 0x123.

00:47:08.060 --> 00:47:10.250
It's not going to be the
same in my actual IDE here.

00:47:10.250 --> 00:47:12.167
It will be whatever the
computer has ordained.

00:47:12.167 --> 00:47:14.610
But it's going to be
the same exact idea.

00:47:14.610 --> 00:47:17.150
So let me go ahead and go to star s.

00:47:17.150 --> 00:47:20.130
And just for kicks, let me
leave it as just that one line.

00:47:20.130 --> 00:47:23.870
So let me go ahead and
rerun this as make address.

00:47:23.870 --> 00:47:25.470
All right, and now dot slash address.

00:47:25.470 --> 00:47:30.710
I should see, hopefully, a capital
H and only an H. But watch this.

00:47:30.710 --> 00:47:34.400
If I know that s, a string, is
technically just an address,

00:47:34.400 --> 00:47:35.960
I can actually now do math on it.

00:47:35.960 --> 00:47:39.470
And I can go ahead and print out another
character, followed by a new line.

00:47:39.470 --> 00:47:44.090
And I can go to, not s,
but how about s plus 1.

00:47:44.090 --> 00:47:47.600
So I can do some very simple arithmetic,
if you will, on that pointer.

00:47:47.600 --> 00:47:49.920
And let me go ahead
and now recompile this.

00:47:49.920 --> 00:47:54.800
So make address, compiles
OK, dot slash address.

00:47:54.800 --> 00:47:56.570
And I should see HI.

00:47:56.570 --> 00:48:01.790
And if I do one more line of code like
this, printf, percent c, backslash n,

00:48:01.790 --> 00:48:07.130
star s plus 2, I can
now go to the character

00:48:07.130 --> 00:48:10.770
that is two bytes away
from whatever s is,

00:48:10.770 --> 00:48:12.480
which again, is the start of the string.

00:48:12.480 --> 00:48:15.890
So now, I've reprinted HI with
the exclamation point character

00:48:15.890 --> 00:48:19.280
by character, but not by using
this fancy square bracket

00:48:19.280 --> 00:48:24.710
notation, fancy only in the sense that
it was sort of an abstraction for us,

00:48:24.710 --> 00:48:25.670
if you will.

00:48:25.670 --> 00:48:28.885
I'm instead, manipulating s for what
it really is, which is just an address.

00:48:28.885 --> 00:48:31.010
And so here, too, and I've
used this phrase before,

00:48:31.010 --> 00:48:33.710
that square bracket notation
that we introduced in week two,

00:48:33.710 --> 00:48:36.410
is technically just syntactic sugar.

00:48:36.410 --> 00:48:39.500
It's not doing anything
fundamentally different

00:48:39.500 --> 00:48:42.770
from these asterisks
and these addresses.

00:48:42.770 --> 00:48:45.440
It's just doing it, honestly, in
a much more user-friendly way.

00:48:45.440 --> 00:48:49.160
I still prefer, personally, the
square bracket notation from week two.

00:48:49.160 --> 00:48:54.680
But it's the same thing as using the
star and doing this math yourself.

00:48:54.680 --> 00:48:57.020
So C is just providing us
with this handy feature

00:48:57.020 --> 00:49:00.200
of using square brackets that
does all of this so-called pointer

00:49:00.200 --> 00:49:02.360
arithmetic for you.

00:49:02.360 --> 00:49:04.290
But again, we're going
to this low level just

00:49:04.290 --> 00:49:10.310
to emphasize what it is that's going
on ultimately underneath the hood here.

00:49:10.310 --> 00:49:13.070
All right, let me pause
here for any questions.

00:49:13.070 --> 00:49:17.290
And Brian, please do feel free
to verbalize any on your end.

00:49:17.290 --> 00:49:19.790
BRIAN: I see a question that
came in about what would happen

00:49:19.790 --> 00:49:22.233
if you tried to print star s plus 3.

00:49:22.233 --> 00:49:25.400
DAVID MALAN: So I'm pretty sure that's
going to print out the nul character.

00:49:25.400 --> 00:49:27.233
But let's go ahead and
confirm as much here,

00:49:27.233 --> 00:49:31.760
percent c backslash n star s plus 3.

00:49:31.760 --> 00:49:35.120
All right, I'm getting a
little adventurous here

00:49:35.120 --> 00:49:38.060
by looking at things I maybe shouldn't
be looking at, because that's

00:49:38.060 --> 00:49:39.545
a low level implementation detail.

00:49:39.545 --> 00:49:40.670
But let's see what happens.

00:49:40.670 --> 00:49:43.130
It compiles OK, dot slash address.

00:49:43.130 --> 00:49:44.780
And it seems to be blank.

00:49:44.780 --> 00:49:46.730
Now, maybe that's the nul character.

00:49:46.730 --> 00:49:48.980
Honestly, it's not meant to
be a printable character.

00:49:48.980 --> 00:49:52.770
It's this special sentinel value
that indicates the end of the string.

00:49:52.770 --> 00:49:54.020
But I could do this.

00:49:54.020 --> 00:49:57.170
I know from week two
that chars are integers

00:49:57.170 --> 00:49:59.670
and integers are chars if I
want to think of them that way.

00:49:59.670 --> 00:50:01.880
So let me change only
the very last character

00:50:01.880 --> 00:50:03.950
to use the format code percent i.

00:50:03.950 --> 00:50:05.690
Let me recompile my code.

00:50:05.690 --> 00:50:07.940
Let me go ahead and run address.

00:50:07.940 --> 00:50:11.540
And voila, HI exclamation 0.

00:50:11.540 --> 00:50:16.400
And there is the all 0 bits represented
here as one single decimal digit thanks

00:50:16.400 --> 00:50:17.570
to percent i.

00:50:17.570 --> 00:50:19.970
Now, I can get really crazy here.

00:50:19.970 --> 00:50:23.420
And why don't we go ahead and
print out not just what characters

00:50:23.420 --> 00:50:28.580
are right after this sequence, HI
exclamation point nul character,

00:50:28.580 --> 00:50:33.770
why don't we go to-- oh heck, how
about address 1,000 bytes away,

00:50:33.770 --> 00:50:35.990
and really get nosy
inside of my computer?

00:50:35.990 --> 00:50:38.450
Let me recompile that dot slash address.

00:50:38.450 --> 00:50:40.460
OK, nothing really going on over there.

00:50:40.460 --> 00:50:42.620
How about 10,000 bytes away?

00:50:42.620 --> 00:50:44.270
Let me go ahead and make address.

00:50:44.270 --> 00:50:47.990
Let me go ahead and run this
segmentation fault. All, right

00:50:47.990 --> 00:50:49.010
that's bad.

00:50:49.010 --> 00:50:53.030
And you might be among the fortunate
few who have seen this error before

00:50:53.030 --> 00:50:54.440
by touching memory you shouldn't.

00:50:54.440 --> 00:50:56.607
And we're going to deliberately
consider this today.

00:50:56.607 --> 00:50:59.540
But a segmentation fault, indeed,
means that you have done something

00:50:59.540 --> 00:51:01.430
wrong somewhere in your code.

00:51:01.430 --> 00:51:04.000
And it tends to mean that you
touched a segment of memory

00:51:04.000 --> 00:51:05.000
that you shouldn't have.

00:51:05.000 --> 00:51:08.750
And I have no business, honestly,
looking 10,000 bytes away

00:51:08.750 --> 00:51:11.420
from the memory that I
know belongs to the string.

00:51:11.420 --> 00:51:14.670
That's like arbitrarily looking
anywhere in your computer's memory,

00:51:14.670 --> 00:51:16.890
which probably, it seems,
is not a good idea.

00:51:16.890 --> 00:51:19.000
But more on that in just a bit.

00:51:19.000 --> 00:51:21.470
So let's consider, now,
some of the implications

00:51:21.470 --> 00:51:25.130
of these underlying
implementation details

00:51:25.130 --> 00:51:28.580
and consider, now, from last week,
why we did a few things the way

00:51:28.580 --> 00:51:30.590
we did in the past few weeks, in fact.

00:51:30.590 --> 00:51:32.360
So string is just a char star.

00:51:32.360 --> 00:51:33.860
And let's, now, consider an example.

00:51:33.860 --> 00:51:37.260
Let me zoom out on my memory, just
so I can cram more in at once.

00:51:37.260 --> 00:51:39.620
Let's consider an example
where I might want to write

00:51:39.620 --> 00:51:42.570
a program that compares two strings.

00:51:42.570 --> 00:51:45.830
Let me go ahead and write some new
code here in a new file this time,

00:51:45.830 --> 00:51:48.350
called, for instance, compare.c.

00:51:48.350 --> 00:51:50.480
My goal with this
program, quite simply, is

00:51:50.480 --> 00:51:55.580
going to be to print out the
contents of-- or rather to compare

00:51:55.580 --> 00:51:57.590
two strings that the user might input.

00:51:57.590 --> 00:52:00.040
I'm going to go ahead
and include cs59.h,

00:52:00.040 --> 00:52:02.810
not because I want
string, per say, anymore,

00:52:02.810 --> 00:52:05.750
but because I want to use get
string just for convenience.

00:52:05.750 --> 00:52:08.180
But we'll take that training
wheel off in a bit, too.

00:52:08.180 --> 00:52:10.520
And in this program, I'm
going to go ahead and first

00:52:10.520 --> 00:52:11.690
use, not get string yet.

00:52:11.690 --> 00:52:14.450
Let me go ahead and keep it
simple and start with get int.

00:52:14.450 --> 00:52:16.910
And I'll ask the user for a variable i.

00:52:16.910 --> 00:52:19.340
And let me do another one
of these in get int and ask

00:52:19.340 --> 00:52:21.270
the user for a value for j.

00:52:21.270 --> 00:52:24.665
And then let me go ahead and quite
simply say, if i equals equals j,

00:52:24.665 --> 00:52:28.790
then go ahead and print out same else.

00:52:28.790 --> 00:52:31.770
Let me go ahead and print out different.

00:52:31.770 --> 00:52:35.930
So this is week one stuff, where
I'm using a couple of variables.

00:52:35.930 --> 00:52:38.300
I'm using a condition
with two branches, and I'm

00:52:38.300 --> 00:52:42.990
using printf to print out whether those
two variables, i and j, are the same.

00:52:42.990 --> 00:52:44.930
So let's go ahead and compile this.

00:52:44.930 --> 00:52:45.950
All is well.

00:52:45.950 --> 00:52:49.310
Run compare, and let me
give it digits 1 and 2.

00:52:49.310 --> 00:52:50.630
And indeed, they're different.

00:52:50.630 --> 00:52:53.400
And let me go ahead and give it
1 and 1, and they're the same.

00:52:53.400 --> 00:52:56.270
So I think, logically, proof
by example, if you will,

00:52:56.270 --> 00:52:57.860
this program looks correct.

00:52:57.860 --> 00:53:02.630
But let me quickly make it seemingly
uncorrect, by not using integers.

00:53:02.630 --> 00:53:05.840
But how about, by using strings instead.

00:53:05.840 --> 00:53:07.988
Let me go ahead and
give myself a string.

00:53:07.988 --> 00:53:10.280
Although, no, I don't need
that training wheel anymore.

00:53:10.280 --> 00:53:15.300
Let's just do char star
s equals get string of s.

00:53:15.300 --> 00:53:17.300
But again, even though
I'm calling it char star,

00:53:17.300 --> 00:53:19.580
it's still a string
like it was weeks ago.

00:53:19.580 --> 00:53:23.510
Let me give myself another string
called t, just to keep the name short.

00:53:23.510 --> 00:53:25.100
And s will get--

00:53:25.100 --> 00:53:26.730
t will get that value there.

00:53:26.730 --> 00:53:30.140
And let me just, very naively
but kind of reasonably,

00:53:30.140 --> 00:53:34.310
say if s equals equals t, let's
go ahead and print out same.

00:53:34.310 --> 00:53:38.000
And otherwise, let's go ahead
and print out different.

00:53:38.000 --> 00:53:41.240
So same exact code, just
different data types, and using

00:53:41.240 --> 00:53:42.830
get string instead of get int.

00:53:42.830 --> 00:53:47.360
Let me go ahead and make compare,
seems to compile OK, dot slash compare.

00:53:47.360 --> 00:53:51.770
Let me go ahead and type in HI!--

00:53:51.770 --> 00:53:53.570
woops, HI!.

00:53:53.570 --> 00:53:55.220
Let me go ahead and type in HI! again.

00:53:55.220 --> 00:53:57.500
And voila, different.

00:53:57.500 --> 00:54:01.010
And I forgot my backslash n's, but that
seems to be the least of my problems.

00:54:01.010 --> 00:54:05.240
Let me recompile this, make compare,
and now, let me run it again.

00:54:05.240 --> 00:54:07.130
How about, let's do a quick test.

00:54:07.130 --> 00:54:09.010
David, Brian, these are
definitely different.

00:54:09.010 --> 00:54:09.580
OK, good.

00:54:09.580 --> 00:54:11.240
So the program seems to work.

00:54:11.240 --> 00:54:13.150
How about David, David?

00:54:13.150 --> 00:54:14.140
Also different.

00:54:14.140 --> 00:54:15.370
Huh, let me try again.

00:54:15.370 --> 00:54:18.600
Brian, Brian, also different.

00:54:18.600 --> 00:54:21.570
But I'm pretty sure those
strings are the same.

00:54:21.570 --> 00:54:24.180
Why might this program be flawed?

00:54:24.180 --> 00:54:28.582
What is wrong with
this program right now?

00:54:28.582 --> 00:54:30.290
BRIAN: A couple of
people in the chat are

00:54:30.290 --> 00:54:32.750
saying that we're not actually
comparing the characters,

00:54:32.750 --> 00:54:34.370
we're comparing the addresses.

00:54:34.370 --> 00:54:37.377
DAVID MALAN: Yeah, so that's sort of
the logical conclusion from today's

00:54:37.377 --> 00:54:38.960
definition of what a string really is.

00:54:38.960 --> 00:54:41.750
If a string is just the
address of its first character,

00:54:41.750 --> 00:54:44.450
then if you're literally
doing s equals equals t,

00:54:44.450 --> 00:54:46.697
you're comparing those two addresses.

00:54:46.697 --> 00:54:48.530
And they are probably
going to be different,

00:54:48.530 --> 00:54:50.990
even if I type in the same
thing, because every time we've

00:54:50.990 --> 00:54:55.010
called get int or get string, it's
kind of plopped the user's input

00:54:55.010 --> 00:54:56.750
somewhere in my computer's memory.

00:54:56.750 --> 00:55:00.560
But we now have the tools, honestly,
to answer this or vet this answer

00:55:00.560 --> 00:55:01.130
ourselves.

00:55:01.130 --> 00:55:03.230
Let me go ahead and
simplify this program.

00:55:03.230 --> 00:55:06.050
And let's, just as a quick
sanity check, print out s.

00:55:06.050 --> 00:55:10.610
And let's go ahead and print out
t using a new line after each,

00:55:10.610 --> 00:55:12.350
just so we can see what the strings are.

00:55:12.350 --> 00:55:16.830
So let me go ahead and do this again,
make compare, compiles OK, dot slash

00:55:16.830 --> 00:55:17.330
compare.

00:55:17.330 --> 00:55:19.310
Let me type in HI, HI.

00:55:19.310 --> 00:55:21.710
And they seem to be visually the same.

00:55:21.710 --> 00:55:24.770
But recall that, now, I
have this other format code,

00:55:24.770 --> 00:55:27.080
such that I can now
start treating strings

00:55:27.080 --> 00:55:29.330
as the addresses they technically are.

00:55:29.330 --> 00:55:33.140
So let me change percent s
to percent p in both places.

00:55:33.140 --> 00:55:37.610
Let me then recompile the program, and
now, rerun compare with both HI and HI

00:55:37.610 --> 00:55:38.690
identically typed.

00:55:38.690 --> 00:55:43.100
But notice, they've ended up at
slightly different memory locations.

00:55:43.100 --> 00:55:46.820
Even though I have coincidentally
typed the same thing, C and my computer

00:55:46.820 --> 00:55:52.097
are not going to be so presumptuous as
to use the same bytes for both strings.

00:55:52.097 --> 00:55:53.930
That's not going to
give me much flexibility

00:55:53.930 --> 00:55:55.490
if I want to change one or the other.

00:55:55.490 --> 00:55:58.490
It's going to very simplistically
put one in this chunk of memory

00:55:58.490 --> 00:56:00.240
and the other in this chunk of memory.

00:56:00.240 --> 00:56:03.680
And indeed, those addresses are
respectively, but arbitrarily,

00:56:03.680 --> 00:56:07.220
0x22fe670 and 0x22fe6b0.

00:56:09.770 --> 00:56:12.500
So they are spread apart some distance.

00:56:12.500 --> 00:56:15.810
But again, it's up to the computer to
decide where to actually put those.

00:56:15.810 --> 00:56:18.310
So what's actually going on
inside of the computer's memory?

00:56:18.310 --> 00:56:22.010
Well, let's consider if, for instance,
this is s, my pointer, or really,

00:56:22.010 --> 00:56:22.640
my string.

00:56:22.640 --> 00:56:23.810
But it's just a pointer now.

00:56:23.810 --> 00:56:25.060
It's the address of something.

00:56:25.060 --> 00:56:28.250
Notice that I've drawn it
as taking up eight squares,

00:56:28.250 --> 00:56:31.680
because again, a pointer on
modern systems is eight bytes.

00:56:31.680 --> 00:56:33.320
So that's why this thing is so big.

00:56:33.320 --> 00:56:37.100
Meanwhile, when I type in something
like HI with the exclamation point,

00:56:37.100 --> 00:56:38.720
then it ends up somewhere in memory.

00:56:38.720 --> 00:56:40.440
We don't really know
or care where it is.

00:56:40.440 --> 00:56:42.773
So let's just arbitrarily say
it happens to end up there

00:56:42.773 --> 00:56:43.850
in my computer's memory.

00:56:43.850 --> 00:56:46.730
Now, each of those bytes,
of course, has an address.

00:56:46.730 --> 00:56:48.950
I don't necessarily know
or care what they are.

00:56:48.950 --> 00:56:52.040
But for explanation's sake, let's
just number them again like before,

00:56:52.040 --> 00:56:56.810
0x123, 0x124, 0x125, 0x126.

00:56:56.810 --> 00:57:02.960
When I then assign s on the left the
value from get string on the right,

00:57:02.960 --> 00:57:04.670
get string, what is it going to do?

00:57:04.670 --> 00:57:07.640
Well, all of this time since week
one, since you've been using it,

00:57:07.640 --> 00:57:11.970
it is, yes, getting a string and handing
it back to you as a return value.

00:57:11.970 --> 00:57:13.680
But what does that really mean?

00:57:13.680 --> 00:57:18.200
Well, if a string is just an address,
the return value of a function

00:57:18.200 --> 00:57:23.030
like get string is to return to, not
the string per se, because that's

00:57:23.030 --> 00:57:24.740
kind of a high level concept.

00:57:24.740 --> 00:57:27.050
What get string has
always been doing for us

00:57:27.050 --> 00:57:29.810
is returning the address
of the string, or more

00:57:29.810 --> 00:57:33.410
specifically, the address of the
first character in the string.

00:57:33.410 --> 00:57:39.740
And so what is technically stored in
s, to be clear, is that address, 0x123.

00:57:39.740 --> 00:57:43.400
It's not returning to the whole string,
the H, the I, the exclamation point.

00:57:43.400 --> 00:57:46.040
Rather, it's returning
just one value to you.

00:57:46.040 --> 00:57:50.990
It's returning only to you the address
of the first character of that string.

00:57:50.990 --> 00:57:54.500
But again, this is all
very good for just s.

00:57:54.500 --> 00:57:55.880
What's going on with t?

00:57:55.880 --> 00:57:58.910
t is kind of the same story, because
I'm calling get string again.

00:57:58.910 --> 00:58:02.390
t is going to get assigned the
address of the first character

00:58:02.390 --> 00:58:03.500
of this version of HI.

00:58:03.500 --> 00:58:13.160
And let's just arbitrarily say it's
at 0x456, 0x457, 0x458, and 0x459.

00:58:13.160 --> 00:58:16.873
And at this point, t is going
to take on the value of 0x456.

00:58:16.873 --> 00:58:19.790
And now, at this point, honestly,
we're really getting into the weeds.

00:58:19.790 --> 00:58:21.665
Let's just start
abstracting all of this away

00:58:21.665 --> 00:58:23.870
and use arrows to point at the values.

00:58:23.870 --> 00:58:26.720
And indeed, these arrows
just represent pointers

00:58:26.720 --> 00:58:29.190
when we stop caring about
the particular addresses.

00:58:29.190 --> 00:58:32.300
So s is really just a
pointer, a variable pointing

00:58:32.300 --> 00:58:34.070
at the first character of HI here.

00:58:34.070 --> 00:58:38.490
t is just a variable pointing at
the first character of HI there.

00:58:38.490 --> 00:58:41.540
And so when you are
comparing two strings

00:58:41.540 --> 00:58:45.440
as I was before in the
earlier version of my program,

00:58:45.440 --> 00:58:53.540
where I was checking if s equals equals
t, I was, indeed, comparing s and t.

00:58:53.540 --> 00:58:55.130
What are s and t?

00:58:55.130 --> 00:59:01.640
s and t, respectively,
are 0x123 and 0x456,

00:59:01.640 --> 00:59:03.770
or whatever the actual
values happen to be,

00:59:03.770 --> 00:59:06.320
which are not going to be
the same because they happen

00:59:06.320 --> 00:59:09.920
to point to different chunks of memory.

00:59:09.920 --> 00:59:12.110
All right, well who cares?

00:59:12.110 --> 00:59:14.630
This is all kind of a nice
intellectual exercise.

00:59:14.630 --> 00:59:15.512
But who cares?

00:59:15.512 --> 00:59:16.970
Well, how do we solve this problem?

00:59:16.970 --> 00:59:20.480
Let's consider what I actually
did in a previous demo.

00:59:20.480 --> 00:59:23.955
I sort of preemptively mentioned that
there's this function, string compare,

00:59:23.955 --> 00:59:25.580
that allows you to compare two strings.

00:59:25.580 --> 00:59:28.040
And I promised that we
would eventually explain

00:59:28.040 --> 00:59:31.573
why we use str compare as opposed
to just using the equal equal sign.

00:59:31.573 --> 00:59:33.740
Well, to use this function,
I'm going to need to add

00:59:33.740 --> 00:59:37.910
in string.h up here per lat time.

00:59:37.910 --> 00:59:40.790
But if string compare s t, let
me go ahead and recompile this,

00:59:40.790 --> 00:59:43.160
compare dots slash compare.

00:59:43.160 --> 00:59:45.710
Now, let me type HI!
and HI! identically.

00:59:45.710 --> 00:59:47.870
Now, they still seem to be different.

00:59:47.870 --> 00:59:51.680
And dammit, I made the same
stupid mistake as I did last time.

00:59:51.680 --> 00:59:57.170
Does anyone know what mistake I
made when comparing two strings?

00:59:57.170 --> 01:00:00.590
Somehow I seem to be very
good at making this mistake.

01:00:00.590 --> 01:00:03.440
BRIAN: Ibrahim is suggesting
that you add an equal equal zero.

01:00:03.440 --> 01:00:04.398
DAVID MALAN: Thank you.

01:00:04.398 --> 01:00:05.390
Ibrahim is quite right.

01:00:05.390 --> 01:00:08.000
The return value,
recall, of str compare,

01:00:08.000 --> 01:00:13.040
is to return 0 if they're the same,
a negative number if one comes

01:00:13.040 --> 01:00:16.430
before the other, and a positive
number if one comes after the other,

01:00:16.430 --> 01:00:18.600
as in ASCIIbetical order.

01:00:18.600 --> 01:00:21.440
So what I should have done,
both last time and this time,

01:00:21.440 --> 01:00:23.600
is check for equality with 0.

01:00:23.600 --> 01:00:26.220
Let me go ahead and
recompile this program.

01:00:26.220 --> 01:00:27.050
OK, good.

01:00:27.050 --> 01:00:29.090
Now, let me rerun this program with HI!

01:00:29.090 --> 01:00:30.230
twice.

01:00:30.230 --> 01:00:31.940
Voila, they're the same.

01:00:31.940 --> 01:00:34.580
And just to make sure,
let me do one other check.

01:00:34.580 --> 01:00:38.810
Let me do David and Brian, which
should be, indeed, different.

01:00:38.810 --> 01:00:42.050
So now, again, I haven't really done
anything different from that last time.

01:00:42.050 --> 01:00:47.420
But I'm now thinking about these
strings as being fundamentally just

01:00:47.420 --> 01:00:48.173
their addresses.

01:00:48.173 --> 01:00:50.090
And so, now, let's make
this actually germane.

01:00:50.090 --> 01:00:52.160
Let me go ahead and create
a new file altogether.

01:00:52.160 --> 01:00:56.590
And let's, pretty reasonably, try to
copy one string and make changes to it.

01:00:56.590 --> 01:00:57.840
So I'm going to go ahead here.

01:00:57.840 --> 01:01:00.230
And just for convenience, I'm going
to still use the CS50 library,

01:01:00.230 --> 01:01:02.300
not for the string data
type, but just for the

01:01:02.300 --> 01:01:06.200
get string function, which we'll see
is more handy than other things--

01:01:06.200 --> 01:01:07.790
than other ways of doing things.

01:01:07.790 --> 01:01:11.630
And I'm going to go ahead and
include standard io dot h.

01:01:11.630 --> 01:01:17.450
And I'm going to go ahead and
include, how about, string.h.

01:01:17.450 --> 01:01:20.000
Let me go ahead and do int main void.

01:01:20.000 --> 01:01:22.790
And let me go ahead, in this
program, and get myself a string.

01:01:22.790 --> 01:01:24.540
But note, we won't
call it string anymore.

01:01:24.540 --> 01:01:26.030
We'll just call it char star.

01:01:26.030 --> 01:01:28.380
So again, start taking
off that training wheel.

01:01:28.380 --> 01:01:31.312
And I'm going to go ahead
and get a string called s.

01:01:31.312 --> 01:01:33.020
And then I'm going to
get another string.

01:01:33.020 --> 01:01:34.062
But I won't call it that.

01:01:34.062 --> 01:01:36.230
I'll call it char star t.

01:01:36.230 --> 01:01:37.400
And I want to copy s.

01:01:37.400 --> 01:01:40.790
And so you might think, based on week
one, week two, and since, that OK,

01:01:40.790 --> 01:01:42.890
if you want to copy a
variable, just do it.

01:01:42.890 --> 01:01:44.690
I mean, we've used the
assignment operator

01:01:44.690 --> 01:01:48.530
to copy a variable from right
to left for integers, for chars,

01:01:48.530 --> 01:01:50.600
and for other data types, perhaps, too.

01:01:50.600 --> 01:01:54.690
I'm going to go ahead, now, and make
a change to the original string.

01:01:54.690 --> 01:01:56.270
So let me go ahead and do this.

01:01:56.270 --> 01:02:01.280
Let me go ahead and say, let's
change the first character of t

01:02:01.280 --> 01:02:02.780
to be uppercase.

01:02:02.780 --> 01:02:04.940
Recall that there's
this function, to upper,

01:02:04.940 --> 01:02:09.170
which takes, as input, a character,
like the first character in t,

01:02:09.170 --> 01:02:11.120
and returns the uppercase version.

01:02:11.120 --> 01:02:14.240
Now, to use to upper, I
need another header file,

01:02:14.240 --> 01:02:17.990
which I recall from a couple of
weeks ago now, I need ctype.h.

01:02:17.990 --> 01:02:20.750
So let me preemptively go
back and put that there.

01:02:20.750 --> 01:02:23.280
And now, let me go ahead
and print these two strings.

01:02:23.280 --> 01:02:27.500
Let me go ahead and print out
s as being this percent s.

01:02:27.500 --> 01:02:33.990
And let me go ahead and print out the
value of t with percent s as follows.

01:02:33.990 --> 01:02:36.680
So again, what I'm doing is I'm
getting a string from the user.

01:02:36.680 --> 01:02:40.490
And the only new thing here is char star
today, which is synonymous with string.

01:02:40.490 --> 01:02:44.270
On line 10 here, I'm copying
the string from right to left.

01:02:44.270 --> 01:02:47.330
And then I'm capitalizing
only the first letter

01:02:47.330 --> 01:02:49.640
in the copy, otherwise known as t.

01:02:49.640 --> 01:02:51.140
And then I'm just printing both out.

01:02:51.140 --> 01:02:54.290
So let me go ahead and
make copy, compiles OK.

01:02:54.290 --> 01:02:56.510
Make cop-- dot slash copy.

01:02:56.510 --> 01:03:00.020
Let me go ahead and type in hi!
in lowercase, all lowercase,

01:03:00.020 --> 01:03:00.920
and then enter.

01:03:00.920 --> 01:03:03.830
And voila, huh.

01:03:03.830 --> 01:03:10.760
It would seem that I somehow capitalized
both S and T, even though I only

01:03:10.760 --> 01:03:17.080
called to upper on T.
Brian, any thoughts

01:03:17.080 --> 01:03:24.820
from the group on why I've accidentally
and erroneously capitalized

01:03:24.820 --> 01:03:26.260
both somehow?

01:03:26.260 --> 01:03:29.735
BRIAN: A couple of people are
saying that t is just an alias of s.

01:03:29.735 --> 01:03:32.860
DAVID MALAN: Just an alias of s, that's
a reasonable way of thinking of it,

01:03:32.860 --> 01:03:33.360
sure.

01:03:33.360 --> 01:03:38.320
And more precisely, any other thoughts
on why this is incorrect somehow?

01:03:38.320 --> 01:03:41.540
BRIAN: Peter is now suggesting
that they have the same address.

01:03:41.540 --> 01:03:45.880
DAVID MALAN: So yeah, more specifically,
all I've done is copy s into t.

01:03:45.880 --> 01:03:48.040
But again, what is s as of today?

01:03:48.040 --> 01:03:49.390
It's just an address.

01:03:49.390 --> 01:03:51.040
So yes, I have copied s.

01:03:51.040 --> 01:03:54.820
But I've copied it literally, which
means copying its address, 0x123,

01:03:54.820 --> 01:03:55.820
or whatever it is.

01:03:55.820 --> 01:04:01.180
And then on line 12, notice that
I'm changing t by uppercasing it.

01:04:01.180 --> 01:04:04.130
But t is at the same address of s.

01:04:04.130 --> 01:04:08.130
So really, I'm changing
one in the same string.

01:04:08.130 --> 01:04:10.630
So if we think about this in
terms of the computer's memory,

01:04:10.630 --> 01:04:12.088
let's consider what I've just done.

01:04:12.088 --> 01:04:13.570
Let me clear the computer's memory.

01:04:13.570 --> 01:04:15.290
Let me put s down as before.

01:04:15.290 --> 01:04:18.250
Let me put hi! down as before,
but all lowercase this time.

01:04:18.250 --> 01:04:23.320
And recall that it might be it
addresses 0x123, 124, 125, and 126.

01:04:23.320 --> 01:04:26.350
And now, if we consider
that s technically

01:04:26.350 --> 01:04:29.740
contains the address of
that first character, 0x123,

01:04:29.740 --> 01:04:34.960
and I proceed to create a new variable,
t, and assign t the value of s,

01:04:34.960 --> 01:04:36.970
I got to take that statement literally.

01:04:36.970 --> 01:04:39.670
I'm literally just putting 0x123 here.

01:04:39.670 --> 01:04:41.770
And if we now abstract
away these details just

01:04:41.770 --> 01:04:44.020
to make it more clear
visually what's going on,

01:04:44.020 --> 01:04:48.070
that's pretty much like
saying that both s and t point

01:04:48.070 --> 01:04:49.750
to the same location in memory.

01:04:49.750 --> 01:04:52.297
So yes, in that sense, t
is just an alias for s,

01:04:52.297 --> 01:04:54.130
which is a reasonable
way of thinking of it.

01:04:54.130 --> 01:04:56.920
But really, just t is identical to s.

01:04:56.920 --> 01:04:59.110
So when you use the
square bracket notation

01:04:59.110 --> 01:05:02.290
to go to the first character
of t, you are equivalently

01:05:02.290 --> 01:05:04.750
going to the first character in s.

01:05:04.750 --> 01:05:06.200
They are one in the same.

01:05:06.200 --> 01:05:10.390
So when I call to upper, I'm calling it
on this character, which of course, is

01:05:10.390 --> 01:05:12.970
the one and only h in the story.

01:05:12.970 --> 01:05:16.240
And when I print s and
I print t, printf is

01:05:16.240 --> 01:05:18.610
following those same
breadcrumbs, if you will,

01:05:18.610 --> 01:05:24.070
and ultimately displaying the
same value as having changed.

01:05:24.070 --> 01:05:27.220
So we would seem to need
to fundamentally rethink

01:05:27.220 --> 01:05:28.990
how we are copying strings.

01:05:28.990 --> 01:05:34.300
And let me ask, if this is the wrong way
to copy one string into the other, what

01:05:34.300 --> 01:05:35.350
is the right way?

01:05:35.350 --> 01:05:39.340
Even if you don't have the functions
in mind or the right vocabulary,

01:05:39.340 --> 01:05:43.750
just intuitively, , if we want to copy
a string in the way that a human would

01:05:43.750 --> 01:05:50.020
think of copying one into the other,
like a photograph or a photocopy,

01:05:50.020 --> 01:05:52.610
how do we want to do this?

01:05:52.610 --> 01:05:54.460
Any thoughts, Brian?

01:05:54.460 --> 01:05:57.430
BRIAN: Yeah, Sophia suggested we
would want to somehow loop over

01:05:57.430 --> 01:05:59.948
the elements in s and put them into t.

01:05:59.948 --> 01:06:01.240
DAVID MALAN: Yeah, I like that.

01:06:01.240 --> 01:06:04.120
So loop over the elements
of s and put them into t.

01:06:04.120 --> 01:06:05.800
So it sounds like more work.

01:06:05.800 --> 01:06:07.660
But that's, again, what
we're going to have

01:06:07.660 --> 01:06:09.582
to do if we want to think of these--

01:06:09.582 --> 01:06:12.790
if we want to accept the fact that these
things, s and t, are just addresses,

01:06:12.790 --> 01:06:15.550
we're going to now have to go
and follow those breadcrumbs.

01:06:15.550 --> 01:06:18.790
So let's go ahead and consider
a variant of this program.

01:06:18.790 --> 01:06:24.520
Let me go ahead, here, and change this
such that I'm still getting a string s.

01:06:24.520 --> 01:06:28.390
But now, let me go ahead
and propose exactly that,

01:06:28.390 --> 01:06:30.340
that we copy the individual characters.

01:06:30.340 --> 01:06:32.320
But I need to copy them somewhere.

01:06:32.320 --> 01:06:35.200
So I feel like another step in
this process of copying a string

01:06:35.200 --> 01:06:37.750
has to be to give myself
some additional memory.

01:06:37.750 --> 01:06:40.840
If I have H i exclamation
point in nul character,

01:06:40.840 --> 01:06:43.150
I need to, now, somehow take
control of this situation

01:06:43.150 --> 01:06:48.320
and tell the computer somehow, in
code, give me four more bytes of memory

01:06:48.320 --> 01:06:53.390
so that I have location for t in
which to copy those characters.

01:06:53.390 --> 01:06:55.360
So here's a new function today.

01:06:55.360 --> 01:06:59.470
If I want to create a string t,
otherwise known today as a char star,

01:06:59.470 --> 01:07:02.680
there is a new function we
can use called malloc, which

01:07:02.680 --> 01:07:04.720
represents memory allocation.

01:07:04.720 --> 01:07:08.200
This is a pretty fancy function that,
fortunately, is pretty simple to use.

01:07:08.200 --> 01:07:10.390
It takes, as input, just a number.

01:07:10.390 --> 01:07:14.480
How many bytes of memory do you
want to ask the computer for?

01:07:14.480 --> 01:07:16.000
So how do I do this?

01:07:16.000 --> 01:07:20.110
Well, H i exclamation point backslash
0, I could literally just say four.

01:07:20.110 --> 01:07:21.850
But this doesn't feel very dynamic.

01:07:21.850 --> 01:07:26.410
I think I can programmatically
implement this a little more elegantly.

01:07:26.410 --> 01:07:30.370
Let me go ahead and say,
give me as many bytes

01:07:30.370 --> 01:07:35.200
as there are characters in s plus 1.

01:07:35.200 --> 01:07:37.090
Plus 1, why am I doing this?

01:07:37.090 --> 01:07:40.773
Well, H i exclamation point nul
character, that's technically

01:07:40.773 --> 01:07:42.190
what's stored underneath the hood.

01:07:42.190 --> 01:07:45.250
But what do you and I think
of the length of Hi! as being?

01:07:45.250 --> 01:07:48.070
Well, odds are, in the human
world, it's H i exclamation point.

01:07:48.070 --> 01:07:50.710
And who cares about this low
level detail, this nul terminator.

01:07:50.710 --> 01:07:53.800
You don't include that in the length
of an English word or any word.

01:07:53.800 --> 01:07:56.290
You only think of the actual
characters you can see.

01:07:56.290 --> 01:08:00.580
So the length of H, i,
exclamation point 3.

01:08:00.580 --> 01:08:08.110
But I do need to cleverly add one more
bite, a fourth, for the nul character,

01:08:08.110 --> 01:08:10.580
because I'm going to have
to copy that over as well.

01:08:10.580 --> 01:08:13.270
Otherwise, if I don't have
an identical nul character,

01:08:13.270 --> 01:08:15.830
t is not going to have
an obvious ending.

01:08:15.830 --> 01:08:17.872
So how do I copy, now,
one string into the other?

01:08:17.872 --> 01:08:20.538
Well, let me go ahead and take
out our old friend, the for loop,

01:08:20.538 --> 01:08:21.380
from week one.

01:08:21.380 --> 01:08:24.050
And say, for i equals 0--

01:08:24.050 --> 01:08:26.810
how about, actually, n
equals string length of s.

01:08:26.810 --> 01:08:28.279
We've done this trick before.

01:08:28.279 --> 01:08:33.080
i is less than n, i++.

01:08:33.080 --> 01:08:38.689
Let me go ahead and, quite simply,
say t bracket i gets s bracket i.

01:08:38.689 --> 01:08:43.939
So this will literally copy, from s,
each of the characters one at a time

01:08:43.939 --> 01:08:45.020
into t.

01:08:45.020 --> 01:08:46.640
But I need to be a little smarter now.

01:08:46.640 --> 01:08:49.130
Even though we almost
always do i less than n,

01:08:49.130 --> 01:08:55.660
I'm actually going to very aggressively
say i less than or equal to n.

01:08:55.660 --> 01:08:56.830
Why?

01:08:56.830 --> 01:09:00.250
Why am I going one step
further than I feel we normally

01:09:00.250 --> 01:09:03.310
do when iterating over strings,
and one step further than you

01:09:03.310 --> 01:09:07.149
probably did when iterating
over a caesar cipher or a string

01:09:07.149 --> 01:09:09.130
in that context?

01:09:09.130 --> 01:09:10.939
Brian, any thoughts here?

01:09:10.939 --> 01:09:16.569
Why am I going from i less than or equal
to n kind of for the first time here?

01:09:16.569 --> 01:09:19.779
BRIAN: Celina is suggesting that we
need to include the nul character.

01:09:19.779 --> 01:09:22.843
DAVID MALAN: Yeah, so if I-- and
now I understand how strings works.

01:09:22.843 --> 01:09:25.510
So it's not sufficient to just
copy the H, I, exclamation point.

01:09:25.510 --> 01:09:29.020
I need to go one step further, one
more than the length of the string.

01:09:29.020 --> 01:09:32.290
And the easiest way to do that
would be less than or equal to n.

01:09:32.290 --> 01:09:34.450
Or I could just do a plus 1 there.

01:09:34.450 --> 01:09:35.950
Or I can do this any number of ways.

01:09:35.950 --> 01:09:37.399
Doesn't matter how you do it.

01:09:37.399 --> 01:09:40.899
But I think a less than or equal
to is one reasonable way to do it.

01:09:40.899 --> 01:09:43.540
And now, let's go down to the
bottom here and now actually

01:09:43.540 --> 01:09:44.590
do this capitalization.

01:09:44.590 --> 01:09:47.710
Let's now change the
first character in t

01:09:47.710 --> 01:09:52.750
to be the result of calling to
upper on the first character of t.

01:09:52.750 --> 01:09:56.770
And then, as before, let's go
ahead and print out whatever s is.

01:09:56.770 --> 01:09:59.080
And like before, let's
go ahead and print out

01:09:59.080 --> 01:10:05.110
whatever t is and hope now that
only t has been capitalized.

01:10:05.110 --> 01:10:07.330
But I do need to make one change now.

01:10:07.330 --> 01:10:10.690
It turns out that this
function, malloc, comes

01:10:10.690 --> 01:10:12.897
in a file called standard lib dot h.

01:10:12.897 --> 01:10:15.730
And again, this is the kind of thing
that you can jot down in notes.

01:10:15.730 --> 01:10:17.563
You can always Google
these kinds of things.

01:10:17.563 --> 01:10:20.740
Even I forget what header files these
functions are sometimes declared in.

01:10:20.740 --> 01:10:24.310
But it happens to be a new one
called standard lib for library

01:10:24.310 --> 01:10:26.110
that gives you access to malloc.

01:10:26.110 --> 01:10:29.800
So let me go ahead,
now, and make compare.

01:10:29.800 --> 01:10:31.210
All right, so far so good.

01:10:31.210 --> 01:10:34.360
Dot slash compare-- sorry,
this is not compare.

01:10:34.360 --> 01:10:35.680
The old program works fine.

01:10:35.680 --> 01:10:38.630
Make copy-- oh my god, seven mistakes.

01:10:38.630 --> 01:10:40.460
What'd I do wrong here?

01:10:40.460 --> 01:10:44.560
Oh, it looks like I forgot
the type of i and n.

01:10:44.560 --> 01:10:47.440
So let me go into my for
loop and add the int.

01:10:47.440 --> 01:10:49.870
That was my fault. Let
me make copy again.

01:10:49.870 --> 01:10:51.910
OK, all seven errors,
thankfully, went away.

01:10:51.910 --> 01:10:56.710
Make copy, let's go ahead and type
in hi! in lower case and hit Enter.

01:10:56.710 --> 01:11:02.860
And voila, now I have capitalized
only the copy of s, a.k.a.

01:11:02.860 --> 01:11:03.580
t.

01:11:03.580 --> 01:11:06.010
And just to be clear, I've
kind of regressed back

01:11:06.010 --> 01:11:09.140
to my square bracket notation, honestly,
because it's perfectly acceptable.

01:11:09.140 --> 01:11:10.360
It's very readable.

01:11:10.360 --> 01:11:12.640
But notice, if I really
want to show off,

01:11:12.640 --> 01:11:19.190
I could say something like,
well, go to t's plus i location.

01:11:19.190 --> 01:11:23.078
And then do this, which again, I don't
necessarily recommend for readability.

01:11:23.078 --> 01:11:24.620
But again, there is this equivalence.

01:11:24.620 --> 01:11:28.640
The square bracket notation is the
same thing as pointer arithmetic.

01:11:28.640 --> 01:11:34.160
So if you want to go to the address at
t plus whatever i is to offset yourself

01:11:34.160 --> 01:11:36.570
one or more bytes, you
can totally do that.

01:11:36.570 --> 01:11:39.920
And if I want to be fancy,
I can go down here and say,

01:11:39.920 --> 01:11:45.350
go to the first character
in t and capitalize it.

01:11:45.350 --> 01:11:48.170
But again, I would argue that even
though, yes, you're very clever

01:11:48.170 --> 01:11:50.420
and that you understand pointers
and addresses at this point

01:11:50.420 --> 01:11:51.795
if you're writing code like this.

01:11:51.795 --> 01:11:53.990
Honestly, it's not
necessarily as readable.

01:11:53.990 --> 01:11:57.800
So sticking with week two syntax of
the square bracket notation, totally

01:11:57.800 --> 01:12:03.110
reasonable, totally correct, totally
well-designed, and perhaps preferable,

01:12:03.110 --> 01:12:04.890
though I should be careful here.

01:12:04.890 --> 01:12:07.550
This line of code is a
little bit risky for me

01:12:07.550 --> 01:12:10.310
because what if the user just
hits Enter and they don't type hi

01:12:10.310 --> 01:12:11.540
or David or Brian.

01:12:11.540 --> 01:12:13.580
What if they type nothing except Enter?

01:12:13.580 --> 01:12:16.130
In that case, the length
of the string might be 0.

01:12:16.130 --> 01:12:19.220
And then I probably shouldn't
capitalizing the first character

01:12:19.220 --> 01:12:22.230
in a string that doesn't
really even exist.

01:12:22.230 --> 01:12:25.250
So I should probably
have some error checking,

01:12:25.250 --> 01:12:32.450
like if, for instance, the string
length of t is at least greater than 0,

01:12:32.450 --> 01:12:34.960
then go ahead and safely do that.

01:12:34.960 --> 01:12:37.550
But again, this is just one
example of some additional error

01:12:37.550 --> 01:12:39.200
checking I can add to the program.

01:12:39.200 --> 01:12:41.300
There's actually one more
piece of error checking

01:12:41.300 --> 01:12:43.520
I should really do in a
fully correct program,

01:12:43.520 --> 01:12:45.170
as you should do in problem sets.

01:12:45.170 --> 01:12:47.010
Sometimes things can go wrong.

01:12:47.010 --> 01:12:50.270
And if your program is so big,
so fancy, and so memory-hungry

01:12:50.270 --> 01:12:52.187
that you're mallocing
lots and lots of memory,

01:12:52.187 --> 01:12:54.062
which you won't do in
the program this small,

01:12:54.062 --> 01:12:56.270
but over time you might
need more and more memory,

01:12:56.270 --> 01:13:01.490
we should also make sure that
t actually has a valid address.

01:13:01.490 --> 01:13:04.670
It turns out that
malloc, most of the time,

01:13:04.670 --> 01:13:08.090
is going to return to you the
address of a chunk of memory

01:13:08.090 --> 01:13:09.470
it has allocated for you.

01:13:09.470 --> 01:13:11.300
Just like get string,
it will return to you

01:13:11.300 --> 01:13:14.900
the address of the first
byte of the chunk of memory

01:13:14.900 --> 01:13:16.820
that it has found space for.

01:13:16.820 --> 01:13:18.740
However, sometimes things can go wrong.

01:13:18.740 --> 01:13:20.630
Sometimes your computer
can be out of memory.

01:13:20.630 --> 01:13:24.320
You've probably seen your Mac or
PC freeze or hang or reboot itself.

01:13:24.320 --> 01:13:26.910
That is very often the
result of memory errors.

01:13:26.910 --> 01:13:29.000
So we should actually
check something like this.

01:13:29.000 --> 01:13:32.570
If t equals equals
this special value nul,

01:13:32.570 --> 01:13:35.360
then I'm going to go ahead and
just bail out and return one,

01:13:35.360 --> 01:13:37.280
quit, let's get out of the program.

01:13:37.280 --> 01:13:38.760
It's not going to work.

01:13:38.760 --> 01:13:41.610
This might only happen one
out of a million times.

01:13:41.610 --> 01:13:44.220
But it's more correct to check for nul.

01:13:44.220 --> 01:13:48.350
Now, unfortunately, the designers
of C kind of used-- or programmers

01:13:48.350 --> 01:13:53.210
more generally, use this word,
which is almost the same as N-U-L,

01:13:53.210 --> 01:13:54.980
otherwise known as backslash 0.

01:13:54.980 --> 01:13:57.290
Unfortunately, this
is a different value.

01:13:57.290 --> 01:14:01.370
N-U-L-L represents a nul pointer.

01:14:01.370 --> 01:14:02.870
It is a bogus address.

01:14:02.870 --> 01:14:04.580
It is the absence of an address.

01:14:04.580 --> 01:14:06.950
Technically, its address 0.

01:14:06.950 --> 01:14:09.230
It is different from backslash 0.

01:14:09.230 --> 01:14:14.000
You use N-U-L-L in the context of
pointers, as we are doing today.

01:14:14.000 --> 01:14:17.390
You use backslash 0,
otherwise known verbally,

01:14:17.390 --> 01:14:21.210
as an N-U-L, or nul, in
the context of characters.

01:14:21.210 --> 01:14:23.810
So backslash 0 is for characters.

01:14:23.810 --> 01:14:26.750
N-U-L-L in all caps is for pointers.

01:14:26.750 --> 01:14:29.750
And it's just a new symbol
we're introducing today

01:14:29.750 --> 01:14:34.520
that comes with this
standard lib dot h file.

01:14:34.520 --> 01:14:38.190
All right, so it turns out, honestly,
I don't need to do some of this work.

01:14:38.190 --> 01:14:41.610
It turns out that if I want
to copy one string to another,

01:14:41.610 --> 01:14:43.170
there is a function for that.

01:14:43.170 --> 01:14:45.920
And increasingly, you will not
have to write as many lines of code

01:14:45.920 --> 01:14:49.520
as you previously did, because if
you look up in the manual pages

01:14:49.520 --> 01:14:52.730
or you've heard about or find online
that there's another function, like one

01:14:52.730 --> 01:14:56.790
called strcpy, you can actually,
more simply, do something like this.

01:14:56.790 --> 01:15:00.410
So even though I really liked the idea,
and it was correct to use a for loop

01:15:00.410 --> 01:15:04.950
to copy all of the characters from s
into t, there's a function for that.

01:15:04.950 --> 01:15:06.200
It's called strcpy.

01:15:06.200 --> 01:15:09.830
It takes two arguments, the
destination followed by the source.

01:15:09.830 --> 01:15:12.200
And it will just handle
all of the looping

01:15:12.200 --> 01:15:15.890
for us, all of the copying for
us, including the backslash 0,

01:15:15.890 --> 01:15:18.830
so that I can focus on what I
want to do, which in this case,

01:15:18.830 --> 01:15:21.300
is actually capitalize things.

01:15:21.300 --> 01:15:26.497
So if we consider, now, this example,
in the context of my computer's memory,

01:15:26.497 --> 01:15:28.580
we'll see that it's laid
out a little differently.

01:15:28.580 --> 01:15:31.050
But there's one more bug
I do want to fix first.

01:15:31.050 --> 01:15:33.230
And this is something
we've not had to do yet.

01:15:33.230 --> 01:15:37.850
It turns out that any time you
allocate memory with malloc,

01:15:37.850 --> 01:15:41.330
you ask the computer for memory,
the onus is on you, the programmer,

01:15:41.330 --> 01:15:43.160
to eventually give it back.

01:15:43.160 --> 01:15:46.070
And by that, I mean if
you allocate four bytes,

01:15:46.070 --> 01:15:49.430
or who knows, four million bytes of
memory for an even bigger program,

01:15:49.430 --> 01:15:52.160
you'd better give it back to
the computer, more specifically,

01:15:52.160 --> 01:15:55.252
the operating system, be it
Linux or Mac OS or Windows,

01:15:55.252 --> 01:15:57.710
so that your computer eventually
doesn't run out of memory.

01:15:57.710 --> 01:16:00.418
If all you ever do is ask for
more memory, ask for more memory,

01:16:00.418 --> 01:16:03.710
it stands to reason that eventually your
computer will run out, because it only

01:16:03.710 --> 01:16:05.370
has a finite amount of memory.

01:16:05.370 --> 01:16:07.910
It's got a finite amount
of hardware recall.

01:16:07.910 --> 01:16:11.780
So when you're done with memory,
it should be your best practice

01:16:11.780 --> 01:16:14.970
to free it afterward as well.

01:16:14.970 --> 01:16:18.950
And the opposite of malloc is just
a function called free, which takes,

01:16:18.950 --> 01:16:22.040
as its input, whatever
the output of malloc was.

01:16:22.040 --> 01:16:25.070
And recall that the output of
malloc, the return value of malloc,

01:16:25.070 --> 01:16:30.210
is just the address of the first byte
of memory that it has allocated for you.

01:16:30.210 --> 01:16:34.010
So if you ask it for four bytes, like
I did a few lines ago with malloc,

01:16:34.010 --> 01:16:37.100
you're going to get back the
address of the first of those bytes.

01:16:37.100 --> 01:16:41.150
And it's up to you to remember
how many bytes you asked for.

01:16:41.150 --> 01:16:43.760
In the case of free,
all you have to do is

01:16:43.760 --> 01:16:49.820
tell free via its input what the
address was that malloc gave you.

01:16:49.820 --> 01:16:53.210
So if you stored that address as
I did, in this variable called t,

01:16:53.210 --> 01:16:58.190
it suffices when you're done with
that memory just called free t.

01:16:58.190 --> 01:17:02.360
And the computer will go about
freeing up that memory for you.

01:17:02.360 --> 01:17:04.880
And you might very well
get it back later on.

01:17:04.880 --> 01:17:07.400
But at least your computer
won't run out of memory

01:17:07.400 --> 01:17:13.490
as quickly, because it can now
reuse that space for something else.

01:17:13.490 --> 01:17:15.410
All right, let me go
ahead, then, and propose

01:17:15.410 --> 01:17:17.870
that we draw a picture of this--

01:17:17.870 --> 01:17:20.942
now new program's memory,
where we copy things.

01:17:20.942 --> 01:17:23.900
So recall, this is where we left off
before when comparing two strings.

01:17:23.900 --> 01:17:29.010
If this was s and s was pointing to
h, i, exclamation point in lowercase,

01:17:29.010 --> 01:17:32.510
this new version of my
code in copy.c, notice,

01:17:32.510 --> 01:17:34.550
still gives me another pointer called t.

01:17:34.550 --> 01:17:36.530
So that part of the
story hasn't changed.

01:17:36.530 --> 01:17:37.970
But I call malloc now.

01:17:37.970 --> 01:17:40.790
And malloc is going to return
to me some new chunk of memory.

01:17:40.790 --> 01:17:42.440
I don't know in advance where it is.

01:17:42.440 --> 01:17:45.740
But malloc's return value
is going to be the address

01:17:45.740 --> 01:17:47.920
of the first bite of that memory.

01:17:47.920 --> 01:17:51.050
So for instance, 0x456
or whatever it is.

01:17:51.050 --> 01:17:54.230
And the subsequent bytes are
going to be increasing by one

01:17:54.230 --> 01:17:59.630
byte at a time, 0x457, 0x458, 0x459.

01:17:59.630 --> 01:18:03.800
So what is, ultimately, stored in t when
I assign it the return value of malloc?

01:18:03.800 --> 01:18:05.570
It's whatever that address is.

01:18:05.570 --> 01:18:07.980
Again, I could technically
write 0x456 up here.

01:18:07.980 --> 01:18:09.800
But again, we're kind of past that.

01:18:09.800 --> 01:18:10.970
That's very 30 minutes ago.

01:18:10.970 --> 01:18:14.300
Let's now focus on just the
abstraction that is a pointer.

01:18:14.300 --> 01:18:17.690
A pointer is just an arrow
pointing from the variable

01:18:17.690 --> 01:18:19.980
to the actual location in memory.

01:18:19.980 --> 01:18:26.720
So now, if I go about copying s into
t using strcpy, or more manually,

01:18:26.720 --> 01:18:28.670
using my for loop, what happens?

01:18:28.670 --> 01:18:31.610
Well, I'm copying the
h over from s into t.

01:18:31.610 --> 01:18:36.110
I'm copying the i over from s into t,
the exclamation point from s into t.

01:18:36.110 --> 01:18:40.530
And then lastly, the terminating
nul character from s into t.

01:18:40.530 --> 01:18:42.740
So the picture is now
fundamentally different.

01:18:42.740 --> 01:18:45.020
t is not pointing at the same thing.

01:18:45.020 --> 01:18:50.570
It's pointing at its own chunk of
memory that has now, one step at a time,

01:18:50.570 --> 01:18:56.210
been duplicating whatever
was at the address s.

01:18:56.210 --> 01:18:59.600
And so this is what you and I as
humans would consider, presumably,

01:18:59.600 --> 01:19:04.080
to be a proper copy of the program.

01:19:04.080 --> 01:19:09.660
Any questions, then, on what we've just
done by introducing malloc and free?

01:19:09.660 --> 01:19:11.910
The first of which allocates
memory and gives you

01:19:11.910 --> 01:19:15.750
the address of the first byte
of memory that you can now use,

01:19:15.750 --> 01:19:19.260
the latter of which hands it back
to your operating system and says,

01:19:19.260 --> 01:19:20.700
I'm done with this.

01:19:20.700 --> 01:19:24.360
It can now be reused for something
else, some other variable,

01:19:24.360 --> 01:19:27.090
maybe, down the road, if
our program were longer.

01:19:27.090 --> 01:19:31.530
Brian, any questions or
confusion I can help with?

01:19:31.530 --> 01:19:33.870
BRIAN: Someone asked, even
if you're using strcpy

01:19:33.870 --> 01:19:37.470
to copy the string instead of copying
the characters one at a time yourself,

01:19:37.470 --> 01:19:39.420
do you still need to free the memory?

01:19:39.420 --> 01:19:40.545
DAVID MALAN: Good question.

01:19:40.545 --> 01:19:43.320
Even if you're using strcpy,
you do need to still use free.

01:19:43.320 --> 01:19:48.120
Yes, anytime you use malloc
henceforth, you must use free.

01:19:48.120 --> 01:19:52.470
Anytime you use malloc, you must use
free in order to free up that memory.

01:19:52.470 --> 01:19:56.370
strcpy is copying the contents of
one chunk of memory to the other.

01:19:56.370 --> 01:19:59.220
It is not allocating or
managing that memory for you.

01:19:59.220 --> 01:20:02.520
It is just implementing,
essentially, that for loop.

01:20:02.520 --> 01:20:05.520
And it's, perhaps, time too, where I
can take off another training wheel

01:20:05.520 --> 01:20:06.020
verbally.

01:20:06.020 --> 01:20:10.410
It turns out that get string, all
this time, is kind of magical.

01:20:10.410 --> 01:20:13.470
One of the things that get
string does from the CS50 library

01:20:13.470 --> 01:20:16.080
is it itself uses malloc.

01:20:16.080 --> 01:20:19.800
Consider, after all, when we, the
staff, wrote get string years ago,

01:20:19.800 --> 01:20:22.830
we have no idea how long your
names are going to be this year.

01:20:22.830 --> 01:20:24.690
We have no idea what
sentences you're going

01:20:24.690 --> 01:20:28.350
to type, what paragraphs you're going to
type, what text you're going to analyze

01:20:28.350 --> 01:20:30.240
for a program like readability.

01:20:30.240 --> 01:20:32.610
So we had to implement
get string in such a way

01:20:32.610 --> 01:20:35.730
that you can type as few or as
many characters at your keyboard

01:20:35.730 --> 01:20:36.420
as you want.

01:20:36.420 --> 01:20:40.150
And we will make sure there's
enough memory for that string.

01:20:40.150 --> 01:20:43.530
So get string, underneath the hood,
if you look at the code we, the staff,

01:20:43.530 --> 01:20:46.530
wrote someday, you'll
see that we use malloc.

01:20:46.530 --> 01:20:51.390
And we call malloc in order to get
enough memory to fit that string.

01:20:51.390 --> 01:20:54.600
And then, what the CS50
library is also secretly doing,

01:20:54.600 --> 01:20:57.060
is it is also calling free for you.

01:20:57.060 --> 01:20:59.130
There's, essentially,
a fancy way where you

01:20:59.130 --> 01:21:03.690
can write a program that, as soon
as main is about to quit or return

01:21:03.690 --> 01:21:06.480
to your blinking prompt,
some special code

01:21:06.480 --> 01:21:10.860
we wrote swoops in at that final
moment, frees any of the memory

01:21:10.860 --> 01:21:14.130
that we, the library,
allocated so that you

01:21:14.130 --> 01:21:17.190
don't run out of memory because of us.

01:21:17.190 --> 01:21:19.590
But you all, when
using malloc, will have

01:21:19.590 --> 01:21:23.700
to call free, because the library
is not going to do that for you.

01:21:23.700 --> 01:21:26.400
And indeed, the goal of today
and next week and beyond

01:21:26.400 --> 01:21:30.833
is to stop using the CS50
library, ultimately, altogether.

01:21:30.833 --> 01:21:33.000
All right, well let's-- it
would be unfair, I think,

01:21:33.000 --> 01:21:36.000
if we introduced all of these fancy
new techniques but don't necessarily

01:21:36.000 --> 01:21:40.620
provide you with any sort of tools with
which to determine to chase down bugs

01:21:40.620 --> 01:21:43.245
in your new fancy code
or solve problems, now,

01:21:43.245 --> 01:21:44.370
that are related to memory.

01:21:44.370 --> 01:21:46.860
And thankfully, there
are programs via which

01:21:46.860 --> 01:21:49.560
you can chase down memory-related bugs.

01:21:49.560 --> 01:21:52.080
This is in addition to
printf, that function,

01:21:52.080 --> 01:21:56.550
and help50 and check50 and debug50
and debuggers more generally.

01:21:56.550 --> 01:21:59.940
This program-- and it's really the last
of the new tools we'll introduce you

01:21:59.940 --> 01:22:01.920
to in C-- is called valgrind.

01:22:01.920 --> 01:22:04.830
And this is a program
that exists in CS50 IDE.

01:22:04.830 --> 01:22:07.620
But it exists on Macs and
PC's and Linux computers

01:22:07.620 --> 01:22:10.050
anywhere, where you can
run it on your own code

01:22:10.050 --> 01:22:12.870
to detect if you're doing
anything wrong with memory.

01:22:12.870 --> 01:22:14.370
What might you do wrong with memory?

01:22:14.370 --> 01:22:17.037
Well, previously, remember, I
triggered that segmentation fault.

01:22:17.037 --> 01:22:19.320
I touched memory that I should not.

01:22:19.320 --> 01:22:22.080
Valgrind is a tool that
can help you figure out,

01:22:22.080 --> 01:22:25.000
where did you touch memory
that you shouldn't have,

01:22:25.000 --> 01:22:27.960
so as to focus your own human
attention on whatever lines of code

01:22:27.960 --> 01:22:28.830
might be buggy.

01:22:28.830 --> 01:22:32.610
Valgrind grant can also detect
if you forget to call free.

01:22:32.610 --> 01:22:36.240
If you call malloc one or more
times, but don't call free

01:22:36.240 --> 01:22:38.280
a corresponding number
of times, valgrind

01:22:38.280 --> 01:22:40.890
is a program that can notice
that and tell you that you have

01:22:40.890 --> 01:22:42.580
what's called a memory leak.

01:22:42.580 --> 01:22:44.760
And indeed, this is germane
to our own Macs and PCs.

01:22:44.760 --> 01:22:47.100
Again, if you've been using
your Mac or PC or sometimes

01:22:47.100 --> 01:22:50.070
even your phone for a
long, long time, and maybe

01:22:50.070 --> 01:22:53.340
running lots of different programs
at once, lots of browser tabs

01:22:53.340 --> 01:22:55.680
open, lots of different
programs open at once,

01:22:55.680 --> 01:22:59.370
your Mac or PC might very well
have begun to slow to a crawl.

01:22:59.370 --> 01:23:01.920
It might be annoying, if
not impossible to use,

01:23:01.920 --> 01:23:03.960
because everything is so darn slow.

01:23:03.960 --> 01:23:07.920
That may very well be because one
or more of the programs you're using

01:23:07.920 --> 01:23:12.480
has some bug in it whereby a
programmer kept allocating memory

01:23:12.480 --> 01:23:15.210
and never got around to calling free.

01:23:15.210 --> 01:23:17.273
Maybe it's a bug, maybe
it was deliberate,

01:23:17.273 --> 01:23:19.440
they didn't expect you to
have so many windows open.

01:23:19.440 --> 01:23:21.360
But valgrind can detect
errors like that.

01:23:21.360 --> 01:23:23.730
And honestly, some of
you, if you're like me,

01:23:23.730 --> 01:23:29.370
you might very well have 10, 20, 50
different browser tabs open at once,

01:23:29.370 --> 01:23:32.910
thinking oh, I'm going to come back to
that someday, even though we never do.

01:23:32.910 --> 01:23:34.950
Each of those tabs takes up memory.

01:23:34.950 --> 01:23:37.320
Literally, any time you open
a browser tab, think of it,

01:23:37.320 --> 01:23:41.580
really, as Chrome or Edge
or Firefox or whatever

01:23:41.580 --> 01:23:43.920
you're using, underneath
the hood, they're

01:23:43.920 --> 01:23:46.320
probably calling a function
on Mac OS or Windows

01:23:46.320 --> 01:23:50.670
like malloc to give you more memory to
contain the contents of that web page

01:23:50.670 --> 01:23:51.480
temporarily.

01:23:51.480 --> 01:23:54.310
And if you keep opening
more and more browser tabs,

01:23:54.310 --> 01:23:56.190
it's like calling
malloc, malloc, malloc.

01:23:56.190 --> 01:23:57.840
Eventually, you're going to run out.

01:23:57.840 --> 01:23:59.700
And computers can be smart these days.

01:23:59.700 --> 01:24:03.060
They can kind of temporarily remove
things from memory to free up space.

01:24:03.060 --> 01:24:04.477
This is called virtual memory.

01:24:04.477 --> 01:24:06.310
But eventually, something
is going to break.

01:24:06.310 --> 01:24:08.520
And it might very well
be your user experience

01:24:08.520 --> 01:24:11.700
when things get so slow that you
literally have to quit the program

01:24:11.700 --> 01:24:14.140
or maybe even reboot your computer.

01:24:14.140 --> 01:24:15.240
So how do we use valgrind?

01:24:15.240 --> 01:24:17.430
Well, let me go ahead
and write a short program

01:24:17.430 --> 01:24:20.040
that doesn't do anything
useful, but demonstrates

01:24:20.040 --> 01:24:22.080
multiple memory-related mistakes.

01:24:22.080 --> 01:24:24.060
I'll call this file memory.c.

01:24:24.060 --> 01:24:27.550
I'm going to go ahead and
open up the file memory.c

01:24:27.550 --> 01:24:30.842
and include at the
top standard io dot h.

01:24:30.842 --> 01:24:32.550
And then I'm going to
also, preemptively,

01:24:32.550 --> 01:24:37.290
include standard lib dot h, which
recalls where malloc, int main void.

01:24:37.290 --> 01:24:39.070
And I'm going to keep this one simple.

01:24:39.070 --> 01:24:42.370
I'm going to go ahead and just give
myself a whole bunch of integer.

01:24:42.370 --> 01:24:43.810
So this is actually kind of cool.

01:24:43.810 --> 01:24:46.480
It turns out that--

01:24:46.480 --> 01:24:47.880
well, let's go ahead.

01:24:47.880 --> 01:24:48.910
Yeah, I can do this.

01:24:48.910 --> 01:24:50.035
Let's go ahead and do this.

01:24:50.035 --> 01:24:52.650
Char star s gets malloc.

01:24:52.650 --> 01:24:57.630
And let me go ahead and give
myself, how about three of these.

01:24:57.630 --> 01:25:01.050
Let me go ahead and allocate
space for three chars.

01:25:01.050 --> 01:25:03.640
Or actually, let's give
me four, just like before.

01:25:03.640 --> 01:25:08.340
Now, I'm going to go ahead
and say s bracket 0 equals 72.

01:25:08.340 --> 01:25:12.220
s bracket 1-- actually,
I'll just do this manually.

01:25:12.220 --> 01:25:14.160
Let's do h.

01:25:14.160 --> 01:25:16.350
Let's do i.

01:25:16.350 --> 01:25:19.960
Let's do our usual exclamation point.

01:25:19.960 --> 01:25:22.170
And then just for good
measure, s bracket 3 gets

01:25:22.170 --> 01:25:24.120
quote unquote, backslash 0.

01:25:24.120 --> 01:25:29.340
This is the very manual
way of actually--

01:25:29.340 --> 01:25:32.430
this is the very manual way of
actually building up a string.

01:25:32.430 --> 01:25:34.060
But let me introduce a mistake.

01:25:34.060 --> 01:25:37.320
Let me accidentally
allocate only three bytes,

01:25:37.320 --> 01:25:40.440
even though I clearly need a fourth
for that terminating nul character.

01:25:40.440 --> 01:25:42.510
And notice too, the absence of free.

01:25:42.510 --> 01:25:45.720
I'm going to, very sloppily,
not bother calling free.

01:25:45.720 --> 01:25:49.590
Now, I'm going to go ahead and
compile this program, make memory.

01:25:49.590 --> 01:25:53.430
OK, it compiles OK, so that's
good, dot slash memory.

01:25:53.430 --> 01:25:55.413
OK, nothing happens,
but that kind of makes

01:25:55.413 --> 01:25:57.330
sense because I didn't
tell it to do anything.

01:25:57.330 --> 01:26:01.360
Just for kicks, let's print out
that string just like we always do.

01:26:01.360 --> 01:26:04.500
Let me now recompile
memory, still compiles.

01:26:04.500 --> 01:26:06.360
Let me run dot slash memory.

01:26:06.360 --> 01:26:07.570
OK, it seems to work.

01:26:07.570 --> 01:26:10.000
So at first glance, you might
be really proud of yourself.

01:26:10.000 --> 01:26:12.910
You've written another correct
program, seems to pass check50.

01:26:12.910 --> 01:26:13.410
You submit.

01:26:13.410 --> 01:26:14.327
You go about your day.

01:26:14.327 --> 01:26:16.380
And you're very
disappointed some days later

01:26:16.380 --> 01:26:19.920
when you realize, dammit, I did not
get full credit on this because there's

01:26:19.920 --> 01:26:21.780
actually a latent bug.

01:26:21.780 --> 01:26:24.780
So sometimes, indeed,
there are bugs in your code

01:26:24.780 --> 01:26:27.420
that you don't necessarily see
visually, you don't necessarily

01:26:27.420 --> 01:26:30.990
experience when running it
yourself, but eventually, there

01:26:30.990 --> 01:26:33.443
might be an error when
running it enough times.

01:26:33.443 --> 01:26:36.360
Eventually, a computer might notice
that you're doing something wrong.

01:26:36.360 --> 01:26:38.460
And thankfully, tools
exist like valgrind,

01:26:38.460 --> 01:26:40.098
that can allow you to detect that.

01:26:40.098 --> 01:26:43.140
So let me go ahead and just increase
the size of my terminal window here.

01:26:43.140 --> 01:26:48.090
And let me go ahead and run
valgrind on dot slash memory.

01:26:48.090 --> 01:26:49.290
So it's just like debug50.

01:26:49.290 --> 01:26:53.040
Instead of running debug50 and then
dot slash whatever the program is,

01:26:53.040 --> 01:26:55.813
you run valgrind dot slash memory.

01:26:55.813 --> 01:26:58.230
This one, unfortunately, is
only a command line interface.

01:26:58.230 --> 01:27:00.480
There's no graphical user
interface like debug50.

01:27:00.480 --> 01:27:04.530
And honestly, it's a
hideous sequence of output.

01:27:04.530 --> 01:27:06.630
This should overwhelm
you at first glance.

01:27:06.630 --> 01:27:08.190
There's crazy cryptic-ness here.

01:27:08.190 --> 01:27:09.690
It's not the best-designed program.

01:27:09.690 --> 01:27:12.520
It really was meant for the
most comfortable people.

01:27:12.520 --> 01:27:15.180
But there are some useful
tidbits we can take away from it.

01:27:15.180 --> 01:27:17.490
As always, let me show
all the way to the top

01:27:17.490 --> 01:27:19.260
to the very first line of output.

01:27:19.260 --> 01:27:21.600
And I'll draw your attention
to a couple of things

01:27:21.600 --> 01:27:23.070
that will start to jump out to you.

01:27:23.070 --> 01:27:24.960
And help50 can help you with this.

01:27:24.960 --> 01:27:28.020
If you're confused by
valgrind's output, rerun it.

01:27:28.020 --> 01:27:29.520
But put help50 at the beginning.

01:27:29.520 --> 01:27:32.120
And just like I will do
now verbally, so can help50

01:27:32.120 --> 01:27:36.510
help you notice the important
things in this crazy mess of output.

01:27:36.510 --> 01:27:37.770
This is worrisome.

01:27:37.770 --> 01:27:41.880
Valgrind is noting on this line
here, invalid right of size 1.

01:27:41.880 --> 01:27:44.370
And that's on line 10 of memory.c.

01:27:44.370 --> 01:27:46.510
So we'll look at that in a moment.

01:27:46.510 --> 01:27:50.530
If I scroll down further,
invalid read of size 1.

01:27:50.530 --> 01:27:55.810
And that also seems to be on here, it
looks like, on line 11 of memory.c.

01:27:55.810 --> 01:27:59.070
And then if I keep scrolling,
keep scrolling, keep scrolling,

01:27:59.070 --> 01:28:00.990
I'm not liking this.

01:28:00.990 --> 01:28:05.910
3 bytes in 1 blocks are definitely
lost in loss record, whatever that is.

01:28:05.910 --> 01:28:10.170
But three bytes in 1
blocks are definitely lost.

01:28:10.170 --> 01:28:15.240
And then down here, leak summary,
definitely lost, 3 bytes in 1 blocks.

01:28:15.240 --> 01:28:17.703
Incidentally, 1 blocks,
obviously not correct grammar.

01:28:17.703 --> 01:28:19.620
This is what happens
when your program doesn't

01:28:19.620 --> 01:28:24.210
have an if condition that checks if
the number is 1 or positive or 0.

01:28:24.210 --> 01:28:27.300
You could fix this, grammatically,
honestly, with a simple if condition.

01:28:27.300 --> 01:28:29.770
They did not when writing
this program years ago.

01:28:29.770 --> 01:28:32.110
So there's two or three mistakes here.

01:28:32.110 --> 01:28:34.620
One is some kind of
invalid read or write.

01:28:34.620 --> 01:28:35.953
And another is this leak.

01:28:35.953 --> 01:28:36.870
Well, what is a write?

01:28:36.870 --> 01:28:38.940
A write just refers to changing a value.

01:28:38.940 --> 01:28:43.150
A read just refers to reading
or using or printing a value.

01:28:43.150 --> 01:28:44.730
So let's focus on line 10.

01:28:44.730 --> 01:28:48.060
If I scroll back down to my
code and look on line 10,

01:28:48.060 --> 01:28:51.760
this was an invalid
write, invalid write.

01:28:51.760 --> 01:28:52.950
Well, why is it invalid?

01:28:52.950 --> 01:28:57.180
Well, per today's definition,
if you are allocating 3 bytes,

01:28:57.180 --> 01:29:01.710
you are welcome to touch the first byte,
the second byte, and the third byte.

01:29:01.710 --> 01:29:04.500
But you have no business
touching the fourth byte

01:29:04.500 --> 01:29:06.420
if you've only asked for three.

01:29:06.420 --> 01:29:11.070
This is like a small scale version of
the very adventurous and inappropriate

01:29:11.070 --> 01:29:14.100
poking around I did when I
looked at 10,000 bytes away.

01:29:14.100 --> 01:29:16.680
Even looking one byte
away is a potential bug

01:29:16.680 --> 01:29:18.780
and can cause a program to crash.

01:29:18.780 --> 01:29:21.720
Meanwhile, line 11 is
also problematic, which

01:29:21.720 --> 01:29:25.470
is an invalid read, because now,
you're saying go print out this string.

01:29:25.470 --> 01:29:28.043
But that string contains
a memory address

01:29:28.043 --> 01:29:30.210
that you should not have
touched in the first place.

01:29:30.210 --> 01:29:34.080
And the memory leak, the third
problem, stems from the fact

01:29:34.080 --> 01:29:36.520
that I didn't free that memory.

01:29:36.520 --> 01:29:40.380
So again, it'll take some practice and
experience, some mistakes of your own,

01:29:40.380 --> 01:29:42.480
to notice and understand these bugs.

01:29:42.480 --> 01:29:44.670
But let me fix the first two like this.

01:29:44.670 --> 01:29:46.530
Let me just give myself four bytes.

01:29:46.530 --> 01:29:48.990
And let me fix the second
one or the third one,

01:29:48.990 --> 01:29:53.820
really, by freeing s at the very end,
because again, any time you use malloc

01:29:53.820 --> 01:29:55.590
you must use free.

01:29:55.590 --> 01:29:59.310
Let me go ahead and recompile
memory, seems to compile.

01:29:59.310 --> 01:30:02.130
Let me rerun it, still
works the same, visually.

01:30:02.130 --> 01:30:05.670
But now, let's rerun valgrind on it
and see if there are any errors now,

01:30:05.670 --> 01:30:08.710
so valgrind dot slash memory, Enter.

01:30:08.710 --> 01:30:10.710
The output's still going
to look pretty cryptic.

01:30:10.710 --> 01:30:15.300
But notice all heap blocks were
freed, whatever that means.

01:30:15.300 --> 01:30:16.217
No leaks are possible.

01:30:16.217 --> 01:30:18.133
It doesn't really get
more explicit than that.

01:30:18.133 --> 01:30:19.090
That's a good thing.

01:30:19.090 --> 01:30:23.100
And if I scroll up, I see no mention
of those invalid reads or writes.

01:30:23.100 --> 01:30:26.168
So starting with this week's
problems and next week's in C,

01:30:26.168 --> 01:30:27.960
not only are you going
to want to use tools

01:30:27.960 --> 01:30:31.590
like help50 and printf
and debug50 and check50,

01:30:31.590 --> 01:30:35.710
but even if you think your code's
right, the output looks right,

01:30:35.710 --> 01:30:37.050
you might have a latent bug.

01:30:37.050 --> 01:30:40.200
And even when your programs are small,
they might not crash the computer.

01:30:40.200 --> 01:30:43.500
They might not cause that segmentation
fault. Eventually, they will.

01:30:43.500 --> 01:30:47.850
And you do want to use tools like
this to chase down any such mistakes.

01:30:47.850 --> 01:30:50.460
Otherwise, bad things can happen.

01:30:50.460 --> 01:30:51.600
And what might happen?

01:30:51.600 --> 01:30:54.900
Well, let me go ahead and
reveal an example here

01:30:54.900 --> 01:30:57.840
that presents some code
that's a little dangerous.

01:30:57.840 --> 01:31:00.600
So here, for instance,
is an example where

01:31:00.600 --> 01:31:05.202
I'm declaring at the top of the
function, int star x and int star y.

01:31:05.202 --> 01:31:06.160
So what does that mean?

01:31:06.160 --> 01:31:08.700
Well, per today's parlance,
this just means give me

01:31:08.700 --> 01:31:11.550
a pointer to an integer called x.

01:31:11.550 --> 01:31:13.800
Give me a pointer to
an integer called y.

01:31:13.800 --> 01:31:16.650
Put another way, give me
a variable called x that I

01:31:16.650 --> 01:31:18.900
can store the address of an int in.

01:31:18.900 --> 01:31:23.640
Give me a variable called y that I can
store the address of another int in.

01:31:23.640 --> 01:31:27.880
But notice what I am not doing
on these first two lines.

01:31:27.880 --> 01:31:31.950
I'm not actually assigning
them a value until line 3.

01:31:31.950 --> 01:31:36.000
On line 3, even though this is weird--
this is not how we've allocated space

01:31:36.000 --> 01:31:37.530
for integers before--

01:31:37.530 --> 01:31:41.130
there's no reason that
you can't use malloc

01:31:41.130 --> 01:31:45.550
and say, give me enough space
for the size of an integer.

01:31:45.550 --> 01:31:46.370
sizeof is new.

01:31:46.370 --> 01:31:50.150
It's just an operator in C that
tells you the size of a data type,

01:31:50.150 --> 01:31:51.500
like a size of an int.

01:31:51.500 --> 01:31:53.480
So maybe you forgot that an int is 4.

01:31:53.480 --> 01:31:56.450
And indeed, an int is usually 4,
but not always 4 in all systems.

01:31:56.450 --> 01:32:00.020
So size of int just makes sure that it
will always give you the right answer,

01:32:00.020 --> 01:32:02.630
whether you're using a modern
computer or an old one.

01:32:02.630 --> 01:32:07.190
So this just means, really, allocate
4 bytes to me on a modern system.

01:32:07.190 --> 01:32:11.370
And it stores the address
of the first byte in x.

01:32:11.370 --> 01:32:15.360
Would someone mind translating
to layman's terms, what

01:32:15.360 --> 01:32:18.480
is star x equal 42 doing?

01:32:18.480 --> 01:32:20.880
Star, again, is the
dereference operator.

01:32:20.880 --> 01:32:23.430
It means go to the address.

01:32:23.430 --> 01:32:24.375
And do what?

01:32:24.375 --> 01:32:27.510
How would you describe,
with a verbal comment,

01:32:27.510 --> 01:32:30.450
what star x equals 42 is doing?

01:32:30.450 --> 01:32:33.630
Brian, would you mind
verbalizing any thoughts?

01:32:33.630 --> 01:32:37.555
BRIAN: Yeah, so Sophia suggested that at
that address, we are going to place 42.

01:32:37.555 --> 01:32:38.430
DAVID MALAN: Perfect.

01:32:38.430 --> 01:32:40.080
At that address put 42.

01:32:40.080 --> 01:32:44.640
Equivalently, go to that address
in x and put the number 42 there.

01:32:44.640 --> 01:32:48.870
It's like going to Brian's mailbox
and putting the 42 in his mailbox,

01:32:48.870 --> 01:32:52.035
instead of what we previously had
there, which was the number 50.

01:32:52.035 --> 01:32:57.180
How about this next fifth
line, star y equals 13?

01:32:57.180 --> 01:32:59.670
Brian, could you verbalize someone else?

01:32:59.670 --> 01:33:03.500
What does star y equals 13 do for us?

01:33:03.500 --> 01:33:07.850
And it's not an accident
that 13 tends to be unlucky.

01:33:07.850 --> 01:33:10.530
BRIAN: Peter says, put
13 at the address y.

01:33:10.530 --> 01:33:12.710
DAVID MALAN: Good, put
13 at the address in y.

01:33:12.710 --> 01:33:16.860
Or put another way, go to the
address in y and put 13 there.

01:33:16.860 --> 01:33:19.070
But there's a logical problem here.

01:33:19.070 --> 01:33:20.870
What is in y?

01:33:20.870 --> 01:33:24.860
If I rewind, I never
actually assign y a value.

01:33:24.860 --> 01:33:27.050
I don't initially, and
I don't eventually.

01:33:27.050 --> 01:33:30.500
At least with x, even though I didn't
give it a value in declaring it up here

01:33:30.500 --> 01:33:34.850
as a variable, I eventually got around
to storing in it the actual address.

01:33:34.850 --> 01:33:38.060
Now, just to be really nit picky, I
should probably even, in this program,

01:33:38.060 --> 01:33:40.495
check for nul just in
case anything went wrong.

01:33:40.495 --> 01:33:41.870
But that's a whole other problem.

01:33:41.870 --> 01:33:46.470
It is a more damning problem that
I haven't even given y a value.

01:33:46.470 --> 01:33:49.610
And here's where we can reveal
one other detail about a computer.

01:33:49.610 --> 01:33:53.750
Thus far, we've been taking for granted
that you and I almost always initialize

01:33:53.750 --> 01:33:54.360
our memory.

01:33:54.360 --> 01:33:56.900
If we want to give ourselves
a char, an int, a string,

01:33:56.900 --> 01:33:59.900
we literally type it
out into the program

01:33:59.900 --> 01:34:02.150
itself so that it's
there when we want it.

01:34:02.150 --> 01:34:04.070
But if we consider this
picture here, which

01:34:04.070 --> 01:34:07.370
is now just a physical incarnation of
some of the contents of your computer's

01:34:07.370 --> 01:34:11.750
memory, playfully labeled with
a lot of Oscar the Grouches,

01:34:11.750 --> 01:34:16.250
this is because you should never trust
the contents of your computer's memory

01:34:16.250 --> 01:34:18.500
if you yourself have
not put something there.

01:34:18.500 --> 01:34:21.560
There's a term of art in
programming called garbage values.

01:34:21.560 --> 01:34:26.180
If you yourself have not put
a value somewhere in memory,

01:34:26.180 --> 01:34:30.210
you should assume, to be safe, that it
is a quote unquote, "garbage value."

01:34:30.210 --> 01:34:31.440
It's not a weird value.

01:34:31.440 --> 01:34:34.580
It's just a 1, a 2, an
A, a B, a C, you just

01:34:34.580 --> 01:34:38.510
don't know what it is, because if
your program is running over time

01:34:38.510 --> 01:34:40.890
and you're calling functions
and functions are returning.

01:34:40.890 --> 01:34:43.348
You're calling other functions
and functions are returning.

01:34:43.348 --> 01:34:46.970
These values in your computer's
memory are constantly changing,

01:34:46.970 --> 01:34:48.740
and your memory gets reused.

01:34:48.740 --> 01:34:53.180
When you free memory, that doesn't erase
it or set it all back to 0's or set it

01:34:53.180 --> 01:34:53.990
all back to 1's.

01:34:53.990 --> 01:34:56.600
It just leaves it alone
so that you can reuse

01:34:56.600 --> 01:34:59.810
it, which means over time,
your computer contains remnants

01:34:59.810 --> 01:35:03.960
of all of the variables you've ever used
in your program over here, over here,

01:35:03.960 --> 01:35:04.730
over there.

01:35:04.730 --> 01:35:10.850
And so in a program like this, where
you have not explicitly initialized y

01:35:10.850 --> 01:35:14.000
to anything, you should assume
that Oscar the Grouch, so to speak,

01:35:14.000 --> 01:35:15.020
is at that location.

01:35:15.020 --> 01:35:20.570
It is a garbage value that looks like
an address but is not a valid address.

01:35:20.570 --> 01:35:25.040
And so when you say star y equals
13, that means go to that address.

01:35:25.040 --> 01:35:28.910
But really, go to that bogus
address and put something there.

01:35:28.910 --> 01:35:31.850
And odds are, your
program is going to crash.

01:35:31.850 --> 01:35:33.650
You are going to get
a segmentation fault,

01:35:33.650 --> 01:35:37.562
because by going to some
arbitrary garbage value address,

01:35:37.562 --> 01:35:40.520
it would be like picking up a random
piece of paper with a number on it

01:35:40.520 --> 01:35:42.030
and then going to that mailbox.

01:35:42.030 --> 01:35:42.530
Why?

01:35:42.530 --> 01:35:44.300
It does it belong to you.

01:35:44.300 --> 01:35:47.930
If you try to dereference
an uninitialized variable,

01:35:47.930 --> 01:35:49.850
your program may very well crash.

01:35:49.850 --> 01:35:51.890
And this is, perhaps,
no better-presented

01:35:51.890 --> 01:35:55.970
than by some of our friends, Nick
Parlante, a professor at Stanford

01:35:55.970 --> 01:36:02.510
University who is breathed life into a
character in claymation known as Binky.

01:36:02.510 --> 01:36:06.140
We have just a 2 minute clip from this
that paints the picture of bad things

01:36:06.140 --> 01:36:09.020
indeed happening when you touch
memory that you shouldn't.

01:36:09.020 --> 01:36:13.340
So hopefully, a helpful reminder as to
what to do and not to do with pointers.

01:36:13.340 --> 01:36:14.790
Here we go.

01:36:14.790 --> 01:36:16.610
[VIDEO PLAYBACK]

01:36:16.610 --> 01:36:17.540
- Hey, Binky.

01:36:17.540 --> 01:36:20.890
Wake up, it's time for pointer fun.

01:36:20.890 --> 01:36:22.060
- What's that?

01:36:22.060 --> 01:36:23.620
Learn about pointers?

01:36:23.620 --> 01:36:25.390
Oh, goody!

01:36:25.390 --> 01:36:28.430
- Well, to get started, I guess we're
going to need a couple pointers.

01:36:28.430 --> 01:36:32.940
- OK, this code allocates two
pointers which can point to integers.

01:36:32.940 --> 01:36:35.042
- OK, well I see the two pointers.

01:36:35.042 --> 01:36:37.000
But they don't seem to
be pointing to anything.

01:36:37.000 --> 01:36:37.780
- That's right.

01:36:37.780 --> 01:36:39.970
Initially, pointers
don't point to anything.

01:36:39.970 --> 01:36:42.190
The things they point
to or called pointees.

01:36:42.190 --> 01:36:44.110
And setting them up's a separate step.

01:36:44.110 --> 01:36:45.100
- Oh, right, right.

01:36:45.100 --> 01:36:45.790
I knew that.

01:36:45.790 --> 01:36:47.750
The pointees are separate.

01:36:47.750 --> 01:36:50.050
So how do you allocate a pointee?

01:36:50.050 --> 01:36:53.800
- OK, well, this code allocates
a new integer pointee.

01:36:53.800 --> 01:36:56.880
And this part sets x to point to it.

01:36:56.880 --> 01:36:58.180
- Hey, that looks better.

01:36:58.180 --> 01:36:59.700
So make it do something.

01:36:59.700 --> 01:37:05.460
- OK, I'll dereference the pointer x to
store the number 42 into its pointee.

01:37:05.460 --> 01:37:08.970
For this trick, I'll need my
magic wand of dereferencing.

01:37:08.970 --> 01:37:12.660
- Your magic wand of dereferencing?

01:37:12.660 --> 01:37:14.170
That's great.

01:37:14.170 --> 01:37:15.910
- This is what the code looks like.

01:37:15.910 --> 01:37:17.800
I'll just set up the number and--

01:37:17.800 --> 01:37:18.900
[POP]

01:37:18.900 --> 01:37:21.000
- Hey, look, there it goes.

01:37:21.000 --> 01:37:25.830
So doing a dereference on x follows
the arrow to access its pointee.

01:37:25.830 --> 01:37:28.020
In this case, to store 42 in there.

01:37:28.020 --> 01:37:32.450
Hey, try using it to store the number
13 through the other pointer, y.

01:37:32.450 --> 01:37:33.570
- OK.

01:37:33.570 --> 01:37:38.100
I'll just go over here to y
and get the number 13 set up

01:37:38.100 --> 01:37:41.970
and then take the wand of
dereferencing and just--

01:37:41.970 --> 01:37:43.580
[HORN] whoa!

01:37:43.580 --> 01:37:45.930
- Oh, hey, that didn't work.

01:37:45.930 --> 01:37:51.370
Say, Binky, I don't think dereferencing
y is a good idea, because setting up

01:37:51.370 --> 01:37:52.840
the pointee is a separate step.

01:37:52.840 --> 01:37:54.815
And I don't think we ever did it.

01:37:54.815 --> 01:37:56.430
- Hmm, good point.

01:37:56.430 --> 01:37:58.800
- Yeah, we allocated the pointer y.

01:37:58.800 --> 01:38:01.570
But we never set it
to point to a pointee.

01:38:01.570 --> 01:38:03.480
- Hmm, very observant.

01:38:03.480 --> 01:38:05.310
- Hey, you're looking good there, Binky.

01:38:05.310 --> 01:38:08.250
Can you fix it so that y points
to the same pointee as x?

01:38:08.250 --> 01:38:11.620
- Sure, I'll use my magic
wand of pointer assignment.

01:38:11.620 --> 01:38:13.800
- Is that going to be
a problem like before?

01:38:13.800 --> 01:38:15.630
- No, this doesn't touch the pointees.

01:38:15.630 --> 01:38:19.170
It just changes one pointer to
point to the same thing as another.

01:38:19.170 --> 01:38:20.310
- Oh, I see.

01:38:20.310 --> 01:38:23.040
Now, y points to the same place as x.

01:38:23.040 --> 01:38:24.840
So wait, now y is fixed.

01:38:24.840 --> 01:38:25.950
It has a pointee.

01:38:25.950 --> 01:38:29.760
So you can try the wand of
dereferencing again to send the 13 over.

01:38:29.760 --> 01:38:31.093
- Oh, OK.

01:38:31.093 --> 01:38:31.635
Here it goes.

01:38:31.635 --> 01:38:32.900
[POP]

01:38:32.900 --> 01:38:34.160
- Hey, look at that.

01:38:34.160 --> 01:38:35.870
Now, dereferencing works on y.

01:38:35.870 --> 01:38:39.980
And because the pointers are sharing
that one pointee, they both see the 13.

01:38:39.980 --> 01:38:41.720
- Yeah, sharing, whatever.

01:38:41.720 --> 01:38:43.610
So are we going to switch places now?

01:38:43.610 --> 01:38:45.270
- Oh look, we're out of time.

01:38:45.270 --> 01:38:45.770
- But--

01:38:45.770 --> 01:38:46.040
[END PLAYBACK]

01:38:46.040 --> 01:38:47.570
DAVID MALAN: All right, so
we are not quite out of time.

01:38:47.570 --> 01:38:50.028
But let's go ahead and take
our second 5 minute break here.

01:38:50.028 --> 01:38:52.910
And when we return, we'll take
a closer look at Oscar and more.

01:38:52.910 --> 01:38:54.260
Back in 5.

01:38:54.260 --> 01:38:57.380
All right, so I claim that
there's all these garbage

01:38:57.380 --> 01:38:58.950
values in your computer's memory.

01:38:58.950 --> 01:39:00.860
But how can you see them?

01:39:00.860 --> 01:39:04.400
What Binky did was, of course,
try to dereference a garbage value

01:39:04.400 --> 01:39:05.817
when bad things happen.

01:39:05.817 --> 01:39:07.900
But we can actually see
this with code of our own.

01:39:07.900 --> 01:39:10.970
So let me go ahead, quickly, and
whip up a little program here,

01:39:10.970 --> 01:39:15.290
just like something we did
in week one or week two,

01:39:15.290 --> 01:39:17.090
but without doing it very well.

01:39:17.090 --> 01:39:21.410
Let me go ahead and include standard
io dot h as usual, int main void.

01:39:21.410 --> 01:39:24.290
And then let me go ahead and
give myself an array of scores.

01:39:24.290 --> 01:39:26.000
How about an array of three scores?

01:39:26.000 --> 01:39:28.715
And we've done this before where
we collected scores from a user.

01:39:28.715 --> 01:39:30.590
But this time, I'm going
to deliberately make

01:39:30.590 --> 01:39:33.170
the mistake of not actually
initializing those scores

01:39:33.170 --> 01:39:35.450
or even asking the
human for those scores.

01:39:35.450 --> 01:39:41.060
I'm just going to blindly go about
iterating from i equals 0 on up to 3.

01:39:41.060 --> 01:39:46.070
And on each iteration, I'm just going
to presumptuously print whatever is

01:39:46.070 --> 01:39:49.220
at that location in scores bracket i.

01:39:49.220 --> 01:39:52.430
So logically, my code is correct
in what it's trying to do,

01:39:52.430 --> 01:39:54.230
print out the values in scores.

01:39:54.230 --> 01:39:57.170
But notice that I have
deliberately not initialized any

01:39:57.170 --> 01:40:00.147
of the 1, 2, 3 scores in that array.

01:40:00.147 --> 01:40:01.730
So who knows what's going to be there?

01:40:01.730 --> 01:40:04.650
Indeed, it should be
garbage values of some sort

01:40:04.650 --> 01:40:06.650
that we couldn't necessarily
predict in advance.

01:40:06.650 --> 01:40:10.050
So let me go ahead and make
garbage, since this program

01:40:10.050 --> 01:40:11.300
is in a file called garbage.c.

01:40:11.300 --> 01:40:15.140
Compiles OK, but when
I now run garbage, we

01:40:15.140 --> 01:40:21.230
should see three scores, which are
cryptically negative, 833060864.

01:40:21.230 --> 01:40:23.780
Another one is 32765.

01:40:23.780 --> 01:40:25.760
And the third just happens to be 0.

01:40:25.760 --> 01:40:28.490
So there are those garbage values,
because again, the computer

01:40:28.490 --> 01:40:31.800
is not going to initialize
any of those values for you.

01:40:31.800 --> 01:40:33.570
Now, there are exceptions.

01:40:33.570 --> 01:40:36.320
We have, on occasion,
used a global variable,

01:40:36.320 --> 01:40:40.490
a constant that is outside the context
of main and all of my other functions.

01:40:40.490 --> 01:40:42.860
Global variables, if
you do not set them,

01:40:42.860 --> 01:40:47.210
are conventionally initialized
to 0 or nul for you.

01:40:47.210 --> 01:40:50.000
But you should generally not
rely on that kind of behavior.

01:40:50.000 --> 01:40:53.120
Your instinct should be to
always initialize values

01:40:53.120 --> 01:40:56.630
before thinking of
touching or reading them

01:40:56.630 --> 01:40:59.030
as via printf or some other mechanism.

01:40:59.030 --> 01:41:02.720
All right, well, let's see how
this understanding, now, of memory,

01:41:02.720 --> 01:41:06.350
can lead us to solve problems, but
also encounter new types of problems,

01:41:06.350 --> 01:41:08.960
but problems that we can
now hopefully understand.

01:41:08.960 --> 01:41:11.250
I'm going to go ahead and
create a new program here.

01:41:11.250 --> 01:41:14.390
And recall from last week
that it was very common

01:41:14.390 --> 01:41:15.890
for us to want to swap values.

01:41:15.890 --> 01:41:19.010
When Brian was doing our sorts for
us, whether it was selection or bubble

01:41:19.010 --> 01:41:21.710
sort, there was a lot
of swapping going on.

01:41:21.710 --> 01:41:24.440
And yet, we didn't really write
any code for those algorithms.

01:41:24.440 --> 01:41:25.232
And that's fine.

01:41:25.232 --> 01:41:27.440
But let's consider that very
simple primitive of just

01:41:27.440 --> 01:41:30.440
swapping two values, for
instance, swapping two integers.

01:41:30.440 --> 01:41:34.160
Let me go ahead and give myself the
start of a program and swap.c here.

01:41:34.160 --> 01:41:38.630
I'm going to include standard
io dot h, int main void.

01:41:38.630 --> 01:41:41.370
And inside of main, I'm going
to give myself two integers.

01:41:41.370 --> 01:41:44.960
Let's just give myself an int called
x and assign it 1, an int called y

01:41:44.960 --> 01:41:46.140
and assign it 2.

01:41:46.140 --> 01:41:48.890
And then let me go ahead and just
print out what those values are.

01:41:48.890 --> 01:41:55.520
I'll just say, literally, x is percent
i comma y is percent i backslash n.

01:41:55.520 --> 01:41:59.490
And then I'm going to go ahead and
print out x comma y, respectively.

01:41:59.490 --> 01:42:02.930
And then I'm eventually going
to write a function called

01:42:02.930 --> 01:42:04.613
swap that swaps x and y.

01:42:04.613 --> 01:42:06.530
But let's assume, for
the moment, that exists.

01:42:06.530 --> 01:42:08.870
It doesn't, because what
I then want to do right

01:42:08.870 --> 01:42:13.340
after that is just reprint the
same thing, x is now percent i,

01:42:13.340 --> 01:42:17.690
y is percent i, my presumption
being that the values of x and y

01:42:17.690 --> 01:42:18.870
will be swapped.

01:42:18.870 --> 01:42:20.480
So how might I swap these two values?

01:42:20.480 --> 01:42:23.120
Well, let me go ahead and
implement my own function.

01:42:23.120 --> 01:42:25.110
I don't think it needs
to return anything,

01:42:25.110 --> 01:42:27.110
so I'm going to say
void is the return type.

01:42:27.110 --> 01:42:28.340
I'll call it swap.

01:42:28.340 --> 01:42:30.830
It's going to take two
arguments as input.

01:42:30.830 --> 01:42:33.320
We'll call it a and b, both integers.

01:42:33.320 --> 01:42:34.820
But I could call it anything I want.

01:42:34.820 --> 01:42:36.800
But a and b seems reasonable.

01:42:36.800 --> 01:42:39.350
And now, I want to go
ahead and swap two values.

01:42:39.350 --> 01:42:42.140
Now, Brian was kind of doing this
with his two hands last week.

01:42:42.140 --> 01:42:45.830
And that's fine, but we should probably
consider this a little more closely.

01:42:45.830 --> 01:42:48.050
In fact, Brian, instead
of numbers, let's

01:42:48.050 --> 01:42:49.920
do something a little more real world.

01:42:49.920 --> 01:42:53.080
I think you have a couple of
beverages in front of you.

01:42:53.080 --> 01:42:53.580
BRIAN: Yeah.

01:42:53.580 --> 01:42:56.220
So right here, I have a
red glass and a blue glass,

01:42:56.220 --> 01:42:58.970
which I guess we can use to represent
two variables, for instance.

01:42:58.970 --> 01:42:59.180
DAVID MALAN: Yeah.

01:42:59.180 --> 01:43:00.198
Now, let me suppose--

01:43:00.198 --> 01:43:01.490
I wish I'd told you in advance.

01:43:01.490 --> 01:43:03.920
I'd actually prefer
that the red liquid be

01:43:03.920 --> 01:43:07.050
in the blue glass and the blue
liquid be in the red glass.

01:43:07.050 --> 01:43:08.780
So do you mind swapping
those two values,

01:43:08.780 --> 01:43:11.310
just like you swapped numbers last week?

01:43:11.310 --> 01:43:12.060
BRIAN: Yeah, sure.

01:43:12.060 --> 01:43:14.810
So I can just take the two glasses,
and I can switch their places.

01:43:14.810 --> 01:43:17.717
DAVID MALAN: OK, wait,
OK, that's not exactly--

01:43:17.717 --> 01:43:18.800
you took me too literally.

01:43:18.800 --> 01:43:22.760
I think here, if we think of the
glasses, now, as specific locations

01:43:22.760 --> 01:43:24.980
in memory, you can't
just physically move

01:43:24.980 --> 01:43:27.540
the chips of memory inside of
your computer to swap things.

01:43:27.540 --> 01:43:30.410
So I think I literally need
you to move the blue liquid

01:43:30.410 --> 01:43:33.350
into the red glass and the
red liquid into the blue glass

01:43:33.350 --> 01:43:36.100
so that it's more like
a computer's memory.

01:43:36.100 --> 01:43:37.657
BRIAN: OK, I can try to do that.

01:43:37.657 --> 01:43:40.240
I'm a little nervous, though,
because I feel like I can't just

01:43:40.240 --> 01:43:43.270
pour the blue liquid into the red
glass, because the red liquid's already

01:43:43.270 --> 01:43:43.640
in there.

01:43:43.640 --> 01:43:45.730
DAVID MALAN: Yeah, so this
probably doesn't end well,

01:43:45.730 --> 01:43:48.220
if he's got to do some kind of
switcheroo between the two glasses.

01:43:48.220 --> 01:43:49.240
So any thoughts here?

01:43:49.240 --> 01:43:54.100
Like what is the real world solution
to this weird but real problem, where

01:43:54.100 --> 01:43:57.490
we want to swap the contents
of these two locations,

01:43:57.490 --> 01:44:01.180
just like Brian was swapping the
contents of two memory locations

01:44:01.180 --> 01:44:02.290
last week?

01:44:02.290 --> 01:44:04.900
Brian, if you have your eye
on the chat in parallel,

01:44:04.900 --> 01:44:08.480
might anyone have ideas on how
we could swap these two liquids?

01:44:08.480 --> 01:44:11.620
BRIAN: Yeah, a couple of people are
saying that I need a third glass.

01:44:11.620 --> 01:44:13.370
DAVID MALAN: All right,
well Brian, do you

01:44:13.370 --> 01:44:16.370
happen to have a third glass with
you back there behind back stage?

01:44:16.370 --> 01:44:18.040
BRIAN: In fact, I think I do.

01:44:18.040 --> 01:44:21.190
So I have a third glass here
that just so happens to be empty.

01:44:21.190 --> 01:44:22.100
DAVID MALAN: OK.

01:44:22.100 --> 01:44:25.610
And how would you, now, go
about swapping these two things?

01:44:25.610 --> 01:44:28.870
BRIAN: All right, so I want to put
the blue liquid inside the red glass.

01:44:28.870 --> 01:44:30.578
So the first thing I
need to do, I think,

01:44:30.578 --> 01:44:34.040
is just to empty out the red glass
to make space for the blue liquid.

01:44:34.040 --> 01:44:36.310
So I'm going to take the
red liquid, and I'm just

01:44:36.310 --> 01:44:38.470
going to pour it into this extra glass.

01:44:38.470 --> 01:44:39.520
DAVID MALAN: Temporarily though, right?

01:44:39.520 --> 01:44:39.870
BRIAN: Temporarily, yeah.

01:44:39.870 --> 01:44:40.570
DAVID MALAN: OK.

01:44:40.570 --> 01:44:42.620
BRIAN: Just to keep
it to store it there.

01:44:42.620 --> 01:44:45.100
And now, I think I can
just pour the blue liquid

01:44:45.100 --> 01:44:48.942
into the original red glass,
because now I'm free to do so.

01:44:48.942 --> 01:44:50.400
So I'll pour the blue liquid there.

01:44:53.230 --> 01:44:56.220
And I think the last thing I need
to do now is, now this blue--

01:44:56.220 --> 01:44:59.680
this glass that originally held
the blue liquid is now empty.

01:44:59.680 --> 01:45:03.130
So the red liquid, which was inside
of this temporary glass over here,

01:45:03.130 --> 01:45:07.350
I can take the red liquid and
just pour it into this glass here.

01:45:07.350 --> 01:45:10.290
And now, I didn't swap the
positions of the glasses.

01:45:10.290 --> 01:45:12.390
But the liquids have
actually switched places.

01:45:12.390 --> 01:45:15.355
Now, the blue liquid is on the left
and the red liquid is on the right.

01:45:15.355 --> 01:45:16.230
DAVID MALAN: Awesome.

01:45:16.230 --> 01:45:18.660
Yeah, I think that is a
more literal implementation

01:45:18.660 --> 01:45:21.150
of what you were doing and
taking for granted last week,

01:45:21.150 --> 01:45:24.182
swapping the two values
in two separate locations.

01:45:24.182 --> 01:45:25.640
So it seems pretty straightforward.

01:45:25.640 --> 01:45:27.210
I just need a little more space.

01:45:27.210 --> 01:45:29.670
I need a temporary variable
in code, if you will.

01:45:29.670 --> 01:45:31.545
And it seems I need three steps.

01:45:31.545 --> 01:45:34.670
I need to pour one out, pour the other
one out, pour the other one back in.

01:45:34.670 --> 01:45:37.122
So I think I can translate
that into code here.

01:45:37.122 --> 01:45:39.330
Let me go ahead and give
myself a temporary variable,

01:45:39.330 --> 01:45:40.840
like a glass, like Brian did.

01:45:40.840 --> 01:45:43.650
And I'll call it tmp, T-M-P,
which is pretty conventional when

01:45:43.650 --> 01:45:45.180
you want to swap two things in code.

01:45:45.180 --> 01:45:47.850
And I'm going to sign it,
temporarily, the value of a.

01:45:47.850 --> 01:45:51.550
I'm going to then change the contents
of a to equal whatever the contents of B

01:45:51.550 --> 01:45:52.050
are.

01:45:52.050 --> 01:45:56.010
And then I'm going to change b to be
whatever the contents of tmp were.

01:45:56.010 --> 01:45:58.650
So this feels pretty
reasonable and pretty correct,

01:45:58.650 --> 01:46:01.230
because it's just a literal
translation into code,

01:46:01.230 --> 01:46:03.700
now, of what Brian
did in the real world.

01:46:03.700 --> 01:46:05.610
And I think this will compile.

01:46:05.610 --> 01:46:08.040
So let's start there, make swap.

01:46:08.040 --> 01:46:09.690
It does-- oh, doesn't compile.

01:46:09.690 --> 01:46:13.410
OK, previous implicit declaration,
oh, so many errors, my god.

01:46:13.410 --> 01:46:15.687
Implicit declaration of function swap--

01:46:15.687 --> 01:46:16.270
wait a minute.

01:46:16.270 --> 01:46:17.230
I've seen that before.

01:46:17.230 --> 01:46:18.480
I've made this mistake before.

01:46:18.480 --> 01:46:20.050
You might have as well.

01:46:20.050 --> 01:46:23.293
Anytime you see this, recall it's just
that you're missing your prototype.

01:46:23.293 --> 01:46:25.710
Remember that the compiler is
going to take you literally.

01:46:25.710 --> 01:46:28.500
And if it doesn't know the word
swap exists when it sees it,

01:46:28.500 --> 01:46:30.310
it's not going to compile successfully.

01:46:30.310 --> 01:46:33.030
So we need to include my
prototype at the top of my file.

01:46:33.030 --> 01:46:35.460
Now, let me try this again, make swap.

01:46:35.460 --> 01:46:36.780
OK, that compiles.

01:46:36.780 --> 01:46:40.950
Let me go ahead now and run swap
and recall that, in main, what I did

01:46:40.950 --> 01:46:43.380
was initialize x to 1, y to 2.

01:46:43.380 --> 01:46:45.900
I then print out what
x is and what y is.

01:46:45.900 --> 01:46:50.040
I call swap, and then I print
out what x is and y is again.

01:46:50.040 --> 01:46:52.770
So I should see 1, 2, and then 2, 1.

01:46:52.770 --> 01:46:55.430
So lets hit Enter.

01:46:55.430 --> 01:46:58.800
Huh, it does not seem to be working.

01:46:58.800 --> 01:47:01.740
Well, let's try it again, just in case--

01:47:01.740 --> 01:47:04.020
no, not working.

01:47:04.020 --> 01:47:05.530
Well, let me try this.

01:47:05.530 --> 01:47:07.590
Let me add some-- printf is my friend.

01:47:07.590 --> 01:47:10.971
Let me go ahead and say a is percent i.

01:47:10.971 --> 01:47:14.460
b is percent i backslash n, a, b.

01:47:14.460 --> 01:47:15.510
So let's print that out.

01:47:15.510 --> 01:47:16.650
And let's print that out twice.

01:47:16.650 --> 01:47:18.480
So this would be a reasonable
debugging technique.

01:47:18.480 --> 01:47:21.605
If you want to know what's going on
underneath the hood, add some printf's.

01:47:21.605 --> 01:47:23.760
Let me go ahead and make swap.

01:47:23.760 --> 01:47:26.520
That compiles, dot slash swap.

01:47:26.520 --> 01:47:32.880
And let's see, a is 1,
b is 2, a is 2, b is 1.

01:47:32.880 --> 01:47:35.470
But then x and y are unchanged.

01:47:35.470 --> 01:47:37.170
So I feel like my logic is right.

01:47:37.170 --> 01:47:38.550
It's switching a and b.

01:47:38.550 --> 01:47:41.490
But it's not actually switching x and y.

01:47:41.490 --> 01:47:43.340
And I could confirm as much, right?

01:47:43.340 --> 01:47:45.510
The more powerful way
to debug this would

01:47:45.510 --> 01:47:49.890
be to run debug50, set a break
point, for instance, at line 17,

01:47:49.890 --> 01:47:54.270
step through my code, step by step,
stepping into the swap function.

01:47:54.270 --> 01:47:57.030
But for now, it seems
clear that swap works.

01:47:57.030 --> 01:48:00.250
But main isn't really
seeing those results.

01:48:00.250 --> 01:48:01.450
So what's actually going on?

01:48:01.450 --> 01:48:04.170
Well, let's consider this real world
incarnation of what my memory is

01:48:04.170 --> 01:48:05.712
so I can actually move things around.

01:48:05.712 --> 01:48:08.820
And this is all thanks to our friends
in the theater's prop shop in back.

01:48:08.820 --> 01:48:10.830
If we think of this as
my computer's memory,

01:48:10.830 --> 01:48:12.540
initially, it's all garbage values.

01:48:12.540 --> 01:48:16.080
But I can use this as a canvas to
start laying things out in memory.

01:48:16.080 --> 01:48:19.020
But calling functions is something
we've taken for granted thus far.

01:48:19.020 --> 01:48:22.200
And it turns out, when you call
functions, the computer, by default,

01:48:22.200 --> 01:48:25.500
uses this memory in
kind of a standard way.

01:48:25.500 --> 01:48:29.850
In fact, let me go ahead and
draw a more pictorial picture.

01:48:29.850 --> 01:48:33.440
Let me draw a more literal picture here,
if you will, of the computer's memory

01:48:33.440 --> 01:48:33.940
again.

01:48:33.940 --> 01:48:36.660
So if this is the computer's memory
and we zoom in on one of the chips,

01:48:36.660 --> 01:48:39.120
and we think of the chip as having
a whole bunch of bytes like this.

01:48:39.120 --> 01:48:42.390
Let's abstract away the actual hardware
and think of it as we have been.

01:48:42.390 --> 01:48:45.720
It's just this big rectangular region
of memory, not unlike all of those Oscar

01:48:45.720 --> 01:48:47.520
the Grouches a moment ago.

01:48:47.520 --> 01:48:51.150
But by convention, your computer
does not just plop things

01:48:51.150 --> 01:48:52.710
in random locations in memory.

01:48:52.710 --> 01:48:55.710
It has certain rules of
thumb that it adheres to.

01:48:55.710 --> 01:48:59.460
In particular, it treats different
portions of your computer's memory

01:48:59.460 --> 01:49:00.330
in different ways.

01:49:00.330 --> 01:49:03.570
It uses it in a standard way so
that it's not completely random.

01:49:03.570 --> 01:49:08.910
For instance, when you run a program by
doing dot slash something on CS50 IDE

01:49:08.910 --> 01:49:12.270
or on Linux more generally, or
you double click an icon on Mac OS

01:49:12.270 --> 01:49:16.590
or Windows, that
triggers the computer's--

01:49:16.590 --> 01:49:21.030
the program's 0's and 1's stored on
your hard drive to be loaded up here,

01:49:21.030 --> 01:49:23.742
to what we'll call machine code,
which again, is the 0's and 1's.

01:49:23.742 --> 01:49:25.950
So if you think again,
metaphorically, as your memory

01:49:25.950 --> 01:49:29.730
is this rectangular region,
then the machine code,

01:49:29.730 --> 01:49:35.732
the 0's and 1's composing your program
are loaded into the top part of memory.

01:49:35.732 --> 01:49:38.940
And again, top, bottom, left, right, it
has no fundamental technical meaning.

01:49:38.940 --> 01:49:40.470
It's just an artist's rendition.

01:49:40.470 --> 01:49:42.960
But it does go into a standard location.

01:49:42.960 --> 01:49:45.700
Below that are all of
your global variables.

01:49:45.700 --> 01:49:48.250
So are your constants that you
put outside of your functions.

01:49:48.250 --> 01:49:50.500
Those are going to end up
just below the machine code,

01:49:50.500 --> 01:49:53.340
so again, at the top of
your computer's memory.

01:49:53.340 --> 01:49:55.200
Below that is what's called the heap.

01:49:55.200 --> 01:49:56.940
And this is a technical term.

01:49:56.940 --> 01:50:00.780
And it refers to a big
chunk of memory that malloc

01:50:00.780 --> 01:50:03.640
uses to get you some spare memory.

01:50:03.640 --> 01:50:09.270
Any time you call malloc, you are given
the address of some chunk of memory

01:50:09.270 --> 01:50:13.200
up in this region, below the machine
code, below your global variables.

01:50:13.200 --> 01:50:15.270
And it's kind of a big zone.

01:50:15.270 --> 01:50:19.120
But the catch is that other parts
of your memory are used differently.

01:50:19.120 --> 01:50:24.570
In fact, whereas the heap is
considered to be here on down, somewhat

01:50:24.570 --> 01:50:28.830
worrisomely, the stack is
considered to be here on up.

01:50:28.830 --> 01:50:32.070
This is to say, when you call
malloc and ask for memory,

01:50:32.070 --> 01:50:35.670
that gets allocated up here.

01:50:35.670 --> 01:50:39.540
When you call a function,
though, those functions

01:50:39.540 --> 01:50:42.900
use what's called stack
space instead of heap space.

01:50:42.900 --> 01:50:48.450
So any time you call a function, main
or swap or strlang or string compare

01:50:48.450 --> 01:50:51.330
or any of the functions
you've used thus far,

01:50:51.330 --> 01:50:54.150
your computer will
automatically store any

01:50:54.150 --> 01:50:58.860
of the local variables or parameters
from those functions down here.

01:50:58.860 --> 01:51:00.840
Now, this is not
necessarily the best design,

01:51:00.840 --> 01:51:02.550
because you can see the
two arrows pointing at one

01:51:02.550 --> 01:51:05.383
another is like two trains barreling
down the tracks at one another.

01:51:05.383 --> 01:51:07.268
Bad things can eventually happen.

01:51:07.268 --> 01:51:09.060
Thankfully, we typically
have enough memory

01:51:09.060 --> 01:51:12.370
that these two things don't collide,
but more on that in just a bit.

01:51:12.370 --> 01:51:15.570
So again, when you call functions,
memory down here is used.

01:51:15.570 --> 01:51:17.710
When you use malloc,
memory up here is used.

01:51:17.710 --> 01:51:19.710
Now, for my swap function,
I'm not using malloc.

01:51:19.710 --> 01:51:21.690
So I don't think I have
to worry about heap.

01:51:21.690 --> 01:51:23.283
And I don't have any global variables.

01:51:23.283 --> 01:51:25.200
And I don't really care
about my machine code.

01:51:25.200 --> 01:51:27.240
I just need to know that
it's stored somewhere.

01:51:27.240 --> 01:51:30.210
But let's consider, then,
what the stack is all about.

01:51:30.210 --> 01:51:32.670
The stack, indeed, is
this sort of dynamic place

01:51:32.670 --> 01:51:34.860
where memory keeps
getting used and reused.

01:51:34.860 --> 01:51:40.440
So for instance, when you call main,
as you might when this swap program is

01:51:40.440 --> 01:51:45.010
run, main uses a sliver of memory at
the bottom of this picture, if you will.

01:51:45.010 --> 01:51:47.910
So the local variables
in main, like x and y,

01:51:47.910 --> 01:51:49.920
end up at this bottom portion of memory.

01:51:49.920 --> 01:51:53.790
When you call swap, swap uses a
chunk of memory just above main,

01:51:53.790 --> 01:51:58.350
pictorally, in this diagram, such
as variables a and b and temp,

01:51:58.350 --> 01:51:59.410
for that matter.

01:51:59.410 --> 01:52:04.680
And then, once swap returns and is
done executing, that sliver of memory

01:52:04.680 --> 01:52:06.010
essentially goes away.

01:52:06.010 --> 01:52:07.230
Now, it doesn't disappear.

01:52:07.230 --> 01:52:09.610
Obviously, there's still
physical memory there.

01:52:09.610 --> 01:52:12.810
But that's when we get into the
discussion of garbage values again.

01:52:12.810 --> 01:52:15.540
They're still like Oscar the
Grouches all over the place.

01:52:15.540 --> 01:52:18.600
You just don't know, or at this
point care, what the values are.

01:52:18.600 --> 01:52:20.010
But there are values there.

01:52:20.010 --> 01:52:23.640
And that's why, a moment ago, when I
printed out that uninitialized score's

01:52:23.640 --> 01:52:26.970
array, I did see some bogus
values, because there's still

01:52:26.970 --> 01:52:30.510
going to be 0's and 1's there
that are left over from before.

01:52:30.510 --> 01:52:31.750
The problem, though, is this.

01:52:31.750 --> 01:52:35.070
Let me go over to this physical
incarnation of our memory

01:52:35.070 --> 01:52:38.010
and consider this as being our
stack, so it's growing on up.

01:52:38.010 --> 01:52:42.060
And in fact, if I want to have two
local variables like I do, x and y,

01:52:42.060 --> 01:52:47.400
let's go ahead and think of this
row of memory here as being main,

01:52:47.400 --> 01:52:48.870
for instance, here.

01:52:48.870 --> 01:52:51.630
And I'm going to go ahead and
replace all these garbage values

01:52:51.630 --> 01:52:53.790
with an actual value that I care about.

01:52:53.790 --> 01:52:57.660
And the actual values that I care about,
we're going to call x and y, just as

01:52:57.660 --> 01:52:58.480
before.

01:52:58.480 --> 01:53:01.020
So each of these Oscars
happens to be one byte.

01:53:01.020 --> 01:53:02.068
But an int is 4 bytes.

01:53:02.068 --> 01:53:04.110
So thankfully, from our
friends in the prop shop,

01:53:04.110 --> 01:53:06.178
we have these bigger
integer-sized blocks.

01:53:06.178 --> 01:53:08.220
And I'm going to go ahead
and slide this in here.

01:53:08.220 --> 01:53:10.740
And we're going to think
of this, in a moment, as x.

01:53:10.740 --> 01:53:14.340
And indeed, I'm going to go ahead
and call this x with a marker.

01:53:14.340 --> 01:53:17.760
And then I'm going to go ahead and
give myself another integer, a size 4,

01:53:17.760 --> 01:53:19.300
and put it down here.

01:53:19.300 --> 01:53:21.300
And we're going to think of this as y.

01:53:21.300 --> 01:53:23.940
And recall, what do I
initialize these values to?

01:53:23.940 --> 01:53:27.690
Well, the value 1,
initially, and the value 2.

01:53:27.690 --> 01:53:29.370
But then I called the swap function.

01:53:29.370 --> 01:53:32.160
And the swap function has
two arguments, a and b.

01:53:32.160 --> 01:53:38.400
And those, by design, become copies of
x and y, because I passed in x comma y.

01:53:38.400 --> 01:53:41.280
And I defined swap as taking a comma b.

01:53:41.280 --> 01:53:44.970
So I think what I need to
do, physically here, is now

01:53:44.970 --> 01:53:50.170
think of this second row of memory as
now belonging to the swap function,

01:53:50.170 --> 01:53:51.210
not to main.

01:53:51.210 --> 01:53:54.090
And inside of this second
row of memory, I'll

01:53:54.090 --> 01:53:57.540
think of this as belonging to swap.

01:53:57.540 --> 01:54:02.100
And within the swap row, I'm going
to have another integer of size 4.

01:54:02.100 --> 01:54:07.500
And we're going to call this
one a, as down there, a.

01:54:07.500 --> 01:54:10.350
And then I'm going to have
another chunk of size 4.

01:54:10.350 --> 01:54:12.600
And we're going to call this b.

01:54:12.600 --> 01:54:16.050
And again, because those are just
the arguments, x comma y, otherwise

01:54:16.050 --> 01:54:20.760
now known as a comma b, I copy
1 and 2 into those values.

01:54:20.760 --> 01:54:22.770
But swap has a third variable.

01:54:22.770 --> 01:54:24.730
Brian proposed a temporary variable.

01:54:24.730 --> 01:54:27.480
So I'm going to go ahead and
give myself four more bytes,

01:54:27.480 --> 01:54:30.210
thereby getting rid of whatever
the garbage value's there

01:54:30.210 --> 01:54:34.260
and actually setting it
to an integer call tmp.

01:54:34.260 --> 01:54:39.030
So I'm going to go ahead and
call this thing tmp, T-M-P.

01:54:39.030 --> 01:54:40.440
And what did I do first?

01:54:40.440 --> 01:54:43.845
I set tmp equals to a.

01:54:43.845 --> 01:54:45.120
So tmp equals to a.

01:54:45.120 --> 01:54:47.520
So if a is 1, tmp is 1.

01:54:47.520 --> 01:54:48.750
Then what did I do?

01:54:48.750 --> 01:54:51.780
I then did a equals b.

01:54:51.780 --> 01:54:55.150
So b is 2.

01:54:55.150 --> 01:54:57.800
a is 2 as well.

01:54:57.800 --> 01:55:00.030
And then lastly, what did I do?

01:55:00.030 --> 01:55:02.145
I did b gets tmp.

01:55:02.145 --> 01:55:05.020
So I have to go ahead and change
this to be whatever the value of tmp

01:55:05.020 --> 01:55:07.630
is, which is now the number 1.

01:55:07.630 --> 01:55:10.150
So you can see that
swap is correct insofar

01:55:10.150 --> 01:55:12.655
as it is swapping the values of a and b.

01:55:12.655 --> 01:55:16.690
But the moment swap
returns, these return

01:55:16.690 --> 01:55:19.000
to being thought of as garbage values.

01:55:19.000 --> 01:55:20.860
Main is still in the middle of running.

01:55:20.860 --> 01:55:22.300
Swap is no longer running.

01:55:22.300 --> 01:55:23.743
But these values stay there.

01:55:23.743 --> 01:55:24.910
So those are garbage values.

01:55:24.910 --> 01:55:27.850
We happen to know what they are,
but they're no longer valid,

01:55:27.850 --> 01:55:32.560
because when I go to print out x and y
for the second time, what are x and y?

01:55:32.560 --> 01:55:33.820
They're still the same.

01:55:33.820 --> 01:55:37.870
And so this is to say, when you
actually write code that takes arguments

01:55:37.870 --> 01:55:40.750
and you pass arguments from
one function to another,

01:55:40.750 --> 01:55:43.930
those arguments are copied
from one function to another.

01:55:43.930 --> 01:55:47.140
And indeed, x and y are
copied into a and b.

01:55:47.140 --> 01:55:51.670
So your code may very well look correct
in that it's swopping correctly.

01:55:51.670 --> 01:55:55.750
But it's only swapping correctly
in the context of swap,

01:55:55.750 --> 01:55:58.370
not touching the original values.

01:55:58.370 --> 01:56:00.730
So what I think we need
to do, fundamentally,

01:56:00.730 --> 01:56:06.130
is reimplement swap in
such a way that we actually

01:56:06.130 --> 01:56:10.450
change the values of x and y.

01:56:10.450 --> 01:56:11.500
But how can we do this?

01:56:11.500 --> 01:56:13.810
Brian, if we could call in someone here.

01:56:13.810 --> 01:56:18.340
How could I conceptually change
my implementation of swap

01:56:18.340 --> 01:56:26.110
so that it somehow empowers me to change
x and y, not change copies of x and y?

01:56:26.110 --> 01:56:28.570
What could I pass into swap, Brian?

01:56:28.570 --> 01:56:31.150
BRIAN: Igor is suggesting
that we use pointers instead.

01:56:31.150 --> 01:56:33.733
DAVID MALAN: Yeah, so perhaps
the leading question here today.

01:56:33.733 --> 01:56:36.010
But pointers would seem
to give us a solution.

01:56:36.010 --> 01:56:38.170
If pointers are
essentially like a treasure

01:56:38.170 --> 01:56:41.500
map to a specific address in your
computer's memory, what I should really

01:56:41.500 --> 01:56:45.940
do from main to swap is pass
in not x and y literally,

01:56:45.940 --> 01:56:49.630
but why don't I pass in the
address of x and the address of y,

01:56:49.630 --> 01:56:53.230
so that swap can now
go to those addresses

01:56:53.230 --> 01:56:57.460
and actually do the sort of swap
that Brian enacted in person.

01:56:57.460 --> 01:57:02.050
So give the function a sort of map to
those values, pointers to those values,

01:57:02.050 --> 01:57:03.560
and then go to those values.

01:57:03.560 --> 01:57:04.580
So how might I do this?

01:57:04.580 --> 01:57:06.580
Well, the code has to be
a little different now.

01:57:06.580 --> 01:57:09.640
When I call swap this time,
what I really need to do

01:57:09.640 --> 01:57:12.710
is pass in the addresses
of these two variables.

01:57:12.710 --> 01:57:14.950
So I don't necessarily know
what those addresses are.

01:57:14.950 --> 01:57:16.900
But for the sake of
the story, we can just

01:57:16.900 --> 01:57:21.340
assume that this address,
for instance, is like, 0x123.

01:57:21.340 --> 01:57:25.142
And then four bytes away from
that might be 0x127, for instance.

01:57:25.142 --> 01:57:27.100
But again, it doesn't
really matter what it is.

01:57:27.100 --> 01:57:29.440
But they do have addresses, x and y.

01:57:29.440 --> 01:57:31.562
So a pointer recall
tends to be pretty big.

01:57:31.562 --> 01:57:33.520
So we needed to get out
a bigger piece of wood,

01:57:33.520 --> 01:57:35.590
eight bytes that represents a pointer.

01:57:35.590 --> 01:57:38.830
And I actually need to use a
bit more memory in swap now.

01:57:38.830 --> 01:57:42.490
If I now declare a to
be, not an integer,

01:57:42.490 --> 01:57:47.020
but a pointer to an int,
that is a int star variable,

01:57:47.020 --> 01:57:49.330
I could call this thing a now.

01:57:49.330 --> 01:57:54.340
And I could store, in it,
the address of x, like 0x123.

01:57:54.340 --> 01:57:57.640
If I then change the
definition of b to be

01:57:57.640 --> 01:58:01.390
not an integer, but a
pointer to an integer,

01:58:01.390 --> 01:58:04.810
that is another int star, which
happens to be eight bytes.

01:58:04.810 --> 01:58:07.780
I'm going to use a little more
memory for this thing, but that's OK.

01:58:07.780 --> 01:58:10.030
And its name is going to be b now.

01:58:10.030 --> 01:58:13.600
And it's going to contain 0x127.

01:58:13.600 --> 01:58:15.820
I still need a temporary variable.

01:58:15.820 --> 01:58:18.650
I still need a temporary
variable, but that's fine.

01:58:18.650 --> 01:58:20.980
I just need four bytes for
that, because the variable

01:58:20.980 --> 01:58:25.990
itself just needs to store an int, like
Brian temporarily stored it in a glass.

01:58:25.990 --> 01:58:29.260
So I just need an additional four
bytes, like before, for that.

01:58:29.260 --> 01:58:31.720
And now, let's just consider the logic.

01:58:31.720 --> 01:58:32.710
Here's main.

01:58:32.710 --> 01:58:34.990
And swap is now using these 3--

01:58:34.990 --> 01:58:36.550
2 and 1/2 rows of memory.

01:58:36.550 --> 01:58:37.240
And that's fine.

01:58:37.240 --> 01:58:39.640
It's growing upward as I proposed.

01:58:39.640 --> 01:58:41.860
X is at address 0x123.

01:58:41.860 --> 01:58:44.560
y is at address 0x127.

01:58:44.560 --> 01:58:48.370
Therefore, a and b, I propose
conceptually, like Igor proposed,

01:58:48.370 --> 01:58:52.280
store the addresses of
a, x and y, respectively.

01:58:52.280 --> 01:58:55.060
And now my code, I
think, needs to say this.

01:58:55.060 --> 01:59:00.025
Go and store, in the variable
tmp, whatever is at the address a.

01:59:00.025 --> 01:59:02.650
So you can kind of think of this
as being an arrow down here.

01:59:02.650 --> 01:59:03.910
Follow the arrow, OK.

01:59:03.910 --> 01:59:06.010
What is at address 0x123?

01:59:06.010 --> 01:59:06.910
The number 1.

01:59:06.910 --> 01:59:09.250
So we put one in tmp, just like before.

01:59:09.250 --> 01:59:10.310
Then what do we do?

01:59:10.310 --> 01:59:13.540
Well, now, I'm going to go ahead
and change, not the value of a,

01:59:13.540 --> 01:59:18.010
but I'm going to change what
is at the location in a to be

01:59:18.010 --> 01:59:24.790
whatever is at the location in b, which
is an arrow pointing down here, 0x127.

01:59:24.790 --> 01:59:27.850
So I'm going to change
this 1, now, to be a 2.

01:59:27.850 --> 01:59:30.910
And the third and final
step, recall, is for me, now,

01:59:30.910 --> 01:59:37.150
to go, not to b, but to go where b
points to, which happens to be y,

01:59:37.150 --> 01:59:42.440
and change that to be the value of
tmp, which of course, is up here.

01:59:42.440 --> 01:59:45.430
And at this point in the story,
it's still just three lines of code.

01:59:45.430 --> 01:59:47.380
They're different
types of lines of code.

01:59:47.380 --> 01:59:48.950
It's three lines of code.

01:59:48.950 --> 01:59:52.180
But when swap is done executing,
notice what we've done.

01:59:52.180 --> 01:59:55.190
We have successfully
swapped x and y by letting

01:59:55.190 --> 01:59:59.270
swap go to those addresses as
opposed to just naively getting

01:59:59.270 --> 02:00:02.180
copies of the values therein.

02:00:02.180 --> 02:00:05.150
Now, even though this code is
going to look a little cryptic,

02:00:05.150 --> 02:00:10.820
it's, frankly, just an application
of the logic we've seen thus far.

02:00:10.820 --> 02:00:13.860
I'm going to go ahead and go
back to my old buggy version.

02:00:13.860 --> 02:00:15.860
And I'm going to change
the definition of swap

02:00:15.860 --> 02:00:19.190
to say that it doesn't take
two integers, a and b, but two

02:00:19.190 --> 02:00:20.810
pointers to integers a and b.

02:00:20.810 --> 02:00:24.080
And the way you declare a pointer
recall is the type of variable

02:00:24.080 --> 02:00:26.767
you point at followed by a
star and then the name of it.

02:00:26.767 --> 02:00:28.850
And we haven't seen it,
admittedly, in the context

02:00:28.850 --> 02:00:31.550
of a function taking parameters yet.

02:00:31.550 --> 02:00:33.170
But it's quite simply that.

02:00:33.170 --> 02:00:34.610
I added the stars.

02:00:34.610 --> 02:00:40.040
Down here, I need to say,
store in tmp, whatever is at a.

02:00:40.040 --> 02:00:41.870
How do I express go to a?

02:00:41.870 --> 02:00:43.520
Just add a star here.

02:00:43.520 --> 02:00:46.880
How do I express go to a
and put whatever is at b?

02:00:46.880 --> 02:00:48.500
I add stars there.

02:00:48.500 --> 02:00:51.560
How do I say, go to b and
store whatever is at tmp?

02:00:51.560 --> 02:00:53.190
I add one star there.

02:00:53.190 --> 02:00:55.520
So tmp is just a simple integer.

02:00:55.520 --> 02:00:57.380
It's just an empty glass like Brian had.

02:00:57.380 --> 02:00:58.620
There's nothing fancy there.

02:00:58.620 --> 02:01:00.650
So we don't need stars around tmp.

02:01:00.650 --> 02:01:04.970
But I do, now, need to
change how I'm using a and b,

02:01:04.970 --> 02:01:08.330
because now they are addresses
that I actually want to go to.

02:01:08.330 --> 02:01:12.140
There's no need for the address
of operator in this context.

02:01:12.140 --> 02:01:14.330
But up here, I'm going
to need to make a change.

02:01:14.330 --> 02:01:16.380
I do need to change
the prototype to match.

02:01:16.380 --> 02:01:18.200
So that's just a copy paste.

02:01:18.200 --> 02:01:23.120
But I bet you can imagine
what, lastly, needs to change.

02:01:23.120 --> 02:01:26.750
When calling swap, I don't want to
pass in naively x and y, because again,

02:01:26.750 --> 02:01:27.980
they're going to get copied.

02:01:27.980 --> 02:01:32.000
I want to pass in the address
of x and the address of y,

02:01:32.000 --> 02:01:35.690
so that swap now has
sort of special access

02:01:35.690 --> 02:01:38.750
to the contents of those
locations in memory

02:01:38.750 --> 02:01:42.740
so that it actually can
make some changes therein.

02:01:42.740 --> 02:01:47.780
And that, indeed, if I now recompile
this program, make swap, and I do

02:01:47.780 --> 02:01:50.390
dot swap and cross my fingers, voila.

02:01:50.390 --> 02:01:53.855
Now, I have successfully
swapped lines of code.

02:01:53.855 --> 02:01:55.730
So last week, if you
were wondering, perhaps,

02:01:55.730 --> 02:01:58.250
why we didn't show you how
to do swap, we could have.

02:01:58.250 --> 02:01:59.900
And we didn't need a special function.

02:01:59.900 --> 02:02:03.200
You don't necessarily need pointers
if we did all of this in main.

02:02:03.200 --> 02:02:06.470
But I'm trying to introduce an
abstraction, this function that

02:02:06.470 --> 02:02:09.740
does swap just like Brian
swapped those glasses for us.

02:02:09.740 --> 02:02:12.650
And to pass values from
one function to another,

02:02:12.650 --> 02:02:15.990
you do need to understand what's
going on in your computer's memory

02:02:15.990 --> 02:02:18.830
so that you can actually pass
in little breadcrumbs again,

02:02:18.830 --> 02:02:23.330
treasure maps to those locations and
memories, again, thanks to these things

02:02:23.330 --> 02:02:25.100
called pointers.

02:02:25.100 --> 02:02:27.770
All right, well let me
propose and emphasize,

02:02:27.770 --> 02:02:30.770
then, that this design of
the heap being up at the top,

02:02:30.770 --> 02:02:33.200
where malloc uses memory
and the stack being

02:02:33.200 --> 02:02:35.540
at the bottom where your
own functions use memory,

02:02:35.540 --> 02:02:37.730
this is a problem clearly
waiting to happen.

02:02:37.730 --> 02:02:39.460
And those problems actually have names.

02:02:39.460 --> 02:02:41.210
And some of you who
have programmed before

02:02:41.210 --> 02:02:45.230
might know some of these terms, either
heap overflow or stack overflow.

02:02:45.230 --> 02:02:48.650
And in fact, many of you might know
stackoverflow.com as just a website.

02:02:48.650 --> 02:02:50.840
Well, there is an origin
story to its name.

02:02:50.840 --> 02:02:56.240
A stack overflow refers to the process
of calling a function so many times

02:02:56.240 --> 02:02:58.550
that it overflows the heap.

02:02:58.550 --> 02:03:00.320
That is, every time
you call the function,

02:03:00.320 --> 02:03:04.950
like I did here, you use more and
more rows, so to speak, of memory.

02:03:04.950 --> 02:03:07.730
And if you call so many
functions again and again,

02:03:07.730 --> 02:03:11.690
eventually, you may very well run
over the area of memory called heap.

02:03:11.690 --> 02:03:14.090
And at that point, your
program will crash.

02:03:14.090 --> 02:03:18.950
There is no fundamental solution to
that problem other than don't do that.

02:03:18.950 --> 02:03:20.420
Don't use too much memory.

02:03:20.420 --> 02:03:21.680
But that can be hard to do.

02:03:21.680 --> 02:03:24.138
And indeed, that's one of the
dangers of programming today.

02:03:24.138 --> 02:03:27.800
And we can actually induce this a
little bit deliberately ourselves.

02:03:27.800 --> 02:03:30.620
And in fact, I thought we
could revisit, for instance,

02:03:30.620 --> 02:03:34.220
where we left off with Mario last
time, which was this picture here.

02:03:34.220 --> 02:03:37.580
Recall that this was
a pyramid, of course,

02:03:37.580 --> 02:03:40.400
simpler than the one you might
have played with for problems at 0.

02:03:40.400 --> 02:03:44.360
But it's a recursive pyramid in that
you can define a pyramid of height 4,

02:03:44.360 --> 02:03:47.690
in terms of a pyramid of height 3,
in terms of a pyramid of height 2

02:03:47.690 --> 02:03:48.380
and a height 1.

02:03:48.380 --> 02:03:52.580
And indeed, I built that last
week using these very blocks.

02:03:52.580 --> 02:03:56.180
Well, you can implement
Mario's pyramid like this

02:03:56.180 --> 02:03:57.660
in a couple of different ways.

02:03:57.660 --> 02:04:01.160
One is just using week one
style iteration, using a loop.

02:04:01.160 --> 02:04:03.890
And in fact, let me go ahead and
whip up a quick solution that

02:04:03.890 --> 02:04:05.340
does exactly that.

02:04:05.340 --> 02:04:07.730
Let me go ahead and call this mario.c.

02:04:07.730 --> 02:04:10.610
And I'm going to go
ahead and include cs50.h.

02:04:10.610 --> 02:04:12.290
So we can use one of our get functions.

02:04:12.290 --> 02:04:14.300
I'm going to use standard io dot h.

02:04:14.300 --> 02:04:16.160
And I'm going to do int main void.

02:04:16.160 --> 02:04:18.590
And all I want to do is
print out this pyramid.

02:04:18.590 --> 02:04:20.340
But I want to ask the
user for the height.

02:04:20.340 --> 02:04:23.090
So I'm going to say int
height equals get int.

02:04:23.090 --> 02:04:26.870
And we'll ask the user for the height,
just like you did for problem set 1.

02:04:26.870 --> 02:04:30.000
And then I'm going to go ahead
and draw a pyramid of that height.

02:04:30.000 --> 02:04:31.340
Now, draw doesn't exist.

02:04:31.340 --> 02:04:32.030
But that's fine.

02:04:32.030 --> 02:04:34.735
I'm going to go ahead and draw
this now, implement draw myself.

02:04:34.735 --> 02:04:36.860
It doesn't need to return
a value, because I'm just

02:04:36.860 --> 02:04:38.273
printing stuff on the screen.

02:04:38.273 --> 02:04:40.190
Function's called draw,
and it's going to take

02:04:40.190 --> 02:04:42.710
an input called h, for
instance. h for height,

02:04:42.710 --> 02:04:45.080
but I could call its
argument anything I want.

02:04:45.080 --> 02:04:48.650
And then I'm just going to
do this, for int i gets 1,

02:04:48.650 --> 02:04:52.850
i less than or equal to h, i++.

02:04:52.850 --> 02:04:56.170
And then inside of this, this is where
you might recall, from problem set one,

02:04:56.170 --> 02:04:58.700
have found a nested loop to be useful.

02:04:58.700 --> 02:05:04.150
Let me do int j gets 1, j
less than or equal to i, j++.

02:05:04.150 --> 02:05:08.178
This will be similar but not identical
to either the less comfortable or more

02:05:08.178 --> 02:05:09.970
comfortable version of
Mario from the past,

02:05:09.970 --> 02:05:13.240
because this pyramid is shaped
in a different direction.

02:05:13.240 --> 02:05:15.610
Now, you print a hash there.

02:05:15.610 --> 02:05:17.830
And then let me go ahead
and print a new line here.

02:05:17.830 --> 02:05:19.570
So I did this super quickly.

02:05:19.570 --> 02:05:21.880
But logically, what
I'm doing is iterating

02:05:21.880 --> 02:05:29.710
over every row, so from 1 through
h, so row 1, 2, 3, 4, for instance.

02:05:29.710 --> 02:05:34.210
And then on each row, I'm deliberately
iterating from 1 through i.

02:05:34.210 --> 02:05:37.870
So I print 1, then 2, then 3, then 4.

02:05:37.870 --> 02:05:39.640
And again, I could zero index if I want.

02:05:39.640 --> 02:05:44.170
I find that in this context, more
user friendly, more intelligible to me

02:05:44.170 --> 02:05:46.660
to index from 1, totally
reasonable if you think

02:05:46.660 --> 02:05:48.310
there's a compelling design argument.

02:05:48.310 --> 02:05:50.030
So let me go ahead and make Mario.

02:05:50.030 --> 02:05:51.520
Ah, darn it.

02:05:51.520 --> 02:05:53.980
Oh, I missed my prototype.

02:05:53.980 --> 02:05:55.870
So notice, it's not understanding draw.

02:05:55.870 --> 02:05:58.900
So the fix for that is to
either move the whole function

02:05:58.900 --> 02:06:02.980
or, as we've preached instead, to
just put your prototype up top.

02:06:02.980 --> 02:06:05.050
Let me recompile Mario.

02:06:05.050 --> 02:06:06.430
OK, now successful.

02:06:06.430 --> 02:06:08.710
Mario, let's do a
height of 4, and voila.

02:06:08.710 --> 02:06:11.350
Now, I have a relatively
simple-- though I certainly

02:06:11.350 --> 02:06:13.600
did it faster than you might
without some practice--

02:06:13.600 --> 02:06:15.760
implementation of Mario's pyramid.

02:06:15.760 --> 02:06:17.980
But here's where things
get kind of cool.

02:06:17.980 --> 02:06:20.800
Let me stipulate that that is a
correct iterative solution, even

02:06:20.800 --> 02:06:24.970
if it might take you some number
of steps or trial and error

02:06:24.970 --> 02:06:28.180
to get that iterative
loop-based code correct.

02:06:28.180 --> 02:06:30.580
Let me change this,
now, to be recursive.

02:06:30.580 --> 02:06:34.510
And recall, a recursive function
is one that calls itself.

02:06:34.510 --> 02:06:37.660
How do you print a pyramid of height h?

02:06:37.660 --> 02:06:41.980
Well, recall that you print a
pyramid of height h minus 1,

02:06:41.980 --> 02:06:45.340
and then you proceed to
print one more row of blocks.

02:06:45.340 --> 02:06:48.970
So let me take that literally.
for int i gets zero.

02:06:48.970 --> 02:06:51.550
i is less than h, i++.

02:06:51.550 --> 02:06:54.550
Let me go ahead and just
print that extra row of bricks

02:06:54.550 --> 02:06:58.480
like this, followed by a new line.

02:06:58.480 --> 02:07:00.260
So now, I did this kind of fast.

02:07:00.260 --> 02:07:01.340
But what am I doing here?

02:07:01.340 --> 02:07:06.520
Well, if the height equals 1, I
want this loop to iterate one time.

02:07:06.520 --> 02:07:10.760
If the height equals 2, I wanted to
iterate two times, 3, and so forth.

02:07:10.760 --> 02:07:14.260
So I think, using my zero-indexing
technique here, this will work too.

02:07:14.260 --> 02:07:17.080
But if you prefer, I could
certainly just change this to a 1

02:07:17.080 --> 02:07:18.638
and change this 2.

02:07:18.638 --> 02:07:19.930
But I'm going to go ahead and--

02:07:19.930 --> 02:07:20.500
actually, no.

02:07:20.500 --> 02:07:23.350
In this case, I want to
leave it as such, zero index,

02:07:23.350 --> 02:07:25.450
just like we typically do.

02:07:25.450 --> 02:07:29.200
All right, let me go ahead
and compile this, make Mario.

02:07:29.200 --> 02:07:31.870
OK, oops, interesting.

02:07:31.870 --> 02:07:34.940
All paths through this
function will call itself.

02:07:34.940 --> 02:07:37.780
So clang is being kind
of smart here, whereby,

02:07:37.780 --> 02:07:42.260
it's noticing that in my draw
function, I'm calling my draw function.

02:07:42.260 --> 02:07:44.358
And that's a process that never changes.

02:07:44.358 --> 02:07:46.150
In fact, let me see if
I can override that.

02:07:46.150 --> 02:07:51.310
Let me use clang manually and compile
a program called mario using mario.c.

02:07:51.310 --> 02:07:53.140
And let me go ahead and link in cs50.

02:07:53.140 --> 02:07:55.960
So I'm using our old school
syntax from week two.

02:07:55.960 --> 02:07:56.980
OK, that compiled.

02:07:56.980 --> 02:07:58.270
And why did that compile?

02:07:58.270 --> 02:08:01.872
Well, make is, again, a program
that uses your compiler clang.

02:08:01.872 --> 02:08:05.080
And we've configured make to be a little
more user-friendly and a little more

02:08:05.080 --> 02:08:07.450
protective of you by
turning on special features

02:08:07.450 --> 02:08:09.250
where we detect problems like that.

02:08:09.250 --> 02:08:12.730
By using clang directly now, I'm
disabling those special checks.

02:08:12.730 --> 02:08:16.840
And watch what happens when I run Mario
now for height of 4, for instance.

02:08:16.840 --> 02:08:18.730
Boom, it crashed.

02:08:18.730 --> 02:08:20.500
It didn't even print anything.

02:08:20.500 --> 02:08:21.953
It crashed pretty quickly.

02:08:21.953 --> 02:08:25.120
And again, a segmentation fault means
you touched memory that you shouldn't.

02:08:25.120 --> 02:08:26.200
So what's going on?

02:08:26.200 --> 02:08:30.302
Well, if you think of this memory as
representing main still, but then draw,

02:08:30.302 --> 02:08:33.610
draw, draw, draw, draw, draw.

02:08:33.610 --> 02:08:37.540
If every one of your calls to
draw just cause draw again,

02:08:37.540 --> 02:08:39.070
why would it ever stop?

02:08:39.070 --> 02:08:41.590
It wouldn't seem to
stop here, necessarily.

02:08:41.590 --> 02:08:45.070
So it seems that I'm missing a key
detail in my recursive version.

02:08:45.070 --> 02:08:45.670
You know what?

02:08:45.670 --> 02:08:51.130
If there's nothing to draw, if height
equals equals 0, let me go ahead, then,

02:08:51.130 --> 02:08:54.260
and just return immediately.

02:08:54.260 --> 02:08:57.250
Otherwise, I'll go ahead
and draw part of the pyramid

02:08:57.250 --> 02:08:59.260
and then add the new row.

02:08:59.260 --> 02:09:02.110
So you need this so-called
base case, which you literally

02:09:02.110 --> 02:09:05.410
choose to equal some simple value,
like height of 0, height of 1,

02:09:05.410 --> 02:09:10.880
any hardcoded value, so that
eventually, draw does not call itself.

02:09:10.880 --> 02:09:15.040
So let me go ahead and recompile
this with clang or make.

02:09:15.040 --> 02:09:18.430
Let me rerun it, height of 4, and voila.

02:09:18.430 --> 02:09:20.680
It's still working just
like the interior version,

02:09:20.680 --> 02:09:22.340
but it's now using recursion.

02:09:22.340 --> 02:09:24.250
So here's a sort of design question.

02:09:24.250 --> 02:09:26.020
Is iteration better than recursion?

02:09:26.020 --> 02:09:26.680
It depends.

02:09:26.680 --> 02:09:28.270
Iteration will always work.

02:09:28.270 --> 02:09:32.290
When using the iterative version,
I will never overflow the stack

02:09:32.290 --> 02:09:33.140
and hit the heap.

02:09:33.140 --> 02:09:33.640
Why?

02:09:33.640 --> 02:09:35.723
Because I'm not calling
functions again and again.

02:09:35.723 --> 02:09:38.410
There's only main and
one invocation of draw.

02:09:38.410 --> 02:09:42.550
But with the recursive version,
it's kind of a cool, powerful way

02:09:42.550 --> 02:09:43.270
to do things.

02:09:43.270 --> 02:09:45.610
Like, oh, I can draw you
a pyramid of height h.

02:09:45.610 --> 02:09:48.370
Let me just have you draw me
a pyramid of height h minus 1,

02:09:48.370 --> 02:09:49.750
and then I'll add a row.

02:09:49.750 --> 02:09:54.950
It's kind of this clever, cyclical
argument that does work very elegantly.

02:09:54.950 --> 02:09:56.150
But there's a danger.

02:09:56.150 --> 02:10:00.830
And in fact, even though this base case
ensures that it doesn't go forever,

02:10:00.830 --> 02:10:05.180
it could go on so long-- maybe
let's try 10,000 invocations.

02:10:05.180 --> 02:10:06.290
So that worked OK.

02:10:06.290 --> 02:10:07.820
It's a little slow.

02:10:07.820 --> 02:10:09.320
I'm losing control over my keyboard.

02:10:09.320 --> 02:10:10.730
So Control C is your friend.

02:10:10.730 --> 02:10:12.050
Let me try this once more.

02:10:12.050 --> 02:10:16.700
Let me go ahead and do something
like 2 billion and see if that works.

02:10:16.700 --> 02:10:17.540
Boom.

02:10:17.540 --> 02:10:19.110
So even that doesn't work.

02:10:19.110 --> 02:10:21.710
So there's this inherent
danger with recursion, whereby,

02:10:21.710 --> 02:10:25.010
even though it empowered us last week
to solve a problem even more efficiently

02:10:25.010 --> 02:10:29.810
with merge sort, we kind of got lucky,
in that we weren't trying to crazy big

02:10:29.810 --> 02:10:33.080
things on Brian's shelf, because
it would seem if you use recursion

02:10:33.080 --> 02:10:35.330
and call yourself again and
again and again and again,

02:10:35.330 --> 02:10:40.340
even finitely many times, you might
eventually touch memory you shouldn't.

02:10:40.340 --> 02:10:42.290
And what's the solution here?

02:10:42.290 --> 02:10:44.510
Unfortunately, it's don't do that.

02:10:44.510 --> 02:10:48.020
Design your algorithms, choose
your inputs in such a way

02:10:48.020 --> 02:10:49.560
that there just isn't that risk.

02:10:49.560 --> 02:10:51.800
And we'll use recursion
again in a few weeks

02:10:51.800 --> 02:10:54.800
time when we look at more
sophisticated data structures.

02:10:54.800 --> 02:10:56.600
But again, there's
always this trade off.

02:10:56.600 --> 02:10:58.725
Just because you can design
something a little more

02:10:58.725 --> 02:11:03.120
elegantly doesn't necessarily mean
that it's always going to work for you.

02:11:03.120 --> 02:11:06.560
But more commonly, are you likely
to run into other problems as well?

02:11:06.560 --> 02:11:08.540
There's something called
a buffer overflow.

02:11:08.540 --> 02:11:10.880
And this you will surely trip
over in the coming weeks.

02:11:10.880 --> 02:11:13.610
A buffer overflow is when
you allocate an array

02:11:13.610 --> 02:11:15.590
and go too far past the end of it.

02:11:15.590 --> 02:11:18.650
Or you use malloc and you,
nonetheless, go farther

02:11:18.650 --> 02:11:21.020
than the end of the chunk of
memory that you allocated.

02:11:21.020 --> 02:11:25.010
A buffer it's just a chunk of memory,
so to speak, that you can use as you see

02:11:25.010 --> 02:11:25.550
fit.

02:11:25.550 --> 02:11:30.230
Buffer overflow means going beyond
the boundaries of that array.

02:11:30.230 --> 02:11:32.930
You might use-- you're
using, right now, video.

02:11:32.930 --> 02:11:35.125
You might know the phrase
buffering from videos,

02:11:35.125 --> 02:11:37.250
like sort of buffering and
annoying you on Netflix,

02:11:37.250 --> 02:11:39.050
because there's a
spinning icon or whatnot.

02:11:39.050 --> 02:11:40.700
Well, that means exactly this.

02:11:40.700 --> 02:11:44.090
A buffer, in the context of
YouTube or Zoom or Netflix,

02:11:44.090 --> 02:11:46.910
means some chunk of
memory that was retrieved

02:11:46.910 --> 02:11:49.880
via malloc or some similar
tool that gets filled

02:11:49.880 --> 02:11:52.580
with bytes comprising your video.

02:11:52.580 --> 02:11:56.210
And it's finite, which is why you
can only buffer so many seconds

02:11:56.210 --> 02:11:59.520
or minutes of video before,
eventually, if you're offline,

02:11:59.520 --> 02:12:01.220
you run out of video content to watch.

02:12:01.220 --> 02:12:02.930
And the stupid icon
comes up, and you can

02:12:02.930 --> 02:12:07.680
watch no more, because a buffer is just
a chunk of memory, an array of memory.

02:12:07.680 --> 02:12:12.830
And if Netflix or Google or others
were to implement their code unsafely,

02:12:12.830 --> 02:12:16.740
they might very well go too
far past that boundary as well.

02:12:16.740 --> 02:12:22.070
So with all this said, let's
consider, in some of our final minutes

02:12:22.070 --> 02:12:26.000
here today, just what else we've been
getting from these training wheels,

02:12:26.000 --> 02:12:28.830
because we do want to take
them mostly off for you.

02:12:28.830 --> 02:12:30.890
So the CS50 library
not only provides you

02:12:30.890 --> 02:12:33.855
with this abstraction of a
string type, which again,

02:12:33.855 --> 02:12:35.480
doesn't give you any new functionality.

02:12:35.480 --> 02:12:38.600
Strings in C exist,
just not by that name.

02:12:38.600 --> 02:12:40.850
They're known more
properly as char stars.

02:12:40.850 --> 02:12:43.730
But all of these functions
in the CS50 library

02:12:43.730 --> 02:12:49.490
can be implemented with other actual
C functions that weren't from CS50,

02:12:49.490 --> 02:12:51.740
namely using one called scanf.

02:12:51.740 --> 02:12:54.260
But you're going to see,
immediately, some of the dangers

02:12:54.260 --> 02:12:57.980
of using something like scanf,
which is an old school function.

02:12:57.980 --> 02:13:01.280
It was not designed to be
self-defensive like CS50's library.

02:13:01.280 --> 02:13:03.510
And so it's very easy to make mistakes.

02:13:03.510 --> 02:13:06.650
Let me go ahead, for
instance, and create a file

02:13:06.650 --> 02:13:09.860
called scanf.c, just to
demonstrate this function.

02:13:09.860 --> 02:13:13.200
I'm not going to use the CS50
library, just standard io dot h.

02:13:13.200 --> 02:13:15.470
And I'm going to give
myself int main void.

02:13:15.470 --> 02:13:18.110
And I'm going to go ahead
and give myself a variable x.

02:13:18.110 --> 02:13:21.260
And I'm going to go ahead and
print out quote unquote, "x:"

02:13:21.260 --> 02:13:24.060
just like CS50's get int function does.

02:13:24.060 --> 02:13:25.940
And then I'm going to call scanf.

02:13:25.940 --> 02:13:30.170
And I'm going to go ahead and say, scan
from the user's keyboard, an integer,

02:13:30.170 --> 02:13:33.708
and store it in the location of x.

02:13:33.708 --> 02:13:35.750
Then, I'm going to go
ahead and print out, again,

02:13:35.750 --> 02:13:40.340
x, and a colon and a backslash
percent i backslash n.

02:13:40.340 --> 02:13:41.420
And I'm going to print x.

02:13:41.420 --> 02:13:42.830
So what's going on here?

02:13:42.830 --> 02:13:46.580
In line 5, I'm declaring a variable
called x, just like in week one.

02:13:46.580 --> 02:13:49.220
Line 6, just using
printf, like in week one.

02:13:49.220 --> 02:13:52.460
The interesting stuff
seems to be in line 7.

02:13:52.460 --> 02:13:56.870
Scanf is a function that takes input
from the user, just like get int, get

02:13:56.870 --> 02:13:58.500
string, get float, and so forth.

02:13:58.500 --> 02:14:02.630
But it does it only by you
having to understand pointers,

02:14:02.630 --> 02:14:07.790
because recall from our swap example,
if you want to have a function,

02:14:07.790 --> 02:14:12.110
change the contents of a
variable, as we did with a and b

02:14:12.110 --> 02:14:15.920
and x and y, you have to pass in
the address of the variable, whose

02:14:15.920 --> 02:14:17.060
value you want to change.

02:14:17.060 --> 02:14:19.200
You can't just pass in x itself.

02:14:19.200 --> 02:14:22.263
So if we didn't use the
CS50 library in week one,

02:14:22.263 --> 02:14:25.430
you would have been writing code like
this just to get an int from the user.

02:14:25.430 --> 02:14:27.347
And you would have had
to understand pointers.

02:14:27.347 --> 02:14:30.170
And you would have to understand
ampersand and stars and so forth.

02:14:30.170 --> 02:14:32.712
It's just too much, when all we
care about in the first weeks

02:14:32.712 --> 02:14:35.990
are loops and variables and conditions
and sort of the fundamentals.

02:14:35.990 --> 02:14:39.230
But here, we now have the
ability to call scanf, tell it

02:14:39.230 --> 02:14:41.150
to scan from the user's
keyboard, so to speak,

02:14:41.150 --> 02:14:45.380
an integer, or percent f would give
us a float or other such codes,

02:14:45.380 --> 02:14:49.040
and pass in the address of x so
that scanf can go to that address

02:14:49.040 --> 02:14:51.440
and put the integer from
the user's keyboard there.

02:14:51.440 --> 02:14:53.030
Line 8 is like week one stuff.

02:14:53.030 --> 02:14:54.680
I'm just printing out the value.

02:14:54.680 --> 02:14:55.950
And this is pretty safe.

02:14:55.950 --> 02:14:57.800
I'm going to go ahead and make scanf.

02:14:57.800 --> 02:14:58.495
It compiles OK.

02:14:58.495 --> 02:14:59.870
I'm going to go ahead and run it.

02:14:59.870 --> 02:15:00.980
I'm going to type in 50.

02:15:00.980 --> 02:15:03.180
And voila, it prints out a 50.

02:15:03.180 --> 02:15:06.920
But there's some weirdness,
because if you run this program too

02:15:06.920 --> 02:15:09.410
and type in cat, well then x is 0.

02:15:09.410 --> 02:15:10.940
And there's no error checking.

02:15:10.940 --> 02:15:12.767
So immediately, you
should glimpse that one

02:15:12.767 --> 02:15:14.600
of the features of the
CS50 library, recall,

02:15:14.600 --> 02:15:17.630
is that we keep prompting the user
again and again if they're not

02:15:17.630 --> 02:15:19.310
cooperating and giving you an int.

02:15:19.310 --> 02:15:21.740
So that's one feature
you get from the library.

02:15:21.740 --> 02:15:26.120
But it turns out that get
string is even more powerful,

02:15:26.120 --> 02:15:29.000
because if I go and change this
program now, not to get an int,

02:15:29.000 --> 02:15:30.710
but something fancier like a string--

02:15:30.710 --> 02:15:33.223
or wait, we're calling it char star now.

02:15:33.223 --> 02:15:35.390
I'm going to go ahead and
do something very similar.

02:15:35.390 --> 02:15:37.640
I'm going to prompt
the user for string s.

02:15:37.640 --> 02:15:39.020
And I'm going to use scanf.

02:15:39.020 --> 02:15:42.320
And I'm going to use percent s,
just like printf uses percent s.

02:15:42.320 --> 02:15:44.510
And I'm going to pass in s.

02:15:44.510 --> 02:15:48.890
Now, to be clear, I don't
need to do ampersand s here,

02:15:48.890 --> 02:15:53.010
because now, we all know that
s is fundamentally an address.

02:15:53.010 --> 02:15:56.270
So it suffices just to pass in
the address that you already have.

02:15:56.270 --> 02:16:01.280
Now, I'm going to go ahead and print
out s colon, percent s backslash n,

02:16:01.280 --> 02:16:02.930
and print out s.

02:16:02.930 --> 02:16:07.730
But when I compile this, make
scanf, it doesn't like it

02:16:07.730 --> 02:16:10.970
when I compile variable s's
uninitialized when used here.

02:16:10.970 --> 02:16:14.390
All right, well if I really
want to be sort of adventurous,

02:16:14.390 --> 02:16:16.350
I can override make's protections.

02:16:16.350 --> 02:16:19.880
And I can just compile this
manually myself using scanf--

02:16:19.880 --> 02:16:21.260
using clang directly.

02:16:21.260 --> 02:16:23.600
That worked, dot slash scanf.

02:16:23.600 --> 02:16:26.870
Let me go ahead and type
in, for instance, "HI!"

02:16:26.870 --> 02:16:29.000
and you see weirdness, nul.

02:16:29.000 --> 02:16:31.190
Well, fortunately,
make, and in turn clang,

02:16:31.190 --> 02:16:33.830
were kind of helping us
help ourselves there.

02:16:33.830 --> 02:16:35.840
It was pointing out that you declared s.

02:16:35.840 --> 02:16:38.660
So you were declared
8 bytes for a pointer.

02:16:38.660 --> 02:16:39.860
But there's nothing there.

02:16:39.860 --> 02:16:41.459
It's a garbage value.

02:16:41.459 --> 02:16:43.170
And so there's nowhere to put this.

02:16:43.170 --> 02:16:45.889
And thankfully, printf and
scanf are being smart enough

02:16:45.889 --> 02:16:48.870
by not just blindly going
there and plopping H, I,

02:16:48.870 --> 02:16:50.760
exclamation point in a nul character.

02:16:50.760 --> 02:16:52.010
They're just leaving it alone.

02:16:52.010 --> 02:16:55.910
And this parenthetical nul is just a
printf feature saying, you screwed up.

02:16:55.910 --> 02:16:58.100
If you see nul, you've
done something wrong.

02:16:58.100 --> 02:17:00.830
It's just being generous
and not crashing on you.

02:17:00.830 --> 02:17:04.879
If I actually want to get user's
input, I need to be smarter than this.

02:17:04.879 --> 02:17:10.040
And I need to either allocate myself
4 bytes, as we've done earlier today.

02:17:10.040 --> 02:17:14.209
Or I could go back to week two stuff
and say something like, give me 4 bytes.

02:17:14.209 --> 02:17:18.830
This, though, gives me 4
bytes on the stack somewhere

02:17:18.830 --> 02:17:21.410
down here in main's frame, so to speak.

02:17:21.410 --> 02:17:23.270
These rows are called frames.

02:17:23.270 --> 02:17:27.260
If I use malloc instead, it
comes from the so-called heap,

02:17:27.260 --> 02:17:29.780
which not pictured, is sort of up here.

02:17:29.780 --> 02:17:34.309
And the only difference is that if
I'm using malloc, I have to use free.

02:17:34.309 --> 02:17:38.930
If I'm using the stack, as I did in
week two, I don't have to use free.

02:17:38.930 --> 02:17:40.730
It's automatically managed for me.

02:17:40.730 --> 02:17:42.590
So frankly, there's so
much new stuff today.

02:17:42.590 --> 02:17:46.280
I like the idea of sticking
with the old school arrays.

02:17:46.280 --> 02:17:51.379
So now, though, if I go ahead and
make scanf, now it compiles with make.

02:17:51.379 --> 02:17:55.610
If I then run scanf and type in,
HI!, voila, it seems to work.

02:17:55.610 --> 02:17:58.549
But that's because I was smart
and anticipated that H-I,

02:17:58.549 --> 02:17:59.660
OK four characters.

02:17:59.660 --> 02:18:00.980
I gave myself 4 bytes.

02:18:00.980 --> 02:18:06.110
But what if the user types in,
HI THERE, DAVID, HOW ARE YOU?

02:18:06.110 --> 02:18:08.059
Clearly, more than four bytes.

02:18:08.059 --> 02:18:11.959
And I hit Enter now, something
weird there happened.

02:18:11.959 --> 02:18:13.790
The rest is just lost.

02:18:13.790 --> 02:18:16.670
And this would really be
annoying and very frustrating

02:18:16.670 --> 02:18:19.520
if you-- trying to get user input
in the first week of the class.

02:18:19.520 --> 02:18:21.500
Get string avoids this for you.

02:18:21.500 --> 02:18:23.719
Get string calls malloc for you.

02:18:23.719 --> 02:18:27.200
And it calls it for as big a
chunk of memory as the string

02:18:27.200 --> 02:18:28.070
the human types in.

02:18:28.070 --> 02:18:30.980
Long story short, we sort of watch
what they're typing character

02:18:30.980 --> 02:18:32.209
by character by character.

02:18:32.209 --> 02:18:34.340
And we make sure to
allocate or reallocate

02:18:34.340 --> 02:18:38.879
just enough memory to fit whatever
it is the human has typed in.

02:18:38.879 --> 02:18:42.107
So scanf is, essentially, how a
function like the CS50 library

02:18:42.107 --> 02:18:43.190
works underneath the hood.

02:18:43.190 --> 02:18:46.650
But it is doing all of this for you.

02:18:46.650 --> 02:18:49.549
And as soon as you take away training
wheels like that, or frankly,

02:18:49.549 --> 02:18:52.469
libraries like that, which it
really is at the end of the day.

02:18:52.469 --> 02:18:53.719
It's not just a teaching tool.

02:18:53.719 --> 02:18:55.070
It's a useful library.

02:18:55.070 --> 02:18:58.469
You have to start implementing more
of this low-level stuff yourself.

02:18:58.469 --> 02:18:59.810
So again, there is a trade off.

02:18:59.810 --> 02:19:02.727
If you don't want to use something
like the CS50 library, that's fine.

02:19:02.727 --> 02:19:08.400
Now, the onus is on you to avoid all
of these possible error conditions.

02:19:08.400 --> 02:19:11.209
All right, with that said,
we have one final feature

02:19:11.209 --> 02:19:14.270
to give you in order to motivate
this week's problems, wherein

02:19:14.270 --> 02:19:18.230
you'll actually explore and manipulate
and write code to change files.

02:19:18.230 --> 02:19:22.790
And for that, we need one final
topic of file I/O. File I/O

02:19:22.790 --> 02:19:27.350
is the term of art that describes
taking input and output from files.

02:19:27.350 --> 02:19:30.980
Pretty much every program we've written
thus far just uses memory, like this

02:19:30.980 --> 02:19:32.924
here, whereby, you can
put stuff in memory.

02:19:32.924 --> 02:19:34.549
But as soon as your program ends, boom.

02:19:34.549 --> 02:19:35.330
It's gone.

02:19:35.330 --> 02:19:37.070
The contents of memory are gone.

02:19:37.070 --> 02:19:39.770
Files, of course, are where you
and I in the computing world

02:19:39.770 --> 02:19:42.020
save our essays and
documents and resumes

02:19:42.020 --> 02:19:44.629
and all of that permanently
on your computer.

02:19:44.629 --> 02:19:48.590
In C, you have the ability,
certainly, to write code yourself that

02:19:48.590 --> 02:19:50.730
saves files long term.

02:19:50.730 --> 02:19:53.450
So for instance, let me go ahead
and write my own program here,

02:19:53.450 --> 02:19:59.260
a phonebook program that stores
names and numbers in a file.

02:19:59.260 --> 02:20:02.380
I'm going to go ahead and include,
just for convenience, the CS50 library

02:20:02.380 --> 02:20:04.480
again, because I don't
want to deal with scanf.

02:20:04.480 --> 02:20:08.200
I'm going to go ahead and save
this, incidentally, as phonebook.c.

02:20:08.200 --> 02:20:12.370
I'm going to go ahead and include, not
just the CS50 library, but standard io.

02:20:12.370 --> 02:20:18.373
And preemptively, I'm going to go
ahead and include string.h as well.

02:20:18.373 --> 02:20:20.290
And I'm going to go ahead
in my main function.

02:20:20.290 --> 02:20:23.990
And I'm going to use a few new functions
that we'll see only briefly here.

02:20:23.990 --> 02:20:27.260
But in the next problem set, will
you explore these in more detail.

02:20:27.260 --> 02:20:29.980
I'm going to give myself
a pointer to a file.

02:20:29.980 --> 02:20:33.820
It turns out, weirdly,
that in all caps, FILE,

02:20:33.820 --> 02:20:38.540
this is a new data type that does
come with C that represents a file.

02:20:38.540 --> 02:20:42.383
So I'm going to go ahead and
give myself a pointer to a file,

02:20:42.383 --> 02:20:43.300
the address of a file.

02:20:43.300 --> 02:20:44.800
And I'm going to call the variable file.

02:20:44.800 --> 02:20:46.300
I could call it f I could call it x.

02:20:46.300 --> 02:20:49.130
I'm going to call it lowercase
file, just to be clear.

02:20:49.130 --> 02:20:52.180
And I'm going to use a new function
called f open, which means file open.

02:20:52.180 --> 02:20:54.077
And file open takes two arguments.

02:20:54.077 --> 02:20:57.160
It takes the first argument, which is
the name of a file you want to open.

02:20:57.160 --> 02:20:59.638
I'm going to open a file
called phonebook.csv.

02:20:59.638 --> 02:21:02.680
And then I'm going to go ahead and
open it, specifically, in append mode.

02:21:02.680 --> 02:21:05.050
Long story short, you can
open files in different ways,

02:21:05.050 --> 02:21:08.450
to read them, that is just look
at their contents, to write them,

02:21:08.450 --> 02:21:10.780
which is to change
their contents entirely,

02:21:10.780 --> 02:21:15.730
or to append to them, a, which
means to add row by row to them,

02:21:15.730 --> 02:21:18.370
so to keep tacking on
more information to them.

02:21:18.370 --> 02:21:20.210
I'm going to go ahead
and, just to be safe,

02:21:20.210 --> 02:21:23.650
I'm going to say if
file equals equals nul,

02:21:23.650 --> 02:21:26.180
because recall that nul
signifies something went wrong,

02:21:26.180 --> 02:21:27.280
let's just return now.

02:21:27.280 --> 02:21:28.960
Maybe I mistyped the name of the file.

02:21:28.960 --> 02:21:29.950
Maybe it doesn't exist.

02:21:29.950 --> 02:21:31.420
Something went wrong, potentially.

02:21:31.420 --> 02:21:34.660
I'm going to check for that by saying,
if file equals equals nul, just

02:21:34.660 --> 02:21:36.178
quit out of the program now.

02:21:36.178 --> 02:21:38.470
But after that, I'm going to
go ahead and get a string.

02:21:38.470 --> 02:21:41.920
But we can call that char
star now, called name.

02:21:41.920 --> 02:21:44.440
And I'm going to ask
the user for a name.

02:21:44.440 --> 02:21:45.820
And we've done this before.

02:21:45.820 --> 02:21:48.610
I'm going to go ahead and ask
them for a number, phone number.

02:21:48.610 --> 02:21:49.970
And we've done this before.

02:21:49.970 --> 02:21:52.690
The only difference, now, is
I'm calling string char star.

02:21:52.690 --> 02:21:54.400
And now, here's the cool part.

02:21:54.400 --> 02:21:56.830
It turns out, if I want to
save this name and number

02:21:56.830 --> 02:21:58.990
to that file permanently in a CSV--

02:21:58.990 --> 02:22:02.170
if unfamiliar, popular in the
consulting world, the analytics world.

02:22:02.170 --> 02:22:04.900
It's just a spreadsheet,
a comma-separated value

02:22:04.900 --> 02:22:08.470
file that you can open in Excel
or numbers or Google spreadsheet.

02:22:08.470 --> 02:22:13.660
I'm going to go ahead and, not
printf, but fprintf to that file,

02:22:13.660 --> 02:22:18.580
a string followed by a comma, followed
by a string, followed by a new line,

02:22:18.580 --> 02:22:21.070
plugging in the name and the number.

02:22:21.070 --> 02:22:25.280
And then down here, I'm
going to close the file.

02:22:25.280 --> 02:22:28.570
So this is new. fprintf is not
printf, which prints to your screen.

02:22:28.570 --> 02:22:30.307
fprintf prints to a file.

02:22:30.307 --> 02:22:32.890
So you have to pass in one more
argument, the first one, which

02:22:32.890 --> 02:22:37.150
is the pointer to the file that you
want to send these new strings to.

02:22:37.150 --> 02:22:40.180
Then you still provide a format
string, which says, hey fprintf,

02:22:40.180 --> 02:22:43.060
this is the kind of data I
want to print to the file.

02:22:43.060 --> 02:22:46.930
And then you plug in the variables,
just like we've always done with printf.

02:22:46.930 --> 02:22:49.610
And then lastly, we close the file.

02:22:49.610 --> 02:22:53.200
So in short, this program would seem to
prompt a human for a name and number.

02:22:53.200 --> 02:22:55.420
And then it's going to go
ahead and write those names

02:22:55.420 --> 02:22:56.990
and numbers to the file.

02:22:56.990 --> 02:22:59.035
So let me go ahead and make phonebook.

02:22:59.035 --> 02:23:07.810
OK, no mistake so far, dot slash
phonebook, David, 949-468-2750.

02:23:07.810 --> 02:23:11.140
OK, let me run it once more, even
though nothing seems to have happened.

02:23:11.140 --> 02:23:15.730
Brian, how about 617-495-1000, Enter.

02:23:15.730 --> 02:23:17.950
Let me check my file browser here.

02:23:17.950 --> 02:23:22.240
Notice, all of the files we've created
today, including, if I zoom in,

02:23:22.240 --> 02:23:25.390
not just phonebook.c, but phonebook.csv.

02:23:25.390 --> 02:23:29.290
And if I double click that,
notice what's inside of this.

02:23:29.290 --> 02:23:33.700
Voila, David's name, Brian's
name, and each of our numbers.

02:23:33.700 --> 02:23:36.280
And even cooler than that, let
me go ahead and close this.

02:23:36.280 --> 02:23:40.213
Let me go ahead and download
this file using the IDE.

02:23:40.213 --> 02:23:42.380
And that's going to put it
into my Downloads folder.

02:23:42.380 --> 02:23:43.420
Let me go ahead and click on it.

02:23:43.420 --> 02:23:45.545
And it's going to open
Excel or Numbers or whatever

02:23:45.545 --> 02:23:47.470
you happen to have on your Mac or PC.

02:23:47.470 --> 02:23:50.740
I'm going to go ahead and just proceed.

02:23:50.740 --> 02:23:54.400
And voila, looks a little
stupid in this formatting here.

02:23:54.400 --> 02:23:57.160
But I've opened up a spreadsheet
that I, myself, generated

02:23:57.160 --> 02:24:01.390
using fopen, fprintf, and fclose.

02:24:01.390 --> 02:24:04.180
So already, now that we have
pointers at our disposal,

02:24:04.180 --> 02:24:08.292
can we actually manipulate things
like files, which is quite cool.

02:24:08.292 --> 02:24:10.000
But we're going to do
that this week, not

02:24:10.000 --> 02:24:12.940
with text, but with actual
specific types of files.

02:24:12.940 --> 02:24:16.840
And indeed, recall this
kind of thinking here.

02:24:16.840 --> 02:24:19.150
If you glance at this, it's
probably pretty cryptic.

02:24:19.150 --> 02:24:21.400
It looks like machine
code, but it's not.

02:24:21.400 --> 02:24:24.070
This is, perhaps, the
simplest representation

02:24:24.070 --> 02:24:26.410
of a smiley face inside of a file.

02:24:26.410 --> 02:24:31.000
If you have a bitmap file, a map of
bits, a grid of bits, those bits,

02:24:31.000 --> 02:24:33.130
quite simply, could
literally be 0's and 1's.

02:24:33.130 --> 02:24:37.240
And if you assign the color black
to 0 and the color white to 1,

02:24:37.240 --> 02:24:40.660
you could actually think of this same
grid of 0's and 1's as representing,

02:24:40.660 --> 02:24:41.930
indeed, a smiley face.

02:24:41.930 --> 02:24:43.690
In other words, here are some pixels.

02:24:43.690 --> 02:24:45.520
We talked about pixels in week zero.

02:24:45.520 --> 02:24:49.567
Pixels are just the dots that compose
a graphic file on your computer.

02:24:49.567 --> 02:24:50.650
And pixels are everywhere.

02:24:50.650 --> 02:24:53.320
All of us, now, tuning in live
via Zoom or YouTube or the like,

02:24:53.320 --> 02:24:56.800
we're watching streams of pixels, which
compose multiple images and multiple

02:24:56.800 --> 02:25:02.290
images compose video that appears to
be moving at, like, 20 something or 30

02:25:02.290 --> 02:25:04.670
frames per second, images per second.

02:25:04.670 --> 02:25:08.530
Now, of course, there's only so much
fidelity in these kinds of images.

02:25:08.530 --> 02:25:11.097
And it's quite common in the
case on TV and in movies,

02:25:11.097 --> 02:25:13.930
if there's some bad guy that's been
picked up with some surveillance

02:25:13.930 --> 02:25:17.050
footage or the like, invariably, the
folks on Law & Order and the like

02:25:17.050 --> 02:25:19.930
can just kind of enhance the
video and zoom in and see

02:25:19.930 --> 02:25:24.710
exactly the glint in the person's eye
that reveals who committed some crime.

02:25:24.710 --> 02:25:26.140
Well, that's all kind of nonsense.

02:25:26.140 --> 02:25:29.367
And it derives from some of the
primitives we introduced in week zero.

02:25:29.367 --> 02:25:31.450
In fact, just to poke fun
at this, let me go ahead

02:25:31.450 --> 02:25:34.990
and play on a few seconds of
this TV show here in the US

02:25:34.990 --> 02:25:39.670
called CSI, just to give you a sense of
just how commonplace this kind of logic

02:25:39.670 --> 02:25:40.180
is.

02:25:40.180 --> 02:25:41.140
[VIDEO PLAYBACK]

02:25:41.140 --> 02:25:43.330
- We know.

02:25:43.330 --> 02:25:46.930
- That at 9:15, Ray
Santoya was at the ATM.

02:25:46.930 --> 02:25:50.380
- So the question is,
what was he doing at 9:16?

02:25:50.380 --> 02:25:53.180
- Shooting the 9
millimeter at something.

02:25:53.180 --> 02:25:54.820
Maybe he saw the sniper.

02:25:54.820 --> 02:25:56.920
- Or was working with him.

02:25:56.920 --> 02:25:59.490
- Wait, go back one.

02:25:59.490 --> 02:26:00.481
- What do you see?

02:26:00.481 --> 02:26:05.291
[CLICKING]

02:26:07.700 --> 02:26:11.420
- Bring his face up, full screen.

02:26:11.420 --> 02:26:12.530
- His glasses.

02:26:12.530 --> 02:26:13.982
- There's a reflection.

02:26:13.982 --> 02:26:17.426
[TYPING]

02:26:23.840 --> 02:26:25.620
- That's Neuvitas baseball team.

02:26:25.620 --> 02:26:26.630
That's their logo.

02:26:26.630 --> 02:26:29.075
- And he's talking to
whoever's wearing that jacket.

02:26:29.075 --> 02:26:31.160
- We may have a witness.

02:26:31.160 --> 02:26:32.700
- To both shootings.

02:26:32.700 --> 02:26:33.283
[END PLAYBACK]

02:26:33.283 --> 02:26:36.408
DAVID MALAN: So unfortunately, today
will rather ruin a lot of TV and movie

02:26:36.408 --> 02:26:38.650
for you, because you can't
just zoom in infinitely

02:26:38.650 --> 02:26:41.250
and see more information if
that information is not there.

02:26:41.250 --> 02:26:43.750
At the end of the day, there's
only a finite number of bits.

02:26:43.750 --> 02:26:46.120
And case in point, here's
a photograph of Brian.

02:26:46.120 --> 02:26:48.580
And you might see that, oh,
there's a glint in his eye.

02:26:48.580 --> 02:26:50.930
Let's see what was being
reflected in his eye there.

02:26:50.930 --> 02:26:53.410
And so if we Zoom in on
this image here of Brian,

02:26:53.410 --> 02:26:57.730
and maybe we zoom in a little further,
that's all that's actually there.

02:26:57.730 --> 02:27:00.160
You can't just click the
enhance button and see more,

02:27:00.160 --> 02:27:02.368
because at the end of the
day, these are just pixels.

02:27:02.368 --> 02:27:06.310
And pixels, per week zero, are just
0's and 1's, and finitely, many so.

02:27:06.310 --> 02:27:08.470
So what you see is what you get.

02:27:08.470 --> 02:27:12.190
Now, with that said-- and actually,
we can poke fun of this, too, here.

02:27:12.190 --> 02:27:14.830
Let me just play one other
short clip from Futurama,

02:27:14.830 --> 02:27:18.423
which kind of hammers home this
point as well, but more playfully so.

02:27:18.423 --> 02:27:19.090
[VIDEO PLAYBACK]

02:27:19.090 --> 02:27:23.250
- Magnify that death speed.

02:27:23.250 --> 02:27:24.770
Why is it still blurry?

02:27:24.770 --> 02:27:26.710
- That's all the resolution we have.

02:27:26.710 --> 02:27:29.050
Making it bigger
doesn't make it clearer.

02:27:29.050 --> 02:27:31.220
- It does on CSI: Miami.

02:27:31.220 --> 02:27:32.020
- [SIGH]

02:27:32.020 --> 02:27:32.170
[END PLAYBACK]

02:27:32.170 --> 02:27:35.212
DAVID MALAN: So there, we have two
clips talking, rather, to one another.

02:27:35.212 --> 02:27:37.330
But I have to update things for 2020.

02:27:37.330 --> 02:27:41.972
You can't really pick up the internet
these days or magazine these days,

02:27:41.972 --> 02:27:43.930
if you even would, that
doesn't somehow mention

02:27:43.930 --> 02:27:45.850
machine learning and
artificial intelligence

02:27:45.850 --> 02:27:48.005
and fancy algorithms via
which you can do things

02:27:48.005 --> 02:27:49.630
that previously weren't quite possible.

02:27:49.630 --> 02:27:51.460
And that's actually
kinda sorta the case.

02:27:51.460 --> 02:27:56.290
You might recall from week zero, that
we found this beautiful watercolor

02:27:56.290 --> 02:28:00.250
painting in the Harvard archives
that's only about 11 inches tall total.

02:28:00.250 --> 02:28:03.700
And yet somehow, it's 13
feet tall here behind me.

02:28:03.700 --> 02:28:06.533
Now, normally, if you were to just
enhance this watercolor painting,

02:28:06.533 --> 02:28:08.658
it would start to look
pretty stupid pretty quickly

02:28:08.658 --> 02:28:10.570
with lots and lots of
pixelation, even if you

02:28:10.570 --> 02:28:12.940
used a very fancy camera,
as the archives do,

02:28:12.940 --> 02:28:14.440
to capture the original image.

02:28:14.440 --> 02:28:16.810
But we wanted to blow
it up to 13 feet tall

02:28:16.810 --> 02:28:21.110
so that it would stand at high
quality behind us this whole time.

02:28:21.110 --> 02:28:24.790
And there, we actually did
use enhance, in some sense.

02:28:24.790 --> 02:28:28.640
So using, long story short, fancier
algorithms than those last week,

02:28:28.640 --> 02:28:31.690
you can use artificial
intelligence, machine learning,

02:28:31.690 --> 02:28:36.130
to actually analyze data and find
patterns where there weren't--

02:28:36.130 --> 02:28:38.280
that aren't necessarily
visible to the human eye.

02:28:38.280 --> 02:28:41.590
So for instance, if we take the
original here and start to zoom in,

02:28:41.590 --> 02:28:43.600
it looks pretty good at this resolution.

02:28:43.600 --> 02:28:44.720
But it's pretty smooth.

02:28:44.720 --> 02:28:48.730
You don't really see the fact that
this was paint on an actual canvas.

02:28:48.730 --> 02:28:50.707
So this was just
zooming in on Photoshop.

02:28:50.707 --> 02:28:52.540
But when you actually
run an image like this

02:28:52.540 --> 02:28:55.990
through fancy machine learning-based
software, artificial intelligence,

02:28:55.990 --> 02:28:58.570
you can begin to improve
it and actually see,

02:28:58.570 --> 02:29:01.390
not just this window from the top
of one of the buildings, which

02:29:01.390 --> 02:29:03.520
is pretty glossed over
here in Photoshop,

02:29:03.520 --> 02:29:05.480
you can start to see more detail.

02:29:05.480 --> 02:29:08.750
So this is literally the before,
just zooming in Photoshop.

02:29:08.750 --> 02:29:12.572
This is after, actually applying fancy
artificial intelligence algorithms

02:29:12.572 --> 02:29:15.280
that notice, wait a minute, there's
a little discoloration there.

02:29:15.280 --> 02:29:17.072
Wait, there's a little
discoloration there.

02:29:17.072 --> 02:29:20.830
And nowadays, enhance is
increasingly becoming a thing.

02:29:20.830 --> 02:29:22.450
It's still inferring.

02:29:22.450 --> 02:29:25.270
It's not resurrecting information
that was necessarily there.

02:29:25.270 --> 02:29:28.240
It's doing its best guess,
really, algorithmically,

02:29:28.240 --> 02:29:30.487
to reconstruct what
the image actually was.

02:29:30.487 --> 02:29:32.320
And if we zoom in
further, you can, perhaps,

02:29:32.320 --> 02:29:35.440
see that this is really starting to
get blurry if you just use Photoshop

02:29:35.440 --> 02:29:36.578
and keep zooming in.

02:29:36.578 --> 02:29:38.620
But if you run it through
fancy enough algorithms

02:29:38.620 --> 02:29:40.780
and start to notice
slight discolorations that

02:29:40.780 --> 02:29:44.920
aren't super visible to the human
eye, we can enhance that even further.

02:29:44.920 --> 02:29:46.540
And you can't do it infinitely so.

02:29:46.540 --> 02:29:48.550
And in some sense, we're
creating information

02:29:48.550 --> 02:29:51.282
where there isn't necessarily
that information there.

02:29:51.282 --> 02:29:54.490
So whether or not these kinds of things
hold up in court is another question.

02:29:54.490 --> 02:29:56.920
But it can improve the
fidelity of images like this.

02:29:56.920 --> 02:30:02.570
And indeed, it allowed us to zoom in
from 11 inches to 13 feet instead.

02:30:02.570 --> 02:30:05.920
So when it comes to manipulating
images, ultimately, we

02:30:05.920 --> 02:30:10.030
do have some programmatic capabilities,
including this file pointer,

02:30:10.030 --> 02:30:13.280
like we just saw, and also, a
few other functions as well.

02:30:13.280 --> 02:30:15.550
And our final examples,
here, will lay the foundation

02:30:15.550 --> 02:30:17.380
for what you'll do
this coming week, which

02:30:17.380 --> 02:30:21.250
is manipulate your very own graphical
files with a newfound understanding

02:30:21.250 --> 02:30:25.270
of pointers and addresses and
now files and input and output.

02:30:25.270 --> 02:30:30.010
For instance, I'm going to go ahead
and open up a program here called--

02:30:30.010 --> 02:30:32.110
give me just one second.

02:30:32.110 --> 02:30:37.660
I'm going to open up a
program here called jpeg.c.

02:30:37.660 --> 02:30:40.610
And this program, jpeg.c,
which I wrote in advance,

02:30:40.610 --> 02:30:43.400
which is on the course's
website, does the following.

02:30:43.400 --> 02:30:46.510
It first declares a type called byte.

02:30:46.510 --> 02:30:49.990
It turns out, in C, there's no
common definition of what a byte is.

02:30:49.990 --> 02:30:51.610
A bite, as we know it, is a bit.

02:30:51.610 --> 02:30:53.680
And it turns out, the
simplest way to create

02:30:53.680 --> 02:30:57.250
a byte is to define our own,
just like we've defined a string,

02:30:57.250 --> 02:31:01.840
just like we've defined other types
too, like a student, in order--

02:31:01.840 --> 02:31:04.640
a person, rather, in
order to give us a byte.

02:31:04.640 --> 02:31:07.210
So this first line of code
just declares a data type

02:31:07.210 --> 02:31:11.830
called byte, using another, more arcane
data type called u int a underscore t.

02:31:11.830 --> 02:31:13.330
But more on that in the problem set.

02:31:13.330 --> 02:31:15.820
That this just did invent
something called byte.

02:31:15.820 --> 02:31:17.928
Notice, in this program,
I'm resurrecting the idea

02:31:17.928 --> 02:31:21.220
from week two of command line arguments,
where we can take input from the user.

02:31:21.220 --> 02:31:23.860
Notice that I'm checking if the
user typed in two arguments.

02:31:23.860 --> 02:31:27.520
And if not, I'm returning one
immediately to signify error.

02:31:27.520 --> 02:31:30.490
In line 17, I'm using my new technique.

02:31:30.490 --> 02:31:34.210
I'm opening a file using
the name of the file

02:31:34.210 --> 02:31:36.050
that the human typed
at the command line.

02:31:36.050 --> 02:31:40.270
And this time, I'm opening it to read
it with quote unquote, r instead of a.

02:31:40.270 --> 02:31:41.660
But if there's not a file--

02:31:41.660 --> 02:31:44.920
so if bang file, that is,
if exclamation point file,

02:31:44.920 --> 02:31:47.990
or if file equals equals NULL,
those mean the same thing.

02:31:47.990 --> 02:31:51.040
I can go ahead and return
one, signifying an error.

02:31:51.040 --> 02:31:53.710
Down here, I'm doing
something a little clever.

02:31:53.710 --> 02:31:56.890
It turns out that with
very high probability,

02:31:56.890 --> 02:32:01.640
you can determine if any file is a
jpeg by looking only at its first three

02:32:01.640 --> 02:32:02.140
bytes.

02:32:02.140 --> 02:32:04.720
A lot of file formats have
what are called magic numbers

02:32:04.720 --> 02:32:06.350
at the beginning of their files.

02:32:06.350 --> 02:32:10.990
And these are industry standard
numbers, 1 or 2 or 3 or more of them,

02:32:10.990 --> 02:32:13.910
that is just commonly expected
to be at the beginning of a file,

02:32:13.910 --> 02:32:16.240
so that a program can quickly
check, is this a jpeg?

02:32:16.240 --> 02:32:16.960
Is this a gif?

02:32:16.960 --> 02:32:18.070
Is this a Word document?

02:32:18.070 --> 02:32:19.300
Is this an Excel file?

02:32:19.300 --> 02:32:21.910
They tend to have these numbers
at the beginning of them.

02:32:21.910 --> 02:32:26.020
And jpegs have a sequence of
bytes that we're about to see.

02:32:26.020 --> 02:32:29.770
This line of code 24 here, as
you'll see in the next problem set,

02:32:29.770 --> 02:32:33.070
is how you might give yourself
a buffer of bytes, specifically

02:32:33.070 --> 02:32:35.320
an array of three bytes.

02:32:35.320 --> 02:32:38.380
This next line of code, as you'll see
this coming week, is called fread.

02:32:38.380 --> 02:32:40.720
fread, as the name
suggests, reads from a file.

02:32:40.720 --> 02:32:42.940
That is, it grabs bytes from a file.

02:32:42.940 --> 02:32:45.790
And it's a little fancy to use,
but you'll get more comfortable

02:32:45.790 --> 02:32:47.140
with this over time.

02:32:47.140 --> 02:32:52.060
It reads into this buffer, its first
argument, the size of this data type,

02:32:52.060 --> 02:32:53.050
the size of a byte.

02:32:53.050 --> 02:32:58.250
And it reads in this many of
those data types from this file.

02:32:58.250 --> 02:33:01.480
So again, it's for arguments, which
is kind of a lot from what we've seen.

02:33:01.480 --> 02:33:08.230
But it reads from this file,
three bytes into this array,

02:33:08.230 --> 02:33:09.770
a.k.a. buffer, called bytes.

02:33:09.770 --> 02:33:13.460
So this is just how you write code
that doesn't put data in a file,

02:33:13.460 --> 02:33:14.650
but read it from it.

02:33:14.650 --> 02:33:16.700
And then here, notice our hexadecimal.

02:33:16.700 --> 02:33:18.190
So we've come full circle.

02:33:18.190 --> 02:33:23.110
If bytes bracket 0 equals
equals 0xff and bytes

02:33:23.110 --> 02:33:27.080
bracket 1 equals 0xd8 and
bytes bracket 2 equals 0xff,

02:33:27.080 --> 02:33:28.960
this definitely looks cryptic to you.

02:33:28.960 --> 02:33:31.570
But that's just because I looked
up in the manual for jpegs,

02:33:31.570 --> 02:33:34.900
and it turns out that
almost any jpeg, rather,

02:33:34.900 --> 02:33:39.430
must start with 0xff, 0xd8, 0xff.

02:33:39.430 --> 02:33:43.450
Those are the first three bytes
of any jpeg on your Mac, your PC,

02:33:43.450 --> 02:33:44.350
on the internet.

02:33:44.350 --> 02:33:46.300
There are always those three bytes.

02:33:46.300 --> 02:33:50.500
It turns out, the fourth byte
further decides whether or not

02:33:50.500 --> 02:33:51.730
a file is actually a jpeg.

02:33:51.730 --> 02:33:54.640
But the algorithm for that's a
little fancier, so I kept it simple.

02:33:54.640 --> 02:33:59.020
If the first three bytes of a file
are those, maybe you have a jpeg.

02:33:59.020 --> 02:34:01.150
But if you don't have
exactly those three bytes,

02:34:01.150 --> 02:34:02.920
you definitely don't have a jpeg.

02:34:02.920 --> 02:34:05.270
And so what I can do,
here, is as follows.

02:34:05.270 --> 02:34:09.700
In today's code-- let me go
ahead and grab two other files

02:34:09.700 --> 02:34:11.620
that I brought with me.

02:34:11.620 --> 02:34:16.210
And one happens to be
a photograph again.

02:34:16.210 --> 02:34:18.160
Give me one second.

02:34:18.160 --> 02:34:24.010
I brought with me a few files,
one of which is called brian.jpeg,

02:34:24.010 --> 02:34:25.870
which is the same photo of Brian.

02:34:25.870 --> 02:34:28.030
And then I have a gif,
which of course, is not

02:34:28.030 --> 02:34:31.210
a jpeg, that is this cat typing here.

02:34:31.210 --> 02:34:33.250
And what I, effectively,
have in front of me now

02:34:33.250 --> 02:34:37.870
is a program that if I do make
jpeg, because this file is jpeg.c,

02:34:37.870 --> 02:34:43.360
and I run dot slash jpeg, I can
type in something like cat.gif

02:34:43.360 --> 02:34:46.990
at the command line as an argument,
hit Enter, and I should see no.

02:34:46.990 --> 02:34:51.550
By contrast, if I pass in Brian's jpeg
at the command line as an argument,

02:34:51.550 --> 02:34:52.630
I see maybe.

02:34:52.630 --> 02:34:54.430
And again, maybe only
because the algorithm

02:34:54.430 --> 02:34:56.638
for actually adjudicating
whether something is a jpeg

02:34:56.638 --> 02:34:58.550
is a little more complicated than that.

02:34:58.550 --> 02:35:02.590
But indeed, I can now access the
individual bytes, and therefore pixels,

02:35:02.590 --> 02:35:06.310
it would seem, of an image file.

02:35:06.310 --> 02:35:08.575
And in fact, we can even do this.

02:35:08.575 --> 02:35:10.450
Let me go ahead and show
you one last program

02:35:10.450 --> 02:35:13.960
that we wrote deliberately in advance,
just to give you a taste of what's

02:35:13.960 --> 02:35:15.790
coming with the next problem set.

02:35:15.790 --> 02:35:19.480
This program is a reimplementation
of the program you've probably

02:35:19.480 --> 02:35:21.820
used one or more times called CP.

02:35:21.820 --> 02:35:25.570
Recall that CP is a program
in the IDE and in Linux,

02:35:25.570 --> 02:35:27.730
more generally, that
allows you to copy a file.

02:35:27.730 --> 02:35:31.660
You do CP, space, the filename,
space, the new filename.

02:35:31.660 --> 02:35:32.650
How does this work?

02:35:32.650 --> 02:35:37.090
I now have all of the building blocks
with which to copy files myself.

02:35:37.090 --> 02:35:39.100
So again, I'm defining a byte up here.

02:35:39.100 --> 02:35:41.930
I'm defining main as taking
command line arguments here.

02:35:41.930 --> 02:35:43.000
And notice one change.

02:35:43.000 --> 02:35:44.800
I'm not using the CS50 library.

02:35:44.800 --> 02:35:52.090
So even what was previously string
in week two is now char star.

02:35:52.090 --> 02:35:55.450
Even here for argv, I'm making
sure that the human types

02:35:55.450 --> 02:36:00.580
in three words, the program's name and
the source file and the destination

02:36:00.580 --> 02:36:01.180
file.

02:36:01.180 --> 02:36:02.410
I'm using fopen again.

02:36:02.410 --> 02:36:06.100
I'm opening the source
file here from argv1.

02:36:06.100 --> 02:36:07.358
I'm making sure it's not nul.

02:36:07.358 --> 02:36:08.650
And then I'm quitting if it is.

02:36:08.650 --> 02:36:13.030
I'm then-- here's something new,
opening the destination file here, also

02:36:13.030 --> 02:36:13.870
with fopen.

02:36:13.870 --> 02:36:15.700
But I'm using quote unquote, "w."

02:36:15.700 --> 02:36:19.630
I'm opening one file with r, one file
for w, because I want to read from one

02:36:19.630 --> 02:36:21.160
and write to the other.

02:36:21.160 --> 02:36:25.360
And then down here, this
loop is a clever way

02:36:25.360 --> 02:36:27.370
of copying one file to another.

02:36:27.370 --> 02:36:30.790
I'm giving myself a buffer of one byte,
so just a temporary variable, just

02:36:30.790 --> 02:36:33.090
like Brian's temp or empty glass.

02:36:33.090 --> 02:36:35.160
And I'm using this function, fread.

02:36:35.160 --> 02:36:39.750
I'm reading into that buffer via
its address, the size of a byte,

02:36:39.750 --> 02:36:42.870
specifically one byte
from the source file.

02:36:42.870 --> 02:36:47.940
And then, in that same loop, I'm writing
from that buffer, the size of a byte,

02:36:47.940 --> 02:36:50.950
specifically one byte,
to the destination.

02:36:50.950 --> 02:36:53.760
So literally, the CP program
you might have seen me use

02:36:53.760 --> 02:36:57.090
or you yourself have used to copy
files, is literally doing this.

02:36:57.090 --> 02:36:59.790
It's opening one file,
iterating over all of its bytes,

02:36:59.790 --> 02:37:02.010
and copying them from
source to destination.

02:37:02.010 --> 02:37:04.260
And then lastly, it's closing the file.

02:37:04.260 --> 02:37:06.360
And these last two
examples deliberately fast,

02:37:06.360 --> 02:37:11.130
because this whole week will be
spent diving into file I/O and images

02:37:11.130 --> 02:37:11.890
thereof.

02:37:11.890 --> 02:37:16.560
But all that we've done is use these
fread, fopen, and fwrite and f close,

02:37:16.560 --> 02:37:18.610
to manipulate those very files.

02:37:18.610 --> 02:37:21.975
So for instance, if I now
do this, let me do make cp.

02:37:21.975 --> 02:37:25.800
OK, seems to compile,
dot slash cp, brian.jpeg.

02:37:25.800 --> 02:37:27.750
How about brian2.jpeg?

02:37:27.750 --> 02:37:28.680
And hit Enter.

02:37:28.680 --> 02:37:29.880
Nothing seems to happen.

02:37:29.880 --> 02:37:33.240
But if I go in here and
double click on brian2,

02:37:33.240 --> 02:37:37.420
we see that we have a second
copy of Brian's actual file.

02:37:37.420 --> 02:37:41.560
So this coming week, you'll experiment
with multiple file formats for images.

02:37:41.560 --> 02:37:42.580
The first is jpegs.

02:37:42.580 --> 02:37:45.000
And we will give you a
so-called forensic image

02:37:45.000 --> 02:37:47.938
of a whole bunch of photographs
from a digital memory card.

02:37:47.938 --> 02:37:50.730
In fact, it's very common these
days, certainly in law enforcement,

02:37:50.730 --> 02:37:53.580
to take forensic copies of
hard drives, of media sticks,

02:37:53.580 --> 02:37:55.920
of phones and other devices,
and then analyze them

02:37:55.920 --> 02:37:58.650
for data that's been lost
or corrupted or deleted.

02:37:58.650 --> 02:38:01.980
We'll do exactly that, whereby,
you'll write a program that recovers

02:38:01.980 --> 02:38:05.850
jpegs that have been accidentally
deleted from a digital memory card.

02:38:05.850 --> 02:38:08.100
And we'll give you all
copies of that memory card

02:38:08.100 --> 02:38:11.220
by making a forensic image of it,
that is copying all of the 0's and 1's

02:38:11.220 --> 02:38:13.710
from a camera and giving
them to you in a file

02:38:13.710 --> 02:38:16.710
that you can fread and then fwrite from.

02:38:16.710 --> 02:38:18.930
We'll also introduce
you to bitmap files,

02:38:18.930 --> 02:38:22.290
BMP's, popularized by
the Windows operating

02:38:22.290 --> 02:38:24.160
system for wallpaper's and the like.

02:38:24.160 --> 02:38:28.470
But we'll use them to implement
using pointers and using file I/O,

02:38:28.470 --> 02:38:30.550
your very own Instagram-like filter.

02:38:30.550 --> 02:38:33.540
So we'll take this picture,
here, of the Weeks footbridge

02:38:33.540 --> 02:38:35.578
here in Cambridge,
Massachusetts by Harvard.

02:38:35.578 --> 02:38:37.620
And we'll have you implement
a number of filters,

02:38:37.620 --> 02:38:39.328
taking this original
image, for instance,

02:38:39.328 --> 02:38:41.910
and desaturating it,
making it black and white,

02:38:41.910 --> 02:38:45.210
by iterating over all of the pixels
top to bottom, left to right,

02:38:45.210 --> 02:38:49.350
and recognizing any colors, like red or
green or blue or anything in between,

02:38:49.350 --> 02:38:53.467
and changing them to some shade
of gray, doing a sepia filter,

02:38:53.467 --> 02:38:55.800
making things look old school,
like this photo was taken

02:38:55.800 --> 02:39:00.810
many years ago, by similarly applying a
heuristic that alters the colors of all

02:39:00.810 --> 02:39:02.345
of the pixels in this picture.

02:39:02.345 --> 02:39:05.220
We'll have you flip it around so
you have to put this pixel over here

02:39:05.220 --> 02:39:06.630
and this pixel over there.

02:39:06.630 --> 02:39:09.690
And you'll appreciate exactly
how files are implemented

02:39:09.690 --> 02:39:12.180
within your own hard drive and phone.

02:39:12.180 --> 02:39:17.580
And you'll even implement, for instance,
a blur filter, which no accident,

02:39:17.580 --> 02:39:20.010
makes it harder to see
what's going on here,

02:39:20.010 --> 02:39:23.700
because you're starting to, now,
average together pixels that are nearby

02:39:23.700 --> 02:39:27.090
each other to kind of gloss
things over and deliberately

02:39:27.090 --> 02:39:28.990
make it harder to see here.

02:39:28.990 --> 02:39:30.733
And so we'll even, if
you so choose, have

02:39:30.733 --> 02:39:33.150
you implement edge detection,
if feeling more comfortable,

02:39:33.150 --> 02:39:37.020
where you find the edges of all of the
physical objects in these pictures,

02:39:37.020 --> 02:39:43.350
in order to actually detect them in
code and create visual art like this.

02:39:43.350 --> 02:39:44.220
Now, this was a lot.

02:39:44.220 --> 02:39:45.960
And I know pointers are
generally considered

02:39:45.960 --> 02:39:47.820
to be among the more
challenging features of C,

02:39:47.820 --> 02:39:49.403
and certainly, programming in general.

02:39:49.403 --> 02:39:52.140
So if you're feeling like
it's been quite a bit, it was.

02:39:52.140 --> 02:39:55.290
But you do now have the
ability, either today

02:39:55.290 --> 02:39:59.040
or in the very near term, to understand
even XKCD comics like this that most

02:39:59.040 --> 02:40:00.990
any computer scientist
out there has seen.

02:40:00.990 --> 02:40:05.130
So our final look for you,
today, is on this joke here.

02:40:05.130 --> 02:40:10.050
And even though I can't
necessarily hear you from afar,

02:40:10.050 --> 02:40:12.690
I'll just assume, in
our final moments today,

02:40:12.690 --> 02:40:16.650
that everyone is breaking out
into a very geeky laughter.

02:40:16.650 --> 02:40:19.530
And I see some smiles, at
least, which is reassuring.

02:40:19.530 --> 02:40:21.480
This was, then, CS50.

02:40:21.480 --> 02:40:23.010
We'll see you next time.

02:40:23.010 --> 02:40:26.360
[MUSIC PLAYING]