WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:02.982 --> 00:00:06.461
[MUSIC PLAYING]

00:01:12.600 --> 00:01:13.590
DAVID MALAN: All right.

00:01:13.590 --> 00:01:17.130
This is CS50, and this
is week 2 wherein we're

00:01:17.130 --> 00:01:20.610
going to take a look at a
lower level at how things work,

00:01:20.610 --> 00:01:24.120
and indeed, among the goals of the
course is this bottom-up understanding

00:01:24.120 --> 00:01:26.670
so that in a couple of weeks'
time, even a few years' time,

00:01:26.670 --> 00:01:29.920
when you encounter some new technology,
you'll be able to think back hopefully

00:01:29.920 --> 00:01:33.180
on some of this week's and this is
basic building blocks and primitives

00:01:33.180 --> 00:01:36.060
and really just deduce how
tomorrow's technologies work.

00:01:36.060 --> 00:01:37.685
But along the way, it's going to seem--

00:01:37.685 --> 00:01:40.727
it's going to be a little hard, perhaps,
to see the forest for the trees,

00:01:40.727 --> 00:01:41.380
so to speak.

00:01:41.380 --> 00:01:44.783
And so the goal at the end of the day
still is going to be problem-solving.

00:01:44.783 --> 00:01:47.700
And so we thought we'd begin today
with a look at some of the problems

00:01:47.700 --> 00:01:50.405
we'll talk about or
solve this coming week,

00:01:50.405 --> 00:01:53.280
and for that, we have some brave
volunteers who have already come up.

00:01:53.280 --> 00:01:58.320
If we could turn on some dramatic
lighting and meet today's volunteers.

00:01:58.320 --> 00:02:00.430
So on my left here, we have--

00:02:00.430 --> 00:02:00.930
ALEX: Hi.

00:02:00.930 --> 00:02:01.960
My name is Alex.

00:02:01.960 --> 00:02:05.340
I'm a first-year at the college and
I'm from Chapel Hill, North Carolina.

00:02:05.340 --> 00:02:07.080
DAVID MALAN: Welcome to Alex.

00:02:07.080 --> 00:02:09.180
And to Alex's right.

00:02:09.180 --> 00:02:10.050
SARAH: I'm Sarah.

00:02:10.050 --> 00:02:13.230
I'm from Toronto, Canada, and I'm also
a first-year student at the college.

00:02:13.230 --> 00:02:14.188
DAVID MALAN: Wonderful.

00:02:14.188 --> 00:02:15.869
Well, welcome to both Alex and Sarah.

00:02:15.869 --> 00:02:18.577
So one of the problems you'll
perhaps solve this week for problem

00:02:18.577 --> 00:02:22.442
set 2 is to analyze the reading
level of a body of text,

00:02:22.442 --> 00:02:25.650
whether someone reads at a first grade
level, second grade level, third grade

00:02:25.650 --> 00:02:28.570
level, all the way up
to 12 or 13 or beyond.

00:02:28.570 --> 00:02:32.250
What you perhaps never quite thought
about, certainly in terms of code,

00:02:32.250 --> 00:02:35.310
like how you would analyze
some text, some book and figure

00:02:35.310 --> 00:02:36.750
out what reading level is it at.

00:02:36.750 --> 00:02:40.330
And yet, surely our teachers growing up
knew or had an intuitive sense of this.

00:02:40.330 --> 00:02:42.450
So let's consider some sample text.

00:02:42.450 --> 00:02:45.960
For instance, Alex, what
have you been reading lately?

00:02:45.960 --> 00:02:52.502
ALEX: One fish, two fish,
red fish, blue fish.

00:02:52.502 --> 00:02:53.460
DAVID MALAN: Wonderful.

00:02:53.460 --> 00:02:58.890
So given that, what grade level would
you say Alex is currently reading at?

00:02:58.890 --> 00:03:01.500
Feel free to just shout it out.

00:03:01.500 --> 00:03:02.730
First, first?

00:03:02.730 --> 00:03:07.200
So indeed, you'll see this week, if
you run your code on Alex's text,

00:03:07.200 --> 00:03:10.410
it actually turns out he reads
below a first grade reading level.

00:03:10.410 --> 00:03:12.400
But why might that be?

00:03:12.400 --> 00:03:16.410
What might your intuition
be for why we've

00:03:16.410 --> 00:03:19.020
accused Alex of reading at this level?

00:03:19.020 --> 00:03:20.990
Feel free to shout out.

00:03:20.990 --> 00:03:21.490
Yeah.

00:03:21.490 --> 00:03:24.520
So very few syllables, short
words, short sentences.

00:03:24.520 --> 00:03:27.828
And so there's some heuristics, perhaps,
we can infer from that short text,

00:03:27.828 --> 00:03:30.370
that that probably means that
it's best for younger children.

00:03:30.370 --> 00:03:33.370
Now Sarah, by contrast,
what have you been reading?

00:03:33.370 --> 00:03:35.470
SARAH: Mr. And Mrs. Dursley of Number.

00:03:35.470 --> 00:03:38.890
Four Privet Drive were
proud to say that they were

00:03:38.890 --> 00:03:41.050
perfectly normal, thank you very much.

00:03:41.050 --> 00:03:43.480
They were the last people
you'd expect to be involved

00:03:43.480 --> 00:03:46.390
in anything strange or
mysterious because they just

00:03:46.390 --> 00:03:47.952
didn't hold with much nonsense.

00:03:47.952 --> 00:03:48.910
DAVID MALAN: All right.

00:03:48.910 --> 00:03:50.950
Now irrespective of what
grade you were in when

00:03:50.950 --> 00:03:53.283
you might have read that text,
what grade level to Sarah

00:03:53.283 --> 00:03:55.230
seemed to be reading at?

00:03:55.230 --> 00:03:57.570
So eighth grade, second grade.

00:03:57.570 --> 00:03:58.080
OK.

00:03:58.080 --> 00:04:01.125
So hearing a bit of everything, so
with that, at least according to code,

00:04:01.125 --> 00:04:03.240
it would actually be seventh grade.

00:04:03.240 --> 00:04:05.130
And what might the intuition there be?

00:04:05.130 --> 00:04:07.620
Why is that a higher grade
level even though we might

00:04:07.620 --> 00:04:09.917
disagree exactly which grade it is?

00:04:09.917 --> 00:04:11.250
AUDIENCE: Complicated sentences.

00:04:11.250 --> 00:04:12.000
DAVID MALAN: Yeah.

00:04:12.000 --> 00:04:14.218
So complicated sentences,
longer sentences.

00:04:14.218 --> 00:04:17.010
So indeed a lot more words were
being spoken by Sarah because there

00:04:17.010 --> 00:04:18.519
was so much more there on the page.

00:04:18.519 --> 00:04:22.079
So we'll translate these ideas
this coming week in problem set 2,

00:04:22.079 --> 00:04:25.170
if you tackle this one, through
code so that you can ultimately

00:04:25.170 --> 00:04:26.910
infer things of these quantitatively.

00:04:26.910 --> 00:04:29.190
But to do so, we're going
to have to understand text.

00:04:29.190 --> 00:04:32.610
So let's first thank our volunteers and
then we'll dive in to that lower level.

00:04:32.610 --> 00:04:35.337
[APPLAUSE]

00:04:39.910 --> 00:04:40.600
Sorry.

00:04:40.600 --> 00:04:41.490
You can keep those.

00:04:41.490 --> 00:04:42.222
SARAH: Oh, OK.

00:04:42.222 --> 00:04:43.180
DAVID MALAN: All right.

00:04:43.180 --> 00:04:45.970
So besides that, let's
consider one other body of text

00:04:45.970 --> 00:04:48.010
perhaps that you might
see this week, which

00:04:48.010 --> 00:04:50.210
is namely a little something like this.

00:04:50.210 --> 00:04:53.860
What I have here on the screen is what
we'll start calling today ciphertext.

00:04:53.860 --> 00:04:56.530
It's the result of encrypting
some piece of information.

00:04:56.530 --> 00:05:00.190
And encryption, or more generally,
the art and science of cryptography

00:05:00.190 --> 00:05:00.908
is all around us.

00:05:00.908 --> 00:05:03.700
It's what you're using on the web,
on your phones, with your banks.

00:05:03.700 --> 00:05:07.000
And anything that tries to keep
data secure is using encryption.

00:05:07.000 --> 00:05:10.390
But there's going to be different levels
of encryption-- strong encryption,

00:05:10.390 --> 00:05:11.140
weak encryption.

00:05:11.140 --> 00:05:14.590
And what you see here on the
screen isn't all that strong,

00:05:14.590 --> 00:05:18.190
but we'll see later today how we
might decrypt this and actually reveal

00:05:18.190 --> 00:05:22.030
what the plaintext is that
corresponds to that ciphertext.

00:05:22.030 --> 00:05:25.670
But in order to do so, we have to
start taking off some training wheels,

00:05:25.670 --> 00:05:26.197
so to speak.

00:05:26.197 --> 00:05:28.030
And believe it or not,
even though your time

00:05:28.030 --> 00:05:30.100
would see this past
week for the first time,

00:05:30.100 --> 00:05:32.230
probably, might have
been rather in the weeds.

00:05:32.230 --> 00:05:36.072
And much more complicated seemingly
than C, it turns out that along the way,

00:05:36.072 --> 00:05:37.780
we have been providing
and we'll continue

00:05:37.780 --> 00:05:39.760
to provide certain training wheels.

00:05:39.760 --> 00:05:42.190
For instance, the CS50
Library is one of them,

00:05:42.190 --> 00:05:46.240
and even some of the explanations
we give of topics for now

00:05:46.240 --> 00:05:49.120
in these early weeks will be somewhat
simplified-- abstracted away,

00:05:49.120 --> 00:05:49.730
if you will.

00:05:49.730 --> 00:05:51.730
But the goal ultimately
is for you to understand

00:05:51.730 --> 00:05:55.060
each and every one of those details
so that after CS50, you really

00:05:55.060 --> 00:05:58.210
can stand on your own and
understand and wrap your mind

00:05:58.210 --> 00:06:01.040
around any future technologies as well.

00:06:01.040 --> 00:06:05.318
So let's consider first the very first
program with which we began last week,

00:06:05.318 --> 00:06:06.110
which was this one.

00:06:06.110 --> 00:06:09.215
So "hello, world" in C. At the end
of the day, it was really the printf

00:06:09.215 --> 00:06:11.590
function that was doing the
interesting part of the work,

00:06:11.590 --> 00:06:14.890
but there was a lot of technical
stuff above and below it.

00:06:14.890 --> 00:06:19.900
The curly braces, the parentheses,
words like void and include, and then

00:06:19.900 --> 00:06:21.730
of course, the angled brackets and more.

00:06:21.730 --> 00:06:25.870
But at the end of the day, we needed
to convert that source code in C

00:06:25.870 --> 00:06:30.190
to machine code, the 0's and 1's in
binary that the computer understood.

00:06:30.190 --> 00:06:32.500
And to do that, of course, we ran--

00:06:32.500 --> 00:06:33.700
we compiled the code.

00:06:33.700 --> 00:06:37.400
We ran make and then we were able
to actually run that code there.

00:06:37.400 --> 00:06:39.370
So let me actually go
over here to VS Code

00:06:39.370 --> 00:06:44.510
and really quickly recreate that hello.c
pretty much by transcribing the same.

00:06:44.510 --> 00:06:51.970
So I might have here include
stdio.h, int main void.

00:06:51.970 --> 00:06:54.460
And then in here, I had
quite simply, hello,

00:06:54.460 --> 00:06:57.430
comma, world with my
backslash, endquotes, and more.

00:06:57.430 --> 00:07:01.693
Now last time, to compile this, I indeed
ran make hello, followed by Enter.

00:07:01.693 --> 00:07:03.860
Hopefully you see no errors
and that's a good thing.

00:07:03.860 --> 00:07:05.980
And if you do dot,
slash, hello, you see,

00:07:05.980 --> 00:07:07.840
in fact, the results of that program.

00:07:07.840 --> 00:07:11.470
But it turns out that make
is not actually a compiler

00:07:11.470 --> 00:07:12.950
as I alluded to last week.

00:07:12.950 --> 00:07:15.520
It's a program that
clearly makes your program,

00:07:15.520 --> 00:07:19.030
but it itself just automates the
process of using an actual compiler.

00:07:19.030 --> 00:07:21.290
And there's lots of different
compilers out there,

00:07:21.290 --> 00:07:24.190
and the one that it's actually
using underneath the hood

00:07:24.190 --> 00:07:27.640
is a little something
called Clang for C Language.

00:07:27.640 --> 00:07:30.190
And Clang is a pretty
popular compiler nowadays.

00:07:30.190 --> 00:07:33.520
There's another one that's been
around for ages called GCC,

00:07:33.520 --> 00:07:36.330
but these are just specific
names for types of compilers

00:07:36.330 --> 00:07:38.830
that different people, different
companies, different groups

00:07:38.830 --> 00:07:40.310
have actually created.

00:07:40.310 --> 00:07:44.800
But if you use in week 1 a
compiler yourself manually,

00:07:44.800 --> 00:07:47.170
you have to understand a
little more about what's

00:07:47.170 --> 00:07:50.703
going on because it's even more
cryptic than what just make alone.

00:07:50.703 --> 00:07:53.620
So in fact, let me go back to my
terminal window here, let me go ahead

00:07:53.620 --> 00:07:58.690
and clear the screen a little bit
and just run really the raw compiler

00:07:58.690 --> 00:07:59.360
command.

00:07:59.360 --> 00:08:01.450
So what make is
automating for me let me,

00:08:01.450 --> 00:08:03.620
actually do this manually
for just a moment.

00:08:03.620 --> 00:08:10.450
So if I want to compile hello.c into
an executable program I can run,

00:08:10.450 --> 00:08:12.220
I can do this.

00:08:12.220 --> 00:08:17.110
clang, space, hello.c, and then Enter.

00:08:17.110 --> 00:08:20.980
And now there's no output, which is
a good thing in this case, no errors,

00:08:20.980 --> 00:08:22.010
but notice this.

00:08:22.010 --> 00:08:25.450
If I go ahead and type
ls, it turns out there's

00:08:25.450 --> 00:08:32.140
a file that's been created suddenly in
my current folder weirdly called a.out.

00:08:32.140 --> 00:08:33.580
That stands for Assembler Output.

00:08:33.580 --> 00:08:35.980
And long story short, that's
actually the default name

00:08:35.980 --> 00:08:39.440
of a program that's created when
you just run Clang by itself.

00:08:39.440 --> 00:08:41.830
Now that's a pretty
bad name for a program

00:08:41.830 --> 00:08:44.000
because it doesn't
describe what it does.

00:08:44.000 --> 00:08:49.870
So better would be here to perhaps do,
well, instead of a.out, which, yes,

00:08:49.870 --> 00:08:53.950
still prints hello.world, but isn't
really a clearly-named program,

00:08:53.950 --> 00:08:55.420
it'd be nice to name this hello.

00:08:55.420 --> 00:08:56.240
So what could I do?

00:08:56.240 --> 00:08:59.740
I could do like we learned last week--
well, I could rename a.out to hello

00:08:59.740 --> 00:09:01.820
by using Linux's mv command.

00:09:01.820 --> 00:09:04.480
So I'm going to move
a.out to become hello.

00:09:04.480 --> 00:09:06.370
But that, too, seems kind of tedious.

00:09:06.370 --> 00:09:07.720
Now I have three steps.

00:09:07.720 --> 00:09:10.750
Like write my code, compile
my code, and then rename it

00:09:10.750 --> 00:09:12.190
before I can even run it.

00:09:12.190 --> 00:09:13.580
We can do better than that.

00:09:13.580 --> 00:09:15.580
And so it turns out
that certain commands

00:09:15.580 --> 00:09:18.220
like clang support what
we're going to start today

00:09:18.220 --> 00:09:20.380
calling command line arguments.

00:09:20.380 --> 00:09:24.010
A command line argument, unlike
an argument to a function,

00:09:24.010 --> 00:09:27.040
is just an additional word
or key phrase that you

00:09:27.040 --> 00:09:30.400
type after a command at
your prompt in your terminal

00:09:30.400 --> 00:09:33.440
window that just modifies
the behavior of that command.

00:09:33.440 --> 00:09:35.600
It configures it a
little more specifically.

00:09:35.600 --> 00:09:39.220
So what you're seeing here on the screen
is some of a better command with which

00:09:39.220 --> 00:09:45.220
to run clang so that now I can specify
the output of this command per this o.

00:09:45.220 --> 00:09:46.610
So do what I mean by that?

00:09:46.610 --> 00:09:48.943
Well, let me go ahead and
clear my terminal window again

00:09:48.943 --> 00:09:54.955
and more explicitly type clang
-o hello hello.c and then Enter.

00:09:54.955 --> 00:09:57.580
Nothing, again, appears to happen,
but that's a good thing when

00:09:57.580 --> 00:10:02.860
you see no errors and now the program
I just created is indeed called Hello.

00:10:02.860 --> 00:10:07.280
So it achieves really the same
exact effect as make did, but what.

00:10:07.280 --> 00:10:09.820
I don't have to do with make
is type and remember something

00:10:09.820 --> 00:10:11.075
as long as this command.

00:10:11.075 --> 00:10:12.700
And this, too, is a bit of a white lie.

00:10:12.700 --> 00:10:16.420
It turns out, we have preconfigured
VS Code in the cloud for you

00:10:16.420 --> 00:10:21.310
to also use some other features
of Clang that would be even more

00:10:21.310 --> 00:10:22.840
tedious for you to write yourselves.

00:10:22.840 --> 00:10:28.130
And so really, this is why we distill
this as ultimately just running make.

00:10:28.130 --> 00:10:31.900
So let me pause here to see first if
there's any questions on what I've

00:10:31.900 --> 00:10:34.540
done by taking my very
first program in C

00:10:34.540 --> 00:10:37.720
and just now compiling it first
with make, but then starting over

00:10:37.720 --> 00:10:40.780
and now manually compiling
it with clang with what

00:10:40.780 --> 00:10:44.500
we'll call command line
arguments. -o, space, hello,

00:10:44.500 --> 00:10:46.820
and then the name of the file.

00:10:46.820 --> 00:10:47.320
Yeah?

00:10:47.320 --> 00:10:48.780
AUDIENCE: What is a.out?

00:10:48.780 --> 00:10:49.530
DAVID MALAN: Yeah.

00:10:49.530 --> 00:10:51.870
So a.out is a historical name.

00:10:51.870 --> 00:10:55.240
It refers to assembler
output-- more on that soon.

00:10:55.240 --> 00:10:58.080
And it's just the default file
name that you get automatically

00:10:58.080 --> 00:11:01.350
if you just run the compiler
on any file so that you

00:11:01.350 --> 00:11:02.970
have just a standard name for it.

00:11:02.970 --> 00:11:05.213
But it's not a very well-named program.

00:11:05.213 --> 00:11:07.380
Instead of running Microsoft
Word on your Mac or PC,

00:11:07.380 --> 00:11:09.880
it would be like
double-clicking on a.out.

00:11:09.880 --> 00:11:11.880
So instead with these
command line arguments,

00:11:11.880 --> 00:11:17.370
you can customize the output of Clang
and call it hello or anything you want.

00:11:17.370 --> 00:11:23.020
Other questions on what I've done
here with Clang itself, the compiler?

00:11:23.020 --> 00:11:23.520
Yeah?

00:11:23.520 --> 00:11:25.510
AUDIENCE: What is -o?

00:11:25.510 --> 00:11:26.565
DAVID MALAN: So -o--

00:11:26.565 --> 00:11:29.440
and you would only know this from
reading the manual, taking a class,

00:11:29.440 --> 00:11:30.500
means output.

00:11:30.500 --> 00:11:35.890
So -o means change Clang's
output to be a file called hello

00:11:35.890 --> 00:11:38.680
instead of the default, which is a.out.

00:11:38.680 --> 00:11:42.400
And this, too, is, again, a detail you
would have to look up on a web page,

00:11:42.400 --> 00:11:44.810
read the manual, hear someone
like me tell you about it.

00:11:44.810 --> 00:11:46.893
And in fact, there's even
more than these options,

00:11:46.893 --> 00:11:48.890
but we'll just scratch the surface here.

00:11:48.890 --> 00:11:49.390
All right.

00:11:49.390 --> 00:11:53.530
So if we now know this, what more is
actually happening underneath the hood?

00:11:53.530 --> 00:11:57.250
Well, let's take a closer look at
not just this version of my code,

00:11:57.250 --> 00:12:01.190
but my slightly more
complicated version last week,

00:12:01.190 --> 00:12:03.430
which looked a little
something like this, wherein

00:12:03.430 --> 00:12:07.330
I added in some dynamic input from the
user so I could say not hello, world

00:12:07.330 --> 00:12:11.810
to everyone, but hello, David or hello
to whoever actually runs this program.

00:12:11.810 --> 00:12:15.880
So in fact, let me go ahead and
change my code here in VS Code just

00:12:15.880 --> 00:12:17.770
to match that same code from last week.

00:12:17.770 --> 00:12:19.190
So no new code yet.

00:12:19.190 --> 00:12:22.820
I'm just going to, in a moment,
compile it in a slightly different way.

00:12:22.820 --> 00:12:29.020
So I did last week's string, I think,
answer equals string, quote-unquote,

00:12:29.020 --> 00:12:30.100
"What's your name?"

00:12:30.100 --> 00:12:31.540
Just like in Scratch.

00:12:31.540 --> 00:12:35.920
And then down here, instead of doing
world, I initially wrote answer,

00:12:35.920 --> 00:12:37.450
but that didn't go well.

00:12:37.450 --> 00:12:41.530
What did I ultimately do instead
to print out hello, David or hello,

00:12:41.530 --> 00:12:42.940
so-and-so?

00:12:42.940 --> 00:12:44.722
Yeah?

00:12:44.722 --> 00:12:45.680
Sorry, a little louder?

00:12:45.680 --> 00:12:46.430
AUDIENCE: %s?

00:12:46.430 --> 00:12:50.478
DAVID MALAN: Yeah, so %s, the so-called
format code that printf just knows how

00:12:50.478 --> 00:12:51.020
to deal with.

00:12:51.020 --> 00:12:52.470
And I had to add one other thing.

00:12:52.470 --> 00:12:54.350
Someone else besides %s--

00:12:54.350 --> 00:12:54.850
yeah?

00:12:54.850 --> 00:12:56.050
AUDIENCE: The name of the variable.

00:12:56.050 --> 00:12:58.870
DAVID MALAN: The name of the variable
that I want to plug into that

00:12:58.870 --> 00:13:00.190
placeholder %s.

00:13:00.190 --> 00:13:01.630
And in this case, it's answer.

00:13:01.630 --> 00:13:04.363
Now let me make one refinement
only because now we're in week 2

00:13:04.363 --> 00:13:06.530
and we're going to start
writing more lines of code,

00:13:06.530 --> 00:13:10.360
even though Scratch called the
return value of the ask puzzle piece,

00:13:10.360 --> 00:13:11.560
answer always.

00:13:11.560 --> 00:13:14.480
And see, we have full control over
what our variables are called.

00:13:14.480 --> 00:13:17.410
And now it's probably good not
to just generically always call

00:13:17.410 --> 00:13:19.870
my variable answer if
I'm using get_string.

00:13:19.870 --> 00:13:21.050
Let's call it what it is.

00:13:21.050 --> 00:13:23.680
So this is now just a matter
of style, if you will.

00:13:23.680 --> 00:13:26.620
Let me change the variable
to be name just so

00:13:26.620 --> 00:13:29.980
that it's a little clearer
to me, to you, to a TF or TA

00:13:29.980 --> 00:13:34.000
exactly what that variable represents
instead of more generically answer.

00:13:34.000 --> 00:13:37.030
All right, so that said, let me
go down to my terminal window,

00:13:37.030 --> 00:13:41.050
and last week again, I ran make to
compile this exact same program.

00:13:41.050 --> 00:13:43.270
Now, though, let me go
ahead and just use clang.

00:13:43.270 --> 00:13:45.490
So clang -o--

00:13:45.490 --> 00:13:47.500
I'll still call this version hello--

00:13:47.500 --> 00:13:49.330
space, hello.c.

00:13:49.330 --> 00:13:51.080
So exact same command as before.

00:13:51.080 --> 00:13:54.640
The only thing that's different is I've
added a couple of more lines of code

00:13:54.640 --> 00:13:56.330
to get the user's input.

00:13:56.330 --> 00:13:59.960
Let me hit Enter, and now,
darn it, our first error.

00:13:59.960 --> 00:14:02.750
So output from clang and
make is not a good thing,

00:14:02.750 --> 00:14:05.420
and here, we're seeing
something particularly cryptic.

00:14:05.420 --> 00:14:09.010
So something in function
'main--' undefined reference

00:14:09.010 --> 00:14:13.480
to 'get_string,' string and then
linker command failed with exit code 1.

00:14:13.480 --> 00:14:16.540
So there's actually a lot of jargon
in there that will tease apart today,

00:14:16.540 --> 00:14:20.338
but my hint is that clearly my problem's
in main, although that's not surprising

00:14:20.338 --> 00:14:22.130
because there's nothing
else going on here.

00:14:22.130 --> 00:14:26.830
get_string is an issue, and the issue
is that it's an undefined reference.

00:14:26.830 --> 00:14:28.990
And yet, notice, I was pretty good.

00:14:28.990 --> 00:14:32.920
I added the CS50 header file
and I said last week that that's

00:14:32.920 --> 00:14:35.920
enough to teach the compiler
that functions exist,

00:14:35.920 --> 00:14:39.070
but the problem is that even
though this does, in fact,

00:14:39.070 --> 00:14:43.090
teach Clang that get_string
exists, it is not

00:14:43.090 --> 00:14:47.530
sufficient information for Clang to go
find on the hard drive of the computer

00:14:47.530 --> 00:14:51.860
the 0's and 1's that actually
implement get_string itself.

00:14:51.860 --> 00:14:54.250
So in other words, this
include line, per last week,

00:14:54.250 --> 00:14:55.333
is a little bit of a hint.

00:14:55.333 --> 00:14:59.560
It's a teaser to Clang that you're about
to see and use this function somewhere.

00:14:59.560 --> 00:15:05.710
But if you actually want to use the 0's
and 1's that CS50 wrote some time ago

00:15:05.710 --> 00:15:08.740
and bake those into your
program so your program actually

00:15:08.740 --> 00:15:11.470
knows how to get input
from the user, well then,

00:15:11.470 --> 00:15:15.440
I'm going to have to go ahead and
run a slightly different command.

00:15:15.440 --> 00:15:16.250
So let me do this.

00:15:16.250 --> 00:15:18.917
Let me clear my terminal window
just get rid of that distraction

00:15:18.917 --> 00:15:23.020
and let me propose now that
we run this command instead.

00:15:23.020 --> 00:15:28.510
Almost the same as before, clang
-o, space, hello, then hello.c,

00:15:28.510 --> 00:15:34.210
but with one additional command line
argument at the end, and this is a -l--

00:15:34.210 --> 00:15:35.050
not a number 1.

00:15:35.050 --> 00:15:39.370
So -lcs with no space
in between those two.

00:15:39.370 --> 00:15:43.540
Now the l is going to result in all
of those 0's and 1's that actually

00:15:43.540 --> 00:15:48.350
were in by CS50 being linked into your
code, your few lines of code or mine

00:15:48.350 --> 00:15:48.850
here.

00:15:48.850 --> 00:15:53.530
But that's the second step that the
compiler requires in order to know how

00:15:53.530 --> 00:15:58.537
to actually execute and rather
compile your code and CS50's.

00:15:58.537 --> 00:16:00.370
And CS50 is not the
only one that does this.

00:16:00.370 --> 00:16:04.750
If you use any third party library in
C that doesn't come with the language,

00:16:04.750 --> 00:16:08.333
you would do -l such
and such where whoever--

00:16:08.333 --> 00:16:10.000
however they've named their own library.

00:16:10.000 --> 00:16:14.298
But you don't have to do it for built in
things like we've been using thus far.

00:16:14.298 --> 00:16:16.090
All right, so let me
go ahead and try this.

00:16:16.090 --> 00:16:19.000
I'll go back to VS Code
here, and let me go ahead now

00:16:19.000 --> 00:16:23.620
and run clang -o hello, then hello.c.

00:16:23.620 --> 00:16:26.560
And now instead of just
hitting Enter, -lcs50

00:16:26.560 --> 00:16:29.590
with no space between the
l and the cs50, Enter.

00:16:29.590 --> 00:16:33.310
Now nothing bad happens,
and now I can do ./hello.

00:16:33.310 --> 00:16:34.180
What's your name?

00:16:34.180 --> 00:16:37.633
I'll type in David, Enter,
and now we see hello, David.

00:16:37.633 --> 00:16:40.300
Now honestly, this is where we're
really getting into the weeds,

00:16:40.300 --> 00:16:42.130
and now this is taking--

00:16:42.130 --> 00:16:45.730
this is really just adding nuisance to
the process of compiling and running

00:16:45.730 --> 00:16:46.460
your code.

00:16:46.460 --> 00:16:49.960
And so the reality is, even though
this is indeed what is happening,

00:16:49.960 --> 00:16:51.880
this is why we used last
week and we're going

00:16:51.880 --> 00:16:55.240
to continue using this week
onward make because it just

00:16:55.240 --> 00:16:57.130
automates that whole process for you.

00:16:57.130 --> 00:17:00.130
But it's ideal to understand what's
going wrong because any of the error

00:17:00.130 --> 00:17:02.770
messages you saw for problem
set 1, any of the error messages

00:17:02.770 --> 00:17:05.859
you see for the next few weeks
probably aren't coming from make,

00:17:05.859 --> 00:17:08.560
they're coming from
Clang underneath the hood

00:17:08.560 --> 00:17:10.780
because make is just
automating the process.

00:17:10.780 --> 00:17:14.060
But with make, you literally just write
make and then the name of the program,

00:17:14.060 --> 00:17:17.560
you don't have to worry about any
of those command line arguments.

00:17:17.560 --> 00:17:22.240
Questions, then, on compiling
with dash -lcs50 or anything else?

00:17:22.240 --> 00:17:23.043
Yeah?

00:17:23.043 --> 00:17:24.960
AUDIENCE: What is the
benefit of [INAUDIBLE]??

00:17:24.960 --> 00:17:26.220
DAVID MALAN: Sorry,
what is the benefit of--

00:17:26.220 --> 00:17:27.512
AUDIENCE: Using Clang manually.

00:17:27.512 --> 00:17:30.000
DAVID MALAN: What is the
benefit of using Clang manually?

00:17:30.000 --> 00:17:30.870
None, really.

00:17:30.870 --> 00:17:33.450
In fact, all main is doing
is just say-- make is doing

00:17:33.450 --> 00:17:35.055
is saving us some keystrokes.

00:17:35.055 --> 00:17:37.680
If you prefer, though, and you
just like to be more in control,

00:17:37.680 --> 00:17:41.130
you can totally run Clang manually if
you remember the various command line

00:17:41.130 --> 00:17:42.090
arguments.

00:17:42.090 --> 00:17:42.660
Yeah?

00:17:42.660 --> 00:17:47.335
AUDIENCE: So why did you
have to explain [INAUDIBLE]

00:17:47.335 --> 00:17:48.210
DAVID MALAN: Exactly.

00:17:48.210 --> 00:17:49.560
Why did I have to explain--

00:17:49.560 --> 00:17:53.220
that is, provide a hint to CS50
with the cs50.h header file,

00:17:53.220 --> 00:17:55.470
but I didn't have to do
that with standardio.h?

00:17:55.470 --> 00:17:56.400
Just because.

00:17:56.400 --> 00:18:00.990
standardio.h comes with C, just
like a few other libraries come

00:18:00.990 --> 00:18:03.060
with C that we'll start seeing today.

00:18:03.060 --> 00:18:05.410
CS50, though, is not
built into C everywhere,

00:18:05.410 --> 00:18:07.890
and so you do have to
explicitly add that one there.

00:18:07.890 --> 00:18:08.767
Yeah?

00:18:08.767 --> 00:18:11.970
AUDIENCE: Can you define what
command line argument [INAUDIBLE]??

00:18:11.970 --> 00:18:15.210
DAVID MALAN: A command line
argument is a word or phrase

00:18:15.210 --> 00:18:17.740
that you type at the command line--

00:18:17.740 --> 00:18:22.200
a.k.a., your terminal-- in order to
influence the behavior of a program.

00:18:22.200 --> 00:18:22.742
AUDIENCE: OK.

00:18:22.742 --> 00:18:24.430
So it's a term for
whatever you're giving it.

00:18:24.430 --> 00:18:24.565
DAVID MALAN: Yeah.

00:18:24.565 --> 00:18:25.660
It changes the defaults.

00:18:25.660 --> 00:18:27.790
In our GUI world,
Graphical User Interface,

00:18:27.790 --> 00:18:29.680
you and I would probably
click some boxes,

00:18:29.680 --> 00:18:32.350
we would select some menu
options to configure a program

00:18:32.350 --> 00:18:33.460
to behave in the same way.

00:18:33.460 --> 00:18:36.850
At a command line interface, you have
to just say everything all at once,

00:18:36.850 --> 00:18:39.600
and that's why we have
command line arguments.

00:18:39.600 --> 00:18:40.605
Yeah?

00:18:40.605 --> 00:18:43.243
AUDIENCE: Is make [INAUDIBLE]

00:18:43.243 --> 00:18:43.910
DAVID MALAN: No.

00:18:43.910 --> 00:18:45.470
Make is not just for CS50.

00:18:45.470 --> 00:18:50.480
It's used globally in any project
really nowadays using C, C++,

00:18:50.480 --> 00:18:52.020
even other languages as well.

00:18:52.020 --> 00:18:54.140
In fact, most every command
you see in this class,

00:18:54.140 --> 00:18:57.530
unless it has 5-0 at the
end of it, is globally used.

00:18:57.530 --> 00:19:00.758
Only those-- a suffix with 50
are, indeed, course-specific.

00:19:00.758 --> 00:19:03.050
And even those we'll gradually
take training wheels off

00:19:03.050 --> 00:19:06.890
of so that exactly what those
commands are doing as well.

00:19:06.890 --> 00:19:09.053
All right, so what is
it that we've just done?

00:19:09.053 --> 00:19:11.720
Everything we've just done, of
course, I keep calling compiling,

00:19:11.720 --> 00:19:13.580
but let's just go down
one rabbit hole so

00:19:13.580 --> 00:19:15.967
that you understand that
when you compile code,

00:19:15.967 --> 00:19:18.050
there's actually a whole
bunch of steps, happening

00:19:18.050 --> 00:19:21.800
and this is going to enable a lot
of features, like companies can

00:19:21.800 --> 00:19:26.060
write code and then convert it
to run it on Macs and PCs alike

00:19:26.060 --> 00:19:27.240
or phones or the like.

00:19:27.240 --> 00:19:30.320
So it's not just a matter of
converting source code to machine code,

00:19:30.320 --> 00:19:34.610
there's actually four steps involved
in what you and I, as of last week,

00:19:34.610 --> 00:19:35.840
know as compiling.

00:19:35.840 --> 00:19:39.033
And these aren't terms that you'll
have to keep in mind constantly

00:19:39.033 --> 00:19:41.450
because again, we're going to
abstract a lot of this away.

00:19:41.450 --> 00:19:43.492
But just so we've gone
down the rabbit hole once,

00:19:43.492 --> 00:19:45.890
let's consider each of
these four steps that

00:19:45.890 --> 00:19:49.850
have been happening for you for a
week automatically, the first of which

00:19:49.850 --> 00:19:51.080
is called preprocessing.

00:19:51.080 --> 00:19:52.260
So what does this mean?

00:19:52.260 --> 00:19:54.450
Well, let's consider that
same program as before.

00:19:54.450 --> 00:19:57.830
So notice that two of the lines
of code start with a hash mark.

00:19:57.830 --> 00:20:02.338
That is a special symbol in C, and it's
a so-called preprocessor directive.

00:20:02.338 --> 00:20:04.130
You don't need to
memorize terms like that,

00:20:04.130 --> 00:20:07.005
but it just means that it's a little
different from every other line.

00:20:07.005 --> 00:20:08.960
And anything with a
hash symbol here should

00:20:08.960 --> 00:20:13.315
be preprocessed-- that is, analyzed
initially before anything else happens.

00:20:13.315 --> 00:20:17.100
So let's consider these two lines
up top, what exactly is happening.

00:20:17.100 --> 00:20:19.220
Well, it turns out with
these two lines, you

00:20:19.220 --> 00:20:23.390
have two header files, of
course, cs50.h and stdio.h.

00:20:23.390 --> 00:20:27.980
Where are those files, because
they've never been in VS Code for you,

00:20:27.980 --> 00:20:28.550
seemingly.

00:20:28.550 --> 00:20:31.940
If you type LS-- if you open up
the File Explorer in the GUI,

00:20:31.940 --> 00:20:35.900
you have never seen,
probably, cs50.h or stdio.h.

00:20:35.900 --> 00:20:39.620
They just work, but that's
because there's a folder somewhere

00:20:39.620 --> 00:20:43.340
on the hard drive that you're
using on your Mac or PC

00:20:43.340 --> 00:20:45.690
or somewhere in the
cloud, as in our case.

00:20:45.690 --> 00:20:50.210
And inside of this folder,
traditionally called /usr/include.

00:20:50.210 --> 00:20:51.857
And user is deliberately misspelled.

00:20:51.857 --> 00:20:54.440
It's just slightly more succinct,
although it's a little weird

00:20:54.440 --> 00:20:55.760
why we drop that one letter.

00:20:55.760 --> 00:21:01.760
But usr/include is just a folder on the
server that contains cs50.h, stdio.h,

00:21:01.760 --> 00:21:03.990
and a bunch of other things as well.

00:21:03.990 --> 00:21:08.030
So in fact, if you type in VS
Code, in your terminal window,

00:21:08.030 --> 00:21:13.310
when you're using code spaces in the
cloud and type LS space /usr/include,

00:21:13.310 --> 00:21:15.470
you can see all of the
files in that folder.

00:21:15.470 --> 00:21:17.580
But we've preinstalled
all of that stuff for you.

00:21:17.580 --> 00:21:20.390
So let's consider what's
actually in those files here.

00:21:20.390 --> 00:21:25.370
If I highlight these two lines up top
that start with hash include, well,

00:21:25.370 --> 00:21:30.530
I kind of hinted last week that what's
in that first file is a hint as to what

00:21:30.530 --> 00:21:32.660
functions CS50 wrote for you.

00:21:32.660 --> 00:21:35.540
So you can kind of think
of these include lines

00:21:35.540 --> 00:21:38.300
as being temporary
placeholders for what's

00:21:38.300 --> 00:21:41.000
going to become like a
global find and replace.

00:21:41.000 --> 00:21:44.270
That is the first thing clang is going
to do is to preprocess this file.

00:21:44.270 --> 00:21:47.300
It's going to look for any line
that starts with hash include.

00:21:47.300 --> 00:21:50.960
And if it sees that, it's going
to essentially go into that file,

00:21:50.960 --> 00:21:55.190
like cs50.h, and then just copy
and paste the contents of that file

00:21:55.190 --> 00:21:56.443
magically there for you.

00:21:56.443 --> 00:21:58.110
You don't see it visually on the screen.

00:21:58.110 --> 00:22:00.060
But it's happening behind the scenes.

00:22:00.060 --> 00:22:03.230
And so really, what's
happening with this first line

00:22:03.230 --> 00:22:09.380
is that somewhere in cs50.h is
the declaration of getString

00:22:09.380 --> 00:22:11.690
like we talked last
week, and it probably

00:22:11.690 --> 00:22:13.215
looks a little something like this.

00:22:13.215 --> 00:22:15.590
And we didn't spend much time
on this yet this past week,

00:22:15.590 --> 00:22:17.030
but we will in time more.

00:22:17.030 --> 00:22:21.470
Notice that this is how
a function is declared.

00:22:21.470 --> 00:22:23.677
That is, it is decreed to exist.

00:22:23.677 --> 00:22:25.760
The name of the function,
of course, is getString.

00:22:25.760 --> 00:22:28.310
Inside of the parentheses
are its arguments.

00:22:28.310 --> 00:22:31.580
In this case, there's one argument
to getString, I claim today,

00:22:31.580 --> 00:22:33.080
but you've known this implicitly.

00:22:33.080 --> 00:22:34.160
And it's a prompt.

00:22:34.160 --> 00:22:36.860
It's the prompt that the human
sees when you use getString.

00:22:36.860 --> 00:22:37.790
What is that prompt?

00:22:37.790 --> 00:22:41.060
Well, it's a string of text, like
quote unquote, "what's your name?"

00:22:41.060 --> 00:22:43.080
or anything else that I asked last week.

00:22:43.080 --> 00:22:46.610
Meanwhile, getString, as we know
from last week, has a return value.

00:22:46.610 --> 00:22:48.140
It returns something to you.

00:22:48.140 --> 00:22:49.610
And that, too, is a string.

00:22:49.610 --> 00:22:52.120
So again, this is also
called a functions prototype.

00:22:52.120 --> 00:22:53.870
It's the thing toward
the end of last week

00:22:53.870 --> 00:22:57.560
that I just copied and pasted from
the bottom of my file to the top,

00:22:57.560 --> 00:23:02.030
just so that it was like this teaser
for clang as to what would exist later.

00:23:02.030 --> 00:23:07.670
So you can think, then, of these include
lines as just kind of combining all

00:23:07.670 --> 00:23:11.360
of those function declarations in
some separate file called cs50.h,

00:23:11.360 --> 00:23:14.780
so that you yourself don't have to type
them every time you use the library--

00:23:14.780 --> 00:23:18.470
or worse, so that you, yourself, don't
have to copy and paste those lines.

00:23:18.470 --> 00:23:22.520
This is what clang is doing for you
in its first step of preprocessing.

00:23:22.520 --> 00:23:27.470
Second, and last in this example,
what happens when clang preprocesses

00:23:27.470 --> 00:23:29.175
this second include line?

00:23:29.175 --> 00:23:31.550
Well, the only other function
we care about in this story

00:23:31.550 --> 00:23:33.650
is printf, of course,
which comes with C.

00:23:33.650 --> 00:23:39.440
So essentially, you can think of
printf's prototype or declaration

00:23:39.440 --> 00:23:40.820
as just being this.

00:23:40.820 --> 00:23:42.870
Printf is the name of the function.

00:23:42.870 --> 00:23:47.370
It takes a string that you want
to format like, Hello comma world,

00:23:47.370 --> 00:23:49.110
or Hello comma %s.

00:23:49.110 --> 00:23:52.120
And then with dot, dot, dot, this
actually has technical meaning.

00:23:52.120 --> 00:23:55.770
It means, of course, that you can
plug-in 0 variables, 1 variable, 2

00:23:55.770 --> 00:23:56.340
or 10.

00:23:56.340 --> 00:23:58.530
So dot, dot, dot means
some number of variables.

00:23:58.530 --> 00:24:00.072
Now we haven't talked about this yet.

00:24:00.072 --> 00:24:01.410
And we won't really, in general.

00:24:01.410 --> 00:24:05.490
printf actually returns a value,
a number, that is an integer.

00:24:05.490 --> 00:24:07.420
But more on that perhaps another time.

00:24:07.420 --> 00:24:10.920
It's generally not something
the programmer tends to look at.

00:24:10.920 --> 00:24:14.250
But that's all we mean by preprocessing,
so that at the end of this process,

00:24:14.250 --> 00:24:18.030
even though there's more lines
of code in cs50.h and stdio.h,

00:24:18.030 --> 00:24:21.330
what's really just happening
is that clang, in preprocessing

00:24:21.330 --> 00:24:25.380
the file, copies and pastes the
contents of those files into your code

00:24:25.380 --> 00:24:29.160
so that now your code knows about
everything-- getString, printf,

00:24:29.160 --> 00:24:31.060
and anything else.

00:24:31.060 --> 00:24:35.230
Any questions, then, on that
first step, preprocessing?

00:24:35.230 --> 00:24:35.920
Yes?

00:24:35.920 --> 00:24:49.195
AUDIENCE: [INAUDIBLE]

00:24:49.195 --> 00:24:50.320
DAVID MALAN: Good question.

00:24:50.320 --> 00:24:52.720
When you include a file,
does it only include what

00:24:52.720 --> 00:24:54.880
you need or does it include everything?

00:24:54.880 --> 00:24:56.420
Think of it as including everything.

00:24:56.420 --> 00:24:59.020
So if it's a big file, that's
a lot of code at the very top.

00:24:59.020 --> 00:25:01.880
And that's why, if you think
back to all of the zeros and ones

00:25:01.880 --> 00:25:03.880
I showed a little bit
ago, as well as last week,

00:25:03.880 --> 00:25:06.130
there's a lot of zeros
and ones that end up

00:25:06.130 --> 00:25:08.892
on the screen as a result of
just writing, Hello, world.

00:25:08.892 --> 00:25:10.600
A lot of those zeros
and ones are perhaps

00:25:10.600 --> 00:25:13.390
coming from code that you didn't
actually, necessarily need.

00:25:13.390 --> 00:25:15.340
But some of it is
perhaps there, but there

00:25:15.340 --> 00:25:17.740
are ways to optimize that as well.

00:25:17.740 --> 00:25:22.395
All right, so step two of compiling
is, confusingly, called compiling.

00:25:22.395 --> 00:25:24.520
It's just, this is the term
that most everyone uses

00:25:24.520 --> 00:25:27.940
to describe the whole process,
instead of just this one step.

00:25:27.940 --> 00:25:32.140
But once a program has been
preprocessed behind the scenes

00:25:32.140 --> 00:25:35.865
by the compiler for you, it looks
now a little something like this.

00:25:35.865 --> 00:25:38.740
And I've put dot, dot, dot just to
imply that, yes, to your question,

00:25:38.740 --> 00:25:39.820
there's more stuff above it.

00:25:39.820 --> 00:25:40.987
There's more stuff below it.

00:25:40.987 --> 00:25:43.070
It's just not interesting
right now for us.

00:25:43.070 --> 00:25:44.860
So now we have just C code.

00:25:44.860 --> 00:25:46.960
There's no more preprocessor directives.

00:25:46.960 --> 00:25:49.840
At this point, all of the hash
symbols and those lines of code

00:25:49.840 --> 00:25:52.670
have been preprocessed and
converted to something else.

00:25:52.670 --> 00:25:56.380
And so now-- and this is where
things get a little spooky looking.

00:25:56.380 --> 00:26:00.370
Here now is what happens
when clang, or any compiler,

00:26:00.370 --> 00:26:03.310
literally compiles code like this.

00:26:03.310 --> 00:26:08.720
It converts it from this in
C to this in assembly code.

00:26:08.720 --> 00:26:10.720
So this is among the scarier languages.

00:26:10.720 --> 00:26:12.580
I, myself, don't really
have fond memories.

00:26:12.580 --> 00:26:14.805
This is not a language that
many people program in.

00:26:14.805 --> 00:26:16.930
If you take a subsequent
class in computer science,

00:26:16.930 --> 00:26:19.600
in systems, a higher level
class, you might actually

00:26:19.600 --> 00:26:21.430
learn this or some variant thereof.

00:26:21.430 --> 00:26:23.232
But there's at least
a few people out there

00:26:23.232 --> 00:26:24.940
that need to know this
stuff because this

00:26:24.940 --> 00:26:29.320
is closer to what the computers
themselves, nowadays, understand.

00:26:29.320 --> 00:26:34.600
The Intel CPUs or the AMD CPUs, the
brains of today's computers and phones

00:26:34.600 --> 00:26:37.960
understand stuff that looks
more like this and less like C.

00:26:37.960 --> 00:26:42.430
Now it's completely esoteric, but
let me just highlight a few phrases.

00:26:42.430 --> 00:26:44.630
There's some stuff
that's a little familiar.

00:26:44.630 --> 00:26:47.620
There is mention of main
at the top there in yellow.

00:26:47.620 --> 00:26:49.750
There is mention of
getString toward the bottom.

00:26:49.750 --> 00:26:52.070
There is mention of printf down below.

00:26:52.070 --> 00:26:55.600
So this is just another programming
language called assembly language,

00:26:55.600 --> 00:26:57.010
that decades ago, humans--

00:26:57.010 --> 00:26:58.450
myself included in school--

00:26:58.450 --> 00:27:00.130
did write code in.

00:27:00.130 --> 00:27:02.630
And absolutely, some people
still write this code,

00:27:02.630 --> 00:27:06.070
especially since you can write
very, very efficient code.

00:27:06.070 --> 00:27:08.590
But it's a lot more arcane.

00:27:08.590 --> 00:27:11.380
It's a lot less user friendly.

00:27:11.380 --> 00:27:14.650
So you'll see in yellow now, these
are the so-called instructions

00:27:14.650 --> 00:27:18.460
that a computer's brain or CPU
understands, pushing values

00:27:18.460 --> 00:27:23.630
around, moving them, subtracting values,
calling functions, and move, move,

00:27:23.630 --> 00:27:24.130
move.

00:27:24.130 --> 00:27:27.400
So really, the low-level operations
that computers understand

00:27:27.400 --> 00:27:31.030
tend to be arithmetic operations--
subtraction, addition,

00:27:31.030 --> 00:27:34.120
and the like-- moving
things in and out of memory.

00:27:34.120 --> 00:27:37.510
It's just a lot more tedious for
folks like us to write code like this.

00:27:37.510 --> 00:27:40.450
This is why you and I tend
to write stuff like this.

00:27:40.450 --> 00:27:44.080
And ideally, still, people like you and
I tend to drag and drop puzzle pieces

00:27:44.080 --> 00:27:46.520
that sort of abstract
all of that away further.

00:27:46.520 --> 00:27:49.420
But for now, this is, again,
called assembly language.

00:27:49.420 --> 00:27:54.310
It is what happens when the compiler
literally compiles your code.

00:27:54.310 --> 00:27:57.010
But of course, this,
still not zeros and ones.

00:27:57.010 --> 00:27:58.580
So we got two steps to go.

00:27:58.580 --> 00:28:02.270
So when a compiler
proceeds to step three,

00:28:02.270 --> 00:28:05.530
this is where things get
converted to machine code.

00:28:05.530 --> 00:28:08.500
And when a compiler
assembles your code for you,

00:28:08.500 --> 00:28:14.260
it converts what we just saw on the
screen here to actual zeros and ones--

00:28:14.260 --> 00:28:18.550
the so-called machine code that your
phone or your computer understands.

00:28:18.550 --> 00:28:22.120
But it's worth noting that
these are not necessarily all

00:28:22.120 --> 00:28:24.280
of the zeros and ones of your program.

00:28:24.280 --> 00:28:29.980
Yes, they are the zeros and ones
that correspond to your Hello program

00:28:29.980 --> 00:28:33.250
or printf and getString
and the like, but notice

00:28:33.250 --> 00:28:36.940
that here, we need one final step.

00:28:36.940 --> 00:28:40.100
In those zeros and ones are
only your lines of code.

00:28:40.100 --> 00:28:43.540
But what about CS50's lines of code
that we wrote to implement getString?

00:28:43.540 --> 00:28:46.990
What about the lines of code that humans
wrote decades ago to implement printf?

00:28:46.990 --> 00:28:50.020
Those are somewhere on this hard
drive, like on my Mac, my PC,

00:28:50.020 --> 00:28:54.460
or somewhere in the cloud, but we need
to combine all of those zeros and ones

00:28:54.460 --> 00:29:01.390
together and link my code with
CS50's code with standard I/O's code,

00:29:01.390 --> 00:29:02.420
all together.

00:29:02.420 --> 00:29:05.110
And so what happens in
the last step, ultimately,

00:29:05.110 --> 00:29:07.960
is that if we have my
code here in yellow,

00:29:07.960 --> 00:29:11.440
and then the code that CS50 wrote,
and the code that the authors of C

00:29:11.440 --> 00:29:15.940
itself wrote, what really is happening
is that somewhere, we have not only

00:29:15.940 --> 00:29:19.960
hello.c, which, obviously, I
wrote, and wrote with us live here,

00:29:19.960 --> 00:29:24.550
there's also, let's assume, somewhere
on the computer, a cs50.c file

00:29:24.550 --> 00:29:28.210
that, coincidentally, I and
CS50 staff wrote years ago.

00:29:28.210 --> 00:29:30.790
And also, somewhere on the
computer, there's another file.

00:29:30.790 --> 00:29:34.120
Let me oversimplify by
just calling it stdio.c.

00:29:34.120 --> 00:29:36.850
In practice, it's probably
specifically called printf.c.

00:29:36.850 --> 00:29:39.460
But they're somewhere,
these two other files.

00:29:39.460 --> 00:29:44.110
And so this last step called
linking takes my zeros and ones

00:29:44.110 --> 00:29:48.100
from the code I just wrote, namely
this code on the screen here.

00:29:48.100 --> 00:29:50.810
It then grabs the zeros
and ones that CS50 wrote.

00:29:50.810 --> 00:29:53.480
And it grabs the zeros and ones
that the authors of C wrote,

00:29:53.480 --> 00:29:56.240
in order to implement
the standard I/O library.

00:29:56.240 --> 00:30:00.750
And lastly, voila,
links them all together.

00:30:00.750 --> 00:30:03.980
And this is the same blob of zeros
and ones that we saw earlier.

00:30:03.980 --> 00:30:08.090
It's just now the result
of preprocessing your code,

00:30:08.090 --> 00:30:12.620
compiling your code, assembling your
code, linking your code, and my God,

00:30:12.620 --> 00:30:15.830
at this point, like if there were
any fun in programming for you yet,

00:30:15.830 --> 00:30:19.620
we've just taken it all away, we just
call this whole process compiling.

00:30:19.620 --> 00:30:20.120
Why?

00:30:20.120 --> 00:30:22.490
Because now that we
know those steps exist--

00:30:22.490 --> 00:30:25.370
and smart people solve
that problem for us--

00:30:25.370 --> 00:30:27.890
you and I can kind of operate
at this level of abstraction

00:30:27.890 --> 00:30:32.420
and just assume that compiling
converts source code to machine code.

00:30:32.420 --> 00:30:36.350
Questions, though, on any
of these intermediate steps?

00:30:36.350 --> 00:30:37.360
Yeah?

00:30:37.360 --> 00:30:41.958
AUDIENCE: For linking, are
different parts, like [INAUDIBLE]??

00:30:50.072 --> 00:30:51.280
DAVID MALAN: A good question.

00:30:51.280 --> 00:30:53.238
So where are all of these
zeros and one stored?

00:30:53.238 --> 00:30:56.400
Because you and I, we've been using
a browser, right? code.cs50.io,

00:30:56.400 --> 00:30:58.330
of course, is this
web-based user interface.

00:30:58.330 --> 00:31:00.497
But again, recall from last
week, even though you're

00:31:00.497 --> 00:31:05.640
using a web browser to access VS Code,
that web-based version of VS code

00:31:05.640 --> 00:31:09.000
is connected to an actual
server somewhere in the cloud.

00:31:09.000 --> 00:31:13.170
And on that server, you have your own
account and your own files, and really,

00:31:13.170 --> 00:31:15.360
your own hard drive,
virtually in the cloud.

00:31:15.360 --> 00:31:18.872
Think of it a little like Dropbox
or Box or Google Drive or OneDrive

00:31:18.872 --> 00:31:19.830
or something like that.

00:31:19.830 --> 00:31:23.310
So you have a hard drive somewhere out
there that we've provisioned for you.

00:31:23.310 --> 00:31:27.930
And it's on that hard drive that you
have your code that you just wrote,

00:31:27.930 --> 00:31:32.700
or I just wrote, cs50.c, stdio.c,
and all of the other code

00:31:32.700 --> 00:31:36.967
that implements the math functions
and everything else that C supports.

00:31:36.967 --> 00:31:37.550
Good question.

00:31:37.550 --> 00:31:38.964
Yeah?

00:31:38.964 --> 00:31:45.425
AUDIENCE: So, say in the CS50
library, the line [INAUDIBLE]

00:31:45.425 --> 00:31:49.401
do we do the same
exact thing [INAUDIBLE]

00:31:49.401 --> 00:31:51.935
copy paste them all the way over?

00:31:51.935 --> 00:31:53.060
DAVID MALAN: Good question.

00:31:53.060 --> 00:31:57.110
That hash includes cs50.h
line at the top of my code.

00:31:57.110 --> 00:32:01.310
If I just replace that with the
contents of cs50.c, would that work?

00:32:01.310 --> 00:32:03.590
Short answer, yes, that would work.

00:32:03.590 --> 00:32:05.400
You could copy all of the code there.

00:32:05.400 --> 00:32:08.577
However, there's some order of
operations that might come into play.

00:32:08.577 --> 00:32:10.910
And so it's probably not quite
as simple as copy, paste.

00:32:10.910 --> 00:32:13.190
But conceptually, yes,
that's what's happening.

00:32:13.190 --> 00:32:19.370
Now with that said, in cs50.h, are
only the prototypes of the functions,

00:32:19.370 --> 00:32:23.628
the hints as to how the functions
look, what their return type is,

00:32:23.628 --> 00:32:25.670
what their name is, and
what their arguments are.

00:32:25.670 --> 00:32:29.867
It's in the dot c file that
actual code tends to be written.

00:32:29.867 --> 00:32:32.450
And this is a little confusing
now because you and I have only

00:32:32.450 --> 00:32:33.920
written code in dot c files.

00:32:33.920 --> 00:32:35.690
But in the next few
weeks, you'll actually

00:32:35.690 --> 00:32:37.940
start writing some of
your own dot h files

00:32:37.940 --> 00:32:40.460
as well, just like CS50,
just like standard I/O.

00:32:40.460 --> 00:32:44.150
But in essence, that line of code
just makes it easier to use and reuse

00:32:44.150 --> 00:32:46.020
code that's already been written.

00:32:46.020 --> 00:32:47.750
And that's the whole point of a library.

00:32:47.750 --> 00:32:50.327
AUDIENCE: Does linking them [INAUDIBLE]?

00:32:50.327 --> 00:32:51.910
DAVID MALAN: Say that a little louder.

00:32:51.910 --> 00:32:54.472
AUDIENCE: Does linking happen
when you use the compiler?

00:32:54.472 --> 00:32:55.180
DAVID MALAN: Yes.

00:32:55.180 --> 00:32:56.980
Does linking happen when
you compile your code?

00:32:56.980 --> 00:32:57.480
Yes.

00:32:57.480 --> 00:33:02.320
When you run make, as we have
been doing the past week now,

00:33:02.320 --> 00:33:04.570
all four of these steps are happening.

00:33:04.570 --> 00:33:07.780
Preprocessing converts the hash
include lines to something else.

00:33:07.780 --> 00:33:10.600
Compiling technically
converts it to assembly

00:33:10.600 --> 00:33:14.290
code, which the Mac, the PC, the
server more closely understands.

00:33:14.290 --> 00:33:18.850
Assembly converts that language to
binary machine code that this computer

00:33:18.850 --> 00:33:20.080
actually understands.

00:33:20.080 --> 00:33:22.540
And then linking combines
everything together.

00:33:22.540 --> 00:33:27.550
And in fact, if you think back a few
minutes ago to when I did this -lcs50,

00:33:27.550 --> 00:33:30.070
the reason I had to add
that, and the reason

00:33:30.070 --> 00:33:32.860
my code did not compile
at first, was because I

00:33:32.860 --> 00:33:38.650
forgot to tell clang to link in CS50's
zeros and ones per that last step.

00:33:38.650 --> 00:33:42.147
I don't need to do -lstdio
because it comes with C,

00:33:42.147 --> 00:33:44.480
so that would just be tedious
for everyone in the world.

00:33:44.480 --> 00:33:47.140
But CS50 does not come
with C, so we link that in.

00:33:47.140 --> 00:33:49.780
And to be clear, too, we won't
always use CS50's library.

00:33:49.780 --> 00:33:53.072
That'll be yet another pair of training
wheels we take off in the coming weeks.

00:33:53.072 --> 00:33:55.000
But for now, it makes
a few things simpler.

00:33:55.000 --> 00:33:57.284
Yeah?

00:33:57.284 --> 00:33:59.750
AUDIENCE: What is the [INAUDIBLE]?

00:34:08.878 --> 00:34:10.170
DAVID MALAN: Short answer, yes.

00:34:10.170 --> 00:34:12.870
So what do the zeros and ones,
the machine code, translate to?

00:34:12.870 --> 00:34:15.690
Yes, there is a one-to-one
relationship between the machine

00:34:15.690 --> 00:34:17.340
code and the assembly code.

00:34:17.340 --> 00:34:21.510
Assembly code, it's not really English,
but at least it's symbols I recognize.

00:34:21.510 --> 00:34:22.800
It's not zeros and ones.

00:34:22.800 --> 00:34:24.810
Machine code, of course,
is just zeros and ones.

00:34:24.810 --> 00:34:27.960
So back in the day,
before C existed, people

00:34:27.960 --> 00:34:30.630
were programming only in assembly code.

00:34:30.630 --> 00:34:34.469
Before assembly code existed, people
were coding in zeros and ones.

00:34:34.469 --> 00:34:36.719
And you can imagine just
how painful that was,

00:34:36.719 --> 00:34:39.027
and so each of these
languages makes life, for us,

00:34:39.027 --> 00:34:40.110
sort of easier and easier.

00:34:40.110 --> 00:34:42.330
In a few weeks, we'll
transition to Python, which

00:34:42.330 --> 00:34:45.300
will, in turn, make C even simpler--

00:34:45.300 --> 00:34:48.090
or coding, in general,
simpler to do too.

00:34:48.090 --> 00:34:53.346
All right, so with that
said, what now can we--

00:34:53.346 --> 00:34:55.060
what could go wrong with this?

00:34:55.060 --> 00:34:58.140
Well, it turns out that besides
compiling, technically speaking,

00:34:58.140 --> 00:34:59.233
there's decompiling.

00:34:59.233 --> 00:35:01.150
And we've not done this,
and we won't do this.

00:35:01.150 --> 00:35:04.080
But it's worth considering
for just a moment.

00:35:04.080 --> 00:35:07.560
If you were to not compile
your code, but decompile it--

00:35:07.560 --> 00:35:11.340
as the word suggests, this just means
reversing the process, converting it,

00:35:11.340 --> 00:35:14.580
ideally, from machine
code-- zeros and ones--

00:35:14.580 --> 00:35:19.870
maybe back to C. Now this would be cool,
perhaps, if all you have is a program,

00:35:19.870 --> 00:35:22.080
you can convert it and see
the actual source code.

00:35:22.080 --> 00:35:25.320
What might a downside be,
if anyone on the internet

00:35:25.320 --> 00:35:28.650
is able to decompile
code on their machine?

00:35:28.650 --> 00:35:29.160
Yeah?

00:35:29.160 --> 00:35:30.270
AUDIENCE: [INAUDIBLE]

00:35:30.270 --> 00:35:34.130
DAVID MALAN: OK, so it's easier
to find bugs in the code that--

00:35:34.130 --> 00:35:35.430
oh, to exploit.

00:35:35.430 --> 00:35:38.417
So it might be easier to
hack into the software

00:35:38.417 --> 00:35:41.000
by finding mistakes you and I
made because, literally, they're

00:35:41.000 --> 00:35:43.370
staring at you in code,
whereas the zeros and ones make

00:35:43.370 --> 00:35:45.080
it way less obvious.

00:35:45.080 --> 00:35:48.140
Other downsides of what
I called decompiling?

00:35:48.140 --> 00:35:49.970
Yeah?

00:35:49.970 --> 00:35:53.690
AUDIENCE: If stuff is copyrighted or
you don't even know how to get it--

00:35:53.690 --> 00:35:54.440
DAVID MALAN: Yeah.

00:35:54.440 --> 00:35:55.948
AUDIENCE: [INAUDIBLE]

00:35:55.948 --> 00:35:57.740
DAVID MALAN: Yeah, if
your code, your work,

00:35:57.740 --> 00:36:00.950
is your intellectual property,
copyrighted or otherwise, that's

00:36:00.950 --> 00:36:03.660
kind of obnoxious that someone
can just run a command, and boom,

00:36:03.660 --> 00:36:05.577
they can see the original
code that you wrote.

00:36:05.577 --> 00:36:08.490
Now, it turns out it's not
quite as simple as that.

00:36:08.490 --> 00:36:11.720
And so even though, yes, you
could take a program like Hello,

00:36:11.720 --> 00:36:15.080
or even Microsoft Word, and
convert it from zeros and ones

00:36:15.080 --> 00:36:19.400
back to some form of source
code-- be it in C or Java

00:36:19.400 --> 00:36:22.820
or Python or something else, whatever
it was originally written in-- odds

00:36:22.820 --> 00:36:25.800
are it's going to be an
utter mess to look at.

00:36:25.800 --> 00:36:26.300
Why?

00:36:26.300 --> 00:36:30.390
Because things variable names are
not retained in the zeros and ones,

00:36:30.390 --> 00:36:30.890
typically.

00:36:30.890 --> 00:36:33.980
Function names might not be
retained in the zeros and ones.

00:36:33.980 --> 00:36:36.350
The code is, the logic
is, but the computer

00:36:36.350 --> 00:36:38.510
doesn't care what pretty
variables you chose

00:36:38.510 --> 00:36:41.060
and how nicely named your
functions were, it just

00:36:41.060 --> 00:36:42.890
needs to know them as zeros and ones.

00:36:42.890 --> 00:36:46.370
Moreover, if you think about last week,
we introduced things like loops in C.

00:36:46.370 --> 00:36:49.745
And besides for loops, there's what
other kind of loop, for instance?

00:36:49.745 --> 00:36:50.620
AUDIENCE: [INAUDIBLE]

00:36:50.620 --> 00:36:53.412
DAVID MALAN: So, a while loop--
and even though they look different

00:36:53.412 --> 00:36:55.920
and you have to write different
code, they achieve exactly

00:36:55.920 --> 00:36:59.910
the same functionality, which is
to say, when you compile a for loop

00:36:59.910 --> 00:37:04.140
or you compile a while loop, if
they logically do the same thing,

00:37:04.140 --> 00:37:07.420
they might end up looking
identical as zeros and ones.

00:37:07.420 --> 00:37:09.780
And so, therefore, it's
not necessarily predictable

00:37:09.780 --> 00:37:11.820
that you'll get back
the original code, why?

00:37:11.820 --> 00:37:15.110
Because the zeros and ones
might not know, so to speak,

00:37:15.110 --> 00:37:16.860
whether it was a for
loop or a while loop,

00:37:16.860 --> 00:37:19.350
so maybe compiling will
show you one or the other.

00:37:19.350 --> 00:37:21.870
And honestly, decompiling,
while possible-- and it's

00:37:21.870 --> 00:37:24.570
one way of reverse
engineering someone's product.

00:37:24.570 --> 00:37:28.662
Odds are, if you're good enough to start
reading code that's been decompiled

00:37:28.662 --> 00:37:30.870
and reading through the
messiness of it, odds are you

00:37:30.870 --> 00:37:34.020
have the talent probably to just
write that same program from scratch

00:37:34.020 --> 00:37:34.650
yourself.

00:37:34.650 --> 00:37:36.870
Now, that's an overstatement,
perhaps, but it's not

00:37:36.870 --> 00:37:40.410
quite as easy or threatening
as you might first think.

00:37:40.410 --> 00:37:43.290
So in general, once
code is compiled, it's

00:37:43.290 --> 00:37:48.290
pretty challenging, time consuming,
costly to reverse engineer it, much

00:37:48.290 --> 00:37:50.040
like it would be in
the real world, right?

00:37:50.040 --> 00:37:52.860
Like all of us have some kind of phone,
probably, nowadays in our pocket.

00:37:52.860 --> 00:37:55.193
There's nothing stopping you
from opening it up somehow,

00:37:55.193 --> 00:37:57.060
poking around, recreating what's there.

00:37:57.060 --> 00:37:59.130
That's a huge amount
of effort, most likely.

00:37:59.130 --> 00:38:01.880
And at that point, maybe you should
just invent the phone, instead

00:38:01.880 --> 00:38:03.310
of trying to reverse engineer it.

00:38:03.310 --> 00:38:06.330
So same kind of idea
in the physical world.

00:38:06.330 --> 00:38:13.050
Any questions, then, on compiling,
or even decompiling in these forms?

00:38:13.050 --> 00:38:17.160
All right, so odds are, at this point,
not only I, but you have made mistakes.

00:38:17.160 --> 00:38:19.050
And you've written buggy code--

00:38:19.050 --> 00:38:22.350
a bug in a code is just a
mistake, a logical error

00:38:22.350 --> 00:38:26.490
or otherwise, where the code just does
not behave correctly as you intend.

00:38:26.490 --> 00:38:29.880
And up until now, odds are,
your debugging techniques

00:38:29.880 --> 00:38:32.910
have been to maybe look back
at what I did in class, maybe

00:38:32.910 --> 00:38:35.320
ask a question online or in-person.

00:38:35.320 --> 00:38:38.190
But ultimately, it'd be nice if
you had some tools of your own

00:38:38.190 --> 00:38:39.570
with which to debug code.

00:38:39.570 --> 00:38:41.587
And this, honestly, is a lifelong skill.

00:38:41.587 --> 00:38:43.170
You're not going to emerge from CS50--

00:38:43.170 --> 00:38:44.490
and even 20 years from
now, you're not going

00:38:44.490 --> 00:38:47.910
to be writing-- if you're writing code
at all-- correct code all of the time.

00:38:47.910 --> 00:38:50.820
Like, all of us on the staff
continue to write bugs.

00:38:50.820 --> 00:38:54.120
Hopefully, they get a little more
sophisticated, and not sort of like,

00:38:54.120 --> 00:38:55.540
oops, I missed a semicolon.

00:38:55.540 --> 00:38:57.660
But even those kinds of
mistakes, we make too.

00:38:57.660 --> 00:39:00.150
But there's tools out
there and techniques

00:39:00.150 --> 00:39:03.550
that can make your life easier when
it comes to solving those problems.

00:39:03.550 --> 00:39:06.360
Now, the term bug has actually
been around for decades.

00:39:06.360 --> 00:39:11.790
But a fun story to tell is that
the first documented actual bug was

00:39:11.790 --> 00:39:13.650
actually somehow connected to Harvard.

00:39:13.650 --> 00:39:18.870
In fact, this is the logbook relating
to the Harvard Mark II computer

00:39:18.870 --> 00:39:22.890
from 1947, whereby if you read the
notes here-- and I'll Zoom in-- this

00:39:22.890 --> 00:39:27.630
was an actual moth discovered inside
of this big mainframe computer that

00:39:27.630 --> 00:39:29.160
was causing some kind of problems.

00:39:29.160 --> 00:39:30.450
And the engineers there
at the time actually

00:39:30.450 --> 00:39:33.610
thought it was funny that, wow, physical
bug actually explains the issue.

00:39:33.610 --> 00:39:36.450
And it's been forever taped to this
sheet of paper, which I believe

00:39:36.450 --> 00:39:39.090
now is on display in the Smithsonian.

00:39:39.090 --> 00:39:43.260
With that said, this is just
representative, too, of a logical bug.

00:39:43.260 --> 00:39:45.390
And that story is actually--

00:39:45.390 --> 00:39:49.170
that story was often retold by a famous
mathematician, then computer scientist

00:39:49.170 --> 00:39:53.640
really, Dr. Grace Hopper, who actually
worked not only on the Harvard Mark II

00:39:53.640 --> 00:39:57.210
computer, but its predecessor,
the Harvard Mark I.

00:39:57.210 --> 00:40:01.020
And if you ever spent time, yet, in the
engineering building across the river

00:40:01.020 --> 00:40:04.103
here, you can actually see
much of this computer, which

00:40:04.103 --> 00:40:07.020
is along the wall when you first
walk into the Science and Engineering

00:40:07.020 --> 00:40:07.530
Complex.

00:40:07.530 --> 00:40:09.530
And indeed, as you've
probably heard growing up,

00:40:09.530 --> 00:40:11.070
this is a mainframe computer.

00:40:11.070 --> 00:40:15.210
This is what Macs and PCs, so to
speak, looked like back in the day,

00:40:15.210 --> 00:40:18.240
with very physical things that
essentially implemented the zeros

00:40:18.240 --> 00:40:21.900
and ones that you and I take for granted
now being miniaturized in our laptops

00:40:21.900 --> 00:40:22.410
and phones.

00:40:22.410 --> 00:40:23.910
So there's a piece of history there.

00:40:23.910 --> 00:40:27.390
If you visit that side of
campus sometime, do take a look.

00:40:27.390 --> 00:40:30.480
But let's consider, then, how we
solve not, of course, physical bugs,

00:40:30.480 --> 00:40:31.350
but logical bugs.

00:40:31.350 --> 00:40:33.600
And let's consider something
like this from last week,

00:40:33.600 --> 00:40:38.820
whereby, we were trying very simply to
print like this column of three bricks

00:40:38.820 --> 00:40:40.320
using hashtags of sorts.

00:40:40.320 --> 00:40:44.400
So let me go over here in
just a moment to VS Code.

00:40:44.400 --> 00:40:47.080
And I'm going to go ahead and
open a program I wrote in advance.

00:40:47.080 --> 00:40:49.455
And I'm bringing it to class
because there's a bug in it,

00:40:49.455 --> 00:40:51.510
and I'd like to figure
out how to solve this bug.

00:40:51.510 --> 00:40:56.160
So let me open up a buggy0.c,
which is version 0 of my code.

00:40:56.160 --> 00:40:58.200
And let's just take a
quick peek at what's here.

00:40:58.200 --> 00:40:58.950
It's pretty short.

00:40:58.950 --> 00:41:03.750
It includes only stdio.h, it
uses printf, it uses a for loop,

00:41:03.750 --> 00:41:07.797
and the goal, quite simply, is to
print out that column of three bricks.

00:41:07.797 --> 00:41:11.130
Now, it's short enough that some of you,
if you're getting comfy already with C,

00:41:11.130 --> 00:41:13.360
you might already see the logical bug.

00:41:13.360 --> 00:41:16.200
It's not a syntax error,
like it will compile and run.

00:41:16.200 --> 00:41:17.280
But there's a bug there.

00:41:17.280 --> 00:41:22.320
And suppose that I'm very new to C, I'm
very uncomfortable with C, it's 2:00 AM

00:41:22.320 --> 00:41:26.130
and I just can't see the bug, what
are my recourses here for actually

00:41:26.130 --> 00:41:27.745
finding a mistake like this?

00:41:27.745 --> 00:41:29.370
Well, first, let's look at the symptom.

00:41:29.370 --> 00:41:31.740
Let me go down to my terminal window.

00:41:31.740 --> 00:41:36.120
I'm going to use make buggy0 because,
again, the file is called buggyo.c.

00:41:36.120 --> 00:41:37.260
I'm not going to use clang.

00:41:37.260 --> 00:41:39.880
In fact, I'm never really going
to use clang manually here on out.

00:41:39.880 --> 00:41:42.430
I'm just going to use make
because it makes our lives easier.

00:41:42.430 --> 00:41:43.560
It does compile.

00:41:43.560 --> 00:41:45.390
No errors, so it's not syntax.

00:41:45.390 --> 00:41:47.670
It's not something silly
like a missing semicolon.

00:41:47.670 --> 00:41:53.190
But when I run ./buggy0, I, of
course, see one, two, three, four--

00:41:53.190 --> 00:41:57.990
and this, of course, does not match the
one, two, three bricks that I actually

00:41:57.990 --> 00:41:59.610
intended for that column.

00:41:59.610 --> 00:42:02.970
And yet, I'm starting counting
at 0, as I usually do.

00:42:02.970 --> 00:42:03.930
I've got three.

00:42:03.930 --> 00:42:05.280
I'm going up to three.

00:42:05.280 --> 00:42:06.780
So where is my logical error?

00:42:06.780 --> 00:42:10.150
If it hasn't obviously jumped out at
you already, well, how can I solve this?

00:42:10.150 --> 00:42:13.080
Well, first and foremost,
perhaps the best technique

00:42:13.080 --> 00:42:16.080
for solving bugs, at least
early on, is just use printf.

00:42:16.080 --> 00:42:20.020
Like thus far, we've used sprint say,
Hello, and other things on the screen.

00:42:20.020 --> 00:42:22.530
But printf is just a function
for printing anything.

00:42:22.530 --> 00:42:24.570
And there's no reason
you can't temporarily

00:42:24.570 --> 00:42:27.900
use printf to print out
the contents of variables,

00:42:27.900 --> 00:42:29.850
what's going on inside
of your program, just

00:42:29.850 --> 00:42:31.350
to figure out where your mistake is.

00:42:31.350 --> 00:42:32.940
And then you can delete
that line of code later.

00:42:32.940 --> 00:42:34.600
It doesn't have to stay there forever.

00:42:34.600 --> 00:42:35.740
So let me do this.

00:42:35.740 --> 00:42:39.450
Instead of just printing out
in VS Code the hash symbol,

00:42:39.450 --> 00:42:45.690
let me do a little safety check
here and print out the value of i.

00:42:45.690 --> 00:42:49.170
So let me go ahead and
say something like, i is--

00:42:49.170 --> 00:42:51.610
now I want to say i is this.

00:42:51.610 --> 00:42:54.540
But, of course, this is not
how I print out the value of i.

00:42:54.540 --> 00:42:58.930
If I want to print out the value
of i, what should I put here?

00:42:58.930 --> 00:43:02.160
So %i for integer,
instead of %s for string.

00:43:02.160 --> 00:43:03.410
So they're still placeholders.

00:43:03.410 --> 00:43:04.930
But we use %s for integers.

00:43:04.930 --> 00:43:08.450
And now if I want to print out i, I just
need the comma as the second argument,

00:43:08.450 --> 00:43:09.250
and then i.

00:43:09.250 --> 00:43:13.000
All right, let me go ahead and
back to my terminal window.

00:43:13.000 --> 00:43:15.760
Let me recompile the program
because I've changed it.

00:43:15.760 --> 00:43:18.880
That still works fine, ./buggy0.

00:43:18.880 --> 00:43:22.540
And now, let me increase the
size of my terminal window here.

00:43:22.540 --> 00:43:25.510
You just see some diagnostic
information, if you will.

00:43:25.510 --> 00:43:26.560
This is not the goal.

00:43:26.560 --> 00:43:29.393
This is not what you should be
submitting for this homework problem,

00:43:29.393 --> 00:43:30.070
were it one.

00:43:30.070 --> 00:43:33.730
But it is helping us diagnostically
know that, OK, when i is zero,

00:43:33.730 --> 00:43:34.450
here's a hash.

00:43:34.450 --> 00:43:36.182
When i is 1, here's a hash.

00:43:36.182 --> 00:43:37.390
When i is two, here's a hash.

00:43:37.390 --> 00:43:39.017
When i is 3, here's a hash.

00:43:39.017 --> 00:43:39.850
Well, wait a minute.

00:43:39.850 --> 00:43:41.530
That's one, two, three, four.

00:43:41.530 --> 00:43:44.360
So clearly, I'm printing
it one too many times.

00:43:44.360 --> 00:43:48.130
So let me look back at the code here
by shrinking my terminal window.

00:43:48.130 --> 00:43:53.080
And let me just ask the group,
where is, in fact, the mistake?

00:43:53.080 --> 00:43:56.080
Or what, equivalently,
would be the solution?

00:43:56.080 --> 00:43:57.561
Yeah, in the middle.

00:43:57.561 --> 00:44:00.020
AUDIENCE: [INAUDIBLE]

00:44:00.020 --> 00:44:03.550
DAVID MALAN: Yeah, instead of less
than or equal to, use just less than.

00:44:03.550 --> 00:44:05.300
So you've got to kind
of pick a lane here.

00:44:05.300 --> 00:44:08.630
If you're going to start counting
from 0, you generally use less than,

00:44:08.630 --> 00:44:10.880
and go up to, but not through the value.

00:44:10.880 --> 00:44:13.970
Or if you prefer, like in the
human world, counting from 1 on up,

00:44:13.970 --> 00:44:17.300
you can use less than or equal
to, but you have to be consistent.

00:44:17.300 --> 00:44:19.790
And in general, as a
programmer, just always start

00:44:19.790 --> 00:44:22.610
counting from 0 if you're doing
something canonical like this.

00:44:22.610 --> 00:44:25.160
But the solution is,
indeed, just to change this

00:44:25.160 --> 00:44:27.860
by changing the greater less
than or equal to the less than.

00:44:27.860 --> 00:44:34.340
If I recompile this program with make
buggy0, and then do .buggy0 again--

00:44:34.340 --> 00:44:36.500
and let me increase the
size of my terminal window.

00:44:36.500 --> 00:44:39.050
Now, you see, OK,
almost the same output.

00:44:39.050 --> 00:44:44.330
But indeed, i starts at 0 and goes
up to, but not through, three.

00:44:44.330 --> 00:44:48.920
All right, so printf, in short,
can be your first diagnostic tool.

00:44:48.920 --> 00:44:51.500
Instead of just staring at the
screen or raising your hand--

00:44:51.500 --> 00:44:55.490
I mean, use printf to see, literally,
what's going on inside of your program

00:44:55.490 --> 00:44:57.287
by just printing out things of interest.

00:44:57.287 --> 00:44:59.120
And then once you've
solved the problem, you

00:44:59.120 --> 00:45:02.840
can go back into your code, as I'll do
here, by shrinking my terminal window.

00:45:02.840 --> 00:45:04.610
I'll delete the printf line.

00:45:04.610 --> 00:45:07.100
And now I'm ready to share
this program with the world

00:45:07.100 --> 00:45:08.870
or submit it as homework or the like.

00:45:08.870 --> 00:45:11.390
It's just meant there to be temporary.

00:45:11.390 --> 00:45:15.440
Any questions on printf
as a debugging tool?

00:45:18.010 --> 00:45:18.510
No?

00:45:18.510 --> 00:45:20.970
All right, well, that
only gets us so far.

00:45:20.970 --> 00:45:23.430
And honestly, as your programs
grow and grow and grow,

00:45:23.430 --> 00:45:25.180
it's going to actually
get really annoying

00:45:25.180 --> 00:45:28.860
to start going in and adding printf's,
then removing them, and figuring out,

00:45:28.860 --> 00:45:31.860
if you've got multiple printf's,
well, which one printed what?

00:45:31.860 --> 00:45:34.560
It just gets messy, eventually,
to rely on printf alone.

00:45:34.560 --> 00:45:37.740
So being a computer
scientist, computer scientists

00:45:37.740 --> 00:45:41.040
have written software to
make it easier to debug code.

00:45:41.040 --> 00:45:44.040
That software is what we would
generally call a debugger, which

00:45:44.040 --> 00:45:47.040
would be the second tool of the trade
that you can use to actually solve

00:45:47.040 --> 00:45:48.610
problems in your code.

00:45:48.610 --> 00:45:52.690
Now, in the world of VS code,
there's actually a debugger built in.

00:45:52.690 --> 00:45:54.840
So the graphical user
interface you're about to see

00:45:54.840 --> 00:45:58.260
in VS Code isn't specific to CS50,
it actually comes with VS Code.

00:45:58.260 --> 00:46:01.230
And it supports C, and
C++, and Java, and Python,

00:46:01.230 --> 00:46:03.030
and lots of other languages too.

00:46:03.030 --> 00:46:05.640
But it's, admittedly,
a little complicated

00:46:05.640 --> 00:46:07.650
to just start using the debugger.

00:46:07.650 --> 00:46:10.200
You have to create a
configuration file and do

00:46:10.200 --> 00:46:13.480
some annoying steps that just get
in the way of solving real problems.

00:46:13.480 --> 00:46:17.070
So we have automated the process for
you of just starting the debugger.

00:46:17.070 --> 00:46:19.680
And thereafter, it's sort of
industry standard how you use it.

00:46:19.680 --> 00:46:23.380
But we save you the headache of having
to create those configuration files.

00:46:23.380 --> 00:46:25.330
So, suppose I want to do this.

00:46:25.330 --> 00:46:27.600
Suppose I want to try
to debug this program

00:46:27.600 --> 00:46:30.330
step by step using special software.

00:46:30.330 --> 00:46:31.810
Well, how can I do that?

00:46:31.810 --> 00:46:36.240
Well, let me propose that if I revert
this back to the original version

00:46:36.240 --> 00:46:40.530
where i was less than or equal
to 3, I'm pretty sure that I

00:46:40.530 --> 00:46:41.790
was printing too many hashes.

00:46:41.790 --> 00:46:43.350
So I'm going to do this--
and you might have done this

00:46:43.350 --> 00:46:45.160
accidentally or never at all.

00:46:45.160 --> 00:46:49.500
But notice if you hover over the gutter,
so to speak, in VS Code, the part of it

00:46:49.500 --> 00:46:52.590
all the way to the left of the
editor, you see this sort of grayed

00:46:52.590 --> 00:46:54.390
out red dot.

00:46:54.390 --> 00:46:57.240
If you click there, it
becomes a brighter red dot.

00:46:57.240 --> 00:46:59.670
And this represents what we're
going to call a breakpoint.

00:46:59.670 --> 00:47:03.090
And this is just a visual indicator that
you've put like a stop sign equivalent

00:47:03.090 --> 00:47:06.270
there, and you're telling the
debugger in a moment, stop

00:47:06.270 --> 00:47:07.350
running my code there.

00:47:07.350 --> 00:47:07.920
Why?

00:47:07.920 --> 00:47:11.610
Because I prefer to step through
my code at sort of a human speed,

00:47:11.610 --> 00:47:14.380
and not as computer speed
where it runs all at once.

00:47:14.380 --> 00:47:16.750
So I've set my breakpoint,
which is step one.

00:47:16.750 --> 00:47:18.580
And then step two is quite simply this.

00:47:18.580 --> 00:47:23.190
Instead of running the program itself,
run the command called debug50,

00:47:23.190 --> 00:47:26.010
and then ./buggy0.

00:47:26.010 --> 00:47:29.220
And now this will start
your program, but inside

00:47:29.220 --> 00:47:31.200
of the debugger, which
is a special program

00:47:31.200 --> 00:47:33.060
that smart people
wrote that will empower

00:47:33.060 --> 00:47:38.190
you to now step through your code line
by line, and again, at your own comfort

00:47:38.190 --> 00:47:38.970
pace.

00:47:38.970 --> 00:47:43.080
I'm going to hit Enter, some stuff's
going to happen on the screen-- whoops.

00:47:43.080 --> 00:47:45.767
Notice, this is a common mistake
that I made accidentally here.

00:47:45.767 --> 00:47:47.100
Looks like I've changed my code.

00:47:47.100 --> 00:47:49.892
I did because I went in and changed
the less than or equal to sign.

00:47:49.892 --> 00:47:52.860
So let me go ahead and
rerun make buggy0--

00:47:52.860 --> 00:47:53.520
Enter.

00:47:53.520 --> 00:47:55.590
Good, now let me rerun debug50--

00:47:55.590 --> 00:47:57.810
Enter.

00:47:57.810 --> 00:47:59.760
And now some stuff just
happened on the screen

00:47:59.760 --> 00:48:03.270
and it takes a moment to get
started but once it's started you'll

00:48:03.270 --> 00:48:06.010
see this you'll still see your code.

00:48:06.010 --> 00:48:09.410
But you'll see this yellow highlight,
which you've probably not seen before.

00:48:09.410 --> 00:48:11.910
And notice that it's specifically
highlighting the same line

00:48:11.910 --> 00:48:13.440
that I set a breakpoint on.

00:48:13.440 --> 00:48:13.950
Why?

00:48:13.950 --> 00:48:18.870
That just means the debugger
has executed all of these lines,

00:48:18.870 --> 00:48:20.670
except for line 7.

00:48:20.670 --> 00:48:23.340
It has broken at-- not in a bad way.

00:48:23.340 --> 00:48:27.580
But it has paused execution on line 7,
so it hasn't yet printed any hashes.

00:48:27.580 --> 00:48:30.450
And you can see that-- no hashes
in the terminal window yet.

00:48:30.450 --> 00:48:31.980
It's paused execution.

00:48:31.980 --> 00:48:35.190
But what's interesting with
the debugger is the stuff

00:48:35.190 --> 00:48:37.410
over here on the left-hand side.

00:48:37.410 --> 00:48:39.960
In the debugger here,
you'll see, under variables,

00:48:39.960 --> 00:48:41.910
all of your so-called local variables.

00:48:41.910 --> 00:48:44.160
And we haven't really made
a distinction between local

00:48:44.160 --> 00:48:45.327
and something called global.

00:48:45.327 --> 00:48:48.000
But for now, local variables
just means all of the variables

00:48:48.000 --> 00:48:49.390
that exist in your function.

00:48:49.390 --> 00:48:52.110
So i currently has a value of 0.

00:48:52.110 --> 00:48:53.410
OK, and that makes sense.

00:48:53.410 --> 00:48:57.360
So now, how do I step through
my code and see what it's doing?

00:48:57.360 --> 00:48:59.610
Well, at the top of
the screen here, you'll

00:48:59.610 --> 00:49:02.250
see some playback icons,
kind of like a video player,

00:49:02.250 --> 00:49:03.630
but they have special meaning.

00:49:03.630 --> 00:49:07.892
This first one will just play the rest
of your program all the way to the end.

00:49:07.892 --> 00:49:10.350
So you only click that if you've
sort of solved the problem

00:49:10.350 --> 00:49:13.110
and you just want to run it
to completion like before.

00:49:13.110 --> 00:49:14.370
But the next three--

00:49:14.370 --> 00:49:16.920
or next two, really,
are really the juiciest.

00:49:16.920 --> 00:49:19.710
The second one here, if you
hover over it, eventually,

00:49:19.710 --> 00:49:21.930
you'll see that it's called Step Over.

00:49:21.930 --> 00:49:25.170
Step Over means that
the debugger will run

00:49:25.170 --> 00:49:28.630
this currently highlighted line of code,
but it's not going to dive into it.

00:49:28.630 --> 00:49:30.660
So if it's a function
like printf, it's not

00:49:30.660 --> 00:49:32.827
going to start stepping
through printf line by line.

00:49:32.827 --> 00:49:33.327
Why?

00:49:33.327 --> 00:49:36.420
Because I can pretty much assume
printf, written decades ago, is correct.

00:49:36.420 --> 00:49:38.050
Problem's probably with me.

00:49:38.050 --> 00:49:42.690
But this next line, if I did really
want to step into the printf code

00:49:42.690 --> 00:49:46.110
to figure out how it works or find some
problem in it all these years later,

00:49:46.110 --> 00:49:48.810
you can step into printf, and
then the screen would change,

00:49:48.810 --> 00:49:50.910
and you'd see each of
the lines for printf,

00:49:50.910 --> 00:49:54.250
line by line-- at least if you have
the source code for printf installed.

00:49:54.250 --> 00:49:56.490
All right, I'm going to use
the first one, Step Over.

00:49:56.490 --> 00:49:59.130
And watch as the yellow highlight moves.

00:49:59.130 --> 00:50:03.060
And watch as, in the terminal
window, there's a hash symbol.

00:50:03.060 --> 00:50:03.780
Here we go.

00:50:03.780 --> 00:50:05.130
There's one hash.

00:50:05.130 --> 00:50:07.230
Now, notice line 5 is highlighted.

00:50:07.230 --> 00:50:09.480
That means it has paused on line 5.

00:50:09.480 --> 00:50:11.350
Line 5 has not yet been executed.

00:50:11.350 --> 00:50:12.600
So what does that mean?

00:50:12.600 --> 00:50:16.320
The value of i, per the top
left-hand corner, is still 0.

00:50:16.320 --> 00:50:18.920
But as soon as I click
Step Over again, watch

00:50:18.920 --> 00:50:24.470
what happens at the top left, where
i is a variable on the screen.

00:50:24.470 --> 00:50:26.420
Now i-- and it flashed briefly--

00:50:26.420 --> 00:50:27.920
has a value of 1.

00:50:27.920 --> 00:50:30.650
And now if I step over again,
watch the terminal window.

00:50:30.650 --> 00:50:32.120
There's my second hash.

00:50:32.120 --> 00:50:36.380
Now, let me click Step Over on for
loop, watch the variable at top left.

00:50:36.380 --> 00:50:38.567
Now 1 goes to 2.

00:50:38.567 --> 00:50:39.650
Now let me click it again.

00:50:39.650 --> 00:50:43.220
Third hash-- and here's where the
logical error is perhaps revealed.

00:50:43.220 --> 00:50:45.210
Let me go ahead and step over the loop.

00:50:45.210 --> 00:50:46.520
Now i is 3.

00:50:46.520 --> 00:50:49.280
Wait a minute, I'm still
going to print out a hash.

00:50:49.280 --> 00:50:49.810
There it is.

00:50:49.810 --> 00:50:50.810
There's the fourth hash.

00:50:50.810 --> 00:50:53.852
And at this point, hopefully, the
light bulb, proverbially, has gone off.

00:50:53.852 --> 00:50:55.020
I realize, oh, I screwed up.

00:50:55.020 --> 00:50:58.580
I can either stop the program
altogether with the red square,

00:50:58.580 --> 00:51:01.100
or I can just let it run all
the way to the end, which

00:51:01.100 --> 00:51:02.493
just terminates everything.

00:51:02.493 --> 00:51:05.660
At this point, I just want to get back
into my code and start fixing things.

00:51:05.660 --> 00:51:07.700
And you can close, for
instance, as I will here,

00:51:07.700 --> 00:51:10.670
the File Explorer, just to
hide the panel that opened.

00:51:10.670 --> 00:51:12.320
So that's debug50.

00:51:12.320 --> 00:51:15.920
But it's not a CS50 thing, that just
starts the debugger for you, which

00:51:15.920 --> 00:51:19.520
is something you'd find in most any
programming environment nowadays.

00:51:19.520 --> 00:51:23.670
Questions on debugging?

00:51:23.670 --> 00:51:24.170
Questions?

00:51:24.170 --> 00:51:24.670
Yeah?

00:51:24.670 --> 00:51:27.295
AUDIENCE: Where does it tell
you where it went wrong?

00:51:27.295 --> 00:51:28.420
DAVID MALAN: Good question.

00:51:28.420 --> 00:51:30.310
Where does it tell you
where it went wrong?

00:51:30.310 --> 00:51:33.190
So, sadly, it does not
tell you any of that.

00:51:33.190 --> 00:51:37.570
The onus is still on you, the human,
to use this tool productively to walk

00:51:37.570 --> 00:51:39.580
through your code at a saner pace.

00:51:39.580 --> 00:51:42.070
But your brain is the one
that still needs to solve it.

00:51:42.070 --> 00:51:45.190
And I don't doubt, down the line,
with artificial intelligence and more,

00:51:45.190 --> 00:51:47.350
programs like this will
get all the more helpful,

00:51:47.350 --> 00:51:49.160
and start answering
questions like that for us.

00:51:49.160 --> 00:51:51.340
And there are other tools we'll
introduce you this semester

00:51:51.340 --> 00:51:52.990
that are even more powerful than this.

00:51:52.990 --> 00:51:56.770
But for now, it's just a tool,
really, to slow things down and not

00:51:56.770 --> 00:51:57.820
have to change your code.

00:51:57.820 --> 00:52:01.420
The fact that I had that panel on the
left that just showed me i's changing

00:52:01.420 --> 00:52:04.150
value is just an alternative
to printf, and I can

00:52:04.150 --> 00:52:06.820
step through it a little more slowly.

00:52:06.820 --> 00:52:10.580
Other questions on debugging?

00:52:10.580 --> 00:52:11.080
No?

00:52:11.080 --> 00:52:14.950
Let me show you one final
example with this debugger here.

00:52:14.950 --> 00:52:16.750
And this one, too, I wrote in advance.

00:52:16.750 --> 00:52:18.730
Let me close buggy0.c.

00:52:18.730 --> 00:52:22.327
And let me open up buggy1.c,
my second version thereof.

00:52:22.327 --> 00:52:24.160
Let me close my terminal
window for a second

00:52:24.160 --> 00:52:26.350
and give you a quick tour
of this program, which

00:52:26.350 --> 00:52:28.030
similarly, has a mistake.

00:52:28.030 --> 00:52:32.830
Now, at the top of this program, some
familiar includes, cs50.h and stdio.h.

00:52:32.830 --> 00:52:34.730
This is not something we've seen before.

00:52:34.730 --> 00:52:36.190
It's specific to this example--

00:52:36.190 --> 00:52:38.830
a function called getNegativeInt.

00:52:38.830 --> 00:52:41.043
Takes no arguments, and
it returns an integer.

00:52:41.043 --> 00:52:41.710
What does it do?

00:52:41.710 --> 00:52:45.040
It literally gets a negative
integer, ideally, from the user.

00:52:45.040 --> 00:52:47.200
Fun fact, though, it doesn't correctly.

00:52:47.200 --> 00:52:50.090
That's the bug. getNegativeInt
is broken at the moment.

00:52:50.090 --> 00:52:51.470
So what does main do?

00:52:51.470 --> 00:52:54.130
Well, main just calls this
function, passing in nothing

00:52:54.130 --> 00:52:55.690
in parentheses, no inputs.

00:52:55.690 --> 00:52:58.240
And it stores the return value in i.

00:52:58.240 --> 00:53:00.260
And then it just prints
out i on the screen.

00:53:00.260 --> 00:53:03.910
So honestly, just by eyeballing
this, I feel comfortable enough

00:53:03.910 --> 00:53:06.365
with programming in C,
I think main is correct.

00:53:06.365 --> 00:53:07.990
Let me just stipulate, main is correct.

00:53:07.990 --> 00:53:09.698
But there is going to
be a bug down here.

00:53:09.698 --> 00:53:11.210
Now, what's the bug down here?

00:53:11.210 --> 00:53:14.830
Well, let me look at
getNegativeInt's implementation.

00:53:14.830 --> 00:53:18.970
Notice, this first line, 12, is
identical to the prototype up here.

00:53:18.970 --> 00:53:22.690
The prototype is sort of
stupidly required up here

00:53:22.690 --> 00:53:25.300
because C reads things top
to bottom, left to right--

00:53:25.300 --> 00:53:26.690
the compiler technically does.

00:53:26.690 --> 00:53:29.680
So if you reference
getNegativeInt here, but you

00:53:29.680 --> 00:53:33.490
don't implement it until down here,
and you haven't told C in advance

00:53:33.490 --> 00:53:36.820
that it will exist, again, you
get the error we saw last week.

00:53:36.820 --> 00:53:39.010
All right, so how does
getNegativeInt work?

00:53:39.010 --> 00:53:40.960
We declare a variable called n.

00:53:40.960 --> 00:53:43.540
We've got to do while
loop that does what?

00:53:43.540 --> 00:53:47.110
It uses getInt, which comes with
the cs50 library, per last week.

00:53:47.110 --> 00:53:49.480
It prompts the user for
negative integer, quote unquote,

00:53:49.480 --> 00:53:51.670
and stores the value in n.

00:53:51.670 --> 00:53:56.800
I then do all of this while
n is less than 0, right?

00:53:56.800 --> 00:54:00.400
Remember, we used to do while loop last
week to make sure the human cooperates

00:54:00.400 --> 00:54:03.970
and doesn't give us the wrong type
of value, be it positive or negative

00:54:03.970 --> 00:54:04.970
or something else.

00:54:04.970 --> 00:54:06.400
And then we return n.

00:54:06.400 --> 00:54:07.570
And there's some subtleties.

00:54:07.570 --> 00:54:12.970
Anyone recall-- or have an intuition
for why I've declared n on line 14,

00:54:12.970 --> 00:54:15.790
instead of line 17?

00:54:15.790 --> 00:54:17.620
This is a C specific thing.

00:54:17.620 --> 00:54:23.465
AUDIENCE: [INAUDIBLE]

00:54:23.465 --> 00:54:24.340
DAVID MALAN: Exactly.

00:54:24.340 --> 00:54:27.610
There's this notion of scope in C. And
we'll continue to see this over time,

00:54:27.610 --> 00:54:32.590
whereby, a variable only exists
inside of the most recent curly braces

00:54:32.590 --> 00:54:33.560
that you've opened.

00:54:33.560 --> 00:54:36.910
So if I've declared n here
on line 14, I can use it

00:54:36.910 --> 00:54:40.900
anywhere between lines 13 and 21 because
those are the nearest curly braces.

00:54:40.900 --> 00:54:43.540
If by contrast, as you note,
if I instead said this,

00:54:43.540 --> 00:54:49.180
int n equals getInt and so forth,
and didn't have the current line 14,

00:54:49.180 --> 00:54:53.470
well, n would exist inside of these
curly braces, but not here, which

00:54:53.470 --> 00:54:55.340
is too late, and definitely not here.

00:54:55.340 --> 00:54:59.480
So you just have to declare it first,
and then use and reuse it as such.

00:54:59.480 --> 00:55:01.545
Now, let me just show
you how I can debug this.

00:55:01.545 --> 00:55:03.170
But let me show you the symptoms first.

00:55:03.170 --> 00:55:04.930
Let me open my terminal window.

00:55:04.930 --> 00:55:06.970
Let me run make buggy1.

00:55:06.970 --> 00:55:11.710
Compiles OK, so it's not something
silly like a semicolon. ./buggy1,

00:55:11.710 --> 00:55:13.660
and I'm asked for a negative integer.

00:55:13.660 --> 00:55:15.280
All right, let me give it negative 1--

00:55:15.280 --> 00:55:16.710
Enter.

00:55:16.710 --> 00:55:19.920
Well, the main function is
supposed to print out what I typed,

00:55:19.920 --> 00:55:20.880
but it clearly didn't.

00:55:20.880 --> 00:55:21.880
It's prompting me again.

00:55:21.880 --> 00:55:23.830
All right, so maybe
it'll like negative 2.

00:55:23.830 --> 00:55:24.330
No?

00:55:24.330 --> 00:55:26.380
Maybe negative 3.

00:55:26.380 --> 00:55:27.570
50?

00:55:27.570 --> 00:55:29.160
OK, so it's definitely broken, right?

00:55:29.160 --> 00:55:31.528
It kind of seems logically
to be doing the opposite.

00:55:31.528 --> 00:55:33.820
Now, you can perhaps see why
this is happening already.

00:55:33.820 --> 00:55:37.170
These are deliberately simple
programs for demonstrations sake.

00:55:37.170 --> 00:55:38.470
But let's do this.

00:55:38.470 --> 00:55:41.037
Let me go ahead and set
a breakpoint in main,

00:55:41.037 --> 00:55:42.870
even though I'm pretty
sure main is correct.

00:55:42.870 --> 00:55:45.810
But it just helps me start my
thought process-- start with main,

00:55:45.810 --> 00:55:47.010
and then take it from there.

00:55:47.010 --> 00:55:51.840
Let me run now, debug50 ./buggy1--

00:55:51.840 --> 00:55:52.920
Enter.

00:55:52.920 --> 00:55:53.700
And let's see.

00:55:53.700 --> 00:55:56.880
With that breakpoint now, the GUI
is going to reconfigure itself.

00:55:56.880 --> 00:56:00.360
It's going to pause on line 8 because
that's the first interesting line

00:56:00.360 --> 00:56:01.260
inside of main.

00:56:01.260 --> 00:56:03.780
So I could have just put the
breakpoint on line 8 too.

00:56:03.780 --> 00:56:06.480
It's smart enough to know
that if I set it on 6,

00:56:06.480 --> 00:56:09.570
you really mean line 8 because
that's the first actual line of code.

00:56:09.570 --> 00:56:11.280
And watch, now, what happens.

00:56:11.280 --> 00:56:15.780
If I step over this line, notice
that i, which at the moment

00:56:15.780 --> 00:56:18.090
seems to have a default value of 0--

00:56:18.090 --> 00:56:19.470
more on that another time.

00:56:19.470 --> 00:56:24.750
But if I click Step Over like before,
I'm prompted for a negative integer.

00:56:24.750 --> 00:56:25.750
Let me type negative 1--

00:56:25.750 --> 00:56:27.300
Enter.

00:56:27.300 --> 00:56:32.470
And now, notice, there's no
additional yellow highlight.

00:56:32.470 --> 00:56:32.970
Why?

00:56:32.970 --> 00:56:35.160
Where am I currently stuck, logically?

00:56:35.160 --> 00:56:37.937
AUDIENCE: [INAUDIBLE]

00:56:37.937 --> 00:56:40.770
DAVID MALAN: Yeah, just logically,
I must be in that do, while loop.

00:56:40.770 --> 00:56:43.560
And even if you don't understand it,
like that's the only explanation.

00:56:43.560 --> 00:56:46.143
If you keep getting prompted,
surely, there's a loop going on.

00:56:46.143 --> 00:56:49.270
There's only one loop in my code,
so there's probably a problem there.

00:56:49.270 --> 00:56:52.900
So I can't just set a breakpoint in
main, and then wait for this to work.

00:56:52.900 --> 00:56:53.610
So let me just--

00:56:53.610 --> 00:56:56.280
let me stop this with the red square.

00:56:56.280 --> 00:56:58.860
And let me think, all
right, instead of--

00:56:58.860 --> 00:57:02.770
I can still set my breakpoint in main,
but let me rerun the debugger instead.

00:57:02.770 --> 00:57:05.470
And this time, not step
over that line of code,

00:57:05.470 --> 00:57:07.930
let me step into that line of code.

00:57:07.930 --> 00:57:09.270
So watch what happens now.

00:57:09.270 --> 00:57:11.430
Instead of clicking
the second icon here,

00:57:11.430 --> 00:57:14.610
let me click the third, whose
name is, indeed, Step Into.

00:57:14.610 --> 00:57:17.880
And watch as the yellow highlight
does not move to line 9.

00:57:17.880 --> 00:57:21.930
It dives into line 8--
the function on line 8,

00:57:21.930 --> 00:57:25.170
thereby, bringing me down to line 17.

00:57:25.170 --> 00:57:28.270
It's kind of going down
into that next function.

00:57:28.270 --> 00:57:31.422
Now, it didn't bother pausing
on line 12 or 13 or 14

00:57:31.422 --> 00:57:34.380
because there's nothing intellectually
interesting there happening yet.

00:57:34.380 --> 00:57:37.080
The juicy part really starts,
it would seem, in line 17.

00:57:37.080 --> 00:57:40.980
So, now notice, n is my
variable at the top left.

00:57:40.980 --> 00:57:42.270
If I click--

00:57:42.270 --> 00:57:45.420
I don't want to click
Step Into now, though.

00:57:45.420 --> 00:57:48.090
What would go wrong if
I click on Step Into--

00:57:48.090 --> 00:57:52.480
or what would it do that I
don't think I want to do?

00:57:52.480 --> 00:57:52.990
Yeah?

00:57:52.990 --> 00:57:54.755
AUDIENCE: [INAUDIBLE]

00:57:54.755 --> 00:57:56.630
DAVID MALAN: Yeah, it
would step into getInt.

00:57:56.630 --> 00:57:59.620
But I'd like to think that the
staff's version of getInt is correct,

00:57:59.620 --> 00:58:02.120
and that's not our problem
today, so I want to step over it.

00:58:02.120 --> 00:58:06.710
And watch now at top left that
nothing happens yet to the value of n

00:58:06.710 --> 00:58:09.530
until I go to the terminal window
now, and I type in something

00:58:09.530 --> 00:58:10.670
like negative 1.

00:58:10.670 --> 00:58:14.600
Now notice, it jumps to line 19,
which is the next interesting line.

00:58:14.600 --> 00:58:17.240
Top left, n, indeed, is negative 1.

00:58:17.240 --> 00:58:19.160
And here's where I can
now pause as a human

00:58:19.160 --> 00:58:22.760
and think, all right, so
while n is less than 0.

00:58:22.760 --> 00:58:25.280
All right, n, per the top
left corner, is negative 1.

00:58:25.280 --> 00:58:27.830
So all right, while
negative 1 is less than 0,

00:58:27.830 --> 00:58:29.780
well, obviously that's
true mathematically.

00:58:29.780 --> 00:58:30.930
So what's going to happen?

00:58:30.930 --> 00:58:32.130
It's a do while loop.

00:58:32.130 --> 00:58:37.285
So when I click on Step Over again,
it's going to go to this line

00:58:37.285 --> 00:58:39.410
because it's at the end of
the inside of that loop.

00:58:39.410 --> 00:58:42.710
And now here, it's looping
through again and again.

00:58:42.710 --> 00:58:44.240
All right, let me do this once more.

00:58:44.240 --> 00:58:45.980
I'm going to step over, all right?

00:58:45.980 --> 00:58:48.777
I'm going to type in negative 2,
and it's the exact same thing.

00:58:48.777 --> 00:58:50.360
Now is my chance, on the yellow line--

00:58:50.360 --> 00:58:51.260
OK, wait a minute.

00:58:51.260 --> 00:58:53.450
Negative 2 is obviously less than 0.

00:58:53.450 --> 00:58:56.080
Let me try this one more time.

00:58:56.080 --> 00:58:57.570
Click it once here.

00:58:57.570 --> 00:58:59.040
All right, let me give it 50.

00:58:59.040 --> 00:59:05.020
And now, OK, while 50 is
less than 0, that's not true,

00:59:05.020 --> 00:59:08.970
so the loop is over because it's not
going to do it while 50 is less than 0.

00:59:08.970 --> 00:59:09.730
That's not true.

00:59:09.730 --> 00:59:12.240
So now watch, when I
click Step Over once more,

00:59:12.240 --> 00:59:15.810
it then finishes the loop, even
though there's nothing more to do.

00:59:15.810 --> 00:59:17.610
It's now about to return n.

00:59:17.610 --> 00:59:21.360
It jumps back up to main,
where I left off on line 9.

00:59:21.360 --> 00:59:23.778
It now prints, in my terminal
window, the number 50.

00:59:23.778 --> 00:59:26.070
And hopefully, at this point,
to your question earlier,

00:59:26.070 --> 00:59:30.700
my human brain has realized, oh, I'm
an idiot, like I flipped my sign there.

00:59:30.700 --> 00:59:32.460
So I probably-- let me stop this.

00:59:32.460 --> 00:59:34.780
I probably want to do
something like this.

00:59:34.780 --> 00:59:38.860
If the goal is to get a negative
integer, I probably want to say,

00:59:38.860 --> 00:59:45.070
while n is, for instance, greater
than or equal to 0 would work.

00:59:45.070 --> 00:59:48.630
So while n is greater than or
equal to 0, keep doing this.

00:59:48.630 --> 00:59:50.430
And that's the logic
I wanted to express.

00:59:50.430 --> 00:59:53.733
So the debugger just saves me from
staring at the screen, raising a hand,

00:59:53.733 --> 00:59:54.900
sort of asking someone else.

00:59:54.900 --> 00:59:58.650
At least in this case, it allows me
to go through it at a healthier pace.

00:59:58.650 --> 01:00:03.000
Questions now on debug50, which should
be your new friend, even if it's not

01:00:03.000 --> 01:00:04.940
your first instinct after printf?

01:00:07.690 --> 01:00:09.190
Any questions on debug50?

01:00:09.190 --> 01:00:09.730
No?

01:00:09.730 --> 01:00:13.960
All right, well, there's one last
technique we can equip you with here.

01:00:13.960 --> 01:00:17.470
And that is, in addition to
printf and a debugger, no joke,

01:00:17.470 --> 01:00:21.400
a rubber duck is actually a
reasonably recommended solution

01:00:21.400 --> 01:00:22.720
to finding bugs in your code.

01:00:22.720 --> 01:00:24.640
To your question earlier,
the duck two is not

01:00:24.640 --> 01:00:26.390
going to solve the problem for you.

01:00:26.390 --> 01:00:29.710
But if you've wondered why this
little guy has been here for so long,

01:00:29.710 --> 01:00:32.080
there's this technique, has
its own Wikipedia article

01:00:32.080 --> 01:00:33.760
of called rubber duck debugging.

01:00:33.760 --> 01:00:37.390
The idea of which is that if
you're home in your dorm room,

01:00:37.390 --> 01:00:39.520
wrestling with some bug
in your code, printf

01:00:39.520 --> 01:00:42.820
didn't quite reveal the source to
you, debugger isn't really helping,

01:00:42.820 --> 01:00:46.960
honestly, maybe it would help to just
sound out what problem you're having.

01:00:46.960 --> 01:00:50.260
Similar to going to office hours,
talking to a TA or a professor,

01:00:50.260 --> 01:00:52.030
just walking through
your problems because

01:00:52.030 --> 01:00:54.730
in sort of talking to
the duck about the fact

01:00:54.730 --> 01:01:00.550
that you're doing this while n is
less than 0, and then if it is--

01:01:00.550 --> 01:01:01.180
wait a minute.

01:01:01.180 --> 01:01:03.820
I'm an idiot, not just for
talking to the rubber duck.

01:01:03.820 --> 01:01:05.980
You realize, hopefully,
in expressing yourself,

01:01:05.980 --> 01:01:09.910
literally verbally, you probably
will hear with non-zero probability,

01:01:09.910 --> 01:01:11.860
like some illogic in your statement.

01:01:11.860 --> 01:01:16.430
And just by sounding things out, you'll
realize like, oh, that's my problem.

01:01:16.430 --> 01:01:19.720
And so, frankly, if you have roommates,
you can also use a roommate for this.

01:01:19.720 --> 01:01:21.700
But the rubber duck is
just sort of a go-to

01:01:21.700 --> 01:01:24.700
when your roommates have no
interest in your C problem set,

01:01:24.700 --> 01:01:28.150
talking something through that as such.

01:01:28.150 --> 01:01:29.933
And this is an invaluable technique.

01:01:29.933 --> 01:01:32.350
I admittedly tend not to do
it so much with a rubber duck,

01:01:32.350 --> 01:01:34.510
but ideally with colleagues,
human colleagues.

01:01:34.510 --> 01:01:38.260
But just talking through things
often will help you just realize,

01:01:38.260 --> 01:01:40.360
oh, I said something illogical.

01:01:40.360 --> 01:01:41.860
Now I can go back to the code.

01:01:41.860 --> 01:01:44.650
So don't solve problems
by staring at your screen

01:01:44.650 --> 01:01:46.240
endlessly for minutes, for hours.

01:01:46.240 --> 01:01:48.100
At that point, it's
time for a break, time

01:01:48.100 --> 01:01:50.475
to walk away, time to talk to
the duck, if you've already

01:01:50.475 --> 01:01:52.900
exhausted some of those other tools.

01:01:52.900 --> 01:01:55.330
As an aside, on your way out
today at the end of class,

01:01:55.330 --> 01:01:59.020
we have, clearly, plenty
of rubber ducks for you.

01:01:59.020 --> 01:02:01.600
And it's become a thing
over the years, at least

01:02:01.600 --> 01:02:05.770
among some, to bring the duck with them
when they travel and send us photos.

01:02:05.770 --> 01:02:10.480
Here, for instance, is CS50's
rubber duck debugger, A.K.A. DDB,

01:02:10.480 --> 01:02:15.940
for Duck Debugger, which is a pun on
a geekier program called GDB, the GNU

01:02:15.940 --> 01:02:18.740
Debugger, which is an actual
piece of software for debugging.

01:02:18.740 --> 01:02:25.270
This is CS50's debugger in the hills
of Puerto Rico, also, here on the sea.

01:02:25.270 --> 01:02:28.310
He made its way to San Francisco here.

01:02:28.310 --> 01:02:30.640
Also, down by Fisherman's
Wharf by the sea lions.

01:02:30.640 --> 01:02:31.660
Familiar?

01:02:31.660 --> 01:02:34.570
Here at Stanford, where there's
a William Gates Computer Science

01:02:34.570 --> 01:02:38.950
building for computer science,
down the road in SF at Google.

01:02:38.950 --> 01:02:41.650
And this is the Trevi Fountain in Rome.

01:02:41.650 --> 01:02:43.810
And lastly, the Colosseum.

01:02:43.810 --> 01:02:46.990
So we'll be curious to see in the coming
years where your duck two travels.

01:02:46.990 --> 01:02:49.120
So that, then, was quite a bit.

01:02:49.120 --> 01:02:51.850
Why don't we go ahead here and
take a short 5 minute break?

01:02:51.850 --> 01:02:52.760
No snacks yet.

01:02:52.760 --> 01:02:54.400
You're welcome to get up or sit down.

01:02:54.400 --> 01:02:56.620
We'll return in about five.

01:02:56.620 --> 01:03:00.020
All right, so we are back.

01:03:00.020 --> 01:03:04.000
And if the goal, ultimately, today is
to have a better understanding of things

01:03:04.000 --> 01:03:06.940
like strings so that we can
solve problems with text,

01:03:06.940 --> 01:03:09.190
let's consider some
simpler types of data

01:03:09.190 --> 01:03:11.290
first, how we might
represent those, and then

01:03:11.290 --> 01:03:14.290
see if that doesn't lead us to
a discovery as to how strings,

01:03:14.290 --> 01:03:17.330
and just today's modern software
is using things like that.

01:03:17.330 --> 01:03:21.850
So when we talked on week zero
about representation of data,

01:03:21.850 --> 01:03:25.930
we had different ways of doing it,
in terms of binary and decimal,

01:03:25.930 --> 01:03:27.640
and unary even.

01:03:27.640 --> 01:03:30.520
When we started talking about
the same last week in code,

01:03:30.520 --> 01:03:33.980
we started talking about
data types instead.

01:03:33.980 --> 01:03:36.820
And these data types
were a way of telling

01:03:36.820 --> 01:03:40.000
the computer, like do you want an
integer, do you want a character,

01:03:40.000 --> 01:03:44.260
do you want a floating point value,
like a real number, or even a string,

01:03:44.260 --> 01:03:45.070
as we've seen?

01:03:45.070 --> 01:03:47.350
But it turns out that
computers, of course,

01:03:47.350 --> 01:03:49.930
only have finite amounts of resources.

01:03:49.930 --> 01:03:53.740
Your computer only has a
fixed amount of memory or RAM.

01:03:53.740 --> 01:03:55.910
And that actually has very
real world implications.

01:03:55.910 --> 01:03:59.630
So for instance, here are some of
the data types we've seen thus far.

01:03:59.630 --> 01:04:04.090
And it turns out that each of
these in C has a specific number

01:04:04.090 --> 01:04:05.650
of bits allocated to it.

01:04:05.650 --> 01:04:08.350
Now, admittedly, this
can vary by system.

01:04:08.350 --> 01:04:10.850
It's not so much the case
nowadays, but for many years,

01:04:10.850 --> 01:04:13.100
for decades, computers were
getting better and better.

01:04:13.100 --> 01:04:15.392
The earliest computers
might have used fewer bits

01:04:15.392 --> 01:04:16.600
for some of these data types.

01:04:16.600 --> 01:04:18.663
More modern computers
might use more bits.

01:04:18.663 --> 01:04:21.830
So the numbers you're about to see are
pretty much where we are present day.

01:04:21.830 --> 01:04:25.030
So when it comes to
these data types, a bool,

01:04:25.030 --> 01:04:29.020
which is true or false, somewhat
curiously, uses a whole byte,

01:04:29.020 --> 01:04:32.380
even though that's way overkill
because for a bool, true or false,

01:04:32.380 --> 01:04:33.940
you, of course, only need one bit.

01:04:33.940 --> 01:04:36.520
But it turns out, even
though it's wasteful to use

01:04:36.520 --> 01:04:39.938
eight bits, or one byte, just
to represent true or false,

01:04:39.938 --> 01:04:41.230
it's just easier for computers.

01:04:41.230 --> 01:04:42.820
So a bool tends to be one byte.

01:04:42.820 --> 01:04:47.590
An int, which we've been using a lot,
uses 4 bytes, typically, or 32 bits.

01:04:47.590 --> 01:04:50.590
And if I do some quick math
from week zero, with 32 bits,

01:04:50.590 --> 01:04:54.040
you have 4 billion
possible values, roughly.

01:04:54.040 --> 01:04:56.290
But if you want to represent
positive and negative,

01:04:56.290 --> 01:04:59.710
that means you can represent roughly
negative 2 billion, all the way up

01:04:59.710 --> 01:05:01.020
to positive 2 billion.

01:05:01.020 --> 01:05:02.770
So that's the range,
typically, with ints.

01:05:02.770 --> 01:05:06.820
If that's too few numbers for you,
turns out there's things called longs.

01:05:06.820 --> 01:05:10.120
And longs use 64 bits,
which allow you to have

01:05:10.120 --> 01:05:13.220
like a quintillion
number of possibilities,

01:05:13.220 --> 01:05:15.730
which is a lot, certainly,
a lot more than 4 billion.

01:05:15.730 --> 01:05:17.410
So sometimes you might use a long.

01:05:17.410 --> 01:05:18.670
But even that's finite.

01:05:18.670 --> 01:05:21.640
And so as we discussed
at the end of last week,

01:05:21.640 --> 01:05:23.980
bad things can happen if
you make certain assumptions

01:05:23.980 --> 01:05:27.220
as to the data because of things
like integer overflow or the like,

01:05:27.220 --> 01:05:28.330
where things wrap around.

01:05:28.330 --> 01:05:31.538
Then there's a float, which is a real
number, something with a decimal point.

01:05:31.538 --> 01:05:36.040
By convention, it's 4 bytes or 32
bits, which gives you, in short,

01:05:36.040 --> 01:05:37.810
only a specific amount of precision.

01:05:37.810 --> 01:05:41.620
It doesn't necessarily dictate how many
numbers to the left or to the right.

01:05:41.620 --> 01:05:45.250
In the aggregate,
ultimately, you have though,

01:05:45.250 --> 01:05:47.650
4 billion possible permutations still.

01:05:47.650 --> 01:05:50.110
If you need more precision
for scientific, for medical,

01:05:50.110 --> 01:05:54.790
for financial applications, you
might use 8 bytes, A.K.A. a double,

01:05:54.790 --> 01:05:57.700
which just gives you
more digits of precision.

01:05:57.700 --> 01:06:01.360
They eventually get imprecise per
the example we looked at last week,

01:06:01.360 --> 01:06:03.610
but it at least gets you
further down the line.

01:06:03.610 --> 01:06:07.930
As an aside, in really, really
important applications, in finance,

01:06:07.930 --> 01:06:10.030
in medicine, in military
operations, and the

01:06:10.030 --> 01:06:12.640
like where you really can't
have rounding errors--

01:06:12.640 --> 01:06:17.470
long story short, humans have developed
libraries in C and other languages

01:06:17.470 --> 01:06:19.317
that use more, even, than 8 bytes.

01:06:19.317 --> 01:06:22.150
So there are solutions to these
problems, but they're always finite.

01:06:22.150 --> 01:06:24.070
You have to pick an upper bound.

01:06:24.070 --> 01:06:27.070
Then there's char, which we saw
briefly last week when I asked

01:06:27.070 --> 01:06:29.470
the user for y or n, for yes or no.

01:06:29.470 --> 01:06:32.470
And then there's a string, which I'm
going to propose as a question mark

01:06:32.470 --> 01:06:34.360
because a string totally depends.

01:06:34.360 --> 01:06:35.380
Like, Hi!

01:06:35.380 --> 01:06:38.890
H-I, exclamation point,
would seem to be three bytes.

01:06:38.890 --> 01:06:41.140
D-A-V-I-D, would seem to be five.

01:06:41.140 --> 01:06:45.400
So the strings, clearly, are variable
based on what you or the human type in.

01:06:45.400 --> 01:06:48.140
So we'll see what this
means, though, in just a bit.

01:06:48.140 --> 01:06:51.580
This though, is the thing inside
of your Mac, your PC, your phone.

01:06:51.580 --> 01:06:53.680
It might not look exactly
like this, but this is

01:06:53.680 --> 01:06:56.187
a memory module for a modern computer.

01:06:56.187 --> 01:06:57.520
And let's go ahead and use this.

01:06:57.520 --> 01:06:59.920
Really, it's just representative
of the finite amount of memory

01:06:59.920 --> 01:07:01.360
that any computer, indeed, has.

01:07:01.360 --> 01:07:06.160
Let's zoom in on one of these little
black chips on the circuit board here.

01:07:06.160 --> 01:07:10.180
Zoom in, and let me propose that
this rectangle really represents

01:07:10.180 --> 01:07:14.380
some number of bytes, like tucked
inside of this little black circuit

01:07:14.380 --> 01:07:16.750
on the board is maybe, I
don't know, a gigabyte,

01:07:16.750 --> 01:07:19.300
a billion bytes, maybe it's 100
bytes-- some number of bytes.

01:07:19.300 --> 01:07:21.258
It totally depends on
the computer and how much

01:07:21.258 --> 01:07:22.700
you paid for the stick of memory.

01:07:22.700 --> 01:07:27.850
But if there's a finite number of
bytes physically implemented somehow

01:07:27.850 --> 01:07:30.327
digitally inside of this
hardware, well, then it

01:07:30.327 --> 01:07:32.410
stands to reason that we
could number those bytes.

01:07:32.410 --> 01:07:36.940
We can just arbitrarily decide that
the top left corner is byte number

01:07:36.940 --> 01:07:38.800
one, or really byte number zero.

01:07:38.800 --> 01:07:41.170
The one next to it is
number one, then number two,

01:07:41.170 --> 01:07:43.450
number 3, dot, dot,
dot, number 2 billion

01:07:43.450 --> 01:07:46.090
or whatever it is, however
big this memory is.

01:07:46.090 --> 01:07:50.530
So if you use a variable in a C
program, that's only one byte.

01:07:50.530 --> 01:07:54.190
Like a char, it might literally be
stored in that top left-hand corner

01:07:54.190 --> 01:07:55.120
of the memory.

01:07:55.120 --> 01:07:57.760
In practice, you don't care
where, physically, it is.

01:07:57.760 --> 01:07:59.830
But really, the artist's
rendition would be

01:07:59.830 --> 01:08:02.872
this-- a char might use
one of those single bytes

01:08:02.872 --> 01:08:04.330
somewhere in the computer's memory.

01:08:04.330 --> 01:08:07.450
If you use an int, which is
4 bytes, it would give you

01:08:07.450 --> 01:08:10.840
4 bytes, contiguous-- that is
left to right, top to bottom.

01:08:10.840 --> 01:08:13.274
But all 32 bits would
be next to each other

01:08:13.274 --> 01:08:16.149
so the computer knows that those,
indeed, all belong to the same int.

01:08:16.149 --> 01:08:18.680
If you need a long, or a
double for that matter,

01:08:18.680 --> 01:08:21.140
then you might use a full
8 bytes in this case.

01:08:21.140 --> 01:08:23.439
And you just keep using
and using this memory,

01:08:23.439 --> 01:08:26.170
kind of like a canvas,
almost in Photoshop

01:08:26.170 --> 01:08:29.845
or a spreadsheet where you can just
move pixels or you can move data around,

01:08:29.845 --> 01:08:31.720
that's really what your
computer's memory is,

01:08:31.720 --> 01:08:36.702
a canvas for storing information
in units of bytes or 8 bits.

01:08:36.702 --> 01:08:39.160
Now, we don't need to keep
looking at these circuit boards.

01:08:39.160 --> 01:08:41.287
We can abstract it away, as we often do.

01:08:41.287 --> 01:08:43.120
And let's go ahead and
zoom in on this grid,

01:08:43.120 --> 01:08:45.740
just to consider some
very specific variables.

01:08:45.740 --> 01:08:49.180
So let me zoom in, and now I
see fewer, but larger boxes

01:08:49.180 --> 01:08:51.580
on the screen, each of which,
again, represents a byte.

01:08:51.580 --> 01:08:55.130
And now let me propose that
we play with some actual code.

01:08:55.130 --> 01:08:58.029
So here in C, albeit
without a full program,

01:08:58.029 --> 01:09:01.060
are three ints-- score1, score2, score3.

01:09:01.060 --> 01:09:07.359
I have, coincidentally, given
myself two scores around 72 and 73,

01:09:07.359 --> 01:09:09.040
and then a pretty low score at 33.

01:09:09.040 --> 01:09:12.048
Of course, last week or two weeks
ago, this would have been high.

01:09:12.048 --> 01:09:13.840
But now we're dealing
with actual integers.

01:09:13.840 --> 01:09:17.750
So these are three so-so scores on
my quizzes or tests or the like.

01:09:17.750 --> 01:09:19.250
So let me go to VS Code here.

01:09:19.250 --> 01:09:22.210
And let's make a
program called scores.c.

01:09:22.210 --> 01:09:24.399
So I'm going to write, code scores.c.

01:09:24.399 --> 01:09:26.149
That's going to give me my new file.

01:09:26.149 --> 01:09:28.420
And let me go ahead and
implement something like this.

01:09:28.420 --> 01:09:34.149
Include stdio.h, int main(void),
and then inside of here,

01:09:34.149 --> 01:09:37.689
let me do int score1 will be 72.

01:09:37.689 --> 01:09:40.029
Int score2 will be 73.

01:09:40.029 --> 01:09:43.149
And int score3 will be 33.

01:09:43.149 --> 01:09:45.460
And then let me just do
something like write a program

01:09:45.460 --> 01:09:48.043
to average my three test scores
together, something like that.

01:09:48.043 --> 01:09:52.240
So let me do printf, quote
unquote, my average is--

01:09:52.240 --> 01:09:56.470
and I'm going to go ahead
and do, say, %i, /n.

01:09:56.470 --> 01:09:58.290
And now, let me plug in the results.

01:09:58.290 --> 01:10:00.040
And this is kind of
grade school math now.

01:10:00.040 --> 01:10:02.210
How do I compute the
average of three values?

01:10:02.210 --> 01:10:09.110
Well, just like on paper, I can
do score1 plus score2 plus score3

01:10:09.110 --> 01:10:12.830
in parentheses, because of order
of operations, divided by 3,

01:10:12.830 --> 01:10:14.457
since there's three total scores.

01:10:14.457 --> 01:10:16.040
All right, so I think this checks out.

01:10:16.040 --> 01:10:19.040
And indeed, you can use parentheses
and operators like plus in your code

01:10:19.040 --> 01:10:23.180
like this in C. Let me go
ahead now and do make scores.

01:10:23.180 --> 01:10:24.327
No syntax error.

01:10:24.327 --> 01:10:25.910
So that's good, nothing missing there.

01:10:25.910 --> 01:10:28.850
And now let me do ./scores and
see what my test average is.

01:10:28.850 --> 01:10:32.270
All right, it's not great,
but I think I still passed.

01:10:32.270 --> 01:10:36.050
And indeed, my average here is 59.

01:10:36.050 --> 01:10:38.360
Is it precisely 59 though?

01:10:38.360 --> 01:10:39.140
Well, let's see.

01:10:39.140 --> 01:10:42.110
Let's actually, instead of using
an int, how about we go ahead

01:10:42.110 --> 01:10:44.870
and use something like a
floating point value here?

01:10:44.870 --> 01:10:46.250
And let me go ahead and do this.

01:10:46.250 --> 01:10:48.710
So let me recompile
my code, make scores.

01:10:48.710 --> 01:10:50.600
Huh, all right, I've got an issue.

01:10:50.600 --> 01:10:52.340
Let me zoom in on my terminal window.

01:10:52.340 --> 01:10:54.710
We've not seen this one,
necessarily, before.

01:10:54.710 --> 01:10:56.510
But error on line 9.

01:10:56.510 --> 01:11:00.410
Format specifies type double,
which is a lot of precision,

01:11:00.410 --> 01:11:02.180
but the argument has type int.

01:11:02.180 --> 01:11:03.300
So what does this mean?

01:11:03.300 --> 01:11:06.508
Well, it's showing me with these green
squiggles that something's bad between

01:11:06.508 --> 01:11:09.060
the %f and this thing over here.

01:11:09.060 --> 01:11:13.020
Well, on the left, I'm implying a
float, or a double for that matter.

01:11:13.020 --> 01:11:16.835
On the right, though, what data
type are score1, score2, score3?

01:11:16.835 --> 01:11:17.960
All right, so they're ints.

01:11:17.960 --> 01:11:19.583
So clang does not like this.

01:11:19.583 --> 01:11:22.250
The compiler just doesn't like
that I'm using ints on the right,

01:11:22.250 --> 01:11:24.170
but I want floats on the left.

01:11:24.170 --> 01:11:26.670
So there's going to be
different ways of solving this.

01:11:26.670 --> 01:11:29.870
One way would be to just ignore
the problem like I originally did,

01:11:29.870 --> 01:11:32.450
and just go back to %i.

01:11:32.450 --> 01:11:38.330
Or as an aside, %d is often an
alternative to %i for a decimal number.

01:11:38.330 --> 01:11:42.358
But we use %i because it sounds
like int, so %i is fine here too.

01:11:42.358 --> 01:11:44.150
But I don't want to
just avoid the problem.

01:11:44.150 --> 01:11:46.500
I want to actually display
a floating point value.

01:11:46.500 --> 01:11:47.730
So how can I fix this?

01:11:47.730 --> 01:11:50.272
Well, it turns out, I can solve
this in a few different ways.

01:11:50.272 --> 01:11:53.990
The simplest is just to make sure
that at least one number on the right

01:11:53.990 --> 01:11:59.330
is a floating point value,
like 3.0 instead of just 3.

01:11:59.330 --> 01:12:01.700
Now I think clang will be happier.

01:12:01.700 --> 01:12:03.320
Let me do make scores--

01:12:03.320 --> 01:12:04.400
Enter.

01:12:04.400 --> 01:12:05.330
And indeed, it's OK.

01:12:05.330 --> 01:12:05.930
Why?

01:12:05.930 --> 01:12:10.050
As soon as you have at least one
more precise data type on the right,

01:12:10.050 --> 01:12:13.170
it just treats everything, at that
point, as floating point value

01:12:13.170 --> 01:12:14.330
so that the math works out.

01:12:14.330 --> 01:12:17.720
So ./scores, Enter-- and
now, there we go, right?

01:12:17.720 --> 01:12:20.390
Some of us might really
want that 1/3 of a point.

01:12:20.390 --> 01:12:21.980
Our average was not 59.

01:12:21.980 --> 01:12:25.010
It's 59 1/3, as in this case here.

01:12:25.010 --> 01:12:26.750
All right, so we've solved that there.

01:12:26.750 --> 01:12:30.890
As an aside, though, there's one
other technique to show here.

01:12:30.890 --> 01:12:33.320
If you didn't want to change
it to 3.0 because that's

01:12:33.320 --> 01:12:36.410
a little weird, because there
were literally three scores,

01:12:36.410 --> 01:12:38.760
it's not like that needs
to have a decimal point,

01:12:38.760 --> 01:12:43.970
you could also explicitly
convert the 3 to a float

01:12:43.970 --> 01:12:46.230
by saying, in parentheses, float.

01:12:46.230 --> 01:12:48.050
This is what's called typecasting.

01:12:48.050 --> 01:12:51.840
And this will just convert the thing
right after it to that data type,

01:12:51.840 --> 01:12:52.560
if it's possible.

01:12:52.560 --> 01:12:56.970
So if I do this again, make scores,
no errors now. ./scores, and I get,

01:12:56.970 --> 01:12:59.960
in fact, the same result. There's
a bit of a rounding issue here,

01:12:59.960 --> 01:13:03.650
but we know the rounding relates
to the imprecision from last week.

01:13:03.650 --> 01:13:06.980
For now, let me just be
happy with my 59.3 something.

01:13:06.980 --> 01:13:08.360
I'll take that for now.

01:13:08.360 --> 01:13:14.660
But this is as close to a good
enough correct answer for me now.

01:13:14.660 --> 01:13:15.942
But how do I--

01:13:15.942 --> 01:13:18.650
think about now, what's going on
inside of the computer's memory?

01:13:18.650 --> 01:13:19.310
Well, let's consider.

01:13:19.310 --> 01:13:20.643
Here's that same grid of memory.

01:13:20.643 --> 01:13:22.490
Each box represents a byte.

01:13:22.490 --> 01:13:25.790
Where are score1, score2,
and score3 in my memory?

01:13:25.790 --> 01:13:28.790
Well, score1, let me just
propose, is at the top left.

01:13:28.790 --> 01:13:32.060
But it's taking up
four boxes for 4 bytes.

01:13:32.060 --> 01:13:34.842
Score2 probably ends up
right next to it in memory,

01:13:34.842 --> 01:13:36.800
though, this isn't always
going to be the case,

01:13:36.800 --> 01:13:38.180
but I've chosen simple examples.

01:13:38.180 --> 01:13:40.910
73 is next to it, also
taking up 4 bytes.

01:13:40.910 --> 01:13:45.320
And then lastly, 33 is in
score3, down there underneath.

01:13:45.320 --> 01:13:48.343
Now, if we really look
at the computer's memory,

01:13:48.343 --> 01:13:50.510
look at it with some kind
of microscope or the like,

01:13:50.510 --> 01:13:54.110
there's actually 32
bits, 32 bits, 32 bits

01:13:54.110 --> 01:13:59.308
in each of those four groups of four
bytes representing those values.

01:13:59.308 --> 01:14:01.100
But again, for today's
purposes onwards, we

01:14:01.100 --> 01:14:03.308
don't really need to think
again and again in binary.

01:14:03.308 --> 01:14:05.940
It's just, indeed, these decimal
numbers being stored there.

01:14:05.940 --> 01:14:08.240
But I claim now, this
isn't the best design.

01:14:08.240 --> 01:14:11.300
Even if you have never
programmed before CS50,

01:14:11.300 --> 01:14:13.220
what you're looking
at here on the screen,

01:14:13.220 --> 01:14:16.970
as an excerpt, in what sense is this
perhaps bad design, even though it's

01:14:16.970 --> 01:14:19.960
a correct way of storing
three test scores?

01:14:19.960 --> 01:14:20.960
What's kind of bad here?

01:14:20.960 --> 01:14:21.882
Yeah?

01:14:21.882 --> 01:14:26.220
AUDIENCE: The more scores you
have, the more you [INAUDIBLE]..

01:14:26.220 --> 01:14:28.950
DAVID MALAN: Yeah, always do
exactly what you did-- extrapolate

01:14:28.950 --> 01:14:31.740
to 4 scores, 5 scores 50 scores.

01:14:31.740 --> 01:14:34.020
This can't be that
well-designed because now you're

01:14:34.020 --> 01:14:36.300
going to have 4 lines of
code, 5 lines of code,

01:14:36.300 --> 01:14:38.550
50 lines of code that
are almost identical,

01:14:38.550 --> 01:14:40.770
except for this like
arbitrary number that we're

01:14:40.770 --> 01:14:42.430
updating at the end of the variable.

01:14:42.430 --> 01:14:44.940
So indeed, there's probably
going to be a better

01:14:44.940 --> 01:14:48.690
way, even though, at least in C,
we haven't yet seen that technique.

01:14:48.690 --> 01:14:52.440
But the solution, today onward, is
going to be something called an array.

01:14:52.440 --> 01:14:57.180
An array is a way of
storing your data back

01:14:57.180 --> 01:15:00.630
to back to back in the
computer's memory in such a way

01:15:00.630 --> 01:15:03.960
that you can access each
individual member easily.

01:15:03.960 --> 01:15:08.530
Put another way, with an array, you
can instead do something like this.

01:15:08.530 --> 01:15:12.300
Instead of saying int score1,
int score2, int score3,

01:15:12.300 --> 01:15:15.790
giving each a value, you
can first tell the computer,

01:15:15.790 --> 01:15:18.330
please give me a
variable called scores--

01:15:18.330 --> 01:15:20.700
plural, though you can
call it anything you want--

01:15:20.700 --> 01:15:24.090
of size three, each of
which will be an integer.

01:15:24.090 --> 01:15:28.680
That is to say, this is how you
declare an array in C that will have

01:15:28.680 --> 01:15:30.930
enough room to store three integers.

01:15:30.930 --> 01:15:34.540
Put another way, this is the
technical way of telling the computer,

01:15:34.540 --> 01:15:38.880
please give me 12 bytes in total--

01:15:38.880 --> 01:15:42.660
3 times 4 each for an int,
so give me 12 bytes in total.

01:15:42.660 --> 01:15:44.640
And what the computer
will do is guarantee

01:15:44.640 --> 01:15:47.350
that they're back to back to
back in the computer's memory.

01:15:47.350 --> 01:15:49.360
And that'll be useful in just a moment.

01:15:49.360 --> 01:15:51.820
So let me go ahead and do
something useful with this.

01:15:51.820 --> 01:15:53.640
Let me store three actual scores.

01:15:53.640 --> 01:15:58.500
Here's how I could now store those
same numeric scores in this array.

01:15:58.500 --> 01:16:03.040
Syntax is a little different, but
there's one variable called scores.

01:16:03.040 --> 01:16:05.010
But if you want to go
to its first location,

01:16:05.010 --> 01:16:08.520
starting today, you use square
brackets and go to location 0

01:16:08.520 --> 01:16:13.080
first, which because things in
C are 0 indexed, so to speak,

01:16:13.080 --> 01:16:14.280
you start counting at 0.

01:16:14.280 --> 01:16:16.410
The first int is at [0].

01:16:16.410 --> 01:16:18.030
Second int is at [1].

01:16:18.030 --> 01:16:19.530
Third int is at [2].

01:16:19.530 --> 01:16:20.730
So it's not one, two, three.

01:16:20.730 --> 01:16:22.090
It's literally 0, 1, 2.

01:16:22.090 --> 01:16:24.090
And this is not something
you have control over.

01:16:24.090 --> 01:16:26.250
You must start at 0.

01:16:26.250 --> 01:16:29.940
So these lines now create
an array of size three,

01:16:29.940 --> 01:16:33.510
and then insert one, two,
three values into that array.

01:16:33.510 --> 01:16:37.770
But the upside now is that you only have
one name of the variable to remember.

01:16:37.770 --> 01:16:39.240
It's just called scores.

01:16:39.240 --> 01:16:43.380
Yes, you need to go into the
array to get individual values.

01:16:43.380 --> 01:16:46.618
You need to index into it
using those square brackets.

01:16:46.618 --> 01:16:48.660
But at least you don't
have this hackish approach

01:16:48.660 --> 01:16:53.050
of declaring a separate variable for
each and every one of these values.

01:16:53.050 --> 01:16:56.070
So let me go back to scores.c here.

01:16:56.070 --> 01:16:57.580
And let me propose that I do this.

01:16:57.580 --> 01:17:00.580
Let me just use that same
idea to do the following.

01:17:00.580 --> 01:17:02.580
Let me get rid of these
three separate integers.

01:17:02.580 --> 01:17:06.210
Let me give myself an int
scores array of size 3.

01:17:06.210 --> 01:17:10.470
And then scores[0]
will, as before, be 72.

01:17:10.470 --> 01:17:14.070
Scores[1] will be 73.

01:17:14.070 --> 01:17:16.830
And scores[2] will be 33.

01:17:16.830 --> 01:17:18.780
And let me get rid of
the little dot there.

01:17:18.780 --> 01:17:23.490
All right, so now, if I go ahead and
run this again with make scores--

01:17:23.490 --> 01:17:24.642
Enter.

01:17:24.642 --> 01:17:29.060
Huh, what did I do wrong here?

01:17:29.060 --> 01:17:31.680
I think I got a little
too ahead of myself.

01:17:31.680 --> 01:17:36.100
Let me increase my terminal window.

01:17:36.100 --> 01:17:38.830
Let's focus on line 10 here, first.

01:17:38.830 --> 01:17:42.310
Error, use of undeclared
identifier, score1.

01:17:42.310 --> 01:17:44.170
What did I do here that was dumb?

01:17:44.170 --> 01:17:45.430
Yeah?

01:17:45.430 --> 01:17:47.440
AUDIENCE: You didn't
declare it a variable.

01:17:47.440 --> 01:17:49.420
DAVID MALAN: Right, so
I didn't declare score1.

01:17:49.420 --> 01:17:50.530
I've got old code.

01:17:50.530 --> 01:17:53.798
So I just kind of, honestly, got ahead
of myself here, not even intentionally.

01:17:53.798 --> 01:17:56.090
So let me go ahead and shrink
my terminal window again.

01:17:56.090 --> 01:17:57.740
I need to finish my thought here.

01:17:57.740 --> 01:17:58.960
So let me clear my terminal.

01:17:58.960 --> 01:18:04.960
And let me change this now to
be scores[0] plus scores[1] plus

01:18:04.960 --> 01:18:05.610
scores[2].

01:18:05.610 --> 01:18:07.360
So it's a little more
verbose because I've

01:18:07.360 --> 01:18:10.040
got these square brackets, so to speak.

01:18:10.040 --> 01:18:12.220
But I think now my code is consistent.

01:18:12.220 --> 01:18:13.870
So let me make scores now.

01:18:13.870 --> 01:18:14.950
It now compiles.

01:18:14.950 --> 01:18:19.870
./scores gives me, indeed, the same
rough average with those same values.

01:18:19.870 --> 01:18:24.280
All right, so let me go ahead and
maybe enhance this a little bit.

01:18:24.280 --> 01:18:26.920
It's a little silly to have to
write a special program just

01:18:26.920 --> 01:18:31.610
to check your average of three
test scores like 72, 73, 33.

01:18:31.610 --> 01:18:33.550
Why don't I actually
make the program dynamic

01:18:33.550 --> 01:18:37.250
and ask the human for those scores?

01:18:37.250 --> 01:18:39.140
So instead, let me do this.

01:18:39.140 --> 01:18:43.480
How about we get rid of the
72, and change this to getInt.

01:18:43.480 --> 01:18:46.300
And I'll just prompt
the user for a score.

01:18:46.300 --> 01:18:52.510
Let me get rid of the 73 and get this
to be getInt score, quote unquote.

01:18:52.510 --> 01:18:56.560
And then lastly, get rid of the 33, and
replace it with getInt, quote unquote,

01:18:56.560 --> 01:18:57.670
score.

01:18:57.670 --> 01:19:03.680
getInt is a CS50 thing for now, so
I need to include cs50.h, as always.

01:19:03.680 --> 01:19:05.650
But I think now, it's
sort of a better program

01:19:05.650 --> 01:19:08.680
because now I can compile it once,
I can even share it with my friends.

01:19:08.680 --> 01:19:12.490
And now any of us can average
three scores on some classes test.

01:19:12.490 --> 01:19:15.190
They don't need to know the
code or rewrite the code just

01:19:15.190 --> 01:19:16.910
to type in their scores.

01:19:16.910 --> 01:19:19.150
So make scores worked.

01:19:19.150 --> 01:19:25.120
./scores, now I can type anything
I want-- maybe it's a 72, 73, 33,

01:19:25.120 --> 01:19:26.320
still get the same answer.

01:19:26.320 --> 01:19:31.210
Or maybe I'm having a better
semester, 100, 100, maybe 99,

01:19:31.210 --> 01:19:33.520
and now we get still a
pretty high score there.

01:19:33.520 --> 01:19:34.600
But now it's dynamic.

01:19:34.600 --> 01:19:36.080
Now you don't need the source code.

01:19:36.080 --> 01:19:37.747
You don't need to recompile the program.

01:19:37.747 --> 01:19:39.670
It's just going to work again and again.

01:19:39.670 --> 01:19:41.090
But this, too.

01:19:41.090 --> 01:19:43.660
Let me propose that this
code is correct if I

01:19:43.660 --> 01:19:45.910
want to get three scores from the user.

01:19:45.910 --> 01:19:50.950
But these highlighted lines now, 6
through 9, are they well-designed,

01:19:50.950 --> 01:19:53.170
would you say?

01:19:53.170 --> 01:19:53.680
Yeah?

01:19:53.680 --> 01:19:54.898
AUDIENCE: Can you loop?

01:19:54.898 --> 01:19:55.940
DAVID MALAN: Yeah, right?

01:19:55.940 --> 01:19:58.220
This is-- we can use a
loop, is the spoiler here.

01:19:58.220 --> 01:19:58.820
Why?

01:19:58.820 --> 01:20:01.590
I mean, my God, it's like the same
code again and again and again.

01:20:01.590 --> 01:20:03.465
The only thing that's
changing is the number.

01:20:03.465 --> 01:20:06.170
And this should have kind of
had some code smell again,

01:20:06.170 --> 01:20:09.080
because if I keep typing the
same thing again and again,

01:20:09.080 --> 01:20:11.810
that's clearly an opportunity
to better design something.

01:20:11.810 --> 01:20:13.650
So let me do this.

01:20:13.650 --> 01:20:18.590
Let me go ahead and still
create my array of size three.

01:20:18.590 --> 01:20:23.270
But let me use our old friend,
the for loop, for int i equals 0,

01:20:23.270 --> 01:20:26.610
i less than 3, i++.

01:20:26.610 --> 01:20:29.510
And then in here, let
me do scores bracket--

01:20:29.510 --> 01:20:32.920
we haven't seen this
before, but any intuition?

01:20:32.920 --> 01:20:34.220
Scores bracket--

01:20:34.220 --> 01:20:34.720
AUDIENCE: i.

01:20:34.720 --> 01:20:39.730
DAVID MALAN: i, because that will
use whatever i is, be it 0 or 1 or 2

01:20:39.730 --> 01:20:40.720
in iteration.

01:20:40.720 --> 01:20:43.780
And then I can get an int,
asking the user for score,

01:20:43.780 --> 01:20:47.000
without having to repeat
myself again and again.

01:20:47.000 --> 01:20:50.560
So hopefully, if I didn't make
any typos, make scores, all good.

01:20:50.560 --> 01:20:54.665
./scores, 72, 73, 33, and
we're back in business.

01:20:54.665 --> 01:20:56.540
But the code is arguably
now better designed,

01:20:56.540 --> 01:21:01.240
because now, I haven't
actually hardcoded the scores,

01:21:01.240 --> 01:21:04.940
and I haven't actually copied
and pasted any of that code.

01:21:04.940 --> 01:21:08.230
Well, if we consider now what's going
on inside of the computer's memory,

01:21:08.230 --> 01:21:10.510
it's pretty much the same
in terms of the values.

01:21:10.510 --> 01:21:15.490
But instead of the variables being,
literally, score1, score2, score3,

01:21:15.490 --> 01:21:17.210
there's just one variable.

01:21:17.210 --> 01:21:19.030
It's an array called scores.

01:21:19.030 --> 01:21:24.550
But you can index into its three
locations by using scores[0] to get

01:21:24.550 --> 01:21:28.810
the first, scores[1] to get the
second, scores[2] to get the third.

01:21:28.810 --> 01:21:29.990
But this is key.

01:21:29.990 --> 01:21:33.040
The memory is contiguous.

01:21:33.040 --> 01:21:35.380
The screen is only so
large, so it wraps around.

01:21:35.380 --> 01:21:38.950
But physically, digitally,
the memory is contiguous-- top

01:21:38.950 --> 01:21:40.270
to bottom, left to right.

01:21:40.270 --> 01:21:41.530
And that's important, why?

01:21:41.530 --> 01:21:46.060
Because the brackets indicate 0,
1, 2, that each of these integers

01:21:46.060 --> 01:21:48.790
is just one integer away from the next.

01:21:48.790 --> 01:21:51.220
It can't be randomly down
here all of a sudden.

01:21:51.220 --> 01:21:54.070
It's got to be back to back to back.

01:21:54.070 --> 01:21:57.130
All right, now equipped
with that paradigm,

01:21:57.130 --> 01:22:00.710
what more could we actually do here?

01:22:00.710 --> 01:22:04.270
Well, it turns out, it's worth
knowing that it's possible in code

01:22:04.270 --> 01:22:06.850
to even pass arrays around as arguments.

01:22:06.850 --> 01:22:09.100
And let me just whip this
program up somewhat quickly,

01:22:09.100 --> 01:22:11.320
just so you've seen it before long.

01:22:11.320 --> 01:22:13.190
But let me go ahead and do this.

01:22:13.190 --> 01:22:18.130
Let me propose that I create a function
that does this averaging for me.

01:22:18.130 --> 01:22:22.510
So I'm going to create a function
called average that returns a float.

01:22:22.510 --> 01:22:26.860
And the arguments this
thing is going to take--

01:22:26.860 --> 01:22:28.640
let's see, it's going to be the array.

01:22:28.640 --> 01:22:31.480
So it turns out, if you want to
take in an array of numbers--

01:22:31.480 --> 01:22:33.050
you can call it anything you want.

01:22:33.050 --> 01:22:36.970
This is how you tell C
that a function takes, not

01:22:36.970 --> 01:22:39.790
an integer, but an array of integers.

01:22:39.790 --> 01:22:41.290
And you don't have to call it array.

01:22:41.290 --> 01:22:42.790
I'm doing that just for
the sake of discussion.

01:22:42.790 --> 01:22:43.660
It can be called x.

01:22:43.660 --> 01:22:44.490
It can be numbers.

01:22:44.490 --> 01:22:45.490
It can be anything else.

01:22:45.490 --> 01:22:49.060
I'm just calling an array to be super
explicit as to what it is there.

01:22:49.060 --> 01:22:51.730
Now, how do I change my code down here?

01:22:51.730 --> 01:22:55.130
What I think I'm going to do
for the moment is just this.

01:22:55.130 --> 01:22:59.110
I'm going to get rid of this code here,
where I manually computed the average.

01:22:59.110 --> 01:23:01.480
And let me just call the
average function here

01:23:01.480 --> 01:23:05.000
by passing in the whole array of scores.

01:23:05.000 --> 01:23:07.030
So this is just an
example of abstraction,

01:23:07.030 --> 01:23:08.890
like now I have a
function called average.

01:23:08.890 --> 01:23:09.670
I don't care.

01:23:09.670 --> 01:23:12.490
I don't have to remember how
it works once I implement it.

01:23:12.490 --> 01:23:15.010
It just kind of tightens up
my main code a little bit.

01:23:15.010 --> 01:23:17.030
But I do still have to implement this.

01:23:17.030 --> 01:23:19.360
So later in my file-- let
me repeat myself before,

01:23:19.360 --> 01:23:22.270
the only time it's OK in C to
repeat yourself again and again,

01:23:22.270 --> 01:23:27.010
by typing out again, average,
and then int array open bracket--

01:23:27.010 --> 01:23:28.580
but now not a semicolon.

01:23:28.580 --> 01:23:30.250
Now I have to implement this thing.

01:23:30.250 --> 01:23:33.400
And I can implement this in
a bunch of different ways,

01:23:33.400 --> 01:23:37.630
but I don't know in advance--

01:23:37.630 --> 01:23:39.040
I can't just do this.

01:23:39.040 --> 01:23:48.400
I can't just do array[0]
plus array[1] plus array[2],

01:23:48.400 --> 01:23:52.130
unless this program's only ever
going to work on three numbers.

01:23:52.130 --> 01:23:55.460
So let me go ahead and do this.

01:23:55.460 --> 01:23:58.570
Let me first propose that
there's a poor design here.

01:23:58.570 --> 01:24:01.930
In my main function, what
value have I repeated twice?

01:24:05.050 --> 01:24:07.550
Among the highlighted lines,
what jumps out at you as twice?

01:24:07.550 --> 01:24:09.020
AUDIENCE: The length of the array?

01:24:09.020 --> 01:24:11.520
DAVID MALAN: Yeah, the length
of the array, it's just three.

01:24:11.520 --> 01:24:14.720
Now it's not a huge deal that I typed
the number three on line 8 and line 9,

01:24:14.720 --> 01:24:17.120
but this is exactly the
kind of like shortcut

01:24:17.120 --> 01:24:18.440
that's going to get you
in trouble eventually.

01:24:18.440 --> 01:24:18.860
Why?

01:24:18.860 --> 01:24:20.240
Because, eventually,
you or someone else is

01:24:20.240 --> 01:24:22.407
going to go in and make the
array bigger or smaller,

01:24:22.407 --> 01:24:24.410
and you're not going to
realize that magically,

01:24:24.410 --> 01:24:26.270
that same number is in two places.

01:24:26.270 --> 01:24:29.270
And indeed, this is what a programmer
would often call a magic number.

01:24:29.270 --> 01:24:31.940
A magic number is one that
just kind of appears magically.

01:24:31.940 --> 01:24:35.210
And you're on the honor system to
change it here, if you change it here,

01:24:35.210 --> 01:24:36.688
and then you change it over here.

01:24:36.688 --> 01:24:39.230
That's not going to end well if
the onus is on the programmer

01:24:39.230 --> 01:24:43.190
to remember where they hardcoded--
that is, wrote out three explicitly.

01:24:43.190 --> 01:24:46.250
So any time you reuse a value
like this, you know what?

01:24:46.250 --> 01:24:50.690
We should probably do what we did last
week, which was to declare a variable,

01:24:50.690 --> 01:24:53.510
perhaps at the very top of my
program, so it's super obvious

01:24:53.510 --> 01:24:56.990
what it is, called, maybe
n, and set that equal to 3.

01:24:56.990 --> 01:24:59.030
Better yet, what did I
do last week to make sure

01:24:59.030 --> 01:25:02.390
that I can't screw up and
accidentally change that value?

01:25:02.390 --> 01:25:03.440
Yeah, constant.

01:25:03.440 --> 01:25:05.810
And the keyword there
was just const for short.

01:25:05.810 --> 01:25:09.110
And now I have a global variable--
global in the sense that I can

01:25:09.110 --> 01:25:11.870
access it anywhere-- that is called n.

01:25:11.870 --> 01:25:12.680
It's an int.

01:25:12.680 --> 01:25:14.450
And it's always going to be 3.

01:25:14.450 --> 01:25:18.500
And now I can improve my main
function a little bit by just changing

01:25:18.500 --> 01:25:22.662
the 3's to n, so now if I, if a
colleague realized, oh, wait a minute,

01:25:22.662 --> 01:25:23.870
there's four tests this year.

01:25:23.870 --> 01:25:25.610
You change n to four,
recompile the code,

01:25:25.610 --> 01:25:31.190
and it just works everywhere else,
except in my average function.

01:25:31.190 --> 01:25:33.830
Let me change it back to
3, just for consistency.

01:25:33.830 --> 01:25:39.770
This is not going to fly now, to just
sum up things like this, for instance,

01:25:39.770 --> 01:25:43.610
and then return this divided by 3.

01:25:43.610 --> 01:25:51.130
Why will this not work
now as I've defined it?

01:25:51.130 --> 01:25:52.159
Yeah?

01:25:52.159 --> 01:25:58.030
AUDIENCE: [INAUDIBLE]

01:25:58.030 --> 01:26:00.980
DAVID MALAN: OK, I might be
returning an integer value when

01:26:00.980 --> 01:26:02.870
I intend to return a float per this.

01:26:02.870 --> 01:26:05.870
But I think I'm OK because I used
that little trick where I made sure

01:26:05.870 --> 01:26:08.810
that at least one of the numbers
in my arithmetic expression

01:26:08.810 --> 01:26:11.010
is, in fact, a floating point value.

01:26:11.010 --> 01:26:14.180
And just by adding the point
0, make sure that everything

01:26:14.180 --> 01:26:15.650
gets treated as a float.

01:26:15.650 --> 01:26:17.864
So I think that's OK.

01:26:17.864 --> 01:26:19.034
AUDIENCE: [INAUDIBLE]

01:26:19.034 --> 01:26:20.701
DAVID MALAN: I'm sorry, a little louder.

01:26:20.701 --> 01:26:24.385
AUDIENCE: It just seems
like you're [INAUDIBLE]..

01:26:24.385 --> 01:26:25.260
DAVID MALAN: Exactly.

01:26:25.260 --> 01:26:27.093
So left hand's not
talking to the right hand

01:26:27.093 --> 01:26:30.210
here, in that my current
implementation of average

01:26:30.210 --> 01:26:33.510
is still assuming that there's only
going to be three tests or whatever.

01:26:33.510 --> 01:26:35.670
But wait a minute, I just
went through the trouble

01:26:35.670 --> 01:26:39.480
of modifying this to be n, generically.

01:26:39.480 --> 01:26:43.205
And if I change this to 4, I'm
not going to be happy, perhaps,

01:26:43.205 --> 01:26:46.080
with my average because now I'm
going to ignore one of my test scores

01:26:46.080 --> 01:26:46.690
altogether.

01:26:46.690 --> 01:26:48.450
So let me change this back to 3.

01:26:48.450 --> 01:26:51.180
And unfortunately, if
it's a variable now,

01:26:51.180 --> 01:26:55.500
n, and therefore, I have literally
a variable number of scores,

01:26:55.500 --> 01:27:00.920
how do I take the average of
a variable number of things?

01:27:00.920 --> 01:27:02.630
I mean, what's my building block there?

01:27:02.630 --> 01:27:03.170
Yeah?

01:27:03.170 --> 01:27:10.100
AUDIENCE: [INAUDIBLE]

01:27:10.100 --> 01:27:10.850
DAVID MALAN: Yeah.

01:27:10.850 --> 01:27:14.880
Why don't I use a loop that goes through
the array and adds things up as you go?

01:27:14.880 --> 01:27:17.360
I mean, kind of like grade school, as
you take the average on your calculator

01:27:17.360 --> 01:27:19.730
or paper and pencil, you just
keep adding the numbers together,

01:27:19.730 --> 01:27:22.380
and then you divide at the end
by the total number of things.

01:27:22.380 --> 01:27:23.520
So how can I do this?

01:27:23.520 --> 01:27:25.730
Well, let me change my
implementation of average

01:27:25.730 --> 01:27:30.515
to first declare a variable called
sum, or whatever, set it equal to 0.

01:27:30.515 --> 01:27:33.140
So this is like me on my piece
of paper getting ready to count,

01:27:33.140 --> 01:27:36.590
or my calculator, of course, when you
turn it on, typically defaults to zero.

01:27:36.590 --> 01:27:41.570
And now, let me do for, int i
equals 0. i is less than a--

01:27:41.570 --> 01:27:43.700
well, no, I didn't do that.

01:27:43.700 --> 01:27:46.730
i is less than n, i++.

01:27:46.730 --> 01:27:52.640
And now in here, let me go ahead
and add to the current sum, whatever

01:27:52.640 --> 01:27:55.910
is in the array's location, i.

01:27:55.910 --> 01:28:00.740
And then down here, I think I can
just return some divided by 3.0--

01:28:00.740 --> 01:28:04.560
not 3.0, n, perhaps here.

01:28:04.560 --> 01:28:08.492
And actually, I think I'm going to
get-- let's make sure it's a float.

01:28:08.492 --> 01:28:11.450
Let's use the type casting trick just
to make sure I don't accidentally

01:28:11.450 --> 01:28:15.540
shortchange someone and throw away
everything after the decimal point.

01:28:15.540 --> 01:28:17.300
So it just escalated quickly, right?

01:28:17.300 --> 01:28:18.990
Average just got a lot more involved.

01:28:18.990 --> 01:28:22.130
It's not just a single one line
of code, but now it's dynamic.

01:28:22.130 --> 01:28:25.070
I initialize a variable called sum to 0.

01:28:25.070 --> 01:28:30.920
In this loop, I go through and just keep
adding to sum, which is initially 0,

01:28:30.920 --> 01:28:33.200
whatever's in array[i]--

01:28:33.200 --> 01:28:36.740
or specifically array[0],
array[1], array[2].

01:28:36.740 --> 01:28:40.970
That gives me a total sum that I return,
divided by the total number of things.

01:28:40.970 --> 01:28:42.560
Now, this I can tighten slightly.

01:28:42.560 --> 01:28:45.650
Recall that this is syntactic
sugar for just adding things.

01:28:45.650 --> 01:28:48.620
I can't use plus plus because
that only literally adds one.

01:28:48.620 --> 01:28:52.630
But I can use here, plus equals.

01:28:52.630 --> 01:28:54.880
Questions on this implementation here?

01:28:54.880 --> 01:28:58.000
Really the only takeaway-- or
the most important takeaway

01:28:58.000 --> 01:29:00.730
is that this is the
syntax for how you tell

01:29:00.730 --> 01:29:04.210
a function that it
expects a whole array, not

01:29:04.210 --> 01:29:06.450
a single variable like
an int or the like.

01:29:06.450 --> 01:29:08.200
You literally use
square brackets, but you

01:29:08.200 --> 01:29:11.530
don't specify the length inside there.

01:29:11.530 --> 01:29:12.748
Yeah?

01:29:12.748 --> 01:29:16.410
AUDIENCE: What variable
[INAUDIBLE] at the top?

01:29:16.410 --> 01:29:18.410
DAVID MALAN: What about
the variable at the top?

01:29:18.410 --> 01:29:22.205
AUDIENCE: [INAUDIBLE]

01:29:22.205 --> 01:29:23.330
DAVID MALAN: Good question.

01:29:23.330 --> 01:29:25.220
What do I have it defined as at the top?

01:29:25.220 --> 01:29:31.280
This variable, N, it must be an integer
if you're going to use it inside

01:29:31.280 --> 01:29:33.840
of an arrays square brackets here.

01:29:33.840 --> 01:29:38.360
So this line 10, notice, no
longer says 3, it says N.

01:29:38.360 --> 01:29:42.350
And so whatever N is 3 or 4 or
something else, that's how many

01:29:42.350 --> 01:29:43.970
integers I will get in that array.

01:29:43.970 --> 01:29:47.070
And it must be, by definition
of an array, an integer that

01:29:47.070 --> 01:29:48.320
goes in those square brackets.

01:29:48.320 --> 01:29:50.000
And here's a common source of confusion.

01:29:50.000 --> 01:29:52.350
When you create the
array, that is declare it,

01:29:52.350 --> 01:29:54.350
you use square brackets
like this, where you put

01:29:54.350 --> 01:29:56.210
the total number of elements you want.

01:29:56.210 --> 01:29:59.820
When you subsequently use the
array, like I'm doing here,

01:29:59.820 --> 01:30:02.690
you don't mention int again--
just like you don't mention int

01:30:02.690 --> 01:30:04.610
again and again once a variable exists.

01:30:04.610 --> 01:30:10.220
You use the square brackets still, but
you don't use N. You use 0 or 1 or 2

01:30:10.220 --> 01:30:11.990
or, generically here, i.

01:30:11.990 --> 01:30:14.810
So when C was designed, they
sometimes used the same syntax

01:30:14.810 --> 01:30:17.060
for two different ideas or contexts.

01:30:17.060 --> 01:30:17.984
Yeah?

01:30:17.984 --> 01:30:22.645
AUDIENCE: Do you have to
include line 6 [INAUDIBLE]??

01:30:22.645 --> 01:30:23.770
DAVID MALAN: Good question.

01:30:23.770 --> 01:30:25.900
Do I have to include line 6?

01:30:25.900 --> 01:30:29.290
Short answer, yes, because of
the reason we ran into last week.

01:30:29.290 --> 01:30:32.750
C, or clang really, reads your
code top to bottom, left to right.

01:30:32.750 --> 01:30:38.890
And so if the compiler sees some mention
of this function average on line 16,

01:30:38.890 --> 01:30:41.800
but you haven't told the
compiler that average exists,

01:30:41.800 --> 01:30:43.610
you're going to get an
error on the screen.

01:30:43.610 --> 01:30:45.490
So the conventional
way to do that is you

01:30:45.490 --> 01:30:48.670
just copy paste the first line
of code from the function,

01:30:48.670 --> 01:30:51.260
it's so-called prototype or declaration.

01:30:51.260 --> 01:30:51.760
Yeah?

01:30:51.760 --> 01:30:55.662
AUDIENCE: Is there a library if you
don't know the size of the array?

01:30:55.662 --> 01:30:58.120
DAVID MALAN: Really good
question, and a perfect segue way.

01:30:58.120 --> 01:31:01.078
Is there a library you can use if
you don't know the size of the array?

01:31:01.078 --> 01:31:01.720
No.

01:31:01.720 --> 01:31:07.660
And so if any of you have programmed
in Java or Python or other languages,

01:31:07.660 --> 01:31:11.020
you can actually just ask
the array, how big is it?

01:31:11.020 --> 01:31:13.778
In C, you and I, the
programmers, have to remember it.

01:31:13.778 --> 01:31:15.820
And so short answer, no,
there's no function that

01:31:15.820 --> 01:31:17.445
will just automatically do this for us.

01:31:17.445 --> 01:31:20.230
And in fact, let me
make a more subtle claim

01:31:20.230 --> 01:31:23.950
that it's fine to use global
variables like this if they're really

01:31:23.950 --> 01:31:25.160
for configuration options.

01:31:25.160 --> 01:31:25.660
Why?

01:31:25.660 --> 01:31:28.160
It's just convenient to put
them at the very top of the file

01:31:28.160 --> 01:31:30.565
because everyone, you,
your colleagues, your TAs

01:31:30.565 --> 01:31:32.440
are going to see them
at the top of the code.

01:31:32.440 --> 01:31:36.130
But you really shouldn't be using
them everywhere throughout your code.

01:31:36.130 --> 01:31:38.380
It'd be better if the average
function, itself, were

01:31:38.380 --> 01:31:40.610
independent of that special variable.

01:31:40.610 --> 01:31:42.025
So by that, I mean this.

01:31:42.025 --> 01:31:46.240
You know what I should really do, if
I really want to be well-designed?

01:31:46.240 --> 01:31:51.400
I should pass in the length of
the array to the average function.

01:31:51.400 --> 01:31:54.310
I should give the average
function a second argument--

01:31:54.310 --> 01:31:57.800
I'll call it length, for instance,
but I could call it anything I want.

01:31:57.800 --> 01:32:02.500
And so rather than putting N all the
way down here at the bottom of my file,

01:32:02.500 --> 01:32:05.745
let me just dynamically
say length instead.

01:32:05.745 --> 01:32:08.620
And this is a subtlety-- and no need
to get too tripped up over this.

01:32:08.620 --> 01:32:11.830
But this, now, is just an example
of how the same function can

01:32:11.830 --> 01:32:13.690
take not one, but two arguments.

01:32:13.690 --> 01:32:19.400
But indeed, in C, you must remember,
yourself, what the length of an array

01:32:19.400 --> 01:32:19.900
is.

01:32:19.900 --> 01:32:22.810
You can't just ask the
array via some syntax

01:32:22.810 --> 01:32:26.560
like you can, those of you who've
programmed before in Java or Python.

01:32:26.560 --> 01:32:27.070
Yeah?

01:32:27.070 --> 01:32:35.115
AUDIENCE: [INAUDIBLE]

01:32:35.115 --> 01:32:36.240
DAVID MALAN: Good question.

01:32:36.240 --> 01:32:39.198
Would it be better designed to write
a function that computes the size?

01:32:39.198 --> 01:32:42.570
Short answer, can't do that in
C. As soon as you pass an array

01:32:42.570 --> 01:32:47.263
into a function in C, you cannot figure
out its size if it's a generic array

01:32:47.263 --> 01:32:48.180
like that of integers.

01:32:48.180 --> 01:32:51.040
There are special cases
that you can do that.

01:32:51.040 --> 01:32:53.283
But in general, no, it's
just not possible in C.

01:32:53.283 --> 01:32:55.200
And if that's some
frustration, honestly, this

01:32:55.200 --> 01:32:57.180
is why more modern
languages add that feature.

01:32:57.180 --> 01:32:57.680
Why?

01:32:57.680 --> 01:32:59.910
Because it was really
annoying, as I'm alluding here

01:32:59.910 --> 01:33:01.560
to not having that information.

01:33:01.560 --> 01:33:03.643
Now, just to make sure I
didn't screw up anywhere,

01:33:03.643 --> 01:33:07.540
let me compile this
final version of scores.

01:33:07.540 --> 01:33:08.620
Suspense.

01:33:08.620 --> 01:33:14.030
All good. ./scores, 72, 73, 33,
and we're still back in business.

01:33:14.030 --> 01:33:15.530
So this version is more complicated.

01:33:15.530 --> 01:33:18.738
And as always, we'll have this version
on the course's website for reference.

01:33:18.738 --> 01:33:20.740
But the point, really,
is that arrays, not only

01:33:20.740 --> 01:33:23.290
can be used as containers
to store multiple values--

01:33:23.290 --> 01:33:25.490
three or more in this case--

01:33:25.490 --> 01:33:30.440
you can also even pass them
around as arguments, as such.

01:33:30.440 --> 01:33:34.300
All right, now besides that,
let's simplify for just a moment,

01:33:34.300 --> 01:33:36.100
and consider now the world of chars.

01:33:36.100 --> 01:33:39.200
If we've just got single
bytes, where does this lead us?

01:33:39.200 --> 01:33:41.200
And how does this get us,
ultimately, to strings

01:33:41.200 --> 01:33:44.170
to solve problems like readability
and cryptography and the like?

01:33:44.170 --> 01:33:46.390
Well here, for instance,
are three lines of code,

01:33:46.390 --> 01:33:48.967
out of context, that
simply store three chars.

01:33:48.967 --> 01:33:50.800
And you can already see
where this is going.

01:33:50.800 --> 01:33:53.920
Having three variables
called c1, c2, c3 is clearly

01:33:53.920 --> 01:33:57.470
going to end up being bad design because
of all the silly redundancy here.

01:33:57.470 --> 01:33:59.650
But notice, I'm using
single quotes like last week

01:33:59.650 --> 01:34:01.330
because these are single chars.

01:34:01.330 --> 01:34:03.647
What does this look like
in the computer's memory?

01:34:03.647 --> 01:34:05.480
Well, it looks a little
something like this.

01:34:05.480 --> 01:34:09.730
If we clear out the old
memory, c1, c2, c3 probably

01:34:09.730 --> 01:34:12.562
will end up here, maybe not literally
in the top left-hand corner.

01:34:12.562 --> 01:34:14.020
This is just an artist's rendition.

01:34:14.020 --> 01:34:18.440
But c1, c2, c3 will
probably end up like that.

01:34:18.440 --> 01:34:20.020
Now, what's really there?

01:34:20.020 --> 01:34:21.730
It's really those same three numbers--

01:34:21.730 --> 01:34:23.350
72, 73, 33.

01:34:23.350 --> 01:34:27.920
But how many bits does a byte have?

01:34:27.920 --> 01:34:28.880
Just eight.

01:34:28.880 --> 01:34:33.830
So if we were to look at the binary
representation of these characters,

01:34:33.830 --> 01:34:35.330
it would only be eight bits each.

01:34:35.330 --> 01:34:39.140
That's enough to store small
numbers like 72, 73, 33.

01:34:39.140 --> 01:34:41.580
We're not dealing with Unicode
and emoji and the like.

01:34:41.580 --> 01:34:42.837
But the point is the same.

01:34:42.837 --> 01:34:45.170
You don't have to use four
bytes to store these numbers.

01:34:45.170 --> 01:34:48.087
You can use a different data type
like chars, and underneath the hood,

01:34:48.087 --> 01:34:51.420
it's, indeed, going to use
just single bytes for each.

01:34:51.420 --> 01:34:55.850
But this is sort of like a-- this isn't
really how we implement strings, right?

01:34:55.850 --> 01:34:59.270
When you wanted to say, hi, last
week, or this, we used double quotes.

01:34:59.270 --> 01:35:02.400
And we wrote all of the things together
and used one variable, not three,

01:35:02.400 --> 01:35:02.900
right?

01:35:02.900 --> 01:35:06.260
When I typed in David, I didn't
have a variable for D-A-V-I-D.

01:35:06.260 --> 01:35:09.750
I had one variable called name
that stored the whole thing.

01:35:09.750 --> 01:35:13.310
So in C, we keep talking about
these things called strings.

01:35:13.310 --> 01:35:17.427
We'll see, eventually, that strings are
not necessarily what they seem to be.

01:35:17.427 --> 01:35:19.760
But for now, the key thing
about strings is that they're

01:35:19.760 --> 01:35:22.070
variable length, so to speak, right?

01:35:22.070 --> 01:35:25.250
They might be three characters,
Hi, or five characters, David,

01:35:25.250 --> 01:35:28.250
or anything smaller or larger.

01:35:28.250 --> 01:35:30.980
So how do we go about
implementing strings,

01:35:30.980 --> 01:35:33.110
if all we have at the end
of the day is my memory?

01:35:33.110 --> 01:35:36.290
Well, here is an example of
just creating, declaring,

01:35:36.290 --> 01:35:39.650
and defining a string called s. s
because it's just a simple string,

01:35:39.650 --> 01:35:41.900
and quote unquote,
HI!, in double quotes.

01:35:41.900 --> 01:35:44.090
What does this look like
in the computer's memory?

01:35:44.090 --> 01:35:45.230
Well, let's clear it again.

01:35:45.230 --> 01:35:48.110
And here, now, because it's
technically stored in one variable,

01:35:48.110 --> 01:35:50.960
s, here is how I might
draw it as an artist.

01:35:50.960 --> 01:35:52.520
It's three bytes in total--

01:35:52.520 --> 01:35:53.990
H-I exclamation point.

01:35:53.990 --> 01:35:59.630
But there's no c1, c2, c3, it's
just, the whole thing is s.

01:35:59.630 --> 01:36:03.800
But it turns out that
a string, fun fact,

01:36:03.800 --> 01:36:06.990
is really just what underneath the hood?

01:36:06.990 --> 01:36:09.610
Kind of leading up to this--

01:36:09.610 --> 01:36:12.090
what is a string, if this is
how it's laid out in memory?

01:36:12.090 --> 01:36:13.190
AUDIENCE: An array.

01:36:13.190 --> 01:36:15.830
DAVID MALAN: Literally, it's
just an array of characters.

01:36:15.830 --> 01:36:18.590
And we didn't have to know about
arrays last week to use strings.

01:36:18.590 --> 01:36:21.382
This is where, again, the training
wheels are starting to come off.

01:36:21.382 --> 01:36:23.730
But a string is just
an array of characters.

01:36:23.730 --> 01:36:26.040
H-I exclamation point, for instance.

01:36:26.040 --> 01:36:28.370
So technically, an array--

01:36:28.370 --> 01:36:33.890
or a string called s is really a
variable called s that allows you

01:36:33.890 --> 01:36:38.150
to get at the first character with
s[0], if you want-- s[1], s[2].

01:36:38.150 --> 01:36:40.340
You can literally get
individual characters

01:36:40.340 --> 01:36:43.820
just by treating s as though
it's an array, which it really

01:36:43.820 --> 01:36:47.000
is underneath the hood, in this case.

01:36:47.000 --> 01:36:48.560
But there's a catch.

01:36:48.560 --> 01:36:51.500
How do you know where strings end?

01:36:51.500 --> 01:36:54.560
In the past, when I drew
some integers on the screen,

01:36:54.560 --> 01:36:57.080
I know, I claim they
always take up 4 bytes.

01:36:57.080 --> 01:37:00.200
If I had drawn a long, it
always takes up 8 bytes.

01:37:00.200 --> 01:37:03.530
If I had drawn a character,
it always takes up 1 byte.

01:37:03.530 --> 01:37:06.533
But how many bytes
does a string take up?

01:37:06.533 --> 01:37:08.450
Yeah, I mean, that's
kind of the right answer.

01:37:08.450 --> 01:37:10.490
In this case, three, it would seem.

01:37:10.490 --> 01:37:13.490
But if it's David, that's
a good five characters.

01:37:13.490 --> 01:37:16.173
But where do we put the number three?

01:37:16.173 --> 01:37:17.840
Where do you put the number five, right?

01:37:17.840 --> 01:37:20.190
This is literally all
that's inside your computer.

01:37:20.190 --> 01:37:23.430
This is all our building
blocks in front of us.

01:37:23.430 --> 01:37:25.490
So how can we-- where does the three go?

01:37:25.490 --> 01:37:26.540
Where does the five go?

01:37:26.540 --> 01:37:29.420
Well, it turns out you can solve
this in a couple of different ways.

01:37:29.420 --> 01:37:34.160
But the way humans decided to implement
strings years ago is, indeed, an array,

01:37:34.160 --> 01:37:38.960
but they added one extra byte at
the end of every such string array,

01:37:38.960 --> 01:37:41.840
just to make clear, with a
so-called sentinel value,

01:37:41.840 --> 01:37:44.480
that the string ends here.

01:37:44.480 --> 01:37:45.050
Why?

01:37:45.050 --> 01:37:47.930
So that if you have two strings
in the computer's memory like, HI!

01:37:47.930 --> 01:37:52.760
and bye, you know where the barrier is
between the exclamation point of one

01:37:52.760 --> 01:37:54.590
and the letter B in the next, right?

01:37:54.590 --> 01:37:56.000
You need some kind of delimiter.

01:37:56.000 --> 01:38:00.110
And so what really is
underneath the hood is this.

01:38:00.110 --> 01:38:04.460
When you store a string in memory, when
you type in a string-- as the user,

01:38:04.460 --> 01:38:07.040
if you type in 3 characters,
it's going to use

01:38:07.040 --> 01:38:10.280
3 plus 1 equals 4 bytes in total.

01:38:10.280 --> 01:38:14.130
If you type in David, it's going to
use 5 plus 1 equals 6 bytes in total.

01:38:14.130 --> 01:38:14.630
Why?

01:38:14.630 --> 01:38:20.210
Because C automatically adds this
special 0 at the end of the string.

01:38:20.210 --> 01:38:24.710
I've drawn it with backslash 0 because
this is how you represent 0 as a char,

01:38:24.710 --> 01:38:25.710
as a character.

01:38:25.710 --> 01:38:28.230
But this is literally
just 0, as we'll soon see.

01:38:28.230 --> 01:38:31.100
So any time there's a string
in memory, it always takes up

01:38:31.100 --> 01:38:36.197
one more byte than you, yourself,
as the programmer or human typed in.

01:38:36.197 --> 01:38:38.780
In fact, if we convert this
again, just for discussion's sake,

01:38:38.780 --> 01:38:41.572
to those integers, what's literally
stored in the computer's memory

01:38:41.572 --> 01:38:45.170
is going to be 72, 73, 33, and now a 0.

01:38:45.170 --> 01:38:48.240
And the computer, because of
C and how it was invented,

01:38:48.240 --> 01:38:51.350
it's just smart enough to know
that when you print out a string,

01:38:51.350 --> 01:38:54.530
it prints out every
character until it sees a 0,

01:38:54.530 --> 01:38:56.150
and then it just stops printing.

01:38:56.150 --> 01:38:58.470
In particular, printf
knows how this works.

01:38:58.470 --> 01:39:02.050
And this is why printf
knows when to stop printing.

01:39:02.050 --> 01:39:03.800
Decimal numbers are
not that enlightening.

01:39:03.800 --> 01:39:05.940
We'll generally write
the characters like this.

01:39:05.940 --> 01:39:09.350
And again, backslash 0 is
just special symbology.

01:39:09.350 --> 01:39:13.190
It's what the programmer types to make
clear that you're not saying, HI!, 0.

01:39:13.190 --> 01:39:15.980
You're saying HI!, and
then it's a special 0.

01:39:15.980 --> 01:39:20.887
Specifically, it is eight
0 bits that indicate

01:39:20.887 --> 01:39:22.220
that it's the end of the string.

01:39:22.220 --> 01:39:26.330
Technically, that backslash zero, if
you want to be fancy, it's called null,

01:39:26.330 --> 01:39:27.320
N-U-L-L.

01:39:27.320 --> 01:39:30.320
And it turns out, you've seen this
before, though we didn't call it out.

01:39:30.320 --> 01:39:33.230
Here's that same ASCII chart
from the past couple of weeks.

01:39:33.230 --> 01:39:39.080
If I highlight this, what is
decimal number 0 mapping to?

01:39:39.080 --> 01:39:42.830
NUL, which is just programmer speak
for the special null character.

01:39:42.830 --> 01:39:46.550
All 0 bits that means
the string ends here.

01:39:46.550 --> 01:39:48.510
This all happens automatically for you.

01:39:48.510 --> 01:39:53.420
You do not need to create these
null characters or these zeros.

01:39:53.420 --> 01:40:00.030
Any questions then, on this
implementation thus far?

01:40:00.030 --> 01:40:01.820
Any questions here?

01:40:01.820 --> 01:40:02.320
No?

01:40:02.320 --> 01:40:03.195
Well, let me do this.

01:40:03.195 --> 01:40:05.310
Let me go back to VS Code in a second.

01:40:05.310 --> 01:40:07.770
And let's actually corroborate
this with some code.

01:40:07.770 --> 01:40:10.830
Let me go ahead and create
a small program called hi.c.

01:40:10.830 --> 01:40:12.070
And how about we do this?

01:40:12.070 --> 01:40:14.550
Let me include stdio.h.

01:40:14.550 --> 01:40:18.670
Let me include-- let me type
out int main void, as always.

01:40:18.670 --> 01:40:20.910
And now let me do something
simple and kind of bad,

01:40:20.910 --> 01:40:24.960
but char c1 equals quote
unquote, h, in single quotes.

01:40:24.960 --> 01:40:28.590
Char c2 equals quote
unquote, I, in single quotes.

01:40:28.590 --> 01:40:32.830
And lastly, char c3 equals
exclamation point, in single quotes.

01:40:32.830 --> 01:40:34.500
And now, let me just print this out.

01:40:34.500 --> 01:40:36.960
I can't use %s because
that is not a string.

01:40:36.960 --> 01:40:40.290
That's literally three chars, because
that's the design decision I made.

01:40:40.290 --> 01:40:41.430
But I could do this--

01:40:41.430 --> 01:40:48.600
%c, %c, %c, which we haven't seen
before, but %s is string, %i is int,

01:40:48.600 --> 01:40:51.060
%c is, indeed, char.

01:40:51.060 --> 01:40:54.150
So let me put a backslash n
at the end for cleanliness,

01:40:54.150 --> 01:40:56.280
and now do, c1, c2, c3.

01:40:56.280 --> 01:41:00.430
So this is like a char-based
version of printing string.

01:41:00.430 --> 01:41:01.650
So let me make HI!

01:41:01.650 --> 01:41:05.880
And then let me do ./hi, and it
looks like I used printf with %s.

01:41:05.880 --> 01:41:09.750
But I did things very manually by
printing out each individual character.

01:41:09.750 --> 01:41:11.700
What's cool now,
though, is that once you

01:41:11.700 --> 01:41:15.270
know that characters are just numbers
and strings are just characters,

01:41:15.270 --> 01:41:16.560
you can kind of poke around.

01:41:16.560 --> 01:41:21.970
Let me change all three
placeholders to %i instead.

01:41:21.970 --> 01:41:23.860
And this is totally fine, too.

01:41:23.860 --> 01:41:26.310
Let me rerun this, make hi.

01:41:26.310 --> 01:41:31.570
Actually, let me make one
change, just so we can see this.

01:41:31.570 --> 01:41:37.710
Let me add spaces, just for aesthetics
sake, let me do make hi, ./hi, Enter,

01:41:37.710 --> 01:41:40.350
and voila, like now, you can
actually see the numbers,

01:41:40.350 --> 01:41:44.085
that I claimed back in week zero, were
in fact happening underneath the hood.

01:41:44.085 --> 01:41:45.960
Well, this is not how
you would make strings.

01:41:45.960 --> 01:41:49.457
It'd be incredibly tedious to have three
variables for three letter words, five

01:41:49.457 --> 01:41:50.790
variables for five letter words.

01:41:50.790 --> 01:41:52.998
We've been using, of course,
strings since last week,

01:41:52.998 --> 01:41:54.450
so let's do that instead.

01:41:54.450 --> 01:41:59.370
String s equals quote
unquote, double quotes "HI!"

01:41:59.370 --> 01:42:02.520
For this, no, because of
these training wheels,

01:42:02.520 --> 01:42:04.560
I need to include the CS50 library.

01:42:04.560 --> 01:42:06.580
But we'll come back to
that in the coming weeks.

01:42:06.580 --> 01:42:10.530
But for now, I'm going to go ahead and
create a string s called quote unquote,

01:42:10.530 --> 01:42:11.580
"HI!"

01:42:11.580 --> 01:42:14.760
And now I'm going to change
this to be my familiar %s,

01:42:14.760 --> 01:42:17.610
and now just print out s itself.

01:42:17.610 --> 01:42:20.430
This, of course, is the same
thing as last week, ./hi,

01:42:20.430 --> 01:42:24.750
gives me the exact same thing, but now,
we're dealing, of course, with strings.

01:42:24.750 --> 01:42:27.610
But how can we see a little beyond that?

01:42:27.610 --> 01:42:28.810
Well, how about this?

01:42:28.810 --> 01:42:31.530
Let's poke around further
with today's primitives.

01:42:31.530 --> 01:42:35.580
Even though s is a string, I could
technically print out its first

01:42:35.580 --> 01:42:39.000
character with %c by doing s[0].

01:42:39.000 --> 01:42:43.110
I could technically print out its
second character with %c by doing s[1].

01:42:43.110 --> 01:42:47.820
I could print out its third character
with %c and printing out s[2].

01:42:47.820 --> 01:42:50.430
So again, this just derives
logically from my understanding

01:42:50.430 --> 01:42:52.770
now that strings are
arrays, as you note.

01:42:52.770 --> 01:42:54.540
Let me do make--

01:42:54.540 --> 01:42:57.300
let me do make hi, ./hi.

01:42:57.300 --> 01:43:00.760
And no visual change, but I'm
just kind of now tinkering around.

01:43:00.760 --> 01:43:03.400
And in fact, if you're really
curious, let me do this.

01:43:03.400 --> 01:43:06.870
Let me change these
back to i, back to i--

01:43:06.870 --> 01:43:08.250
oops, back to i.

01:43:08.250 --> 01:43:11.310
And let me add a fourth one
because if I'm really curious now,

01:43:11.310 --> 01:43:14.490
let's see what's in s[3].

01:43:14.490 --> 01:43:16.020
This is the fourth byte.

01:43:16.020 --> 01:43:18.990
And even though the
string itself is H-I,

01:43:18.990 --> 01:43:21.840
I think we can corroborate
this whole null thing.

01:43:21.840 --> 01:43:26.248
Make hi, ./hi, Enter, and there it is.

01:43:26.248 --> 01:43:28.290
You could have done this
last week, if you really

01:43:28.290 --> 01:43:29.580
wanted to geek out on strings.

01:43:29.580 --> 01:43:33.060
But for now, it's just revealing
what's going on underneath the hood.

01:43:33.060 --> 01:43:36.480
Questions then, on
what these strings are?

01:43:36.480 --> 01:43:37.498
Yeah?

01:43:37.498 --> 01:43:41.293
AUDIENCE: [INAUDIBLE]

01:43:41.293 --> 01:43:42.960
DAVID MALAN: Why do we need the bracket?

01:43:42.960 --> 01:43:45.430
AUDIENCE: [INAUDIBLE]

01:43:45.430 --> 01:43:47.180
DAVID MALAN: Why do
you not need brackets?

01:43:47.180 --> 01:43:47.780
Good question.

01:43:47.780 --> 01:43:51.620
Why do I not need brackets on line 6?

01:43:51.620 --> 01:43:53.300
Because s is a string.

01:43:53.300 --> 01:43:56.930
We'll see in a couple of
weeks that s is, essentially,

01:43:56.930 --> 01:44:00.200
implemented underneath the
hood, indeed, as an array,

01:44:00.200 --> 01:44:02.240
but that happens automatically for you.

01:44:02.240 --> 01:44:06.800
You can treat s as just a variable
name without square brackets.

01:44:06.800 --> 01:44:09.500
You will use square brackets
when you have arrays of ints

01:44:09.500 --> 01:44:13.730
or you manually create arrays of chars
or doubles or floats or anything else.

01:44:13.730 --> 01:44:14.900
But strings are special.

01:44:14.900 --> 01:44:15.440
Why?

01:44:15.440 --> 01:44:19.190
I mean, every program you write seems
to use strings, text in some form.

01:44:19.190 --> 01:44:21.930
We're humans we like text,
not just numbers and such.

01:44:21.930 --> 01:44:25.910
So this is just treated a little
specially in C and many other languages

01:44:25.910 --> 01:44:28.580
as well.

01:44:28.580 --> 01:44:31.170
Other questions on this here?

01:44:31.170 --> 01:44:31.670
No?

01:44:31.670 --> 01:44:33.530
Let's add then, one
other string to the mix.

01:44:33.530 --> 01:44:36.290
So instead of just saying, HI!,
why don't we consider a version

01:44:36.290 --> 01:44:38.660
of the program that
says both, HI! and BYE!.

01:44:38.660 --> 01:44:41.420
And I claim now that
that backslash zero,

01:44:41.420 --> 01:44:44.270
that null character is going
to be ever more important now

01:44:44.270 --> 01:44:46.820
if we've got two strings
in memory, so that C knows

01:44:46.820 --> 01:44:48.570
how to distinguish one from the other.

01:44:48.570 --> 01:44:51.487
So let me go ahead and just get rid
of these two lines for the moment.

01:44:51.487 --> 01:44:55.430
Let me recreate string s equals,
quote unquote double quotes, "HI!"

01:44:55.430 --> 01:44:56.780
Let me give myself another one.

01:44:56.780 --> 01:44:59.905
And because I'm just playing around,
I'll choose very short variable names.

01:44:59.905 --> 01:45:04.410
String t equals quote unquote, "BYE!"

01:45:04.410 --> 01:45:06.470
And then let me just
print them both out.

01:45:06.470 --> 01:45:11.300
Let me go ahead and print
out %s, backslash n, comma s,

01:45:11.300 --> 01:45:16.910
and then printf %s
backslash n, and then t.

01:45:16.910 --> 01:45:19.970
So very simple demonstration
of just these two variables.

01:45:19.970 --> 01:45:26.090
Make hi, ./hi, and of course, it prints
out two lines, one after the other.

01:45:26.090 --> 01:45:27.980
What's actually going
on underneath the hood?

01:45:27.980 --> 01:45:29.510
Well, let's go back to
the computer's memory.

01:45:29.510 --> 01:45:32.160
HI!, I think, is going to be,
I claim, pretty much the same.

01:45:32.160 --> 01:45:36.170
So s, I'll claim, is in the top
left, followed by the backslash zero.

01:45:36.170 --> 01:45:40.035
And that's important now because BYE!
probably is going to end up there.

01:45:40.035 --> 01:45:43.160
And visually, it wraps just by nature
of how I've drawn this grid of bytes,

01:45:43.160 --> 01:45:44.330
but it's contiguous.

01:45:44.330 --> 01:45:46.340
B-Y-E-!

01:45:46.340 --> 01:45:51.470
null, A.K.A. backslash zero,
this is now helpful to printf

01:45:51.470 --> 01:45:55.550
because now printf knows
where one begins and ends

01:45:55.550 --> 01:45:58.580
by way of that special null character.

01:45:58.580 --> 01:46:00.230
But we can poke around now, too.

01:46:00.230 --> 01:46:01.620
What else can I do here?

01:46:01.620 --> 01:46:02.840
How about this?

01:46:02.840 --> 01:46:08.870
How about I go into my code here,
back to VS code, and let me go ahead

01:46:08.870 --> 01:46:13.790
and say something like, well, if
I've got two of these strings,

01:46:13.790 --> 01:46:15.410
you know, let's put them in an array.

01:46:15.410 --> 01:46:20.520
Let's kind of do this sort of arrays in
arrays, sort of inception-style here.

01:46:20.520 --> 01:46:23.060
So string words[2].

01:46:23.060 --> 01:46:25.100
So give me an array
of two strings is what

01:46:25.100 --> 01:46:28.100
I'm saying here in code, even though
we've not done it with strings yet.

01:46:28.100 --> 01:46:29.270
We only did it with ints.

01:46:29.270 --> 01:46:30.770
And now let me do this.

01:46:30.770 --> 01:46:35.480
The first word A.K.A. words[0]
will equal, as before, HI!

01:46:35.480 --> 01:46:40.940
And now words[1] will
equal quote unquote, "BYE!"

01:46:40.940 --> 01:46:43.760
And now I've done the exact
same thing, but again, I'm

01:46:43.760 --> 01:46:48.650
just avoiding having s, t, q, r, and all
these different variables in my code.

01:46:48.650 --> 01:46:52.790
I just now am treating them as
one single array of strings.

01:46:52.790 --> 01:46:54.750
How do I change my code down here?

01:46:54.750 --> 01:46:57.380
Well, if I want to print the
first word, I do words[0].

01:46:57.380 --> 01:46:59.900
And if I want to print the
second word, I do words[1].

01:46:59.900 --> 01:47:02.088
This is not a useful
exercise at the moment

01:47:02.088 --> 01:47:04.130
because I'm just making
my code more complicated.

01:47:04.130 --> 01:47:06.830
But again, it allows us to
poke around and see what's

01:47:06.830 --> 01:47:08.690
going on because there is that HI!

01:47:08.690 --> 01:47:09.530
and BYE!.

01:47:09.530 --> 01:47:10.700
But watch this.

01:47:10.700 --> 01:47:14.670
If I really want to be
cool, I can do this.

01:47:14.670 --> 01:47:24.380
Let's print out %c, %c, %c, backslash
n, and then here, %c, %c, %c, %c,

01:47:24.380 --> 01:47:25.700
so four of those.

01:47:25.700 --> 01:47:28.430
And now here's where
things get interesting.

01:47:28.430 --> 01:47:30.620
Words is an array of strings.

01:47:30.620 --> 01:47:33.400
Again, if I may, what's a string?

01:47:33.400 --> 01:47:35.060
An array of characters.

01:47:35.060 --> 01:47:36.790
So just use the same logic.

01:47:36.790 --> 01:47:41.110
If words is an array of strings, you
get at the first string with words[0].

01:47:41.110 --> 01:47:44.530
How do you get at the first
character in the first string?

01:47:44.530 --> 01:47:52.150
Bracket 0, words[0][1],
and lastly, words[0][2].

01:47:52.150 --> 01:47:57.460
And now down here, words[1], but
the first character is there.

01:47:57.460 --> 01:48:00.400
Word[1], the second character is here.

01:48:00.400 --> 01:48:03.190
Words[1], the third character is here--

01:48:03.190 --> 01:48:04.720
whoops-- third character's here.

01:48:04.720 --> 01:48:07.898
And words[1], the fourth
character is here.

01:48:07.898 --> 01:48:09.190
This is not how people program.

01:48:09.190 --> 01:48:10.840
This is only for demonstrations sake.

01:48:10.840 --> 01:48:13.060
My God, it's so tedious
and verbose already.

01:48:13.060 --> 01:48:20.410
But if I make hi now, ./hi, now,
I'm manually reinventing %s,

01:48:20.410 --> 01:48:22.990
if I forgot it existed, using %c alone.

01:48:22.990 --> 01:48:25.900
But you can indeed manipulate
arrays in this way.

01:48:25.900 --> 01:48:28.300
But because strings are
arrays of characters,

01:48:28.300 --> 01:48:32.200
you can manipulate
strings in this way too.

01:48:32.200 --> 01:48:34.675
Any question now on this syntax?

01:48:37.210 --> 01:48:38.800
Any questions here?

01:48:38.800 --> 01:48:39.460
No?

01:48:39.460 --> 01:48:39.970
No?

01:48:39.970 --> 01:48:42.070
All right, well, let's
go ahead and propose

01:48:42.070 --> 01:48:45.830
that we solve a couple of other
problems we might not have as before.

01:48:45.830 --> 01:48:49.150
But first, a quick visual of what's
been going on underneath the hood here.

01:48:49.150 --> 01:48:52.420
If here, again, is where we left
off on the screen, HI! and BYE!

01:48:52.420 --> 01:48:56.470
back to back, here is really
how I just treated these things.

01:48:56.470 --> 01:49:00.880
s bracket 0, 1, 2, 3 and
then t 0, 1, 2, 3, 4.

01:49:00.880 --> 01:49:04.840
But really, once I put them in an
array, the picture becomes this.

01:49:04.840 --> 01:49:07.030
Words[0] is the whole HI!.

01:49:07.030 --> 01:49:08.680
Words[1] is the whole BYE!.

01:49:08.680 --> 01:49:11.470
But if I really get into
the weeds and start indexing

01:49:11.470 --> 01:49:14.980
into individual characters in
those strings, all I'm using

01:49:14.980 --> 01:49:20.710
is new syntax in order to
represent these same values here.

01:49:20.710 --> 01:49:28.710
Questions then, on these
representations before we forge ahead?

01:49:28.710 --> 01:49:29.430
No?

01:49:29.430 --> 01:49:30.030
Yeah?

01:49:30.030 --> 01:49:33.390
AUDIENCE: Does the new line
character not [INAUDIBLE]??

01:49:33.390 --> 01:49:36.030
DAVID MALAN: Does the new line
character-- say that once more?

01:49:36.030 --> 01:49:38.597
AUDIENCE: Does the new line
character take up any space?

01:49:38.597 --> 01:49:40.180
DAVID MALAN: Ah, really good question.

01:49:40.180 --> 01:49:42.730
Does the new line character
take up any space?

01:49:42.730 --> 01:49:45.340
It does, so far as printf is concerned.

01:49:45.340 --> 01:49:48.790
But I'm not storing the
backslash n in my strings,

01:49:48.790 --> 01:49:53.460
printf is being manually
handed that thing instead.

01:49:53.460 --> 01:49:55.520
All right, so let's go
ahead then and consider

01:49:55.520 --> 01:49:58.970
how we might solve some problems that
have arisen now with these strings,

01:49:58.970 --> 01:50:00.680
as follows here.

01:50:00.680 --> 01:50:02.760
Suppose I-- let's do this.

01:50:02.760 --> 01:50:04.400
Let me go back to VS Code here.

01:50:04.400 --> 01:50:09.980
And let me go ahead and open up a
new file called, how about, length.c.

01:50:09.980 --> 01:50:12.680
And let's consider for a moment
how I might actually figure out

01:50:12.680 --> 01:50:16.130
what the length of a string is, which
is distinct from the length of an array.

01:50:16.130 --> 01:50:19.680
I claimed earlier, you cannot figure out
dynamically what the length of an array

01:50:19.680 --> 01:50:20.180
is.

01:50:20.180 --> 01:50:24.020
But I can figure out the length
of a string, specifically, because

01:50:24.020 --> 01:50:26.960
of this implementation detail
of that null character.

01:50:26.960 --> 01:50:28.500
So let me go ahead and do this.

01:50:28.500 --> 01:50:31.940
Let me include cs50.h in
this second program here.

01:50:31.940 --> 01:50:35.090
Let me include stdio.h, as before.

01:50:35.090 --> 01:50:38.120
And let me do this, int main void--

01:50:38.120 --> 01:50:40.970
and the first thing I'll do is
just get a string from the user.

01:50:40.970 --> 01:50:43.250
I'll ask the user, as
always, for their name.

01:50:43.250 --> 01:50:48.170
So I'll call getString, and say, what's
your name, question mark, as always.

01:50:48.170 --> 01:50:51.620
And then down here, if I want to
figure out the length of this string

01:50:51.620 --> 01:50:56.210
and print the length out
on the screen, well, I

01:50:56.210 --> 01:50:58.465
can kind of do this similar
in spirit to the average,

01:50:58.465 --> 01:50:59.840
where I'm accumulating something.

01:50:59.840 --> 01:51:02.600
Let me go ahead and initialize N to 0.

01:51:02.600 --> 01:51:05.120
Let me give myself--

01:51:05.120 --> 01:51:07.035
it's not a for loop
because I don't have a--

01:51:07.035 --> 01:51:08.660
I don't know in advance how long it is.

01:51:08.660 --> 01:51:09.980
But what if I do this?

01:51:09.980 --> 01:51:20.600
While the value at name[n]
does not equal '/0'--

01:51:20.600 --> 01:51:23.390
crazy syntax at the moment,
but it's just the culmination

01:51:23.390 --> 01:51:25.590
of these various building blocks.

01:51:25.590 --> 01:51:28.970
Let me just finish
the thought here, n++.

01:51:28.970 --> 01:51:33.656
And then down here, let's just
print out, with printf and %i,

01:51:33.656 --> 01:51:38.930
that value of N. So I claim this is
going to show me the length of any

01:51:38.930 --> 01:51:43.220
string I type in, whether it's hi
or bye or David or anything else.

01:51:43.220 --> 01:51:45.410
I initialize a variable
to zero, and that's good

01:51:45.410 --> 01:51:47.535
because that's where you
start counting in general.

01:51:47.535 --> 01:51:50.990
While name[0] does not
equal backslash zero.

01:51:50.990 --> 01:51:51.930
What is this saying?

01:51:51.930 --> 01:51:55.580
Well, if name is the string the user
typed in-- and name is just an array,

01:51:55.580 --> 01:51:56.460
as you noted--

01:51:56.460 --> 01:51:59.390
the name[0] is going to
be the first character.

01:51:59.390 --> 01:52:02.750
And I'm asking the question, well,
does the first character not equal

01:52:02.750 --> 01:52:03.680
backslash zero?

01:52:03.680 --> 01:52:08.750
And if I type in David, D, it's not,
so I keep going and I add 1 to N.

01:52:08.750 --> 01:52:10.750
Then I'm going to check name[1].

01:52:10.750 --> 01:52:13.895
Well, if I typed in David,
name[1] is going to be A.

01:52:13.895 --> 01:52:18.020
A does not equal backslash zero, and
so it's going to go again and again

01:52:18.020 --> 01:52:18.740
and again.

01:52:18.740 --> 01:52:23.090
But five steps in total later,
it's going to get to the byte after

01:52:23.090 --> 01:52:26.480
D-A-V-I-D, realize, wait a
minute, that is a backslash n.

01:52:26.480 --> 01:52:29.750
The loop finishes, and I
print out the total length.

01:52:29.750 --> 01:52:33.050
Arrays, in general, do not
have this null character.

01:52:33.050 --> 01:52:34.910
However, strings do.

01:52:34.910 --> 01:52:38.150
Again, strings are special versus
all of the other data types

01:52:38.150 --> 01:52:39.590
we've talked about thus far.

01:52:39.590 --> 01:52:43.220
But how could I, for
instance, do this differently?

01:52:43.220 --> 01:52:47.220
Well, let's actually factor this out
as a function, as I've commonly done.

01:52:47.220 --> 01:52:50.540
But rather than implement
it myself, you know what?

01:52:50.540 --> 01:52:54.140
It turns out what's nice
about strings being so common,

01:52:54.140 --> 01:52:57.260
there are many other people who
have solved these problems before.

01:52:57.260 --> 01:53:00.290
And in fact, there's a
whole string library in C.

01:53:00.290 --> 01:53:04.190
It is used by way of a
header file called string.h.

01:53:04.190 --> 01:53:08.400
And what string.h is, is a library
of string-related functions.

01:53:08.400 --> 01:53:10.760
In fact, you can see
in CS50's manual pages

01:53:10.760 --> 01:53:16.217
for C, the string.h functions, at least
those that we recommend as most useful,

01:53:16.217 --> 01:53:18.050
and in particular, if
you poke around there,

01:53:18.050 --> 01:53:20.290
you'll see that there's
a function called strlen.

01:53:20.290 --> 01:53:22.055
It means string length.

01:53:22.055 --> 01:53:24.680
It was named very succinctly,
just because it's a little easier

01:53:24.680 --> 01:53:25.850
to type than string length.

01:53:25.850 --> 01:53:28.800
But strlen tells you
the length of a string.

01:53:28.800 --> 01:53:30.990
So how might I use this in my code here?

01:53:30.990 --> 01:53:34.020
Well, it turns out, I can
simplify this quite a bit.

01:53:34.020 --> 01:53:37.700
Let me get rid of my loop,
get rid of my accounting

01:53:37.700 --> 01:53:40.880
manually, and do something
like this-- int n

01:53:40.880 --> 01:53:45.630
equals strlen of the humans name, name.

01:53:45.630 --> 01:53:49.430
And now I'll just use printf,
as before, with %i backslash n,

01:53:49.430 --> 01:53:51.290
and output the value of n.

01:53:51.290 --> 01:53:54.380
But there's a bug at the moment.

01:53:54.380 --> 01:53:58.480
What have I forgotten to do?

01:53:58.480 --> 01:54:01.670
Yeah, I have to include the header
file at the top of the screen,

01:54:01.670 --> 01:54:03.260
so let me-- at the top of the code.

01:54:03.260 --> 01:54:07.640
So let me also include
string.h at the top of my file,

01:54:07.640 --> 01:54:10.970
so that C knows that,
in fact, strlen exists.

01:54:10.970 --> 01:54:14.170
Let me go ahead and
make length, as before.

01:54:14.170 --> 01:54:18.670
./length-- or actually, really for
the first time, what's your name?

01:54:18.670 --> 01:54:22.360
D-A-V-I-D. And hopefully,
I'm going to see, in fact, 5.

01:54:22.360 --> 01:54:26.950
By contrast, if I run it again
and type in HI!, now I see three.

01:54:26.950 --> 01:54:29.785
So strlen is just one of the
functions in that library.

01:54:29.785 --> 01:54:30.910
And there are so many more.

01:54:30.910 --> 01:54:33.700
In fact, yet another library that
might be useful moving forward

01:54:33.700 --> 01:54:37.570
is this one, ctype,
which relates to C data

01:54:37.570 --> 01:54:40.580
types and lots of functions
therein that can be useful.

01:54:40.580 --> 01:54:43.690
For instance, if you review its
documentation in the manual pages

01:54:43.690 --> 01:54:46.930
online, you'll see that
there are functions via which

01:54:46.930 --> 01:54:49.460
we can solve problems like this.

01:54:49.460 --> 01:54:52.480
Let me go ahead and propose here--

01:54:52.480 --> 01:54:53.680
let me see.

01:54:53.680 --> 01:54:59.080
Let's do an example here involving--

01:54:59.080 --> 01:55:03.250
how about checking if something
is uppercase or lowercase,

01:55:03.250 --> 01:55:06.700
and converting it to uppercase only.

01:55:06.700 --> 01:55:10.810
Let me go back to VS Code, and
code a program called uppercase.c.

01:55:10.810 --> 01:55:15.220
In this, file I'm going to start by
including now, as always, cs50.h.

01:55:15.220 --> 01:55:17.710
I'm going to include stdio.h.

01:55:17.710 --> 01:55:21.670
And I'm going to add one
other to the mix, which

01:55:21.670 --> 01:55:26.230
is string.h now too, so I can access
the length of things as needed.

01:55:26.230 --> 01:55:28.570
Int main void comes next.

01:55:28.570 --> 01:55:30.460
And then within my main
function, I'm going

01:55:30.460 --> 01:55:32.230
to go ahead and declare
a string called s.

01:55:32.230 --> 01:55:34.240
I'm going to call getString, as before.

01:55:34.240 --> 01:55:38.170
And I'm going to go ahead and just ask
the user for a string called before.

01:55:38.170 --> 01:55:39.670
I want to do a before and after.

01:55:39.670 --> 01:55:41.350
Whatever the user types in is before.

01:55:41.350 --> 01:55:44.770
But I want to force everything
to uppercase, thereafter.

01:55:44.770 --> 01:55:48.740
Let me now, in this loop here, do this.

01:55:48.740 --> 01:55:53.800
Let me printf quote unquote, "After,"
just so we can see this on the screen.

01:55:53.800 --> 01:56:02.440
And let me do four int i gets 0,
i is less than strlen of s, i++.

01:56:02.440 --> 01:56:03.610
What am I about to do?

01:56:03.610 --> 01:56:06.190
I'm about to iterate over
every character in the string

01:56:06.190 --> 01:56:11.230
from left to right, from 0 on up to,
but not through, the length of s.

01:56:11.230 --> 01:56:13.990
And how do I check if
something is lowercase,

01:56:13.990 --> 01:56:16.990
so that I can actually
force it to uppercase?

01:56:16.990 --> 01:56:19.630
Well, it turns out, I
could do this literally.

01:56:19.630 --> 01:56:27.436
If the character in s at location i
is greater than or equal to capital A,

01:56:27.436 --> 01:56:31.780
ampersand, ampersand, which means
and instead of or, which we saw

01:56:31.780 --> 01:56:37.930
in the past, s[i] is less than
or equal to little z, that means,

01:56:37.930 --> 01:56:41.800
logically in English, that
this is indeed lowercase.

01:56:41.800 --> 01:56:44.830
How do I now convert it to
uppercase, this character?

01:56:44.830 --> 01:56:48.160
Well, I could just literally
print out the same character.

01:56:48.160 --> 01:56:52.280
But that would not be the answer here
because that's not changing the value.

01:56:52.280 --> 01:56:54.470
But what could I do instead?

01:56:54.470 --> 01:56:59.890
Well, let me actually pull up here
real fast the ASCII chart as before,

01:56:59.890 --> 01:57:03.220
and let's see if we
can't glean some insight.

01:57:03.220 --> 01:57:05.710
If I pull up the same
ASCII chart, and suppose

01:57:05.710 --> 01:57:09.790
the human has typed in a
lowercase a, that's 97.

01:57:09.790 --> 01:57:13.240
What letter-- I want to
convert it to uppercase

01:57:13.240 --> 01:57:18.660
A, so what number do I want to
convert the 97 to, per week zero?

01:57:18.660 --> 01:57:21.000
So 65, we keep coming back to that one.

01:57:21.000 --> 01:57:23.010
What if the user types in lowercase b?

01:57:23.010 --> 01:57:27.550
I want to change the 98
value to 66, and so forth.

01:57:27.550 --> 01:57:30.130
And any quick math, how
far apart are those?

01:57:30.130 --> 01:57:33.120
So it's always 32, like
uppercase to lowercase

01:57:33.120 --> 01:57:37.990
is always, wonderfully, good
design, 32 away, one from the other.

01:57:37.990 --> 01:57:39.100
So what does this mean?

01:57:39.100 --> 01:57:41.350
Well, I think we saw earlier
that underneath the hood,

01:57:41.350 --> 01:57:42.600
a char is just a number.

01:57:42.600 --> 01:57:44.340
You can certainly do arithmetic on it.

01:57:44.340 --> 01:57:46.507
And here, again, if you
understand these lower level

01:57:46.507 --> 01:57:48.180
primitives, what if I do this?

01:57:48.180 --> 01:57:53.940
Whatever s[i] is, if I know on
line 13 that it's lowercase,

01:57:53.940 --> 01:57:57.048
do I want to add or subtract 32?

01:57:57.048 --> 01:57:57.840
AUDIENCE: Subtract.

01:57:57.840 --> 01:58:01.910
DAVID MALAN: So I want to subtract
because I want to go from like 97 to 65

01:58:01.910 --> 01:58:06.560
or 98 to 66, so indeed, if you do
some quick math, that gives you 32.

01:58:06.560 --> 01:58:10.970
So it's suffices to just treat
chars as numbers, subtract the 32,

01:58:10.970 --> 01:58:16.370
and printing it with %c, I think, will
just convert lowercase to uppercase.

01:58:16.370 --> 01:58:19.795
If you now fast forward to the real
world, Microsoft Word or Google Docs,

01:58:19.795 --> 01:58:22.670
if you've ever chosen the menu option
that forces things to uppercase

01:58:22.670 --> 01:58:24.980
or lowercase on occasion,
literally, that's

01:58:24.980 --> 01:58:26.480
what Microsoft and Google have done.

01:58:26.480 --> 01:58:29.605
They iterate over every character in
the document, check if it's lowercase,

01:58:29.605 --> 01:58:33.810
and if so, they subtract 32 from
it and show you the new value.

01:58:33.810 --> 01:58:36.650
What if, though, it is
not a lowercase letter?

01:58:36.650 --> 01:58:40.520
I think I can keep it easy and just
print out the current letter unchanged,

01:58:40.520 --> 01:58:44.850
if my goal is to simply force things
to all uppercase, and that letter,

01:58:44.850 --> 01:58:46.490
then would be s[i].

01:58:46.490 --> 01:58:50.750
So let me go ahead now and make
uppercase, hopefully, no errors.

01:58:50.750 --> 01:58:55.670
./uppercase, and I'll now type
in David with an uppercase D,

01:58:55.670 --> 01:58:57.120
but lowercase everything else.

01:58:57.120 --> 01:59:00.020
But now the after version is DAVID--

01:59:00.020 --> 01:59:01.190
an aesthetic bug.

01:59:01.190 --> 01:59:04.400
Notice here, I forgot to include,
just for prettiness sake,

01:59:04.400 --> 01:59:05.930
a backslash n at the end.

01:59:05.930 --> 01:59:07.640
No problem, I'll add that.

01:59:07.640 --> 01:59:08.870
Let me fix my mistake.

01:59:08.870 --> 01:59:12.050
Make uppercase, ./uppercase, Enter.

01:59:12.050 --> 01:59:14.240
D-A-V-I-D, Enter, and voila.

01:59:14.240 --> 01:59:16.820
And I deliberately added
another space after,

01:59:16.820 --> 01:59:19.130
just so they would line up
pretty, even though before

01:59:19.130 --> 01:59:22.070
and after have different
numbers of letters.

01:59:22.070 --> 01:59:25.630
Questions then, on this
implementation of forcing something

01:59:25.630 --> 01:59:28.380
to uppercase, which in and of
itself is not all that enlightening,

01:59:28.380 --> 01:59:33.990
but is representative now of how you
can leverage these low level primitives.

01:59:33.990 --> 01:59:35.880
Question?

01:59:35.880 --> 01:59:36.380
No?

01:59:36.380 --> 01:59:38.633
All right, well, this
honestly is tedious.

01:59:38.633 --> 01:59:40.550
My God, like does
Microsoft, Google, everyone,

01:59:40.550 --> 01:59:43.550
you have to literally write out this
code just to do something simple?

01:59:43.550 --> 01:59:46.310
Well, no, that's, again, why
we have things like libraries.

01:59:46.310 --> 01:59:49.220
And increasingly now, for problem
sets, projects, and beyond,

01:59:49.220 --> 01:59:52.040
well, you just use libraries
more often off-the-shelf

01:59:52.040 --> 01:59:55.940
so as to solve problems that, surely,
other people have had before you.

01:59:55.940 --> 01:59:59.570
So how can I now use
this library, ctype.h?

01:59:59.570 --> 02:00:01.320
Well, let me go back into my code.

02:00:01.320 --> 02:00:05.090
Let me include this among
my header files here.

02:00:05.090 --> 02:00:08.030
Just so I can skim things easily,
I tend to alphabetize my headers.

02:00:08.030 --> 02:00:11.238
But that's not strictly necessary, but
it allows me, at a glance, to realize,

02:00:11.238 --> 02:00:13.400
did I or did I not
include something I need?

02:00:13.400 --> 02:00:15.570
Now, let me go ahead and do this.

02:00:15.570 --> 02:00:20.390
It turns out if you read the
documentation for the C type library,

02:00:20.390 --> 02:00:24.710
there's a function,
wonderfully called, if islower,

02:00:24.710 --> 02:00:28.910
that takes in a character as its
argument, essentially, so s[i].

02:00:28.910 --> 02:00:32.182
And if that returns true, a
Boolean value, if you will,

02:00:32.182 --> 02:00:33.890
well, I'm going to
force it to lowercase.

02:00:33.890 --> 02:00:36.560
But I don't have to
do this math anymore.

02:00:36.560 --> 02:00:40.610
Turns out, in the C type library,
there's also a function called to upper

02:00:40.610 --> 02:00:43.130
that takes a character
as input, like s[i],

02:00:43.130 --> 02:00:45.060
and it just does the math for you.

02:00:45.060 --> 02:00:47.270
So that you can abstract
away the 32 thing,

02:00:47.270 --> 02:00:50.400
and just know that someone else
has solved that problem for you.

02:00:50.400 --> 02:00:53.030
Otherwise, I can leave my
code unchanged down below

02:00:53.030 --> 02:00:55.200
because I'm not changing anything else.

02:00:55.200 --> 02:01:00.410
So if I do make uppercase now,
and then ./uppercase, D-a-v-i-d,

02:01:00.410 --> 02:01:03.710
with just a capital D,
and now it still works.

02:01:03.710 --> 02:01:06.890
But if you read the documentation
further, it turns out that to upper

02:01:06.890 --> 02:01:07.520
is smart.

02:01:07.520 --> 02:01:10.220
If you pass in a character to
to upper, that's lowercase,

02:01:10.220 --> 02:01:13.040
it obviously converts it to
uppercase by doing that math.

02:01:13.040 --> 02:01:17.240
But if you pass in a character to
to upper that's already uppercase,

02:01:17.240 --> 02:01:21.540
the documentation you would see tells
you that it leaves it unchanged.

02:01:21.540 --> 02:01:23.910
So I can tighten all of this up.

02:01:23.910 --> 02:01:25.880
I can get rid of the whole else.

02:01:25.880 --> 02:01:29.150
I can get rid of the whole
if, and arguably now,

02:01:29.150 --> 02:01:33.620
implement a program that's just
as correct, but better designed.

02:01:33.620 --> 02:01:34.250
Why?

02:01:34.250 --> 02:01:38.000
Fewer lines of code easier to read,
lower probability of mistakes,

02:01:38.000 --> 02:01:39.740
assuming the library is correct.

02:01:39.740 --> 02:01:43.160
It just makes it easier and
faster for me, now, to write code.

02:01:43.160 --> 02:01:47.960
So if I now do, one last time,
make uppercase, Enter, ./uppercase,

02:01:47.960 --> 02:01:50.190
and type in my name, still working.

02:01:50.190 --> 02:01:53.810
But now notice, we've whittled this
down to far fewer lines of code,

02:01:53.810 --> 02:01:57.740
albeit, using now this
additional library.

02:01:57.740 --> 02:02:00.140
Questions then on how we did this?

02:02:03.930 --> 02:02:06.230
Well, even though this
code, I daresay, is correct,

02:02:06.230 --> 02:02:09.120
it's not necessarily
well-designed just yet.

02:02:09.120 --> 02:02:12.590
In fact, there's one line
of code, one function

02:02:12.590 --> 02:02:14.690
call in this current
implementation that's

02:02:14.690 --> 02:02:17.900
more inefficient than it needs to be.

02:02:17.900 --> 02:02:20.630
And allow me to draw your
attention to this here,

02:02:20.630 --> 02:02:24.320
line 10, wherein we're calling strlen.

02:02:24.320 --> 02:02:27.350
But we're calling it inside of
this for loop, specifically,

02:02:27.350 --> 02:02:29.000
inside of the condition.

02:02:29.000 --> 02:02:33.720
And why might that not
necessarily be the best idea?

02:02:33.720 --> 02:02:36.810
Well, is the length of the
string as changing, ever?

02:02:36.810 --> 02:02:38.950
I mean, certainly not within
the span of this loop.

02:02:38.950 --> 02:02:42.840
And so here we are within our for
loop on line 10, 11, 12, and 13,

02:02:42.840 --> 02:02:45.242
asking on every iteration
that same question.

02:02:45.242 --> 02:02:46.200
What's the length of s?

02:02:46.200 --> 02:02:47.190
What's the length of s?

02:02:47.190 --> 02:02:48.330
What's the length of s?

02:02:48.330 --> 02:02:50.702
And in turn, we're
calling strlen every time,

02:02:50.702 --> 02:02:52.660
even though we're getting
back the same answer.

02:02:52.660 --> 02:02:54.960
So I daresay a better
solution here would

02:02:54.960 --> 02:02:58.230
be to maybe figure out the length
of s earlier on in my code,

02:02:58.230 --> 02:02:59.490
and maybe declare a variable.

02:02:59.490 --> 02:03:02.580
Or perhaps do something that's
syntactically a little more elegant,

02:03:02.580 --> 02:03:05.070
and in fact, a very common
design in a loop like this,

02:03:05.070 --> 02:03:07.860
would be to declare not
just one variable like i,

02:03:07.860 --> 02:03:12.060
but to actually declare a second
variable called n, for instance, where

02:03:12.060 --> 02:03:16.530
n is just some number, set
n equal to the length of s.

02:03:16.530 --> 02:03:18.900
But thereafter, inside
of this condition,

02:03:18.900 --> 02:03:24.540
instead of calling strlen of s again and
again and again, what might I now do?

02:03:24.540 --> 02:03:28.110
I could instead just
compare i against n itself,

02:03:28.110 --> 02:03:31.080
because n now will only be calculated
once when it's initialized,

02:03:31.080 --> 02:03:32.730
just as i is initialize to zero.

02:03:32.730 --> 02:03:36.000
And thereafter, we're going to be
comparing i, which is changing,

02:03:36.000 --> 02:03:37.350
against n, which will not be.

02:03:37.350 --> 02:03:40.330
So it's going to be marginally
more efficient by design.

02:03:40.330 --> 02:03:42.900
Now with that said, a
good compiler could also

02:03:42.900 --> 02:03:46.080
recognize that there is this
optimization possibility,

02:03:46.080 --> 02:03:47.100
and maybe do it for us.

02:03:47.100 --> 02:03:49.080
But for now, best to
get into the habit, best

02:03:49.080 --> 02:03:52.260
to develop the muscle memory for
making those better design decisions

02:03:52.260 --> 02:03:54.010
yourselves.

02:03:54.010 --> 02:03:56.380
Questions, then, on how we did this?

02:03:58.900 --> 02:03:59.650
No?

02:03:59.650 --> 02:04:03.050
All right, a few final
building blocks for the day.

02:04:03.050 --> 02:04:07.870
So we started by talking about those
command line arguments that clang uses,

02:04:07.870 --> 02:04:13.090
whereby, anything after the command
that you type at a prompt, be it make

02:04:13.090 --> 02:04:18.160
or clang or even CD in Linux,
any word thereafter, or something

02:04:18.160 --> 02:04:21.350
cryptic like -o is a
command line argument.

02:04:21.350 --> 02:04:22.840
It's an input to the command.

02:04:22.840 --> 02:04:26.132
It's different from a function argument
because a function argument, of course,

02:04:26.132 --> 02:04:27.280
is an input to a function.

02:04:27.280 --> 02:04:28.345
But it's the same idea.

02:04:28.345 --> 02:04:30.970
It's just different syntax after
the dollar sign at the prompt.

02:04:30.970 --> 02:04:33.880
Well, it turns out that
command line arguments

02:04:33.880 --> 02:04:37.660
are something you can now
use in your own programs

02:04:37.660 --> 02:04:41.800
by accessing words after the prompt.

02:04:41.800 --> 02:04:45.410
And let me propose that
we invent this as follows.

02:04:45.410 --> 02:04:49.540
Let me propose that we
switch back to VS Code here,

02:04:49.540 --> 02:04:53.560
and I'll open a new file
here called greet.c.

02:04:53.560 --> 02:04:56.410
So in greet.c, it's going to be
a program that very simply greets

02:04:56.410 --> 02:04:57.070
the user.

02:04:57.070 --> 02:04:59.440
Had we written this last
week, we would have done this.

02:04:59.440 --> 02:05:08.200
Include cs50.h, and then include
stdio.h, and then int main void,

02:05:08.200 --> 02:05:13.060
and then we might do something simple
like string name equals getString,

02:05:13.060 --> 02:05:15.980
quote unquote, "What's your name?"

02:05:15.980 --> 02:05:20.020
And then we would have printed
out, as always, Hello, %s,

02:05:20.020 --> 02:05:21.490
and then plugging in that name.

02:05:21.490 --> 02:05:25.300
So this is the same program we've
implemented many times, just

02:05:25.300 --> 02:05:26.590
to make sure it works--

02:05:26.590 --> 02:05:29.140
although, nope, that's not
quite the same program.

02:05:29.140 --> 02:05:30.940
Semicolon's in the wrong place.

02:05:30.940 --> 02:05:32.960
This now is the same program.

02:05:32.960 --> 02:05:37.610
So make greet, dot ./greet, and I'll
type in my own name. hello, David.

02:05:37.610 --> 02:05:38.770
So we're back there.

02:05:38.770 --> 02:05:41.770
Now, what's arguably a little
annoying about this program,

02:05:41.770 --> 02:05:44.110
if I type in something
else like, Carter,

02:05:44.110 --> 02:05:48.130
Enter, I have to run the program,
wait for the prompt, type in my name,

02:05:48.130 --> 02:05:48.910
hit Enter.

02:05:48.910 --> 02:05:52.360
And that's fine, but imagine if
every program worked like this.

02:05:52.360 --> 02:05:55.415
Like make, suppose you could only
type make, then you wait for a prompt,

02:05:55.415 --> 02:05:58.540
then you type the name of the program
you want to make, then you hit Enter.

02:05:58.540 --> 02:06:01.720
Or worse, in Linux when you
have to change directories,

02:06:01.720 --> 02:06:05.263
as you might have for problem set one,
what if you had to type CD, Enter,

02:06:05.263 --> 02:06:07.930
now type the name of the folder
you want to change into, Enter--

02:06:07.930 --> 02:06:09.710
I mean, it just slows life down.

02:06:09.710 --> 02:06:11.470
And so it just gets annoying quickly.

02:06:11.470 --> 02:06:16.070
So command line arguments just let you
express your whole thought all at once.

02:06:16.070 --> 02:06:18.200
So how can I do this?

02:06:18.200 --> 02:06:22.450
Well, if I want to express the notion
of command line arguments in my code,

02:06:22.450 --> 02:06:25.640
I could do something like this.

02:06:25.640 --> 02:06:28.750
I could, for the very
first time, go up and get

02:06:28.750 --> 02:06:33.730
rid of this void, which as of today
means, this program takes no command

02:06:33.730 --> 02:06:34.780
line arguments.

02:06:34.780 --> 02:06:37.540
And I can change it to exactly this.

02:06:37.540 --> 02:06:43.490
Int argc, string argv, with brackets.

02:06:43.490 --> 02:06:44.950
Now it's cryptic, admittedly.

02:06:44.950 --> 02:06:46.150
And let me zoom in.

02:06:46.150 --> 02:06:49.300
But I think we can perhaps
infer now, what's going on.

02:06:49.300 --> 02:06:52.750
If main now does not have
void as its input, which

02:06:52.750 --> 02:06:55.600
means it takes no arguments,
surely, the spoiler

02:06:55.600 --> 02:06:59.230
here is that now main will take
command line arguments somehow.

02:06:59.230 --> 02:07:05.180
Any guesses as to what
argv is or will be?

02:07:05.180 --> 02:07:08.330
What might this represent?

02:07:08.330 --> 02:07:11.390
It's an array of strings,
right, by way of the syntax.

02:07:11.390 --> 02:07:13.223
Yeah?

02:07:13.223 --> 02:07:15.480
AUDIENCE: All the characters
will be typed out.

02:07:15.480 --> 02:07:16.050
DAVID MALAN: Exactly.

02:07:16.050 --> 02:07:18.550
It will be all of the characters,
or really all of the words

02:07:18.550 --> 02:07:19.830
that you type at the prompt.

02:07:19.830 --> 02:07:21.765
Argc, as an int, any guess?

02:07:24.360 --> 02:07:28.700
Argument count is what it generally
stands for, though technically,

02:07:28.700 --> 02:07:30.290
you could call these things anything.

02:07:30.290 --> 02:07:31.520
But this is the convention.

02:07:31.520 --> 02:07:35.780
Because I claimed earlier that arrays
don't keep track of their own length,

02:07:35.780 --> 02:07:38.930
if you want to know how many words
the human typed at the prompt

02:07:38.930 --> 02:07:41.420
after your program's
name, you have to be told,

02:07:41.420 --> 02:07:45.650
not just the array of the words,
but the length of that array.

02:07:45.650 --> 02:07:48.530
The strings, you can figure
out the length of using strlen,

02:07:48.530 --> 02:07:53.360
but you can't figure out the length of
the array of strings, the collection

02:07:53.360 --> 02:07:55.020
of words that the human typed in.

02:07:55.020 --> 02:07:56.760
So how can I now use this?

02:07:56.760 --> 02:07:59.190
Well, let me go ahead and do this.

02:07:59.190 --> 02:08:04.190
Let me go ahead and change this program
now just to be printf, quote unquote,

02:08:04.190 --> 02:08:11.630
"hello, %2 /n", then argv[1].

02:08:11.630 --> 02:08:14.780
So this is not the best version
of my code yet, but it's my first.

02:08:14.780 --> 02:08:21.020
Make greet, and now let me do
./greet, David all at once.

02:08:21.020 --> 02:08:23.210
Enter, hello, David.

02:08:23.210 --> 02:08:25.820
Now let me run it
again, ./greet, Carter.

02:08:25.820 --> 02:08:27.620
Enter, hello, Carter.

02:08:27.620 --> 02:08:29.840
It's a marginal improvement,
but I don't have

02:08:29.840 --> 02:08:32.330
to wait for getString to
prompt me to hit Enter.

02:08:32.330 --> 02:08:34.370
It's just speeding
things up, twice as fast.

02:08:34.370 --> 02:08:36.890
One less command to type in.

02:08:36.890 --> 02:08:41.390
But I deliberately did [1], but
what's the beginning of argv?

02:08:41.390 --> 02:08:42.170
It would be [0].

02:08:44.730 --> 02:08:45.780
Well, what's that?

02:08:45.780 --> 02:08:48.840
This is sometimes useful,
though for now, it's not.

02:08:48.840 --> 02:08:54.110
Suppose I recompile my code and
run this program now, greet David.

02:08:54.110 --> 02:08:58.598
Anyone want to guess what's in argv[0]?

02:08:58.598 --> 02:08:59.530
AUDIENCE: [INAUDIBLE]

02:08:59.530 --> 02:09:00.220
DAVID MALAN: Say again?

02:09:00.220 --> 02:09:01.230
AUDIENCE: Greet, hello.

02:09:01.230 --> 02:09:04.530
DAVID MALAN: Greet,
Enter, hello, ./greet.

02:09:04.530 --> 02:09:08.280
So if you want to sort of inception
style your program to figure out what

02:09:08.280 --> 02:09:11.910
its own name is, or at least how it
was executed at the command line,

02:09:11.910 --> 02:09:14.460
at the terminal, you
can look at argv[0].

02:09:14.460 --> 02:09:17.160
In general, probably not
that useful, probably better

02:09:17.160 --> 02:09:21.900
to start looking at [1], which was
the first word after the program name.

02:09:21.900 --> 02:09:25.320
And if there were more, I could
do this how about argv[2],

02:09:25.320 --> 02:09:27.690
let me add in a second %s.

02:09:27.690 --> 02:09:29.550
Let me recompile greet.

02:09:29.550 --> 02:09:35.490
Let me do ./greet David Malan,
Enter, and that, too, now works,

02:09:35.490 --> 02:09:37.112
taking in two words at the prompt.

02:09:37.112 --> 02:09:38.820
If I really want to
be smart at this now,

02:09:38.820 --> 02:09:40.445
I could do something like this, though.

02:09:40.445 --> 02:09:44.700
How about if the count of
arguments, A.K.A. argc,

02:09:44.700 --> 02:09:49.890
equals equals to, then assume that the
human typed in only their first name,

02:09:49.890 --> 02:09:58.440
and do printf hello comma
%s /n, and then argv[1].

02:09:58.440 --> 02:10:01.470
Else, if the human did
not provide exactly two

02:10:01.470 --> 02:10:04.920
arguments, the name of the
program and their own name,

02:10:04.920 --> 02:10:07.890
let's just print out a default
value, lest they forgot their name

02:10:07.890 --> 02:10:09.990
or they typed in two
names or three names.

02:10:09.990 --> 02:10:13.110
Let's just do, hello
comma world as a default.

02:10:13.110 --> 02:10:15.270
And we'll just ignore
what the human typed in.

02:10:15.270 --> 02:10:20.850
If I recompile this, make greet, I
can do ./greet and David again, Enter.

02:10:20.850 --> 02:10:24.840
Oops-- sorry, what am I missing?

02:10:24.840 --> 02:10:26.640
Yeah, so newbie mistake.

02:10:26.640 --> 02:10:30.090
Else, all right, make greet again.

02:10:30.090 --> 02:10:34.050
./greet, David, Enter,
there's my hello, David.

02:10:34.050 --> 02:10:37.870
But if I omit my name, I just get
the generic, like a default value.

02:10:37.870 --> 02:10:41.590
And if I get a little curious and I type
in both names, then I get ignored too.

02:10:41.590 --> 02:10:42.090
Why?

02:10:42.090 --> 02:10:44.880
Because I just haven't built
in support for argc of three.

02:10:44.880 --> 02:10:47.610
I could do anything I want,
but now we have access

02:10:47.610 --> 02:10:50.730
to these kinds of building blocks.

02:10:50.730 --> 02:10:52.780
All right, what else might I do here?

02:10:52.780 --> 02:10:57.660
Well, it turns out there might be some
final features for us to now execute.

02:10:57.660 --> 02:11:00.090
Notice, though, that
in C, despite what you

02:11:00.090 --> 02:11:02.820
might see in books or
online tutorials, nowadays,

02:11:02.820 --> 02:11:06.180
the two official formats
for defining a main function

02:11:06.180 --> 02:11:11.130
are either this, which we've been using
now for two plus weeks or now this,

02:11:11.130 --> 02:11:14.250
whereby, you change
the void to int argc,

02:11:14.250 --> 02:11:17.880
and then for now, string
argv, and then empty brackets.

02:11:17.880 --> 02:11:20.608
And we'll see that this, too, is
a simplification, some training

02:11:20.608 --> 02:11:21.400
wheels if you will.

02:11:21.400 --> 02:11:23.550
But for now, those are
the two forms, even

02:11:23.550 --> 02:11:26.550
though you will see in online
tutorials and even books, some people

02:11:26.550 --> 02:11:27.840
use main in different ways.

02:11:27.840 --> 02:11:30.142
These are the two now to keep in mind.

02:11:30.142 --> 02:11:32.100
And I'll note that these
command line arguments

02:11:32.100 --> 02:11:33.360
are kind of all over the place.

02:11:33.360 --> 02:11:35.590
Didn't probably expect to see
this word on the screen here.

02:11:35.590 --> 02:11:36.490
And what does it mean?

02:11:36.490 --> 02:11:37.920
Well, it turns out that
for decades-- there's

02:11:37.920 --> 02:11:40.080
actually this program that
comes with Linux systems

02:11:40.080 --> 02:11:41.880
in particular called cowsay.

02:11:41.880 --> 02:11:42.510
Why?

02:11:42.510 --> 02:11:45.300
Probably because someone had too
much free time once and decided

02:11:45.300 --> 02:11:49.920
to write a program that creates ASCII
art out of a cow saying something

02:11:49.920 --> 02:11:51.520
textually on the screen.

02:11:51.520 --> 02:11:55.780
But you use cowsay, just for fun,
by way of command line arguments.

02:11:55.780 --> 02:12:00.660
So for instance, let me propose
that I go back to VS Code

02:12:00.660 --> 02:12:03.020
here, not because I
want to write any code,

02:12:03.020 --> 02:12:04.770
but I just want to use
my terminal window.

02:12:04.770 --> 02:12:07.320
And let me maximize my
terminal window here.

02:12:07.320 --> 02:12:11.880
And let me go ahead and type in
something like, how about cowsay,

02:12:11.880 --> 02:12:13.170
space moo?

02:12:13.170 --> 02:12:14.822
So cowsay is not a program I wrote.

02:12:14.822 --> 02:12:16.030
It's been around for decades.

02:12:16.030 --> 02:12:18.870
But we installed it in VS
Code for you in the cloud.

02:12:18.870 --> 02:12:21.330
It takes at least one
command line argument.

02:12:21.330 --> 02:12:23.070
What do you want the cow to say?

02:12:23.070 --> 02:12:26.190
I can say, cowsay moo, and
hit Enter, and voila, there

02:12:26.190 --> 02:12:29.490
is my ASCII art of a cow
saying moo on the screen.

02:12:29.490 --> 02:12:31.090
It can say multiple words.

02:12:31.090 --> 02:12:33.960
So I can say, Hello, world, Enter.

02:12:33.960 --> 02:12:35.800
And now it says, Hello, world.

02:12:35.800 --> 02:12:38.730
So this is just an example of a
silly program that uses command line

02:12:38.730 --> 02:12:40.470
arguments, but it takes others too.

02:12:40.470 --> 02:12:43.650
Just like clang, use this
convention of hyphens

02:12:43.650 --> 02:12:45.750
to change the output of the program.

02:12:45.750 --> 02:12:49.350
Dash something is just a super common
convention with command line arguments

02:12:49.350 --> 02:12:53.520
when you want a very terse notation
for some option like output.

02:12:53.520 --> 02:12:56.460
In cowsay, I read the
documentation, and it turns out

02:12:56.460 --> 02:12:59.040
there's a dash f command
line argument that

02:12:59.040 --> 02:13:03.460
allows you to change the
appearance of the cow, if you will.

02:13:03.460 --> 02:13:10.170
So if I do cowsay dash f, duck, and
then some other word like quack,

02:13:10.170 --> 02:13:11.640
it's no longer a cow.

02:13:11.640 --> 02:13:15.850
That command line argument turns it
into a tiny, adorable duck instead.

02:13:15.850 --> 02:13:19.020
And then lastly, just for fun,
because I spent way too much time

02:13:19.020 --> 02:13:20.790
playing with command line arguments.

02:13:20.790 --> 02:13:25.260
Cowsay dash f, dragon, and
then how about, rawr, Enter,

02:13:25.260 --> 02:13:27.910
you can even get this
on the screen here.

02:13:27.910 --> 02:13:30.150
So this, too, is just
an example of what you

02:13:30.150 --> 02:13:34.230
can do with these command line arguments
now that we have this building block.

02:13:34.230 --> 02:13:36.960
And there's one final thing
we can now do with code.

02:13:36.960 --> 02:13:39.150
There's one last
feature today that we'll

02:13:39.150 --> 02:13:41.610
introduce before we now
connect all of these dots

02:13:41.610 --> 02:13:47.520
to readability and encryption by
talking, lastly, about something called

02:13:47.520 --> 02:13:48.450
exit status.

02:13:48.450 --> 02:13:52.380
It turns out that whenever
your main function exits,

02:13:52.380 --> 02:13:55.590
it returns a secret integer
that you can figure out,

02:13:55.590 --> 02:13:58.260
as the programmer or an
advanced user, what it was.

02:13:58.260 --> 02:14:02.398
And these exit codes, exit statuses,
are typically used to indicate errors.

02:14:02.398 --> 02:14:05.190
So for instance, over the past
couple of years, if you've used zoom

02:14:05.190 --> 02:14:08.560
and you ever got some kind of error,
you might have seen a screen like this.

02:14:08.560 --> 02:14:11.040
It's usually not that helpful,
maybe tells you to click

02:14:11.040 --> 02:14:13.050
Report Problem or Contact Support.

02:14:13.050 --> 02:14:16.980
But very often in our human
world on Macs, PCs, and phones,

02:14:16.980 --> 02:14:20.010
you see cryptic error codes,
like literally numbers

02:14:20.010 --> 02:14:23.640
that probably only Zoom knows, or
Microsoft or Google or whatever company

02:14:23.640 --> 02:14:25.050
wrote the software you're using.

02:14:25.050 --> 02:14:28.260
But that number corresponds
to a specific error

02:14:28.260 --> 02:14:32.070
that some human somewhere
knows might very well happen.

02:14:32.070 --> 02:14:34.950
These are used similarly,
although under a different name

02:14:34.950 --> 02:14:38.260
that we'll talk about later in
the term, on the web as well.

02:14:38.260 --> 02:14:41.350
Have you ever seen this-- maybe
not character, but number?

02:14:41.350 --> 02:14:43.485
So, 404 means what?

02:14:43.485 --> 02:14:44.880
AUDIENCE: Error.

02:14:44.880 --> 02:14:47.790
DAVID MALAN: So error,
yes, but really, not found.

02:14:47.790 --> 02:14:48.410
So, why?

02:14:48.410 --> 02:14:49.993
I mean, this is the most arcane thing.

02:14:49.993 --> 02:14:53.000
And we'll talk in a few weeks about
what this and other numbers mean,

02:14:53.000 --> 02:14:54.917
but numbers are all
around us in technology,

02:14:54.917 --> 02:14:57.500
and they very often mean something
to the technical people who

02:14:57.500 --> 02:15:00.270
wrote the software, less so
to humans like you and me.

02:15:00.270 --> 02:15:03.230
Why so many of us recognize
404 is kind of weird,

02:15:03.230 --> 02:15:05.900
that like that's been around
long enough that we all know it.

02:15:05.900 --> 02:15:10.250
But it really is just a special number
that represents an error of some sort.

02:15:10.250 --> 02:15:13.100
So it turns out, the last
thing we'll reveal today

02:15:13.100 --> 02:15:15.530
about what we've been taking
for granted for two weeks,

02:15:15.530 --> 02:15:18.200
is what the int is in main.

02:15:18.200 --> 02:15:21.650
We've seen, just a moment ago, that
the thing in the parentheses, which

02:15:21.650 --> 02:15:24.680
up until now has been void, which
means no command line arguments.

02:15:24.680 --> 02:15:29.690
now int argc string argv brackets just
means, yes, command line arguments.

02:15:29.690 --> 02:15:31.290
And we've seen how to access them.

02:15:31.290 --> 02:15:33.620
So the last piece of
the puzzle, honestly,

02:15:33.620 --> 02:15:37.460
of all the cryptic syntax the past
two weeks, is just what int means.

02:15:37.460 --> 02:15:40.610
Int is always there for
main, and it indicates

02:15:40.610 --> 02:15:44.300
that main will always return an integer,
even though you and I have never

02:15:44.300 --> 02:15:46.010
done so explicitly.

02:15:46.010 --> 02:15:50.450
Usually, main returns
0, by default. But it

02:15:50.450 --> 02:15:53.928
would be weird if you saw an error
message saying 0, so 0 is just hidden.

02:15:53.928 --> 02:15:55.470
You would never see it on the screen.

02:15:55.470 --> 02:15:58.670
But it's happening automatically
by way of how C is designed.

02:15:58.670 --> 02:16:01.550
So let me write one final program here.

02:16:01.550 --> 02:16:05.750
I'll call it, for instance, status.c
to show you these exit statuses.

02:16:05.750 --> 02:16:10.790
Code of status.c, and then up here,
let me do something simple like include

02:16:10.790 --> 02:16:18.020
cs50.h, then include
stdio.h, and then int main--

02:16:18.020 --> 02:16:21.350
actually, let's use a command line
argument. int argc, string argv[],

02:16:21.350 --> 02:16:23.180
so that's copy, paste.

02:16:23.180 --> 02:16:26.000
But now let's do this.

02:16:26.000 --> 02:16:29.280
If argc does not equal to--

02:16:29.280 --> 02:16:30.780
why don't we do something like this?

02:16:30.780 --> 02:16:33.740
Let's not just default to
hello, world like last time.

02:16:33.740 --> 02:16:34.770
Let's yell at the user.

02:16:34.770 --> 02:16:38.802
So let's say something like printf
missing command line argument,

02:16:38.802 --> 02:16:40.760
so that they know they
screwed up and they need

02:16:40.760 --> 02:16:43.160
to run the program again correctly.

02:16:43.160 --> 02:16:51.320
Else, let's go ahead and say, print
out, as before, Hello, comma %s,

02:16:51.320 --> 02:16:56.730
and then plug in argv[1], so the
human's name from the prompt.

02:16:56.730 --> 02:17:01.910
Now at this point, let me go
ahead and run status, ./status,

02:17:01.910 --> 02:17:03.590
and I'll type nothing first.

02:17:03.590 --> 02:17:04.700
I get yelled at.

02:17:04.700 --> 02:17:10.170
This time, I'll type it again.
./status David, and it works properly.

02:17:10.170 --> 02:17:14.090
But now let me show you a
somewhat secret, cryptic command.

02:17:14.090 --> 02:17:17.330
You can type this at your prompt,
and it's just a coincidence

02:17:17.330 --> 02:17:18.740
that there's another dollar sign.

02:17:18.740 --> 02:17:22.400
Echo $?, totally arcane,
but it allows you

02:17:22.400 --> 02:17:25.490
to see what exit status
your program has ended with.

02:17:25.490 --> 02:17:27.559
So let me run this again the wrong way.

02:17:27.559 --> 02:17:31.040
./status, I get the error message.

02:17:31.040 --> 02:17:32.780
What was secretly returned?

02:17:32.780 --> 02:17:33.440
I can't see it.

02:17:33.440 --> 02:17:37.280
There's obviously no error
screen, but by typing echo $?,

02:17:37.280 --> 02:17:41.420
I can see that, oh, my program
automatically, by default, returns

02:17:41.420 --> 02:17:42.170
zero.

02:17:42.170 --> 02:17:46.879
However, if I run it again
correctly, ./status David, Enter,

02:17:46.879 --> 02:17:48.690
this is the correct version.

02:17:48.690 --> 02:17:50.629
But if I run echo $?

02:17:50.629 --> 02:17:52.879
status again, it's still entered with 0.

02:17:52.879 --> 02:17:55.879
And long story short, this
is just a missed opportunity.

02:17:55.879 --> 02:17:59.570
When something goes wrong, why
don't I return a value other than 0?

02:17:59.570 --> 02:18:01.070
0, by default, means success.

02:18:01.070 --> 02:18:02.690
And it's always there automatically.

02:18:02.690 --> 02:18:04.940
But you can control this.

02:18:04.940 --> 02:18:11.160
I can go into my code here and return
1, else, if something works fine,

02:18:11.160 --> 02:18:14.870
I can return 0, by default. And
honestly, if I omit the return zero,

02:18:14.870 --> 02:18:17.129
again, zero automatically is returned.

02:18:17.129 --> 02:18:20.719
So let me go ahead and go be explicit,
just so I know what's going on.

02:18:20.719 --> 02:18:26.360
Make status again, ./status, and
let's do this correctly with David.

02:18:26.360 --> 02:18:28.520
Enter, hello, David.

02:18:28.520 --> 02:18:32.059
Echo $?, zero.

02:18:32.059 --> 02:18:33.270
So all is well.

02:18:33.270 --> 02:18:38.240
But now if I do ./status and nothing,
or multiple things, but not just David,

02:18:38.240 --> 02:18:40.530
Enter, I get the error message.

02:18:40.530 --> 02:18:45.230
But now if I do echo $?,
voila, there now is the one.

02:18:45.230 --> 02:18:47.330
So what does this now mean?

02:18:47.330 --> 02:18:49.490
This is, in the graphical
world, we would just

02:18:49.490 --> 02:18:51.020
show something like this
on the screen, which is

02:18:51.020 --> 02:18:52.459
a little more informative to the user.

02:18:52.459 --> 02:18:54.469
But even in the Linux world
where you don't have a GUI,

02:18:54.469 --> 02:18:56.690
necessarily, even for the
programs we've written,

02:18:56.690 --> 02:18:58.549
you can check these exit statuses.

02:18:58.549 --> 02:19:01.070
And in fact, more comfortable,
more advanced programmers,

02:19:01.070 --> 02:19:03.889
when they write code
that calls programs,

02:19:03.889 --> 02:19:07.340
be it cowsay or anything
else, you can encode,

02:19:07.340 --> 02:19:11.030
check what the exit status is
of a program, and then decide,

02:19:11.030 --> 02:19:13.170
did my program work or did it not?

02:19:13.170 --> 02:19:16.219
And now let's connect
the final dots before we

02:19:16.219 --> 02:19:19.070
adjourn for some fruit snacks.

02:19:19.070 --> 02:19:22.100
Cryptography, namely one of
the applications this week

02:19:22.100 --> 02:19:24.770
via which you'll be able
to send, if you will,

02:19:24.770 --> 02:19:27.650
secret messages, and better
yet, decrypt secret messages.

02:19:27.650 --> 02:19:29.780
This will be in addition
to perhaps analyzing

02:19:29.780 --> 02:19:32.120
the readability of text
using heuristics, like we

02:19:32.120 --> 02:19:34.040
identified at the start of class two.

02:19:34.040 --> 02:19:38.299
So cryptography is just the art, the
science of encrypting information,

02:19:38.299 --> 02:19:41.330
scrambling information so that
if you have a secret message

02:19:41.330 --> 02:19:45.980
to send in so-called plaintext, you
can run it through some algorithm

02:19:45.980 --> 02:19:49.910
and turn it into what's called
ciphertext, thereby, encrypting it.

02:19:49.910 --> 02:19:53.150
And only someone who knows
what algorithm you've used

02:19:53.150 --> 02:19:55.880
and what input you've used to
the algorithm, theoretically,

02:19:55.880 --> 02:19:59.880
can decrypt that process and convert
it back to the original message.

02:19:59.880 --> 02:20:03.030
So if we use our mental model
from last week, here is a problem.

02:20:03.030 --> 02:20:04.910
Here is an input and output.

02:20:04.910 --> 02:20:08.120
The goal I claim here is to take
some plain text, like the message

02:20:08.120 --> 02:20:10.250
you want to send, think
back to grade school

02:20:10.250 --> 02:20:13.640
if you ever passed a note to a friend
or to your crush saying, I love you,

02:20:13.640 --> 02:20:16.910
it's a little awkward if the teacher
or someone else intercepts the paper.

02:20:16.910 --> 02:20:19.490
And in English, it just says,
I love you, or whatever it is.

02:20:19.490 --> 02:20:22.350
It'd be nice if you had at
least encrypted it in some way.

02:20:22.350 --> 02:20:25.220
But the other person needs to
know what algorithm you used

02:20:25.220 --> 02:20:27.230
and what inputs you
use to that algorithm

02:20:27.230 --> 02:20:31.100
so that, ultimately, they can decode
the so-called ciphertext, which

02:20:31.100 --> 02:20:32.040
is the output.

02:20:32.040 --> 02:20:34.190
So what goes inside of the box today?

02:20:34.190 --> 02:20:37.970
Well, an algorithm, as it relates
to cryptography, is called a cipher.

02:20:37.970 --> 02:20:41.390
And a cipher is a fancy name for
an algorithm that encrypts text

02:20:41.390 --> 02:20:43.250
from plaintext to ciphertext.

02:20:43.250 --> 02:20:46.760
The catch is, there needs to
be not just the algorithm,

02:20:46.760 --> 02:20:48.750
there needs to be an input to it.

02:20:48.750 --> 02:20:52.590
And so, for instance, you might draw
the picture like this for the first time

02:20:52.590 --> 02:20:53.090
today.

02:20:53.090 --> 02:20:54.257
And we've seen this in code.

02:20:54.257 --> 02:20:57.180
You can give multiple inputs
or arguments to functions.

02:20:57.180 --> 02:20:59.960
So in this black box, can you
imagine passing in the message

02:20:59.960 --> 02:21:02.510
you want to send, and then some secret.

02:21:02.510 --> 02:21:05.300
So for instance, suppose
that, the simplest

02:21:05.300 --> 02:21:08.750
thing I could think of as a kid was
instead of sending the letter A,

02:21:08.750 --> 02:21:10.310
why don't I write the letter B?

02:21:10.310 --> 02:21:13.070
Instead of the letter B, why
don't I write the letter C?

02:21:13.070 --> 02:21:16.280
So I can kind of shift the
English alphabet by one space.

02:21:16.280 --> 02:21:18.740
So A becomes B, B
becomes C, dot, dot, dot,

02:21:18.740 --> 02:21:21.690
Z becomes A. You can
wrap around at the end.

02:21:21.690 --> 02:21:24.120
And let's assume no punctuation
in this part of the story.

02:21:24.120 --> 02:21:29.420
So that's a very simple algorithm--
add a value to each letter

02:21:29.420 --> 02:21:32.090
and send the value as the ciphertext.

02:21:32.090 --> 02:21:35.540
And now the teacher, the classmate,
they have to know that you use,

02:21:35.540 --> 02:21:39.410
not only this rotational algorithm,
also known as a Caesar cipher,

02:21:39.410 --> 02:21:41.300
they also need to know
what number you use.

02:21:41.300 --> 02:21:45.200
Did you add 1 to every letter, 2 to
every letter, 25 to every letter?

02:21:45.200 --> 02:21:49.310
Now if they're super smart and probably
not the young age in this story,

02:21:49.310 --> 02:21:51.165
they could also just
try all possibilities.

02:21:51.165 --> 02:21:53.040
And that would be an
attack on the algorithm.

02:21:53.040 --> 02:21:55.310
This is not a sophisticated
algorithm, but it's

02:21:55.310 --> 02:21:56.970
enough to send a message in class.

02:21:56.970 --> 02:21:58.940
So if the two inputs now are HI!

02:21:58.940 --> 02:22:04.280
as the plain text message, and 1 as
the so-called key, the secret number

02:22:04.280 --> 02:22:06.950
that only you and the
other person know, you

02:22:06.950 --> 02:22:11.040
might be able to encrypt a
message from one way to the other.

02:22:11.040 --> 02:22:13.400
And so in this case, for instance, HI!

02:22:13.400 --> 02:22:16.198
would become I-J-!.

02:22:16.198 --> 02:22:17.990
In this version of the
algorithm, we're not

02:22:17.990 --> 02:22:19.823
going to bother with
numbers or punctuation.

02:22:19.823 --> 02:22:23.090
We'll only operate on A through
Z, be it uppercase or lowercase.

02:22:23.090 --> 02:22:28.250
So now if you were to receive a slip
of paper in class with I-J on it,

02:22:28.250 --> 02:22:31.290
you, the recipient,
would know what it is

02:22:31.290 --> 02:22:33.440
so long as you know that
the sender used one,

02:22:33.440 --> 02:22:36.500
because you just reverse the algorithm
and you subtract one instead.

02:22:36.500 --> 02:22:39.110
The teacher, they probably
don't know what this means,

02:22:39.110 --> 02:22:41.443
and they're not going to spend
time hacking the message,

02:22:41.443 --> 02:22:42.975
so it just looks scrambled to them.

02:22:42.975 --> 02:22:44.600
And that's what we get from encryption.

02:22:44.600 --> 02:22:47.430
Someone who intercepts it, be it
in class or in the real world,

02:22:47.430 --> 02:22:51.080
on the internet or anywhere else,
can't actually figure out, ideally,

02:22:51.080 --> 02:22:52.700
what it is you have sent.

02:22:52.700 --> 02:22:55.130
The opposite, of course, is
indeed called decryption,

02:22:55.130 --> 02:22:56.300
but the process is the same.

02:22:56.300 --> 02:22:58.370
We now pass in negative 1.

02:22:58.370 --> 02:23:00.300
And so how about this?

02:23:00.300 --> 02:23:02.840
Why don't we end with
a demonstration here?

02:23:02.840 --> 02:23:08.360
UIJT XBT DT50-- there's
a bit of a tell there.

02:23:08.360 --> 02:23:11.060
If we pass that in and
do negative 1, well,

02:23:11.060 --> 02:23:14.180
how do we get out the
plaintext originally?

02:23:14.180 --> 02:23:18.200
Well, if this is the ciphertext,
and we subtract 1 from each letter,

02:23:18.200 --> 02:23:28.010
I think U becomes T, I becomes H, J
becomes I, T becomes S, X becomes W,

02:23:28.010 --> 02:23:37.580
B becomes A, T becomes S, D becomes C,
T becomes S, and this was, indeed, CS50.

02:23:37.580 --> 02:23:40.250
Have a duck on your way out,
and some snacks in the lobby.

02:23:40.250 --> 02:23:42.350
[APPLAUSE]

02:23:42.350 --> 02:23:43.850
[FILM ROLLING]

02:23:43.850 --> 02:23:47.500
[MUSIC PLAYING]