WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:03.493
[MUSIC PLAYING]

00:00:49.420 --> 00:00:51.760
DAVID MALAN: All right, so this is CS50.

00:00:51.760 --> 00:00:55.450
And this is week 2, wherein we're
going to dive in a little more deeply

00:00:55.450 --> 00:00:56.720
to see this new language.

00:00:56.720 --> 00:00:58.720
And we're also going to
take a look back at some

00:00:58.720 --> 00:01:02.350
of the concepts we looked at last week
so that you can better understand some

00:01:02.350 --> 00:01:04.750
of the features of C
and some of the steps

00:01:04.750 --> 00:01:06.830
you've been taking to
make your code work.

00:01:06.830 --> 00:01:09.880
So we'll peel back some of the
layers of abstraction from last week

00:01:09.880 --> 00:01:11.950
so that you better understand
really what's going

00:01:11.950 --> 00:01:14.540
on underneath the hood of the computer.

00:01:14.540 --> 00:01:18.907
So, of course, last week, we began with
perhaps the most canonical of programs

00:01:18.907 --> 00:01:20.740
in C, the most canonical
of programs you can

00:01:20.740 --> 00:01:23.140
write pretty much in any
language, which is that which

00:01:23.140 --> 00:01:25.030
says, quite simply, "hello, world."

00:01:25.030 --> 00:01:28.600
But recall that before
actually running this program,

00:01:28.600 --> 00:01:31.600
we have to convert it into the language
that computers themselves speak,

00:01:31.600 --> 00:01:35.080
which we defined last week as
binary, 0's and 1's, otherwise known

00:01:35.080 --> 00:01:37.640
as machine language in this context.

00:01:37.640 --> 00:01:40.120
So we have to go somehow from
this source code to something

00:01:40.120 --> 00:01:44.380
more like this machine code, the 0's
and 1's that the computer actually

00:01:44.380 --> 00:01:45.250
understands.

00:01:45.250 --> 00:01:48.160
Now, you may recall too that we
introduced a command for this.

00:01:48.160 --> 00:01:49.840
And that command was called make.

00:01:49.840 --> 00:01:53.770
And literally via this command,
"make hello," could we make a program

00:01:53.770 --> 00:01:54.490
called hello.

00:01:54.490 --> 00:01:55.840
And make was a little fancy.

00:01:55.840 --> 00:01:59.140
It assumed that if you want to
make a program called hello,

00:01:59.140 --> 00:02:01.720
it would look for a file called hello.c.

00:02:01.720 --> 00:02:03.850
That just happens automatically for you.

00:02:03.850 --> 00:02:07.240
And the end result, of course, was
an additional file called hello

00:02:07.240 --> 00:02:11.390
that would end up getting put
into your current directory.

00:02:11.390 --> 00:02:14.360
So you could then do
./hello and be on your way.

00:02:14.360 --> 00:02:17.620
But it turns out that make
is actually automating

00:02:17.620 --> 00:02:20.650
a more specific set of
steps for us that we'll

00:02:20.650 --> 00:02:22.510
see a little more closely now instead.

00:02:22.510 --> 00:02:24.640
So on the screen here
is exactly the same code

00:02:24.640 --> 00:02:27.370
that we wrote last week to say,
quite simply, "hello, world."

00:02:27.370 --> 00:02:31.810
And recall that any time you
run "make hello" or "make mario"

00:02:31.810 --> 00:02:34.120
or "make cash" or "make
credit," any of the problems

00:02:34.120 --> 00:02:35.890
that you might have
tackled more recently,

00:02:35.890 --> 00:02:38.260
you see some cryptic
output on the screen.

00:02:38.260 --> 00:02:42.040
Hopefully, no red or yellow error
messages, but even when all is well,

00:02:42.040 --> 00:02:45.770
you see this white text which is
indicative of all having been well.

00:02:45.770 --> 00:02:49.240
And last week, we just kind of ignored
this and moved on and immediately did

00:02:49.240 --> 00:02:51.400
something like ./hello.

00:02:51.400 --> 00:02:53.528
But today, let's actually
better understand

00:02:53.528 --> 00:02:55.570
what it is that we've been
turning a blind eye to

00:02:55.570 --> 00:03:00.820
so that each week, as it passes, there's
less and less that you don't understand

00:03:00.820 --> 00:03:04.220
the entirety of with respect
to what's going on your screen.

00:03:04.220 --> 00:03:08.020
So again, if I do ls here, we'll
see not only hello.c, but also

00:03:08.020 --> 00:03:12.910
the executable program called hello
that I actually created via make.

00:03:12.910 --> 00:03:14.450
But look at this output.

00:03:14.450 --> 00:03:16.930
There's some mention of
something called Clang here.

00:03:16.930 --> 00:03:21.730
And then there's a lot of other words or
cryptic phrases, something in computer

00:03:21.730 --> 00:03:24.500
speak here that has all of
these hyphens in front of them.

00:03:24.500 --> 00:03:26.770
And it turns out that
what make is doing for us

00:03:26.770 --> 00:03:31.870
is it's automating execution of a
command more specifically called clang.

00:03:31.870 --> 00:03:35.860
Clang is actually the compiler that
we alluded to last week, a compiler

00:03:35.860 --> 00:03:39.010
being a program that converts
source code to machine code.

00:03:39.010 --> 00:03:41.590
We've actually been using
Clang this whole time.

00:03:41.590 --> 00:03:44.860
But notice that Clang requires
a bit more sophistication.

00:03:44.860 --> 00:03:48.200
You have to understand a bit more about
what's going on in order to use it.

00:03:48.200 --> 00:03:51.190
So let me go ahead and remove
the program called hello.

00:03:51.190 --> 00:03:54.160
I'm going to use the rm command
that we saw briefly last time.

00:03:54.160 --> 00:03:55.900
I'm going to confirm by hitting y.

00:03:55.900 --> 00:04:00.010
And if I type ls again now, hello.c
is the only file that remains.

00:04:00.010 --> 00:04:04.600
Well, temporarily, let me take
away the ability to use make.

00:04:04.600 --> 00:04:07.000
And let's now use Clang directly.

00:04:07.000 --> 00:04:10.210
Clang is another program
installed in CS50 IDE.

00:04:10.210 --> 00:04:13.540
It's a very popular compiler that you
can download onto your own Macs and PCs

00:04:13.540 --> 00:04:14.320
as well.

00:04:14.320 --> 00:04:16.550
But to run it is a little different.

00:04:16.550 --> 00:04:19.930
I'm going to go ahead and say
clang and then the name of the file

00:04:19.930 --> 00:04:23.380
that I want to compile,
hello.c being this one.

00:04:23.380 --> 00:04:24.970
I'm going to go ahead and hit Enter.

00:04:24.970 --> 00:04:27.340
And now nothing happens, seemingly.

00:04:27.340 --> 00:04:29.410
But frankly, as you've
probably gleaned already,

00:04:29.410 --> 00:04:31.698
when nothing bad seems to
happen, that implicitly

00:04:31.698 --> 00:04:33.490
tends to mean that
something good happened.

00:04:33.490 --> 00:04:35.710
Your program compiled successfully.

00:04:35.710 --> 00:04:39.790
But curiously, if I type ls now,
you don't see the program, hello.

00:04:39.790 --> 00:04:42.700
You see this weird
file name called a.out.

00:04:42.700 --> 00:04:44.620
And this is actually
a historical remnant.

00:04:44.620 --> 00:04:48.220
Years ago, when humans would use
a compiler to compile their code,

00:04:48.220 --> 00:04:51.520
the default file name that
every program was given

00:04:51.520 --> 00:04:54.460
was a.out for assembly output.

00:04:54.460 --> 00:04:55.670
More on that in a moment.

00:04:55.670 --> 00:04:57.670
But this is kind of a
stupid name for a program.

00:04:57.670 --> 00:04:59.590
It's not at all descriptive
of what it does.

00:04:59.590 --> 00:05:05.140
So it turns out that programs like Clang
can be configured at the command line.

00:05:05.140 --> 00:05:08.140
The command line, again, refers to
the blinking prompt where you can

00:05:08.140 --> 00:05:09.280
type commands.

00:05:09.280 --> 00:05:14.200
So indeed, I'm going to go ahead and
remove this file now-- rm space a.out,

00:05:14.200 --> 00:05:15.550
and then confirm with y.

00:05:15.550 --> 00:05:18.520
And now I'm back to where
I began with just hello.c.

00:05:18.520 --> 00:05:21.140
And let me go ahead now and do
something a little different.

00:05:21.140 --> 00:05:27.700
I'm going to do "clang -o hello"
and then the word "hello.c."

00:05:27.700 --> 00:05:29.950
And what I'm doing here
is actually providing

00:05:29.950 --> 00:05:33.080
what we're going to start
calling a command-line argument.

00:05:33.080 --> 00:05:37.330
So these commands, like
make and rm, sometimes

00:05:37.330 --> 00:05:39.460
can just be run all by themselves.

00:05:39.460 --> 00:05:41.500
You just type a single
word and hit Enter.

00:05:41.500 --> 00:05:44.980
But very often, we've seen that
they take inputs in some sense.

00:05:44.980 --> 00:05:46.660
You type, "make hello."

00:05:46.660 --> 00:05:48.870
You type, "rm hello."

00:05:48.870 --> 00:05:51.030
And the second word,
"hello," in those cases,

00:05:51.030 --> 00:05:53.910
is kind of an input to
the command, otherwise

00:05:53.910 --> 00:05:56.980
now known as a command-line argument.

00:05:56.980 --> 00:05:58.480
It's an input to the command.

00:05:58.480 --> 00:06:01.710
So here, we have more
command-line arguments.

00:06:01.710 --> 00:06:06.300
We've got the word "clang," which is
the compiler we're about to run, "-o,"

00:06:06.300 --> 00:06:09.230
which it turns out is shorthand
notation for "output,"

00:06:09.230 --> 00:06:10.875
so please output the following.

00:06:10.875 --> 00:06:12.000
What do you want to output?

00:06:12.000 --> 00:06:13.830
Well, the next word is "hello."

00:06:13.830 --> 00:06:16.210
And then the final word is "hello.c."

00:06:16.210 --> 00:06:19.530
So long story short, this
command now more verbose

00:06:19.530 --> 00:06:24.210
though it is, is saying, run
Clang, output a file called hello,

00:06:24.210 --> 00:06:27.020
and take as input file called hello.c.

00:06:27.020 --> 00:06:30.270
So when I run this command after hitting
Enter, nothing again seems to happen.

00:06:30.270 --> 00:06:34.560
But if I type ls, I don't see that
stupid default file name of a.out.

00:06:34.560 --> 00:06:37.590
Now I see the file name, hello.

00:06:37.590 --> 00:06:41.310
So this is how ultimately Clang
is helping me compile my code.

00:06:41.310 --> 00:06:43.770
It's kind of automating
all of those processes.

00:06:43.770 --> 00:06:48.210
But recall that that's not the only
type of program we ran last week

00:06:48.210 --> 00:06:49.290
or wrote last week.

00:06:49.290 --> 00:06:52.425
We rather took code like
this and began to enhance it

00:06:52.425 --> 00:06:53.550
with some additional lines.

00:06:53.550 --> 00:06:56.040
So version 2 of Hello,
World actually involved

00:06:56.040 --> 00:06:59.730
prompting the user for input
using CS50's get_string function,

00:06:59.730 --> 00:07:02.940
storing the output in
a variable called name.

00:07:02.940 --> 00:07:07.028
But recall that we also had to
add cs50.h at the top of the file.

00:07:07.028 --> 00:07:08.320
So let me go ahead and do that.

00:07:08.320 --> 00:07:12.060
Let me go ahead and remove hello
because that's now the old version.

00:07:12.060 --> 00:07:18.510
Let me go in now and start updating my
code here and go into my hello.c file,

00:07:18.510 --> 00:07:23.010
include cs50.h, now get
myself a string called name,

00:07:23.010 --> 00:07:25.890
but we could call it anything,
call the function get_string,

00:07:25.890 --> 00:07:30.990
and ask, "What's your name," question
mark with a space at the very end

00:07:30.990 --> 00:07:32.340
just to create a gap.

00:07:32.340 --> 00:07:36.090
And then down here, instead of
printing out "hello, world" always,

00:07:36.090 --> 00:07:40.110
let me print out "Hello, %s,"
which is a placeholder recall,

00:07:40.110 --> 00:07:41.910
and output the person's name.

00:07:41.910 --> 00:07:44.760
So last week, the way we
compiled this program was just

00:07:44.760 --> 00:07:47.140
"make hello," no different from now.

00:07:47.140 --> 00:07:52.200
But this week, suppose I were
to instead get rid of make, only

00:07:52.200 --> 00:07:54.570
because it's sort of automating
steps for me that I now

00:07:54.570 --> 00:07:56.250
want to understand in more detail.

00:07:56.250 --> 00:08:01.380
I could compile this program again
with clang -o hello hello.c, so just

00:08:01.380 --> 00:08:06.600
a reapplication of that same idea of
passing in three arguments, -o, hello,

00:08:06.600 --> 00:08:08.130
and hello.c.

00:08:08.130 --> 00:08:11.070
But the catch now is
that I'm actually going

00:08:11.070 --> 00:08:12.880
to see one of these red error messages.

00:08:12.880 --> 00:08:14.880
And let's consider what
this is actually saying.

00:08:14.880 --> 00:08:17.440
There's still going to be a
bunch of cryptic stuff here.

00:08:17.440 --> 00:08:19.992
But notice, as always, we're
going to see, hopefully,

00:08:19.992 --> 00:08:21.450
something that's a little familiar.

00:08:21.450 --> 00:08:23.520
So "undefined reference to get_string."

00:08:23.520 --> 00:08:26.790
I don't yet know what an undefined
reference is, necessarily.

00:08:26.790 --> 00:08:28.440
I don't know what a linker command is.

00:08:28.440 --> 00:08:31.680
But I at least recognize there's
something going on with get_string.

00:08:31.680 --> 00:08:33.100
And there's a reason for this.

00:08:33.100 --> 00:08:37.140
It turns out that when using a library,
whether it's CS50's library or others'

00:08:37.140 --> 00:08:41.159
as well, it's sometimes not
sufficient only to include the header

00:08:41.159 --> 00:08:43.650
file at the top of your own code.

00:08:43.650 --> 00:08:46.230
Sometimes, you additionally
have to tell the computer

00:08:46.230 --> 00:08:52.350
where to find the 0's and 1's that
someone has written to implement

00:08:52.350 --> 00:08:54.290
a function like get_string.

00:08:54.290 --> 00:08:58.860
So the header file, like
cs50.h, just tells the compiler

00:08:58.860 --> 00:09:00.540
that the function exists.

00:09:00.540 --> 00:09:02.910
But there's a second
mechanism that, up until now,

00:09:02.910 --> 00:09:05.730
has been automated for us,
that tells the computer where

00:09:05.730 --> 00:09:10.560
to find the actual 0's and 1's that
implements the functions in that header

00:09:10.560 --> 00:09:11.470
file.

00:09:11.470 --> 00:09:15.180
So with that said, I'm going to need
to actually add another command line

00:09:15.180 --> 00:09:16.710
argument to this command.

00:09:16.710 --> 00:09:22.140
And instead of doing clang -o hello
hello.c, I'm going to additionally,

00:09:22.140 --> 00:09:26.250
and admittedly, cryptically,
do -lcs50 at the end

00:09:26.250 --> 00:09:31.440
of this command, which quite simply
refers to link in the CS50 library.

00:09:31.440 --> 00:09:33.750
So "link" is a term of
art that we'll see what it

00:09:33.750 --> 00:09:35.560
means in more detail in just a moment.

00:09:35.560 --> 00:09:39.330
But this additional final
command-line argument tells Clang,

00:09:39.330 --> 00:09:42.660
you already know that a
function like get_string exists.

00:09:42.660 --> 00:09:47.100
-lcs50 means when
compiling hello.c, make

00:09:47.100 --> 00:09:51.570
sure to incorporate all of the
machine code from CS50's library

00:09:51.570 --> 00:09:53.190
into your program as well.

00:09:53.190 --> 00:09:56.880
In short, it's something you have to
do when you use certain libraries.

00:09:56.880 --> 00:10:00.610
So now when I hit Enter, all seems to
be well because nothing bad got printed.

00:10:00.610 --> 00:10:02.520
If I type ls, I see hello.

00:10:02.520 --> 00:10:06.300
And voila, I can do ./hello,
type in my name, David.

00:10:06.300 --> 00:10:08.520
And voila, "hello, David."

00:10:08.520 --> 00:10:10.740
So why didn't we do
all of this last week?

00:10:10.740 --> 00:10:13.020
And frankly, we've made
no fundamental progress.

00:10:13.020 --> 00:10:15.960
All we've done is reveal what's
going on underneath the hood.

00:10:15.960 --> 00:10:19.650
But I'll claim that, frankly,
compiling your code by typing out

00:10:19.650 --> 00:10:24.450
all of these verbose command-line
arguments just gets tedious quickly.

00:10:24.450 --> 00:10:27.330
And so computer scientists and
programmers, more specifically,

00:10:27.330 --> 00:10:29.370
tend to automate monotonous steps.

00:10:29.370 --> 00:10:33.360
So what's happening ultimately
with make is that all of this

00:10:33.360 --> 00:10:34.800
is being automated for us.

00:10:34.800 --> 00:10:37.590
So when you typed "make hello"
last week-- and henceforth,

00:10:37.590 --> 00:10:40.030
you're welcome to continue
using make as well--

00:10:40.030 --> 00:10:43.710
notice that it generates this
extra long command, some of which

00:10:43.710 --> 00:10:45.000
we haven't even talked about.

00:10:45.000 --> 00:10:47.400
But I do recognize
clang at the beginning.

00:10:47.400 --> 00:10:51.170
I recognize hello.c see here.

00:10:51.170 --> 00:10:54.330
I recognize -lcs50 here.

00:10:54.330 --> 00:10:56.490
But notice there's a bunch
of other stuff as well,

00:10:56.490 --> 00:11:00.470
not only the -o hello,
but also -lm, which

00:11:00.470 --> 00:11:03.500
refers to a math library,
-lcrypt, which refers

00:11:03.500 --> 00:11:05.900
to a cryptography or
an encryption library.

00:11:05.900 --> 00:11:08.810
In short, we the staff
have preconfigured

00:11:08.810 --> 00:11:11.450
make to just make sure that
when you compile your code,

00:11:11.450 --> 00:11:15.410
all of the requisite dependencies,
libraries, and so forth,

00:11:15.410 --> 00:11:18.650
are available to you without
having to worry about all

00:11:18.650 --> 00:11:20.100
of these command-line arguments.

00:11:20.100 --> 00:11:22.700
So henceforth, you can
certainly compile your code

00:11:22.700 --> 00:11:24.650
in this way using Clang directly.

00:11:24.650 --> 00:11:27.740
Or you can come back full circle
to where we were last week

00:11:27.740 --> 00:11:29.240
and just run "make hello."

00:11:29.240 --> 00:11:33.170
But there's a reason we run make hello,
because executing all of those steps

00:11:33.170 --> 00:11:35.900
manually tends to just
get tedious quickly.

00:11:35.900 --> 00:11:38.992
And so indeed, what we've
done here is compile our code.

00:11:38.992 --> 00:11:41.450
And compiling means going from
source code to machine code.

00:11:41.450 --> 00:11:44.660
But today, we revealed that
there's a little more, indeed,

00:11:44.660 --> 00:11:46.610
going on underneath the
hood, this "linking"

00:11:46.610 --> 00:11:49.800
that I referred to and a
couple of other steps as well.

00:11:49.800 --> 00:11:53.900
So it turns out when you compile your
code from source code to machine code,

00:11:53.900 --> 00:11:56.900
there's a few more steps
that are ultimately involved.

00:11:56.900 --> 00:11:59.960
And when we say "compiling," we
actually mean these four steps.

00:11:59.960 --> 00:12:03.350
And we're not going to dwell on
these kinds of low-level details.

00:12:03.350 --> 00:12:05.690
But it's perhaps
enlightening just to see

00:12:05.690 --> 00:12:09.830
a brief tour of what's going on
when you start with your source code

00:12:09.830 --> 00:12:11.680
and end up trying to
produce machine code.

00:12:11.680 --> 00:12:12.638
So let's consider this.

00:12:12.638 --> 00:12:14.660
This is step 1 that
the computer is doing

00:12:14.660 --> 00:12:17.150
for you when you compile your code.

00:12:17.150 --> 00:12:19.640
So step 1 takes your
own source code that

00:12:19.640 --> 00:12:21.260
looks a little something like this.

00:12:21.260 --> 00:12:24.650
And it preprocesses your code,
top to bottom, left to right.

00:12:24.650 --> 00:12:27.170
And to preprocess your
code essentially means

00:12:27.170 --> 00:12:30.500
that it looks for any lines that
start with a hash symbol, so

00:12:30.500 --> 00:12:35.300
#include cs50.h, #include stdio.h.

00:12:35.300 --> 00:12:39.230
And what the preprocessing step does is
it's kind of like a find and replace.

00:12:39.230 --> 00:12:42.330
It notices, oh, here's a #include line.

00:12:42.330 --> 00:12:49.790
Let me go ahead and copy the contents of
that file, cs50.h, into your own code.

00:12:49.790 --> 00:12:54.290
Similarly, when I encounter
#include stdio.h, let me,

00:12:54.290 --> 00:12:58.760
the so-called preprocessor, open
that file, stdio.h, and copy/paste

00:12:58.760 --> 00:13:04.650
the contents of that file so that what's
in the file now looks more like this.

00:13:04.650 --> 00:13:06.290
So this is happening automatically.

00:13:06.290 --> 00:13:08.240
You never have to do this manually.

00:13:08.240 --> 00:13:12.290
But why is there this
preprocessing step?

00:13:12.290 --> 00:13:16.880
If you recall our discussion last
week of these lines of code that

00:13:16.880 --> 00:13:19.910
tend to go at the top of
your file, does anyone

00:13:19.910 --> 00:13:24.580
perceive what the preprocessor
is doing for me and why?

00:13:24.580 --> 00:13:29.720
Why do I write code that has these
hash symbols, like #include cs50.h

00:13:29.720 --> 00:13:33.350
and #include stdio.h, but
this preprocessor apparently

00:13:33.350 --> 00:13:37.415
is automatically replacing those
lines with the actual contents

00:13:37.415 --> 00:13:38.040
of those files?

00:13:38.040 --> 00:13:42.740
What are these things
here in yellow now?

00:13:42.740 --> 00:13:44.168
Yeah, Jack, what do you think?

00:13:44.168 --> 00:13:46.960
JACK: Is it defining all the
functions for you to use in your code,

00:13:46.960 --> 00:13:48.740
otherwise the computer
wouldn't know what to do?

00:13:48.740 --> 00:13:49.340
DAVID MALAN: Exactly.

00:13:49.340 --> 00:13:51.065
It's defining all of
the functions in my code

00:13:51.065 --> 00:13:52.648
so that the computer knows what to do.

00:13:52.648 --> 00:13:56.113
Because remember that we ran into
that sort of annoying bug last week,

00:13:56.113 --> 00:13:58.280
whereby I was trying to
implement a function called,

00:13:58.280 --> 00:13:59.960
I think, get_positive_int.

00:13:59.960 --> 00:14:04.350
And recall that when I implemented
that function at the bottom of my file,

00:14:04.350 --> 00:14:07.610
the compiler was kind of dumb
in that it didn't realize

00:14:07.610 --> 00:14:09.440
that it existed because
it was implemented

00:14:09.440 --> 00:14:11.220
all the way at the bottom of my file.

00:14:11.220 --> 00:14:16.040
So to Jack's point, by putting a mention
of this function, a hint, if you will,

00:14:16.040 --> 00:14:18.950
at the very top, it's
like training the compiler

00:14:18.950 --> 00:14:22.160
to know in advance that I don't
know how it's implemented yet,

00:14:22.160 --> 00:14:23.850
but I know get_string is going to exist.

00:14:23.850 --> 00:14:27.540
I don't know how it's implemented yet,
but I know printf is going to exist.

00:14:27.540 --> 00:14:31.400
So these header files that we've been
including for the past week essentially

00:14:31.400 --> 00:14:34.190
contain all of the prototypes--

00:14:34.190 --> 00:14:38.240
that is, all of the hints for all the
functions that exist in the library--

00:14:38.240 --> 00:14:42.710
so that your code, when
compiled, know from the top down

00:14:42.710 --> 00:14:45.690
that those functions will indeed exist.

00:14:45.690 --> 00:14:47.690
So the preprocessor just
saves us the trouble

00:14:47.690 --> 00:14:50.480
of having to copy and paste all
of these prototypes, if you will,

00:14:50.480 --> 00:14:52.830
all of these hints, ourselves.

00:14:52.830 --> 00:14:54.950
So what happens after that step there?

00:14:54.950 --> 00:14:55.777
What comes next?

00:14:55.777 --> 00:14:57.860
Well, there might very
well be other header files.

00:14:57.860 --> 00:15:00.152
There might very well be
other contents in those files.

00:15:00.152 --> 00:15:03.800
But for now, let's just assume that
only in there is the prototype.

00:15:03.800 --> 00:15:06.770
So now compiling actually
has a more precise meaning

00:15:06.770 --> 00:15:08.000
that we'll define today.

00:15:08.000 --> 00:15:11.690
To compile your code now
means to take this C code

00:15:11.690 --> 00:15:17.215
and to convert it from source code here
to another type of source code here.

00:15:17.215 --> 00:15:20.090
Now, this is probably going to be
the most cryptic stuff we ever see.

00:15:20.090 --> 00:15:22.190
And this is not code
you need to understand.

00:15:22.190 --> 00:15:25.460
But what's on the screen here
is what's called assembly code.

00:15:25.460 --> 00:15:28.550
So long story short, there's a lot
of different computers in the world.

00:15:28.550 --> 00:15:30.650
And specifically, there's
a lot of different types

00:15:30.650 --> 00:15:35.730
of CPUs in the, Central Processing
Units, the brains of a computer.

00:15:35.730 --> 00:15:39.680
And a CPU understands certain commands.

00:15:39.680 --> 00:15:43.880
And those commands tend to be expressed
in this language called assembly code.

00:15:43.880 --> 00:15:46.597
Now, I honestly don't really
understand most of this myself.

00:15:46.597 --> 00:15:49.680
It's certainly been a while even since
I thought hard about assembly code.

00:15:49.680 --> 00:15:53.460
But if I highlight a few
operative characters here,

00:15:53.460 --> 00:15:56.570
notice that there's mention
of main, get_string, printf.

00:15:56.570 --> 00:16:00.170
So this is of like a lower-level
implementation of main,

00:16:00.170 --> 00:16:03.420
of get_string and printf, in a
different language called assembly.

00:16:03.420 --> 00:16:04.820
So you write the C code.

00:16:04.820 --> 00:16:08.630
The computer, though, converts it
to a more computer-friendly language

00:16:08.630 --> 00:16:09.960
called assembly code.

00:16:09.960 --> 00:16:12.320
And decades ago, humans
wrote this stuff.

00:16:12.320 --> 00:16:14.210
Humans wrote assembly code.

00:16:14.210 --> 00:16:17.585
But nowadays, we have C. And nowadays,
we have languages like Python--

00:16:17.585 --> 00:16:20.210
more on that in a few weeks--
that are just more user friendly,

00:16:20.210 --> 00:16:22.310
even if it didn't feel
like that this past week.

00:16:22.310 --> 00:16:26.180
Assembly code is a little closer to
what the computer itself understands.

00:16:26.180 --> 00:16:27.740
But there's still another step.

00:16:27.740 --> 00:16:29.240
There's this step called assembling.

00:16:29.240 --> 00:16:31.910
And again, all of this is
happening when you simply run

00:16:31.910 --> 00:16:34.580
make and, in turn, this command, clang.

00:16:34.580 --> 00:16:39.350
To assemble your code means to take this
assembly code and finally convert it

00:16:39.350 --> 00:16:41.720
to machine code, 0's and 1's.

00:16:41.720 --> 00:16:43.460
So you write the source code.

00:16:43.460 --> 00:16:46.700
The compiler assembles
it into assembly code.

00:16:46.700 --> 00:16:49.550
Then it compiles it into assembly code.

00:16:49.550 --> 00:16:54.650
Then it assembles it into machine code
until we have the actual 0's and 1's.

00:16:54.650 --> 00:16:56.610
But there's actually one final step.

00:16:56.610 --> 00:17:00.380
Just because your code that you wrote
has been converted into 0's and 1's, it

00:17:00.380 --> 00:17:04.369
still needs to be linked in with
the 0's and 1's that CS50 wrote

00:17:04.369 --> 00:17:07.280
and that the designers of the
C language wrote years ago

00:17:07.280 --> 00:17:09.680
when implementing the
CS50 library in our case,

00:17:09.680 --> 00:17:12.470
and the printf function in their case.

00:17:12.470 --> 00:17:15.950
So this is to say that when you
have code like this that's not only

00:17:15.950 --> 00:17:20.270
including the prototypes for functions
like get_string and printf at the very

00:17:20.270 --> 00:17:24.440
top, these lines here in yellow
are what are ultimately converted

00:17:24.440 --> 00:17:27.440
into 0's and 1's.

00:17:27.440 --> 00:17:32.270
We now have to combine those 0's and
1's with the 0's and 1's from cs50.c,

00:17:32.270 --> 00:17:35.030
which the staff wrote some
time ago, and even a file

00:17:35.030 --> 00:17:38.588
called stdio.c, which the
designers of C wrote years ago.

00:17:38.588 --> 00:17:40.880
And technically, it might be
called something different

00:17:40.880 --> 00:17:41.802
underneath the hood.

00:17:41.802 --> 00:17:43.760
But there's really three
files that are getting

00:17:43.760 --> 00:17:45.530
combined when you write your program.

00:17:45.530 --> 00:17:51.920
The first, I just claimed, once
it's preprocessed and compiled

00:17:51.920 --> 00:17:55.760
and assembled, it's then in
this form of all 0's and 1's.

00:17:55.760 --> 00:17:58.130
Somewhere on the CS50
IDE, there's a whole bunch

00:17:58.130 --> 00:18:00.800
of 0's and 1's representing cs50.c.

00:18:00.800 --> 00:18:03.410
Somewhere in CS50 IDE,
there's another file

00:18:03.410 --> 00:18:08.840
representing the 0's and 1's for stdio.c
So this final fourth step, a.k.a.

00:18:08.840 --> 00:18:13.280
linking, just takes all of my 0's
and 1's, all of CS50 0's and 1's, all

00:18:13.280 --> 00:18:18.800
of printf's 0's and 1's, and links
them all together into one big blob,

00:18:18.800 --> 00:18:23.870
if you will, that collectively
represent your program, hello.

00:18:23.870 --> 00:18:26.960
So, my god, like, that's quite
a mouthful and so many steps.

00:18:26.960 --> 00:18:31.250
And none of the steps have I described
are really germane to you implementing

00:18:31.250 --> 00:18:35.090
Mario's pyramid or cash or
credit, because what we've really

00:18:35.090 --> 00:18:37.340
been doing over the past
week is taking all four

00:18:37.340 --> 00:18:40.880
of these fairly low-level,
sophisticated concepts and, if you will,

00:18:40.880 --> 00:18:44.720
abstracting them away so that we
just refer to this whole process

00:18:44.720 --> 00:18:46.310
as compiling.

00:18:46.310 --> 00:18:48.380
So we even though, yes,
technically, compiling

00:18:48.380 --> 00:18:51.320
is just one of the four steps,
what a programmer typically

00:18:51.320 --> 00:18:54.470
does when saying compiling is
they're, just with a wave of the hand,

00:18:54.470 --> 00:18:58.400
referring to all of those
lower-level details.

00:18:58.400 --> 00:19:01.700
But it is the case that there's multiple
steps happening underneath the hood.

00:19:01.700 --> 00:19:04.610
And this is what make and, in
turn, Clang are doing for you,

00:19:04.610 --> 00:19:08.810
automating this process of going
from source code to assembly code

00:19:08.810 --> 00:19:13.153
to machine code and then linking it
all together with any libraries you

00:19:13.153 --> 00:19:13.820
might have used.

00:19:13.820 --> 00:19:15.800
So no longer take for
granted what's happening.

00:19:15.800 --> 00:19:17.990
Hopefully, that offers
you a glimpse a bit more

00:19:17.990 --> 00:19:21.860
of what's actually happening
when you compile your own code.

00:19:21.860 --> 00:19:24.800
Well, let me pause there,
because that's quite a mouthful,

00:19:24.800 --> 00:19:29.660
and see if there's any questions
on preprocessing, compiling,

00:19:29.660 --> 00:19:33.050
or assembling, or linking, a.k.a.

00:19:33.050 --> 00:19:35.120
compiling.

00:19:35.120 --> 00:19:37.550
And again, we won't
dwell at this low level.

00:19:37.550 --> 00:19:40.640
We'll tend to now just abstract this
all away if we can sort of agree

00:19:40.640 --> 00:19:42.540
that, OK, yes, there's those steps.

00:19:42.540 --> 00:19:45.290
But what's really important is the
whole process, not the minutia.

00:19:45.290 --> 00:19:46.260
Sophia?

00:19:46.260 --> 00:19:50.060
SOPHIA: I had a question about
with the first step, when

00:19:50.060 --> 00:19:53.720
we're replacing all the
information at the top,

00:19:53.720 --> 00:19:56.790
is that information
contained within the IDE?

00:19:56.790 --> 00:19:58.010
Or where do we--

00:19:58.010 --> 00:20:00.375
are there files saved somewhere
in that IDE, like, where

00:20:00.375 --> 00:20:02.000
it's getting all this information from?

00:20:02.000 --> 00:20:03.020
DAVID MALAN: Yeah, really good question.

00:20:03.020 --> 00:20:04.603
Where are all these files coming from?

00:20:04.603 --> 00:20:07.320
So yes, when you are using
CS50 IDE, or frankly,

00:20:07.320 --> 00:20:09.830
if you're using your
own Mac or your own PC,

00:20:09.830 --> 00:20:13.810
and you have preinstalled a compiler
into your Mac or PC just like we have

00:20:13.810 --> 00:20:18.500
to CS50 IDE, what you get is a
whole bunch of .h files somewhere

00:20:18.500 --> 00:20:19.700
on the computer system.

00:20:19.700 --> 00:20:23.950
You might also have a whole bunch of
.c files, or compiled versions thereof,

00:20:23.950 --> 00:20:24.950
somewhere on the system.

00:20:24.950 --> 00:20:28.370
So yes, when you download
and install a compiler,

00:20:28.370 --> 00:20:31.280
you are getting all of these
libraries added for you.

00:20:31.280 --> 00:20:35.720
And we preinstalled an additional
library called CS50's library that

00:20:35.720 --> 00:20:40.180
additionally comes with its own .h
file and its own machine code as well.

00:20:40.180 --> 00:20:43.250
So all of those files are
somewhere in CS50 IDE,

00:20:43.250 --> 00:20:46.460
or equivalently, in your own Mac
or PC if you're working locally.

00:20:46.460 --> 00:20:48.620
And the compiler, Clang,
in this case, just

00:20:48.620 --> 00:20:52.370
knows how to find that because one
of the steps involved in installing

00:20:52.370 --> 00:20:55.130
your own compiler is making
sure it's configured to know,

00:20:55.130 --> 00:20:58.010
per Sophia's question,
where all those files are.

00:21:00.770 --> 00:21:02.990
[? Basili? ?] I'm sorry
if I'm mispronouncing it.

00:21:02.990 --> 00:21:04.010
[? Basili? ?]

00:21:04.010 --> 00:21:06.800
[? BASILI: ?] So whenever
we're compiling hello,

00:21:06.800 --> 00:21:11.960
for example, is the compiler also
compiling, for example, CS50?

00:21:11.960 --> 00:21:16.387
Or does CS50 already exist in
machine code somewhere beneath?

00:21:16.387 --> 00:21:18.220
DAVID MALAN: Yeah,
really good question too.

00:21:18.220 --> 00:21:20.570
So I was kind of skirting
this part of Sophia's question

00:21:20.570 --> 00:21:25.640
because technically speaking, probably
cs50.c is not installed on the system.

00:21:25.640 --> 00:21:29.550
And technically, stdio.c is probably
not installed in the system.

00:21:29.550 --> 00:21:30.050
Why?

00:21:30.050 --> 00:21:31.160
It just doesn't need to be.

00:21:31.160 --> 00:21:32.868
It would be kind of
inefficient, that is,

00:21:32.868 --> 00:21:35.600
slow, if every time you
compiled your own program,

00:21:35.600 --> 00:21:39.050
you had to additionally compile
CS50's program, and stdio's program,

00:21:39.050 --> 00:21:40.020
and so forth.

00:21:40.020 --> 00:21:42.740
So it actually stands to reason
that what computers typically do

00:21:42.740 --> 00:21:46.490
is they precompile all of
those library files for you

00:21:46.490 --> 00:21:48.823
so that more efficiently
they can just be linked in.

00:21:48.823 --> 00:21:50.990
And you don't have to keep
preprocessing, compiling,

00:21:50.990 --> 00:21:53.330
and assembling third-party code.

00:21:53.330 --> 00:21:57.560
You only perform those steps on your own
code and then link everything together.

00:21:57.560 --> 00:21:59.270
And indeed, that's the case.

00:21:59.270 --> 00:22:01.490
It's all done in advance.

00:22:01.490 --> 00:22:03.800
Iris, question from you.

00:22:03.800 --> 00:22:07.070
IRIS: When we replace the
header files with prototypes,

00:22:07.070 --> 00:22:10.440
are we only replacing it with
the prototypes that get used?

00:22:10.440 --> 00:22:12.777
Or are all the prototypes
technically substituted?

00:22:12.777 --> 00:22:15.110
DAVID MALAN: Yeah, so I was
kind of sweeping that detail

00:22:15.110 --> 00:22:16.535
under the rug with my dot, dot, dot.

00:22:16.535 --> 00:22:18.618
There's a whole lot of
other stuff in those files.

00:22:18.618 --> 00:22:21.110
You're getting the entire
contents of those files,

00:22:21.110 --> 00:22:24.710
even if the only thing
you need is the prototype.

00:22:24.710 --> 00:22:27.710
But, and this is why I alluded
to the fact too that technically,

00:22:27.710 --> 00:22:30.860
there probably isn't a
stdio.c file, because there

00:22:30.860 --> 00:22:32.630
would be so much stuff in it.

00:22:32.630 --> 00:22:36.140
There's probably not just one
stdio.h file with everything in it.

00:22:36.140 --> 00:22:40.070
There's probably some smaller files
that get magically included as well.

00:22:40.070 --> 00:22:44.300
But yes, there are many more
lines of code in those files.

00:22:44.300 --> 00:22:47.330
But that's OK.

00:22:47.330 --> 00:22:51.920
Your compiler is only going to use the
lines that it actually cares about.

00:22:51.920 --> 00:22:53.120
Good question.

00:22:53.120 --> 00:22:56.450
All right, so with that
said, this past week

00:22:56.450 --> 00:22:58.850
undoubtedly was a bit
frustrating in some ways

00:22:58.850 --> 00:23:00.980
because you probably ran into problems.

00:23:00.980 --> 00:23:03.560
You ran into bugs,
mistakes in your own code.

00:23:03.560 --> 00:23:06.165
You probably saw one or more
yellow or red error messages.

00:23:06.165 --> 00:23:09.290
And you might have struggled a little
bit just to get your code to compile.

00:23:09.290 --> 00:23:10.670
And again, that's normal.

00:23:10.670 --> 00:23:12.390
That will go away over time.

00:23:12.390 --> 00:23:16.320
But honestly, whenever I write
C, let's say 20% of the time,

00:23:16.320 --> 00:23:20.400
I still have a compilation error, let
alone logical errors, in my own code.

00:23:20.400 --> 00:23:23.240
So this is just part of the
experience of writing code.

00:23:23.240 --> 00:23:25.370
Humans make mistakes
in all forms of life.

00:23:25.370 --> 00:23:28.130
And that's ever more true in the
context of code, where again,

00:23:28.130 --> 00:23:32.180
per our first two weeks precision
is important as is correctness.

00:23:32.180 --> 00:23:35.520
And it's hard sometimes to
achieve both of those goals.

00:23:35.520 --> 00:23:38.060
So let's consider now
how you might be more

00:23:38.060 --> 00:23:42.590
empowered to debug your own code-- that
is, find problems in your own code.

00:23:42.590 --> 00:23:44.750
And this word actually
has some etymology.

00:23:44.750 --> 00:23:46.670
This isn't necessarily the first bug.

00:23:46.670 --> 00:23:49.130
But perhaps the most famous
bug is this one pictured

00:23:49.130 --> 00:23:53.060
here from the research
notebook of Grace Hopper,

00:23:53.060 --> 00:23:56.090
a famous computer scientist,
who had discovered

00:23:56.090 --> 00:23:59.810
that there were some problems with
the Harvard Mark II computer, a very

00:23:59.810 --> 00:24:03.440
famous computer nowadays
that actually lives over soon

00:24:03.440 --> 00:24:05.240
in the new engineering
school on campus--

00:24:05.240 --> 00:24:06.830
used to live in the Science Center.

00:24:06.830 --> 00:24:08.330
The computer was having problems.

00:24:08.330 --> 00:24:12.770
And sure enough, when the engineers
took a look inside of this big mainframe

00:24:12.770 --> 00:24:15.770
computer, there was actually
a bug, pictured here

00:24:15.770 --> 00:24:17.900
and taped to Grace Hopper's notebook.

00:24:17.900 --> 00:24:20.840
So this wasn't necessarily the
first use of the term "bug,"

00:24:20.840 --> 00:24:25.110
but it is a very well-known example of
an actual bug in an actual computer.

00:24:25.110 --> 00:24:27.860
Nowadays, we speak a little more
metaphorically that a bug is just

00:24:27.860 --> 00:24:29.760
a mistake in one program.

00:24:29.760 --> 00:24:33.020
And we did give you a few tools
last week for troubleshooting bugs.

00:24:33.020 --> 00:24:37.135
Help50 allows you to better understand
some of the cryptic error messages.

00:24:37.135 --> 00:24:39.510
And that's just because the
staff wrote this program that

00:24:39.510 --> 00:24:41.610
analyzed the problem
you're having, and we try

00:24:41.610 --> 00:24:44.250
to translate it to just
more human-friendly speak.

00:24:44.250 --> 00:24:47.400
We saw a tool called style50, which
helps you not with your correctness,

00:24:47.400 --> 00:24:49.470
but just with the
aesthetics of your code,

00:24:49.470 --> 00:24:52.020
helping you better indent things
and add white space-- that

00:24:52.020 --> 00:24:55.050
is, blank lines or space characters--
so it's a little more user

00:24:55.050 --> 00:24:56.760
friendly to the human to read.

00:24:56.760 --> 00:24:59.130
And then check50, which,
of course, the staff

00:24:59.130 --> 00:25:01.560
write so that we can give
you immediate feedback on

00:25:01.560 --> 00:25:05.230
whether or not your code is correct
per the problem sets or the lab

00:25:05.230 --> 00:25:06.450
specification.

00:25:06.450 --> 00:25:09.323
But there's some other tools that
you should have in your toolkit.

00:25:09.323 --> 00:25:10.740
And we'll give those to you today.

00:25:10.740 --> 00:25:14.790
And one, frankly, is this universal
debugging tool just called,

00:25:14.790 --> 00:25:16.928
in the context of C, printf.

00:25:16.928 --> 00:25:18.720
So printf, of course,
is just this function

00:25:18.720 --> 00:25:20.470
that prints stuff out onto the screen.

00:25:20.470 --> 00:25:24.270
But that in and of itself is
a wonderfully powerful tool

00:25:24.270 --> 00:25:26.820
via which you can chase
down problems in your code.

00:25:26.820 --> 00:25:29.940
And even after we leave C
in a few weeks and introduce

00:25:29.940 --> 00:25:33.690
Python and other languages, almost
every programming language out there

00:25:33.690 --> 00:25:35.460
has some form of printf.

00:25:35.460 --> 00:25:36.480
Maybe it's called print.

00:25:36.480 --> 00:25:38.940
Maybe it's called say,
as it was in Scratch,

00:25:38.940 --> 00:25:43.780
but some ability to display information
or present information to a human.

00:25:43.780 --> 00:25:47.700
So let's try to use this
primitive, this notion of print f,

00:25:47.700 --> 00:25:49.760
to chase down a bug in one's code.

00:25:49.760 --> 00:25:52.950
So let me go ahead and
deliberately write a buggy program.

00:25:52.950 --> 00:25:56.570
I'm going to even call
the file buggy0.c.

00:25:56.570 --> 00:26:01.230
And at the top of this file, I'm going
to go ahead and #include stdio.h.

00:26:01.230 --> 00:26:03.810
No need for the CS50
library for this one.

00:26:03.810 --> 00:26:06.960
And then I'm going to do int
main(void), which we saw last week,

00:26:06.960 --> 00:26:08.700
and we'll explain in more detail today.

00:26:08.700 --> 00:26:10.260
And then I'm going to
give myself a quick loop.

00:26:10.260 --> 00:26:12.990
I just want to go ahead and print
out, oh, I don't know, like,

00:26:12.990 --> 00:26:14.580
10 hashes on the screen.

00:26:14.580 --> 00:26:17.430
So I want to print a vertical
column, kind of like one

00:26:17.430 --> 00:26:20.190
of those screenshots from Super
Mario Bros., not a pyramid,

00:26:20.190 --> 00:26:23.020
just a single column of
hashes, and 10 of them.

00:26:23.020 --> 00:26:25.830
So I'm going to do
something like, int i = 0,

00:26:25.830 --> 00:26:28.140
because I feel like I learned
in class that I generally

00:26:28.140 --> 00:26:29.562
should start counting from 0.

00:26:29.562 --> 00:26:31.770
Then I'm going to have my
condition in this for loop.

00:26:31.770 --> 00:26:33.190
And I want to do this 10 times.

00:26:33.190 --> 00:26:35.200
I'm going to do it less
than or equal to 10.

00:26:35.200 --> 00:26:37.242
Then I'm going to go ahead
and have my increment,

00:26:37.242 --> 00:26:39.478
which quite simply can
be expressed as i++.

00:26:39.478 --> 00:26:42.270
And then inside this loop, I'm just
going to go ahead and print out

00:26:42.270 --> 00:26:44.970
a single hash followed by a new line.

00:26:44.970 --> 00:26:46.680
I'm going to save the program.

00:26:46.680 --> 00:26:51.870
I'm going to compile it with
clang -o buggy0 buggy0--

00:26:51.870 --> 00:26:52.768
I mean, no.

00:26:52.768 --> 00:26:54.810
You don't have to use
Clang manually in this way.

00:26:54.810 --> 00:26:58.453
It's a lot simpler to
just abstract that away--

00:26:58.453 --> 00:26:59.370
that's not a command--

00:26:59.370 --> 00:27:03.330
to abstract that away
and run make buggy0.

00:27:03.330 --> 00:27:07.475
And make will take care of the
process of invoking Clang for you.

00:27:07.475 --> 00:27:08.850
I'm going to go ahead and run it.

00:27:08.850 --> 00:27:12.290
Seems to be compiling successfully,
so no need for help50.

00:27:12.290 --> 00:27:13.890
It's already pretty well styled.

00:27:13.890 --> 00:27:18.420
In fact, if I run style50 on this
buggy0, I don't have any comments yet.

00:27:18.420 --> 00:27:20.490
But at least it looks
very nicely indented.

00:27:20.490 --> 00:27:22.000
So I think I'm OK with that.

00:27:22.000 --> 00:27:25.530
But let me add that comment
and do "Print 10 hashes" just

00:27:25.530 --> 00:27:27.120
to remind myself of my goal.

00:27:27.120 --> 00:27:31.290
And now let me go ahead and
run this, ./buggy0, Enter.

00:27:31.290 --> 00:27:32.670
And I see, OK, good.

00:27:32.670 --> 00:27:38.147
1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, I think.

00:27:38.147 --> 00:27:39.480
All right, so it's a stupid bug.

00:27:39.480 --> 00:27:41.970
And maybe it's jumped out
obviously to some of you.

00:27:41.970 --> 00:27:44.220
But maybe it's a little more
subtle to others of you.

00:27:44.220 --> 00:27:45.250
But where do you begin?

00:27:45.250 --> 00:27:46.830
Suppose I were to run check50.

00:27:46.830 --> 00:27:51.570
And check50 were to say, nope, you
printed out 11 hashes instead of 10.

00:27:51.570 --> 00:27:54.210
But my code looks right to
me, at least at first glance.

00:27:54.210 --> 00:27:57.220
Well, how can I go about
debugging this or solving this?

00:27:57.220 --> 00:27:58.780
Well, again, printf is your friend.

00:27:58.780 --> 00:28:01.560
If you want to understand
more about your own program,

00:28:01.560 --> 00:28:05.460
use printf to temporarily print
more information to the screen,

00:28:05.460 --> 00:28:08.790
not that we want in the final version,
not that your TF wants to see,

00:28:08.790 --> 00:28:11.340
but that you, the programmer,
can temporarily see.

00:28:11.340 --> 00:28:14.040
So before I print this
hash, let me print something

00:28:14.040 --> 00:28:15.900
a little more pedantic like this--

00:28:15.900 --> 00:28:19.950
"i is now %i backslash n."

00:28:19.950 --> 00:28:25.440
So I literally want to know, just for my
own mental math, what is the value of i

00:28:25.440 --> 00:28:28.230
at this point before I print that hash?

00:28:28.230 --> 00:28:30.480
Now I'm going to go ahead
and paste in the value of i.

00:28:30.480 --> 00:28:32.640
So I'm using %i as a placeholder.

00:28:32.640 --> 00:28:35.100
I'm plugging in the
value of the variable i.

00:28:35.100 --> 00:28:36.870
I'm going to save my code now.

00:28:36.870 --> 00:28:39.750
I'm going to recompile
it with make buggy0.

00:28:39.750 --> 00:28:41.380
And I'm going to rerun it now.

00:28:41.380 --> 00:28:43.680
And let me go ahead and
increase the size of my window

00:28:43.680 --> 00:28:46.390
just so we can focus now on the output.

00:28:46.390 --> 00:28:50.440
And I'm going to go ahead
and ./buggy0, Enter.

00:28:50.440 --> 00:28:53.670
OK, so now I see not
only my output, but also

00:28:53.670 --> 00:28:56.670
commingled with that output, some
diagnostic output, if you will,

00:28:56.670 --> 00:28:58.080
some debugging output.

00:28:58.080 --> 00:29:02.430
And it's just more pedantically
telling me, "i is now 0," "i is now 1,"

00:29:02.430 --> 00:29:08.490
"i is now 2," dot, dot, dot,
"i is now 9," "i is now 10."

00:29:08.490 --> 00:29:11.040
OK, I don't hate the fact that i is 10.

00:29:11.040 --> 00:29:15.630
But I'm not loving the fact that if
I started at 0 and printed a hash,

00:29:15.630 --> 00:29:19.140
and I'm hitting 10 and printing
another hash, well, obviously,

00:29:19.140 --> 00:29:20.180
there's my problem.

00:29:20.180 --> 00:29:22.620
So it might not have been
all that much more obvious

00:29:22.620 --> 00:29:24.030
than looking at the code itself.

00:29:24.030 --> 00:29:27.090
But by using printf, you
can just be a lot more clear

00:29:27.090 --> 00:29:28.900
to yourself what's going on.

00:29:28.900 --> 00:29:32.490
So if now I see, OK, well, if I
start at 0, I have to go up to 10.

00:29:32.490 --> 00:29:35.100
I could change my code to
do this to be less than 10.

00:29:35.100 --> 00:29:38.040
I could leave that alone
and go from 1 through 10.

00:29:38.040 --> 00:29:41.920
But again, programmer convention
would be to go from 0 up to 10.

00:29:41.920 --> 00:29:43.140
So I think I'm good now.

00:29:43.140 --> 00:29:46.662
And in fact, now I'll go ahead
and recompile this, make buggy0.

00:29:46.662 --> 00:29:49.620
Let me go ahead and increase the size
of the window again just so I can

00:29:49.620 --> 00:29:53.980
temporarily see this and ./buggy0.

00:29:53.980 --> 00:29:57.460
OK, I start now at 0,
1, 2, dot, dot, dot.

00:29:57.460 --> 00:29:59.160
Now I stop at 9.

00:29:59.160 --> 00:30:01.080
And that, of course, gives me 10 hashes.

00:30:01.080 --> 00:30:03.343
So again, I don't need
this in the final output.

00:30:03.343 --> 00:30:05.010
And I'm to go ahead and delete this now.

00:30:05.010 --> 00:30:06.510
It's temporary output.

00:30:06.510 --> 00:30:08.760
But again, having those
instincts-- if you don't quite

00:30:08.760 --> 00:30:12.120
understand why your code is
compiling but not running properly,

00:30:12.120 --> 00:30:15.360
and you want to better see what
the computer is clearly seeing,

00:30:15.360 --> 00:30:18.930
its mind eye, use printf
to just tell yourself

00:30:18.930 --> 00:30:23.790
what the value of some variable or
variables are anywhere in your code

00:30:23.790 --> 00:30:26.732
that you want to see
a little more detail.

00:30:26.732 --> 00:30:28.440
All right, let me
pause for just a moment

00:30:28.440 --> 00:30:32.220
to see if there's any questions on
this technique of just using printf

00:30:32.220 --> 00:30:37.830
to begin to debug your code and
to see the values of variables

00:30:37.830 --> 00:30:40.560
in a way that's a little more explicit.

00:30:43.980 --> 00:30:44.580
No?

00:30:44.580 --> 00:30:45.670
All right.

00:30:45.670 --> 00:30:50.130
Well, let me propose an even more
powerful tool that admittedly

00:30:50.130 --> 00:30:51.480
takes a little getting used to.

00:30:51.480 --> 00:30:54.000
But this is kind of
one of those lessons,

00:30:54.000 --> 00:30:58.350
trust me, if you will, that if you
spend a few more minutes, maybe even

00:30:58.350 --> 00:31:01.320
an hour or so this week,
learning the following tool,

00:31:01.320 --> 00:31:04.500
you will save yourself
hours, plural, maybe even

00:31:04.500 --> 00:31:07.440
tens of hours over the
course of the next many weeks

00:31:07.440 --> 00:31:12.520
because this tool can help you truly
see what's going on inside of your code.

00:31:12.520 --> 00:31:15.870
So this tool we're going to add to
the list today is called debug50.

00:31:15.870 --> 00:31:20.130
And while this one does end with
50, implying that it's a CS50 tool,

00:31:20.130 --> 00:31:24.450
it's built on top of an industry
standard tool known as GDB, the GNU

00:31:24.450 --> 00:31:27.960
DeBugger, that's a standard tool that
a lot of different computer systems

00:31:27.960 --> 00:31:32.520
use to provide you with the ability to
debug your code in a more sophisticated

00:31:32.520 --> 00:31:35.530
way than just using printf alone.

00:31:35.530 --> 00:31:36.780
So let's go ahead and do this.

00:31:36.780 --> 00:31:39.360
Let me go back to the buggy
version of this program

00:31:39.360 --> 00:31:43.620
which, recall, had me going from 0
through 10, which was too many steps.

00:31:43.620 --> 00:31:47.850
A moment ago, I proposed that we just
use printf to see the value of i.

00:31:47.850 --> 00:31:50.640
But frankly, the bigger our
programs get, the more complicated

00:31:50.640 --> 00:31:53.730
they get, the more output they
need to have on the screen.

00:31:53.730 --> 00:31:56.250
It's just going to
get very messy quickly

00:31:56.250 --> 00:31:58.800
if you're printing out stuff
that shouldn't be there, right?

00:31:58.800 --> 00:31:59.910
Think back to Mario.

00:31:59.910 --> 00:32:03.060
Mario's pyramid is this
sort of graphical output.

00:32:03.060 --> 00:32:07.860
And it would very quickly get ugly and
kind of hard to understand your pyramid

00:32:07.860 --> 00:32:11.520
if you're comingling that pyramid
with actual textual output from printf

00:32:11.520 --> 00:32:12.430
as well.

00:32:12.430 --> 00:32:16.560
So debug50, and in turn a
debugger in any language,

00:32:16.560 --> 00:32:20.580
is a tool that allows you to
run your code step by step

00:32:20.580 --> 00:32:26.550
and look inside of variables and other
pieces of memory inside of the computer

00:32:26.550 --> 00:32:28.080
while your program is running.

00:32:28.080 --> 00:32:31.800
Right now, pretty much every program
we run takes a split second to run.

00:32:31.800 --> 00:32:34.170
That's way too fast for me,
the human, to wrap my mind

00:32:34.170 --> 00:32:36.330
around what's going on step by step.

00:32:36.330 --> 00:32:38.550
A debugger allows you
to run your program,

00:32:38.550 --> 00:32:42.970
but much more slowly, step by step,
so you can see what's going on.

00:32:42.970 --> 00:32:48.030
So I'm going to go ahead
now and run debug50 ./hello.

00:32:48.030 --> 00:32:52.380
No, sorry, debug50 ./buggy0.

00:32:52.380 --> 00:32:54.900
So I write debug50
first, a space, and then

00:32:54.900 --> 00:32:56.910
dot slash and the name
of the program that's

00:32:56.910 --> 00:32:59.785
already compiled that I want to debug.

00:32:59.785 --> 00:33:01.410
So I'm going to go ahead and hit Enter.

00:33:01.410 --> 00:33:03.240
And notice that, oh, it was smart.

00:33:03.240 --> 00:33:05.100
It noticed that I changed my code.

00:33:05.100 --> 00:33:06.060
And I did a moment ago.

00:33:06.060 --> 00:33:07.740
I reverted it back to the buggy version.

00:33:07.740 --> 00:33:10.380
So let me fix this-- make buggy0.

00:33:10.380 --> 00:33:11.620
All right, no errors.

00:33:11.620 --> 00:33:13.500
Now let me go ahead
and run debug50 again.

00:33:13.500 --> 00:33:17.280
And if you haven't noticed this already,
sometimes I seem to type crazy fast.

00:33:17.280 --> 00:33:19.180
I'm not necessarily typing that fast.

00:33:19.180 --> 00:33:21.960
I'm going through my
history in CS50 IDE.

00:33:21.960 --> 00:33:25.470
Using your arrow keys, Up
and Down, you can scroll back

00:33:25.470 --> 00:33:29.070
in time for all of the commands
you've typed over the past few minutes

00:33:29.070 --> 00:33:30.430
or hours or even days.

00:33:30.430 --> 00:33:32.430
And this will just start
to save you keystrokes.

00:33:32.430 --> 00:33:33.870
So I'm going to go ahead and hit Up.

00:33:33.870 --> 00:33:36.495
And now I don't have to bother
typing this whole command again.

00:33:36.495 --> 00:33:38.320
It's a helpful way to just save time.

00:33:38.320 --> 00:33:40.800
I'm going to go head
in now and hit Enter.

00:33:40.800 --> 00:33:43.650
And now notice this error message--

00:33:43.650 --> 00:33:45.050
I haven't set any breakpoints.

00:33:45.050 --> 00:33:48.300
"Set at least one breakpoint by clicking
to the left of a line number and then

00:33:48.300 --> 00:33:49.500
re-run debug50!"

00:33:49.500 --> 00:33:51.420
Well, what's going on here?

00:33:51.420 --> 00:33:55.620
Well, debug50 needs me to tell the
computer in advance at what line

00:33:55.620 --> 00:33:59.910
I want to break into and
step through step by step.

00:33:59.910 --> 00:34:01.020
So, I can do that.

00:34:01.020 --> 00:34:03.780
I'm going to go over to the side
of the file here, as it says.

00:34:03.780 --> 00:34:04.530
And you know what?

00:34:04.530 --> 00:34:08.460
The first interesting line
is this one here, line 6.

00:34:08.460 --> 00:34:12.060
So I clicked in the so-called gutter,
the left-hand side of the screen,

00:34:12.060 --> 00:34:13.170
on line 6.

00:34:13.170 --> 00:34:16.139
And that automatically put a
red dot there, like a stop sign.

00:34:16.139 --> 00:34:21.420
Now, one last time, I'm going to go
ahead and run debug50 ./buggy0 and hit

00:34:21.420 --> 00:34:21.960
Enter.

00:34:21.960 --> 00:34:25.887
And now notice this fancy new panel
opens up on the right-hand side.

00:34:25.887 --> 00:34:27.929
And it's going to look a
little cryptic at first.

00:34:27.929 --> 00:34:30.219
But let's consider what
has changed on the screen.

00:34:30.219 --> 00:34:34.440
Notice now that highlighted in this
sort of off-yellow color is line 6.

00:34:34.440 --> 00:34:37.949
And that's because what debug50 is
doing is it's running my program,

00:34:37.949 --> 00:34:41.610
but it has paused execution on line 6.

00:34:41.610 --> 00:34:44.100
So it's done everything
from line 1 through 5,

00:34:44.100 --> 00:34:46.860
but now it's waiting for me on line 6.

00:34:46.860 --> 00:34:49.620
And what's interesting
over here is this-- let

00:34:49.620 --> 00:34:51.929
me zoom in on this window over here.

00:34:51.929 --> 00:34:54.150
And there's a lot going
on here, admittedly.

00:34:54.150 --> 00:34:59.190
But let's focus for just a moment not
on Watch Expressions, not on Call Stack,

00:34:59.190 --> 00:35:00.850
but only on Local Variables.

00:35:00.850 --> 00:35:04.380
And notice, I have a variable
called i whose initial value is 0,

00:35:04.380 --> 00:35:05.820
and it's of type int.

00:35:05.820 --> 00:35:09.150
Now, this is kind of interesting because
watch what I can do via these icons

00:35:09.150 --> 00:35:09.930
up here.

00:35:09.930 --> 00:35:15.360
I can click on this Step Over line
and start to step through my code line

00:35:15.360 --> 00:35:16.007
by line.

00:35:16.007 --> 00:35:17.340
So let me go ahead and zoom out.

00:35:17.340 --> 00:35:18.870
Let me go ahead and click Step Over.

00:35:18.870 --> 00:35:21.180
And watch what happens to
the yellow highlighting.

00:35:21.180 --> 00:35:23.140
It moves down to the next line.

00:35:23.140 --> 00:35:27.090
But notice, if I zoom in again up
here, the value of i has not changed.

00:35:27.090 --> 00:35:29.460
Now let me go ahead and step over again.

00:35:29.460 --> 00:35:31.740
And notice the yellow
highlighting doubles back.

00:35:31.740 --> 00:35:33.790
That makes sense because I'm in a loop.

00:35:33.790 --> 00:35:36.760
So it should be going back
and forth, back and forth.

00:35:36.760 --> 00:35:38.123
But what next happens in a loop?

00:35:38.123 --> 00:35:40.290
Every time you go back to
the beginning of the loop,

00:35:40.290 --> 00:35:43.770
remember that your incrementation
happens, like the i++.

00:35:43.770 --> 00:35:46.530
So watch now closely in
the top right-hand corner,

00:35:46.530 --> 00:35:52.110
when I Step Over now, notice that
the value of i in my debugger

00:35:52.110 --> 00:35:54.058
has just been changed to 1.

00:35:54.058 --> 00:35:55.350
So I didn't have to use printf.

00:35:55.350 --> 00:35:57.400
I didn't have to mess up
the output of my screen.

00:35:57.400 --> 00:35:59.850
I can literally see in this
GUI, this Graphical User

00:35:59.850 --> 00:36:02.790
Interface on the right-hand
side, what the value of i is.

00:36:02.790 --> 00:36:05.310
Now if I just start clicking
a little more quickly,

00:36:05.310 --> 00:36:09.900
notice that as the loop is executing,
again and again, the value of i

00:36:09.900 --> 00:36:11.070
keeps getting updated.

00:36:11.070 --> 00:36:11.820
And you know what?

00:36:11.820 --> 00:36:15.930
I bet, even though we started
at 0, if I do this enough times,

00:36:15.930 --> 00:36:18.990
I will see that the
value is 10 now, thereby

00:36:18.990 --> 00:36:25.110
giving me another printf at the bottom,
thereby explaining the 11 total hashes

00:36:25.110 --> 00:36:25.950
that I saw.

00:36:25.950 --> 00:36:28.450
So I haven't gotten any
new information here.

00:36:28.450 --> 00:36:30.960
But notice I've gotten
unperturbed information.

00:36:30.960 --> 00:36:35.370
I've not messily and sloppily printed
out all of these printf statements

00:36:35.370 --> 00:36:36.100
on the screen.

00:36:36.100 --> 00:36:38.430
I'm just kind of watching
a little more methodically

00:36:38.430 --> 00:36:43.230
what's happening to the state of my
variable over on the top right there.

00:36:43.230 --> 00:36:47.700
All right, let me pause here too to
see if there's any questions on what

00:36:47.700 --> 00:36:49.230
this debugger does.

00:36:49.230 --> 00:36:51.150
Again, you compile your code.

00:36:51.150 --> 00:36:56.340
You run debug50 on your code, but only
after setting a so-called breakpoint,

00:36:56.340 --> 00:37:00.575
where you decide in advance where do you
want to pause execution of your code.

00:37:00.575 --> 00:37:03.450
Even though here I did it pretty
much at the beginning of my program,

00:37:03.450 --> 00:37:05.242
for bigger programs,
it's going to be super

00:37:05.242 --> 00:37:07.718
convenient to be able to pause
halfway through your code

00:37:07.718 --> 00:37:09.510
and not have to go
through the whole thing.

00:37:09.510 --> 00:37:11.430
Peter, question.

00:37:11.430 --> 00:37:16.350
PETER: About the debugger, what's
the difference between Step Over

00:37:16.350 --> 00:37:18.813
and Step Into and Step Out and--

00:37:18.813 --> 00:37:20.230
DAVID MALAN: Really good question.

00:37:20.230 --> 00:37:21.980
Let me come back to
that in just a moment,

00:37:21.980 --> 00:37:25.800
because we'll do one other example
where Step Into and Step Out actually

00:37:25.800 --> 00:37:27.520
are germane.

00:37:27.520 --> 00:37:28.490
But before we do that.

00:37:28.490 --> 00:37:33.520
Any other questions about debug50
before we reveal what Step Into and Step

00:37:33.520 --> 00:37:35.335
Over do for us as well?

00:37:38.940 --> 00:37:39.910
Oh, all right.

00:37:39.910 --> 00:37:42.310
Well, let's take Peter's
question right there.

00:37:42.310 --> 00:37:44.705
Let me go ahead now and
get out of the debugger.

00:37:44.705 --> 00:37:46.830
And honestly, I don't see
an obvious way to get out

00:37:46.830 --> 00:37:48.490
of the debugger at the moment.

00:37:48.490 --> 00:37:51.240
But Control-C is your
new friend today too.

00:37:51.240 --> 00:37:53.700
Pretty much any time you
lose control of a program

00:37:53.700 --> 00:37:56.880
because the debugger's running,
and you've lost interest in it.

00:37:56.880 --> 00:37:58.770
Or maybe last week, you
wrote a program that

00:37:58.770 --> 00:38:01.800
has an infinite loop that just
keeps going and going and going,

00:38:01.800 --> 00:38:04.110
Control-C will break
out of that program.

00:38:04.110 --> 00:38:07.290
But let's now write quickly
another program that, this time,

00:38:07.290 --> 00:38:08.430
has a second function.

00:38:08.430 --> 00:38:10.800
And we'll see one other
feature of the debugger today.

00:38:10.800 --> 00:38:14.520
I'm going to go ahead and create
a new file now called buggy1.c.

00:38:14.520 --> 00:38:16.470
Again, it's going to
be deliberately flawed.

00:38:16.470 --> 00:38:20.280
But I'm going to first going to go
ahead and #include cs50.h this time.

00:38:20.280 --> 00:38:22.830
And I'm going to #include stdio.h.

00:38:22.830 --> 00:38:24.590
I'm going to do int main void.

00:38:24.590 --> 00:38:27.090
And I'm going to go ahead and
do the following-- give myself

00:38:27.090 --> 00:38:28.380
a variable called i.

00:38:28.380 --> 00:38:31.290
And I'm going to try to get
a negative int by calling

00:38:31.290 --> 00:38:33.180
a function called get_negative_int.

00:38:33.180 --> 00:38:37.740
And then quite simply, I'm going to
print out this value, "%i backslash n",

00:38:37.740 --> 00:38:39.210
i, semicolon.

00:38:39.210 --> 00:38:40.860
Now, there's only one problem--

00:38:40.860 --> 00:38:43.210
get_negative_int does not exist.

00:38:43.210 --> 00:38:45.870
So like last week, where we
implemented get_positive_int,

00:38:45.870 --> 00:38:47.790
this week, I'll implement
get_negative_int.

00:38:47.790 --> 00:38:49.890
But I'm going to do it
incorrectly at first.

00:38:49.890 --> 00:38:54.300
Now, get_negative_int, as the name
implies, needs to return an integer.

00:38:54.300 --> 00:38:57.330
And even though we only spent
brief time on this last week,

00:38:57.330 --> 00:39:00.210
recall that you can specify
the output of a function,

00:39:00.210 --> 00:39:03.720
a custom function that you wrote,
by putting its so-called return

00:39:03.720 --> 00:39:05.555
value first on this line.

00:39:05.555 --> 00:39:08.430
And then you can put the name of
the function, like get_negative_int,

00:39:08.430 --> 00:39:11.940
and then in parentheses, you can
put the input to the function.

00:39:11.940 --> 00:39:15.030
But if it takes no input, you can
literally write the word "void,"

00:39:15.030 --> 00:39:17.965
which is a term of art that
just means, nothing goes here.

00:39:17.965 --> 00:39:20.340
I'm going to go ahead now and
implement get_negative_int.

00:39:20.340 --> 00:39:22.920
And frankly, I think it's going
to be pretty similar to last week.

00:39:22.920 --> 00:39:24.212
But my memory is a little hazy.

00:39:24.212 --> 00:39:26.310
So again, it will be
deliberately flawed.

00:39:26.310 --> 00:39:29.130
But I'm going to go ahead and
declare a variable called n.

00:39:29.130 --> 00:39:31.420
Then I'm going to do the following--

00:39:31.420 --> 00:39:34.170
I'm going to set n equal to get_int.

00:39:34.170 --> 00:39:39.000
And I'm just going to explicitly ask
the user for "Negative integer" followed

00:39:39.000 --> 00:39:39.900
by a space.

00:39:39.900 --> 00:39:44.220
And then I'm going to keep doing
this while n is less than 0.

00:39:44.220 --> 00:39:48.540
And then at the very last
line, I'm going to return n.

00:39:48.540 --> 00:39:51.120
So again, I claim that
this function will

00:39:51.120 --> 00:39:53.340
get me a negative int from the user.

00:39:53.340 --> 00:39:57.810
And it's going to keep doing it again
and again until the user cooperates.

00:39:57.810 --> 00:40:00.720
However, there is a bug.

00:40:00.720 --> 00:40:02.730
And there's a couple of bugs, in fact.

00:40:02.730 --> 00:40:06.720
Right now, let me go ahead and make
a deliberate mistake-- make buggy1,

00:40:06.720 --> 00:40:07.740
Enter.

00:40:07.740 --> 00:40:10.020
And I see a whole bunch of errors here.

00:40:10.020 --> 00:40:12.300
I could use help50 on this.

00:40:12.300 --> 00:40:16.290
But based on last week, does anyone
recall what the error here might be?

00:40:16.290 --> 00:40:20.100
"Error-- implicit declaration
of function 'get_negative_int'

00:40:20.100 --> 00:40:21.930
is invalid in C99."

00:40:21.930 --> 00:40:24.992
So I don't know all of that, but
implicit declaration of function

00:40:24.992 --> 00:40:26.700
is something you're
going to start to see

00:40:26.700 --> 00:40:28.770
more often if you make this mistake.

00:40:28.770 --> 00:40:35.030
Anyone recall what this means and what
the fix is without resorting to help50?

00:40:35.030 --> 00:40:37.760
Yeah, Jasmine, what do you think?

00:40:37.760 --> 00:40:40.370
JASMINE: So basically,
since you declared it

00:40:40.370 --> 00:40:42.830
after you already used
it in your code, it

00:40:42.830 --> 00:40:46.050
doesn't know what to read that
as when it's processing it.

00:40:46.050 --> 00:40:49.825
So you have to move the first line
above when you actually start the code.

00:40:49.825 --> 00:40:50.700
DAVID MALAN: Perfect.

00:40:50.700 --> 00:40:53.690
And this is the only time I
will claim that copy/paste

00:40:53.690 --> 00:40:55.730
is acceptable and encouraged.

00:40:55.730 --> 00:40:59.180
I'm going to copy the very first
line only of that function.

00:40:59.180 --> 00:41:02.840
And as Jasmine proposed, I'm going to
paste it at the very top of the file,

00:41:02.840 --> 00:41:05.990
thereby giving myself a hint
otherwise known as a prototype.

00:41:05.990 --> 00:41:09.290
So I'll even label it as such to
remind myself why it's there--

00:41:09.290 --> 00:41:11.720
prototype of that function.

00:41:11.720 --> 00:41:16.790
And here, I'm going to go ahead and
"Get negative integer from user."

00:41:16.790 --> 00:41:20.720
And then this function
is as left as written.

00:41:20.720 --> 00:41:23.340
So I now have this
prototype at the very top

00:41:23.340 --> 00:41:25.840
of my file, which I think will
indeed get rid of this error.

00:41:25.840 --> 00:41:27.950
Let me go to make buggy1 again.

00:41:27.950 --> 00:41:29.960
Now I see that indeed compiled OK.

00:41:29.960 --> 00:41:33.110
But when I run it now, ./buggy1--

00:41:33.110 --> 00:41:36.270
let me go ahead and input a
negative integer, negative 1.

00:41:36.270 --> 00:41:36.770
Hm.

00:41:36.770 --> 00:41:38.685
Negative 2, negative 3--

00:41:38.685 --> 00:41:41.810
I feel like the function should be
happy with this, and it's obviously not.

00:41:41.810 --> 00:41:42.650
So there's a bug.

00:41:42.650 --> 00:41:45.470
I'm going to go ahead and hit
Control-C to get out of my program

00:41:45.470 --> 00:41:47.810
because otherwise, it would
run potentially forever.

00:41:47.810 --> 00:41:49.610
And now I'm going to use debug50.

00:41:49.610 --> 00:41:53.090
But debug50 just got really
interesting, to Peter's question

00:41:53.090 --> 00:41:56.180
earlier, because now I have
things I can step into.

00:41:56.180 --> 00:41:58.070
I'm not writing all of my code in main.

00:41:58.070 --> 00:42:00.570
There's this other function
now called get_negative_int.

00:42:00.570 --> 00:42:02.390
So let's see what happens now.

00:42:02.390 --> 00:42:05.930
Let me go ahead and set a breakpoint
on the first interesting line of code,

00:42:05.930 --> 00:42:06.532
line 10.

00:42:06.532 --> 00:42:08.990
And it's interesting only in
the sense that everything else

00:42:08.990 --> 00:42:10.670
is kind of boilerplate at this point.

00:42:10.670 --> 00:42:13.460
You just have to do it to
get your program started.

00:42:13.460 --> 00:42:15.020
I'm going to now go down here.

00:42:15.020 --> 00:42:18.770
And I'm going to do debug50 ./buggy1.

00:42:18.770 --> 00:42:22.220
And in a moment, it's going
to open up that sidebar.

00:42:22.220 --> 00:42:25.640
And I'm going to focus now
not only on local variables--

00:42:25.640 --> 00:42:29.810
like I did before, notice that i is
again equal to 0 here by default.

00:42:29.810 --> 00:42:33.560
But I'm also going to reveal
this option here, Call Stack.

00:42:33.560 --> 00:42:38.120
So Call Stack is a fancy way of
referring to all of the functions

00:42:38.120 --> 00:42:43.560
that your program at this point in time
has executed and not yet returned from.

00:42:43.560 --> 00:42:45.890
So right now, there's only
one thing on the call stack

00:42:45.890 --> 00:42:49.700
because the only function that is
currently executing is, of course,

00:42:49.700 --> 00:42:50.930
main, because why?

00:42:50.930 --> 00:42:55.040
I set a breakpoint at line 10, which
is, by definition, inside of main.

00:42:55.040 --> 00:42:59.720
But to Peter's question earlier,
I feel like lines 10 and 11--

00:42:59.720 --> 00:43:01.550
frankly, they look
pretty correct, right?

00:43:01.550 --> 00:43:03.470
It's hard at this point
to have screwed up

00:43:03.470 --> 00:43:07.400
lines 10 and 11 except syntactically,
because I'm getting a negative int.

00:43:07.400 --> 00:43:10.730
I'm storing it in i, and then
I'm printing out the value of i

00:43:10.730 --> 00:43:12.230
on those two lines.

00:43:12.230 --> 00:43:16.370
But what if instead, I'm
curious about get_negative_int?

00:43:16.370 --> 00:43:18.350
I feel like the bug--
logically, it's got

00:43:18.350 --> 00:43:21.170
to be in there because that's
the harder code that I wrote.

00:43:21.170 --> 00:43:24.530
Notice this time, instead
of clicking Step Over,

00:43:24.530 --> 00:43:28.640
let me go ahead and click on Step
Into, which is one of the buttons Peter

00:43:28.640 --> 00:43:29.240
alluded to.

00:43:29.240 --> 00:43:33.440
And when I click Step Into, notice that
you sort of go down the rabbit hole.

00:43:33.440 --> 00:43:38.460
And debug50 jumps into the
function get_negative_int,

00:43:38.460 --> 00:43:41.460
and it focuses on the first
interesting line of code.

00:43:41.460 --> 00:43:44.070
So do, in and of itself,
really isn't that interesting.

00:43:44.070 --> 00:43:46.160
Int n isn't that
interesting because it's not

00:43:46.160 --> 00:43:48.020
assigning a value to it even yet.

00:43:48.020 --> 00:43:50.930
The first juicy line of
code seems to be line 19.

00:43:50.930 --> 00:43:53.150
And that's why the debugger
has jumped to that line.

00:43:53.150 --> 00:43:57.350
Now, n = get_int feels pretty correct.

00:43:57.350 --> 00:43:59.300
It's hard to misuse get_int.

00:43:59.300 --> 00:44:02.420
But notice now on the right-hand
side what has happened.

00:44:02.420 --> 00:44:06.500
Under Call Stack, you now see
two things, not only main,

00:44:06.500 --> 00:44:08.930
but also get_negative_int in a stack.

00:44:08.930 --> 00:44:11.030
It's like a stack of
trays in a cafeteria.

00:44:11.030 --> 00:44:13.250
The first tray at the
bottom is like main.

00:44:13.250 --> 00:44:17.750
The second tray on the stack in the
cafeteria is now get_negative_int.

00:44:17.750 --> 00:44:21.680
And what's cool about this is
that notice that right now, I

00:44:21.680 --> 00:44:23.630
can see my local variables, n.

00:44:23.630 --> 00:44:25.380
And that's indeed the variable I used.

00:44:25.380 --> 00:44:26.750
So I no longer see i.

00:44:26.750 --> 00:44:30.780
I see n because I'm into the
get_negative_int function.

00:44:30.780 --> 00:44:35.030
And now if I keep clicking
Step Over again and again

00:44:35.030 --> 00:44:36.140
after typing in a number.

00:44:36.140 --> 00:44:38.210
Let me type in negative 1 here.

00:44:38.210 --> 00:44:41.540
Now notice on the top right of the
screen, you can see in the debugger

00:44:41.540 --> 00:44:43.280
that n equals negative 1.

00:44:43.280 --> 00:44:45.830
I'm going to now go ahead
and click Step Over.

00:44:45.830 --> 00:44:48.680
And I think I'm going
to end up in line 22.

00:44:48.680 --> 00:44:51.920
If the human has typed in a
negative integer like negative 1,

00:44:51.920 --> 00:44:53.480
obviously, that's negative.

00:44:53.480 --> 00:44:55.160
Let's proceed to line 22.

00:44:55.160 --> 00:44:58.310
But watch what happens
when I click Step Over.

00:44:58.310 --> 00:45:03.740
It actually seems to be going back
to the do loop again and again

00:45:03.740 --> 00:45:06.750
and again, as it will, I keep
providing negative integers.

00:45:06.750 --> 00:45:10.670
So my logic then should be,
well, OK, if n is negative 1,

00:45:10.670 --> 00:45:17.030
but my loop is still running, what
should your logical takeaway here be?

00:45:17.030 --> 00:45:20.710
If n is negative 1, and that is
by definition a negative integer,

00:45:20.710 --> 00:45:25.720
but my loop is still running, what
could be your diagnostic conclusion

00:45:25.720 --> 00:45:29.860
if the debugger is essentially revealing
this hint to you? n is negative 1,

00:45:29.860 --> 00:45:31.420
but the loop is still going.

00:45:31.420 --> 00:45:33.730
Omar, what would you conclude?

00:45:33.730 --> 00:45:36.850
OMAR: Either the condition is wrong,
or maybe some sort of Boolean logic

00:45:36.850 --> 00:45:37.755
could be flawed.

00:45:37.755 --> 00:45:38.630
DAVID MALAN: Perfect.

00:45:38.630 --> 00:45:40.463
So obviously, either
the condition is wrong,

00:45:40.463 --> 00:45:42.505
or there's something wrong
with my Boolean logic.

00:45:42.505 --> 00:45:44.540
And Boolean logic just
refers to true or false.

00:45:44.540 --> 00:45:46.930
So somewhere, I'm saying
true instead of false,

00:45:46.930 --> 00:45:48.850
or I'm saying false instead of true.

00:45:48.850 --> 00:45:52.060
And frankly, the only
place where I have code

00:45:52.060 --> 00:45:56.050
that's going to make this loop go again
and again must logically be on line 21.

00:45:56.050 --> 00:45:59.350
So even if you're not quite sure how
to fix it yet, just by deduction,

00:45:59.350 --> 00:46:02.215
you should realize that, OK,
negative 1 is what's in the variable.

00:46:02.215 --> 00:46:03.340
But that's not good enough.

00:46:03.340 --> 00:46:04.340
The loop is still going.

00:46:04.340 --> 00:46:05.680
I must have screwed up the loop.

00:46:05.680 --> 00:46:08.080
And indeed, let me just now call it out.

00:46:08.080 --> 00:46:11.290
Line 21 is indeed the source of the bug.

00:46:11.290 --> 00:46:12.520
So we've isolated it.

00:46:12.520 --> 00:46:15.160
Out of 23 lines, we've at
least found the one line

00:46:15.160 --> 00:46:18.520
where I know the solution has to be.

00:46:18.520 --> 00:46:19.610
What's the solution?

00:46:19.610 --> 00:46:26.020
How do I fix the logic now thanks to the
debugger having led me down this road?

00:46:26.020 --> 00:46:29.230
How do I fix line 21 here?

00:46:29.230 --> 00:46:31.350
What's the fix?

00:46:31.350 --> 00:46:33.960
What do you propose?

00:46:33.960 --> 00:46:35.220
Yeah, Jacob?

00:46:35.220 --> 00:46:38.700
JACOB: You would have to change
it from while n is less than 0

00:46:38.700 --> 00:46:40.345
to while n is greater than 0.

00:46:40.345 --> 00:46:41.220
DAVID MALAN: Exactly.

00:46:41.220 --> 00:46:44.640
So instead of n less than 0, I
want to say n greater than 0.

00:46:44.640 --> 00:46:46.860
And I think-- slight
clarification, I think

00:46:46.860 --> 00:46:50.328
I want to include 0 here
because 0 is not negative.

00:46:50.328 --> 00:46:52.620
And if I want a negative int,
I think what I'm probably

00:46:52.620 --> 00:46:56.070
going to want to say is while n
is greater than or equal to 0,

00:46:56.070 --> 00:46:57.120
keep doing the loop.

00:46:57.120 --> 00:46:59.970
So I very understandably sort
of just inverted the logic.

00:46:59.970 --> 00:47:00.490
No big deal.

00:47:00.490 --> 00:47:02.323
I'm thinking negatives,
and I did less than.

00:47:02.323 --> 00:47:03.670
But the fix is easy.

00:47:03.670 --> 00:47:06.300
The point is the debugger
led you to this point.

00:47:06.300 --> 00:47:08.730
Now, those of you who have
programmed before probably

00:47:08.730 --> 00:47:10.290
saw the bug jumping out at you.

00:47:10.290 --> 00:47:13.123
Those of you who haven't programmed
before, probably with some time,

00:47:13.123 --> 00:47:15.940
would have figured out what the
bug was, because out of 23 lines,

00:47:15.940 --> 00:47:17.580
it's got to be one of those.

00:47:17.580 --> 00:47:19.830
But as our programs
get more sophisticated,

00:47:19.830 --> 00:47:25.020
and we start writing more lines of
code, debug50 and debuggers in general

00:47:25.020 --> 00:47:26.020
will be your friend.

00:47:26.020 --> 00:47:29.865
And I realize that this is easier
said than done because at first,

00:47:29.865 --> 00:47:32.240
when using a debugger, you're
going to feel like, ah, I'm

00:47:32.240 --> 00:47:33.282
just going to use printf.

00:47:33.282 --> 00:47:35.700
Ah, I'm just going to
fight through this.

00:47:35.700 --> 00:47:37.500
Because there's a bit
of a learning curve,

00:47:37.500 --> 00:47:41.640
you will gain back that
time and more by just

00:47:41.640 --> 00:47:47.490
using a debugger as your first instinct
when chasing down problems like this.

00:47:47.490 --> 00:47:51.660
All right, so that's it for debug50,
a new tool in your toolkit in addition

00:47:51.660 --> 00:47:52.800
to printf.

00:47:52.800 --> 00:47:55.730
But debug50 is hands down
the more powerful of the two.

00:47:55.730 --> 00:47:58.230
Now, some of you have wondered
over the past couple of weeks

00:47:58.230 --> 00:48:00.207
why there's this little
rubber duck here.

00:48:00.207 --> 00:48:02.040
And there actually is
a reason for this too.

00:48:02.040 --> 00:48:05.280
And there's one final debugging
technique that, in all seriousness,

00:48:05.280 --> 00:48:08.390
we'll introduce you today to
known as rubber duck debugging.

00:48:08.390 --> 00:48:09.390
And you can google this.

00:48:09.390 --> 00:48:11.288
There's a whole Wikipedia
article about it.

00:48:11.288 --> 00:48:14.580
And this is kind of a thing in computer
science circles for computer scientists

00:48:14.580 --> 00:48:16.860
or programmers to have
rubber ducks on their desk.

00:48:16.860 --> 00:48:19.290
And the point here is
that sometimes, when

00:48:19.290 --> 00:48:22.710
trying to understand what
is wrong in your code,

00:48:22.710 --> 00:48:24.420
it helps to just talk it through.

00:48:24.420 --> 00:48:28.620
And in an ideal world, we would just
talk to our colleague or our partner

00:48:28.620 --> 00:48:29.610
on some project.

00:48:29.610 --> 00:48:33.060
And just in hearing yourself
vocalize what it is your code

00:48:33.060 --> 00:48:36.810
is supposed to do, very often, that
proverbial light bulb goes off.

00:48:36.810 --> 00:48:39.330
And you're like, oh, wait a
minute, never mind, I got it,

00:48:39.330 --> 00:48:42.600
just because you heard yourself
speaking illogically when

00:48:42.600 --> 00:48:44.910
you intended something actual logical.

00:48:44.910 --> 00:48:49.410
Now, we don't often all have colleagues
or partners or friends with whom we're

00:48:49.410 --> 00:48:50.698
working on a project with.

00:48:50.698 --> 00:48:52.740
And we don't often have
family members or friends

00:48:52.740 --> 00:48:55.270
who want to hear about
our code of all things.

00:48:55.270 --> 00:48:58.590
And so a wonderful proxy
for that conversant partner

00:48:58.590 --> 00:49:00.300
would be literally a rubber duck.

00:49:00.300 --> 00:49:03.900
And so here in healthier times, we
would be giving all of you rubber ducks.

00:49:03.900 --> 00:49:07.080
Here on stage, we brought a
larger one for us all to share.

00:49:07.080 --> 00:49:09.900
If you've noticed in some
of the wide shots on camera,

00:49:09.900 --> 00:49:12.100
there's a duck who's been
watching this whole time.

00:49:12.100 --> 00:49:13.975
So that any time I screw
up, I literally have

00:49:13.975 --> 00:49:17.850
someone I can sort of talk
to nonverbally, in this case.

00:49:17.850 --> 00:49:20.880
But we can't emphasize enough that
in addition to printf, in addition to

00:49:20.880 --> 00:49:25.230
the more sophisticated debug50,
talking through your problems with code

00:49:25.230 --> 00:49:26.940
is a wonderfully valuable thing.

00:49:26.940 --> 00:49:29.010
And if your friends or
family are willing to hear

00:49:29.010 --> 00:49:31.650
about some low-level code you're
writing and some bug you're

00:49:31.650 --> 00:49:33.000
trying to solve, great.

00:49:33.000 --> 00:49:36.210
But in the absence of that, talk
to a stuffed animal in your room.

00:49:36.210 --> 00:49:38.400
Talk to an actual rubber
duck if you have one.

00:49:38.400 --> 00:49:39.960
Talk even aloud or think aloud.

00:49:39.960 --> 00:49:42.120
It's just a wonderful
compelling habit to get

00:49:42.120 --> 00:49:46.440
into because just in hearing yourself
vocalize what you think is logical

00:49:46.440 --> 00:49:51.750
will the illogical very often
jump out at you instead.

00:49:51.750 --> 00:49:55.620
All right, so with that
said, that's been a lot.

00:49:55.620 --> 00:49:57.690
Let's go ahead here and
take a five-minute break,

00:49:57.690 --> 00:49:59.130
give everyone a bit of a breather.

00:49:59.130 --> 00:50:01.050
And when we come back,
we'll take a look now

00:50:01.050 --> 00:50:02.880
at some of the more
powerful features of C

00:50:02.880 --> 00:50:05.850
now that we can trust that we
can solve any problems with all

00:50:05.850 --> 00:50:06.730
of these new tools.

00:50:06.730 --> 00:50:08.700
So we'll be back in five.

00:50:08.700 --> 00:50:10.440
All right, we are back.

00:50:10.440 --> 00:50:13.320
So let's take a look underneath
the hood, so to speak,

00:50:13.320 --> 00:50:15.480
of a computer, because
as fancy as these devices

00:50:15.480 --> 00:50:17.460
are and as powerful
as they seem, they're

00:50:17.460 --> 00:50:21.630
relatively simple in their capabilities
and what they can actually do.

00:50:21.630 --> 00:50:24.570
And let's reveal as much by way
of last week's discussion of type.

00:50:24.570 --> 00:50:27.700
So recall that C supports
different data types.

00:50:27.700 --> 00:50:31.060
So we saw char, and string,
and int, and so forth.

00:50:31.060 --> 00:50:32.730
So to recap, we had all of these.

00:50:32.730 --> 00:50:35.310
Well, it turns out that
each of these data types

00:50:35.310 --> 00:50:40.800
is defined on a typical computer system
as taking up a fixed amount of space.

00:50:40.800 --> 00:50:44.280
And it depends on the computer,
whether it's Mac or PC, or old or new,

00:50:44.280 --> 00:50:47.400
just how much space is used
typically by these data types.

00:50:47.400 --> 00:50:51.400
But on CS50 IDE, the sizes of all
of these types are as follows--

00:50:51.400 --> 00:50:54.510
a bool, true or false, uses just 1 byte.

00:50:54.510 --> 00:50:58.320
Now, that's actually a little wasteful
because 1 byte is 8 bits, and gosh,

00:50:58.320 --> 00:51:00.090
for a bool, you should only need 1 bit.

00:51:00.090 --> 00:51:04.200
You can't work at the
single-bit level easily in C.

00:51:04.200 --> 00:51:07.440
And so we just typically
spend 1 whole byte on a bool.

00:51:07.440 --> 00:51:09.640
Char is going to be 1 byte as well.

00:51:09.640 --> 00:51:13.110
And that might sound familiar, because
last week when we talked about ASCII,

00:51:13.110 --> 00:51:15.450
we proposed that the total
number of possible characters

00:51:15.450 --> 00:51:20.190
you can represent with a char
was 256 because of 8 bits and 2

00:51:20.190 --> 00:51:21.340
to the eighth power.

00:51:21.340 --> 00:51:23.620
So one char is 1 byte.

00:51:23.620 --> 00:51:25.383
And that's fixed in C, no matter what.

00:51:25.383 --> 00:51:27.300
Then there were all of
these other data types.

00:51:27.300 --> 00:51:29.910
There was float, which is a real
number with a decimal point.

00:51:29.910 --> 00:51:31.410
That happens to use 4 bytes.

00:51:31.410 --> 00:51:34.000
A double is also a real
number with a decimal point,

00:51:34.000 --> 00:51:36.820
but it uses 8 bytes, which
gives you even more precision.

00:51:36.820 --> 00:51:40.450
You can have more significant digits
after the decimal point, for instance.

00:51:40.450 --> 00:51:41.830
Ints, we've used a bunch.

00:51:41.830 --> 00:51:43.540
Those are 4 bytes, typically.

00:51:43.540 --> 00:51:45.490
A long is twice as big,
and that just allows

00:51:45.490 --> 00:51:47.115
you to represent an even bigger number.

00:51:47.115 --> 00:51:49.780
And some of you might have done
that exactly on credit when

00:51:49.780 --> 00:51:51.400
storing a whole credit card number.

00:51:51.400 --> 00:51:54.310
Strings, for now, are a
variable number of bytes.

00:51:54.310 --> 00:51:57.780
It could be a short string of text, a
long string of text, a whole paragraph.

00:51:57.780 --> 00:51:58.780
So that's going to vary.

00:51:58.780 --> 00:52:01.870
So we'll come back to this
notion of string next time.

00:52:01.870 --> 00:52:05.380
But today, focus on just these
primitive types, if you will.

00:52:05.380 --> 00:52:08.960
And here is a picture of what
is inside of your computer.

00:52:08.960 --> 00:52:12.285
So this is a piece of memory
or RAM, Random Access Memory.

00:52:12.285 --> 00:52:13.660
And it might be a little smaller.

00:52:13.660 --> 00:52:15.820
It might be a little bigger
depending on whether it's a laptop,

00:52:15.820 --> 00:52:17.450
or desktop, or phone, or the like.

00:52:17.450 --> 00:52:20.710
But it's in memory,
or RAM, that programs

00:52:20.710 --> 00:52:22.780
are stored while they're running.

00:52:22.780 --> 00:52:25.840
And it's where files are
stored when they are open.

00:52:25.840 --> 00:52:29.440
So typically, if you save,
install programs, or save files,

00:52:29.440 --> 00:52:32.740
those are saved on what's generally
called your hard drive, or hard disk,

00:52:32.740 --> 00:52:37.090
or solid-state disk, or CD,
or some other physical medium.

00:52:37.090 --> 00:52:40.450
And that, the [INAUDIBLE] of which is
that they don't require electricity

00:52:40.450 --> 00:52:42.160
to store your data long term.

00:52:42.160 --> 00:52:43.180
RAM is different.

00:52:43.180 --> 00:52:44.830
It's volatile, so to speak.

00:52:44.830 --> 00:52:48.500
But it's much faster than a hard
disk or a solid-state disk, even.

00:52:48.500 --> 00:52:50.500
It's much faster because
it's purely electronic.

00:52:50.500 --> 00:52:52.300
And indeed, there are no moving parts.

00:52:52.300 --> 00:52:54.640
It's purely electronic,
as pictured here.

00:52:54.640 --> 00:52:59.410
And so with RAM, you have the ability
to open files and run programs

00:52:59.410 --> 00:53:02.290
more quickly because when you
double-click a program to run it,

00:53:02.290 --> 00:53:04.960
or you open a file in
order to view or edit it,

00:53:04.960 --> 00:53:06.820
it's stored temporarily in RAM.

00:53:06.820 --> 00:53:11.290
And long story short, if your
laptop battery has ever died,

00:53:11.290 --> 00:53:13.990
or your computer's gotten
unplugged, or your phone dies,

00:53:13.990 --> 00:53:17.770
the reason that you and I tend to
lose data, the paragraph that you just

00:53:17.770 --> 00:53:19.960
wrote in the essay that
you hadn't yet saved,

00:53:19.960 --> 00:53:23.020
is because RAM, memory, is volatile.

00:53:23.020 --> 00:53:26.360
That is, it requires electricity
to continue powering it.

00:53:26.360 --> 00:53:30.100
But for our purposes, we're
only going to focus on RAM,

00:53:30.100 --> 00:53:33.340
not so much long-term disk
space yet, because when

00:53:33.340 --> 00:53:36.820
you're running a program in C,
it is indeed, by definition,

00:53:36.820 --> 00:53:38.680
running in your computer's memory.

00:53:38.680 --> 00:53:41.530
But the funny thing about
something as simple as this picture

00:53:41.530 --> 00:53:44.290
is that each of these black
rectangles is kind of a chip.

00:53:44.290 --> 00:53:47.710
And in those chips are stored all
of the 0's and 1's, the little

00:53:47.710 --> 00:53:49.690
switches that we alluded to in week 0.

00:53:49.690 --> 00:53:53.410
So let's focus on and just zoom
in on just one of these chips.

00:53:53.410 --> 00:53:57.130
Now, it stands to reason that I don't
know how big this stick of RAM is.

00:53:57.130 --> 00:53:59.440
Maybe it's 1 gigabyte, a billion bytes.

00:53:59.440 --> 00:54:01.000
Maybe it's 4 gigabytes.

00:54:01.000 --> 00:54:02.800
Maybe it's even smaller or bigger.

00:54:02.800 --> 00:54:07.250
There's some number of bytes
represented physically by this hardware.

00:54:07.250 --> 00:54:10.228
So if we zoom in further, let
me propose that, all right,

00:54:10.228 --> 00:54:11.770
I don't know how many bytes are here.

00:54:11.770 --> 00:54:15.130
But if there's some number of bytes,
whether it's a billion or 2 billion,

00:54:15.130 --> 00:54:17.440
or fewer or more, it
stands to reason that we

00:54:17.440 --> 00:54:19.240
could just number all of these bytes.

00:54:19.240 --> 00:54:22.670
We could sort of think of this
physical device, this memory,

00:54:22.670 --> 00:54:24.970
as just being a grid, top
to bottom, left to right.

00:54:24.970 --> 00:54:28.270
And each of the squares I've just
overlaid on this physical device

00:54:28.270 --> 00:54:30.057
might represent an individual byte.

00:54:30.057 --> 00:54:32.140
And again, in reality,
maybe there's more of them.

00:54:32.140 --> 00:54:33.410
Maybe there's fewer of them.

00:54:33.410 --> 00:54:35.920
But it stands to reason, no
matter how many there are,

00:54:35.920 --> 00:54:38.650
we can think of each of
these as having a location.

00:54:38.650 --> 00:54:42.200
Like, this is the first byte, second
byte, third byte, and so forth.

00:54:42.200 --> 00:54:45.940
Well, what does it mean, then,
for a char to take up 1 byte?

00:54:45.940 --> 00:54:49.750
That means that if your computer's
memory is running a program maybe

00:54:49.750 --> 00:54:53.890
that you wrote or I wrote that's
using a char variable somewhere in it,

00:54:53.890 --> 00:54:56.500
the char you're storing in
that variable may very well

00:54:56.500 --> 00:55:00.940
be stored in the top left-hand corner
physically of this piece of RAM.

00:55:00.940 --> 00:55:01.660
Maybe it's there.

00:55:01.660 --> 00:55:02.535
Maybe it's elsewhere.

00:55:02.535 --> 00:55:04.840
But it's just one physical square.

00:55:04.840 --> 00:55:08.110
If you're storing something like
an int, which takes up 4 bytes,

00:55:08.110 --> 00:55:11.830
well, that frankly might take up
all four squares along the top there

00:55:11.830 --> 00:55:12.670
or somewhere else.

00:55:12.670 --> 00:55:15.610
If you're using a long, that's going
to take up twice as much space.

00:55:15.610 --> 00:55:18.250
So representing an even bigger
number in your computer's memory

00:55:18.250 --> 00:55:21.340
is going to require that you
use all of the 0's and 1's

00:55:21.340 --> 00:55:25.030
comprising these 8 bytes instead.

00:55:25.030 --> 00:55:27.170
but let's now move away
from physical hardware.

00:55:27.170 --> 00:55:30.530
Let's abstract it away, if you will, and
just now start to think of our memory

00:55:30.530 --> 00:55:31.390
as just this grid.

00:55:31.390 --> 00:55:33.280
And technically, it's not a
two-dimensional structure.

00:55:33.280 --> 00:55:35.860
I could just as easily draw all
of these bytes from left to right.

00:55:35.860 --> 00:55:37.790
I could just fit fewer
of them on the screen.

00:55:37.790 --> 00:55:39.832
So we'll take the physical
metaphor a bit further

00:55:39.832 --> 00:55:44.020
and just think of our computer's memory
as this grid, this grid of bytes.

00:55:44.020 --> 00:55:46.510
And those bytes are each 8 bits.

00:55:46.510 --> 00:55:48.340
Those bits are just 0's and 1's.

00:55:48.340 --> 00:55:52.840
So what we've really done is zoom in
metaphorically on our computer's memory

00:55:52.840 --> 00:55:57.460
to start thinking about where things
are going to end up in memory when you

00:55:57.460 --> 00:56:01.030
double-click on a program on
your Mac or PC or, in CS50 IDE,

00:56:01.030 --> 00:56:04.720
when you do ./hello or
./buggy0 or ./buggy1,

00:56:04.720 --> 00:56:07.990
it's these bytes in your computer's
memory that are filled with all

00:56:07.990 --> 00:56:09.440
of your variables' values.

00:56:09.440 --> 00:56:10.940
So let's consider an example here.

00:56:10.940 --> 00:56:14.800
Suppose I had written some code that
involved declaring three scores.

00:56:14.800 --> 00:56:17.680
Maybe it's a class that's
got, like, three tests.

00:56:17.680 --> 00:56:23.343
And you want to average the student's
grade across all three of those tests.

00:56:23.343 --> 00:56:26.260
Well, let's go ahead and write a
quick program that does exactly this.

00:56:26.260 --> 00:56:30.640
In CS50 IDE, I'm going to create
a program called scores.c.

00:56:30.640 --> 00:56:35.920
And in scores.c, I'm going to
go ahead and #include stdio.h.

00:56:35.920 --> 00:56:38.703
I'm going to then do my
int main(void) as usual.

00:56:38.703 --> 00:56:41.120
And then inside of here, I'm
going to keep it very simple.

00:56:41.120 --> 00:56:43.750
I'm going to give myself
one int called score1.

00:56:43.750 --> 00:56:46.180
And just to be a little
playful, I'm going

00:56:46.180 --> 00:56:48.242
to set it equal to 72, like last week.

00:56:48.242 --> 00:56:50.200
I'm going to give myself
a second score and set

00:56:50.200 --> 00:56:55.120
it equal to 73, and then a third
score whose value is going to be 33.

00:56:55.120 --> 00:56:59.770
And then let me go ahead and print
out the average of those three values

00:56:59.770 --> 00:57:03.100
by plugging in a placeholder
for floating point value, right?

00:57:03.100 --> 00:57:07.330
If you add three integers
together and divide them by 3,

00:57:07.330 --> 00:57:10.730
I may very well get a fraction or
a real number with a decimal point.

00:57:10.730 --> 00:57:14.110
So I'm going to use %f instead of
%i because I don't want to truncate

00:57:14.110 --> 00:57:15.010
someone's grade.

00:57:15.010 --> 00:57:19.210
Otherwise, if they have, like, a 99.9%,
they're not being rounded up to 100%.

00:57:19.210 --> 00:57:22.880
They're going to get the 99% because of
truncation, as we discussed last week.

00:57:22.880 --> 00:57:25.030
So how do I do now the
math of an average?

00:57:25.030 --> 00:57:27.220
Well, it's pretty
straightforward-- score1

00:57:27.220 --> 00:57:30.160
plus score2 plus score3
in parentheses, just

00:57:30.160 --> 00:57:33.610
like in math, divided by 3, semicolon.

00:57:33.610 --> 00:57:35.000
Let me save that file.

00:57:35.000 --> 00:57:36.655
Let me do make scores at the bottom.

00:57:36.655 --> 00:57:38.530
Again, we're not going
to use Clang manually.

00:57:38.530 --> 00:57:41.230
No need to, because it's
a lot easier to run make.

00:57:41.230 --> 00:57:42.670
But I did mess up here.

00:57:42.670 --> 00:57:46.420
"Format specifies type 'double',
but the argument has type 'int'."

00:57:46.420 --> 00:57:48.500
So I don't quite understand that.

00:57:48.500 --> 00:57:52.540
But it's drawing my attention to the
%f and the fact that my math looks like

00:57:52.540 --> 00:57:53.930
this.

00:57:53.930 --> 00:57:56.110
So any thoughts here?

00:57:56.110 --> 00:57:59.650
I don't think printf is going to
help me here because the bug is

00:57:59.650 --> 00:58:01.810
within the printf line.

00:58:01.810 --> 00:58:06.160
I don't think that debug50 is going to
really help me here because I already

00:58:06.160 --> 00:58:09.130
know what line of code the bug is in.

00:58:09.130 --> 00:58:13.570
This feels like an opportunity
to talk to the physical duck

00:58:13.570 --> 00:58:15.280
or some other inanimate object.

00:58:15.280 --> 00:58:21.070
Or we can perhaps think about what
errors we ran into even last week.

00:58:21.070 --> 00:58:23.410
[? Arpan, ?] what do you think?

00:58:23.410 --> 00:58:27.520
[? ARPAN: ?] I think it's
telling you this because it's

00:58:27.520 --> 00:58:31.390
receiving in all the
values are integer type,

00:58:31.390 --> 00:58:33.557
but you are telling it to be in float.

00:58:33.557 --> 00:58:34.390
DAVID MALAN: Indeed.

00:58:34.390 --> 00:58:37.450
So score1, score2,
score3 are all integers,

00:58:37.450 --> 00:58:40.570
and the number 3 is
literally an integer.

00:58:40.570 --> 00:58:43.720
And so this time, the compiler is
smart enough to realize, wait a minute,

00:58:43.720 --> 00:58:48.850
you're trying to coerce an integer
result into a floating point value,

00:58:48.850 --> 00:58:51.592
but you haven't done any floating
point arithmetic, if you will.

00:58:51.592 --> 00:58:52.300
So you know what?

00:58:52.300 --> 00:58:53.592
There's a few ways to fix this.

00:58:53.592 --> 00:58:56.680
Last week, recall we proposed
that you could use a cast,

00:58:56.680 --> 00:59:00.880
and you could explicitly cast one
or more of those values to a float.

00:59:00.880 --> 00:59:02.830
So I could do this, for instance.

00:59:02.830 --> 00:59:06.728
Or I could cast all of these to
floats or one of these to floats.

00:59:06.728 --> 00:59:08.270
There's many different possibilities.

00:59:08.270 --> 00:59:12.250
But frankly, the simplest fix is
just to divide, for instance, by 3.0.

00:59:12.250 --> 00:59:16.263
I can avoid some of the headaches of
casting from one to another by just

00:59:16.263 --> 00:59:18.430
making sure that there's
at least one floating point

00:59:18.430 --> 00:59:20.680
value involved in this arithmetic.

00:59:20.680 --> 00:59:23.020
So now let me recompile scores.

00:59:23.020 --> 00:59:24.460
This time, it compiles OK.

00:59:24.460 --> 00:59:31.810
Let me do ./scores, and voila, my
average isn't so high, 59.333333.

00:59:31.810 --> 00:59:34.750
All right, so what is
actually going on inside

00:59:34.750 --> 00:59:38.980
of the computer irrespective of the
floating point arithmetic, which was,

00:59:38.980 --> 00:59:40.760
again, a topic of last week?

00:59:40.760 --> 00:59:44.470
Well, let's consider these three
variables, score1, score2, score3--

00:59:44.470 --> 00:59:47.500
where are they actually being
stored in the computer's memory?

00:59:47.500 --> 00:59:49.120
Well, let's consider that grid again.

00:59:49.120 --> 00:59:51.550
And again, I'm going to start
at top left for convenience.

00:59:51.550 --> 00:59:53.925
But technically speaking--
we'll see this down the road--

00:59:53.925 --> 00:59:56.410
your computer's memory is
just like this big canvas.

00:59:56.410 --> 00:59:59.150
And values can end up
in all different places.

00:59:59.150 --> 01:00:00.700
But for today, we'll keep it clean.

01:00:00.700 --> 01:00:03.830
The first variable, score1,
I claim is going to be here,

01:00:03.830 --> 01:00:05.420
top left, for simplicity.

01:00:05.420 --> 01:00:08.290
But what's important
about where score1--

01:00:08.290 --> 01:00:13.120
that is, 72-- is being stored, is
it's taking up four of these boxes.

01:00:13.120 --> 01:00:15.670
Each of these boxes,
recall, represents 1 byte.

01:00:15.670 --> 01:00:19.600
And an integer, recall,
in CS50 IDE is 4 bytes.

01:00:19.600 --> 01:00:24.250
Therefore, I have used 4 bytes of
space to represent the number 72.

01:00:24.250 --> 01:00:27.430
The number 73 in score2
similarly is going

01:00:27.430 --> 01:00:32.150
to take up four boxes, as is score3
going to take up four boxes as well.

01:00:32.150 --> 01:00:34.750
But what's really going on
underneath the hood here?

01:00:34.750 --> 01:00:37.450
Well, if each of these
squares represents a byte,

01:00:37.450 --> 01:00:42.070
and each of those bytes is 8
bits, and a bit is just a 0 or 1,

01:00:42.070 --> 01:00:45.130
what's really going on underneath
the hood is something like this.

01:00:45.130 --> 01:00:47.980
Somehow, this electronic
memory is storing

01:00:47.980 --> 01:00:50.920
electricity in just the right
way so that it's storing

01:00:50.920 --> 01:00:53.650
this pattern of 0's and 1's, a.k.a.

01:00:53.650 --> 01:00:57.370
72 in decimal, this pattern
of 0's and 1's, a.k.a.

01:00:57.370 --> 01:01:01.180
73 in decimal, this pattern
of 0's and 1's, a.k.a.

01:01:01.180 --> 01:01:02.682
33 in decimal.

01:01:02.682 --> 01:01:05.140
But again, we don't have to
keep thinking about or dwelling

01:01:05.140 --> 01:01:06.190
on the binary level.

01:01:06.190 --> 01:01:09.100
But this is only to say that
everything we've discussed thus far

01:01:09.100 --> 01:01:11.560
is coming together now
in this one picture

01:01:11.560 --> 01:01:14.320
because a computer is just
storing these patterns for us,

01:01:14.320 --> 01:01:16.930
and we are allocating space
now thanks to our programming

01:01:16.930 --> 01:01:20.050
language via code like this.

01:01:20.050 --> 01:01:26.770
But this code, correct though it may
be, indeed 59.333333 and so forth

01:01:26.770 --> 01:01:31.240
was my average if my test
scores were 72, 73, and 33.

01:01:31.240 --> 01:01:34.330
But I feel like there's an
opportunity for better design here.

01:01:34.330 --> 01:01:37.570
So not just correctness, not
just style, recall that design

01:01:37.570 --> 01:01:40.000
is this other metric of code quality.

01:01:40.000 --> 01:01:42.100
And it's a little more
subjective, and it's

01:01:42.100 --> 01:01:45.490
a little more subject to
debate among reasonable people.

01:01:45.490 --> 01:01:50.258
But I don't really love what I
was doing with this naming scheme.

01:01:50.258 --> 01:01:52.300
And in fact, if we look
at the code, there really

01:01:52.300 --> 01:01:55.240
wasn't much more to my program
than these three lines.

01:01:55.240 --> 01:01:58.840
I worry this program isn't
particularly well designed.

01:01:58.840 --> 01:02:04.360
What rubs you the wrong way, perhaps,
about those three lines of code?

01:02:04.360 --> 01:02:06.255
What could be better?

01:02:06.255 --> 01:02:08.380
And even if you don't know
the solution, especially

01:02:08.380 --> 01:02:13.688
if you've never programmed before, what
kind of smells about those three lines?

01:02:13.688 --> 01:02:14.980
This is actually a term of art.

01:02:14.980 --> 01:02:18.820
"Code smell" is like something--
not loving that for some reason.

01:02:18.820 --> 01:02:22.270
If you can't put your finger on
it, it's not the best design.

01:02:22.270 --> 01:02:23.380
The code smells.

01:02:23.380 --> 01:02:26.950
What's smelly, if you will,
about score1, score2, score3?

01:02:26.950 --> 01:02:28.405
Ryan, what do you think?

01:02:28.405 --> 01:02:30.280
RYAN: If you're doing
an average calculation,

01:02:30.280 --> 01:02:33.170
you don't need to add them
up all together in the code.

01:02:33.170 --> 01:02:35.970
You can add them up beforehand
and store it as one variable.

01:02:35.970 --> 01:02:36.970
DAVID MALAN: Absolutely.

01:02:36.970 --> 01:02:39.803
If I'm computing the average, I
don't need to keep all three around.

01:02:39.803 --> 01:02:42.940
I can just keep a sum and then divide
the whole sum by the total number.

01:02:42.940 --> 01:02:45.070
I like that, that instinct.

01:02:45.070 --> 01:02:49.480
What else might you not like
about the design of this code now?

01:02:49.480 --> 01:02:51.340
Score1, score2, score3.

01:02:54.110 --> 01:02:56.150
Score1, score2, score3.

01:02:56.150 --> 01:02:59.030
Might there be opportunity
still for improvement?

01:02:59.030 --> 01:03:02.090
I feel like any time you start
to see this repetition, maybe.

01:03:02.090 --> 01:03:03.600
Andrew, your thoughts?

01:03:03.600 --> 01:03:06.715
ANDREW: Not hard code the
three scores together?

01:03:06.715 --> 01:03:08.840
DAVID MALAN: OK, so not
hard code the three scores.

01:03:08.840 --> 01:03:10.500
And what would you do instead?

01:03:10.500 --> 01:03:14.060
ANDREW: Maybe take an
input, or I would--

01:03:14.060 --> 01:03:16.562
yeah, I wouldn't write
out the scores themselves.

01:03:16.562 --> 01:03:18.270
DAVID MALAN: Yeah,
another good instinct.

01:03:18.270 --> 01:03:21.470
It's kind of stupid that I've written
a program, compiled a program,

01:03:21.470 --> 01:03:25.847
that only computes the average for some
student who literally got those three

01:03:25.847 --> 01:03:26.930
test scores and no others.

01:03:26.930 --> 01:03:28.490
Like, there's no dynamism here.

01:03:28.490 --> 01:03:31.550
Moreover, it's a little
lazy too that I called

01:03:31.550 --> 01:03:33.890
my variables score1, score2, score3.

01:03:33.890 --> 01:03:35.450
I mean, where does it end after that?

01:03:35.450 --> 01:03:37.790
If I want to have a fourth
test next semester, now

01:03:37.790 --> 01:03:39.140
I have to go and have score4.

01:03:39.140 --> 01:03:40.760
If I've got a fifth, score5.

01:03:40.760 --> 01:03:44.270
That starts to be reminiscent
of last week's copy/paste, which

01:03:44.270 --> 01:03:45.990
really wasn't the best practice.

01:03:45.990 --> 01:03:48.590
And so let me propose
that we clean this up.

01:03:48.590 --> 01:03:50.900
And it turns out we can
clean this up by way

01:03:50.900 --> 01:03:53.210
of another topic, another
feature of C that's

01:03:53.210 --> 01:03:56.100
also present in other
languages, known as arrays.

01:03:56.100 --> 01:03:58.880
And if you happened to use
something called a list in Scratch,

01:03:58.880 --> 01:04:01.760
very similar in spirit
to Scratch's lists.

01:04:01.760 --> 01:04:05.060
But we didn't see those in
lecture that first week.

01:04:05.060 --> 01:04:11.030
An array in C, as in other
languages, is a sequence

01:04:11.030 --> 01:04:14.870
of values stored in memory
back to back to back,

01:04:14.870 --> 01:04:19.020
a sequence of contiguous values,
so to speak, back to back to back.

01:04:19.020 --> 01:04:22.185
So in that sense, it's like a
list of values from left to right

01:04:22.185 --> 01:04:24.560
if we use the metaphor of the
picture we've been drawing.

01:04:24.560 --> 01:04:27.480
So how might this be germane here?

01:04:27.480 --> 01:04:30.710
Well, it turns out that if you want
to store a whole bunch of values,

01:04:30.710 --> 01:04:33.320
but they're all kind of interrelated,
like they're all scores,

01:04:33.320 --> 01:04:37.340
you don't have to resort to this sort
of lazy, score1, score2, score3, score4,

01:04:37.340 --> 01:04:40.850
score5, up to score99, depending
on how many scores there are.

01:04:40.850 --> 01:04:44.630
Why don't you just call all
of those numbers scores,

01:04:44.630 --> 01:04:46.520
but use a slightly different syntax?

01:04:46.520 --> 01:04:49.550
And that syntax gives you access
to what are called arrays.

01:04:49.550 --> 01:04:52.730
So the syntax here on
the screen is an example

01:04:52.730 --> 01:04:56.930
of declaring space for
three integers all at once

01:04:56.930 --> 01:05:00.980
and collectively referring to
all of them as the word "scores."

01:05:00.980 --> 01:05:03.320
So there's no more scores 1, 2, and 3.

01:05:03.320 --> 01:05:06.260
All three of those scores are
in a variable called "scores."

01:05:06.260 --> 01:05:09.830
And what's new here is the
square brackets, inside of which

01:05:09.830 --> 01:05:14.180
is a number that literally
connotes how many integers do you

01:05:14.180 --> 01:05:18.150
want to store under the name "scores."

01:05:18.150 --> 01:05:19.940
So what does this allow me to do?

01:05:19.940 --> 01:05:24.240
It allows me still to define
three integers in that array.

01:05:24.240 --> 01:05:26.330
So this array is going
to be a chunk of memory

01:05:26.330 --> 01:05:29.030
back to back to back
that I can put values in.

01:05:29.030 --> 01:05:32.240
And the way I put those values is
going to look syntactically like this.

01:05:32.240 --> 01:05:36.050
I still use numbers, but now
I'm using a new notation.

01:05:36.050 --> 01:05:39.170
And it's similar to what
I resorted to before,

01:05:39.170 --> 01:05:41.390
but it's a little more
generalized now and dynamic.

01:05:41.390 --> 01:05:44.960
Now if I want to update the
very first score in that array,

01:05:44.960 --> 01:05:47.540
I literally write the name
of the variable scores,

01:05:47.540 --> 01:05:51.020
bracket[0] and then assign it the value.

01:05:51.020 --> 01:05:54.200
If I want to get at the
second score, I do scores[1].

01:05:54.200 --> 01:05:56.570
If I want the third
score, it's scores[2].

01:05:56.570 --> 01:06:00.050
And the only thing that's a little
weird and takes some getting used to is

01:06:00.050 --> 01:06:04.100
the fact that we are
"zero-indexing" our arrays.

01:06:04.100 --> 01:06:06.980
So in past examples, like
for loops and while loops,

01:06:06.980 --> 01:06:09.680
I've sort of said, eh, it's
a convention in programming

01:06:09.680 --> 01:06:11.180
to start counting from 0.

01:06:11.180 --> 01:06:15.140
When it comes to arrays,
which are contiguous

01:06:15.140 --> 01:06:20.292
sequences of values in a computer's
memory, they have to start at 0.

01:06:20.292 --> 01:06:22.250
So otherwise, if you
don't start counting at 0,

01:06:22.250 --> 01:06:26.280
you're literally going to be wasting
space by overlooking one value.

01:06:26.280 --> 01:06:29.100
So now if we were to rename
things on the screen,

01:06:29.100 --> 01:06:34.340
instead of calling these three
rectangles score1, score2, score3,

01:06:34.340 --> 01:06:35.810
they're all called scores.

01:06:35.810 --> 01:06:38.270
But if you want to refer
specifically to the first one,

01:06:38.270 --> 01:06:42.140
you use this fancy bracket notation, and
the second one, this bracket notation,

01:06:42.140 --> 01:06:44.150
and the third one,
this bracket notation.

01:06:44.150 --> 01:06:45.740
But notice the dichotomy.

01:06:45.740 --> 01:06:51.770
When declaring the array, when creating
the array, saying, give me three ints,

01:06:51.770 --> 01:06:56.270
you use [3] where [3] is
the total number of values.

01:06:56.270 --> 01:06:59.300
When you index into the array--

01:06:59.300 --> 01:07:03.260
that is, when you go to a specific
location in that chunk of memory--

01:07:03.260 --> 01:07:05.130
you similarly use numbers.

01:07:05.130 --> 01:07:08.510
But now those are referring to their
relative positions, position 0,

01:07:08.510 --> 01:07:10.460
position 1, position 2.

01:07:10.460 --> 01:07:13.160
This is the total number of spaces.

01:07:13.160 --> 01:07:17.480
This is the specific space
first, second, and third.

01:07:17.480 --> 01:07:20.000
All right, so pictorially,
nothing has changed,

01:07:20.000 --> 01:07:21.872
just our nomenclature really has.

01:07:21.872 --> 01:07:24.080
So let me go ahead and start
to improve this program,

01:07:24.080 --> 01:07:28.220
taking in the advice that was offered
too on how we can improve the design

01:07:28.220 --> 01:07:30.290
and get rid of the smelliness of it.

01:07:30.290 --> 01:07:32.060
Let me take the first--

01:07:32.060 --> 01:07:34.650
let me take the easiest
of these approaches

01:07:34.650 --> 01:07:37.340
first by just getting rid of
these three separate variables

01:07:37.340 --> 01:07:42.590
and instead giving me one variable
called scores, an array of size 3.

01:07:42.590 --> 01:07:45.710
And then I don't need to
declare score1, score2.

01:07:45.710 --> 01:07:47.420
Again, that's all going away.

01:07:47.420 --> 01:07:48.453
That's all going away.

01:07:48.453 --> 01:07:49.370
That's all going away.

01:07:49.370 --> 01:07:52.520
Now if I want to initialize that
array with these three values,

01:07:52.520 --> 01:07:54.440
I say scores[0].

01:07:54.440 --> 01:07:56.510
And down here, I say scores[1].

01:07:56.510 --> 01:07:59.060
And down here, I say scores[2].

01:07:59.060 --> 01:08:01.200
So I've added one line of code.

01:08:01.200 --> 01:08:02.810
But notice the dynamism now.

01:08:02.810 --> 01:08:05.720
If I want to have a fourth one,
I can just allocate here and then

01:08:05.720 --> 01:08:09.742
put in the value with another line
of code, or 5, or 6, or 7, or 8.

01:08:09.742 --> 01:08:11.450
I don't have to start
copying and pasting

01:08:11.450 --> 01:08:13.930
all of these different
variable names by convention.

01:08:13.930 --> 01:08:16.930
But I think if we take some of the
advice that was offered a moment ago,

01:08:16.930 --> 01:08:20.370
we can also clean this up by
way of a loop or such as well.

01:08:20.370 --> 01:08:21.380
So let's do that.

01:08:21.380 --> 01:08:26.140
Let me go ahead and give myself,
actually, first the CS50 library so

01:08:26.140 --> 01:08:27.609
that I can use get_int.

01:08:27.609 --> 01:08:30.100
And let's take this first
piece of advice, which is,

01:08:30.100 --> 01:08:33.370
let's start asking for
a score using get_int.

01:08:33.370 --> 01:08:35.950
And I'm going to do this three times.

01:08:35.950 --> 01:08:37.750
And yeah, I'm getting a little lazy.

01:08:37.750 --> 01:08:38.859
I'm getting a little bored already.

01:08:38.859 --> 01:08:40.029
So I'm going to copy/paste.

01:08:40.029 --> 01:08:41.946
And again, that does not
bode well in general.

01:08:41.946 --> 01:08:44.649
When copying and pasting, we
can probably do better still.

01:08:44.649 --> 01:08:47.770
But now I think I need to
change just one more thing here.

01:08:47.770 --> 01:08:54.010
When doing the math, I want scores[0]
plus scores[1] plus scores[2].

01:08:54.010 --> 01:08:57.580
But before I solve this problem
here-- the logic is still the same,

01:08:57.580 --> 01:09:00.310
but I'm now taking in
dynamically three integers--

01:09:00.310 --> 01:09:02.740
there's still a smell to it as well.

01:09:02.740 --> 01:09:04.550
It's still not as well designed.

01:09:04.550 --> 01:09:10.640
And so just to make clear, what
could I do be doing better now?

01:09:10.640 --> 01:09:14.359
How could I clean up this code and
make it not just correct, not just

01:09:14.359 --> 01:09:17.310
well styled, but better designed?

01:09:17.310 --> 01:09:18.240
What remains here?

01:09:18.240 --> 01:09:18.740
Nina?

01:09:18.740 --> 01:09:20.450
What do you think?

01:09:20.450 --> 01:09:24.170
NINA: The code is specific
for only three scores.

01:09:24.170 --> 01:09:27.859
So you could, as an input,
[INAUDIBLE] how many scores

01:09:27.859 --> 01:09:30.109
it wants at the very beginning.

01:09:30.109 --> 01:09:33.735
And then instead of having
scores[0], scores[1],

01:09:33.735 --> 01:09:40.560
you could use a for loop that goes
through from 0 to n minus 1 or less

01:09:40.560 --> 01:09:45.078
than n that will ask, and it
should be one line of code instead.

01:09:45.078 --> 01:09:46.370
DAVID MALAN: Yeah, really good.

01:09:46.370 --> 01:09:48.649
It's the fact that we have
get_int, get_int, get_int.

01:09:48.649 --> 01:09:51.649
That's the first sign that you're
probably doing something suboptimally.

01:09:51.649 --> 01:09:53.510
It might be correct, but it's
probably not well designed

01:09:53.510 --> 01:09:55.682
because I did literally
resort to copy/paste.

01:09:55.682 --> 01:09:57.890
There's sort of a pattern
here that I could certainly

01:09:57.890 --> 01:09:59.580
integrate into something like a loop.

01:09:59.580 --> 01:10:00.470
So let me do that.

01:10:00.470 --> 01:10:03.830
Let me actually get rid of
two of these lines of code.

01:10:03.830 --> 01:10:08.870
Let me go up here and do something like
for int i get 0, i less than 3 for now,

01:10:08.870 --> 01:10:10.100
i++.

01:10:10.100 --> 01:10:11.600
Let me open up this for loop.

01:10:11.600 --> 01:10:14.120
Let me indent that
remaining line of code.

01:10:14.120 --> 01:10:16.520
And instead of scores[0]--

01:10:16.520 --> 01:10:18.930
this is where arrays
get really powerful--

01:10:18.930 --> 01:10:22.760
you can use a variable
to index into an array--

01:10:22.760 --> 01:10:24.740
that is, to go to a specific location.

01:10:24.740 --> 01:10:26.780
What do I want to use for my variable?

01:10:26.780 --> 01:10:29.400
Well, I would think i here.

01:10:29.400 --> 01:10:33.980
So now I've whittled my lines of code
down from all three triplicate, three

01:10:33.980 --> 01:10:36.140
nearly identical lines,
into just one really

01:10:36.140 --> 01:10:39.290
inside of a loop that's going to do
the same thing for me again and again.

01:10:39.290 --> 01:10:42.230
And as Nina proposed too,
I don't have to hard code

01:10:42.230 --> 01:10:43.860
these 3's all over the place.

01:10:43.860 --> 01:10:46.050
Maybe I could do something like this.

01:10:46.050 --> 01:10:50.240
I could say something like,
int total gets get_int.

01:10:50.240 --> 01:10:53.690
And I might ask, "Total
number of scores."

01:10:53.690 --> 01:10:56.660
And I could literally ask
the human from the get-go

01:10:56.660 --> 01:10:58.490
how many total scores are there.

01:10:58.490 --> 01:11:04.680
Then I can even more powerfully use
this variable, total, in multiple places

01:11:04.680 --> 01:11:07.820
so that now, I'm doing my
math much more dynamically.

01:11:07.820 --> 01:11:11.300
This, though-- I'm afraid,
Nina, this broke a bit.

01:11:11.300 --> 01:11:13.280
I'm going to be a little more--

01:11:13.280 --> 01:11:16.280
I need to exert a little more
effort here on line 14 because now I

01:11:16.280 --> 01:11:21.500
can't hard code scores[0], [1], and [2]
because if the total number of scores

01:11:21.500 --> 01:11:23.820
is more than that, I
need to do more addition.

01:11:23.820 --> 01:11:26.010
If it's fewer than that, I
need to do less addition.

01:11:26.010 --> 01:11:28.698
So I think we've introduced
a bug, but we can fix that.

01:11:28.698 --> 01:11:30.240
But let me propose for just a moment.

01:11:30.240 --> 01:11:33.470
Let's not make it dynamic because I
worry that's just made my life harder.

01:11:33.470 --> 01:11:36.620
Let's at least introduce one
other feature here first.

01:11:36.620 --> 01:11:41.090
I'm going to go ahead up here and
define a new feature of C today, which

01:11:41.090 --> 01:11:42.290
is known as a constant.

01:11:42.290 --> 01:11:46.070
If I know in advance that I want
to declare a number that I want

01:11:46.070 --> 01:11:49.670
to use again and again and again
without copying and pasting

01:11:49.670 --> 01:11:53.180
literally that number 3, I
can give myself a constant int

01:11:53.180 --> 01:11:58.160
by a const int total = 3.

01:11:58.160 --> 01:12:01.320
This declares what's called
a constant in programming,

01:12:01.320 --> 01:12:05.600
which is a feature of many languages
whereby you declare a variable of sorts

01:12:05.600 --> 01:12:07.370
whose value can never change.

01:12:07.370 --> 01:12:09.870
Once you set it, you cannot change it.

01:12:09.870 --> 01:12:12.020
And that's a good thing
because, one, it shouldn't

01:12:12.020 --> 01:12:13.603
change in the context of this program.

01:12:13.603 --> 01:12:16.460
And two, just in case you,
the human, are fallible,

01:12:16.460 --> 01:12:19.103
you don't want to accidentally
change it when you don't intend.

01:12:19.103 --> 01:12:21.020
So this is a feature of
a programming language

01:12:21.020 --> 01:12:23.850
that sort of protects you from yourself.

01:12:23.850 --> 01:12:28.280
So now I can sort of take an amalgam
of my instincts and Nina's and use

01:12:28.280 --> 01:12:29.780
this variable, total.

01:12:29.780 --> 01:12:32.570
And actually, another convention
when declaring constants

01:12:32.570 --> 01:12:35.960
is to capitalize them just to
make visually clear that there's

01:12:35.960 --> 01:12:38.820
something different or
special about this variable.

01:12:38.820 --> 01:12:42.320
So I'm going to change this to TOTAL,
and I'm going to use that value here

01:12:42.320 --> 01:12:45.260
and here and also down here.

01:12:45.260 --> 01:12:49.550
But I'm afraid both Nina and I have
a little bit of cleanup here to do

01:12:49.550 --> 01:12:54.680
in that I still have hard coded
scores[0], scores[1], and scores[2].

01:12:54.680 --> 01:12:58.552
And I want to add a changing
number of values together.

01:12:58.552 --> 01:12:59.260
So you know what?

01:12:59.260 --> 01:13:00.270
I've got an idea.

01:13:00.270 --> 01:13:03.260
Let me go ahead and
create a function that's

01:13:03.260 --> 01:13:05.070
going to compute an average for me.

01:13:05.070 --> 01:13:08.780
So if I want to create my own
function that computes an average,

01:13:08.780 --> 01:13:10.650
I want it to return a
floating point value,

01:13:10.650 --> 01:13:13.400
just so that we don't truncate any math.

01:13:13.400 --> 01:13:15.020
I'm going to call this average.

01:13:15.020 --> 01:13:18.560
And the input to this function
is going to be the length

01:13:18.560 --> 01:13:21.710
of an array and the actual array.

01:13:21.710 --> 01:13:24.320
And this is the last piece
of funky syntax for now.

01:13:24.320 --> 01:13:31.700
It turns out that when you want to pass
an array as input to a custom function,

01:13:31.700 --> 01:13:35.360
you literally use those square brackets
again, but you don't specify the size.

01:13:35.360 --> 01:13:37.940
And the upside of this is
that your function then

01:13:37.940 --> 01:13:42.050
can support an array that's got
one space in it, two spaces, three,

01:13:42.050 --> 01:13:42.800
a hundred.

01:13:42.800 --> 01:13:45.030
It's more dynamic this way.

01:13:45.030 --> 01:13:47.090
So how do I compute an average here?

01:13:47.090 --> 01:13:48.590
I can do this a few different ways.

01:13:48.590 --> 01:13:50.975
But I think what was
suggested earlier makes

01:13:50.975 --> 01:13:52.850
sense, where I can do
some kind of summation.

01:13:52.850 --> 01:13:54.898
So let me do int sum = 0.

01:13:54.898 --> 01:13:57.440
Because how do you compute the
average of a bunch of numbers?

01:13:57.440 --> 01:14:00.170
Well, you add them all together,
and you divide by the total.

01:14:00.170 --> 01:14:01.670
Well, let's see how I might do that.

01:14:01.670 --> 01:14:06.050
Let me do for int i
gets 0, i less than--

01:14:06.050 --> 01:14:06.920
what should this be?

01:14:06.920 --> 01:14:11.990
Well, if I'm being passed as this
custom function the length of the array

01:14:11.990 --> 01:14:16.280
and the actual array, I think I
can iterate from i up to length,

01:14:16.280 --> 01:14:18.870
and then i++ on each iteration.

01:14:18.870 --> 01:14:20.720
And then on each
iteration, I think I want

01:14:20.720 --> 01:14:27.450
to do sum plus whatever is in the
array's i-th location, so to speak.

01:14:27.450 --> 01:14:31.160
So again, this is shorthand
notation per last week for this.

01:14:31.160 --> 01:14:38.330
Sum equals whatever sum is plus
whatever is in location i of the array.

01:14:38.330 --> 01:14:40.670
And once I've done all
of that, I think what

01:14:40.670 --> 01:14:47.400
I can do is return the total sum
divided by the length of the array.

01:14:47.400 --> 01:14:50.510
And what I like about this whole
approach-- assuming my code's correct,

01:14:50.510 --> 01:14:54.440
and I don't think it is just yet--
notice what I can do back up in main.

01:14:54.440 --> 01:14:58.400
Now I can abstract away the
notion of calculating an average

01:14:58.400 --> 01:15:04.863
and just do something like this
with this line of code here.

01:15:04.863 --> 01:15:05.780
So what did I just do?

01:15:05.780 --> 01:15:09.080
A lot's going on, but let's focus
for a moment on line 14 here.

01:15:09.080 --> 01:15:12.230
On line 14, I'm still just printing
the average of some floating point

01:15:12.230 --> 01:15:13.230
placeholder.

01:15:13.230 --> 01:15:17.570
But what I'm passing as input
now is this function, average,

01:15:17.570 --> 01:15:20.210
whose inputs are going to be
TOTAL, which again is just

01:15:20.210 --> 01:15:23.210
this constant at the very
top-- oh, sorry, I goofed.

01:15:23.210 --> 01:15:26.420
I should have capitalized it, which
is just that constant at the very top.

01:15:26.420 --> 01:15:29.840
And I'm passing in scores,
which again, is just

01:15:29.840 --> 01:15:32.750
this array of all of those scores.

01:15:32.750 --> 01:15:36.180
Meanwhile, in the function, in
the context of the function,

01:15:36.180 --> 01:15:39.140
notice that the names of
the inputs to a function

01:15:39.140 --> 01:15:41.870
do not need to match the
names of the variables being

01:15:41.870 --> 01:15:43.410
passed into that function.

01:15:43.410 --> 01:15:46.820
So even though in main, they're
called TOTAL and scores,

01:15:46.820 --> 01:15:48.890
in the context of my
function, average, I

01:15:48.890 --> 01:15:54.140
can call them x and y, a and b, or
more generically, length and array.

01:15:54.140 --> 01:15:56.660
I don't know what the array
is, but it's an array of ints.

01:15:56.660 --> 01:16:01.280
And I don't know how long it is, but
that answer is going to be in length.

01:16:01.280 --> 01:16:03.560
But there's still a bug here.

01:16:03.560 --> 01:16:04.640
There's still a bug.

01:16:04.640 --> 01:16:07.940
And if we ignore main for a
moment, this is a subtle one.

01:16:07.940 --> 01:16:11.840
Does anyone see a mistake that I've
made probably for the third time

01:16:11.840 --> 01:16:14.330
now over the past two weeks?

01:16:14.330 --> 01:16:18.490
What mistake subtle have
I made here with my code

01:16:18.490 --> 01:16:21.850
only in this average function?

01:16:21.850 --> 01:16:23.740
This one's a little more subtle.

01:16:23.740 --> 01:16:27.250
But the goal is to compute the
average of a whole bunch of integers

01:16:27.250 --> 01:16:28.460
and return the answer.

01:16:28.460 --> 01:16:29.770
Nicholas?

01:16:29.770 --> 01:16:33.940
NICHOLAS: You've declared the
variable within the function.

01:16:33.940 --> 01:16:37.420
DAVID MALAN: I've declared the
variable within the function.

01:16:37.420 --> 01:16:42.460
That's OK because I've declared my
variable sum here, I think you mean.

01:16:42.460 --> 01:16:45.430
But that's inside of
the average function.

01:16:45.430 --> 01:16:49.870
And I'm using sum inside of
the outermost curly braces

01:16:49.870 --> 01:16:50.890
that was defined.

01:16:50.890 --> 01:16:52.300
And so that's OK.

01:16:52.300 --> 01:16:53.590
That's OK.

01:16:53.590 --> 01:16:56.650
Let's take another thought here.

01:16:56.650 --> 01:17:00.085
Olivia, where might the bug still be?

01:17:00.085 --> 01:17:01.960
OLIVIA: The return type's
a float, but you're

01:17:01.960 --> 01:17:03.425
returning an int divided by an int.

01:17:03.425 --> 01:17:04.300
DAVID MALAN: Perfect.

01:17:04.300 --> 01:17:06.400
So I again made that same
stupid mistake that's

01:17:06.400 --> 01:17:08.710
just going to get more
obvious as time goes on

01:17:08.710 --> 01:17:12.760
that if I want to do floating point
arithmetic, just like the Ariane rocket

01:17:12.760 --> 01:17:15.550
discussion, the Patriot missile--
like, these kinds of details

01:17:15.550 --> 01:17:16.870
matter in a program.

01:17:16.870 --> 01:17:18.790
Now it's correct because
I'm actually going

01:17:18.790 --> 01:17:22.000
to ensure that even
though the context here

01:17:22.000 --> 01:17:24.740
is much less important than
those real-world contexts,

01:17:24.740 --> 01:17:28.960
just computing some average of scores,
I'm not going to accidentally truncate

01:17:28.960 --> 01:17:30.212
any of my values.

01:17:30.212 --> 01:17:32.170
So again, in the context
here of this function,

01:17:32.170 --> 01:17:34.540
average is just applying some
of last week's principles.

01:17:34.540 --> 01:17:35.500
I've got a variable.

01:17:35.500 --> 01:17:36.310
I've got a loop.

01:17:36.310 --> 01:17:39.070
And I'm doing some floating
point arithmetic, ultimately.

01:17:39.070 --> 01:17:42.790
And I'm now creating a
function that takes two inputs.

01:17:42.790 --> 01:17:44.890
One is length, and one is the length--

01:17:44.890 --> 01:17:48.100
one is the array itself, and the
return type, as Olivia notes,

01:17:48.100 --> 01:17:51.790
is a float so that my
output is also well defined.

01:17:51.790 --> 01:17:53.590
But what's nice about
this is, again, you

01:17:53.590 --> 01:17:55.660
can think of these
functions as abstractions.

01:17:55.660 --> 01:18:00.760
Now I don't need to worry about how
I calculate an average because I now

01:18:00.760 --> 01:18:03.400
have this helper function,
a custom function

01:18:03.400 --> 01:18:05.930
I wrote that can help
me answer that question.

01:18:05.930 --> 01:18:09.010
And here, notice that the
output of this average function

01:18:09.010 --> 01:18:12.842
will become an input into printf.

01:18:12.842 --> 01:18:15.050
And the only other feature
I've added to the mix here

01:18:15.050 --> 01:18:18.380
now are not only arrays,
which allow us to create

01:18:18.380 --> 01:18:21.650
multiple variables, a variable
number of variables, if you will,

01:18:21.650 --> 01:18:23.420
but also this notion of a constant.

01:18:23.420 --> 01:18:26.960
If I find myself using the same
number again and again and again,

01:18:26.960 --> 01:18:29.570
this constant can help
me keep my code clean.

01:18:29.570 --> 01:18:30.440
And notice this.

01:18:30.440 --> 01:18:33.710
If next year, maybe another semester,
there's four scores or four tests,

01:18:33.710 --> 01:18:34.940
I change it in one place.

01:18:34.940 --> 01:18:35.870
I recompile.

01:18:35.870 --> 01:18:37.610
Boom, I'm done.

01:18:37.610 --> 01:18:39.980
A well-designed program
does not require that you

01:18:39.980 --> 01:18:43.130
go reading through the entirety of it,
fixing numbers here, numbers there.

01:18:43.130 --> 01:18:46.010
Changing it in one place can
allow me to improve this program,

01:18:46.010 --> 01:18:49.520
make it support four tests next
year instead of just the three.

01:18:49.520 --> 01:18:52.760
But better still would
be to take, I think,

01:18:52.760 --> 01:18:56.900
Nina's advice before, which was to
maybe just use get_int and ask the human

01:18:56.900 --> 01:18:58.910
for how many tests they actually have.

01:18:58.910 --> 01:19:00.562
That too would work.

01:19:00.562 --> 01:19:02.270
Well, let me pause
here to see if there's

01:19:02.270 --> 01:19:07.460
any questions then about
arrays or about constants

01:19:07.460 --> 01:19:13.770
or passing them around as
inputs and outputs in this way.

01:19:13.770 --> 01:19:16.740
Yeah, over to Sophia.

01:19:16.740 --> 01:19:21.570
SOPHIA: I had question about the use
of float and why the use of one float

01:19:21.570 --> 01:19:23.790
causes the whole output to be a float.

01:19:23.790 --> 01:19:24.870
Why does that occur?

01:19:24.870 --> 01:19:25.920
DAVID MALAN: Yeah, really good question.

01:19:25.920 --> 01:19:27.340
That's just how C behaves.

01:19:27.340 --> 01:19:30.840
So long as there is one or more
floating point values involved

01:19:30.840 --> 01:19:35.820
in a mathematical formula, it is
going to use that data type, which

01:19:35.820 --> 01:19:39.610
is the more powerful one, if you will,
rather than risk truncating anything.

01:19:39.610 --> 01:19:41.970
So you just need one
float to be participating

01:19:41.970 --> 01:19:44.490
in the formula in question.

01:19:44.490 --> 01:19:45.900
Good question.

01:19:45.900 --> 01:19:53.550
Other questions on arrays or constants
or this passing around of them?

01:19:53.550 --> 01:19:57.150
Yeah, over to Alexandra.

01:19:57.150 --> 01:20:03.240
ALEXANDRA: I have a question about
the declaring of the array, scores.

01:20:03.240 --> 01:20:08.370
When you declared it in
main, you said int scores.

01:20:08.370 --> 01:20:11.890
And in the brackets, you have TOTAL.

01:20:11.890 --> 01:20:16.313
Can you declare it without the TOTAL--

01:20:16.313 --> 01:20:17.730
DAVID MALAN: Really good question.

01:20:17.730 --> 01:20:18.600
ALEXANDRA: --only the brackets?

01:20:18.600 --> 01:20:19.530
DAVID MALAN: Short answer, no.

01:20:19.530 --> 01:20:21.940
So the way I did it is the
way you do have to do it.

01:20:21.940 --> 01:20:25.810
And in fact, if I highlight what I
did here, now it currently says TOTAL.

01:20:25.810 --> 01:20:29.400
If I get rid of that, and I go back
to our first version where I said

01:20:29.400 --> 01:20:36.360
something like 3 and 3 and 3 over here,
you cannot do this, which I think,

01:20:36.360 --> 01:20:38.010
Alexandra, is what you were proposing.

01:20:38.010 --> 01:20:41.640
The computer needs to know how big
the array is when you are creating it.

01:20:41.640 --> 01:20:44.160
The exception to that
is that when you're

01:20:44.160 --> 01:20:47.070
passing an array from
one function to another,

01:20:47.070 --> 01:20:49.350
you do not need to tell
that custom function

01:20:49.350 --> 01:20:51.990
how big the array is because,
again, you don't know in advance.

01:20:51.990 --> 01:20:55.410
You're writing a fairly generic,
dynamic function whose purpose in life

01:20:55.410 --> 01:21:00.750
is to take any array as input
of integers and any length

01:21:00.750 --> 01:21:05.640
and respond accordingly with an average
that matches the size of that thing.

01:21:05.640 --> 01:21:09.870
And those of you, as an aside, who have
programmed before, especially in Java,

01:21:09.870 --> 01:21:13.890
unlike in Java and certain other
languages, the length of an array

01:21:13.890 --> 01:21:16.320
is not built into the array itself.

01:21:16.320 --> 01:21:20.590
If you do not pass in the length
of an array to another function,

01:21:20.590 --> 01:21:24.280
there is no way to determine
how big the array is.

01:21:24.280 --> 01:21:26.850
This is different from
Java and other languages,

01:21:26.850 --> 01:21:29.880
where you can ask the array, in
some sense, what is its length.

01:21:29.880 --> 01:21:32.490
In C, you have to pass
both the array itself

01:21:32.490 --> 01:21:35.610
and its length around
separately. [? Sina? ?]

01:21:35.610 --> 01:21:38.880
[? SINA: ?] I just-- I'm still
a little bit confused about how,

01:21:38.880 --> 01:21:44.740
when we write that second command,
when is it void in the parentheses?

01:21:44.740 --> 01:21:47.520
And when do we define the int?

01:21:47.520 --> 01:21:50.918
Because as I remember when we did the--

01:21:50.918 --> 01:21:53.460
get a negative number, we get
a positive number, it was void,

01:21:53.460 --> 01:21:55.500
but we still kind of gave it an input.

01:21:55.500 --> 01:21:57.705
I'm just not completely sold on that.

01:21:57.705 --> 01:21:59.080
DAVID MALAN: Sure, good question.

01:21:59.080 --> 01:22:01.570
Let me go ahead and open
up that previous example,

01:22:01.570 --> 01:22:04.960
which was a little buggy, but
it has the right syntax here.

01:22:04.960 --> 01:22:07.620
So here was the get_negative_int
function from before.

01:22:07.620 --> 01:22:10.620
And, [? Sina, ?] you know
it was void as input.

01:22:10.620 --> 01:22:13.170
So there was one comment you
made where it still took input.

01:22:13.170 --> 01:22:14.010
That was not so.

01:22:14.010 --> 01:22:17.070
So get_negative_int
did not take any input.

01:22:17.070 --> 01:22:19.620
And case in point, if
we scroll up to main,

01:22:19.620 --> 01:22:22.530
notice that when I
called it on line 10, I

01:22:22.530 --> 01:22:25.920
said get_negative_int, open
parenthesis, close parenthesis,

01:22:25.920 --> 01:22:29.040
with no inputs inside
of those parentheses.

01:22:29.040 --> 01:22:32.220
This keyword "void," which we've
seen a few times now last week

01:22:32.220 --> 01:22:35.880
and this week, is just an
explicit keyword in C that says,

01:22:35.880 --> 01:22:41.340
do not put anything here, which is to
say, it would be incorrect for me up

01:22:41.340 --> 01:22:44.970
here to do something like
this, like to pass in a number,

01:22:44.970 --> 01:22:48.990
or to pass in a prompt, or anything
inside of those parentheses.

01:22:48.990 --> 01:22:51.630
The fact that this
function, get_negative_int

01:22:51.630 --> 01:22:56.340
takes void as its input means it
does not take any inputs whatsoever.

01:22:56.340 --> 01:22:56.942
That's fine.

01:22:56.942 --> 01:22:59.400
For get_negative_int, the name
of the function says it all.

01:22:59.400 --> 01:23:02.367
Like, there's no need to
parameterize or customize

01:23:02.367 --> 01:23:04.200
the behavior of getting
negative int itself.

01:23:04.200 --> 01:23:06.180
You just want to get a negative int.

01:23:06.180 --> 01:23:09.300
By contrast, though, with
the function we just wrote,

01:23:09.300 --> 01:23:14.940
average, this function does make
conceptual sense to take inputs,

01:23:14.940 --> 01:23:17.490
because you can't just
say, give me the average.

01:23:17.490 --> 01:23:18.930
Like, average of what?

01:23:18.930 --> 01:23:22.110
Like, it needs to take input so as
to answer that question for you.

01:23:22.110 --> 01:23:24.840
And the input, in this case,
is the array itself of numbers

01:23:24.840 --> 01:23:28.425
and the length of that array
so you can do the arithmetic.

01:23:28.425 --> 01:23:31.050
And so, [? Sina, ?] hopefully,
that helps make the distinction.

01:23:31.050 --> 01:23:33.930
You use void when you
don't want to take input.

01:23:33.930 --> 01:23:38.340
And you actually specify a
comma-separated list of arguments

01:23:38.340 --> 01:23:42.000
when you do want to take input.

01:23:42.000 --> 01:23:46.170
All right, so we focused up
until now on integers, really.

01:23:46.170 --> 01:23:49.020
But let's simplify a little
bit because it turns out

01:23:49.020 --> 01:23:52.020
that arrays and memory
actually intersect

01:23:52.020 --> 01:23:55.740
to create some very familiar features
of most any computer program, namely

01:23:55.740 --> 01:23:57.970
text or strings more generally.

01:23:57.970 --> 01:24:03.010
So suppose we simplify further, no more
integers, no more arrays of integers.

01:24:03.010 --> 01:24:05.490
Let's just start for a moment
with a single character

01:24:05.490 --> 01:24:09.840
and write a program that just creates
a single brick from that Mario game.

01:24:09.840 --> 01:24:13.540
Let me go ahead and create a
program here called brick.c.

01:24:13.540 --> 01:24:15.900
And in brick.c, I'm
just going to #include

01:24:15.900 --> 01:24:21.570
stdio.h, int main(void) And more
on this void a little later today.

01:24:21.570 --> 01:24:25.170
Char c gets, quote unquote, '#'.

01:24:25.170 --> 01:24:29.730
And then down here, let me just go ahead
and print very simply a placeholder,

01:24:29.730 --> 01:24:32.800
%c, backslash n, and then output c.

01:24:32.800 --> 01:24:34.380
So this is a pretty stupid program.

01:24:34.380 --> 01:24:37.530
Its sole purpose in life
is to print a single hash

01:24:37.530 --> 01:24:41.940
as you might have in a Mario
pyramid of height 1, so very simple.

01:24:41.940 --> 01:24:44.040
Let me go ahead and make brick.

01:24:44.040 --> 01:24:45.480
It seems to compile OK.

01:24:45.480 --> 01:24:47.040
Let me run it with ./brick.

01:24:47.040 --> 01:24:48.750
And voila, we get a single brick.

01:24:48.750 --> 01:24:54.150
But let's consider for just a moment
exactly what just happened here

01:24:54.150 --> 01:24:58.237
and what actually was going
on underneath the hood.

01:24:58.237 --> 01:24:59.070
Well, you know what?

01:24:59.070 --> 01:25:00.030
I'm kind of curious.

01:25:00.030 --> 01:25:03.990
I remember from last week, we could
cast values from one thing to another.

01:25:03.990 --> 01:25:07.290
What if I got a little curious,
and I didn't print out c,

01:25:07.290 --> 01:25:12.480
which is this hash character, as %c,
which is a placeholder for a character?

01:25:12.480 --> 01:25:15.250
What if I got a little
crazy and said %i?

01:25:15.250 --> 01:25:21.370
I think I could probably coerce
this char by casting it to an int

01:25:21.370 --> 01:25:23.830
so I can see its decimal equivalent.

01:25:23.830 --> 01:25:25.960
I could see its actual ASCII code.

01:25:25.960 --> 01:25:28.350
So let me rebuild this with make brick.

01:25:28.350 --> 01:25:30.330
Now let me do ./brick.

01:25:30.330 --> 01:25:32.430
And what number might we see?

01:25:32.430 --> 01:25:36.840
Last week, we saw 72 a
lot, 73, and 33 for "HI!"

01:25:36.840 --> 01:25:39.000
This week, you can see 35.

01:25:39.000 --> 01:25:43.140
It turns out it's the code
for and an ASCII hash.

01:25:43.140 --> 01:25:47.730
And you can see this, for instance,
if I go to a website like--

01:25:47.730 --> 01:25:52.020
let's go to asciichart.com.

01:25:52.020 --> 01:25:55.170
And sure enough, if I go to
the same chart from last week,

01:25:55.170 --> 01:25:58.560
and I look for the hash symbol
here, its ASCII code is 35.

01:25:58.560 --> 01:26:02.340
And it turns out, in C, if it's
pretty straightforward to the computer

01:26:02.340 --> 01:26:05.390
that, yes, if this is a character,
I know I can convert it to an int,

01:26:05.390 --> 01:26:07.440
you don't have to explicitly cast it.

01:26:07.440 --> 01:26:12.990
You can instead implicitly cast one data
type to another just from context here.

01:26:12.990 --> 01:26:16.950
So printf and C are smart enough
here to know, OK, you're giving me

01:26:16.950 --> 01:26:19.050
a character in the form of variable c.

01:26:19.050 --> 01:26:23.083
But you want to display
it as a %i, an integer.

01:26:23.083 --> 01:26:24.000
That's going to be OK.

01:26:24.000 --> 01:26:25.990
And indeed, I still see the number 35.

01:26:25.990 --> 01:26:27.392
So that's just simple casting.

01:26:27.392 --> 01:26:29.850
But let's now put this into
the context of today's picture.

01:26:29.850 --> 01:26:31.372
How is that character laid out?

01:26:31.372 --> 01:26:33.330
Well, quite simply, if
this is my memory again,

01:26:33.330 --> 01:26:36.150
and we've gotten rid of
all of the numbers, c,

01:26:36.150 --> 01:26:41.250
otherwise storing this hash, is just
being stored in one of these bytes.

01:26:41.250 --> 01:26:47.370
It only requires one square because,
again, a char is a single byte.

01:26:47.370 --> 01:26:52.240
But equivalently, 35 is the number
that's actually being stored there.

01:26:52.240 --> 01:26:53.790
But I wonder, I wonder.

01:26:53.790 --> 01:26:55.890
Last week, we spent quite
a bit of time storing

01:26:55.890 --> 01:27:01.060
not just single characters, but actual
words like "hi" and other expressions.

01:27:01.060 --> 01:27:03.490
And so what if I were to
do something like this?

01:27:03.490 --> 01:27:04.960
Let me go back to my code.

01:27:04.960 --> 01:27:07.530
And let me not quite yet
practice what I just preached.

01:27:07.530 --> 01:27:11.910
And let me give myself three
variables this time-- c1, c2, and c3.

01:27:11.910 --> 01:27:16.980
And let me deliberately store in those
three variables H, I, in all caps,

01:27:16.980 --> 01:27:18.720
followed by an exclamation point.

01:27:18.720 --> 01:27:22.170
And per last week, when you're
dealing with individual characters,

01:27:22.170 --> 01:27:24.630
you must, in C, use single quotes.

01:27:24.630 --> 01:27:26.520
When you're dealing with
multiple characters,

01:27:26.520 --> 01:27:29.080
otherwise known last week as
strings, use double quotes.

01:27:29.080 --> 01:27:31.830
But that's why I'm using single
quotes, because we're only playing

01:27:31.830 --> 01:27:34.060
at the moment with single characters.

01:27:34.060 --> 01:27:37.080
Now let me go ahead and
print these values out.

01:27:37.080 --> 01:27:43.320
Let me print out %c, %c,
%c, and output c1, c2, c3.

01:27:43.320 --> 01:27:49.590
So this is perhaps the stupidest way you
could print out a full word like "HI!"

01:27:49.590 --> 01:27:54.360
in C by storing every single character
in its own variable, but so be it.

01:27:54.360 --> 01:27:57.090
I'm just using these
first principles here.

01:27:57.090 --> 01:27:58.493
I'm using %c as my placeholder.

01:27:58.493 --> 01:27:59.910
I'm printing out these characters.

01:27:59.910 --> 01:28:01.950
So let me do make brick now.

01:28:01.950 --> 01:28:03.000
Compiles OK.

01:28:03.000 --> 01:28:04.678
And if I do a dot slash--

01:28:04.678 --> 01:28:06.720
you know, I really should
have renamed this file,

01:28:06.720 --> 01:28:08.095
but we'll rename it in a moment--

01:28:08.095 --> 01:28:09.630
./brick, "HI!"

01:28:09.630 --> 01:28:11.190
And let me go ahead and do this.

01:28:11.190 --> 01:28:14.490
Let me go ahead now and
actually close the file.

01:28:14.490 --> 01:28:18.820
And recall from last week, if I
want to rename my file from brick.c,

01:28:18.820 --> 01:28:22.620
let's say, to hi.c, I can
use the move command, mv.

01:28:22.620 --> 01:28:26.730
And now if I open up this file,
sure enough, there's hi.c.

01:28:26.730 --> 01:28:29.850
And I've fixed my renaming mistake.

01:28:29.850 --> 01:28:35.040
All right, so again, if I now do
make hi, and I do ./hi, voila,

01:28:35.040 --> 01:28:36.000
I see the "HI!"

01:28:36.000 --> 01:28:40.052
But again, this is kind of a stupid
way of implementing a string.

01:28:40.052 --> 01:28:41.760
But let's still look
underneath the hood.

01:28:41.760 --> 01:28:43.093
Let me go ahead and get curious.

01:28:43.093 --> 01:28:46.311
Let me print out %i, %i, and %i.

01:28:46.311 --> 01:28:48.480
And Let me include spaces
this time just so I

01:28:48.480 --> 01:28:51.700
can see separation between the numbers.

01:28:51.700 --> 01:28:54.750
Let me make hi again, ./hi.

01:28:54.750 --> 01:28:56.760
OK, there's that 72.

01:28:56.760 --> 01:28:57.900
There's that 73.

01:28:57.900 --> 01:29:00.340
And there's that 33 from last week.

01:29:00.340 --> 01:29:01.653
So that's interesting too.

01:29:01.653 --> 01:29:04.320
So what's going on underneath the
hood in the computer's memory?

01:29:04.320 --> 01:29:06.237
Well, when I'm storing
these three characters,

01:29:06.237 --> 01:29:11.040
now I'm just storing them in three
different boxes, so c1, c2, c3.

01:29:11.040 --> 01:29:14.970
And when you look at it collectively,
it kind of looks like a whole word

01:29:14.970 --> 01:29:17.610
even though it's, of course,
just these individual characters.

01:29:17.610 --> 01:29:20.850
So what's underneath the hood,
of course, though, is 72, 73, 33.

01:29:20.850 --> 01:29:23.160
Or equivalently, in binary, just this.

01:29:23.160 --> 01:29:25.410
So the story is the same
even though we're now talking

01:29:25.410 --> 01:29:28.540
about chars instead of integers.

01:29:28.540 --> 01:29:31.110
But what happens when I do this?

01:29:31.110 --> 01:29:35.040
What happens when I do string
s gets, quote unquote, "HI!"

01:29:35.040 --> 01:29:36.450
using double quotes?

01:29:36.450 --> 01:29:38.850
Well, let's change this
program accordingly.

01:29:38.850 --> 01:29:42.390
Let me go ahead and do what we
would have done last week, string--

01:29:42.390 --> 01:29:44.760
I'll call it s just for s for string--

01:29:44.760 --> 01:29:45.300
"HI!"

01:29:45.300 --> 01:29:46.410
in all caps.

01:29:46.410 --> 01:29:47.925
I can simplify this next line.

01:29:47.925 --> 01:29:52.170
I'm going to use %s as a
placeholder for string s.

01:29:52.170 --> 01:29:54.300
But let's, for now, reveal
what a string really

01:29:54.300 --> 01:29:55.800
is, because string is a term of art.

01:29:55.800 --> 01:29:59.370
Every programming language has
"strings" even if it doesn't technically

01:29:59.370 --> 01:30:01.260
have a data type called string.

01:30:01.260 --> 01:30:04.560
C does not technically have
a data type called string.

01:30:04.560 --> 01:30:08.850
We have added this type to
C by way of CS50's library.

01:30:08.850 --> 01:30:12.720
But now if I do make hi, notice
that my code compiles OK.

01:30:12.720 --> 01:30:17.230
And if I do ./hi Enter,
voila, I still see "HI!",

01:30:17.230 --> 01:30:19.570
which is what I would have
seen last week as well.

01:30:19.570 --> 01:30:23.700
And if we depict this in the computer's
memory, because "HI!" is three letters,

01:30:23.700 --> 01:30:26.040
it's kind of like saying,
well, give me three boxes,

01:30:26.040 --> 01:30:27.930
and let me call this string s.

01:30:27.930 --> 01:30:30.510
So this feels like a
reasonable artist's rendition

01:30:30.510 --> 01:30:35.070
of what s is if it's storing
a three-letter word like "HI!"

01:30:35.070 --> 01:30:39.840
But any time we have sequences
of characters like this,

01:30:39.840 --> 01:30:44.190
I feel like we're now seeing the
capability of a proper programming

01:30:44.190 --> 01:30:44.760
language.

01:30:44.760 --> 01:30:48.250
We introduced a little bit
ago the notion of a string.

01:30:48.250 --> 01:30:52.190
So maybe could someone
redefine string as we've

01:30:52.190 --> 01:30:56.360
been using it in terms of
some of today's nomenclature?

01:30:56.360 --> 01:30:57.860
Like, what is a string?

01:30:57.860 --> 01:31:02.730
There's an example of one,
"HI!", taking up three boxes.

01:31:02.730 --> 01:31:06.720
But how did we, CS50 maybe implement
string underneath the hood,

01:31:06.720 --> 01:31:09.140
would you say?

01:31:09.140 --> 01:31:09.650
What is it?

01:31:09.650 --> 01:31:11.270
Tucker?

01:31:11.270 --> 01:31:14.848
TUCKER: Well, it's an array
of characters and integers.

01:31:14.848 --> 01:31:16.640
Well, it's integers
are used in the string,

01:31:16.640 --> 01:31:19.575
but it's an array of
basically single characters.

01:31:19.575 --> 01:31:20.450
DAVID MALAN: Perfect.

01:31:20.450 --> 01:31:22.640
If we now have the ability to express--

01:31:22.640 --> 01:31:23.810
very nicely done, Tucker.

01:31:23.810 --> 01:31:27.560
If we now have the ability to represent
sequences of things, integers,

01:31:27.560 --> 01:31:29.360
for instance, like
scores, well, it stands

01:31:29.360 --> 01:31:33.410
to reason that we can take another
primitive, a very basic data type

01:31:33.410 --> 01:31:34.340
like a char.

01:31:34.340 --> 01:31:38.030
And if we want to spell things with
those chars, like English words,

01:31:38.030 --> 01:31:41.180
well, let's just think of
a string really as an array

01:31:41.180 --> 01:31:43.820
of characters, an array of chars.

01:31:43.820 --> 01:31:47.850
And indeed, that's exactly
what string actually is.

01:31:47.850 --> 01:31:54.180
So this thing here, "HI!", technically
speaking is an array called s.

01:31:54.180 --> 01:31:57.080
And this is s[0] This is s[1].

01:31:57.080 --> 01:31:58.310
This is s[2].

01:31:58.310 --> 01:31:59.878
It's just an array called s.

01:31:59.878 --> 01:32:01.670
Now, we didn't use the
word array last week

01:32:01.670 --> 01:32:04.610
because it's not as familiar as
the notion of a "string of text,"

01:32:04.610 --> 01:32:05.570
for instance.

01:32:05.570 --> 01:32:08.720
But a string is
apparently just an array.

01:32:08.720 --> 01:32:12.380
And if it's an array, that means
we can access, if we want to,

01:32:12.380 --> 01:32:16.610
the individual characters of that
array by way of the square bracket

01:32:16.610 --> 01:32:18.170
notation from today.

01:32:18.170 --> 01:32:23.180
But it turns out there's something
a little special about strings

01:32:23.180 --> 01:32:24.440
as they're implemented.

01:32:24.440 --> 01:32:28.190
Recall in our example
involving scores, the only way

01:32:28.190 --> 01:32:32.930
we knew how long that
array was was because I

01:32:32.930 --> 01:32:36.740
had a second variable
called length or TOTAL

01:32:36.740 --> 01:32:41.900
that stored the total number
of integers in that array.

01:32:41.900 --> 01:32:44.480
That is to say in our scores
example, not only did we

01:32:44.480 --> 01:32:45.860
allocate the array itself.

01:32:45.860 --> 01:32:51.390
We also kept track of how many things
were in that array with two variables.

01:32:51.390 --> 01:32:56.810
However, up until now, every time you
and I have used the printf function,

01:32:56.810 --> 01:33:01.040
and we have passed to that
printf function a string like s,

01:33:01.040 --> 01:33:05.420
we have only provided printf
with the string itself.

01:33:05.420 --> 01:33:08.030
Or logically, we have
only provided printf

01:33:08.030 --> 01:33:11.670
with the array of characters itself.

01:33:11.670 --> 01:33:17.870
And yet somehow, printf is magically
figuring out how long the string is.

01:33:17.870 --> 01:33:20.660
After all, when printf
prints the value of s,

01:33:20.660 --> 01:33:23.780
it is printing H, I, exclamation
point, and that's it.

01:33:23.780 --> 01:33:27.643
It's not going and printing 4
characters or 5 or 20, right?

01:33:27.643 --> 01:33:30.560
It stands to reason that there's
other stuff in your computer's memory

01:33:30.560 --> 01:33:32.960
if you've got other variables
or other programs running.

01:33:32.960 --> 01:33:35.480
Yet printf seems to be
smart enough to know,

01:33:35.480 --> 01:33:39.320
given an array, how long the
array is because, quite simply, it

01:33:39.320 --> 01:33:42.480
only prints out that single word.

01:33:42.480 --> 01:33:48.440
So how then does a computer know where
a string ends in memory if all a string

01:33:48.440 --> 01:33:49.910
is is a sequence of characters?

01:33:49.910 --> 01:33:54.500
Well, it turns out that if your string
is length 3, as is this one, H, I,

01:33:54.500 --> 01:34:00.680
exclamation point, technically a
string, implemented underneath the hood,

01:34:00.680 --> 01:34:02.390
uses 4 bytes.

01:34:02.390 --> 01:34:04.280
It uses 4 bytes.

01:34:04.280 --> 01:34:07.760
It uses a fourth byte to
be initialized to what

01:34:07.760 --> 01:34:11.850
we would describe as backslash 0,
which is a weird way of describing it.

01:34:11.850 --> 01:34:14.870
But this just represents a
special character, otherwise known

01:34:14.870 --> 01:34:18.890
as the null character, which
is just a special value that

01:34:18.890 --> 01:34:20.880
represents the end of a string.

01:34:20.880 --> 01:34:23.960
So that is to say when
you create a string, quote

01:34:23.960 --> 01:34:26.750
unquote with double quotes, "HI!"--

01:34:26.750 --> 01:34:28.400
yes, the string is length 3.

01:34:28.400 --> 01:34:31.580
But you're wasting or
spending 4 total bytes on it.

01:34:31.580 --> 01:34:32.240
Why?

01:34:32.240 --> 01:34:36.380
Because this is a clue to the
computer as to where "HI!"

01:34:36.380 --> 01:34:39.800
ends and where the next
string maybe begins.

01:34:39.800 --> 01:34:43.010
It is not sufficient to just
start printing characters inside

01:34:43.010 --> 01:34:45.117
of printf one at a time, left to right.

01:34:45.117 --> 01:34:47.450
There needs to be this sort
of equivalent of a stop sign

01:34:47.450 --> 01:34:50.150
at the end of the string, saying,
that's it for this string.

01:34:50.150 --> 01:34:51.540
Well, what are these values?

01:34:51.540 --> 01:34:53.290
Well, let's convert
them back to decimal--

01:34:53.290 --> 01:34:54.800
72, 73, 33.

01:34:54.800 --> 01:35:00.560
That fancy backslash 0 was just a way
of saying, in character form, it's 0.

01:35:00.560 --> 01:35:06.740
More specifically, it is eight
0 bits inside of that square.

01:35:06.740 --> 01:35:09.470
So to store a string, the
computer, unbeknownst to you,

01:35:09.470 --> 01:35:15.260
has been using one extra byte all, 0
bits, otherwise written as backslash 0,

01:35:15.260 --> 01:35:19.340
but otherwise known as
literally the value 0.

01:35:19.340 --> 01:35:23.180
So this thing, otherwise
colloquially known as null,

01:35:23.180 --> 01:35:24.685
is just a special character.

01:35:24.685 --> 01:35:26.060
And we can actually see it again.

01:35:26.060 --> 01:35:30.260
If I go back to my
asciichart.com from before,

01:35:30.260 --> 01:35:35.480
notice number 0 is known
as NUL, N-U-L in all caps.

01:35:35.480 --> 01:35:40.580
All right, so with that said, what
is powerful then about strings

01:35:40.580 --> 01:35:42.060
once we have this capability?

01:35:42.060 --> 01:35:43.640
Well, let me go ahead and do this.

01:35:43.640 --> 01:35:46.130
Let me go back into my
code from a moment ago.

01:35:46.130 --> 01:35:48.830
And let me go ahead and enhance
this program a little bit

01:35:48.830 --> 01:35:51.965
just to get a little curious
as to what's going on.

01:35:51.965 --> 01:35:53.250
You know what I can do?

01:35:53.250 --> 01:35:57.200
I bet what I can do here in
this version here is this.

01:35:57.200 --> 01:35:57.800
You know what?

01:35:57.800 --> 01:36:00.440
If I want to print out all
of these characters of s,

01:36:00.440 --> 01:36:06.590
I can get a little curious
again and print out %c, %c, %c.

01:36:06.590 --> 01:36:11.340
And if s is an array, per today's
syntax, I can technically do s[0],

01:36:11.340 --> 01:36:14.940
s[1], s[2].

01:36:14.940 --> 01:36:21.720
And then if I save this, recompile
my code with make hi, OK, ./hi,

01:36:21.720 --> 01:36:23.070
I still see "HI!"

01:36:23.070 --> 01:36:23.820
But you know what?

01:36:23.820 --> 01:36:25.195
Let me get a little more curious.

01:36:25.195 --> 01:36:28.740
Let me use %i so I can
actually see those ASCII codes.

01:36:28.740 --> 01:36:31.950
Let me go ahead and
recompile with make hi, ./hi.

01:36:31.950 --> 01:36:35.190
There's the 72, 73, 33.

01:36:35.190 --> 01:36:37.090
Now let me get even more curious.

01:36:37.090 --> 01:36:42.270
Let me print a fourth
value like this here, s[3],

01:36:42.270 --> 01:36:44.430
which is the fourth location, mind you.

01:36:44.430 --> 01:36:50.850
So if I now do make hi and
./hi, voila, now you see 0.

01:36:50.850 --> 01:36:55.110
And what this hints at is actually a
very dangerous feature of C. You know,

01:36:55.110 --> 01:36:57.750
suppose I'm curious at
seeing what's beyond that.

01:36:57.750 --> 01:37:01.290
I could technically do
s[4], the fifth location,

01:37:01.290 --> 01:37:04.830
even though according to my picture,
there really shouldn't be anything

01:37:04.830 --> 01:37:08.010
at the fifth location, at least
not that I know about just yet.

01:37:08.010 --> 01:37:10.980
But I can do it in C.
Nothing's stopping me.

01:37:10.980 --> 01:37:13.710
So let me do make hi, ./hi.

01:37:13.710 --> 01:37:15.490
And that's interesting.

01:37:15.490 --> 01:37:17.560
Apparently there's the number 37.

01:37:17.560 --> 01:37:19.110
What is the number 37?

01:37:19.110 --> 01:37:21.150
Well, let me go back to my ASCII chart.

01:37:21.150 --> 01:37:25.102
And let me conclude that
number 37 is a percent sign.

01:37:25.102 --> 01:37:28.060
So that's kind of weird because I
didn't print out an explicit percent.

01:37:28.060 --> 01:37:31.290
Now I'm kind of poking around
the computer's memory in places

01:37:31.290 --> 01:37:33.370
I shouldn't be looking, in some sense.

01:37:33.370 --> 01:37:36.510
In fact, if I get really curious,
let's look not at location 4.

01:37:36.510 --> 01:37:40.140
How about location 40, like
way off into that picture?

01:37:40.140 --> 01:37:44.400
Make hi, ./hi, 24, whatever that is.

01:37:44.400 --> 01:37:52.470
I can look at location 400,
recompile my code, make hi, ./hi.

01:37:52.470 --> 01:37:54.090
And now it's 0 again.

01:37:54.090 --> 01:37:57.060
So this is what's both powerful
and also dangerous about C.

01:37:57.060 --> 01:38:01.088
You can touch, look at,
change any memory you want.

01:38:01.088 --> 01:38:02.880
You're essentially just
on the honor system

01:38:02.880 --> 01:38:04.838
not to touch memory that
does it belong to you.

01:38:04.838 --> 01:38:06.960
And invariably,
especially next week, are

01:38:06.960 --> 01:38:10.290
we going to start accidentally touching
memory that doesn't belong to you.

01:38:10.290 --> 01:38:13.380
And you'll see that it actually can
cause computer programs to crash,

01:38:13.380 --> 01:38:18.330
including programs on your own Mac and
PC, yet another source of common bugs.

01:38:18.330 --> 01:38:22.350
But now that we have this ability
to store different strings

01:38:22.350 --> 01:38:24.362
or to think about
strings as arrays, well,

01:38:24.362 --> 01:38:26.070
let's go ahead and
consider how you might

01:38:26.070 --> 01:38:27.670
have multiple strings in a program.

01:38:27.670 --> 01:38:30.900
So for instance, if you were to store
two strings in a program-- let's call

01:38:30.900 --> 01:38:32.677
them s and t respectively.

01:38:32.677 --> 01:38:35.010
Another programmer convention--
if you need two strings,

01:38:35.010 --> 01:38:37.110
call the first one s
then the second one t.

01:38:37.110 --> 01:38:38.400
Maybe I'm storing "HI!"

01:38:38.400 --> 01:38:39.280
then "BYE!"

01:38:39.280 --> 01:38:41.530
Well, what's the computer's
memory going to look like?

01:38:41.530 --> 01:38:43.950
Well, let's do some digging.

01:38:43.950 --> 01:38:46.202
"HI!", as before, is
going to be stored here.

01:38:46.202 --> 01:38:47.910
So this whole thing
refers to s, and it's

01:38:47.910 --> 01:38:52.080
taking 4 bytes because the last one
is that special null character that

01:38:52.080 --> 01:38:55.440
just is the stop sign that
demarcates the end of the string.

01:38:55.440 --> 01:38:59.760
"BYE!", meanwhile, is going to take
up another B, Y, E, exclamation point,

01:38:59.760 --> 01:39:04.650
five bytes because I need a fifth byte
to represent another null character.

01:39:04.650 --> 01:39:06.600
And this one deliberately wraps around.

01:39:06.600 --> 01:39:08.820
Though again, this is just
an artist's rendition.

01:39:08.820 --> 01:39:11.580
There's not necessarily
a grid in reality.

01:39:11.580 --> 01:39:16.770
B, Y, E, exclamation point,
backslash 0 now represents t.

01:39:16.770 --> 01:39:21.690
So this is to say, if I had a
program like this, where I had "HI!"

01:39:21.690 --> 01:39:25.200
and then "BYE!", and I started
poking around the computer's memory

01:39:25.200 --> 01:39:27.360
just using the square
bracket notation, I

01:39:27.360 --> 01:39:31.020
bet I could start accessing
the value of B or Y

01:39:31.020 --> 01:39:34.710
or E just by looking a
little past the string s.

01:39:34.710 --> 01:39:37.380
So again, as complicated
as our programs get,

01:39:37.380 --> 01:39:40.320
all that's going on underneath the
hood is you just plop things down

01:39:40.320 --> 01:39:44.070
in memory in locations like these.

01:39:44.070 --> 01:39:47.310
And so now that we have this
ability or maybe this mental model

01:39:47.310 --> 01:39:49.710
for what's going on
inside of a computer,

01:39:49.710 --> 01:39:53.490
we can consider some of the
features that you might want

01:39:53.490 --> 01:39:55.740
to now use in programs that you write.

01:39:55.740 --> 01:39:59.190
So let me go ahead here and
whip up a quick program,

01:39:59.190 --> 01:40:05.400
for instance, that goes
ahead and, let's say,

01:40:05.400 --> 01:40:09.310
prints out the total length of a string.

01:40:09.310 --> 01:40:10.540
Let me go ahead and do this.

01:40:10.540 --> 01:40:14.730
I'm going to go ahead and create
a new program here in CS50's IDE.

01:40:14.730 --> 01:40:17.870
And I'm going to call this one string.c.

01:40:17.870 --> 01:40:22.080
And I'm going to very quickly at
the top include as usual cs50.h.

01:40:22.080 --> 01:40:24.735
And I'm going to go ahead
and #include stdio.h.

01:40:24.735 --> 01:40:27.185
And I'm going to give
myself int main(void).

01:40:27.185 --> 01:40:29.310
And then in here, I'm going
to get myself a string.

01:40:29.310 --> 01:40:32.280
So string s equals get_string.

01:40:32.280 --> 01:40:35.220
Let me just ask the human for
some input, whatever it is.

01:40:35.220 --> 01:40:39.270
Then let me go ahead and print
out literally the word "Output"

01:40:39.270 --> 01:40:41.730
just so that I can
actually see the result.

01:40:41.730 --> 01:40:47.250
And then down here, let me go ahead and
print out that string, for int i get 0,

01:40:47.250 --> 01:40:49.792
i is less than--

01:40:49.792 --> 01:40:52.240
huh, I don't know what the
length of the string is yet.

01:40:52.240 --> 01:40:54.990
So let me just put a question mark
there, which is not valid code,

01:40:54.990 --> 01:40:57.068
but we'll come back to this-- i++.

01:40:57.068 --> 01:40:59.610
And then inside of the loop, I
want to go ahead and print out

01:40:59.610 --> 01:41:03.432
every character one at a time
by using my new array notation.

01:41:03.432 --> 01:41:05.140
And then at the very
end of this program,

01:41:05.140 --> 01:41:06.890
I'm going to print a
new line just to make

01:41:06.890 --> 01:41:08.460
sure the cursor is on its own line.

01:41:08.460 --> 01:41:11.000
So this is a complete
program that is now,

01:41:11.000 --> 01:41:15.950
as of this week, going to treat a string
as an array, ergo, my syntax in line 10

01:41:15.950 --> 01:41:18.830
that's using my new fancy
square bracket notation.

01:41:18.830 --> 01:41:21.920
But the only question I
haven't answered yet is this--

01:41:21.920 --> 01:41:25.100
how do I know when to
stop printing the string?

01:41:25.100 --> 01:41:26.390
How do I know when to stop?

01:41:26.390 --> 01:41:28.850
Well, it turns out, thus far,
when we're using for loops,

01:41:28.850 --> 01:41:34.040
we've typically done something like
just count from 0 on up to some number.

01:41:34.040 --> 01:41:36.620
This condition, though,
is any Boolean expression.

01:41:36.620 --> 01:41:39.300
I just need to have a yes/no
or a true/false answer.

01:41:39.300 --> 01:41:40.850
So you know what I could do?

01:41:40.850 --> 01:41:45.620
Keep looping so long as
character at location i

01:41:45.620 --> 01:41:50.030
and s does not equal backslash 0.

01:41:50.030 --> 01:41:52.170
So this is now definitely
some new syntax.

01:41:52.170 --> 01:41:53.510
Let me zoom in here.

01:41:53.510 --> 01:41:58.700
But s[i] just means the i-th
character in s, or more specifically,

01:41:58.700 --> 01:42:01.820
the character at position i in s.

01:42:01.820 --> 01:42:05.000
Bang equals-- so bang is
how a programmer pronounces

01:42:05.000 --> 01:42:08.150
exclamation point because it's
a little faster-- bang equals

01:42:08.150 --> 01:42:09.597
means does not equal.

01:42:09.597 --> 01:42:12.680
So this is how you would do an equal
sign with a slash through it in math.

01:42:12.680 --> 01:42:15.920
It's, in code, exclamation
point, equals sign.

01:42:15.920 --> 01:42:18.230
And then notice this
funkiness-- backslash

01:42:18.230 --> 01:42:22.100
0 is again, the "null character,"
but it's in single quotes

01:42:22.100 --> 01:42:24.500
because, again, it is by
definition a character.

01:42:24.500 --> 01:42:26.480
And for reasons we'll
get into another time,

01:42:26.480 --> 01:42:28.760
backslash 0 is how you express it.

01:42:28.760 --> 01:42:32.600
Just like backslash n is kind of a
weird escape character for the new line,

01:42:32.600 --> 01:42:36.710
backslash 0 is the
character that is all 0's.

01:42:36.710 --> 01:42:38.570
So this is kind of a different for loop.

01:42:38.570 --> 01:42:41.870
I'm still starting at 0 for i.

01:42:41.870 --> 01:42:43.880
I'm still incrementing i as always.

01:42:43.880 --> 01:42:46.400
But I'm now not checking
for some preordained length

01:42:46.400 --> 01:42:50.990
because just like a computer, I do not
know a priori where these strings end.

01:42:50.990 --> 01:42:55.580
I only know that they end
once I see backslash 0.

01:42:55.580 --> 01:42:59.150
So when I now go down
here and do make string--

01:42:59.150 --> 01:43:05.570
it compiles OK-- ./string, let me type
in something like "HELLO" in all caps.

01:43:05.570 --> 01:43:07.460
Voila, the output is "HELLO" again.

01:43:07.460 --> 01:43:08.450
Let me do it again--

01:43:08.450 --> 01:43:11.030
"BYE" in all caps, and
the output is "BYE."

01:43:11.030 --> 01:43:13.580
So it's kind of a useless program
in that it's just printing

01:43:13.580 --> 01:43:15.350
the same thing that I typed in.

01:43:15.350 --> 01:43:19.490
But I'm conditionally using
this Boolean expression

01:43:19.490 --> 01:43:22.170
to decide whether or not to
keep printing characters.

01:43:22.170 --> 01:43:25.280
Now thankfully, C comes with a
function that can answer this for me.

01:43:25.280 --> 01:43:29.210
It turns out there is a
function called strlen

01:43:29.210 --> 01:43:31.850
so I can literally just
say, well, figure out

01:43:31.850 --> 01:43:33.500
what the length of the string is.

01:43:33.500 --> 01:43:36.110
The function is called
strlen for string length.

01:43:36.110 --> 01:43:40.730
And it exists in a file called,
not surprisingly, perhaps,

01:43:40.730 --> 01:43:43.610
string.h, string.h.

01:43:43.610 --> 01:43:47.660
So now let me go ahead down
here and do make string--

01:43:47.660 --> 01:43:50.300
compiles OK-- ./string.

01:43:50.300 --> 01:43:52.950
Type in "HELLO," and it still works.

01:43:52.950 --> 01:43:58.400
So this function strlen that does
exist in a library via the header file

01:43:58.400 --> 01:43:59.523
string.h already exists.

01:43:59.523 --> 01:44:00.440
Someone else wrote it.

01:44:00.440 --> 01:44:01.710
But how did they write it?

01:44:01.710 --> 01:44:04.040
Odds are they wrote the
first version that I

01:44:04.040 --> 01:44:06.980
did by checking for that backslash 0.

01:44:06.980 --> 01:44:09.235
But let me ask a subtle question here.

01:44:09.235 --> 01:44:10.235
This program is correct.

01:44:10.235 --> 01:44:12.235
It iterates over the whole
length of the string,

01:44:12.235 --> 01:44:14.870
and it prints out every
character therein.

01:44:14.870 --> 01:44:20.510
Can anyone observe a poor design
decision in this function?

01:44:20.510 --> 01:44:24.200
This one's subtle, but
there's something I don't

01:44:24.200 --> 01:44:26.660
like about my for loop in particular.

01:44:26.660 --> 01:44:28.640
And I'll isolate it to line 9.

01:44:28.640 --> 01:44:31.230
I've not done something
optimally on line 9.

01:44:31.230 --> 01:44:34.700
There's an opportunity
for better design.

01:44:34.700 --> 01:44:40.830
Any thoughts here on
what I might do better?

01:44:40.830 --> 01:44:42.426
Yeah, Jonathan?

01:44:42.426 --> 01:44:46.770
JONATHAN: Yeah, to create basically
another variable for the string length

01:44:46.770 --> 01:44:48.455
and to remember it.

01:44:48.455 --> 01:44:50.580
DAVID MALAN: Yeah, and why
are you suggesting that?

01:44:50.580 --> 01:44:53.670
JONATHAN: If you want to use a
different value for the string length,

01:44:53.670 --> 01:44:55.710
or if it might fluctuate
or change, you want

01:44:55.710 --> 01:44:59.370
to just have a different variable as
a sort of placeholder value for it.

01:44:59.370 --> 01:45:00.670
DAVID MALAN: OK, potentially.

01:45:00.670 --> 01:45:03.210
But I will claim in this case
that because the human has

01:45:03.210 --> 01:45:07.090
typed in the word, once you type in
the word, it's not going to change.

01:45:07.090 --> 01:45:11.520
But I think you're going down
the right direction because

01:45:11.520 --> 01:45:15.570
in this Boolean expression here, i
less than the string length of s,

01:45:15.570 --> 01:45:19.350
recall that this expression gets
evaluated again and again and again.

01:45:19.350 --> 01:45:22.050
Every time through a for loop,
recall that you're constantly

01:45:22.050 --> 01:45:23.290
checking the condition.

01:45:23.290 --> 01:45:26.460
The condition in this case is
i less than the length of s.

01:45:26.460 --> 01:45:30.382
The problem is that strlen in
this case is a function, which

01:45:30.382 --> 01:45:32.340
means there's some piece
of code someone wrote,

01:45:32.340 --> 01:45:35.593
probably similar to what I wrote a few
minutes ago, that you're constantly

01:45:35.593 --> 01:45:37.260
asking, what's the length of the string?

01:45:37.260 --> 01:45:38.593
What's the length of the string?

01:45:38.593 --> 01:45:41.880
And recall from our picture, the way
you figure out the length of a string

01:45:41.880 --> 01:45:44.070
is you start at the beginning of
the string, and you keep checking,

01:45:44.070 --> 01:45:45.300
am I at backslash 0?

01:45:45.300 --> 01:45:46.020
OK.

01:45:46.020 --> 01:45:47.700
Am I at backslash 0?

01:45:47.700 --> 01:45:48.540
OK.

01:45:48.540 --> 01:45:52.600
So to figure out the length of "HI!",
it's going to take me 1, 2, 3, 4 steps,

01:45:52.600 --> 01:45:54.600
right, because I have to
start at the beginning.

01:45:54.600 --> 01:45:57.267
And I iterate from
location 0 on to the end.

01:45:57.267 --> 01:45:59.100
To find out the length
of "BYE!", it's going

01:45:59.100 --> 01:46:01.350
to take me five steps
because that's how long it's

01:46:01.350 --> 01:46:04.740
going to take me from left to
right to find that backslash 0.

01:46:04.740 --> 01:46:07.080
So what I don't like about
this line of code is,

01:46:07.080 --> 01:46:10.680
why are you asking for the string
length of s again and again

01:46:10.680 --> 01:46:11.790
and again and again?

01:46:11.790 --> 01:46:14.230
It's not going to
change in this context.

01:46:14.230 --> 01:46:17.887
So Jonathan's point is taken if we
keep asking the user for more input.

01:46:17.887 --> 01:46:19.970
But in this case, we've
only asked the human once.

01:46:19.970 --> 01:46:20.920
So you know what?

01:46:20.920 --> 01:46:26.700
Let's take Jonathan's advice and do
int n equals the string length of s.

01:46:26.700 --> 01:46:28.950
And then maybe you
know what we could do?

01:46:28.950 --> 01:46:32.170
Put n in this condition instead.

01:46:32.170 --> 01:46:35.520
So now I'm asking the same
question, but I'm not foolishly,

01:46:35.520 --> 01:46:39.030
inefficiently asking the same
question again and again,

01:46:39.030 --> 01:46:42.720
whereby the same question
requires a good amount of work

01:46:42.720 --> 01:46:45.940
to find the backslash 0
again and again and again.

01:46:45.940 --> 01:46:48.470
Now, there's some cleaning
up we can do here too.

01:46:48.470 --> 01:46:50.970
It turns out there's this other
subtle feature of for loops.

01:46:50.970 --> 01:46:54.660
If you want to initialize
another variable to a value,

01:46:54.660 --> 01:46:56.370
you can actually do this all at once.

01:46:56.370 --> 01:46:59.130
And you can do so before the semicolon.

01:46:59.130 --> 01:47:04.530
You can do comma n equals strlen of s.

01:47:04.530 --> 01:47:07.150
And then you can use
n, just as I have here.

01:47:07.150 --> 01:47:09.210
So it's not all that
much better, but it's

01:47:09.210 --> 01:47:11.790
a little cleaner in that now
I've taken two lines of code

01:47:11.790 --> 01:47:13.710
and collapsed them into one.

01:47:13.710 --> 01:47:15.750
They both have to be
of the same data types,

01:47:15.750 --> 01:47:19.380
but that's OK here
because both i and n are.

01:47:19.380 --> 01:47:21.750
So again, the inefficiency
here is that it was foolish

01:47:21.750 --> 01:47:26.100
before that I kept asking the same
question again and again and again.

01:47:26.100 --> 01:47:30.810
But now I'm asking the question once,
remembering it in a variable called n,

01:47:30.810 --> 01:47:36.720
and only comparing i against that
integer which does not actually change.

01:47:36.720 --> 01:47:38.370
All right, I know that too was a lot.

01:47:38.370 --> 01:47:41.910
Let's go ahead here and take a 3-minute
break just to stretch legs and whatnot.

01:47:41.910 --> 01:47:44.880
In 3 minutes, we'll come back
and start to see applications

01:47:44.880 --> 01:47:48.030
now of all of these features
ultimately to some problems that

01:47:48.030 --> 01:47:51.030
are going to lie ahead this week
on the readability of language

01:47:51.030 --> 01:47:52.510
and also on cryptography.

01:47:52.510 --> 01:47:54.750
So we'll see you in 3 minutes.

01:47:54.750 --> 01:47:57.240
All right, so we're back.

01:47:57.240 --> 01:48:00.885
And this has been a whole bunch
of low-level details, admittedly.

01:48:00.885 --> 01:48:03.510
And where we're going with this
ultimately this week and beyond

01:48:03.510 --> 01:48:05.562
is applications of some
of these building blocks.

01:48:05.562 --> 01:48:08.520
And one of those applications this
coming week and the next problem set

01:48:08.520 --> 01:48:11.580
is going to be that of cryptography,
the art of scrambling or encrypting

01:48:11.580 --> 01:48:12.597
information.

01:48:12.597 --> 01:48:14.430
And if you're trying
to encrypt information,

01:48:14.430 --> 01:48:16.830
like messages, well, those
messages might very well

01:48:16.830 --> 01:48:19.260
be written in English or
in ASCII, if you will.

01:48:19.260 --> 01:48:23.250
And you might want to convert some of
those ASCII characters from one thing

01:48:23.250 --> 01:48:27.480
to another so that if your message
is intercepted by some third party,

01:48:27.480 --> 01:48:30.990
they can't actually decipher or figure
out what it is that you've sent.

01:48:30.990 --> 01:48:33.360
So I feel like we're almost toward--

01:48:33.360 --> 01:48:35.550
we're almost at the
ability where, in code, we

01:48:35.550 --> 01:48:39.270
can start to convert one word to
another or to scramble our text.

01:48:39.270 --> 01:48:41.490
But we do need a couple
of more building blocks.

01:48:41.490 --> 01:48:44.040
So recall that we left
off with this picture

01:48:44.040 --> 01:48:47.160
here, where we had two words in the
computer's memory, "HI!" and "BYE!",

01:48:47.160 --> 01:48:50.610
both with exclamation points, but
also both with these backslash 0's

01:48:50.610 --> 01:48:52.800
that you and I do not
put there explicitly.

01:48:52.800 --> 01:48:56.370
They just happen for you any time you
use the double quotes and any time

01:48:56.370 --> 01:48:58.990
you use the get_string function.

01:48:58.990 --> 01:49:03.720
So once we have those in memory, you can
think of them as s and t respectively.

01:49:03.720 --> 01:49:06.480
But a string, s or t, is just an array.

01:49:06.480 --> 01:49:11.040
So again, you can also refer to all of
these individual characters or chars

01:49:11.040 --> 01:49:15.420
via the new square bracket notation
of today, s[0], s[1], s[2], s[3],

01:49:15.420 --> 01:49:21.210
and then t[0], t[1], [2], [3],
and [4], and then whatever else is

01:49:21.210 --> 01:49:22.470
in the computer's memory.

01:49:22.470 --> 01:49:26.880
But you know what you can even do
is this-- suppose that instead we

01:49:26.880 --> 01:49:28.980
wanted to have an array of words.

01:49:28.980 --> 01:49:32.650
So before, we had an array of
scores, an array of integers.

01:49:32.650 --> 01:49:35.370
But now suppose we wanted in the
context of some other program

01:49:35.370 --> 01:49:36.780
to have an array of words.

01:49:36.780 --> 01:49:37.800
You can totally do that.

01:49:37.800 --> 01:49:40.560
There's nothing stopping you
from having an array of words.

01:49:40.560 --> 01:49:42.240
And the syntax is going to be identical.

01:49:42.240 --> 01:49:48.150
Notice, if I want an array called
words that has room for two strings,

01:49:48.150 --> 01:49:51.180
I literally just say, string words[2].

01:49:51.180 --> 01:49:56.540
This means, hey, computer, give me an
array of size 2, each of whose members

01:49:56.540 --> 01:49:57.540
is going to be a string.

01:49:57.540 --> 01:49:58.920
How do I populate that array?

01:49:58.920 --> 01:50:00.510
Same as before with the scores--

01:50:00.510 --> 01:50:02.790
words[0] gets, quote unquote, "HI!"

01:50:02.790 --> 01:50:05.280
Words[1] gets, quote unquote, "BYE!"

01:50:05.280 --> 01:50:09.540
So that is to say with this code, could
we create a picture similar to the one

01:50:09.540 --> 01:50:10.380
previously?

01:50:10.380 --> 01:50:12.540
But I'm not calling
these strings s and t.

01:50:12.540 --> 01:50:16.890
Now I'm calling them both "words"
at two different locations, 0 and 1

01:50:16.890 --> 01:50:17.830
respectively.

01:50:17.830 --> 01:50:20.040
So we could redraw that
same picture like this.

01:50:20.040 --> 01:50:23.790
Now this word is
technically named words[0].

01:50:23.790 --> 01:50:26.640
And this one is referred to by words[1].

01:50:26.640 --> 01:50:29.310
But again, what is a string?

01:50:29.310 --> 01:50:30.990
A string is an array.

01:50:30.990 --> 01:50:34.360
And yet, here we have
an array of strings.

01:50:34.360 --> 01:50:37.510
So we kind of sort of
have an array of arrays.

01:50:37.510 --> 01:50:40.440
So we've got an array of words,
but a word is just a string.

01:50:40.440 --> 01:50:42.850
And a string is an array of characters.

01:50:42.850 --> 01:50:47.430
So what I really have on the
board is an array of arrays.

01:50:47.430 --> 01:50:51.190
And so here-- and this will be
the last weird syntax for today--

01:50:51.190 --> 01:50:55.050
you can actually have multiple
square brackets back to back.

01:50:55.050 --> 01:50:58.650
So if your variable's called words,
and that variable's an array,

01:50:58.650 --> 01:51:03.240
if you want to get the first word
in the array, you do words[0].

01:51:03.240 --> 01:51:06.090
Once you're at that
word, "HI!", and you want

01:51:06.090 --> 01:51:10.860
to get the first character in that
word, you can similarly do [0].

01:51:10.860 --> 01:51:14.230
So the first bracket refers to
what word do you want in the array.

01:51:14.230 --> 01:51:18.060
The second bracket refers to what
character do you want in that word.

01:51:18.060 --> 01:51:22.320
So now the I is that words[0][1].

01:51:22.320 --> 01:51:25.500
The exclamation point
is that words[0][2].

01:51:25.500 --> 01:51:28.810
And the null character's at words[0][3].

01:51:28.810 --> 01:51:37.508
Meanwhile, the B is that words[1][0],
[1][1], [1][2], [1][3], [1][4].

01:51:37.508 --> 01:51:40.050
So it's almost kind of like a
coordinate system, if you will.

01:51:40.050 --> 01:51:43.200
It's a two-dimensional
array, or an array of arrays.

01:51:43.200 --> 01:51:49.080
So this is only to say that if we
wanted to think of arrays of strings

01:51:49.080 --> 01:51:53.280
as individual characters, we can.

01:51:53.280 --> 01:51:56.680
We have that expressiveness
now to encode.

01:51:56.680 --> 01:52:00.460
So what more can I do now that I
can manipulate things at this level?

01:52:00.460 --> 01:52:03.263
Let me do a program that'll
be pretty applicable,

01:52:03.263 --> 01:52:05.430
I think, with some of our
upcoming programs as well.

01:52:05.430 --> 01:52:06.960
Let me call this one uppercase.

01:52:06.960 --> 01:52:09.240
Let me quickly write a
program whose purpose in life

01:52:09.240 --> 01:52:12.120
is just to convert an
input word to uppercase.

01:52:12.120 --> 01:52:13.540
And let's see how we can do this.

01:52:13.540 --> 01:52:16.380
So let me go ahead and #include cs50.h.

01:52:16.380 --> 01:52:20.050
Let me go ahead and #include stdio.h.

01:52:20.050 --> 01:52:23.160
Let me also include this
time string.h, which is

01:52:23.160 --> 01:52:24.990
going to give us functions like strlen.

01:52:24.990 --> 01:52:27.670
And then let me do int main(void).

01:52:27.670 --> 01:52:31.280
And then let me go ahead here and get
a string from the user like before.

01:52:31.280 --> 01:52:34.030
So I'm just going to ask
the user for a string.

01:52:34.030 --> 01:52:36.370
And I want them to give me
whatever the string should

01:52:36.370 --> 01:52:38.950
be before I uppercase everything.

01:52:38.950 --> 01:52:41.890
Then I'm just going to go ahead
and print out literally "After,"

01:52:41.890 --> 01:52:46.330
just so I can see what happens after
I capitalize everything in the string.

01:52:46.330 --> 01:52:49.690
And now let me go ahead and
do this-- for int i get 0,

01:52:49.690 --> 01:52:53.110
i less than string length of s, i++.

01:52:53.110 --> 01:52:55.180
Wait a minute, I made
that mistake before.

01:52:55.180 --> 01:52:57.200
Let's not repeat this question.

01:52:57.200 --> 01:53:02.740
Let's give myself a second variable-- n
gets string length of s, i less than n,

01:53:02.740 --> 01:53:04.090
i++.

01:53:04.090 --> 01:53:06.340
So again, this is now
becoming boilerplate.

01:53:06.340 --> 01:53:09.400
Any time you want to iterate over
all of the characters in the string,

01:53:09.400 --> 01:53:11.913
this probably is a
reasonable place to start.

01:53:11.913 --> 01:53:13.330
And then let me ask the question--

01:53:13.330 --> 01:53:15.673
I want to iterate over every
character in the string

01:53:15.673 --> 01:53:16.840
that the human has typed in.

01:53:16.840 --> 01:53:20.470
And I want to ask myself a question,
just as we've done with any algorithm.

01:53:20.470 --> 01:53:23.980
Specifically, I want to ask if
the current letter is lowercase,

01:53:23.980 --> 01:53:26.080
let me somehow convert it to uppercase.

01:53:26.080 --> 01:53:28.260
Else, let me just
print it out unchanged.

01:53:28.260 --> 01:53:31.540
So how can I express that using last
week and this week's building blocks?

01:53:31.540 --> 01:53:33.280
Well, let me say something like this--

01:53:33.280 --> 01:53:39.670
if the character at location i in
s, or if the i-th character in s

01:53:39.670 --> 01:53:47.710
is greater than or equal to a lowercase
a, and the i-th character in s

01:53:47.710 --> 01:53:52.270
is less than or equal to a lower
case z, what do I want to do?

01:53:52.270 --> 01:53:55.750
Let me go ahead and
print out a character.

01:53:55.750 --> 01:53:59.320
But that character should
be what? s bracket i,

01:53:59.320 --> 01:54:01.960
but I'm not sure what to do here yet.

01:54:01.960 --> 01:54:03.460
But let me come back to that.

01:54:03.460 --> 01:54:09.250
Else, let me go ahead and just print
out that character unchanged, s[i].

01:54:09.250 --> 01:54:14.270
So minus the placeholder, the question
marks I've put, I'm kind of all the way

01:54:14.270 --> 01:54:14.770
there.

01:54:14.770 --> 01:54:16.872
Line 10 initializes i to 0.

01:54:16.872 --> 01:54:20.080
It's going to count all the way up to
n, where n is the length of the string.

01:54:20.080 --> 01:54:21.310
And it's going to keep incrementing i.

01:54:21.310 --> 01:54:22.393
So we've seen that before.

01:54:22.393 --> 01:54:25.330
And again, that's going to
become muscle memory before long.

01:54:25.330 --> 01:54:28.480
Line 12 is a little new,
but it uses building blocks

01:54:28.480 --> 01:54:29.532
from last week and this.

01:54:29.532 --> 01:54:31.240
This week, we have
the new square bracket

01:54:31.240 --> 01:54:34.810
notation to get the i-th
character in the string s.

01:54:34.810 --> 01:54:37.870
Greater than or equal to, less than
or equal to-- we saw at least one

01:54:37.870 --> 01:54:38.770
of those last week.

01:54:38.770 --> 01:54:41.860
That just means greater than or
equal to, less than or equal to.

01:54:41.860 --> 01:54:46.370
I mentioned && last week, which
is the logical AND operator,

01:54:46.370 --> 01:54:49.150
which means you can check
one condition and another.

01:54:49.150 --> 01:54:52.540
And the whole thing is true
if both of those are true.

01:54:52.540 --> 01:54:54.440
This is a bit weird today.

01:54:54.440 --> 01:54:57.100
But if you want to express,
is the current character

01:54:57.100 --> 01:55:01.930
between lowercase a and
lowercase z, totally fine

01:55:01.930 --> 01:55:07.750
to implicitly treat a and z as
numbers, which they really are.

01:55:07.750 --> 01:55:11.180
Because again, if we come back
to our favorite ASCII chart,

01:55:11.180 --> 01:55:16.600
you'll see again that lowercase a
has a number associated with it, 97.

01:55:16.600 --> 01:55:20.410
Lowercase z has a number
associated with it, 122.

01:55:20.410 --> 01:55:25.000
So if I really wanted to be pedantic,
I could go back into my code

01:55:25.000 --> 01:55:28.540
and do something like, well, if
this is greater than or equal to 97,

01:55:28.540 --> 01:55:32.320
and it's less than or equal
to 122, but bad design.

01:55:32.320 --> 01:55:35.272
Like, I'm never going to
remember that lowercase z is 122.

01:55:35.272 --> 01:55:36.730
Like, no one is going to know that.

01:55:36.730 --> 01:55:38.320
It makes the code less obvious.

01:55:38.320 --> 01:55:41.080
Go ahead and write it in
a way that's a little more

01:55:41.080 --> 01:55:43.730
friendly to humans like this.

01:55:43.730 --> 01:55:45.070
But notice this question mark.

01:55:45.070 --> 01:55:46.720
How do I fill in this blank?

01:55:46.720 --> 01:55:48.970
Well, let me go back to the ASCII chart.

01:55:48.970 --> 01:55:51.520
This is subtle, but
this is kind of cool.

01:55:51.520 --> 01:55:53.560
And humans were
definitely thinking ahead.

01:55:53.560 --> 01:55:56.590
Notice that lowercase a is 97.

01:55:56.590 --> 01:55:58.900
Capital A is 65.

01:55:58.900 --> 01:56:01.000
Lowercase b is 98.

01:56:01.000 --> 01:56:03.430
Capital B is 66.

01:56:03.430 --> 01:56:05.965
And notice these two numbers--

01:56:05.965 --> 01:56:13.330
65 to 97, 66 to 98, 67 to 99.

01:56:13.330 --> 01:56:17.320
It would seem that no matter what
letters we compare, lowercase

01:56:17.320 --> 01:56:20.543
and uppercase, they're always 32 apart.

01:56:20.543 --> 01:56:21.460
And that's consistent.

01:56:21.460 --> 01:56:24.290
We could do it for all
26 English letters.

01:56:24.290 --> 01:56:27.520
So if they're always 32 apart,
you know what I could do--

01:56:27.520 --> 01:56:30.730
if I want to take a
lowercase letter, which

01:56:30.730 --> 01:56:33.790
is what I'm thinking about
in line 14, I could just

01:56:33.790 --> 01:56:36.102
subtract off 32 in this case.

01:56:36.102 --> 01:56:37.810
It's not the cleanest,
because again, I'm

01:56:37.810 --> 01:56:39.640
probably going to forget
that math at some point.

01:56:39.640 --> 01:56:41.598
But at least mathematically,
I think that'll do

01:56:41.598 --> 01:56:44.050
the trick because 97 will become 65.

01:56:44.050 --> 01:56:47.922
98 will become 66, which is forcing
those characters to lowercase.

01:56:47.922 --> 01:56:49.630
But they're not being
printed as numbers.

01:56:49.630 --> 01:56:52.870
I'm still using %c to
coerce it to be a char.

01:56:52.870 --> 01:56:56.780
So if I didn't mess any syntax
up here, let me make uppercase.

01:56:56.780 --> 01:56:59.260
OK, ./uppercase.

01:56:59.260 --> 01:57:03.580
And let me go ahead and type in, for
instance, my name in all lowercase.

01:57:03.580 --> 01:57:05.490
And voila, uppercase.

01:57:05.490 --> 01:57:06.490
Now, it's a little ugly.

01:57:06.490 --> 01:57:08.530
I forgot my backslash
n, so let me go ahead

01:57:08.530 --> 01:57:11.620
and add one of those real
quick just to fix the cursor.

01:57:11.620 --> 01:57:14.590
Let me recompile the
code with make uppercase.

01:57:14.590 --> 01:57:17.650
Let me rerun the program with
./uppercase and now type in my name,

01:57:17.650 --> 01:57:18.400
David.

01:57:18.400 --> 01:57:20.050
Let me do it again with Brian.

01:57:20.050 --> 01:57:23.770
And notice that it's capitalizing
everything character by character

01:57:23.770 --> 01:57:26.470
using only today's building blocks.

01:57:26.470 --> 01:57:27.530
This is correct.

01:57:27.530 --> 01:57:30.350
It's pretty well styled because
everything's nicely indented.

01:57:30.350 --> 01:57:33.890
It's very readable even though it might
look a little cryptic at first glance.

01:57:33.890 --> 01:57:35.430
But I think I can do better.

01:57:35.430 --> 01:57:37.940
And I can do better by
using yet another library.

01:57:37.940 --> 01:57:41.270
And here's where C, and really
programming in general, gets powerful.

01:57:41.270 --> 01:57:43.340
The whole point of
using popular languages

01:57:43.340 --> 01:57:46.742
is because so many other people
before you have solved problems

01:57:46.742 --> 01:57:48.200
that you don't need to solve again.

01:57:48.200 --> 01:57:51.230
And I'm sure over the past, like,
50 years, someone has probably

01:57:51.230 --> 01:57:54.770
written a function that
capitalizes letters for me.

01:57:54.770 --> 01:57:56.690
I don't have to do this myself.

01:57:56.690 --> 01:58:00.770
And indeed, there is another
library that I'm going

01:58:00.770 --> 01:58:02.540
to include by way of its header file.

01:58:02.540 --> 01:58:07.050
In ctype.h, type which is the language
C and a bunch of type-related things.

01:58:07.050 --> 01:58:11.270
And in ctype.h, it turns out
there's a function call--

01:58:11.270 --> 01:58:12.650
there's a couple of functions.

01:58:12.650 --> 01:58:15.990
Specifically, let me get
rid of all of this code.

01:58:15.990 --> 01:58:21.980
And let me call a function called
islower and pass to islower s[i].

01:58:21.980 --> 01:58:24.560
And islower, as you might
guess, its purpose in life

01:58:24.560 --> 01:58:27.230
is to return essentially a
Boolean value, true or false,

01:58:27.230 --> 01:58:28.770
if that character is lower.

01:58:28.770 --> 01:58:31.610
And if so, well, let me go ahead
and print out a placeholder

01:58:31.610 --> 01:58:34.280
followed by the
capitalization of that letter.

01:58:34.280 --> 01:58:37.670
Now, before I had to do that annoying
math with minus 32 and figure it out,

01:58:37.670 --> 01:58:44.120
uh-uh, toupper of parentheses s[i].

01:58:44.120 --> 01:58:48.110
And now I can otherwise just print
out that character unchanged,

01:58:48.110 --> 01:58:50.990
just as before, s[i].

01:58:50.990 --> 01:58:52.400
But now notice my program--

01:58:52.400 --> 01:58:54.540
honestly, it's definitely
a little shorter.

01:58:54.540 --> 01:58:56.900
It's a little simpler in
that there's just less code.

01:58:56.900 --> 01:59:00.800
And hopefully, if the person that wrote
islower and toupper did a good job,

01:59:00.800 --> 01:59:01.898
I know it's correct.

01:59:01.898 --> 01:59:03.440
I'm just standing on their shoulders.

01:59:03.440 --> 01:59:07.010
And frankly, my code's more readable
because I understand what islower

01:59:07.010 --> 01:59:11.450
means, whereas that crazy && syntax
and all of the additional code--

01:59:11.450 --> 01:59:14.360
that was just a lot harder to
wrap your mind around, arguably.

01:59:14.360 --> 01:59:19.510
So now if I go ahead and
compile this-- make uppercase.

01:59:19.510 --> 01:59:21.460
OK, that seemed to work well.

01:59:21.460 --> 01:59:24.870
And now I'm going to go ahead and do
./uppercase and type in my name in all

01:59:24.870 --> 01:59:25.800
lowercase again.

01:59:25.800 --> 01:59:26.810
David seems to work.

01:59:26.810 --> 01:59:27.690
Brian seems to work.

01:59:27.690 --> 01:59:29.065
And I could do this all day long.

01:59:29.065 --> 01:59:30.390
It seems to still work.

01:59:30.390 --> 01:59:31.350
But you know what?

01:59:31.350 --> 01:59:33.407
I don't think I have to
be even this explicit.

01:59:33.407 --> 01:59:33.990
You know what?

01:59:33.990 --> 01:59:36.660
I bet if the human who
wrote toupper was smart,

01:59:36.660 --> 01:59:41.700
I bet I can just blindly pass
in any character to toupper,

01:59:41.700 --> 01:59:46.948
and it's only going to uppercase it
if it can be converted to uppercase.

01:59:46.948 --> 01:59:48.740
Otherwise, it'll pass
it through unchanged.

01:59:48.740 --> 01:59:49.448
So you know what?

01:59:49.448 --> 01:59:53.340
Let me get rid of all of this stuff
and really tighten this program up

01:59:53.340 --> 01:59:59.760
and print out a placeholder
for c and then toupper of s[i].

01:59:59.760 --> 02:00:02.760
And sure enough, if you read the
documentation for this function,

02:00:02.760 --> 02:00:07.380
it will handle the case where it's
either lowercase or not lowercase.

02:00:07.380 --> 02:00:09.270
And it will do the right thing.

02:00:09.270 --> 02:00:14.070
So now if I recompile my code,
make uppercase, so far so good.

02:00:14.070 --> 02:00:15.780
./uppercase, David again.

02:00:15.780 --> 02:00:17.370
Voila, it still works.

02:00:17.370 --> 02:00:21.300
And notice truly just how much
tighter, how much cleaner,

02:00:21.300 --> 02:00:23.100
how much shorter my code is.

02:00:23.100 --> 02:00:26.790
And it's more readable in the sense
that this function is pretty well named.

02:00:26.790 --> 02:00:29.070
Toupper is what it's indeed called.

02:00:29.070 --> 02:00:31.140
But there is an important detail here.

02:00:31.140 --> 02:00:34.140
Toupper expects as input a character.

02:00:34.140 --> 02:00:36.090
You cannot pass a whole word to it.

02:00:36.090 --> 02:00:39.480
It is still necessary at this
point for me to be using this loop

02:00:39.480 --> 02:00:41.542
and doing it character by character.

02:00:41.542 --> 02:00:42.750
Now, how would you know this?

02:00:42.750 --> 02:00:45.910
Well, you'll see multiple examples
of this over the weeks to come.

02:00:45.910 --> 02:00:50.000
But if I go to what's called the
manual pages for the language C,

02:00:50.000 --> 02:00:51.750
we have our own web-based
version of them.

02:00:51.750 --> 02:00:54.210
And we'll link this for
you in the course's labs

02:00:54.210 --> 02:00:55.690
and problem sets as needed.

02:00:55.690 --> 02:00:58.680
You can see a list of all of
the available functions in C

02:00:58.680 --> 02:01:00.570
at least that are
frequently used in CS50.

02:01:00.570 --> 02:01:03.510
And if we uncheck a box at the top,
we can see even more functions.

02:01:03.510 --> 02:01:06.660
There's dozens, maybe hundreds
of functions, most of which

02:01:06.660 --> 02:01:08.715
we will not need or use in CS50.

02:01:08.715 --> 02:01:10.590
But this is going to be
true in any language.

02:01:10.590 --> 02:01:13.623
You sort of pick up the building
blocks that you need over time.

02:01:13.623 --> 02:01:15.540
So we'll refer you to
these kinds of resources

02:01:15.540 --> 02:01:18.930
so that you don't rely only on what
we show in section and lecture,

02:01:18.930 --> 02:01:24.010
but you have at your disposal these
other functions and toolkits as well.

02:01:24.010 --> 02:01:28.120
And we'll do the same with Python
and SQL and other languages as well.

02:01:28.120 --> 02:01:32.040
So those are what we
call, again, manual pages.

02:01:32.040 --> 02:01:34.440
All right, a final
feature before we even

02:01:34.440 --> 02:01:38.880
think about cryptography and scrambling
information as for problem set 2.

02:01:38.880 --> 02:01:41.520
So a command-line argument
I mentioned by name before--

02:01:41.520 --> 02:01:44.460
it's like a word you can
type after a program's name

02:01:44.460 --> 02:01:46.960
in order to provide it
input at the command line.

02:01:46.960 --> 02:01:52.140
So make hello-- hello is a command-line
argument to the program, hello.

02:01:52.140 --> 02:01:58.470
Rm space a.out-- a.out was an argument,
a command-line argument to the program

02:01:58.470 --> 02:02:00.130
rm when I wanted to remove it.

02:02:00.130 --> 02:02:02.790
So we've already seen
command-line arguments in action.

02:02:02.790 --> 02:02:05.520
But we haven't actually
written any programs

02:02:05.520 --> 02:02:11.460
that allow you to accept words or other
inputs from the so-called command line.

02:02:11.460 --> 02:02:14.430
Up until now, all of the input you
and I have gotten in our programs

02:02:14.430 --> 02:02:16.440
comes from get_string,
get_int, and so forth.

02:02:16.440 --> 02:02:20.490
We have never been able to look at words
that the human might very well have

02:02:20.490 --> 02:02:23.610
typed at the prompt when
running your program.

02:02:23.610 --> 02:02:25.350
But that's all about to change now.

02:02:25.350 --> 02:02:28.350
Let me go ahead and create
a program called argv.c,

02:02:28.350 --> 02:02:31.140
and it'll become clear
why in just a moment.

02:02:31.140 --> 02:02:36.270
I'm going to go ahead and
include, shall we say, stdio.h.

02:02:36.270 --> 02:02:39.120
And then I'm going to give
myself int main(void).

02:02:39.120 --> 02:02:43.650
And then I'm just going to very
simply go back and change the void.

02:02:43.650 --> 02:02:47.160
So just as our own custom
functions can take inputs--

02:02:47.160 --> 02:02:49.290
and we saw that with get_negative_int.

02:02:49.290 --> 02:02:52.020
We saw that with average today--

02:02:52.020 --> 02:02:54.600
so does main potentially take inputs.

02:02:54.600 --> 02:02:57.120
Up till now though,
we've been saying void.

02:02:57.120 --> 02:02:58.770
And we told you to say void last week.

02:02:58.770 --> 02:03:01.380
And we told you to say
void in problem set 1.

02:03:01.380 --> 02:03:06.780
But now it turns out that C does allow
you to put other inputs into main.

02:03:06.780 --> 02:03:10.720
You can either say, nope, main does
not take any command-line arguments.

02:03:10.720 --> 02:03:15.270
But if it does, you can
say literally, int argc

02:03:15.270 --> 02:03:19.150
and string argv with square brackets.

02:03:19.150 --> 02:03:20.220
So it's a little cryptic.

02:03:20.220 --> 02:03:22.803
And technically, you don't have
to type it precisely this way.

02:03:22.803 --> 02:03:26.220
But human convention would have you
do it, at least for now, in this way.

02:03:26.220 --> 02:03:29.010
This says that main,
your function, main,

02:03:29.010 --> 02:03:33.360
takes an integer as one
input and not a string

02:03:33.360 --> 02:03:36.570
but an array of strings as input.

02:03:36.570 --> 02:03:40.480
And argc is shorthand
notation for argument count.

02:03:40.480 --> 02:03:43.860
Argument count is an integer that's
going to represent the number of words

02:03:43.860 --> 02:03:45.720
that your users type at the prompt.

02:03:45.720 --> 02:03:48.330
Argv is short for argument vector.

02:03:48.330 --> 02:03:50.430
Vector is a fancy way of saying list.

02:03:50.430 --> 02:03:55.470
It is a variable that's going to
store in an array all of the strings

02:03:55.470 --> 02:03:59.940
that a human types at the prompt
after your own program's name.

02:03:59.940 --> 02:04:02.710
So we can use this, for
instance, as follows.

02:04:02.710 --> 02:04:06.330
Suppose that I want to let the user type
their own name at the command prompt.

02:04:06.330 --> 02:04:06.960
I don't want to use get_string.

02:04:06.960 --> 02:04:09.583
I don't want to have to prompt
the human later for their name.

02:04:09.583 --> 02:04:12.750
I want them to be able to run my program
and give me their name all at once,

02:04:12.750 --> 02:04:17.080
just like make, just like rm, and
Clang, and other programs we've seen.

02:04:17.080 --> 02:04:20.850
So I'm going to do this-- if argc == 2--

02:04:20.850 --> 02:04:24.390
so if the number of arguments
to my program is 2--

02:04:24.390 --> 02:04:31.420
go ahead and print out, "hello, %s",
and plug in whatever is that argv[1].

02:04:31.420 --> 02:04:33.450
So more on this in just a moment.

02:04:33.450 --> 02:04:37.770
Else, if argc is not equal to 2, let's
just go with last week's default,

02:04:37.770 --> 02:04:39.190
"hello, world."

02:04:39.190 --> 02:04:41.250
So what is this program's
purpose in life?

02:04:41.250 --> 02:04:43.680
If the human types two
words at the prompt,

02:04:43.680 --> 02:04:47.310
I want to say, "hello, David,"
"hello, Brian," "hello, so-and-so."

02:04:47.310 --> 02:04:50.310
Otherwise, if they don't
type two words at the prompt,

02:04:50.310 --> 02:04:52.630
I'm just going to say the
default "hello, world."

02:04:52.630 --> 02:04:55.780
So let me compile this, make argv.

02:04:55.780 --> 02:05:00.350
And, hm, I didn't get it right here--
unknown type string, unknown type

02:05:00.350 --> 02:05:00.850
string.

02:05:00.850 --> 02:05:01.820
All right, I goofed.

02:05:01.820 --> 02:05:07.150
If I'm using string, recall that now I
need to start using the CS50 library.

02:05:07.150 --> 02:05:09.730
And again, we'll see all the
more why in the coming weeks as

02:05:09.730 --> 02:05:11.440
we take those training wheels off.

02:05:11.440 --> 02:05:13.870
But now I'm going to do
this again, make argv.

02:05:13.870 --> 02:05:14.440
There we go.

02:05:14.440 --> 02:05:18.280
Now it works-- ./argv,
Enter, "hello, world."

02:05:18.280 --> 02:05:20.710
That's pretty much equivalent
to what we did last week.

02:05:20.710 --> 02:05:26.030
But notice if I type in, for instance,
argv[1] David, Enter, it says, "hello,

02:05:26.030 --> 02:05:26.530
David."

02:05:26.530 --> 02:05:29.500
If I type in argv Brian, it says that.

02:05:29.500 --> 02:05:33.710
If I type in Brian Yu,
it says "hello, world."

02:05:33.710 --> 02:05:35.200
So what's going on?

02:05:35.200 --> 02:05:40.990
Well, the way you write programs in C
that accept zero or more command-line

02:05:40.990 --> 02:05:44.620
arguments-- that is, words at the
prompt after your program's name--

02:05:44.620 --> 02:05:48.910
is you change what we have been
doing all this time from void

02:05:48.910 --> 02:05:52.748
to be this into argc string
argv with square brackets.

02:05:52.748 --> 02:05:55.165
And what the computer is going
to do for you automatically

02:05:55.165 --> 02:05:59.170
is it's going to store in argc a
number of the total number of words

02:05:59.170 --> 02:06:01.690
that the human typed in, not
just the arguments, technically

02:06:01.690 --> 02:06:04.420
all of the words, including
your own program's name.

02:06:04.420 --> 02:06:08.650
It's then going to fill this
array of strings, a.k.a. argv,

02:06:08.650 --> 02:06:11.890
with all of the words the human
typed at the prompt, so not just

02:06:11.890 --> 02:06:16.340
the arguments like Brian or David,
but also the name of your program.

02:06:16.340 --> 02:06:20.560
So if the human typed in two total
words, which they did, argv Brian,

02:06:20.560 --> 02:06:24.160
argv David, then I want
to print out, "hello"

02:06:24.160 --> 02:06:27.790
followed by a placeholder and
then whatever value is at argv[1].

02:06:27.790 --> 02:06:29.770
And I'm deliberately not doing 0.

02:06:29.770 --> 02:06:33.340
If I did 0, based on the
verbal definition I just gave,

02:06:33.340 --> 02:06:38.260
if I recompile this program, I don't
want to see this, hello, ./argv.

02:06:38.260 --> 02:06:43.030
So the program's own name is
automatically always stored for you

02:06:43.030 --> 02:06:45.190
at the first location in that array.

02:06:45.190 --> 02:06:48.070
But if you want the first
useful piece of information,

02:06:48.070 --> 02:06:53.860
you actually would, after recompiling
the code here, access it at [1].

02:06:53.860 --> 02:06:58.030
And so in this way do we see
in argv that we can actually

02:06:58.030 --> 02:06:59.350
access individual words.

02:06:59.350 --> 02:07:00.520
But notice this too--

02:07:00.520 --> 02:07:05.410
suppose I want to print out all of
the individual characters in someone's

02:07:05.410 --> 02:07:06.010
input.

02:07:06.010 --> 02:07:06.593
You know what?

02:07:06.593 --> 02:07:08.083
I bet I could even do this.

02:07:08.083 --> 02:07:09.250
Let me go ahead and do this.

02:07:09.250 --> 02:07:13.330
Instead of just printing out
"hello," let me do for int i get 0,

02:07:13.330 --> 02:07:17.620
n equals the string length of argv[1].

02:07:20.270 --> 02:07:24.800
And then over here, I'm going
to do i is less than n, i++.

02:07:24.800 --> 02:07:27.770
All right, so I'm going to
iterate over all of the characters

02:07:27.770 --> 02:07:30.930
in the first real word in argv.

02:07:30.930 --> 02:07:32.160
And what am I going to do?

02:07:32.160 --> 02:07:37.310
Well, let me go ahead and print
out a character that's at argv[1]

02:07:37.310 --> 02:07:38.900
but at location i.

02:07:38.900 --> 02:07:41.300
So I said a moment ago
with our picture that we

02:07:41.300 --> 02:07:47.090
could think of an array of strings as
really just being an array of arrays.

02:07:47.090 --> 02:07:53.570
And so I can employ that syntax here by
going into argv[1] to get me the word

02:07:53.570 --> 02:07:57.440
like "David" or "Brian" or so forth,
and then further index into it with more

02:07:57.440 --> 02:08:02.100
square brackets that get me the D, the
A, the V, the I, the D, and so forth.

02:08:02.100 --> 02:08:05.300
And just to be super clear, let
me put a new line character there

02:08:05.300 --> 02:08:08.070
just so we can see
explicitly what's going on.

02:08:08.070 --> 02:08:10.580
And let me go ahead now and
just delete this "hello, world"

02:08:10.580 --> 02:08:12.205
because I don't want to see any hellos.

02:08:12.205 --> 02:08:14.240
I just want to see the
word the human typed in.

02:08:14.240 --> 02:08:19.500
Make argv-- whoops, what did I do wrong?

02:08:19.500 --> 02:08:25.260
Oh, I used strlen when I shouldn't have
because I haven't included string.h

02:08:25.260 --> 02:08:26.550
at the top.

02:08:26.550 --> 02:08:31.230
OK, now if I recompile this
code and recompile make argv--

02:08:31.230 --> 02:08:36.010
there we go-- ./argv David,
you'll see one character per line.

02:08:36.010 --> 02:08:38.940
And if I do the same with
Brian's name or anyone's name

02:08:38.940 --> 02:08:42.352
and change it to Brian, I'm
printing one character at a time.

02:08:42.352 --> 02:08:44.560
So again, I'm not sure why
you would want to do that.

02:08:44.560 --> 02:08:47.760
But in this case, my goal simply
was to not only iterate over

02:08:47.760 --> 02:08:51.970
the characters in that first
word, but print them out.

02:08:51.970 --> 02:08:56.520
So again, just by applying twice
over this time this principle,

02:08:56.520 --> 02:09:00.570
can we actually see that
a program has access

02:09:00.570 --> 02:09:03.600
to the individual characters
in each of these strings.

02:09:03.600 --> 02:09:06.090
All right, and one last
explanation before we

02:09:06.090 --> 02:09:08.880
introduce the crypto
and application thereof.

02:09:08.880 --> 02:09:11.790
This thing here, this
thing here-- does anyone

02:09:11.790 --> 02:09:15.660
have any idea as to why main,
last week and this week,

02:09:15.660 --> 02:09:19.320
seems to return an int even though
it's not an average function?

02:09:19.320 --> 02:09:21.000
It's not a get_positive_int function.

02:09:21.000 --> 02:09:22.470
It's not get_negative_int.

02:09:22.470 --> 02:09:26.040
Somehow, for some reason, main keeps
returning an int even though we

02:09:26.040 --> 02:09:29.410
have never seen this int in action.

02:09:29.410 --> 02:09:31.040
What might this mean?

02:09:31.040 --> 02:09:33.340
This is the one last
piece that we promised

02:09:33.340 --> 02:09:37.090
last week we would eventually explain.

02:09:37.090 --> 02:09:38.800
What might this mean?

02:09:38.800 --> 02:09:41.420
And this one's a tough one.

02:09:41.420 --> 02:09:43.870
Brian, who do we have?

02:09:43.870 --> 02:09:47.230
How about [? Gred, ?] is it?

02:09:47.230 --> 02:09:51.810
[? GRED: ?] Usually, the functions
in the end have returned 0.

02:09:51.810 --> 02:09:54.060
And that means that the function stops.

02:09:54.060 --> 02:10:00.270
And the 0 is the integer that
pops out of the main function.

02:10:00.270 --> 02:10:03.810
DAVID MALAN: Yeah, and this one's subtle
in that if you had programmed before,

02:10:03.810 --> 02:10:06.390
odds are-- and I'm guessing you have,
[? Gred-- ?] you've seen this in use

02:10:06.390 --> 02:10:07.020
before.

02:10:07.020 --> 02:10:10.350
We humans, though, in the real
world of using Macs and PCs--

02:10:10.350 --> 02:10:13.320
you've actually seen numbers,
integers in weird places.

02:10:13.320 --> 02:10:17.220
Frankly, almost any time your computer
freezes or you see an error message,

02:10:17.220 --> 02:10:21.280
odds are you see an English or some
spoken language in the error message.

02:10:21.280 --> 02:10:23.307
But you very often see a numeric code.

02:10:23.307 --> 02:10:25.140
For instance, if you're
having Zoom trouble,

02:10:25.140 --> 02:10:29.700
you'll often see the number 5 in
the error window in Zoom's program.

02:10:29.700 --> 02:10:31.710
And 5 just means you're
having network issues.

02:10:31.710 --> 02:10:34.710
So programmers often
associate integers with things

02:10:34.710 --> 02:10:36.540
that can go wrong in a program.

02:10:36.540 --> 02:10:42.210
And as [? Gred ?] notes, they use 0 to
connote that nothing has gone wrong,

02:10:42.210 --> 02:10:43.660
that all as well.

02:10:43.660 --> 02:10:48.285
So let me write one final
program here just called exit.c

02:10:48.285 --> 02:10:49.950
that puts this to the test.

02:10:49.950 --> 02:10:54.640
Let me go ahead and write a
program in a file called exit.c

02:10:54.640 --> 02:10:57.870
that's going to introduce what
we're going to call an exit status.

02:10:57.870 --> 02:11:01.230
This is a subtlety that will
be useful as our programs get

02:11:01.230 --> 02:11:02.580
a little more complicated.

02:11:02.580 --> 02:11:06.360
I'm going to go in here
and do #include cs50.h.

02:11:06.360 --> 02:11:09.360
And I'm going to go ahead
and #include stdio.h.

02:11:09.360 --> 02:11:14.970
And I'm going to give myself the longer
version of main, so int argc, string

02:11:14.970 --> 02:11:17.140
argv with the square brackets.

02:11:17.140 --> 02:11:21.690
And in here, I'm going to
say, if argc does not equal 2,

02:11:21.690 --> 02:11:24.290
uh-uh, the human is not
doing what I want them to,

02:11:24.290 --> 02:11:26.040
and I'm going to yell
at them in some way.

02:11:26.040 --> 02:11:28.580
I'm going to say missing
command-line arguments.

02:11:28.580 --> 02:11:31.960
So any kind of error message that I
want the human to see on the screen,

02:11:31.960 --> 02:11:33.900
I'm just going to tell
them with that message.

02:11:33.900 --> 02:11:37.650
But I'm going to very
subtly return the number 1.

02:11:37.650 --> 02:11:39.090
I'm going to return an error code.

02:11:39.090 --> 02:11:41.830
And the human is not necessarily
going to see this code.

02:11:41.830 --> 02:11:45.150
But if we were to have a graphical
user interface or some other feature

02:11:45.150 --> 02:11:47.130
to this program, that
would be the number

02:11:47.130 --> 02:11:49.110
they see in the error
window that pops up,

02:11:49.110 --> 02:11:52.320
just like Zoom might show you the
number 5 if something has gone wrong.

02:11:52.320 --> 02:11:54.870
Similarly, if you've ever
visited a page, frankly,

02:11:54.870 --> 02:11:59.130
and the web page doesn't
exist, you see the integer 404.

02:11:59.130 --> 02:12:01.890
That's not technically the
exact same incarnation of this,

02:12:01.890 --> 02:12:05.440
but it is representative of programmers
using numbers to represent errors.

02:12:05.440 --> 02:12:07.230
So that one, you probably have seen.

02:12:07.230 --> 02:12:11.160
Here, I'm going to go ahead, though,
and by default, say, "hello, %s,"

02:12:11.160 --> 02:12:14.250
just like before, passing
in whatever's in argv[1].

02:12:14.250 --> 02:12:17.940
So same program as before, but I'm not
going to do any of this lame, "hello,

02:12:17.940 --> 02:12:21.580
world" if the human doesn't
type in their name as I expect.

02:12:21.580 --> 02:12:25.110
Instead, I am going to
check, did the human

02:12:25.110 --> 02:12:27.180
give me two words at the command line?

02:12:27.180 --> 02:12:30.210
If not, I'm going to print,
"missing command-line argument,"

02:12:30.210 --> 02:12:32.220
and then return this exit code.

02:12:32.220 --> 02:12:36.750
Otherwise, if all is well, I'm going
to go ahead and return explicitly 0.

02:12:36.750 --> 02:12:40.200
This is another number that the human,
you and I, are never going to see,

02:12:40.200 --> 02:12:42.060
but we could have access to it.

02:12:42.060 --> 02:12:46.200
And frankly, for course purposes,
check50 can have access to this.

02:12:46.200 --> 02:12:48.570
And graphical user interfaces,
when we get to those,

02:12:48.570 --> 02:12:50.980
can have access to these values.

02:12:50.980 --> 02:12:54.160
So 0, as [? Gred ?] notes,
is just all as well.

02:12:54.160 --> 02:12:56.235
But 1 would mean that
something goes wrong.

02:12:56.235 --> 02:12:58.860
So let me go ahead and make exit,
which is kind of appropriate,

02:12:58.860 --> 02:13:00.270
as we're wrapping up here.

02:13:00.270 --> 02:13:02.760
And let me go ahead and do ./exit.

02:13:02.760 --> 02:13:05.700
"Missing command-line
argument" is what's displayed.

02:13:05.700 --> 02:13:09.120
If I go ahead and say, exit
David, now I see "hello, David."

02:13:09.120 --> 02:13:12.570
Or exit Brian, I'll see "exit Brian."

02:13:12.570 --> 02:13:15.120
Now, this is not a technique
you'll need to use often,

02:13:15.120 --> 02:13:19.110
but you can actually see these
return values if you want.

02:13:19.110 --> 02:13:23.970
If I run exit, and I see this error
message, I can very weirdly say,

02:13:23.970 --> 02:13:28.260
echo $?, which is a very
admittedly cryptic way of saying,

02:13:28.260 --> 02:13:30.120
what was my exit status?

02:13:30.120 --> 02:13:32.640
And if you hit Enter, you'll see 1.

02:13:32.640 --> 02:13:35.370
By contrast, if I run exit
of David, and I actually

02:13:35.370 --> 02:13:42.060
see "hello, David," and I do
echo $?, now I will see 0.

02:13:42.060 --> 02:13:45.030
So again, this is not a technique
you and I will use very frequently.

02:13:45.030 --> 02:13:48.480
But it's a capability of a program,
and it's a capability of C,

02:13:48.480 --> 02:13:49.920
that you do now have access to.

02:13:49.920 --> 02:13:52.140
And so in writing
programs moving forward,

02:13:52.140 --> 02:13:55.380
what we will often do in labs
and in problem sets and the like

02:13:55.380 --> 02:14:02.430
is ask you to return from main
either 0 or 1 or maybe 2 or 3 or 4

02:14:02.430 --> 02:14:06.060
based on the problems that might
have gone wrong in your program

02:14:06.060 --> 02:14:09.420
that you have detected and
responded to appropriately.

02:14:09.420 --> 02:14:13.530
So it's a very effective way of
handling errors in a standard way

02:14:13.530 --> 02:14:18.180
so that you know that you are being
proactive about detecting mistakes.

02:14:18.180 --> 02:14:20.540
So what kinds of mistakes
might we handle this week?

02:14:20.540 --> 02:14:22.290
And what kinds of
problems might we solve?

02:14:22.290 --> 02:14:26.100
Well, today was entirely about
deconstructing what a string is.

02:14:26.100 --> 02:14:29.220
Last week, it was just a sequence
of text, a chunk of text.

02:14:29.220 --> 02:14:31.740
Today, it's now an array of characters.

02:14:31.740 --> 02:14:34.950
And we have new syntax in C
for accessing those characters.

02:14:34.950 --> 02:14:38.370
We also today have access to more
libraries, more header files,

02:14:38.370 --> 02:14:41.460
the documentation, therefore, so
that we can actually solve problems

02:14:41.460 --> 02:14:43.290
without writing as much code ourselves.

02:14:43.290 --> 02:14:46.630
We can use other people's code
in the form of these libraries.

02:14:46.630 --> 02:14:49.890
So one problem we will solve this
coming week by way of problems set 2

02:14:49.890 --> 02:14:51.120
is that of readability.

02:14:51.120 --> 02:14:54.150
Like, when you're reading a book
or an essay or a paper or anything,

02:14:54.150 --> 02:14:56.370
what is it that makes it
like a 3rd-grade reading

02:14:56.370 --> 02:14:59.917
level or a 12th-grade reading
level or university reading level?

02:14:59.917 --> 02:15:02.250
Well, all of us probably have
an intuitive sense, right?

02:15:02.250 --> 02:15:05.940
Like, if it's big font and short
words, it's probably for younger kids.

02:15:05.940 --> 02:15:09.000
And if it's really complicated
words with big vocabulary and things

02:15:09.000 --> 02:15:12.460
we don't know, maybe it's
meant for university audiences.

02:15:12.460 --> 02:15:16.440
But we can quantify this a
little more formulaically,

02:15:16.440 --> 02:15:19.328
not necessarily the only way, but
we'll give you a few definitions.

02:15:19.328 --> 02:15:21.120
So for instance, here's
a famous sentence--

02:15:21.120 --> 02:15:23.370
"Mr. And Mrs. Dursley, of
number four, Privet Drive,

02:15:23.370 --> 02:15:26.412
we're proud to say that they were
perfectly normal, thank you very much,"

02:15:26.412 --> 02:15:27.420
and so forth.

02:15:27.420 --> 02:15:32.070
Well, what is it about this text
that puts Harry Potter at grade seven

02:15:32.070 --> 02:15:32.940
reading level?

02:15:32.940 --> 02:15:35.520
Well, it probably has to do
with the vocabulary words.

02:15:35.520 --> 02:15:38.760
But it probably has to do with the
lengths of the sentences, the amount

02:15:38.760 --> 02:15:44.550
of punctuation perhaps, the total number
of characters that you might count up.

02:15:44.550 --> 02:15:48.518
You can imagine quantifying it
just based generically on the look

02:15:48.518 --> 02:15:49.810
and the aesthetics of the text.

02:15:49.810 --> 02:15:50.670
What about this?

02:15:50.670 --> 02:15:53.010
"In computational linguistics,
authorship attribution

02:15:53.010 --> 02:15:55.590
is the task of predicting the author
of document of unknown authorship.

02:15:55.590 --> 02:15:58.673
This task is generally performed by
the analysis of stylometric features--

02:15:58.673 --> 02:16:00.750
particular"-- this is
Brian's senior thesis.

02:16:00.750 --> 02:16:02.650
So this is not a
seventh-grade reading level.

02:16:02.650 --> 02:16:04.860
This was actually rated at grade 16.

02:16:04.860 --> 02:16:08.130
So Brian's pretty sophisticated
when it comes to writing theses.

02:16:08.130 --> 02:16:11.160
But there too, you could perhaps
glean from the sophistication

02:16:11.160 --> 02:16:14.010
of the sentences, the length
thereof, and the words therein--

02:16:14.010 --> 02:16:17.010
there's something we could perhaps
quantify so as to apply numbers.

02:16:17.010 --> 02:16:21.720
And indeed, that's one way you could
assess the readability of a text

02:16:21.720 --> 02:16:24.480
even if you don't have access
to a dictionary with which

02:16:24.480 --> 02:16:27.360
to figure out which are the
actual big or small words.

02:16:27.360 --> 02:16:28.820
And what about cryptography?

02:16:28.820 --> 02:16:32.160
So it's incredibly common
these days and so important

02:16:32.160 --> 02:16:37.020
these days for you and I to use
cryptography, not necessarily using

02:16:37.020 --> 02:16:39.389
algorithms we ourselves come
up with, but rather using

02:16:39.389 --> 02:16:43.469
software, like WhatsApp and Signal
and Telegram and Messenger and others,

02:16:43.469 --> 02:16:48.340
that support encryption between you and
the third party or friend or family,

02:16:48.340 --> 02:16:51.090
or at least minimally the website
with which you're interacting.

02:16:51.090 --> 02:16:55.590
So cryptography is the art of scrambling
information, or hiding information.

02:16:55.590 --> 02:16:59.430
And if that information is text, well,
frankly, as of this third week of CS50,

02:16:59.430 --> 02:17:03.059
we already have the requisite building
blocks for not only representing text,

02:17:03.059 --> 02:17:05.040
but we saw today manipulating it.

02:17:05.040 --> 02:17:09.330
Even just uppercasing characters
allows us to start mutating text.

02:17:09.330 --> 02:17:11.459
Well, what does it mean
to encrypt information?

02:17:11.459 --> 02:17:13.650
Well, it's like our
black box from last week.

02:17:13.650 --> 02:17:14.520
You have some input.

02:17:14.520 --> 02:17:15.395
You want some output.

02:17:15.395 --> 02:17:18.040
The input, we're going to
start calling plaintext.

02:17:18.040 --> 02:17:20.969
The message, you want to send
from yourself to someone else.

02:17:20.969 --> 02:17:22.976
Ciphertext is the output that you want.

02:17:22.976 --> 02:17:24.809
And so in between there,
there's going to be

02:17:24.809 --> 02:17:26.226
what we're going to call a cipher.

02:17:26.226 --> 02:17:30.270
A cipher is an algorithm
that encrypts or scrambles

02:17:30.270 --> 02:17:34.177
its input so as to produce output
that a third party can't understand.

02:17:34.177 --> 02:17:35.969
And hopefully, that
cipher, that algorithm,

02:17:35.969 --> 02:17:40.020
is a reversible process so that when
you receive the scrambled ciphertext,

02:17:40.020 --> 02:17:44.830
you can figure out what it was
that the person sent to you.

02:17:44.830 --> 02:17:48.030
But the key to using
cryptography-- pun intended--

02:17:48.030 --> 02:17:49.282
is to also have a secret key.

02:17:49.282 --> 02:17:51.240
So if you think back to
grade school, maybe you

02:17:51.240 --> 02:17:53.549
were flirting with someone
in class, and you sent them

02:17:53.549 --> 02:17:55.082
a note on a piece of paper.

02:17:55.082 --> 02:17:58.290
Well, hopefully, you didn't just say,
like, I love you, on the piece of paper

02:17:58.290 --> 02:18:00.165
and then pass it through
all of your friends,

02:18:00.165 --> 02:18:02.910
or let alone the teacher,
to the ultimate recipient.

02:18:02.910 --> 02:18:05.340
Maybe you did something
like, an A becomes

02:18:05.340 --> 02:18:08.459
a B. A B becomes a C.
A C becomes a D. Like,

02:18:08.459 --> 02:18:11.740
you kind of apply an algorithm
to add 1 to all of the letters

02:18:11.740 --> 02:18:14.219
so that if the teacher does
intercept it and look at it,

02:18:14.219 --> 02:18:17.070
they probably don't have enough care in
the world to figure out what this is.

02:18:17.070 --> 02:18:18.690
It's just going to look like nonsense.

02:18:18.690 --> 02:18:21.840
But if your friend knows
that you changed A to B, B

02:18:21.840 --> 02:18:26.010
to C by adding 1 to every letter,
they could reverse that process

02:18:26.010 --> 02:18:27.610
and decrypt it.

02:18:27.610 --> 02:18:30.270
So the key, for instance, might
be literally the number 1.

02:18:30.270 --> 02:18:32.610
The message literally
might be, "I LOVE YOU."

02:18:32.610 --> 02:18:35.080
But what would the
ciphertext be, or the output?

02:18:35.080 --> 02:18:38.610
Well, let's consider "I LOVE YOU"
is a string which, as of today,

02:18:38.610 --> 02:18:40.240
is an array of characters.

02:18:40.240 --> 02:18:42.000
So what use is that?

02:18:42.000 --> 02:18:45.123
Well, let's consider exactly that
phrase as though it's an array.

02:18:45.123 --> 02:18:46.290
It's an array of characters.

02:18:46.290 --> 02:18:50.969
We know from last week, characters
are just integers, decimal integers,

02:18:50.969 --> 02:18:53.190
thanks to ASCII, and in turn, Unicode.

02:18:53.190 --> 02:18:55.770
So it turns out I, we
already know, is 73.

02:18:55.770 --> 02:19:04.920
And if we looked up all the others on a
chart, L is 76, 79, 86, 69, 89, 79, 85.

02:19:04.920 --> 02:19:08.400
So we could relatively easily and see--
you might have to check your notes

02:19:08.400 --> 02:19:10.240
and check my sample code and so forth--

02:19:10.240 --> 02:19:15.750
but relatively easily in C convert "I
LOVE YOU" to the corresponding integers

02:19:15.750 --> 02:19:19.290
by just casting, so to
speak, chars to integers.

02:19:19.290 --> 02:19:23.340
I could very easily mathematically,
using the plus operator in C,

02:19:23.340 --> 02:19:26.910
start to add 1 to every
one of these characters,

02:19:26.910 --> 02:19:29.309
thereby encrypting my message.

02:19:29.309 --> 02:19:31.398
But I could send my
friend these numbers.

02:19:31.398 --> 02:19:33.690
But I might as well make it
a little more user friendly

02:19:33.690 --> 02:19:36.209
and cast it back from integers to chars.

02:19:36.209 --> 02:19:42.930
So now it would seem that the ciphertext
for "I LOVE YOU," if using a key of 1--

02:19:42.930 --> 02:19:47.910
and 1 just means change A to B, not
A to C, just move it by one place--

02:19:47.910 --> 02:19:52.740
this is the ciphertext for an
encrypted message of, "I LOVE YOU."

02:19:52.740 --> 02:19:55.840
And so the whole process becomes
1 is the input as the key.

02:19:55.840 --> 02:19:57.810
"I LOVE YOU" is the
input as the plaintext.

02:19:57.810 --> 02:20:00.990
And the output ultimately is
this unpronounceable phrase

02:20:00.990 --> 02:20:03.630
that, again, if the teacher
or some friend intercepts,

02:20:03.630 --> 02:20:06.060
they probably don't
know what's going on.

02:20:06.060 --> 02:20:08.520
And indeed, this is the
essence of cryptography.

02:20:08.520 --> 02:20:12.027
The algorithms that protect our emails
and texts and financial information

02:20:12.027 --> 02:20:13.860
and health information
is hopefully way more

02:20:13.860 --> 02:20:17.160
sophisticated than that
particular algorithm as it is.

02:20:17.160 --> 02:20:19.350
But it reduces to the same process--

02:20:19.350 --> 02:20:23.640
an input key and an input
text followed by some output,

02:20:23.640 --> 02:20:25.050
the so-called ciphertext.

02:20:25.050 --> 02:20:28.500
And this has been with us for decades
now in some form, sometimes even

02:20:28.500 --> 02:20:29.400
mechanical form.

02:20:29.400 --> 02:20:32.760
Back in the day, you could actually
get these little circular devices

02:20:32.760 --> 02:20:35.343
that have letters on the alphabet
on one side, other letters

02:20:35.343 --> 02:20:36.760
on the alphabet on the other side.

02:20:36.760 --> 02:20:39.720
And if you rotate one or
the other, A might line up

02:20:39.720 --> 02:20:41.310
with B, B might line up with C.

02:20:41.310 --> 02:20:44.760
So you can have even a physical
incarnation of cryptography,

02:20:44.760 --> 02:20:49.920
just as was popular in a movie
that seems to play endlessly on TV,

02:20:49.920 --> 02:20:52.930
at least here in the US
around Christmas time.

02:20:52.930 --> 02:20:56.980
And you might recognize if you've
seen A Christmas Story one such look.

02:20:56.980 --> 02:20:59.460
So we'll use just a couple of
minutes of our final moments

02:20:59.460 --> 02:21:02.910
together to take a look at this
real-world incarnation of cryptography

02:21:02.910 --> 02:21:06.533
that undoubtedly you can
probably see on TV this fall.

02:21:06.533 --> 02:21:07.200
[VIDEO PLAYBACK]

02:21:07.200 --> 02:21:09.810
- "Be it known to all and sundry
that Ralph Parker is hereby

02:21:09.810 --> 02:21:12.720
appointed a member of the Little
Orphan Annie secret circle

02:21:12.720 --> 02:21:16.300
and is entitled to all the honors
and benefits occurring thereto."

02:21:16.300 --> 02:21:18.810
- "Signed, Little Orphan Annie."

02:21:18.810 --> 02:21:22.920
"Countersigned, Pierre Andre," in ink.

02:21:22.920 --> 02:21:25.620
Honors and benefits
already at the age of nine.

02:21:25.620 --> 02:21:27.942
[RADIO CHATTER]

02:21:27.942 --> 02:21:28.900
- (ON RADIO) Attention!

02:21:28.900 --> 02:21:29.710
[INAUDIBLE] overboard!

02:21:29.710 --> 02:21:30.165
[CLANGING]

02:21:30.165 --> 02:21:31.530
- (ON RADIO) Come
[INAUDIBLE] Gone overboard!

02:21:31.530 --> 02:21:32.440
- (ON RADIO) [INAUDIBLE]

02:21:32.440 --> 02:21:33.808
- Come on, let's get on with it.

02:21:33.808 --> 02:21:36.100
I don't need all that jazz
about smugglers and pirates.

02:21:36.100 --> 02:21:36.976
[BARKING]

02:21:37.645 --> 02:21:40.270
- (ON RADIO) Listen tomorrow
night for the concluding adventure

02:21:40.270 --> 02:21:42.450
of the Black Pirate Ship.

02:21:42.450 --> 02:21:48.430
Now it's time for Annie's secret message
for you members of the secret circle.

02:21:48.430 --> 02:21:52.150
Remember kids, only members
of any secret circle

02:21:52.150 --> 02:21:54.730
can decode any secret message.

02:21:54.730 --> 02:21:58.900
Remember, Annie is depending on you.

02:21:58.900 --> 02:22:01.480
Set your pins to B-2.

02:22:01.480 --> 02:22:03.760
Here is the message.

02:22:03.760 --> 02:22:05.710
12, 11, 2, 8--

02:22:05.710 --> 02:22:07.540
- I am in my first secret meeting.

02:22:07.540 --> 02:22:12.250
- (ON RADIO) --25, 14, 11, 18, 16, 23--

02:22:12.250 --> 02:22:14.110
- Old Pierre was in great voice tonight.

02:22:14.110 --> 02:22:14.350
- (ON RADIO) --12, 23--

02:22:14.350 --> 02:22:16.767
- I could tell that tonight's
message was really important

02:22:16.767 --> 02:22:19.660
- (ON RADIO) --21, 3, 25.

02:22:19.660 --> 02:22:21.400
That's a message from Annie herself.

02:22:21.400 --> 02:22:22.620
Remember, don't tell anyone.

02:22:22.620 --> 02:22:25.584
[FOOTSTEPS AND PANTING]

02:22:27.560 --> 02:22:31.550
- 90 seconds later, I'm in the only
room in the house where a boy of nine

02:22:31.550 --> 02:22:33.635
could sit in privacy and decode.

02:22:33.635 --> 02:22:43.070
[CHUCKLES] Aha, B. [CHUCKLES] I went
to the next, E. The first word is "be."

02:22:43.070 --> 02:22:45.680
S, it was coming easier now.

02:22:45.680 --> 02:22:47.747
U. [CHUCKLES] 25, that's R.

02:22:47.747 --> 02:22:50.192
- Aw, come on, Ralphie, I gotta go.

02:22:50.192 --> 02:22:51.170
- Come on.

02:22:51.170 --> 02:22:53.126
- I'll be right down, Ma!

02:22:53.126 --> 02:22:54.104
- Gee whiz.

02:22:57.040 --> 02:23:01.120
- T, O. "Be sure to."

02:23:01.120 --> 02:23:02.380
Be sure to what?

02:23:02.380 --> 02:23:04.513
What was Little Orphan
Annie trying to say?

02:23:04.513 --> 02:23:05.180
Be sure to what?

02:23:05.180 --> 02:23:06.970
- Ralphie, Randy has got to go.

02:23:06.970 --> 02:23:08.350
Will you please come out?

02:23:08.350 --> 02:23:09.580
- All right, Ma!

02:23:09.580 --> 02:23:11.470
I'll be right out!

02:23:11.470 --> 02:23:13.370
- I was getting closer now.

02:23:13.370 --> 02:23:15.300
The tension was terrible.

02:23:15.300 --> 02:23:16.310
What was it?

02:23:16.310 --> 02:23:18.776
The fate of the planet
may hang in the balance.

02:23:18.776 --> 02:23:19.276
[KNOCKING]

02:23:19.276 --> 02:23:19.776
- Ralphie!

02:23:19.776 --> 02:23:21.666
Randy's got to go!

02:23:21.666 --> 02:23:25.012
- I'll be right out,
for crying out loud!

02:23:25.012 --> 02:23:26.860
- [CHUCKLES] Almost there.

02:23:26.860 --> 02:23:27.930
My fingers flew.

02:23:27.930 --> 02:23:31.560
My mind was a steel trap,
every pore vibrated.

02:23:31.560 --> 02:23:33.524
It was almost clear.

02:23:33.524 --> 02:23:35.894
Yes, yes, yes, yes.

02:23:35.894 --> 02:23:41.700
- "Be sure to drink your Ovaltine."

02:23:41.700 --> 02:23:42.630
Ovaltine?

02:23:46.510 --> 02:23:47.750
A crummy commercial?

02:23:47.750 --> 02:23:50.890
[MUSIC PLAYING]

02:23:50.890 --> 02:23:52.317
Son of a bitch.

02:23:52.317 --> 02:23:52.900
[END PLAYBACK]

02:23:52.900 --> 02:23:55.030
DAVID MALAN: All right,
that's it for CS50.

02:23:55.030 --> 02:23:57.310
We will see you next time.

02:23:57.310 --> 02:24:00.660
[MUSIC PLAYING]