WEBVTT
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:00:00.000 --> 00:00:00.000
[MUSIC PLAYING]

00:01:18.000 --> 00:01:20.825
DAVID MALAN: This is
CS50 and this is week 2.

00:01:20.825 --> 00:01:23.450
Now that you have some programming
experience under your belts,

00:01:23.450 --> 00:01:25.910
in this more arcane language called c.

00:01:25.910 --> 00:01:28.790
Among our goals today is to help
you understand exactly what you have

00:01:28.790 --> 00:01:30.650
been doing these past several days.

00:01:30.650 --> 00:01:33.955
Wrestling with your first programs in
C, so that you have more of a bottom

00:01:33.955 --> 00:01:36.080
up understanding of what
some of these commands do.

00:01:36.080 --> 00:01:38.580
And, ultimately, what more
we can do with this language.

00:01:38.580 --> 00:01:41.750
So this recall was the very
first program you wrote,

00:01:41.750 --> 00:01:44.870
I wrote in this language
called C, much more textual,

00:01:44.870 --> 00:01:46.970
certainly, than the Scratch equivalent.

00:01:46.970 --> 00:01:51.200
But at the end of the day,
computers, your Mac, your PC,

00:01:51.200 --> 00:01:54.555
VS Code doesn't understand
this actual code.

00:01:54.555 --> 00:01:57.680
What's the format into which we need
to get any program that we write, just

00:01:57.680 --> 00:01:58.180
to recap?

00:01:58.180 --> 00:01:59.202
AUDIENCE: [INAUDIBLE]

00:01:59.202 --> 00:02:01.790
DAVID MALAN: So binary,
otherwise known as machine code.

00:02:01.790 --> 00:02:02.290
Right?

00:02:02.290 --> 00:02:05.870
The 0s and 1s that your computer
actually does understand.

00:02:05.870 --> 00:02:08.030
So somehow we need to
get to this format.

00:02:08.030 --> 00:02:10.730
And up until now, we've been
using this command called make,

00:02:10.730 --> 00:02:13.670
which is aptly named, because
it lets you make programs.

00:02:13.670 --> 00:02:16.430
And the invocation of that
has been pretty simple.

00:02:16.430 --> 00:02:20.450
Make hello looks in your current
directory or folder for a file called

00:02:20.450 --> 00:02:25.100
hello.c, implicitly, and then it
compiles that into a file called hello,

00:02:25.100 --> 00:02:27.650
which itself is executable,
which just means runnable,

00:02:27.650 --> 00:02:29.900
so that you can then do ./hello.

00:02:29.900 --> 00:02:34.190
But it turns out that make is
actually not a compiler itself.

00:02:34.190 --> 00:02:35.840
It does help you make programs.

00:02:35.840 --> 00:02:40.520
But make is this utility that comes on
a lot of systems that makes it easier

00:02:40.520 --> 00:02:44.060
to actually compile code by
using an actual compiler,

00:02:44.060 --> 00:02:48.290
the program that converts source code
to machine code, on your own Mac, or PC,

00:02:48.290 --> 00:02:50.660
or whatever cloud environment
you might be using.

00:02:50.660 --> 00:02:53.330
In fact, what make is
doing for us, is actually,

00:02:53.330 --> 00:02:57.230
running a command automatically
known as clang, for C language.

00:02:57.230 --> 00:03:01.590
And, so here, for instance, in VS
Code, is that very first program again,

00:03:01.590 --> 00:03:03.470
this time in the context
of a text editor,

00:03:03.470 --> 00:03:06.680
and I could compile
this with make hello.

00:03:06.680 --> 00:03:09.567
Let me go ahead and use the
compiler itself manually.

00:03:09.567 --> 00:03:12.650
And we'll see in a moment why we've
been automating the process with make.

00:03:12.650 --> 00:03:15.060
I'm going to run clang instead.

00:03:15.060 --> 00:03:17.340
And then I'm going to run hello.c.

00:03:17.340 --> 00:03:19.490
So it's a little different
how the compiler's used.

00:03:19.490 --> 00:03:22.160
It needs to know, explicitly,
what the file is called.

00:03:22.160 --> 00:03:25.280
I'll go ahead and run
clang, hello.c, Enter.

00:03:25.280 --> 00:03:28.415
Nothing seems to happen, which,
generally speaking, is a good thing.

00:03:28.415 --> 00:03:29.790
Because no errors have popped up.

00:03:29.790 --> 00:03:36.140
And if I do ls for list, you'll see
there is not a file called hello.

00:03:36.140 --> 00:03:39.230
But there is a curiously-named
file called a.out.

00:03:39.230 --> 00:03:42.620
This is a historical convention,
stands for assembler output.

00:03:42.620 --> 00:03:45.380
And this is, just, the default
file name for a program

00:03:45.380 --> 00:03:49.400
that you might compile yourself,
manually, using clang itself.

00:03:49.400 --> 00:03:51.830
Let me go ahead now and
point out that that's

00:03:51.830 --> 00:03:53.340
kind of a stupid name for a program.

00:03:53.340 --> 00:03:56.435
Even though it works,
./a.out would work.

00:03:56.435 --> 00:03:59.060
But if you actually want to
customize the name of your program,

00:03:59.060 --> 00:04:02.720
we could just resort to make,
or we could do explicitly

00:04:02.720 --> 00:04:03.920
what make is doing for us.

00:04:03.920 --> 00:04:06.770
It turns out, some
programs, among them make,

00:04:06.770 --> 00:04:08.990
support what are called
command line arguments,

00:04:08.990 --> 00:04:10.310
and more on those later today.

00:04:10.310 --> 00:04:13.670
But these are literally words or
numbers that you type at your prompt

00:04:13.670 --> 00:04:17.330
after the name of a program that just
influences its behavior in some way.

00:04:17.330 --> 00:04:20.040
It modifies its behavior.

00:04:20.040 --> 00:04:22.940
And it turns out, if you read
the documentation for clang,

00:04:22.940 --> 00:04:28.040
you can actually pass a -o, for
output, command line argument, that

00:04:28.040 --> 00:04:30.260
lets you specify,
explicitly what do you want

00:04:30.260 --> 00:04:31.795
your outputted program to be called?

00:04:31.795 --> 00:04:34.670
And then you go ahead and type the
name of the file that you actually

00:04:34.670 --> 00:04:37.110
want to compile, from
source code to machine code.

00:04:37.110 --> 00:04:38.720
Let me hit Enter now.

00:04:38.720 --> 00:04:41.990
Again, nothing seems to happen,
and I type ls and voila.

00:04:41.990 --> 00:04:45.010
Now we still have the old a.out,
because I didn't delete it yet.

00:04:45.010 --> 00:04:46.010
And I do have hello now.

00:04:46.010 --> 00:04:50.420
So ./hello, voila, runs
hello, world again.

00:04:50.420 --> 00:04:52.160
And let me go ahead
and remove this file.

00:04:52.160 --> 00:04:56.593
I could, of course, resort to using
the Explorer, on the left hand side.

00:04:56.593 --> 00:04:59.510
Which, I am in the habit of closing,
just to give us more room to see.

00:04:59.510 --> 00:05:02.240
But I could go ahead and right-click
or control-click on a.out

00:05:02.240 --> 00:05:03.365
if I want to get rid of it.

00:05:03.365 --> 00:05:06.300
Or again, let me focus on
the command line interface.

00:05:06.300 --> 00:05:07.250
And I can use--

00:05:07.250 --> 00:05:08.030
anyone recall?

00:05:08.030 --> 00:05:11.000
We didn't really use it much,
but what command removes a file?

00:05:11.000 --> 00:05:12.665
AUDIENCE: rm.

00:05:12.665 --> 00:05:16.430
DAVID MALAN: So rm for
remove. rm, a.out, Enter.

00:05:16.430 --> 00:05:20.060
Remove regular file,
a.out, y for yes, enter.

00:05:20.060 --> 00:05:22.640
And now, if I do ls
again, voila, it's gone.

00:05:22.640 --> 00:05:24.650
All right, so, let's
now enhance this program

00:05:24.650 --> 00:05:30.290
to do the second version we ever did,
which was to also include cs50.h,

00:05:30.290 --> 00:05:33.149
so that we have access to functions
like, get string, and the like.

00:05:33.149 --> 00:05:40.340
Let me do string, name, gets,
get string, what's your name,

00:05:40.340 --> 00:05:41.550
question mark.

00:05:41.550 --> 00:05:46.010
And now, let me go ahead and say hello
to that name with our %s placeholder,

00:05:46.010 --> 00:05:46.920
comma, name.

00:05:46.920 --> 00:05:49.160
So this was version 2 of
our program last time,

00:05:49.160 --> 00:05:53.300
that very easily compiled with make
hello, but notice the difference now.

00:05:53.300 --> 00:05:56.360
If I want to compile this
thing myself with clang, using

00:05:56.360 --> 00:05:58.520
that same lesson learned,
all right, let's do it.

00:05:58.520 --> 00:06:05.300
clang-o, hello, just so I get a better
name for the program, hello.c, Enter.

00:06:05.300 --> 00:06:09.750
And a new error pops up that some of
you might have encountered on your own.

00:06:09.750 --> 00:06:13.580
So it's a bit arcane here, and there's
this mention of a cryptic-looking path

00:06:13.580 --> 00:06:15.330
with temp for temporary there.

00:06:15.330 --> 00:06:18.560
But somehow, my issue's in
main, as we can see here.

00:06:18.560 --> 00:06:20.257
It somehow relates to hello.c.

00:06:20.257 --> 00:06:23.090
Even though we might not have seen
this language last time in class,

00:06:23.090 --> 00:06:25.970
but there's an undefined
reference to get string.

00:06:25.970 --> 00:06:27.800
As though get string doesn't exist.

00:06:27.800 --> 00:06:31.340
Now, your first instinct might be, well
maybe I forgot cs50.h, but of course,

00:06:31.340 --> 00:06:32.180
I didn't.

00:06:32.180 --> 00:06:34.310
That's the very first
line of my program.

00:06:34.310 --> 00:06:37.910
But it turns out, make is doing
something else for us, all this time.

00:06:37.910 --> 00:06:41.930
Just putting cs50.h, or any header
file at the top of your code,

00:06:41.930 --> 00:06:46.730
for that matter, just teaches the
compiler that a function will exist.

00:06:46.730 --> 00:06:49.310
It, sort of, asks the compiler
to-- it asks the compiler

00:06:49.310 --> 00:06:52.610
to trust that I will, eventually,
get around to implementing functions,

00:06:52.610 --> 00:06:58.130
like get string, and cs50.h,
and stdio.h, printf, therein.

00:06:58.130 --> 00:07:03.830
But this error here, some kind of
linker command, relates to the fact

00:07:03.830 --> 00:07:05.960
that there's a separate
process for actually

00:07:05.960 --> 00:07:10.280
finding the 0s and 1s that
cs50 compiled long ago for you.

00:07:10.280 --> 00:07:13.850
That authors of this operating
system compiled for you, long ago,

00:07:13.850 --> 00:07:14.900
in the form of printf.

00:07:14.900 --> 00:07:17.840
We need to, somehow,
tell the compiler that we

00:07:17.840 --> 00:07:20.450
need to link in code
that someone else wrote,

00:07:20.450 --> 00:07:23.750
the actual machine code that someone
else wrote and then compiled.

00:07:23.750 --> 00:07:27.497
So to do that, you'd have to
type -lcs50, for instance,

00:07:27.497 --> 00:07:28.580
at the end of the command.

00:07:28.580 --> 00:07:31.548
So additionally, telling clang
that, not only do you want to output

00:07:31.548 --> 00:07:34.340
a file called hello, and you want
to compile a file called hello.c,

00:07:34.340 --> 00:07:39.200
you also want to quote-unquote
link in a bunch of 0s and 1s

00:07:39.200 --> 00:07:43.010
that collectively implement
get string and printf.

00:07:43.010 --> 00:07:47.220
So now, if I hit enter,
this time it compiled OK.

00:07:47.220 --> 00:07:53.142
And now if I run ./hello, it works
as it did last week, just like that.

00:07:53.142 --> 00:07:56.100
But honestly, this is just going to
get really tedious, really quickly.

00:07:56.100 --> 00:07:57.930
Notice, already, just
to compile my code,

00:07:57.930 --> 00:08:01.417
I have to run clang-o,
hello, hello.c, lcs50,

00:08:01.417 --> 00:08:03.500
and you're going to have
to type more things, too.

00:08:03.500 --> 00:08:06.890
If you wanted to use the math library,
like, to use that round function,

00:08:06.890 --> 00:08:09.440
you would also have
to do -lm, typically,

00:08:09.440 --> 00:08:12.890
to specify give me the math
bits that someone else compiled.

00:08:12.890 --> 00:08:14.970
And the commands just
get longer and longer.

00:08:14.970 --> 00:08:19.520
So moving forward, we won't have
to resort to running clang itself,

00:08:19.520 --> 00:08:21.330
but clang is, indeed, the compiler.

00:08:21.330 --> 00:08:24.380
That is the program that converts
from source code to machine code.

00:08:24.380 --> 00:08:28.438
But we'll continue to use make because
it just automates that process.

00:08:28.438 --> 00:08:30.230
And the commands are
only going to get more

00:08:30.230 --> 00:08:34.640
cryptic the more sophisticated and
more feature full year programs get.

00:08:34.640 --> 00:08:39.620
And make, again, is just a tool
that makes all that happen.

00:08:39.620 --> 00:08:44.300
Let me pause there to see if
there's any questions before then we

00:08:44.300 --> 00:08:45.890
take a look further under the hood.

00:08:45.890 --> 00:08:47.185
Yeah, in front.

00:08:47.185 --> 00:08:50.185
AUDIENCE: Can you explain again what
the -lcs50-- just why you put that?

00:08:50.185 --> 00:08:52.518
DAVID MALAN: Sure, let me
come back to that in a moment.

00:08:52.518 --> 00:08:53.750
What does the -lcs50 mean?

00:08:53.750 --> 00:08:55.917
We'll come back to that,
visually, in just a moment.

00:08:55.917 --> 00:08:58.850
But it means to link in the
0s and 1s that collectively

00:08:58.850 --> 00:09:00.435
implement get string and printf.

00:09:00.435 --> 00:09:02.060
But we'll see that, visually, in a sec.

00:09:02.060 --> 00:09:03.341
Yeah, behind you.

00:09:03.341 --> 00:09:07.073
AUDIENCE: [INAUDIBLE].

00:09:07.073 --> 00:09:08.490
DAVID MALAN: Really good question.

00:09:08.490 --> 00:09:10.850
How come I didn't have
to link in standard I/O?

00:09:10.850 --> 00:09:12.950
Because I used printf in version 1.

00:09:12.950 --> 00:09:16.280
Standard I/O is just, literally,
so standard that it's built in,

00:09:16.280 --> 00:09:17.480
it just works for free.

00:09:17.480 --> 00:09:18.800
CS50, of course, is not.

00:09:18.800 --> 00:09:21.080
It did not come with the
language C or the compiler.

00:09:21.080 --> 00:09:22.250
We ourselves wrote it.

00:09:22.250 --> 00:09:26.600
And other libraries, even though
they might come with the language C,

00:09:26.600 --> 00:09:30.600
they might not be enabled by default,
generally for efficiency purposes.

00:09:30.600 --> 00:09:33.470
So you're not loading more 0s
and 1s into the computer's memory

00:09:33.470 --> 00:09:34.280
than you need to.

00:09:34.280 --> 00:09:37.250
So standard I/O is special, if you will.

00:09:37.250 --> 00:09:38.510
Other questions?

00:09:38.510 --> 00:09:39.500
Yeah?

00:09:39.500 --> 00:09:41.420
AUDIENCE: [INAUDIBLE]

00:09:41.420 --> 00:09:43.160
DAVID MALAN: Oh, what does the -o mean?

00:09:43.160 --> 00:09:46.190
So -o is shorthand for
the English word output,

00:09:46.190 --> 00:09:51.260
and so -o is telling clang to
please output a file called hello,

00:09:51.260 --> 00:09:53.850
because the next thing I
wrote after the command line

00:09:53.850 --> 00:09:59.929
recall was clang -o hello, then
the name of the file, then -lcs50.

00:09:59.929 --> 00:10:03.407
And this is where these commands
do get and stay fairly arcane.

00:10:03.407 --> 00:10:05.240
It's just through muscle
memory and practice

00:10:05.240 --> 00:10:07.610
that you'll start to remember, oh
what are the other commands that you--

00:10:07.610 --> 00:10:10.277
what are the command line arguments
you can provide to programs?

00:10:10.277 --> 00:10:11.570
But we've seen this before.

00:10:11.570 --> 00:10:14.780
Technically, when you run make
hello, the program is called make,

00:10:14.780 --> 00:10:16.980
hello is the command line argument.

00:10:16.980 --> 00:10:19.040
It's an input to the
make function, albeit,

00:10:19.040 --> 00:10:22.250
typed at the prompt, that tells
make what you want to make.

00:10:22.250 --> 00:10:26.180
Even when I used rm a moment
ago, and did rm of a.out,

00:10:26.180 --> 00:10:28.280
the command line argument
there was called a.out

00:10:28.280 --> 00:10:30.740
and it's telling rm what to delete.

00:10:30.740 --> 00:10:35.270
It is entirely dependent on the programs
to decide what their conventions are,

00:10:35.270 --> 00:10:38.090
whether you use dash this
or dash that, but we'll

00:10:38.090 --> 00:10:40.805
see over time, which ones
actually matter in practice.

00:10:40.805 --> 00:10:46.220
So to come back to the first question
about what actually is happening there,

00:10:46.220 --> 00:10:48.562
let's consider the code more closely.

00:10:48.562 --> 00:10:50.270
So here is that first
version of the code

00:10:50.270 --> 00:10:54.590
again, with stdio.h and only
printf, so no cs50 stuff yet.

00:10:54.590 --> 00:10:56.840
Until we add it back in
and had the second version,

00:10:56.840 --> 00:10:59.630
where we actually get the human's name.

00:10:59.630 --> 00:11:02.783
When you run this command,
there's a few things

00:11:02.783 --> 00:11:04.700
that are happening
underneath the hood, and we

00:11:04.700 --> 00:11:06.650
won't dwell on these
kinds of details, indeed,

00:11:06.650 --> 00:11:08.870
we'll abstract it away by using make.

00:11:08.870 --> 00:11:10.940
But it's worth understanding
from the get-go,

00:11:10.940 --> 00:11:13.880
how much automation is going on, so
that when you run these commands,

00:11:13.880 --> 00:11:14.850
it's not magic.

00:11:14.850 --> 00:11:17.940
You have this bottom-up
understanding of what's going on.

00:11:17.940 --> 00:11:21.530
So when we say you've been
compiling your code with make,

00:11:21.530 --> 00:11:23.600
that's a bit of an oversimplification.

00:11:23.600 --> 00:11:26.780
Technically, every time
you compile your code,

00:11:26.780 --> 00:11:29.570
you're having the computer do
four distinct things for you.

00:11:29.570 --> 00:11:33.020
And this is not four distinct things
that you need to memorize and remember

00:11:33.020 --> 00:11:35.180
every time you run your
program, what's happening,

00:11:35.180 --> 00:11:37.820
but it helps to break it
down into building blocks,

00:11:37.820 --> 00:11:42.110
as to how we're getting from source
code, like C, into 0s and 1s.

00:11:42.110 --> 00:11:46.640
It turns out, that when you compile,
quote-unquote, "your code," technically

00:11:46.640 --> 00:11:50.510
speaking, you're doing four things
automatically, and all at once.

00:11:50.510 --> 00:11:53.960
Preprocessing it, compiling it,
assembling it, and linking it.

00:11:53.960 --> 00:11:57.350
Just humans decided, let's just
call the whole process compiling.

00:11:57.350 --> 00:12:00.230
But for a moment, let's
consider what these steps are.

00:12:00.230 --> 00:12:02.690
So preprocessing refers to this.

00:12:02.690 --> 00:12:06.710
If we look at our source code,
version 2 that uses the cs50 library

00:12:06.710 --> 00:12:10.442
and therefore get string, notice that
we have these include lines at top.

00:12:10.442 --> 00:12:12.650
And they're kind of special
versus all the other code

00:12:12.650 --> 00:12:15.710
we've written, because they start
with hash symbols, specifically.

00:12:15.710 --> 00:12:17.660
And that's sort of a
special syntax that means

00:12:17.660 --> 00:12:20.600
that these are, technically,
called preprocessor directives.

00:12:20.600 --> 00:12:25.290
Fancy way of saying they're handled
special versus the rest of your code.

00:12:25.290 --> 00:12:29.870
In fact, if we focus on
cs50.h, recall from last week

00:12:29.870 --> 00:12:35.870
that I provided a hint as to what's
actually in cs50.h, among other things.

00:12:35.870 --> 00:12:40.580
What was the one salient thing that
I said was in cs50.h and therefore,

00:12:40.580 --> 00:12:43.475
why we were including
it in the first place?

00:12:43.475 --> 00:12:44.350
AUDIENCE: Get string?

00:12:44.350 --> 00:12:46.850
DAVID MALAN: So get
string, specifically,

00:12:46.850 --> 00:12:49.160
the prototype for get string.

00:12:49.160 --> 00:12:51.410
We haven't made many of
our own functions yet,

00:12:51.410 --> 00:12:53.840
but recall that any time
we've made our own functions,

00:12:53.840 --> 00:12:56.330
and we've written them
below main in a file,

00:12:56.330 --> 00:12:58.790
we've also had to, somewhat
stupidly, copy paste

00:12:58.790 --> 00:13:01.370
the prototype of the function
at the top of the file,

00:13:01.370 --> 00:13:05.210
just to teach the compiler that
this function doesn't exist, yet,

00:13:05.210 --> 00:13:07.430
it does down there, but it will exist.

00:13:07.430 --> 00:13:08.300
Just trust me.

00:13:08.300 --> 00:13:10.980
So again, that's what these
prototypes are doing for us.

00:13:10.980 --> 00:13:13.340
So therefore, in my
code, If I want to use

00:13:13.340 --> 00:13:16.760
a function like get string,
or printf, for that matter,

00:13:16.760 --> 00:13:19.150
they're not implemented
clearly in the same file,

00:13:19.150 --> 00:13:20.400
they're implemented elsewhere.

00:13:20.400 --> 00:13:22.692
So I need to tell the compiler
to trust me that they're

00:13:22.692 --> 00:13:24.000
implemented somewhere else.

00:13:24.000 --> 00:13:26.810
And so technically,
inside of cs50.h, which

00:13:26.810 --> 00:13:30.410
is installed somewhere in the
cloud's hard drive, so to speak,

00:13:30.410 --> 00:13:34.820
that you all are accessing via VS Code,
there's a line that looks like this.

00:13:34.820 --> 00:13:38.870
A prototype for the get string function
that says the name of the functions

00:13:38.870 --> 00:13:42.830
get string, it takes one input,
or argument, called prompt,

00:13:42.830 --> 00:13:45.710
and that type of that
prompt is a string.

00:13:45.710 --> 00:13:51.150
Get string, not surprisingly, has a
return value and it returns a string.

00:13:51.150 --> 00:13:54.800
So literally, that line and a
bunch of others, are in cs50.h.

00:13:54.800 --> 00:13:58.280
So rather than you all having
to copy paste the prototype,

00:13:58.280 --> 00:14:01.160
you can just trust that
cs50 figured out what it is.

00:14:01.160 --> 00:14:04.970
You can include cs50.h
and the compiler is going

00:14:04.970 --> 00:14:07.420
to go find that prototype for you.

00:14:07.420 --> 00:14:09.480
Same thing in standard
I/O. Someone else-- what

00:14:09.480 --> 00:14:13.620
must clearly be in stdio.h,
among other stuff, that

00:14:13.620 --> 00:14:17.590
motivates our including stdio.h, too?

00:14:17.590 --> 00:14:18.090
Yeah?

00:14:18.090 --> 00:14:18.798
AUDIENCE: Printf.

00:14:18.798 --> 00:14:21.030
DAVID MALAN: Printf, the
prototype for printf,

00:14:21.030 --> 00:14:24.010
and I'll just change it here
in yellow, to be the same.

00:14:24.010 --> 00:14:25.410
And it turns out, the format--

00:14:25.410 --> 00:14:28.590
the prototype for printf
is, actually, pretty fancy,

00:14:28.590 --> 00:14:31.740
because, as you might have noticed,
printf can take one argument, just

00:14:31.740 --> 00:14:35.910
something to print, 2, if you want
to plug a value into it, 3 or more.

00:14:35.910 --> 00:14:38.620
So the dot dot dot just
represents exactly that.

00:14:38.620 --> 00:14:42.330
It's not quite as simple a prototype
as get strain, but more on that

00:14:42.330 --> 00:14:43.115
another time.

00:14:43.115 --> 00:14:46.050
So what does it mean to
preprocess your code?

00:14:46.050 --> 00:14:49.860
The very first thing the
compiler, clang, in this case,

00:14:49.860 --> 00:14:54.270
is doing for you when it reads your
code top-to-bottom, left-to-right, is it

00:14:54.270 --> 00:14:57.960
notices, oh, here is hash include,
oh, here's another hash include.

00:14:57.960 --> 00:15:03.090
And it, essentially, finds those files
on the hard drive, cs50.h, stdio.h,

00:15:03.090 --> 00:15:06.990
and does the equivalent of copying
and pasting them automatically

00:15:06.990 --> 00:15:09.360
into your code at the very top.

00:15:09.360 --> 00:15:12.450
Thereby teaching the compiler
that gets string and printf

00:15:12.450 --> 00:15:14.430
will eventually exist somewhere.

00:15:14.430 --> 00:15:18.480
So that's the preprocessing
step, whereby, again, it's

00:15:18.480 --> 00:15:22.080
just doing a find-and-replace of
anything that starts with hash include.

00:15:22.080 --> 00:15:24.510
It's plugging in the files
there so that you, essentially,

00:15:24.510 --> 00:15:27.780
get all the prototypes
you need automatically.

00:15:27.780 --> 00:15:28.830
OK.

00:15:28.830 --> 00:15:31.230
What does it mean, then,
to compile the results?

00:15:31.230 --> 00:15:33.450
Because at this point
in the story, your code

00:15:33.450 --> 00:15:35.678
now looks like this in
the computer's memory.

00:15:35.678 --> 00:15:37.470
It doesn't change your
file, it's doing all

00:15:37.470 --> 00:15:39.990
of this in the computer's
memory, or RAM, for you.

00:15:39.990 --> 00:15:42.070
But it, essentially, looks like this.

00:15:42.070 --> 00:15:45.600
Well the next step is what's,
technically, really compiling.

00:15:45.600 --> 00:15:48.420
Even though again, we use
compile as an umbrella term.

00:15:48.420 --> 00:15:51.510
Compiling code in C
means to take code that

00:15:51.510 --> 00:15:53.740
now looks like this in
the computer's memory

00:15:53.740 --> 00:15:56.890
and turn it into something
that looks like this.

00:15:56.890 --> 00:15:58.350
Which is way more cryptic.

00:15:58.350 --> 00:16:00.990
But it was just a few
decades ago that, if you

00:16:00.990 --> 00:16:03.930
were taking a class like
CS50 in its earlier form,

00:16:03.930 --> 00:16:07.740
we wouldn't be using C it didn't exist
yet, we would actually be using this,

00:16:07.740 --> 00:16:09.690
something called assembly language.

00:16:09.690 --> 00:16:13.230
And there's different types of,
or flavors of, assembly language.

00:16:13.230 --> 00:16:17.010
But this is about as low level as
you can get to what a computer really

00:16:17.010 --> 00:16:19.410
understands, be it a
Mac, or PC, or a phone,

00:16:19.410 --> 00:16:22.650
before you start getting
into actual 0s and 1s.

00:16:22.650 --> 00:16:24.013
And most of this is cryptic.

00:16:24.013 --> 00:16:27.180
I couldn't tell you what this is doing
unless I thought it through carefully

00:16:27.180 --> 00:16:30.300
and rewound mentally, years
ago, from having studied it,

00:16:30.300 --> 00:16:32.880
but let's highlight a
few key words in yellow.

00:16:32.880 --> 00:16:37.380
Notice that this assembly language
that the computer is outputting

00:16:37.380 --> 00:16:40.530
for you automatically,
still has mention of main

00:16:40.530 --> 00:16:43.290
and it has mention of get string,
and it has mention of printf.

00:16:43.290 --> 00:16:46.358
So there's some relationship to
the C code we saw a moment ago.

00:16:46.358 --> 00:16:48.150
And then if I highlight
these other things,

00:16:48.150 --> 00:16:50.430
these are what are called
computer instructions.

00:16:50.430 --> 00:16:52.740
At the end of the day,
your Mac, your PC,

00:16:52.740 --> 00:16:56.340
your phone actually only
understands very basic instructions,

00:16:56.340 --> 00:17:01.020
like addition, subtraction, division,
multiplication, move into memory,

00:17:01.020 --> 00:17:06.190
load from memory, print something to
the screen, very basic operations.

00:17:06.190 --> 00:17:07.755
And that's what you're seeing here.

00:17:07.755 --> 00:17:12.750
These assembly instructions
are what the computer actually

00:17:12.750 --> 00:17:16.870
feeds into the brains of the computer,
the CPU, the central processing unit.

00:17:16.870 --> 00:17:19.770
And it's that Intel CPU,
or whatever you have,

00:17:19.770 --> 00:17:23.220
that understands this instruction, and
this one, and this one, and this one.

00:17:23.220 --> 00:17:25.860
And collectively, long
story short, all they do

00:17:25.860 --> 00:17:28.620
is print hello, world on
the screen, but in a way

00:17:28.620 --> 00:17:31.910
that the machine understands how to do.

00:17:31.910 --> 00:17:34.500
So let me pause here.

00:17:34.500 --> 00:17:37.010
Are there any questions on
what we mean by preprocessing?

00:17:37.010 --> 00:17:40.850
Which finds and replaces the hash
includes symbols, among others,

00:17:40.850 --> 00:17:44.450
and compiling, which technically
takes your source code,

00:17:44.450 --> 00:17:48.170
once preprocessed, and converts it to
that stuff called assembly language.

00:17:48.170 --> 00:17:50.342
AUDIENCE: [INAUDIBLE] each CPU has--

00:17:50.342 --> 00:17:51.290
DAVID MALAN: Correct.

00:17:51.290 --> 00:17:54.710
Each type of CPU has
its own instruction set.

00:17:54.710 --> 00:17:55.280
Indeed.

00:17:55.280 --> 00:17:58.970
And as a teaser, this is why,
at least back in the day, when

00:17:58.970 --> 00:18:02.900
we used to install software from
CD-ROMs, or some other type of media,

00:18:02.900 --> 00:18:08.222
this is why you can't take a program
that was sold for a Windows computer

00:18:08.222 --> 00:18:09.680
and run it on a Mac, or vice-versa.

00:18:09.680 --> 00:18:14.420
Because the commands, the instructions
that those two products understand,

00:18:14.420 --> 00:18:15.500
are actually different.

00:18:15.500 --> 00:18:20.150
Now Microsoft, or any company, could
generally write code in one language,

00:18:20.150 --> 00:18:24.109
like C or another, and they can
compile it twice, saving a PC version

00:18:24.109 --> 00:18:25.790
and saving a Mac version.

00:18:25.790 --> 00:18:30.109
It's twice as much work and sometimes
you get into some incompatibilities,

00:18:30.109 --> 00:18:33.140
but that's why these steps
are somewhat distinct.

00:18:33.140 --> 00:18:36.710
You can now use the same code and
support even different platforms,

00:18:36.710 --> 00:18:37.940
or systems, if you'd want.

00:18:37.940 --> 00:18:38.440
All right.

00:18:38.440 --> 00:18:39.650
Assembly, assembling.

00:18:39.650 --> 00:18:42.800
Thankfully, this part is fairly
straightforward, at least, in concept.

00:18:42.800 --> 00:18:46.250
To assemble code, which is step
three of four, that is just

00:18:46.250 --> 00:18:50.360
happening for you every time
you run make or, in turn, clang,

00:18:50.360 --> 00:18:53.570
this assembly language, which the
computer generated automatically

00:18:53.570 --> 00:18:57.080
for you from your source code,
is turned into 0s and 1s.

00:18:57.080 --> 00:19:00.783
So that's the step that, last
week, I simplified and said,

00:19:00.783 --> 00:19:03.950
when you compile your code, you convert
it to source code-- from source code

00:19:03.950 --> 00:19:04.970
to machine code.

00:19:04.970 --> 00:19:07.685
Technically, that happens
when you assemble your code.

00:19:07.685 --> 00:19:10.940
But no one in normal
conversations says that, they just

00:19:10.940 --> 00:19:13.280
say compile for all of these terms.

00:19:13.280 --> 00:19:14.310
All right.

00:19:14.310 --> 00:19:17.450
So that's assembling.

00:19:17.450 --> 00:19:19.070
There's one final step.

00:19:19.070 --> 00:19:22.400
Even in this simple program
of getting the user's name

00:19:22.400 --> 00:19:27.120
and then plugging it into printf, I'm
using three different people's code,

00:19:27.120 --> 00:19:27.620
if you will.

00:19:27.620 --> 00:19:30.200
My own, which is in hello.c.

00:19:30.200 --> 00:19:35.600
Some of CS50s, which is
in hello.c, sorry-- which

00:19:35.600 --> 00:19:39.080
is in cs50.c, which is not
a file I've mentioned, yet,

00:19:39.080 --> 00:19:43.220
but it stands to reason, that if
there's a cs50.h that has prototypes,

00:19:43.220 --> 00:19:45.380
turns out, the actual
implementation of get string

00:19:45.380 --> 00:19:47.600
and other things are in cs50.c.

00:19:47.600 --> 00:19:51.290
And there's a third file
somewhere on the hard drive

00:19:51.290 --> 00:19:54.260
that's involved in compiling
even this simple program.

00:19:54.260 --> 00:19:59.971
hello.c, cs50.c, and by that
logic, what might the other be?

00:19:59.971 --> 00:20:00.471
Yeah?

00:20:00.471 --> 00:20:02.275
AUDIENCE: stdio?

00:20:02.275 --> 00:20:03.600
DAVID MALAN: Stdio.c.

00:20:03.600 --> 00:20:06.690
And that's a bit of a white lie,
because that's such a big, fancy library

00:20:06.690 --> 00:20:09.750
that there's actually multiple files
that compose it, but the same idea,

00:20:09.750 --> 00:20:11.380
and we'll take the simplification.

00:20:11.380 --> 00:20:16.200
So when I have this code,
and I compile my code,

00:20:16.200 --> 00:20:21.300
I get those 0s and 1s that end up taking
hello.c and turning it, effectively,

00:20:21.300 --> 00:20:26.830
into 0s and 1s that are combined with
cs50.c, followed by stdio.c as well.

00:20:26.830 --> 00:20:27.840
So let me rewind here.

00:20:27.840 --> 00:20:33.300
Here might be the 0s and 1s for my code,
the two lines of code that I wrote.

00:20:33.300 --> 00:20:37.920
Here might be the 0s and 1s for what
cs50 wrote some years ago in cs50.c.

00:20:37.920 --> 00:20:42.210
Here might be the 0s and 1s that someone
wrote for standard I/O decades ago.

00:20:42.210 --> 00:20:45.720
The last and final step
is that linking command

00:20:45.720 --> 00:20:48.330
that links all of these
0s and 1s together,

00:20:48.330 --> 00:20:53.820
essentially stitches them together
into one single file called hello,

00:20:53.820 --> 00:20:56.385
or called a.out, whatever you name it.

00:20:56.385 --> 00:21:01.650
That last step is what combines all of
these different programmers' 0s and 1s.

00:21:01.650 --> 00:21:04.050
And my God, now we're
really in the weeds.

00:21:04.050 --> 00:21:07.020
Who wants to even think about
running code at this level?

00:21:07.020 --> 00:21:08.160
You shouldn't need to.

00:21:08.160 --> 00:21:09.180
But it's not magic.

00:21:09.180 --> 00:21:11.748
When you're running make,
there's some very concrete steps

00:21:11.748 --> 00:21:14.290
that are happening that humans
have developed over the years,

00:21:14.290 --> 00:21:17.700
over the decades, that breakdown
this big problem of source code going

00:21:17.700 --> 00:21:22.410
to 0s and 1s, or machine code,
into these very specific steps.

00:21:22.410 --> 00:21:26.100
But henceforth, you can
call all of this compiling.

00:21:26.100 --> 00:21:27.120
Questions?

00:21:27.120 --> 00:21:27.780
Or confusion?

00:21:27.780 --> 00:21:28.596
Yeah?

00:21:28.596 --> 00:21:30.804
AUDIENCE: Can you explain
again what a.out signifies?

00:21:30.804 --> 00:21:31.770
DAVID MALAN: Sure.

00:21:31.770 --> 00:21:33.270
What does a.out signify?

00:21:33.270 --> 00:21:37.890
a.out is just the conventional,
default file name for any program

00:21:37.890 --> 00:21:41.280
that you compile directly
with a compiler, like clang.

00:21:41.280 --> 00:21:43.680
It's a meaningless name, though.

00:21:43.680 --> 00:21:47.250
It stands for assembler output, and
assembler might now sound familiar

00:21:47.250 --> 00:21:48.690
from this assembling process.

00:21:48.690 --> 00:21:51.150
It's a lame name for a
computer program, and we

00:21:51.150 --> 00:21:56.450
can override it by outputting
something like hello, instead.

00:21:56.450 --> 00:21:57.317
Yeah?

00:21:57.317 --> 00:22:03.426
AUDIENCE: [INAUDIBLE]

00:22:03.426 --> 00:22:07.860
DAVID MALAN: To recap, there are
other prototypes in those files,

00:22:07.860 --> 00:22:11.910
cs50.h, stdio.h, technically, they're
all included on top of your file,

00:22:11.910 --> 00:22:14.460
even though you, strictly
speaking, don't need most of them,

00:22:14.460 --> 00:22:18.190
but they are there, just in
case you might want them.

00:22:18.190 --> 00:22:19.660
And finally, any other questions?

00:22:19.660 --> 00:22:20.160
Yeah?

00:22:20.160 --> 00:22:23.878
AUDIENCE: [INAUDIBLE]

00:22:23.878 --> 00:22:26.920
DAVID MALAN: Does it matter what order
we're telling the computer to run?

00:22:26.920 --> 00:22:29.140
Sometimes with libraries,
yes, it matters

00:22:29.140 --> 00:22:31.520
what order they are linked in together.

00:22:31.520 --> 00:22:34.330
But for our purposes, it's
really not going to matter.

00:22:34.330 --> 00:22:38.750
It's going to-- make is going to take
care of automating that process for us.

00:22:38.750 --> 00:22:39.250
All right.

00:22:39.250 --> 00:22:41.795
So with that said, henceforth,
compiling, technically,

00:22:41.795 --> 00:22:42.670
is these four things.

00:22:42.670 --> 00:22:46.690
But we'll focus on it as a higher
level concept, an abstraction,

00:22:46.690 --> 00:22:49.880
known as compiling itself.

00:22:49.880 --> 00:22:52.510
So another process that we'll
now begin to focus on all the

00:22:52.510 --> 00:22:55.690
more this week because, invariably,
this past week you ran against--

00:22:55.690 --> 00:22:57.160
ran up against some challenges.

00:22:57.160 --> 00:23:00.550
You probably created your very first
bugs, or mistakes, in a program

00:23:00.550 --> 00:23:03.940
and so let's focus for a moment on
actual techniques for debugging.

00:23:03.940 --> 00:23:07.060
As you spend more time
this semester, in the years

00:23:07.060 --> 00:23:10.270
to come If you continue to program,
you're never, frankly, probably,

00:23:10.270 --> 00:23:13.577
going to write bug
free code, ultimately.

00:23:13.577 --> 00:23:16.660
Though your programs are going to get
more featureful, more sophisticated,

00:23:16.660 --> 00:23:20.230
and we're all going to start to
make more sophisticated mistakes.

00:23:20.230 --> 00:23:22.570
And to this day, I write
buggy code all the time.

00:23:22.570 --> 00:23:24.520
And I'm always horrified
when I do it up here.

00:23:24.520 --> 00:23:26.620
But hopefully, that
won't happen too often.

00:23:26.620 --> 00:23:30.100
But when it does, it's a process,
now, of debugging, trying

00:23:30.100 --> 00:23:32.230
to find the mistakes in your program.

00:23:32.230 --> 00:23:35.600
You don't have to stare at your code,
or shake your fist at your code.

00:23:35.600 --> 00:23:38.590
There are actual tools
that real world programmers

00:23:38.590 --> 00:23:41.860
use to help debug their
code and find these faults.

00:23:41.860 --> 00:23:44.455
So what are some of the techniques
and tools that folks use?

00:23:44.455 --> 00:23:49.440
Well as an aside, if you've ever--

00:23:49.440 --> 00:23:52.840
a bug in a program is a mistake,
that's been around for some time.

00:23:52.840 --> 00:23:58.010
If you've ever heard this tale,
some 50 plus years ago, in 1947.

00:23:58.010 --> 00:24:02.770
This is an entry in a log book written
by a famous computer scientist known

00:24:02.770 --> 00:24:05.230
as-- named Grace Hopper,
who happened to be the one

00:24:05.230 --> 00:24:09.345
to record the very first discovery of a
quote-unquote actual bug in a computer.

00:24:09.345 --> 00:24:11.860
This was like a moth
that had flown into,

00:24:11.860 --> 00:24:17.080
at the time, a very sophisticated system
known as the Harvard Mark II computer,

00:24:17.080 --> 00:24:20.050
very large, refrigerator-sized
type systems,

00:24:20.050 --> 00:24:24.160
in which an actual bug caused an issue.

00:24:24.160 --> 00:24:27.190
The etymology of bug though,
predates this particular instance,

00:24:27.190 --> 00:24:30.580
but here you have, as any computer
scientists might know, the example

00:24:30.580 --> 00:24:32.845
of a first physical bug in a computer.

00:24:32.845 --> 00:24:35.322
How, though, do you go
about removing such a thing?

00:24:35.322 --> 00:24:37.780
Well, let's consider a very
simple scenario from last time,

00:24:37.780 --> 00:24:40.780
for instance, when we were trying to
print out various aspects of Mario,

00:24:40.780 --> 00:24:42.970
like this column of 3 bricks.

00:24:42.970 --> 00:24:46.660
Let's consider how I might go about
implementing a program like this.

00:24:46.660 --> 00:24:51.130
Let me switch back over to VS
Code here, and I'm going to run--

00:24:51.130 --> 00:24:52.750
write a program.

00:24:52.750 --> 00:24:54.640
And I'm not going to
trust myself, so I'm

00:24:54.640 --> 00:24:56.507
going to call it
buggy.c from the get-go,

00:24:56.507 --> 00:24:58.340
knowing that I'm going
to mess something up.

00:24:58.340 --> 00:25:01.150
But I'm going to go ahead
and include stdio.h.

00:25:01.150 --> 00:25:03.940
And I'm going to define main, as usual.

00:25:03.940 --> 00:25:05.950
So hopefully, no mistakes just yet.

00:25:05.950 --> 00:25:08.710
And now, I want to print those
3 bricks on the screen using

00:25:08.710 --> 00:25:10.270
just hashes for bricks.

00:25:10.270 --> 00:25:16.420
So how about 4 int i get 0, i less
than or equal to 3, i plus plus.

00:25:16.420 --> 00:25:18.280
Now, inside of my
curly braces, I'm going

00:25:18.280 --> 00:25:23.960
to go ahead and print out a hash
followed by a backslash n, semicolon.

00:25:23.960 --> 00:25:27.975
All right, saving the file, doing
make, buggy, Enter, it compiles.

00:25:27.975 --> 00:25:33.340
So there's no syntactical errors,
my code is syntactically correct.

00:25:33.340 --> 00:25:36.640
But some of you have probably
seen the logical error already,

00:25:36.640 --> 00:25:39.370
because when I run this
program I don't get

00:25:39.370 --> 00:25:45.430
this picture, which was 3 bricks
high, I seem to have 4 bricks instead.

00:25:45.430 --> 00:25:47.930
Now, this might be jumping out
at you, why it's happening,

00:25:47.930 --> 00:25:49.930
but I've kept the program
simple just so that we

00:25:49.930 --> 00:25:54.010
don't have to find an actual bug, we can
use a tool to find one that we already

00:25:54.010 --> 00:25:55.970
know about, in this case.

00:25:55.970 --> 00:25:59.050
What might be the first strategy
for finding a bug like this,

00:25:59.050 --> 00:26:03.292
rather than staring at your code,
asking a question, trying to think

00:26:03.292 --> 00:26:04.125
through the problem?

00:26:04.125 --> 00:26:07.690
Well, let's actually try to diagnose
the problem more proactively.

00:26:07.690 --> 00:26:10.420
And the simplest way to do
this now, and years from now,

00:26:10.420 --> 00:26:13.870
is, honestly, going to be to
use a function like printf.

00:26:13.870 --> 00:26:15.790
Printf is a wonderfully
useful function, not

00:26:15.790 --> 00:26:18.550
for formatting-- printing
formatted strings and all that, for

00:26:18.550 --> 00:26:21.430
just looking inside
the values of variables

00:26:21.430 --> 00:26:24.352
that you might be curious
about to see what's going on.

00:26:24.352 --> 00:26:25.060
So you know what?

00:26:25.060 --> 00:26:26.320
Let me do this.

00:26:26.320 --> 00:26:29.110
I see that there's 4 coming
out, but I intended 3.

00:26:29.110 --> 00:26:31.740
So clearly, something's
wrong with my i variables.

00:26:31.740 --> 00:26:34.090
So let me be a little more pedantic.

00:26:34.090 --> 00:26:37.300
Let me go inside of this
loop and, temporarily,

00:26:37.300 --> 00:26:40.480
say something explicit, like, i is--

00:26:40.480 --> 00:26:45.200
&i /n, and then just
plug in the value of i.

00:26:45.200 --> 00:26:45.700
Right?

00:26:45.700 --> 00:26:48.970
This is not the program I want to
write, it's the program I'm temporarily

00:26:48.970 --> 00:26:54.400
writing, because now I'm going
to say make buggy, ./buggy.

00:26:54.400 --> 00:26:56.500
And if I look, now,
at the output, I have

00:26:56.500 --> 00:27:01.090
some helpful diagnostic information.
i is 0, and I get a hash, i is 1,

00:27:01.090 --> 00:27:03.610
and I get a hash, 2 and I
get a hash, 3 and I get hash.

00:27:03.610 --> 00:27:04.527
OK, wait a minute.

00:27:04.527 --> 00:27:06.610
I'm clearly going too many
steps because, maybe, I

00:27:06.610 --> 00:27:09.250
forgot that computers are,
essentially, counting from 0,

00:27:09.250 --> 00:27:11.450
and now, oh, it's less than or equal to.

00:27:11.450 --> 00:27:13.030
Now you see it, right?

00:27:13.030 --> 00:27:15.940
Again, trivial example,
but just by using printf,

00:27:15.940 --> 00:27:18.910
you can see inside of
the computer's memory

00:27:18.910 --> 00:27:21.130
by just printing stuff out like this.

00:27:21.130 --> 00:27:25.770
And now, once you've figured it out, oh,
so this should probably be less than 3,

00:27:25.770 --> 00:27:28.140
or I should start
counting from 1, there's

00:27:28.140 --> 00:27:29.640
any number of ways I could fix this.

00:27:29.640 --> 00:27:32.655
But the most conventional is
probably just to say less than 3.

00:27:32.655 --> 00:27:39.180
Now, I can delete my temporary print
statement, rerun make buggy, ./buggy.

00:27:39.180 --> 00:27:41.790
And, voila, problem solved.

00:27:41.790 --> 00:27:43.830
All right, and to this day, I do this.

00:27:43.830 --> 00:27:46.860
Whether it's making a command line
application, or a web application,

00:27:46.860 --> 00:27:49.050
or mobile application,
It's very common to use

00:27:49.050 --> 00:27:51.270
printf, or some equivalent
in any language,

00:27:51.270 --> 00:27:55.350
just to poke around and see what's
inside the computer's memory.

00:27:55.350 --> 00:27:58.570
Thankfully, there's more
sophisticated tools than this.

00:27:58.570 --> 00:28:00.930
Let me go ahead and
reintroduce the bug here.

00:28:00.930 --> 00:28:04.620
And let me reopen my
sidebar at left here.

00:28:04.620 --> 00:28:08.550
Let me now recompile the code
to make sure it's current.

00:28:08.550 --> 00:28:11.310
And I'm going to run a
command called debug50.

00:28:11.310 --> 00:28:15.090
Which is a command that's
representative of a type of program

00:28:15.090 --> 00:28:16.740
known as a debugger.

00:28:16.740 --> 00:28:19.680
And this debugger is
actually built into VS Code.

00:28:19.680 --> 00:28:23.700
And all debug50 is doing for us is
automating the process of starting

00:28:23.700 --> 00:28:25.650
VS Code's built-in debugger.

00:28:25.650 --> 00:28:28.260
So this isn't even a
CS50-specific tool, we've

00:28:28.260 --> 00:28:31.170
just given you a debug50
command to make it easier

00:28:31.170 --> 00:28:32.855
to start it up from the get-go.

00:28:32.855 --> 00:28:37.560
And the way you run this debugger
is you say debug50, space, and then

00:28:37.560 --> 00:28:40.120
the name of the program
that you want to debug.

00:28:40.120 --> 00:28:42.210
So, in this case, . /buggy.

00:28:42.210 --> 00:28:44.010
So you don't mention your c-file.

00:28:44.010 --> 00:28:46.650
You mention your already-compiled code.

00:28:46.650 --> 00:28:52.230
And what this debugger is going
to let me do is, most powerfully,

00:28:52.230 --> 00:28:54.930
walk through my code step-by-step.

00:28:54.930 --> 00:28:58.930
Because every program we've written
thus far, runs from start to finish,

00:28:58.930 --> 00:29:02.325
even if I'm not done thinking
through each step at a time.

00:29:02.325 --> 00:29:05.850
With a debugger, I can
actually click on a line number

00:29:05.850 --> 00:29:09.180
and say pause execution
here, and the debugger

00:29:09.180 --> 00:29:14.130
will let me walk through my code one
step at a time, one second at a time,

00:29:14.130 --> 00:29:16.740
one minute at a time,
at my own human pace.

00:29:16.740 --> 00:29:19.470
Which is super compelling when
the programs get more complicated

00:29:19.470 --> 00:29:22.600
and they might, otherwise,
fly by on the screen.

00:29:22.600 --> 00:29:25.860
So I'm going to click
to the left of line 5.

00:29:25.860 --> 00:29:27.970
And notice that these
little red dots appear.

00:29:27.970 --> 00:29:31.290
And if I click on one it
stays, and gets even redder.

00:29:31.290 --> 00:29:34.230
And I'm going to run debug50 on ./buggy.

00:29:34.230 --> 00:29:39.090
And in just a moment, you'll see that a
new panel opens on the left hand side.

00:29:39.090 --> 00:29:41.910
It's doing some
configuration of the screen.

00:29:41.910 --> 00:29:46.690
Let me zoom out a little bit here so
we can see more on the screen at once.

00:29:46.690 --> 00:29:50.440
And sometimes, you'll see in VS
Code that debug console opens up,

00:29:50.440 --> 00:29:54.480
which looks very cryptic, just go back
to terminal window if that happens.

00:29:54.480 --> 00:29:57.875
Because at the terminal window is where
you can still interact with your code.

00:29:57.875 --> 00:30:00.120
And let's now take a
look at what's going on.

00:30:00.120 --> 00:30:04.650
If I zoom in on my
buggy.c code here, you'll

00:30:04.650 --> 00:30:10.890
notice that we have the same program
as before, but highlighted in yellow

00:30:10.890 --> 00:30:11.820
is line 5.

00:30:11.820 --> 00:30:15.660
Not a coincidence, that's the line
I set a so-called breakpoint at.

00:30:15.660 --> 00:30:20.400
The little red dot means break
here, pause execution here.

00:30:20.400 --> 00:30:23.716
And the yellow line has
not yet been executed.

00:30:23.716 --> 00:30:27.600
But if I, now, at the top of my
screen, notice these little arrows.

00:30:27.600 --> 00:30:28.750
There's one for Play.

00:30:28.750 --> 00:30:30.750
There's one for this,
which, if I hover over it,

00:30:30.750 --> 00:30:34.140
says Step Over, there's another
that's going to say Step Into,

00:30:34.140 --> 00:30:35.820
there's a third that says Step Out.

00:30:35.820 --> 00:30:38.520
I'm just going to use the
first of these, Step Over.

00:30:38.520 --> 00:30:41.580
And I'm going to do this, and
you'll see that the yellow highlight

00:30:41.580 --> 00:30:45.660
moved from line 5 to line
7 because now it's ready,

00:30:45.660 --> 00:30:47.955
but hasn't yet printed out that hash.

00:30:47.955 --> 00:30:51.817
But the most powerful thing here,
notice, is that top left here.

00:30:51.817 --> 00:30:54.150
It's a little cryptic, because
there's a bunch of things

00:30:54.150 --> 00:30:56.910
going on that will make more
sense over time, but at the top

00:30:56.910 --> 00:30:58.470
there's a section called variables.

00:30:58.470 --> 00:31:00.750
Below that, something
called locals, which means

00:31:00.750 --> 00:31:02.820
local to my current function, main.

00:31:02.820 --> 00:31:07.410
And notice, there's my variable
called i, and its current value is 0.

00:31:07.410 --> 00:31:12.810
So now, once I click Step Over
again, watch what happens.

00:31:12.810 --> 00:31:15.660
We go from line 7 back to line 5.

00:31:15.660 --> 00:31:19.455
But look in the terminal window,
one of the hashes has printed.

00:31:19.455 --> 00:31:22.050
But now, it's printed at my own pace.

00:31:22.050 --> 00:31:24.030
I can think through this step-by-step.

00:31:24.030 --> 00:31:26.340
Notice that i has not changed, yet.

00:31:26.340 --> 00:31:29.700
It's still 0 because the yellow
highlighted line hasn't yet executed.

00:31:29.700 --> 00:31:34.140
But the moment I click Step Over,
it's going to execute line 5.

00:31:34.140 --> 00:31:41.010
Now, notice at top left, i has become
1, and nothing has printed, yet,

00:31:41.010 --> 00:31:43.290
because now, highlighted is line 7.

00:31:43.290 --> 00:31:48.000
So if I click Step Over
again, we'll see the hash.

00:31:48.000 --> 00:31:51.930
If I repeat this process at my
own human, comfortable pace,

00:31:51.930 --> 00:31:57.040
I can see my variables changing, I
can see output changing on the screen,

00:31:57.040 --> 00:31:59.902
and I can just think about
should that have just happened.

00:31:59.902 --> 00:32:01.860
I can pause and give
thought to what's actually

00:32:01.860 --> 00:32:06.240
going on without trying to race the
computer and figure it all out at once.

00:32:06.240 --> 00:32:08.490
I'm going to go ahead and
stop here because we already

00:32:08.490 --> 00:32:11.430
know what this particular problem
is, and that brings me back

00:32:11.430 --> 00:32:12.720
to my default terminal window.

00:32:12.720 --> 00:32:16.180
But this debugger, let me
disable the breakpoint now

00:32:16.180 --> 00:32:18.570
so it doesn't keep
breaking, this debugger

00:32:18.570 --> 00:32:20.760
will be your friend
moving forward in order

00:32:20.760 --> 00:32:25.290
to step through your code step-by-step,
at your own pace to figure out

00:32:25.290 --> 00:32:26.820
where something has gone wrong.

00:32:26.820 --> 00:32:30.397
Printf is great, but it gets annoying if
you have to constantly add print this,

00:32:30.397 --> 00:32:33.480
print this, print this, print this,
recompile, rerun it, oh wait a minute,

00:32:33.480 --> 00:32:34.980
print this, print this.

00:32:34.980 --> 00:32:39.780
The debugger lets you do the
equivalent, but automatically.

00:32:39.780 --> 00:32:45.960
Questions on this debugger, which you'll
see all the more hands-on over time?

00:32:45.960 --> 00:32:47.430
Questions on debugger?

00:32:47.430 --> 00:32:48.554
Yeah?

00:32:48.554 --> 00:32:50.560
AUDIENCE: You were using
a Step Over feature.

00:32:50.560 --> 00:32:53.303
What do the other
features in the debugger--

00:32:53.303 --> 00:32:54.720
DAVID MALAN: Really good question.

00:32:54.720 --> 00:32:57.720
We'll see this before long, but those
other buttons that I glossed over,

00:32:57.720 --> 00:33:02.460
step into and step out of, actually
let you step into specific functions

00:33:02.460 --> 00:33:04.200
if I had any more than main.

00:33:04.200 --> 00:33:06.960
So if main called a
function called something,

00:33:06.960 --> 00:33:10.380
and something called a function
called something else, instead of just

00:33:10.380 --> 00:33:14.730
stepping over the entire execution of
that function, I could step into it

00:33:14.730 --> 00:33:17.105
and walk through its
lines of code one by one.

00:33:17.105 --> 00:33:19.020
So any time you have
a problem set you're

00:33:19.020 --> 00:33:22.140
working on that has multiple functions,
you can set a breakpoint in main,

00:33:22.140 --> 00:33:26.250
if you want, or you can set it inside
of one of your additional functions

00:33:26.250 --> 00:33:29.130
to focus your attention only on that.

00:33:29.130 --> 00:33:32.640
And we'll see examples
of that over time.

00:33:32.640 --> 00:33:33.780
All right, so what else?

00:33:33.780 --> 00:33:38.100
And what's the sort of, elephant
in the room, so to speak,

00:33:38.100 --> 00:33:39.750
is actually a duck in this case.

00:33:39.750 --> 00:33:42.160
Why is there this duck and
all of these ducks here?

00:33:42.160 --> 00:33:46.440
Well, it turns out, a third, genuinely
recommended, debugging technique

00:33:46.440 --> 00:33:50.055
is talking through problems, talking
through code with someone else.

00:33:50.055 --> 00:33:52.620
Now, in the absence of having
a family member, or a friend,

00:33:52.620 --> 00:33:56.520
or a roommate who actually wants to
hear you talk about code, of all things,

00:33:56.520 --> 00:34:01.320
generally, programmers turn to a
rubber duck, or other inanimate objects

00:34:01.320 --> 00:34:03.360
if something animate is not available.

00:34:03.360 --> 00:34:06.760
The idea behind rubber duck
debugging, so to speak,

00:34:06.760 --> 00:34:12.750
is that simply by looking at your code
and talking it through, OK, on line 3,

00:34:12.750 --> 00:34:17.040
I'm starting a 4 loop and
I'm initializing i to 0.

00:34:17.040 --> 00:34:18.990
OK, then, I'm printing out a hash.

00:34:18.990 --> 00:34:24.112
Just by talking through your
code, step-by-step, invariably,

00:34:24.112 --> 00:34:26.820
finds you having the proverbial
light bulb go off over your head,

00:34:26.820 --> 00:34:29.040
because you realize, wait a minute
I just said something stupid,

00:34:29.040 --> 00:34:30.510
or I just said something wrong.

00:34:30.510 --> 00:34:34.500
And this is really just a proxy for any
other human, teaching fellow, teacher

00:34:34.500 --> 00:34:36.060
or friend, colleague.

00:34:36.060 --> 00:34:38.440
But in the absence of any
of those people in the room,

00:34:38.440 --> 00:34:40.357
you're welcome to take,
on your way out today.

00:34:40.357 --> 00:34:44.280
One of these little, rubber ducks and
consider using it, for real, any time

00:34:44.280 --> 00:34:47.820
you want to talk through one
of your problems in CS50,

00:34:47.820 --> 00:34:49.140
or maybe life more generally.

00:34:49.140 --> 00:34:51.480
But having it there on
your desk is just a way

00:34:51.480 --> 00:34:55.140
to help you hear illogic
in what you think

00:34:55.140 --> 00:34:57.790
might, otherwise, be logical code.

00:34:57.790 --> 00:35:02.400
So printf, debugging, rubber-duck
debugging are just three of the ways,

00:35:02.400 --> 00:35:05.207
you'll see over time, to
get to the source of code

00:35:05.207 --> 00:35:06.790
that you will write that has mistakes.

00:35:06.790 --> 00:35:08.880
Which is going to happen,
but it will empower you

00:35:08.880 --> 00:35:12.000
all the more to solve those mistakes.

00:35:12.000 --> 00:35:17.440
All right, any questions on debugging,
in general, or these three techniques?

00:35:17.440 --> 00:35:17.940
Yeah?

00:35:17.940 --> 00:35:19.740
AUDIENCE: [INAUDIBLE]

00:35:19.740 --> 00:35:22.650
DAVID MALAN: What's the difference
between Step Over and Step Into?

00:35:22.650 --> 00:35:25.980
At the moment, the only one that's
applicable to the code I just wrote

00:35:25.980 --> 00:35:29.340
is Step Over, because it means
step over each line of code.

00:35:29.340 --> 00:35:34.050
If, though, I had other functions
that I had written in this program,

00:35:34.050 --> 00:35:39.300
maybe lower down in the file, I
could step into those function calls

00:35:39.300 --> 00:35:41.469
and walk through them one at a time.

00:35:41.469 --> 00:35:43.650
So we'll come back to this
with an actual example,

00:35:43.650 --> 00:35:46.230
but step into will allow
me to do exactly that.

00:35:46.230 --> 00:35:49.210
In fact, this is a perfect segue to
doing a little something like this.

00:35:49.210 --> 00:35:51.632
Let me go ahead and open
up another file here.

00:35:51.632 --> 00:35:53.340
And, actually, we'll
use the same, buggy.

00:35:53.340 --> 00:35:56.320
And we're going to write one
other thing that's buggy, as well.

00:35:56.320 --> 00:36:00.000
Let me go up here and
include, as before, cs50.h.

00:36:00.000 --> 00:36:03.780
Let me include stdio.h.

00:36:03.780 --> 00:36:05.520
Let me do int main(void).

00:36:05.520 --> 00:36:08.050
So all of this, I think,
is correct, so far.

00:36:08.050 --> 00:36:11.280
And let's do this, let's
give myself an int called i,

00:36:11.280 --> 00:36:14.530
and let's ask the user
for a negative integer.

00:36:14.530 --> 00:36:17.300
This is not a function that
exists, technically, yet.

00:36:17.300 --> 00:36:20.050
But I'm going to assume, for the
sake of discussion, that it does.

00:36:20.050 --> 00:36:23.700
Then, I'm just going to print
out, with %i and a new line,

00:36:23.700 --> 00:36:25.360
whatever the human typed in.

00:36:25.360 --> 00:36:28.320
So at this point in the story,
my program, I think, is correct.

00:36:28.320 --> 00:36:30.930
Except for the fact that
get negative int is not

00:36:30.930 --> 00:36:33.690
a function in the CS50
library or anywhere else.

00:36:33.690 --> 00:36:35.460
I'm going to need to invent it myself.

00:36:35.460 --> 00:36:41.310
So suppose, in this case, that I declare
a function called get negative int.

00:36:41.310 --> 00:36:45.630
It's return type, so to speak, should
be int, because, as its name suggests,

00:36:45.630 --> 00:36:48.360
I want to hand the user back
in integer, and it's going

00:36:48.360 --> 00:36:50.310
to take no input to keep it simple.

00:36:50.310 --> 00:36:51.810
So I'm just going to say void there.

00:36:51.810 --> 00:36:54.810
No inputs, no special
prompts, nothing like that.

00:36:54.810 --> 00:36:57.600
Let me, now, give myself
some curly braces.

00:36:57.600 --> 00:37:00.510
And let me do something familiar,
perhaps, from problem set 1.

00:37:00.510 --> 00:37:05.550
Let me give myself a variable,
like n, and let me do the following

00:37:05.550 --> 00:37:07.320
within this block of code.

00:37:07.320 --> 00:37:13.590
Assign n the value of get int, asking
the user for a negative integer using

00:37:13.590 --> 00:37:14.850
get int's own prompt.

00:37:14.850 --> 00:37:18.750
And I want to do this while
n is less than 0, because I

00:37:18.750 --> 00:37:20.390
want to get a negative from the user.

00:37:20.390 --> 00:37:24.140
And recall, from having
used this block in the past,

00:37:24.140 --> 00:37:27.770
I can now return n as the
very last step to hand back

00:37:27.770 --> 00:37:31.790
whatever the user has typed in, so
long as they cooperated and gave me

00:37:31.790 --> 00:37:33.750
an actual negative integer.

00:37:33.750 --> 00:37:36.710
Now, I've deliberately
made a mistake here,

00:37:36.710 --> 00:37:39.080
and it's a subtle,
silly, mathematical one,

00:37:39.080 --> 00:37:43.910
but let me compile this program after
copying the prototype up to the top,

00:37:43.910 --> 00:37:45.380
so I don't make that mistake again.

00:37:45.380 --> 00:37:48.470
Let me do make buggy, Enter.

00:37:48.470 --> 00:37:50.720
And now, let me do ./buggy.

00:37:50.720 --> 00:37:54.020
I'll give it a negative
integer, like negative 50.

00:37:54.020 --> 00:37:55.370
Uh-huh.

00:37:55.370 --> 00:37:59.330
That did not take.

00:37:59.330 --> 00:38:00.860
How about negative 5?

00:38:00.860 --> 00:38:02.060
No.

00:38:02.060 --> 00:38:04.500
How about 0?

00:38:04.500 --> 00:38:05.000
All right.

00:38:05.000 --> 00:38:09.080
So it's, clearly, working backwards,
or incorrectly here, logically.

00:38:09.080 --> 00:38:10.800
So how could I go about debugging this?

00:38:10.800 --> 00:38:12.425
Well, I could do what I've done before?

00:38:12.425 --> 00:38:18.920
I could use my printf technique and
say something explicit like n is %i,

00:38:18.920 --> 00:38:25.310
new line, comma n, just to print
it out, let me recompile buggy,

00:38:25.310 --> 00:38:28.640
let me rerun buggy, let
me type in negative 50.

00:38:28.640 --> 00:38:30.630
OK, n is negative 50.

00:38:30.630 --> 00:38:33.173
So that didn't really
help me at this point,

00:38:33.173 --> 00:38:34.590
because that's the same as before.

00:38:34.590 --> 00:38:38.030
So let me do this, debug50, ./buggy.

00:38:38.030 --> 00:38:39.870
Oh, but I've made a mistake.

00:38:39.870 --> 00:38:41.700
So I didn't set my breakpoint, yet.

00:38:41.700 --> 00:38:44.930
So let me do this, and I'll
set a breakpoint this time.

00:38:44.930 --> 00:38:47.330
I could set it here, on line 8.

00:38:47.330 --> 00:38:49.340
Let's do it in main, as before.

00:38:49.340 --> 00:38:51.530
Let me rerun debug50, now.

00:38:51.530 --> 00:38:52.970
On ./buggy.

00:38:52.970 --> 00:38:55.190
That fancy user interface
is going to pop up.

00:38:55.190 --> 00:38:58.310
It's going to highlight the line
that I set the breakpoint on.

00:38:58.310 --> 00:39:01.250
Notice that, on the left
hand side of the screen,

00:39:01.250 --> 00:39:04.650
i is defaulting, at the moment to 0,
because I haven't typed anything in,

00:39:04.650 --> 00:39:05.150
yet.

00:39:05.150 --> 00:39:10.815
But let me, now, Step Over this
line that's highlighted in yellow,

00:39:10.815 --> 00:39:12.440
and you'll see that I'm being prompted.

00:39:12.440 --> 00:39:16.220
So let's type in my negative 50, Enter.

00:39:16.220 --> 00:39:21.470
Notice now that I'm
stuck in that function.

00:39:21.470 --> 00:39:22.250
All right.

00:39:22.250 --> 00:39:26.520
So clearly, the issue seems to be
in my get negative int function.

00:39:26.520 --> 00:39:30.120
So, OK, let me stop this execution.

00:39:30.120 --> 00:39:33.175
My problem doesn't seem to be in
main, per se, maybe it's down here.

00:39:33.175 --> 00:39:33.800
So that's fine.

00:39:33.800 --> 00:39:35.990
Let me set my same breakpoint at line 8.

00:39:35.990 --> 00:39:38.510
Let me rerun debug50 one more time.

00:39:38.510 --> 00:39:43.110
But this time, instead of just stepping
over that line, let's step into it.

00:39:43.110 --> 00:39:45.410
So notice line 8 is, again,
highlighted in yellow.

00:39:45.410 --> 00:39:47.690
In the past I've been
clicking Step Over.

00:39:47.690 --> 00:39:50.180
Let's click Step into, now.

00:39:50.180 --> 00:39:53.480
When I click Step Into,
boom, now, the debugger

00:39:53.480 --> 00:39:56.390
jumps into that specific function.

00:39:56.390 --> 00:39:59.330
Now, I can step through these
lines of code, again and again.

00:39:59.330 --> 00:40:01.700
I can see what the value of
n is as I'm typing it in.

00:40:01.700 --> 00:40:03.500
I can think through my logic, and voila.

00:40:03.500 --> 00:40:07.640
Hopefully, once I've solved the issue,
I can exit the debugger, fix my code,

00:40:07.640 --> 00:40:09.180
and move on.

00:40:09.180 --> 00:40:12.050
So Step Over just goes over
the line, but executes it,

00:40:12.050 --> 00:40:17.210
Step Into lets you go into
other functions you've written.

00:40:17.210 --> 00:40:19.400
So let's go ahead and do this.

00:40:19.400 --> 00:40:23.550
We've got a bunch of
possible approaches that we

00:40:23.550 --> 00:40:25.550
can take to solving some
problems let's go ahead

00:40:25.550 --> 00:40:26.730
and pace ourselves today, though.

00:40:26.730 --> 00:40:27.900
Let's take a five-minute break, here.

00:40:27.900 --> 00:40:30.688
And when we come back, we'll take
a look at that computer's memory

00:40:30.688 --> 00:40:31.730
we've been talking about.

00:40:31.730 --> 00:40:32.950
See you in five.

00:40:32.950 --> 00:40:36.380
All right.

00:40:36.380 --> 00:40:41.000
So let's dive back in.

00:40:41.000 --> 00:40:46.860
Up until now, both, by way of week 1
and problems set 1, for the most part,

00:40:46.860 --> 00:40:50.660
we've just translated from Scratch into
C all of these basic building blocks,

00:40:50.660 --> 00:40:53.700
like loops and conditionals,
Boolean expressions, variables.

00:40:53.700 --> 00:40:54.950
So sort of, more of the same.

00:40:54.950 --> 00:40:58.430
But there are features in C that
we've already stumbled across already,

00:40:58.430 --> 00:41:02.300
like data types, the types of variables
that doesn't exist in Scratch,

00:41:02.300 --> 00:41:04.450
but that, in fact, does
exist in other languages.

00:41:04.450 --> 00:41:06.200
In fact, a few that
we'll see before long.

00:41:06.200 --> 00:41:10.670
So to summarize the types we saw last
week, recall this little list here.

00:41:10.670 --> 00:41:15.050
We had ints, and floats, and
longs, and doubles, and chars,

00:41:15.050 --> 00:41:18.510
there's also Booles and also string,
which we've seen a few times.

00:41:18.510 --> 00:41:21.830
But today, let's actually start to
formalize what these things are,

00:41:21.830 --> 00:41:25.760
and actually what your Mac and PC
are doing when you manipulate bits

00:41:25.760 --> 00:41:29.170
as an int versus a char, versus
a string, versus something else.

00:41:29.170 --> 00:41:31.920
And see if we can't put more tools
into your toolkit, so to speak,

00:41:31.920 --> 00:41:35.630
so we can start quickly writing
more featureful, more sophisticated

00:41:35.630 --> 00:41:36.800
programs in C.

00:41:36.800 --> 00:41:40.640
So it turns out, that on
most systems nowadays,

00:41:40.640 --> 00:41:43.010
though this can vary by
actual computer, this

00:41:43.010 --> 00:41:46.040
is how large each of the
data types, typically,

00:41:46.040 --> 00:41:51.590
is in C. When you store a Boolean value,
a 0 or 1, a true, a false, or true,

00:41:51.590 --> 00:41:52.850
it actually uses 1 byte.

00:41:52.850 --> 00:41:55.100
That's a little excessive,
because, strictly speaking,

00:41:55.100 --> 00:41:58.580
you only need 1 bit,
which is 1/8 of this size.

00:41:58.580 --> 00:42:01.190
But for simplicity,
computers use a whole byte

00:42:01.190 --> 00:42:03.740
to represent a Boole, true or false.

00:42:03.740 --> 00:42:08.040
A char, we saw last week,
is only 1 byte, or 8 bits.

00:42:08.040 --> 00:42:12.950
And this is why ASCII, which uses 1
byte, or technically, only 7 bits early

00:42:12.950 --> 00:42:17.600
on, was confined to only 256
maximally possible characters.

00:42:17.600 --> 00:42:21.940
Notice that an int is
4 bytes, or 32 bits.

00:42:21.940 --> 00:42:24.580
A float is also 4 bytes or 32 bits.

00:42:24.580 --> 00:42:27.850
But the things that we call long,
it's, literally, twice as long,

00:42:27.850 --> 00:42:29.710
8 bytes or 64 bits.

00:42:29.710 --> 00:42:30.430
So is a double.

00:42:30.430 --> 00:42:33.900
A double is 64 bits of precision
for floating point values.

00:42:33.900 --> 00:42:37.215
And a string, for today, we're
going to leave as a question mark.

00:42:37.215 --> 00:42:39.340
We'll come back to that,
later today and next week,

00:42:39.340 --> 00:42:42.520
as to how much space a string
takes up, but, suffice it to say,

00:42:42.520 --> 00:42:45.488
it's going to take up a
variable amount of space,

00:42:45.488 --> 00:42:47.530
depending on whether the
string is short or long.

00:42:47.530 --> 00:42:50.470
But we'll see exactly what
that means, before long.

00:42:50.470 --> 00:42:55.030
So here's a photograph of
a typical piece of memory

00:42:55.030 --> 00:42:57.760
inside of your Mac, or PC, or phone.

00:42:57.760 --> 00:43:00.160
Odds are, it might be a little
smaller in some devices.

00:43:00.160 --> 00:43:02.950
This is known as RAM,
or random access memory.

00:43:02.950 --> 00:43:05.410
Each of these little black
chips on this circuit

00:43:05.410 --> 00:43:07.720
board, the green thing,
these little black chips

00:43:07.720 --> 00:43:10.630
are where 0s and 1s are actually stored.

00:43:10.630 --> 00:43:12.670
Each of those stores
some number of bytes.

00:43:12.670 --> 00:43:15.130
Maybe megabytes, maybe
even gigabytes, nowadays.

00:43:15.130 --> 00:43:21.430
So let's focus on one of those chips,
to give us a zoomed in version, thereof.

00:43:21.430 --> 00:43:25.390
Let's consider the fact that, even
though we don't have to care, exactly ,

00:43:25.390 --> 00:43:29.470
how this kind of thing is made, if
this is, like, 1 gigabyte of memory,

00:43:29.470 --> 00:43:31.930
for the sake of discussion,
it stands to reason that,

00:43:31.930 --> 00:43:35.830
if this thing is storing 1
billion bytes, 1 gigabyte,

00:43:35.830 --> 00:43:38.110
then we can number them, arbitrarily.

00:43:38.110 --> 00:43:41.590
Maybe this will be byte
0, 1, 2, 3, 4, 5, 6, 7, 8.

00:43:41.590 --> 00:43:45.000
Then, maybe, way down here in the bottom
right corner is byte number 1 billion.

00:43:45.000 --> 00:43:48.760
We can just number these things,
as might be our convention.

00:43:48.760 --> 00:43:50.710
Let's draw that graphically.

00:43:50.710 --> 00:43:53.090
Not with a billion squares,
but fewer than those.

00:43:53.090 --> 00:43:55.410
And let's zoom in further,
and consider that.

00:43:55.410 --> 00:43:57.160
At this point in the
story, let's abstract

00:43:57.160 --> 00:43:59.380
away all the hardware,
and all the little wires,

00:43:59.380 --> 00:44:03.730
and just think of memory as taking
up-- or, rather, just think of data

00:44:03.730 --> 00:44:06.170
as taking up some number of bytes.

00:44:06.170 --> 00:44:09.820
So, for instance, if you were to store
a char in a computer's memory, which

00:44:09.820 --> 00:44:14.230
was 1 byte, it might be stored
at this top left-hand location

00:44:14.230 --> 00:44:16.195
of this black chip of memory.

00:44:16.195 --> 00:44:20.290
If you were to store something like
an integer that uses 4 bytes, well,

00:44:20.290 --> 00:44:23.560
it might use four of those bytes,
but they're going to be contiguous

00:44:23.560 --> 00:44:25.220
back-to-back-to-back, in this case.

00:44:25.220 --> 00:44:29.270
If you were to store a long or a double,
you might, actually, need 8 bytes.

00:44:29.270 --> 00:44:31.390
So I'm filling in these
squares to represent

00:44:31.390 --> 00:44:36.160
how much memory and given variable
of some data type would take up.

00:44:36.160 --> 00:44:39.230
1, or 4, or 8, in this case, here.

00:44:39.230 --> 00:44:42.160
Well, from here, let's abstract
away from all of the hardware

00:44:42.160 --> 00:44:44.320
and really focus on
memory as being a grid.

00:44:44.320 --> 00:44:47.650
Or, really, like a canvas that
we can paint any types of data

00:44:47.650 --> 00:44:48.850
onto that we want.

00:44:48.850 --> 00:44:52.600
At the end of the day, all of this
data is just going to be 0s and 1s.

00:44:52.600 --> 00:44:56.500
But it's up to you and I to build
abstractions on top of that.

00:44:56.500 --> 00:45:00.130
Things like actual numbers,
colors, images, movies, and beyond.

00:45:00.130 --> 00:45:02.440
But we'll start
lower-level, here, first.

00:45:02.440 --> 00:45:05.950
Suppose I had a program
that needs three integers.

00:45:05.950 --> 00:45:08.800
A simple program whose purpose
in life is to average your three

00:45:08.800 --> 00:45:12.400
scores on an exam, or some such thing.

00:45:12.400 --> 00:45:17.020
Suppose that your three scores were
these, 72, 73, not too bad, and 33,

00:45:17.020 --> 00:45:18.145
which is particularly low.

00:45:18.145 --> 00:45:23.030
Let's write a program that does
this kind of averaging for us.

00:45:23.030 --> 00:45:24.860
Let me go back to VS Code, here.

00:45:24.860 --> 00:45:28.270
Let me open up a file called scores.c.

00:45:28.270 --> 00:45:30.830
Let me implement this as follows.

00:45:30.830 --> 00:45:35.860
Let me include stdio.h at the
top, int main(void) as before.

00:45:35.860 --> 00:45:41.320
Then, inside of main, let me
declare score 1, which is 72.

00:45:41.320 --> 00:45:43.990
Give me another score, 73.

00:45:43.990 --> 00:45:47.140
Then, a third score, called
score 3, which is going to be 33.

00:45:47.140 --> 00:45:50.740
Now, I'm going to use printf to print
out the average of those things,

00:45:50.740 --> 00:45:52.520
and I can do this in
a few different ways.

00:45:52.520 --> 00:45:57.850
But I'm going to print out %f, and
I'm going to do score 1, plus score 2,

00:45:57.850 --> 00:46:03.760
plus score 3, divided by 3,
close parentheses semicolon.

00:46:03.760 --> 00:46:07.300
Some relatively simple arithmetic to
compute the average of three scores,

00:46:07.300 --> 00:46:10.570
if I'm curious what my average grade
is in the class with these three

00:46:10.570 --> 00:46:11.620
assessments.

00:46:11.620 --> 00:46:15.616
Let me, now, do make scores.

00:46:15.616 --> 00:46:19.240
All right, so I've somehow
made an error already.

00:46:19.240 --> 00:46:25.150
But this one is, actually, germane
to a problem we, hopefully,

00:46:25.150 --> 00:46:26.860
won't encounter too frequently.

00:46:26.860 --> 00:46:27.860
What's going on here?

00:46:27.860 --> 00:46:31.360
So underlined to score 1, plus
score 2, plus score 3, divided by 3.

00:46:31.360 --> 00:46:36.250
Format specifies type double, but
the argument has type int, well,

00:46:36.250 --> 00:46:38.530
what's going on here?

00:46:38.530 --> 00:46:40.430
Because the arithmetic
seems to check out.

00:46:40.430 --> 00:46:40.930
Yeah?

00:46:40.930 --> 00:46:44.560
AUDIENCE: So the computer is doing the
math, but they basically [INAUDIBLE]

00:46:44.560 --> 00:46:49.260
just gives out a value at the
end because, well [INAUDIBLE]

00:46:49.260 --> 00:46:50.210
DAVID MALAN: Correct.

00:46:50.210 --> 00:46:51.640
And we'll come back to
this in more detail,

00:46:51.640 --> 00:46:54.522
but, indeed, what's happening here
is I'm adding three ints together,

00:46:54.522 --> 00:46:56.480
obviously, because I
define them right up here.

00:46:56.480 --> 00:46:59.470
And I'm dividing by another
int, 3, but the catch

00:46:59.470 --> 00:47:03.890
is, recall that C when it performs math,
treats all of these things as integers.

00:47:03.890 --> 00:47:05.810
But integers are not
floating point value.

00:47:05.810 --> 00:47:08.890
So if you actually want to get a
precise, average for your score

00:47:08.890 --> 00:47:12.760
without throwing away the remainder,
everything after the decimal point,

00:47:12.760 --> 00:47:15.430
it turns out, we're going to have to--

00:47:15.430 --> 00:47:17.410
we're going to-- aww--

00:47:17.410 --> 00:47:18.430
we're going to have to--

00:47:18.430 --> 00:47:22.720
[LAUGHTER] we're going to have to
convert this whole expression, somehow,

00:47:22.720 --> 00:47:23.350
to a float.

00:47:23.350 --> 00:47:26.230
And there's a few ways to
do this but the easiest way,

00:47:26.230 --> 00:47:28.540
for now, I'm going to go
ahead and do this up here,

00:47:28.540 --> 00:47:31.360
I'm going to change the
divide by 3 to divide by 3.0.

00:47:31.360 --> 00:47:35.440
Because it turns out, long story short,
in C, so long as one of the values

00:47:35.440 --> 00:47:37.300
participating in an
arithmetic expression

00:47:37.300 --> 00:47:39.730
like this is something
like a float, the rest

00:47:39.730 --> 00:47:44.210
will be treated as promoted to
a floating point value as well.

00:47:44.210 --> 00:47:49.495
So let me, now, recompile this
code with make scores, Enter.

00:47:49.495 --> 00:47:53.500
This time it worked OK, because
I'm treating a float as a float.

00:47:53.500 --> 00:47:55.600
Let me do . /scores, Enter.

00:47:55.600 --> 00:48:00.150
All right, my average is
59.33333 and so forth.

00:48:00.150 --> 00:48:00.650
All right.

00:48:00.650 --> 00:48:03.340
So the math, presumably, checks out.

00:48:03.340 --> 00:48:06.220
Floating point imprecision
per last week aside.

00:48:06.220 --> 00:48:09.280
But let's consider the
design of this program.

00:48:09.280 --> 00:48:16.680
What is, kind of, bad about it, or if
we maintain this program longer term,

00:48:16.680 --> 00:48:19.480
are we going to regret the
design of this program?

00:48:19.480 --> 00:48:20.990
What might not be ideal here?

00:48:20.990 --> 00:48:21.490
Yeah?

00:48:21.490 --> 00:48:30.364
AUDIENCE: [INAUDIBLE]

00:48:30.364 --> 00:48:34.220
DAVID MALAN: Yeah, so in this case,
I have hard coded my three scores.

00:48:34.220 --> 00:48:37.140
So, if I'm hearing you
correctly, this program

00:48:37.140 --> 00:48:39.600
is only ever going to tell
me this specific average.

00:48:39.600 --> 00:48:41.730
I'm not even using
something like, get int

00:48:41.730 --> 00:48:44.790
or get float to get three different
scores, so that's not good.

00:48:44.790 --> 00:48:46.942
And suppose that we wait
later in the semester,

00:48:46.942 --> 00:48:48.400
I think other problems could arise.

00:48:48.400 --> 00:48:48.900
Yeah?

00:48:48.900 --> 00:48:51.020
AUDIENCE: Just thinking
also somewhat of an issue

00:48:51.020 --> 00:48:52.900
that you can't reuse that number.

00:48:52.900 --> 00:48:55.450
DAVID MALAN: I can't
reuse the number because I

00:48:55.450 --> 00:48:59.088
haven't stored the average in some
variable, which in this program, not

00:48:59.088 --> 00:49:01.630
a big deal, but certainly, if
I wanted to reuse it elsewhere,

00:49:01.630 --> 00:49:02.650
that's a problem.

00:49:02.650 --> 00:49:05.025
Let's fast-forward again, a
little later in the semester,

00:49:05.025 --> 00:49:07.390
I don't just have three
test scores or exam scores,

00:49:07.390 --> 00:49:09.430
maybe I have 4, or 5, or 6.

00:49:09.430 --> 00:49:10.690
Where might this take us?

00:49:10.690 --> 00:49:12.301
AUDIENCE: Yeah, if you
ever want to have to take

00:49:12.301 --> 00:49:14.900
the average of any number of
scores other than 3, [INAUDIBLE]

00:49:14.900 --> 00:49:18.110
DAVID MALAN: Yeah, I've sort
of, capped this program at 3.

00:49:18.110 --> 00:49:20.942
And honestly, this is, kind
of, bordering on copy paste.

00:49:20.942 --> 00:49:23.900
Even though the variables, yes, have
different names; score 1, score 2,

00:49:23.900 --> 00:49:24.800
score 3.

00:49:24.800 --> 00:49:27.230
Imagine doing this for a
whole grade book for a class.

00:49:27.230 --> 00:49:32.990
Having to score 4, 5, 6, 11 10, 12,
20, 30, that's a lot of variables.

00:49:32.990 --> 00:49:35.420
You can imagine just
how ugly the code starts

00:49:35.420 --> 00:49:38.635
to get if you're just defining variable
after variable, after variable.

00:49:38.635 --> 00:49:42.740
So it turns out, there are
better ways, in languages like C,

00:49:42.740 --> 00:49:47.240
if you want to have multiple
values stored in memory that

00:49:47.240 --> 00:49:49.040
happened to be of the same data type.

00:49:49.040 --> 00:49:50.420
Let's take a look back
at this memory, here,

00:49:50.420 --> 00:49:52.545
to see what these things
might look like in memory.

00:49:52.545 --> 00:49:54.170
Here's that grid of memory.

00:49:54.170 --> 00:49:56.450
Each of these recall represents a byte.

00:49:56.450 --> 00:49:59.690
To be clear, if I store
score 1 in memory first,

00:49:59.690 --> 00:50:01.130
how many bytes will it take up?

00:50:01.130 --> 00:50:02.520
AUDIENCE: [INAUDIBLE]

00:50:02.520 --> 00:50:03.650
DAVID MALAN: So 4, a.k.a.

00:50:03.650 --> 00:50:04.430
32 bits.

00:50:04.430 --> 00:50:08.578
So I might draw a score 1 as
filling up this part of the memory.

00:50:08.578 --> 00:50:11.870
It's up to the computer as to whether it
goes here, or down there, or wherever.

00:50:11.870 --> 00:50:15.290
I'm just keeping the pictures clean
for today, from the top-left on down.

00:50:15.290 --> 00:50:18.080
If I, then, declare another
variable, called score 2,

00:50:18.080 --> 00:50:20.730
it might end up over there,
also taking up 4 bytes.

00:50:20.730 --> 00:50:23.330
And then score 3 might end up here.

00:50:23.330 --> 00:50:26.880
So that's just representing what's going
on inside of the computer's memory.

00:50:26.880 --> 00:50:30.680
But technically speaking, to
be clear, per week 0, what's

00:50:30.680 --> 00:50:34.580
really being stored in the computer's
memory, are patterns of 0s and 1s.

00:50:34.580 --> 00:50:39.350
32 total, in this case,
because 32 bits is 4 bytes.

00:50:39.350 --> 00:50:43.280
But again, it gets boring
quickly to think in and look

00:50:43.280 --> 00:50:44.760
at binary all the time.

00:50:44.760 --> 00:50:47.120
So we'll, generally, abstract
this away as just using

00:50:47.120 --> 00:50:49.550
decimal numbers, in this case, instead.

00:50:49.550 --> 00:50:54.170
But there might be a better way to
store, not just three of these things,

00:50:54.170 --> 00:50:57.500
but maybe four, maybe,
five, maybe 10, maybe, more,

00:50:57.500 --> 00:51:03.110
by declaring one variable to store
all of them, instead of 3, or 4, or 5,

00:51:03.110 --> 00:51:05.750
or more individual variables.

00:51:05.750 --> 00:51:10.250
The way to do this is by way
of something known as an array.

00:51:10.250 --> 00:51:18.320
An array is another type of data that
allows you to store multiple values

00:51:18.320 --> 00:51:20.980
of the same type back-to-back-to-back.

00:51:20.980 --> 00:51:22.230
That is, to say, contiguously.

00:51:22.230 --> 00:51:29.840
So an array can let you create
memory for one int, or two, or three,

00:51:29.840 --> 00:51:32.600
or even more than
that, but describe them

00:51:32.600 --> 00:51:36.390
all using the same variable
name, the same one name.

00:51:36.390 --> 00:51:40.740
So for instance, if, for one
program, I only need three integers,

00:51:40.740 --> 00:51:45.800
but I don't want to messily declare
them as score 1, score 2, score 3,

00:51:45.800 --> 00:51:46.960
I can do this, instead.

00:51:46.960 --> 00:51:49.130
This is today's first
new piece of syntax,

00:51:49.130 --> 00:51:51.290
the square brackets
that we're now seeing.

00:51:51.290 --> 00:51:57.140
This line of code, here, is
similar to int score 1 semicolon,

00:51:57.140 --> 00:52:00.360
or int score 1 equals 72 semicolon.

00:52:00.360 --> 00:52:05.780
This line of code is declaring for
me, so to speak, an array of size 3.

00:52:05.780 --> 00:52:09.260
And that array is going
to store three integers.

00:52:09.260 --> 00:52:09.770
Why?

00:52:09.770 --> 00:52:14.990
Because the type of that
array is an int, here.

00:52:14.990 --> 00:52:18.110
The square brackets tell the
computer how many ints you want.

00:52:18.110 --> 00:52:18.980
In this case, 3.

00:52:18.980 --> 00:52:21.140
And the name is, of course, scores.

00:52:21.140 --> 00:52:23.540
Which, in English, I've
deliberately pluralized

00:52:23.540 --> 00:52:28.100
so that I can describe this array
as storing multiple scores, indeed.

00:52:28.100 --> 00:52:32.970
So if I want to now assign values
to this variable, called scores,

00:52:32.970 --> 00:52:34.760
I can do code like this.

00:52:34.760 --> 00:52:40.160
I can say, scores bracket 0 equals
72, scores bracket 1 equals 73,

00:52:40.160 --> 00:52:42.190
and scores bracket 2 equals 33.

00:52:42.190 --> 00:52:43.940
The only thing weird
there is, admittedly,

00:52:43.940 --> 00:52:45.830
the square brackets which are still new.

00:52:45.830 --> 00:52:49.820
But we're also, notice,
0 indexing things.

00:52:49.820 --> 00:52:52.345
To zero index means to
start counting at 0.

00:52:52.345 --> 00:52:54.470
When we've talked about
that before, our four loops

00:52:54.470 --> 00:52:56.000
have, generally, been zero indexed.

00:52:56.000 --> 00:52:59.870
Arrays in C are zero indexed.

00:52:59.870 --> 00:53:01.430
And you do not have choice over that.

00:53:01.430 --> 00:53:04.550
You can't start counting at 1
in arrays because you prefer to,

00:53:04.550 --> 00:53:06.830
you'd be sacrificing
one of the elements.

00:53:06.830 --> 00:53:09.620
You have to start in
arrays counting from 0.

00:53:09.620 --> 00:53:13.130
So out of context, this
doesn't solve a problem,

00:53:13.130 --> 00:53:15.200
but it, definitely, is
going to once we have more

00:53:15.200 --> 00:53:16.910
than, even, three scores here.

00:53:16.910 --> 00:53:19.750
In fact, let me change
this program a little bit.

00:53:19.750 --> 00:53:21.450
Let me go back to VS Code.

00:53:21.450 --> 00:53:24.020
And delete these three lines, here.

00:53:24.020 --> 00:53:27.080
And replace it with a
scores variable that's

00:53:27.080 --> 00:53:30.140
ready to store three total integers.

00:53:30.140 --> 00:53:34.130
And then, initialize them as
follows, scores bracket 0 is 72,

00:53:34.130 --> 00:53:38.300
as before, scores bracket 1 is
going to be 73, scores bracket 2

00:53:38.300 --> 00:53:39.740
is going to be 33.

00:53:39.740 --> 00:53:44.068
Notice, I do not need to say
int before any of these lines,

00:53:44.068 --> 00:53:45.860
because that's been
taken care of, already,

00:53:45.860 --> 00:53:50.570
for me on line 5, where I already
specified that everything in this array

00:53:50.570 --> 00:53:53.330
is going to be an int.

00:53:53.330 --> 00:53:57.020
Now, down here, this code needs
to change because I no longer have

00:53:57.020 --> 00:53:59.300
three variables, score 1, 2, and 3.

00:53:59.300 --> 00:54:03.950
I have 1 variable, but
that I can index into.

00:54:03.950 --> 00:54:08.750
I'm going to, here, then, do scores
bracket 0, plus scores bracket 1,

00:54:08.750 --> 00:54:13.370
plus scores bracket 2, which is
equivalent to what I did earlier,

00:54:13.370 --> 00:54:14.900
giving me back those three integers.

00:54:14.900 --> 00:54:17.860
But notice, I'm using the same
variable name, every time.

00:54:17.860 --> 00:54:21.070
And again, I'm using this new square
bracket notation to, quote-unquote,

00:54:21.070 --> 00:54:26.590
index into the array to get at the first
int, the second int, and the third,

00:54:26.590 --> 00:54:28.840
and then, to do it again down here.

00:54:28.840 --> 00:54:31.907
Now, this program, still not really
solving all the problems we describe,

00:54:31.907 --> 00:54:34.240
I still can only store three
scores, but we'll come back

00:54:34.240 --> 00:54:35.930
to something like that before long.

00:54:35.930 --> 00:54:38.950
But for now, we're just introducing
a new syntax and a new feature,

00:54:38.950 --> 00:54:44.980
whereby, I can now store multiple
values in the same variable.

00:54:44.980 --> 00:54:47.110
Well, let's enhance this a bit more.

00:54:47.110 --> 00:54:50.660
Instead of hard coding these scores,
as was identified as a problem,

00:54:50.660 --> 00:54:54.790
let's use get int to ask
the user for a score.

00:54:54.790 --> 00:54:58.330
Let's, then, use get int to
ask the user for another score.

00:54:58.330 --> 00:55:01.540
Let's use get int to ask
the user for a third score,

00:55:01.540 --> 00:55:04.400
storing them in those
respective locations.

00:55:04.400 --> 00:55:09.820
And, now, if I go ahead and save
this program, recompile scores, huh.

00:55:09.820 --> 00:55:10.900
I've messed up, here.

00:55:10.900 --> 00:55:13.990
Now these errors should be
getting a little familiar.

00:55:13.990 --> 00:55:16.750
What mistake did I make?

00:55:16.750 --> 00:55:17.875
Let me give folks a moment.

00:55:17.875 --> 00:55:18.970
AUDIENCE: cs50.h

00:55:18.970 --> 00:55:21.100
DAVID MALAN: cs50.h.

00:55:21.100 --> 00:55:24.220
That was not intentional, so still
making mistakes all these years later.

00:55:24.220 --> 00:55:26.320
I need to include cs50.h.

00:55:26.320 --> 00:55:29.570
Now, I'm going to go back to the bottom
in the terminal window, make scores.

00:55:29.570 --> 00:55:30.070
OK.

00:55:30.070 --> 00:55:31.670
We're back in business, ./scores.

00:55:31.670 --> 00:55:33.920
Now, the program is getting
a little more interesting.

00:55:33.920 --> 00:55:38.020
So maybe, this year was better and I got
a 100, and a 99, and a 98, and there,

00:55:38.020 --> 00:55:40.900
my average is 99.0000.

00:55:40.900 --> 00:55:42.370
So now, it's a little more dynamic.

00:55:42.370 --> 00:55:43.270
It's a little more interesting.

00:55:43.270 --> 00:55:45.978
But it's still capping the number
of scores at three, admittedly.

00:55:45.978 --> 00:55:50.740
But now, I've introduced another,
sort of, symptom of bad programming.

00:55:50.740 --> 00:55:54.108
There's this expression in programming,
too, called code smell, where like--

00:55:54.108 --> 00:55:55.900
[SNIFFS AIR] something
smells a little off.

00:55:55.900 --> 00:56:00.550
And there's something off here in
that I could do better with this code.

00:56:00.550 --> 00:56:05.080
Does anyone see an opportunity to
improve the design of this code, here,

00:56:05.080 --> 00:56:08.230
if my goal, still, is to get three
scores from the user but [SNIFF SNIFF]

00:56:08.230 --> 00:56:10.430
without it smelling [SNIFF] kind of bad?

00:56:10.430 --> 00:56:10.930
Yeah?

00:56:10.930 --> 00:56:12.940
AUDIENCE: [INAUDIBLE] use a 4 loop?

00:56:12.940 --> 00:56:15.958
That way you don't have to copy
and paste all of those scores.

00:56:15.958 --> 00:56:17.160
DAVID MALAN: Yeah, exactly.

00:56:17.160 --> 00:56:19.022
Those lines of code
are almost identical.

00:56:19.022 --> 00:56:21.480
And honestly, the only thing
that's changing is the number,

00:56:21.480 --> 00:56:23.100
and it's just incrementing by 1.

00:56:23.100 --> 00:56:25.330
We have all of the building
blocks to do this better.

00:56:25.330 --> 00:56:27.130
So let me go ahead and improve this.

00:56:27.130 --> 00:56:29.560
Let me delete that code.

00:56:29.560 --> 00:56:31.720
Let me, now, have a 4 loop.

00:56:31.720 --> 00:56:36.150
So for int i get 0, i
less than 3, i plus plus.

00:56:36.150 --> 00:56:39.060
Then, inside of this 4 loop,
I can distill all three

00:56:39.060 --> 00:56:40.860
of those lines into
something more generic,

00:56:40.860 --> 00:56:46.530
like scores bracket i equals get
int, and now, ask the user, just

00:56:46.530 --> 00:56:48.905
once, via get int, for a score.

00:56:48.905 --> 00:56:52.000
So this is where arrays
start to get pretty powerful.

00:56:52.000 --> 00:56:54.000
You don't have to hard
code, that is, literally,

00:56:54.000 --> 00:56:56.462
type in all of these magic
numbers like 0, 1, and 2.

00:56:56.462 --> 00:56:58.170
You can start to do
it, programmatically,

00:56:58.170 --> 00:56:59.770
as you propose with a loop.

00:56:59.770 --> 00:57:01.350
So now, I've tightened things up.

00:57:01.350 --> 00:57:04.230
I'm now, dynamically, getting
three different scores,

00:57:04.230 --> 00:57:06.766
but putting them in three
different locations.

00:57:06.766 --> 00:57:10.470
And so this program, ultimately, is
going to work, pretty much, the same.

00:57:10.470 --> 00:57:17.520
Make scores, ./scores, and 100, 99,
98, and we're back to the same answer.

00:57:17.520 --> 00:57:19.440
But it's a little better designed, too.

00:57:19.440 --> 00:57:21.360
If I really want to
nitpick, there's something

00:57:21.360 --> 00:57:23.100
that still smells, a little bit, here.

00:57:23.100 --> 00:57:27.540
The fact that I have indeed, this
magic number three, that really

00:57:27.540 --> 00:57:29.890
has to be the same as this number here.

00:57:29.890 --> 00:57:32.170
Otherwise, who knows
what's going to go wrong.

00:57:32.170 --> 00:57:34.380
So what might be a
solution, per last week,

00:57:34.380 --> 00:57:36.960
to cleaning that code up further, too?

00:57:36.960 --> 00:57:39.750
AUDIENCE: [INAUDIBLE]
the user's discretion

00:57:39.750 --> 00:57:41.742
how many input scores [INAUDIBLE].

00:57:41.742 --> 00:57:44.790
DAVID MALAN: OK, so we could leave
it up to the user's discretion.

00:57:44.790 --> 00:57:47.500
And so we could, actually,
do something like this.

00:57:47.500 --> 00:57:49.200
Let me take this a few steps ahead.

00:57:49.200 --> 00:57:56.230
Let me say something like, int n gets
get int, how many scores question mark,

00:57:56.230 --> 00:58:00.600
then I could actually change this
to an n, and then this to an n,

00:58:00.600 --> 00:58:02.970
and, indeed, make the
whole program dynamic?

00:58:02.970 --> 00:58:05.670
Ask the human how many tests
have there been this semester?

00:58:05.670 --> 00:58:07.500
Then, you can type in
each of those scores

00:58:07.500 --> 00:58:09.708
because the loop is going
to iterate that many times.

00:58:09.708 --> 00:58:13.020
And then you'll get the average
of one test, two test, three--

00:58:13.020 --> 00:58:17.520
well, lost another-- or however
many scores that were actually

00:58:17.520 --> 00:58:20.760
specified by the user Yeah, question?

00:58:20.760 --> 00:58:25.765
AUDIENCE: How many bits or
bytes get used in an array?

00:58:25.765 --> 00:58:28.060
DAVID MALAN: How many
bytes are used in an array?

00:58:28.060 --> 00:58:32.524
AUDIENCE: [INAUDIBLE] point of
doing this is to save [INAUDIBLE]

00:58:32.524 --> 00:58:35.500
DAVID MALAN: So the purpose of
an array is not to save space.

00:58:35.500 --> 00:58:39.010
It's to eliminate having
multiple variable names

00:58:39.010 --> 00:58:40.900
because that gets very messy quickly.

00:58:40.900 --> 00:58:44.980
If you have score 1, score 2,
score 3, dot, dot, dot, score 99,

00:58:44.980 --> 00:58:48.100
that's, like, 99 different
variables, potentially,

00:58:48.100 --> 00:58:54.160
that you could collapse into one
variable that has 99 locations.

00:58:54.160 --> 00:58:56.230
At different indices, or indexes.

00:58:56.230 --> 00:58:58.570
As someone would say,
the index for an array

00:58:58.570 --> 00:59:00.756
is whatever is in the square brackets.

00:59:00.756 --> 00:59:11.560
AUDIENCE: [INAUDIBLE]

00:59:11.560 --> 00:59:13.280
DAVID MALAN: So it's a good question.

00:59:13.280 --> 00:59:15.370
So if you-- I'm using
ints for everything--

00:59:15.370 --> 00:59:17.560
and honestly, we don't
really need ints for scores

00:59:17.560 --> 00:59:21.770
because I'm not likely to get a
2 billion on a test anytime soon.

00:59:21.770 --> 00:59:23.620
And so you could use
different data types.

00:59:23.620 --> 00:59:26.287
And that list we had on the screen,
earlier, is not all of them.

00:59:26.287 --> 00:59:29.770
There's a data type called short,
which is shorter than an int,

00:59:29.770 --> 00:59:34.850
you could, technically, use char, in
some form or other data types as well.

00:59:34.850 --> 00:59:36.940
Generally speaking, in
the year 2021, these

00:59:36.940 --> 00:59:40.990
tend to be over optima--
overly optimized decisions.

00:59:40.990 --> 00:59:42.940
Everyone just uses
ints, even though no one

00:59:42.940 --> 00:59:46.300
is going to get a test score that's 2
billion, or more, because int is just,

00:59:46.300 --> 00:59:47.260
kind of, the go-to.

00:59:47.260 --> 00:59:50.252
Years ago, memory was expensive.

00:59:50.252 --> 00:59:52.210
And every one of your
instincts would have been

00:59:52.210 --> 00:59:54.700
spot on because memory is so tight.

00:59:54.700 --> 00:59:56.930
But, nowadays, we don't
worry as much about it.

00:59:56.930 --> 00:59:57.430
Yeah?

00:59:57.430 --> 01:00:02.556
AUDIENCE: I have a question
about the error [INAUDIBLE]..

01:00:02.556 --> 01:00:06.605
Could it-- when you're doing a
hash problem on the problem set--

01:00:06.605 --> 01:00:10.010
DAVID MALAN: So what is the
difference between dividing two ints

01:00:10.010 --> 01:00:12.380
and not getting an error, as
you might have encountered

01:00:12.380 --> 01:00:15.920
in a program like cash,
versus dividing two ints

01:00:15.920 --> 01:00:18.150
and getting an error
like I did a moment ago?

01:00:18.150 --> 01:00:22.280
The problem with the scenario I created
a moment ago was printf was involved.

01:00:22.280 --> 01:00:27.980
And I was telling printf to use a %f,
but I was giving printf the result

01:00:27.980 --> 01:00:30.580
of dividing integers by another integer.

01:00:30.580 --> 01:00:32.930
So it was printf that was yelling at me.

01:00:32.930 --> 01:00:35.930
I'm guessing in the scenario you're
describing, for something like cash,

01:00:35.930 --> 01:00:39.180
printf was not involved in
that particular line of code.

01:00:39.180 --> 01:00:40.865
So that's the difference, there.

01:00:40.865 --> 01:00:41.660
All right.

01:00:41.660 --> 01:00:45.110
So we, now, have this
ability to create an array.

01:00:45.110 --> 01:00:47.510
And an array can store multiple values.

01:00:47.510 --> 01:00:51.450
What, then, might we do that's more
interesting than just storing numbers

01:00:51.450 --> 01:00:51.950
in memory?

01:00:51.950 --> 01:00:54.230
Well, let's take this one step further.

01:00:54.230 --> 01:01:01.130
As opposed to just storing 72, 73, 33 or
100, 99, 98, at these given locations,

01:01:01.130 --> 01:01:05.930
because again, an array gives you one
variable name, but multiple locations,

01:01:05.930 --> 01:01:08.360
or indices therein,
bracket 0, bracket 1,

01:01:08.360 --> 01:01:11.330
bracket 2 on up, if it
were even bigger than that.

01:01:11.330 --> 01:01:16.100
Let's, now, start to consider something
more modest, like simple chars.

01:01:16.100 --> 01:01:18.830
Chars, being 1 byte each,
so they're even smaller,

01:01:18.830 --> 01:01:20.090
they take up much less space.

01:01:20.090 --> 01:01:22.048
And, indeed, if I wanted
to say a message like,

01:01:22.048 --> 01:01:24.200
hi I could use three variables.

01:01:24.200 --> 01:01:28.520
If I wanted a program to print,
hi, H-I exclamation point,

01:01:28.520 --> 01:01:33.230
I could, of course, store those in
three variables, like c1, c2, c3.

01:01:33.230 --> 01:01:36.710
And let's, for the sake of discussion,
let's whip this up real quickly.

01:01:36.710 --> 01:01:39.680
Let me create a new
program, now, in VS Code.

01:01:39.680 --> 01:01:42.920
This time, I'm going to call it hi.c.

01:01:42.920 --> 01:01:45.650
And I'm not going to bother
with the CS50 library.

01:01:45.650 --> 01:01:47.660
I just need the standard
I/O one, for now.

01:01:47.660 --> 01:01:49.220
int main(void).

01:01:49.220 --> 01:01:52.400
And then, inside of main, I'm going
to, simply, create three variables.

01:01:52.400 --> 01:01:55.760
And this is already, hopefully,
striking you as a bad idea.

01:01:55.760 --> 01:01:58.310
But we'll go down this
road, temporarily,

01:01:58.310 --> 01:02:02.300
with c1, and c2, and, finally, c3.

01:02:02.300 --> 01:02:05.660
Storing each character in
the phrase I want to print,

01:02:05.660 --> 01:02:09.450
and I'm going to print this
in a different way than usual.

01:02:09.450 --> 01:02:10.880
Now I'm dealing with chars.

01:02:10.880 --> 01:02:14.480
And we've, generally, dealt with
strings, which was easier last week.

01:02:14.480 --> 01:02:21.600
But %c, %c, %c, will let me print out
three chars, and like c1, c2, and c3.

01:02:21.600 --> 01:02:24.420
So, kind of, a stupid way
of printing out a string.

01:02:24.420 --> 01:02:26.940
So we already have a solution
to this problem last week.

01:02:26.940 --> 01:02:30.540
But let's poke around at what's
going on underneath the hood, here.

01:02:30.540 --> 01:02:33.350
So let's make hi, ./hi.

01:02:33.350 --> 01:02:34.475
And, voila no surprise.

01:02:34.475 --> 01:02:36.350
But we, again, could
have done this last week

01:02:36.350 --> 01:02:39.530
with a string and just one
variable, or even, 0, at that.

01:02:39.530 --> 01:02:43.220
But let's start converting
these characters

01:02:43.220 --> 01:02:47.750
to their apparent numeric equivalents
like we talked about in week 0 too.

01:02:47.750 --> 01:02:52.310
Let me modify these %c's,
just to be fun, to be %i's.

01:02:52.310 --> 01:02:56.180
And let me add some spaces so there
are gaps between each of them.

01:02:56.180 --> 01:03:00.350
Let me, now, recompile
hi, and let me rerun it.

01:03:00.350 --> 01:03:02.900
Just to guess, what should
I see on the screen now?

01:03:05.690 --> 01:03:06.200
Any guesses?

01:03:06.200 --> 01:03:06.700
Yeah?

01:03:06.700 --> 01:03:08.036
AUDIENCE: The ASCII values?

01:03:08.036 --> 01:03:09.760
DAVID MALAN: The ASCII values.

01:03:09.760 --> 01:03:12.220
And it's intentional that
I keep using the same word,

01:03:12.220 --> 01:03:18.250
hi, because it should be, hopefully,
the old friends, 72, 73, and 33.

01:03:18.250 --> 01:03:22.120
Which, is to say, that c knows about
ASCII, or equivalently, Unicode,

01:03:22.120 --> 01:03:24.320
and can do this conversion
for us automatically.

01:03:24.320 --> 01:03:27.670
And it seems to be doing it
implicitly for us, so to speak.

01:03:27.670 --> 01:03:31.000
Notice that c1, c2 and
c3 are, obviously, chars,

01:03:31.000 --> 01:03:34.420
but printf is able to tolerate
printing them as integers.

01:03:34.420 --> 01:03:38.870
If I really want it to be pedantic,
I could use this technique, again,

01:03:38.870 --> 01:03:41.320
known as typecasting,
where I can actually

01:03:41.320 --> 01:03:46.610
convert one data type to another,
if it makes logical sense to do so.

01:03:46.610 --> 01:03:49.900
And we saw in week 0,
chars, or characters,

01:03:49.900 --> 01:03:53.500
are just numbers, like 72, 73, and 33.

01:03:53.500 --> 01:03:57.680
So I can use this parenthetical
expression to convert, incorrectly,

01:03:57.680 --> 01:04:02.623
[LAUGHTER] three chars to
three integers, instead.

01:04:02.623 --> 01:04:04.540
So that's what I meant
to type the first time.

01:04:04.540 --> 01:04:05.040
There we go.

01:04:05.040 --> 01:04:05.800
Strike two, today.

01:04:05.800 --> 01:04:09.280
So parenthesis, int,
close parenthesis says

01:04:09.280 --> 01:04:14.840
take whatever variable comes after this,
c1, c2, or c3 and convert it to an int.

01:04:14.840 --> 01:04:18.640
The effect is going to be no different,
make hi, and then rerunning whoops--

01:04:18.640 --> 01:04:24.910
then running ./hi still works the same,
but now I'm explicitly converting chars

01:04:24.910 --> 01:04:25.660
to ints.

01:04:25.660 --> 01:04:29.260
And we can do this all day long,
chars to ints, floats to ints,

01:04:29.260 --> 01:04:30.250
ints to floats.

01:04:30.250 --> 01:04:31.888
Sometimes, it's equivalent.

01:04:31.888 --> 01:04:33.805
Other times, you're going
to lose information.

01:04:33.805 --> 01:04:37.270
Taking a float to an
int, just intuitively,

01:04:37.270 --> 01:04:39.790
is going to throw away everything
after the decimal point,

01:04:39.790 --> 01:04:42.680
because an int has no decimal point.

01:04:42.680 --> 01:04:45.100
But, for now, I'm going to
rewind to the version of this

01:04:45.100 --> 01:04:49.150
that just did implicit-type
conversion, or implicit casting,

01:04:49.150 --> 01:04:53.350
just to demonstrate that we can, indeed,
see the values underneath the hood.

01:04:53.350 --> 01:04:53.950
All right.

01:04:53.950 --> 01:04:56.370
Let me go ahead and do
this, now, the week 1 way.

01:04:56.370 --> 01:04:57.370
This was kind of stupid.

01:04:57.370 --> 01:05:00.205
Let's just do printf, quote-unquote--

01:05:00.205 --> 01:05:04.630
Actually, let's do this, string
s equals quote-unquote hi,

01:05:04.630 --> 01:05:09.680
and then let's do a simple printf
with %s, printing out s's there.

01:05:09.680 --> 01:05:12.520
So now I've rewound to last
week, where we began this story,

01:05:12.520 --> 01:05:16.660
but you'll notice that, if we
keep playing around with this--

01:05:16.660 --> 01:05:18.860
whoops, what did I do here?

01:05:18.860 --> 01:05:23.470
Oh, and let me introduce the C50 library
here, more on that next before long.

01:05:23.470 --> 01:05:26.260
Let me go ahead and
recompile, rerun this,

01:05:26.260 --> 01:05:28.268
we seem to be coding in circles, here.

01:05:28.268 --> 01:05:30.810
Like, I've just done the same
thing multiple, different ways.

01:05:30.810 --> 01:05:33.400
But there's clearly
an equivalence, then,

01:05:33.400 --> 01:05:36.978
between sequences of chars and strings.

01:05:36.978 --> 01:05:38.770
And if you do it the
real pedantic way, you

01:05:38.770 --> 01:05:43.390
have three different variables, c1, c2,
c3, representing H-I exclamation point,

01:05:43.390 --> 01:05:47.870
or you can just treat them all together
like this h, i, exclamation point.

01:05:47.870 --> 01:05:52.030
But it turns out that
strings are actually

01:05:52.030 --> 01:05:58.060
implemented by the computer
in a pretty now familiar way.

01:05:58.060 --> 01:06:04.382
What might a string actually be
as of this point in the story?

01:06:04.382 --> 01:06:05.590
Where are we going with this?

01:06:05.590 --> 01:06:06.923
Let me try to look further back.

01:06:06.923 --> 01:06:07.850
Yeah, in way back?

01:06:07.850 --> 01:06:08.350
Yeah?

01:06:08.350 --> 01:06:10.600
AUDIENCE: Can a string like
this be an array of chars?

01:06:10.600 --> 01:06:13.410
DAVID MALAN: Yeah, a string
might be, and indeed is, just

01:06:13.410 --> 01:06:14.800
an array of characters.

01:06:14.800 --> 01:06:17.190
So last week we took for
granted that strings exist.

01:06:17.190 --> 01:06:19.530
Technically, strings exist,
but they're implemented

01:06:19.530 --> 01:06:23.070
as arrays of characters,
which actually opens up

01:06:23.070 --> 01:06:25.770
some interesting possibilities for us.

01:06:25.770 --> 01:06:28.300
Because, let me see, let
me see if I can do this.

01:06:28.300 --> 01:06:31.560
Let me try to print out,
now, three integers again.

01:06:31.560 --> 01:06:37.530
But if string s is but an array, as you
propose, maybe I can do s bracket 0,

01:06:37.530 --> 01:06:39.760
s bracket 1, and s bracket 2.

01:06:39.760 --> 01:06:43.650
So maybe I can start poking
around inside of strings,

01:06:43.650 --> 01:06:45.630
even though we didn't
do this last week, so I

01:06:45.630 --> 01:06:47.260
can get at those individual values.

01:06:47.260 --> 01:06:51.270
So make hi, ./hi and,
voila, there we go again.

01:06:51.270 --> 01:06:56.208
It's the same 72, 73, 33, but
now, I'm sort of, hopefully,

01:06:56.208 --> 01:06:58.500
like, wrapping my mind around
the fact that, all right,

01:06:58.500 --> 01:07:01.650
a string is just an array of
characters, and arrays, you

01:07:01.650 --> 01:07:04.960
can index into them using this
new square bracket notation.

01:07:04.960 --> 01:07:08.040
So I can get at any one of
these individual characters,

01:07:08.040 --> 01:07:14.055
and, heck, convert it to an
integer like we did in week 0.

01:07:14.055 --> 01:07:17.010
Let me get a little curious now.

01:07:17.010 --> 01:07:20.020
What else might be in
the computer's memory?

01:07:20.020 --> 01:07:23.550
Well, let's-- I'll go back to the
depiction of these same things.

01:07:23.550 --> 01:07:25.860
Here might be how we
originally implemented hi

01:07:25.860 --> 01:07:28.800
with three variables, c1, c2, c3.

01:07:28.800 --> 01:07:31.500
Of course, that map to these
decimal digits or equivalent,

01:07:31.500 --> 01:07:32.880
these binary values.

01:07:32.880 --> 01:07:35.310
But what was this
looking like in memory?

01:07:35.310 --> 01:07:38.250
Literally, when you create a
string in memory, like this,

01:07:38.250 --> 01:07:41.240
string s equals quote-unquote hi,
let's consider what's going on

01:07:41.240 --> 01:07:42.615
underneath the hood, so to speak.

01:07:42.615 --> 01:07:47.490
Well, as an abstraction, a string,
it's H-I exclamation point taking up,

01:07:47.490 --> 01:07:48.917
it would seem, 3 bytes, right?

01:07:48.917 --> 01:07:51.000
I've gotten rid of the
bars, there, because if you

01:07:51.000 --> 01:07:55.650
think of a string as a type, I'm just
going to use one big box of size 3.

01:07:55.650 --> 01:08:00.210
But technically, a string, we've
just revealed, is an array,

01:08:00.210 --> 01:08:01.830
and the array is of size 3.

01:08:01.830 --> 01:08:03.750
So technically, if the
string is called s,

01:08:03.750 --> 01:08:05.970
s bracket 0 will give
you the first character,

01:08:05.970 --> 01:08:09.810
s bracket 1, the second,
and s bracket 3, the third.

01:08:09.810 --> 01:08:13.290
But let me ask this question now,
if this, at the end of the day,

01:08:13.290 --> 01:08:16.560
is the only thing in
your computer memory

01:08:16.560 --> 01:08:20.790
and the ability, like a canvas to draw
0s and 1s, or numbers, or characters,

01:08:20.790 --> 01:08:22.620
or whatever on it, but
that's it, like this

01:08:22.620 --> 01:08:25.770
is what your Mac, and PC, and
phone ultimately reduced to.

01:08:25.770 --> 01:08:29.730
Suppose that I'm running a piece
of software, like a text messenger,

01:08:29.730 --> 01:08:33.000
and now I write down
bye exclamation point.

01:08:33.000 --> 01:08:34.860
Well, where might that go in memory?

01:08:34.860 --> 01:08:35.845
Well, it might go here.

01:08:35.845 --> 01:08:39.333
B-Y-E. And then the next thing I type
might go here, here, here and so forth.

01:08:39.333 --> 01:08:41.250
My memory just might get
filled up, over time,

01:08:41.250 --> 01:08:44.310
with things that you or
someone else are typing.

01:08:44.310 --> 01:08:50.580
But then how does the computer know if,
potentially, B-Y-E exclamation point

01:08:50.580 --> 01:08:56.150
is right after H-I exclamation point
where one string ends and the next one

01:08:56.150 --> 01:08:56.650
begins?

01:08:58.930 --> 01:08:59.430
Right?

01:08:59.430 --> 01:09:03.070
All we have are bytes, or 0s and 1s.

01:09:03.070 --> 01:09:05.730
So if you were designing
this, how would you

01:09:05.730 --> 01:09:08.280
implement some kind of
delimiter between the two?

01:09:08.280 --> 01:09:10.260
Or figure out what the
length of a string is?

01:09:10.260 --> 01:09:11.010
What do you think?

01:09:11.010 --> 01:09:12.148
AUDIENCE: A nul character.

01:09:12.148 --> 01:09:15.107
DAVID MALAN: OK, so the right
answer is use a nul character,

01:09:15.107 --> 01:09:17.190
and for those who don't
know, what does that mean?

01:09:17.190 --> 01:09:19.492
AUDIENCE: It's special.

01:09:19.492 --> 01:09:21.450
DAVID MALAN: Yeah, so
it's a special character.

01:09:21.450 --> 01:09:23.520
Let me describe it as
a sentinel character.

01:09:23.520 --> 01:09:25.575
Humans decided some
time ago that you know

01:09:25.575 --> 01:09:28.560
what, if we want to delineate
where one string ends

01:09:28.560 --> 01:09:32.010
and where the next one begins,
we just need some special symbol.

01:09:32.010 --> 01:09:35.189
And the symbol they'll use is
generally written as backslash 0.

01:09:35.189 --> 01:09:39.555
This is just shorthand notation
for literally eight 0 bits.

01:09:39.555 --> 01:09:42.540
0, 0, 0, 0, 0, 0, 0, 0.

01:09:42.540 --> 01:09:46.140
And the nickname for eight
0 bits, in this context,

01:09:46.140 --> 01:09:48.930
is nul, N-U-L, so to speak.

01:09:48.930 --> 01:09:51.910
And we can actually see this as follows.

01:09:51.910 --> 01:09:53.913
If you look at the
corresponding decimal digits,

01:09:53.913 --> 01:09:56.580
like you could do by doing out
the math or doing the conversion,

01:09:56.580 --> 01:10:01.560
like we've done in code, you would
see for storing hi, 72, 73, 33,

01:10:01.560 --> 01:10:06.600
but then 1 extra byte that's sort of
invisibly there, but that is all 0s.

01:10:06.600 --> 01:10:09.120
And now I've just written
it as the decimal number 0.

01:10:09.120 --> 01:10:12.120
The implication of this is
that the computer is apparently

01:10:12.120 --> 01:10:16.695
using, not 3 bytes to store
a word like hi, but 4 bytes.

01:10:16.695 --> 01:10:22.050
Whatever the length of the string is,
plus 1 for this special sentinel value

01:10:22.050 --> 01:10:24.640
that demarcates the end of the string.

01:10:24.640 --> 01:10:26.680
So we might draw it like this instead.

01:10:26.680 --> 01:10:31.350
And this character is, again,
pronounced nul, or written N-U-L.

01:10:31.350 --> 01:10:32.319
So that's all, right?

01:10:32.319 --> 01:10:35.069
If humans, at the end of the day,
just have this canvas of memory,

01:10:35.069 --> 01:10:36.902
they just needed to
decide, all right, well,

01:10:36.902 --> 01:10:39.990
how do we distinguish
one string from another?

01:10:39.990 --> 01:10:42.660
It's a lot easier with
chars, individually, it's

01:10:42.660 --> 01:10:45.450
a lot easier with ints, it's
even easier With floats, why?

01:10:45.450 --> 01:10:49.620
Because, per that chart earlier,
every character is always 1 byte.

01:10:49.620 --> 01:10:51.810
Every int is always 4 bytes.

01:10:51.810 --> 01:10:54.750
Every long is always 8 bytes.

01:10:54.750 --> 01:10:56.279
How long is a string?

01:10:56.279 --> 01:10:59.760
Well, hi is 1, 2, 3 with
an exclamation point.

01:10:59.760 --> 01:11:03.029
Bye is 1, 2, 3, 4 with
an exclamation point.

01:11:03.029 --> 01:11:06.450
David is D-A-V-I-D, five
without an exclamation point.

01:11:06.450 --> 01:11:10.210
And so a string can be
any number of bytes long,

01:11:10.210 --> 01:11:12.700
so you somehow need to
draw a line in the sand

01:11:12.700 --> 01:11:16.706
to separate in memory
one string from another.

01:11:16.706 --> 01:11:19.412
So what's the implication of this?

01:11:19.412 --> 01:11:20.870
Well, let me go back to code, here.

01:11:20.870 --> 01:11:22.210
Let's actually poke around.

01:11:22.210 --> 01:11:27.130
This is a bit dangerous, but I'm going
to start looking at memory locations

01:11:27.130 --> 01:11:29.210
past my string here.

01:11:29.210 --> 01:11:33.250
So let me go ahead and
recompile, make hi.

01:11:33.250 --> 01:11:35.110
Whoops, what did I do here?

01:11:35.110 --> 01:11:36.680
I forgot a format code.

01:11:36.680 --> 01:11:38.620
Let me add one more %i.

01:11:38.620 --> 01:11:42.550
Now let me go ahead and
rerun make hi, ./hi, Enter.

01:11:42.550 --> 01:11:43.580
There it is.

01:11:43.580 --> 01:11:46.660
So you can actually see in the
computer, unbeknownst to you

01:11:46.660 --> 01:11:49.830
previously, that there's indeed
something else going on there.

01:11:49.830 --> 01:11:52.880
And if I were to make one
other variant of this program--

01:11:52.880 --> 01:11:55.630
let's get rid of just this
one word and let's have two.

01:11:55.630 --> 01:11:57.550
So let me give myself
another string called t,

01:11:57.550 --> 01:12:01.810
for instance, just this common
convention with bye exclamation point.

01:12:01.810 --> 01:12:04.900
Let me, then print out with %s.

01:12:04.900 --> 01:12:10.785
And let me also print out with %s,
whoops, printf, print out t, as well.

01:12:10.785 --> 01:12:14.320
Let me recompile this program,
and obviously the out--

01:12:14.320 --> 01:12:17.470
ugh-- this is what happens
when I go too fast.

01:12:17.470 --> 01:12:20.740
All right, third mistake
today, close quote.

01:12:20.740 --> 01:12:22.030
As I was missing.

01:12:22.030 --> 01:12:23.590
Make hi.

01:12:23.590 --> 01:12:25.000
Fourth mistake today.

01:12:25.000 --> 01:12:26.200
Make hi.

01:12:26.200 --> 01:12:27.490
Dot slash hi.

01:12:27.490 --> 01:12:28.210
OK, voila.

01:12:28.210 --> 01:12:30.610
Now we have a program that's
printing both hi and bye,

01:12:30.610 --> 01:12:34.720
only so that we can consider what's
going on in the computer's memory.

01:12:34.720 --> 01:12:40.210
If s is storing hi and
apparently one bonus byte that

01:12:40.210 --> 01:12:43.240
demarcates the end of that
string, bye is apparently

01:12:43.240 --> 01:12:46.413
going to fit into the
location directly after.

01:12:46.413 --> 01:12:49.330
And it's wrapping around, but that's
just an artist's rendition, here.

01:12:49.330 --> 01:12:52.000
But bye, B-Y-E exclamation
point is taking up

01:12:52.000 --> 01:12:58.948
1, 2, 3, 4, plus a fifth byte, as well.

01:12:58.948 --> 01:13:03.580
All right, any questions on this
underlying representation of strings?

01:13:03.580 --> 01:13:05.560
And we'll contextualize
this, before long,

01:13:05.560 --> 01:13:07.840
so that this isn't just
like, OK, who really cares?

01:13:07.840 --> 01:13:10.730
This is going to be the source
of actually implementing things.

01:13:10.730 --> 01:13:13.510
In fact for problem set 2, like
cryptography, and encryption,

01:13:13.510 --> 01:13:15.468
and scrambling actual human messages.

01:13:15.468 --> 01:13:16.510
But some questions first.

01:13:16.510 --> 01:13:20.650
AUDIENCE: So normally if
you were to not use string,

01:13:20.650 --> 01:13:23.480
you would just make a character
range that would declare,

01:13:23.480 --> 01:13:26.580
how many characters there are so
you know how many characters are

01:13:26.580 --> 01:13:27.330
going to be there.

01:13:27.330 --> 01:13:29.480
DAVID MALAN: A good
question, too and let

01:13:29.480 --> 01:13:32.115
me summarize as, if we were
instead to use chars all the time,

01:13:32.115 --> 01:13:35.240
we would indeed have to know in advance
how many chars you want for a given

01:13:35.240 --> 01:13:38.750
string that you're storing, how, then,
does something like get string work,

01:13:38.750 --> 01:13:41.000
because when you CS50 wrote
the get string function,

01:13:41.000 --> 01:13:43.190
we obviously don't know
how long the words are

01:13:43.190 --> 01:13:45.020
going to be that you all are typing in.

01:13:45.020 --> 01:13:48.560
It turns out, two weeks from
now we'll see that get string

01:13:48.560 --> 01:13:51.320
uses a technique known as
dynamic memory allocation.

01:13:51.320 --> 01:13:55.770
And it's going to grow or shrink
the array automatically for you.

01:13:55.770 --> 01:13:57.050
But more on that soon.

01:13:57.050 --> 01:13:57.920
Other questions?

01:13:57.920 --> 01:14:01.450
AUDIENCE: Why are we using a nul value?

01:14:01.450 --> 01:14:02.725
Isn't that wasting a byte?

01:14:02.725 --> 01:14:03.850
DAVID MALAN: Good question.

01:14:03.850 --> 01:14:06.880
Why are we using a nul value,
isn't it wasting a byte?

01:14:06.880 --> 01:14:07.630
Yes.

01:14:07.630 --> 01:14:13.210
But I claim there's really no other way
to distinguish the end of one string

01:14:13.210 --> 01:14:19.748
from the start of another, unless we
make some sort of notation in memory.

01:14:19.748 --> 01:14:22.540
All we have, at the end of the day,
inside of a computer, are bits.

01:14:22.540 --> 01:14:25.900
Therefore, all we can do is spin
those bits in some creative way

01:14:25.900 --> 01:14:27.520
to solve this problem.

01:14:27.520 --> 01:14:30.710
So we're minimally going to spend
1 byte to solve this problem.

01:14:30.710 --> 01:14:31.210
Yeah?

01:14:31.210 --> 01:14:35.897
AUDIENCE: How does our memory device
know to enter a line when you type

01:14:35.897 --> 01:14:39.270
the /n if we don't have
it stored as a char?

01:14:39.270 --> 01:14:40.910
DAVID MALAN: If you don't--

01:14:40.910 --> 01:14:44.690
how does the computer know to move
to a next line when you have a /n?

01:14:44.690 --> 01:14:47.990
So /n, even though it
looks like two characters,

01:14:47.990 --> 01:14:51.890
it's actually stored as just 1
byte in the computer's memory.

01:14:51.890 --> 01:14:54.357
There's a mapping between
it and an actual number.

01:14:54.357 --> 01:14:57.440
And you can see that, for instance,
on the ASCII chart from the other day.

01:14:57.440 --> 01:15:01.224
AUDIENCE: So with that being
stored would be the [INAUDIBLE]..

01:15:01.224 --> 01:15:02.420
DAVID MALAN: It would be.

01:15:02.420 --> 01:15:08.210
If I had put a /n in my code here,
right after the exclamation point here

01:15:08.210 --> 01:15:11.840
and here, that would actually shift
everything in memory because we would

01:15:11.840 --> 01:15:16.740
need to make room for a /n
here and another one over here.

01:15:16.740 --> 01:15:18.913
So it would take two
more bytes, exactly.

01:15:18.913 --> 01:15:19.580
Other questions?

01:15:19.580 --> 01:15:26.050
AUDIENCE: So if hi exclamation
point is written in binary and ASCII

01:15:26.050 --> 01:15:32.630
too as 72, 73, 33, if we are to
write those numbers in the string,

01:15:32.630 --> 01:15:39.090
and convert them into binary how
would the computer know what's 72

01:15:39.090 --> 01:15:40.390
and what's 8?

01:15:40.390 --> 01:15:42.390
DAVID MALAN: And what's
the last thing you said?

01:15:42.390 --> 01:15:43.806
AUDIENCE: 8, for example.

01:15:43.806 --> 01:15:45.700
DAVID MALAN: It's context sensitive.

01:15:45.700 --> 01:15:48.450
So if, at the end of the day, all
we're storing is these numbers,

01:15:48.450 --> 01:15:52.380
like 72, 73, 33, recall
that it's up to the program

01:15:52.380 --> 01:15:55.470
to decide, based on context,
how to interpret them.

01:15:55.470 --> 01:15:59.310
And I simplified this story in week 0
saying that Photoshop interprets them

01:15:59.310 --> 01:16:02.910
as RGB colors, and iMessage
or a text messaging program

01:16:02.910 --> 01:16:07.440
interprets them as letters, and
Excel interprets them as numbers.

01:16:07.440 --> 01:16:12.540
How those programs do it is by way
of variables like string, and int,

01:16:12.540 --> 01:16:13.080
and float.

01:16:13.080 --> 01:16:14.872
And in fact, later this
semester, we'll see

01:16:14.872 --> 01:16:19.500
a data type via which you can represent
a color as a triple of numbers,

01:16:19.500 --> 01:16:22.240
and red value, a green
value, and a blue value.

01:16:22.240 --> 01:16:24.600
So we'll see other data types as well.

01:16:24.600 --> 01:16:25.100
Yeah?

01:16:25.100 --> 01:16:29.320
AUDIENCE: It seems easy enough to just
add a nul thing at the end of the word,

01:16:29.320 --> 01:16:32.190
so why do we have integers
and long integers?

01:16:32.190 --> 01:16:35.192
Why can't we make everything
variable in its data size?

01:16:35.192 --> 01:16:36.900
DAVID MALAN: Really
interesting question.

01:16:36.900 --> 01:16:40.110
Why could we not just make all
data types variable in size?

01:16:40.110 --> 01:16:43.560
And some languages, some
libraries do exactly this.

01:16:43.560 --> 01:16:47.100
C is an older language, and
because memory was expensive

01:16:47.100 --> 01:16:48.300
memory was limited.

01:16:48.300 --> 01:16:50.640
The reality was you
gain benefits from just

01:16:50.640 --> 01:16:53.010
standardizing the size of these things.

01:16:53.010 --> 01:16:55.410
You also get performance
increases in the sense

01:16:55.410 --> 01:16:59.620
that if you know every int is
4 bytes, you can very quickly,

01:16:59.620 --> 01:17:02.220
and we'll see this next week,
jump from integer to another,

01:17:02.220 --> 01:17:06.600
to another in memory just by adding
4 inside of those square brackets.

01:17:06.600 --> 01:17:08.430
You can very quickly poke around.

01:17:08.430 --> 01:17:11.522
Whereas, if you had variable
length numbers, you would have to,

01:17:11.522 --> 01:17:13.980
kind of, follow, follow, follow,
looking for the end of it.

01:17:13.980 --> 01:17:16.780
Follow, follow-- you would have to
look at more locations in memory.

01:17:16.780 --> 01:17:18.322
So that's a topic we'll come back to.

01:17:18.322 --> 01:17:20.700
But it was generally for efficiency.

01:17:20.700 --> 01:17:22.170
And other question, yeah?

01:17:22.170 --> 01:17:27.942
AUDIENCE: Why not store the
nul character [INAUDIBLE]

01:17:27.942 --> 01:17:31.520
DAVID MALAN: Good question
why not store the--

01:17:31.520 --> 01:17:35.540
why not store the nul
character at the beginning?

01:17:35.540 --> 01:17:41.890
You could-- let's see, why
not store it at the beginning?

01:17:41.890 --> 01:17:45.080
You could do that.

01:17:45.080 --> 01:17:48.325
You could absolutely--
well, could you do this?

01:17:51.580 --> 01:17:56.380
If you were to do that
at the beginning--

01:17:56.380 --> 01:17:57.400
short answer, no.

01:17:57.400 --> 01:17:58.420
OK, now I retract that.

01:17:58.420 --> 01:18:00.628
No, because I finally thought
of a problem with this.

01:18:00.628 --> 01:18:02.483
If you store it at
the beginning instead,

01:18:02.483 --> 01:18:04.900
we'll see in just a moment how
you can actually write code

01:18:04.900 --> 01:18:07.150
to figure out where
the end of a string is,

01:18:07.150 --> 01:18:09.550
and the problem there
is wouldn't necessarily

01:18:09.550 --> 01:18:13.000
know if you eventually hit a
0 at the end of the string,

01:18:13.000 --> 01:18:16.810
because it's the number 0 in the
context of Excel using some memory,

01:18:16.810 --> 01:18:20.180
or if it's the context of some
other data type, altogether.

01:18:20.180 --> 01:18:22.600
So the fact that we've standardized--

01:18:22.600 --> 01:18:26.560
the fact that we've standardized
strings as ending with nul

01:18:26.560 --> 01:18:30.655
means that we can reliably distinguish
one variable from another in memory.

01:18:30.655 --> 01:18:32.560
And that's actually a
perfect segue way, now,

01:18:32.560 --> 01:18:35.693
to actually using this
primitive to building up

01:18:35.693 --> 01:18:38.360
our own code that manipulates
these things that are lower level.

01:18:38.360 --> 01:18:39.560
So let me do this.

01:18:39.560 --> 01:18:41.650
Let me create a new file called length.

01:18:41.650 --> 01:18:46.000
And let's use this basic idea to
figure out what the length of a string

01:18:46.000 --> 01:18:50.720
is after it's been stored in a variable.

01:18:50.720 --> 01:18:51.860
So let's do this.

01:18:51.860 --> 01:18:56.530
Let me include both the CS50
header and the standard I/O header,

01:18:56.530 --> 01:19:01.250
give myself int main(void) again
here, and inside of main, do this.

01:19:01.250 --> 01:19:04.060
Let me prompt the user for
a string s and I'll ask them

01:19:04.060 --> 01:19:08.170
for a string like their name, here.

01:19:08.170 --> 01:19:13.420
And then let me name it more
verbosely name this time.

01:19:13.420 --> 01:19:15.170
Now let me go ahead and do this.

01:19:15.170 --> 01:19:20.260
Let me iterate over every
character in this string

01:19:20.260 --> 01:19:22.180
in order to figure out
what its length is.

01:19:22.180 --> 01:19:25.060
So initially, I'm going
to go ahead and say this,

01:19:25.060 --> 01:19:28.040
int length equals 0, because
I don't know what it is yet.

01:19:28.040 --> 01:19:29.290
So we're going to start at 0.

01:19:29.290 --> 01:19:32.410
And then while the following is true--

01:19:32.410 --> 01:19:37.370
while-- let me-- do I want to do this?

01:19:37.370 --> 01:19:40.060
Let me change this to i,
just for clarity, let me do

01:19:40.060 --> 01:19:45.790
this, while name bracket i does not
equal that special nul character.

01:19:45.790 --> 01:19:49.180
So I typed it on the slide is N-U-L,
but you don't write N-U-L in code,

01:19:49.180 --> 01:19:53.665
you actually use its numeric equivalent,
which is /0 in single quotes.

01:19:53.665 --> 01:19:58.930
While name bracket i does not equal the
nul character, I'm going to go ahead

01:19:58.930 --> 01:20:02.470
and increment i to i plus plus.

01:20:02.470 --> 01:20:05.470
And then down here I'm going
to print out the value of i

01:20:05.470 --> 01:20:09.270
to see what we actually get,
printing out the value of i.

01:20:09.270 --> 01:20:11.020
All right, so what's
going to happen here?

01:20:11.020 --> 01:20:13.420
Let me run make length.

01:20:13.420 --> 01:20:14.740
Fortunately no errors.

01:20:14.740 --> 01:20:19.570
./length and let me type in something
like H-I, exclamation point, Enter.

01:20:19.570 --> 01:20:20.740
And I get 3.

01:20:20.740 --> 01:20:23.950
Let me try bye,
exclamation point, Enter.

01:20:23.950 --> 01:20:25.870
And I get 4.

01:20:25.870 --> 01:20:28.510
Let me try my own name, David, Enter.

01:20:28.510 --> 01:20:29.970
5, and so forth.

01:20:29.970 --> 01:20:31.880
So what's actually going on here?

01:20:31.880 --> 01:20:34.490
Well, it seems that
by way of this 4 loop,

01:20:34.490 --> 01:20:36.622
we are specifying a
local variable called

01:20:36.622 --> 01:20:39.580
i initialized to 0, because we're
figuring out the length of the string

01:20:39.580 --> 01:20:40.580
as we go.

01:20:40.580 --> 01:20:44.050
I'm then asking the
question, does location 0,

01:20:44.050 --> 01:20:49.300
that is i in the name string,
which we now know is an array,

01:20:49.300 --> 01:20:51.700
does it not equal /0?

01:20:51.700 --> 01:20:55.645
Because if it doesn't, that means it's
an actual character like H, or B, or D.

01:20:55.645 --> 01:20:57.640
So let's increment i.

01:20:57.640 --> 01:21:00.910
Then, let's come back around to line
9 and let's ask the question again.

01:21:00.910 --> 01:21:02.590
Now i equals 1.

01:21:02.590 --> 01:21:06.420
So does name bracket 1 not equal /0?

01:21:06.420 --> 01:21:12.070
Well, if it doesn't, and it won't
if it's an i, or a y, or an a,

01:21:12.070 --> 01:21:15.490
based on what I typed in, we're
going to increment i once more.

01:21:15.490 --> 01:21:18.940
Fast-forward to the end of the story,
once I get to the end of the string,

01:21:18.940 --> 01:21:22.420
technically, one space
past the end of the string,

01:21:22.420 --> 01:21:25.510
name bracket i will equal /0.

01:21:25.510 --> 01:21:29.960
So I don't increment i anymore, I
end up just printing the result.

01:21:29.960 --> 01:21:34.510
So what we seem to have here with some
low level C code, just this while loop,

01:21:34.510 --> 01:21:39.070
is a program that figures out the length
of a given string that's been typed in.

01:21:39.070 --> 01:21:41.860
Let's practice our abstraction
and decompose this into,

01:21:41.860 --> 01:21:43.270
maybe, a helper function here.

01:21:43.270 --> 01:21:47.110
Let me grab all of this
code here, and assume,

01:21:47.110 --> 01:21:51.580
for the sake of discussion for a moment,
that I can call a function now called

01:21:51.580 --> 01:21:53.740
string length.

01:21:53.740 --> 01:21:56.830
And the length of the string
is name that I want to get,

01:21:56.830 --> 01:22:01.000
and then I'll go ahead and print
out, just as before with %i,

01:22:01.000 --> 01:22:02.398
the length of that string.

01:22:02.398 --> 01:22:04.690
So now I'm abstracting away
this notion of figuring out

01:22:04.690 --> 01:22:05.732
the length of the string.

01:22:05.732 --> 01:22:08.470
That's an opportunity for to
me to create my own function.

01:22:08.470 --> 01:22:11.515
If I want to create a
function called string length,

01:22:11.515 --> 01:22:15.610
I'll claim that I want to
take a string as input,

01:22:15.610 --> 01:22:20.860
and what should I have this
function return as its return type?

01:22:20.860 --> 01:22:26.090
What should get string
presumably return?

01:22:26.090 --> 01:22:26.590
Yeah?

01:22:26.590 --> 01:22:27.430
AUDIENCE: Int.

01:22:27.430 --> 01:22:28.270
DAVID MALAN: An int, right?

01:22:28.270 --> 01:22:29.020
An int makes sense.

01:22:29.020 --> 01:22:30.937
Float really wouldn't
make sense because we're

01:22:30.937 --> 01:22:33.377
measuring things that are integers.

01:22:33.377 --> 01:22:34.960
In this case, the length of something.

01:22:34.960 --> 01:22:36.640
So indeed, let's have it return an int.

01:22:36.640 --> 01:22:39.380
I can use the same
code as before, so I'm

01:22:39.380 --> 01:22:42.175
going to paste what I
cut earlier in the file.

01:22:42.175 --> 01:22:46.660
The only thing I have to change
is the name of the variable.

01:22:46.660 --> 01:22:50.240
Because now this function,
I decided arbitrarily

01:22:50.240 --> 01:22:53.130
that I'm going to call it
s, just to be more generic.

01:22:53.130 --> 01:22:55.915
So I'm going to look at s
bracket i at each location.

01:22:55.915 --> 01:22:58.790
And I don't want to print it at the
end, this would be a side effect.

01:22:58.790 --> 01:23:01.250
What's the line of code I should
include here if I actually

01:23:01.250 --> 01:23:04.005
want to hand back the total length?

01:23:04.005 --> 01:23:04.505
Yeah?

01:23:04.505 --> 01:23:05.362
AUDIENCE: Return i.

01:23:05.362 --> 01:23:06.320
DAVID MALAN: Say again?

01:23:06.320 --> 01:23:07.112
AUDIENCE: Return i.

01:23:07.112 --> 01:23:09.270
DAVID MALAN: Return i, in this case.

01:23:09.270 --> 01:23:11.540
So I'm going return i, not print it.

01:23:11.540 --> 01:23:16.490
Because now, my main function can
use the return value stored in length

01:23:16.490 --> 01:23:18.530
and print it on the next line itself.

01:23:18.530 --> 01:23:22.520
I just need a prototype, so that's
my one forgivable copy paste here.

01:23:22.520 --> 01:23:24.170
I'm going to rerun make length.

01:23:24.170 --> 01:23:25.640
Hopefully I didn't screw up.

01:23:25.640 --> 01:23:29.330
I didn't. ./length,
I'll type in hi-- oops--

01:23:29.330 --> 01:23:31.340
I'll type in hi, again.

01:23:31.340 --> 01:23:31.880
That works.

01:23:31.880 --> 01:23:34.970
I'll type in bye again, and so forth.

01:23:34.970 --> 01:23:38.703
So now we have a function that
determines the length of a string.

01:23:38.703 --> 01:23:41.120
Well, it turns out we didn't
actually need this all along.

01:23:41.120 --> 01:23:46.042
It turns out that we can get rid of my
own custom string length function here.

01:23:46.042 --> 01:23:48.500
I can definitely delete the
whole implementation down here.

01:23:48.500 --> 01:23:52.160
Because it turns out, in
a file called string.h,

01:23:52.160 --> 01:23:55.520
which is a new header file today, we
actually have access to a function

01:23:55.520 --> 01:23:59.690
called, more succinctly,
strlen, S-T-R-L-E-N. Which,

01:23:59.690 --> 01:24:01.130
literally does that.

01:24:01.130 --> 01:24:05.240
This is a function that comes with C,
albeit in the string.h header file,

01:24:05.240 --> 01:24:09.450
and it does what we just
implemented manually.

01:24:09.450 --> 01:24:13.340
So here's an example of, admittedly, a
wheel we just reinvented, but no more.

01:24:13.340 --> 01:24:14.480
We don't have to do that.

01:24:14.480 --> 01:24:16.850
And how do what kinds
of functions exist?

01:24:16.850 --> 01:24:21.260
Well, let me pop out of my
browser here to a website that

01:24:21.260 --> 01:24:24.455
is a CS50's incarnation of
what are called manual pages.

01:24:24.455 --> 01:24:28.070
It turns out that in a lot
of systems, Macs, and Unix,

01:24:28.070 --> 01:24:31.100
and Linux systems, including
the Visual Studio Code

01:24:31.100 --> 01:24:33.020
instance that we have
in the cloud, there

01:24:33.020 --> 01:24:36.290
are publicly accessible
manual pages for functions.

01:24:36.290 --> 01:24:39.770
They tend to be written very
expertly, in a way that's

01:24:39.770 --> 01:24:41.160
not very beginner-friendly.

01:24:41.160 --> 01:24:45.650
So we have here at
manual.cs50.io is CS50's version

01:24:45.650 --> 01:24:48.740
of manual pages that have this
less-comfortable mode that

01:24:48.740 --> 01:24:51.290
give you a, sort of, cheat
sheet of very frequently used,

01:24:51.290 --> 01:24:55.010
helpful functions in C. And
we've translated the expert

01:24:55.010 --> 01:24:58.075
notation to things that a
beginner can understand.

01:24:58.075 --> 01:25:02.190
So, for instance, let me go ahead and
search for a string up at the top here.

01:25:02.190 --> 01:25:06.200
You'll see that there's documentation
for our own get string function,

01:25:06.200 --> 01:25:08.510
but more interestingly
down here, there's

01:25:08.510 --> 01:25:10.850
a whole bunch of
string-related functions

01:25:10.850 --> 01:25:12.620
that we haven't even seen most of, yet.

01:25:12.620 --> 01:25:14.660
But there's indeed one
here called strlen,

01:25:14.660 --> 01:25:16.620
calculate the length of a string.

01:25:16.620 --> 01:25:22.160
And so if I go to strlen here, I'll
see some less-comfortable documentation

01:25:22.160 --> 01:25:22.970
for this function.

01:25:22.970 --> 01:25:25.400
And the way a manual
page typically works,

01:25:25.400 --> 01:25:28.310
whether in CS50's format
or any other, system

01:25:28.310 --> 01:25:30.950
is you see, typically, a
synopsis of what header

01:25:30.950 --> 01:25:33.330
files you need to use the function.

01:25:33.330 --> 01:25:35.960
So you would copy paste
these couple of lines here.

01:25:35.960 --> 01:25:39.530
You see what the prototype
is of the function so

01:25:39.530 --> 01:25:42.533
that you know what its inputs are,
if any, and its outputs are, if any.

01:25:42.533 --> 01:25:45.200
Then down below you might see a
description, which in this case,

01:25:45.200 --> 01:25:46.320
is pretty straightforward.

01:25:46.320 --> 01:25:48.170
This function calculates
the length of s.

01:25:48.170 --> 01:25:51.110
Then you see what the
return value is, if any,

01:25:51.110 --> 01:25:54.310
and you might even see an example, like
this one that we've whipped up here.

01:25:54.310 --> 01:25:57.012
So these manual pages
which are again, accessible

01:25:57.012 --> 01:25:59.720
here, and we'll link to these in
the problem sets moving forward,

01:25:59.720 --> 01:26:02.510
are pretty much the place to
start when you want to figure out

01:26:02.510 --> 01:26:05.210
has a wheel been invented already?

01:26:05.210 --> 01:26:08.490
Is there a function that might help
me solve some problems set problems

01:26:08.490 --> 01:26:11.900
so that I don't have to really
get into the weeds of doing all

01:26:11.900 --> 01:26:13.712
of those lower-level steps as I've had.

01:26:13.712 --> 01:26:16.670
Sometimes the answer is going to be
yes, sometimes it's going to be no.

01:26:16.670 --> 01:26:19.160
But again the point of our
having just done this together

01:26:19.160 --> 01:26:21.950
is to reveal that even the
functions you start taking for

01:26:21.950 --> 01:26:26.135
granted, they all reduce to some
of these basic building blocks.

01:26:26.135 --> 01:26:29.600
At the end of the day, this is
all that's inside of your computer

01:26:29.600 --> 01:26:30.950
is 0s and 1s.

01:26:30.950 --> 01:26:33.060
We're just learning,
now, how to harness those

01:26:33.060 --> 01:26:37.220
and how to manipulate them ourselves.

01:26:37.220 --> 01:26:41.510
Any questions here on this?

01:26:41.510 --> 01:26:43.305
Any questions at all?

01:26:43.305 --> 01:26:43.805
Yeah.

01:26:43.805 --> 01:26:51.779
AUDIENCE: We did just see
[INAUDIBLE] Is that so common

01:26:51.779 --> 01:26:54.035
that we would have to
specify it, or is it not?

01:26:54.035 --> 01:26:55.160
DAVID MALAN: Good question.

01:26:55.160 --> 01:26:57.920
Is it so common that you would
have to specify it or not?

01:26:57.920 --> 01:27:00.170
You do need to include its
header files because that's

01:27:00.170 --> 01:27:01.670
where all of those prototypes are.

01:27:01.670 --> 01:27:05.190
You don't need to worry about
linking it in with -l anything.

01:27:05.190 --> 01:27:07.340
And in fact, moving
forward, you do not ever

01:27:07.340 --> 01:27:10.910
need to worry about linking in
libraries when compiling your code.

01:27:10.910 --> 01:27:14.940
We, the staff, have configured make to
do all of that for you automatically.

01:27:14.940 --> 01:27:17.030
We want you to understand
that it is doing it,

01:27:17.030 --> 01:27:19.340
but we'll take care of
all of the -l's for you.

01:27:19.340 --> 01:27:23.360
But the onus is on you for the
prototypes and the header files.

01:27:23.360 --> 01:27:27.150
Other questions on these
representations or techniques?

01:27:27.150 --> 01:27:27.650
Yeah?

01:27:27.650 --> 01:27:35.920
AUDIENCE: [INAUDIBLE] exclamation mark.

01:27:35.920 --> 01:27:40.524
How does it actually define
the spaces [INAUDIBLE]??

01:27:40.524 --> 01:27:41.920
DAVID MALAN: A good question.

01:27:41.920 --> 01:27:45.700
If you were to have a string with actual
spaces in it that is multiple words,

01:27:45.700 --> 01:27:47.530
what would the computer actually do?

01:27:47.530 --> 01:27:49.960
Well for this. let me
go to asciichart.com.

01:27:49.960 --> 01:27:54.880
Which is just a random website that's
my go-to for the first 127 characters

01:27:54.880 --> 01:27:55.930
of ASCII.

01:27:55.930 --> 01:27:58.520
This is, in fact, what we had
a screenshot of the other day.

01:27:58.520 --> 01:28:02.088
And if you look here, it's a little
non-obvious, but S-P is space.

01:28:02.088 --> 01:28:05.380
If a computer were to store a space, it
would actually store the decimal number

01:28:05.380 --> 01:28:10.430
32, or technically, the pattern of 0s
and 1s that represent the number 32.

01:28:10.430 --> 01:28:13.240
All of the US English keys that
you might type on a keyboard

01:28:13.240 --> 01:28:16.390
can be represented with a
number, and using Unicode can

01:28:16.390 --> 01:28:18.920
you express even things like
emojis and other languages.

01:28:18.920 --> 01:28:19.420
Yeah?

01:28:19.420 --> 01:28:23.130
AUDIENCE: Are only strings
followed by nul number,

01:28:23.130 --> 01:28:26.516
or let's say we had a series of
numbers, would each one of them

01:28:26.516 --> 01:28:27.845
be accompanied by nuls?

01:28:27.845 --> 01:28:28.970
DAVID MALAN: Good question.

01:28:28.970 --> 01:28:31.790
Only strings are accompanied
by nuls at the end

01:28:31.790 --> 01:28:34.760
because every other data type
we've talked about thus far

01:28:34.760 --> 01:28:37.130
is of well defined finite length.

01:28:37.130 --> 01:28:40.190
1 byte for char, 4 bytes
for ints and so forth.

01:28:40.190 --> 01:28:44.240
If we think back to last week, we did
end the week with a couple of problems.

01:28:44.240 --> 01:28:48.080
Integer overflow, because 4 bytes, heck,
even 8 bytes is sometimes not enough.

01:28:48.080 --> 01:28:50.270
We also talked about
floating point imprecision.

01:28:50.270 --> 01:28:53.480
Thankfully in the world of scientific
computing and financial computing,

01:28:53.480 --> 01:28:56.930
there are libraries you can
use that draw inspiration

01:28:56.930 --> 01:28:58.820
from this idea of a
string, and they might

01:28:58.820 --> 01:29:02.640
use 9 bytes for an integer
value or maybe 20 bytes

01:29:02.640 --> 01:29:04.170
that you can count really high.

01:29:04.170 --> 01:29:06.680
But they will then start to
manage that memory for you

01:29:06.680 --> 01:29:09.960
and what they're really probably doing
is just grabbing a whole bunch of bytes

01:29:09.960 --> 01:29:13.070
and somehow remembering how
long the sequence of bytes is.

01:29:13.070 --> 01:29:16.190
That's how these higher-level
libraries work, too.

01:29:16.190 --> 01:29:17.700
All right, this has been a lot.

01:29:17.700 --> 01:29:19.080
Let's take one more break here.

01:29:19.080 --> 01:29:20.670
We'll do a seven-minute break here.

01:29:20.670 --> 01:29:23.465
And when we come back, we'll
flesh out a few more details.

01:29:23.465 --> 01:29:26.390
All right.

01:29:26.390 --> 01:29:31.400
So we just saw strlen as an
example of a function that

01:29:31.400 --> 01:29:32.898
comes in the string library.

01:29:32.898 --> 01:29:35.690
Let's start to take more of these
library functions out for a spin.

01:29:35.690 --> 01:29:39.530
So we're not relying only on the
built ins that we saw last week.

01:29:39.530 --> 01:29:41.660
Let me switch over to VS Code.

01:29:41.660 --> 01:29:46.040
And create a file called, say string.h.

01:29:46.040 --> 01:29:48.115
to apply this lesson
learned, as follows.

01:29:48.115 --> 01:29:54.770
Let me include cs50.h,
stdio.h, and this new thing,

01:29:54.770 --> 01:29:57.260
string.h as well, at the top.

01:29:57.260 --> 01:29:59.698
I'm going to do the usual
int main(void) here.

01:29:59.698 --> 01:30:02.240
And then in this program suppose,
for the sake of discussion,

01:30:02.240 --> 01:30:05.540
that I didn't know about
%s for printf or, heck,

01:30:05.540 --> 01:30:09.300
maybe early on there
was no %s format code.

01:30:09.300 --> 01:30:12.420
And so there was no easy
way to print strings.

01:30:12.420 --> 01:30:15.830
Well, at least if we know that
strings are just arrays of characters,

01:30:15.830 --> 01:30:19.820
we could use %c as a
workaround, a solution to that,

01:30:19.820 --> 01:30:21.420
sort of, contrived problem.

01:30:21.420 --> 01:30:24.920
So let me ask myself for a
string s by using get string here

01:30:24.920 --> 01:30:27.500
and I'll ask the user for some input.

01:30:27.500 --> 01:30:33.260
And then, let me print out say, output
, and all I want to do is print back out

01:30:33.260 --> 01:30:34.460
what the user typed.

01:30:34.460 --> 01:30:38.000
Now, the simplest way to do this, of
course, is going to be like last week,

01:30:38.000 --> 01:30:40.960
printf %s, and plug in
the s, and we're done.

01:30:40.960 --> 01:30:43.730
But again, for the sake of
discussion, I forgot about,

01:30:43.730 --> 01:30:47.820
or someone didn't implement %s,
so how else could we do this?

01:30:47.820 --> 01:30:51.800
Well, in pseudo code, or in English
what's the gist of how we could solve

01:30:51.800 --> 01:30:58.910
this problem, printing out the string
s on the screen without using %s?

01:30:58.910 --> 01:31:02.420
How might we go about solving this?

01:31:02.420 --> 01:31:04.147
Just in English, high-level?

01:31:04.147 --> 01:31:05.730
What would your pseudo code look like?

01:31:05.730 --> 01:31:06.230
Yeah?

01:31:06.230 --> 01:31:09.568
AUDIENCE: You could
just print each letter.

01:31:09.568 --> 01:31:11.360
DAVID MALAN: OK, so
just print each letter.

01:31:11.360 --> 01:31:13.490
And maybe, more precisely,
some kind of loop.

01:31:13.490 --> 01:31:17.030
Like, let's iterate over
all of the characters in s

01:31:17.030 --> 01:31:18.150
and print one at a time.

01:31:18.150 --> 01:31:19.290
So how can I do that?

01:31:19.290 --> 01:31:24.050
Well, for int i, get 0 is kind of the
go-to starting point for most loops,

01:31:24.050 --> 01:31:25.580
i is less than--

01:31:25.580 --> 01:31:27.365
OK, how long do I want to iterate?

01:31:27.365 --> 01:31:29.240
Well, it's going to
depend on what I type in,

01:31:29.240 --> 01:31:31.300
but that's why we have strlen now.

01:31:31.300 --> 01:31:36.080
So iterate up to the length of
s, and then increment i with plus

01:31:36.080 --> 01:31:37.075
plus on each iteration.

01:31:37.075 --> 01:31:40.670
And then let's just print
out %c with no new line,

01:31:40.670 --> 01:31:43.010
because I want everything
on the same line,

01:31:43.010 --> 01:31:47.780
whatever the character
is at s bracket i.

01:31:47.780 --> 01:31:49.790
And then at the very
end, I'll give myself

01:31:49.790 --> 01:31:52.350
that new line, just to move the
cursor down to the next line

01:31:52.350 --> 01:31:54.350
so the dollar sign is
not in a weird place.

01:31:54.350 --> 01:31:57.230
All right, so let's see if I
didn't screw up any of the code,

01:31:57.230 --> 01:32:02.690
make string, Enter, so far so good,
string and let me type in something

01:32:02.690 --> 01:32:04.520
like, hi, Enter.

01:32:04.520 --> 01:32:06.020
And I see output of hi, too.

01:32:06.020 --> 01:32:09.680
Let me do it once more with
bye, Enter, and that works, too.

01:32:09.680 --> 01:32:12.410
Notice I very deliberately
and quickly gave myself

01:32:12.410 --> 01:32:15.260
two spaces here and one space
here just because I, literally,

01:32:15.260 --> 01:32:18.620
wanted these things to line up properly,
and input is shorter than output.

01:32:18.620 --> 01:32:21.830
But that was just a
deliberate formatting detail.

01:32:21.830 --> 01:32:23.520
So this code is correct.

01:32:23.520 --> 01:32:29.240
Which is a claim I've made before,
but it's not well-designed.

01:32:29.240 --> 01:32:33.170
It is well-designed in that I'm using
someone else's library function,

01:32:33.170 --> 01:32:35.660
like, I've not reinvented
a wheel, there's no line 15

01:32:35.660 --> 01:32:38.270
or below, I didn't implement
string length myself.

01:32:38.270 --> 01:32:43.640
So I'm at least practicing
what I've preached.

01:32:43.640 --> 01:32:48.360
But there's still an
imperfection, a suboptimality.

01:32:48.360 --> 01:32:50.910
This one's really subtle though.

01:32:50.910 --> 01:32:54.330
And you have to think
about how loops work.

01:32:54.330 --> 01:32:58.640
What am I doing that's
not super efficient?

01:32:58.640 --> 01:32:59.870
Yeah, in back?

01:32:59.870 --> 01:33:03.178
AUDIENCE: [INAUDIBLE]
over and over again.

01:33:03.178 --> 01:33:04.970
DAVID MALAN: Yeah, this
is a little subtle.

01:33:04.970 --> 01:33:07.460
But if you think back to the
basic definition of a 4 loop

01:33:07.460 --> 01:33:10.070
and recall when I highlighted
things last week, what happens?

01:33:10.070 --> 01:33:12.830
Well, the first thing
is that i gets set to 0.

01:33:12.830 --> 01:33:14.310
Then we check the condition.

01:33:14.310 --> 01:33:15.560
How do we check the condition?

01:33:15.560 --> 01:33:18.380
We call strlen on s,
we get back an answer

01:33:18.380 --> 01:33:24.810
like 3 if it's a H-I exclamation point
and 0 is less than 3, so that's fine,

01:33:24.810 --> 01:33:26.570
and then we print out the character.

01:33:26.570 --> 01:33:29.060
Then we increment i from 0 to 1.

01:33:29.060 --> 01:33:30.468
We recheck the condition.

01:33:30.468 --> 01:33:31.760
How do I recheck the condition?

01:33:31.760 --> 01:33:34.100
I call strlen of s.

01:33:34.100 --> 01:33:36.890
Get back the same answer, 3.

01:33:36.890 --> 01:33:38.720
Compare 3 against 1.

01:33:38.720 --> 01:33:39.800
We're still good.

01:33:39.800 --> 01:33:44.690
So we print out another character. i
gets incremented again, i is now 2.

01:33:44.690 --> 01:33:46.035
We check the condition.

01:33:46.035 --> 01:33:46.910
What's the condition?

01:33:46.910 --> 01:33:47.960
Well, what's the string like the best?

01:33:47.960 --> 01:33:48.980
It's still 3.

01:33:48.980 --> 01:33:51.860
2 is still less than 3.

01:33:51.860 --> 01:33:55.430
So I keep asking the same
question sort of stupidly

01:33:55.430 --> 01:33:58.220
because the string is, presumably,
never changing in length.

01:33:58.220 --> 01:34:00.158
And indeed, every time
I check that condition,

01:34:00.158 --> 01:34:01.700
that function is going to get called.

01:34:01.700 --> 01:34:04.380
And every time, the answer
for hi is going to be 3.

01:34:04.380 --> 01:34:04.880
3.

01:34:04.880 --> 01:34:06.095
3.

01:34:06.095 --> 01:34:10.850
So it's a marginal suboptimality,
but I could do better, right?

01:34:10.850 --> 01:34:15.560
Don't ask multiple times questions
that you can remember the answer to.

01:34:15.560 --> 01:34:20.960
So how could I remember the answer to
this question and ask it just once?

01:34:20.960 --> 01:34:24.750
How could I remember the
answer to this question?

01:34:24.750 --> 01:34:25.250
Let me see.

01:34:25.250 --> 01:34:26.030
Yeah, back there?

01:34:26.030 --> 01:34:27.446
AUDIENCE: Store it in a variable.

01:34:27.446 --> 01:34:29.180
DAVID MALAN: So store
it in a variable, right?

01:34:29.180 --> 01:34:32.097
That's been our answer most any time
we want to keep something around.

01:34:32.097 --> 01:34:33.120
So how could I do this?

01:34:33.120 --> 01:34:37.880
Well, I could do something like this,
int, maybe, length equals strlen of s.

01:34:37.880 --> 01:34:41.200
Then I can just change
this function call.

01:34:41.200 --> 01:34:43.160
Let me fix my spelling here.

01:34:43.160 --> 01:34:47.360
Let me fix this to be comparing
against length, and this is now OK.

01:34:47.360 --> 01:34:50.240
Because now strlen is only
called once on line 9.

01:34:50.240 --> 01:34:52.740
And I'm reusing the value
of that variable, a.k.a.

01:34:52.740 --> 01:34:54.240
length, again, and again, and again.

01:34:54.240 --> 01:34:55.282
So that's more efficient.

01:34:55.282 --> 01:34:59.760
Turns out that 4 loops let you
declare multiple variables at once,

01:34:59.760 --> 01:35:04.020
so we can do this a little
more elegantly all in one line.

01:35:04.020 --> 01:35:06.770
And this is just some
syntactic improvement.

01:35:06.770 --> 01:35:11.930
I could actually do something
like this, n equals strlen of s,

01:35:11.930 --> 01:35:14.750
and then I could just say n
here or I could call it length.

01:35:14.750 --> 01:35:17.667
But heck, while I'm being succinct
I'm just going to use n for number.

01:35:17.667 --> 01:35:22.100
So now it's just a marginal
change but I've now

01:35:22.100 --> 01:35:26.030
declared two variables
inside of my loop, i and n.

01:35:26.030 --> 01:35:29.300
i is set to 0. n extends
to the string length of s.

01:35:29.300 --> 01:35:33.380
But now, hereafter, all of my condition
checks are just, i less than n,

01:35:33.380 --> 01:35:36.170
i less than n, and n is never changing.

01:35:36.170 --> 01:35:38.008
All right, so a marginal
improvement there.

01:35:38.008 --> 01:35:39.800
Now that I've used this
new function, let's

01:35:39.800 --> 01:35:41.925
use some other functions
that might be of interest.

01:35:41.925 --> 01:35:48.680
Let me write a quick program here
that capitalizes the beginning of--

01:35:48.680 --> 01:35:51.810
changes to uppercase some
string that the user types in.

01:35:51.810 --> 01:35:55.490
So let me code a file
called uppercase.c.

01:35:55.490 --> 01:36:01.520
Up here I'll use my new friends,
cs50.h, and standard I/O, and string.h.

01:36:01.520 --> 01:36:07.070
So standard I/O, and string.h So
just as before int main(void).

01:36:07.070 --> 01:36:09.620
And then inside of main, what
I'm going to do this time,

01:36:09.620 --> 01:36:14.390
is let's ask the user for a string
s using get string asking them

01:36:14.390 --> 01:36:15.680
for the before value.

01:36:15.680 --> 01:36:20.130
And then let me print
out something like after.

01:36:20.130 --> 01:36:24.410
So that it-- just so I can see what
the uppercase version thereof is.

01:36:24.410 --> 01:36:28.610
And then after this, let me
do the following, for int, i

01:36:28.610 --> 01:36:32.030
equals 0, oh, let's
practice that same lesson,

01:36:32.030 --> 01:36:37.790
so n equals the string length of
s, i is less than n, i plus plus.

01:36:37.790 --> 01:36:41.600
So really, nothing
new, fundamentally yet.

01:36:41.600 --> 01:36:47.270
How do I now convert characters from
lowercase, if they are, to uppercase?

01:36:47.270 --> 01:36:50.000
In other words, if I type
in hi, H-I in lowercase,

01:36:50.000 --> 01:36:55.490
I want my program, now, to uppercase
everything to capital H, capital I.

01:36:55.490 --> 01:36:58.770
Well how can I go about doing this?

01:36:58.770 --> 01:37:01.010
Well you might recall
that there is this--

01:37:01.010 --> 01:37:03.900
you might recall that
there is this ASCII chart.

01:37:03.900 --> 01:37:06.855
So let's just consult this
real quick on asciichart.com.

01:37:06.855 --> 01:37:11.510
We've looked at this last week
notice that a-- capital A is 65,

01:37:11.510 --> 01:37:15.440
capital B is 66, capital
C is 67, and heck, here's

01:37:15.440 --> 01:37:19.640
lowercase a, lowercase b,
lowercase c, and that's 97, 98, 99.

01:37:19.640 --> 01:37:22.980
And if I actually do some
math, there's a distance of 32.

01:37:22.980 --> 01:37:23.480
Right?

01:37:23.480 --> 01:37:25.640
So if I want to go from
uppercase to lowercase,

01:37:25.640 --> 01:37:30.788
I can do 65 plus 32 will give me
97 and that actually works out

01:37:30.788 --> 01:37:32.330
across the board for everything else.

01:37:32.330 --> 01:37:36.020
66 plus 32 gets me to 98 or lowercase b.

01:37:36.020 --> 01:37:40.640
Or conversely, if you have a
lowercase a, and its value is 97,

01:37:40.640 --> 01:37:46.850
subtract 32 and boom, you have capital
A. So there's some arithmetic involved.

01:37:46.850 --> 01:37:49.460
But now that we know that
strings are just arrays,

01:37:49.460 --> 01:37:53.330
and we know that characters,
which are in those arrays,

01:37:53.330 --> 01:37:56.450
are just binary
representations of numbers,

01:37:56.450 --> 01:37:59.297
I think we can manipulate a
few of these things as follows.

01:37:59.297 --> 01:38:01.130
Let me go back to my
program here, and first

01:38:01.130 --> 01:38:05.360
ask the question, if the current
character in the array during this loop

01:38:05.360 --> 01:38:08.930
is lowercase, let's
force it to uppercase.

01:38:08.930 --> 01:38:10.250
So how am I going to do that?

01:38:10.250 --> 01:38:16.460
If the character at s bracket i,
the current location in the array,

01:38:16.460 --> 01:38:21.320
is greater than or equal to
lowercase a, and s bracket

01:38:21.320 --> 01:38:26.660
i is less than or equal to
lowercase z, kind of a weird Boolean

01:38:26.660 --> 01:38:31.460
expression but it's completely
legitimate, because in this array

01:38:31.460 --> 01:38:34.230
s is a whole bunch of characters
that the humans typed in,

01:38:34.230 --> 01:38:37.520
because that's what a string is,
greater than or equal to a might

01:38:37.520 --> 01:38:39.680
be a little nonsensical
because when have you ever

01:38:39.680 --> 01:38:41.330
compared numbers to letters?

01:38:41.330 --> 01:38:47.568
But we know from week 0 lowercase a
is 97, lowercase z is, what is it, 1?

01:38:47.568 --> 01:38:48.485
I don't even remember.

01:38:48.485 --> 01:38:49.065
AUDIENCE: 132.

01:38:49.065 --> 01:38:49.850
DAVID MALAN: What's that?

01:38:49.850 --> 01:38:50.590
AUDIENCE: 132?

01:38:50.590 --> 01:38:52.590
DAVID MALAN: 132, We know.

01:38:52.590 --> 01:38:56.390
And so that would allow us to answer
the question is the current letter

01:38:56.390 --> 01:38:57.410
lowercase?

01:38:57.410 --> 01:39:00.530
All right, so let me
answer that question.

01:39:00.530 --> 01:39:03.140
If it is, what do I want to print out?

01:39:03.140 --> 01:39:05.870
I don't want to print
out the letter itself,

01:39:05.870 --> 01:39:09.290
I want to print out the
letter minus 32, right?

01:39:09.290 --> 01:39:13.160
Because if it happens to be a
lowercase a, 97, 97 minus 32

01:39:13.160 --> 01:39:15.530
gives me 65, which is
uppercase A, and I know that

01:39:15.530 --> 01:39:18.860
just from having stared
at that chart in the past.

01:39:18.860 --> 01:39:24.172
Else if the character is not
between little a and big A,

01:39:24.172 --> 01:39:25.880
I'm just going to
print out the character

01:39:25.880 --> 01:39:28.550
itself by printing s bracket i.

01:39:28.550 --> 01:39:31.580
And at the very end of this, I'm
going to print out a new line just

01:39:31.580 --> 01:39:33.480
to move the cursor to the next line.

01:39:33.480 --> 01:39:34.930
So again, it's a little wordy.

01:39:34.930 --> 01:39:39.020
But this loop here, which I
borrowed from our code previously,

01:39:39.020 --> 01:39:41.510
just iterates over the string, a.k.a.

01:39:41.510 --> 01:39:44.630
array, character-by-character,
through its length.

01:39:44.630 --> 01:39:47.360
This line 11 here is
just asking the question

01:39:47.360 --> 01:39:50.870
if that current character,
the i-th character of s,

01:39:50.870 --> 01:39:53.900
is greater than or equal
to little a and less

01:39:53.900 --> 01:39:59.240
than or equal to little z, that
is between 97 and 132, then

01:39:59.240 --> 01:40:04.940
we're going to go ahead and
force it to uppercase instead.

01:40:04.940 --> 01:40:09.290
All right, and let me zoom
out here for just a second.

01:40:09.290 --> 01:40:14.270
And sorry, I misspoke 122, which
is what you might have said.

01:40:14.270 --> 01:40:15.630
There's only 26 letters.

01:40:15.630 --> 01:40:17.270
So 122 is little z.

01:40:17.270 --> 01:40:20.280
Let me go ahead now and
compile and run this program.

01:40:20.280 --> 01:40:26.210
So make uppercase, ./uppercase, and
let me type in hi in lowercase, Enter.

01:40:26.210 --> 01:40:28.520
And there's the capitalized
version, thereof.

01:40:28.520 --> 01:40:30.920
Let me do it again, with
my own name in lowercase,

01:40:30.920 --> 01:40:33.100
and now it's capitalized as well.

01:40:33.100 --> 01:40:34.860
Well, what could we do to improve this?

01:40:34.860 --> 01:40:35.360
Well.

01:40:35.360 --> 01:40:35.960
You know what?

01:40:35.960 --> 01:40:37.640
Let's stop reinventing wheels.

01:40:37.640 --> 01:40:39.840
Let's go to the manual pages.

01:40:39.840 --> 01:40:43.490
So let me go here and search for
something like, I don't know,

01:40:43.490 --> 01:40:44.540
lowercase.

01:40:44.540 --> 01:40:45.620
And there I go.

01:40:45.620 --> 01:40:48.470
I did some auto complete
here, our little search box

01:40:48.470 --> 01:40:50.720
is saying that, OK there's
an is-lower function,

01:40:50.720 --> 01:40:52.550
check whether a character is lowercase.

01:40:52.550 --> 01:40:53.640
Well how do I use this?

01:40:53.640 --> 01:40:59.150
Well let me check, is lower, now I see
the actual man page for this function.

01:40:59.150 --> 01:41:01.850
Now we see, include ctype.h.

01:41:01.850 --> 01:41:02.902
So that's the protot--

01:41:02.902 --> 01:41:04.610
that's the header file
I need to include.

01:41:04.610 --> 01:41:08.570
This is the prototype for is-lower,
it apparently takes a char as input

01:41:08.570 --> 01:41:10.330
and returns an int.

01:41:10.330 --> 01:41:11.330
Which is a little weird.

01:41:11.330 --> 01:41:14.400
I feel like is-lower should
return true or false.

01:41:14.400 --> 01:41:18.680
So let's scroll down to the
description and return value.

01:41:18.680 --> 01:41:20.810
It returns, oh this is interesting.

01:41:20.810 --> 01:41:25.370
And this is a convention in C. This
function returns a non-zero int

01:41:25.370 --> 01:41:30.820
if C is a lowercase letter and 0
if C is not a lowercase letter.

01:41:30.820 --> 01:41:33.230
So it returns non-zero.

01:41:33.230 --> 01:41:38.330
So like 1, negative 1, something that's
not 0 if C is a lowercase letter,

01:41:38.330 --> 01:41:41.400
and 0 if it is not a lowercase letter.

01:41:41.400 --> 01:41:43.160
So how can we use this building block?

01:41:43.160 --> 01:41:45.230
Let me go back to my code here.

01:41:45.230 --> 01:41:49.610
Let me add this file, include ctype.h.

01:41:49.610 --> 01:41:53.120
And down here, let me get rid of
this cryptic expression, which

01:41:53.120 --> 01:41:59.060
was kind of painful to come up with,
and just ask this, is-lower s bracket i?

01:42:01.970 --> 01:42:05.390
That should actually work but why?

01:42:05.390 --> 01:42:10.520
Well is-lower, again, returns a non-zero
value if the letter is lowercase.

01:42:10.520 --> 01:42:12.150
Well, what does that mean?

01:42:12.150 --> 01:42:13.415
That means it could return 1.

01:42:13.415 --> 01:42:14.540
It could return negative 1.

01:42:14.540 --> 01:42:16.370
It could return 50 or negative 50.

01:42:16.370 --> 01:42:18.650
It's actually not
precisely defined, why?

01:42:18.650 --> 01:42:19.700
Just, because.

01:42:19.700 --> 01:42:23.750
This was a common convention to
use 0 to represent false and use

01:42:23.750 --> 01:42:26.120
any other value to represent true.

01:42:26.120 --> 01:42:30.140
And so it turns out, that
inside of Boolean expressions,

01:42:30.140 --> 01:42:34.755
if you put a value like a function
call like this, that returns 0,

01:42:34.755 --> 01:42:36.380
that's going to be equivalent to false.

01:42:36.380 --> 01:42:38.975
It's like the answer
being no, it is not lower.

01:42:38.975 --> 01:42:41.990
But you can also, in
parentheses, put the name

01:42:41.990 --> 01:42:45.920
of the function and its arguments,
and not compare it against anything.

01:42:45.920 --> 01:42:51.230
Because we could do something like
this, well if it's not equal to 0, then

01:42:51.230 --> 01:42:52.247
it must be lowercase.

01:42:52.247 --> 01:42:54.830
Because that's the definition,
if it returns a non-zero value,

01:42:54.830 --> 01:42:55.760
it's lowercase.

01:42:55.760 --> 01:42:59.210
But a more succinct way to do that
is just a bit more like English.

01:42:59.210 --> 01:43:04.110
If it's is lower, then print
out the character minus 32.

01:43:04.110 --> 01:43:06.590
So this would be the common
way of using one of these

01:43:06.590 --> 01:43:10.025
is- functions to check if
the answer is true or false.

01:43:10.025 --> 01:43:12.810
AUDIENCE: [INAUDIBLE]

01:43:12.810 --> 01:43:14.670
DAVID MALAN: OK, well we might be done.

01:43:14.670 --> 01:43:15.170
OK.

01:43:15.170 --> 01:43:16.922
AUDIENCE: [INAUDIBLE]

01:43:16.922 --> 01:43:17.900
DAVID MALAN: No.

01:43:17.900 --> 01:43:19.520
So it's not necessarily 1.

01:43:19.520 --> 01:43:23.180
It would be incorrect to check for
1, or negative 1, or anything else.

01:43:23.180 --> 01:43:25.550
You want to check for the opposite of 0.

01:43:25.550 --> 01:43:26.870
So not equal 0.

01:43:26.870 --> 01:43:31.820
Or more succinctly, like I did by
just putting it into parentheses.

01:43:31.820 --> 01:43:34.560
Let me see what happens here.

01:43:34.560 --> 01:43:38.690
So this is great, but some of you
might have spotted a better solution

01:43:38.690 --> 01:43:39.680
to this problem.

01:43:39.680 --> 01:43:42.230
A moment ago when we were on
the manual pages searching

01:43:42.230 --> 01:43:45.380
for things related to lowercase,
what might be another building

01:43:45.380 --> 01:43:46.475
block we can employ here?

01:43:49.160 --> 01:43:50.700
Based on what's on the screen here?

01:43:50.700 --> 01:43:51.200
Yeah?

01:43:51.200 --> 01:43:52.888
AUDIENCE: To-upper.

01:43:52.888 --> 01:43:54.140
DAVID MALAN: So to-upper.

01:43:54.140 --> 01:43:57.098
There's a function that would literally
do the uppercasing thing for me

01:43:57.098 --> 01:44:00.032
so I don't have to get into the
weeds of negative 32, plus 32.

01:44:00.032 --> 01:44:01.490
I don't have to consult that chart.

01:44:01.490 --> 01:44:05.120
Someone has solved this
problem for me in the past.

01:44:05.120 --> 01:44:09.680
And let's see if I can
actually get back to it.

01:44:09.680 --> 01:44:10.520
There we go.

01:44:10.520 --> 01:44:12.540
Let me go ahead, now, and use this.

01:44:12.540 --> 01:44:15.230
So instead of doing
s bracket i minus 32,

01:44:15.230 --> 01:44:19.880
let's use a function that someone else
wrote, and just say to-upper, s bracket

01:44:19.880 --> 01:44:20.420
i.

01:44:20.420 --> 01:44:23.250
And now it's going to
do the solution for me.

01:44:23.250 --> 01:44:30.530
So if I rerun make uppercase, and then
do, slowly, .uppercase, type in hi,

01:44:30.530 --> 01:44:32.120
now it's working as expected.

01:44:32.120 --> 01:44:35.870
And honestly, if I read the
documentation for to-upper

01:44:35.870 --> 01:44:39.170
by going back to its man page,
or manual page, what you'll see

01:44:39.170 --> 01:44:44.420
is that it says if it's lowercase,
it will return the uppercase version

01:44:44.420 --> 01:44:45.050
thereof.

01:44:45.050 --> 01:44:48.913
If it's not lowercase, it's already
uppercase, it's punctuation,

01:44:48.913 --> 01:44:50.705
it will just return
the original character.

01:44:50.705 --> 01:44:53.900
Which means, thanks to this
function, I can actually

01:44:53.900 --> 01:44:57.650
tighten this up significantly,
get rid of all of my conditional

01:44:57.650 --> 01:45:02.030
there, and just print out
the to-upper return value,

01:45:02.030 --> 01:45:05.060
and leave it to whoever wrote
that function to figure out

01:45:05.060 --> 01:45:09.470
if something's uppercase or lowercase.

01:45:09.470 --> 01:45:13.820
All right, questions on
these kinds of tricks?

01:45:13.820 --> 01:45:17.090
Again, it all reduces to
week 0 basics, but we're just

01:45:17.090 --> 01:45:18.750
building these abstractions on top.

01:45:18.750 --> 01:45:19.250
Yeah?

01:45:19.250 --> 01:45:21.208
AUDIENCE: I'm wondering
if there's any way just

01:45:21.208 --> 01:45:25.110
to import all packages under
a certain subdomain instead

01:45:25.110 --> 01:45:27.120
of having to do multiple
[INAUDIBLE] statements,

01:45:27.120 --> 01:45:28.412
kind of like a star [INAUDIBLE]

01:45:28.412 --> 01:45:29.340
DAVID MALAN: Yes.

01:45:29.340 --> 01:45:30.180
Unfortunately, no.

01:45:30.180 --> 01:45:33.120
There is no easy way in C
to say, give me everything.

01:45:33.120 --> 01:45:35.670
That was for, historically,
performance reasons.

01:45:35.670 --> 01:45:38.940
They want you to be explicit
as to what you want to include.

01:45:38.940 --> 01:45:41.730
In other languages like
Python, Java, one of which

01:45:41.730 --> 01:45:44.513
we'll see later this term, you
can say, give me everything.

01:45:44.513 --> 01:45:47.430
But that, actually, tends to be best
practice because it can slow down

01:45:47.430 --> 01:45:50.000
execution or compilation of your code.

01:45:50.000 --> 01:45:50.500
Yeah?

01:45:50.500 --> 01:45:52.845
AUDIENCE: Does to-upper
accommodate for special characters?

01:45:52.845 --> 01:45:53.340
DAVID MALAN: Ah.

01:45:53.340 --> 01:45:55.980
Does to-upper accommodate special
characters like punctuation?

01:45:55.980 --> 01:45:56.480
Yes.

01:45:56.480 --> 01:45:58.440
If I read the documentation
more pedantically,

01:45:58.440 --> 01:45:59.710
we would see exactly that.

01:45:59.710 --> 01:46:02.940
It will properly hand me
back an exclamation point,

01:46:02.940 --> 01:46:04.600
even if I passed it in.

01:46:04.600 --> 01:46:08.970
So if I do make uppercase here,
and let me do ./upper, sorry--

01:46:08.970 --> 01:46:13.620
./uppercase, hi with an exclamation
point, it's going to handle that, too,

01:46:13.620 --> 01:46:15.810
pass it through unchanged Yeah?

01:46:15.810 --> 01:46:19.200
AUDIENCE: Do we access to a
function that would do all of that

01:46:19.200 --> 01:46:21.590
but just to the screen
rather than to [INAUDIBLE]

01:46:21.590 --> 01:46:23.550
DAVID MALAN: Really good question, too.

01:46:23.550 --> 01:46:28.110
No, we do not have access to a function
that at least comes with C or comes

01:46:28.110 --> 01:46:31.740
with CS50's library that will just
force the whole thing to uppercase.

01:46:31.740 --> 01:46:34.170
In C, that's actually
easier said than done.

01:46:34.170 --> 01:46:35.550
In Python, it's trivial.

01:46:35.550 --> 01:46:39.810
So stay tuned for another language
that will let us do exactly that.

01:46:39.810 --> 01:46:42.510
All right, so what does
this leave us with?

01:46:42.510 --> 01:46:44.520
There's just a-- let's
come full circle now,

01:46:44.520 --> 01:46:47.490
to where we began today where we
were talking about those command line

01:46:47.490 --> 01:46:48.090
arguments.

01:46:48.090 --> 01:46:51.810
Recall that we talked about rm
taking command line argument.

01:46:51.810 --> 01:46:54.470
The file you want to delete,
we talked about clang

01:46:54.470 --> 01:46:56.220
taking command line
arguments, that again,

01:46:56.220 --> 01:46:58.140
modify the behavior of the program.

01:46:58.140 --> 01:47:01.680
How is it that maybe you and I
can start to write programs that

01:47:01.680 --> 01:47:03.840
actually take command line arguments?

01:47:03.840 --> 01:47:07.620
Well here is where I
can finally explain why

01:47:07.620 --> 01:47:10.740
we've been typing int
main(void) for the past week

01:47:10.740 --> 01:47:14.490
and just asking that you take on faith
that it's just the way you do things.

01:47:14.490 --> 01:47:20.820
Well, by default in C, at least
the most recent versions thereof,

01:47:20.820 --> 01:47:24.010
there's only two official
ways to write main functions.

01:47:24.010 --> 01:47:26.460
You might see other formats
online, but they're generally

01:47:26.460 --> 01:47:28.870
not consistent with the
current specification.

01:47:28.870 --> 01:47:32.160
This, again, was sort of a
boilerplate for the simplest

01:47:32.160 --> 01:47:34.770
function we might write last
week, and recall that we've

01:47:34.770 --> 01:47:36.210
been doing this the whole time.

01:47:36.210 --> 01:47:40.990
(Void) What that (void) means, for all
of the programs I have written thus far

01:47:40.990 --> 01:47:43.890
and you have written thus far,
is that none of our programs

01:47:43.890 --> 01:47:47.040
that we've written take
command line arguments.

01:47:47.040 --> 01:47:49.110
That's what the void there means.

01:47:49.110 --> 01:47:53.950
It turns out that main is the way you
can specify that your program does,

01:47:53.950 --> 01:47:55.740
in fact, take command
line arguments, that

01:47:55.740 --> 01:47:59.760
is words after the command
in your terminal window.

01:47:59.760 --> 01:48:02.220
If you want to actually not
use get int or get string,

01:48:02.220 --> 01:48:05.970
you want the human to be able to
say something, like hello, David

01:48:05.970 --> 01:48:06.840
and hit Enter.

01:48:06.840 --> 01:48:09.940
And just run-- print
hello, David on the screen.

01:48:09.940 --> 01:48:14.460
You can use command line arguments,
words after the program name

01:48:14.460 --> 01:48:16.750
on your command line.

01:48:16.750 --> 01:48:20.460
So we're going to change this in a
moment to be something more verbose,

01:48:20.460 --> 01:48:23.930
but something that's now a bit
more familiar syntactically.

01:48:23.930 --> 01:48:28.440
If you change that (void) in main
to be this incantation instead,

01:48:28.440 --> 01:48:33.480
int, argc, comma, string, argv,
open bracket, close bracket,

01:48:33.480 --> 01:48:36.630
you are now giving yourself
access to writing programs

01:48:36.630 --> 01:48:38.910
that take command line arguments.

01:48:38.910 --> 01:48:42.120
Argc, which stands for
argument count is going

01:48:42.120 --> 01:48:46.410
to be an integer that stores how many
words the human typed at the prompt.

01:48:46.410 --> 01:48:49.050
The C automatically gives that to you.

01:48:49.050 --> 01:48:52.710
String argv stands for
argument vector, that's

01:48:52.710 --> 01:48:57.100
going to be an array of all of the words
that the human typed at the prompt.

01:48:57.100 --> 01:48:59.130
So with today's building
block of an array,

01:48:59.130 --> 01:49:01.980
we have the ability now to let
the humans type as many words,

01:49:01.980 --> 01:49:03.900
or as few words, as
they want at the prompt.

01:49:03.900 --> 01:49:06.900
C is going to automatically put
them in an array called argv,

01:49:06.900 --> 01:49:12.360
and it's going to tell us how many
words there are in an int called argc.

01:49:12.360 --> 01:49:16.060
The int, as the return type here,
we'll come back to in just a moment.

01:49:16.060 --> 01:49:19.350
Let's use this definition
to make, maybe,

01:49:19.350 --> 01:49:20.970
just a couple of simple programs.

01:49:20.970 --> 01:49:23.070
But in problem set 2
will we actually use

01:49:23.070 --> 01:49:26.470
this to control the
behavior of your own code.

01:49:26.470 --> 01:49:33.120
Let me code up a file called
argv.0 just to keep it aptly named.

01:49:33.120 --> 01:49:35.700
Let me include cs50.h.

01:49:35.700 --> 01:49:37.240
Let me go ahead and include--

01:49:37.240 --> 01:49:37.740
oops.

01:49:37.740 --> 01:49:40.950
That is not the right name of a
program, let's start that over.

01:49:40.950 --> 01:49:45.450
Let's go ahead and code up argv.c.

01:49:45.450 --> 01:49:46.800
And here we have--

01:49:46.800 --> 01:49:52.890
include cs50.h, include
stdio.h, int, main, not void,

01:49:52.890 --> 01:50:00.025
let's actually say int, argc, string,
argv, open bracket, close bracket.

01:50:00.025 --> 01:50:02.400
No numbers in between because
you don't know, in advance,

01:50:02.400 --> 01:50:05.310
how many words the human's
going to type at their prompt.

01:50:05.310 --> 01:50:06.760
Now let's go ahead and do this.

01:50:06.760 --> 01:50:10.800
Let's write a very simple program that
just says, hello, David, hello, Carter,

01:50:10.800 --> 01:50:12.660
whoever the name is that gets typed.

01:50:12.660 --> 01:50:16.260
But not using get string, let's
instead have the human just

01:50:16.260 --> 01:50:19.890
type their name at the prompt, just like
rm, just like clang, just like make,

01:50:19.890 --> 01:50:22.170
so it's just one and
done when you hit Enter.

01:50:22.170 --> 01:50:23.610
No additional prompts.

01:50:23.610 --> 01:50:28.380
Let me go ahead then and do this,
printf, quote-unquote, hello,

01:50:28.380 --> 01:50:31.500
comma, and instead of world
today, I want to print out

01:50:31.500 --> 01:50:33.370
whatever the human typed in.

01:50:33.370 --> 01:50:38.850
So let's go ahead and do
this, argv, bracket 0 for now.

01:50:38.850 --> 01:50:43.080
But I don't think this is quite
what I want because, of course,

01:50:43.080 --> 01:50:48.370
that's going to literally print
out argv, bracket, 0, bracket.

01:50:48.370 --> 01:50:52.510
I need a placeholder, so let me
put %s here and then put that here.

01:50:52.510 --> 01:50:56.520
So if argv is an array, but
it's an array of strings,

01:50:56.520 --> 01:51:00.480
then argv bracket 0 is
itself a single string.

01:51:00.480 --> 01:51:03.450
And so it can be plugged
into that %s placeholder.

01:51:03.450 --> 01:51:05.740
Let me go ahead and save my program.

01:51:05.740 --> 01:51:09.340
And compile argv, so far, so good.

01:51:09.340 --> 01:51:13.170
Let me now type in my name
after the name of the program.

01:51:13.170 --> 01:51:13.980
So no get string.

01:51:13.980 --> 01:51:18.280
I'm literally typing an extra word,
my own name at the prompt, Enter.

01:51:18.280 --> 01:51:21.290
OK, it's apparently a little
buggy in a couple of ways.

01:51:21.290 --> 01:51:24.500
I forgot my /n but
that's not a huge deal.

01:51:24.500 --> 01:51:28.960
But apparently, inside of
argv is literally everything

01:51:28.960 --> 01:51:31.270
that humans typed in including
the name of the program.

01:51:31.270 --> 01:51:36.250
So logically, how do I print out hello,
David, or hello so-and-so and not

01:51:36.250 --> 01:51:37.720
the actual name of the program?

01:51:37.720 --> 01:51:38.960
What needs to change here?

01:51:38.960 --> 01:51:39.460
Yeah?

01:51:39.460 --> 01:51:41.050
AUDIENCE: Change the index to 1.

01:51:41.050 --> 01:51:41.800
DAVID MALAN: Yeah.

01:51:41.800 --> 01:51:45.940
So presumably index to 1, if that's
the second thing I, or whichever human,

01:51:45.940 --> 01:51:46.940
has typed at the prompt.

01:51:46.940 --> 01:51:51.410
So let's do make argv
again, ./argv, Enter.

01:51:51.410 --> 01:51:52.090
Huh.

01:51:52.090 --> 01:51:53.630
Hello, nul.

01:51:53.630 --> 01:51:55.690
So this is another form of nul.

01:51:55.690 --> 01:51:59.320
But this is user error, now, on my part.

01:51:59.320 --> 01:52:01.070
I didn't do exactly what I said I would.

01:52:01.070 --> 01:52:01.570
Yeah?

01:52:01.570 --> 01:52:02.530
AUDIENCE: You forgot the parameter.

01:52:02.530 --> 01:52:04.430
DAVID MALAN: Yeah, I
forgot the parameter.

01:52:04.430 --> 01:52:05.700
So that's actually, hm.

01:52:05.700 --> 01:52:07.450
I should probably deal
with that, somehow,

01:52:07.450 --> 01:52:09.292
so that people aren't
breaking my program

01:52:09.292 --> 01:52:11.000
and printing out random
things, like nul.

01:52:11.000 --> 01:52:14.770
But if I do say argv, David,
now you see hello, David.

01:52:14.770 --> 01:52:18.070
I can get a little curious,
like what's at location 2?

01:52:18.070 --> 01:52:23.410
Well we can see, make argv,
bracket, ./argv, David, Enter.

01:52:23.410 --> 01:52:24.910
All right, so just nothing is there.

01:52:24.910 --> 01:52:28.202
But it turns out, in a couple of weeks,
we'll start really poking around memory

01:52:28.202 --> 01:52:30.310
and see if we can't crash
programs deliberately

01:52:30.310 --> 01:52:32.800
because nothing is
stopping me from saying,

01:52:32.800 --> 01:52:36.470
oh what's at location 2
million, for instance?

01:52:36.470 --> 01:52:38.350
We could really start to get curious.

01:52:38.350 --> 01:52:40.420
But for now, we'll do the right thing.

01:52:40.420 --> 01:52:44.360
But let's now make sure the human has
typed in the right number of words.

01:52:44.360 --> 01:52:50.920
So let's say this, if argc equals
2, that is the name of the program

01:52:50.920 --> 01:52:54.760
and one more word after that, go
ahead and trust that in argv 1,

01:52:54.760 --> 01:52:56.980
as you proposed, is the person's name.

01:52:56.980 --> 01:53:01.810
Else, let's go ahead and default
here to something simple and basic,

01:53:01.810 --> 01:53:05.860
like, well, if we don't get a name
from the user, just say hello, world,

01:53:05.860 --> 01:53:07.300
like always.

01:53:07.300 --> 01:53:10.045
So now we're programming defensively.

01:53:10.045 --> 01:53:13.090
This time the human, even if they
screw up, they don't give us a name

01:53:13.090 --> 01:53:15.965
or they give us too many names,
we're just going to say hello, world,

01:53:15.965 --> 01:53:17.890
because I now have some
error handling here.

01:53:17.890 --> 01:53:22.030
Because, again, argc is argument
count, the number of words, total,

01:53:22.030 --> 01:53:23.990
typed at the command line.

01:53:23.990 --> 01:53:26.740
So make, argv, ./argv.

01:53:26.740 --> 01:53:28.540
Let me make the same mistake as before.

01:53:28.540 --> 01:53:29.050
OK.

01:53:29.050 --> 01:53:30.910
I don't get this weird nul behavior.

01:53:30.910 --> 01:53:32.350
I get something well-defined.

01:53:32.350 --> 01:53:33.610
I could now do David.

01:53:33.610 --> 01:53:36.850
I could do David Malan, but
that's not currently supported.

01:53:36.850 --> 01:53:41.290
I would need to alter my logic to
support more than just two words

01:53:41.290 --> 01:53:42.345
after the prompt.

01:53:42.345 --> 01:53:43.770
So what's the point of this?

01:53:43.770 --> 01:53:45.520
At the moment, it's
just a simple exercise

01:53:45.520 --> 01:53:50.702
to actually give myself a way of taking
user input when they run the program.

01:53:50.702 --> 01:53:52.660
Because, consider, it's
just more convenient in

01:53:52.660 --> 01:53:54.670
this new, command-line-interface world.

01:53:54.670 --> 01:53:58.857
If you had to use get string
every time you compile your code,

01:53:58.857 --> 01:54:00.190
it'd be kind of annoying, right?

01:54:00.190 --> 01:54:03.940
You type make, then you might get a
prompt, what would you like to make?

01:54:03.940 --> 01:54:07.690
Then you type in hello, or cash, or
something else, then you hit Enter,

01:54:07.690 --> 01:54:09.330
it just really slows the process.

01:54:09.330 --> 01:54:11.440
But in this
command-line-interface world,

01:54:11.440 --> 01:54:14.770
if you support command line arguments,
then you can use these little tricks.

01:54:14.770 --> 01:54:18.170
Like, scrolling up and down in
your history with your arrow keys.

01:54:18.170 --> 01:54:22.430
You can just type commands more quickly
because you can do it all at once.

01:54:22.430 --> 01:54:25.000
And you don't have to keep
prompting the user, more

01:54:25.000 --> 01:54:27.760
pedantically, for more and more info.

01:54:27.760 --> 01:54:30.280
So any questions then on
command line arguments?

01:54:30.280 --> 01:54:34.000
Which, finally, reveals why
we had (void) initially,

01:54:34.000 --> 01:54:36.610
but what more we can now put in main.

01:54:36.610 --> 01:54:39.070
That's how you take
command line arguments.

01:54:39.070 --> 01:54:40.500
Yeah?

01:54:40.500 --> 01:54:42.610
AUDIENCE: If you were to put--

01:54:42.610 --> 01:54:47.320
if you were to use argv, and you
were to put integers inside of it,

01:54:47.320 --> 01:54:49.923
would it still give you, like, a string?

01:54:49.923 --> 01:54:51.506
Would that still be considered string?

01:54:51.506 --> 01:54:52.923
Or would you consider [INAUDIBLE]?

01:54:52.923 --> 01:54:53.760
DAVID MALAN: Yes.

01:54:53.760 --> 01:54:56.550
If you were to type at
the command line something

01:54:56.550 --> 01:55:00.660
like, not a word, but
something like the number 42,

01:55:00.660 --> 01:55:03.450
that would actually be
treated as a string.

01:55:03.450 --> 01:55:04.290
Why?

01:55:04.290 --> 01:55:06.220
Because again, context matters.

01:55:06.220 --> 01:55:08.940
So if your program is
currently manipulating memory

01:55:08.940 --> 01:55:12.510
as though its characters or strings,
whatever those patterns of 0s and 1s

01:55:12.510 --> 01:55:16.800
are, they will be interpreted
as ASCII text, or Unicode text.

01:55:16.800 --> 01:55:20.640
If we therefore go to the chart here,
that might make you wonder, well,

01:55:20.640 --> 01:55:24.510
then how do you distinguish numbers
from letters in the context of something

01:55:24.510 --> 01:55:25.890
like chars and strings?

01:55:25.890 --> 01:55:34.380
Well, notice 65 is a, 97 is a,
but also 49 is 1, and 50 is 2.

01:55:34.380 --> 01:55:37.500
So the designers of ASCII,
and then later Unicode,

01:55:37.500 --> 01:55:40.680
realized well wait a minute,
if we want to support programs

01:55:40.680 --> 01:55:43.440
that let you type things
that look like numbers,

01:55:43.440 --> 01:55:46.350
even though they're not
technically ints or floats,

01:55:46.350 --> 01:55:50.620
we need a way in ASCII and
Unicode to represent even numbers.

01:55:50.620 --> 01:55:51.870
So here are your numbers.

01:55:51.870 --> 01:55:55.210
And it's a little silly that we have
numbers representing other numbers.

01:55:55.210 --> 01:55:57.863
But again, if you're in the
world of letters and characters,

01:55:57.863 --> 01:56:00.030
you've got to come up with
a mapping for everything.

01:56:00.030 --> 01:56:01.790
And notice here, here's the dot.

01:56:01.790 --> 01:56:06.390
Even if you were to represent 1.23
as a string, or as characters,

01:56:06.390 --> 01:56:10.840
even the dot now is going to be
represented as an ASCII character.

01:56:10.840 --> 01:56:12.930
So again, context here matters.

01:56:12.930 --> 01:56:17.370
All right, one final example
to tease apart what this int is

01:56:17.370 --> 01:56:19.840
and what it's been
doing here for so long.

01:56:19.840 --> 01:56:24.780
So I'm going to add one
bit of logic to a new file

01:56:24.780 --> 01:56:27.750
that I'm going to call exit.c.

01:56:27.750 --> 01:56:29.130
So an exit.c.

01:56:29.130 --> 01:56:32.880
We're going to introduce something that
are generally known as exit status.

01:56:32.880 --> 01:56:34.980
It turns out this is not
a feature we've used yet,

01:56:34.980 --> 01:56:37.240
but it's just useful to know about.

01:56:37.240 --> 01:56:40.350
Especially when automating
tests of your own code.

01:56:40.350 --> 01:56:44.115
When it comes to figuring out if
a program succeeded or failed.

01:56:44.115 --> 01:56:48.870
It turns out that main has one
more feature we haven't leveraged.

01:56:48.870 --> 01:56:54.330
An ability to signal to the user
whether something was successful or not.

01:56:54.330 --> 01:56:57.760
And that's by way of
main's return value.

01:56:57.760 --> 01:57:02.060
So I'm going modify this
program as follows, like this.

01:57:02.060 --> 01:57:04.920
Suppose I want to write
a similar program that

01:57:04.920 --> 01:57:07.900
requires that the user
type a word at the prompt.

01:57:07.900 --> 01:57:12.450
So that argc has to be 2
for whatever design purpose.

01:57:12.450 --> 01:57:18.990
If argc does not equal 2, I want to
quit out of my program prematurely.

01:57:18.990 --> 01:57:22.590
I want to insist that the user
operate the program correctly.

01:57:22.590 --> 01:57:28.800
So I might give them an error message
like, missing command line argument /n.

01:57:28.800 --> 01:57:31.180
But now I want to quit
out of the program.

01:57:31.180 --> 01:57:32.310
Now how can I do that?

01:57:32.310 --> 01:57:37.260
The right way, quote-unquote, to do
that is to return a value from main.

01:57:37.260 --> 01:57:40.590
Now it's a little weird
because no one called main yet,

01:57:40.590 --> 01:57:42.990
right, main just gets
called automatically,

01:57:42.990 --> 01:57:45.300
but the convention is
anytime something goes

01:57:45.300 --> 01:57:50.100
wrong in a program you should
return a non-zero value from main.

01:57:50.100 --> 01:57:51.780
1 is fine as a go-to.

01:57:51.780 --> 01:57:55.470
We don't need to get into the weeds of
having many different exit statuses,

01:57:55.470 --> 01:57:56.220
so to speak.

01:57:56.220 --> 01:58:01.770
But if you return 1, that is a clue to
the system, the Mac, the PC, the cloud

01:58:01.770 --> 01:58:03.430
device that's something went wrong.

01:58:03.430 --> 01:58:03.930
Why?

01:58:03.930 --> 01:58:05.670
Because 1 is not 0.

01:58:05.670 --> 01:58:11.460
If everything works fine, like, let's go
ahead and print out hello comma %s like

01:58:11.460 --> 01:58:16.620
before, quote-unquote argv bracket 1.

01:58:16.620 --> 01:58:19.080
So this is just a version of
the program without an else.

01:58:19.080 --> 01:58:21.390
So this is the same
as doing, essentially,

01:58:21.390 --> 01:58:23.580
an else here like I did earlier.

01:58:23.580 --> 01:58:26.740
I want to signal to the
computer that all is well.

01:58:26.740 --> 01:58:28.290
And so I return 0.

01:58:28.290 --> 01:58:31.650
But strictly speaking, if
I'm already returning here,

01:58:31.650 --> 01:58:34.560
I don't technically need, if
I really want to be nit picky,

01:58:34.560 --> 01:58:36.870
I don't technically need the
else because the only way

01:58:36.870 --> 01:58:41.486
I'm going to get to line 11
is if I didn't already return.

01:58:41.486 --> 01:58:43.180
So what's going on here?

01:58:43.180 --> 01:58:46.530
The only new thing here logically,
is that for the first time ever,

01:58:46.530 --> 01:58:48.810
I'm returning a value from main.

01:58:48.810 --> 01:58:50.730
That's something I
could always have done

01:58:50.730 --> 01:58:55.290
because main has always been defined by
us as taking an int as a return value.

01:58:55.290 --> 01:58:59.880
By default, main automatically,
sort of secretly, returns 0 for you.

01:58:59.880 --> 01:59:02.850
If you've never once use the
return keyword, which you probably

01:59:02.850 --> 01:59:05.370
haven't in main, it just
automatically returns 0

01:59:05.370 --> 01:59:07.295
and the system assumes
that all went well.

01:59:07.295 --> 01:59:09.390
But now that we're starting
to get a little more

01:59:09.390 --> 01:59:11.520
sophisticated with our
code, and you know,

01:59:11.520 --> 01:59:15.480
the programmer, something went
wrong, you can abort programs early.

01:59:15.480 --> 01:59:20.610
You can exit out of them by returning
some other value, besides 0, from main.

01:59:20.610 --> 01:59:23.040
And this is fortuitous
that it's an int, right?

01:59:23.040 --> 01:59:25.110
0 means everything worked.

01:59:25.110 --> 01:59:29.250
Unfortunately, in programming, there are
seemingly, an infinite number of things

01:59:29.250 --> 01:59:30.240
that can go wrong.

01:59:30.240 --> 01:59:33.210
And int gives you 4
billion possible codes

01:59:33.210 --> 01:59:36.455
that you can use, a.k.a. exit
statuses, to signify errors.

01:59:36.455 --> 01:59:39.930
So if you've ever on your Mac
or PC gotten some weird pop up

01:59:39.930 --> 01:59:43.320
that an error happened, sometimes,
there's a cryptic number in it.

01:59:43.320 --> 01:59:45.420
Maybe it's positive,
maybe it's negative.

01:59:45.420 --> 01:59:50.170
It might say error code 123, or
negative 49, or something like that.

01:59:50.170 --> 01:59:54.310
What you're generally seeing, are
these exit statuses, these return

01:59:54.310 --> 01:59:57.610
values from main in a program
that someone at Microsoft,

01:59:57.610 --> 02:00:01.120
or Apple, or somewhere else
wrote, something went wrong,

02:00:01.120 --> 02:00:05.980
they are unnecessarily showing you,
the user what the error code is.

02:00:05.980 --> 02:00:09.100
If only, so that when you call
customer support or submit a ticket,

02:00:09.100 --> 02:00:12.190
you can tell them what exit
status you encountered,

02:00:12.190 --> 02:00:15.070
what error code you encounter.

02:00:15.070 --> 02:00:19.390
All right, any questions
on exit statuses,

02:00:19.390 --> 02:00:24.580
which is the last of our new
building blocks, for now?

02:00:24.580 --> 02:00:25.540
Any questions at all?

02:00:25.540 --> 02:00:26.040
Yeah?

02:00:26.040 --> 02:00:33.540
AUDIENCE: [INAUDIBLE] You know how
if you have get string or get int,

02:00:33.540 --> 02:00:35.418
if you want to make [INAUDIBLE]

02:00:35.418 --> 02:00:36.085
DAVID MALAN: No.

02:00:36.085 --> 02:00:39.265
The question is can you
do things again and again

02:00:39.265 --> 02:00:41.890
at the command line like you
could with get string and get int.

02:00:41.890 --> 02:00:43.870
Which, by default,
recall are automatically

02:00:43.870 --> 02:00:46.420
designed to keep prompting
the user in their own loop

02:00:46.420 --> 02:00:49.960
until they give you an int, or a
float, or the like with command line

02:00:49.960 --> 02:00:50.740
arguments, no.

02:00:50.740 --> 02:00:52.210
You're going to get an
error message but then

02:00:52.210 --> 02:00:54.002
you're going to be
returned to your prompt.

02:00:54.002 --> 02:00:57.387
And it's up to you to type
it correctly the next time.

02:00:57.387 --> 02:00:57.970
Good question.

02:00:57.970 --> 02:00:58.470
Yeah?

02:00:58.470 --> 02:01:03.435
AUDIENCE: [INAUDIBLE]
automatically for you.

02:01:03.435 --> 02:01:05.310
DAVID MALAN: If you
do not return a value

02:01:05.310 --> 02:01:08.730
explicitly main will
automatically return 0 for you,

02:01:08.730 --> 02:01:12.640
that is the way C simply works
so it's not strictly necessary.

02:01:12.640 --> 02:01:15.510
But now that we're starting
to return values explicitly,

02:01:15.510 --> 02:01:18.090
if something goes wrong,
it would be good practice

02:01:18.090 --> 02:01:21.480
to also start returning a value
for main when something goes right

02:01:21.480 --> 02:01:23.775
and there are no errors.

02:01:23.775 --> 02:01:27.810
So let's now get out of
the weeds and contextualize

02:01:27.810 --> 02:01:31.200
this for some actual problems that
we'll be solving in the coming days

02:01:31.200 --> 02:01:33.130
by way of problems set 2 and beyond.

02:01:33.130 --> 02:01:35.740
So here for instance--

02:01:35.740 --> 02:01:39.990
So here for instance, is a
problem that you might think back

02:01:39.990 --> 02:01:43.980
to when you were a kid the
readability of some text or some book,

02:01:43.980 --> 02:01:46.230
the grade level in which
some book is written.

02:01:46.230 --> 02:01:49.740
If you're a young student, you
might read at first-grade level

02:01:49.740 --> 02:01:51.240
or third-grade level in the US.

02:01:51.240 --> 02:01:53.032
Or, if you're in college
presumably, you're

02:01:53.032 --> 02:01:54.945
reading at a university-level of text.

02:01:54.945 --> 02:01:58.073
But what does it mean
for text, like in a book,

02:01:58.073 --> 02:02:00.240
or in an essay, or something
like that to correspond

02:02:00.240 --> 02:02:01.590
to some kind of grade level?

02:02:01.590 --> 02:02:04.950
Well, here's a quote-- a
title of a childhood book.

02:02:04.950 --> 02:02:07.590
One Fish, Two Fish, Red Fish, Blue Fish.

02:02:07.590 --> 02:02:10.840
What might the grade level be for
a book that has words like this?

02:02:10.840 --> 02:02:13.590
Maybe, when you were a kid or if
you have a siblings still reading

02:02:13.590 --> 02:02:16.260
these things, what might the
grade level of this thing be?

02:02:18.800 --> 02:02:19.590
Any guesses?

02:02:19.590 --> 02:02:20.090
Yeah?

02:02:20.090 --> 02:02:21.257
AUDIENCE: Before grade 1.

02:02:21.257 --> 02:02:22.340
DAVID MALAN: Sorry, again?

02:02:22.340 --> 02:02:23.382
AUDIENCE: Before grade 1.

02:02:23.382 --> 02:02:25.650
DAVID MALAN: Before grade
1 is, in fact, correct.

02:02:25.650 --> 02:02:27.290
So that's for really young kids?

02:02:27.290 --> 02:02:28.230
Why is that?

02:02:28.230 --> 02:02:29.180
Well, let's consider.

02:02:29.180 --> 02:02:32.210
These are pretty simple phrases, right?

02:02:32.210 --> 02:02:33.500
One fish, two fish, red--

02:02:33.500 --> 02:02:35.960
I mean there's not even
verbs in these sentences,

02:02:35.960 --> 02:02:40.040
they're just nouns and adjectives,
and very short sentences.

02:02:40.040 --> 02:02:42.200
And so that might be a
heuristic we could use.

02:02:42.200 --> 02:02:44.810
When analyzing text, well if
the words are kind of short,

02:02:44.810 --> 02:02:47.240
the sentences are kind of
short, everything's very simple,

02:02:47.240 --> 02:02:50.250
that's probably a very
young, or early, grade level.

02:02:50.250 --> 02:02:53.665
And so by one formulation, it might
indeed be even before grade 1,

02:02:53.665 --> 02:02:54.665
for someone quite young.

02:02:54.665 --> 02:02:55.670
How about this?

02:02:55.670 --> 02:02:58.022
Mr and Mrs. Dursley, of
number 4, Privet Drive,

02:02:58.022 --> 02:03:00.980
were proud to say that they were
perfectly normal, thank you very much.

02:03:00.980 --> 02:03:02.960
They were the last
people you would expect

02:03:02.960 --> 02:03:05.120
to be involved in anything
strange or mysterious

02:03:05.120 --> 02:03:07.850
because they just didn't
hold with such nonsense.

02:03:07.850 --> 02:03:08.782
And, onward.

02:03:08.782 --> 02:03:10.490
All right, what grade
level is this book?

02:03:10.490 --> 02:03:11.778
AUDIENCE: Third.

02:03:11.778 --> 02:03:13.070
DAVID MALAN: OK, I heard third.

02:03:13.070 --> 02:03:14.585
AUDIENCE: What?

02:03:14.585 --> 02:03:15.980
DAVID MALAN: Seventh, fifth.

02:03:15.980 --> 02:03:17.150
OK, all over the place.

02:03:17.150 --> 02:03:20.540
But grade 7, according to
one particular measure.

02:03:20.540 --> 02:03:24.802
And whether or not we can debate exactly
what age you were when you read this,

02:03:24.802 --> 02:03:27.260
and maybe you're feeling ahead
of your time, or behind now.

02:03:27.260 --> 02:03:31.470
But here, we have a snippet of text.

02:03:31.470 --> 02:03:36.560
What makes this text assume an older
audience, a more mature audience,

02:03:36.560 --> 02:03:39.690
a higher grade level, would you think?

02:03:39.690 --> 02:03:40.190
Yeah?

02:03:40.190 --> 02:03:42.415
AUDIENCE: [INAUDIBLE]

02:03:42.415 --> 02:03:45.110
DAVID MALAN: Yeah, it's longer,
different types of words,

02:03:45.110 --> 02:03:47.513
there's commas now in
phrases, and so forth.

02:03:47.513 --> 02:03:49.680
So there's just some kind
of sophistication to this.

02:03:49.680 --> 02:03:52.280
So it turns out for the
upcoming problem set,

02:03:52.280 --> 02:03:55.370
among the things you'll do is
take, as input, texts like this

02:03:55.370 --> 02:03:56.510
and analyze them.

02:03:56.510 --> 02:03:59.072
Considering , well, how
many words are in the text?

02:03:59.072 --> 02:04:00.530
How many sentences are in the text?

02:04:00.530 --> 02:04:02.375
How many letters are in the text?

02:04:02.375 --> 02:04:06.170
And use those according to a
well-defined formula to prescribe what,

02:04:06.170 --> 02:04:09.680
exactly, the grade level of some
actual text-- there's the third--

02:04:09.680 --> 02:04:10.582
might actually be.

02:04:10.582 --> 02:04:12.790
Well what else are we going
to do in the coming days?

02:04:12.790 --> 02:04:15.410
Well I've alluded to this notion
of cryptography in the past.

02:04:15.410 --> 02:04:18.350
This notion of scrambling
information in such a way

02:04:18.350 --> 02:04:21.422
that you can hide the
contents of a message

02:04:21.422 --> 02:04:23.630
from someone who might
otherwise intercept it, right?

02:04:23.630 --> 02:04:26.130
The earliest form of this might
also be when you're younger,

02:04:26.130 --> 02:04:29.390
and you're in class, and you're passing
a note from one person to another,

02:04:29.390 --> 02:04:30.650
from yourself to someone else.

02:04:30.650 --> 02:04:32.960
You don't want to necessarily
write a note in English,

02:04:32.960 --> 02:04:35.120
or some other written,
language you might want

02:04:35.120 --> 02:04:37.430
to scramble it somehow, or encrypt it.

02:04:37.430 --> 02:04:40.460
Maybe you change the As
to a B, and the Bs to a C.

02:04:40.460 --> 02:04:42.770
So that if the teacher snaps
it up and intercepts it,

02:04:42.770 --> 02:04:45.200
they can't actually
understand what it is you've

02:04:45.200 --> 02:04:47.160
written because it's encrypted.

02:04:47.160 --> 02:04:49.610
So long as your friend,
the recipient of this note,

02:04:49.610 --> 02:04:51.890
knows how you manipulated it.

02:04:51.890 --> 02:04:55.640
How you added or subtracted
letters to each other,

02:04:55.640 --> 02:04:58.850
they can decrypt it, which
is to reverse that process.

02:04:58.850 --> 02:05:02.070
So formally, in the world of
cryptography and computer science,

02:05:02.070 --> 02:05:04.130
this is another problem to solve.

02:05:04.130 --> 02:05:07.173
Your input, though, when you have a
message you want to send securely,

02:05:07.173 --> 02:05:08.840
is what's generally known as plain text.

02:05:08.840 --> 02:05:12.980
There's some algorithm that's
going to then encipher, or encrypt

02:05:12.980 --> 02:05:16.100
that information, into what's
called ciphertext, which

02:05:16.100 --> 02:05:18.650
is the scrambled version that
theoretically can get safely

02:05:18.650 --> 02:05:21.110
intercepted and your message
has not been spoiled,

02:05:21.110 --> 02:05:24.620
unless that intercept
actually knows what algorithm

02:05:24.620 --> 02:05:27.150
you used inside of this process.

02:05:27.150 --> 02:05:29.720
So that would be generally
known as a cipher.

02:05:29.720 --> 02:05:33.080
The ciphers typically take,
though, not one input, but two.

02:05:33.080 --> 02:05:37.685
If, for instance, your cipher
is as simple as A becomes B,

02:05:37.685 --> 02:05:41.420
B becomes C, C becomes D,
dot dot dot, Z becomes A,

02:05:41.420 --> 02:05:45.140
you're essentially adding one to
every letter and encrypting it.

02:05:45.140 --> 02:05:47.750
Now that would be,
what we call, the key.

02:05:47.750 --> 02:05:51.470
You and the recipient both have to
agree, presumably, before class,

02:05:51.470 --> 02:05:55.280
in advance, what number you're
going to use that day to rotate,

02:05:55.280 --> 02:05:56.960
or change all of these letters by.

02:05:56.960 --> 02:06:00.410
Because when you add 1, they
upon receiving your ciphertext

02:06:00.410 --> 02:06:03.090
have to subtract 1 to
get back the answer.

02:06:03.090 --> 02:06:07.730
For instance, if the input,
plaintext, is hi, as before,

02:06:07.730 --> 02:06:13.010
and the key is 1, the ciphertext using
this simple rotational algorithm,

02:06:13.010 --> 02:06:17.720
otherwise known as the Caesar cipher,
might be ij exclamation point.

02:06:17.720 --> 02:06:21.408
So it's similar, but it's at
least scrambled at first glance.

02:06:21.408 --> 02:06:23.450
And unless the teacher
really cares to figure out

02:06:23.450 --> 02:06:26.420
what algorithm are they using today,
or what key are they using today,

02:06:26.420 --> 02:06:29.700
it's probably sufficiently
secure for your purposes.

02:06:29.700 --> 02:06:31.160
How do you reverse the process?

02:06:31.160 --> 02:06:34.190
Well, your friend gets this
and reverses it by negative 1.

02:06:34.190 --> 02:06:38.630
So I becomes H, J becomes I,
and things like punctuation

02:06:38.630 --> 02:06:41.060
remain untouched at
least in this scheme.

02:06:41.060 --> 02:06:43.580
So let's consider one
final example here.

02:06:43.580 --> 02:06:51.080
If the input to the algorithm
is Uijtxbtdt50, and the key

02:06:51.080 --> 02:06:53.090
this time is negative 1.

02:06:53.090 --> 02:06:59.510
Such that now B should become A, and C
should become B, and A should become A.

02:06:59.510 --> 02:07:01.130
So we're going in the other direction.

02:07:01.130 --> 02:07:03.030
How might we analyze this?

02:07:03.030 --> 02:07:06.000
Well if we spread all the letters
out, and we start from left to right,

02:07:06.000 --> 02:07:11.780
and we start subtracting one letter,
U becomes T, I becomes H, J becomes I,

02:07:11.780 --> 02:07:17.220
T becomes S, X becomes W, A, was, D, T--

02:07:17.220 --> 02:07:18.270
this was CS50.

02:07:18.270 --> 02:07:19.470
We'll see you next time.

02:07:19.470 --> 02:07:21.320
[APPLAUSE]

02:07:20.000 --> 02:07:56.000
[MUSIC PLAYING]