WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:03.493 [MUSIC PLAYING] 00:00:49.420 --> 00:00:51.760 DAVID MALAN: All right, so this is CS50. 00:00:51.760 --> 00:00:55.450 And this is week 2, wherein we're going to dive in a little more deeply 00:00:55.450 --> 00:00:56.720 to see this new language. 00:00:56.720 --> 00:00:58.720 And we're also going to take a look back at some 00:00:58.720 --> 00:01:02.350 of the concepts we looked at last week so that you can better understand some 00:01:02.350 --> 00:01:04.750 of the features of C and some of the steps 00:01:04.750 --> 00:01:06.830 you've been taking to make your code work. 00:01:06.830 --> 00:01:09.880 So we'll peel back some of the layers of abstraction from last week 00:01:09.880 --> 00:01:11.950 so that you better understand really what's going 00:01:11.950 --> 00:01:14.540 on underneath the hood of the computer. 00:01:14.540 --> 00:01:18.907 So, of course, last week, we began with perhaps the most canonical of programs 00:01:18.907 --> 00:01:20.740 in C, the most canonical of programs you can 00:01:20.740 --> 00:01:23.140 write pretty much in any language, which is that which 00:01:23.140 --> 00:01:25.030 says, quite simply, "hello, world." 00:01:25.030 --> 00:01:28.600 But recall that before actually running this program, 00:01:28.600 --> 00:01:31.600 we have to convert it into the language that computers themselves speak, 00:01:31.600 --> 00:01:35.080 which we defined last week as binary, 0's and 1's, otherwise known 00:01:35.080 --> 00:01:37.640 as machine language in this context. 00:01:37.640 --> 00:01:40.120 So we have to go somehow from this source code to something 00:01:40.120 --> 00:01:44.380 more like this machine code, the 0's and 1's that the computer actually 00:01:44.380 --> 00:01:45.250 understands. 00:01:45.250 --> 00:01:48.160 Now, you may recall too that we introduced a command for this. 00:01:48.160 --> 00:01:49.840 And that command was called make. 00:01:49.840 --> 00:01:53.770 And literally via this command, "make hello," could we make a program 00:01:53.770 --> 00:01:54.490 called hello. 00:01:54.490 --> 00:01:55.840 And make was a little fancy. 00:01:55.840 --> 00:01:59.140 It assumed that if you want to make a program called hello, 00:01:59.140 --> 00:02:01.720 it would look for a file called hello.c. 00:02:01.720 --> 00:02:03.850 That just happens automatically for you. 00:02:03.850 --> 00:02:07.240 And the end result, of course, was an additional file called hello 00:02:07.240 --> 00:02:11.390 that would end up getting put into your current directory. 00:02:11.390 --> 00:02:14.360 So you could then do ./hello and be on your way. 00:02:14.360 --> 00:02:17.620 But it turns out that make is actually automating 00:02:17.620 --> 00:02:20.650 a more specific set of steps for us that we'll 00:02:20.650 --> 00:02:22.510 see a little more closely now instead. 00:02:22.510 --> 00:02:24.640 So on the screen here is exactly the same code 00:02:24.640 --> 00:02:27.370 that we wrote last week to say, quite simply, "hello, world." 00:02:27.370 --> 00:02:31.810 And recall that any time you run "make hello" or "make mario" 00:02:31.810 --> 00:02:34.120 or "make cash" or "make credit," any of the problems 00:02:34.120 --> 00:02:35.890 that you might have tackled more recently, 00:02:35.890 --> 00:02:38.260 you see some cryptic output on the screen. 00:02:38.260 --> 00:02:42.040 Hopefully, no red or yellow error messages, but even when all is well, 00:02:42.040 --> 00:02:45.770 you see this white text which is indicative of all having been well. 00:02:45.770 --> 00:02:49.240 And last week, we just kind of ignored this and moved on and immediately did 00:02:49.240 --> 00:02:51.400 something like ./hello. 00:02:51.400 --> 00:02:53.528 But today, let's actually better understand 00:02:53.528 --> 00:02:55.570 what it is that we've been turning a blind eye to 00:02:55.570 --> 00:03:00.820 so that each week, as it passes, there's less and less that you don't understand 00:03:00.820 --> 00:03:04.220 the entirety of with respect to what's going on your screen. 00:03:04.220 --> 00:03:08.020 So again, if I do ls here, we'll see not only hello.c, but also 00:03:08.020 --> 00:03:12.910 the executable program called hello that I actually created via make. 00:03:12.910 --> 00:03:14.450 But look at this output. 00:03:14.450 --> 00:03:16.930 There's some mention of something called Clang here. 00:03:16.930 --> 00:03:21.730 And then there's a lot of other words or cryptic phrases, something in computer 00:03:21.730 --> 00:03:24.500 speak here that has all of these hyphens in front of them. 00:03:24.500 --> 00:03:26.770 And it turns out that what make is doing for us 00:03:26.770 --> 00:03:31.870 is it's automating execution of a command more specifically called clang. 00:03:31.870 --> 00:03:35.860 Clang is actually the compiler that we alluded to last week, a compiler 00:03:35.860 --> 00:03:39.010 being a program that converts source code to machine code. 00:03:39.010 --> 00:03:41.590 We've actually been using Clang this whole time. 00:03:41.590 --> 00:03:44.860 But notice that Clang requires a bit more sophistication. 00:03:44.860 --> 00:03:48.200 You have to understand a bit more about what's going on in order to use it. 00:03:48.200 --> 00:03:51.190 So let me go ahead and remove the program called hello. 00:03:51.190 --> 00:03:54.160 I'm going to use the rm command that we saw briefly last time. 00:03:54.160 --> 00:03:55.900 I'm going to confirm by hitting y. 00:03:55.900 --> 00:04:00.010 And if I type ls again now, hello.c is the only file that remains. 00:04:00.010 --> 00:04:04.600 Well, temporarily, let me take away the ability to use make. 00:04:04.600 --> 00:04:07.000 And let's now use Clang directly. 00:04:07.000 --> 00:04:10.210 Clang is another program installed in CS50 IDE. 00:04:10.210 --> 00:04:13.540 It's a very popular compiler that you can download onto your own Macs and PCs 00:04:13.540 --> 00:04:14.320 as well. 00:04:14.320 --> 00:04:16.550 But to run it is a little different. 00:04:16.550 --> 00:04:19.930 I'm going to go ahead and say clang and then the name of the file 00:04:19.930 --> 00:04:23.380 that I want to compile, hello.c being this one. 00:04:23.380 --> 00:04:24.970 I'm going to go ahead and hit Enter. 00:04:24.970 --> 00:04:27.340 And now nothing happens, seemingly. 00:04:27.340 --> 00:04:29.410 But frankly, as you've probably gleaned already, 00:04:29.410 --> 00:04:31.698 when nothing bad seems to happen, that implicitly 00:04:31.698 --> 00:04:33.490 tends to mean that something good happened. 00:04:33.490 --> 00:04:35.710 Your program compiled successfully. 00:04:35.710 --> 00:04:39.790 But curiously, if I type ls now, you don't see the program, hello. 00:04:39.790 --> 00:04:42.700 You see this weird file name called a.out. 00:04:42.700 --> 00:04:44.620 And this is actually a historical remnant. 00:04:44.620 --> 00:04:48.220 Years ago, when humans would use a compiler to compile their code, 00:04:48.220 --> 00:04:51.520 the default file name that every program was given 00:04:51.520 --> 00:04:54.460 was a.out for assembly output. 00:04:54.460 --> 00:04:55.670 More on that in a moment. 00:04:55.670 --> 00:04:57.670 But this is kind of a stupid name for a program. 00:04:57.670 --> 00:04:59.590 It's not at all descriptive of what it does. 00:04:59.590 --> 00:05:05.140 So it turns out that programs like Clang can be configured at the command line. 00:05:05.140 --> 00:05:08.140 The command line, again, refers to the blinking prompt where you can 00:05:08.140 --> 00:05:09.280 type commands. 00:05:09.280 --> 00:05:14.200 So indeed, I'm going to go ahead and remove this file now-- rm space a.out, 00:05:14.200 --> 00:05:15.550 and then confirm with y. 00:05:15.550 --> 00:05:18.520 And now I'm back to where I began with just hello.c. 00:05:18.520 --> 00:05:21.140 And let me go ahead now and do something a little different. 00:05:21.140 --> 00:05:27.700 I'm going to do "clang -o hello" and then the word "hello.c." 00:05:27.700 --> 00:05:29.950 And what I'm doing here is actually providing 00:05:29.950 --> 00:05:33.080 what we're going to start calling a command-line argument. 00:05:33.080 --> 00:05:37.330 So these commands, like make and rm, sometimes 00:05:37.330 --> 00:05:39.460 can just be run all by themselves. 00:05:39.460 --> 00:05:41.500 You just type a single word and hit Enter. 00:05:41.500 --> 00:05:44.980 But very often, we've seen that they take inputs in some sense. 00:05:44.980 --> 00:05:46.660 You type, "make hello." 00:05:46.660 --> 00:05:48.870 You type, "rm hello." 00:05:48.870 --> 00:05:51.030 And the second word, "hello," in those cases, 00:05:51.030 --> 00:05:53.910 is kind of an input to the command, otherwise 00:05:53.910 --> 00:05:56.980 now known as a command-line argument. 00:05:56.980 --> 00:05:58.480 It's an input to the command. 00:05:58.480 --> 00:06:01.710 So here, we have more command-line arguments. 00:06:01.710 --> 00:06:06.300 We've got the word "clang," which is the compiler we're about to run, "-o," 00:06:06.300 --> 00:06:09.230 which it turns out is shorthand notation for "output," 00:06:09.230 --> 00:06:10.875 so please output the following. 00:06:10.875 --> 00:06:12.000 What do you want to output? 00:06:12.000 --> 00:06:13.830 Well, the next word is "hello." 00:06:13.830 --> 00:06:16.210 And then the final word is "hello.c." 00:06:16.210 --> 00:06:19.530 So long story short, this command now more verbose 00:06:19.530 --> 00:06:24.210 though it is, is saying, run Clang, output a file called hello, 00:06:24.210 --> 00:06:27.020 and take as input file called hello.c. 00:06:27.020 --> 00:06:30.270 So when I run this command after hitting Enter, nothing again seems to happen. 00:06:30.270 --> 00:06:34.560 But if I type ls, I don't see that stupid default file name of a.out. 00:06:34.560 --> 00:06:37.590 Now I see the file name, hello. 00:06:37.590 --> 00:06:41.310 So this is how ultimately Clang is helping me compile my code. 00:06:41.310 --> 00:06:43.770 It's kind of automating all of those processes. 00:06:43.770 --> 00:06:48.210 But recall that that's not the only type of program we ran last week 00:06:48.210 --> 00:06:49.290 or wrote last week. 00:06:49.290 --> 00:06:52.425 We rather took code like this and began to enhance it 00:06:52.425 --> 00:06:53.550 with some additional lines. 00:06:53.550 --> 00:06:56.040 So version 2 of Hello, World actually involved 00:06:56.040 --> 00:06:59.730 prompting the user for input using CS50's get_string function, 00:06:59.730 --> 00:07:02.940 storing the output in a variable called name. 00:07:02.940 --> 00:07:07.028 But recall that we also had to add cs50.h at the top of the file. 00:07:07.028 --> 00:07:08.320 So let me go ahead and do that. 00:07:08.320 --> 00:07:12.060 Let me go ahead and remove hello because that's now the old version. 00:07:12.060 --> 00:07:18.510 Let me go in now and start updating my code here and go into my hello.c file, 00:07:18.510 --> 00:07:23.010 include cs50.h, now get myself a string called name, 00:07:23.010 --> 00:07:25.890 but we could call it anything, call the function get_string, 00:07:25.890 --> 00:07:30.990 and ask, "What's your name," question mark with a space at the very end 00:07:30.990 --> 00:07:32.340 just to create a gap. 00:07:32.340 --> 00:07:36.090 And then down here, instead of printing out "hello, world" always, 00:07:36.090 --> 00:07:40.110 let me print out "Hello, %s," which is a placeholder recall, 00:07:40.110 --> 00:07:41.910 and output the person's name. 00:07:41.910 --> 00:07:44.760 So last week, the way we compiled this program was just 00:07:44.760 --> 00:07:47.140 "make hello," no different from now. 00:07:47.140 --> 00:07:52.200 But this week, suppose I were to instead get rid of make, only 00:07:52.200 --> 00:07:54.570 because it's sort of automating steps for me that I now 00:07:54.570 --> 00:07:56.250 want to understand in more detail. 00:07:56.250 --> 00:08:01.380 I could compile this program again with clang -o hello hello.c, so just 00:08:01.380 --> 00:08:06.600 a reapplication of that same idea of passing in three arguments, -o, hello, 00:08:06.600 --> 00:08:08.130 and hello.c. 00:08:08.130 --> 00:08:11.070 But the catch now is that I'm actually going 00:08:11.070 --> 00:08:12.880 to see one of these red error messages. 00:08:12.880 --> 00:08:14.880 And let's consider what this is actually saying. 00:08:14.880 --> 00:08:17.440 There's still going to be a bunch of cryptic stuff here. 00:08:17.440 --> 00:08:19.992 But notice, as always, we're going to see, hopefully, 00:08:19.992 --> 00:08:21.450 something that's a little familiar. 00:08:21.450 --> 00:08:23.520 So "undefined reference to get_string." 00:08:23.520 --> 00:08:26.790 I don't yet know what an undefined reference is, necessarily. 00:08:26.790 --> 00:08:28.440 I don't know what a linker command is. 00:08:28.440 --> 00:08:31.680 But I at least recognize there's something going on with get_string. 00:08:31.680 --> 00:08:33.100 And there's a reason for this. 00:08:33.100 --> 00:08:37.140 It turns out that when using a library, whether it's CS50's library or others' 00:08:37.140 --> 00:08:41.159 as well, it's sometimes not sufficient only to include the header 00:08:41.159 --> 00:08:43.650 file at the top of your own code. 00:08:43.650 --> 00:08:46.230 Sometimes, you additionally have to tell the computer 00:08:46.230 --> 00:08:52.350 where to find the 0's and 1's that someone has written to implement 00:08:52.350 --> 00:08:54.290 a function like get_string. 00:08:54.290 --> 00:08:58.860 So the header file, like cs50.h, just tells the compiler 00:08:58.860 --> 00:09:00.540 that the function exists. 00:09:00.540 --> 00:09:02.910 But there's a second mechanism that, up until now, 00:09:02.910 --> 00:09:05.730 has been automated for us, that tells the computer where 00:09:05.730 --> 00:09:10.560 to find the actual 0's and 1's that implements the functions in that header 00:09:10.560 --> 00:09:11.470 file. 00:09:11.470 --> 00:09:15.180 So with that said, I'm going to need to actually add another command line 00:09:15.180 --> 00:09:16.710 argument to this command. 00:09:16.710 --> 00:09:22.140 And instead of doing clang -o hello hello.c, I'm going to additionally, 00:09:22.140 --> 00:09:26.250 and admittedly, cryptically, do -lcs50 at the end 00:09:26.250 --> 00:09:31.440 of this command, which quite simply refers to link in the CS50 library. 00:09:31.440 --> 00:09:33.750 So "link" is a term of art that we'll see what it 00:09:33.750 --> 00:09:35.560 means in more detail in just a moment. 00:09:35.560 --> 00:09:39.330 But this additional final command-line argument tells Clang, 00:09:39.330 --> 00:09:42.660 you already know that a function like get_string exists. 00:09:42.660 --> 00:09:47.100 -lcs50 means when compiling hello.c, make 00:09:47.100 --> 00:09:51.570 sure to incorporate all of the machine code from CS50's library 00:09:51.570 --> 00:09:53.190 into your program as well. 00:09:53.190 --> 00:09:56.880 In short, it's something you have to do when you use certain libraries. 00:09:56.880 --> 00:10:00.610 So now when I hit Enter, all seems to be well because nothing bad got printed. 00:10:00.610 --> 00:10:02.520 If I type ls, I see hello. 00:10:02.520 --> 00:10:06.300 And voila, I can do ./hello, type in my name, David. 00:10:06.300 --> 00:10:08.520 And voila, "hello, David." 00:10:08.520 --> 00:10:10.740 So why didn't we do all of this last week? 00:10:10.740 --> 00:10:13.020 And frankly, we've made no fundamental progress. 00:10:13.020 --> 00:10:15.960 All we've done is reveal what's going on underneath the hood. 00:10:15.960 --> 00:10:19.650 But I'll claim that, frankly, compiling your code by typing out 00:10:19.650 --> 00:10:24.450 all of these verbose command-line arguments just gets tedious quickly. 00:10:24.450 --> 00:10:27.330 And so computer scientists and programmers, more specifically, 00:10:27.330 --> 00:10:29.370 tend to automate monotonous steps. 00:10:29.370 --> 00:10:33.360 So what's happening ultimately with make is that all of this 00:10:33.360 --> 00:10:34.800 is being automated for us. 00:10:34.800 --> 00:10:37.590 So when you typed "make hello" last week-- and henceforth, 00:10:37.590 --> 00:10:40.030 you're welcome to continue using make as well-- 00:10:40.030 --> 00:10:43.710 notice that it generates this extra long command, some of which 00:10:43.710 --> 00:10:45.000 we haven't even talked about. 00:10:45.000 --> 00:10:47.400 But I do recognize clang at the beginning. 00:10:47.400 --> 00:10:51.170 I recognize hello.c see here. 00:10:51.170 --> 00:10:54.330 I recognize -lcs50 here. 00:10:54.330 --> 00:10:56.490 But notice there's a bunch of other stuff as well, 00:10:56.490 --> 00:11:00.470 not only the -o hello, but also -lm, which 00:11:00.470 --> 00:11:03.500 refers to a math library, -lcrypt, which refers 00:11:03.500 --> 00:11:05.900 to a cryptography or an encryption library. 00:11:05.900 --> 00:11:08.810 In short, we the staff have preconfigured 00:11:08.810 --> 00:11:11.450 make to just make sure that when you compile your code, 00:11:11.450 --> 00:11:15.410 all of the requisite dependencies, libraries, and so forth, 00:11:15.410 --> 00:11:18.650 are available to you without having to worry about all 00:11:18.650 --> 00:11:20.100 of these command-line arguments. 00:11:20.100 --> 00:11:22.700 So henceforth, you can certainly compile your code 00:11:22.700 --> 00:11:24.650 in this way using Clang directly. 00:11:24.650 --> 00:11:27.740 Or you can come back full circle to where we were last week 00:11:27.740 --> 00:11:29.240 and just run "make hello." 00:11:29.240 --> 00:11:33.170 But there's a reason we run make hello, because executing all of those steps 00:11:33.170 --> 00:11:35.900 manually tends to just get tedious quickly. 00:11:35.900 --> 00:11:38.992 And so indeed, what we've done here is compile our code. 00:11:38.992 --> 00:11:41.450 And compiling means going from source code to machine code. 00:11:41.450 --> 00:11:44.660 But today, we revealed that there's a little more, indeed, 00:11:44.660 --> 00:11:46.610 going on underneath the hood, this "linking" 00:11:46.610 --> 00:11:49.800 that I referred to and a couple of other steps as well. 00:11:49.800 --> 00:11:53.900 So it turns out when you compile your code from source code to machine code, 00:11:53.900 --> 00:11:56.900 there's a few more steps that are ultimately involved. 00:11:56.900 --> 00:11:59.960 And when we say "compiling," we actually mean these four steps. 00:11:59.960 --> 00:12:03.350 And we're not going to dwell on these kinds of low-level details. 00:12:03.350 --> 00:12:05.690 But it's perhaps enlightening just to see 00:12:05.690 --> 00:12:09.830 a brief tour of what's going on when you start with your source code 00:12:09.830 --> 00:12:11.680 and end up trying to produce machine code. 00:12:11.680 --> 00:12:12.638 So let's consider this. 00:12:12.638 --> 00:12:14.660 This is step 1 that the computer is doing 00:12:14.660 --> 00:12:17.150 for you when you compile your code. 00:12:17.150 --> 00:12:19.640 So step 1 takes your own source code that 00:12:19.640 --> 00:12:21.260 looks a little something like this. 00:12:21.260 --> 00:12:24.650 And it preprocesses your code, top to bottom, left to right. 00:12:24.650 --> 00:12:27.170 And to preprocess your code essentially means 00:12:27.170 --> 00:12:30.500 that it looks for any lines that start with a hash symbol, so 00:12:30.500 --> 00:12:35.300 #include cs50.h, #include stdio.h. 00:12:35.300 --> 00:12:39.230 And what the preprocessing step does is it's kind of like a find and replace. 00:12:39.230 --> 00:12:42.330 It notices, oh, here's a #include line. 00:12:42.330 --> 00:12:49.790 Let me go ahead and copy the contents of that file, cs50.h, into your own code. 00:12:49.790 --> 00:12:54.290 Similarly, when I encounter #include stdio.h, let me, 00:12:54.290 --> 00:12:58.760 the so-called preprocessor, open that file, stdio.h, and copy/paste 00:12:58.760 --> 00:13:04.650 the contents of that file so that what's in the file now looks more like this. 00:13:04.650 --> 00:13:06.290 So this is happening automatically. 00:13:06.290 --> 00:13:08.240 You never have to do this manually. 00:13:08.240 --> 00:13:12.290 But why is there this preprocessing step? 00:13:12.290 --> 00:13:16.880 If you recall our discussion last week of these lines of code that 00:13:16.880 --> 00:13:19.910 tend to go at the top of your file, does anyone 00:13:19.910 --> 00:13:24.580 perceive what the preprocessor is doing for me and why? 00:13:24.580 --> 00:13:29.720 Why do I write code that has these hash symbols, like #include cs50.h 00:13:29.720 --> 00:13:33.350 and #include stdio.h, but this preprocessor apparently 00:13:33.350 --> 00:13:37.415 is automatically replacing those lines with the actual contents 00:13:37.415 --> 00:13:38.040 of those files? 00:13:38.040 --> 00:13:42.740 What are these things here in yellow now? 00:13:42.740 --> 00:13:44.168 Yeah, Jack, what do you think? 00:13:44.168 --> 00:13:46.960 JACK: Is it defining all the functions for you to use in your code, 00:13:46.960 --> 00:13:48.740 otherwise the computer wouldn't know what to do? 00:13:48.740 --> 00:13:49.340 DAVID MALAN: Exactly. 00:13:49.340 --> 00:13:51.065 It's defining all of the functions in my code 00:13:51.065 --> 00:13:52.648 so that the computer knows what to do. 00:13:52.648 --> 00:13:56.113 Because remember that we ran into that sort of annoying bug last week, 00:13:56.113 --> 00:13:58.280 whereby I was trying to implement a function called, 00:13:58.280 --> 00:13:59.960 I think, get_positive_int. 00:13:59.960 --> 00:14:04.350 And recall that when I implemented that function at the bottom of my file, 00:14:04.350 --> 00:14:07.610 the compiler was kind of dumb in that it didn't realize 00:14:07.610 --> 00:14:09.440 that it existed because it was implemented 00:14:09.440 --> 00:14:11.220 all the way at the bottom of my file. 00:14:11.220 --> 00:14:16.040 So to Jack's point, by putting a mention of this function, a hint, if you will, 00:14:16.040 --> 00:14:18.950 at the very top, it's like training the compiler 00:14:18.950 --> 00:14:22.160 to know in advance that I don't know how it's implemented yet, 00:14:22.160 --> 00:14:23.850 but I know get_string is going to exist. 00:14:23.850 --> 00:14:27.540 I don't know how it's implemented yet, but I know printf is going to exist. 00:14:27.540 --> 00:14:31.400 So these header files that we've been including for the past week essentially 00:14:31.400 --> 00:14:34.190 contain all of the prototypes-- 00:14:34.190 --> 00:14:38.240 that is, all of the hints for all the functions that exist in the library-- 00:14:38.240 --> 00:14:42.710 so that your code, when compiled, know from the top down 00:14:42.710 --> 00:14:45.690 that those functions will indeed exist. 00:14:45.690 --> 00:14:47.690 So the preprocessor just saves us the trouble 00:14:47.690 --> 00:14:50.480 of having to copy and paste all of these prototypes, if you will, 00:14:50.480 --> 00:14:52.830 all of these hints, ourselves. 00:14:52.830 --> 00:14:54.950 So what happens after that step there? 00:14:54.950 --> 00:14:55.777 What comes next? 00:14:55.777 --> 00:14:57.860 Well, there might very well be other header files. 00:14:57.860 --> 00:15:00.152 There might very well be other contents in those files. 00:15:00.152 --> 00:15:03.800 But for now, let's just assume that only in there is the prototype. 00:15:03.800 --> 00:15:06.770 So now compiling actually has a more precise meaning 00:15:06.770 --> 00:15:08.000 that we'll define today. 00:15:08.000 --> 00:15:11.690 To compile your code now means to take this C code 00:15:11.690 --> 00:15:17.215 and to convert it from source code here to another type of source code here. 00:15:17.215 --> 00:15:20.090 Now, this is probably going to be the most cryptic stuff we ever see. 00:15:20.090 --> 00:15:22.190 And this is not code you need to understand. 00:15:22.190 --> 00:15:25.460 But what's on the screen here is what's called assembly code. 00:15:25.460 --> 00:15:28.550 So long story short, there's a lot of different computers in the world. 00:15:28.550 --> 00:15:30.650 And specifically, there's a lot of different types 00:15:30.650 --> 00:15:35.730 of CPUs in the, Central Processing Units, the brains of a computer. 00:15:35.730 --> 00:15:39.680 And a CPU understands certain commands. 00:15:39.680 --> 00:15:43.880 And those commands tend to be expressed in this language called assembly code. 00:15:43.880 --> 00:15:46.597 Now, I honestly don't really understand most of this myself. 00:15:46.597 --> 00:15:49.680 It's certainly been a while even since I thought hard about assembly code. 00:15:49.680 --> 00:15:53.460 But if I highlight a few operative characters here, 00:15:53.460 --> 00:15:56.570 notice that there's mention of main, get_string, printf. 00:15:56.570 --> 00:16:00.170 So this is of like a lower-level implementation of main, 00:16:00.170 --> 00:16:03.420 of get_string and printf, in a different language called assembly. 00:16:03.420 --> 00:16:04.820 So you write the C code. 00:16:04.820 --> 00:16:08.630 The computer, though, converts it to a more computer-friendly language 00:16:08.630 --> 00:16:09.960 called assembly code. 00:16:09.960 --> 00:16:12.320 And decades ago, humans wrote this stuff. 00:16:12.320 --> 00:16:14.210 Humans wrote assembly code. 00:16:14.210 --> 00:16:17.585 But nowadays, we have C. And nowadays, we have languages like Python-- 00:16:17.585 --> 00:16:20.210 more on that in a few weeks-- that are just more user friendly, 00:16:20.210 --> 00:16:22.310 even if it didn't feel like that this past week. 00:16:22.310 --> 00:16:26.180 Assembly code is a little closer to what the computer itself understands. 00:16:26.180 --> 00:16:27.740 But there's still another step. 00:16:27.740 --> 00:16:29.240 There's this step called assembling. 00:16:29.240 --> 00:16:31.910 And again, all of this is happening when you simply run 00:16:31.910 --> 00:16:34.580 make and, in turn, this command, clang. 00:16:34.580 --> 00:16:39.350 To assemble your code means to take this assembly code and finally convert it 00:16:39.350 --> 00:16:41.720 to machine code, 0's and 1's. 00:16:41.720 --> 00:16:43.460 So you write the source code. 00:16:43.460 --> 00:16:46.700 The compiler assembles it into assembly code. 00:16:46.700 --> 00:16:49.550 Then it compiles it into assembly code. 00:16:49.550 --> 00:16:54.650 Then it assembles it into machine code until we have the actual 0's and 1's. 00:16:54.650 --> 00:16:56.610 But there's actually one final step. 00:16:56.610 --> 00:17:00.380 Just because your code that you wrote has been converted into 0's and 1's, it 00:17:00.380 --> 00:17:04.369 still needs to be linked in with the 0's and 1's that CS50 wrote 00:17:04.369 --> 00:17:07.280 and that the designers of the C language wrote years ago 00:17:07.280 --> 00:17:09.680 when implementing the CS50 library in our case, 00:17:09.680 --> 00:17:12.470 and the printf function in their case. 00:17:12.470 --> 00:17:15.950 So this is to say that when you have code like this that's not only 00:17:15.950 --> 00:17:20.270 including the prototypes for functions like get_string and printf at the very 00:17:20.270 --> 00:17:24.440 top, these lines here in yellow are what are ultimately converted 00:17:24.440 --> 00:17:27.440 into 0's and 1's. 00:17:27.440 --> 00:17:32.270 We now have to combine those 0's and 1's with the 0's and 1's from cs50.c, 00:17:32.270 --> 00:17:35.030 which the staff wrote some time ago, and even a file 00:17:35.030 --> 00:17:38.588 called stdio.c, which the designers of C wrote years ago. 00:17:38.588 --> 00:17:40.880 And technically, it might be called something different 00:17:40.880 --> 00:17:41.802 underneath the hood. 00:17:41.802 --> 00:17:43.760 But there's really three files that are getting 00:17:43.760 --> 00:17:45.530 combined when you write your program. 00:17:45.530 --> 00:17:51.920 The first, I just claimed, once it's preprocessed and compiled 00:17:51.920 --> 00:17:55.760 and assembled, it's then in this form of all 0's and 1's. 00:17:55.760 --> 00:17:58.130 Somewhere on the CS50 IDE, there's a whole bunch 00:17:58.130 --> 00:18:00.800 of 0's and 1's representing cs50.c. 00:18:00.800 --> 00:18:03.410 Somewhere in CS50 IDE, there's another file 00:18:03.410 --> 00:18:08.840 representing the 0's and 1's for stdio.c So this final fourth step, a.k.a. 00:18:08.840 --> 00:18:13.280 linking, just takes all of my 0's and 1's, all of CS50 0's and 1's, all 00:18:13.280 --> 00:18:18.800 of printf's 0's and 1's, and links them all together into one big blob, 00:18:18.800 --> 00:18:23.870 if you will, that collectively represent your program, hello. 00:18:23.870 --> 00:18:26.960 So, my god, like, that's quite a mouthful and so many steps. 00:18:26.960 --> 00:18:31.250 And none of the steps have I described are really germane to you implementing 00:18:31.250 --> 00:18:35.090 Mario's pyramid or cash or credit, because what we've really 00:18:35.090 --> 00:18:37.340 been doing over the past week is taking all four 00:18:37.340 --> 00:18:40.880 of these fairly low-level, sophisticated concepts and, if you will, 00:18:40.880 --> 00:18:44.720 abstracting them away so that we just refer to this whole process 00:18:44.720 --> 00:18:46.310 as compiling. 00:18:46.310 --> 00:18:48.380 So we even though, yes, technically, compiling 00:18:48.380 --> 00:18:51.320 is just one of the four steps, what a programmer typically 00:18:51.320 --> 00:18:54.470 does when saying compiling is they're, just with a wave of the hand, 00:18:54.470 --> 00:18:58.400 referring to all of those lower-level details. 00:18:58.400 --> 00:19:01.700 But it is the case that there's multiple steps happening underneath the hood. 00:19:01.700 --> 00:19:04.610 And this is what make and, in turn, Clang are doing for you, 00:19:04.610 --> 00:19:08.810 automating this process of going from source code to assembly code 00:19:08.810 --> 00:19:13.153 to machine code and then linking it all together with any libraries you 00:19:13.153 --> 00:19:13.820 might have used. 00:19:13.820 --> 00:19:15.800 So no longer take for granted what's happening. 00:19:15.800 --> 00:19:17.990 Hopefully, that offers you a glimpse a bit more 00:19:17.990 --> 00:19:21.860 of what's actually happening when you compile your own code. 00:19:21.860 --> 00:19:24.800 Well, let me pause there, because that's quite a mouthful, 00:19:24.800 --> 00:19:29.660 and see if there's any questions on preprocessing, compiling, 00:19:29.660 --> 00:19:33.050 or assembling, or linking, a.k.a. 00:19:33.050 --> 00:19:35.120 compiling. 00:19:35.120 --> 00:19:37.550 And again, we won't dwell at this low level. 00:19:37.550 --> 00:19:40.640 We'll tend to now just abstract this all away if we can sort of agree 00:19:40.640 --> 00:19:42.540 that, OK, yes, there's those steps. 00:19:42.540 --> 00:19:45.290 But what's really important is the whole process, not the minutia. 00:19:45.290 --> 00:19:46.260 Sophia? 00:19:46.260 --> 00:19:50.060 SOPHIA: I had a question about with the first step, when 00:19:50.060 --> 00:19:53.720 we're replacing all the information at the top, 00:19:53.720 --> 00:19:56.790 is that information contained within the IDE? 00:19:56.790 --> 00:19:58.010 Or where do we-- 00:19:58.010 --> 00:20:00.375 are there files saved somewhere in that IDE, like, where 00:20:00.375 --> 00:20:02.000 it's getting all this information from? 00:20:02.000 --> 00:20:03.020 DAVID MALAN: Yeah, really good question. 00:20:03.020 --> 00:20:04.603 Where are all these files coming from? 00:20:04.603 --> 00:20:07.320 So yes, when you are using CS50 IDE, or frankly, 00:20:07.320 --> 00:20:09.830 if you're using your own Mac or your own PC, 00:20:09.830 --> 00:20:13.810 and you have preinstalled a compiler into your Mac or PC just like we have 00:20:13.810 --> 00:20:18.500 to CS50 IDE, what you get is a whole bunch of .h files somewhere 00:20:18.500 --> 00:20:19.700 on the computer system. 00:20:19.700 --> 00:20:23.950 You might also have a whole bunch of .c files, or compiled versions thereof, 00:20:23.950 --> 00:20:24.950 somewhere on the system. 00:20:24.950 --> 00:20:28.370 So yes, when you download and install a compiler, 00:20:28.370 --> 00:20:31.280 you are getting all of these libraries added for you. 00:20:31.280 --> 00:20:35.720 And we preinstalled an additional library called CS50's library that 00:20:35.720 --> 00:20:40.180 additionally comes with its own .h file and its own machine code as well. 00:20:40.180 --> 00:20:43.250 So all of those files are somewhere in CS50 IDE, 00:20:43.250 --> 00:20:46.460 or equivalently, in your own Mac or PC if you're working locally. 00:20:46.460 --> 00:20:48.620 And the compiler, Clang, in this case, just 00:20:48.620 --> 00:20:52.370 knows how to find that because one of the steps involved in installing 00:20:52.370 --> 00:20:55.130 your own compiler is making sure it's configured to know, 00:20:55.130 --> 00:20:58.010 per Sophia's question, where all those files are. 00:21:00.770 --> 00:21:02.990 [? Basili? ?] I'm sorry if I'm mispronouncing it. 00:21:02.990 --> 00:21:04.010 [? Basili? ?] 00:21:04.010 --> 00:21:06.800 [? BASILI: ?] So whenever we're compiling hello, 00:21:06.800 --> 00:21:11.960 for example, is the compiler also compiling, for example, CS50? 00:21:11.960 --> 00:21:16.387 Or does CS50 already exist in machine code somewhere beneath? 00:21:16.387 --> 00:21:18.220 DAVID MALAN: Yeah, really good question too. 00:21:18.220 --> 00:21:20.570 So I was kind of skirting this part of Sophia's question 00:21:20.570 --> 00:21:25.640 because technically speaking, probably cs50.c is not installed on the system. 00:21:25.640 --> 00:21:29.550 And technically, stdio.c is probably not installed in the system. 00:21:29.550 --> 00:21:30.050 Why? 00:21:30.050 --> 00:21:31.160 It just doesn't need to be. 00:21:31.160 --> 00:21:32.868 It would be kind of inefficient, that is, 00:21:32.868 --> 00:21:35.600 slow, if every time you compiled your own program, 00:21:35.600 --> 00:21:39.050 you had to additionally compile CS50's program, and stdio's program, 00:21:39.050 --> 00:21:40.020 and so forth. 00:21:40.020 --> 00:21:42.740 So it actually stands to reason that what computers typically do 00:21:42.740 --> 00:21:46.490 is they precompile all of those library files for you 00:21:46.490 --> 00:21:48.823 so that more efficiently they can just be linked in. 00:21:48.823 --> 00:21:50.990 And you don't have to keep preprocessing, compiling, 00:21:50.990 --> 00:21:53.330 and assembling third-party code. 00:21:53.330 --> 00:21:57.560 You only perform those steps on your own code and then link everything together. 00:21:57.560 --> 00:21:59.270 And indeed, that's the case. 00:21:59.270 --> 00:22:01.490 It's all done in advance. 00:22:01.490 --> 00:22:03.800 Iris, question from you. 00:22:03.800 --> 00:22:07.070 IRIS: When we replace the header files with prototypes, 00:22:07.070 --> 00:22:10.440 are we only replacing it with the prototypes that get used? 00:22:10.440 --> 00:22:12.777 Or are all the prototypes technically substituted? 00:22:12.777 --> 00:22:15.110 DAVID MALAN: Yeah, so I was kind of sweeping that detail 00:22:15.110 --> 00:22:16.535 under the rug with my dot, dot, dot. 00:22:16.535 --> 00:22:18.618 There's a whole lot of other stuff in those files. 00:22:18.618 --> 00:22:21.110 You're getting the entire contents of those files, 00:22:21.110 --> 00:22:24.710 even if the only thing you need is the prototype. 00:22:24.710 --> 00:22:27.710 But, and this is why I alluded to the fact too that technically, 00:22:27.710 --> 00:22:30.860 there probably isn't a stdio.c file, because there 00:22:30.860 --> 00:22:32.630 would be so much stuff in it. 00:22:32.630 --> 00:22:36.140 There's probably not just one stdio.h file with everything in it. 00:22:36.140 --> 00:22:40.070 There's probably some smaller files that get magically included as well. 00:22:40.070 --> 00:22:44.300 But yes, there are many more lines of code in those files. 00:22:44.300 --> 00:22:47.330 But that's OK. 00:22:47.330 --> 00:22:51.920 Your compiler is only going to use the lines that it actually cares about. 00:22:51.920 --> 00:22:53.120 Good question. 00:22:53.120 --> 00:22:56.450 All right, so with that said, this past week 00:22:56.450 --> 00:22:58.850 undoubtedly was a bit frustrating in some ways 00:22:58.850 --> 00:23:00.980 because you probably ran into problems. 00:23:00.980 --> 00:23:03.560 You ran into bugs, mistakes in your own code. 00:23:03.560 --> 00:23:06.165 You probably saw one or more yellow or red error messages. 00:23:06.165 --> 00:23:09.290 And you might have struggled a little bit just to get your code to compile. 00:23:09.290 --> 00:23:10.670 And again, that's normal. 00:23:10.670 --> 00:23:12.390 That will go away over time. 00:23:12.390 --> 00:23:16.320 But honestly, whenever I write C, let's say 20% of the time, 00:23:16.320 --> 00:23:20.400 I still have a compilation error, let alone logical errors, in my own code. 00:23:20.400 --> 00:23:23.240 So this is just part of the experience of writing code. 00:23:23.240 --> 00:23:25.370 Humans make mistakes in all forms of life. 00:23:25.370 --> 00:23:28.130 And that's ever more true in the context of code, where again, 00:23:28.130 --> 00:23:32.180 per our first two weeks precision is important as is correctness. 00:23:32.180 --> 00:23:35.520 And it's hard sometimes to achieve both of those goals. 00:23:35.520 --> 00:23:38.060 So let's consider now how you might be more 00:23:38.060 --> 00:23:42.590 empowered to debug your own code-- that is, find problems in your own code. 00:23:42.590 --> 00:23:44.750 And this word actually has some etymology. 00:23:44.750 --> 00:23:46.670 This isn't necessarily the first bug. 00:23:46.670 --> 00:23:49.130 But perhaps the most famous bug is this one pictured 00:23:49.130 --> 00:23:53.060 here from the research notebook of Grace Hopper, 00:23:53.060 --> 00:23:56.090 a famous computer scientist, who had discovered 00:23:56.090 --> 00:23:59.810 that there were some problems with the Harvard Mark II computer, a very 00:23:59.810 --> 00:24:03.440 famous computer nowadays that actually lives over soon 00:24:03.440 --> 00:24:05.240 in the new engineering school on campus-- 00:24:05.240 --> 00:24:06.830 used to live in the Science Center. 00:24:06.830 --> 00:24:08.330 The computer was having problems. 00:24:08.330 --> 00:24:12.770 And sure enough, when the engineers took a look inside of this big mainframe 00:24:12.770 --> 00:24:15.770 computer, there was actually a bug, pictured here 00:24:15.770 --> 00:24:17.900 and taped to Grace Hopper's notebook. 00:24:17.900 --> 00:24:20.840 So this wasn't necessarily the first use of the term "bug," 00:24:20.840 --> 00:24:25.110 but it is a very well-known example of an actual bug in an actual computer. 00:24:25.110 --> 00:24:27.860 Nowadays, we speak a little more metaphorically that a bug is just 00:24:27.860 --> 00:24:29.760 a mistake in one program. 00:24:29.760 --> 00:24:33.020 And we did give you a few tools last week for troubleshooting bugs. 00:24:33.020 --> 00:24:37.135 Help50 allows you to better understand some of the cryptic error messages. 00:24:37.135 --> 00:24:39.510 And that's just because the staff wrote this program that 00:24:39.510 --> 00:24:41.610 analyzed the problem you're having, and we try 00:24:41.610 --> 00:24:44.250 to translate it to just more human-friendly speak. 00:24:44.250 --> 00:24:47.400 We saw a tool called style50, which helps you not with your correctness, 00:24:47.400 --> 00:24:49.470 but just with the aesthetics of your code, 00:24:49.470 --> 00:24:52.020 helping you better indent things and add white space-- that 00:24:52.020 --> 00:24:55.050 is, blank lines or space characters-- so it's a little more user 00:24:55.050 --> 00:24:56.760 friendly to the human to read. 00:24:56.760 --> 00:24:59.130 And then check50, which, of course, the staff 00:24:59.130 --> 00:25:01.560 write so that we can give you immediate feedback on 00:25:01.560 --> 00:25:05.230 whether or not your code is correct per the problem sets or the lab 00:25:05.230 --> 00:25:06.450 specification. 00:25:06.450 --> 00:25:09.323 But there's some other tools that you should have in your toolkit. 00:25:09.323 --> 00:25:10.740 And we'll give those to you today. 00:25:10.740 --> 00:25:14.790 And one, frankly, is this universal debugging tool just called, 00:25:14.790 --> 00:25:16.928 in the context of C, printf. 00:25:16.928 --> 00:25:18.720 So printf, of course, is just this function 00:25:18.720 --> 00:25:20.470 that prints stuff out onto the screen. 00:25:20.470 --> 00:25:24.270 But that in and of itself is a wonderfully powerful tool 00:25:24.270 --> 00:25:26.820 via which you can chase down problems in your code. 00:25:26.820 --> 00:25:29.940 And even after we leave C in a few weeks and introduce 00:25:29.940 --> 00:25:33.690 Python and other languages, almost every programming language out there 00:25:33.690 --> 00:25:35.460 has some form of printf. 00:25:35.460 --> 00:25:36.480 Maybe it's called print. 00:25:36.480 --> 00:25:38.940 Maybe it's called say, as it was in Scratch, 00:25:38.940 --> 00:25:43.780 but some ability to display information or present information to a human. 00:25:43.780 --> 00:25:47.700 So let's try to use this primitive, this notion of print f, 00:25:47.700 --> 00:25:49.760 to chase down a bug in one's code. 00:25:49.760 --> 00:25:52.950 So let me go ahead and deliberately write a buggy program. 00:25:52.950 --> 00:25:56.570 I'm going to even call the file buggy0.c. 00:25:56.570 --> 00:26:01.230 And at the top of this file, I'm going to go ahead and #include stdio.h. 00:26:01.230 --> 00:26:03.810 No need for the CS50 library for this one. 00:26:03.810 --> 00:26:06.960 And then I'm going to do int main(void), which we saw last week, 00:26:06.960 --> 00:26:08.700 and we'll explain in more detail today. 00:26:08.700 --> 00:26:10.260 And then I'm going to give myself a quick loop. 00:26:10.260 --> 00:26:12.990 I just want to go ahead and print out, oh, I don't know, like, 00:26:12.990 --> 00:26:14.580 10 hashes on the screen. 00:26:14.580 --> 00:26:17.430 So I want to print a vertical column, kind of like one 00:26:17.430 --> 00:26:20.190 of those screenshots from Super Mario Bros., not a pyramid, 00:26:20.190 --> 00:26:23.020 just a single column of hashes, and 10 of them. 00:26:23.020 --> 00:26:25.830 So I'm going to do something like, int i = 0, 00:26:25.830 --> 00:26:28.140 because I feel like I learned in class that I generally 00:26:28.140 --> 00:26:29.562 should start counting from 0. 00:26:29.562 --> 00:26:31.770 Then I'm going to have my condition in this for loop. 00:26:31.770 --> 00:26:33.190 And I want to do this 10 times. 00:26:33.190 --> 00:26:35.200 I'm going to do it less than or equal to 10. 00:26:35.200 --> 00:26:37.242 Then I'm going to go ahead and have my increment, 00:26:37.242 --> 00:26:39.478 which quite simply can be expressed as i++. 00:26:39.478 --> 00:26:42.270 And then inside this loop, I'm just going to go ahead and print out 00:26:42.270 --> 00:26:44.970 a single hash followed by a new line. 00:26:44.970 --> 00:26:46.680 I'm going to save the program. 00:26:46.680 --> 00:26:51.870 I'm going to compile it with clang -o buggy0 buggy0-- 00:26:51.870 --> 00:26:52.768 I mean, no. 00:26:52.768 --> 00:26:54.810 You don't have to use Clang manually in this way. 00:26:54.810 --> 00:26:58.453 It's a lot simpler to just abstract that away-- 00:26:58.453 --> 00:26:59.370 that's not a command-- 00:26:59.370 --> 00:27:03.330 to abstract that away and run make buggy0. 00:27:03.330 --> 00:27:07.475 And make will take care of the process of invoking Clang for you. 00:27:07.475 --> 00:27:08.850 I'm going to go ahead and run it. 00:27:08.850 --> 00:27:12.290 Seems to be compiling successfully, so no need for help50. 00:27:12.290 --> 00:27:13.890 It's already pretty well styled. 00:27:13.890 --> 00:27:18.420 In fact, if I run style50 on this buggy0, I don't have any comments yet. 00:27:18.420 --> 00:27:20.490 But at least it looks very nicely indented. 00:27:20.490 --> 00:27:22.000 So I think I'm OK with that. 00:27:22.000 --> 00:27:25.530 But let me add that comment and do "Print 10 hashes" just 00:27:25.530 --> 00:27:27.120 to remind myself of my goal. 00:27:27.120 --> 00:27:31.290 And now let me go ahead and run this, ./buggy0, Enter. 00:27:31.290 --> 00:27:32.670 And I see, OK, good. 00:27:32.670 --> 00:27:38.147 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, I think. 00:27:38.147 --> 00:27:39.480 All right, so it's a stupid bug. 00:27:39.480 --> 00:27:41.970 And maybe it's jumped out obviously to some of you. 00:27:41.970 --> 00:27:44.220 But maybe it's a little more subtle to others of you. 00:27:44.220 --> 00:27:45.250 But where do you begin? 00:27:45.250 --> 00:27:46.830 Suppose I were to run check50. 00:27:46.830 --> 00:27:51.570 And check50 were to say, nope, you printed out 11 hashes instead of 10. 00:27:51.570 --> 00:27:54.210 But my code looks right to me, at least at first glance. 00:27:54.210 --> 00:27:57.220 Well, how can I go about debugging this or solving this? 00:27:57.220 --> 00:27:58.780 Well, again, printf is your friend. 00:27:58.780 --> 00:28:01.560 If you want to understand more about your own program, 00:28:01.560 --> 00:28:05.460 use printf to temporarily print more information to the screen, 00:28:05.460 --> 00:28:08.790 not that we want in the final version, not that your TF wants to see, 00:28:08.790 --> 00:28:11.340 but that you, the programmer, can temporarily see. 00:28:11.340 --> 00:28:14.040 So before I print this hash, let me print something 00:28:14.040 --> 00:28:15.900 a little more pedantic like this-- 00:28:15.900 --> 00:28:19.950 "i is now %i backslash n." 00:28:19.950 --> 00:28:25.440 So I literally want to know, just for my own mental math, what is the value of i 00:28:25.440 --> 00:28:28.230 at this point before I print that hash? 00:28:28.230 --> 00:28:30.480 Now I'm going to go ahead and paste in the value of i. 00:28:30.480 --> 00:28:32.640 So I'm using %i as a placeholder. 00:28:32.640 --> 00:28:35.100 I'm plugging in the value of the variable i. 00:28:35.100 --> 00:28:36.870 I'm going to save my code now. 00:28:36.870 --> 00:28:39.750 I'm going to recompile it with make buggy0. 00:28:39.750 --> 00:28:41.380 And I'm going to rerun it now. 00:28:41.380 --> 00:28:43.680 And let me go ahead and increase the size of my window 00:28:43.680 --> 00:28:46.390 just so we can focus now on the output. 00:28:46.390 --> 00:28:50.440 And I'm going to go ahead and ./buggy0, Enter. 00:28:50.440 --> 00:28:53.670 OK, so now I see not only my output, but also 00:28:53.670 --> 00:28:56.670 commingled with that output, some diagnostic output, if you will, 00:28:56.670 --> 00:28:58.080 some debugging output. 00:28:58.080 --> 00:29:02.430 And it's just more pedantically telling me, "i is now 0," "i is now 1," 00:29:02.430 --> 00:29:08.490 "i is now 2," dot, dot, dot, "i is now 9," "i is now 10." 00:29:08.490 --> 00:29:11.040 OK, I don't hate the fact that i is 10. 00:29:11.040 --> 00:29:15.630 But I'm not loving the fact that if I started at 0 and printed a hash, 00:29:15.630 --> 00:29:19.140 and I'm hitting 10 and printing another hash, well, obviously, 00:29:19.140 --> 00:29:20.180 there's my problem. 00:29:20.180 --> 00:29:22.620 So it might not have been all that much more obvious 00:29:22.620 --> 00:29:24.030 than looking at the code itself. 00:29:24.030 --> 00:29:27.090 But by using printf, you can just be a lot more clear 00:29:27.090 --> 00:29:28.900 to yourself what's going on. 00:29:28.900 --> 00:29:32.490 So if now I see, OK, well, if I start at 0, I have to go up to 10. 00:29:32.490 --> 00:29:35.100 I could change my code to do this to be less than 10. 00:29:35.100 --> 00:29:38.040 I could leave that alone and go from 1 through 10. 00:29:38.040 --> 00:29:41.920 But again, programmer convention would be to go from 0 up to 10. 00:29:41.920 --> 00:29:43.140 So I think I'm good now. 00:29:43.140 --> 00:29:46.662 And in fact, now I'll go ahead and recompile this, make buggy0. 00:29:46.662 --> 00:29:49.620 Let me go ahead and increase the size of the window again just so I can 00:29:49.620 --> 00:29:53.980 temporarily see this and ./buggy0. 00:29:53.980 --> 00:29:57.460 OK, I start now at 0, 1, 2, dot, dot, dot. 00:29:57.460 --> 00:29:59.160 Now I stop at 9. 00:29:59.160 --> 00:30:01.080 And that, of course, gives me 10 hashes. 00:30:01.080 --> 00:30:03.343 So again, I don't need this in the final output. 00:30:03.343 --> 00:30:05.010 And I'm to go ahead and delete this now. 00:30:05.010 --> 00:30:06.510 It's temporary output. 00:30:06.510 --> 00:30:08.760 But again, having those instincts-- if you don't quite 00:30:08.760 --> 00:30:12.120 understand why your code is compiling but not running properly, 00:30:12.120 --> 00:30:15.360 and you want to better see what the computer is clearly seeing, 00:30:15.360 --> 00:30:18.930 its mind eye, use printf to just tell yourself 00:30:18.930 --> 00:30:23.790 what the value of some variable or variables are anywhere in your code 00:30:23.790 --> 00:30:26.732 that you want to see a little more detail. 00:30:26.732 --> 00:30:28.440 All right, let me pause for just a moment 00:30:28.440 --> 00:30:32.220 to see if there's any questions on this technique of just using printf 00:30:32.220 --> 00:30:37.830 to begin to debug your code and to see the values of variables 00:30:37.830 --> 00:30:40.560 in a way that's a little more explicit. 00:30:43.980 --> 00:30:44.580 No? 00:30:44.580 --> 00:30:45.670 All right. 00:30:45.670 --> 00:30:50.130 Well, let me propose an even more powerful tool that admittedly 00:30:50.130 --> 00:30:51.480 takes a little getting used to. 00:30:51.480 --> 00:30:54.000 But this is kind of one of those lessons, 00:30:54.000 --> 00:30:58.350 trust me, if you will, that if you spend a few more minutes, maybe even 00:30:58.350 --> 00:31:01.320 an hour or so this week, learning the following tool, 00:31:01.320 --> 00:31:04.500 you will save yourself hours, plural, maybe even 00:31:04.500 --> 00:31:07.440 tens of hours over the course of the next many weeks 00:31:07.440 --> 00:31:12.520 because this tool can help you truly see what's going on inside of your code. 00:31:12.520 --> 00:31:15.870 So this tool we're going to add to the list today is called debug50. 00:31:15.870 --> 00:31:20.130 And while this one does end with 50, implying that it's a CS50 tool, 00:31:20.130 --> 00:31:24.450 it's built on top of an industry standard tool known as GDB, the GNU 00:31:24.450 --> 00:31:27.960 DeBugger, that's a standard tool that a lot of different computer systems 00:31:27.960 --> 00:31:32.520 use to provide you with the ability to debug your code in a more sophisticated 00:31:32.520 --> 00:31:35.530 way than just using printf alone. 00:31:35.530 --> 00:31:36.780 So let's go ahead and do this. 00:31:36.780 --> 00:31:39.360 Let me go back to the buggy version of this program 00:31:39.360 --> 00:31:43.620 which, recall, had me going from 0 through 10, which was too many steps. 00:31:43.620 --> 00:31:47.850 A moment ago, I proposed that we just use printf to see the value of i. 00:31:47.850 --> 00:31:50.640 But frankly, the bigger our programs get, the more complicated 00:31:50.640 --> 00:31:53.730 they get, the more output they need to have on the screen. 00:31:53.730 --> 00:31:56.250 It's just going to get very messy quickly 00:31:56.250 --> 00:31:58.800 if you're printing out stuff that shouldn't be there, right? 00:31:58.800 --> 00:31:59.910 Think back to Mario. 00:31:59.910 --> 00:32:03.060 Mario's pyramid is this sort of graphical output. 00:32:03.060 --> 00:32:07.860 And it would very quickly get ugly and kind of hard to understand your pyramid 00:32:07.860 --> 00:32:11.520 if you're comingling that pyramid with actual textual output from printf 00:32:11.520 --> 00:32:12.430 as well. 00:32:12.430 --> 00:32:16.560 So debug50, and in turn a debugger in any language, 00:32:16.560 --> 00:32:20.580 is a tool that allows you to run your code step by step 00:32:20.580 --> 00:32:26.550 and look inside of variables and other pieces of memory inside of the computer 00:32:26.550 --> 00:32:28.080 while your program is running. 00:32:28.080 --> 00:32:31.800 Right now, pretty much every program we run takes a split second to run. 00:32:31.800 --> 00:32:34.170 That's way too fast for me, the human, to wrap my mind 00:32:34.170 --> 00:32:36.330 around what's going on step by step. 00:32:36.330 --> 00:32:38.550 A debugger allows you to run your program, 00:32:38.550 --> 00:32:42.970 but much more slowly, step by step, so you can see what's going on. 00:32:42.970 --> 00:32:48.030 So I'm going to go ahead now and run debug50 ./hello. 00:32:48.030 --> 00:32:52.380 No, sorry, debug50 ./buggy0. 00:32:52.380 --> 00:32:54.900 So I write debug50 first, a space, and then 00:32:54.900 --> 00:32:56.910 dot slash and the name of the program that's 00:32:56.910 --> 00:32:59.785 already compiled that I want to debug. 00:32:59.785 --> 00:33:01.410 So I'm going to go ahead and hit Enter. 00:33:01.410 --> 00:33:03.240 And notice that, oh, it was smart. 00:33:03.240 --> 00:33:05.100 It noticed that I changed my code. 00:33:05.100 --> 00:33:06.060 And I did a moment ago. 00:33:06.060 --> 00:33:07.740 I reverted it back to the buggy version. 00:33:07.740 --> 00:33:10.380 So let me fix this-- make buggy0. 00:33:10.380 --> 00:33:11.620 All right, no errors. 00:33:11.620 --> 00:33:13.500 Now let me go ahead and run debug50 again. 00:33:13.500 --> 00:33:17.280 And if you haven't noticed this already, sometimes I seem to type crazy fast. 00:33:17.280 --> 00:33:19.180 I'm not necessarily typing that fast. 00:33:19.180 --> 00:33:21.960 I'm going through my history in CS50 IDE. 00:33:21.960 --> 00:33:25.470 Using your arrow keys, Up and Down, you can scroll back 00:33:25.470 --> 00:33:29.070 in time for all of the commands you've typed over the past few minutes 00:33:29.070 --> 00:33:30.430 or hours or even days. 00:33:30.430 --> 00:33:32.430 And this will just start to save you keystrokes. 00:33:32.430 --> 00:33:33.870 So I'm going to go ahead and hit Up. 00:33:33.870 --> 00:33:36.495 And now I don't have to bother typing this whole command again. 00:33:36.495 --> 00:33:38.320 It's a helpful way to just save time. 00:33:38.320 --> 00:33:40.800 I'm going to go head in now and hit Enter. 00:33:40.800 --> 00:33:43.650 And now notice this error message-- 00:33:43.650 --> 00:33:45.050 I haven't set any breakpoints. 00:33:45.050 --> 00:33:48.300 "Set at least one breakpoint by clicking to the left of a line number and then 00:33:48.300 --> 00:33:49.500 re-run debug50!" 00:33:49.500 --> 00:33:51.420 Well, what's going on here? 00:33:51.420 --> 00:33:55.620 Well, debug50 needs me to tell the computer in advance at what line 00:33:55.620 --> 00:33:59.910 I want to break into and step through step by step. 00:33:59.910 --> 00:34:01.020 So, I can do that. 00:34:01.020 --> 00:34:03.780 I'm going to go over to the side of the file here, as it says. 00:34:03.780 --> 00:34:04.530 And you know what? 00:34:04.530 --> 00:34:08.460 The first interesting line is this one here, line 6. 00:34:08.460 --> 00:34:12.060 So I clicked in the so-called gutter, the left-hand side of the screen, 00:34:12.060 --> 00:34:13.170 on line 6. 00:34:13.170 --> 00:34:16.139 And that automatically put a red dot there, like a stop sign. 00:34:16.139 --> 00:34:21.420 Now, one last time, I'm going to go ahead and run debug50 ./buggy0 and hit 00:34:21.420 --> 00:34:21.960 Enter. 00:34:21.960 --> 00:34:25.887 And now notice this fancy new panel opens up on the right-hand side. 00:34:25.887 --> 00:34:27.929 And it's going to look a little cryptic at first. 00:34:27.929 --> 00:34:30.219 But let's consider what has changed on the screen. 00:34:30.219 --> 00:34:34.440 Notice now that highlighted in this sort of off-yellow color is line 6. 00:34:34.440 --> 00:34:37.949 And that's because what debug50 is doing is it's running my program, 00:34:37.949 --> 00:34:41.610 but it has paused execution on line 6. 00:34:41.610 --> 00:34:44.100 So it's done everything from line 1 through 5, 00:34:44.100 --> 00:34:46.860 but now it's waiting for me on line 6. 00:34:46.860 --> 00:34:49.620 And what's interesting over here is this-- let 00:34:49.620 --> 00:34:51.929 me zoom in on this window over here. 00:34:51.929 --> 00:34:54.150 And there's a lot going on here, admittedly. 00:34:54.150 --> 00:34:59.190 But let's focus for just a moment not on Watch Expressions, not on Call Stack, 00:34:59.190 --> 00:35:00.850 but only on Local Variables. 00:35:00.850 --> 00:35:04.380 And notice, I have a variable called i whose initial value is 0, 00:35:04.380 --> 00:35:05.820 and it's of type int. 00:35:05.820 --> 00:35:09.150 Now, this is kind of interesting because watch what I can do via these icons 00:35:09.150 --> 00:35:09.930 up here. 00:35:09.930 --> 00:35:15.360 I can click on this Step Over line and start to step through my code line 00:35:15.360 --> 00:35:16.007 by line. 00:35:16.007 --> 00:35:17.340 So let me go ahead and zoom out. 00:35:17.340 --> 00:35:18.870 Let me go ahead and click Step Over. 00:35:18.870 --> 00:35:21.180 And watch what happens to the yellow highlighting. 00:35:21.180 --> 00:35:23.140 It moves down to the next line. 00:35:23.140 --> 00:35:27.090 But notice, if I zoom in again up here, the value of i has not changed. 00:35:27.090 --> 00:35:29.460 Now let me go ahead and step over again. 00:35:29.460 --> 00:35:31.740 And notice the yellow highlighting doubles back. 00:35:31.740 --> 00:35:33.790 That makes sense because I'm in a loop. 00:35:33.790 --> 00:35:36.760 So it should be going back and forth, back and forth. 00:35:36.760 --> 00:35:38.123 But what next happens in a loop? 00:35:38.123 --> 00:35:40.290 Every time you go back to the beginning of the loop, 00:35:40.290 --> 00:35:43.770 remember that your incrementation happens, like the i++. 00:35:43.770 --> 00:35:46.530 So watch now closely in the top right-hand corner, 00:35:46.530 --> 00:35:52.110 when I Step Over now, notice that the value of i in my debugger 00:35:52.110 --> 00:35:54.058 has just been changed to 1. 00:35:54.058 --> 00:35:55.350 So I didn't have to use printf. 00:35:55.350 --> 00:35:57.400 I didn't have to mess up the output of my screen. 00:35:57.400 --> 00:35:59.850 I can literally see in this GUI, this Graphical User 00:35:59.850 --> 00:36:02.790 Interface on the right-hand side, what the value of i is. 00:36:02.790 --> 00:36:05.310 Now if I just start clicking a little more quickly, 00:36:05.310 --> 00:36:09.900 notice that as the loop is executing, again and again, the value of i 00:36:09.900 --> 00:36:11.070 keeps getting updated. 00:36:11.070 --> 00:36:11.820 And you know what? 00:36:11.820 --> 00:36:15.930 I bet, even though we started at 0, if I do this enough times, 00:36:15.930 --> 00:36:18.990 I will see that the value is 10 now, thereby 00:36:18.990 --> 00:36:25.110 giving me another printf at the bottom, thereby explaining the 11 total hashes 00:36:25.110 --> 00:36:25.950 that I saw. 00:36:25.950 --> 00:36:28.450 So I haven't gotten any new information here. 00:36:28.450 --> 00:36:30.960 But notice I've gotten unperturbed information. 00:36:30.960 --> 00:36:35.370 I've not messily and sloppily printed out all of these printf statements 00:36:35.370 --> 00:36:36.100 on the screen. 00:36:36.100 --> 00:36:38.430 I'm just kind of watching a little more methodically 00:36:38.430 --> 00:36:43.230 what's happening to the state of my variable over on the top right there. 00:36:43.230 --> 00:36:47.700 All right, let me pause here too to see if there's any questions on what 00:36:47.700 --> 00:36:49.230 this debugger does. 00:36:49.230 --> 00:36:51.150 Again, you compile your code. 00:36:51.150 --> 00:36:56.340 You run debug50 on your code, but only after setting a so-called breakpoint, 00:36:56.340 --> 00:37:00.575 where you decide in advance where do you want to pause execution of your code. 00:37:00.575 --> 00:37:03.450 Even though here I did it pretty much at the beginning of my program, 00:37:03.450 --> 00:37:05.242 for bigger programs, it's going to be super 00:37:05.242 --> 00:37:07.718 convenient to be able to pause halfway through your code 00:37:07.718 --> 00:37:09.510 and not have to go through the whole thing. 00:37:09.510 --> 00:37:11.430 Peter, question. 00:37:11.430 --> 00:37:16.350 PETER: About the debugger, what's the difference between Step Over 00:37:16.350 --> 00:37:18.813 and Step Into and Step Out and-- 00:37:18.813 --> 00:37:20.230 DAVID MALAN: Really good question. 00:37:20.230 --> 00:37:21.980 Let me come back to that in just a moment, 00:37:21.980 --> 00:37:25.800 because we'll do one other example where Step Into and Step Out actually 00:37:25.800 --> 00:37:27.520 are germane. 00:37:27.520 --> 00:37:28.490 But before we do that. 00:37:28.490 --> 00:37:33.520 Any other questions about debug50 before we reveal what Step Into and Step 00:37:33.520 --> 00:37:35.335 Over do for us as well? 00:37:38.940 --> 00:37:39.910 Oh, all right. 00:37:39.910 --> 00:37:42.310 Well, let's take Peter's question right there. 00:37:42.310 --> 00:37:44.705 Let me go ahead now and get out of the debugger. 00:37:44.705 --> 00:37:46.830 And honestly, I don't see an obvious way to get out 00:37:46.830 --> 00:37:48.490 of the debugger at the moment. 00:37:48.490 --> 00:37:51.240 But Control-C is your new friend today too. 00:37:51.240 --> 00:37:53.700 Pretty much any time you lose control of a program 00:37:53.700 --> 00:37:56.880 because the debugger's running, and you've lost interest in it. 00:37:56.880 --> 00:37:58.770 Or maybe last week, you wrote a program that 00:37:58.770 --> 00:38:01.800 has an infinite loop that just keeps going and going and going, 00:38:01.800 --> 00:38:04.110 Control-C will break out of that program. 00:38:04.110 --> 00:38:07.290 But let's now write quickly another program that, this time, 00:38:07.290 --> 00:38:08.430 has a second function. 00:38:08.430 --> 00:38:10.800 And we'll see one other feature of the debugger today. 00:38:10.800 --> 00:38:14.520 I'm going to go ahead and create a new file now called buggy1.c. 00:38:14.520 --> 00:38:16.470 Again, it's going to be deliberately flawed. 00:38:16.470 --> 00:38:20.280 But I'm going to first going to go ahead and #include cs50.h this time. 00:38:20.280 --> 00:38:22.830 And I'm going to #include stdio.h. 00:38:22.830 --> 00:38:24.590 I'm going to do int main void. 00:38:24.590 --> 00:38:27.090 And I'm going to go ahead and do the following-- give myself 00:38:27.090 --> 00:38:28.380 a variable called i. 00:38:28.380 --> 00:38:31.290 And I'm going to try to get a negative int by calling 00:38:31.290 --> 00:38:33.180 a function called get_negative_int. 00:38:33.180 --> 00:38:37.740 And then quite simply, I'm going to print out this value, "%i backslash n", 00:38:37.740 --> 00:38:39.210 i, semicolon. 00:38:39.210 --> 00:38:40.860 Now, there's only one problem-- 00:38:40.860 --> 00:38:43.210 get_negative_int does not exist. 00:38:43.210 --> 00:38:45.870 So like last week, where we implemented get_positive_int, 00:38:45.870 --> 00:38:47.790 this week, I'll implement get_negative_int. 00:38:47.790 --> 00:38:49.890 But I'm going to do it incorrectly at first. 00:38:49.890 --> 00:38:54.300 Now, get_negative_int, as the name implies, needs to return an integer. 00:38:54.300 --> 00:38:57.330 And even though we only spent brief time on this last week, 00:38:57.330 --> 00:39:00.210 recall that you can specify the output of a function, 00:39:00.210 --> 00:39:03.720 a custom function that you wrote, by putting its so-called return 00:39:03.720 --> 00:39:05.555 value first on this line. 00:39:05.555 --> 00:39:08.430 And then you can put the name of the function, like get_negative_int, 00:39:08.430 --> 00:39:11.940 and then in parentheses, you can put the input to the function. 00:39:11.940 --> 00:39:15.030 But if it takes no input, you can literally write the word "void," 00:39:15.030 --> 00:39:17.965 which is a term of art that just means, nothing goes here. 00:39:17.965 --> 00:39:20.340 I'm going to go ahead now and implement get_negative_int. 00:39:20.340 --> 00:39:22.920 And frankly, I think it's going to be pretty similar to last week. 00:39:22.920 --> 00:39:24.212 But my memory is a little hazy. 00:39:24.212 --> 00:39:26.310 So again, it will be deliberately flawed. 00:39:26.310 --> 00:39:29.130 But I'm going to go ahead and declare a variable called n. 00:39:29.130 --> 00:39:31.420 Then I'm going to do the following-- 00:39:31.420 --> 00:39:34.170 I'm going to set n equal to get_int. 00:39:34.170 --> 00:39:39.000 And I'm just going to explicitly ask the user for "Negative integer" followed 00:39:39.000 --> 00:39:39.900 by a space. 00:39:39.900 --> 00:39:44.220 And then I'm going to keep doing this while n is less than 0. 00:39:44.220 --> 00:39:48.540 And then at the very last line, I'm going to return n. 00:39:48.540 --> 00:39:51.120 So again, I claim that this function will 00:39:51.120 --> 00:39:53.340 get me a negative int from the user. 00:39:53.340 --> 00:39:57.810 And it's going to keep doing it again and again until the user cooperates. 00:39:57.810 --> 00:40:00.720 However, there is a bug. 00:40:00.720 --> 00:40:02.730 And there's a couple of bugs, in fact. 00:40:02.730 --> 00:40:06.720 Right now, let me go ahead and make a deliberate mistake-- make buggy1, 00:40:06.720 --> 00:40:07.740 Enter. 00:40:07.740 --> 00:40:10.020 And I see a whole bunch of errors here. 00:40:10.020 --> 00:40:12.300 I could use help50 on this. 00:40:12.300 --> 00:40:16.290 But based on last week, does anyone recall what the error here might be? 00:40:16.290 --> 00:40:20.100 "Error-- implicit declaration of function 'get_negative_int' 00:40:20.100 --> 00:40:21.930 is invalid in C99." 00:40:21.930 --> 00:40:24.992 So I don't know all of that, but implicit declaration of function 00:40:24.992 --> 00:40:26.700 is something you're going to start to see 00:40:26.700 --> 00:40:28.770 more often if you make this mistake. 00:40:28.770 --> 00:40:35.030 Anyone recall what this means and what the fix is without resorting to help50? 00:40:35.030 --> 00:40:37.760 Yeah, Jasmine, what do you think? 00:40:37.760 --> 00:40:40.370 JASMINE: So basically, since you declared it 00:40:40.370 --> 00:40:42.830 after you already used it in your code, it 00:40:42.830 --> 00:40:46.050 doesn't know what to read that as when it's processing it. 00:40:46.050 --> 00:40:49.825 So you have to move the first line above when you actually start the code. 00:40:49.825 --> 00:40:50.700 DAVID MALAN: Perfect. 00:40:50.700 --> 00:40:53.690 And this is the only time I will claim that copy/paste 00:40:53.690 --> 00:40:55.730 is acceptable and encouraged. 00:40:55.730 --> 00:40:59.180 I'm going to copy the very first line only of that function. 00:40:59.180 --> 00:41:02.840 And as Jasmine proposed, I'm going to paste it at the very top of the file, 00:41:02.840 --> 00:41:05.990 thereby giving myself a hint otherwise known as a prototype. 00:41:05.990 --> 00:41:09.290 So I'll even label it as such to remind myself why it's there-- 00:41:09.290 --> 00:41:11.720 prototype of that function. 00:41:11.720 --> 00:41:16.790 And here, I'm going to go ahead and "Get negative integer from user." 00:41:16.790 --> 00:41:20.720 And then this function is as left as written. 00:41:20.720 --> 00:41:23.340 So I now have this prototype at the very top 00:41:23.340 --> 00:41:25.840 of my file, which I think will indeed get rid of this error. 00:41:25.840 --> 00:41:27.950 Let me go to make buggy1 again. 00:41:27.950 --> 00:41:29.960 Now I see that indeed compiled OK. 00:41:29.960 --> 00:41:33.110 But when I run it now, ./buggy1-- 00:41:33.110 --> 00:41:36.270 let me go ahead and input a negative integer, negative 1. 00:41:36.270 --> 00:41:36.770 Hm. 00:41:36.770 --> 00:41:38.685 Negative 2, negative 3-- 00:41:38.685 --> 00:41:41.810 I feel like the function should be happy with this, and it's obviously not. 00:41:41.810 --> 00:41:42.650 So there's a bug. 00:41:42.650 --> 00:41:45.470 I'm going to go ahead and hit Control-C to get out of my program 00:41:45.470 --> 00:41:47.810 because otherwise, it would run potentially forever. 00:41:47.810 --> 00:41:49.610 And now I'm going to use debug50. 00:41:49.610 --> 00:41:53.090 But debug50 just got really interesting, to Peter's question 00:41:53.090 --> 00:41:56.180 earlier, because now I have things I can step into. 00:41:56.180 --> 00:41:58.070 I'm not writing all of my code in main. 00:41:58.070 --> 00:42:00.570 There's this other function now called get_negative_int. 00:42:00.570 --> 00:42:02.390 So let's see what happens now. 00:42:02.390 --> 00:42:05.930 Let me go ahead and set a breakpoint on the first interesting line of code, 00:42:05.930 --> 00:42:06.532 line 10. 00:42:06.532 --> 00:42:08.990 And it's interesting only in the sense that everything else 00:42:08.990 --> 00:42:10.670 is kind of boilerplate at this point. 00:42:10.670 --> 00:42:13.460 You just have to do it to get your program started. 00:42:13.460 --> 00:42:15.020 I'm going to now go down here. 00:42:15.020 --> 00:42:18.770 And I'm going to do debug50 ./buggy1. 00:42:18.770 --> 00:42:22.220 And in a moment, it's going to open up that sidebar. 00:42:22.220 --> 00:42:25.640 And I'm going to focus now not only on local variables-- 00:42:25.640 --> 00:42:29.810 like I did before, notice that i is again equal to 0 here by default. 00:42:29.810 --> 00:42:33.560 But I'm also going to reveal this option here, Call Stack. 00:42:33.560 --> 00:42:38.120 So Call Stack is a fancy way of referring to all of the functions 00:42:38.120 --> 00:42:43.560 that your program at this point in time has executed and not yet returned from. 00:42:43.560 --> 00:42:45.890 So right now, there's only one thing on the call stack 00:42:45.890 --> 00:42:49.700 because the only function that is currently executing is, of course, 00:42:49.700 --> 00:42:50.930 main, because why? 00:42:50.930 --> 00:42:55.040 I set a breakpoint at line 10, which is, by definition, inside of main. 00:42:55.040 --> 00:42:59.720 But to Peter's question earlier, I feel like lines 10 and 11-- 00:42:59.720 --> 00:43:01.550 frankly, they look pretty correct, right? 00:43:01.550 --> 00:43:03.470 It's hard at this point to have screwed up 00:43:03.470 --> 00:43:07.400 lines 10 and 11 except syntactically, because I'm getting a negative int. 00:43:07.400 --> 00:43:10.730 I'm storing it in i, and then I'm printing out the value of i 00:43:10.730 --> 00:43:12.230 on those two lines. 00:43:12.230 --> 00:43:16.370 But what if instead, I'm curious about get_negative_int? 00:43:16.370 --> 00:43:18.350 I feel like the bug-- logically, it's got 00:43:18.350 --> 00:43:21.170 to be in there because that's the harder code that I wrote. 00:43:21.170 --> 00:43:24.530 Notice this time, instead of clicking Step Over, 00:43:24.530 --> 00:43:28.640 let me go ahead and click on Step Into, which is one of the buttons Peter 00:43:28.640 --> 00:43:29.240 alluded to. 00:43:29.240 --> 00:43:33.440 And when I click Step Into, notice that you sort of go down the rabbit hole. 00:43:33.440 --> 00:43:38.460 And debug50 jumps into the function get_negative_int, 00:43:38.460 --> 00:43:41.460 and it focuses on the first interesting line of code. 00:43:41.460 --> 00:43:44.070 So do, in and of itself, really isn't that interesting. 00:43:44.070 --> 00:43:46.160 Int n isn't that interesting because it's not 00:43:46.160 --> 00:43:48.020 assigning a value to it even yet. 00:43:48.020 --> 00:43:50.930 The first juicy line of code seems to be line 19. 00:43:50.930 --> 00:43:53.150 And that's why the debugger has jumped to that line. 00:43:53.150 --> 00:43:57.350 Now, n = get_int feels pretty correct. 00:43:57.350 --> 00:43:59.300 It's hard to misuse get_int. 00:43:59.300 --> 00:44:02.420 But notice now on the right-hand side what has happened. 00:44:02.420 --> 00:44:06.500 Under Call Stack, you now see two things, not only main, 00:44:06.500 --> 00:44:08.930 but also get_negative_int in a stack. 00:44:08.930 --> 00:44:11.030 It's like a stack of trays in a cafeteria. 00:44:11.030 --> 00:44:13.250 The first tray at the bottom is like main. 00:44:13.250 --> 00:44:17.750 The second tray on the stack in the cafeteria is now get_negative_int. 00:44:17.750 --> 00:44:21.680 And what's cool about this is that notice that right now, I 00:44:21.680 --> 00:44:23.630 can see my local variables, n. 00:44:23.630 --> 00:44:25.380 And that's indeed the variable I used. 00:44:25.380 --> 00:44:26.750 So I no longer see i. 00:44:26.750 --> 00:44:30.780 I see n because I'm into the get_negative_int function. 00:44:30.780 --> 00:44:35.030 And now if I keep clicking Step Over again and again 00:44:35.030 --> 00:44:36.140 after typing in a number. 00:44:36.140 --> 00:44:38.210 Let me type in negative 1 here. 00:44:38.210 --> 00:44:41.540 Now notice on the top right of the screen, you can see in the debugger 00:44:41.540 --> 00:44:43.280 that n equals negative 1. 00:44:43.280 --> 00:44:45.830 I'm going to now go ahead and click Step Over. 00:44:45.830 --> 00:44:48.680 And I think I'm going to end up in line 22. 00:44:48.680 --> 00:44:51.920 If the human has typed in a negative integer like negative 1, 00:44:51.920 --> 00:44:53.480 obviously, that's negative. 00:44:53.480 --> 00:44:55.160 Let's proceed to line 22. 00:44:55.160 --> 00:44:58.310 But watch what happens when I click Step Over. 00:44:58.310 --> 00:45:03.740 It actually seems to be going back to the do loop again and again 00:45:03.740 --> 00:45:06.750 and again, as it will, I keep providing negative integers. 00:45:06.750 --> 00:45:10.670 So my logic then should be, well, OK, if n is negative 1, 00:45:10.670 --> 00:45:17.030 but my loop is still running, what should your logical takeaway here be? 00:45:17.030 --> 00:45:20.710 If n is negative 1, and that is by definition a negative integer, 00:45:20.710 --> 00:45:25.720 but my loop is still running, what could be your diagnostic conclusion 00:45:25.720 --> 00:45:29.860 if the debugger is essentially revealing this hint to you? n is negative 1, 00:45:29.860 --> 00:45:31.420 but the loop is still going. 00:45:31.420 --> 00:45:33.730 Omar, what would you conclude? 00:45:33.730 --> 00:45:36.850 OMAR: Either the condition is wrong, or maybe some sort of Boolean logic 00:45:36.850 --> 00:45:37.755 could be flawed. 00:45:37.755 --> 00:45:38.630 DAVID MALAN: Perfect. 00:45:38.630 --> 00:45:40.463 So obviously, either the condition is wrong, 00:45:40.463 --> 00:45:42.505 or there's something wrong with my Boolean logic. 00:45:42.505 --> 00:45:44.540 And Boolean logic just refers to true or false. 00:45:44.540 --> 00:45:46.930 So somewhere, I'm saying true instead of false, 00:45:46.930 --> 00:45:48.850 or I'm saying false instead of true. 00:45:48.850 --> 00:45:52.060 And frankly, the only place where I have code 00:45:52.060 --> 00:45:56.050 that's going to make this loop go again and again must logically be on line 21. 00:45:56.050 --> 00:45:59.350 So even if you're not quite sure how to fix it yet, just by deduction, 00:45:59.350 --> 00:46:02.215 you should realize that, OK, negative 1 is what's in the variable. 00:46:02.215 --> 00:46:03.340 But that's not good enough. 00:46:03.340 --> 00:46:04.340 The loop is still going. 00:46:04.340 --> 00:46:05.680 I must have screwed up the loop. 00:46:05.680 --> 00:46:08.080 And indeed, let me just now call it out. 00:46:08.080 --> 00:46:11.290 Line 21 is indeed the source of the bug. 00:46:11.290 --> 00:46:12.520 So we've isolated it. 00:46:12.520 --> 00:46:15.160 Out of 23 lines, we've at least found the one line 00:46:15.160 --> 00:46:18.520 where I know the solution has to be. 00:46:18.520 --> 00:46:19.610 What's the solution? 00:46:19.610 --> 00:46:26.020 How do I fix the logic now thanks to the debugger having led me down this road? 00:46:26.020 --> 00:46:29.230 How do I fix line 21 here? 00:46:29.230 --> 00:46:31.350 What's the fix? 00:46:31.350 --> 00:46:33.960 What do you propose? 00:46:33.960 --> 00:46:35.220 Yeah, Jacob? 00:46:35.220 --> 00:46:38.700 JACOB: You would have to change it from while n is less than 0 00:46:38.700 --> 00:46:40.345 to while n is greater than 0. 00:46:40.345 --> 00:46:41.220 DAVID MALAN: Exactly. 00:46:41.220 --> 00:46:44.640 So instead of n less than 0, I want to say n greater than 0. 00:46:44.640 --> 00:46:46.860 And I think-- slight clarification, I think 00:46:46.860 --> 00:46:50.328 I want to include 0 here because 0 is not negative. 00:46:50.328 --> 00:46:52.620 And if I want a negative int, I think what I'm probably 00:46:52.620 --> 00:46:56.070 going to want to say is while n is greater than or equal to 0, 00:46:56.070 --> 00:46:57.120 keep doing the loop. 00:46:57.120 --> 00:46:59.970 So I very understandably sort of just inverted the logic. 00:46:59.970 --> 00:47:00.490 No big deal. 00:47:00.490 --> 00:47:02.323 I'm thinking negatives, and I did less than. 00:47:02.323 --> 00:47:03.670 But the fix is easy. 00:47:03.670 --> 00:47:06.300 The point is the debugger led you to this point. 00:47:06.300 --> 00:47:08.730 Now, those of you who have programmed before probably 00:47:08.730 --> 00:47:10.290 saw the bug jumping out at you. 00:47:10.290 --> 00:47:13.123 Those of you who haven't programmed before, probably with some time, 00:47:13.123 --> 00:47:15.940 would have figured out what the bug was, because out of 23 lines, 00:47:15.940 --> 00:47:17.580 it's got to be one of those. 00:47:17.580 --> 00:47:19.830 But as our programs get more sophisticated, 00:47:19.830 --> 00:47:25.020 and we start writing more lines of code, debug50 and debuggers in general 00:47:25.020 --> 00:47:26.020 will be your friend. 00:47:26.020 --> 00:47:29.865 And I realize that this is easier said than done because at first, 00:47:29.865 --> 00:47:32.240 when using a debugger, you're going to feel like, ah, I'm 00:47:32.240 --> 00:47:33.282 just going to use printf. 00:47:33.282 --> 00:47:35.700 Ah, I'm just going to fight through this. 00:47:35.700 --> 00:47:37.500 Because there's a bit of a learning curve, 00:47:37.500 --> 00:47:41.640 you will gain back that time and more by just 00:47:41.640 --> 00:47:47.490 using a debugger as your first instinct when chasing down problems like this. 00:47:47.490 --> 00:47:51.660 All right, so that's it for debug50, a new tool in your toolkit in addition 00:47:51.660 --> 00:47:52.800 to printf. 00:47:52.800 --> 00:47:55.730 But debug50 is hands down the more powerful of the two. 00:47:55.730 --> 00:47:58.230 Now, some of you have wondered over the past couple of weeks 00:47:58.230 --> 00:48:00.207 why there's this little rubber duck here. 00:48:00.207 --> 00:48:02.040 And there actually is a reason for this too. 00:48:02.040 --> 00:48:05.280 And there's one final debugging technique that, in all seriousness, 00:48:05.280 --> 00:48:08.390 we'll introduce you today to known as rubber duck debugging. 00:48:08.390 --> 00:48:09.390 And you can google this. 00:48:09.390 --> 00:48:11.288 There's a whole Wikipedia article about it. 00:48:11.288 --> 00:48:14.580 And this is kind of a thing in computer science circles for computer scientists 00:48:14.580 --> 00:48:16.860 or programmers to have rubber ducks on their desk. 00:48:16.860 --> 00:48:19.290 And the point here is that sometimes, when 00:48:19.290 --> 00:48:22.710 trying to understand what is wrong in your code, 00:48:22.710 --> 00:48:24.420 it helps to just talk it through. 00:48:24.420 --> 00:48:28.620 And in an ideal world, we would just talk to our colleague or our partner 00:48:28.620 --> 00:48:29.610 on some project. 00:48:29.610 --> 00:48:33.060 And just in hearing yourself vocalize what it is your code 00:48:33.060 --> 00:48:36.810 is supposed to do, very often, that proverbial light bulb goes off. 00:48:36.810 --> 00:48:39.330 And you're like, oh, wait a minute, never mind, I got it, 00:48:39.330 --> 00:48:42.600 just because you heard yourself speaking illogically when 00:48:42.600 --> 00:48:44.910 you intended something actual logical. 00:48:44.910 --> 00:48:49.410 Now, we don't often all have colleagues or partners or friends with whom we're 00:48:49.410 --> 00:48:50.698 working on a project with. 00:48:50.698 --> 00:48:52.740 And we don't often have family members or friends 00:48:52.740 --> 00:48:55.270 who want to hear about our code of all things. 00:48:55.270 --> 00:48:58.590 And so a wonderful proxy for that conversant partner 00:48:58.590 --> 00:49:00.300 would be literally a rubber duck. 00:49:00.300 --> 00:49:03.900 And so here in healthier times, we would be giving all of you rubber ducks. 00:49:03.900 --> 00:49:07.080 Here on stage, we brought a larger one for us all to share. 00:49:07.080 --> 00:49:09.900 If you've noticed in some of the wide shots on camera, 00:49:09.900 --> 00:49:12.100 there's a duck who's been watching this whole time. 00:49:12.100 --> 00:49:13.975 So that any time I screw up, I literally have 00:49:13.975 --> 00:49:17.850 someone I can sort of talk to nonverbally, in this case. 00:49:17.850 --> 00:49:20.880 But we can't emphasize enough that in addition to printf, in addition to 00:49:20.880 --> 00:49:25.230 the more sophisticated debug50, talking through your problems with code 00:49:25.230 --> 00:49:26.940 is a wonderfully valuable thing. 00:49:26.940 --> 00:49:29.010 And if your friends or family are willing to hear 00:49:29.010 --> 00:49:31.650 about some low-level code you're writing and some bug you're 00:49:31.650 --> 00:49:33.000 trying to solve, great. 00:49:33.000 --> 00:49:36.210 But in the absence of that, talk to a stuffed animal in your room. 00:49:36.210 --> 00:49:38.400 Talk to an actual rubber duck if you have one. 00:49:38.400 --> 00:49:39.960 Talk even aloud or think aloud. 00:49:39.960 --> 00:49:42.120 It's just a wonderful compelling habit to get 00:49:42.120 --> 00:49:46.440 into because just in hearing yourself vocalize what you think is logical 00:49:46.440 --> 00:49:51.750 will the illogical very often jump out at you instead. 00:49:51.750 --> 00:49:55.620 All right, so with that said, that's been a lot. 00:49:55.620 --> 00:49:57.690 Let's go ahead here and take a five-minute break, 00:49:57.690 --> 00:49:59.130 give everyone a bit of a breather. 00:49:59.130 --> 00:50:01.050 And when we come back, we'll take a look now 00:50:01.050 --> 00:50:02.880 at some of the more powerful features of C 00:50:02.880 --> 00:50:05.850 now that we can trust that we can solve any problems with all 00:50:05.850 --> 00:50:06.730 of these new tools. 00:50:06.730 --> 00:50:08.700 So we'll be back in five. 00:50:08.700 --> 00:50:10.440 All right, we are back. 00:50:10.440 --> 00:50:13.320 So let's take a look underneath the hood, so to speak, 00:50:13.320 --> 00:50:15.480 of a computer, because as fancy as these devices 00:50:15.480 --> 00:50:17.460 are and as powerful as they seem, they're 00:50:17.460 --> 00:50:21.630 relatively simple in their capabilities and what they can actually do. 00:50:21.630 --> 00:50:24.570 And let's reveal as much by way of last week's discussion of type. 00:50:24.570 --> 00:50:27.700 So recall that C supports different data types. 00:50:27.700 --> 00:50:31.060 So we saw char, and string, and int, and so forth. 00:50:31.060 --> 00:50:32.730 So to recap, we had all of these. 00:50:32.730 --> 00:50:35.310 Well, it turns out that each of these data types 00:50:35.310 --> 00:50:40.800 is defined on a typical computer system as taking up a fixed amount of space. 00:50:40.800 --> 00:50:44.280 And it depends on the computer, whether it's Mac or PC, or old or new, 00:50:44.280 --> 00:50:47.400 just how much space is used typically by these data types. 00:50:47.400 --> 00:50:51.400 But on CS50 IDE, the sizes of all of these types are as follows-- 00:50:51.400 --> 00:50:54.510 a bool, true or false, uses just 1 byte. 00:50:54.510 --> 00:50:58.320 Now, that's actually a little wasteful because 1 byte is 8 bits, and gosh, 00:50:58.320 --> 00:51:00.090 for a bool, you should only need 1 bit. 00:51:00.090 --> 00:51:04.200 You can't work at the single-bit level easily in C. 00:51:04.200 --> 00:51:07.440 And so we just typically spend 1 whole byte on a bool. 00:51:07.440 --> 00:51:09.640 Char is going to be 1 byte as well. 00:51:09.640 --> 00:51:13.110 And that might sound familiar, because last week when we talked about ASCII, 00:51:13.110 --> 00:51:15.450 we proposed that the total number of possible characters 00:51:15.450 --> 00:51:20.190 you can represent with a char was 256 because of 8 bits and 2 00:51:20.190 --> 00:51:21.340 to the eighth power. 00:51:21.340 --> 00:51:23.620 So one char is 1 byte. 00:51:23.620 --> 00:51:25.383 And that's fixed in C, no matter what. 00:51:25.383 --> 00:51:27.300 Then there were all of these other data types. 00:51:27.300 --> 00:51:29.910 There was float, which is a real number with a decimal point. 00:51:29.910 --> 00:51:31.410 That happens to use 4 bytes. 00:51:31.410 --> 00:51:34.000 A double is also a real number with a decimal point, 00:51:34.000 --> 00:51:36.820 but it uses 8 bytes, which gives you even more precision. 00:51:36.820 --> 00:51:40.450 You can have more significant digits after the decimal point, for instance. 00:51:40.450 --> 00:51:41.830 Ints, we've used a bunch. 00:51:41.830 --> 00:51:43.540 Those are 4 bytes, typically. 00:51:43.540 --> 00:51:45.490 A long is twice as big, and that just allows 00:51:45.490 --> 00:51:47.115 you to represent an even bigger number. 00:51:47.115 --> 00:51:49.780 And some of you might have done that exactly on credit when 00:51:49.780 --> 00:51:51.400 storing a whole credit card number. 00:51:51.400 --> 00:51:54.310 Strings, for now, are a variable number of bytes. 00:51:54.310 --> 00:51:57.780 It could be a short string of text, a long string of text, a whole paragraph. 00:51:57.780 --> 00:51:58.780 So that's going to vary. 00:51:58.780 --> 00:52:01.870 So we'll come back to this notion of string next time. 00:52:01.870 --> 00:52:05.380 But today, focus on just these primitive types, if you will. 00:52:05.380 --> 00:52:08.960 And here is a picture of what is inside of your computer. 00:52:08.960 --> 00:52:12.285 So this is a piece of memory or RAM, Random Access Memory. 00:52:12.285 --> 00:52:13.660 And it might be a little smaller. 00:52:13.660 --> 00:52:15.820 It might be a little bigger depending on whether it's a laptop, 00:52:15.820 --> 00:52:17.450 or desktop, or phone, or the like. 00:52:17.450 --> 00:52:20.710 But it's in memory, or RAM, that programs 00:52:20.710 --> 00:52:22.780 are stored while they're running. 00:52:22.780 --> 00:52:25.840 And it's where files are stored when they are open. 00:52:25.840 --> 00:52:29.440 So typically, if you save, install programs, or save files, 00:52:29.440 --> 00:52:32.740 those are saved on what's generally called your hard drive, or hard disk, 00:52:32.740 --> 00:52:37.090 or solid-state disk, or CD, or some other physical medium. 00:52:37.090 --> 00:52:40.450 And that, the [INAUDIBLE] of which is that they don't require electricity 00:52:40.450 --> 00:52:42.160 to store your data long term. 00:52:42.160 --> 00:52:43.180 RAM is different. 00:52:43.180 --> 00:52:44.830 It's volatile, so to speak. 00:52:44.830 --> 00:52:48.500 But it's much faster than a hard disk or a solid-state disk, even. 00:52:48.500 --> 00:52:50.500 It's much faster because it's purely electronic. 00:52:50.500 --> 00:52:52.300 And indeed, there are no moving parts. 00:52:52.300 --> 00:52:54.640 It's purely electronic, as pictured here. 00:52:54.640 --> 00:52:59.410 And so with RAM, you have the ability to open files and run programs 00:52:59.410 --> 00:53:02.290 more quickly because when you double-click a program to run it, 00:53:02.290 --> 00:53:04.960 or you open a file in order to view or edit it, 00:53:04.960 --> 00:53:06.820 it's stored temporarily in RAM. 00:53:06.820 --> 00:53:11.290 And long story short, if your laptop battery has ever died, 00:53:11.290 --> 00:53:13.990 or your computer's gotten unplugged, or your phone dies, 00:53:13.990 --> 00:53:17.770 the reason that you and I tend to lose data, the paragraph that you just 00:53:17.770 --> 00:53:19.960 wrote in the essay that you hadn't yet saved, 00:53:19.960 --> 00:53:23.020 is because RAM, memory, is volatile. 00:53:23.020 --> 00:53:26.360 That is, it requires electricity to continue powering it. 00:53:26.360 --> 00:53:30.100 But for our purposes, we're only going to focus on RAM, 00:53:30.100 --> 00:53:33.340 not so much long-term disk space yet, because when 00:53:33.340 --> 00:53:36.820 you're running a program in C, it is indeed, by definition, 00:53:36.820 --> 00:53:38.680 running in your computer's memory. 00:53:38.680 --> 00:53:41.530 But the funny thing about something as simple as this picture 00:53:41.530 --> 00:53:44.290 is that each of these black rectangles is kind of a chip. 00:53:44.290 --> 00:53:47.710 And in those chips are stored all of the 0's and 1's, the little 00:53:47.710 --> 00:53:49.690 switches that we alluded to in week 0. 00:53:49.690 --> 00:53:53.410 So let's focus on and just zoom in on just one of these chips. 00:53:53.410 --> 00:53:57.130 Now, it stands to reason that I don't know how big this stick of RAM is. 00:53:57.130 --> 00:53:59.440 Maybe it's 1 gigabyte, a billion bytes. 00:53:59.440 --> 00:54:01.000 Maybe it's 4 gigabytes. 00:54:01.000 --> 00:54:02.800 Maybe it's even smaller or bigger. 00:54:02.800 --> 00:54:07.250 There's some number of bytes represented physically by this hardware. 00:54:07.250 --> 00:54:10.228 So if we zoom in further, let me propose that, all right, 00:54:10.228 --> 00:54:11.770 I don't know how many bytes are here. 00:54:11.770 --> 00:54:15.130 But if there's some number of bytes, whether it's a billion or 2 billion, 00:54:15.130 --> 00:54:17.440 or fewer or more, it stands to reason that we 00:54:17.440 --> 00:54:19.240 could just number all of these bytes. 00:54:19.240 --> 00:54:22.670 We could sort of think of this physical device, this memory, 00:54:22.670 --> 00:54:24.970 as just being a grid, top to bottom, left to right. 00:54:24.970 --> 00:54:28.270 And each of the squares I've just overlaid on this physical device 00:54:28.270 --> 00:54:30.057 might represent an individual byte. 00:54:30.057 --> 00:54:32.140 And again, in reality, maybe there's more of them. 00:54:32.140 --> 00:54:33.410 Maybe there's fewer of them. 00:54:33.410 --> 00:54:35.920 But it stands to reason, no matter how many there are, 00:54:35.920 --> 00:54:38.650 we can think of each of these as having a location. 00:54:38.650 --> 00:54:42.200 Like, this is the first byte, second byte, third byte, and so forth. 00:54:42.200 --> 00:54:45.940 Well, what does it mean, then, for a char to take up 1 byte? 00:54:45.940 --> 00:54:49.750 That means that if your computer's memory is running a program maybe 00:54:49.750 --> 00:54:53.890 that you wrote or I wrote that's using a char variable somewhere in it, 00:54:53.890 --> 00:54:56.500 the char you're storing in that variable may very well 00:54:56.500 --> 00:55:00.940 be stored in the top left-hand corner physically of this piece of RAM. 00:55:00.940 --> 00:55:01.660 Maybe it's there. 00:55:01.660 --> 00:55:02.535 Maybe it's elsewhere. 00:55:02.535 --> 00:55:04.840 But it's just one physical square. 00:55:04.840 --> 00:55:08.110 If you're storing something like an int, which takes up 4 bytes, 00:55:08.110 --> 00:55:11.830 well, that frankly might take up all four squares along the top there 00:55:11.830 --> 00:55:12.670 or somewhere else. 00:55:12.670 --> 00:55:15.610 If you're using a long, that's going to take up twice as much space. 00:55:15.610 --> 00:55:18.250 So representing an even bigger number in your computer's memory 00:55:18.250 --> 00:55:21.340 is going to require that you use all of the 0's and 1's 00:55:21.340 --> 00:55:25.030 comprising these 8 bytes instead. 00:55:25.030 --> 00:55:27.170 but let's now move away from physical hardware. 00:55:27.170 --> 00:55:30.530 Let's abstract it away, if you will, and just now start to think of our memory 00:55:30.530 --> 00:55:31.390 as just this grid. 00:55:31.390 --> 00:55:33.280 And technically, it's not a two-dimensional structure. 00:55:33.280 --> 00:55:35.860 I could just as easily draw all of these bytes from left to right. 00:55:35.860 --> 00:55:37.790 I could just fit fewer of them on the screen. 00:55:37.790 --> 00:55:39.832 So we'll take the physical metaphor a bit further 00:55:39.832 --> 00:55:44.020 and just think of our computer's memory as this grid, this grid of bytes. 00:55:44.020 --> 00:55:46.510 And those bytes are each 8 bits. 00:55:46.510 --> 00:55:48.340 Those bits are just 0's and 1's. 00:55:48.340 --> 00:55:52.840 So what we've really done is zoom in metaphorically on our computer's memory 00:55:52.840 --> 00:55:57.460 to start thinking about where things are going to end up in memory when you 00:55:57.460 --> 00:56:01.030 double-click on a program on your Mac or PC or, in CS50 IDE, 00:56:01.030 --> 00:56:04.720 when you do ./hello or ./buggy0 or ./buggy1, 00:56:04.720 --> 00:56:07.990 it's these bytes in your computer's memory that are filled with all 00:56:07.990 --> 00:56:09.440 of your variables' values. 00:56:09.440 --> 00:56:10.940 So let's consider an example here. 00:56:10.940 --> 00:56:14.800 Suppose I had written some code that involved declaring three scores. 00:56:14.800 --> 00:56:17.680 Maybe it's a class that's got, like, three tests. 00:56:17.680 --> 00:56:23.343 And you want to average the student's grade across all three of those tests. 00:56:23.343 --> 00:56:26.260 Well, let's go ahead and write a quick program that does exactly this. 00:56:26.260 --> 00:56:30.640 In CS50 IDE, I'm going to create a program called scores.c. 00:56:30.640 --> 00:56:35.920 And in scores.c, I'm going to go ahead and #include stdio.h. 00:56:35.920 --> 00:56:38.703 I'm going to then do my int main(void) as usual. 00:56:38.703 --> 00:56:41.120 And then inside of here, I'm going to keep it very simple. 00:56:41.120 --> 00:56:43.750 I'm going to give myself one int called score1. 00:56:43.750 --> 00:56:46.180 And just to be a little playful, I'm going 00:56:46.180 --> 00:56:48.242 to set it equal to 72, like last week. 00:56:48.242 --> 00:56:50.200 I'm going to give myself a second score and set 00:56:50.200 --> 00:56:55.120 it equal to 73, and then a third score whose value is going to be 33. 00:56:55.120 --> 00:56:59.770 And then let me go ahead and print out the average of those three values 00:56:59.770 --> 00:57:03.100 by plugging in a placeholder for floating point value, right? 00:57:03.100 --> 00:57:07.330 If you add three integers together and divide them by 3, 00:57:07.330 --> 00:57:10.730 I may very well get a fraction or a real number with a decimal point. 00:57:10.730 --> 00:57:14.110 So I'm going to use %f instead of %i because I don't want to truncate 00:57:14.110 --> 00:57:15.010 someone's grade. 00:57:15.010 --> 00:57:19.210 Otherwise, if they have, like, a 99.9%, they're not being rounded up to 100%. 00:57:19.210 --> 00:57:22.880 They're going to get the 99% because of truncation, as we discussed last week. 00:57:22.880 --> 00:57:25.030 So how do I do now the math of an average? 00:57:25.030 --> 00:57:27.220 Well, it's pretty straightforward-- score1 00:57:27.220 --> 00:57:30.160 plus score2 plus score3 in parentheses, just 00:57:30.160 --> 00:57:33.610 like in math, divided by 3, semicolon. 00:57:33.610 --> 00:57:35.000 Let me save that file. 00:57:35.000 --> 00:57:36.655 Let me do make scores at the bottom. 00:57:36.655 --> 00:57:38.530 Again, we're not going to use Clang manually. 00:57:38.530 --> 00:57:41.230 No need to, because it's a lot easier to run make. 00:57:41.230 --> 00:57:42.670 But I did mess up here. 00:57:42.670 --> 00:57:46.420 "Format specifies type 'double', but the argument has type 'int'." 00:57:46.420 --> 00:57:48.500 So I don't quite understand that. 00:57:48.500 --> 00:57:52.540 But it's drawing my attention to the %f and the fact that my math looks like 00:57:52.540 --> 00:57:53.930 this. 00:57:53.930 --> 00:57:56.110 So any thoughts here? 00:57:56.110 --> 00:57:59.650 I don't think printf is going to help me here because the bug is 00:57:59.650 --> 00:58:01.810 within the printf line. 00:58:01.810 --> 00:58:06.160 I don't think that debug50 is going to really help me here because I already 00:58:06.160 --> 00:58:09.130 know what line of code the bug is in. 00:58:09.130 --> 00:58:13.570 This feels like an opportunity to talk to the physical duck 00:58:13.570 --> 00:58:15.280 or some other inanimate object. 00:58:15.280 --> 00:58:21.070 Or we can perhaps think about what errors we ran into even last week. 00:58:21.070 --> 00:58:23.410 [? Arpan, ?] what do you think? 00:58:23.410 --> 00:58:27.520 [? ARPAN: ?] I think it's telling you this because it's 00:58:27.520 --> 00:58:31.390 receiving in all the values are integer type, 00:58:31.390 --> 00:58:33.557 but you are telling it to be in float. 00:58:33.557 --> 00:58:34.390 DAVID MALAN: Indeed. 00:58:34.390 --> 00:58:37.450 So score1, score2, score3 are all integers, 00:58:37.450 --> 00:58:40.570 and the number 3 is literally an integer. 00:58:40.570 --> 00:58:43.720 And so this time, the compiler is smart enough to realize, wait a minute, 00:58:43.720 --> 00:58:48.850 you're trying to coerce an integer result into a floating point value, 00:58:48.850 --> 00:58:51.592 but you haven't done any floating point arithmetic, if you will. 00:58:51.592 --> 00:58:52.300 So you know what? 00:58:52.300 --> 00:58:53.592 There's a few ways to fix this. 00:58:53.592 --> 00:58:56.680 Last week, recall we proposed that you could use a cast, 00:58:56.680 --> 00:59:00.880 and you could explicitly cast one or more of those values to a float. 00:59:00.880 --> 00:59:02.830 So I could do this, for instance. 00:59:02.830 --> 00:59:06.728 Or I could cast all of these to floats or one of these to floats. 00:59:06.728 --> 00:59:08.270 There's many different possibilities. 00:59:08.270 --> 00:59:12.250 But frankly, the simplest fix is just to divide, for instance, by 3.0. 00:59:12.250 --> 00:59:16.263 I can avoid some of the headaches of casting from one to another by just 00:59:16.263 --> 00:59:18.430 making sure that there's at least one floating point 00:59:18.430 --> 00:59:20.680 value involved in this arithmetic. 00:59:20.680 --> 00:59:23.020 So now let me recompile scores. 00:59:23.020 --> 00:59:24.460 This time, it compiles OK. 00:59:24.460 --> 00:59:31.810 Let me do ./scores, and voila, my average isn't so high, 59.333333. 00:59:31.810 --> 00:59:34.750 All right, so what is actually going on inside 00:59:34.750 --> 00:59:38.980 of the computer irrespective of the floating point arithmetic, which was, 00:59:38.980 --> 00:59:40.760 again, a topic of last week? 00:59:40.760 --> 00:59:44.470 Well, let's consider these three variables, score1, score2, score3-- 00:59:44.470 --> 00:59:47.500 where are they actually being stored in the computer's memory? 00:59:47.500 --> 00:59:49.120 Well, let's consider that grid again. 00:59:49.120 --> 00:59:51.550 And again, I'm going to start at top left for convenience. 00:59:51.550 --> 00:59:53.925 But technically speaking-- we'll see this down the road-- 00:59:53.925 --> 00:59:56.410 your computer's memory is just like this big canvas. 00:59:56.410 --> 00:59:59.150 And values can end up in all different places. 00:59:59.150 --> 01:00:00.700 But for today, we'll keep it clean. 01:00:00.700 --> 01:00:03.830 The first variable, score1, I claim is going to be here, 01:00:03.830 --> 01:00:05.420 top left, for simplicity. 01:00:05.420 --> 01:00:08.290 But what's important about where score1-- 01:00:08.290 --> 01:00:13.120 that is, 72-- is being stored, is it's taking up four of these boxes. 01:00:13.120 --> 01:00:15.670 Each of these boxes, recall, represents 1 byte. 01:00:15.670 --> 01:00:19.600 And an integer, recall, in CS50 IDE is 4 bytes. 01:00:19.600 --> 01:00:24.250 Therefore, I have used 4 bytes of space to represent the number 72. 01:00:24.250 --> 01:00:27.430 The number 73 in score2 similarly is going 01:00:27.430 --> 01:00:32.150 to take up four boxes, as is score3 going to take up four boxes as well. 01:00:32.150 --> 01:00:34.750 But what's really going on underneath the hood here? 01:00:34.750 --> 01:00:37.450 Well, if each of these squares represents a byte, 01:00:37.450 --> 01:00:42.070 and each of those bytes is 8 bits, and a bit is just a 0 or 1, 01:00:42.070 --> 01:00:45.130 what's really going on underneath the hood is something like this. 01:00:45.130 --> 01:00:47.980 Somehow, this electronic memory is storing 01:00:47.980 --> 01:00:50.920 electricity in just the right way so that it's storing 01:00:50.920 --> 01:00:53.650 this pattern of 0's and 1's, a.k.a. 01:00:53.650 --> 01:00:57.370 72 in decimal, this pattern of 0's and 1's, a.k.a. 01:00:57.370 --> 01:01:01.180 73 in decimal, this pattern of 0's and 1's, a.k.a. 01:01:01.180 --> 01:01:02.682 33 in decimal. 01:01:02.682 --> 01:01:05.140 But again, we don't have to keep thinking about or dwelling 01:01:05.140 --> 01:01:06.190 on the binary level. 01:01:06.190 --> 01:01:09.100 But this is only to say that everything we've discussed thus far 01:01:09.100 --> 01:01:11.560 is coming together now in this one picture 01:01:11.560 --> 01:01:14.320 because a computer is just storing these patterns for us, 01:01:14.320 --> 01:01:16.930 and we are allocating space now thanks to our programming 01:01:16.930 --> 01:01:20.050 language via code like this. 01:01:20.050 --> 01:01:26.770 But this code, correct though it may be, indeed 59.333333 and so forth 01:01:26.770 --> 01:01:31.240 was my average if my test scores were 72, 73, and 33. 01:01:31.240 --> 01:01:34.330 But I feel like there's an opportunity for better design here. 01:01:34.330 --> 01:01:37.570 So not just correctness, not just style, recall that design 01:01:37.570 --> 01:01:40.000 is this other metric of code quality. 01:01:40.000 --> 01:01:42.100 And it's a little more subjective, and it's 01:01:42.100 --> 01:01:45.490 a little more subject to debate among reasonable people. 01:01:45.490 --> 01:01:50.258 But I don't really love what I was doing with this naming scheme. 01:01:50.258 --> 01:01:52.300 And in fact, if we look at the code, there really 01:01:52.300 --> 01:01:55.240 wasn't much more to my program than these three lines. 01:01:55.240 --> 01:01:58.840 I worry this program isn't particularly well designed. 01:01:58.840 --> 01:02:04.360 What rubs you the wrong way, perhaps, about those three lines of code? 01:02:04.360 --> 01:02:06.255 What could be better? 01:02:06.255 --> 01:02:08.380 And even if you don't know the solution, especially 01:02:08.380 --> 01:02:13.688 if you've never programmed before, what kind of smells about those three lines? 01:02:13.688 --> 01:02:14.980 This is actually a term of art. 01:02:14.980 --> 01:02:18.820 "Code smell" is like something-- not loving that for some reason. 01:02:18.820 --> 01:02:22.270 If you can't put your finger on it, it's not the best design. 01:02:22.270 --> 01:02:23.380 The code smells. 01:02:23.380 --> 01:02:26.950 What's smelly, if you will, about score1, score2, score3? 01:02:26.950 --> 01:02:28.405 Ryan, what do you think? 01:02:28.405 --> 01:02:30.280 RYAN: If you're doing an average calculation, 01:02:30.280 --> 01:02:33.170 you don't need to add them up all together in the code. 01:02:33.170 --> 01:02:35.970 You can add them up beforehand and store it as one variable. 01:02:35.970 --> 01:02:36.970 DAVID MALAN: Absolutely. 01:02:36.970 --> 01:02:39.803 If I'm computing the average, I don't need to keep all three around. 01:02:39.803 --> 01:02:42.940 I can just keep a sum and then divide the whole sum by the total number. 01:02:42.940 --> 01:02:45.070 I like that, that instinct. 01:02:45.070 --> 01:02:49.480 What else might you not like about the design of this code now? 01:02:49.480 --> 01:02:51.340 Score1, score2, score3. 01:02:54.110 --> 01:02:56.150 Score1, score2, score3. 01:02:56.150 --> 01:02:59.030 Might there be opportunity still for improvement? 01:02:59.030 --> 01:03:02.090 I feel like any time you start to see this repetition, maybe. 01:03:02.090 --> 01:03:03.600 Andrew, your thoughts? 01:03:03.600 --> 01:03:06.715 ANDREW: Not hard code the three scores together? 01:03:06.715 --> 01:03:08.840 DAVID MALAN: OK, so not hard code the three scores. 01:03:08.840 --> 01:03:10.500 And what would you do instead? 01:03:10.500 --> 01:03:14.060 ANDREW: Maybe take an input, or I would-- 01:03:14.060 --> 01:03:16.562 yeah, I wouldn't write out the scores themselves. 01:03:16.562 --> 01:03:18.270 DAVID MALAN: Yeah, another good instinct. 01:03:18.270 --> 01:03:21.470 It's kind of stupid that I've written a program, compiled a program, 01:03:21.470 --> 01:03:25.847 that only computes the average for some student who literally got those three 01:03:25.847 --> 01:03:26.930 test scores and no others. 01:03:26.930 --> 01:03:28.490 Like, there's no dynamism here. 01:03:28.490 --> 01:03:31.550 Moreover, it's a little lazy too that I called 01:03:31.550 --> 01:03:33.890 my variables score1, score2, score3. 01:03:33.890 --> 01:03:35.450 I mean, where does it end after that? 01:03:35.450 --> 01:03:37.790 If I want to have a fourth test next semester, now 01:03:37.790 --> 01:03:39.140 I have to go and have score4. 01:03:39.140 --> 01:03:40.760 If I've got a fifth, score5. 01:03:40.760 --> 01:03:44.270 That starts to be reminiscent of last week's copy/paste, which 01:03:44.270 --> 01:03:45.990 really wasn't the best practice. 01:03:45.990 --> 01:03:48.590 And so let me propose that we clean this up. 01:03:48.590 --> 01:03:50.900 And it turns out we can clean this up by way 01:03:50.900 --> 01:03:53.210 of another topic, another feature of C that's 01:03:53.210 --> 01:03:56.100 also present in other languages, known as arrays. 01:03:56.100 --> 01:03:58.880 And if you happened to use something called a list in Scratch, 01:03:58.880 --> 01:04:01.760 very similar in spirit to Scratch's lists. 01:04:01.760 --> 01:04:05.060 But we didn't see those in lecture that first week. 01:04:05.060 --> 01:04:11.030 An array in C, as in other languages, is a sequence 01:04:11.030 --> 01:04:14.870 of values stored in memory back to back to back, 01:04:14.870 --> 01:04:19.020 a sequence of contiguous values, so to speak, back to back to back. 01:04:19.020 --> 01:04:22.185 So in that sense, it's like a list of values from left to right 01:04:22.185 --> 01:04:24.560 if we use the metaphor of the picture we've been drawing. 01:04:24.560 --> 01:04:27.480 So how might this be germane here? 01:04:27.480 --> 01:04:30.710 Well, it turns out that if you want to store a whole bunch of values, 01:04:30.710 --> 01:04:33.320 but they're all kind of interrelated, like they're all scores, 01:04:33.320 --> 01:04:37.340 you don't have to resort to this sort of lazy, score1, score2, score3, score4, 01:04:37.340 --> 01:04:40.850 score5, up to score99, depending on how many scores there are. 01:04:40.850 --> 01:04:44.630 Why don't you just call all of those numbers scores, 01:04:44.630 --> 01:04:46.520 but use a slightly different syntax? 01:04:46.520 --> 01:04:49.550 And that syntax gives you access to what are called arrays. 01:04:49.550 --> 01:04:52.730 So the syntax here on the screen is an example 01:04:52.730 --> 01:04:56.930 of declaring space for three integers all at once 01:04:56.930 --> 01:05:00.980 and collectively referring to all of them as the word "scores." 01:05:00.980 --> 01:05:03.320 So there's no more scores 1, 2, and 3. 01:05:03.320 --> 01:05:06.260 All three of those scores are in a variable called "scores." 01:05:06.260 --> 01:05:09.830 And what's new here is the square brackets, inside of which 01:05:09.830 --> 01:05:14.180 is a number that literally connotes how many integers do you 01:05:14.180 --> 01:05:18.150 want to store under the name "scores." 01:05:18.150 --> 01:05:19.940 So what does this allow me to do? 01:05:19.940 --> 01:05:24.240 It allows me still to define three integers in that array. 01:05:24.240 --> 01:05:26.330 So this array is going to be a chunk of memory 01:05:26.330 --> 01:05:29.030 back to back to back that I can put values in. 01:05:29.030 --> 01:05:32.240 And the way I put those values is going to look syntactically like this. 01:05:32.240 --> 01:05:36.050 I still use numbers, but now I'm using a new notation. 01:05:36.050 --> 01:05:39.170 And it's similar to what I resorted to before, 01:05:39.170 --> 01:05:41.390 but it's a little more generalized now and dynamic. 01:05:41.390 --> 01:05:44.960 Now if I want to update the very first score in that array, 01:05:44.960 --> 01:05:47.540 I literally write the name of the variable scores, 01:05:47.540 --> 01:05:51.020 bracket[0] and then assign it the value. 01:05:51.020 --> 01:05:54.200 If I want to get at the second score, I do scores[1]. 01:05:54.200 --> 01:05:56.570 If I want the third score, it's scores[2]. 01:05:56.570 --> 01:06:00.050 And the only thing that's a little weird and takes some getting used to is 01:06:00.050 --> 01:06:04.100 the fact that we are "zero-indexing" our arrays. 01:06:04.100 --> 01:06:06.980 So in past examples, like for loops and while loops, 01:06:06.980 --> 01:06:09.680 I've sort of said, eh, it's a convention in programming 01:06:09.680 --> 01:06:11.180 to start counting from 0. 01:06:11.180 --> 01:06:15.140 When it comes to arrays, which are contiguous 01:06:15.140 --> 01:06:20.292 sequences of values in a computer's memory, they have to start at 0. 01:06:20.292 --> 01:06:22.250 So otherwise, if you don't start counting at 0, 01:06:22.250 --> 01:06:26.280 you're literally going to be wasting space by overlooking one value. 01:06:26.280 --> 01:06:29.100 So now if we were to rename things on the screen, 01:06:29.100 --> 01:06:34.340 instead of calling these three rectangles score1, score2, score3, 01:06:34.340 --> 01:06:35.810 they're all called scores. 01:06:35.810 --> 01:06:38.270 But if you want to refer specifically to the first one, 01:06:38.270 --> 01:06:42.140 you use this fancy bracket notation, and the second one, this bracket notation, 01:06:42.140 --> 01:06:44.150 and the third one, this bracket notation. 01:06:44.150 --> 01:06:45.740 But notice the dichotomy. 01:06:45.740 --> 01:06:51.770 When declaring the array, when creating the array, saying, give me three ints, 01:06:51.770 --> 01:06:56.270 you use [3] where [3] is the total number of values. 01:06:56.270 --> 01:06:59.300 When you index into the array-- 01:06:59.300 --> 01:07:03.260 that is, when you go to a specific location in that chunk of memory-- 01:07:03.260 --> 01:07:05.130 you similarly use numbers. 01:07:05.130 --> 01:07:08.510 But now those are referring to their relative positions, position 0, 01:07:08.510 --> 01:07:10.460 position 1, position 2. 01:07:10.460 --> 01:07:13.160 This is the total number of spaces. 01:07:13.160 --> 01:07:17.480 This is the specific space first, second, and third. 01:07:17.480 --> 01:07:20.000 All right, so pictorially, nothing has changed, 01:07:20.000 --> 01:07:21.872 just our nomenclature really has. 01:07:21.872 --> 01:07:24.080 So let me go ahead and start to improve this program, 01:07:24.080 --> 01:07:28.220 taking in the advice that was offered too on how we can improve the design 01:07:28.220 --> 01:07:30.290 and get rid of the smelliness of it. 01:07:30.290 --> 01:07:32.060 Let me take the first-- 01:07:32.060 --> 01:07:34.650 let me take the easiest of these approaches 01:07:34.650 --> 01:07:37.340 first by just getting rid of these three separate variables 01:07:37.340 --> 01:07:42.590 and instead giving me one variable called scores, an array of size 3. 01:07:42.590 --> 01:07:45.710 And then I don't need to declare score1, score2. 01:07:45.710 --> 01:07:47.420 Again, that's all going away. 01:07:47.420 --> 01:07:48.453 That's all going away. 01:07:48.453 --> 01:07:49.370 That's all going away. 01:07:49.370 --> 01:07:52.520 Now if I want to initialize that array with these three values, 01:07:52.520 --> 01:07:54.440 I say scores[0]. 01:07:54.440 --> 01:07:56.510 And down here, I say scores[1]. 01:07:56.510 --> 01:07:59.060 And down here, I say scores[2]. 01:07:59.060 --> 01:08:01.200 So I've added one line of code. 01:08:01.200 --> 01:08:02.810 But notice the dynamism now. 01:08:02.810 --> 01:08:05.720 If I want to have a fourth one, I can just allocate here and then 01:08:05.720 --> 01:08:09.742 put in the value with another line of code, or 5, or 6, or 7, or 8. 01:08:09.742 --> 01:08:11.450 I don't have to start copying and pasting 01:08:11.450 --> 01:08:13.930 all of these different variable names by convention. 01:08:13.930 --> 01:08:16.930 But I think if we take some of the advice that was offered a moment ago, 01:08:16.930 --> 01:08:20.370 we can also clean this up by way of a loop or such as well. 01:08:20.370 --> 01:08:21.380 So let's do that. 01:08:21.380 --> 01:08:26.140 Let me go ahead and give myself, actually, first the CS50 library so 01:08:26.140 --> 01:08:27.609 that I can use get_int. 01:08:27.609 --> 01:08:30.100 And let's take this first piece of advice, which is, 01:08:30.100 --> 01:08:33.370 let's start asking for a score using get_int. 01:08:33.370 --> 01:08:35.950 And I'm going to do this three times. 01:08:35.950 --> 01:08:37.750 And yeah, I'm getting a little lazy. 01:08:37.750 --> 01:08:38.859 I'm getting a little bored already. 01:08:38.859 --> 01:08:40.029 So I'm going to copy/paste. 01:08:40.029 --> 01:08:41.946 And again, that does not bode well in general. 01:08:41.946 --> 01:08:44.649 When copying and pasting, we can probably do better still. 01:08:44.649 --> 01:08:47.770 But now I think I need to change just one more thing here. 01:08:47.770 --> 01:08:54.010 When doing the math, I want scores[0] plus scores[1] plus scores[2]. 01:08:54.010 --> 01:08:57.580 But before I solve this problem here-- the logic is still the same, 01:08:57.580 --> 01:09:00.310 but I'm now taking in dynamically three integers-- 01:09:00.310 --> 01:09:02.740 there's still a smell to it as well. 01:09:02.740 --> 01:09:04.550 It's still not as well designed. 01:09:04.550 --> 01:09:10.640 And so just to make clear, what could I do be doing better now? 01:09:10.640 --> 01:09:14.359 How could I clean up this code and make it not just correct, not just 01:09:14.359 --> 01:09:17.310 well styled, but better designed? 01:09:17.310 --> 01:09:18.240 What remains here? 01:09:18.240 --> 01:09:18.740 Nina? 01:09:18.740 --> 01:09:20.450 What do you think? 01:09:20.450 --> 01:09:24.170 NINA: The code is specific for only three scores. 01:09:24.170 --> 01:09:27.859 So you could, as an input, [INAUDIBLE] how many scores 01:09:27.859 --> 01:09:30.109 it wants at the very beginning. 01:09:30.109 --> 01:09:33.735 And then instead of having scores[0], scores[1], 01:09:33.735 --> 01:09:40.560 you could use a for loop that goes through from 0 to n minus 1 or less 01:09:40.560 --> 01:09:45.078 than n that will ask, and it should be one line of code instead. 01:09:45.078 --> 01:09:46.370 DAVID MALAN: Yeah, really good. 01:09:46.370 --> 01:09:48.649 It's the fact that we have get_int, get_int, get_int. 01:09:48.649 --> 01:09:51.649 That's the first sign that you're probably doing something suboptimally. 01:09:51.649 --> 01:09:53.510 It might be correct, but it's probably not well designed 01:09:53.510 --> 01:09:55.682 because I did literally resort to copy/paste. 01:09:55.682 --> 01:09:57.890 There's sort of a pattern here that I could certainly 01:09:57.890 --> 01:09:59.580 integrate into something like a loop. 01:09:59.580 --> 01:10:00.470 So let me do that. 01:10:00.470 --> 01:10:03.830 Let me actually get rid of two of these lines of code. 01:10:03.830 --> 01:10:08.870 Let me go up here and do something like for int i get 0, i less than 3 for now, 01:10:08.870 --> 01:10:10.100 i++. 01:10:10.100 --> 01:10:11.600 Let me open up this for loop. 01:10:11.600 --> 01:10:14.120 Let me indent that remaining line of code. 01:10:14.120 --> 01:10:16.520 And instead of scores[0]-- 01:10:16.520 --> 01:10:18.930 this is where arrays get really powerful-- 01:10:18.930 --> 01:10:22.760 you can use a variable to index into an array-- 01:10:22.760 --> 01:10:24.740 that is, to go to a specific location. 01:10:24.740 --> 01:10:26.780 What do I want to use for my variable? 01:10:26.780 --> 01:10:29.400 Well, I would think i here. 01:10:29.400 --> 01:10:33.980 So now I've whittled my lines of code down from all three triplicate, three 01:10:33.980 --> 01:10:36.140 nearly identical lines, into just one really 01:10:36.140 --> 01:10:39.290 inside of a loop that's going to do the same thing for me again and again. 01:10:39.290 --> 01:10:42.230 And as Nina proposed too, I don't have to hard code 01:10:42.230 --> 01:10:43.860 these 3's all over the place. 01:10:43.860 --> 01:10:46.050 Maybe I could do something like this. 01:10:46.050 --> 01:10:50.240 I could say something like, int total gets get_int. 01:10:50.240 --> 01:10:53.690 And I might ask, "Total number of scores." 01:10:53.690 --> 01:10:56.660 And I could literally ask the human from the get-go 01:10:56.660 --> 01:10:58.490 how many total scores are there. 01:10:58.490 --> 01:11:04.680 Then I can even more powerfully use this variable, total, in multiple places 01:11:04.680 --> 01:11:07.820 so that now, I'm doing my math much more dynamically. 01:11:07.820 --> 01:11:11.300 This, though-- I'm afraid, Nina, this broke a bit. 01:11:11.300 --> 01:11:13.280 I'm going to be a little more-- 01:11:13.280 --> 01:11:16.280 I need to exert a little more effort here on line 14 because now I 01:11:16.280 --> 01:11:21.500 can't hard code scores[0], [1], and [2] because if the total number of scores 01:11:21.500 --> 01:11:23.820 is more than that, I need to do more addition. 01:11:23.820 --> 01:11:26.010 If it's fewer than that, I need to do less addition. 01:11:26.010 --> 01:11:28.698 So I think we've introduced a bug, but we can fix that. 01:11:28.698 --> 01:11:30.240 But let me propose for just a moment. 01:11:30.240 --> 01:11:33.470 Let's not make it dynamic because I worry that's just made my life harder. 01:11:33.470 --> 01:11:36.620 Let's at least introduce one other feature here first. 01:11:36.620 --> 01:11:41.090 I'm going to go ahead up here and define a new feature of C today, which 01:11:41.090 --> 01:11:42.290 is known as a constant. 01:11:42.290 --> 01:11:46.070 If I know in advance that I want to declare a number that I want 01:11:46.070 --> 01:11:49.670 to use again and again and again without copying and pasting 01:11:49.670 --> 01:11:53.180 literally that number 3, I can give myself a constant int 01:11:53.180 --> 01:11:58.160 by a const int total = 3. 01:11:58.160 --> 01:12:01.320 This declares what's called a constant in programming, 01:12:01.320 --> 01:12:05.600 which is a feature of many languages whereby you declare a variable of sorts 01:12:05.600 --> 01:12:07.370 whose value can never change. 01:12:07.370 --> 01:12:09.870 Once you set it, you cannot change it. 01:12:09.870 --> 01:12:12.020 And that's a good thing because, one, it shouldn't 01:12:12.020 --> 01:12:13.603 change in the context of this program. 01:12:13.603 --> 01:12:16.460 And two, just in case you, the human, are fallible, 01:12:16.460 --> 01:12:19.103 you don't want to accidentally change it when you don't intend. 01:12:19.103 --> 01:12:21.020 So this is a feature of a programming language 01:12:21.020 --> 01:12:23.850 that sort of protects you from yourself. 01:12:23.850 --> 01:12:28.280 So now I can sort of take an amalgam of my instincts and Nina's and use 01:12:28.280 --> 01:12:29.780 this variable, total. 01:12:29.780 --> 01:12:32.570 And actually, another convention when declaring constants 01:12:32.570 --> 01:12:35.960 is to capitalize them just to make visually clear that there's 01:12:35.960 --> 01:12:38.820 something different or special about this variable. 01:12:38.820 --> 01:12:42.320 So I'm going to change this to TOTAL, and I'm going to use that value here 01:12:42.320 --> 01:12:45.260 and here and also down here. 01:12:45.260 --> 01:12:49.550 But I'm afraid both Nina and I have a little bit of cleanup here to do 01:12:49.550 --> 01:12:54.680 in that I still have hard coded scores[0], scores[1], and scores[2]. 01:12:54.680 --> 01:12:58.552 And I want to add a changing number of values together. 01:12:58.552 --> 01:12:59.260 So you know what? 01:12:59.260 --> 01:13:00.270 I've got an idea. 01:13:00.270 --> 01:13:03.260 Let me go ahead and create a function that's 01:13:03.260 --> 01:13:05.070 going to compute an average for me. 01:13:05.070 --> 01:13:08.780 So if I want to create my own function that computes an average, 01:13:08.780 --> 01:13:10.650 I want it to return a floating point value, 01:13:10.650 --> 01:13:13.400 just so that we don't truncate any math. 01:13:13.400 --> 01:13:15.020 I'm going to call this average. 01:13:15.020 --> 01:13:18.560 And the input to this function is going to be the length 01:13:18.560 --> 01:13:21.710 of an array and the actual array. 01:13:21.710 --> 01:13:24.320 And this is the last piece of funky syntax for now. 01:13:24.320 --> 01:13:31.700 It turns out that when you want to pass an array as input to a custom function, 01:13:31.700 --> 01:13:35.360 you literally use those square brackets again, but you don't specify the size. 01:13:35.360 --> 01:13:37.940 And the upside of this is that your function then 01:13:37.940 --> 01:13:42.050 can support an array that's got one space in it, two spaces, three, 01:13:42.050 --> 01:13:42.800 a hundred. 01:13:42.800 --> 01:13:45.030 It's more dynamic this way. 01:13:45.030 --> 01:13:47.090 So how do I compute an average here? 01:13:47.090 --> 01:13:48.590 I can do this a few different ways. 01:13:48.590 --> 01:13:50.975 But I think what was suggested earlier makes 01:13:50.975 --> 01:13:52.850 sense, where I can do some kind of summation. 01:13:52.850 --> 01:13:54.898 So let me do int sum = 0. 01:13:54.898 --> 01:13:57.440 Because how do you compute the average of a bunch of numbers? 01:13:57.440 --> 01:14:00.170 Well, you add them all together, and you divide by the total. 01:14:00.170 --> 01:14:01.670 Well, let's see how I might do that. 01:14:01.670 --> 01:14:06.050 Let me do for int i gets 0, i less than-- 01:14:06.050 --> 01:14:06.920 what should this be? 01:14:06.920 --> 01:14:11.990 Well, if I'm being passed as this custom function the length of the array 01:14:11.990 --> 01:14:16.280 and the actual array, I think I can iterate from i up to length, 01:14:16.280 --> 01:14:18.870 and then i++ on each iteration. 01:14:18.870 --> 01:14:20.720 And then on each iteration, I think I want 01:14:20.720 --> 01:14:27.450 to do sum plus whatever is in the array's i-th location, so to speak. 01:14:27.450 --> 01:14:31.160 So again, this is shorthand notation per last week for this. 01:14:31.160 --> 01:14:38.330 Sum equals whatever sum is plus whatever is in location i of the array. 01:14:38.330 --> 01:14:40.670 And once I've done all of that, I think what 01:14:40.670 --> 01:14:47.400 I can do is return the total sum divided by the length of the array. 01:14:47.400 --> 01:14:50.510 And what I like about this whole approach-- assuming my code's correct, 01:14:50.510 --> 01:14:54.440 and I don't think it is just yet-- notice what I can do back up in main. 01:14:54.440 --> 01:14:58.400 Now I can abstract away the notion of calculating an average 01:14:58.400 --> 01:15:04.863 and just do something like this with this line of code here. 01:15:04.863 --> 01:15:05.780 So what did I just do? 01:15:05.780 --> 01:15:09.080 A lot's going on, but let's focus for a moment on line 14 here. 01:15:09.080 --> 01:15:12.230 On line 14, I'm still just printing the average of some floating point 01:15:12.230 --> 01:15:13.230 placeholder. 01:15:13.230 --> 01:15:17.570 But what I'm passing as input now is this function, average, 01:15:17.570 --> 01:15:20.210 whose inputs are going to be TOTAL, which again is just 01:15:20.210 --> 01:15:23.210 this constant at the very top-- oh, sorry, I goofed. 01:15:23.210 --> 01:15:26.420 I should have capitalized it, which is just that constant at the very top. 01:15:26.420 --> 01:15:29.840 And I'm passing in scores, which again, is just 01:15:29.840 --> 01:15:32.750 this array of all of those scores. 01:15:32.750 --> 01:15:36.180 Meanwhile, in the function, in the context of the function, 01:15:36.180 --> 01:15:39.140 notice that the names of the inputs to a function 01:15:39.140 --> 01:15:41.870 do not need to match the names of the variables being 01:15:41.870 --> 01:15:43.410 passed into that function. 01:15:43.410 --> 01:15:46.820 So even though in main, they're called TOTAL and scores, 01:15:46.820 --> 01:15:48.890 in the context of my function, average, I 01:15:48.890 --> 01:15:54.140 can call them x and y, a and b, or more generically, length and array. 01:15:54.140 --> 01:15:56.660 I don't know what the array is, but it's an array of ints. 01:15:56.660 --> 01:16:01.280 And I don't know how long it is, but that answer is going to be in length. 01:16:01.280 --> 01:16:03.560 But there's still a bug here. 01:16:03.560 --> 01:16:04.640 There's still a bug. 01:16:04.640 --> 01:16:07.940 And if we ignore main for a moment, this is a subtle one. 01:16:07.940 --> 01:16:11.840 Does anyone see a mistake that I've made probably for the third time 01:16:11.840 --> 01:16:14.330 now over the past two weeks? 01:16:14.330 --> 01:16:18.490 What mistake subtle have I made here with my code 01:16:18.490 --> 01:16:21.850 only in this average function? 01:16:21.850 --> 01:16:23.740 This one's a little more subtle. 01:16:23.740 --> 01:16:27.250 But the goal is to compute the average of a whole bunch of integers 01:16:27.250 --> 01:16:28.460 and return the answer. 01:16:28.460 --> 01:16:29.770 Nicholas? 01:16:29.770 --> 01:16:33.940 NICHOLAS: You've declared the variable within the function. 01:16:33.940 --> 01:16:37.420 DAVID MALAN: I've declared the variable within the function. 01:16:37.420 --> 01:16:42.460 That's OK because I've declared my variable sum here, I think you mean. 01:16:42.460 --> 01:16:45.430 But that's inside of the average function. 01:16:45.430 --> 01:16:49.870 And I'm using sum inside of the outermost curly braces 01:16:49.870 --> 01:16:50.890 that was defined. 01:16:50.890 --> 01:16:52.300 And so that's OK. 01:16:52.300 --> 01:16:53.590 That's OK. 01:16:53.590 --> 01:16:56.650 Let's take another thought here. 01:16:56.650 --> 01:17:00.085 Olivia, where might the bug still be? 01:17:00.085 --> 01:17:01.960 OLIVIA: The return type's a float, but you're 01:17:01.960 --> 01:17:03.425 returning an int divided by an int. 01:17:03.425 --> 01:17:04.300 DAVID MALAN: Perfect. 01:17:04.300 --> 01:17:06.400 So I again made that same stupid mistake that's 01:17:06.400 --> 01:17:08.710 just going to get more obvious as time goes on 01:17:08.710 --> 01:17:12.760 that if I want to do floating point arithmetic, just like the Ariane rocket 01:17:12.760 --> 01:17:15.550 discussion, the Patriot missile-- like, these kinds of details 01:17:15.550 --> 01:17:16.870 matter in a program. 01:17:16.870 --> 01:17:18.790 Now it's correct because I'm actually going 01:17:18.790 --> 01:17:22.000 to ensure that even though the context here 01:17:22.000 --> 01:17:24.740 is much less important than those real-world contexts, 01:17:24.740 --> 01:17:28.960 just computing some average of scores, I'm not going to accidentally truncate 01:17:28.960 --> 01:17:30.212 any of my values. 01:17:30.212 --> 01:17:32.170 So again, in the context here of this function, 01:17:32.170 --> 01:17:34.540 average is just applying some of last week's principles. 01:17:34.540 --> 01:17:35.500 I've got a variable. 01:17:35.500 --> 01:17:36.310 I've got a loop. 01:17:36.310 --> 01:17:39.070 And I'm doing some floating point arithmetic, ultimately. 01:17:39.070 --> 01:17:42.790 And I'm now creating a function that takes two inputs. 01:17:42.790 --> 01:17:44.890 One is length, and one is the length-- 01:17:44.890 --> 01:17:48.100 one is the array itself, and the return type, as Olivia notes, 01:17:48.100 --> 01:17:51.790 is a float so that my output is also well defined. 01:17:51.790 --> 01:17:53.590 But what's nice about this is, again, you 01:17:53.590 --> 01:17:55.660 can think of these functions as abstractions. 01:17:55.660 --> 01:18:00.760 Now I don't need to worry about how I calculate an average because I now 01:18:00.760 --> 01:18:03.400 have this helper function, a custom function 01:18:03.400 --> 01:18:05.930 I wrote that can help me answer that question. 01:18:05.930 --> 01:18:09.010 And here, notice that the output of this average function 01:18:09.010 --> 01:18:12.842 will become an input into printf. 01:18:12.842 --> 01:18:15.050 And the only other feature I've added to the mix here 01:18:15.050 --> 01:18:18.380 now are not only arrays, which allow us to create 01:18:18.380 --> 01:18:21.650 multiple variables, a variable number of variables, if you will, 01:18:21.650 --> 01:18:23.420 but also this notion of a constant. 01:18:23.420 --> 01:18:26.960 If I find myself using the same number again and again and again, 01:18:26.960 --> 01:18:29.570 this constant can help me keep my code clean. 01:18:29.570 --> 01:18:30.440 And notice this. 01:18:30.440 --> 01:18:33.710 If next year, maybe another semester, there's four scores or four tests, 01:18:33.710 --> 01:18:34.940 I change it in one place. 01:18:34.940 --> 01:18:35.870 I recompile. 01:18:35.870 --> 01:18:37.610 Boom, I'm done. 01:18:37.610 --> 01:18:39.980 A well-designed program does not require that you 01:18:39.980 --> 01:18:43.130 go reading through the entirety of it, fixing numbers here, numbers there. 01:18:43.130 --> 01:18:46.010 Changing it in one place can allow me to improve this program, 01:18:46.010 --> 01:18:49.520 make it support four tests next year instead of just the three. 01:18:49.520 --> 01:18:52.760 But better still would be to take, I think, 01:18:52.760 --> 01:18:56.900 Nina's advice before, which was to maybe just use get_int and ask the human 01:18:56.900 --> 01:18:58.910 for how many tests they actually have. 01:18:58.910 --> 01:19:00.562 That too would work. 01:19:00.562 --> 01:19:02.270 Well, let me pause here to see if there's 01:19:02.270 --> 01:19:07.460 any questions then about arrays or about constants 01:19:07.460 --> 01:19:13.770 or passing them around as inputs and outputs in this way. 01:19:13.770 --> 01:19:16.740 Yeah, over to Sophia. 01:19:16.740 --> 01:19:21.570 SOPHIA: I had question about the use of float and why the use of one float 01:19:21.570 --> 01:19:23.790 causes the whole output to be a float. 01:19:23.790 --> 01:19:24.870 Why does that occur? 01:19:24.870 --> 01:19:25.920 DAVID MALAN: Yeah, really good question. 01:19:25.920 --> 01:19:27.340 That's just how C behaves. 01:19:27.340 --> 01:19:30.840 So long as there is one or more floating point values involved 01:19:30.840 --> 01:19:35.820 in a mathematical formula, it is going to use that data type, which 01:19:35.820 --> 01:19:39.610 is the more powerful one, if you will, rather than risk truncating anything. 01:19:39.610 --> 01:19:41.970 So you just need one float to be participating 01:19:41.970 --> 01:19:44.490 in the formula in question. 01:19:44.490 --> 01:19:45.900 Good question. 01:19:45.900 --> 01:19:53.550 Other questions on arrays or constants or this passing around of them? 01:19:53.550 --> 01:19:57.150 Yeah, over to Alexandra. 01:19:57.150 --> 01:20:03.240 ALEXANDRA: I have a question about the declaring of the array, scores. 01:20:03.240 --> 01:20:08.370 When you declared it in main, you said int scores. 01:20:08.370 --> 01:20:11.890 And in the brackets, you have TOTAL. 01:20:11.890 --> 01:20:16.313 Can you declare it without the TOTAL-- 01:20:16.313 --> 01:20:17.730 DAVID MALAN: Really good question. 01:20:17.730 --> 01:20:18.600 ALEXANDRA: --only the brackets? 01:20:18.600 --> 01:20:19.530 DAVID MALAN: Short answer, no. 01:20:19.530 --> 01:20:21.940 So the way I did it is the way you do have to do it. 01:20:21.940 --> 01:20:25.810 And in fact, if I highlight what I did here, now it currently says TOTAL. 01:20:25.810 --> 01:20:29.400 If I get rid of that, and I go back to our first version where I said 01:20:29.400 --> 01:20:36.360 something like 3 and 3 and 3 over here, you cannot do this, which I think, 01:20:36.360 --> 01:20:38.010 Alexandra, is what you were proposing. 01:20:38.010 --> 01:20:41.640 The computer needs to know how big the array is when you are creating it. 01:20:41.640 --> 01:20:44.160 The exception to that is that when you're 01:20:44.160 --> 01:20:47.070 passing an array from one function to another, 01:20:47.070 --> 01:20:49.350 you do not need to tell that custom function 01:20:49.350 --> 01:20:51.990 how big the array is because, again, you don't know in advance. 01:20:51.990 --> 01:20:55.410 You're writing a fairly generic, dynamic function whose purpose in life 01:20:55.410 --> 01:21:00.750 is to take any array as input of integers and any length 01:21:00.750 --> 01:21:05.640 and respond accordingly with an average that matches the size of that thing. 01:21:05.640 --> 01:21:09.870 And those of you, as an aside, who have programmed before, especially in Java, 01:21:09.870 --> 01:21:13.890 unlike in Java and certain other languages, the length of an array 01:21:13.890 --> 01:21:16.320 is not built into the array itself. 01:21:16.320 --> 01:21:20.590 If you do not pass in the length of an array to another function, 01:21:20.590 --> 01:21:24.280 there is no way to determine how big the array is. 01:21:24.280 --> 01:21:26.850 This is different from Java and other languages, 01:21:26.850 --> 01:21:29.880 where you can ask the array, in some sense, what is its length. 01:21:29.880 --> 01:21:32.490 In C, you have to pass both the array itself 01:21:32.490 --> 01:21:35.610 and its length around separately. [? Sina? ?] 01:21:35.610 --> 01:21:38.880 [? SINA: ?] I just-- I'm still a little bit confused about how, 01:21:38.880 --> 01:21:44.740 when we write that second command, when is it void in the parentheses? 01:21:44.740 --> 01:21:47.520 And when do we define the int? 01:21:47.520 --> 01:21:50.918 Because as I remember when we did the-- 01:21:50.918 --> 01:21:53.460 get a negative number, we get a positive number, it was void, 01:21:53.460 --> 01:21:55.500 but we still kind of gave it an input. 01:21:55.500 --> 01:21:57.705 I'm just not completely sold on that. 01:21:57.705 --> 01:21:59.080 DAVID MALAN: Sure, good question. 01:21:59.080 --> 01:22:01.570 Let me go ahead and open up that previous example, 01:22:01.570 --> 01:22:04.960 which was a little buggy, but it has the right syntax here. 01:22:04.960 --> 01:22:07.620 So here was the get_negative_int function from before. 01:22:07.620 --> 01:22:10.620 And, [? Sina, ?] you know it was void as input. 01:22:10.620 --> 01:22:13.170 So there was one comment you made where it still took input. 01:22:13.170 --> 01:22:14.010 That was not so. 01:22:14.010 --> 01:22:17.070 So get_negative_int did not take any input. 01:22:17.070 --> 01:22:19.620 And case in point, if we scroll up to main, 01:22:19.620 --> 01:22:22.530 notice that when I called it on line 10, I 01:22:22.530 --> 01:22:25.920 said get_negative_int, open parenthesis, close parenthesis, 01:22:25.920 --> 01:22:29.040 with no inputs inside of those parentheses. 01:22:29.040 --> 01:22:32.220 This keyword "void," which we've seen a few times now last week 01:22:32.220 --> 01:22:35.880 and this week, is just an explicit keyword in C that says, 01:22:35.880 --> 01:22:41.340 do not put anything here, which is to say, it would be incorrect for me up 01:22:41.340 --> 01:22:44.970 here to do something like this, like to pass in a number, 01:22:44.970 --> 01:22:48.990 or to pass in a prompt, or anything inside of those parentheses. 01:22:48.990 --> 01:22:51.630 The fact that this function, get_negative_int 01:22:51.630 --> 01:22:56.340 takes void as its input means it does not take any inputs whatsoever. 01:22:56.340 --> 01:22:56.942 That's fine. 01:22:56.942 --> 01:22:59.400 For get_negative_int, the name of the function says it all. 01:22:59.400 --> 01:23:02.367 Like, there's no need to parameterize or customize 01:23:02.367 --> 01:23:04.200 the behavior of getting negative int itself. 01:23:04.200 --> 01:23:06.180 You just want to get a negative int. 01:23:06.180 --> 01:23:09.300 By contrast, though, with the function we just wrote, 01:23:09.300 --> 01:23:14.940 average, this function does make conceptual sense to take inputs, 01:23:14.940 --> 01:23:17.490 because you can't just say, give me the average. 01:23:17.490 --> 01:23:18.930 Like, average of what? 01:23:18.930 --> 01:23:22.110 Like, it needs to take input so as to answer that question for you. 01:23:22.110 --> 01:23:24.840 And the input, in this case, is the array itself of numbers 01:23:24.840 --> 01:23:28.425 and the length of that array so you can do the arithmetic. 01:23:28.425 --> 01:23:31.050 And so, [? Sina, ?] hopefully, that helps make the distinction. 01:23:31.050 --> 01:23:33.930 You use void when you don't want to take input. 01:23:33.930 --> 01:23:38.340 And you actually specify a comma-separated list of arguments 01:23:38.340 --> 01:23:42.000 when you do want to take input. 01:23:42.000 --> 01:23:46.170 All right, so we focused up until now on integers, really. 01:23:46.170 --> 01:23:49.020 But let's simplify a little bit because it turns out 01:23:49.020 --> 01:23:52.020 that arrays and memory actually intersect 01:23:52.020 --> 01:23:55.740 to create some very familiar features of most any computer program, namely 01:23:55.740 --> 01:23:57.970 text or strings more generally. 01:23:57.970 --> 01:24:03.010 So suppose we simplify further, no more integers, no more arrays of integers. 01:24:03.010 --> 01:24:05.490 Let's just start for a moment with a single character 01:24:05.490 --> 01:24:09.840 and write a program that just creates a single brick from that Mario game. 01:24:09.840 --> 01:24:13.540 Let me go ahead and create a program here called brick.c. 01:24:13.540 --> 01:24:15.900 And in brick.c, I'm just going to #include 01:24:15.900 --> 01:24:21.570 stdio.h, int main(void) And more on this void a little later today. 01:24:21.570 --> 01:24:25.170 Char c gets, quote unquote, '#'. 01:24:25.170 --> 01:24:29.730 And then down here, let me just go ahead and print very simply a placeholder, 01:24:29.730 --> 01:24:32.800 %c, backslash n, and then output c. 01:24:32.800 --> 01:24:34.380 So this is a pretty stupid program. 01:24:34.380 --> 01:24:37.530 Its sole purpose in life is to print a single hash 01:24:37.530 --> 01:24:41.940 as you might have in a Mario pyramid of height 1, so very simple. 01:24:41.940 --> 01:24:44.040 Let me go ahead and make brick. 01:24:44.040 --> 01:24:45.480 It seems to compile OK. 01:24:45.480 --> 01:24:47.040 Let me run it with ./brick. 01:24:47.040 --> 01:24:48.750 And voila, we get a single brick. 01:24:48.750 --> 01:24:54.150 But let's consider for just a moment exactly what just happened here 01:24:54.150 --> 01:24:58.237 and what actually was going on underneath the hood. 01:24:58.237 --> 01:24:59.070 Well, you know what? 01:24:59.070 --> 01:25:00.030 I'm kind of curious. 01:25:00.030 --> 01:25:03.990 I remember from last week, we could cast values from one thing to another. 01:25:03.990 --> 01:25:07.290 What if I got a little curious, and I didn't print out c, 01:25:07.290 --> 01:25:12.480 which is this hash character, as %c, which is a placeholder for a character? 01:25:12.480 --> 01:25:15.250 What if I got a little crazy and said %i? 01:25:15.250 --> 01:25:21.370 I think I could probably coerce this char by casting it to an int 01:25:21.370 --> 01:25:23.830 so I can see its decimal equivalent. 01:25:23.830 --> 01:25:25.960 I could see its actual ASCII code. 01:25:25.960 --> 01:25:28.350 So let me rebuild this with make brick. 01:25:28.350 --> 01:25:30.330 Now let me do ./brick. 01:25:30.330 --> 01:25:32.430 And what number might we see? 01:25:32.430 --> 01:25:36.840 Last week, we saw 72 a lot, 73, and 33 for "HI!" 01:25:36.840 --> 01:25:39.000 This week, you can see 35. 01:25:39.000 --> 01:25:43.140 It turns out it's the code for and an ASCII hash. 01:25:43.140 --> 01:25:47.730 And you can see this, for instance, if I go to a website like-- 01:25:47.730 --> 01:25:52.020 let's go to asciichart.com. 01:25:52.020 --> 01:25:55.170 And sure enough, if I go to the same chart from last week, 01:25:55.170 --> 01:25:58.560 and I look for the hash symbol here, its ASCII code is 35. 01:25:58.560 --> 01:26:02.340 And it turns out, in C, if it's pretty straightforward to the computer 01:26:02.340 --> 01:26:05.390 that, yes, if this is a character, I know I can convert it to an int, 01:26:05.390 --> 01:26:07.440 you don't have to explicitly cast it. 01:26:07.440 --> 01:26:12.990 You can instead implicitly cast one data type to another just from context here. 01:26:12.990 --> 01:26:16.950 So printf and C are smart enough here to know, OK, you're giving me 01:26:16.950 --> 01:26:19.050 a character in the form of variable c. 01:26:19.050 --> 01:26:23.083 But you want to display it as a %i, an integer. 01:26:23.083 --> 01:26:24.000 That's going to be OK. 01:26:24.000 --> 01:26:25.990 And indeed, I still see the number 35. 01:26:25.990 --> 01:26:27.392 So that's just simple casting. 01:26:27.392 --> 01:26:29.850 But let's now put this into the context of today's picture. 01:26:29.850 --> 01:26:31.372 How is that character laid out? 01:26:31.372 --> 01:26:33.330 Well, quite simply, if this is my memory again, 01:26:33.330 --> 01:26:36.150 and we've gotten rid of all of the numbers, c, 01:26:36.150 --> 01:26:41.250 otherwise storing this hash, is just being stored in one of these bytes. 01:26:41.250 --> 01:26:47.370 It only requires one square because, again, a char is a single byte. 01:26:47.370 --> 01:26:52.240 But equivalently, 35 is the number that's actually being stored there. 01:26:52.240 --> 01:26:53.790 But I wonder, I wonder. 01:26:53.790 --> 01:26:55.890 Last week, we spent quite a bit of time storing 01:26:55.890 --> 01:27:01.060 not just single characters, but actual words like "hi" and other expressions. 01:27:01.060 --> 01:27:03.490 And so what if I were to do something like this? 01:27:03.490 --> 01:27:04.960 Let me go back to my code. 01:27:04.960 --> 01:27:07.530 And let me not quite yet practice what I just preached. 01:27:07.530 --> 01:27:11.910 And let me give myself three variables this time-- c1, c2, and c3. 01:27:11.910 --> 01:27:16.980 And let me deliberately store in those three variables H, I, in all caps, 01:27:16.980 --> 01:27:18.720 followed by an exclamation point. 01:27:18.720 --> 01:27:22.170 And per last week, when you're dealing with individual characters, 01:27:22.170 --> 01:27:24.630 you must, in C, use single quotes. 01:27:24.630 --> 01:27:26.520 When you're dealing with multiple characters, 01:27:26.520 --> 01:27:29.080 otherwise known last week as strings, use double quotes. 01:27:29.080 --> 01:27:31.830 But that's why I'm using single quotes, because we're only playing 01:27:31.830 --> 01:27:34.060 at the moment with single characters. 01:27:34.060 --> 01:27:37.080 Now let me go ahead and print these values out. 01:27:37.080 --> 01:27:43.320 Let me print out %c, %c, %c, and output c1, c2, c3. 01:27:43.320 --> 01:27:49.590 So this is perhaps the stupidest way you could print out a full word like "HI!" 01:27:49.590 --> 01:27:54.360 in C by storing every single character in its own variable, but so be it. 01:27:54.360 --> 01:27:57.090 I'm just using these first principles here. 01:27:57.090 --> 01:27:58.493 I'm using %c as my placeholder. 01:27:58.493 --> 01:27:59.910 I'm printing out these characters. 01:27:59.910 --> 01:28:01.950 So let me do make brick now. 01:28:01.950 --> 01:28:03.000 Compiles OK. 01:28:03.000 --> 01:28:04.678 And if I do a dot slash-- 01:28:04.678 --> 01:28:06.720 you know, I really should have renamed this file, 01:28:06.720 --> 01:28:08.095 but we'll rename it in a moment-- 01:28:08.095 --> 01:28:09.630 ./brick, "HI!" 01:28:09.630 --> 01:28:11.190 And let me go ahead and do this. 01:28:11.190 --> 01:28:14.490 Let me go ahead now and actually close the file. 01:28:14.490 --> 01:28:18.820 And recall from last week, if I want to rename my file from brick.c, 01:28:18.820 --> 01:28:22.620 let's say, to hi.c, I can use the move command, mv. 01:28:22.620 --> 01:28:26.730 And now if I open up this file, sure enough, there's hi.c. 01:28:26.730 --> 01:28:29.850 And I've fixed my renaming mistake. 01:28:29.850 --> 01:28:35.040 All right, so again, if I now do make hi, and I do ./hi, voila, 01:28:35.040 --> 01:28:36.000 I see the "HI!" 01:28:36.000 --> 01:28:40.052 But again, this is kind of a stupid way of implementing a string. 01:28:40.052 --> 01:28:41.760 But let's still look underneath the hood. 01:28:41.760 --> 01:28:43.093 Let me go ahead and get curious. 01:28:43.093 --> 01:28:46.311 Let me print out %i, %i, and %i. 01:28:46.311 --> 01:28:48.480 And Let me include spaces this time just so I 01:28:48.480 --> 01:28:51.700 can see separation between the numbers. 01:28:51.700 --> 01:28:54.750 Let me make hi again, ./hi. 01:28:54.750 --> 01:28:56.760 OK, there's that 72. 01:28:56.760 --> 01:28:57.900 There's that 73. 01:28:57.900 --> 01:29:00.340 And there's that 33 from last week. 01:29:00.340 --> 01:29:01.653 So that's interesting too. 01:29:01.653 --> 01:29:04.320 So what's going on underneath the hood in the computer's memory? 01:29:04.320 --> 01:29:06.237 Well, when I'm storing these three characters, 01:29:06.237 --> 01:29:11.040 now I'm just storing them in three different boxes, so c1, c2, c3. 01:29:11.040 --> 01:29:14.970 And when you look at it collectively, it kind of looks like a whole word 01:29:14.970 --> 01:29:17.610 even though it's, of course, just these individual characters. 01:29:17.610 --> 01:29:20.850 So what's underneath the hood, of course, though, is 72, 73, 33. 01:29:20.850 --> 01:29:23.160 Or equivalently, in binary, just this. 01:29:23.160 --> 01:29:25.410 So the story is the same even though we're now talking 01:29:25.410 --> 01:29:28.540 about chars instead of integers. 01:29:28.540 --> 01:29:31.110 But what happens when I do this? 01:29:31.110 --> 01:29:35.040 What happens when I do string s gets, quote unquote, "HI!" 01:29:35.040 --> 01:29:36.450 using double quotes? 01:29:36.450 --> 01:29:38.850 Well, let's change this program accordingly. 01:29:38.850 --> 01:29:42.390 Let me go ahead and do what we would have done last week, string-- 01:29:42.390 --> 01:29:44.760 I'll call it s just for s for string-- 01:29:44.760 --> 01:29:45.300 "HI!" 01:29:45.300 --> 01:29:46.410 in all caps. 01:29:46.410 --> 01:29:47.925 I can simplify this next line. 01:29:47.925 --> 01:29:52.170 I'm going to use %s as a placeholder for string s. 01:29:52.170 --> 01:29:54.300 But let's, for now, reveal what a string really 01:29:54.300 --> 01:29:55.800 is, because string is a term of art. 01:29:55.800 --> 01:29:59.370 Every programming language has "strings" even if it doesn't technically 01:29:59.370 --> 01:30:01.260 have a data type called string. 01:30:01.260 --> 01:30:04.560 C does not technically have a data type called string. 01:30:04.560 --> 01:30:08.850 We have added this type to C by way of CS50's library. 01:30:08.850 --> 01:30:12.720 But now if I do make hi, notice that my code compiles OK. 01:30:12.720 --> 01:30:17.230 And if I do ./hi Enter, voila, I still see "HI!", 01:30:17.230 --> 01:30:19.570 which is what I would have seen last week as well. 01:30:19.570 --> 01:30:23.700 And if we depict this in the computer's memory, because "HI!" is three letters, 01:30:23.700 --> 01:30:26.040 it's kind of like saying, well, give me three boxes, 01:30:26.040 --> 01:30:27.930 and let me call this string s. 01:30:27.930 --> 01:30:30.510 So this feels like a reasonable artist's rendition 01:30:30.510 --> 01:30:35.070 of what s is if it's storing a three-letter word like "HI!" 01:30:35.070 --> 01:30:39.840 But any time we have sequences of characters like this, 01:30:39.840 --> 01:30:44.190 I feel like we're now seeing the capability of a proper programming 01:30:44.190 --> 01:30:44.760 language. 01:30:44.760 --> 01:30:48.250 We introduced a little bit ago the notion of a string. 01:30:48.250 --> 01:30:52.190 So maybe could someone redefine string as we've 01:30:52.190 --> 01:30:56.360 been using it in terms of some of today's nomenclature? 01:30:56.360 --> 01:30:57.860 Like, what is a string? 01:30:57.860 --> 01:31:02.730 There's an example of one, "HI!", taking up three boxes. 01:31:02.730 --> 01:31:06.720 But how did we, CS50 maybe implement string underneath the hood, 01:31:06.720 --> 01:31:09.140 would you say? 01:31:09.140 --> 01:31:09.650 What is it? 01:31:09.650 --> 01:31:11.270 Tucker? 01:31:11.270 --> 01:31:14.848 TUCKER: Well, it's an array of characters and integers. 01:31:14.848 --> 01:31:16.640 Well, it's integers are used in the string, 01:31:16.640 --> 01:31:19.575 but it's an array of basically single characters. 01:31:19.575 --> 01:31:20.450 DAVID MALAN: Perfect. 01:31:20.450 --> 01:31:22.640 If we now have the ability to express-- 01:31:22.640 --> 01:31:23.810 very nicely done, Tucker. 01:31:23.810 --> 01:31:27.560 If we now have the ability to represent sequences of things, integers, 01:31:27.560 --> 01:31:29.360 for instance, like scores, well, it stands 01:31:29.360 --> 01:31:33.410 to reason that we can take another primitive, a very basic data type 01:31:33.410 --> 01:31:34.340 like a char. 01:31:34.340 --> 01:31:38.030 And if we want to spell things with those chars, like English words, 01:31:38.030 --> 01:31:41.180 well, let's just think of a string really as an array 01:31:41.180 --> 01:31:43.820 of characters, an array of chars. 01:31:43.820 --> 01:31:47.850 And indeed, that's exactly what string actually is. 01:31:47.850 --> 01:31:54.180 So this thing here, "HI!", technically speaking is an array called s. 01:31:54.180 --> 01:31:57.080 And this is s[0] This is s[1]. 01:31:57.080 --> 01:31:58.310 This is s[2]. 01:31:58.310 --> 01:31:59.878 It's just an array called s. 01:31:59.878 --> 01:32:01.670 Now, we didn't use the word array last week 01:32:01.670 --> 01:32:04.610 because it's not as familiar as the notion of a "string of text," 01:32:04.610 --> 01:32:05.570 for instance. 01:32:05.570 --> 01:32:08.720 But a string is apparently just an array. 01:32:08.720 --> 01:32:12.380 And if it's an array, that means we can access, if we want to, 01:32:12.380 --> 01:32:16.610 the individual characters of that array by way of the square bracket 01:32:16.610 --> 01:32:18.170 notation from today. 01:32:18.170 --> 01:32:23.180 But it turns out there's something a little special about strings 01:32:23.180 --> 01:32:24.440 as they're implemented. 01:32:24.440 --> 01:32:28.190 Recall in our example involving scores, the only way 01:32:28.190 --> 01:32:32.930 we knew how long that array was was because I 01:32:32.930 --> 01:32:36.740 had a second variable called length or TOTAL 01:32:36.740 --> 01:32:41.900 that stored the total number of integers in that array. 01:32:41.900 --> 01:32:44.480 That is to say in our scores example, not only did we 01:32:44.480 --> 01:32:45.860 allocate the array itself. 01:32:45.860 --> 01:32:51.390 We also kept track of how many things were in that array with two variables. 01:32:51.390 --> 01:32:56.810 However, up until now, every time you and I have used the printf function, 01:32:56.810 --> 01:33:01.040 and we have passed to that printf function a string like s, 01:33:01.040 --> 01:33:05.420 we have only provided printf with the string itself. 01:33:05.420 --> 01:33:08.030 Or logically, we have only provided printf 01:33:08.030 --> 01:33:11.670 with the array of characters itself. 01:33:11.670 --> 01:33:17.870 And yet somehow, printf is magically figuring out how long the string is. 01:33:17.870 --> 01:33:20.660 After all, when printf prints the value of s, 01:33:20.660 --> 01:33:23.780 it is printing H, I, exclamation point, and that's it. 01:33:23.780 --> 01:33:27.643 It's not going and printing 4 characters or 5 or 20, right? 01:33:27.643 --> 01:33:30.560 It stands to reason that there's other stuff in your computer's memory 01:33:30.560 --> 01:33:32.960 if you've got other variables or other programs running. 01:33:32.960 --> 01:33:35.480 Yet printf seems to be smart enough to know, 01:33:35.480 --> 01:33:39.320 given an array, how long the array is because, quite simply, it 01:33:39.320 --> 01:33:42.480 only prints out that single word. 01:33:42.480 --> 01:33:48.440 So how then does a computer know where a string ends in memory if all a string 01:33:48.440 --> 01:33:49.910 is is a sequence of characters? 01:33:49.910 --> 01:33:54.500 Well, it turns out that if your string is length 3, as is this one, H, I, 01:33:54.500 --> 01:34:00.680 exclamation point, technically a string, implemented underneath the hood, 01:34:00.680 --> 01:34:02.390 uses 4 bytes. 01:34:02.390 --> 01:34:04.280 It uses 4 bytes. 01:34:04.280 --> 01:34:07.760 It uses a fourth byte to be initialized to what 01:34:07.760 --> 01:34:11.850 we would describe as backslash 0, which is a weird way of describing it. 01:34:11.850 --> 01:34:14.870 But this just represents a special character, otherwise known 01:34:14.870 --> 01:34:18.890 as the null character, which is just a special value that 01:34:18.890 --> 01:34:20.880 represents the end of a string. 01:34:20.880 --> 01:34:23.960 So that is to say when you create a string, quote 01:34:23.960 --> 01:34:26.750 unquote with double quotes, "HI!"-- 01:34:26.750 --> 01:34:28.400 yes, the string is length 3. 01:34:28.400 --> 01:34:31.580 But you're wasting or spending 4 total bytes on it. 01:34:31.580 --> 01:34:32.240 Why? 01:34:32.240 --> 01:34:36.380 Because this is a clue to the computer as to where "HI!" 01:34:36.380 --> 01:34:39.800 ends and where the next string maybe begins. 01:34:39.800 --> 01:34:43.010 It is not sufficient to just start printing characters inside 01:34:43.010 --> 01:34:45.117 of printf one at a time, left to right. 01:34:45.117 --> 01:34:47.450 There needs to be this sort of equivalent of a stop sign 01:34:47.450 --> 01:34:50.150 at the end of the string, saying, that's it for this string. 01:34:50.150 --> 01:34:51.540 Well, what are these values? 01:34:51.540 --> 01:34:53.290 Well, let's convert them back to decimal-- 01:34:53.290 --> 01:34:54.800 72, 73, 33. 01:34:54.800 --> 01:35:00.560 That fancy backslash 0 was just a way of saying, in character form, it's 0. 01:35:00.560 --> 01:35:06.740 More specifically, it is eight 0 bits inside of that square. 01:35:06.740 --> 01:35:09.470 So to store a string, the computer, unbeknownst to you, 01:35:09.470 --> 01:35:15.260 has been using one extra byte all, 0 bits, otherwise written as backslash 0, 01:35:15.260 --> 01:35:19.340 but otherwise known as literally the value 0. 01:35:19.340 --> 01:35:23.180 So this thing, otherwise colloquially known as null, 01:35:23.180 --> 01:35:24.685 is just a special character. 01:35:24.685 --> 01:35:26.060 And we can actually see it again. 01:35:26.060 --> 01:35:30.260 If I go back to my asciichart.com from before, 01:35:30.260 --> 01:35:35.480 notice number 0 is known as NUL, N-U-L in all caps. 01:35:35.480 --> 01:35:40.580 All right, so with that said, what is powerful then about strings 01:35:40.580 --> 01:35:42.060 once we have this capability? 01:35:42.060 --> 01:35:43.640 Well, let me go ahead and do this. 01:35:43.640 --> 01:35:46.130 Let me go back into my code from a moment ago. 01:35:46.130 --> 01:35:48.830 And let me go ahead and enhance this program a little bit 01:35:48.830 --> 01:35:51.965 just to get a little curious as to what's going on. 01:35:51.965 --> 01:35:53.250 You know what I can do? 01:35:53.250 --> 01:35:57.200 I bet what I can do here in this version here is this. 01:35:57.200 --> 01:35:57.800 You know what? 01:35:57.800 --> 01:36:00.440 If I want to print out all of these characters of s, 01:36:00.440 --> 01:36:06.590 I can get a little curious again and print out %c, %c, %c. 01:36:06.590 --> 01:36:11.340 And if s is an array, per today's syntax, I can technically do s[0], 01:36:11.340 --> 01:36:14.940 s[1], s[2]. 01:36:14.940 --> 01:36:21.720 And then if I save this, recompile my code with make hi, OK, ./hi, 01:36:21.720 --> 01:36:23.070 I still see "HI!" 01:36:23.070 --> 01:36:23.820 But you know what? 01:36:23.820 --> 01:36:25.195 Let me get a little more curious. 01:36:25.195 --> 01:36:28.740 Let me use %i so I can actually see those ASCII codes. 01:36:28.740 --> 01:36:31.950 Let me go ahead and recompile with make hi, ./hi. 01:36:31.950 --> 01:36:35.190 There's the 72, 73, 33. 01:36:35.190 --> 01:36:37.090 Now let me get even more curious. 01:36:37.090 --> 01:36:42.270 Let me print a fourth value like this here, s[3], 01:36:42.270 --> 01:36:44.430 which is the fourth location, mind you. 01:36:44.430 --> 01:36:50.850 So if I now do make hi and ./hi, voila, now you see 0. 01:36:50.850 --> 01:36:55.110 And what this hints at is actually a very dangerous feature of C. You know, 01:36:55.110 --> 01:36:57.750 suppose I'm curious at seeing what's beyond that. 01:36:57.750 --> 01:37:01.290 I could technically do s[4], the fifth location, 01:37:01.290 --> 01:37:04.830 even though according to my picture, there really shouldn't be anything 01:37:04.830 --> 01:37:08.010 at the fifth location, at least not that I know about just yet. 01:37:08.010 --> 01:37:10.980 But I can do it in C. Nothing's stopping me. 01:37:10.980 --> 01:37:13.710 So let me do make hi, ./hi. 01:37:13.710 --> 01:37:15.490 And that's interesting. 01:37:15.490 --> 01:37:17.560 Apparently there's the number 37. 01:37:17.560 --> 01:37:19.110 What is the number 37? 01:37:19.110 --> 01:37:21.150 Well, let me go back to my ASCII chart. 01:37:21.150 --> 01:37:25.102 And let me conclude that number 37 is a percent sign. 01:37:25.102 --> 01:37:28.060 So that's kind of weird because I didn't print out an explicit percent. 01:37:28.060 --> 01:37:31.290 Now I'm kind of poking around the computer's memory in places 01:37:31.290 --> 01:37:33.370 I shouldn't be looking, in some sense. 01:37:33.370 --> 01:37:36.510 In fact, if I get really curious, let's look not at location 4. 01:37:36.510 --> 01:37:40.140 How about location 40, like way off into that picture? 01:37:40.140 --> 01:37:44.400 Make hi, ./hi, 24, whatever that is. 01:37:44.400 --> 01:37:52.470 I can look at location 400, recompile my code, make hi, ./hi. 01:37:52.470 --> 01:37:54.090 And now it's 0 again. 01:37:54.090 --> 01:37:57.060 So this is what's both powerful and also dangerous about C. 01:37:57.060 --> 01:38:01.088 You can touch, look at, change any memory you want. 01:38:01.088 --> 01:38:02.880 You're essentially just on the honor system 01:38:02.880 --> 01:38:04.838 not to touch memory that does it belong to you. 01:38:04.838 --> 01:38:06.960 And invariably, especially next week, are 01:38:06.960 --> 01:38:10.290 we going to start accidentally touching memory that doesn't belong to you. 01:38:10.290 --> 01:38:13.380 And you'll see that it actually can cause computer programs to crash, 01:38:13.380 --> 01:38:18.330 including programs on your own Mac and PC, yet another source of common bugs. 01:38:18.330 --> 01:38:22.350 But now that we have this ability to store different strings 01:38:22.350 --> 01:38:24.362 or to think about strings as arrays, well, 01:38:24.362 --> 01:38:26.070 let's go ahead and consider how you might 01:38:26.070 --> 01:38:27.670 have multiple strings in a program. 01:38:27.670 --> 01:38:30.900 So for instance, if you were to store two strings in a program-- let's call 01:38:30.900 --> 01:38:32.677 them s and t respectively. 01:38:32.677 --> 01:38:35.010 Another programmer convention-- if you need two strings, 01:38:35.010 --> 01:38:37.110 call the first one s then the second one t. 01:38:37.110 --> 01:38:38.400 Maybe I'm storing "HI!" 01:38:38.400 --> 01:38:39.280 then "BYE!" 01:38:39.280 --> 01:38:41.530 Well, what's the computer's memory going to look like? 01:38:41.530 --> 01:38:43.950 Well, let's do some digging. 01:38:43.950 --> 01:38:46.202 "HI!", as before, is going to be stored here. 01:38:46.202 --> 01:38:47.910 So this whole thing refers to s, and it's 01:38:47.910 --> 01:38:52.080 taking 4 bytes because the last one is that special null character that 01:38:52.080 --> 01:38:55.440 just is the stop sign that demarcates the end of the string. 01:38:55.440 --> 01:38:59.760 "BYE!", meanwhile, is going to take up another B, Y, E, exclamation point, 01:38:59.760 --> 01:39:04.650 five bytes because I need a fifth byte to represent another null character. 01:39:04.650 --> 01:39:06.600 And this one deliberately wraps around. 01:39:06.600 --> 01:39:08.820 Though again, this is just an artist's rendition. 01:39:08.820 --> 01:39:11.580 There's not necessarily a grid in reality. 01:39:11.580 --> 01:39:16.770 B, Y, E, exclamation point, backslash 0 now represents t. 01:39:16.770 --> 01:39:21.690 So this is to say, if I had a program like this, where I had "HI!" 01:39:21.690 --> 01:39:25.200 and then "BYE!", and I started poking around the computer's memory 01:39:25.200 --> 01:39:27.360 just using the square bracket notation, I 01:39:27.360 --> 01:39:31.020 bet I could start accessing the value of B or Y 01:39:31.020 --> 01:39:34.710 or E just by looking a little past the string s. 01:39:34.710 --> 01:39:37.380 So again, as complicated as our programs get, 01:39:37.380 --> 01:39:40.320 all that's going on underneath the hood is you just plop things down 01:39:40.320 --> 01:39:44.070 in memory in locations like these. 01:39:44.070 --> 01:39:47.310 And so now that we have this ability or maybe this mental model 01:39:47.310 --> 01:39:49.710 for what's going on inside of a computer, 01:39:49.710 --> 01:39:53.490 we can consider some of the features that you might want 01:39:53.490 --> 01:39:55.740 to now use in programs that you write. 01:39:55.740 --> 01:39:59.190 So let me go ahead here and whip up a quick program, 01:39:59.190 --> 01:40:05.400 for instance, that goes ahead and, let's say, 01:40:05.400 --> 01:40:09.310 prints out the total length of a string. 01:40:09.310 --> 01:40:10.540 Let me go ahead and do this. 01:40:10.540 --> 01:40:14.730 I'm going to go ahead and create a new program here in CS50's IDE. 01:40:14.730 --> 01:40:17.870 And I'm going to call this one string.c. 01:40:17.870 --> 01:40:22.080 And I'm going to very quickly at the top include as usual cs50.h. 01:40:22.080 --> 01:40:24.735 And I'm going to go ahead and #include stdio.h. 01:40:24.735 --> 01:40:27.185 And I'm going to give myself int main(void). 01:40:27.185 --> 01:40:29.310 And then in here, I'm going to get myself a string. 01:40:29.310 --> 01:40:32.280 So string s equals get_string. 01:40:32.280 --> 01:40:35.220 Let me just ask the human for some input, whatever it is. 01:40:35.220 --> 01:40:39.270 Then let me go ahead and print out literally the word "Output" 01:40:39.270 --> 01:40:41.730 just so that I can actually see the result. 01:40:41.730 --> 01:40:47.250 And then down here, let me go ahead and print out that string, for int i get 0, 01:40:47.250 --> 01:40:49.792 i is less than-- 01:40:49.792 --> 01:40:52.240 huh, I don't know what the length of the string is yet. 01:40:52.240 --> 01:40:54.990 So let me just put a question mark there, which is not valid code, 01:40:54.990 --> 01:40:57.068 but we'll come back to this-- i++. 01:40:57.068 --> 01:40:59.610 And then inside of the loop, I want to go ahead and print out 01:40:59.610 --> 01:41:03.432 every character one at a time by using my new array notation. 01:41:03.432 --> 01:41:05.140 And then at the very end of this program, 01:41:05.140 --> 01:41:06.890 I'm going to print a new line just to make 01:41:06.890 --> 01:41:08.460 sure the cursor is on its own line. 01:41:08.460 --> 01:41:11.000 So this is a complete program that is now, 01:41:11.000 --> 01:41:15.950 as of this week, going to treat a string as an array, ergo, my syntax in line 10 01:41:15.950 --> 01:41:18.830 that's using my new fancy square bracket notation. 01:41:18.830 --> 01:41:21.920 But the only question I haven't answered yet is this-- 01:41:21.920 --> 01:41:25.100 how do I know when to stop printing the string? 01:41:25.100 --> 01:41:26.390 How do I know when to stop? 01:41:26.390 --> 01:41:28.850 Well, it turns out, thus far, when we're using for loops, 01:41:28.850 --> 01:41:34.040 we've typically done something like just count from 0 on up to some number. 01:41:34.040 --> 01:41:36.620 This condition, though, is any Boolean expression. 01:41:36.620 --> 01:41:39.300 I just need to have a yes/no or a true/false answer. 01:41:39.300 --> 01:41:40.850 So you know what I could do? 01:41:40.850 --> 01:41:45.620 Keep looping so long as character at location i 01:41:45.620 --> 01:41:50.030 and s does not equal backslash 0. 01:41:50.030 --> 01:41:52.170 So this is now definitely some new syntax. 01:41:52.170 --> 01:41:53.510 Let me zoom in here. 01:41:53.510 --> 01:41:58.700 But s[i] just means the i-th character in s, or more specifically, 01:41:58.700 --> 01:42:01.820 the character at position i in s. 01:42:01.820 --> 01:42:05.000 Bang equals-- so bang is how a programmer pronounces 01:42:05.000 --> 01:42:08.150 exclamation point because it's a little faster-- bang equals 01:42:08.150 --> 01:42:09.597 means does not equal. 01:42:09.597 --> 01:42:12.680 So this is how you would do an equal sign with a slash through it in math. 01:42:12.680 --> 01:42:15.920 It's, in code, exclamation point, equals sign. 01:42:15.920 --> 01:42:18.230 And then notice this funkiness-- backslash 01:42:18.230 --> 01:42:22.100 0 is again, the "null character," but it's in single quotes 01:42:22.100 --> 01:42:24.500 because, again, it is by definition a character. 01:42:24.500 --> 01:42:26.480 And for reasons we'll get into another time, 01:42:26.480 --> 01:42:28.760 backslash 0 is how you express it. 01:42:28.760 --> 01:42:32.600 Just like backslash n is kind of a weird escape character for the new line, 01:42:32.600 --> 01:42:36.710 backslash 0 is the character that is all 0's. 01:42:36.710 --> 01:42:38.570 So this is kind of a different for loop. 01:42:38.570 --> 01:42:41.870 I'm still starting at 0 for i. 01:42:41.870 --> 01:42:43.880 I'm still incrementing i as always. 01:42:43.880 --> 01:42:46.400 But I'm now not checking for some preordained length 01:42:46.400 --> 01:42:50.990 because just like a computer, I do not know a priori where these strings end. 01:42:50.990 --> 01:42:55.580 I only know that they end once I see backslash 0. 01:42:55.580 --> 01:42:59.150 So when I now go down here and do make string-- 01:42:59.150 --> 01:43:05.570 it compiles OK-- ./string, let me type in something like "HELLO" in all caps. 01:43:05.570 --> 01:43:07.460 Voila, the output is "HELLO" again. 01:43:07.460 --> 01:43:08.450 Let me do it again-- 01:43:08.450 --> 01:43:11.030 "BYE" in all caps, and the output is "BYE." 01:43:11.030 --> 01:43:13.580 So it's kind of a useless program in that it's just printing 01:43:13.580 --> 01:43:15.350 the same thing that I typed in. 01:43:15.350 --> 01:43:19.490 But I'm conditionally using this Boolean expression 01:43:19.490 --> 01:43:22.170 to decide whether or not to keep printing characters. 01:43:22.170 --> 01:43:25.280 Now thankfully, C comes with a function that can answer this for me. 01:43:25.280 --> 01:43:29.210 It turns out there is a function called strlen 01:43:29.210 --> 01:43:31.850 so I can literally just say, well, figure out 01:43:31.850 --> 01:43:33.500 what the length of the string is. 01:43:33.500 --> 01:43:36.110 The function is called strlen for string length. 01:43:36.110 --> 01:43:40.730 And it exists in a file called, not surprisingly, perhaps, 01:43:40.730 --> 01:43:43.610 string.h, string.h. 01:43:43.610 --> 01:43:47.660 So now let me go ahead down here and do make string-- 01:43:47.660 --> 01:43:50.300 compiles OK-- ./string. 01:43:50.300 --> 01:43:52.950 Type in "HELLO," and it still works. 01:43:52.950 --> 01:43:58.400 So this function strlen that does exist in a library via the header file 01:43:58.400 --> 01:43:59.523 string.h already exists. 01:43:59.523 --> 01:44:00.440 Someone else wrote it. 01:44:00.440 --> 01:44:01.710 But how did they write it? 01:44:01.710 --> 01:44:04.040 Odds are they wrote the first version that I 01:44:04.040 --> 01:44:06.980 did by checking for that backslash 0. 01:44:06.980 --> 01:44:09.235 But let me ask a subtle question here. 01:44:09.235 --> 01:44:10.235 This program is correct. 01:44:10.235 --> 01:44:12.235 It iterates over the whole length of the string, 01:44:12.235 --> 01:44:14.870 and it prints out every character therein. 01:44:14.870 --> 01:44:20.510 Can anyone observe a poor design decision in this function? 01:44:20.510 --> 01:44:24.200 This one's subtle, but there's something I don't 01:44:24.200 --> 01:44:26.660 like about my for loop in particular. 01:44:26.660 --> 01:44:28.640 And I'll isolate it to line 9. 01:44:28.640 --> 01:44:31.230 I've not done something optimally on line 9. 01:44:31.230 --> 01:44:34.700 There's an opportunity for better design. 01:44:34.700 --> 01:44:40.830 Any thoughts here on what I might do better? 01:44:40.830 --> 01:44:42.426 Yeah, Jonathan? 01:44:42.426 --> 01:44:46.770 JONATHAN: Yeah, to create basically another variable for the string length 01:44:46.770 --> 01:44:48.455 and to remember it. 01:44:48.455 --> 01:44:50.580 DAVID MALAN: Yeah, and why are you suggesting that? 01:44:50.580 --> 01:44:53.670 JONATHAN: If you want to use a different value for the string length, 01:44:53.670 --> 01:44:55.710 or if it might fluctuate or change, you want 01:44:55.710 --> 01:44:59.370 to just have a different variable as a sort of placeholder value for it. 01:44:59.370 --> 01:45:00.670 DAVID MALAN: OK, potentially. 01:45:00.670 --> 01:45:03.210 But I will claim in this case that because the human has 01:45:03.210 --> 01:45:07.090 typed in the word, once you type in the word, it's not going to change. 01:45:07.090 --> 01:45:11.520 But I think you're going down the right direction because 01:45:11.520 --> 01:45:15.570 in this Boolean expression here, i less than the string length of s, 01:45:15.570 --> 01:45:19.350 recall that this expression gets evaluated again and again and again. 01:45:19.350 --> 01:45:22.050 Every time through a for loop, recall that you're constantly 01:45:22.050 --> 01:45:23.290 checking the condition. 01:45:23.290 --> 01:45:26.460 The condition in this case is i less than the length of s. 01:45:26.460 --> 01:45:30.382 The problem is that strlen in this case is a function, which 01:45:30.382 --> 01:45:32.340 means there's some piece of code someone wrote, 01:45:32.340 --> 01:45:35.593 probably similar to what I wrote a few minutes ago, that you're constantly 01:45:35.593 --> 01:45:37.260 asking, what's the length of the string? 01:45:37.260 --> 01:45:38.593 What's the length of the string? 01:45:38.593 --> 01:45:41.880 And recall from our picture, the way you figure out the length of a string 01:45:41.880 --> 01:45:44.070 is you start at the beginning of the string, and you keep checking, 01:45:44.070 --> 01:45:45.300 am I at backslash 0? 01:45:45.300 --> 01:45:46.020 OK. 01:45:46.020 --> 01:45:47.700 Am I at backslash 0? 01:45:47.700 --> 01:45:48.540 OK. 01:45:48.540 --> 01:45:52.600 So to figure out the length of "HI!", it's going to take me 1, 2, 3, 4 steps, 01:45:52.600 --> 01:45:54.600 right, because I have to start at the beginning. 01:45:54.600 --> 01:45:57.267 And I iterate from location 0 on to the end. 01:45:57.267 --> 01:45:59.100 To find out the length of "BYE!", it's going 01:45:59.100 --> 01:46:01.350 to take me five steps because that's how long it's 01:46:01.350 --> 01:46:04.740 going to take me from left to right to find that backslash 0. 01:46:04.740 --> 01:46:07.080 So what I don't like about this line of code is, 01:46:07.080 --> 01:46:10.680 why are you asking for the string length of s again and again 01:46:10.680 --> 01:46:11.790 and again and again? 01:46:11.790 --> 01:46:14.230 It's not going to change in this context. 01:46:14.230 --> 01:46:17.887 So Jonathan's point is taken if we keep asking the user for more input. 01:46:17.887 --> 01:46:19.970 But in this case, we've only asked the human once. 01:46:19.970 --> 01:46:20.920 So you know what? 01:46:20.920 --> 01:46:26.700 Let's take Jonathan's advice and do int n equals the string length of s. 01:46:26.700 --> 01:46:28.950 And then maybe you know what we could do? 01:46:28.950 --> 01:46:32.170 Put n in this condition instead. 01:46:32.170 --> 01:46:35.520 So now I'm asking the same question, but I'm not foolishly, 01:46:35.520 --> 01:46:39.030 inefficiently asking the same question again and again, 01:46:39.030 --> 01:46:42.720 whereby the same question requires a good amount of work 01:46:42.720 --> 01:46:45.940 to find the backslash 0 again and again and again. 01:46:45.940 --> 01:46:48.470 Now, there's some cleaning up we can do here too. 01:46:48.470 --> 01:46:50.970 It turns out there's this other subtle feature of for loops. 01:46:50.970 --> 01:46:54.660 If you want to initialize another variable to a value, 01:46:54.660 --> 01:46:56.370 you can actually do this all at once. 01:46:56.370 --> 01:46:59.130 And you can do so before the semicolon. 01:46:59.130 --> 01:47:04.530 You can do comma n equals strlen of s. 01:47:04.530 --> 01:47:07.150 And then you can use n, just as I have here. 01:47:07.150 --> 01:47:09.210 So it's not all that much better, but it's 01:47:09.210 --> 01:47:11.790 a little cleaner in that now I've taken two lines of code 01:47:11.790 --> 01:47:13.710 and collapsed them into one. 01:47:13.710 --> 01:47:15.750 They both have to be of the same data types, 01:47:15.750 --> 01:47:19.380 but that's OK here because both i and n are. 01:47:19.380 --> 01:47:21.750 So again, the inefficiency here is that it was foolish 01:47:21.750 --> 01:47:26.100 before that I kept asking the same question again and again and again. 01:47:26.100 --> 01:47:30.810 But now I'm asking the question once, remembering it in a variable called n, 01:47:30.810 --> 01:47:36.720 and only comparing i against that integer which does not actually change. 01:47:36.720 --> 01:47:38.370 All right, I know that too was a lot. 01:47:38.370 --> 01:47:41.910 Let's go ahead here and take a 3-minute break just to stretch legs and whatnot. 01:47:41.910 --> 01:47:44.880 In 3 minutes, we'll come back and start to see applications 01:47:44.880 --> 01:47:48.030 now of all of these features ultimately to some problems that 01:47:48.030 --> 01:47:51.030 are going to lie ahead this week on the readability of language 01:47:51.030 --> 01:47:52.510 and also on cryptography. 01:47:52.510 --> 01:47:54.750 So we'll see you in 3 minutes. 01:47:54.750 --> 01:47:57.240 All right, so we're back. 01:47:57.240 --> 01:48:00.885 And this has been a whole bunch of low-level details, admittedly. 01:48:00.885 --> 01:48:03.510 And where we're going with this ultimately this week and beyond 01:48:03.510 --> 01:48:05.562 is applications of some of these building blocks. 01:48:05.562 --> 01:48:08.520 And one of those applications this coming week and the next problem set 01:48:08.520 --> 01:48:11.580 is going to be that of cryptography, the art of scrambling or encrypting 01:48:11.580 --> 01:48:12.597 information. 01:48:12.597 --> 01:48:14.430 And if you're trying to encrypt information, 01:48:14.430 --> 01:48:16.830 like messages, well, those messages might very well 01:48:16.830 --> 01:48:19.260 be written in English or in ASCII, if you will. 01:48:19.260 --> 01:48:23.250 And you might want to convert some of those ASCII characters from one thing 01:48:23.250 --> 01:48:27.480 to another so that if your message is intercepted by some third party, 01:48:27.480 --> 01:48:30.990 they can't actually decipher or figure out what it is that you've sent. 01:48:30.990 --> 01:48:33.360 So I feel like we're almost toward-- 01:48:33.360 --> 01:48:35.550 we're almost at the ability where, in code, we 01:48:35.550 --> 01:48:39.270 can start to convert one word to another or to scramble our text. 01:48:39.270 --> 01:48:41.490 But we do need a couple of more building blocks. 01:48:41.490 --> 01:48:44.040 So recall that we left off with this picture 01:48:44.040 --> 01:48:47.160 here, where we had two words in the computer's memory, "HI!" and "BYE!", 01:48:47.160 --> 01:48:50.610 both with exclamation points, but also both with these backslash 0's 01:48:50.610 --> 01:48:52.800 that you and I do not put there explicitly. 01:48:52.800 --> 01:48:56.370 They just happen for you any time you use the double quotes and any time 01:48:56.370 --> 01:48:58.990 you use the get_string function. 01:48:58.990 --> 01:49:03.720 So once we have those in memory, you can think of them as s and t respectively. 01:49:03.720 --> 01:49:06.480 But a string, s or t, is just an array. 01:49:06.480 --> 01:49:11.040 So again, you can also refer to all of these individual characters or chars 01:49:11.040 --> 01:49:15.420 via the new square bracket notation of today, s[0], s[1], s[2], s[3], 01:49:15.420 --> 01:49:21.210 and then t[0], t[1], [2], [3], and [4], and then whatever else is 01:49:21.210 --> 01:49:22.470 in the computer's memory. 01:49:22.470 --> 01:49:26.880 But you know what you can even do is this-- suppose that instead we 01:49:26.880 --> 01:49:28.980 wanted to have an array of words. 01:49:28.980 --> 01:49:32.650 So before, we had an array of scores, an array of integers. 01:49:32.650 --> 01:49:35.370 But now suppose we wanted in the context of some other program 01:49:35.370 --> 01:49:36.780 to have an array of words. 01:49:36.780 --> 01:49:37.800 You can totally do that. 01:49:37.800 --> 01:49:40.560 There's nothing stopping you from having an array of words. 01:49:40.560 --> 01:49:42.240 And the syntax is going to be identical. 01:49:42.240 --> 01:49:48.150 Notice, if I want an array called words that has room for two strings, 01:49:48.150 --> 01:49:51.180 I literally just say, string words[2]. 01:49:51.180 --> 01:49:56.540 This means, hey, computer, give me an array of size 2, each of whose members 01:49:56.540 --> 01:49:57.540 is going to be a string. 01:49:57.540 --> 01:49:58.920 How do I populate that array? 01:49:58.920 --> 01:50:00.510 Same as before with the scores-- 01:50:00.510 --> 01:50:02.790 words[0] gets, quote unquote, "HI!" 01:50:02.790 --> 01:50:05.280 Words[1] gets, quote unquote, "BYE!" 01:50:05.280 --> 01:50:09.540 So that is to say with this code, could we create a picture similar to the one 01:50:09.540 --> 01:50:10.380 previously? 01:50:10.380 --> 01:50:12.540 But I'm not calling these strings s and t. 01:50:12.540 --> 01:50:16.890 Now I'm calling them both "words" at two different locations, 0 and 1 01:50:16.890 --> 01:50:17.830 respectively. 01:50:17.830 --> 01:50:20.040 So we could redraw that same picture like this. 01:50:20.040 --> 01:50:23.790 Now this word is technically named words[0]. 01:50:23.790 --> 01:50:26.640 And this one is referred to by words[1]. 01:50:26.640 --> 01:50:29.310 But again, what is a string? 01:50:29.310 --> 01:50:30.990 A string is an array. 01:50:30.990 --> 01:50:34.360 And yet, here we have an array of strings. 01:50:34.360 --> 01:50:37.510 So we kind of sort of have an array of arrays. 01:50:37.510 --> 01:50:40.440 So we've got an array of words, but a word is just a string. 01:50:40.440 --> 01:50:42.850 And a string is an array of characters. 01:50:42.850 --> 01:50:47.430 So what I really have on the board is an array of arrays. 01:50:47.430 --> 01:50:51.190 And so here-- and this will be the last weird syntax for today-- 01:50:51.190 --> 01:50:55.050 you can actually have multiple square brackets back to back. 01:50:55.050 --> 01:50:58.650 So if your variable's called words, and that variable's an array, 01:50:58.650 --> 01:51:03.240 if you want to get the first word in the array, you do words[0]. 01:51:03.240 --> 01:51:06.090 Once you're at that word, "HI!", and you want 01:51:06.090 --> 01:51:10.860 to get the first character in that word, you can similarly do [0]. 01:51:10.860 --> 01:51:14.230 So the first bracket refers to what word do you want in the array. 01:51:14.230 --> 01:51:18.060 The second bracket refers to what character do you want in that word. 01:51:18.060 --> 01:51:22.320 So now the I is that words[0][1]. 01:51:22.320 --> 01:51:25.500 The exclamation point is that words[0][2]. 01:51:25.500 --> 01:51:28.810 And the null character's at words[0][3]. 01:51:28.810 --> 01:51:37.508 Meanwhile, the B is that words[1][0], [1][1], [1][2], [1][3], [1][4]. 01:51:37.508 --> 01:51:40.050 So it's almost kind of like a coordinate system, if you will. 01:51:40.050 --> 01:51:43.200 It's a two-dimensional array, or an array of arrays. 01:51:43.200 --> 01:51:49.080 So this is only to say that if we wanted to think of arrays of strings 01:51:49.080 --> 01:51:53.280 as individual characters, we can. 01:51:53.280 --> 01:51:56.680 We have that expressiveness now to encode. 01:51:56.680 --> 01:52:00.460 So what more can I do now that I can manipulate things at this level? 01:52:00.460 --> 01:52:03.263 Let me do a program that'll be pretty applicable, 01:52:03.263 --> 01:52:05.430 I think, with some of our upcoming programs as well. 01:52:05.430 --> 01:52:06.960 Let me call this one uppercase. 01:52:06.960 --> 01:52:09.240 Let me quickly write a program whose purpose in life 01:52:09.240 --> 01:52:12.120 is just to convert an input word to uppercase. 01:52:12.120 --> 01:52:13.540 And let's see how we can do this. 01:52:13.540 --> 01:52:16.380 So let me go ahead and #include cs50.h. 01:52:16.380 --> 01:52:20.050 Let me go ahead and #include stdio.h. 01:52:20.050 --> 01:52:23.160 Let me also include this time string.h, which is 01:52:23.160 --> 01:52:24.990 going to give us functions like strlen. 01:52:24.990 --> 01:52:27.670 And then let me do int main(void). 01:52:27.670 --> 01:52:31.280 And then let me go ahead here and get a string from the user like before. 01:52:31.280 --> 01:52:34.030 So I'm just going to ask the user for a string. 01:52:34.030 --> 01:52:36.370 And I want them to give me whatever the string should 01:52:36.370 --> 01:52:38.950 be before I uppercase everything. 01:52:38.950 --> 01:52:41.890 Then I'm just going to go ahead and print out literally "After," 01:52:41.890 --> 01:52:46.330 just so I can see what happens after I capitalize everything in the string. 01:52:46.330 --> 01:52:49.690 And now let me go ahead and do this-- for int i get 0, 01:52:49.690 --> 01:52:53.110 i less than string length of s, i++. 01:52:53.110 --> 01:52:55.180 Wait a minute, I made that mistake before. 01:52:55.180 --> 01:52:57.200 Let's not repeat this question. 01:52:57.200 --> 01:53:02.740 Let's give myself a second variable-- n gets string length of s, i less than n, 01:53:02.740 --> 01:53:04.090 i++. 01:53:04.090 --> 01:53:06.340 So again, this is now becoming boilerplate. 01:53:06.340 --> 01:53:09.400 Any time you want to iterate over all of the characters in the string, 01:53:09.400 --> 01:53:11.913 this probably is a reasonable place to start. 01:53:11.913 --> 01:53:13.330 And then let me ask the question-- 01:53:13.330 --> 01:53:15.673 I want to iterate over every character in the string 01:53:15.673 --> 01:53:16.840 that the human has typed in. 01:53:16.840 --> 01:53:20.470 And I want to ask myself a question, just as we've done with any algorithm. 01:53:20.470 --> 01:53:23.980 Specifically, I want to ask if the current letter is lowercase, 01:53:23.980 --> 01:53:26.080 let me somehow convert it to uppercase. 01:53:26.080 --> 01:53:28.260 Else, let me just print it out unchanged. 01:53:28.260 --> 01:53:31.540 So how can I express that using last week and this week's building blocks? 01:53:31.540 --> 01:53:33.280 Well, let me say something like this-- 01:53:33.280 --> 01:53:39.670 if the character at location i in s, or if the i-th character in s 01:53:39.670 --> 01:53:47.710 is greater than or equal to a lowercase a, and the i-th character in s 01:53:47.710 --> 01:53:52.270 is less than or equal to a lower case z, what do I want to do? 01:53:52.270 --> 01:53:55.750 Let me go ahead and print out a character. 01:53:55.750 --> 01:53:59.320 But that character should be what? s bracket i, 01:53:59.320 --> 01:54:01.960 but I'm not sure what to do here yet. 01:54:01.960 --> 01:54:03.460 But let me come back to that. 01:54:03.460 --> 01:54:09.250 Else, let me go ahead and just print out that character unchanged, s[i]. 01:54:09.250 --> 01:54:14.270 So minus the placeholder, the question marks I've put, I'm kind of all the way 01:54:14.270 --> 01:54:14.770 there. 01:54:14.770 --> 01:54:16.872 Line 10 initializes i to 0. 01:54:16.872 --> 01:54:20.080 It's going to count all the way up to n, where n is the length of the string. 01:54:20.080 --> 01:54:21.310 And it's going to keep incrementing i. 01:54:21.310 --> 01:54:22.393 So we've seen that before. 01:54:22.393 --> 01:54:25.330 And again, that's going to become muscle memory before long. 01:54:25.330 --> 01:54:28.480 Line 12 is a little new, but it uses building blocks 01:54:28.480 --> 01:54:29.532 from last week and this. 01:54:29.532 --> 01:54:31.240 This week, we have the new square bracket 01:54:31.240 --> 01:54:34.810 notation to get the i-th character in the string s. 01:54:34.810 --> 01:54:37.870 Greater than or equal to, less than or equal to-- we saw at least one 01:54:37.870 --> 01:54:38.770 of those last week. 01:54:38.770 --> 01:54:41.860 That just means greater than or equal to, less than or equal to. 01:54:41.860 --> 01:54:46.370 I mentioned && last week, which is the logical AND operator, 01:54:46.370 --> 01:54:49.150 which means you can check one condition and another. 01:54:49.150 --> 01:54:52.540 And the whole thing is true if both of those are true. 01:54:52.540 --> 01:54:54.440 This is a bit weird today. 01:54:54.440 --> 01:54:57.100 But if you want to express, is the current character 01:54:57.100 --> 01:55:01.930 between lowercase a and lowercase z, totally fine 01:55:01.930 --> 01:55:07.750 to implicitly treat a and z as numbers, which they really are. 01:55:07.750 --> 01:55:11.180 Because again, if we come back to our favorite ASCII chart, 01:55:11.180 --> 01:55:16.600 you'll see again that lowercase a has a number associated with it, 97. 01:55:16.600 --> 01:55:20.410 Lowercase z has a number associated with it, 122. 01:55:20.410 --> 01:55:25.000 So if I really wanted to be pedantic, I could go back into my code 01:55:25.000 --> 01:55:28.540 and do something like, well, if this is greater than or equal to 97, 01:55:28.540 --> 01:55:32.320 and it's less than or equal to 122, but bad design. 01:55:32.320 --> 01:55:35.272 Like, I'm never going to remember that lowercase z is 122. 01:55:35.272 --> 01:55:36.730 Like, no one is going to know that. 01:55:36.730 --> 01:55:38.320 It makes the code less obvious. 01:55:38.320 --> 01:55:41.080 Go ahead and write it in a way that's a little more 01:55:41.080 --> 01:55:43.730 friendly to humans like this. 01:55:43.730 --> 01:55:45.070 But notice this question mark. 01:55:45.070 --> 01:55:46.720 How do I fill in this blank? 01:55:46.720 --> 01:55:48.970 Well, let me go back to the ASCII chart. 01:55:48.970 --> 01:55:51.520 This is subtle, but this is kind of cool. 01:55:51.520 --> 01:55:53.560 And humans were definitely thinking ahead. 01:55:53.560 --> 01:55:56.590 Notice that lowercase a is 97. 01:55:56.590 --> 01:55:58.900 Capital A is 65. 01:55:58.900 --> 01:56:01.000 Lowercase b is 98. 01:56:01.000 --> 01:56:03.430 Capital B is 66. 01:56:03.430 --> 01:56:05.965 And notice these two numbers-- 01:56:05.965 --> 01:56:13.330 65 to 97, 66 to 98, 67 to 99. 01:56:13.330 --> 01:56:17.320 It would seem that no matter what letters we compare, lowercase 01:56:17.320 --> 01:56:20.543 and uppercase, they're always 32 apart. 01:56:20.543 --> 01:56:21.460 And that's consistent. 01:56:21.460 --> 01:56:24.290 We could do it for all 26 English letters. 01:56:24.290 --> 01:56:27.520 So if they're always 32 apart, you know what I could do-- 01:56:27.520 --> 01:56:30.730 if I want to take a lowercase letter, which 01:56:30.730 --> 01:56:33.790 is what I'm thinking about in line 14, I could just 01:56:33.790 --> 01:56:36.102 subtract off 32 in this case. 01:56:36.102 --> 01:56:37.810 It's not the cleanest, because again, I'm 01:56:37.810 --> 01:56:39.640 probably going to forget that math at some point. 01:56:39.640 --> 01:56:41.598 But at least mathematically, I think that'll do 01:56:41.598 --> 01:56:44.050 the trick because 97 will become 65. 01:56:44.050 --> 01:56:47.922 98 will become 66, which is forcing those characters to lowercase. 01:56:47.922 --> 01:56:49.630 But they're not being printed as numbers. 01:56:49.630 --> 01:56:52.870 I'm still using %c to coerce it to be a char. 01:56:52.870 --> 01:56:56.780 So if I didn't mess any syntax up here, let me make uppercase. 01:56:56.780 --> 01:56:59.260 OK, ./uppercase. 01:56:59.260 --> 01:57:03.580 And let me go ahead and type in, for instance, my name in all lowercase. 01:57:03.580 --> 01:57:05.490 And voila, uppercase. 01:57:05.490 --> 01:57:06.490 Now, it's a little ugly. 01:57:06.490 --> 01:57:08.530 I forgot my backslash n, so let me go ahead 01:57:08.530 --> 01:57:11.620 and add one of those real quick just to fix the cursor. 01:57:11.620 --> 01:57:14.590 Let me recompile the code with make uppercase. 01:57:14.590 --> 01:57:17.650 Let me rerun the program with ./uppercase and now type in my name, 01:57:17.650 --> 01:57:18.400 David. 01:57:18.400 --> 01:57:20.050 Let me do it again with Brian. 01:57:20.050 --> 01:57:23.770 And notice that it's capitalizing everything character by character 01:57:23.770 --> 01:57:26.470 using only today's building blocks. 01:57:26.470 --> 01:57:27.530 This is correct. 01:57:27.530 --> 01:57:30.350 It's pretty well styled because everything's nicely indented. 01:57:30.350 --> 01:57:33.890 It's very readable even though it might look a little cryptic at first glance. 01:57:33.890 --> 01:57:35.430 But I think I can do better. 01:57:35.430 --> 01:57:37.940 And I can do better by using yet another library. 01:57:37.940 --> 01:57:41.270 And here's where C, and really programming in general, gets powerful. 01:57:41.270 --> 01:57:43.340 The whole point of using popular languages 01:57:43.340 --> 01:57:46.742 is because so many other people before you have solved problems 01:57:46.742 --> 01:57:48.200 that you don't need to solve again. 01:57:48.200 --> 01:57:51.230 And I'm sure over the past, like, 50 years, someone has probably 01:57:51.230 --> 01:57:54.770 written a function that capitalizes letters for me. 01:57:54.770 --> 01:57:56.690 I don't have to do this myself. 01:57:56.690 --> 01:58:00.770 And indeed, there is another library that I'm going 01:58:00.770 --> 01:58:02.540 to include by way of its header file. 01:58:02.540 --> 01:58:07.050 In ctype.h, type which is the language C and a bunch of type-related things. 01:58:07.050 --> 01:58:11.270 And in ctype.h, it turns out there's a function call-- 01:58:11.270 --> 01:58:12.650 there's a couple of functions. 01:58:12.650 --> 01:58:15.990 Specifically, let me get rid of all of this code. 01:58:15.990 --> 01:58:21.980 And let me call a function called islower and pass to islower s[i]. 01:58:21.980 --> 01:58:24.560 And islower, as you might guess, its purpose in life 01:58:24.560 --> 01:58:27.230 is to return essentially a Boolean value, true or false, 01:58:27.230 --> 01:58:28.770 if that character is lower. 01:58:28.770 --> 01:58:31.610 And if so, well, let me go ahead and print out a placeholder 01:58:31.610 --> 01:58:34.280 followed by the capitalization of that letter. 01:58:34.280 --> 01:58:37.670 Now, before I had to do that annoying math with minus 32 and figure it out, 01:58:37.670 --> 01:58:44.120 uh-uh, toupper of parentheses s[i]. 01:58:44.120 --> 01:58:48.110 And now I can otherwise just print out that character unchanged, 01:58:48.110 --> 01:58:50.990 just as before, s[i]. 01:58:50.990 --> 01:58:52.400 But now notice my program-- 01:58:52.400 --> 01:58:54.540 honestly, it's definitely a little shorter. 01:58:54.540 --> 01:58:56.900 It's a little simpler in that there's just less code. 01:58:56.900 --> 01:59:00.800 And hopefully, if the person that wrote islower and toupper did a good job, 01:59:00.800 --> 01:59:01.898 I know it's correct. 01:59:01.898 --> 01:59:03.440 I'm just standing on their shoulders. 01:59:03.440 --> 01:59:07.010 And frankly, my code's more readable because I understand what islower 01:59:07.010 --> 01:59:11.450 means, whereas that crazy && syntax and all of the additional code-- 01:59:11.450 --> 01:59:14.360 that was just a lot harder to wrap your mind around, arguably. 01:59:14.360 --> 01:59:19.510 So now if I go ahead and compile this-- make uppercase. 01:59:19.510 --> 01:59:21.460 OK, that seemed to work well. 01:59:21.460 --> 01:59:24.870 And now I'm going to go ahead and do ./uppercase and type in my name in all 01:59:24.870 --> 01:59:25.800 lowercase again. 01:59:25.800 --> 01:59:26.810 David seems to work. 01:59:26.810 --> 01:59:27.690 Brian seems to work. 01:59:27.690 --> 01:59:29.065 And I could do this all day long. 01:59:29.065 --> 01:59:30.390 It seems to still work. 01:59:30.390 --> 01:59:31.350 But you know what? 01:59:31.350 --> 01:59:33.407 I don't think I have to be even this explicit. 01:59:33.407 --> 01:59:33.990 You know what? 01:59:33.990 --> 01:59:36.660 I bet if the human who wrote toupper was smart, 01:59:36.660 --> 01:59:41.700 I bet I can just blindly pass in any character to toupper, 01:59:41.700 --> 01:59:46.948 and it's only going to uppercase it if it can be converted to uppercase. 01:59:46.948 --> 01:59:48.740 Otherwise, it'll pass it through unchanged. 01:59:48.740 --> 01:59:49.448 So you know what? 01:59:49.448 --> 01:59:53.340 Let me get rid of all of this stuff and really tighten this program up 01:59:53.340 --> 01:59:59.760 and print out a placeholder for c and then toupper of s[i]. 01:59:59.760 --> 02:00:02.760 And sure enough, if you read the documentation for this function, 02:00:02.760 --> 02:00:07.380 it will handle the case where it's either lowercase or not lowercase. 02:00:07.380 --> 02:00:09.270 And it will do the right thing. 02:00:09.270 --> 02:00:14.070 So now if I recompile my code, make uppercase, so far so good. 02:00:14.070 --> 02:00:15.780 ./uppercase, David again. 02:00:15.780 --> 02:00:17.370 Voila, it still works. 02:00:17.370 --> 02:00:21.300 And notice truly just how much tighter, how much cleaner, 02:00:21.300 --> 02:00:23.100 how much shorter my code is. 02:00:23.100 --> 02:00:26.790 And it's more readable in the sense that this function is pretty well named. 02:00:26.790 --> 02:00:29.070 Toupper is what it's indeed called. 02:00:29.070 --> 02:00:31.140 But there is an important detail here. 02:00:31.140 --> 02:00:34.140 Toupper expects as input a character. 02:00:34.140 --> 02:00:36.090 You cannot pass a whole word to it. 02:00:36.090 --> 02:00:39.480 It is still necessary at this point for me to be using this loop 02:00:39.480 --> 02:00:41.542 and doing it character by character. 02:00:41.542 --> 02:00:42.750 Now, how would you know this? 02:00:42.750 --> 02:00:45.910 Well, you'll see multiple examples of this over the weeks to come. 02:00:45.910 --> 02:00:50.000 But if I go to what's called the manual pages for the language C, 02:00:50.000 --> 02:00:51.750 we have our own web-based version of them. 02:00:51.750 --> 02:00:54.210 And we'll link this for you in the course's labs 02:00:54.210 --> 02:00:55.690 and problem sets as needed. 02:00:55.690 --> 02:00:58.680 You can see a list of all of the available functions in C 02:00:58.680 --> 02:01:00.570 at least that are frequently used in CS50. 02:01:00.570 --> 02:01:03.510 And if we uncheck a box at the top, we can see even more functions. 02:01:03.510 --> 02:01:06.660 There's dozens, maybe hundreds of functions, most of which 02:01:06.660 --> 02:01:08.715 we will not need or use in CS50. 02:01:08.715 --> 02:01:10.590 But this is going to be true in any language. 02:01:10.590 --> 02:01:13.623 You sort of pick up the building blocks that you need over time. 02:01:13.623 --> 02:01:15.540 So we'll refer you to these kinds of resources 02:01:15.540 --> 02:01:18.930 so that you don't rely only on what we show in section and lecture, 02:01:18.930 --> 02:01:24.010 but you have at your disposal these other functions and toolkits as well. 02:01:24.010 --> 02:01:28.120 And we'll do the same with Python and SQL and other languages as well. 02:01:28.120 --> 02:01:32.040 So those are what we call, again, manual pages. 02:01:32.040 --> 02:01:34.440 All right, a final feature before we even 02:01:34.440 --> 02:01:38.880 think about cryptography and scrambling information as for problem set 2. 02:01:38.880 --> 02:01:41.520 So a command-line argument I mentioned by name before-- 02:01:41.520 --> 02:01:44.460 it's like a word you can type after a program's name 02:01:44.460 --> 02:01:46.960 in order to provide it input at the command line. 02:01:46.960 --> 02:01:52.140 So make hello-- hello is a command-line argument to the program, hello. 02:01:52.140 --> 02:01:58.470 Rm space a.out-- a.out was an argument, a command-line argument to the program 02:01:58.470 --> 02:02:00.130 rm when I wanted to remove it. 02:02:00.130 --> 02:02:02.790 So we've already seen command-line arguments in action. 02:02:02.790 --> 02:02:05.520 But we haven't actually written any programs 02:02:05.520 --> 02:02:11.460 that allow you to accept words or other inputs from the so-called command line. 02:02:11.460 --> 02:02:14.430 Up until now, all of the input you and I have gotten in our programs 02:02:14.430 --> 02:02:16.440 comes from get_string, get_int, and so forth. 02:02:16.440 --> 02:02:20.490 We have never been able to look at words that the human might very well have 02:02:20.490 --> 02:02:23.610 typed at the prompt when running your program. 02:02:23.610 --> 02:02:25.350 But that's all about to change now. 02:02:25.350 --> 02:02:28.350 Let me go ahead and create a program called argv.c, 02:02:28.350 --> 02:02:31.140 and it'll become clear why in just a moment. 02:02:31.140 --> 02:02:36.270 I'm going to go ahead and include, shall we say, stdio.h. 02:02:36.270 --> 02:02:39.120 And then I'm going to give myself int main(void). 02:02:39.120 --> 02:02:43.650 And then I'm just going to very simply go back and change the void. 02:02:43.650 --> 02:02:47.160 So just as our own custom functions can take inputs-- 02:02:47.160 --> 02:02:49.290 and we saw that with get_negative_int. 02:02:49.290 --> 02:02:52.020 We saw that with average today-- 02:02:52.020 --> 02:02:54.600 so does main potentially take inputs. 02:02:54.600 --> 02:02:57.120 Up till now though, we've been saying void. 02:02:57.120 --> 02:02:58.770 And we told you to say void last week. 02:02:58.770 --> 02:03:01.380 And we told you to say void in problem set 1. 02:03:01.380 --> 02:03:06.780 But now it turns out that C does allow you to put other inputs into main. 02:03:06.780 --> 02:03:10.720 You can either say, nope, main does not take any command-line arguments. 02:03:10.720 --> 02:03:15.270 But if it does, you can say literally, int argc 02:03:15.270 --> 02:03:19.150 and string argv with square brackets. 02:03:19.150 --> 02:03:20.220 So it's a little cryptic. 02:03:20.220 --> 02:03:22.803 And technically, you don't have to type it precisely this way. 02:03:22.803 --> 02:03:26.220 But human convention would have you do it, at least for now, in this way. 02:03:26.220 --> 02:03:29.010 This says that main, your function, main, 02:03:29.010 --> 02:03:33.360 takes an integer as one input and not a string 02:03:33.360 --> 02:03:36.570 but an array of strings as input. 02:03:36.570 --> 02:03:40.480 And argc is shorthand notation for argument count. 02:03:40.480 --> 02:03:43.860 Argument count is an integer that's going to represent the number of words 02:03:43.860 --> 02:03:45.720 that your users type at the prompt. 02:03:45.720 --> 02:03:48.330 Argv is short for argument vector. 02:03:48.330 --> 02:03:50.430 Vector is a fancy way of saying list. 02:03:50.430 --> 02:03:55.470 It is a variable that's going to store in an array all of the strings 02:03:55.470 --> 02:03:59.940 that a human types at the prompt after your own program's name. 02:03:59.940 --> 02:04:02.710 So we can use this, for instance, as follows. 02:04:02.710 --> 02:04:06.330 Suppose that I want to let the user type their own name at the command prompt. 02:04:06.330 --> 02:04:06.960 I don't want to use get_string. 02:04:06.960 --> 02:04:09.583 I don't want to have to prompt the human later for their name. 02:04:09.583 --> 02:04:12.750 I want them to be able to run my program and give me their name all at once, 02:04:12.750 --> 02:04:17.080 just like make, just like rm, and Clang, and other programs we've seen. 02:04:17.080 --> 02:04:20.850 So I'm going to do this-- if argc == 2-- 02:04:20.850 --> 02:04:24.390 so if the number of arguments to my program is 2-- 02:04:24.390 --> 02:04:31.420 go ahead and print out, "hello, %s", and plug in whatever is that argv[1]. 02:04:31.420 --> 02:04:33.450 So more on this in just a moment. 02:04:33.450 --> 02:04:37.770 Else, if argc is not equal to 2, let's just go with last week's default, 02:04:37.770 --> 02:04:39.190 "hello, world." 02:04:39.190 --> 02:04:41.250 So what is this program's purpose in life? 02:04:41.250 --> 02:04:43.680 If the human types two words at the prompt, 02:04:43.680 --> 02:04:47.310 I want to say, "hello, David," "hello, Brian," "hello, so-and-so." 02:04:47.310 --> 02:04:50.310 Otherwise, if they don't type two words at the prompt, 02:04:50.310 --> 02:04:52.630 I'm just going to say the default "hello, world." 02:04:52.630 --> 02:04:55.780 So let me compile this, make argv. 02:04:55.780 --> 02:05:00.350 And, hm, I didn't get it right here-- unknown type string, unknown type 02:05:00.350 --> 02:05:00.850 string. 02:05:00.850 --> 02:05:01.820 All right, I goofed. 02:05:01.820 --> 02:05:07.150 If I'm using string, recall that now I need to start using the CS50 library. 02:05:07.150 --> 02:05:09.730 And again, we'll see all the more why in the coming weeks as 02:05:09.730 --> 02:05:11.440 we take those training wheels off. 02:05:11.440 --> 02:05:13.870 But now I'm going to do this again, make argv. 02:05:13.870 --> 02:05:14.440 There we go. 02:05:14.440 --> 02:05:18.280 Now it works-- ./argv, Enter, "hello, world." 02:05:18.280 --> 02:05:20.710 That's pretty much equivalent to what we did last week. 02:05:20.710 --> 02:05:26.030 But notice if I type in, for instance, argv[1] David, Enter, it says, "hello, 02:05:26.030 --> 02:05:26.530 David." 02:05:26.530 --> 02:05:29.500 If I type in argv Brian, it says that. 02:05:29.500 --> 02:05:33.710 If I type in Brian Yu, it says "hello, world." 02:05:33.710 --> 02:05:35.200 So what's going on? 02:05:35.200 --> 02:05:40.990 Well, the way you write programs in C that accept zero or more command-line 02:05:40.990 --> 02:05:44.620 arguments-- that is, words at the prompt after your program's name-- 02:05:44.620 --> 02:05:48.910 is you change what we have been doing all this time from void 02:05:48.910 --> 02:05:52.748 to be this into argc string argv with square brackets. 02:05:52.748 --> 02:05:55.165 And what the computer is going to do for you automatically 02:05:55.165 --> 02:05:59.170 is it's going to store in argc a number of the total number of words 02:05:59.170 --> 02:06:01.690 that the human typed in, not just the arguments, technically 02:06:01.690 --> 02:06:04.420 all of the words, including your own program's name. 02:06:04.420 --> 02:06:08.650 It's then going to fill this array of strings, a.k.a. argv, 02:06:08.650 --> 02:06:11.890 with all of the words the human typed at the prompt, so not just 02:06:11.890 --> 02:06:16.340 the arguments like Brian or David, but also the name of your program. 02:06:16.340 --> 02:06:20.560 So if the human typed in two total words, which they did, argv Brian, 02:06:20.560 --> 02:06:24.160 argv David, then I want to print out, "hello" 02:06:24.160 --> 02:06:27.790 followed by a placeholder and then whatever value is at argv[1]. 02:06:27.790 --> 02:06:29.770 And I'm deliberately not doing 0. 02:06:29.770 --> 02:06:33.340 If I did 0, based on the verbal definition I just gave, 02:06:33.340 --> 02:06:38.260 if I recompile this program, I don't want to see this, hello, ./argv. 02:06:38.260 --> 02:06:43.030 So the program's own name is automatically always stored for you 02:06:43.030 --> 02:06:45.190 at the first location in that array. 02:06:45.190 --> 02:06:48.070 But if you want the first useful piece of information, 02:06:48.070 --> 02:06:53.860 you actually would, after recompiling the code here, access it at [1]. 02:06:53.860 --> 02:06:58.030 And so in this way do we see in argv that we can actually 02:06:58.030 --> 02:06:59.350 access individual words. 02:06:59.350 --> 02:07:00.520 But notice this too-- 02:07:00.520 --> 02:07:05.410 suppose I want to print out all of the individual characters in someone's 02:07:05.410 --> 02:07:06.010 input. 02:07:06.010 --> 02:07:06.593 You know what? 02:07:06.593 --> 02:07:08.083 I bet I could even do this. 02:07:08.083 --> 02:07:09.250 Let me go ahead and do this. 02:07:09.250 --> 02:07:13.330 Instead of just printing out "hello," let me do for int i get 0, 02:07:13.330 --> 02:07:17.620 n equals the string length of argv[1]. 02:07:20.270 --> 02:07:24.800 And then over here, I'm going to do i is less than n, i++. 02:07:24.800 --> 02:07:27.770 All right, so I'm going to iterate over all of the characters 02:07:27.770 --> 02:07:30.930 in the first real word in argv. 02:07:30.930 --> 02:07:32.160 And what am I going to do? 02:07:32.160 --> 02:07:37.310 Well, let me go ahead and print out a character that's at argv[1] 02:07:37.310 --> 02:07:38.900 but at location i. 02:07:38.900 --> 02:07:41.300 So I said a moment ago with our picture that we 02:07:41.300 --> 02:07:47.090 could think of an array of strings as really just being an array of arrays. 02:07:47.090 --> 02:07:53.570 And so I can employ that syntax here by going into argv[1] to get me the word 02:07:53.570 --> 02:07:57.440 like "David" or "Brian" or so forth, and then further index into it with more 02:07:57.440 --> 02:08:02.100 square brackets that get me the D, the A, the V, the I, the D, and so forth. 02:08:02.100 --> 02:08:05.300 And just to be super clear, let me put a new line character there 02:08:05.300 --> 02:08:08.070 just so we can see explicitly what's going on. 02:08:08.070 --> 02:08:10.580 And let me go ahead now and just delete this "hello, world" 02:08:10.580 --> 02:08:12.205 because I don't want to see any hellos. 02:08:12.205 --> 02:08:14.240 I just want to see the word the human typed in. 02:08:14.240 --> 02:08:19.500 Make argv-- whoops, what did I do wrong? 02:08:19.500 --> 02:08:25.260 Oh, I used strlen when I shouldn't have because I haven't included string.h 02:08:25.260 --> 02:08:26.550 at the top. 02:08:26.550 --> 02:08:31.230 OK, now if I recompile this code and recompile make argv-- 02:08:31.230 --> 02:08:36.010 there we go-- ./argv David, you'll see one character per line. 02:08:36.010 --> 02:08:38.940 And if I do the same with Brian's name or anyone's name 02:08:38.940 --> 02:08:42.352 and change it to Brian, I'm printing one character at a time. 02:08:42.352 --> 02:08:44.560 So again, I'm not sure why you would want to do that. 02:08:44.560 --> 02:08:47.760 But in this case, my goal simply was to not only iterate over 02:08:47.760 --> 02:08:51.970 the characters in that first word, but print them out. 02:08:51.970 --> 02:08:56.520 So again, just by applying twice over this time this principle, 02:08:56.520 --> 02:09:00.570 can we actually see that a program has access 02:09:00.570 --> 02:09:03.600 to the individual characters in each of these strings. 02:09:03.600 --> 02:09:06.090 All right, and one last explanation before we 02:09:06.090 --> 02:09:08.880 introduce the crypto and application thereof. 02:09:08.880 --> 02:09:11.790 This thing here, this thing here-- does anyone 02:09:11.790 --> 02:09:15.660 have any idea as to why main, last week and this week, 02:09:15.660 --> 02:09:19.320 seems to return an int even though it's not an average function? 02:09:19.320 --> 02:09:21.000 It's not a get_positive_int function. 02:09:21.000 --> 02:09:22.470 It's not get_negative_int. 02:09:22.470 --> 02:09:26.040 Somehow, for some reason, main keeps returning an int even though we 02:09:26.040 --> 02:09:29.410 have never seen this int in action. 02:09:29.410 --> 02:09:31.040 What might this mean? 02:09:31.040 --> 02:09:33.340 This is the one last piece that we promised 02:09:33.340 --> 02:09:37.090 last week we would eventually explain. 02:09:37.090 --> 02:09:38.800 What might this mean? 02:09:38.800 --> 02:09:41.420 And this one's a tough one. 02:09:41.420 --> 02:09:43.870 Brian, who do we have? 02:09:43.870 --> 02:09:47.230 How about [? Gred, ?] is it? 02:09:47.230 --> 02:09:51.810 [? GRED: ?] Usually, the functions in the end have returned 0. 02:09:51.810 --> 02:09:54.060 And that means that the function stops. 02:09:54.060 --> 02:10:00.270 And the 0 is the integer that pops out of the main function. 02:10:00.270 --> 02:10:03.810 DAVID MALAN: Yeah, and this one's subtle in that if you had programmed before, 02:10:03.810 --> 02:10:06.390 odds are-- and I'm guessing you have, [? Gred-- ?] you've seen this in use 02:10:06.390 --> 02:10:07.020 before. 02:10:07.020 --> 02:10:10.350 We humans, though, in the real world of using Macs and PCs-- 02:10:10.350 --> 02:10:13.320 you've actually seen numbers, integers in weird places. 02:10:13.320 --> 02:10:17.220 Frankly, almost any time your computer freezes or you see an error message, 02:10:17.220 --> 02:10:21.280 odds are you see an English or some spoken language in the error message. 02:10:21.280 --> 02:10:23.307 But you very often see a numeric code. 02:10:23.307 --> 02:10:25.140 For instance, if you're having Zoom trouble, 02:10:25.140 --> 02:10:29.700 you'll often see the number 5 in the error window in Zoom's program. 02:10:29.700 --> 02:10:31.710 And 5 just means you're having network issues. 02:10:31.710 --> 02:10:34.710 So programmers often associate integers with things 02:10:34.710 --> 02:10:36.540 that can go wrong in a program. 02:10:36.540 --> 02:10:42.210 And as [? Gred ?] notes, they use 0 to connote that nothing has gone wrong, 02:10:42.210 --> 02:10:43.660 that all as well. 02:10:43.660 --> 02:10:48.285 So let me write one final program here just called exit.c 02:10:48.285 --> 02:10:49.950 that puts this to the test. 02:10:49.950 --> 02:10:54.640 Let me go ahead and write a program in a file called exit.c 02:10:54.640 --> 02:10:57.870 that's going to introduce what we're going to call an exit status. 02:10:57.870 --> 02:11:01.230 This is a subtlety that will be useful as our programs get 02:11:01.230 --> 02:11:02.580 a little more complicated. 02:11:02.580 --> 02:11:06.360 I'm going to go in here and do #include cs50.h. 02:11:06.360 --> 02:11:09.360 And I'm going to go ahead and #include stdio.h. 02:11:09.360 --> 02:11:14.970 And I'm going to give myself the longer version of main, so int argc, string 02:11:14.970 --> 02:11:17.140 argv with the square brackets. 02:11:17.140 --> 02:11:21.690 And in here, I'm going to say, if argc does not equal 2, 02:11:21.690 --> 02:11:24.290 uh-uh, the human is not doing what I want them to, 02:11:24.290 --> 02:11:26.040 and I'm going to yell at them in some way. 02:11:26.040 --> 02:11:28.580 I'm going to say missing command-line arguments. 02:11:28.580 --> 02:11:31.960 So any kind of error message that I want the human to see on the screen, 02:11:31.960 --> 02:11:33.900 I'm just going to tell them with that message. 02:11:33.900 --> 02:11:37.650 But I'm going to very subtly return the number 1. 02:11:37.650 --> 02:11:39.090 I'm going to return an error code. 02:11:39.090 --> 02:11:41.830 And the human is not necessarily going to see this code. 02:11:41.830 --> 02:11:45.150 But if we were to have a graphical user interface or some other feature 02:11:45.150 --> 02:11:47.130 to this program, that would be the number 02:11:47.130 --> 02:11:49.110 they see in the error window that pops up, 02:11:49.110 --> 02:11:52.320 just like Zoom might show you the number 5 if something has gone wrong. 02:11:52.320 --> 02:11:54.870 Similarly, if you've ever visited a page, frankly, 02:11:54.870 --> 02:11:59.130 and the web page doesn't exist, you see the integer 404. 02:11:59.130 --> 02:12:01.890 That's not technically the exact same incarnation of this, 02:12:01.890 --> 02:12:05.440 but it is representative of programmers using numbers to represent errors. 02:12:05.440 --> 02:12:07.230 So that one, you probably have seen. 02:12:07.230 --> 02:12:11.160 Here, I'm going to go ahead, though, and by default, say, "hello, %s," 02:12:11.160 --> 02:12:14.250 just like before, passing in whatever's in argv[1]. 02:12:14.250 --> 02:12:17.940 So same program as before, but I'm not going to do any of this lame, "hello, 02:12:17.940 --> 02:12:21.580 world" if the human doesn't type in their name as I expect. 02:12:21.580 --> 02:12:25.110 Instead, I am going to check, did the human 02:12:25.110 --> 02:12:27.180 give me two words at the command line? 02:12:27.180 --> 02:12:30.210 If not, I'm going to print, "missing command-line argument," 02:12:30.210 --> 02:12:32.220 and then return this exit code. 02:12:32.220 --> 02:12:36.750 Otherwise, if all is well, I'm going to go ahead and return explicitly 0. 02:12:36.750 --> 02:12:40.200 This is another number that the human, you and I, are never going to see, 02:12:40.200 --> 02:12:42.060 but we could have access to it. 02:12:42.060 --> 02:12:46.200 And frankly, for course purposes, check50 can have access to this. 02:12:46.200 --> 02:12:48.570 And graphical user interfaces, when we get to those, 02:12:48.570 --> 02:12:50.980 can have access to these values. 02:12:50.980 --> 02:12:54.160 So 0, as [? Gred ?] notes, is just all as well. 02:12:54.160 --> 02:12:56.235 But 1 would mean that something goes wrong. 02:12:56.235 --> 02:12:58.860 So let me go ahead and make exit, which is kind of appropriate, 02:12:58.860 --> 02:13:00.270 as we're wrapping up here. 02:13:00.270 --> 02:13:02.760 And let me go ahead and do ./exit. 02:13:02.760 --> 02:13:05.700 "Missing command-line argument" is what's displayed. 02:13:05.700 --> 02:13:09.120 If I go ahead and say, exit David, now I see "hello, David." 02:13:09.120 --> 02:13:12.570 Or exit Brian, I'll see "exit Brian." 02:13:12.570 --> 02:13:15.120 Now, this is not a technique you'll need to use often, 02:13:15.120 --> 02:13:19.110 but you can actually see these return values if you want. 02:13:19.110 --> 02:13:23.970 If I run exit, and I see this error message, I can very weirdly say, 02:13:23.970 --> 02:13:28.260 echo $?, which is a very admittedly cryptic way of saying, 02:13:28.260 --> 02:13:30.120 what was my exit status? 02:13:30.120 --> 02:13:32.640 And if you hit Enter, you'll see 1. 02:13:32.640 --> 02:13:35.370 By contrast, if I run exit of David, and I actually 02:13:35.370 --> 02:13:42.060 see "hello, David," and I do echo $?, now I will see 0. 02:13:42.060 --> 02:13:45.030 So again, this is not a technique you and I will use very frequently. 02:13:45.030 --> 02:13:48.480 But it's a capability of a program, and it's a capability of C, 02:13:48.480 --> 02:13:49.920 that you do now have access to. 02:13:49.920 --> 02:13:52.140 And so in writing programs moving forward, 02:13:52.140 --> 02:13:55.380 what we will often do in labs and in problem sets and the like 02:13:55.380 --> 02:14:02.430 is ask you to return from main either 0 or 1 or maybe 2 or 3 or 4 02:14:02.430 --> 02:14:06.060 based on the problems that might have gone wrong in your program 02:14:06.060 --> 02:14:09.420 that you have detected and responded to appropriately. 02:14:09.420 --> 02:14:13.530 So it's a very effective way of handling errors in a standard way 02:14:13.530 --> 02:14:18.180 so that you know that you are being proactive about detecting mistakes. 02:14:18.180 --> 02:14:20.540 So what kinds of mistakes might we handle this week? 02:14:20.540 --> 02:14:22.290 And what kinds of problems might we solve? 02:14:22.290 --> 02:14:26.100 Well, today was entirely about deconstructing what a string is. 02:14:26.100 --> 02:14:29.220 Last week, it was just a sequence of text, a chunk of text. 02:14:29.220 --> 02:14:31.740 Today, it's now an array of characters. 02:14:31.740 --> 02:14:34.950 And we have new syntax in C for accessing those characters. 02:14:34.950 --> 02:14:38.370 We also today have access to more libraries, more header files, 02:14:38.370 --> 02:14:41.460 the documentation, therefore, so that we can actually solve problems 02:14:41.460 --> 02:14:43.290 without writing as much code ourselves. 02:14:43.290 --> 02:14:46.630 We can use other people's code in the form of these libraries. 02:14:46.630 --> 02:14:49.890 So one problem we will solve this coming week by way of problems set 2 02:14:49.890 --> 02:14:51.120 is that of readability. 02:14:51.120 --> 02:14:54.150 Like, when you're reading a book or an essay or a paper or anything, 02:14:54.150 --> 02:14:56.370 what is it that makes it like a 3rd-grade reading 02:14:56.370 --> 02:14:59.917 level or a 12th-grade reading level or university reading level? 02:14:59.917 --> 02:15:02.250 Well, all of us probably have an intuitive sense, right? 02:15:02.250 --> 02:15:05.940 Like, if it's big font and short words, it's probably for younger kids. 02:15:05.940 --> 02:15:09.000 And if it's really complicated words with big vocabulary and things 02:15:09.000 --> 02:15:12.460 we don't know, maybe it's meant for university audiences. 02:15:12.460 --> 02:15:16.440 But we can quantify this a little more formulaically, 02:15:16.440 --> 02:15:19.328 not necessarily the only way, but we'll give you a few definitions. 02:15:19.328 --> 02:15:21.120 So for instance, here's a famous sentence-- 02:15:21.120 --> 02:15:23.370 "Mr. And Mrs. Dursley, of number four, Privet Drive, 02:15:23.370 --> 02:15:26.412 we're proud to say that they were perfectly normal, thank you very much," 02:15:26.412 --> 02:15:27.420 and so forth. 02:15:27.420 --> 02:15:32.070 Well, what is it about this text that puts Harry Potter at grade seven 02:15:32.070 --> 02:15:32.940 reading level? 02:15:32.940 --> 02:15:35.520 Well, it probably has to do with the vocabulary words. 02:15:35.520 --> 02:15:38.760 But it probably has to do with the lengths of the sentences, the amount 02:15:38.760 --> 02:15:44.550 of punctuation perhaps, the total number of characters that you might count up. 02:15:44.550 --> 02:15:48.518 You can imagine quantifying it just based generically on the look 02:15:48.518 --> 02:15:49.810 and the aesthetics of the text. 02:15:49.810 --> 02:15:50.670 What about this? 02:15:50.670 --> 02:15:53.010 "In computational linguistics, authorship attribution 02:15:53.010 --> 02:15:55.590 is the task of predicting the author of document of unknown authorship. 02:15:55.590 --> 02:15:58.673 This task is generally performed by the analysis of stylometric features-- 02:15:58.673 --> 02:16:00.750 particular"-- this is Brian's senior thesis. 02:16:00.750 --> 02:16:02.650 So this is not a seventh-grade reading level. 02:16:02.650 --> 02:16:04.860 This was actually rated at grade 16. 02:16:04.860 --> 02:16:08.130 So Brian's pretty sophisticated when it comes to writing theses. 02:16:08.130 --> 02:16:11.160 But there too, you could perhaps glean from the sophistication 02:16:11.160 --> 02:16:14.010 of the sentences, the length thereof, and the words therein-- 02:16:14.010 --> 02:16:17.010 there's something we could perhaps quantify so as to apply numbers. 02:16:17.010 --> 02:16:21.720 And indeed, that's one way you could assess the readability of a text 02:16:21.720 --> 02:16:24.480 even if you don't have access to a dictionary with which 02:16:24.480 --> 02:16:27.360 to figure out which are the actual big or small words. 02:16:27.360 --> 02:16:28.820 And what about cryptography? 02:16:28.820 --> 02:16:32.160 So it's incredibly common these days and so important 02:16:32.160 --> 02:16:37.020 these days for you and I to use cryptography, not necessarily using 02:16:37.020 --> 02:16:39.389 algorithms we ourselves come up with, but rather using 02:16:39.389 --> 02:16:43.469 software, like WhatsApp and Signal and Telegram and Messenger and others, 02:16:43.469 --> 02:16:48.340 that support encryption between you and the third party or friend or family, 02:16:48.340 --> 02:16:51.090 or at least minimally the website with which you're interacting. 02:16:51.090 --> 02:16:55.590 So cryptography is the art of scrambling information, or hiding information. 02:16:55.590 --> 02:16:59.430 And if that information is text, well, frankly, as of this third week of CS50, 02:16:59.430 --> 02:17:03.059 we already have the requisite building blocks for not only representing text, 02:17:03.059 --> 02:17:05.040 but we saw today manipulating it. 02:17:05.040 --> 02:17:09.330 Even just uppercasing characters allows us to start mutating text. 02:17:09.330 --> 02:17:11.459 Well, what does it mean to encrypt information? 02:17:11.459 --> 02:17:13.650 Well, it's like our black box from last week. 02:17:13.650 --> 02:17:14.520 You have some input. 02:17:14.520 --> 02:17:15.395 You want some output. 02:17:15.395 --> 02:17:18.040 The input, we're going to start calling plaintext. 02:17:18.040 --> 02:17:20.969 The message, you want to send from yourself to someone else. 02:17:20.969 --> 02:17:22.976 Ciphertext is the output that you want. 02:17:22.976 --> 02:17:24.809 And so in between there, there's going to be 02:17:24.809 --> 02:17:26.226 what we're going to call a cipher. 02:17:26.226 --> 02:17:30.270 A cipher is an algorithm that encrypts or scrambles 02:17:30.270 --> 02:17:34.177 its input so as to produce output that a third party can't understand. 02:17:34.177 --> 02:17:35.969 And hopefully, that cipher, that algorithm, 02:17:35.969 --> 02:17:40.020 is a reversible process so that when you receive the scrambled ciphertext, 02:17:40.020 --> 02:17:44.830 you can figure out what it was that the person sent to you. 02:17:44.830 --> 02:17:48.030 But the key to using cryptography-- pun intended-- 02:17:48.030 --> 02:17:49.282 is to also have a secret key. 02:17:49.282 --> 02:17:51.240 So if you think back to grade school, maybe you 02:17:51.240 --> 02:17:53.549 were flirting with someone in class, and you sent them 02:17:53.549 --> 02:17:55.082 a note on a piece of paper. 02:17:55.082 --> 02:17:58.290 Well, hopefully, you didn't just say, like, I love you, on the piece of paper 02:17:58.290 --> 02:18:00.165 and then pass it through all of your friends, 02:18:00.165 --> 02:18:02.910 or let alone the teacher, to the ultimate recipient. 02:18:02.910 --> 02:18:05.340 Maybe you did something like, an A becomes 02:18:05.340 --> 02:18:08.459 a B. A B becomes a C. A C becomes a D. Like, 02:18:08.459 --> 02:18:11.740 you kind of apply an algorithm to add 1 to all of the letters 02:18:11.740 --> 02:18:14.219 so that if the teacher does intercept it and look at it, 02:18:14.219 --> 02:18:17.070 they probably don't have enough care in the world to figure out what this is. 02:18:17.070 --> 02:18:18.690 It's just going to look like nonsense. 02:18:18.690 --> 02:18:21.840 But if your friend knows that you changed A to B, B 02:18:21.840 --> 02:18:26.010 to C by adding 1 to every letter, they could reverse that process 02:18:26.010 --> 02:18:27.610 and decrypt it. 02:18:27.610 --> 02:18:30.270 So the key, for instance, might be literally the number 1. 02:18:30.270 --> 02:18:32.610 The message literally might be, "I LOVE YOU." 02:18:32.610 --> 02:18:35.080 But what would the ciphertext be, or the output? 02:18:35.080 --> 02:18:38.610 Well, let's consider "I LOVE YOU" is a string which, as of today, 02:18:38.610 --> 02:18:40.240 is an array of characters. 02:18:40.240 --> 02:18:42.000 So what use is that? 02:18:42.000 --> 02:18:45.123 Well, let's consider exactly that phrase as though it's an array. 02:18:45.123 --> 02:18:46.290 It's an array of characters. 02:18:46.290 --> 02:18:50.969 We know from last week, characters are just integers, decimal integers, 02:18:50.969 --> 02:18:53.190 thanks to ASCII, and in turn, Unicode. 02:18:53.190 --> 02:18:55.770 So it turns out I, we already know, is 73. 02:18:55.770 --> 02:19:04.920 And if we looked up all the others on a chart, L is 76, 79, 86, 69, 89, 79, 85. 02:19:04.920 --> 02:19:08.400 So we could relatively easily and see-- you might have to check your notes 02:19:08.400 --> 02:19:10.240 and check my sample code and so forth-- 02:19:10.240 --> 02:19:15.750 but relatively easily in C convert "I LOVE YOU" to the corresponding integers 02:19:15.750 --> 02:19:19.290 by just casting, so to speak, chars to integers. 02:19:19.290 --> 02:19:23.340 I could very easily mathematically, using the plus operator in C, 02:19:23.340 --> 02:19:26.910 start to add 1 to every one of these characters, 02:19:26.910 --> 02:19:29.309 thereby encrypting my message. 02:19:29.309 --> 02:19:31.398 But I could send my friend these numbers. 02:19:31.398 --> 02:19:33.690 But I might as well make it a little more user friendly 02:19:33.690 --> 02:19:36.209 and cast it back from integers to chars. 02:19:36.209 --> 02:19:42.930 So now it would seem that the ciphertext for "I LOVE YOU," if using a key of 1-- 02:19:42.930 --> 02:19:47.910 and 1 just means change A to B, not A to C, just move it by one place-- 02:19:47.910 --> 02:19:52.740 this is the ciphertext for an encrypted message of, "I LOVE YOU." 02:19:52.740 --> 02:19:55.840 And so the whole process becomes 1 is the input as the key. 02:19:55.840 --> 02:19:57.810 "I LOVE YOU" is the input as the plaintext. 02:19:57.810 --> 02:20:00.990 And the output ultimately is this unpronounceable phrase 02:20:00.990 --> 02:20:03.630 that, again, if the teacher or some friend intercepts, 02:20:03.630 --> 02:20:06.060 they probably don't know what's going on. 02:20:06.060 --> 02:20:08.520 And indeed, this is the essence of cryptography. 02:20:08.520 --> 02:20:12.027 The algorithms that protect our emails and texts and financial information 02:20:12.027 --> 02:20:13.860 and health information is hopefully way more 02:20:13.860 --> 02:20:17.160 sophisticated than that particular algorithm as it is. 02:20:17.160 --> 02:20:19.350 But it reduces to the same process-- 02:20:19.350 --> 02:20:23.640 an input key and an input text followed by some output, 02:20:23.640 --> 02:20:25.050 the so-called ciphertext. 02:20:25.050 --> 02:20:28.500 And this has been with us for decades now in some form, sometimes even 02:20:28.500 --> 02:20:29.400 mechanical form. 02:20:29.400 --> 02:20:32.760 Back in the day, you could actually get these little circular devices 02:20:32.760 --> 02:20:35.343 that have letters on the alphabet on one side, other letters 02:20:35.343 --> 02:20:36.760 on the alphabet on the other side. 02:20:36.760 --> 02:20:39.720 And if you rotate one or the other, A might line up 02:20:39.720 --> 02:20:41.310 with B, B might line up with C. 02:20:41.310 --> 02:20:44.760 So you can have even a physical incarnation of cryptography, 02:20:44.760 --> 02:20:49.920 just as was popular in a movie that seems to play endlessly on TV, 02:20:49.920 --> 02:20:52.930 at least here in the US around Christmas time. 02:20:52.930 --> 02:20:56.980 And you might recognize if you've seen A Christmas Story one such look. 02:20:56.980 --> 02:20:59.460 So we'll use just a couple of minutes of our final moments 02:20:59.460 --> 02:21:02.910 together to take a look at this real-world incarnation of cryptography 02:21:02.910 --> 02:21:06.533 that undoubtedly you can probably see on TV this fall. 02:21:06.533 --> 02:21:07.200 [VIDEO PLAYBACK] 02:21:07.200 --> 02:21:09.810 - "Be it known to all and sundry that Ralph Parker is hereby 02:21:09.810 --> 02:21:12.720 appointed a member of the Little Orphan Annie secret circle 02:21:12.720 --> 02:21:16.300 and is entitled to all the honors and benefits occurring thereto." 02:21:16.300 --> 02:21:18.810 - "Signed, Little Orphan Annie." 02:21:18.810 --> 02:21:22.920 "Countersigned, Pierre Andre," in ink. 02:21:22.920 --> 02:21:25.620 Honors and benefits already at the age of nine. 02:21:25.620 --> 02:21:27.942 [RADIO CHATTER] 02:21:27.942 --> 02:21:28.900 - (ON RADIO) Attention! 02:21:28.900 --> 02:21:29.710 [INAUDIBLE] overboard! 02:21:29.710 --> 02:21:30.165 [CLANGING] 02:21:30.165 --> 02:21:31.530 - (ON RADIO) Come [INAUDIBLE] Gone overboard! 02:21:31.530 --> 02:21:32.440 - (ON RADIO) [INAUDIBLE] 02:21:32.440 --> 02:21:33.808 - Come on, let's get on with it. 02:21:33.808 --> 02:21:36.100 I don't need all that jazz about smugglers and pirates. 02:21:36.100 --> 02:21:36.976 [BARKING] 02:21:37.645 --> 02:21:40.270 - (ON RADIO) Listen tomorrow night for the concluding adventure 02:21:40.270 --> 02:21:42.450 of the Black Pirate Ship. 02:21:42.450 --> 02:21:48.430 Now it's time for Annie's secret message for you members of the secret circle. 02:21:48.430 --> 02:21:52.150 Remember kids, only members of any secret circle 02:21:52.150 --> 02:21:54.730 can decode any secret message. 02:21:54.730 --> 02:21:58.900 Remember, Annie is depending on you. 02:21:58.900 --> 02:22:01.480 Set your pins to B-2. 02:22:01.480 --> 02:22:03.760 Here is the message. 02:22:03.760 --> 02:22:05.710 12, 11, 2, 8-- 02:22:05.710 --> 02:22:07.540 - I am in my first secret meeting. 02:22:07.540 --> 02:22:12.250 - (ON RADIO) --25, 14, 11, 18, 16, 23-- 02:22:12.250 --> 02:22:14.110 - Old Pierre was in great voice tonight. 02:22:14.110 --> 02:22:14.350 - (ON RADIO) --12, 23-- 02:22:14.350 --> 02:22:16.767 - I could tell that tonight's message was really important 02:22:16.767 --> 02:22:19.660 - (ON RADIO) --21, 3, 25. 02:22:19.660 --> 02:22:21.400 That's a message from Annie herself. 02:22:21.400 --> 02:22:22.620 Remember, don't tell anyone. 02:22:22.620 --> 02:22:25.584 [FOOTSTEPS AND PANTING] 02:22:27.560 --> 02:22:31.550 - 90 seconds later, I'm in the only room in the house where a boy of nine 02:22:31.550 --> 02:22:33.635 could sit in privacy and decode. 02:22:33.635 --> 02:22:43.070 [CHUCKLES] Aha, B. [CHUCKLES] I went to the next, E. The first word is "be." 02:22:43.070 --> 02:22:45.680 S, it was coming easier now. 02:22:45.680 --> 02:22:47.747 U. [CHUCKLES] 25, that's R. 02:22:47.747 --> 02:22:50.192 - Aw, come on, Ralphie, I gotta go. 02:22:50.192 --> 02:22:51.170 - Come on. 02:22:51.170 --> 02:22:53.126 - I'll be right down, Ma! 02:22:53.126 --> 02:22:54.104 - Gee whiz. 02:22:57.040 --> 02:23:01.120 - T, O. "Be sure to." 02:23:01.120 --> 02:23:02.380 Be sure to what? 02:23:02.380 --> 02:23:04.513 What was Little Orphan Annie trying to say? 02:23:04.513 --> 02:23:05.180 Be sure to what? 02:23:05.180 --> 02:23:06.970 - Ralphie, Randy has got to go. 02:23:06.970 --> 02:23:08.350 Will you please come out? 02:23:08.350 --> 02:23:09.580 - All right, Ma! 02:23:09.580 --> 02:23:11.470 I'll be right out! 02:23:11.470 --> 02:23:13.370 - I was getting closer now. 02:23:13.370 --> 02:23:15.300 The tension was terrible. 02:23:15.300 --> 02:23:16.310 What was it? 02:23:16.310 --> 02:23:18.776 The fate of the planet may hang in the balance. 02:23:18.776 --> 02:23:19.276 [KNOCKING] 02:23:19.276 --> 02:23:19.776 - Ralphie! 02:23:19.776 --> 02:23:21.666 Randy's got to go! 02:23:21.666 --> 02:23:25.012 - I'll be right out, for crying out loud! 02:23:25.012 --> 02:23:26.860 - [CHUCKLES] Almost there. 02:23:26.860 --> 02:23:27.930 My fingers flew. 02:23:27.930 --> 02:23:31.560 My mind was a steel trap, every pore vibrated. 02:23:31.560 --> 02:23:33.524 It was almost clear. 02:23:33.524 --> 02:23:35.894 Yes, yes, yes, yes. 02:23:35.894 --> 02:23:41.700 - "Be sure to drink your Ovaltine." 02:23:41.700 --> 02:23:42.630 Ovaltine? 02:23:46.510 --> 02:23:47.750 A crummy commercial? 02:23:47.750 --> 02:23:50.890 [MUSIC PLAYING] 02:23:50.890 --> 02:23:52.317 Son of a bitch. 02:23:52.317 --> 02:23:52.900 [END PLAYBACK] 02:23:52.900 --> 02:23:55.030 DAVID MALAN: All right, that's it for CS50. 02:23:55.030 --> 02:23:57.310 We will see you next time. 02:23:57.310 --> 02:24:00.660 [MUSIC PLAYING]