WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:00.000 --> 00:00:00.000 [MUSIC PLAYING] 00:01:18.000 --> 00:01:20.825 DAVID MALAN: This is CS50 and this is week 2. 00:01:20.825 --> 00:01:23.450 Now that you have some programming experience under your belts, 00:01:23.450 --> 00:01:25.910 in this more arcane language called c. 00:01:25.910 --> 00:01:28.790 Among our goals today is to help you understand exactly what you have 00:01:28.790 --> 00:01:30.650 been doing these past several days. 00:01:30.650 --> 00:01:33.955 Wrestling with your first programs in C, so that you have more of a bottom 00:01:33.955 --> 00:01:36.080 up understanding of what some of these commands do. 00:01:36.080 --> 00:01:38.580 And, ultimately, what more we can do with this language. 00:01:38.580 --> 00:01:41.750 So this recall was the very first program you wrote, 00:01:41.750 --> 00:01:44.870 I wrote in this language called C, much more textual, 00:01:44.870 --> 00:01:46.970 certainly, than the Scratch equivalent. 00:01:46.970 --> 00:01:51.200 But at the end of the day, computers, your Mac, your PC, 00:01:51.200 --> 00:01:54.555 VS Code doesn't understand this actual code. 00:01:54.555 --> 00:01:57.680 What's the format into which we need to get any program that we write, just 00:01:57.680 --> 00:01:58.180 to recap? 00:01:58.180 --> 00:01:59.202 AUDIENCE: [INAUDIBLE] 00:01:59.202 --> 00:02:01.790 DAVID MALAN: So binary, otherwise known as machine code. 00:02:01.790 --> 00:02:02.290 Right? 00:02:02.290 --> 00:02:05.870 The 0s and 1s that your computer actually does understand. 00:02:05.870 --> 00:02:08.030 So somehow we need to get to this format. 00:02:08.030 --> 00:02:10.730 And up until now, we've been using this command called make, 00:02:10.730 --> 00:02:13.670 which is aptly named, because it lets you make programs. 00:02:13.670 --> 00:02:16.430 And the invocation of that has been pretty simple. 00:02:16.430 --> 00:02:20.450 Make hello looks in your current directory or folder for a file called 00:02:20.450 --> 00:02:25.100 hello.c, implicitly, and then it compiles that into a file called hello, 00:02:25.100 --> 00:02:27.650 which itself is executable, which just means runnable, 00:02:27.650 --> 00:02:29.900 so that you can then do ./hello. 00:02:29.900 --> 00:02:34.190 But it turns out that make is actually not a compiler itself. 00:02:34.190 --> 00:02:35.840 It does help you make programs. 00:02:35.840 --> 00:02:40.520 But make is this utility that comes on a lot of systems that makes it easier 00:02:40.520 --> 00:02:44.060 to actually compile code by using an actual compiler, 00:02:44.060 --> 00:02:48.290 the program that converts source code to machine code, on your own Mac, or PC, 00:02:48.290 --> 00:02:50.660 or whatever cloud environment you might be using. 00:02:50.660 --> 00:02:53.330 In fact, what make is doing for us, is actually, 00:02:53.330 --> 00:02:57.230 running a command automatically known as clang, for C language. 00:02:57.230 --> 00:03:01.590 And, so here, for instance, in VS Code, is that very first program again, 00:03:01.590 --> 00:03:03.470 this time in the context of a text editor, 00:03:03.470 --> 00:03:06.680 and I could compile this with make hello. 00:03:06.680 --> 00:03:09.567 Let me go ahead and use the compiler itself manually. 00:03:09.567 --> 00:03:12.650 And we'll see in a moment why we've been automating the process with make. 00:03:12.650 --> 00:03:15.060 I'm going to run clang instead. 00:03:15.060 --> 00:03:17.340 And then I'm going to run hello.c. 00:03:17.340 --> 00:03:19.490 So it's a little different how the compiler's used. 00:03:19.490 --> 00:03:22.160 It needs to know, explicitly, what the file is called. 00:03:22.160 --> 00:03:25.280 I'll go ahead and run clang, hello.c, Enter. 00:03:25.280 --> 00:03:28.415 Nothing seems to happen, which, generally speaking, is a good thing. 00:03:28.415 --> 00:03:29.790 Because no errors have popped up. 00:03:29.790 --> 00:03:36.140 And if I do ls for list, you'll see there is not a file called hello. 00:03:36.140 --> 00:03:39.230 But there is a curiously-named file called a.out. 00:03:39.230 --> 00:03:42.620 This is a historical convention, stands for assembler output. 00:03:42.620 --> 00:03:45.380 And this is, just, the default file name for a program 00:03:45.380 --> 00:03:49.400 that you might compile yourself, manually, using clang itself. 00:03:49.400 --> 00:03:51.830 Let me go ahead now and point out that that's 00:03:51.830 --> 00:03:53.340 kind of a stupid name for a program. 00:03:53.340 --> 00:03:56.435 Even though it works, ./a.out would work. 00:03:56.435 --> 00:03:59.060 But if you actually want to customize the name of your program, 00:03:59.060 --> 00:04:02.720 we could just resort to make, or we could do explicitly 00:04:02.720 --> 00:04:03.920 what make is doing for us. 00:04:03.920 --> 00:04:06.770 It turns out, some programs, among them make, 00:04:06.770 --> 00:04:08.990 support what are called command line arguments, 00:04:08.990 --> 00:04:10.310 and more on those later today. 00:04:10.310 --> 00:04:13.670 But these are literally words or numbers that you type at your prompt 00:04:13.670 --> 00:04:17.330 after the name of a program that just influences its behavior in some way. 00:04:17.330 --> 00:04:20.040 It modifies its behavior. 00:04:20.040 --> 00:04:22.940 And it turns out, if you read the documentation for clang, 00:04:22.940 --> 00:04:28.040 you can actually pass a -o, for output, command line argument, that 00:04:28.040 --> 00:04:30.260 lets you specify, explicitly what do you want 00:04:30.260 --> 00:04:31.795 your outputted program to be called? 00:04:31.795 --> 00:04:34.670 And then you go ahead and type the name of the file that you actually 00:04:34.670 --> 00:04:37.110 want to compile, from source code to machine code. 00:04:37.110 --> 00:04:38.720 Let me hit Enter now. 00:04:38.720 --> 00:04:41.990 Again, nothing seems to happen, and I type ls and voila. 00:04:41.990 --> 00:04:45.010 Now we still have the old a.out, because I didn't delete it yet. 00:04:45.010 --> 00:04:46.010 And I do have hello now. 00:04:46.010 --> 00:04:50.420 So ./hello, voila, runs hello, world again. 00:04:50.420 --> 00:04:52.160 And let me go ahead and remove this file. 00:04:52.160 --> 00:04:56.593 I could, of course, resort to using the Explorer, on the left hand side. 00:04:56.593 --> 00:04:59.510 Which, I am in the habit of closing, just to give us more room to see. 00:04:59.510 --> 00:05:02.240 But I could go ahead and right-click or control-click on a.out 00:05:02.240 --> 00:05:03.365 if I want to get rid of it. 00:05:03.365 --> 00:05:06.300 Or again, let me focus on the command line interface. 00:05:06.300 --> 00:05:07.250 And I can use-- 00:05:07.250 --> 00:05:08.030 anyone recall? 00:05:08.030 --> 00:05:11.000 We didn't really use it much, but what command removes a file? 00:05:11.000 --> 00:05:12.665 AUDIENCE: rm. 00:05:12.665 --> 00:05:16.430 DAVID MALAN: So rm for remove. rm, a.out, Enter. 00:05:16.430 --> 00:05:20.060 Remove regular file, a.out, y for yes, enter. 00:05:20.060 --> 00:05:22.640 And now, if I do ls again, voila, it's gone. 00:05:22.640 --> 00:05:24.650 All right, so, let's now enhance this program 00:05:24.650 --> 00:05:30.290 to do the second version we ever did, which was to also include cs50.h, 00:05:30.290 --> 00:05:33.149 so that we have access to functions like, get string, and the like. 00:05:33.149 --> 00:05:40.340 Let me do string, name, gets, get string, what's your name, 00:05:40.340 --> 00:05:41.550 question mark. 00:05:41.550 --> 00:05:46.010 And now, let me go ahead and say hello to that name with our %s placeholder, 00:05:46.010 --> 00:05:46.920 comma, name. 00:05:46.920 --> 00:05:49.160 So this was version 2 of our program last time, 00:05:49.160 --> 00:05:53.300 that very easily compiled with make hello, but notice the difference now. 00:05:53.300 --> 00:05:56.360 If I want to compile this thing myself with clang, using 00:05:56.360 --> 00:05:58.520 that same lesson learned, all right, let's do it. 00:05:58.520 --> 00:06:05.300 clang-o, hello, just so I get a better name for the program, hello.c, Enter. 00:06:05.300 --> 00:06:09.750 And a new error pops up that some of you might have encountered on your own. 00:06:09.750 --> 00:06:13.580 So it's a bit arcane here, and there's this mention of a cryptic-looking path 00:06:13.580 --> 00:06:15.330 with temp for temporary there. 00:06:15.330 --> 00:06:18.560 But somehow, my issue's in main, as we can see here. 00:06:18.560 --> 00:06:20.257 It somehow relates to hello.c. 00:06:20.257 --> 00:06:23.090 Even though we might not have seen this language last time in class, 00:06:23.090 --> 00:06:25.970 but there's an undefined reference to get string. 00:06:25.970 --> 00:06:27.800 As though get string doesn't exist. 00:06:27.800 --> 00:06:31.340 Now, your first instinct might be, well maybe I forgot cs50.h, but of course, 00:06:31.340 --> 00:06:32.180 I didn't. 00:06:32.180 --> 00:06:34.310 That's the very first line of my program. 00:06:34.310 --> 00:06:37.910 But it turns out, make is doing something else for us, all this time. 00:06:37.910 --> 00:06:41.930 Just putting cs50.h, or any header file at the top of your code, 00:06:41.930 --> 00:06:46.730 for that matter, just teaches the compiler that a function will exist. 00:06:46.730 --> 00:06:49.310 It, sort of, asks the compiler to-- it asks the compiler 00:06:49.310 --> 00:06:52.610 to trust that I will, eventually, get around to implementing functions, 00:06:52.610 --> 00:06:58.130 like get string, and cs50.h, and stdio.h, printf, therein. 00:06:58.130 --> 00:07:03.830 But this error here, some kind of linker command, relates to the fact 00:07:03.830 --> 00:07:05.960 that there's a separate process for actually 00:07:05.960 --> 00:07:10.280 finding the 0s and 1s that cs50 compiled long ago for you. 00:07:10.280 --> 00:07:13.850 That authors of this operating system compiled for you, long ago, 00:07:13.850 --> 00:07:14.900 in the form of printf. 00:07:14.900 --> 00:07:17.840 We need to, somehow, tell the compiler that we 00:07:17.840 --> 00:07:20.450 need to link in code that someone else wrote, 00:07:20.450 --> 00:07:23.750 the actual machine code that someone else wrote and then compiled. 00:07:23.750 --> 00:07:27.497 So to do that, you'd have to type -lcs50, for instance, 00:07:27.497 --> 00:07:28.580 at the end of the command. 00:07:28.580 --> 00:07:31.548 So additionally, telling clang that, not only do you want to output 00:07:31.548 --> 00:07:34.340 a file called hello, and you want to compile a file called hello.c, 00:07:34.340 --> 00:07:39.200 you also want to quote-unquote link in a bunch of 0s and 1s 00:07:39.200 --> 00:07:43.010 that collectively implement get string and printf. 00:07:43.010 --> 00:07:47.220 So now, if I hit enter, this time it compiled OK. 00:07:47.220 --> 00:07:53.142 And now if I run ./hello, it works as it did last week, just like that. 00:07:53.142 --> 00:07:56.100 But honestly, this is just going to get really tedious, really quickly. 00:07:56.100 --> 00:07:57.930 Notice, already, just to compile my code, 00:07:57.930 --> 00:08:01.417 I have to run clang-o, hello, hello.c, lcs50, 00:08:01.417 --> 00:08:03.500 and you're going to have to type more things, too. 00:08:03.500 --> 00:08:06.890 If you wanted to use the math library, like, to use that round function, 00:08:06.890 --> 00:08:09.440 you would also have to do -lm, typically, 00:08:09.440 --> 00:08:12.890 to specify give me the math bits that someone else compiled. 00:08:12.890 --> 00:08:14.970 And the commands just get longer and longer. 00:08:14.970 --> 00:08:19.520 So moving forward, we won't have to resort to running clang itself, 00:08:19.520 --> 00:08:21.330 but clang is, indeed, the compiler. 00:08:21.330 --> 00:08:24.380 That is the program that converts from source code to machine code. 00:08:24.380 --> 00:08:28.438 But we'll continue to use make because it just automates that process. 00:08:28.438 --> 00:08:30.230 And the commands are only going to get more 00:08:30.230 --> 00:08:34.640 cryptic the more sophisticated and more feature full year programs get. 00:08:34.640 --> 00:08:39.620 And make, again, is just a tool that makes all that happen. 00:08:39.620 --> 00:08:44.300 Let me pause there to see if there's any questions before then we 00:08:44.300 --> 00:08:45.890 take a look further under the hood. 00:08:45.890 --> 00:08:47.185 Yeah, in front. 00:08:47.185 --> 00:08:50.185 AUDIENCE: Can you explain again what the -lcs50-- just why you put that? 00:08:50.185 --> 00:08:52.518 DAVID MALAN: Sure, let me come back to that in a moment. 00:08:52.518 --> 00:08:53.750 What does the -lcs50 mean? 00:08:53.750 --> 00:08:55.917 We'll come back to that, visually, in just a moment. 00:08:55.917 --> 00:08:58.850 But it means to link in the 0s and 1s that collectively 00:08:58.850 --> 00:09:00.435 implement get string and printf. 00:09:00.435 --> 00:09:02.060 But we'll see that, visually, in a sec. 00:09:02.060 --> 00:09:03.341 Yeah, behind you. 00:09:03.341 --> 00:09:07.073 AUDIENCE: [INAUDIBLE]. 00:09:07.073 --> 00:09:08.490 DAVID MALAN: Really good question. 00:09:08.490 --> 00:09:10.850 How come I didn't have to link in standard I/O? 00:09:10.850 --> 00:09:12.950 Because I used printf in version 1. 00:09:12.950 --> 00:09:16.280 Standard I/O is just, literally, so standard that it's built in, 00:09:16.280 --> 00:09:17.480 it just works for free. 00:09:17.480 --> 00:09:18.800 CS50, of course, is not. 00:09:18.800 --> 00:09:21.080 It did not come with the language C or the compiler. 00:09:21.080 --> 00:09:22.250 We ourselves wrote it. 00:09:22.250 --> 00:09:26.600 And other libraries, even though they might come with the language C, 00:09:26.600 --> 00:09:30.600 they might not be enabled by default, generally for efficiency purposes. 00:09:30.600 --> 00:09:33.470 So you're not loading more 0s and 1s into the computer's memory 00:09:33.470 --> 00:09:34.280 than you need to. 00:09:34.280 --> 00:09:37.250 So standard I/O is special, if you will. 00:09:37.250 --> 00:09:38.510 Other questions? 00:09:38.510 --> 00:09:39.500 Yeah? 00:09:39.500 --> 00:09:41.420 AUDIENCE: [INAUDIBLE] 00:09:41.420 --> 00:09:43.160 DAVID MALAN: Oh, what does the -o mean? 00:09:43.160 --> 00:09:46.190 So -o is shorthand for the English word output, 00:09:46.190 --> 00:09:51.260 and so -o is telling clang to please output a file called hello, 00:09:51.260 --> 00:09:53.850 because the next thing I wrote after the command line 00:09:53.850 --> 00:09:59.929 recall was clang -o hello, then the name of the file, then -lcs50. 00:09:59.929 --> 00:10:03.407 And this is where these commands do get and stay fairly arcane. 00:10:03.407 --> 00:10:05.240 It's just through muscle memory and practice 00:10:05.240 --> 00:10:07.610 that you'll start to remember, oh what are the other commands that you-- 00:10:07.610 --> 00:10:10.277 what are the command line arguments you can provide to programs? 00:10:10.277 --> 00:10:11.570 But we've seen this before. 00:10:11.570 --> 00:10:14.780 Technically, when you run make hello, the program is called make, 00:10:14.780 --> 00:10:16.980 hello is the command line argument. 00:10:16.980 --> 00:10:19.040 It's an input to the make function, albeit, 00:10:19.040 --> 00:10:22.250 typed at the prompt, that tells make what you want to make. 00:10:22.250 --> 00:10:26.180 Even when I used rm a moment ago, and did rm of a.out, 00:10:26.180 --> 00:10:28.280 the command line argument there was called a.out 00:10:28.280 --> 00:10:30.740 and it's telling rm what to delete. 00:10:30.740 --> 00:10:35.270 It is entirely dependent on the programs to decide what their conventions are, 00:10:35.270 --> 00:10:38.090 whether you use dash this or dash that, but we'll 00:10:38.090 --> 00:10:40.805 see over time, which ones actually matter in practice. 00:10:40.805 --> 00:10:46.220 So to come back to the first question about what actually is happening there, 00:10:46.220 --> 00:10:48.562 let's consider the code more closely. 00:10:48.562 --> 00:10:50.270 So here is that first version of the code 00:10:50.270 --> 00:10:54.590 again, with stdio.h and only printf, so no cs50 stuff yet. 00:10:54.590 --> 00:10:56.840 Until we add it back in and had the second version, 00:10:56.840 --> 00:10:59.630 where we actually get the human's name. 00:10:59.630 --> 00:11:02.783 When you run this command, there's a few things 00:11:02.783 --> 00:11:04.700 that are happening underneath the hood, and we 00:11:04.700 --> 00:11:06.650 won't dwell on these kinds of details, indeed, 00:11:06.650 --> 00:11:08.870 we'll abstract it away by using make. 00:11:08.870 --> 00:11:10.940 But it's worth understanding from the get-go, 00:11:10.940 --> 00:11:13.880 how much automation is going on, so that when you run these commands, 00:11:13.880 --> 00:11:14.850 it's not magic. 00:11:14.850 --> 00:11:17.940 You have this bottom-up understanding of what's going on. 00:11:17.940 --> 00:11:21.530 So when we say you've been compiling your code with make, 00:11:21.530 --> 00:11:23.600 that's a bit of an oversimplification. 00:11:23.600 --> 00:11:26.780 Technically, every time you compile your code, 00:11:26.780 --> 00:11:29.570 you're having the computer do four distinct things for you. 00:11:29.570 --> 00:11:33.020 And this is not four distinct things that you need to memorize and remember 00:11:33.020 --> 00:11:35.180 every time you run your program, what's happening, 00:11:35.180 --> 00:11:37.820 but it helps to break it down into building blocks, 00:11:37.820 --> 00:11:42.110 as to how we're getting from source code, like C, into 0s and 1s. 00:11:42.110 --> 00:11:46.640 It turns out, that when you compile, quote-unquote, "your code," technically 00:11:46.640 --> 00:11:50.510 speaking, you're doing four things automatically, and all at once. 00:11:50.510 --> 00:11:53.960 Preprocessing it, compiling it, assembling it, and linking it. 00:11:53.960 --> 00:11:57.350 Just humans decided, let's just call the whole process compiling. 00:11:57.350 --> 00:12:00.230 But for a moment, let's consider what these steps are. 00:12:00.230 --> 00:12:02.690 So preprocessing refers to this. 00:12:02.690 --> 00:12:06.710 If we look at our source code, version 2 that uses the cs50 library 00:12:06.710 --> 00:12:10.442 and therefore get string, notice that we have these include lines at top. 00:12:10.442 --> 00:12:12.650 And they're kind of special versus all the other code 00:12:12.650 --> 00:12:15.710 we've written, because they start with hash symbols, specifically. 00:12:15.710 --> 00:12:17.660 And that's sort of a special syntax that means 00:12:17.660 --> 00:12:20.600 that these are, technically, called preprocessor directives. 00:12:20.600 --> 00:12:25.290 Fancy way of saying they're handled special versus the rest of your code. 00:12:25.290 --> 00:12:29.870 In fact, if we focus on cs50.h, recall from last week 00:12:29.870 --> 00:12:35.870 that I provided a hint as to what's actually in cs50.h, among other things. 00:12:35.870 --> 00:12:40.580 What was the one salient thing that I said was in cs50.h and therefore, 00:12:40.580 --> 00:12:43.475 why we were including it in the first place? 00:12:43.475 --> 00:12:44.350 AUDIENCE: Get string? 00:12:44.350 --> 00:12:46.850 DAVID MALAN: So get string, specifically, 00:12:46.850 --> 00:12:49.160 the prototype for get string. 00:12:49.160 --> 00:12:51.410 We haven't made many of our own functions yet, 00:12:51.410 --> 00:12:53.840 but recall that any time we've made our own functions, 00:12:53.840 --> 00:12:56.330 and we've written them below main in a file, 00:12:56.330 --> 00:12:58.790 we've also had to, somewhat stupidly, copy paste 00:12:58.790 --> 00:13:01.370 the prototype of the function at the top of the file, 00:13:01.370 --> 00:13:05.210 just to teach the compiler that this function doesn't exist, yet, 00:13:05.210 --> 00:13:07.430 it does down there, but it will exist. 00:13:07.430 --> 00:13:08.300 Just trust me. 00:13:08.300 --> 00:13:10.980 So again, that's what these prototypes are doing for us. 00:13:10.980 --> 00:13:13.340 So therefore, in my code, If I want to use 00:13:13.340 --> 00:13:16.760 a function like get string, or printf, for that matter, 00:13:16.760 --> 00:13:19.150 they're not implemented clearly in the same file, 00:13:19.150 --> 00:13:20.400 they're implemented elsewhere. 00:13:20.400 --> 00:13:22.692 So I need to tell the compiler to trust me that they're 00:13:22.692 --> 00:13:24.000 implemented somewhere else. 00:13:24.000 --> 00:13:26.810 And so technically, inside of cs50.h, which 00:13:26.810 --> 00:13:30.410 is installed somewhere in the cloud's hard drive, so to speak, 00:13:30.410 --> 00:13:34.820 that you all are accessing via VS Code, there's a line that looks like this. 00:13:34.820 --> 00:13:38.870 A prototype for the get string function that says the name of the functions 00:13:38.870 --> 00:13:42.830 get string, it takes one input, or argument, called prompt, 00:13:42.830 --> 00:13:45.710 and that type of that prompt is a string. 00:13:45.710 --> 00:13:51.150 Get string, not surprisingly, has a return value and it returns a string. 00:13:51.150 --> 00:13:54.800 So literally, that line and a bunch of others, are in cs50.h. 00:13:54.800 --> 00:13:58.280 So rather than you all having to copy paste the prototype, 00:13:58.280 --> 00:14:01.160 you can just trust that cs50 figured out what it is. 00:14:01.160 --> 00:14:04.970 You can include cs50.h and the compiler is going 00:14:04.970 --> 00:14:07.420 to go find that prototype for you. 00:14:07.420 --> 00:14:09.480 Same thing in standard I/O. Someone else-- what 00:14:09.480 --> 00:14:13.620 must clearly be in stdio.h, among other stuff, that 00:14:13.620 --> 00:14:17.590 motivates our including stdio.h, too? 00:14:17.590 --> 00:14:18.090 Yeah? 00:14:18.090 --> 00:14:18.798 AUDIENCE: Printf. 00:14:18.798 --> 00:14:21.030 DAVID MALAN: Printf, the prototype for printf, 00:14:21.030 --> 00:14:24.010 and I'll just change it here in yellow, to be the same. 00:14:24.010 --> 00:14:25.410 And it turns out, the format-- 00:14:25.410 --> 00:14:28.590 the prototype for printf is, actually, pretty fancy, 00:14:28.590 --> 00:14:31.740 because, as you might have noticed, printf can take one argument, just 00:14:31.740 --> 00:14:35.910 something to print, 2, if you want to plug a value into it, 3 or more. 00:14:35.910 --> 00:14:38.620 So the dot dot dot just represents exactly that. 00:14:38.620 --> 00:14:42.330 It's not quite as simple a prototype as get strain, but more on that 00:14:42.330 --> 00:14:43.115 another time. 00:14:43.115 --> 00:14:46.050 So what does it mean to preprocess your code? 00:14:46.050 --> 00:14:49.860 The very first thing the compiler, clang, in this case, 00:14:49.860 --> 00:14:54.270 is doing for you when it reads your code top-to-bottom, left-to-right, is it 00:14:54.270 --> 00:14:57.960 notices, oh, here is hash include, oh, here's another hash include. 00:14:57.960 --> 00:15:03.090 And it, essentially, finds those files on the hard drive, cs50.h, stdio.h, 00:15:03.090 --> 00:15:06.990 and does the equivalent of copying and pasting them automatically 00:15:06.990 --> 00:15:09.360 into your code at the very top. 00:15:09.360 --> 00:15:12.450 Thereby teaching the compiler that gets string and printf 00:15:12.450 --> 00:15:14.430 will eventually exist somewhere. 00:15:14.430 --> 00:15:18.480 So that's the preprocessing step, whereby, again, it's 00:15:18.480 --> 00:15:22.080 just doing a find-and-replace of anything that starts with hash include. 00:15:22.080 --> 00:15:24.510 It's plugging in the files there so that you, essentially, 00:15:24.510 --> 00:15:27.780 get all the prototypes you need automatically. 00:15:27.780 --> 00:15:28.830 OK. 00:15:28.830 --> 00:15:31.230 What does it mean, then, to compile the results? 00:15:31.230 --> 00:15:33.450 Because at this point in the story, your code 00:15:33.450 --> 00:15:35.678 now looks like this in the computer's memory. 00:15:35.678 --> 00:15:37.470 It doesn't change your file, it's doing all 00:15:37.470 --> 00:15:39.990 of this in the computer's memory, or RAM, for you. 00:15:39.990 --> 00:15:42.070 But it, essentially, looks like this. 00:15:42.070 --> 00:15:45.600 Well the next step is what's, technically, really compiling. 00:15:45.600 --> 00:15:48.420 Even though again, we use compile as an umbrella term. 00:15:48.420 --> 00:15:51.510 Compiling code in C means to take code that 00:15:51.510 --> 00:15:53.740 now looks like this in the computer's memory 00:15:53.740 --> 00:15:56.890 and turn it into something that looks like this. 00:15:56.890 --> 00:15:58.350 Which is way more cryptic. 00:15:58.350 --> 00:16:00.990 But it was just a few decades ago that, if you 00:16:00.990 --> 00:16:03.930 were taking a class like CS50 in its earlier form, 00:16:03.930 --> 00:16:07.740 we wouldn't be using C it didn't exist yet, we would actually be using this, 00:16:07.740 --> 00:16:09.690 something called assembly language. 00:16:09.690 --> 00:16:13.230 And there's different types of, or flavors of, assembly language. 00:16:13.230 --> 00:16:17.010 But this is about as low level as you can get to what a computer really 00:16:17.010 --> 00:16:19.410 understands, be it a Mac, or PC, or a phone, 00:16:19.410 --> 00:16:22.650 before you start getting into actual 0s and 1s. 00:16:22.650 --> 00:16:24.013 And most of this is cryptic. 00:16:24.013 --> 00:16:27.180 I couldn't tell you what this is doing unless I thought it through carefully 00:16:27.180 --> 00:16:30.300 and rewound mentally, years ago, from having studied it, 00:16:30.300 --> 00:16:32.880 but let's highlight a few key words in yellow. 00:16:32.880 --> 00:16:37.380 Notice that this assembly language that the computer is outputting 00:16:37.380 --> 00:16:40.530 for you automatically, still has mention of main 00:16:40.530 --> 00:16:43.290 and it has mention of get string, and it has mention of printf. 00:16:43.290 --> 00:16:46.358 So there's some relationship to the C code we saw a moment ago. 00:16:46.358 --> 00:16:48.150 And then if I highlight these other things, 00:16:48.150 --> 00:16:50.430 these are what are called computer instructions. 00:16:50.430 --> 00:16:52.740 At the end of the day, your Mac, your PC, 00:16:52.740 --> 00:16:56.340 your phone actually only understands very basic instructions, 00:16:56.340 --> 00:17:01.020 like addition, subtraction, division, multiplication, move into memory, 00:17:01.020 --> 00:17:06.190 load from memory, print something to the screen, very basic operations. 00:17:06.190 --> 00:17:07.755 And that's what you're seeing here. 00:17:07.755 --> 00:17:12.750 These assembly instructions are what the computer actually 00:17:12.750 --> 00:17:16.870 feeds into the brains of the computer, the CPU, the central processing unit. 00:17:16.870 --> 00:17:19.770 And it's that Intel CPU, or whatever you have, 00:17:19.770 --> 00:17:23.220 that understands this instruction, and this one, and this one, and this one. 00:17:23.220 --> 00:17:25.860 And collectively, long story short, all they do 00:17:25.860 --> 00:17:28.620 is print hello, world on the screen, but in a way 00:17:28.620 --> 00:17:31.910 that the machine understands how to do. 00:17:31.910 --> 00:17:34.500 So let me pause here. 00:17:34.500 --> 00:17:37.010 Are there any questions on what we mean by preprocessing? 00:17:37.010 --> 00:17:40.850 Which finds and replaces the hash includes symbols, among others, 00:17:40.850 --> 00:17:44.450 and compiling, which technically takes your source code, 00:17:44.450 --> 00:17:48.170 once preprocessed, and converts it to that stuff called assembly language. 00:17:48.170 --> 00:17:50.342 AUDIENCE: [INAUDIBLE] each CPU has-- 00:17:50.342 --> 00:17:51.290 DAVID MALAN: Correct. 00:17:51.290 --> 00:17:54.710 Each type of CPU has its own instruction set. 00:17:54.710 --> 00:17:55.280 Indeed. 00:17:55.280 --> 00:17:58.970 And as a teaser, this is why, at least back in the day, when 00:17:58.970 --> 00:18:02.900 we used to install software from CD-ROMs, or some other type of media, 00:18:02.900 --> 00:18:08.222 this is why you can't take a program that was sold for a Windows computer 00:18:08.222 --> 00:18:09.680 and run it on a Mac, or vice-versa. 00:18:09.680 --> 00:18:14.420 Because the commands, the instructions that those two products understand, 00:18:14.420 --> 00:18:15.500 are actually different. 00:18:15.500 --> 00:18:20.150 Now Microsoft, or any company, could generally write code in one language, 00:18:20.150 --> 00:18:24.109 like C or another, and they can compile it twice, saving a PC version 00:18:24.109 --> 00:18:25.790 and saving a Mac version. 00:18:25.790 --> 00:18:30.109 It's twice as much work and sometimes you get into some incompatibilities, 00:18:30.109 --> 00:18:33.140 but that's why these steps are somewhat distinct. 00:18:33.140 --> 00:18:36.710 You can now use the same code and support even different platforms, 00:18:36.710 --> 00:18:37.940 or systems, if you'd want. 00:18:37.940 --> 00:18:38.440 All right. 00:18:38.440 --> 00:18:39.650 Assembly, assembling. 00:18:39.650 --> 00:18:42.800 Thankfully, this part is fairly straightforward, at least, in concept. 00:18:42.800 --> 00:18:46.250 To assemble code, which is step three of four, that is just 00:18:46.250 --> 00:18:50.360 happening for you every time you run make or, in turn, clang, 00:18:50.360 --> 00:18:53.570 this assembly language, which the computer generated automatically 00:18:53.570 --> 00:18:57.080 for you from your source code, is turned into 0s and 1s. 00:18:57.080 --> 00:19:00.783 So that's the step that, last week, I simplified and said, 00:19:00.783 --> 00:19:03.950 when you compile your code, you convert it to source code-- from source code 00:19:03.950 --> 00:19:04.970 to machine code. 00:19:04.970 --> 00:19:07.685 Technically, that happens when you assemble your code. 00:19:07.685 --> 00:19:10.940 But no one in normal conversations says that, they just 00:19:10.940 --> 00:19:13.280 say compile for all of these terms. 00:19:13.280 --> 00:19:14.310 All right. 00:19:14.310 --> 00:19:17.450 So that's assembling. 00:19:17.450 --> 00:19:19.070 There's one final step. 00:19:19.070 --> 00:19:22.400 Even in this simple program of getting the user's name 00:19:22.400 --> 00:19:27.120 and then plugging it into printf, I'm using three different people's code, 00:19:27.120 --> 00:19:27.620 if you will. 00:19:27.620 --> 00:19:30.200 My own, which is in hello.c. 00:19:30.200 --> 00:19:35.600 Some of CS50s, which is in hello.c, sorry-- which 00:19:35.600 --> 00:19:39.080 is in cs50.c, which is not a file I've mentioned, yet, 00:19:39.080 --> 00:19:43.220 but it stands to reason, that if there's a cs50.h that has prototypes, 00:19:43.220 --> 00:19:45.380 turns out, the actual implementation of get string 00:19:45.380 --> 00:19:47.600 and other things are in cs50.c. 00:19:47.600 --> 00:19:51.290 And there's a third file somewhere on the hard drive 00:19:51.290 --> 00:19:54.260 that's involved in compiling even this simple program. 00:19:54.260 --> 00:19:59.971 hello.c, cs50.c, and by that logic, what might the other be? 00:19:59.971 --> 00:20:00.471 Yeah? 00:20:00.471 --> 00:20:02.275 AUDIENCE: stdio? 00:20:02.275 --> 00:20:03.600 DAVID MALAN: Stdio.c. 00:20:03.600 --> 00:20:06.690 And that's a bit of a white lie, because that's such a big, fancy library 00:20:06.690 --> 00:20:09.750 that there's actually multiple files that compose it, but the same idea, 00:20:09.750 --> 00:20:11.380 and we'll take the simplification. 00:20:11.380 --> 00:20:16.200 So when I have this code, and I compile my code, 00:20:16.200 --> 00:20:21.300 I get those 0s and 1s that end up taking hello.c and turning it, effectively, 00:20:21.300 --> 00:20:26.830 into 0s and 1s that are combined with cs50.c, followed by stdio.c as well. 00:20:26.830 --> 00:20:27.840 So let me rewind here. 00:20:27.840 --> 00:20:33.300 Here might be the 0s and 1s for my code, the two lines of code that I wrote. 00:20:33.300 --> 00:20:37.920 Here might be the 0s and 1s for what cs50 wrote some years ago in cs50.c. 00:20:37.920 --> 00:20:42.210 Here might be the 0s and 1s that someone wrote for standard I/O decades ago. 00:20:42.210 --> 00:20:45.720 The last and final step is that linking command 00:20:45.720 --> 00:20:48.330 that links all of these 0s and 1s together, 00:20:48.330 --> 00:20:53.820 essentially stitches them together into one single file called hello, 00:20:53.820 --> 00:20:56.385 or called a.out, whatever you name it. 00:20:56.385 --> 00:21:01.650 That last step is what combines all of these different programmers' 0s and 1s. 00:21:01.650 --> 00:21:04.050 And my God, now we're really in the weeds. 00:21:04.050 --> 00:21:07.020 Who wants to even think about running code at this level? 00:21:07.020 --> 00:21:08.160 You shouldn't need to. 00:21:08.160 --> 00:21:09.180 But it's not magic. 00:21:09.180 --> 00:21:11.748 When you're running make, there's some very concrete steps 00:21:11.748 --> 00:21:14.290 that are happening that humans have developed over the years, 00:21:14.290 --> 00:21:17.700 over the decades, that breakdown this big problem of source code going 00:21:17.700 --> 00:21:22.410 to 0s and 1s, or machine code, into these very specific steps. 00:21:22.410 --> 00:21:26.100 But henceforth, you can call all of this compiling. 00:21:26.100 --> 00:21:27.120 Questions? 00:21:27.120 --> 00:21:27.780 Or confusion? 00:21:27.780 --> 00:21:28.596 Yeah? 00:21:28.596 --> 00:21:30.804 AUDIENCE: Can you explain again what a.out signifies? 00:21:30.804 --> 00:21:31.770 DAVID MALAN: Sure. 00:21:31.770 --> 00:21:33.270 What does a.out signify? 00:21:33.270 --> 00:21:37.890 a.out is just the conventional, default file name for any program 00:21:37.890 --> 00:21:41.280 that you compile directly with a compiler, like clang. 00:21:41.280 --> 00:21:43.680 It's a meaningless name, though. 00:21:43.680 --> 00:21:47.250 It stands for assembler output, and assembler might now sound familiar 00:21:47.250 --> 00:21:48.690 from this assembling process. 00:21:48.690 --> 00:21:51.150 It's a lame name for a computer program, and we 00:21:51.150 --> 00:21:56.450 can override it by outputting something like hello, instead. 00:21:56.450 --> 00:21:57.317 Yeah? 00:21:57.317 --> 00:22:03.426 AUDIENCE: [INAUDIBLE] 00:22:03.426 --> 00:22:07.860 DAVID MALAN: To recap, there are other prototypes in those files, 00:22:07.860 --> 00:22:11.910 cs50.h, stdio.h, technically, they're all included on top of your file, 00:22:11.910 --> 00:22:14.460 even though you, strictly speaking, don't need most of them, 00:22:14.460 --> 00:22:18.190 but they are there, just in case you might want them. 00:22:18.190 --> 00:22:19.660 And finally, any other questions? 00:22:19.660 --> 00:22:20.160 Yeah? 00:22:20.160 --> 00:22:23.878 AUDIENCE: [INAUDIBLE] 00:22:23.878 --> 00:22:26.920 DAVID MALAN: Does it matter what order we're telling the computer to run? 00:22:26.920 --> 00:22:29.140 Sometimes with libraries, yes, it matters 00:22:29.140 --> 00:22:31.520 what order they are linked in together. 00:22:31.520 --> 00:22:34.330 But for our purposes, it's really not going to matter. 00:22:34.330 --> 00:22:38.750 It's going to-- make is going to take care of automating that process for us. 00:22:38.750 --> 00:22:39.250 All right. 00:22:39.250 --> 00:22:41.795 So with that said, henceforth, compiling, technically, 00:22:41.795 --> 00:22:42.670 is these four things. 00:22:42.670 --> 00:22:46.690 But we'll focus on it as a higher level concept, an abstraction, 00:22:46.690 --> 00:22:49.880 known as compiling itself. 00:22:49.880 --> 00:22:52.510 So another process that we'll now begin to focus on all the 00:22:52.510 --> 00:22:55.690 more this week because, invariably, this past week you ran against-- 00:22:55.690 --> 00:22:57.160 ran up against some challenges. 00:22:57.160 --> 00:23:00.550 You probably created your very first bugs, or mistakes, in a program 00:23:00.550 --> 00:23:03.940 and so let's focus for a moment on actual techniques for debugging. 00:23:03.940 --> 00:23:07.060 As you spend more time this semester, in the years 00:23:07.060 --> 00:23:10.270 to come If you continue to program, you're never, frankly, probably, 00:23:10.270 --> 00:23:13.577 going to write bug free code, ultimately. 00:23:13.577 --> 00:23:16.660 Though your programs are going to get more featureful, more sophisticated, 00:23:16.660 --> 00:23:20.230 and we're all going to start to make more sophisticated mistakes. 00:23:20.230 --> 00:23:22.570 And to this day, I write buggy code all the time. 00:23:22.570 --> 00:23:24.520 And I'm always horrified when I do it up here. 00:23:24.520 --> 00:23:26.620 But hopefully, that won't happen too often. 00:23:26.620 --> 00:23:30.100 But when it does, it's a process, now, of debugging, trying 00:23:30.100 --> 00:23:32.230 to find the mistakes in your program. 00:23:32.230 --> 00:23:35.600 You don't have to stare at your code, or shake your fist at your code. 00:23:35.600 --> 00:23:38.590 There are actual tools that real world programmers 00:23:38.590 --> 00:23:41.860 use to help debug their code and find these faults. 00:23:41.860 --> 00:23:44.455 So what are some of the techniques and tools that folks use? 00:23:44.455 --> 00:23:49.440 Well as an aside, if you've ever-- 00:23:49.440 --> 00:23:52.840 a bug in a program is a mistake, that's been around for some time. 00:23:52.840 --> 00:23:58.010 If you've ever heard this tale, some 50 plus years ago, in 1947. 00:23:58.010 --> 00:24:02.770 This is an entry in a log book written by a famous computer scientist known 00:24:02.770 --> 00:24:05.230 as-- named Grace Hopper, who happened to be the one 00:24:05.230 --> 00:24:09.345 to record the very first discovery of a quote-unquote actual bug in a computer. 00:24:09.345 --> 00:24:11.860 This was like a moth that had flown into, 00:24:11.860 --> 00:24:17.080 at the time, a very sophisticated system known as the Harvard Mark II computer, 00:24:17.080 --> 00:24:20.050 very large, refrigerator-sized type systems, 00:24:20.050 --> 00:24:24.160 in which an actual bug caused an issue. 00:24:24.160 --> 00:24:27.190 The etymology of bug though, predates this particular instance, 00:24:27.190 --> 00:24:30.580 but here you have, as any computer scientists might know, the example 00:24:30.580 --> 00:24:32.845 of a first physical bug in a computer. 00:24:32.845 --> 00:24:35.322 How, though, do you go about removing such a thing? 00:24:35.322 --> 00:24:37.780 Well, let's consider a very simple scenario from last time, 00:24:37.780 --> 00:24:40.780 for instance, when we were trying to print out various aspects of Mario, 00:24:40.780 --> 00:24:42.970 like this column of 3 bricks. 00:24:42.970 --> 00:24:46.660 Let's consider how I might go about implementing a program like this. 00:24:46.660 --> 00:24:51.130 Let me switch back over to VS Code here, and I'm going to run-- 00:24:51.130 --> 00:24:52.750 write a program. 00:24:52.750 --> 00:24:54.640 And I'm not going to trust myself, so I'm 00:24:54.640 --> 00:24:56.507 going to call it buggy.c from the get-go, 00:24:56.507 --> 00:24:58.340 knowing that I'm going to mess something up. 00:24:58.340 --> 00:25:01.150 But I'm going to go ahead and include stdio.h. 00:25:01.150 --> 00:25:03.940 And I'm going to define main, as usual. 00:25:03.940 --> 00:25:05.950 So hopefully, no mistakes just yet. 00:25:05.950 --> 00:25:08.710 And now, I want to print those 3 bricks on the screen using 00:25:08.710 --> 00:25:10.270 just hashes for bricks. 00:25:10.270 --> 00:25:16.420 So how about 4 int i get 0, i less than or equal to 3, i plus plus. 00:25:16.420 --> 00:25:18.280 Now, inside of my curly braces, I'm going 00:25:18.280 --> 00:25:23.960 to go ahead and print out a hash followed by a backslash n, semicolon. 00:25:23.960 --> 00:25:27.975 All right, saving the file, doing make, buggy, Enter, it compiles. 00:25:27.975 --> 00:25:33.340 So there's no syntactical errors, my code is syntactically correct. 00:25:33.340 --> 00:25:36.640 But some of you have probably seen the logical error already, 00:25:36.640 --> 00:25:39.370 because when I run this program I don't get 00:25:39.370 --> 00:25:45.430 this picture, which was 3 bricks high, I seem to have 4 bricks instead. 00:25:45.430 --> 00:25:47.930 Now, this might be jumping out at you, why it's happening, 00:25:47.930 --> 00:25:49.930 but I've kept the program simple just so that we 00:25:49.930 --> 00:25:54.010 don't have to find an actual bug, we can use a tool to find one that we already 00:25:54.010 --> 00:25:55.970 know about, in this case. 00:25:55.970 --> 00:25:59.050 What might be the first strategy for finding a bug like this, 00:25:59.050 --> 00:26:03.292 rather than staring at your code, asking a question, trying to think 00:26:03.292 --> 00:26:04.125 through the problem? 00:26:04.125 --> 00:26:07.690 Well, let's actually try to diagnose the problem more proactively. 00:26:07.690 --> 00:26:10.420 And the simplest way to do this now, and years from now, 00:26:10.420 --> 00:26:13.870 is, honestly, going to be to use a function like printf. 00:26:13.870 --> 00:26:15.790 Printf is a wonderfully useful function, not 00:26:15.790 --> 00:26:18.550 for formatting-- printing formatted strings and all that, for 00:26:18.550 --> 00:26:21.430 just looking inside the values of variables 00:26:21.430 --> 00:26:24.352 that you might be curious about to see what's going on. 00:26:24.352 --> 00:26:25.060 So you know what? 00:26:25.060 --> 00:26:26.320 Let me do this. 00:26:26.320 --> 00:26:29.110 I see that there's 4 coming out, but I intended 3. 00:26:29.110 --> 00:26:31.740 So clearly, something's wrong with my i variables. 00:26:31.740 --> 00:26:34.090 So let me be a little more pedantic. 00:26:34.090 --> 00:26:37.300 Let me go inside of this loop and, temporarily, 00:26:37.300 --> 00:26:40.480 say something explicit, like, i is-- 00:26:40.480 --> 00:26:45.200 &i /n, and then just plug in the value of i. 00:26:45.200 --> 00:26:45.700 Right? 00:26:45.700 --> 00:26:48.970 This is not the program I want to write, it's the program I'm temporarily 00:26:48.970 --> 00:26:54.400 writing, because now I'm going to say make buggy, ./buggy. 00:26:54.400 --> 00:26:56.500 And if I look, now, at the output, I have 00:26:56.500 --> 00:27:01.090 some helpful diagnostic information. i is 0, and I get a hash, i is 1, 00:27:01.090 --> 00:27:03.610 and I get a hash, 2 and I get a hash, 3 and I get hash. 00:27:03.610 --> 00:27:04.527 OK, wait a minute. 00:27:04.527 --> 00:27:06.610 I'm clearly going too many steps because, maybe, I 00:27:06.610 --> 00:27:09.250 forgot that computers are, essentially, counting from 0, 00:27:09.250 --> 00:27:11.450 and now, oh, it's less than or equal to. 00:27:11.450 --> 00:27:13.030 Now you see it, right? 00:27:13.030 --> 00:27:15.940 Again, trivial example, but just by using printf, 00:27:15.940 --> 00:27:18.910 you can see inside of the computer's memory 00:27:18.910 --> 00:27:21.130 by just printing stuff out like this. 00:27:21.130 --> 00:27:25.770 And now, once you've figured it out, oh, so this should probably be less than 3, 00:27:25.770 --> 00:27:28.140 or I should start counting from 1, there's 00:27:28.140 --> 00:27:29.640 any number of ways I could fix this. 00:27:29.640 --> 00:27:32.655 But the most conventional is probably just to say less than 3. 00:27:32.655 --> 00:27:39.180 Now, I can delete my temporary print statement, rerun make buggy, ./buggy. 00:27:39.180 --> 00:27:41.790 And, voila, problem solved. 00:27:41.790 --> 00:27:43.830 All right, and to this day, I do this. 00:27:43.830 --> 00:27:46.860 Whether it's making a command line application, or a web application, 00:27:46.860 --> 00:27:49.050 or mobile application, It's very common to use 00:27:49.050 --> 00:27:51.270 printf, or some equivalent in any language, 00:27:51.270 --> 00:27:55.350 just to poke around and see what's inside the computer's memory. 00:27:55.350 --> 00:27:58.570 Thankfully, there's more sophisticated tools than this. 00:27:58.570 --> 00:28:00.930 Let me go ahead and reintroduce the bug here. 00:28:00.930 --> 00:28:04.620 And let me reopen my sidebar at left here. 00:28:04.620 --> 00:28:08.550 Let me now recompile the code to make sure it's current. 00:28:08.550 --> 00:28:11.310 And I'm going to run a command called debug50. 00:28:11.310 --> 00:28:15.090 Which is a command that's representative of a type of program 00:28:15.090 --> 00:28:16.740 known as a debugger. 00:28:16.740 --> 00:28:19.680 And this debugger is actually built into VS Code. 00:28:19.680 --> 00:28:23.700 And all debug50 is doing for us is automating the process of starting 00:28:23.700 --> 00:28:25.650 VS Code's built-in debugger. 00:28:25.650 --> 00:28:28.260 So this isn't even a CS50-specific tool, we've 00:28:28.260 --> 00:28:31.170 just given you a debug50 command to make it easier 00:28:31.170 --> 00:28:32.855 to start it up from the get-go. 00:28:32.855 --> 00:28:37.560 And the way you run this debugger is you say debug50, space, and then 00:28:37.560 --> 00:28:40.120 the name of the program that you want to debug. 00:28:40.120 --> 00:28:42.210 So, in this case, . /buggy. 00:28:42.210 --> 00:28:44.010 So you don't mention your c-file. 00:28:44.010 --> 00:28:46.650 You mention your already-compiled code. 00:28:46.650 --> 00:28:52.230 And what this debugger is going to let me do is, most powerfully, 00:28:52.230 --> 00:28:54.930 walk through my code step-by-step. 00:28:54.930 --> 00:28:58.930 Because every program we've written thus far, runs from start to finish, 00:28:58.930 --> 00:29:02.325 even if I'm not done thinking through each step at a time. 00:29:02.325 --> 00:29:05.850 With a debugger, I can actually click on a line number 00:29:05.850 --> 00:29:09.180 and say pause execution here, and the debugger 00:29:09.180 --> 00:29:14.130 will let me walk through my code one step at a time, one second at a time, 00:29:14.130 --> 00:29:16.740 one minute at a time, at my own human pace. 00:29:16.740 --> 00:29:19.470 Which is super compelling when the programs get more complicated 00:29:19.470 --> 00:29:22.600 and they might, otherwise, fly by on the screen. 00:29:22.600 --> 00:29:25.860 So I'm going to click to the left of line 5. 00:29:25.860 --> 00:29:27.970 And notice that these little red dots appear. 00:29:27.970 --> 00:29:31.290 And if I click on one it stays, and gets even redder. 00:29:31.290 --> 00:29:34.230 And I'm going to run debug50 on ./buggy. 00:29:34.230 --> 00:29:39.090 And in just a moment, you'll see that a new panel opens on the left hand side. 00:29:39.090 --> 00:29:41.910 It's doing some configuration of the screen. 00:29:41.910 --> 00:29:46.690 Let me zoom out a little bit here so we can see more on the screen at once. 00:29:46.690 --> 00:29:50.440 And sometimes, you'll see in VS Code that debug console opens up, 00:29:50.440 --> 00:29:54.480 which looks very cryptic, just go back to terminal window if that happens. 00:29:54.480 --> 00:29:57.875 Because at the terminal window is where you can still interact with your code. 00:29:57.875 --> 00:30:00.120 And let's now take a look at what's going on. 00:30:00.120 --> 00:30:04.650 If I zoom in on my buggy.c code here, you'll 00:30:04.650 --> 00:30:10.890 notice that we have the same program as before, but highlighted in yellow 00:30:10.890 --> 00:30:11.820 is line 5. 00:30:11.820 --> 00:30:15.660 Not a coincidence, that's the line I set a so-called breakpoint at. 00:30:15.660 --> 00:30:20.400 The little red dot means break here, pause execution here. 00:30:20.400 --> 00:30:23.716 And the yellow line has not yet been executed. 00:30:23.716 --> 00:30:27.600 But if I, now, at the top of my screen, notice these little arrows. 00:30:27.600 --> 00:30:28.750 There's one for Play. 00:30:28.750 --> 00:30:30.750 There's one for this, which, if I hover over it, 00:30:30.750 --> 00:30:34.140 says Step Over, there's another that's going to say Step Into, 00:30:34.140 --> 00:30:35.820 there's a third that says Step Out. 00:30:35.820 --> 00:30:38.520 I'm just going to use the first of these, Step Over. 00:30:38.520 --> 00:30:41.580 And I'm going to do this, and you'll see that the yellow highlight 00:30:41.580 --> 00:30:45.660 moved from line 5 to line 7 because now it's ready, 00:30:45.660 --> 00:30:47.955 but hasn't yet printed out that hash. 00:30:47.955 --> 00:30:51.817 But the most powerful thing here, notice, is that top left here. 00:30:51.817 --> 00:30:54.150 It's a little cryptic, because there's a bunch of things 00:30:54.150 --> 00:30:56.910 going on that will make more sense over time, but at the top 00:30:56.910 --> 00:30:58.470 there's a section called variables. 00:30:58.470 --> 00:31:00.750 Below that, something called locals, which means 00:31:00.750 --> 00:31:02.820 local to my current function, main. 00:31:02.820 --> 00:31:07.410 And notice, there's my variable called i, and its current value is 0. 00:31:07.410 --> 00:31:12.810 So now, once I click Step Over again, watch what happens. 00:31:12.810 --> 00:31:15.660 We go from line 7 back to line 5. 00:31:15.660 --> 00:31:19.455 But look in the terminal window, one of the hashes has printed. 00:31:19.455 --> 00:31:22.050 But now, it's printed at my own pace. 00:31:22.050 --> 00:31:24.030 I can think through this step-by-step. 00:31:24.030 --> 00:31:26.340 Notice that i has not changed, yet. 00:31:26.340 --> 00:31:29.700 It's still 0 because the yellow highlighted line hasn't yet executed. 00:31:29.700 --> 00:31:34.140 But the moment I click Step Over, it's going to execute line 5. 00:31:34.140 --> 00:31:41.010 Now, notice at top left, i has become 1, and nothing has printed, yet, 00:31:41.010 --> 00:31:43.290 because now, highlighted is line 7. 00:31:43.290 --> 00:31:48.000 So if I click Step Over again, we'll see the hash. 00:31:48.000 --> 00:31:51.930 If I repeat this process at my own human, comfortable pace, 00:31:51.930 --> 00:31:57.040 I can see my variables changing, I can see output changing on the screen, 00:31:57.040 --> 00:31:59.902 and I can just think about should that have just happened. 00:31:59.902 --> 00:32:01.860 I can pause and give thought to what's actually 00:32:01.860 --> 00:32:06.240 going on without trying to race the computer and figure it all out at once. 00:32:06.240 --> 00:32:08.490 I'm going to go ahead and stop here because we already 00:32:08.490 --> 00:32:11.430 know what this particular problem is, and that brings me back 00:32:11.430 --> 00:32:12.720 to my default terminal window. 00:32:12.720 --> 00:32:16.180 But this debugger, let me disable the breakpoint now 00:32:16.180 --> 00:32:18.570 so it doesn't keep breaking, this debugger 00:32:18.570 --> 00:32:20.760 will be your friend moving forward in order 00:32:20.760 --> 00:32:25.290 to step through your code step-by-step, at your own pace to figure out 00:32:25.290 --> 00:32:26.820 where something has gone wrong. 00:32:26.820 --> 00:32:30.397 Printf is great, but it gets annoying if you have to constantly add print this, 00:32:30.397 --> 00:32:33.480 print this, print this, print this, recompile, rerun it, oh wait a minute, 00:32:33.480 --> 00:32:34.980 print this, print this. 00:32:34.980 --> 00:32:39.780 The debugger lets you do the equivalent, but automatically. 00:32:39.780 --> 00:32:45.960 Questions on this debugger, which you'll see all the more hands-on over time? 00:32:45.960 --> 00:32:47.430 Questions on debugger? 00:32:47.430 --> 00:32:48.554 Yeah? 00:32:48.554 --> 00:32:50.560 AUDIENCE: You were using a Step Over feature. 00:32:50.560 --> 00:32:53.303 What do the other features in the debugger-- 00:32:53.303 --> 00:32:54.720 DAVID MALAN: Really good question. 00:32:54.720 --> 00:32:57.720 We'll see this before long, but those other buttons that I glossed over, 00:32:57.720 --> 00:33:02.460 step into and step out of, actually let you step into specific functions 00:33:02.460 --> 00:33:04.200 if I had any more than main. 00:33:04.200 --> 00:33:06.960 So if main called a function called something, 00:33:06.960 --> 00:33:10.380 and something called a function called something else, instead of just 00:33:10.380 --> 00:33:14.730 stepping over the entire execution of that function, I could step into it 00:33:14.730 --> 00:33:17.105 and walk through its lines of code one by one. 00:33:17.105 --> 00:33:19.020 So any time you have a problem set you're 00:33:19.020 --> 00:33:22.140 working on that has multiple functions, you can set a breakpoint in main, 00:33:22.140 --> 00:33:26.250 if you want, or you can set it inside of one of your additional functions 00:33:26.250 --> 00:33:29.130 to focus your attention only on that. 00:33:29.130 --> 00:33:32.640 And we'll see examples of that over time. 00:33:32.640 --> 00:33:33.780 All right, so what else? 00:33:33.780 --> 00:33:38.100 And what's the sort of, elephant in the room, so to speak, 00:33:38.100 --> 00:33:39.750 is actually a duck in this case. 00:33:39.750 --> 00:33:42.160 Why is there this duck and all of these ducks here? 00:33:42.160 --> 00:33:46.440 Well, it turns out, a third, genuinely recommended, debugging technique 00:33:46.440 --> 00:33:50.055 is talking through problems, talking through code with someone else. 00:33:50.055 --> 00:33:52.620 Now, in the absence of having a family member, or a friend, 00:33:52.620 --> 00:33:56.520 or a roommate who actually wants to hear you talk about code, of all things, 00:33:56.520 --> 00:34:01.320 generally, programmers turn to a rubber duck, or other inanimate objects 00:34:01.320 --> 00:34:03.360 if something animate is not available. 00:34:03.360 --> 00:34:06.760 The idea behind rubber duck debugging, so to speak, 00:34:06.760 --> 00:34:12.750 is that simply by looking at your code and talking it through, OK, on line 3, 00:34:12.750 --> 00:34:17.040 I'm starting a 4 loop and I'm initializing i to 0. 00:34:17.040 --> 00:34:18.990 OK, then, I'm printing out a hash. 00:34:18.990 --> 00:34:24.112 Just by talking through your code, step-by-step, invariably, 00:34:24.112 --> 00:34:26.820 finds you having the proverbial light bulb go off over your head, 00:34:26.820 --> 00:34:29.040 because you realize, wait a minute I just said something stupid, 00:34:29.040 --> 00:34:30.510 or I just said something wrong. 00:34:30.510 --> 00:34:34.500 And this is really just a proxy for any other human, teaching fellow, teacher 00:34:34.500 --> 00:34:36.060 or friend, colleague. 00:34:36.060 --> 00:34:38.440 But in the absence of any of those people in the room, 00:34:38.440 --> 00:34:40.357 you're welcome to take, on your way out today. 00:34:40.357 --> 00:34:44.280 One of these little, rubber ducks and consider using it, for real, any time 00:34:44.280 --> 00:34:47.820 you want to talk through one of your problems in CS50, 00:34:47.820 --> 00:34:49.140 or maybe life more generally. 00:34:49.140 --> 00:34:51.480 But having it there on your desk is just a way 00:34:51.480 --> 00:34:55.140 to help you hear illogic in what you think 00:34:55.140 --> 00:34:57.790 might, otherwise, be logical code. 00:34:57.790 --> 00:35:02.400 So printf, debugging, rubber-duck debugging are just three of the ways, 00:35:02.400 --> 00:35:05.207 you'll see over time, to get to the source of code 00:35:05.207 --> 00:35:06.790 that you will write that has mistakes. 00:35:06.790 --> 00:35:08.880 Which is going to happen, but it will empower you 00:35:08.880 --> 00:35:12.000 all the more to solve those mistakes. 00:35:12.000 --> 00:35:17.440 All right, any questions on debugging, in general, or these three techniques? 00:35:17.440 --> 00:35:17.940 Yeah? 00:35:17.940 --> 00:35:19.740 AUDIENCE: [INAUDIBLE] 00:35:19.740 --> 00:35:22.650 DAVID MALAN: What's the difference between Step Over and Step Into? 00:35:22.650 --> 00:35:25.980 At the moment, the only one that's applicable to the code I just wrote 00:35:25.980 --> 00:35:29.340 is Step Over, because it means step over each line of code. 00:35:29.340 --> 00:35:34.050 If, though, I had other functions that I had written in this program, 00:35:34.050 --> 00:35:39.300 maybe lower down in the file, I could step into those function calls 00:35:39.300 --> 00:35:41.469 and walk through them one at a time. 00:35:41.469 --> 00:35:43.650 So we'll come back to this with an actual example, 00:35:43.650 --> 00:35:46.230 but step into will allow me to do exactly that. 00:35:46.230 --> 00:35:49.210 In fact, this is a perfect segue to doing a little something like this. 00:35:49.210 --> 00:35:51.632 Let me go ahead and open up another file here. 00:35:51.632 --> 00:35:53.340 And, actually, we'll use the same, buggy. 00:35:53.340 --> 00:35:56.320 And we're going to write one other thing that's buggy, as well. 00:35:56.320 --> 00:36:00.000 Let me go up here and include, as before, cs50.h. 00:36:00.000 --> 00:36:03.780 Let me include stdio.h. 00:36:03.780 --> 00:36:05.520 Let me do int main(void). 00:36:05.520 --> 00:36:08.050 So all of this, I think, is correct, so far. 00:36:08.050 --> 00:36:11.280 And let's do this, let's give myself an int called i, 00:36:11.280 --> 00:36:14.530 and let's ask the user for a negative integer. 00:36:14.530 --> 00:36:17.300 This is not a function that exists, technically, yet. 00:36:17.300 --> 00:36:20.050 But I'm going to assume, for the sake of discussion, that it does. 00:36:20.050 --> 00:36:23.700 Then, I'm just going to print out, with %i and a new line, 00:36:23.700 --> 00:36:25.360 whatever the human typed in. 00:36:25.360 --> 00:36:28.320 So at this point in the story, my program, I think, is correct. 00:36:28.320 --> 00:36:30.930 Except for the fact that get negative int is not 00:36:30.930 --> 00:36:33.690 a function in the CS50 library or anywhere else. 00:36:33.690 --> 00:36:35.460 I'm going to need to invent it myself. 00:36:35.460 --> 00:36:41.310 So suppose, in this case, that I declare a function called get negative int. 00:36:41.310 --> 00:36:45.630 It's return type, so to speak, should be int, because, as its name suggests, 00:36:45.630 --> 00:36:48.360 I want to hand the user back in integer, and it's going 00:36:48.360 --> 00:36:50.310 to take no input to keep it simple. 00:36:50.310 --> 00:36:51.810 So I'm just going to say void there. 00:36:51.810 --> 00:36:54.810 No inputs, no special prompts, nothing like that. 00:36:54.810 --> 00:36:57.600 Let me, now, give myself some curly braces. 00:36:57.600 --> 00:37:00.510 And let me do something familiar, perhaps, from problem set 1. 00:37:00.510 --> 00:37:05.550 Let me give myself a variable, like n, and let me do the following 00:37:05.550 --> 00:37:07.320 within this block of code. 00:37:07.320 --> 00:37:13.590 Assign n the value of get int, asking the user for a negative integer using 00:37:13.590 --> 00:37:14.850 get int's own prompt. 00:37:14.850 --> 00:37:18.750 And I want to do this while n is less than 0, because I 00:37:18.750 --> 00:37:20.390 want to get a negative from the user. 00:37:20.390 --> 00:37:24.140 And recall, from having used this block in the past, 00:37:24.140 --> 00:37:27.770 I can now return n as the very last step to hand back 00:37:27.770 --> 00:37:31.790 whatever the user has typed in, so long as they cooperated and gave me 00:37:31.790 --> 00:37:33.750 an actual negative integer. 00:37:33.750 --> 00:37:36.710 Now, I've deliberately made a mistake here, 00:37:36.710 --> 00:37:39.080 and it's a subtle, silly, mathematical one, 00:37:39.080 --> 00:37:43.910 but let me compile this program after copying the prototype up to the top, 00:37:43.910 --> 00:37:45.380 so I don't make that mistake again. 00:37:45.380 --> 00:37:48.470 Let me do make buggy, Enter. 00:37:48.470 --> 00:37:50.720 And now, let me do ./buggy. 00:37:50.720 --> 00:37:54.020 I'll give it a negative integer, like negative 50. 00:37:54.020 --> 00:37:55.370 Uh-huh. 00:37:55.370 --> 00:37:59.330 That did not take. 00:37:59.330 --> 00:38:00.860 How about negative 5? 00:38:00.860 --> 00:38:02.060 No. 00:38:02.060 --> 00:38:04.500 How about 0? 00:38:04.500 --> 00:38:05.000 All right. 00:38:05.000 --> 00:38:09.080 So it's, clearly, working backwards, or incorrectly here, logically. 00:38:09.080 --> 00:38:10.800 So how could I go about debugging this? 00:38:10.800 --> 00:38:12.425 Well, I could do what I've done before? 00:38:12.425 --> 00:38:18.920 I could use my printf technique and say something explicit like n is %i, 00:38:18.920 --> 00:38:25.310 new line, comma n, just to print it out, let me recompile buggy, 00:38:25.310 --> 00:38:28.640 let me rerun buggy, let me type in negative 50. 00:38:28.640 --> 00:38:30.630 OK, n is negative 50. 00:38:30.630 --> 00:38:33.173 So that didn't really help me at this point, 00:38:33.173 --> 00:38:34.590 because that's the same as before. 00:38:34.590 --> 00:38:38.030 So let me do this, debug50, ./buggy. 00:38:38.030 --> 00:38:39.870 Oh, but I've made a mistake. 00:38:39.870 --> 00:38:41.700 So I didn't set my breakpoint, yet. 00:38:41.700 --> 00:38:44.930 So let me do this, and I'll set a breakpoint this time. 00:38:44.930 --> 00:38:47.330 I could set it here, on line 8. 00:38:47.330 --> 00:38:49.340 Let's do it in main, as before. 00:38:49.340 --> 00:38:51.530 Let me rerun debug50, now. 00:38:51.530 --> 00:38:52.970 On ./buggy. 00:38:52.970 --> 00:38:55.190 That fancy user interface is going to pop up. 00:38:55.190 --> 00:38:58.310 It's going to highlight the line that I set the breakpoint on. 00:38:58.310 --> 00:39:01.250 Notice that, on the left hand side of the screen, 00:39:01.250 --> 00:39:04.650 i is defaulting, at the moment to 0, because I haven't typed anything in, 00:39:04.650 --> 00:39:05.150 yet. 00:39:05.150 --> 00:39:10.815 But let me, now, Step Over this line that's highlighted in yellow, 00:39:10.815 --> 00:39:12.440 and you'll see that I'm being prompted. 00:39:12.440 --> 00:39:16.220 So let's type in my negative 50, Enter. 00:39:16.220 --> 00:39:21.470 Notice now that I'm stuck in that function. 00:39:21.470 --> 00:39:22.250 All right. 00:39:22.250 --> 00:39:26.520 So clearly, the issue seems to be in my get negative int function. 00:39:26.520 --> 00:39:30.120 So, OK, let me stop this execution. 00:39:30.120 --> 00:39:33.175 My problem doesn't seem to be in main, per se, maybe it's down here. 00:39:33.175 --> 00:39:33.800 So that's fine. 00:39:33.800 --> 00:39:35.990 Let me set my same breakpoint at line 8. 00:39:35.990 --> 00:39:38.510 Let me rerun debug50 one more time. 00:39:38.510 --> 00:39:43.110 But this time, instead of just stepping over that line, let's step into it. 00:39:43.110 --> 00:39:45.410 So notice line 8 is, again, highlighted in yellow. 00:39:45.410 --> 00:39:47.690 In the past I've been clicking Step Over. 00:39:47.690 --> 00:39:50.180 Let's click Step into, now. 00:39:50.180 --> 00:39:53.480 When I click Step Into, boom, now, the debugger 00:39:53.480 --> 00:39:56.390 jumps into that specific function. 00:39:56.390 --> 00:39:59.330 Now, I can step through these lines of code, again and again. 00:39:59.330 --> 00:40:01.700 I can see what the value of n is as I'm typing it in. 00:40:01.700 --> 00:40:03.500 I can think through my logic, and voila. 00:40:03.500 --> 00:40:07.640 Hopefully, once I've solved the issue, I can exit the debugger, fix my code, 00:40:07.640 --> 00:40:09.180 and move on. 00:40:09.180 --> 00:40:12.050 So Step Over just goes over the line, but executes it, 00:40:12.050 --> 00:40:17.210 Step Into lets you go into other functions you've written. 00:40:17.210 --> 00:40:19.400 So let's go ahead and do this. 00:40:19.400 --> 00:40:23.550 We've got a bunch of possible approaches that we 00:40:23.550 --> 00:40:25.550 can take to solving some problems let's go ahead 00:40:25.550 --> 00:40:26.730 and pace ourselves today, though. 00:40:26.730 --> 00:40:27.900 Let's take a five-minute break, here. 00:40:27.900 --> 00:40:30.688 And when we come back, we'll take a look at that computer's memory 00:40:30.688 --> 00:40:31.730 we've been talking about. 00:40:31.730 --> 00:40:32.950 See you in five. 00:40:32.950 --> 00:40:36.380 All right. 00:40:36.380 --> 00:40:41.000 So let's dive back in. 00:40:41.000 --> 00:40:46.860 Up until now, both, by way of week 1 and problems set 1, for the most part, 00:40:46.860 --> 00:40:50.660 we've just translated from Scratch into C all of these basic building blocks, 00:40:50.660 --> 00:40:53.700 like loops and conditionals, Boolean expressions, variables. 00:40:53.700 --> 00:40:54.950 So sort of, more of the same. 00:40:54.950 --> 00:40:58.430 But there are features in C that we've already stumbled across already, 00:40:58.430 --> 00:41:02.300 like data types, the types of variables that doesn't exist in Scratch, 00:41:02.300 --> 00:41:04.450 but that, in fact, does exist in other languages. 00:41:04.450 --> 00:41:06.200 In fact, a few that we'll see before long. 00:41:06.200 --> 00:41:10.670 So to summarize the types we saw last week, recall this little list here. 00:41:10.670 --> 00:41:15.050 We had ints, and floats, and longs, and doubles, and chars, 00:41:15.050 --> 00:41:18.510 there's also Booles and also string, which we've seen a few times. 00:41:18.510 --> 00:41:21.830 But today, let's actually start to formalize what these things are, 00:41:21.830 --> 00:41:25.760 and actually what your Mac and PC are doing when you manipulate bits 00:41:25.760 --> 00:41:29.170 as an int versus a char, versus a string, versus something else. 00:41:29.170 --> 00:41:31.920 And see if we can't put more tools into your toolkit, so to speak, 00:41:31.920 --> 00:41:35.630 so we can start quickly writing more featureful, more sophisticated 00:41:35.630 --> 00:41:36.800 programs in C. 00:41:36.800 --> 00:41:40.640 So it turns out, that on most systems nowadays, 00:41:40.640 --> 00:41:43.010 though this can vary by actual computer, this 00:41:43.010 --> 00:41:46.040 is how large each of the data types, typically, 00:41:46.040 --> 00:41:51.590 is in C. When you store a Boolean value, a 0 or 1, a true, a false, or true, 00:41:51.590 --> 00:41:52.850 it actually uses 1 byte. 00:41:52.850 --> 00:41:55.100 That's a little excessive, because, strictly speaking, 00:41:55.100 --> 00:41:58.580 you only need 1 bit, which is 1/8 of this size. 00:41:58.580 --> 00:42:01.190 But for simplicity, computers use a whole byte 00:42:01.190 --> 00:42:03.740 to represent a Boole, true or false. 00:42:03.740 --> 00:42:08.040 A char, we saw last week, is only 1 byte, or 8 bits. 00:42:08.040 --> 00:42:12.950 And this is why ASCII, which uses 1 byte, or technically, only 7 bits early 00:42:12.950 --> 00:42:17.600 on, was confined to only 256 maximally possible characters. 00:42:17.600 --> 00:42:21.940 Notice that an int is 4 bytes, or 32 bits. 00:42:21.940 --> 00:42:24.580 A float is also 4 bytes or 32 bits. 00:42:24.580 --> 00:42:27.850 But the things that we call long, it's, literally, twice as long, 00:42:27.850 --> 00:42:29.710 8 bytes or 64 bits. 00:42:29.710 --> 00:42:30.430 So is a double. 00:42:30.430 --> 00:42:33.900 A double is 64 bits of precision for floating point values. 00:42:33.900 --> 00:42:37.215 And a string, for today, we're going to leave as a question mark. 00:42:37.215 --> 00:42:39.340 We'll come back to that, later today and next week, 00:42:39.340 --> 00:42:42.520 as to how much space a string takes up, but, suffice it to say, 00:42:42.520 --> 00:42:45.488 it's going to take up a variable amount of space, 00:42:45.488 --> 00:42:47.530 depending on whether the string is short or long. 00:42:47.530 --> 00:42:50.470 But we'll see exactly what that means, before long. 00:42:50.470 --> 00:42:55.030 So here's a photograph of a typical piece of memory 00:42:55.030 --> 00:42:57.760 inside of your Mac, or PC, or phone. 00:42:57.760 --> 00:43:00.160 Odds are, it might be a little smaller in some devices. 00:43:00.160 --> 00:43:02.950 This is known as RAM, or random access memory. 00:43:02.950 --> 00:43:05.410 Each of these little black chips on this circuit 00:43:05.410 --> 00:43:07.720 board, the green thing, these little black chips 00:43:07.720 --> 00:43:10.630 are where 0s and 1s are actually stored. 00:43:10.630 --> 00:43:12.670 Each of those stores some number of bytes. 00:43:12.670 --> 00:43:15.130 Maybe megabytes, maybe even gigabytes, nowadays. 00:43:15.130 --> 00:43:21.430 So let's focus on one of those chips, to give us a zoomed in version, thereof. 00:43:21.430 --> 00:43:25.390 Let's consider the fact that, even though we don't have to care, exactly , 00:43:25.390 --> 00:43:29.470 how this kind of thing is made, if this is, like, 1 gigabyte of memory, 00:43:29.470 --> 00:43:31.930 for the sake of discussion, it stands to reason that, 00:43:31.930 --> 00:43:35.830 if this thing is storing 1 billion bytes, 1 gigabyte, 00:43:35.830 --> 00:43:38.110 then we can number them, arbitrarily. 00:43:38.110 --> 00:43:41.590 Maybe this will be byte 0, 1, 2, 3, 4, 5, 6, 7, 8. 00:43:41.590 --> 00:43:45.000 Then, maybe, way down here in the bottom right corner is byte number 1 billion. 00:43:45.000 --> 00:43:48.760 We can just number these things, as might be our convention. 00:43:48.760 --> 00:43:50.710 Let's draw that graphically. 00:43:50.710 --> 00:43:53.090 Not with a billion squares, but fewer than those. 00:43:53.090 --> 00:43:55.410 And let's zoom in further, and consider that. 00:43:55.410 --> 00:43:57.160 At this point in the story, let's abstract 00:43:57.160 --> 00:43:59.380 away all the hardware, and all the little wires, 00:43:59.380 --> 00:44:03.730 and just think of memory as taking up-- or, rather, just think of data 00:44:03.730 --> 00:44:06.170 as taking up some number of bytes. 00:44:06.170 --> 00:44:09.820 So, for instance, if you were to store a char in a computer's memory, which 00:44:09.820 --> 00:44:14.230 was 1 byte, it might be stored at this top left-hand location 00:44:14.230 --> 00:44:16.195 of this black chip of memory. 00:44:16.195 --> 00:44:20.290 If you were to store something like an integer that uses 4 bytes, well, 00:44:20.290 --> 00:44:23.560 it might use four of those bytes, but they're going to be contiguous 00:44:23.560 --> 00:44:25.220 back-to-back-to-back, in this case. 00:44:25.220 --> 00:44:29.270 If you were to store a long or a double, you might, actually, need 8 bytes. 00:44:29.270 --> 00:44:31.390 So I'm filling in these squares to represent 00:44:31.390 --> 00:44:36.160 how much memory and given variable of some data type would take up. 00:44:36.160 --> 00:44:39.230 1, or 4, or 8, in this case, here. 00:44:39.230 --> 00:44:42.160 Well, from here, let's abstract away from all of the hardware 00:44:42.160 --> 00:44:44.320 and really focus on memory as being a grid. 00:44:44.320 --> 00:44:47.650 Or, really, like a canvas that we can paint any types of data 00:44:47.650 --> 00:44:48.850 onto that we want. 00:44:48.850 --> 00:44:52.600 At the end of the day, all of this data is just going to be 0s and 1s. 00:44:52.600 --> 00:44:56.500 But it's up to you and I to build abstractions on top of that. 00:44:56.500 --> 00:45:00.130 Things like actual numbers, colors, images, movies, and beyond. 00:45:00.130 --> 00:45:02.440 But we'll start lower-level, here, first. 00:45:02.440 --> 00:45:05.950 Suppose I had a program that needs three integers. 00:45:05.950 --> 00:45:08.800 A simple program whose purpose in life is to average your three 00:45:08.800 --> 00:45:12.400 scores on an exam, or some such thing. 00:45:12.400 --> 00:45:17.020 Suppose that your three scores were these, 72, 73, not too bad, and 33, 00:45:17.020 --> 00:45:18.145 which is particularly low. 00:45:18.145 --> 00:45:23.030 Let's write a program that does this kind of averaging for us. 00:45:23.030 --> 00:45:24.860 Let me go back to VS Code, here. 00:45:24.860 --> 00:45:28.270 Let me open up a file called scores.c. 00:45:28.270 --> 00:45:30.830 Let me implement this as follows. 00:45:30.830 --> 00:45:35.860 Let me include stdio.h at the top, int main(void) as before. 00:45:35.860 --> 00:45:41.320 Then, inside of main, let me declare score 1, which is 72. 00:45:41.320 --> 00:45:43.990 Give me another score, 73. 00:45:43.990 --> 00:45:47.140 Then, a third score, called score 3, which is going to be 33. 00:45:47.140 --> 00:45:50.740 Now, I'm going to use printf to print out the average of those things, 00:45:50.740 --> 00:45:52.520 and I can do this in a few different ways. 00:45:52.520 --> 00:45:57.850 But I'm going to print out %f, and I'm going to do score 1, plus score 2, 00:45:57.850 --> 00:46:03.760 plus score 3, divided by 3, close parentheses semicolon. 00:46:03.760 --> 00:46:07.300 Some relatively simple arithmetic to compute the average of three scores, 00:46:07.300 --> 00:46:10.570 if I'm curious what my average grade is in the class with these three 00:46:10.570 --> 00:46:11.620 assessments. 00:46:11.620 --> 00:46:15.616 Let me, now, do make scores. 00:46:15.616 --> 00:46:19.240 All right, so I've somehow made an error already. 00:46:19.240 --> 00:46:25.150 But this one is, actually, germane to a problem we, hopefully, 00:46:25.150 --> 00:46:26.860 won't encounter too frequently. 00:46:26.860 --> 00:46:27.860 What's going on here? 00:46:27.860 --> 00:46:31.360 So underlined to score 1, plus score 2, plus score 3, divided by 3. 00:46:31.360 --> 00:46:36.250 Format specifies type double, but the argument has type int, well, 00:46:36.250 --> 00:46:38.530 what's going on here? 00:46:38.530 --> 00:46:40.430 Because the arithmetic seems to check out. 00:46:40.430 --> 00:46:40.930 Yeah? 00:46:40.930 --> 00:46:44.560 AUDIENCE: So the computer is doing the math, but they basically [INAUDIBLE] 00:46:44.560 --> 00:46:49.260 just gives out a value at the end because, well [INAUDIBLE] 00:46:49.260 --> 00:46:50.210 DAVID MALAN: Correct. 00:46:50.210 --> 00:46:51.640 And we'll come back to this in more detail, 00:46:51.640 --> 00:46:54.522 but, indeed, what's happening here is I'm adding three ints together, 00:46:54.522 --> 00:46:56.480 obviously, because I define them right up here. 00:46:56.480 --> 00:46:59.470 And I'm dividing by another int, 3, but the catch 00:46:59.470 --> 00:47:03.890 is, recall that C when it performs math, treats all of these things as integers. 00:47:03.890 --> 00:47:05.810 But integers are not floating point value. 00:47:05.810 --> 00:47:08.890 So if you actually want to get a precise, average for your score 00:47:08.890 --> 00:47:12.760 without throwing away the remainder, everything after the decimal point, 00:47:12.760 --> 00:47:15.430 it turns out, we're going to have to-- 00:47:15.430 --> 00:47:17.410 we're going to-- aww-- 00:47:17.410 --> 00:47:18.430 we're going to have to-- 00:47:18.430 --> 00:47:22.720 [LAUGHTER] we're going to have to convert this whole expression, somehow, 00:47:22.720 --> 00:47:23.350 to a float. 00:47:23.350 --> 00:47:26.230 And there's a few ways to do this but the easiest way, 00:47:26.230 --> 00:47:28.540 for now, I'm going to go ahead and do this up here, 00:47:28.540 --> 00:47:31.360 I'm going to change the divide by 3 to divide by 3.0. 00:47:31.360 --> 00:47:35.440 Because it turns out, long story short, in C, so long as one of the values 00:47:35.440 --> 00:47:37.300 participating in an arithmetic expression 00:47:37.300 --> 00:47:39.730 like this is something like a float, the rest 00:47:39.730 --> 00:47:44.210 will be treated as promoted to a floating point value as well. 00:47:44.210 --> 00:47:49.495 So let me, now, recompile this code with make scores, Enter. 00:47:49.495 --> 00:47:53.500 This time it worked OK, because I'm treating a float as a float. 00:47:53.500 --> 00:47:55.600 Let me do . /scores, Enter. 00:47:55.600 --> 00:48:00.150 All right, my average is 59.33333 and so forth. 00:48:00.150 --> 00:48:00.650 All right. 00:48:00.650 --> 00:48:03.340 So the math, presumably, checks out. 00:48:03.340 --> 00:48:06.220 Floating point imprecision per last week aside. 00:48:06.220 --> 00:48:09.280 But let's consider the design of this program. 00:48:09.280 --> 00:48:16.680 What is, kind of, bad about it, or if we maintain this program longer term, 00:48:16.680 --> 00:48:19.480 are we going to regret the design of this program? 00:48:19.480 --> 00:48:20.990 What might not be ideal here? 00:48:20.990 --> 00:48:21.490 Yeah? 00:48:21.490 --> 00:48:30.364 AUDIENCE: [INAUDIBLE] 00:48:30.364 --> 00:48:34.220 DAVID MALAN: Yeah, so in this case, I have hard coded my three scores. 00:48:34.220 --> 00:48:37.140 So, if I'm hearing you correctly, this program 00:48:37.140 --> 00:48:39.600 is only ever going to tell me this specific average. 00:48:39.600 --> 00:48:41.730 I'm not even using something like, get int 00:48:41.730 --> 00:48:44.790 or get float to get three different scores, so that's not good. 00:48:44.790 --> 00:48:46.942 And suppose that we wait later in the semester, 00:48:46.942 --> 00:48:48.400 I think other problems could arise. 00:48:48.400 --> 00:48:48.900 Yeah? 00:48:48.900 --> 00:48:51.020 AUDIENCE: Just thinking also somewhat of an issue 00:48:51.020 --> 00:48:52.900 that you can't reuse that number. 00:48:52.900 --> 00:48:55.450 DAVID MALAN: I can't reuse the number because I 00:48:55.450 --> 00:48:59.088 haven't stored the average in some variable, which in this program, not 00:48:59.088 --> 00:49:01.630 a big deal, but certainly, if I wanted to reuse it elsewhere, 00:49:01.630 --> 00:49:02.650 that's a problem. 00:49:02.650 --> 00:49:05.025 Let's fast-forward again, a little later in the semester, 00:49:05.025 --> 00:49:07.390 I don't just have three test scores or exam scores, 00:49:07.390 --> 00:49:09.430 maybe I have 4, or 5, or 6. 00:49:09.430 --> 00:49:10.690 Where might this take us? 00:49:10.690 --> 00:49:12.301 AUDIENCE: Yeah, if you ever want to have to take 00:49:12.301 --> 00:49:14.900 the average of any number of scores other than 3, [INAUDIBLE] 00:49:14.900 --> 00:49:18.110 DAVID MALAN: Yeah, I've sort of, capped this program at 3. 00:49:18.110 --> 00:49:20.942 And honestly, this is, kind of, bordering on copy paste. 00:49:20.942 --> 00:49:23.900 Even though the variables, yes, have different names; score 1, score 2, 00:49:23.900 --> 00:49:24.800 score 3. 00:49:24.800 --> 00:49:27.230 Imagine doing this for a whole grade book for a class. 00:49:27.230 --> 00:49:32.990 Having to score 4, 5, 6, 11 10, 12, 20, 30, that's a lot of variables. 00:49:32.990 --> 00:49:35.420 You can imagine just how ugly the code starts 00:49:35.420 --> 00:49:38.635 to get if you're just defining variable after variable, after variable. 00:49:38.635 --> 00:49:42.740 So it turns out, there are better ways, in languages like C, 00:49:42.740 --> 00:49:47.240 if you want to have multiple values stored in memory that 00:49:47.240 --> 00:49:49.040 happened to be of the same data type. 00:49:49.040 --> 00:49:50.420 Let's take a look back at this memory, here, 00:49:50.420 --> 00:49:52.545 to see what these things might look like in memory. 00:49:52.545 --> 00:49:54.170 Here's that grid of memory. 00:49:54.170 --> 00:49:56.450 Each of these recall represents a byte. 00:49:56.450 --> 00:49:59.690 To be clear, if I store score 1 in memory first, 00:49:59.690 --> 00:50:01.130 how many bytes will it take up? 00:50:01.130 --> 00:50:02.520 AUDIENCE: [INAUDIBLE] 00:50:02.520 --> 00:50:03.650 DAVID MALAN: So 4, a.k.a. 00:50:03.650 --> 00:50:04.430 32 bits. 00:50:04.430 --> 00:50:08.578 So I might draw a score 1 as filling up this part of the memory. 00:50:08.578 --> 00:50:11.870 It's up to the computer as to whether it goes here, or down there, or wherever. 00:50:11.870 --> 00:50:15.290 I'm just keeping the pictures clean for today, from the top-left on down. 00:50:15.290 --> 00:50:18.080 If I, then, declare another variable, called score 2, 00:50:18.080 --> 00:50:20.730 it might end up over there, also taking up 4 bytes. 00:50:20.730 --> 00:50:23.330 And then score 3 might end up here. 00:50:23.330 --> 00:50:26.880 So that's just representing what's going on inside of the computer's memory. 00:50:26.880 --> 00:50:30.680 But technically speaking, to be clear, per week 0, what's 00:50:30.680 --> 00:50:34.580 really being stored in the computer's memory, are patterns of 0s and 1s. 00:50:34.580 --> 00:50:39.350 32 total, in this case, because 32 bits is 4 bytes. 00:50:39.350 --> 00:50:43.280 But again, it gets boring quickly to think in and look 00:50:43.280 --> 00:50:44.760 at binary all the time. 00:50:44.760 --> 00:50:47.120 So we'll, generally, abstract this away as just using 00:50:47.120 --> 00:50:49.550 decimal numbers, in this case, instead. 00:50:49.550 --> 00:50:54.170 But there might be a better way to store, not just three of these things, 00:50:54.170 --> 00:50:57.500 but maybe four, maybe, five, maybe 10, maybe, more, 00:50:57.500 --> 00:51:03.110 by declaring one variable to store all of them, instead of 3, or 4, or 5, 00:51:03.110 --> 00:51:05.750 or more individual variables. 00:51:05.750 --> 00:51:10.250 The way to do this is by way of something known as an array. 00:51:10.250 --> 00:51:18.320 An array is another type of data that allows you to store multiple values 00:51:18.320 --> 00:51:20.980 of the same type back-to-back-to-back. 00:51:20.980 --> 00:51:22.230 That is, to say, contiguously. 00:51:22.230 --> 00:51:29.840 So an array can let you create memory for one int, or two, or three, 00:51:29.840 --> 00:51:32.600 or even more than that, but describe them 00:51:32.600 --> 00:51:36.390 all using the same variable name, the same one name. 00:51:36.390 --> 00:51:40.740 So for instance, if, for one program, I only need three integers, 00:51:40.740 --> 00:51:45.800 but I don't want to messily declare them as score 1, score 2, score 3, 00:51:45.800 --> 00:51:46.960 I can do this, instead. 00:51:46.960 --> 00:51:49.130 This is today's first new piece of syntax, 00:51:49.130 --> 00:51:51.290 the square brackets that we're now seeing. 00:51:51.290 --> 00:51:57.140 This line of code, here, is similar to int score 1 semicolon, 00:51:57.140 --> 00:52:00.360 or int score 1 equals 72 semicolon. 00:52:00.360 --> 00:52:05.780 This line of code is declaring for me, so to speak, an array of size 3. 00:52:05.780 --> 00:52:09.260 And that array is going to store three integers. 00:52:09.260 --> 00:52:09.770 Why? 00:52:09.770 --> 00:52:14.990 Because the type of that array is an int, here. 00:52:14.990 --> 00:52:18.110 The square brackets tell the computer how many ints you want. 00:52:18.110 --> 00:52:18.980 In this case, 3. 00:52:18.980 --> 00:52:21.140 And the name is, of course, scores. 00:52:21.140 --> 00:52:23.540 Which, in English, I've deliberately pluralized 00:52:23.540 --> 00:52:28.100 so that I can describe this array as storing multiple scores, indeed. 00:52:28.100 --> 00:52:32.970 So if I want to now assign values to this variable, called scores, 00:52:32.970 --> 00:52:34.760 I can do code like this. 00:52:34.760 --> 00:52:40.160 I can say, scores bracket 0 equals 72, scores bracket 1 equals 73, 00:52:40.160 --> 00:52:42.190 and scores bracket 2 equals 33. 00:52:42.190 --> 00:52:43.940 The only thing weird there is, admittedly, 00:52:43.940 --> 00:52:45.830 the square brackets which are still new. 00:52:45.830 --> 00:52:49.820 But we're also, notice, 0 indexing things. 00:52:49.820 --> 00:52:52.345 To zero index means to start counting at 0. 00:52:52.345 --> 00:52:54.470 When we've talked about that before, our four loops 00:52:54.470 --> 00:52:56.000 have, generally, been zero indexed. 00:52:56.000 --> 00:52:59.870 Arrays in C are zero indexed. 00:52:59.870 --> 00:53:01.430 And you do not have choice over that. 00:53:01.430 --> 00:53:04.550 You can't start counting at 1 in arrays because you prefer to, 00:53:04.550 --> 00:53:06.830 you'd be sacrificing one of the elements. 00:53:06.830 --> 00:53:09.620 You have to start in arrays counting from 0. 00:53:09.620 --> 00:53:13.130 So out of context, this doesn't solve a problem, 00:53:13.130 --> 00:53:15.200 but it, definitely, is going to once we have more 00:53:15.200 --> 00:53:16.910 than, even, three scores here. 00:53:16.910 --> 00:53:19.750 In fact, let me change this program a little bit. 00:53:19.750 --> 00:53:21.450 Let me go back to VS Code. 00:53:21.450 --> 00:53:24.020 And delete these three lines, here. 00:53:24.020 --> 00:53:27.080 And replace it with a scores variable that's 00:53:27.080 --> 00:53:30.140 ready to store three total integers. 00:53:30.140 --> 00:53:34.130 And then, initialize them as follows, scores bracket 0 is 72, 00:53:34.130 --> 00:53:38.300 as before, scores bracket 1 is going to be 73, scores bracket 2 00:53:38.300 --> 00:53:39.740 is going to be 33. 00:53:39.740 --> 00:53:44.068 Notice, I do not need to say int before any of these lines, 00:53:44.068 --> 00:53:45.860 because that's been taken care of, already, 00:53:45.860 --> 00:53:50.570 for me on line 5, where I already specified that everything in this array 00:53:50.570 --> 00:53:53.330 is going to be an int. 00:53:53.330 --> 00:53:57.020 Now, down here, this code needs to change because I no longer have 00:53:57.020 --> 00:53:59.300 three variables, score 1, 2, and 3. 00:53:59.300 --> 00:54:03.950 I have 1 variable, but that I can index into. 00:54:03.950 --> 00:54:08.750 I'm going to, here, then, do scores bracket 0, plus scores bracket 1, 00:54:08.750 --> 00:54:13.370 plus scores bracket 2, which is equivalent to what I did earlier, 00:54:13.370 --> 00:54:14.900 giving me back those three integers. 00:54:14.900 --> 00:54:17.860 But notice, I'm using the same variable name, every time. 00:54:17.860 --> 00:54:21.070 And again, I'm using this new square bracket notation to, quote-unquote, 00:54:21.070 --> 00:54:26.590 index into the array to get at the first int, the second int, and the third, 00:54:26.590 --> 00:54:28.840 and then, to do it again down here. 00:54:28.840 --> 00:54:31.907 Now, this program, still not really solving all the problems we describe, 00:54:31.907 --> 00:54:34.240 I still can only store three scores, but we'll come back 00:54:34.240 --> 00:54:35.930 to something like that before long. 00:54:35.930 --> 00:54:38.950 But for now, we're just introducing a new syntax and a new feature, 00:54:38.950 --> 00:54:44.980 whereby, I can now store multiple values in the same variable. 00:54:44.980 --> 00:54:47.110 Well, let's enhance this a bit more. 00:54:47.110 --> 00:54:50.660 Instead of hard coding these scores, as was identified as a problem, 00:54:50.660 --> 00:54:54.790 let's use get int to ask the user for a score. 00:54:54.790 --> 00:54:58.330 Let's, then, use get int to ask the user for another score. 00:54:58.330 --> 00:55:01.540 Let's use get int to ask the user for a third score, 00:55:01.540 --> 00:55:04.400 storing them in those respective locations. 00:55:04.400 --> 00:55:09.820 And, now, if I go ahead and save this program, recompile scores, huh. 00:55:09.820 --> 00:55:10.900 I've messed up, here. 00:55:10.900 --> 00:55:13.990 Now these errors should be getting a little familiar. 00:55:13.990 --> 00:55:16.750 What mistake did I make? 00:55:16.750 --> 00:55:17.875 Let me give folks a moment. 00:55:17.875 --> 00:55:18.970 AUDIENCE: cs50.h 00:55:18.970 --> 00:55:21.100 DAVID MALAN: cs50.h. 00:55:21.100 --> 00:55:24.220 That was not intentional, so still making mistakes all these years later. 00:55:24.220 --> 00:55:26.320 I need to include cs50.h. 00:55:26.320 --> 00:55:29.570 Now, I'm going to go back to the bottom in the terminal window, make scores. 00:55:29.570 --> 00:55:30.070 OK. 00:55:30.070 --> 00:55:31.670 We're back in business, ./scores. 00:55:31.670 --> 00:55:33.920 Now, the program is getting a little more interesting. 00:55:33.920 --> 00:55:38.020 So maybe, this year was better and I got a 100, and a 99, and a 98, and there, 00:55:38.020 --> 00:55:40.900 my average is 99.0000. 00:55:40.900 --> 00:55:42.370 So now, it's a little more dynamic. 00:55:42.370 --> 00:55:43.270 It's a little more interesting. 00:55:43.270 --> 00:55:45.978 But it's still capping the number of scores at three, admittedly. 00:55:45.978 --> 00:55:50.740 But now, I've introduced another, sort of, symptom of bad programming. 00:55:50.740 --> 00:55:54.108 There's this expression in programming, too, called code smell, where like-- 00:55:54.108 --> 00:55:55.900 [SNIFFS AIR] something smells a little off. 00:55:55.900 --> 00:56:00.550 And there's something off here in that I could do better with this code. 00:56:00.550 --> 00:56:05.080 Does anyone see an opportunity to improve the design of this code, here, 00:56:05.080 --> 00:56:08.230 if my goal, still, is to get three scores from the user but [SNIFF SNIFF] 00:56:08.230 --> 00:56:10.430 without it smelling [SNIFF] kind of bad? 00:56:10.430 --> 00:56:10.930 Yeah? 00:56:10.930 --> 00:56:12.940 AUDIENCE: [INAUDIBLE] use a 4 loop? 00:56:12.940 --> 00:56:15.958 That way you don't have to copy and paste all of those scores. 00:56:15.958 --> 00:56:17.160 DAVID MALAN: Yeah, exactly. 00:56:17.160 --> 00:56:19.022 Those lines of code are almost identical. 00:56:19.022 --> 00:56:21.480 And honestly, the only thing that's changing is the number, 00:56:21.480 --> 00:56:23.100 and it's just incrementing by 1. 00:56:23.100 --> 00:56:25.330 We have all of the building blocks to do this better. 00:56:25.330 --> 00:56:27.130 So let me go ahead and improve this. 00:56:27.130 --> 00:56:29.560 Let me delete that code. 00:56:29.560 --> 00:56:31.720 Let me, now, have a 4 loop. 00:56:31.720 --> 00:56:36.150 So for int i get 0, i less than 3, i plus plus. 00:56:36.150 --> 00:56:39.060 Then, inside of this 4 loop, I can distill all three 00:56:39.060 --> 00:56:40.860 of those lines into something more generic, 00:56:40.860 --> 00:56:46.530 like scores bracket i equals get int, and now, ask the user, just 00:56:46.530 --> 00:56:48.905 once, via get int, for a score. 00:56:48.905 --> 00:56:52.000 So this is where arrays start to get pretty powerful. 00:56:52.000 --> 00:56:54.000 You don't have to hard code, that is, literally, 00:56:54.000 --> 00:56:56.462 type in all of these magic numbers like 0, 1, and 2. 00:56:56.462 --> 00:56:58.170 You can start to do it, programmatically, 00:56:58.170 --> 00:56:59.770 as you propose with a loop. 00:56:59.770 --> 00:57:01.350 So now, I've tightened things up. 00:57:01.350 --> 00:57:04.230 I'm now, dynamically, getting three different scores, 00:57:04.230 --> 00:57:06.766 but putting them in three different locations. 00:57:06.766 --> 00:57:10.470 And so this program, ultimately, is going to work, pretty much, the same. 00:57:10.470 --> 00:57:17.520 Make scores, ./scores, and 100, 99, 98, and we're back to the same answer. 00:57:17.520 --> 00:57:19.440 But it's a little better designed, too. 00:57:19.440 --> 00:57:21.360 If I really want to nitpick, there's something 00:57:21.360 --> 00:57:23.100 that still smells, a little bit, here. 00:57:23.100 --> 00:57:27.540 The fact that I have indeed, this magic number three, that really 00:57:27.540 --> 00:57:29.890 has to be the same as this number here. 00:57:29.890 --> 00:57:32.170 Otherwise, who knows what's going to go wrong. 00:57:32.170 --> 00:57:34.380 So what might be a solution, per last week, 00:57:34.380 --> 00:57:36.960 to cleaning that code up further, too? 00:57:36.960 --> 00:57:39.750 AUDIENCE: [INAUDIBLE] the user's discretion 00:57:39.750 --> 00:57:41.742 how many input scores [INAUDIBLE]. 00:57:41.742 --> 00:57:44.790 DAVID MALAN: OK, so we could leave it up to the user's discretion. 00:57:44.790 --> 00:57:47.500 And so we could, actually, do something like this. 00:57:47.500 --> 00:57:49.200 Let me take this a few steps ahead. 00:57:49.200 --> 00:57:56.230 Let me say something like, int n gets get int, how many scores question mark, 00:57:56.230 --> 00:58:00.600 then I could actually change this to an n, and then this to an n, 00:58:00.600 --> 00:58:02.970 and, indeed, make the whole program dynamic? 00:58:02.970 --> 00:58:05.670 Ask the human how many tests have there been this semester? 00:58:05.670 --> 00:58:07.500 Then, you can type in each of those scores 00:58:07.500 --> 00:58:09.708 because the loop is going to iterate that many times. 00:58:09.708 --> 00:58:13.020 And then you'll get the average of one test, two test, three-- 00:58:13.020 --> 00:58:17.520 well, lost another-- or however many scores that were actually 00:58:17.520 --> 00:58:20.760 specified by the user Yeah, question? 00:58:20.760 --> 00:58:25.765 AUDIENCE: How many bits or bytes get used in an array? 00:58:25.765 --> 00:58:28.060 DAVID MALAN: How many bytes are used in an array? 00:58:28.060 --> 00:58:32.524 AUDIENCE: [INAUDIBLE] point of doing this is to save [INAUDIBLE] 00:58:32.524 --> 00:58:35.500 DAVID MALAN: So the purpose of an array is not to save space. 00:58:35.500 --> 00:58:39.010 It's to eliminate having multiple variable names 00:58:39.010 --> 00:58:40.900 because that gets very messy quickly. 00:58:40.900 --> 00:58:44.980 If you have score 1, score 2, score 3, dot, dot, dot, score 99, 00:58:44.980 --> 00:58:48.100 that's, like, 99 different variables, potentially, 00:58:48.100 --> 00:58:54.160 that you could collapse into one variable that has 99 locations. 00:58:54.160 --> 00:58:56.230 At different indices, or indexes. 00:58:56.230 --> 00:58:58.570 As someone would say, the index for an array 00:58:58.570 --> 00:59:00.756 is whatever is in the square brackets. 00:59:00.756 --> 00:59:11.560 AUDIENCE: [INAUDIBLE] 00:59:11.560 --> 00:59:13.280 DAVID MALAN: So it's a good question. 00:59:13.280 --> 00:59:15.370 So if you-- I'm using ints for everything-- 00:59:15.370 --> 00:59:17.560 and honestly, we don't really need ints for scores 00:59:17.560 --> 00:59:21.770 because I'm not likely to get a 2 billion on a test anytime soon. 00:59:21.770 --> 00:59:23.620 And so you could use different data types. 00:59:23.620 --> 00:59:26.287 And that list we had on the screen, earlier, is not all of them. 00:59:26.287 --> 00:59:29.770 There's a data type called short, which is shorter than an int, 00:59:29.770 --> 00:59:34.850 you could, technically, use char, in some form or other data types as well. 00:59:34.850 --> 00:59:36.940 Generally speaking, in the year 2021, these 00:59:36.940 --> 00:59:40.990 tend to be over optima-- overly optimized decisions. 00:59:40.990 --> 00:59:42.940 Everyone just uses ints, even though no one 00:59:42.940 --> 00:59:46.300 is going to get a test score that's 2 billion, or more, because int is just, 00:59:46.300 --> 00:59:47.260 kind of, the go-to. 00:59:47.260 --> 00:59:50.252 Years ago, memory was expensive. 00:59:50.252 --> 00:59:52.210 And every one of your instincts would have been 00:59:52.210 --> 00:59:54.700 spot on because memory is so tight. 00:59:54.700 --> 00:59:56.930 But, nowadays, we don't worry as much about it. 00:59:56.930 --> 00:59:57.430 Yeah? 00:59:57.430 --> 01:00:02.556 AUDIENCE: I have a question about the error [INAUDIBLE].. 01:00:02.556 --> 01:00:06.605 Could it-- when you're doing a hash problem on the problem set-- 01:00:06.605 --> 01:00:10.010 DAVID MALAN: So what is the difference between dividing two ints 01:00:10.010 --> 01:00:12.380 and not getting an error, as you might have encountered 01:00:12.380 --> 01:00:15.920 in a program like cash, versus dividing two ints 01:00:15.920 --> 01:00:18.150 and getting an error like I did a moment ago? 01:00:18.150 --> 01:00:22.280 The problem with the scenario I created a moment ago was printf was involved. 01:00:22.280 --> 01:00:27.980 And I was telling printf to use a %f, but I was giving printf the result 01:00:27.980 --> 01:00:30.580 of dividing integers by another integer. 01:00:30.580 --> 01:00:32.930 So it was printf that was yelling at me. 01:00:32.930 --> 01:00:35.930 I'm guessing in the scenario you're describing, for something like cash, 01:00:35.930 --> 01:00:39.180 printf was not involved in that particular line of code. 01:00:39.180 --> 01:00:40.865 So that's the difference, there. 01:00:40.865 --> 01:00:41.660 All right. 01:00:41.660 --> 01:00:45.110 So we, now, have this ability to create an array. 01:00:45.110 --> 01:00:47.510 And an array can store multiple values. 01:00:47.510 --> 01:00:51.450 What, then, might we do that's more interesting than just storing numbers 01:00:51.450 --> 01:00:51.950 in memory? 01:00:51.950 --> 01:00:54.230 Well, let's take this one step further. 01:00:54.230 --> 01:01:01.130 As opposed to just storing 72, 73, 33 or 100, 99, 98, at these given locations, 01:01:01.130 --> 01:01:05.930 because again, an array gives you one variable name, but multiple locations, 01:01:05.930 --> 01:01:08.360 or indices therein, bracket 0, bracket 1, 01:01:08.360 --> 01:01:11.330 bracket 2 on up, if it were even bigger than that. 01:01:11.330 --> 01:01:16.100 Let's, now, start to consider something more modest, like simple chars. 01:01:16.100 --> 01:01:18.830 Chars, being 1 byte each, so they're even smaller, 01:01:18.830 --> 01:01:20.090 they take up much less space. 01:01:20.090 --> 01:01:22.048 And, indeed, if I wanted to say a message like, 01:01:22.048 --> 01:01:24.200 hi I could use three variables. 01:01:24.200 --> 01:01:28.520 If I wanted a program to print, hi, H-I exclamation point, 01:01:28.520 --> 01:01:33.230 I could, of course, store those in three variables, like c1, c2, c3. 01:01:33.230 --> 01:01:36.710 And let's, for the sake of discussion, let's whip this up real quickly. 01:01:36.710 --> 01:01:39.680 Let me create a new program, now, in VS Code. 01:01:39.680 --> 01:01:42.920 This time, I'm going to call it hi.c. 01:01:42.920 --> 01:01:45.650 And I'm not going to bother with the CS50 library. 01:01:45.650 --> 01:01:47.660 I just need the standard I/O one, for now. 01:01:47.660 --> 01:01:49.220 int main(void). 01:01:49.220 --> 01:01:52.400 And then, inside of main, I'm going to, simply, create three variables. 01:01:52.400 --> 01:01:55.760 And this is already, hopefully, striking you as a bad idea. 01:01:55.760 --> 01:01:58.310 But we'll go down this road, temporarily, 01:01:58.310 --> 01:02:02.300 with c1, and c2, and, finally, c3. 01:02:02.300 --> 01:02:05.660 Storing each character in the phrase I want to print, 01:02:05.660 --> 01:02:09.450 and I'm going to print this in a different way than usual. 01:02:09.450 --> 01:02:10.880 Now I'm dealing with chars. 01:02:10.880 --> 01:02:14.480 And we've, generally, dealt with strings, which was easier last week. 01:02:14.480 --> 01:02:21.600 But %c, %c, %c, will let me print out three chars, and like c1, c2, and c3. 01:02:21.600 --> 01:02:24.420 So, kind of, a stupid way of printing out a string. 01:02:24.420 --> 01:02:26.940 So we already have a solution to this problem last week. 01:02:26.940 --> 01:02:30.540 But let's poke around at what's going on underneath the hood, here. 01:02:30.540 --> 01:02:33.350 So let's make hi, ./hi. 01:02:33.350 --> 01:02:34.475 And, voila no surprise. 01:02:34.475 --> 01:02:36.350 But we, again, could have done this last week 01:02:36.350 --> 01:02:39.530 with a string and just one variable, or even, 0, at that. 01:02:39.530 --> 01:02:43.220 But let's start converting these characters 01:02:43.220 --> 01:02:47.750 to their apparent numeric equivalents like we talked about in week 0 too. 01:02:47.750 --> 01:02:52.310 Let me modify these %c's, just to be fun, to be %i's. 01:02:52.310 --> 01:02:56.180 And let me add some spaces so there are gaps between each of them. 01:02:56.180 --> 01:03:00.350 Let me, now, recompile hi, and let me rerun it. 01:03:00.350 --> 01:03:02.900 Just to guess, what should I see on the screen now? 01:03:05.690 --> 01:03:06.200 Any guesses? 01:03:06.200 --> 01:03:06.700 Yeah? 01:03:06.700 --> 01:03:08.036 AUDIENCE: The ASCII values? 01:03:08.036 --> 01:03:09.760 DAVID MALAN: The ASCII values. 01:03:09.760 --> 01:03:12.220 And it's intentional that I keep using the same word, 01:03:12.220 --> 01:03:18.250 hi, because it should be, hopefully, the old friends, 72, 73, and 33. 01:03:18.250 --> 01:03:22.120 Which, is to say, that c knows about ASCII, or equivalently, Unicode, 01:03:22.120 --> 01:03:24.320 and can do this conversion for us automatically. 01:03:24.320 --> 01:03:27.670 And it seems to be doing it implicitly for us, so to speak. 01:03:27.670 --> 01:03:31.000 Notice that c1, c2 and c3 are, obviously, chars, 01:03:31.000 --> 01:03:34.420 but printf is able to tolerate printing them as integers. 01:03:34.420 --> 01:03:38.870 If I really want it to be pedantic, I could use this technique, again, 01:03:38.870 --> 01:03:41.320 known as typecasting, where I can actually 01:03:41.320 --> 01:03:46.610 convert one data type to another, if it makes logical sense to do so. 01:03:46.610 --> 01:03:49.900 And we saw in week 0, chars, or characters, 01:03:49.900 --> 01:03:53.500 are just numbers, like 72, 73, and 33. 01:03:53.500 --> 01:03:57.680 So I can use this parenthetical expression to convert, incorrectly, 01:03:57.680 --> 01:04:02.623 [LAUGHTER] three chars to three integers, instead. 01:04:02.623 --> 01:04:04.540 So that's what I meant to type the first time. 01:04:04.540 --> 01:04:05.040 There we go. 01:04:05.040 --> 01:04:05.800 Strike two, today. 01:04:05.800 --> 01:04:09.280 So parenthesis, int, close parenthesis says 01:04:09.280 --> 01:04:14.840 take whatever variable comes after this, c1, c2, or c3 and convert it to an int. 01:04:14.840 --> 01:04:18.640 The effect is going to be no different, make hi, and then rerunning whoops-- 01:04:18.640 --> 01:04:24.910 then running ./hi still works the same, but now I'm explicitly converting chars 01:04:24.910 --> 01:04:25.660 to ints. 01:04:25.660 --> 01:04:29.260 And we can do this all day long, chars to ints, floats to ints, 01:04:29.260 --> 01:04:30.250 ints to floats. 01:04:30.250 --> 01:04:31.888 Sometimes, it's equivalent. 01:04:31.888 --> 01:04:33.805 Other times, you're going to lose information. 01:04:33.805 --> 01:04:37.270 Taking a float to an int, just intuitively, 01:04:37.270 --> 01:04:39.790 is going to throw away everything after the decimal point, 01:04:39.790 --> 01:04:42.680 because an int has no decimal point. 01:04:42.680 --> 01:04:45.100 But, for now, I'm going to rewind to the version of this 01:04:45.100 --> 01:04:49.150 that just did implicit-type conversion, or implicit casting, 01:04:49.150 --> 01:04:53.350 just to demonstrate that we can, indeed, see the values underneath the hood. 01:04:53.350 --> 01:04:53.950 All right. 01:04:53.950 --> 01:04:56.370 Let me go ahead and do this, now, the week 1 way. 01:04:56.370 --> 01:04:57.370 This was kind of stupid. 01:04:57.370 --> 01:05:00.205 Let's just do printf, quote-unquote-- 01:05:00.205 --> 01:05:04.630 Actually, let's do this, string s equals quote-unquote hi, 01:05:04.630 --> 01:05:09.680 and then let's do a simple printf with %s, printing out s's there. 01:05:09.680 --> 01:05:12.520 So now I've rewound to last week, where we began this story, 01:05:12.520 --> 01:05:16.660 but you'll notice that, if we keep playing around with this-- 01:05:16.660 --> 01:05:18.860 whoops, what did I do here? 01:05:18.860 --> 01:05:23.470 Oh, and let me introduce the C50 library here, more on that next before long. 01:05:23.470 --> 01:05:26.260 Let me go ahead and recompile, rerun this, 01:05:26.260 --> 01:05:28.268 we seem to be coding in circles, here. 01:05:28.268 --> 01:05:30.810 Like, I've just done the same thing multiple, different ways. 01:05:30.810 --> 01:05:33.400 But there's clearly an equivalence, then, 01:05:33.400 --> 01:05:36.978 between sequences of chars and strings. 01:05:36.978 --> 01:05:38.770 And if you do it the real pedantic way, you 01:05:38.770 --> 01:05:43.390 have three different variables, c1, c2, c3, representing H-I exclamation point, 01:05:43.390 --> 01:05:47.870 or you can just treat them all together like this h, i, exclamation point. 01:05:47.870 --> 01:05:52.030 But it turns out that strings are actually 01:05:52.030 --> 01:05:58.060 implemented by the computer in a pretty now familiar way. 01:05:58.060 --> 01:06:04.382 What might a string actually be as of this point in the story? 01:06:04.382 --> 01:06:05.590 Where are we going with this? 01:06:05.590 --> 01:06:06.923 Let me try to look further back. 01:06:06.923 --> 01:06:07.850 Yeah, in way back? 01:06:07.850 --> 01:06:08.350 Yeah? 01:06:08.350 --> 01:06:10.600 AUDIENCE: Can a string like this be an array of chars? 01:06:10.600 --> 01:06:13.410 DAVID MALAN: Yeah, a string might be, and indeed is, just 01:06:13.410 --> 01:06:14.800 an array of characters. 01:06:14.800 --> 01:06:17.190 So last week we took for granted that strings exist. 01:06:17.190 --> 01:06:19.530 Technically, strings exist, but they're implemented 01:06:19.530 --> 01:06:23.070 as arrays of characters, which actually opens up 01:06:23.070 --> 01:06:25.770 some interesting possibilities for us. 01:06:25.770 --> 01:06:28.300 Because, let me see, let me see if I can do this. 01:06:28.300 --> 01:06:31.560 Let me try to print out, now, three integers again. 01:06:31.560 --> 01:06:37.530 But if string s is but an array, as you propose, maybe I can do s bracket 0, 01:06:37.530 --> 01:06:39.760 s bracket 1, and s bracket 2. 01:06:39.760 --> 01:06:43.650 So maybe I can start poking around inside of strings, 01:06:43.650 --> 01:06:45.630 even though we didn't do this last week, so I 01:06:45.630 --> 01:06:47.260 can get at those individual values. 01:06:47.260 --> 01:06:51.270 So make hi, ./hi and, voila, there we go again. 01:06:51.270 --> 01:06:56.208 It's the same 72, 73, 33, but now, I'm sort of, hopefully, 01:06:56.208 --> 01:06:58.500 like, wrapping my mind around the fact that, all right, 01:06:58.500 --> 01:07:01.650 a string is just an array of characters, and arrays, you 01:07:01.650 --> 01:07:04.960 can index into them using this new square bracket notation. 01:07:04.960 --> 01:07:08.040 So I can get at any one of these individual characters, 01:07:08.040 --> 01:07:14.055 and, heck, convert it to an integer like we did in week 0. 01:07:14.055 --> 01:07:17.010 Let me get a little curious now. 01:07:17.010 --> 01:07:20.020 What else might be in the computer's memory? 01:07:20.020 --> 01:07:23.550 Well, let's-- I'll go back to the depiction of these same things. 01:07:23.550 --> 01:07:25.860 Here might be how we originally implemented hi 01:07:25.860 --> 01:07:28.800 with three variables, c1, c2, c3. 01:07:28.800 --> 01:07:31.500 Of course, that map to these decimal digits or equivalent, 01:07:31.500 --> 01:07:32.880 these binary values. 01:07:32.880 --> 01:07:35.310 But what was this looking like in memory? 01:07:35.310 --> 01:07:38.250 Literally, when you create a string in memory, like this, 01:07:38.250 --> 01:07:41.240 string s equals quote-unquote hi, let's consider what's going on 01:07:41.240 --> 01:07:42.615 underneath the hood, so to speak. 01:07:42.615 --> 01:07:47.490 Well, as an abstraction, a string, it's H-I exclamation point taking up, 01:07:47.490 --> 01:07:48.917 it would seem, 3 bytes, right? 01:07:48.917 --> 01:07:51.000 I've gotten rid of the bars, there, because if you 01:07:51.000 --> 01:07:55.650 think of a string as a type, I'm just going to use one big box of size 3. 01:07:55.650 --> 01:08:00.210 But technically, a string, we've just revealed, is an array, 01:08:00.210 --> 01:08:01.830 and the array is of size 3. 01:08:01.830 --> 01:08:03.750 So technically, if the string is called s, 01:08:03.750 --> 01:08:05.970 s bracket 0 will give you the first character, 01:08:05.970 --> 01:08:09.810 s bracket 1, the second, and s bracket 3, the third. 01:08:09.810 --> 01:08:13.290 But let me ask this question now, if this, at the end of the day, 01:08:13.290 --> 01:08:16.560 is the only thing in your computer memory 01:08:16.560 --> 01:08:20.790 and the ability, like a canvas to draw 0s and 1s, or numbers, or characters, 01:08:20.790 --> 01:08:22.620 or whatever on it, but that's it, like this 01:08:22.620 --> 01:08:25.770 is what your Mac, and PC, and phone ultimately reduced to. 01:08:25.770 --> 01:08:29.730 Suppose that I'm running a piece of software, like a text messenger, 01:08:29.730 --> 01:08:33.000 and now I write down bye exclamation point. 01:08:33.000 --> 01:08:34.860 Well, where might that go in memory? 01:08:34.860 --> 01:08:35.845 Well, it might go here. 01:08:35.845 --> 01:08:39.333 B-Y-E. And then the next thing I type might go here, here, here and so forth. 01:08:39.333 --> 01:08:41.250 My memory just might get filled up, over time, 01:08:41.250 --> 01:08:44.310 with things that you or someone else are typing. 01:08:44.310 --> 01:08:50.580 But then how does the computer know if, potentially, B-Y-E exclamation point 01:08:50.580 --> 01:08:56.150 is right after H-I exclamation point where one string ends and the next one 01:08:56.150 --> 01:08:56.650 begins? 01:08:58.930 --> 01:08:59.430 Right? 01:08:59.430 --> 01:09:03.070 All we have are bytes, or 0s and 1s. 01:09:03.070 --> 01:09:05.730 So if you were designing this, how would you 01:09:05.730 --> 01:09:08.280 implement some kind of delimiter between the two? 01:09:08.280 --> 01:09:10.260 Or figure out what the length of a string is? 01:09:10.260 --> 01:09:11.010 What do you think? 01:09:11.010 --> 01:09:12.148 AUDIENCE: A nul character. 01:09:12.148 --> 01:09:15.107 DAVID MALAN: OK, so the right answer is use a nul character, 01:09:15.107 --> 01:09:17.190 and for those who don't know, what does that mean? 01:09:17.190 --> 01:09:19.492 AUDIENCE: It's special. 01:09:19.492 --> 01:09:21.450 DAVID MALAN: Yeah, so it's a special character. 01:09:21.450 --> 01:09:23.520 Let me describe it as a sentinel character. 01:09:23.520 --> 01:09:25.575 Humans decided some time ago that you know 01:09:25.575 --> 01:09:28.560 what, if we want to delineate where one string ends 01:09:28.560 --> 01:09:32.010 and where the next one begins, we just need some special symbol. 01:09:32.010 --> 01:09:35.189 And the symbol they'll use is generally written as backslash 0. 01:09:35.189 --> 01:09:39.555 This is just shorthand notation for literally eight 0 bits. 01:09:39.555 --> 01:09:42.540 0, 0, 0, 0, 0, 0, 0, 0. 01:09:42.540 --> 01:09:46.140 And the nickname for eight 0 bits, in this context, 01:09:46.140 --> 01:09:48.930 is nul, N-U-L, so to speak. 01:09:48.930 --> 01:09:51.910 And we can actually see this as follows. 01:09:51.910 --> 01:09:53.913 If you look at the corresponding decimal digits, 01:09:53.913 --> 01:09:56.580 like you could do by doing out the math or doing the conversion, 01:09:56.580 --> 01:10:01.560 like we've done in code, you would see for storing hi, 72, 73, 33, 01:10:01.560 --> 01:10:06.600 but then 1 extra byte that's sort of invisibly there, but that is all 0s. 01:10:06.600 --> 01:10:09.120 And now I've just written it as the decimal number 0. 01:10:09.120 --> 01:10:12.120 The implication of this is that the computer is apparently 01:10:12.120 --> 01:10:16.695 using, not 3 bytes to store a word like hi, but 4 bytes. 01:10:16.695 --> 01:10:22.050 Whatever the length of the string is, plus 1 for this special sentinel value 01:10:22.050 --> 01:10:24.640 that demarcates the end of the string. 01:10:24.640 --> 01:10:26.680 So we might draw it like this instead. 01:10:26.680 --> 01:10:31.350 And this character is, again, pronounced nul, or written N-U-L. 01:10:31.350 --> 01:10:32.319 So that's all, right? 01:10:32.319 --> 01:10:35.069 If humans, at the end of the day, just have this canvas of memory, 01:10:35.069 --> 01:10:36.902 they just needed to decide, all right, well, 01:10:36.902 --> 01:10:39.990 how do we distinguish one string from another? 01:10:39.990 --> 01:10:42.660 It's a lot easier with chars, individually, it's 01:10:42.660 --> 01:10:45.450 a lot easier with ints, it's even easier With floats, why? 01:10:45.450 --> 01:10:49.620 Because, per that chart earlier, every character is always 1 byte. 01:10:49.620 --> 01:10:51.810 Every int is always 4 bytes. 01:10:51.810 --> 01:10:54.750 Every long is always 8 bytes. 01:10:54.750 --> 01:10:56.279 How long is a string? 01:10:56.279 --> 01:10:59.760 Well, hi is 1, 2, 3 with an exclamation point. 01:10:59.760 --> 01:11:03.029 Bye is 1, 2, 3, 4 with an exclamation point. 01:11:03.029 --> 01:11:06.450 David is D-A-V-I-D, five without an exclamation point. 01:11:06.450 --> 01:11:10.210 And so a string can be any number of bytes long, 01:11:10.210 --> 01:11:12.700 so you somehow need to draw a line in the sand 01:11:12.700 --> 01:11:16.706 to separate in memory one string from another. 01:11:16.706 --> 01:11:19.412 So what's the implication of this? 01:11:19.412 --> 01:11:20.870 Well, let me go back to code, here. 01:11:20.870 --> 01:11:22.210 Let's actually poke around. 01:11:22.210 --> 01:11:27.130 This is a bit dangerous, but I'm going to start looking at memory locations 01:11:27.130 --> 01:11:29.210 past my string here. 01:11:29.210 --> 01:11:33.250 So let me go ahead and recompile, make hi. 01:11:33.250 --> 01:11:35.110 Whoops, what did I do here? 01:11:35.110 --> 01:11:36.680 I forgot a format code. 01:11:36.680 --> 01:11:38.620 Let me add one more %i. 01:11:38.620 --> 01:11:42.550 Now let me go ahead and rerun make hi, ./hi, Enter. 01:11:42.550 --> 01:11:43.580 There it is. 01:11:43.580 --> 01:11:46.660 So you can actually see in the computer, unbeknownst to you 01:11:46.660 --> 01:11:49.830 previously, that there's indeed something else going on there. 01:11:49.830 --> 01:11:52.880 And if I were to make one other variant of this program-- 01:11:52.880 --> 01:11:55.630 let's get rid of just this one word and let's have two. 01:11:55.630 --> 01:11:57.550 So let me give myself another string called t, 01:11:57.550 --> 01:12:01.810 for instance, just this common convention with bye exclamation point. 01:12:01.810 --> 01:12:04.900 Let me, then print out with %s. 01:12:04.900 --> 01:12:10.785 And let me also print out with %s, whoops, printf, print out t, as well. 01:12:10.785 --> 01:12:14.320 Let me recompile this program, and obviously the out-- 01:12:14.320 --> 01:12:17.470 ugh-- this is what happens when I go too fast. 01:12:17.470 --> 01:12:20.740 All right, third mistake today, close quote. 01:12:20.740 --> 01:12:22.030 As I was missing. 01:12:22.030 --> 01:12:23.590 Make hi. 01:12:23.590 --> 01:12:25.000 Fourth mistake today. 01:12:25.000 --> 01:12:26.200 Make hi. 01:12:26.200 --> 01:12:27.490 Dot slash hi. 01:12:27.490 --> 01:12:28.210 OK, voila. 01:12:28.210 --> 01:12:30.610 Now we have a program that's printing both hi and bye, 01:12:30.610 --> 01:12:34.720 only so that we can consider what's going on in the computer's memory. 01:12:34.720 --> 01:12:40.210 If s is storing hi and apparently one bonus byte that 01:12:40.210 --> 01:12:43.240 demarcates the end of that string, bye is apparently 01:12:43.240 --> 01:12:46.413 going to fit into the location directly after. 01:12:46.413 --> 01:12:49.330 And it's wrapping around, but that's just an artist's rendition, here. 01:12:49.330 --> 01:12:52.000 But bye, B-Y-E exclamation point is taking up 01:12:52.000 --> 01:12:58.948 1, 2, 3, 4, plus a fifth byte, as well. 01:12:58.948 --> 01:13:03.580 All right, any questions on this underlying representation of strings? 01:13:03.580 --> 01:13:05.560 And we'll contextualize this, before long, 01:13:05.560 --> 01:13:07.840 so that this isn't just like, OK, who really cares? 01:13:07.840 --> 01:13:10.730 This is going to be the source of actually implementing things. 01:13:10.730 --> 01:13:13.510 In fact for problem set 2, like cryptography, and encryption, 01:13:13.510 --> 01:13:15.468 and scrambling actual human messages. 01:13:15.468 --> 01:13:16.510 But some questions first. 01:13:16.510 --> 01:13:20.650 AUDIENCE: So normally if you were to not use string, 01:13:20.650 --> 01:13:23.480 you would just make a character range that would declare, 01:13:23.480 --> 01:13:26.580 how many characters there are so you know how many characters are 01:13:26.580 --> 01:13:27.330 going to be there. 01:13:27.330 --> 01:13:29.480 DAVID MALAN: A good question, too and let 01:13:29.480 --> 01:13:32.115 me summarize as, if we were instead to use chars all the time, 01:13:32.115 --> 01:13:35.240 we would indeed have to know in advance how many chars you want for a given 01:13:35.240 --> 01:13:38.750 string that you're storing, how, then, does something like get string work, 01:13:38.750 --> 01:13:41.000 because when you CS50 wrote the get string function, 01:13:41.000 --> 01:13:43.190 we obviously don't know how long the words are 01:13:43.190 --> 01:13:45.020 going to be that you all are typing in. 01:13:45.020 --> 01:13:48.560 It turns out, two weeks from now we'll see that get string 01:13:48.560 --> 01:13:51.320 uses a technique known as dynamic memory allocation. 01:13:51.320 --> 01:13:55.770 And it's going to grow or shrink the array automatically for you. 01:13:55.770 --> 01:13:57.050 But more on that soon. 01:13:57.050 --> 01:13:57.920 Other questions? 01:13:57.920 --> 01:14:01.450 AUDIENCE: Why are we using a nul value? 01:14:01.450 --> 01:14:02.725 Isn't that wasting a byte? 01:14:02.725 --> 01:14:03.850 DAVID MALAN: Good question. 01:14:03.850 --> 01:14:06.880 Why are we using a nul value, isn't it wasting a byte? 01:14:06.880 --> 01:14:07.630 Yes. 01:14:07.630 --> 01:14:13.210 But I claim there's really no other way to distinguish the end of one string 01:14:13.210 --> 01:14:19.748 from the start of another, unless we make some sort of notation in memory. 01:14:19.748 --> 01:14:22.540 All we have, at the end of the day, inside of a computer, are bits. 01:14:22.540 --> 01:14:25.900 Therefore, all we can do is spin those bits in some creative way 01:14:25.900 --> 01:14:27.520 to solve this problem. 01:14:27.520 --> 01:14:30.710 So we're minimally going to spend 1 byte to solve this problem. 01:14:30.710 --> 01:14:31.210 Yeah? 01:14:31.210 --> 01:14:35.897 AUDIENCE: How does our memory device know to enter a line when you type 01:14:35.897 --> 01:14:39.270 the /n if we don't have it stored as a char? 01:14:39.270 --> 01:14:40.910 DAVID MALAN: If you don't-- 01:14:40.910 --> 01:14:44.690 how does the computer know to move to a next line when you have a /n? 01:14:44.690 --> 01:14:47.990 So /n, even though it looks like two characters, 01:14:47.990 --> 01:14:51.890 it's actually stored as just 1 byte in the computer's memory. 01:14:51.890 --> 01:14:54.357 There's a mapping between it and an actual number. 01:14:54.357 --> 01:14:57.440 And you can see that, for instance, on the ASCII chart from the other day. 01:14:57.440 --> 01:15:01.224 AUDIENCE: So with that being stored would be the [INAUDIBLE].. 01:15:01.224 --> 01:15:02.420 DAVID MALAN: It would be. 01:15:02.420 --> 01:15:08.210 If I had put a /n in my code here, right after the exclamation point here 01:15:08.210 --> 01:15:11.840 and here, that would actually shift everything in memory because we would 01:15:11.840 --> 01:15:16.740 need to make room for a /n here and another one over here. 01:15:16.740 --> 01:15:18.913 So it would take two more bytes, exactly. 01:15:18.913 --> 01:15:19.580 Other questions? 01:15:19.580 --> 01:15:26.050 AUDIENCE: So if hi exclamation point is written in binary and ASCII 01:15:26.050 --> 01:15:32.630 too as 72, 73, 33, if we are to write those numbers in the string, 01:15:32.630 --> 01:15:39.090 and convert them into binary how would the computer know what's 72 01:15:39.090 --> 01:15:40.390 and what's 8? 01:15:40.390 --> 01:15:42.390 DAVID MALAN: And what's the last thing you said? 01:15:42.390 --> 01:15:43.806 AUDIENCE: 8, for example. 01:15:43.806 --> 01:15:45.700 DAVID MALAN: It's context sensitive. 01:15:45.700 --> 01:15:48.450 So if, at the end of the day, all we're storing is these numbers, 01:15:48.450 --> 01:15:52.380 like 72, 73, 33, recall that it's up to the program 01:15:52.380 --> 01:15:55.470 to decide, based on context, how to interpret them. 01:15:55.470 --> 01:15:59.310 And I simplified this story in week 0 saying that Photoshop interprets them 01:15:59.310 --> 01:16:02.910 as RGB colors, and iMessage or a text messaging program 01:16:02.910 --> 01:16:07.440 interprets them as letters, and Excel interprets them as numbers. 01:16:07.440 --> 01:16:12.540 How those programs do it is by way of variables like string, and int, 01:16:12.540 --> 01:16:13.080 and float. 01:16:13.080 --> 01:16:14.872 And in fact, later this semester, we'll see 01:16:14.872 --> 01:16:19.500 a data type via which you can represent a color as a triple of numbers, 01:16:19.500 --> 01:16:22.240 and red value, a green value, and a blue value. 01:16:22.240 --> 01:16:24.600 So we'll see other data types as well. 01:16:24.600 --> 01:16:25.100 Yeah? 01:16:25.100 --> 01:16:29.320 AUDIENCE: It seems easy enough to just add a nul thing at the end of the word, 01:16:29.320 --> 01:16:32.190 so why do we have integers and long integers? 01:16:32.190 --> 01:16:35.192 Why can't we make everything variable in its data size? 01:16:35.192 --> 01:16:36.900 DAVID MALAN: Really interesting question. 01:16:36.900 --> 01:16:40.110 Why could we not just make all data types variable in size? 01:16:40.110 --> 01:16:43.560 And some languages, some libraries do exactly this. 01:16:43.560 --> 01:16:47.100 C is an older language, and because memory was expensive 01:16:47.100 --> 01:16:48.300 memory was limited. 01:16:48.300 --> 01:16:50.640 The reality was you gain benefits from just 01:16:50.640 --> 01:16:53.010 standardizing the size of these things. 01:16:53.010 --> 01:16:55.410 You also get performance increases in the sense 01:16:55.410 --> 01:16:59.620 that if you know every int is 4 bytes, you can very quickly, 01:16:59.620 --> 01:17:02.220 and we'll see this next week, jump from integer to another, 01:17:02.220 --> 01:17:06.600 to another in memory just by adding 4 inside of those square brackets. 01:17:06.600 --> 01:17:08.430 You can very quickly poke around. 01:17:08.430 --> 01:17:11.522 Whereas, if you had variable length numbers, you would have to, 01:17:11.522 --> 01:17:13.980 kind of, follow, follow, follow, looking for the end of it. 01:17:13.980 --> 01:17:16.780 Follow, follow-- you would have to look at more locations in memory. 01:17:16.780 --> 01:17:18.322 So that's a topic we'll come back to. 01:17:18.322 --> 01:17:20.700 But it was generally for efficiency. 01:17:20.700 --> 01:17:22.170 And other question, yeah? 01:17:22.170 --> 01:17:27.942 AUDIENCE: Why not store the nul character [INAUDIBLE] 01:17:27.942 --> 01:17:31.520 DAVID MALAN: Good question why not store the-- 01:17:31.520 --> 01:17:35.540 why not store the nul character at the beginning? 01:17:35.540 --> 01:17:41.890 You could-- let's see, why not store it at the beginning? 01:17:41.890 --> 01:17:45.080 You could do that. 01:17:45.080 --> 01:17:48.325 You could absolutely-- well, could you do this? 01:17:51.580 --> 01:17:56.380 If you were to do that at the beginning-- 01:17:56.380 --> 01:17:57.400 short answer, no. 01:17:57.400 --> 01:17:58.420 OK, now I retract that. 01:17:58.420 --> 01:18:00.628 No, because I finally thought of a problem with this. 01:18:00.628 --> 01:18:02.483 If you store it at the beginning instead, 01:18:02.483 --> 01:18:04.900 we'll see in just a moment how you can actually write code 01:18:04.900 --> 01:18:07.150 to figure out where the end of a string is, 01:18:07.150 --> 01:18:09.550 and the problem there is wouldn't necessarily 01:18:09.550 --> 01:18:13.000 know if you eventually hit a 0 at the end of the string, 01:18:13.000 --> 01:18:16.810 because it's the number 0 in the context of Excel using some memory, 01:18:16.810 --> 01:18:20.180 or if it's the context of some other data type, altogether. 01:18:20.180 --> 01:18:22.600 So the fact that we've standardized-- 01:18:22.600 --> 01:18:26.560 the fact that we've standardized strings as ending with nul 01:18:26.560 --> 01:18:30.655 means that we can reliably distinguish one variable from another in memory. 01:18:30.655 --> 01:18:32.560 And that's actually a perfect segue way, now, 01:18:32.560 --> 01:18:35.693 to actually using this primitive to building up 01:18:35.693 --> 01:18:38.360 our own code that manipulates these things that are lower level. 01:18:38.360 --> 01:18:39.560 So let me do this. 01:18:39.560 --> 01:18:41.650 Let me create a new file called length. 01:18:41.650 --> 01:18:46.000 And let's use this basic idea to figure out what the length of a string 01:18:46.000 --> 01:18:50.720 is after it's been stored in a variable. 01:18:50.720 --> 01:18:51.860 So let's do this. 01:18:51.860 --> 01:18:56.530 Let me include both the CS50 header and the standard I/O header, 01:18:56.530 --> 01:19:01.250 give myself int main(void) again here, and inside of main, do this. 01:19:01.250 --> 01:19:04.060 Let me prompt the user for a string s and I'll ask them 01:19:04.060 --> 01:19:08.170 for a string like their name, here. 01:19:08.170 --> 01:19:13.420 And then let me name it more verbosely name this time. 01:19:13.420 --> 01:19:15.170 Now let me go ahead and do this. 01:19:15.170 --> 01:19:20.260 Let me iterate over every character in this string 01:19:20.260 --> 01:19:22.180 in order to figure out what its length is. 01:19:22.180 --> 01:19:25.060 So initially, I'm going to go ahead and say this, 01:19:25.060 --> 01:19:28.040 int length equals 0, because I don't know what it is yet. 01:19:28.040 --> 01:19:29.290 So we're going to start at 0. 01:19:29.290 --> 01:19:32.410 And then while the following is true-- 01:19:32.410 --> 01:19:37.370 while-- let me-- do I want to do this? 01:19:37.370 --> 01:19:40.060 Let me change this to i, just for clarity, let me do 01:19:40.060 --> 01:19:45.790 this, while name bracket i does not equal that special nul character. 01:19:45.790 --> 01:19:49.180 So I typed it on the slide is N-U-L, but you don't write N-U-L in code, 01:19:49.180 --> 01:19:53.665 you actually use its numeric equivalent, which is /0 in single quotes. 01:19:53.665 --> 01:19:58.930 While name bracket i does not equal the nul character, I'm going to go ahead 01:19:58.930 --> 01:20:02.470 and increment i to i plus plus. 01:20:02.470 --> 01:20:05.470 And then down here I'm going to print out the value of i 01:20:05.470 --> 01:20:09.270 to see what we actually get, printing out the value of i. 01:20:09.270 --> 01:20:11.020 All right, so what's going to happen here? 01:20:11.020 --> 01:20:13.420 Let me run make length. 01:20:13.420 --> 01:20:14.740 Fortunately no errors. 01:20:14.740 --> 01:20:19.570 ./length and let me type in something like H-I, exclamation point, Enter. 01:20:19.570 --> 01:20:20.740 And I get 3. 01:20:20.740 --> 01:20:23.950 Let me try bye, exclamation point, Enter. 01:20:23.950 --> 01:20:25.870 And I get 4. 01:20:25.870 --> 01:20:28.510 Let me try my own name, David, Enter. 01:20:28.510 --> 01:20:29.970 5, and so forth. 01:20:29.970 --> 01:20:31.880 So what's actually going on here? 01:20:31.880 --> 01:20:34.490 Well, it seems that by way of this 4 loop, 01:20:34.490 --> 01:20:36.622 we are specifying a local variable called 01:20:36.622 --> 01:20:39.580 i initialized to 0, because we're figuring out the length of the string 01:20:39.580 --> 01:20:40.580 as we go. 01:20:40.580 --> 01:20:44.050 I'm then asking the question, does location 0, 01:20:44.050 --> 01:20:49.300 that is i in the name string, which we now know is an array, 01:20:49.300 --> 01:20:51.700 does it not equal /0? 01:20:51.700 --> 01:20:55.645 Because if it doesn't, that means it's an actual character like H, or B, or D. 01:20:55.645 --> 01:20:57.640 So let's increment i. 01:20:57.640 --> 01:21:00.910 Then, let's come back around to line 9 and let's ask the question again. 01:21:00.910 --> 01:21:02.590 Now i equals 1. 01:21:02.590 --> 01:21:06.420 So does name bracket 1 not equal /0? 01:21:06.420 --> 01:21:12.070 Well, if it doesn't, and it won't if it's an i, or a y, or an a, 01:21:12.070 --> 01:21:15.490 based on what I typed in, we're going to increment i once more. 01:21:15.490 --> 01:21:18.940 Fast-forward to the end of the story, once I get to the end of the string, 01:21:18.940 --> 01:21:22.420 technically, one space past the end of the string, 01:21:22.420 --> 01:21:25.510 name bracket i will equal /0. 01:21:25.510 --> 01:21:29.960 So I don't increment i anymore, I end up just printing the result. 01:21:29.960 --> 01:21:34.510 So what we seem to have here with some low level C code, just this while loop, 01:21:34.510 --> 01:21:39.070 is a program that figures out the length of a given string that's been typed in. 01:21:39.070 --> 01:21:41.860 Let's practice our abstraction and decompose this into, 01:21:41.860 --> 01:21:43.270 maybe, a helper function here. 01:21:43.270 --> 01:21:47.110 Let me grab all of this code here, and assume, 01:21:47.110 --> 01:21:51.580 for the sake of discussion for a moment, that I can call a function now called 01:21:51.580 --> 01:21:53.740 string length. 01:21:53.740 --> 01:21:56.830 And the length of the string is name that I want to get, 01:21:56.830 --> 01:22:01.000 and then I'll go ahead and print out, just as before with %i, 01:22:01.000 --> 01:22:02.398 the length of that string. 01:22:02.398 --> 01:22:04.690 So now I'm abstracting away this notion of figuring out 01:22:04.690 --> 01:22:05.732 the length of the string. 01:22:05.732 --> 01:22:08.470 That's an opportunity for to me to create my own function. 01:22:08.470 --> 01:22:11.515 If I want to create a function called string length, 01:22:11.515 --> 01:22:15.610 I'll claim that I want to take a string as input, 01:22:15.610 --> 01:22:20.860 and what should I have this function return as its return type? 01:22:20.860 --> 01:22:26.090 What should get string presumably return? 01:22:26.090 --> 01:22:26.590 Yeah? 01:22:26.590 --> 01:22:27.430 AUDIENCE: Int. 01:22:27.430 --> 01:22:28.270 DAVID MALAN: An int, right? 01:22:28.270 --> 01:22:29.020 An int makes sense. 01:22:29.020 --> 01:22:30.937 Float really wouldn't make sense because we're 01:22:30.937 --> 01:22:33.377 measuring things that are integers. 01:22:33.377 --> 01:22:34.960 In this case, the length of something. 01:22:34.960 --> 01:22:36.640 So indeed, let's have it return an int. 01:22:36.640 --> 01:22:39.380 I can use the same code as before, so I'm 01:22:39.380 --> 01:22:42.175 going to paste what I cut earlier in the file. 01:22:42.175 --> 01:22:46.660 The only thing I have to change is the name of the variable. 01:22:46.660 --> 01:22:50.240 Because now this function, I decided arbitrarily 01:22:50.240 --> 01:22:53.130 that I'm going to call it s, just to be more generic. 01:22:53.130 --> 01:22:55.915 So I'm going to look at s bracket i at each location. 01:22:55.915 --> 01:22:58.790 And I don't want to print it at the end, this would be a side effect. 01:22:58.790 --> 01:23:01.250 What's the line of code I should include here if I actually 01:23:01.250 --> 01:23:04.005 want to hand back the total length? 01:23:04.005 --> 01:23:04.505 Yeah? 01:23:04.505 --> 01:23:05.362 AUDIENCE: Return i. 01:23:05.362 --> 01:23:06.320 DAVID MALAN: Say again? 01:23:06.320 --> 01:23:07.112 AUDIENCE: Return i. 01:23:07.112 --> 01:23:09.270 DAVID MALAN: Return i, in this case. 01:23:09.270 --> 01:23:11.540 So I'm going return i, not print it. 01:23:11.540 --> 01:23:16.490 Because now, my main function can use the return value stored in length 01:23:16.490 --> 01:23:18.530 and print it on the next line itself. 01:23:18.530 --> 01:23:22.520 I just need a prototype, so that's my one forgivable copy paste here. 01:23:22.520 --> 01:23:24.170 I'm going to rerun make length. 01:23:24.170 --> 01:23:25.640 Hopefully I didn't screw up. 01:23:25.640 --> 01:23:29.330 I didn't. ./length, I'll type in hi-- oops-- 01:23:29.330 --> 01:23:31.340 I'll type in hi, again. 01:23:31.340 --> 01:23:31.880 That works. 01:23:31.880 --> 01:23:34.970 I'll type in bye again, and so forth. 01:23:34.970 --> 01:23:38.703 So now we have a function that determines the length of a string. 01:23:38.703 --> 01:23:41.120 Well, it turns out we didn't actually need this all along. 01:23:41.120 --> 01:23:46.042 It turns out that we can get rid of my own custom string length function here. 01:23:46.042 --> 01:23:48.500 I can definitely delete the whole implementation down here. 01:23:48.500 --> 01:23:52.160 Because it turns out, in a file called string.h, 01:23:52.160 --> 01:23:55.520 which is a new header file today, we actually have access to a function 01:23:55.520 --> 01:23:59.690 called, more succinctly, strlen, S-T-R-L-E-N. Which, 01:23:59.690 --> 01:24:01.130 literally does that. 01:24:01.130 --> 01:24:05.240 This is a function that comes with C, albeit in the string.h header file, 01:24:05.240 --> 01:24:09.450 and it does what we just implemented manually. 01:24:09.450 --> 01:24:13.340 So here's an example of, admittedly, a wheel we just reinvented, but no more. 01:24:13.340 --> 01:24:14.480 We don't have to do that. 01:24:14.480 --> 01:24:16.850 And how do what kinds of functions exist? 01:24:16.850 --> 01:24:21.260 Well, let me pop out of my browser here to a website that 01:24:21.260 --> 01:24:24.455 is a CS50's incarnation of what are called manual pages. 01:24:24.455 --> 01:24:28.070 It turns out that in a lot of systems, Macs, and Unix, 01:24:28.070 --> 01:24:31.100 and Linux systems, including the Visual Studio Code 01:24:31.100 --> 01:24:33.020 instance that we have in the cloud, there 01:24:33.020 --> 01:24:36.290 are publicly accessible manual pages for functions. 01:24:36.290 --> 01:24:39.770 They tend to be written very expertly, in a way that's 01:24:39.770 --> 01:24:41.160 not very beginner-friendly. 01:24:41.160 --> 01:24:45.650 So we have here at manual.cs50.io is CS50's version 01:24:45.650 --> 01:24:48.740 of manual pages that have this less-comfortable mode that 01:24:48.740 --> 01:24:51.290 give you a, sort of, cheat sheet of very frequently used, 01:24:51.290 --> 01:24:55.010 helpful functions in C. And we've translated the expert 01:24:55.010 --> 01:24:58.075 notation to things that a beginner can understand. 01:24:58.075 --> 01:25:02.190 So, for instance, let me go ahead and search for a string up at the top here. 01:25:02.190 --> 01:25:06.200 You'll see that there's documentation for our own get string function, 01:25:06.200 --> 01:25:08.510 but more interestingly down here, there's 01:25:08.510 --> 01:25:10.850 a whole bunch of string-related functions 01:25:10.850 --> 01:25:12.620 that we haven't even seen most of, yet. 01:25:12.620 --> 01:25:14.660 But there's indeed one here called strlen, 01:25:14.660 --> 01:25:16.620 calculate the length of a string. 01:25:16.620 --> 01:25:22.160 And so if I go to strlen here, I'll see some less-comfortable documentation 01:25:22.160 --> 01:25:22.970 for this function. 01:25:22.970 --> 01:25:25.400 And the way a manual page typically works, 01:25:25.400 --> 01:25:28.310 whether in CS50's format or any other, system 01:25:28.310 --> 01:25:30.950 is you see, typically, a synopsis of what header 01:25:30.950 --> 01:25:33.330 files you need to use the function. 01:25:33.330 --> 01:25:35.960 So you would copy paste these couple of lines here. 01:25:35.960 --> 01:25:39.530 You see what the prototype is of the function so 01:25:39.530 --> 01:25:42.533 that you know what its inputs are, if any, and its outputs are, if any. 01:25:42.533 --> 01:25:45.200 Then down below you might see a description, which in this case, 01:25:45.200 --> 01:25:46.320 is pretty straightforward. 01:25:46.320 --> 01:25:48.170 This function calculates the length of s. 01:25:48.170 --> 01:25:51.110 Then you see what the return value is, if any, 01:25:51.110 --> 01:25:54.310 and you might even see an example, like this one that we've whipped up here. 01:25:54.310 --> 01:25:57.012 So these manual pages which are again, accessible 01:25:57.012 --> 01:25:59.720 here, and we'll link to these in the problem sets moving forward, 01:25:59.720 --> 01:26:02.510 are pretty much the place to start when you want to figure out 01:26:02.510 --> 01:26:05.210 has a wheel been invented already? 01:26:05.210 --> 01:26:08.490 Is there a function that might help me solve some problems set problems 01:26:08.490 --> 01:26:11.900 so that I don't have to really get into the weeds of doing all 01:26:11.900 --> 01:26:13.712 of those lower-level steps as I've had. 01:26:13.712 --> 01:26:16.670 Sometimes the answer is going to be yes, sometimes it's going to be no. 01:26:16.670 --> 01:26:19.160 But again the point of our having just done this together 01:26:19.160 --> 01:26:21.950 is to reveal that even the functions you start taking for 01:26:21.950 --> 01:26:26.135 granted, they all reduce to some of these basic building blocks. 01:26:26.135 --> 01:26:29.600 At the end of the day, this is all that's inside of your computer 01:26:29.600 --> 01:26:30.950 is 0s and 1s. 01:26:30.950 --> 01:26:33.060 We're just learning, now, how to harness those 01:26:33.060 --> 01:26:37.220 and how to manipulate them ourselves. 01:26:37.220 --> 01:26:41.510 Any questions here on this? 01:26:41.510 --> 01:26:43.305 Any questions at all? 01:26:43.305 --> 01:26:43.805 Yeah. 01:26:43.805 --> 01:26:51.779 AUDIENCE: We did just see [INAUDIBLE] Is that so common 01:26:51.779 --> 01:26:54.035 that we would have to specify it, or is it not? 01:26:54.035 --> 01:26:55.160 DAVID MALAN: Good question. 01:26:55.160 --> 01:26:57.920 Is it so common that you would have to specify it or not? 01:26:57.920 --> 01:27:00.170 You do need to include its header files because that's 01:27:00.170 --> 01:27:01.670 where all of those prototypes are. 01:27:01.670 --> 01:27:05.190 You don't need to worry about linking it in with -l anything. 01:27:05.190 --> 01:27:07.340 And in fact, moving forward, you do not ever 01:27:07.340 --> 01:27:10.910 need to worry about linking in libraries when compiling your code. 01:27:10.910 --> 01:27:14.940 We, the staff, have configured make to do all of that for you automatically. 01:27:14.940 --> 01:27:17.030 We want you to understand that it is doing it, 01:27:17.030 --> 01:27:19.340 but we'll take care of all of the -l's for you. 01:27:19.340 --> 01:27:23.360 But the onus is on you for the prototypes and the header files. 01:27:23.360 --> 01:27:27.150 Other questions on these representations or techniques? 01:27:27.150 --> 01:27:27.650 Yeah? 01:27:27.650 --> 01:27:35.920 AUDIENCE: [INAUDIBLE] exclamation mark. 01:27:35.920 --> 01:27:40.524 How does it actually define the spaces [INAUDIBLE]?? 01:27:40.524 --> 01:27:41.920 DAVID MALAN: A good question. 01:27:41.920 --> 01:27:45.700 If you were to have a string with actual spaces in it that is multiple words, 01:27:45.700 --> 01:27:47.530 what would the computer actually do? 01:27:47.530 --> 01:27:49.960 Well for this. let me go to asciichart.com. 01:27:49.960 --> 01:27:54.880 Which is just a random website that's my go-to for the first 127 characters 01:27:54.880 --> 01:27:55.930 of ASCII. 01:27:55.930 --> 01:27:58.520 This is, in fact, what we had a screenshot of the other day. 01:27:58.520 --> 01:28:02.088 And if you look here, it's a little non-obvious, but S-P is space. 01:28:02.088 --> 01:28:05.380 If a computer were to store a space, it would actually store the decimal number 01:28:05.380 --> 01:28:10.430 32, or technically, the pattern of 0s and 1s that represent the number 32. 01:28:10.430 --> 01:28:13.240 All of the US English keys that you might type on a keyboard 01:28:13.240 --> 01:28:16.390 can be represented with a number, and using Unicode can 01:28:16.390 --> 01:28:18.920 you express even things like emojis and other languages. 01:28:18.920 --> 01:28:19.420 Yeah? 01:28:19.420 --> 01:28:23.130 AUDIENCE: Are only strings followed by nul number, 01:28:23.130 --> 01:28:26.516 or let's say we had a series of numbers, would each one of them 01:28:26.516 --> 01:28:27.845 be accompanied by nuls? 01:28:27.845 --> 01:28:28.970 DAVID MALAN: Good question. 01:28:28.970 --> 01:28:31.790 Only strings are accompanied by nuls at the end 01:28:31.790 --> 01:28:34.760 because every other data type we've talked about thus far 01:28:34.760 --> 01:28:37.130 is of well defined finite length. 01:28:37.130 --> 01:28:40.190 1 byte for char, 4 bytes for ints and so forth. 01:28:40.190 --> 01:28:44.240 If we think back to last week, we did end the week with a couple of problems. 01:28:44.240 --> 01:28:48.080 Integer overflow, because 4 bytes, heck, even 8 bytes is sometimes not enough. 01:28:48.080 --> 01:28:50.270 We also talked about floating point imprecision. 01:28:50.270 --> 01:28:53.480 Thankfully in the world of scientific computing and financial computing, 01:28:53.480 --> 01:28:56.930 there are libraries you can use that draw inspiration 01:28:56.930 --> 01:28:58.820 from this idea of a string, and they might 01:28:58.820 --> 01:29:02.640 use 9 bytes for an integer value or maybe 20 bytes 01:29:02.640 --> 01:29:04.170 that you can count really high. 01:29:04.170 --> 01:29:06.680 But they will then start to manage that memory for you 01:29:06.680 --> 01:29:09.960 and what they're really probably doing is just grabbing a whole bunch of bytes 01:29:09.960 --> 01:29:13.070 and somehow remembering how long the sequence of bytes is. 01:29:13.070 --> 01:29:16.190 That's how these higher-level libraries work, too. 01:29:16.190 --> 01:29:17.700 All right, this has been a lot. 01:29:17.700 --> 01:29:19.080 Let's take one more break here. 01:29:19.080 --> 01:29:20.670 We'll do a seven-minute break here. 01:29:20.670 --> 01:29:23.465 And when we come back, we'll flesh out a few more details. 01:29:23.465 --> 01:29:26.390 All right. 01:29:26.390 --> 01:29:31.400 So we just saw strlen as an example of a function that 01:29:31.400 --> 01:29:32.898 comes in the string library. 01:29:32.898 --> 01:29:35.690 Let's start to take more of these library functions out for a spin. 01:29:35.690 --> 01:29:39.530 So we're not relying only on the built ins that we saw last week. 01:29:39.530 --> 01:29:41.660 Let me switch over to VS Code. 01:29:41.660 --> 01:29:46.040 And create a file called, say string.h. 01:29:46.040 --> 01:29:48.115 to apply this lesson learned, as follows. 01:29:48.115 --> 01:29:54.770 Let me include cs50.h, stdio.h, and this new thing, 01:29:54.770 --> 01:29:57.260 string.h as well, at the top. 01:29:57.260 --> 01:29:59.698 I'm going to do the usual int main(void) here. 01:29:59.698 --> 01:30:02.240 And then in this program suppose, for the sake of discussion, 01:30:02.240 --> 01:30:05.540 that I didn't know about %s for printf or, heck, 01:30:05.540 --> 01:30:09.300 maybe early on there was no %s format code. 01:30:09.300 --> 01:30:12.420 And so there was no easy way to print strings. 01:30:12.420 --> 01:30:15.830 Well, at least if we know that strings are just arrays of characters, 01:30:15.830 --> 01:30:19.820 we could use %c as a workaround, a solution to that, 01:30:19.820 --> 01:30:21.420 sort of, contrived problem. 01:30:21.420 --> 01:30:24.920 So let me ask myself for a string s by using get string here 01:30:24.920 --> 01:30:27.500 and I'll ask the user for some input. 01:30:27.500 --> 01:30:33.260 And then, let me print out say, output , and all I want to do is print back out 01:30:33.260 --> 01:30:34.460 what the user typed. 01:30:34.460 --> 01:30:38.000 Now, the simplest way to do this, of course, is going to be like last week, 01:30:38.000 --> 01:30:40.960 printf %s, and plug in the s, and we're done. 01:30:40.960 --> 01:30:43.730 But again, for the sake of discussion, I forgot about, 01:30:43.730 --> 01:30:47.820 or someone didn't implement %s, so how else could we do this? 01:30:47.820 --> 01:30:51.800 Well, in pseudo code, or in English what's the gist of how we could solve 01:30:51.800 --> 01:30:58.910 this problem, printing out the string s on the screen without using %s? 01:30:58.910 --> 01:31:02.420 How might we go about solving this? 01:31:02.420 --> 01:31:04.147 Just in English, high-level? 01:31:04.147 --> 01:31:05.730 What would your pseudo code look like? 01:31:05.730 --> 01:31:06.230 Yeah? 01:31:06.230 --> 01:31:09.568 AUDIENCE: You could just print each letter. 01:31:09.568 --> 01:31:11.360 DAVID MALAN: OK, so just print each letter. 01:31:11.360 --> 01:31:13.490 And maybe, more precisely, some kind of loop. 01:31:13.490 --> 01:31:17.030 Like, let's iterate over all of the characters in s 01:31:17.030 --> 01:31:18.150 and print one at a time. 01:31:18.150 --> 01:31:19.290 So how can I do that? 01:31:19.290 --> 01:31:24.050 Well, for int i, get 0 is kind of the go-to starting point for most loops, 01:31:24.050 --> 01:31:25.580 i is less than-- 01:31:25.580 --> 01:31:27.365 OK, how long do I want to iterate? 01:31:27.365 --> 01:31:29.240 Well, it's going to depend on what I type in, 01:31:29.240 --> 01:31:31.300 but that's why we have strlen now. 01:31:31.300 --> 01:31:36.080 So iterate up to the length of s, and then increment i with plus 01:31:36.080 --> 01:31:37.075 plus on each iteration. 01:31:37.075 --> 01:31:40.670 And then let's just print out %c with no new line, 01:31:40.670 --> 01:31:43.010 because I want everything on the same line, 01:31:43.010 --> 01:31:47.780 whatever the character is at s bracket i. 01:31:47.780 --> 01:31:49.790 And then at the very end, I'll give myself 01:31:49.790 --> 01:31:52.350 that new line, just to move the cursor down to the next line 01:31:52.350 --> 01:31:54.350 so the dollar sign is not in a weird place. 01:31:54.350 --> 01:31:57.230 All right, so let's see if I didn't screw up any of the code, 01:31:57.230 --> 01:32:02.690 make string, Enter, so far so good, string and let me type in something 01:32:02.690 --> 01:32:04.520 like, hi, Enter. 01:32:04.520 --> 01:32:06.020 And I see output of hi, too. 01:32:06.020 --> 01:32:09.680 Let me do it once more with bye, Enter, and that works, too. 01:32:09.680 --> 01:32:12.410 Notice I very deliberately and quickly gave myself 01:32:12.410 --> 01:32:15.260 two spaces here and one space here just because I, literally, 01:32:15.260 --> 01:32:18.620 wanted these things to line up properly, and input is shorter than output. 01:32:18.620 --> 01:32:21.830 But that was just a deliberate formatting detail. 01:32:21.830 --> 01:32:23.520 So this code is correct. 01:32:23.520 --> 01:32:29.240 Which is a claim I've made before, but it's not well-designed. 01:32:29.240 --> 01:32:33.170 It is well-designed in that I'm using someone else's library function, 01:32:33.170 --> 01:32:35.660 like, I've not reinvented a wheel, there's no line 15 01:32:35.660 --> 01:32:38.270 or below, I didn't implement string length myself. 01:32:38.270 --> 01:32:43.640 So I'm at least practicing what I've preached. 01:32:43.640 --> 01:32:48.360 But there's still an imperfection, a suboptimality. 01:32:48.360 --> 01:32:50.910 This one's really subtle though. 01:32:50.910 --> 01:32:54.330 And you have to think about how loops work. 01:32:54.330 --> 01:32:58.640 What am I doing that's not super efficient? 01:32:58.640 --> 01:32:59.870 Yeah, in back? 01:32:59.870 --> 01:33:03.178 AUDIENCE: [INAUDIBLE] over and over again. 01:33:03.178 --> 01:33:04.970 DAVID MALAN: Yeah, this is a little subtle. 01:33:04.970 --> 01:33:07.460 But if you think back to the basic definition of a 4 loop 01:33:07.460 --> 01:33:10.070 and recall when I highlighted things last week, what happens? 01:33:10.070 --> 01:33:12.830 Well, the first thing is that i gets set to 0. 01:33:12.830 --> 01:33:14.310 Then we check the condition. 01:33:14.310 --> 01:33:15.560 How do we check the condition? 01:33:15.560 --> 01:33:18.380 We call strlen on s, we get back an answer 01:33:18.380 --> 01:33:24.810 like 3 if it's a H-I exclamation point and 0 is less than 3, so that's fine, 01:33:24.810 --> 01:33:26.570 and then we print out the character. 01:33:26.570 --> 01:33:29.060 Then we increment i from 0 to 1. 01:33:29.060 --> 01:33:30.468 We recheck the condition. 01:33:30.468 --> 01:33:31.760 How do I recheck the condition? 01:33:31.760 --> 01:33:34.100 I call strlen of s. 01:33:34.100 --> 01:33:36.890 Get back the same answer, 3. 01:33:36.890 --> 01:33:38.720 Compare 3 against 1. 01:33:38.720 --> 01:33:39.800 We're still good. 01:33:39.800 --> 01:33:44.690 So we print out another character. i gets incremented again, i is now 2. 01:33:44.690 --> 01:33:46.035 We check the condition. 01:33:46.035 --> 01:33:46.910 What's the condition? 01:33:46.910 --> 01:33:47.960 Well, what's the string like the best? 01:33:47.960 --> 01:33:48.980 It's still 3. 01:33:48.980 --> 01:33:51.860 2 is still less than 3. 01:33:51.860 --> 01:33:55.430 So I keep asking the same question sort of stupidly 01:33:55.430 --> 01:33:58.220 because the string is, presumably, never changing in length. 01:33:58.220 --> 01:34:00.158 And indeed, every time I check that condition, 01:34:00.158 --> 01:34:01.700 that function is going to get called. 01:34:01.700 --> 01:34:04.380 And every time, the answer for hi is going to be 3. 01:34:04.380 --> 01:34:04.880 3. 01:34:04.880 --> 01:34:06.095 3. 01:34:06.095 --> 01:34:10.850 So it's a marginal suboptimality, but I could do better, right? 01:34:10.850 --> 01:34:15.560 Don't ask multiple times questions that you can remember the answer to. 01:34:15.560 --> 01:34:20.960 So how could I remember the answer to this question and ask it just once? 01:34:20.960 --> 01:34:24.750 How could I remember the answer to this question? 01:34:24.750 --> 01:34:25.250 Let me see. 01:34:25.250 --> 01:34:26.030 Yeah, back there? 01:34:26.030 --> 01:34:27.446 AUDIENCE: Store it in a variable. 01:34:27.446 --> 01:34:29.180 DAVID MALAN: So store it in a variable, right? 01:34:29.180 --> 01:34:32.097 That's been our answer most any time we want to keep something around. 01:34:32.097 --> 01:34:33.120 So how could I do this? 01:34:33.120 --> 01:34:37.880 Well, I could do something like this, int, maybe, length equals strlen of s. 01:34:37.880 --> 01:34:41.200 Then I can just change this function call. 01:34:41.200 --> 01:34:43.160 Let me fix my spelling here. 01:34:43.160 --> 01:34:47.360 Let me fix this to be comparing against length, and this is now OK. 01:34:47.360 --> 01:34:50.240 Because now strlen is only called once on line 9. 01:34:50.240 --> 01:34:52.740 And I'm reusing the value of that variable, a.k.a. 01:34:52.740 --> 01:34:54.240 length, again, and again, and again. 01:34:54.240 --> 01:34:55.282 So that's more efficient. 01:34:55.282 --> 01:34:59.760 Turns out that 4 loops let you declare multiple variables at once, 01:34:59.760 --> 01:35:04.020 so we can do this a little more elegantly all in one line. 01:35:04.020 --> 01:35:06.770 And this is just some syntactic improvement. 01:35:06.770 --> 01:35:11.930 I could actually do something like this, n equals strlen of s, 01:35:11.930 --> 01:35:14.750 and then I could just say n here or I could call it length. 01:35:14.750 --> 01:35:17.667 But heck, while I'm being succinct I'm just going to use n for number. 01:35:17.667 --> 01:35:22.100 So now it's just a marginal change but I've now 01:35:22.100 --> 01:35:26.030 declared two variables inside of my loop, i and n. 01:35:26.030 --> 01:35:29.300 i is set to 0. n extends to the string length of s. 01:35:29.300 --> 01:35:33.380 But now, hereafter, all of my condition checks are just, i less than n, 01:35:33.380 --> 01:35:36.170 i less than n, and n is never changing. 01:35:36.170 --> 01:35:38.008 All right, so a marginal improvement there. 01:35:38.008 --> 01:35:39.800 Now that I've used this new function, let's 01:35:39.800 --> 01:35:41.925 use some other functions that might be of interest. 01:35:41.925 --> 01:35:48.680 Let me write a quick program here that capitalizes the beginning of-- 01:35:48.680 --> 01:35:51.810 changes to uppercase some string that the user types in. 01:35:51.810 --> 01:35:55.490 So let me code a file called uppercase.c. 01:35:55.490 --> 01:36:01.520 Up here I'll use my new friends, cs50.h, and standard I/O, and string.h. 01:36:01.520 --> 01:36:07.070 So standard I/O, and string.h So just as before int main(void). 01:36:07.070 --> 01:36:09.620 And then inside of main, what I'm going to do this time, 01:36:09.620 --> 01:36:14.390 is let's ask the user for a string s using get string asking them 01:36:14.390 --> 01:36:15.680 for the before value. 01:36:15.680 --> 01:36:20.130 And then let me print out something like after. 01:36:20.130 --> 01:36:24.410 So that it-- just so I can see what the uppercase version thereof is. 01:36:24.410 --> 01:36:28.610 And then after this, let me do the following, for int, i 01:36:28.610 --> 01:36:32.030 equals 0, oh, let's practice that same lesson, 01:36:32.030 --> 01:36:37.790 so n equals the string length of s, i is less than n, i plus plus. 01:36:37.790 --> 01:36:41.600 So really, nothing new, fundamentally yet. 01:36:41.600 --> 01:36:47.270 How do I now convert characters from lowercase, if they are, to uppercase? 01:36:47.270 --> 01:36:50.000 In other words, if I type in hi, H-I in lowercase, 01:36:50.000 --> 01:36:55.490 I want my program, now, to uppercase everything to capital H, capital I. 01:36:55.490 --> 01:36:58.770 Well how can I go about doing this? 01:36:58.770 --> 01:37:01.010 Well you might recall that there is this-- 01:37:01.010 --> 01:37:03.900 you might recall that there is this ASCII chart. 01:37:03.900 --> 01:37:06.855 So let's just consult this real quick on asciichart.com. 01:37:06.855 --> 01:37:11.510 We've looked at this last week notice that a-- capital A is 65, 01:37:11.510 --> 01:37:15.440 capital B is 66, capital C is 67, and heck, here's 01:37:15.440 --> 01:37:19.640 lowercase a, lowercase b, lowercase c, and that's 97, 98, 99. 01:37:19.640 --> 01:37:22.980 And if I actually do some math, there's a distance of 32. 01:37:22.980 --> 01:37:23.480 Right? 01:37:23.480 --> 01:37:25.640 So if I want to go from uppercase to lowercase, 01:37:25.640 --> 01:37:30.788 I can do 65 plus 32 will give me 97 and that actually works out 01:37:30.788 --> 01:37:32.330 across the board for everything else. 01:37:32.330 --> 01:37:36.020 66 plus 32 gets me to 98 or lowercase b. 01:37:36.020 --> 01:37:40.640 Or conversely, if you have a lowercase a, and its value is 97, 01:37:40.640 --> 01:37:46.850 subtract 32 and boom, you have capital A. So there's some arithmetic involved. 01:37:46.850 --> 01:37:49.460 But now that we know that strings are just arrays, 01:37:49.460 --> 01:37:53.330 and we know that characters, which are in those arrays, 01:37:53.330 --> 01:37:56.450 are just binary representations of numbers, 01:37:56.450 --> 01:37:59.297 I think we can manipulate a few of these things as follows. 01:37:59.297 --> 01:38:01.130 Let me go back to my program here, and first 01:38:01.130 --> 01:38:05.360 ask the question, if the current character in the array during this loop 01:38:05.360 --> 01:38:08.930 is lowercase, let's force it to uppercase. 01:38:08.930 --> 01:38:10.250 So how am I going to do that? 01:38:10.250 --> 01:38:16.460 If the character at s bracket i, the current location in the array, 01:38:16.460 --> 01:38:21.320 is greater than or equal to lowercase a, and s bracket 01:38:21.320 --> 01:38:26.660 i is less than or equal to lowercase z, kind of a weird Boolean 01:38:26.660 --> 01:38:31.460 expression but it's completely legitimate, because in this array 01:38:31.460 --> 01:38:34.230 s is a whole bunch of characters that the humans typed in, 01:38:34.230 --> 01:38:37.520 because that's what a string is, greater than or equal to a might 01:38:37.520 --> 01:38:39.680 be a little nonsensical because when have you ever 01:38:39.680 --> 01:38:41.330 compared numbers to letters? 01:38:41.330 --> 01:38:47.568 But we know from week 0 lowercase a is 97, lowercase z is, what is it, 1? 01:38:47.568 --> 01:38:48.485 I don't even remember. 01:38:48.485 --> 01:38:49.065 AUDIENCE: 132. 01:38:49.065 --> 01:38:49.850 DAVID MALAN: What's that? 01:38:49.850 --> 01:38:50.590 AUDIENCE: 132? 01:38:50.590 --> 01:38:52.590 DAVID MALAN: 132, We know. 01:38:52.590 --> 01:38:56.390 And so that would allow us to answer the question is the current letter 01:38:56.390 --> 01:38:57.410 lowercase? 01:38:57.410 --> 01:39:00.530 All right, so let me answer that question. 01:39:00.530 --> 01:39:03.140 If it is, what do I want to print out? 01:39:03.140 --> 01:39:05.870 I don't want to print out the letter itself, 01:39:05.870 --> 01:39:09.290 I want to print out the letter minus 32, right? 01:39:09.290 --> 01:39:13.160 Because if it happens to be a lowercase a, 97, 97 minus 32 01:39:13.160 --> 01:39:15.530 gives me 65, which is uppercase A, and I know that 01:39:15.530 --> 01:39:18.860 just from having stared at that chart in the past. 01:39:18.860 --> 01:39:24.172 Else if the character is not between little a and big A, 01:39:24.172 --> 01:39:25.880 I'm just going to print out the character 01:39:25.880 --> 01:39:28.550 itself by printing s bracket i. 01:39:28.550 --> 01:39:31.580 And at the very end of this, I'm going to print out a new line just 01:39:31.580 --> 01:39:33.480 to move the cursor to the next line. 01:39:33.480 --> 01:39:34.930 So again, it's a little wordy. 01:39:34.930 --> 01:39:39.020 But this loop here, which I borrowed from our code previously, 01:39:39.020 --> 01:39:41.510 just iterates over the string, a.k.a. 01:39:41.510 --> 01:39:44.630 array, character-by-character, through its length. 01:39:44.630 --> 01:39:47.360 This line 11 here is just asking the question 01:39:47.360 --> 01:39:50.870 if that current character, the i-th character of s, 01:39:50.870 --> 01:39:53.900 is greater than or equal to little a and less 01:39:53.900 --> 01:39:59.240 than or equal to little z, that is between 97 and 132, then 01:39:59.240 --> 01:40:04.940 we're going to go ahead and force it to uppercase instead. 01:40:04.940 --> 01:40:09.290 All right, and let me zoom out here for just a second. 01:40:09.290 --> 01:40:14.270 And sorry, I misspoke 122, which is what you might have said. 01:40:14.270 --> 01:40:15.630 There's only 26 letters. 01:40:15.630 --> 01:40:17.270 So 122 is little z. 01:40:17.270 --> 01:40:20.280 Let me go ahead now and compile and run this program. 01:40:20.280 --> 01:40:26.210 So make uppercase, ./uppercase, and let me type in hi in lowercase, Enter. 01:40:26.210 --> 01:40:28.520 And there's the capitalized version, thereof. 01:40:28.520 --> 01:40:30.920 Let me do it again, with my own name in lowercase, 01:40:30.920 --> 01:40:33.100 and now it's capitalized as well. 01:40:33.100 --> 01:40:34.860 Well, what could we do to improve this? 01:40:34.860 --> 01:40:35.360 Well. 01:40:35.360 --> 01:40:35.960 You know what? 01:40:35.960 --> 01:40:37.640 Let's stop reinventing wheels. 01:40:37.640 --> 01:40:39.840 Let's go to the manual pages. 01:40:39.840 --> 01:40:43.490 So let me go here and search for something like, I don't know, 01:40:43.490 --> 01:40:44.540 lowercase. 01:40:44.540 --> 01:40:45.620 And there I go. 01:40:45.620 --> 01:40:48.470 I did some auto complete here, our little search box 01:40:48.470 --> 01:40:50.720 is saying that, OK there's an is-lower function, 01:40:50.720 --> 01:40:52.550 check whether a character is lowercase. 01:40:52.550 --> 01:40:53.640 Well how do I use this? 01:40:53.640 --> 01:40:59.150 Well let me check, is lower, now I see the actual man page for this function. 01:40:59.150 --> 01:41:01.850 Now we see, include ctype.h. 01:41:01.850 --> 01:41:02.902 So that's the protot-- 01:41:02.902 --> 01:41:04.610 that's the header file I need to include. 01:41:04.610 --> 01:41:08.570 This is the prototype for is-lower, it apparently takes a char as input 01:41:08.570 --> 01:41:10.330 and returns an int. 01:41:10.330 --> 01:41:11.330 Which is a little weird. 01:41:11.330 --> 01:41:14.400 I feel like is-lower should return true or false. 01:41:14.400 --> 01:41:18.680 So let's scroll down to the description and return value. 01:41:18.680 --> 01:41:20.810 It returns, oh this is interesting. 01:41:20.810 --> 01:41:25.370 And this is a convention in C. This function returns a non-zero int 01:41:25.370 --> 01:41:30.820 if C is a lowercase letter and 0 if C is not a lowercase letter. 01:41:30.820 --> 01:41:33.230 So it returns non-zero. 01:41:33.230 --> 01:41:38.330 So like 1, negative 1, something that's not 0 if C is a lowercase letter, 01:41:38.330 --> 01:41:41.400 and 0 if it is not a lowercase letter. 01:41:41.400 --> 01:41:43.160 So how can we use this building block? 01:41:43.160 --> 01:41:45.230 Let me go back to my code here. 01:41:45.230 --> 01:41:49.610 Let me add this file, include ctype.h. 01:41:49.610 --> 01:41:53.120 And down here, let me get rid of this cryptic expression, which 01:41:53.120 --> 01:41:59.060 was kind of painful to come up with, and just ask this, is-lower s bracket i? 01:42:01.970 --> 01:42:05.390 That should actually work but why? 01:42:05.390 --> 01:42:10.520 Well is-lower, again, returns a non-zero value if the letter is lowercase. 01:42:10.520 --> 01:42:12.150 Well, what does that mean? 01:42:12.150 --> 01:42:13.415 That means it could return 1. 01:42:13.415 --> 01:42:14.540 It could return negative 1. 01:42:14.540 --> 01:42:16.370 It could return 50 or negative 50. 01:42:16.370 --> 01:42:18.650 It's actually not precisely defined, why? 01:42:18.650 --> 01:42:19.700 Just, because. 01:42:19.700 --> 01:42:23.750 This was a common convention to use 0 to represent false and use 01:42:23.750 --> 01:42:26.120 any other value to represent true. 01:42:26.120 --> 01:42:30.140 And so it turns out, that inside of Boolean expressions, 01:42:30.140 --> 01:42:34.755 if you put a value like a function call like this, that returns 0, 01:42:34.755 --> 01:42:36.380 that's going to be equivalent to false. 01:42:36.380 --> 01:42:38.975 It's like the answer being no, it is not lower. 01:42:38.975 --> 01:42:41.990 But you can also, in parentheses, put the name 01:42:41.990 --> 01:42:45.920 of the function and its arguments, and not compare it against anything. 01:42:45.920 --> 01:42:51.230 Because we could do something like this, well if it's not equal to 0, then 01:42:51.230 --> 01:42:52.247 it must be lowercase. 01:42:52.247 --> 01:42:54.830 Because that's the definition, if it returns a non-zero value, 01:42:54.830 --> 01:42:55.760 it's lowercase. 01:42:55.760 --> 01:42:59.210 But a more succinct way to do that is just a bit more like English. 01:42:59.210 --> 01:43:04.110 If it's is lower, then print out the character minus 32. 01:43:04.110 --> 01:43:06.590 So this would be the common way of using one of these 01:43:06.590 --> 01:43:10.025 is- functions to check if the answer is true or false. 01:43:10.025 --> 01:43:12.810 AUDIENCE: [INAUDIBLE] 01:43:12.810 --> 01:43:14.670 DAVID MALAN: OK, well we might be done. 01:43:14.670 --> 01:43:15.170 OK. 01:43:15.170 --> 01:43:16.922 AUDIENCE: [INAUDIBLE] 01:43:16.922 --> 01:43:17.900 DAVID MALAN: No. 01:43:17.900 --> 01:43:19.520 So it's not necessarily 1. 01:43:19.520 --> 01:43:23.180 It would be incorrect to check for 1, or negative 1, or anything else. 01:43:23.180 --> 01:43:25.550 You want to check for the opposite of 0. 01:43:25.550 --> 01:43:26.870 So not equal 0. 01:43:26.870 --> 01:43:31.820 Or more succinctly, like I did by just putting it into parentheses. 01:43:31.820 --> 01:43:34.560 Let me see what happens here. 01:43:34.560 --> 01:43:38.690 So this is great, but some of you might have spotted a better solution 01:43:38.690 --> 01:43:39.680 to this problem. 01:43:39.680 --> 01:43:42.230 A moment ago when we were on the manual pages searching 01:43:42.230 --> 01:43:45.380 for things related to lowercase, what might be another building 01:43:45.380 --> 01:43:46.475 block we can employ here? 01:43:49.160 --> 01:43:50.700 Based on what's on the screen here? 01:43:50.700 --> 01:43:51.200 Yeah? 01:43:51.200 --> 01:43:52.888 AUDIENCE: To-upper. 01:43:52.888 --> 01:43:54.140 DAVID MALAN: So to-upper. 01:43:54.140 --> 01:43:57.098 There's a function that would literally do the uppercasing thing for me 01:43:57.098 --> 01:44:00.032 so I don't have to get into the weeds of negative 32, plus 32. 01:44:00.032 --> 01:44:01.490 I don't have to consult that chart. 01:44:01.490 --> 01:44:05.120 Someone has solved this problem for me in the past. 01:44:05.120 --> 01:44:09.680 And let's see if I can actually get back to it. 01:44:09.680 --> 01:44:10.520 There we go. 01:44:10.520 --> 01:44:12.540 Let me go ahead, now, and use this. 01:44:12.540 --> 01:44:15.230 So instead of doing s bracket i minus 32, 01:44:15.230 --> 01:44:19.880 let's use a function that someone else wrote, and just say to-upper, s bracket 01:44:19.880 --> 01:44:20.420 i. 01:44:20.420 --> 01:44:23.250 And now it's going to do the solution for me. 01:44:23.250 --> 01:44:30.530 So if I rerun make uppercase, and then do, slowly, .uppercase, type in hi, 01:44:30.530 --> 01:44:32.120 now it's working as expected. 01:44:32.120 --> 01:44:35.870 And honestly, if I read the documentation for to-upper 01:44:35.870 --> 01:44:39.170 by going back to its man page, or manual page, what you'll see 01:44:39.170 --> 01:44:44.420 is that it says if it's lowercase, it will return the uppercase version 01:44:44.420 --> 01:44:45.050 thereof. 01:44:45.050 --> 01:44:48.913 If it's not lowercase, it's already uppercase, it's punctuation, 01:44:48.913 --> 01:44:50.705 it will just return the original character. 01:44:50.705 --> 01:44:53.900 Which means, thanks to this function, I can actually 01:44:53.900 --> 01:44:57.650 tighten this up significantly, get rid of all of my conditional 01:44:57.650 --> 01:45:02.030 there, and just print out the to-upper return value, 01:45:02.030 --> 01:45:05.060 and leave it to whoever wrote that function to figure out 01:45:05.060 --> 01:45:09.470 if something's uppercase or lowercase. 01:45:09.470 --> 01:45:13.820 All right, questions on these kinds of tricks? 01:45:13.820 --> 01:45:17.090 Again, it all reduces to week 0 basics, but we're just 01:45:17.090 --> 01:45:18.750 building these abstractions on top. 01:45:18.750 --> 01:45:19.250 Yeah? 01:45:19.250 --> 01:45:21.208 AUDIENCE: I'm wondering if there's any way just 01:45:21.208 --> 01:45:25.110 to import all packages under a certain subdomain instead 01:45:25.110 --> 01:45:27.120 of having to do multiple [INAUDIBLE] statements, 01:45:27.120 --> 01:45:28.412 kind of like a star [INAUDIBLE] 01:45:28.412 --> 01:45:29.340 DAVID MALAN: Yes. 01:45:29.340 --> 01:45:30.180 Unfortunately, no. 01:45:30.180 --> 01:45:33.120 There is no easy way in C to say, give me everything. 01:45:33.120 --> 01:45:35.670 That was for, historically, performance reasons. 01:45:35.670 --> 01:45:38.940 They want you to be explicit as to what you want to include. 01:45:38.940 --> 01:45:41.730 In other languages like Python, Java, one of which 01:45:41.730 --> 01:45:44.513 we'll see later this term, you can say, give me everything. 01:45:44.513 --> 01:45:47.430 But that, actually, tends to be best practice because it can slow down 01:45:47.430 --> 01:45:50.000 execution or compilation of your code. 01:45:50.000 --> 01:45:50.500 Yeah? 01:45:50.500 --> 01:45:52.845 AUDIENCE: Does to-upper accommodate for special characters? 01:45:52.845 --> 01:45:53.340 DAVID MALAN: Ah. 01:45:53.340 --> 01:45:55.980 Does to-upper accommodate special characters like punctuation? 01:45:55.980 --> 01:45:56.480 Yes. 01:45:56.480 --> 01:45:58.440 If I read the documentation more pedantically, 01:45:58.440 --> 01:45:59.710 we would see exactly that. 01:45:59.710 --> 01:46:02.940 It will properly hand me back an exclamation point, 01:46:02.940 --> 01:46:04.600 even if I passed it in. 01:46:04.600 --> 01:46:08.970 So if I do make uppercase here, and let me do ./upper, sorry-- 01:46:08.970 --> 01:46:13.620 ./uppercase, hi with an exclamation point, it's going to handle that, too, 01:46:13.620 --> 01:46:15.810 pass it through unchanged Yeah? 01:46:15.810 --> 01:46:19.200 AUDIENCE: Do we access to a function that would do all of that 01:46:19.200 --> 01:46:21.590 but just to the screen rather than to [INAUDIBLE] 01:46:21.590 --> 01:46:23.550 DAVID MALAN: Really good question, too. 01:46:23.550 --> 01:46:28.110 No, we do not have access to a function that at least comes with C or comes 01:46:28.110 --> 01:46:31.740 with CS50's library that will just force the whole thing to uppercase. 01:46:31.740 --> 01:46:34.170 In C, that's actually easier said than done. 01:46:34.170 --> 01:46:35.550 In Python, it's trivial. 01:46:35.550 --> 01:46:39.810 So stay tuned for another language that will let us do exactly that. 01:46:39.810 --> 01:46:42.510 All right, so what does this leave us with? 01:46:42.510 --> 01:46:44.520 There's just a-- let's come full circle now, 01:46:44.520 --> 01:46:47.490 to where we began today where we were talking about those command line 01:46:47.490 --> 01:46:48.090 arguments. 01:46:48.090 --> 01:46:51.810 Recall that we talked about rm taking command line argument. 01:46:51.810 --> 01:46:54.470 The file you want to delete, we talked about clang 01:46:54.470 --> 01:46:56.220 taking command line arguments, that again, 01:46:56.220 --> 01:46:58.140 modify the behavior of the program. 01:46:58.140 --> 01:47:01.680 How is it that maybe you and I can start to write programs that 01:47:01.680 --> 01:47:03.840 actually take command line arguments? 01:47:03.840 --> 01:47:07.620 Well here is where I can finally explain why 01:47:07.620 --> 01:47:10.740 we've been typing int main(void) for the past week 01:47:10.740 --> 01:47:14.490 and just asking that you take on faith that it's just the way you do things. 01:47:14.490 --> 01:47:20.820 Well, by default in C, at least the most recent versions thereof, 01:47:20.820 --> 01:47:24.010 there's only two official ways to write main functions. 01:47:24.010 --> 01:47:26.460 You might see other formats online, but they're generally 01:47:26.460 --> 01:47:28.870 not consistent with the current specification. 01:47:28.870 --> 01:47:32.160 This, again, was sort of a boilerplate for the simplest 01:47:32.160 --> 01:47:34.770 function we might write last week, and recall that we've 01:47:34.770 --> 01:47:36.210 been doing this the whole time. 01:47:36.210 --> 01:47:40.990 (Void) What that (void) means, for all of the programs I have written thus far 01:47:40.990 --> 01:47:43.890 and you have written thus far, is that none of our programs 01:47:43.890 --> 01:47:47.040 that we've written take command line arguments. 01:47:47.040 --> 01:47:49.110 That's what the void there means. 01:47:49.110 --> 01:47:53.950 It turns out that main is the way you can specify that your program does, 01:47:53.950 --> 01:47:55.740 in fact, take command line arguments, that 01:47:55.740 --> 01:47:59.760 is words after the command in your terminal window. 01:47:59.760 --> 01:48:02.220 If you want to actually not use get int or get string, 01:48:02.220 --> 01:48:05.970 you want the human to be able to say something, like hello, David 01:48:05.970 --> 01:48:06.840 and hit Enter. 01:48:06.840 --> 01:48:09.940 And just run-- print hello, David on the screen. 01:48:09.940 --> 01:48:14.460 You can use command line arguments, words after the program name 01:48:14.460 --> 01:48:16.750 on your command line. 01:48:16.750 --> 01:48:20.460 So we're going to change this in a moment to be something more verbose, 01:48:20.460 --> 01:48:23.930 but something that's now a bit more familiar syntactically. 01:48:23.930 --> 01:48:28.440 If you change that (void) in main to be this incantation instead, 01:48:28.440 --> 01:48:33.480 int, argc, comma, string, argv, open bracket, close bracket, 01:48:33.480 --> 01:48:36.630 you are now giving yourself access to writing programs 01:48:36.630 --> 01:48:38.910 that take command line arguments. 01:48:38.910 --> 01:48:42.120 Argc, which stands for argument count is going 01:48:42.120 --> 01:48:46.410 to be an integer that stores how many words the human typed at the prompt. 01:48:46.410 --> 01:48:49.050 The C automatically gives that to you. 01:48:49.050 --> 01:48:52.710 String argv stands for argument vector, that's 01:48:52.710 --> 01:48:57.100 going to be an array of all of the words that the human typed at the prompt. 01:48:57.100 --> 01:48:59.130 So with today's building block of an array, 01:48:59.130 --> 01:49:01.980 we have the ability now to let the humans type as many words, 01:49:01.980 --> 01:49:03.900 or as few words, as they want at the prompt. 01:49:03.900 --> 01:49:06.900 C is going to automatically put them in an array called argv, 01:49:06.900 --> 01:49:12.360 and it's going to tell us how many words there are in an int called argc. 01:49:12.360 --> 01:49:16.060 The int, as the return type here, we'll come back to in just a moment. 01:49:16.060 --> 01:49:19.350 Let's use this definition to make, maybe, 01:49:19.350 --> 01:49:20.970 just a couple of simple programs. 01:49:20.970 --> 01:49:23.070 But in problem set 2 will we actually use 01:49:23.070 --> 01:49:26.470 this to control the behavior of your own code. 01:49:26.470 --> 01:49:33.120 Let me code up a file called argv.0 just to keep it aptly named. 01:49:33.120 --> 01:49:35.700 Let me include cs50.h. 01:49:35.700 --> 01:49:37.240 Let me go ahead and include-- 01:49:37.240 --> 01:49:37.740 oops. 01:49:37.740 --> 01:49:40.950 That is not the right name of a program, let's start that over. 01:49:40.950 --> 01:49:45.450 Let's go ahead and code up argv.c. 01:49:45.450 --> 01:49:46.800 And here we have-- 01:49:46.800 --> 01:49:52.890 include cs50.h, include stdio.h, int, main, not void, 01:49:52.890 --> 01:50:00.025 let's actually say int, argc, string, argv, open bracket, close bracket. 01:50:00.025 --> 01:50:02.400 No numbers in between because you don't know, in advance, 01:50:02.400 --> 01:50:05.310 how many words the human's going to type at their prompt. 01:50:05.310 --> 01:50:06.760 Now let's go ahead and do this. 01:50:06.760 --> 01:50:10.800 Let's write a very simple program that just says, hello, David, hello, Carter, 01:50:10.800 --> 01:50:12.660 whoever the name is that gets typed. 01:50:12.660 --> 01:50:16.260 But not using get string, let's instead have the human just 01:50:16.260 --> 01:50:19.890 type their name at the prompt, just like rm, just like clang, just like make, 01:50:19.890 --> 01:50:22.170 so it's just one and done when you hit Enter. 01:50:22.170 --> 01:50:23.610 No additional prompts. 01:50:23.610 --> 01:50:28.380 Let me go ahead then and do this, printf, quote-unquote, hello, 01:50:28.380 --> 01:50:31.500 comma, and instead of world today, I want to print out 01:50:31.500 --> 01:50:33.370 whatever the human typed in. 01:50:33.370 --> 01:50:38.850 So let's go ahead and do this, argv, bracket 0 for now. 01:50:38.850 --> 01:50:43.080 But I don't think this is quite what I want because, of course, 01:50:43.080 --> 01:50:48.370 that's going to literally print out argv, bracket, 0, bracket. 01:50:48.370 --> 01:50:52.510 I need a placeholder, so let me put %s here and then put that here. 01:50:52.510 --> 01:50:56.520 So if argv is an array, but it's an array of strings, 01:50:56.520 --> 01:51:00.480 then argv bracket 0 is itself a single string. 01:51:00.480 --> 01:51:03.450 And so it can be plugged into that %s placeholder. 01:51:03.450 --> 01:51:05.740 Let me go ahead and save my program. 01:51:05.740 --> 01:51:09.340 And compile argv, so far, so good. 01:51:09.340 --> 01:51:13.170 Let me now type in my name after the name of the program. 01:51:13.170 --> 01:51:13.980 So no get string. 01:51:13.980 --> 01:51:18.280 I'm literally typing an extra word, my own name at the prompt, Enter. 01:51:18.280 --> 01:51:21.290 OK, it's apparently a little buggy in a couple of ways. 01:51:21.290 --> 01:51:24.500 I forgot my /n but that's not a huge deal. 01:51:24.500 --> 01:51:28.960 But apparently, inside of argv is literally everything 01:51:28.960 --> 01:51:31.270 that humans typed in including the name of the program. 01:51:31.270 --> 01:51:36.250 So logically, how do I print out hello, David, or hello so-and-so and not 01:51:36.250 --> 01:51:37.720 the actual name of the program? 01:51:37.720 --> 01:51:38.960 What needs to change here? 01:51:38.960 --> 01:51:39.460 Yeah? 01:51:39.460 --> 01:51:41.050 AUDIENCE: Change the index to 1. 01:51:41.050 --> 01:51:41.800 DAVID MALAN: Yeah. 01:51:41.800 --> 01:51:45.940 So presumably index to 1, if that's the second thing I, or whichever human, 01:51:45.940 --> 01:51:46.940 has typed at the prompt. 01:51:46.940 --> 01:51:51.410 So let's do make argv again, ./argv, Enter. 01:51:51.410 --> 01:51:52.090 Huh. 01:51:52.090 --> 01:51:53.630 Hello, nul. 01:51:53.630 --> 01:51:55.690 So this is another form of nul. 01:51:55.690 --> 01:51:59.320 But this is user error, now, on my part. 01:51:59.320 --> 01:52:01.070 I didn't do exactly what I said I would. 01:52:01.070 --> 01:52:01.570 Yeah? 01:52:01.570 --> 01:52:02.530 AUDIENCE: You forgot the parameter. 01:52:02.530 --> 01:52:04.430 DAVID MALAN: Yeah, I forgot the parameter. 01:52:04.430 --> 01:52:05.700 So that's actually, hm. 01:52:05.700 --> 01:52:07.450 I should probably deal with that, somehow, 01:52:07.450 --> 01:52:09.292 so that people aren't breaking my program 01:52:09.292 --> 01:52:11.000 and printing out random things, like nul. 01:52:11.000 --> 01:52:14.770 But if I do say argv, David, now you see hello, David. 01:52:14.770 --> 01:52:18.070 I can get a little curious, like what's at location 2? 01:52:18.070 --> 01:52:23.410 Well we can see, make argv, bracket, ./argv, David, Enter. 01:52:23.410 --> 01:52:24.910 All right, so just nothing is there. 01:52:24.910 --> 01:52:28.202 But it turns out, in a couple of weeks, we'll start really poking around memory 01:52:28.202 --> 01:52:30.310 and see if we can't crash programs deliberately 01:52:30.310 --> 01:52:32.800 because nothing is stopping me from saying, 01:52:32.800 --> 01:52:36.470 oh what's at location 2 million, for instance? 01:52:36.470 --> 01:52:38.350 We could really start to get curious. 01:52:38.350 --> 01:52:40.420 But for now, we'll do the right thing. 01:52:40.420 --> 01:52:44.360 But let's now make sure the human has typed in the right number of words. 01:52:44.360 --> 01:52:50.920 So let's say this, if argc equals 2, that is the name of the program 01:52:50.920 --> 01:52:54.760 and one more word after that, go ahead and trust that in argv 1, 01:52:54.760 --> 01:52:56.980 as you proposed, is the person's name. 01:52:56.980 --> 01:53:01.810 Else, let's go ahead and default here to something simple and basic, 01:53:01.810 --> 01:53:05.860 like, well, if we don't get a name from the user, just say hello, world, 01:53:05.860 --> 01:53:07.300 like always. 01:53:07.300 --> 01:53:10.045 So now we're programming defensively. 01:53:10.045 --> 01:53:13.090 This time the human, even if they screw up, they don't give us a name 01:53:13.090 --> 01:53:15.965 or they give us too many names, we're just going to say hello, world, 01:53:15.965 --> 01:53:17.890 because I now have some error handling here. 01:53:17.890 --> 01:53:22.030 Because, again, argc is argument count, the number of words, total, 01:53:22.030 --> 01:53:23.990 typed at the command line. 01:53:23.990 --> 01:53:26.740 So make, argv, ./argv. 01:53:26.740 --> 01:53:28.540 Let me make the same mistake as before. 01:53:28.540 --> 01:53:29.050 OK. 01:53:29.050 --> 01:53:30.910 I don't get this weird nul behavior. 01:53:30.910 --> 01:53:32.350 I get something well-defined. 01:53:32.350 --> 01:53:33.610 I could now do David. 01:53:33.610 --> 01:53:36.850 I could do David Malan, but that's not currently supported. 01:53:36.850 --> 01:53:41.290 I would need to alter my logic to support more than just two words 01:53:41.290 --> 01:53:42.345 after the prompt. 01:53:42.345 --> 01:53:43.770 So what's the point of this? 01:53:43.770 --> 01:53:45.520 At the moment, it's just a simple exercise 01:53:45.520 --> 01:53:50.702 to actually give myself a way of taking user input when they run the program. 01:53:50.702 --> 01:53:52.660 Because, consider, it's just more convenient in 01:53:52.660 --> 01:53:54.670 this new, command-line-interface world. 01:53:54.670 --> 01:53:58.857 If you had to use get string every time you compile your code, 01:53:58.857 --> 01:54:00.190 it'd be kind of annoying, right? 01:54:00.190 --> 01:54:03.940 You type make, then you might get a prompt, what would you like to make? 01:54:03.940 --> 01:54:07.690 Then you type in hello, or cash, or something else, then you hit Enter, 01:54:07.690 --> 01:54:09.330 it just really slows the process. 01:54:09.330 --> 01:54:11.440 But in this command-line-interface world, 01:54:11.440 --> 01:54:14.770 if you support command line arguments, then you can use these little tricks. 01:54:14.770 --> 01:54:18.170 Like, scrolling up and down in your history with your arrow keys. 01:54:18.170 --> 01:54:22.430 You can just type commands more quickly because you can do it all at once. 01:54:22.430 --> 01:54:25.000 And you don't have to keep prompting the user, more 01:54:25.000 --> 01:54:27.760 pedantically, for more and more info. 01:54:27.760 --> 01:54:30.280 So any questions then on command line arguments? 01:54:30.280 --> 01:54:34.000 Which, finally, reveals why we had (void) initially, 01:54:34.000 --> 01:54:36.610 but what more we can now put in main. 01:54:36.610 --> 01:54:39.070 That's how you take command line arguments. 01:54:39.070 --> 01:54:40.500 Yeah? 01:54:40.500 --> 01:54:42.610 AUDIENCE: If you were to put-- 01:54:42.610 --> 01:54:47.320 if you were to use argv, and you were to put integers inside of it, 01:54:47.320 --> 01:54:49.923 would it still give you, like, a string? 01:54:49.923 --> 01:54:51.506 Would that still be considered string? 01:54:51.506 --> 01:54:52.923 Or would you consider [INAUDIBLE]? 01:54:52.923 --> 01:54:53.760 DAVID MALAN: Yes. 01:54:53.760 --> 01:54:56.550 If you were to type at the command line something 01:54:56.550 --> 01:55:00.660 like, not a word, but something like the number 42, 01:55:00.660 --> 01:55:03.450 that would actually be treated as a string. 01:55:03.450 --> 01:55:04.290 Why? 01:55:04.290 --> 01:55:06.220 Because again, context matters. 01:55:06.220 --> 01:55:08.940 So if your program is currently manipulating memory 01:55:08.940 --> 01:55:12.510 as though its characters or strings, whatever those patterns of 0s and 1s 01:55:12.510 --> 01:55:16.800 are, they will be interpreted as ASCII text, or Unicode text. 01:55:16.800 --> 01:55:20.640 If we therefore go to the chart here, that might make you wonder, well, 01:55:20.640 --> 01:55:24.510 then how do you distinguish numbers from letters in the context of something 01:55:24.510 --> 01:55:25.890 like chars and strings? 01:55:25.890 --> 01:55:34.380 Well, notice 65 is a, 97 is a, but also 49 is 1, and 50 is 2. 01:55:34.380 --> 01:55:37.500 So the designers of ASCII, and then later Unicode, 01:55:37.500 --> 01:55:40.680 realized well wait a minute, if we want to support programs 01:55:40.680 --> 01:55:43.440 that let you type things that look like numbers, 01:55:43.440 --> 01:55:46.350 even though they're not technically ints or floats, 01:55:46.350 --> 01:55:50.620 we need a way in ASCII and Unicode to represent even numbers. 01:55:50.620 --> 01:55:51.870 So here are your numbers. 01:55:51.870 --> 01:55:55.210 And it's a little silly that we have numbers representing other numbers. 01:55:55.210 --> 01:55:57.863 But again, if you're in the world of letters and characters, 01:55:57.863 --> 01:56:00.030 you've got to come up with a mapping for everything. 01:56:00.030 --> 01:56:01.790 And notice here, here's the dot. 01:56:01.790 --> 01:56:06.390 Even if you were to represent 1.23 as a string, or as characters, 01:56:06.390 --> 01:56:10.840 even the dot now is going to be represented as an ASCII character. 01:56:10.840 --> 01:56:12.930 So again, context here matters. 01:56:12.930 --> 01:56:17.370 All right, one final example to tease apart what this int is 01:56:17.370 --> 01:56:19.840 and what it's been doing here for so long. 01:56:19.840 --> 01:56:24.780 So I'm going to add one bit of logic to a new file 01:56:24.780 --> 01:56:27.750 that I'm going to call exit.c. 01:56:27.750 --> 01:56:29.130 So an exit.c. 01:56:29.130 --> 01:56:32.880 We're going to introduce something that are generally known as exit status. 01:56:32.880 --> 01:56:34.980 It turns out this is not a feature we've used yet, 01:56:34.980 --> 01:56:37.240 but it's just useful to know about. 01:56:37.240 --> 01:56:40.350 Especially when automating tests of your own code. 01:56:40.350 --> 01:56:44.115 When it comes to figuring out if a program succeeded or failed. 01:56:44.115 --> 01:56:48.870 It turns out that main has one more feature we haven't leveraged. 01:56:48.870 --> 01:56:54.330 An ability to signal to the user whether something was successful or not. 01:56:54.330 --> 01:56:57.760 And that's by way of main's return value. 01:56:57.760 --> 01:57:02.060 So I'm going modify this program as follows, like this. 01:57:02.060 --> 01:57:04.920 Suppose I want to write a similar program that 01:57:04.920 --> 01:57:07.900 requires that the user type a word at the prompt. 01:57:07.900 --> 01:57:12.450 So that argc has to be 2 for whatever design purpose. 01:57:12.450 --> 01:57:18.990 If argc does not equal 2, I want to quit out of my program prematurely. 01:57:18.990 --> 01:57:22.590 I want to insist that the user operate the program correctly. 01:57:22.590 --> 01:57:28.800 So I might give them an error message like, missing command line argument /n. 01:57:28.800 --> 01:57:31.180 But now I want to quit out of the program. 01:57:31.180 --> 01:57:32.310 Now how can I do that? 01:57:32.310 --> 01:57:37.260 The right way, quote-unquote, to do that is to return a value from main. 01:57:37.260 --> 01:57:40.590 Now it's a little weird because no one called main yet, 01:57:40.590 --> 01:57:42.990 right, main just gets called automatically, 01:57:42.990 --> 01:57:45.300 but the convention is anytime something goes 01:57:45.300 --> 01:57:50.100 wrong in a program you should return a non-zero value from main. 01:57:50.100 --> 01:57:51.780 1 is fine as a go-to. 01:57:51.780 --> 01:57:55.470 We don't need to get into the weeds of having many different exit statuses, 01:57:55.470 --> 01:57:56.220 so to speak. 01:57:56.220 --> 01:58:01.770 But if you return 1, that is a clue to the system, the Mac, the PC, the cloud 01:58:01.770 --> 01:58:03.430 device that's something went wrong. 01:58:03.430 --> 01:58:03.930 Why? 01:58:03.930 --> 01:58:05.670 Because 1 is not 0. 01:58:05.670 --> 01:58:11.460 If everything works fine, like, let's go ahead and print out hello comma %s like 01:58:11.460 --> 01:58:16.620 before, quote-unquote argv bracket 1. 01:58:16.620 --> 01:58:19.080 So this is just a version of the program without an else. 01:58:19.080 --> 01:58:21.390 So this is the same as doing, essentially, 01:58:21.390 --> 01:58:23.580 an else here like I did earlier. 01:58:23.580 --> 01:58:26.740 I want to signal to the computer that all is well. 01:58:26.740 --> 01:58:28.290 And so I return 0. 01:58:28.290 --> 01:58:31.650 But strictly speaking, if I'm already returning here, 01:58:31.650 --> 01:58:34.560 I don't technically need, if I really want to be nit picky, 01:58:34.560 --> 01:58:36.870 I don't technically need the else because the only way 01:58:36.870 --> 01:58:41.486 I'm going to get to line 11 is if I didn't already return. 01:58:41.486 --> 01:58:43.180 So what's going on here? 01:58:43.180 --> 01:58:46.530 The only new thing here logically, is that for the first time ever, 01:58:46.530 --> 01:58:48.810 I'm returning a value from main. 01:58:48.810 --> 01:58:50.730 That's something I could always have done 01:58:50.730 --> 01:58:55.290 because main has always been defined by us as taking an int as a return value. 01:58:55.290 --> 01:58:59.880 By default, main automatically, sort of secretly, returns 0 for you. 01:58:59.880 --> 01:59:02.850 If you've never once use the return keyword, which you probably 01:59:02.850 --> 01:59:05.370 haven't in main, it just automatically returns 0 01:59:05.370 --> 01:59:07.295 and the system assumes that all went well. 01:59:07.295 --> 01:59:09.390 But now that we're starting to get a little more 01:59:09.390 --> 01:59:11.520 sophisticated with our code, and you know, 01:59:11.520 --> 01:59:15.480 the programmer, something went wrong, you can abort programs early. 01:59:15.480 --> 01:59:20.610 You can exit out of them by returning some other value, besides 0, from main. 01:59:20.610 --> 01:59:23.040 And this is fortuitous that it's an int, right? 01:59:23.040 --> 01:59:25.110 0 means everything worked. 01:59:25.110 --> 01:59:29.250 Unfortunately, in programming, there are seemingly, an infinite number of things 01:59:29.250 --> 01:59:30.240 that can go wrong. 01:59:30.240 --> 01:59:33.210 And int gives you 4 billion possible codes 01:59:33.210 --> 01:59:36.455 that you can use, a.k.a. exit statuses, to signify errors. 01:59:36.455 --> 01:59:39.930 So if you've ever on your Mac or PC gotten some weird pop up 01:59:39.930 --> 01:59:43.320 that an error happened, sometimes, there's a cryptic number in it. 01:59:43.320 --> 01:59:45.420 Maybe it's positive, maybe it's negative. 01:59:45.420 --> 01:59:50.170 It might say error code 123, or negative 49, or something like that. 01:59:50.170 --> 01:59:54.310 What you're generally seeing, are these exit statuses, these return 01:59:54.310 --> 01:59:57.610 values from main in a program that someone at Microsoft, 01:59:57.610 --> 02:00:01.120 or Apple, or somewhere else wrote, something went wrong, 02:00:01.120 --> 02:00:05.980 they are unnecessarily showing you, the user what the error code is. 02:00:05.980 --> 02:00:09.100 If only, so that when you call customer support or submit a ticket, 02:00:09.100 --> 02:00:12.190 you can tell them what exit status you encountered, 02:00:12.190 --> 02:00:15.070 what error code you encounter. 02:00:15.070 --> 02:00:19.390 All right, any questions on exit statuses, 02:00:19.390 --> 02:00:24.580 which is the last of our new building blocks, for now? 02:00:24.580 --> 02:00:25.540 Any questions at all? 02:00:25.540 --> 02:00:26.040 Yeah? 02:00:26.040 --> 02:00:33.540 AUDIENCE: [INAUDIBLE] You know how if you have get string or get int, 02:00:33.540 --> 02:00:35.418 if you want to make [INAUDIBLE] 02:00:35.418 --> 02:00:36.085 DAVID MALAN: No. 02:00:36.085 --> 02:00:39.265 The question is can you do things again and again 02:00:39.265 --> 02:00:41.890 at the command line like you could with get string and get int. 02:00:41.890 --> 02:00:43.870 Which, by default, recall are automatically 02:00:43.870 --> 02:00:46.420 designed to keep prompting the user in their own loop 02:00:46.420 --> 02:00:49.960 until they give you an int, or a float, or the like with command line 02:00:49.960 --> 02:00:50.740 arguments, no. 02:00:50.740 --> 02:00:52.210 You're going to get an error message but then 02:00:52.210 --> 02:00:54.002 you're going to be returned to your prompt. 02:00:54.002 --> 02:00:57.387 And it's up to you to type it correctly the next time. 02:00:57.387 --> 02:00:57.970 Good question. 02:00:57.970 --> 02:00:58.470 Yeah? 02:00:58.470 --> 02:01:03.435 AUDIENCE: [INAUDIBLE] automatically for you. 02:01:03.435 --> 02:01:05.310 DAVID MALAN: If you do not return a value 02:01:05.310 --> 02:01:08.730 explicitly main will automatically return 0 for you, 02:01:08.730 --> 02:01:12.640 that is the way C simply works so it's not strictly necessary. 02:01:12.640 --> 02:01:15.510 But now that we're starting to return values explicitly, 02:01:15.510 --> 02:01:18.090 if something goes wrong, it would be good practice 02:01:18.090 --> 02:01:21.480 to also start returning a value for main when something goes right 02:01:21.480 --> 02:01:23.775 and there are no errors. 02:01:23.775 --> 02:01:27.810 So let's now get out of the weeds and contextualize 02:01:27.810 --> 02:01:31.200 this for some actual problems that we'll be solving in the coming days 02:01:31.200 --> 02:01:33.130 by way of problems set 2 and beyond. 02:01:33.130 --> 02:01:35.740 So here for instance-- 02:01:35.740 --> 02:01:39.990 So here for instance, is a problem that you might think back 02:01:39.990 --> 02:01:43.980 to when you were a kid the readability of some text or some book, 02:01:43.980 --> 02:01:46.230 the grade level in which some book is written. 02:01:46.230 --> 02:01:49.740 If you're a young student, you might read at first-grade level 02:01:49.740 --> 02:01:51.240 or third-grade level in the US. 02:01:51.240 --> 02:01:53.032 Or, if you're in college presumably, you're 02:01:53.032 --> 02:01:54.945 reading at a university-level of text. 02:01:54.945 --> 02:01:58.073 But what does it mean for text, like in a book, 02:01:58.073 --> 02:02:00.240 or in an essay, or something like that to correspond 02:02:00.240 --> 02:02:01.590 to some kind of grade level? 02:02:01.590 --> 02:02:04.950 Well, here's a quote-- a title of a childhood book. 02:02:04.950 --> 02:02:07.590 One Fish, Two Fish, Red Fish, Blue Fish. 02:02:07.590 --> 02:02:10.840 What might the grade level be for a book that has words like this? 02:02:10.840 --> 02:02:13.590 Maybe, when you were a kid or if you have a siblings still reading 02:02:13.590 --> 02:02:16.260 these things, what might the grade level of this thing be? 02:02:18.800 --> 02:02:19.590 Any guesses? 02:02:19.590 --> 02:02:20.090 Yeah? 02:02:20.090 --> 02:02:21.257 AUDIENCE: Before grade 1. 02:02:21.257 --> 02:02:22.340 DAVID MALAN: Sorry, again? 02:02:22.340 --> 02:02:23.382 AUDIENCE: Before grade 1. 02:02:23.382 --> 02:02:25.650 DAVID MALAN: Before grade 1 is, in fact, correct. 02:02:25.650 --> 02:02:27.290 So that's for really young kids? 02:02:27.290 --> 02:02:28.230 Why is that? 02:02:28.230 --> 02:02:29.180 Well, let's consider. 02:02:29.180 --> 02:02:32.210 These are pretty simple phrases, right? 02:02:32.210 --> 02:02:33.500 One fish, two fish, red-- 02:02:33.500 --> 02:02:35.960 I mean there's not even verbs in these sentences, 02:02:35.960 --> 02:02:40.040 they're just nouns and adjectives, and very short sentences. 02:02:40.040 --> 02:02:42.200 And so that might be a heuristic we could use. 02:02:42.200 --> 02:02:44.810 When analyzing text, well if the words are kind of short, 02:02:44.810 --> 02:02:47.240 the sentences are kind of short, everything's very simple, 02:02:47.240 --> 02:02:50.250 that's probably a very young, or early, grade level. 02:02:50.250 --> 02:02:53.665 And so by one formulation, it might indeed be even before grade 1, 02:02:53.665 --> 02:02:54.665 for someone quite young. 02:02:54.665 --> 02:02:55.670 How about this? 02:02:55.670 --> 02:02:58.022 Mr and Mrs. Dursley, of number 4, Privet Drive, 02:02:58.022 --> 02:03:00.980 were proud to say that they were perfectly normal, thank you very much. 02:03:00.980 --> 02:03:02.960 They were the last people you would expect 02:03:02.960 --> 02:03:05.120 to be involved in anything strange or mysterious 02:03:05.120 --> 02:03:07.850 because they just didn't hold with such nonsense. 02:03:07.850 --> 02:03:08.782 And, onward. 02:03:08.782 --> 02:03:10.490 All right, what grade level is this book? 02:03:10.490 --> 02:03:11.778 AUDIENCE: Third. 02:03:11.778 --> 02:03:13.070 DAVID MALAN: OK, I heard third. 02:03:13.070 --> 02:03:14.585 AUDIENCE: What? 02:03:14.585 --> 02:03:15.980 DAVID MALAN: Seventh, fifth. 02:03:15.980 --> 02:03:17.150 OK, all over the place. 02:03:17.150 --> 02:03:20.540 But grade 7, according to one particular measure. 02:03:20.540 --> 02:03:24.802 And whether or not we can debate exactly what age you were when you read this, 02:03:24.802 --> 02:03:27.260 and maybe you're feeling ahead of your time, or behind now. 02:03:27.260 --> 02:03:31.470 But here, we have a snippet of text. 02:03:31.470 --> 02:03:36.560 What makes this text assume an older audience, a more mature audience, 02:03:36.560 --> 02:03:39.690 a higher grade level, would you think? 02:03:39.690 --> 02:03:40.190 Yeah? 02:03:40.190 --> 02:03:42.415 AUDIENCE: [INAUDIBLE] 02:03:42.415 --> 02:03:45.110 DAVID MALAN: Yeah, it's longer, different types of words, 02:03:45.110 --> 02:03:47.513 there's commas now in phrases, and so forth. 02:03:47.513 --> 02:03:49.680 So there's just some kind of sophistication to this. 02:03:49.680 --> 02:03:52.280 So it turns out for the upcoming problem set, 02:03:52.280 --> 02:03:55.370 among the things you'll do is take, as input, texts like this 02:03:55.370 --> 02:03:56.510 and analyze them. 02:03:56.510 --> 02:03:59.072 Considering , well, how many words are in the text? 02:03:59.072 --> 02:04:00.530 How many sentences are in the text? 02:04:00.530 --> 02:04:02.375 How many letters are in the text? 02:04:02.375 --> 02:04:06.170 And use those according to a well-defined formula to prescribe what, 02:04:06.170 --> 02:04:09.680 exactly, the grade level of some actual text-- there's the third-- 02:04:09.680 --> 02:04:10.582 might actually be. 02:04:10.582 --> 02:04:12.790 Well what else are we going to do in the coming days? 02:04:12.790 --> 02:04:15.410 Well I've alluded to this notion of cryptography in the past. 02:04:15.410 --> 02:04:18.350 This notion of scrambling information in such a way 02:04:18.350 --> 02:04:21.422 that you can hide the contents of a message 02:04:21.422 --> 02:04:23.630 from someone who might otherwise intercept it, right? 02:04:23.630 --> 02:04:26.130 The earliest form of this might also be when you're younger, 02:04:26.130 --> 02:04:29.390 and you're in class, and you're passing a note from one person to another, 02:04:29.390 --> 02:04:30.650 from yourself to someone else. 02:04:30.650 --> 02:04:32.960 You don't want to necessarily write a note in English, 02:04:32.960 --> 02:04:35.120 or some other written, language you might want 02:04:35.120 --> 02:04:37.430 to scramble it somehow, or encrypt it. 02:04:37.430 --> 02:04:40.460 Maybe you change the As to a B, and the Bs to a C. 02:04:40.460 --> 02:04:42.770 So that if the teacher snaps it up and intercepts it, 02:04:42.770 --> 02:04:45.200 they can't actually understand what it is you've 02:04:45.200 --> 02:04:47.160 written because it's encrypted. 02:04:47.160 --> 02:04:49.610 So long as your friend, the recipient of this note, 02:04:49.610 --> 02:04:51.890 knows how you manipulated it. 02:04:51.890 --> 02:04:55.640 How you added or subtracted letters to each other, 02:04:55.640 --> 02:04:58.850 they can decrypt it, which is to reverse that process. 02:04:58.850 --> 02:05:02.070 So formally, in the world of cryptography and computer science, 02:05:02.070 --> 02:05:04.130 this is another problem to solve. 02:05:04.130 --> 02:05:07.173 Your input, though, when you have a message you want to send securely, 02:05:07.173 --> 02:05:08.840 is what's generally known as plain text. 02:05:08.840 --> 02:05:12.980 There's some algorithm that's going to then encipher, or encrypt 02:05:12.980 --> 02:05:16.100 that information, into what's called ciphertext, which 02:05:16.100 --> 02:05:18.650 is the scrambled version that theoretically can get safely 02:05:18.650 --> 02:05:21.110 intercepted and your message has not been spoiled, 02:05:21.110 --> 02:05:24.620 unless that intercept actually knows what algorithm 02:05:24.620 --> 02:05:27.150 you used inside of this process. 02:05:27.150 --> 02:05:29.720 So that would be generally known as a cipher. 02:05:29.720 --> 02:05:33.080 The ciphers typically take, though, not one input, but two. 02:05:33.080 --> 02:05:37.685 If, for instance, your cipher is as simple as A becomes B, 02:05:37.685 --> 02:05:41.420 B becomes C, C becomes D, dot dot dot, Z becomes A, 02:05:41.420 --> 02:05:45.140 you're essentially adding one to every letter and encrypting it. 02:05:45.140 --> 02:05:47.750 Now that would be, what we call, the key. 02:05:47.750 --> 02:05:51.470 You and the recipient both have to agree, presumably, before class, 02:05:51.470 --> 02:05:55.280 in advance, what number you're going to use that day to rotate, 02:05:55.280 --> 02:05:56.960 or change all of these letters by. 02:05:56.960 --> 02:06:00.410 Because when you add 1, they upon receiving your ciphertext 02:06:00.410 --> 02:06:03.090 have to subtract 1 to get back the answer. 02:06:03.090 --> 02:06:07.730 For instance, if the input, plaintext, is hi, as before, 02:06:07.730 --> 02:06:13.010 and the key is 1, the ciphertext using this simple rotational algorithm, 02:06:13.010 --> 02:06:17.720 otherwise known as the Caesar cipher, might be ij exclamation point. 02:06:17.720 --> 02:06:21.408 So it's similar, but it's at least scrambled at first glance. 02:06:21.408 --> 02:06:23.450 And unless the teacher really cares to figure out 02:06:23.450 --> 02:06:26.420 what algorithm are they using today, or what key are they using today, 02:06:26.420 --> 02:06:29.700 it's probably sufficiently secure for your purposes. 02:06:29.700 --> 02:06:31.160 How do you reverse the process? 02:06:31.160 --> 02:06:34.190 Well, your friend gets this and reverses it by negative 1. 02:06:34.190 --> 02:06:38.630 So I becomes H, J becomes I, and things like punctuation 02:06:38.630 --> 02:06:41.060 remain untouched at least in this scheme. 02:06:41.060 --> 02:06:43.580 So let's consider one final example here. 02:06:43.580 --> 02:06:51.080 If the input to the algorithm is Uijtxbtdt50, and the key 02:06:51.080 --> 02:06:53.090 this time is negative 1. 02:06:53.090 --> 02:06:59.510 Such that now B should become A, and C should become B, and A should become A. 02:06:59.510 --> 02:07:01.130 So we're going in the other direction. 02:07:01.130 --> 02:07:03.030 How might we analyze this? 02:07:03.030 --> 02:07:06.000 Well if we spread all the letters out, and we start from left to right, 02:07:06.000 --> 02:07:11.780 and we start subtracting one letter, U becomes T, I becomes H, J becomes I, 02:07:11.780 --> 02:07:17.220 T becomes S, X becomes W, A, was, D, T-- 02:07:17.220 --> 02:07:18.270 this was CS50. 02:07:18.270 --> 02:07:19.470 We'll see you next time. 02:07:19.470 --> 02:07:21.320 [APPLAUSE] 02:07:20.000 --> 02:07:56.000 [MUSIC PLAYING]