WEBVTT X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000 00:00:02.982 --> 00:00:06.461 [MUSIC PLAYING] 00:01:12.600 --> 00:01:13.590 DAVID MALAN: All right. 00:01:13.590 --> 00:01:17.130 This is CS50, and this is week 2 wherein we're 00:01:17.130 --> 00:01:20.610 going to take a look at a lower level at how things work, 00:01:20.610 --> 00:01:24.120 and indeed, among the goals of the course is this bottom-up understanding 00:01:24.120 --> 00:01:26.670 so that in a couple of weeks' time, even a few years' time, 00:01:26.670 --> 00:01:29.920 when you encounter some new technology, you'll be able to think back hopefully 00:01:29.920 --> 00:01:33.180 on some of this week's and this is basic building blocks and primitives 00:01:33.180 --> 00:01:36.060 and really just deduce how tomorrow's technologies work. 00:01:36.060 --> 00:01:37.685 But along the way, it's going to seem-- 00:01:37.685 --> 00:01:40.727 it's going to be a little hard, perhaps, to see the forest for the trees, 00:01:40.727 --> 00:01:41.380 so to speak. 00:01:41.380 --> 00:01:44.783 And so the goal at the end of the day still is going to be problem-solving. 00:01:44.783 --> 00:01:47.700 And so we thought we'd begin today with a look at some of the problems 00:01:47.700 --> 00:01:50.405 we'll talk about or solve this coming week, 00:01:50.405 --> 00:01:53.280 and for that, we have some brave volunteers who have already come up. 00:01:53.280 --> 00:01:58.320 If we could turn on some dramatic lighting and meet today's volunteers. 00:01:58.320 --> 00:02:00.430 So on my left here, we have-- 00:02:00.430 --> 00:02:00.930 ALEX: Hi. 00:02:00.930 --> 00:02:01.960 My name is Alex. 00:02:01.960 --> 00:02:05.340 I'm a first-year at the college and I'm from Chapel Hill, North Carolina. 00:02:05.340 --> 00:02:07.080 DAVID MALAN: Welcome to Alex. 00:02:07.080 --> 00:02:09.180 And to Alex's right. 00:02:09.180 --> 00:02:10.050 SARAH: I'm Sarah. 00:02:10.050 --> 00:02:13.230 I'm from Toronto, Canada, and I'm also a first-year student at the college. 00:02:13.230 --> 00:02:14.188 DAVID MALAN: Wonderful. 00:02:14.188 --> 00:02:15.869 Well, welcome to both Alex and Sarah. 00:02:15.869 --> 00:02:18.577 So one of the problems you'll perhaps solve this week for problem 00:02:18.577 --> 00:02:22.442 set 2 is to analyze the reading level of a body of text, 00:02:22.442 --> 00:02:25.650 whether someone reads at a first grade level, second grade level, third grade 00:02:25.650 --> 00:02:28.570 level, all the way up to 12 or 13 or beyond. 00:02:28.570 --> 00:02:32.250 What you perhaps never quite thought about, certainly in terms of code, 00:02:32.250 --> 00:02:35.310 like how you would analyze some text, some book and figure 00:02:35.310 --> 00:02:36.750 out what reading level is it at. 00:02:36.750 --> 00:02:40.330 And yet, surely our teachers growing up knew or had an intuitive sense of this. 00:02:40.330 --> 00:02:42.450 So let's consider some sample text. 00:02:42.450 --> 00:02:45.960 For instance, Alex, what have you been reading lately? 00:02:45.960 --> 00:02:52.502 ALEX: One fish, two fish, red fish, blue fish. 00:02:52.502 --> 00:02:53.460 DAVID MALAN: Wonderful. 00:02:53.460 --> 00:02:58.890 So given that, what grade level would you say Alex is currently reading at? 00:02:58.890 --> 00:03:01.500 Feel free to just shout it out. 00:03:01.500 --> 00:03:02.730 First, first? 00:03:02.730 --> 00:03:07.200 So indeed, you'll see this week, if you run your code on Alex's text, 00:03:07.200 --> 00:03:10.410 it actually turns out he reads below a first grade reading level. 00:03:10.410 --> 00:03:12.400 But why might that be? 00:03:12.400 --> 00:03:16.410 What might your intuition be for why we've 00:03:16.410 --> 00:03:19.020 accused Alex of reading at this level? 00:03:19.020 --> 00:03:20.990 Feel free to shout out. 00:03:20.990 --> 00:03:21.490 Yeah. 00:03:21.490 --> 00:03:24.520 So very few syllables, short words, short sentences. 00:03:24.520 --> 00:03:27.828 And so there's some heuristics, perhaps, we can infer from that short text, 00:03:27.828 --> 00:03:30.370 that that probably means that it's best for younger children. 00:03:30.370 --> 00:03:33.370 Now Sarah, by contrast, what have you been reading? 00:03:33.370 --> 00:03:35.470 SARAH: Mr. And Mrs. Dursley of Number. 00:03:35.470 --> 00:03:38.890 Four Privet Drive were proud to say that they were 00:03:38.890 --> 00:03:41.050 perfectly normal, thank you very much. 00:03:41.050 --> 00:03:43.480 They were the last people you'd expect to be involved 00:03:43.480 --> 00:03:46.390 in anything strange or mysterious because they just 00:03:46.390 --> 00:03:47.952 didn't hold with much nonsense. 00:03:47.952 --> 00:03:48.910 DAVID MALAN: All right. 00:03:48.910 --> 00:03:50.950 Now irrespective of what grade you were in when 00:03:50.950 --> 00:03:53.283 you might have read that text, what grade level to Sarah 00:03:53.283 --> 00:03:55.230 seemed to be reading at? 00:03:55.230 --> 00:03:57.570 So eighth grade, second grade. 00:03:57.570 --> 00:03:58.080 OK. 00:03:58.080 --> 00:04:01.125 So hearing a bit of everything, so with that, at least according to code, 00:04:01.125 --> 00:04:03.240 it would actually be seventh grade. 00:04:03.240 --> 00:04:05.130 And what might the intuition there be? 00:04:05.130 --> 00:04:07.620 Why is that a higher grade level even though we might 00:04:07.620 --> 00:04:09.917 disagree exactly which grade it is? 00:04:09.917 --> 00:04:11.250 AUDIENCE: Complicated sentences. 00:04:11.250 --> 00:04:12.000 DAVID MALAN: Yeah. 00:04:12.000 --> 00:04:14.218 So complicated sentences, longer sentences. 00:04:14.218 --> 00:04:17.010 So indeed a lot more words were being spoken by Sarah because there 00:04:17.010 --> 00:04:18.519 was so much more there on the page. 00:04:18.519 --> 00:04:22.079 So we'll translate these ideas this coming week in problem set 2, 00:04:22.079 --> 00:04:25.170 if you tackle this one, through code so that you can ultimately 00:04:25.170 --> 00:04:26.910 infer things of these quantitatively. 00:04:26.910 --> 00:04:29.190 But to do so, we're going to have to understand text. 00:04:29.190 --> 00:04:32.610 So let's first thank our volunteers and then we'll dive in to that lower level. 00:04:32.610 --> 00:04:35.337 [APPLAUSE] 00:04:39.910 --> 00:04:40.600 Sorry. 00:04:40.600 --> 00:04:41.490 You can keep those. 00:04:41.490 --> 00:04:42.222 SARAH: Oh, OK. 00:04:42.222 --> 00:04:43.180 DAVID MALAN: All right. 00:04:43.180 --> 00:04:45.970 So besides that, let's consider one other body of text 00:04:45.970 --> 00:04:48.010 perhaps that you might see this week, which 00:04:48.010 --> 00:04:50.210 is namely a little something like this. 00:04:50.210 --> 00:04:53.860 What I have here on the screen is what we'll start calling today ciphertext. 00:04:53.860 --> 00:04:56.530 It's the result of encrypting some piece of information. 00:04:56.530 --> 00:05:00.190 And encryption, or more generally, the art and science of cryptography 00:05:00.190 --> 00:05:00.908 is all around us. 00:05:00.908 --> 00:05:03.700 It's what you're using on the web, on your phones, with your banks. 00:05:03.700 --> 00:05:07.000 And anything that tries to keep data secure is using encryption. 00:05:07.000 --> 00:05:10.390 But there's going to be different levels of encryption-- strong encryption, 00:05:10.390 --> 00:05:11.140 weak encryption. 00:05:11.140 --> 00:05:14.590 And what you see here on the screen isn't all that strong, 00:05:14.590 --> 00:05:18.190 but we'll see later today how we might decrypt this and actually reveal 00:05:18.190 --> 00:05:22.030 what the plaintext is that corresponds to that ciphertext. 00:05:22.030 --> 00:05:25.670 But in order to do so, we have to start taking off some training wheels, 00:05:25.670 --> 00:05:26.197 so to speak. 00:05:26.197 --> 00:05:28.030 And believe it or not, even though your time 00:05:28.030 --> 00:05:30.100 would see this past week for the first time, 00:05:30.100 --> 00:05:32.230 probably, might have been rather in the weeds. 00:05:32.230 --> 00:05:36.072 And much more complicated seemingly than C, it turns out that along the way, 00:05:36.072 --> 00:05:37.780 we have been providing and we'll continue 00:05:37.780 --> 00:05:39.760 to provide certain training wheels. 00:05:39.760 --> 00:05:42.190 For instance, the CS50 Library is one of them, 00:05:42.190 --> 00:05:46.240 and even some of the explanations we give of topics for now 00:05:46.240 --> 00:05:49.120 in these early weeks will be somewhat simplified-- abstracted away, 00:05:49.120 --> 00:05:49.730 if you will. 00:05:49.730 --> 00:05:51.730 But the goal ultimately is for you to understand 00:05:51.730 --> 00:05:55.060 each and every one of those details so that after CS50, you really 00:05:55.060 --> 00:05:58.210 can stand on your own and understand and wrap your mind 00:05:58.210 --> 00:06:01.040 around any future technologies as well. 00:06:01.040 --> 00:06:05.318 So let's consider first the very first program with which we began last week, 00:06:05.318 --> 00:06:06.110 which was this one. 00:06:06.110 --> 00:06:09.215 So "hello, world" in C. At the end of the day, it was really the printf 00:06:09.215 --> 00:06:11.590 function that was doing the interesting part of the work, 00:06:11.590 --> 00:06:14.890 but there was a lot of technical stuff above and below it. 00:06:14.890 --> 00:06:19.900 The curly braces, the parentheses, words like void and include, and then 00:06:19.900 --> 00:06:21.730 of course, the angled brackets and more. 00:06:21.730 --> 00:06:25.870 But at the end of the day, we needed to convert that source code in C 00:06:25.870 --> 00:06:30.190 to machine code, the 0's and 1's in binary that the computer understood. 00:06:30.190 --> 00:06:32.500 And to do that, of course, we ran-- 00:06:32.500 --> 00:06:33.700 we compiled the code. 00:06:33.700 --> 00:06:37.400 We ran make and then we were able to actually run that code there. 00:06:37.400 --> 00:06:39.370 So let me actually go over here to VS Code 00:06:39.370 --> 00:06:44.510 and really quickly recreate that hello.c pretty much by transcribing the same. 00:06:44.510 --> 00:06:51.970 So I might have here include stdio.h, int main void. 00:06:51.970 --> 00:06:54.460 And then in here, I had quite simply, hello, 00:06:54.460 --> 00:06:57.430 comma, world with my backslash, endquotes, and more. 00:06:57.430 --> 00:07:01.693 Now last time, to compile this, I indeed ran make hello, followed by Enter. 00:07:01.693 --> 00:07:03.860 Hopefully you see no errors and that's a good thing. 00:07:03.860 --> 00:07:05.980 And if you do dot, slash, hello, you see, 00:07:05.980 --> 00:07:07.840 in fact, the results of that program. 00:07:07.840 --> 00:07:11.470 But it turns out that make is not actually a compiler 00:07:11.470 --> 00:07:12.950 as I alluded to last week. 00:07:12.950 --> 00:07:15.520 It's a program that clearly makes your program, 00:07:15.520 --> 00:07:19.030 but it itself just automates the process of using an actual compiler. 00:07:19.030 --> 00:07:21.290 And there's lots of different compilers out there, 00:07:21.290 --> 00:07:24.190 and the one that it's actually using underneath the hood 00:07:24.190 --> 00:07:27.640 is a little something called Clang for C Language. 00:07:27.640 --> 00:07:30.190 And Clang is a pretty popular compiler nowadays. 00:07:30.190 --> 00:07:33.520 There's another one that's been around for ages called GCC, 00:07:33.520 --> 00:07:36.330 but these are just specific names for types of compilers 00:07:36.330 --> 00:07:38.830 that different people, different companies, different groups 00:07:38.830 --> 00:07:40.310 have actually created. 00:07:40.310 --> 00:07:44.800 But if you use in week 1 a compiler yourself manually, 00:07:44.800 --> 00:07:47.170 you have to understand a little more about what's 00:07:47.170 --> 00:07:50.703 going on because it's even more cryptic than what just make alone. 00:07:50.703 --> 00:07:53.620 So in fact, let me go back to my terminal window here, let me go ahead 00:07:53.620 --> 00:07:58.690 and clear the screen a little bit and just run really the raw compiler 00:07:58.690 --> 00:07:59.360 command. 00:07:59.360 --> 00:08:01.450 So what make is automating for me let me, 00:08:01.450 --> 00:08:03.620 actually do this manually for just a moment. 00:08:03.620 --> 00:08:10.450 So if I want to compile hello.c into an executable program I can run, 00:08:10.450 --> 00:08:12.220 I can do this. 00:08:12.220 --> 00:08:17.110 clang, space, hello.c, and then Enter. 00:08:17.110 --> 00:08:20.980 And now there's no output, which is a good thing in this case, no errors, 00:08:20.980 --> 00:08:22.010 but notice this. 00:08:22.010 --> 00:08:25.450 If I go ahead and type ls, it turns out there's 00:08:25.450 --> 00:08:32.140 a file that's been created suddenly in my current folder weirdly called a.out. 00:08:32.140 --> 00:08:33.580 That stands for Assembler Output. 00:08:33.580 --> 00:08:35.980 And long story short, that's actually the default name 00:08:35.980 --> 00:08:39.440 of a program that's created when you just run Clang by itself. 00:08:39.440 --> 00:08:41.830 Now that's a pretty bad name for a program 00:08:41.830 --> 00:08:44.000 because it doesn't describe what it does. 00:08:44.000 --> 00:08:49.870 So better would be here to perhaps do, well, instead of a.out, which, yes, 00:08:49.870 --> 00:08:53.950 still prints hello.world, but isn't really a clearly-named program, 00:08:53.950 --> 00:08:55.420 it'd be nice to name this hello. 00:08:55.420 --> 00:08:56.240 So what could I do? 00:08:56.240 --> 00:08:59.740 I could do like we learned last week-- well, I could rename a.out to hello 00:08:59.740 --> 00:09:01.820 by using Linux's mv command. 00:09:01.820 --> 00:09:04.480 So I'm going to move a.out to become hello. 00:09:04.480 --> 00:09:06.370 But that, too, seems kind of tedious. 00:09:06.370 --> 00:09:07.720 Now I have three steps. 00:09:07.720 --> 00:09:10.750 Like write my code, compile my code, and then rename it 00:09:10.750 --> 00:09:12.190 before I can even run it. 00:09:12.190 --> 00:09:13.580 We can do better than that. 00:09:13.580 --> 00:09:15.580 And so it turns out that certain commands 00:09:15.580 --> 00:09:18.220 like clang support what we're going to start today 00:09:18.220 --> 00:09:20.380 calling command line arguments. 00:09:20.380 --> 00:09:24.010 A command line argument, unlike an argument to a function, 00:09:24.010 --> 00:09:27.040 is just an additional word or key phrase that you 00:09:27.040 --> 00:09:30.400 type after a command at your prompt in your terminal 00:09:30.400 --> 00:09:33.440 window that just modifies the behavior of that command. 00:09:33.440 --> 00:09:35.600 It configures it a little more specifically. 00:09:35.600 --> 00:09:39.220 So what you're seeing here on the screen is some of a better command with which 00:09:39.220 --> 00:09:45.220 to run clang so that now I can specify the output of this command per this o. 00:09:45.220 --> 00:09:46.610 So do what I mean by that? 00:09:46.610 --> 00:09:48.943 Well, let me go ahead and clear my terminal window again 00:09:48.943 --> 00:09:54.955 and more explicitly type clang -o hello hello.c and then Enter. 00:09:54.955 --> 00:09:57.580 Nothing, again, appears to happen, but that's a good thing when 00:09:57.580 --> 00:10:02.860 you see no errors and now the program I just created is indeed called Hello. 00:10:02.860 --> 00:10:07.280 So it achieves really the same exact effect as make did, but what. 00:10:07.280 --> 00:10:09.820 I don't have to do with make is type and remember something 00:10:09.820 --> 00:10:11.075 as long as this command. 00:10:11.075 --> 00:10:12.700 And this, too, is a bit of a white lie. 00:10:12.700 --> 00:10:16.420 It turns out, we have preconfigured VS Code in the cloud for you 00:10:16.420 --> 00:10:21.310 to also use some other features of Clang that would be even more 00:10:21.310 --> 00:10:22.840 tedious for you to write yourselves. 00:10:22.840 --> 00:10:28.130 And so really, this is why we distill this as ultimately just running make. 00:10:28.130 --> 00:10:31.900 So let me pause here to see first if there's any questions on what I've 00:10:31.900 --> 00:10:34.540 done by taking my very first program in C 00:10:34.540 --> 00:10:37.720 and just now compiling it first with make, but then starting over 00:10:37.720 --> 00:10:40.780 and now manually compiling it with clang with what 00:10:40.780 --> 00:10:44.500 we'll call command line arguments. -o, space, hello, 00:10:44.500 --> 00:10:46.820 and then the name of the file. 00:10:46.820 --> 00:10:47.320 Yeah? 00:10:47.320 --> 00:10:48.780 AUDIENCE: What is a.out? 00:10:48.780 --> 00:10:49.530 DAVID MALAN: Yeah. 00:10:49.530 --> 00:10:51.870 So a.out is a historical name. 00:10:51.870 --> 00:10:55.240 It refers to assembler output-- more on that soon. 00:10:55.240 --> 00:10:58.080 And it's just the default file name that you get automatically 00:10:58.080 --> 00:11:01.350 if you just run the compiler on any file so that you 00:11:01.350 --> 00:11:02.970 have just a standard name for it. 00:11:02.970 --> 00:11:05.213 But it's not a very well-named program. 00:11:05.213 --> 00:11:07.380 Instead of running Microsoft Word on your Mac or PC, 00:11:07.380 --> 00:11:09.880 it would be like double-clicking on a.out. 00:11:09.880 --> 00:11:11.880 So instead with these command line arguments, 00:11:11.880 --> 00:11:17.370 you can customize the output of Clang and call it hello or anything you want. 00:11:17.370 --> 00:11:23.020 Other questions on what I've done here with Clang itself, the compiler? 00:11:23.020 --> 00:11:23.520 Yeah? 00:11:23.520 --> 00:11:25.510 AUDIENCE: What is -o? 00:11:25.510 --> 00:11:26.565 DAVID MALAN: So -o-- 00:11:26.565 --> 00:11:29.440 and you would only know this from reading the manual, taking a class, 00:11:29.440 --> 00:11:30.500 means output. 00:11:30.500 --> 00:11:35.890 So -o means change Clang's output to be a file called hello 00:11:35.890 --> 00:11:38.680 instead of the default, which is a.out. 00:11:38.680 --> 00:11:42.400 And this, too, is, again, a detail you would have to look up on a web page, 00:11:42.400 --> 00:11:44.810 read the manual, hear someone like me tell you about it. 00:11:44.810 --> 00:11:46.893 And in fact, there's even more than these options, 00:11:46.893 --> 00:11:48.890 but we'll just scratch the surface here. 00:11:48.890 --> 00:11:49.390 All right. 00:11:49.390 --> 00:11:53.530 So if we now know this, what more is actually happening underneath the hood? 00:11:53.530 --> 00:11:57.250 Well, let's take a closer look at not just this version of my code, 00:11:57.250 --> 00:12:01.190 but my slightly more complicated version last week, 00:12:01.190 --> 00:12:03.430 which looked a little something like this, wherein 00:12:03.430 --> 00:12:07.330 I added in some dynamic input from the user so I could say not hello, world 00:12:07.330 --> 00:12:11.810 to everyone, but hello, David or hello to whoever actually runs this program. 00:12:11.810 --> 00:12:15.880 So in fact, let me go ahead and change my code here in VS Code just 00:12:15.880 --> 00:12:17.770 to match that same code from last week. 00:12:17.770 --> 00:12:19.190 So no new code yet. 00:12:19.190 --> 00:12:22.820 I'm just going to, in a moment, compile it in a slightly different way. 00:12:22.820 --> 00:12:29.020 So I did last week's string, I think, answer equals string, quote-unquote, 00:12:29.020 --> 00:12:30.100 "What's your name?" 00:12:30.100 --> 00:12:31.540 Just like in Scratch. 00:12:31.540 --> 00:12:35.920 And then down here, instead of doing world, I initially wrote answer, 00:12:35.920 --> 00:12:37.450 but that didn't go well. 00:12:37.450 --> 00:12:41.530 What did I ultimately do instead to print out hello, David or hello, 00:12:41.530 --> 00:12:42.940 so-and-so? 00:12:42.940 --> 00:12:44.722 Yeah? 00:12:44.722 --> 00:12:45.680 Sorry, a little louder? 00:12:45.680 --> 00:12:46.430 AUDIENCE: %s? 00:12:46.430 --> 00:12:50.478 DAVID MALAN: Yeah, so %s, the so-called format code that printf just knows how 00:12:50.478 --> 00:12:51.020 to deal with. 00:12:51.020 --> 00:12:52.470 And I had to add one other thing. 00:12:52.470 --> 00:12:54.350 Someone else besides %s-- 00:12:54.350 --> 00:12:54.850 yeah? 00:12:54.850 --> 00:12:56.050 AUDIENCE: The name of the variable. 00:12:56.050 --> 00:12:58.870 DAVID MALAN: The name of the variable that I want to plug into that 00:12:58.870 --> 00:13:00.190 placeholder %s. 00:13:00.190 --> 00:13:01.630 And in this case, it's answer. 00:13:01.630 --> 00:13:04.363 Now let me make one refinement only because now we're in week 2 00:13:04.363 --> 00:13:06.530 and we're going to start writing more lines of code, 00:13:06.530 --> 00:13:10.360 even though Scratch called the return value of the ask puzzle piece, 00:13:10.360 --> 00:13:11.560 answer always. 00:13:11.560 --> 00:13:14.480 And see, we have full control over what our variables are called. 00:13:14.480 --> 00:13:17.410 And now it's probably good not to just generically always call 00:13:17.410 --> 00:13:19.870 my variable answer if I'm using get_string. 00:13:19.870 --> 00:13:21.050 Let's call it what it is. 00:13:21.050 --> 00:13:23.680 So this is now just a matter of style, if you will. 00:13:23.680 --> 00:13:26.620 Let me change the variable to be name just so 00:13:26.620 --> 00:13:29.980 that it's a little clearer to me, to you, to a TF or TA 00:13:29.980 --> 00:13:34.000 exactly what that variable represents instead of more generically answer. 00:13:34.000 --> 00:13:37.030 All right, so that said, let me go down to my terminal window, 00:13:37.030 --> 00:13:41.050 and last week again, I ran make to compile this exact same program. 00:13:41.050 --> 00:13:43.270 Now, though, let me go ahead and just use clang. 00:13:43.270 --> 00:13:45.490 So clang -o-- 00:13:45.490 --> 00:13:47.500 I'll still call this version hello-- 00:13:47.500 --> 00:13:49.330 space, hello.c. 00:13:49.330 --> 00:13:51.080 So exact same command as before. 00:13:51.080 --> 00:13:54.640 The only thing that's different is I've added a couple of more lines of code 00:13:54.640 --> 00:13:56.330 to get the user's input. 00:13:56.330 --> 00:13:59.960 Let me hit Enter, and now, darn it, our first error. 00:13:59.960 --> 00:14:02.750 So output from clang and make is not a good thing, 00:14:02.750 --> 00:14:05.420 and here, we're seeing something particularly cryptic. 00:14:05.420 --> 00:14:09.010 So something in function 'main--' undefined reference 00:14:09.010 --> 00:14:13.480 to 'get_string,' string and then linker command failed with exit code 1. 00:14:13.480 --> 00:14:16.540 So there's actually a lot of jargon in there that will tease apart today, 00:14:16.540 --> 00:14:20.338 but my hint is that clearly my problem's in main, although that's not surprising 00:14:20.338 --> 00:14:22.130 because there's nothing else going on here. 00:14:22.130 --> 00:14:26.830 get_string is an issue, and the issue is that it's an undefined reference. 00:14:26.830 --> 00:14:28.990 And yet, notice, I was pretty good. 00:14:28.990 --> 00:14:32.920 I added the CS50 header file and I said last week that that's 00:14:32.920 --> 00:14:35.920 enough to teach the compiler that functions exist, 00:14:35.920 --> 00:14:39.070 but the problem is that even though this does, in fact, 00:14:39.070 --> 00:14:43.090 teach Clang that get_string exists, it is not 00:14:43.090 --> 00:14:47.530 sufficient information for Clang to go find on the hard drive of the computer 00:14:47.530 --> 00:14:51.860 the 0's and 1's that actually implement get_string itself. 00:14:51.860 --> 00:14:54.250 So in other words, this include line, per last week, 00:14:54.250 --> 00:14:55.333 is a little bit of a hint. 00:14:55.333 --> 00:14:59.560 It's a teaser to Clang that you're about to see and use this function somewhere. 00:14:59.560 --> 00:15:05.710 But if you actually want to use the 0's and 1's that CS50 wrote some time ago 00:15:05.710 --> 00:15:08.740 and bake those into your program so your program actually 00:15:08.740 --> 00:15:11.470 knows how to get input from the user, well then, 00:15:11.470 --> 00:15:15.440 I'm going to have to go ahead and run a slightly different command. 00:15:15.440 --> 00:15:16.250 So let me do this. 00:15:16.250 --> 00:15:18.917 Let me clear my terminal window just get rid of that distraction 00:15:18.917 --> 00:15:23.020 and let me propose now that we run this command instead. 00:15:23.020 --> 00:15:28.510 Almost the same as before, clang -o, space, hello, then hello.c, 00:15:28.510 --> 00:15:34.210 but with one additional command line argument at the end, and this is a -l-- 00:15:34.210 --> 00:15:35.050 not a number 1. 00:15:35.050 --> 00:15:39.370 So -lcs with no space in between those two. 00:15:39.370 --> 00:15:43.540 Now the l is going to result in all of those 0's and 1's that actually 00:15:43.540 --> 00:15:48.350 were in by CS50 being linked into your code, your few lines of code or mine 00:15:48.350 --> 00:15:48.850 here. 00:15:48.850 --> 00:15:53.530 But that's the second step that the compiler requires in order to know how 00:15:53.530 --> 00:15:58.537 to actually execute and rather compile your code and CS50's. 00:15:58.537 --> 00:16:00.370 And CS50 is not the only one that does this. 00:16:00.370 --> 00:16:04.750 If you use any third party library in C that doesn't come with the language, 00:16:04.750 --> 00:16:08.333 you would do -l such and such where whoever-- 00:16:08.333 --> 00:16:10.000 however they've named their own library. 00:16:10.000 --> 00:16:14.298 But you don't have to do it for built in things like we've been using thus far. 00:16:14.298 --> 00:16:16.090 All right, so let me go ahead and try this. 00:16:16.090 --> 00:16:19.000 I'll go back to VS Code here, and let me go ahead now 00:16:19.000 --> 00:16:23.620 and run clang -o hello, then hello.c. 00:16:23.620 --> 00:16:26.560 And now instead of just hitting Enter, -lcs50 00:16:26.560 --> 00:16:29.590 with no space between the l and the cs50, Enter. 00:16:29.590 --> 00:16:33.310 Now nothing bad happens, and now I can do ./hello. 00:16:33.310 --> 00:16:34.180 What's your name? 00:16:34.180 --> 00:16:37.633 I'll type in David, Enter, and now we see hello, David. 00:16:37.633 --> 00:16:40.300 Now honestly, this is where we're really getting into the weeds, 00:16:40.300 --> 00:16:42.130 and now this is taking-- 00:16:42.130 --> 00:16:45.730 this is really just adding nuisance to the process of compiling and running 00:16:45.730 --> 00:16:46.460 your code. 00:16:46.460 --> 00:16:49.960 And so the reality is, even though this is indeed what is happening, 00:16:49.960 --> 00:16:51.880 this is why we used last week and we're going 00:16:51.880 --> 00:16:55.240 to continue using this week onward make because it just 00:16:55.240 --> 00:16:57.130 automates that whole process for you. 00:16:57.130 --> 00:17:00.130 But it's ideal to understand what's going wrong because any of the error 00:17:00.130 --> 00:17:02.770 messages you saw for problem set 1, any of the error messages 00:17:02.770 --> 00:17:05.859 you see for the next few weeks probably aren't coming from make, 00:17:05.859 --> 00:17:08.560 they're coming from Clang underneath the hood 00:17:08.560 --> 00:17:10.780 because make is just automating the process. 00:17:10.780 --> 00:17:14.060 But with make, you literally just write make and then the name of the program, 00:17:14.060 --> 00:17:17.560 you don't have to worry about any of those command line arguments. 00:17:17.560 --> 00:17:22.240 Questions, then, on compiling with dash -lcs50 or anything else? 00:17:22.240 --> 00:17:23.043 Yeah? 00:17:23.043 --> 00:17:24.960 AUDIENCE: What is the benefit of [INAUDIBLE]?? 00:17:24.960 --> 00:17:26.220 DAVID MALAN: Sorry, what is the benefit of-- 00:17:26.220 --> 00:17:27.512 AUDIENCE: Using Clang manually. 00:17:27.512 --> 00:17:30.000 DAVID MALAN: What is the benefit of using Clang manually? 00:17:30.000 --> 00:17:30.870 None, really. 00:17:30.870 --> 00:17:33.450 In fact, all main is doing is just say-- make is doing 00:17:33.450 --> 00:17:35.055 is saving us some keystrokes. 00:17:35.055 --> 00:17:37.680 If you prefer, though, and you just like to be more in control, 00:17:37.680 --> 00:17:41.130 you can totally run Clang manually if you remember the various command line 00:17:41.130 --> 00:17:42.090 arguments. 00:17:42.090 --> 00:17:42.660 Yeah? 00:17:42.660 --> 00:17:47.335 AUDIENCE: So why did you have to explain [INAUDIBLE] 00:17:47.335 --> 00:17:48.210 DAVID MALAN: Exactly. 00:17:48.210 --> 00:17:49.560 Why did I have to explain-- 00:17:49.560 --> 00:17:53.220 that is, provide a hint to CS50 with the cs50.h header file, 00:17:53.220 --> 00:17:55.470 but I didn't have to do that with standardio.h? 00:17:55.470 --> 00:17:56.400 Just because. 00:17:56.400 --> 00:18:00.990 standardio.h comes with C, just like a few other libraries come 00:18:00.990 --> 00:18:03.060 with C that we'll start seeing today. 00:18:03.060 --> 00:18:05.410 CS50, though, is not built into C everywhere, 00:18:05.410 --> 00:18:07.890 and so you do have to explicitly add that one there. 00:18:07.890 --> 00:18:08.767 Yeah? 00:18:08.767 --> 00:18:11.970 AUDIENCE: Can you define what command line argument [INAUDIBLE]?? 00:18:11.970 --> 00:18:15.210 DAVID MALAN: A command line argument is a word or phrase 00:18:15.210 --> 00:18:17.740 that you type at the command line-- 00:18:17.740 --> 00:18:22.200 a.k.a., your terminal-- in order to influence the behavior of a program. 00:18:22.200 --> 00:18:22.742 AUDIENCE: OK. 00:18:22.742 --> 00:18:24.430 So it's a term for whatever you're giving it. 00:18:24.430 --> 00:18:24.565 DAVID MALAN: Yeah. 00:18:24.565 --> 00:18:25.660 It changes the defaults. 00:18:25.660 --> 00:18:27.790 In our GUI world, Graphical User Interface, 00:18:27.790 --> 00:18:29.680 you and I would probably click some boxes, 00:18:29.680 --> 00:18:32.350 we would select some menu options to configure a program 00:18:32.350 --> 00:18:33.460 to behave in the same way. 00:18:33.460 --> 00:18:36.850 At a command line interface, you have to just say everything all at once, 00:18:36.850 --> 00:18:39.600 and that's why we have command line arguments. 00:18:39.600 --> 00:18:40.605 Yeah? 00:18:40.605 --> 00:18:43.243 AUDIENCE: Is make [INAUDIBLE] 00:18:43.243 --> 00:18:43.910 DAVID MALAN: No. 00:18:43.910 --> 00:18:45.470 Make is not just for CS50. 00:18:45.470 --> 00:18:50.480 It's used globally in any project really nowadays using C, C++, 00:18:50.480 --> 00:18:52.020 even other languages as well. 00:18:52.020 --> 00:18:54.140 In fact, most every command you see in this class, 00:18:54.140 --> 00:18:57.530 unless it has 5-0 at the end of it, is globally used. 00:18:57.530 --> 00:19:00.758 Only those-- a suffix with 50 are, indeed, course-specific. 00:19:00.758 --> 00:19:03.050 And even those we'll gradually take training wheels off 00:19:03.050 --> 00:19:06.890 of so that exactly what those commands are doing as well. 00:19:06.890 --> 00:19:09.053 All right, so what is it that we've just done? 00:19:09.053 --> 00:19:11.720 Everything we've just done, of course, I keep calling compiling, 00:19:11.720 --> 00:19:13.580 but let's just go down one rabbit hole so 00:19:13.580 --> 00:19:15.967 that you understand that when you compile code, 00:19:15.967 --> 00:19:18.050 there's actually a whole bunch of steps, happening 00:19:18.050 --> 00:19:21.800 and this is going to enable a lot of features, like companies can 00:19:21.800 --> 00:19:26.060 write code and then convert it to run it on Macs and PCs alike 00:19:26.060 --> 00:19:27.240 or phones or the like. 00:19:27.240 --> 00:19:30.320 So it's not just a matter of converting source code to machine code, 00:19:30.320 --> 00:19:34.610 there's actually four steps involved in what you and I, as of last week, 00:19:34.610 --> 00:19:35.840 know as compiling. 00:19:35.840 --> 00:19:39.033 And these aren't terms that you'll have to keep in mind constantly 00:19:39.033 --> 00:19:41.450 because again, we're going to abstract a lot of this away. 00:19:41.450 --> 00:19:43.492 But just so we've gone down the rabbit hole once, 00:19:43.492 --> 00:19:45.890 let's consider each of these four steps that 00:19:45.890 --> 00:19:49.850 have been happening for you for a week automatically, the first of which 00:19:49.850 --> 00:19:51.080 is called preprocessing. 00:19:51.080 --> 00:19:52.260 So what does this mean? 00:19:52.260 --> 00:19:54.450 Well, let's consider that same program as before. 00:19:54.450 --> 00:19:57.830 So notice that two of the lines of code start with a hash mark. 00:19:57.830 --> 00:20:02.338 That is a special symbol in C, and it's a so-called preprocessor directive. 00:20:02.338 --> 00:20:04.130 You don't need to memorize terms like that, 00:20:04.130 --> 00:20:07.005 but it just means that it's a little different from every other line. 00:20:07.005 --> 00:20:08.960 And anything with a hash symbol here should 00:20:08.960 --> 00:20:13.315 be preprocessed-- that is, analyzed initially before anything else happens. 00:20:13.315 --> 00:20:17.100 So let's consider these two lines up top, what exactly is happening. 00:20:17.100 --> 00:20:19.220 Well, it turns out with these two lines, you 00:20:19.220 --> 00:20:23.390 have two header files, of course, cs50.h and stdio.h. 00:20:23.390 --> 00:20:27.980 Where are those files, because they've never been in VS Code for you, 00:20:27.980 --> 00:20:28.550 seemingly. 00:20:28.550 --> 00:20:31.940 If you type LS-- if you open up the File Explorer in the GUI, 00:20:31.940 --> 00:20:35.900 you have never seen, probably, cs50.h or stdio.h. 00:20:35.900 --> 00:20:39.620 They just work, but that's because there's a folder somewhere 00:20:39.620 --> 00:20:43.340 on the hard drive that you're using on your Mac or PC 00:20:43.340 --> 00:20:45.690 or somewhere in the cloud, as in our case. 00:20:45.690 --> 00:20:50.210 And inside of this folder, traditionally called /usr/include. 00:20:50.210 --> 00:20:51.857 And user is deliberately misspelled. 00:20:51.857 --> 00:20:54.440 It's just slightly more succinct, although it's a little weird 00:20:54.440 --> 00:20:55.760 why we drop that one letter. 00:20:55.760 --> 00:21:01.760 But usr/include is just a folder on the server that contains cs50.h, stdio.h, 00:21:01.760 --> 00:21:03.990 and a bunch of other things as well. 00:21:03.990 --> 00:21:08.030 So in fact, if you type in VS Code, in your terminal window, 00:21:08.030 --> 00:21:13.310 when you're using code spaces in the cloud and type LS space /usr/include, 00:21:13.310 --> 00:21:15.470 you can see all of the files in that folder. 00:21:15.470 --> 00:21:17.580 But we've preinstalled all of that stuff for you. 00:21:17.580 --> 00:21:20.390 So let's consider what's actually in those files here. 00:21:20.390 --> 00:21:25.370 If I highlight these two lines up top that start with hash include, well, 00:21:25.370 --> 00:21:30.530 I kind of hinted last week that what's in that first file is a hint as to what 00:21:30.530 --> 00:21:32.660 functions CS50 wrote for you. 00:21:32.660 --> 00:21:35.540 So you can kind of think of these include lines 00:21:35.540 --> 00:21:38.300 as being temporary placeholders for what's 00:21:38.300 --> 00:21:41.000 going to become like a global find and replace. 00:21:41.000 --> 00:21:44.270 That is the first thing clang is going to do is to preprocess this file. 00:21:44.270 --> 00:21:47.300 It's going to look for any line that starts with hash include. 00:21:47.300 --> 00:21:50.960 And if it sees that, it's going to essentially go into that file, 00:21:50.960 --> 00:21:55.190 like cs50.h, and then just copy and paste the contents of that file 00:21:55.190 --> 00:21:56.443 magically there for you. 00:21:56.443 --> 00:21:58.110 You don't see it visually on the screen. 00:21:58.110 --> 00:22:00.060 But it's happening behind the scenes. 00:22:00.060 --> 00:22:03.230 And so really, what's happening with this first line 00:22:03.230 --> 00:22:09.380 is that somewhere in cs50.h is the declaration of getString 00:22:09.380 --> 00:22:11.690 like we talked last week, and it probably 00:22:11.690 --> 00:22:13.215 looks a little something like this. 00:22:13.215 --> 00:22:15.590 And we didn't spend much time on this yet this past week, 00:22:15.590 --> 00:22:17.030 but we will in time more. 00:22:17.030 --> 00:22:21.470 Notice that this is how a function is declared. 00:22:21.470 --> 00:22:23.677 That is, it is decreed to exist. 00:22:23.677 --> 00:22:25.760 The name of the function, of course, is getString. 00:22:25.760 --> 00:22:28.310 Inside of the parentheses are its arguments. 00:22:28.310 --> 00:22:31.580 In this case, there's one argument to getString, I claim today, 00:22:31.580 --> 00:22:33.080 but you've known this implicitly. 00:22:33.080 --> 00:22:34.160 And it's a prompt. 00:22:34.160 --> 00:22:36.860 It's the prompt that the human sees when you use getString. 00:22:36.860 --> 00:22:37.790 What is that prompt? 00:22:37.790 --> 00:22:41.060 Well, it's a string of text, like quote unquote, "what's your name?" 00:22:41.060 --> 00:22:43.080 or anything else that I asked last week. 00:22:43.080 --> 00:22:46.610 Meanwhile, getString, as we know from last week, has a return value. 00:22:46.610 --> 00:22:48.140 It returns something to you. 00:22:48.140 --> 00:22:49.610 And that, too, is a string. 00:22:49.610 --> 00:22:52.120 So again, this is also called a functions prototype. 00:22:52.120 --> 00:22:53.870 It's the thing toward the end of last week 00:22:53.870 --> 00:22:57.560 that I just copied and pasted from the bottom of my file to the top, 00:22:57.560 --> 00:23:02.030 just so that it was like this teaser for clang as to what would exist later. 00:23:02.030 --> 00:23:07.670 So you can think, then, of these include lines as just kind of combining all 00:23:07.670 --> 00:23:11.360 of those function declarations in some separate file called cs50.h, 00:23:11.360 --> 00:23:14.780 so that you yourself don't have to type them every time you use the library-- 00:23:14.780 --> 00:23:18.470 or worse, so that you, yourself, don't have to copy and paste those lines. 00:23:18.470 --> 00:23:22.520 This is what clang is doing for you in its first step of preprocessing. 00:23:22.520 --> 00:23:27.470 Second, and last in this example, what happens when clang preprocesses 00:23:27.470 --> 00:23:29.175 this second include line? 00:23:29.175 --> 00:23:31.550 Well, the only other function we care about in this story 00:23:31.550 --> 00:23:33.650 is printf, of course, which comes with C. 00:23:33.650 --> 00:23:39.440 So essentially, you can think of printf's prototype or declaration 00:23:39.440 --> 00:23:40.820 as just being this. 00:23:40.820 --> 00:23:42.870 Printf is the name of the function. 00:23:42.870 --> 00:23:47.370 It takes a string that you want to format like, Hello comma world, 00:23:47.370 --> 00:23:49.110 or Hello comma %s. 00:23:49.110 --> 00:23:52.120 And then with dot, dot, dot, this actually has technical meaning. 00:23:52.120 --> 00:23:55.770 It means, of course, that you can plug-in 0 variables, 1 variable, 2 00:23:55.770 --> 00:23:56.340 or 10. 00:23:56.340 --> 00:23:58.530 So dot, dot, dot means some number of variables. 00:23:58.530 --> 00:24:00.072 Now we haven't talked about this yet. 00:24:00.072 --> 00:24:01.410 And we won't really, in general. 00:24:01.410 --> 00:24:05.490 printf actually returns a value, a number, that is an integer. 00:24:05.490 --> 00:24:07.420 But more on that perhaps another time. 00:24:07.420 --> 00:24:10.920 It's generally not something the programmer tends to look at. 00:24:10.920 --> 00:24:14.250 But that's all we mean by preprocessing, so that at the end of this process, 00:24:14.250 --> 00:24:18.030 even though there's more lines of code in cs50.h and stdio.h, 00:24:18.030 --> 00:24:21.330 what's really just happening is that clang, in preprocessing 00:24:21.330 --> 00:24:25.380 the file, copies and pastes the contents of those files into your code 00:24:25.380 --> 00:24:29.160 so that now your code knows about everything-- getString, printf, 00:24:29.160 --> 00:24:31.060 and anything else. 00:24:31.060 --> 00:24:35.230 Any questions, then, on that first step, preprocessing? 00:24:35.230 --> 00:24:35.920 Yes? 00:24:35.920 --> 00:24:49.195 AUDIENCE: [INAUDIBLE] 00:24:49.195 --> 00:24:50.320 DAVID MALAN: Good question. 00:24:50.320 --> 00:24:52.720 When you include a file, does it only include what 00:24:52.720 --> 00:24:54.880 you need or does it include everything? 00:24:54.880 --> 00:24:56.420 Think of it as including everything. 00:24:56.420 --> 00:24:59.020 So if it's a big file, that's a lot of code at the very top. 00:24:59.020 --> 00:25:01.880 And that's why, if you think back to all of the zeros and ones 00:25:01.880 --> 00:25:03.880 I showed a little bit ago, as well as last week, 00:25:03.880 --> 00:25:06.130 there's a lot of zeros and ones that end up 00:25:06.130 --> 00:25:08.892 on the screen as a result of just writing, Hello, world. 00:25:08.892 --> 00:25:10.600 A lot of those zeros and ones are perhaps 00:25:10.600 --> 00:25:13.390 coming from code that you didn't actually, necessarily need. 00:25:13.390 --> 00:25:15.340 But some of it is perhaps there, but there 00:25:15.340 --> 00:25:17.740 are ways to optimize that as well. 00:25:17.740 --> 00:25:22.395 All right, so step two of compiling is, confusingly, called compiling. 00:25:22.395 --> 00:25:24.520 It's just, this is the term that most everyone uses 00:25:24.520 --> 00:25:27.940 to describe the whole process, instead of just this one step. 00:25:27.940 --> 00:25:32.140 But once a program has been preprocessed behind the scenes 00:25:32.140 --> 00:25:35.865 by the compiler for you, it looks now a little something like this. 00:25:35.865 --> 00:25:38.740 And I've put dot, dot, dot just to imply that, yes, to your question, 00:25:38.740 --> 00:25:39.820 there's more stuff above it. 00:25:39.820 --> 00:25:40.987 There's more stuff below it. 00:25:40.987 --> 00:25:43.070 It's just not interesting right now for us. 00:25:43.070 --> 00:25:44.860 So now we have just C code. 00:25:44.860 --> 00:25:46.960 There's no more preprocessor directives. 00:25:46.960 --> 00:25:49.840 At this point, all of the hash symbols and those lines of code 00:25:49.840 --> 00:25:52.670 have been preprocessed and converted to something else. 00:25:52.670 --> 00:25:56.380 And so now-- and this is where things get a little spooky looking. 00:25:56.380 --> 00:26:00.370 Here now is what happens when clang, or any compiler, 00:26:00.370 --> 00:26:03.310 literally compiles code like this. 00:26:03.310 --> 00:26:08.720 It converts it from this in C to this in assembly code. 00:26:08.720 --> 00:26:10.720 So this is among the scarier languages. 00:26:10.720 --> 00:26:12.580 I, myself, don't really have fond memories. 00:26:12.580 --> 00:26:14.805 This is not a language that many people program in. 00:26:14.805 --> 00:26:16.930 If you take a subsequent class in computer science, 00:26:16.930 --> 00:26:19.600 in systems, a higher level class, you might actually 00:26:19.600 --> 00:26:21.430 learn this or some variant thereof. 00:26:21.430 --> 00:26:23.232 But there's at least a few people out there 00:26:23.232 --> 00:26:24.940 that need to know this stuff because this 00:26:24.940 --> 00:26:29.320 is closer to what the computers themselves, nowadays, understand. 00:26:29.320 --> 00:26:34.600 The Intel CPUs or the AMD CPUs, the brains of today's computers and phones 00:26:34.600 --> 00:26:37.960 understand stuff that looks more like this and less like C. 00:26:37.960 --> 00:26:42.430 Now it's completely esoteric, but let me just highlight a few phrases. 00:26:42.430 --> 00:26:44.630 There's some stuff that's a little familiar. 00:26:44.630 --> 00:26:47.620 There is mention of main at the top there in yellow. 00:26:47.620 --> 00:26:49.750 There is mention of getString toward the bottom. 00:26:49.750 --> 00:26:52.070 There is mention of printf down below. 00:26:52.070 --> 00:26:55.600 So this is just another programming language called assembly language, 00:26:55.600 --> 00:26:57.010 that decades ago, humans-- 00:26:57.010 --> 00:26:58.450 myself included in school-- 00:26:58.450 --> 00:27:00.130 did write code in. 00:27:00.130 --> 00:27:02.630 And absolutely, some people still write this code, 00:27:02.630 --> 00:27:06.070 especially since you can write very, very efficient code. 00:27:06.070 --> 00:27:08.590 But it's a lot more arcane. 00:27:08.590 --> 00:27:11.380 It's a lot less user friendly. 00:27:11.380 --> 00:27:14.650 So you'll see in yellow now, these are the so-called instructions 00:27:14.650 --> 00:27:18.460 that a computer's brain or CPU understands, pushing values 00:27:18.460 --> 00:27:23.630 around, moving them, subtracting values, calling functions, and move, move, 00:27:23.630 --> 00:27:24.130 move. 00:27:24.130 --> 00:27:27.400 So really, the low-level operations that computers understand 00:27:27.400 --> 00:27:31.030 tend to be arithmetic operations-- subtraction, addition, 00:27:31.030 --> 00:27:34.120 and the like-- moving things in and out of memory. 00:27:34.120 --> 00:27:37.510 It's just a lot more tedious for folks like us to write code like this. 00:27:37.510 --> 00:27:40.450 This is why you and I tend to write stuff like this. 00:27:40.450 --> 00:27:44.080 And ideally, still, people like you and I tend to drag and drop puzzle pieces 00:27:44.080 --> 00:27:46.520 that sort of abstract all of that away further. 00:27:46.520 --> 00:27:49.420 But for now, this is, again, called assembly language. 00:27:49.420 --> 00:27:54.310 It is what happens when the compiler literally compiles your code. 00:27:54.310 --> 00:27:57.010 But of course, this, still not zeros and ones. 00:27:57.010 --> 00:27:58.580 So we got two steps to go. 00:27:58.580 --> 00:28:02.270 So when a compiler proceeds to step three, 00:28:02.270 --> 00:28:05.530 this is where things get converted to machine code. 00:28:05.530 --> 00:28:08.500 And when a compiler assembles your code for you, 00:28:08.500 --> 00:28:14.260 it converts what we just saw on the screen here to actual zeros and ones-- 00:28:14.260 --> 00:28:18.550 the so-called machine code that your phone or your computer understands. 00:28:18.550 --> 00:28:22.120 But it's worth noting that these are not necessarily all 00:28:22.120 --> 00:28:24.280 of the zeros and ones of your program. 00:28:24.280 --> 00:28:29.980 Yes, they are the zeros and ones that correspond to your Hello program 00:28:29.980 --> 00:28:33.250 or printf and getString and the like, but notice 00:28:33.250 --> 00:28:36.940 that here, we need one final step. 00:28:36.940 --> 00:28:40.100 In those zeros and ones are only your lines of code. 00:28:40.100 --> 00:28:43.540 But what about CS50's lines of code that we wrote to implement getString? 00:28:43.540 --> 00:28:46.990 What about the lines of code that humans wrote decades ago to implement printf? 00:28:46.990 --> 00:28:50.020 Those are somewhere on this hard drive, like on my Mac, my PC, 00:28:50.020 --> 00:28:54.460 or somewhere in the cloud, but we need to combine all of those zeros and ones 00:28:54.460 --> 00:29:01.390 together and link my code with CS50's code with standard I/O's code, 00:29:01.390 --> 00:29:02.420 all together. 00:29:02.420 --> 00:29:05.110 And so what happens in the last step, ultimately, 00:29:05.110 --> 00:29:07.960 is that if we have my code here in yellow, 00:29:07.960 --> 00:29:11.440 and then the code that CS50 wrote, and the code that the authors of C 00:29:11.440 --> 00:29:15.940 itself wrote, what really is happening is that somewhere, we have not only 00:29:15.940 --> 00:29:19.960 hello.c, which, obviously, I wrote, and wrote with us live here, 00:29:19.960 --> 00:29:24.550 there's also, let's assume, somewhere on the computer, a cs50.c file 00:29:24.550 --> 00:29:28.210 that, coincidentally, I and CS50 staff wrote years ago. 00:29:28.210 --> 00:29:30.790 And also, somewhere on the computer, there's another file. 00:29:30.790 --> 00:29:34.120 Let me oversimplify by just calling it stdio.c. 00:29:34.120 --> 00:29:36.850 In practice, it's probably specifically called printf.c. 00:29:36.850 --> 00:29:39.460 But they're somewhere, these two other files. 00:29:39.460 --> 00:29:44.110 And so this last step called linking takes my zeros and ones 00:29:44.110 --> 00:29:48.100 from the code I just wrote, namely this code on the screen here. 00:29:48.100 --> 00:29:50.810 It then grabs the zeros and ones that CS50 wrote. 00:29:50.810 --> 00:29:53.480 And it grabs the zeros and ones that the authors of C wrote, 00:29:53.480 --> 00:29:56.240 in order to implement the standard I/O library. 00:29:56.240 --> 00:30:00.750 And lastly, voila, links them all together. 00:30:00.750 --> 00:30:03.980 And this is the same blob of zeros and ones that we saw earlier. 00:30:03.980 --> 00:30:08.090 It's just now the result of preprocessing your code, 00:30:08.090 --> 00:30:12.620 compiling your code, assembling your code, linking your code, and my God, 00:30:12.620 --> 00:30:15.830 at this point, like if there were any fun in programming for you yet, 00:30:15.830 --> 00:30:19.620 we've just taken it all away, we just call this whole process compiling. 00:30:19.620 --> 00:30:20.120 Why? 00:30:20.120 --> 00:30:22.490 Because now that we know those steps exist-- 00:30:22.490 --> 00:30:25.370 and smart people solve that problem for us-- 00:30:25.370 --> 00:30:27.890 you and I can kind of operate at this level of abstraction 00:30:27.890 --> 00:30:32.420 and just assume that compiling converts source code to machine code. 00:30:32.420 --> 00:30:36.350 Questions, though, on any of these intermediate steps? 00:30:36.350 --> 00:30:37.360 Yeah? 00:30:37.360 --> 00:30:41.958 AUDIENCE: For linking, are different parts, like [INAUDIBLE]?? 00:30:50.072 --> 00:30:51.280 DAVID MALAN: A good question. 00:30:51.280 --> 00:30:53.238 So where are all of these zeros and one stored? 00:30:53.238 --> 00:30:56.400 Because you and I, we've been using a browser, right? code.cs50.io, 00:30:56.400 --> 00:30:58.330 of course, is this web-based user interface. 00:30:58.330 --> 00:31:00.497 But again, recall from last week, even though you're 00:31:00.497 --> 00:31:05.640 using a web browser to access VS Code, that web-based version of VS code 00:31:05.640 --> 00:31:09.000 is connected to an actual server somewhere in the cloud. 00:31:09.000 --> 00:31:13.170 And on that server, you have your own account and your own files, and really, 00:31:13.170 --> 00:31:15.360 your own hard drive, virtually in the cloud. 00:31:15.360 --> 00:31:18.872 Think of it a little like Dropbox or Box or Google Drive or OneDrive 00:31:18.872 --> 00:31:19.830 or something like that. 00:31:19.830 --> 00:31:23.310 So you have a hard drive somewhere out there that we've provisioned for you. 00:31:23.310 --> 00:31:27.930 And it's on that hard drive that you have your code that you just wrote, 00:31:27.930 --> 00:31:32.700 or I just wrote, cs50.c, stdio.c, and all of the other code 00:31:32.700 --> 00:31:36.967 that implements the math functions and everything else that C supports. 00:31:36.967 --> 00:31:37.550 Good question. 00:31:37.550 --> 00:31:38.964 Yeah? 00:31:38.964 --> 00:31:45.425 AUDIENCE: So, say in the CS50 library, the line [INAUDIBLE] 00:31:45.425 --> 00:31:49.401 do we do the same exact thing [INAUDIBLE] 00:31:49.401 --> 00:31:51.935 copy paste them all the way over? 00:31:51.935 --> 00:31:53.060 DAVID MALAN: Good question. 00:31:53.060 --> 00:31:57.110 That hash includes cs50.h line at the top of my code. 00:31:57.110 --> 00:32:01.310 If I just replace that with the contents of cs50.c, would that work? 00:32:01.310 --> 00:32:03.590 Short answer, yes, that would work. 00:32:03.590 --> 00:32:05.400 You could copy all of the code there. 00:32:05.400 --> 00:32:08.577 However, there's some order of operations that might come into play. 00:32:08.577 --> 00:32:10.910 And so it's probably not quite as simple as copy, paste. 00:32:10.910 --> 00:32:13.190 But conceptually, yes, that's what's happening. 00:32:13.190 --> 00:32:19.370 Now with that said, in cs50.h, are only the prototypes of the functions, 00:32:19.370 --> 00:32:23.628 the hints as to how the functions look, what their return type is, 00:32:23.628 --> 00:32:25.670 what their name is, and what their arguments are. 00:32:25.670 --> 00:32:29.867 It's in the dot c file that actual code tends to be written. 00:32:29.867 --> 00:32:32.450 And this is a little confusing now because you and I have only 00:32:32.450 --> 00:32:33.920 written code in dot c files. 00:32:33.920 --> 00:32:35.690 But in the next few weeks, you'll actually 00:32:35.690 --> 00:32:37.940 start writing some of your own dot h files 00:32:37.940 --> 00:32:40.460 as well, just like CS50, just like standard I/O. 00:32:40.460 --> 00:32:44.150 But in essence, that line of code just makes it easier to use and reuse 00:32:44.150 --> 00:32:46.020 code that's already been written. 00:32:46.020 --> 00:32:47.750 And that's the whole point of a library. 00:32:47.750 --> 00:32:50.327 AUDIENCE: Does linking them [INAUDIBLE]? 00:32:50.327 --> 00:32:51.910 DAVID MALAN: Say that a little louder. 00:32:51.910 --> 00:32:54.472 AUDIENCE: Does linking happen when you use the compiler? 00:32:54.472 --> 00:32:55.180 DAVID MALAN: Yes. 00:32:55.180 --> 00:32:56.980 Does linking happen when you compile your code? 00:32:56.980 --> 00:32:57.480 Yes. 00:32:57.480 --> 00:33:02.320 When you run make, as we have been doing the past week now, 00:33:02.320 --> 00:33:04.570 all four of these steps are happening. 00:33:04.570 --> 00:33:07.780 Preprocessing converts the hash include lines to something else. 00:33:07.780 --> 00:33:10.600 Compiling technically converts it to assembly 00:33:10.600 --> 00:33:14.290 code, which the Mac, the PC, the server more closely understands. 00:33:14.290 --> 00:33:18.850 Assembly converts that language to binary machine code that this computer 00:33:18.850 --> 00:33:20.080 actually understands. 00:33:20.080 --> 00:33:22.540 And then linking combines everything together. 00:33:22.540 --> 00:33:27.550 And in fact, if you think back a few minutes ago to when I did this -lcs50, 00:33:27.550 --> 00:33:30.070 the reason I had to add that, and the reason 00:33:30.070 --> 00:33:32.860 my code did not compile at first, was because I 00:33:32.860 --> 00:33:38.650 forgot to tell clang to link in CS50's zeros and ones per that last step. 00:33:38.650 --> 00:33:42.147 I don't need to do -lstdio because it comes with C, 00:33:42.147 --> 00:33:44.480 so that would just be tedious for everyone in the world. 00:33:44.480 --> 00:33:47.140 But CS50 does not come with C, so we link that in. 00:33:47.140 --> 00:33:49.780 And to be clear, too, we won't always use CS50's library. 00:33:49.780 --> 00:33:53.072 That'll be yet another pair of training wheels we take off in the coming weeks. 00:33:53.072 --> 00:33:55.000 But for now, it makes a few things simpler. 00:33:55.000 --> 00:33:57.284 Yeah? 00:33:57.284 --> 00:33:59.750 AUDIENCE: What is the [INAUDIBLE]? 00:34:08.878 --> 00:34:10.170 DAVID MALAN: Short answer, yes. 00:34:10.170 --> 00:34:12.870 So what do the zeros and ones, the machine code, translate to? 00:34:12.870 --> 00:34:15.690 Yes, there is a one-to-one relationship between the machine 00:34:15.690 --> 00:34:17.340 code and the assembly code. 00:34:17.340 --> 00:34:21.510 Assembly code, it's not really English, but at least it's symbols I recognize. 00:34:21.510 --> 00:34:22.800 It's not zeros and ones. 00:34:22.800 --> 00:34:24.810 Machine code, of course, is just zeros and ones. 00:34:24.810 --> 00:34:27.960 So back in the day, before C existed, people 00:34:27.960 --> 00:34:30.630 were programming only in assembly code. 00:34:30.630 --> 00:34:34.469 Before assembly code existed, people were coding in zeros and ones. 00:34:34.469 --> 00:34:36.719 And you can imagine just how painful that was, 00:34:36.719 --> 00:34:39.027 and so each of these languages makes life, for us, 00:34:39.027 --> 00:34:40.110 sort of easier and easier. 00:34:40.110 --> 00:34:42.330 In a few weeks, we'll transition to Python, which 00:34:42.330 --> 00:34:45.300 will, in turn, make C even simpler-- 00:34:45.300 --> 00:34:48.090 or coding, in general, simpler to do too. 00:34:48.090 --> 00:34:53.346 All right, so with that said, what now can we-- 00:34:53.346 --> 00:34:55.060 what could go wrong with this? 00:34:55.060 --> 00:34:58.140 Well, it turns out that besides compiling, technically speaking, 00:34:58.140 --> 00:34:59.233 there's decompiling. 00:34:59.233 --> 00:35:01.150 And we've not done this, and we won't do this. 00:35:01.150 --> 00:35:04.080 But it's worth considering for just a moment. 00:35:04.080 --> 00:35:07.560 If you were to not compile your code, but decompile it-- 00:35:07.560 --> 00:35:11.340 as the word suggests, this just means reversing the process, converting it, 00:35:11.340 --> 00:35:14.580 ideally, from machine code-- zeros and ones-- 00:35:14.580 --> 00:35:19.870 maybe back to C. Now this would be cool, perhaps, if all you have is a program, 00:35:19.870 --> 00:35:22.080 you can convert it and see the actual source code. 00:35:22.080 --> 00:35:25.320 What might a downside be, if anyone on the internet 00:35:25.320 --> 00:35:28.650 is able to decompile code on their machine? 00:35:28.650 --> 00:35:29.160 Yeah? 00:35:29.160 --> 00:35:30.270 AUDIENCE: [INAUDIBLE] 00:35:30.270 --> 00:35:34.130 DAVID MALAN: OK, so it's easier to find bugs in the code that-- 00:35:34.130 --> 00:35:35.430 oh, to exploit. 00:35:35.430 --> 00:35:38.417 So it might be easier to hack into the software 00:35:38.417 --> 00:35:41.000 by finding mistakes you and I made because, literally, they're 00:35:41.000 --> 00:35:43.370 staring at you in code, whereas the zeros and ones make 00:35:43.370 --> 00:35:45.080 it way less obvious. 00:35:45.080 --> 00:35:48.140 Other downsides of what I called decompiling? 00:35:48.140 --> 00:35:49.970 Yeah? 00:35:49.970 --> 00:35:53.690 AUDIENCE: If stuff is copyrighted or you don't even know how to get it-- 00:35:53.690 --> 00:35:54.440 DAVID MALAN: Yeah. 00:35:54.440 --> 00:35:55.948 AUDIENCE: [INAUDIBLE] 00:35:55.948 --> 00:35:57.740 DAVID MALAN: Yeah, if your code, your work, 00:35:57.740 --> 00:36:00.950 is your intellectual property, copyrighted or otherwise, that's 00:36:00.950 --> 00:36:03.660 kind of obnoxious that someone can just run a command, and boom, 00:36:03.660 --> 00:36:05.577 they can see the original code that you wrote. 00:36:05.577 --> 00:36:08.490 Now, it turns out it's not quite as simple as that. 00:36:08.490 --> 00:36:11.720 And so even though, yes, you could take a program like Hello, 00:36:11.720 --> 00:36:15.080 or even Microsoft Word, and convert it from zeros and ones 00:36:15.080 --> 00:36:19.400 back to some form of source code-- be it in C or Java 00:36:19.400 --> 00:36:22.820 or Python or something else, whatever it was originally written in-- odds 00:36:22.820 --> 00:36:25.800 are it's going to be an utter mess to look at. 00:36:25.800 --> 00:36:26.300 Why? 00:36:26.300 --> 00:36:30.390 Because things variable names are not retained in the zeros and ones, 00:36:30.390 --> 00:36:30.890 typically. 00:36:30.890 --> 00:36:33.980 Function names might not be retained in the zeros and ones. 00:36:33.980 --> 00:36:36.350 The code is, the logic is, but the computer 00:36:36.350 --> 00:36:38.510 doesn't care what pretty variables you chose 00:36:38.510 --> 00:36:41.060 and how nicely named your functions were, it just 00:36:41.060 --> 00:36:42.890 needs to know them as zeros and ones. 00:36:42.890 --> 00:36:46.370 Moreover, if you think about last week, we introduced things like loops in C. 00:36:46.370 --> 00:36:49.745 And besides for loops, there's what other kind of loop, for instance? 00:36:49.745 --> 00:36:50.620 AUDIENCE: [INAUDIBLE] 00:36:50.620 --> 00:36:53.412 DAVID MALAN: So, a while loop-- and even though they look different 00:36:53.412 --> 00:36:55.920 and you have to write different code, they achieve exactly 00:36:55.920 --> 00:36:59.910 the same functionality, which is to say, when you compile a for loop 00:36:59.910 --> 00:37:04.140 or you compile a while loop, if they logically do the same thing, 00:37:04.140 --> 00:37:07.420 they might end up looking identical as zeros and ones. 00:37:07.420 --> 00:37:09.780 And so, therefore, it's not necessarily predictable 00:37:09.780 --> 00:37:11.820 that you'll get back the original code, why? 00:37:11.820 --> 00:37:15.110 Because the zeros and ones might not know, so to speak, 00:37:15.110 --> 00:37:16.860 whether it was a for loop or a while loop, 00:37:16.860 --> 00:37:19.350 so maybe compiling will show you one or the other. 00:37:19.350 --> 00:37:21.870 And honestly, decompiling, while possible-- and it's 00:37:21.870 --> 00:37:24.570 one way of reverse engineering someone's product. 00:37:24.570 --> 00:37:28.662 Odds are, if you're good enough to start reading code that's been decompiled 00:37:28.662 --> 00:37:30.870 and reading through the messiness of it, odds are you 00:37:30.870 --> 00:37:34.020 have the talent probably to just write that same program from scratch 00:37:34.020 --> 00:37:34.650 yourself. 00:37:34.650 --> 00:37:36.870 Now, that's an overstatement, perhaps, but it's not 00:37:36.870 --> 00:37:40.410 quite as easy or threatening as you might first think. 00:37:40.410 --> 00:37:43.290 So in general, once code is compiled, it's 00:37:43.290 --> 00:37:48.290 pretty challenging, time consuming, costly to reverse engineer it, much 00:37:48.290 --> 00:37:50.040 like it would be in the real world, right? 00:37:50.040 --> 00:37:52.860 Like all of us have some kind of phone, probably, nowadays in our pocket. 00:37:52.860 --> 00:37:55.193 There's nothing stopping you from opening it up somehow, 00:37:55.193 --> 00:37:57.060 poking around, recreating what's there. 00:37:57.060 --> 00:37:59.130 That's a huge amount of effort, most likely. 00:37:59.130 --> 00:38:01.880 And at that point, maybe you should just invent the phone, instead 00:38:01.880 --> 00:38:03.310 of trying to reverse engineer it. 00:38:03.310 --> 00:38:06.330 So same kind of idea in the physical world. 00:38:06.330 --> 00:38:13.050 Any questions, then, on compiling, or even decompiling in these forms? 00:38:13.050 --> 00:38:17.160 All right, so odds are, at this point, not only I, but you have made mistakes. 00:38:17.160 --> 00:38:19.050 And you've written buggy code-- 00:38:19.050 --> 00:38:22.350 a bug in a code is just a mistake, a logical error 00:38:22.350 --> 00:38:26.490 or otherwise, where the code just does not behave correctly as you intend. 00:38:26.490 --> 00:38:29.880 And up until now, odds are, your debugging techniques 00:38:29.880 --> 00:38:32.910 have been to maybe look back at what I did in class, maybe 00:38:32.910 --> 00:38:35.320 ask a question online or in-person. 00:38:35.320 --> 00:38:38.190 But ultimately, it'd be nice if you had some tools of your own 00:38:38.190 --> 00:38:39.570 with which to debug code. 00:38:39.570 --> 00:38:41.587 And this, honestly, is a lifelong skill. 00:38:41.587 --> 00:38:43.170 You're not going to emerge from CS50-- 00:38:43.170 --> 00:38:44.490 and even 20 years from now, you're not going 00:38:44.490 --> 00:38:47.910 to be writing-- if you're writing code at all-- correct code all of the time. 00:38:47.910 --> 00:38:50.820 Like, all of us on the staff continue to write bugs. 00:38:50.820 --> 00:38:54.120 Hopefully, they get a little more sophisticated, and not sort of like, 00:38:54.120 --> 00:38:55.540 oops, I missed a semicolon. 00:38:55.540 --> 00:38:57.660 But even those kinds of mistakes, we make too. 00:38:57.660 --> 00:39:00.150 But there's tools out there and techniques 00:39:00.150 --> 00:39:03.550 that can make your life easier when it comes to solving those problems. 00:39:03.550 --> 00:39:06.360 Now, the term bug has actually been around for decades. 00:39:06.360 --> 00:39:11.790 But a fun story to tell is that the first documented actual bug was 00:39:11.790 --> 00:39:13.650 actually somehow connected to Harvard. 00:39:13.650 --> 00:39:18.870 In fact, this is the logbook relating to the Harvard Mark II computer 00:39:18.870 --> 00:39:22.890 from 1947, whereby if you read the notes here-- and I'll Zoom in-- this 00:39:22.890 --> 00:39:27.630 was an actual moth discovered inside of this big mainframe computer that 00:39:27.630 --> 00:39:29.160 was causing some kind of problems. 00:39:29.160 --> 00:39:30.450 And the engineers there at the time actually 00:39:30.450 --> 00:39:33.610 thought it was funny that, wow, physical bug actually explains the issue. 00:39:33.610 --> 00:39:36.450 And it's been forever taped to this sheet of paper, which I believe 00:39:36.450 --> 00:39:39.090 now is on display in the Smithsonian. 00:39:39.090 --> 00:39:43.260 With that said, this is just representative, too, of a logical bug. 00:39:43.260 --> 00:39:45.390 And that story is actually-- 00:39:45.390 --> 00:39:49.170 that story was often retold by a famous mathematician, then computer scientist 00:39:49.170 --> 00:39:53.640 really, Dr. Grace Hopper, who actually worked not only on the Harvard Mark II 00:39:53.640 --> 00:39:57.210 computer, but its predecessor, the Harvard Mark I. 00:39:57.210 --> 00:40:01.020 And if you ever spent time, yet, in the engineering building across the river 00:40:01.020 --> 00:40:04.103 here, you can actually see much of this computer, which 00:40:04.103 --> 00:40:07.020 is along the wall when you first walk into the Science and Engineering 00:40:07.020 --> 00:40:07.530 Complex. 00:40:07.530 --> 00:40:09.530 And indeed, as you've probably heard growing up, 00:40:09.530 --> 00:40:11.070 this is a mainframe computer. 00:40:11.070 --> 00:40:15.210 This is what Macs and PCs, so to speak, looked like back in the day, 00:40:15.210 --> 00:40:18.240 with very physical things that essentially implemented the zeros 00:40:18.240 --> 00:40:21.900 and ones that you and I take for granted now being miniaturized in our laptops 00:40:21.900 --> 00:40:22.410 and phones. 00:40:22.410 --> 00:40:23.910 So there's a piece of history there. 00:40:23.910 --> 00:40:27.390 If you visit that side of campus sometime, do take a look. 00:40:27.390 --> 00:40:30.480 But let's consider, then, how we solve not, of course, physical bugs, 00:40:30.480 --> 00:40:31.350 but logical bugs. 00:40:31.350 --> 00:40:33.600 And let's consider something like this from last week, 00:40:33.600 --> 00:40:38.820 whereby, we were trying very simply to print like this column of three bricks 00:40:38.820 --> 00:40:40.320 using hashtags of sorts. 00:40:40.320 --> 00:40:44.400 So let me go over here in just a moment to VS Code. 00:40:44.400 --> 00:40:47.080 And I'm going to go ahead and open a program I wrote in advance. 00:40:47.080 --> 00:40:49.455 And I'm bringing it to class because there's a bug in it, 00:40:49.455 --> 00:40:51.510 and I'd like to figure out how to solve this bug. 00:40:51.510 --> 00:40:56.160 So let me open up a buggy0.c, which is version 0 of my code. 00:40:56.160 --> 00:40:58.200 And let's just take a quick peek at what's here. 00:40:58.200 --> 00:40:58.950 It's pretty short. 00:40:58.950 --> 00:41:03.750 It includes only stdio.h, it uses printf, it uses a for loop, 00:41:03.750 --> 00:41:07.797 and the goal, quite simply, is to print out that column of three bricks. 00:41:07.797 --> 00:41:11.130 Now, it's short enough that some of you, if you're getting comfy already with C, 00:41:11.130 --> 00:41:13.360 you might already see the logical bug. 00:41:13.360 --> 00:41:16.200 It's not a syntax error, like it will compile and run. 00:41:16.200 --> 00:41:17.280 But there's a bug there. 00:41:17.280 --> 00:41:22.320 And suppose that I'm very new to C, I'm very uncomfortable with C, it's 2:00 AM 00:41:22.320 --> 00:41:26.130 and I just can't see the bug, what are my recourses here for actually 00:41:26.130 --> 00:41:27.745 finding a mistake like this? 00:41:27.745 --> 00:41:29.370 Well, first, let's look at the symptom. 00:41:29.370 --> 00:41:31.740 Let me go down to my terminal window. 00:41:31.740 --> 00:41:36.120 I'm going to use make buggy0 because, again, the file is called buggyo.c. 00:41:36.120 --> 00:41:37.260 I'm not going to use clang. 00:41:37.260 --> 00:41:39.880 In fact, I'm never really going to use clang manually here on out. 00:41:39.880 --> 00:41:42.430 I'm just going to use make because it makes our lives easier. 00:41:42.430 --> 00:41:43.560 It does compile. 00:41:43.560 --> 00:41:45.390 No errors, so it's not syntax. 00:41:45.390 --> 00:41:47.670 It's not something silly like a missing semicolon. 00:41:47.670 --> 00:41:53.190 But when I run ./buggy0, I, of course, see one, two, three, four-- 00:41:53.190 --> 00:41:57.990 and this, of course, does not match the one, two, three bricks that I actually 00:41:57.990 --> 00:41:59.610 intended for that column. 00:41:59.610 --> 00:42:02.970 And yet, I'm starting counting at 0, as I usually do. 00:42:02.970 --> 00:42:03.930 I've got three. 00:42:03.930 --> 00:42:05.280 I'm going up to three. 00:42:05.280 --> 00:42:06.780 So where is my logical error? 00:42:06.780 --> 00:42:10.150 If it hasn't obviously jumped out at you already, well, how can I solve this? 00:42:10.150 --> 00:42:13.080 Well, first and foremost, perhaps the best technique 00:42:13.080 --> 00:42:16.080 for solving bugs, at least early on, is just use printf. 00:42:16.080 --> 00:42:20.020 Like thus far, we've used sprint say, Hello, and other things on the screen. 00:42:20.020 --> 00:42:22.530 But printf is just a function for printing anything. 00:42:22.530 --> 00:42:24.570 And there's no reason you can't temporarily 00:42:24.570 --> 00:42:27.900 use printf to print out the contents of variables, 00:42:27.900 --> 00:42:29.850 what's going on inside of your program, just 00:42:29.850 --> 00:42:31.350 to figure out where your mistake is. 00:42:31.350 --> 00:42:32.940 And then you can delete that line of code later. 00:42:32.940 --> 00:42:34.600 It doesn't have to stay there forever. 00:42:34.600 --> 00:42:35.740 So let me do this. 00:42:35.740 --> 00:42:39.450 Instead of just printing out in VS Code the hash symbol, 00:42:39.450 --> 00:42:45.690 let me do a little safety check here and print out the value of i. 00:42:45.690 --> 00:42:49.170 So let me go ahead and say something like, i is-- 00:42:49.170 --> 00:42:51.610 now I want to say i is this. 00:42:51.610 --> 00:42:54.540 But, of course, this is not how I print out the value of i. 00:42:54.540 --> 00:42:58.930 If I want to print out the value of i, what should I put here? 00:42:58.930 --> 00:43:02.160 So %i for integer, instead of %s for string. 00:43:02.160 --> 00:43:03.410 So they're still placeholders. 00:43:03.410 --> 00:43:04.930 But we use %s for integers. 00:43:04.930 --> 00:43:08.450 And now if I want to print out i, I just need the comma as the second argument, 00:43:08.450 --> 00:43:09.250 and then i. 00:43:09.250 --> 00:43:13.000 All right, let me go ahead and back to my terminal window. 00:43:13.000 --> 00:43:15.760 Let me recompile the program because I've changed it. 00:43:15.760 --> 00:43:18.880 That still works fine, ./buggy0. 00:43:18.880 --> 00:43:22.540 And now, let me increase the size of my terminal window here. 00:43:22.540 --> 00:43:25.510 You just see some diagnostic information, if you will. 00:43:25.510 --> 00:43:26.560 This is not the goal. 00:43:26.560 --> 00:43:29.393 This is not what you should be submitting for this homework problem, 00:43:29.393 --> 00:43:30.070 were it one. 00:43:30.070 --> 00:43:33.730 But it is helping us diagnostically know that, OK, when i is zero, 00:43:33.730 --> 00:43:34.450 here's a hash. 00:43:34.450 --> 00:43:36.182 When i is 1, here's a hash. 00:43:36.182 --> 00:43:37.390 When i is two, here's a hash. 00:43:37.390 --> 00:43:39.017 When i is 3, here's a hash. 00:43:39.017 --> 00:43:39.850 Well, wait a minute. 00:43:39.850 --> 00:43:41.530 That's one, two, three, four. 00:43:41.530 --> 00:43:44.360 So clearly, I'm printing it one too many times. 00:43:44.360 --> 00:43:48.130 So let me look back at the code here by shrinking my terminal window. 00:43:48.130 --> 00:43:53.080 And let me just ask the group, where is, in fact, the mistake? 00:43:53.080 --> 00:43:56.080 Or what, equivalently, would be the solution? 00:43:56.080 --> 00:43:57.561 Yeah, in the middle. 00:43:57.561 --> 00:44:00.020 AUDIENCE: [INAUDIBLE] 00:44:00.020 --> 00:44:03.550 DAVID MALAN: Yeah, instead of less than or equal to, use just less than. 00:44:03.550 --> 00:44:05.300 So you've got to kind of pick a lane here. 00:44:05.300 --> 00:44:08.630 If you're going to start counting from 0, you generally use less than, 00:44:08.630 --> 00:44:10.880 and go up to, but not through the value. 00:44:10.880 --> 00:44:13.970 Or if you prefer, like in the human world, counting from 1 on up, 00:44:13.970 --> 00:44:17.300 you can use less than or equal to, but you have to be consistent. 00:44:17.300 --> 00:44:19.790 And in general, as a programmer, just always start 00:44:19.790 --> 00:44:22.610 counting from 0 if you're doing something canonical like this. 00:44:22.610 --> 00:44:25.160 But the solution is, indeed, just to change this 00:44:25.160 --> 00:44:27.860 by changing the greater less than or equal to the less than. 00:44:27.860 --> 00:44:34.340 If I recompile this program with make buggy0, and then do .buggy0 again-- 00:44:34.340 --> 00:44:36.500 and let me increase the size of my terminal window. 00:44:36.500 --> 00:44:39.050 Now, you see, OK, almost the same output. 00:44:39.050 --> 00:44:44.330 But indeed, i starts at 0 and goes up to, but not through, three. 00:44:44.330 --> 00:44:48.920 All right, so printf, in short, can be your first diagnostic tool. 00:44:48.920 --> 00:44:51.500 Instead of just staring at the screen or raising your hand-- 00:44:51.500 --> 00:44:55.490 I mean, use printf to see, literally, what's going on inside of your program 00:44:55.490 --> 00:44:57.287 by just printing out things of interest. 00:44:57.287 --> 00:44:59.120 And then once you've solved the problem, you 00:44:59.120 --> 00:45:02.840 can go back into your code, as I'll do here, by shrinking my terminal window. 00:45:02.840 --> 00:45:04.610 I'll delete the printf line. 00:45:04.610 --> 00:45:07.100 And now I'm ready to share this program with the world 00:45:07.100 --> 00:45:08.870 or submit it as homework or the like. 00:45:08.870 --> 00:45:11.390 It's just meant there to be temporary. 00:45:11.390 --> 00:45:15.440 Any questions on printf as a debugging tool? 00:45:18.010 --> 00:45:18.510 No? 00:45:18.510 --> 00:45:20.970 All right, well, that only gets us so far. 00:45:20.970 --> 00:45:23.430 And honestly, as your programs grow and grow and grow, 00:45:23.430 --> 00:45:25.180 it's going to actually get really annoying 00:45:25.180 --> 00:45:28.860 to start going in and adding printf's, then removing them, and figuring out, 00:45:28.860 --> 00:45:31.860 if you've got multiple printf's, well, which one printed what? 00:45:31.860 --> 00:45:34.560 It just gets messy, eventually, to rely on printf alone. 00:45:34.560 --> 00:45:37.740 So being a computer scientist, computer scientists 00:45:37.740 --> 00:45:41.040 have written software to make it easier to debug code. 00:45:41.040 --> 00:45:44.040 That software is what we would generally call a debugger, which 00:45:44.040 --> 00:45:47.040 would be the second tool of the trade that you can use to actually solve 00:45:47.040 --> 00:45:48.610 problems in your code. 00:45:48.610 --> 00:45:52.690 Now, in the world of VS code, there's actually a debugger built in. 00:45:52.690 --> 00:45:54.840 So the graphical user interface you're about to see 00:45:54.840 --> 00:45:58.260 in VS Code isn't specific to CS50, it actually comes with VS Code. 00:45:58.260 --> 00:46:01.230 And it supports C, and C++, and Java, and Python, 00:46:01.230 --> 00:46:03.030 and lots of other languages too. 00:46:03.030 --> 00:46:05.640 But it's, admittedly, a little complicated 00:46:05.640 --> 00:46:07.650 to just start using the debugger. 00:46:07.650 --> 00:46:10.200 You have to create a configuration file and do 00:46:10.200 --> 00:46:13.480 some annoying steps that just get in the way of solving real problems. 00:46:13.480 --> 00:46:17.070 So we have automated the process for you of just starting the debugger. 00:46:17.070 --> 00:46:19.680 And thereafter, it's sort of industry standard how you use it. 00:46:19.680 --> 00:46:23.380 But we save you the headache of having to create those configuration files. 00:46:23.380 --> 00:46:25.330 So, suppose I want to do this. 00:46:25.330 --> 00:46:27.600 Suppose I want to try to debug this program 00:46:27.600 --> 00:46:30.330 step by step using special software. 00:46:30.330 --> 00:46:31.810 Well, how can I do that? 00:46:31.810 --> 00:46:36.240 Well, let me propose that if I revert this back to the original version 00:46:36.240 --> 00:46:40.530 where i was less than or equal to 3, I'm pretty sure that I 00:46:40.530 --> 00:46:41.790 was printing too many hashes. 00:46:41.790 --> 00:46:43.350 So I'm going to do this-- and you might have done this 00:46:43.350 --> 00:46:45.160 accidentally or never at all. 00:46:45.160 --> 00:46:49.500 But notice if you hover over the gutter, so to speak, in VS Code, the part of it 00:46:49.500 --> 00:46:52.590 all the way to the left of the editor, you see this sort of grayed 00:46:52.590 --> 00:46:54.390 out red dot. 00:46:54.390 --> 00:46:57.240 If you click there, it becomes a brighter red dot. 00:46:57.240 --> 00:46:59.670 And this represents what we're going to call a breakpoint. 00:46:59.670 --> 00:47:03.090 And this is just a visual indicator that you've put like a stop sign equivalent 00:47:03.090 --> 00:47:06.270 there, and you're telling the debugger in a moment, stop 00:47:06.270 --> 00:47:07.350 running my code there. 00:47:07.350 --> 00:47:07.920 Why? 00:47:07.920 --> 00:47:11.610 Because I prefer to step through my code at sort of a human speed, 00:47:11.610 --> 00:47:14.380 and not as computer speed where it runs all at once. 00:47:14.380 --> 00:47:16.750 So I've set my breakpoint, which is step one. 00:47:16.750 --> 00:47:18.580 And then step two is quite simply this. 00:47:18.580 --> 00:47:23.190 Instead of running the program itself, run the command called debug50, 00:47:23.190 --> 00:47:26.010 and then ./buggy0. 00:47:26.010 --> 00:47:29.220 And now this will start your program, but inside 00:47:29.220 --> 00:47:31.200 of the debugger, which is a special program 00:47:31.200 --> 00:47:33.060 that smart people wrote that will empower 00:47:33.060 --> 00:47:38.190 you to now step through your code line by line, and again, at your own comfort 00:47:38.190 --> 00:47:38.970 pace. 00:47:38.970 --> 00:47:43.080 I'm going to hit Enter, some stuff's going to happen on the screen-- whoops. 00:47:43.080 --> 00:47:45.767 Notice, this is a common mistake that I made accidentally here. 00:47:45.767 --> 00:47:47.100 Looks like I've changed my code. 00:47:47.100 --> 00:47:49.892 I did because I went in and changed the less than or equal to sign. 00:47:49.892 --> 00:47:52.860 So let me go ahead and rerun make buggy0-- 00:47:52.860 --> 00:47:53.520 Enter. 00:47:53.520 --> 00:47:55.590 Good, now let me rerun debug50-- 00:47:55.590 --> 00:47:57.810 Enter. 00:47:57.810 --> 00:47:59.760 And now some stuff just happened on the screen 00:47:59.760 --> 00:48:03.270 and it takes a moment to get started but once it's started you'll 00:48:03.270 --> 00:48:06.010 see this you'll still see your code. 00:48:06.010 --> 00:48:09.410 But you'll see this yellow highlight, which you've probably not seen before. 00:48:09.410 --> 00:48:11.910 And notice that it's specifically highlighting the same line 00:48:11.910 --> 00:48:13.440 that I set a breakpoint on. 00:48:13.440 --> 00:48:13.950 Why? 00:48:13.950 --> 00:48:18.870 That just means the debugger has executed all of these lines, 00:48:18.870 --> 00:48:20.670 except for line 7. 00:48:20.670 --> 00:48:23.340 It has broken at-- not in a bad way. 00:48:23.340 --> 00:48:27.580 But it has paused execution on line 7, so it hasn't yet printed any hashes. 00:48:27.580 --> 00:48:30.450 And you can see that-- no hashes in the terminal window yet. 00:48:30.450 --> 00:48:31.980 It's paused execution. 00:48:31.980 --> 00:48:35.190 But what's interesting with the debugger is the stuff 00:48:35.190 --> 00:48:37.410 over here on the left-hand side. 00:48:37.410 --> 00:48:39.960 In the debugger here, you'll see, under variables, 00:48:39.960 --> 00:48:41.910 all of your so-called local variables. 00:48:41.910 --> 00:48:44.160 And we haven't really made a distinction between local 00:48:44.160 --> 00:48:45.327 and something called global. 00:48:45.327 --> 00:48:48.000 But for now, local variables just means all of the variables 00:48:48.000 --> 00:48:49.390 that exist in your function. 00:48:49.390 --> 00:48:52.110 So i currently has a value of 0. 00:48:52.110 --> 00:48:53.410 OK, and that makes sense. 00:48:53.410 --> 00:48:57.360 So now, how do I step through my code and see what it's doing? 00:48:57.360 --> 00:48:59.610 Well, at the top of the screen here, you'll 00:48:59.610 --> 00:49:02.250 see some playback icons, kind of like a video player, 00:49:02.250 --> 00:49:03.630 but they have special meaning. 00:49:03.630 --> 00:49:07.892 This first one will just play the rest of your program all the way to the end. 00:49:07.892 --> 00:49:10.350 So you only click that if you've sort of solved the problem 00:49:10.350 --> 00:49:13.110 and you just want to run it to completion like before. 00:49:13.110 --> 00:49:14.370 But the next three-- 00:49:14.370 --> 00:49:16.920 or next two, really, are really the juiciest. 00:49:16.920 --> 00:49:19.710 The second one here, if you hover over it, eventually, 00:49:19.710 --> 00:49:21.930 you'll see that it's called Step Over. 00:49:21.930 --> 00:49:25.170 Step Over means that the debugger will run 00:49:25.170 --> 00:49:28.630 this currently highlighted line of code, but it's not going to dive into it. 00:49:28.630 --> 00:49:30.660 So if it's a function like printf, it's not 00:49:30.660 --> 00:49:32.827 going to start stepping through printf line by line. 00:49:32.827 --> 00:49:33.327 Why? 00:49:33.327 --> 00:49:36.420 Because I can pretty much assume printf, written decades ago, is correct. 00:49:36.420 --> 00:49:38.050 Problem's probably with me. 00:49:38.050 --> 00:49:42.690 But this next line, if I did really want to step into the printf code 00:49:42.690 --> 00:49:46.110 to figure out how it works or find some problem in it all these years later, 00:49:46.110 --> 00:49:48.810 you can step into printf, and then the screen would change, 00:49:48.810 --> 00:49:50.910 and you'd see each of the lines for printf, 00:49:50.910 --> 00:49:54.250 line by line-- at least if you have the source code for printf installed. 00:49:54.250 --> 00:49:56.490 All right, I'm going to use the first one, Step Over. 00:49:56.490 --> 00:49:59.130 And watch as the yellow highlight moves. 00:49:59.130 --> 00:50:03.060 And watch as, in the terminal window, there's a hash symbol. 00:50:03.060 --> 00:50:03.780 Here we go. 00:50:03.780 --> 00:50:05.130 There's one hash. 00:50:05.130 --> 00:50:07.230 Now, notice line 5 is highlighted. 00:50:07.230 --> 00:50:09.480 That means it has paused on line 5. 00:50:09.480 --> 00:50:11.350 Line 5 has not yet been executed. 00:50:11.350 --> 00:50:12.600 So what does that mean? 00:50:12.600 --> 00:50:16.320 The value of i, per the top left-hand corner, is still 0. 00:50:16.320 --> 00:50:18.920 But as soon as I click Step Over again, watch 00:50:18.920 --> 00:50:24.470 what happens at the top left, where i is a variable on the screen. 00:50:24.470 --> 00:50:26.420 Now i-- and it flashed briefly-- 00:50:26.420 --> 00:50:27.920 has a value of 1. 00:50:27.920 --> 00:50:30.650 And now if I step over again, watch the terminal window. 00:50:30.650 --> 00:50:32.120 There's my second hash. 00:50:32.120 --> 00:50:36.380 Now, let me click Step Over on for loop, watch the variable at top left. 00:50:36.380 --> 00:50:38.567 Now 1 goes to 2. 00:50:38.567 --> 00:50:39.650 Now let me click it again. 00:50:39.650 --> 00:50:43.220 Third hash-- and here's where the logical error is perhaps revealed. 00:50:43.220 --> 00:50:45.210 Let me go ahead and step over the loop. 00:50:45.210 --> 00:50:46.520 Now i is 3. 00:50:46.520 --> 00:50:49.280 Wait a minute, I'm still going to print out a hash. 00:50:49.280 --> 00:50:49.810 There it is. 00:50:49.810 --> 00:50:50.810 There's the fourth hash. 00:50:50.810 --> 00:50:53.852 And at this point, hopefully, the light bulb, proverbially, has gone off. 00:50:53.852 --> 00:50:55.020 I realize, oh, I screwed up. 00:50:55.020 --> 00:50:58.580 I can either stop the program altogether with the red square, 00:50:58.580 --> 00:51:01.100 or I can just let it run all the way to the end, which 00:51:01.100 --> 00:51:02.493 just terminates everything. 00:51:02.493 --> 00:51:05.660 At this point, I just want to get back into my code and start fixing things. 00:51:05.660 --> 00:51:07.700 And you can close, for instance, as I will here, 00:51:07.700 --> 00:51:10.670 the File Explorer, just to hide the panel that opened. 00:51:10.670 --> 00:51:12.320 So that's debug50. 00:51:12.320 --> 00:51:15.920 But it's not a CS50 thing, that just starts the debugger for you, which 00:51:15.920 --> 00:51:19.520 is something you'd find in most any programming environment nowadays. 00:51:19.520 --> 00:51:23.670 Questions on debugging? 00:51:23.670 --> 00:51:24.170 Questions? 00:51:24.170 --> 00:51:24.670 Yeah? 00:51:24.670 --> 00:51:27.295 AUDIENCE: Where does it tell you where it went wrong? 00:51:27.295 --> 00:51:28.420 DAVID MALAN: Good question. 00:51:28.420 --> 00:51:30.310 Where does it tell you where it went wrong? 00:51:30.310 --> 00:51:33.190 So, sadly, it does not tell you any of that. 00:51:33.190 --> 00:51:37.570 The onus is still on you, the human, to use this tool productively to walk 00:51:37.570 --> 00:51:39.580 through your code at a saner pace. 00:51:39.580 --> 00:51:42.070 But your brain is the one that still needs to solve it. 00:51:42.070 --> 00:51:45.190 And I don't doubt, down the line, with artificial intelligence and more, 00:51:45.190 --> 00:51:47.350 programs like this will get all the more helpful, 00:51:47.350 --> 00:51:49.160 and start answering questions like that for us. 00:51:49.160 --> 00:51:51.340 And there are other tools we'll introduce you this semester 00:51:51.340 --> 00:51:52.990 that are even more powerful than this. 00:51:52.990 --> 00:51:56.770 But for now, it's just a tool, really, to slow things down and not 00:51:56.770 --> 00:51:57.820 have to change your code. 00:51:57.820 --> 00:52:01.420 The fact that I had that panel on the left that just showed me i's changing 00:52:01.420 --> 00:52:04.150 value is just an alternative to printf, and I can 00:52:04.150 --> 00:52:06.820 step through it a little more slowly. 00:52:06.820 --> 00:52:10.580 Other questions on debugging? 00:52:10.580 --> 00:52:11.080 No? 00:52:11.080 --> 00:52:14.950 Let me show you one final example with this debugger here. 00:52:14.950 --> 00:52:16.750 And this one, too, I wrote in advance. 00:52:16.750 --> 00:52:18.730 Let me close buggy0.c. 00:52:18.730 --> 00:52:22.327 And let me open up buggy1.c, my second version thereof. 00:52:22.327 --> 00:52:24.160 Let me close my terminal window for a second 00:52:24.160 --> 00:52:26.350 and give you a quick tour of this program, which 00:52:26.350 --> 00:52:28.030 similarly, has a mistake. 00:52:28.030 --> 00:52:32.830 Now, at the top of this program, some familiar includes, cs50.h and stdio.h. 00:52:32.830 --> 00:52:34.730 This is not something we've seen before. 00:52:34.730 --> 00:52:36.190 It's specific to this example-- 00:52:36.190 --> 00:52:38.830 a function called getNegativeInt. 00:52:38.830 --> 00:52:41.043 Takes no arguments, and it returns an integer. 00:52:41.043 --> 00:52:41.710 What does it do? 00:52:41.710 --> 00:52:45.040 It literally gets a negative integer, ideally, from the user. 00:52:45.040 --> 00:52:47.200 Fun fact, though, it doesn't correctly. 00:52:47.200 --> 00:52:50.090 That's the bug. getNegativeInt is broken at the moment. 00:52:50.090 --> 00:52:51.470 So what does main do? 00:52:51.470 --> 00:52:54.130 Well, main just calls this function, passing in nothing 00:52:54.130 --> 00:52:55.690 in parentheses, no inputs. 00:52:55.690 --> 00:52:58.240 And it stores the return value in i. 00:52:58.240 --> 00:53:00.260 And then it just prints out i on the screen. 00:53:00.260 --> 00:53:03.910 So honestly, just by eyeballing this, I feel comfortable enough 00:53:03.910 --> 00:53:06.365 with programming in C, I think main is correct. 00:53:06.365 --> 00:53:07.990 Let me just stipulate, main is correct. 00:53:07.990 --> 00:53:09.698 But there is going to be a bug down here. 00:53:09.698 --> 00:53:11.210 Now, what's the bug down here? 00:53:11.210 --> 00:53:14.830 Well, let me look at getNegativeInt's implementation. 00:53:14.830 --> 00:53:18.970 Notice, this first line, 12, is identical to the prototype up here. 00:53:18.970 --> 00:53:22.690 The prototype is sort of stupidly required up here 00:53:22.690 --> 00:53:25.300 because C reads things top to bottom, left to right-- 00:53:25.300 --> 00:53:26.690 the compiler technically does. 00:53:26.690 --> 00:53:29.680 So if you reference getNegativeInt here, but you 00:53:29.680 --> 00:53:33.490 don't implement it until down here, and you haven't told C in advance 00:53:33.490 --> 00:53:36.820 that it will exist, again, you get the error we saw last week. 00:53:36.820 --> 00:53:39.010 All right, so how does getNegativeInt work? 00:53:39.010 --> 00:53:40.960 We declare a variable called n. 00:53:40.960 --> 00:53:43.540 We've got to do while loop that does what? 00:53:43.540 --> 00:53:47.110 It uses getInt, which comes with the cs50 library, per last week. 00:53:47.110 --> 00:53:49.480 It prompts the user for negative integer, quote unquote, 00:53:49.480 --> 00:53:51.670 and stores the value in n. 00:53:51.670 --> 00:53:56.800 I then do all of this while n is less than 0, right? 00:53:56.800 --> 00:54:00.400 Remember, we used to do while loop last week to make sure the human cooperates 00:54:00.400 --> 00:54:03.970 and doesn't give us the wrong type of value, be it positive or negative 00:54:03.970 --> 00:54:04.970 or something else. 00:54:04.970 --> 00:54:06.400 And then we return n. 00:54:06.400 --> 00:54:07.570 And there's some subtleties. 00:54:07.570 --> 00:54:12.970 Anyone recall-- or have an intuition for why I've declared n on line 14, 00:54:12.970 --> 00:54:15.790 instead of line 17? 00:54:15.790 --> 00:54:17.620 This is a C specific thing. 00:54:17.620 --> 00:54:23.465 AUDIENCE: [INAUDIBLE] 00:54:23.465 --> 00:54:24.340 DAVID MALAN: Exactly. 00:54:24.340 --> 00:54:27.610 There's this notion of scope in C. And we'll continue to see this over time, 00:54:27.610 --> 00:54:32.590 whereby, a variable only exists inside of the most recent curly braces 00:54:32.590 --> 00:54:33.560 that you've opened. 00:54:33.560 --> 00:54:36.910 So if I've declared n here on line 14, I can use it 00:54:36.910 --> 00:54:40.900 anywhere between lines 13 and 21 because those are the nearest curly braces. 00:54:40.900 --> 00:54:43.540 If by contrast, as you note, if I instead said this, 00:54:43.540 --> 00:54:49.180 int n equals getInt and so forth, and didn't have the current line 14, 00:54:49.180 --> 00:54:53.470 well, n would exist inside of these curly braces, but not here, which 00:54:53.470 --> 00:54:55.340 is too late, and definitely not here. 00:54:55.340 --> 00:54:59.480 So you just have to declare it first, and then use and reuse it as such. 00:54:59.480 --> 00:55:01.545 Now, let me just show you how I can debug this. 00:55:01.545 --> 00:55:03.170 But let me show you the symptoms first. 00:55:03.170 --> 00:55:04.930 Let me open my terminal window. 00:55:04.930 --> 00:55:06.970 Let me run make buggy1. 00:55:06.970 --> 00:55:11.710 Compiles OK, so it's not something silly like a semicolon. ./buggy1, 00:55:11.710 --> 00:55:13.660 and I'm asked for a negative integer. 00:55:13.660 --> 00:55:15.280 All right, let me give it negative 1-- 00:55:15.280 --> 00:55:16.710 Enter. 00:55:16.710 --> 00:55:19.920 Well, the main function is supposed to print out what I typed, 00:55:19.920 --> 00:55:20.880 but it clearly didn't. 00:55:20.880 --> 00:55:21.880 It's prompting me again. 00:55:21.880 --> 00:55:23.830 All right, so maybe it'll like negative 2. 00:55:23.830 --> 00:55:24.330 No? 00:55:24.330 --> 00:55:26.380 Maybe negative 3. 00:55:26.380 --> 00:55:27.570 50? 00:55:27.570 --> 00:55:29.160 OK, so it's definitely broken, right? 00:55:29.160 --> 00:55:31.528 It kind of seems logically to be doing the opposite. 00:55:31.528 --> 00:55:33.820 Now, you can perhaps see why this is happening already. 00:55:33.820 --> 00:55:37.170 These are deliberately simple programs for demonstrations sake. 00:55:37.170 --> 00:55:38.470 But let's do this. 00:55:38.470 --> 00:55:41.037 Let me go ahead and set a breakpoint in main, 00:55:41.037 --> 00:55:42.870 even though I'm pretty sure main is correct. 00:55:42.870 --> 00:55:45.810 But it just helps me start my thought process-- start with main, 00:55:45.810 --> 00:55:47.010 and then take it from there. 00:55:47.010 --> 00:55:51.840 Let me run now, debug50 ./buggy1-- 00:55:51.840 --> 00:55:52.920 Enter. 00:55:52.920 --> 00:55:53.700 And let's see. 00:55:53.700 --> 00:55:56.880 With that breakpoint now, the GUI is going to reconfigure itself. 00:55:56.880 --> 00:56:00.360 It's going to pause on line 8 because that's the first interesting line 00:56:00.360 --> 00:56:01.260 inside of main. 00:56:01.260 --> 00:56:03.780 So I could have just put the breakpoint on line 8 too. 00:56:03.780 --> 00:56:06.480 It's smart enough to know that if I set it on 6, 00:56:06.480 --> 00:56:09.570 you really mean line 8 because that's the first actual line of code. 00:56:09.570 --> 00:56:11.280 And watch, now, what happens. 00:56:11.280 --> 00:56:15.780 If I step over this line, notice that i, which at the moment 00:56:15.780 --> 00:56:18.090 seems to have a default value of 0-- 00:56:18.090 --> 00:56:19.470 more on that another time. 00:56:19.470 --> 00:56:24.750 But if I click Step Over like before, I'm prompted for a negative integer. 00:56:24.750 --> 00:56:25.750 Let me type negative 1-- 00:56:25.750 --> 00:56:27.300 Enter. 00:56:27.300 --> 00:56:32.470 And now, notice, there's no additional yellow highlight. 00:56:32.470 --> 00:56:32.970 Why? 00:56:32.970 --> 00:56:35.160 Where am I currently stuck, logically? 00:56:35.160 --> 00:56:37.937 AUDIENCE: [INAUDIBLE] 00:56:37.937 --> 00:56:40.770 DAVID MALAN: Yeah, just logically, I must be in that do, while loop. 00:56:40.770 --> 00:56:43.560 And even if you don't understand it, like that's the only explanation. 00:56:43.560 --> 00:56:46.143 If you keep getting prompted, surely, there's a loop going on. 00:56:46.143 --> 00:56:49.270 There's only one loop in my code, so there's probably a problem there. 00:56:49.270 --> 00:56:52.900 So I can't just set a breakpoint in main, and then wait for this to work. 00:56:52.900 --> 00:56:53.610 So let me just-- 00:56:53.610 --> 00:56:56.280 let me stop this with the red square. 00:56:56.280 --> 00:56:58.860 And let me think, all right, instead of-- 00:56:58.860 --> 00:57:02.770 I can still set my breakpoint in main, but let me rerun the debugger instead. 00:57:02.770 --> 00:57:05.470 And this time, not step over that line of code, 00:57:05.470 --> 00:57:07.930 let me step into that line of code. 00:57:07.930 --> 00:57:09.270 So watch what happens now. 00:57:09.270 --> 00:57:11.430 Instead of clicking the second icon here, 00:57:11.430 --> 00:57:14.610 let me click the third, whose name is, indeed, Step Into. 00:57:14.610 --> 00:57:17.880 And watch as the yellow highlight does not move to line 9. 00:57:17.880 --> 00:57:21.930 It dives into line 8-- the function on line 8, 00:57:21.930 --> 00:57:25.170 thereby, bringing me down to line 17. 00:57:25.170 --> 00:57:28.270 It's kind of going down into that next function. 00:57:28.270 --> 00:57:31.422 Now, it didn't bother pausing on line 12 or 13 or 14 00:57:31.422 --> 00:57:34.380 because there's nothing intellectually interesting there happening yet. 00:57:34.380 --> 00:57:37.080 The juicy part really starts, it would seem, in line 17. 00:57:37.080 --> 00:57:40.980 So, now notice, n is my variable at the top left. 00:57:40.980 --> 00:57:42.270 If I click-- 00:57:42.270 --> 00:57:45.420 I don't want to click Step Into now, though. 00:57:45.420 --> 00:57:48.090 What would go wrong if I click on Step Into-- 00:57:48.090 --> 00:57:52.480 or what would it do that I don't think I want to do? 00:57:52.480 --> 00:57:52.990 Yeah? 00:57:52.990 --> 00:57:54.755 AUDIENCE: [INAUDIBLE] 00:57:54.755 --> 00:57:56.630 DAVID MALAN: Yeah, it would step into getInt. 00:57:56.630 --> 00:57:59.620 But I'd like to think that the staff's version of getInt is correct, 00:57:59.620 --> 00:58:02.120 and that's not our problem today, so I want to step over it. 00:58:02.120 --> 00:58:06.710 And watch now at top left that nothing happens yet to the value of n 00:58:06.710 --> 00:58:09.530 until I go to the terminal window now, and I type in something 00:58:09.530 --> 00:58:10.670 like negative 1. 00:58:10.670 --> 00:58:14.600 Now notice, it jumps to line 19, which is the next interesting line. 00:58:14.600 --> 00:58:17.240 Top left, n, indeed, is negative 1. 00:58:17.240 --> 00:58:19.160 And here's where I can now pause as a human 00:58:19.160 --> 00:58:22.760 and think, all right, so while n is less than 0. 00:58:22.760 --> 00:58:25.280 All right, n, per the top left corner, is negative 1. 00:58:25.280 --> 00:58:27.830 So all right, while negative 1 is less than 0, 00:58:27.830 --> 00:58:29.780 well, obviously that's true mathematically. 00:58:29.780 --> 00:58:30.930 So what's going to happen? 00:58:30.930 --> 00:58:32.130 It's a do while loop. 00:58:32.130 --> 00:58:37.285 So when I click on Step Over again, it's going to go to this line 00:58:37.285 --> 00:58:39.410 because it's at the end of the inside of that loop. 00:58:39.410 --> 00:58:42.710 And now here, it's looping through again and again. 00:58:42.710 --> 00:58:44.240 All right, let me do this once more. 00:58:44.240 --> 00:58:45.980 I'm going to step over, all right? 00:58:45.980 --> 00:58:48.777 I'm going to type in negative 2, and it's the exact same thing. 00:58:48.777 --> 00:58:50.360 Now is my chance, on the yellow line-- 00:58:50.360 --> 00:58:51.260 OK, wait a minute. 00:58:51.260 --> 00:58:53.450 Negative 2 is obviously less than 0. 00:58:53.450 --> 00:58:56.080 Let me try this one more time. 00:58:56.080 --> 00:58:57.570 Click it once here. 00:58:57.570 --> 00:58:59.040 All right, let me give it 50. 00:58:59.040 --> 00:59:05.020 And now, OK, while 50 is less than 0, that's not true, 00:59:05.020 --> 00:59:08.970 so the loop is over because it's not going to do it while 50 is less than 0. 00:59:08.970 --> 00:59:09.730 That's not true. 00:59:09.730 --> 00:59:12.240 So now watch, when I click Step Over once more, 00:59:12.240 --> 00:59:15.810 it then finishes the loop, even though there's nothing more to do. 00:59:15.810 --> 00:59:17.610 It's now about to return n. 00:59:17.610 --> 00:59:21.360 It jumps back up to main, where I left off on line 9. 00:59:21.360 --> 00:59:23.778 It now prints, in my terminal window, the number 50. 00:59:23.778 --> 00:59:26.070 And hopefully, at this point, to your question earlier, 00:59:26.070 --> 00:59:30.700 my human brain has realized, oh, I'm an idiot, like I flipped my sign there. 00:59:30.700 --> 00:59:32.460 So I probably-- let me stop this. 00:59:32.460 --> 00:59:34.780 I probably want to do something like this. 00:59:34.780 --> 00:59:38.860 If the goal is to get a negative integer, I probably want to say, 00:59:38.860 --> 00:59:45.070 while n is, for instance, greater than or equal to 0 would work. 00:59:45.070 --> 00:59:48.630 So while n is greater than or equal to 0, keep doing this. 00:59:48.630 --> 00:59:50.430 And that's the logic I wanted to express. 00:59:50.430 --> 00:59:53.733 So the debugger just saves me from staring at the screen, raising a hand, 00:59:53.733 --> 00:59:54.900 sort of asking someone else. 00:59:54.900 --> 00:59:58.650 At least in this case, it allows me to go through it at a healthier pace. 00:59:58.650 --> 01:00:03.000 Questions now on debug50, which should be your new friend, even if it's not 01:00:03.000 --> 01:00:04.940 your first instinct after printf? 01:00:07.690 --> 01:00:09.190 Any questions on debug50? 01:00:09.190 --> 01:00:09.730 No? 01:00:09.730 --> 01:00:13.960 All right, well, there's one last technique we can equip you with here. 01:00:13.960 --> 01:00:17.470 And that is, in addition to printf and a debugger, no joke, 01:00:17.470 --> 01:00:21.400 a rubber duck is actually a reasonably recommended solution 01:00:21.400 --> 01:00:22.720 to finding bugs in your code. 01:00:22.720 --> 01:00:24.640 To your question earlier, the duck two is not 01:00:24.640 --> 01:00:26.390 going to solve the problem for you. 01:00:26.390 --> 01:00:29.710 But if you've wondered why this little guy has been here for so long, 01:00:29.710 --> 01:00:32.080 there's this technique, has its own Wikipedia article 01:00:32.080 --> 01:00:33.760 of called rubber duck debugging. 01:00:33.760 --> 01:00:37.390 The idea of which is that if you're home in your dorm room, 01:00:37.390 --> 01:00:39.520 wrestling with some bug in your code, printf 01:00:39.520 --> 01:00:42.820 didn't quite reveal the source to you, debugger isn't really helping, 01:00:42.820 --> 01:00:46.960 honestly, maybe it would help to just sound out what problem you're having. 01:00:46.960 --> 01:00:50.260 Similar to going to office hours, talking to a TA or a professor, 01:00:50.260 --> 01:00:52.030 just walking through your problems because 01:00:52.030 --> 01:00:54.730 in sort of talking to the duck about the fact 01:00:54.730 --> 01:01:00.550 that you're doing this while n is less than 0, and then if it is-- 01:01:00.550 --> 01:01:01.180 wait a minute. 01:01:01.180 --> 01:01:03.820 I'm an idiot, not just for talking to the rubber duck. 01:01:03.820 --> 01:01:05.980 You realize, hopefully, in expressing yourself, 01:01:05.980 --> 01:01:09.910 literally verbally, you probably will hear with non-zero probability, 01:01:09.910 --> 01:01:11.860 like some illogic in your statement. 01:01:11.860 --> 01:01:16.430 And just by sounding things out, you'll realize like, oh, that's my problem. 01:01:16.430 --> 01:01:19.720 And so, frankly, if you have roommates, you can also use a roommate for this. 01:01:19.720 --> 01:01:21.700 But the rubber duck is just sort of a go-to 01:01:21.700 --> 01:01:24.700 when your roommates have no interest in your C problem set, 01:01:24.700 --> 01:01:28.150 talking something through that as such. 01:01:28.150 --> 01:01:29.933 And this is an invaluable technique. 01:01:29.933 --> 01:01:32.350 I admittedly tend not to do it so much with a rubber duck, 01:01:32.350 --> 01:01:34.510 but ideally with colleagues, human colleagues. 01:01:34.510 --> 01:01:38.260 But just talking through things often will help you just realize, 01:01:38.260 --> 01:01:40.360 oh, I said something illogical. 01:01:40.360 --> 01:01:41.860 Now I can go back to the code. 01:01:41.860 --> 01:01:44.650 So don't solve problems by staring at your screen 01:01:44.650 --> 01:01:46.240 endlessly for minutes, for hours. 01:01:46.240 --> 01:01:48.100 At that point, it's time for a break, time 01:01:48.100 --> 01:01:50.475 to walk away, time to talk to the duck, if you've already 01:01:50.475 --> 01:01:52.900 exhausted some of those other tools. 01:01:52.900 --> 01:01:55.330 As an aside, on your way out today at the end of class, 01:01:55.330 --> 01:01:59.020 we have, clearly, plenty of rubber ducks for you. 01:01:59.020 --> 01:02:01.600 And it's become a thing over the years, at least 01:02:01.600 --> 01:02:05.770 among some, to bring the duck with them when they travel and send us photos. 01:02:05.770 --> 01:02:10.480 Here, for instance, is CS50's rubber duck debugger, A.K.A. DDB, 01:02:10.480 --> 01:02:15.940 for Duck Debugger, which is a pun on a geekier program called GDB, the GNU 01:02:15.940 --> 01:02:18.740 Debugger, which is an actual piece of software for debugging. 01:02:18.740 --> 01:02:25.270 This is CS50's debugger in the hills of Puerto Rico, also, here on the sea. 01:02:25.270 --> 01:02:28.310 He made its way to San Francisco here. 01:02:28.310 --> 01:02:30.640 Also, down by Fisherman's Wharf by the sea lions. 01:02:30.640 --> 01:02:31.660 Familiar? 01:02:31.660 --> 01:02:34.570 Here at Stanford, where there's a William Gates Computer Science 01:02:34.570 --> 01:02:38.950 building for computer science, down the road in SF at Google. 01:02:38.950 --> 01:02:41.650 And this is the Trevi Fountain in Rome. 01:02:41.650 --> 01:02:43.810 And lastly, the Colosseum. 01:02:43.810 --> 01:02:46.990 So we'll be curious to see in the coming years where your duck two travels. 01:02:46.990 --> 01:02:49.120 So that, then, was quite a bit. 01:02:49.120 --> 01:02:51.850 Why don't we go ahead here and take a short 5 minute break? 01:02:51.850 --> 01:02:52.760 No snacks yet. 01:02:52.760 --> 01:02:54.400 You're welcome to get up or sit down. 01:02:54.400 --> 01:02:56.620 We'll return in about five. 01:02:56.620 --> 01:03:00.020 All right, so we are back. 01:03:00.020 --> 01:03:04.000 And if the goal, ultimately, today is to have a better understanding of things 01:03:04.000 --> 01:03:06.940 like strings so that we can solve problems with text, 01:03:06.940 --> 01:03:09.190 let's consider some simpler types of data 01:03:09.190 --> 01:03:11.290 first, how we might represent those, and then 01:03:11.290 --> 01:03:14.290 see if that doesn't lead us to a discovery as to how strings, 01:03:14.290 --> 01:03:17.330 and just today's modern software is using things like that. 01:03:17.330 --> 01:03:21.850 So when we talked on week zero about representation of data, 01:03:21.850 --> 01:03:25.930 we had different ways of doing it, in terms of binary and decimal, 01:03:25.930 --> 01:03:27.640 and unary even. 01:03:27.640 --> 01:03:30.520 When we started talking about the same last week in code, 01:03:30.520 --> 01:03:33.980 we started talking about data types instead. 01:03:33.980 --> 01:03:36.820 And these data types were a way of telling 01:03:36.820 --> 01:03:40.000 the computer, like do you want an integer, do you want a character, 01:03:40.000 --> 01:03:44.260 do you want a floating point value, like a real number, or even a string, 01:03:44.260 --> 01:03:45.070 as we've seen? 01:03:45.070 --> 01:03:47.350 But it turns out that computers, of course, 01:03:47.350 --> 01:03:49.930 only have finite amounts of resources. 01:03:49.930 --> 01:03:53.740 Your computer only has a fixed amount of memory or RAM. 01:03:53.740 --> 01:03:55.910 And that actually has very real world implications. 01:03:55.910 --> 01:03:59.630 So for instance, here are some of the data types we've seen thus far. 01:03:59.630 --> 01:04:04.090 And it turns out that each of these in C has a specific number 01:04:04.090 --> 01:04:05.650 of bits allocated to it. 01:04:05.650 --> 01:04:08.350 Now, admittedly, this can vary by system. 01:04:08.350 --> 01:04:10.850 It's not so much the case nowadays, but for many years, 01:04:10.850 --> 01:04:13.100 for decades, computers were getting better and better. 01:04:13.100 --> 01:04:15.392 The earliest computers might have used fewer bits 01:04:15.392 --> 01:04:16.600 for some of these data types. 01:04:16.600 --> 01:04:18.663 More modern computers might use more bits. 01:04:18.663 --> 01:04:21.830 So the numbers you're about to see are pretty much where we are present day. 01:04:21.830 --> 01:04:25.030 So when it comes to these data types, a bool, 01:04:25.030 --> 01:04:29.020 which is true or false, somewhat curiously, uses a whole byte, 01:04:29.020 --> 01:04:32.380 even though that's way overkill because for a bool, true or false, 01:04:32.380 --> 01:04:33.940 you, of course, only need one bit. 01:04:33.940 --> 01:04:36.520 But it turns out, even though it's wasteful to use 01:04:36.520 --> 01:04:39.938 eight bits, or one byte, just to represent true or false, 01:04:39.938 --> 01:04:41.230 it's just easier for computers. 01:04:41.230 --> 01:04:42.820 So a bool tends to be one byte. 01:04:42.820 --> 01:04:47.590 An int, which we've been using a lot, uses 4 bytes, typically, or 32 bits. 01:04:47.590 --> 01:04:50.590 And if I do some quick math from week zero, with 32 bits, 01:04:50.590 --> 01:04:54.040 you have 4 billion possible values, roughly. 01:04:54.040 --> 01:04:56.290 But if you want to represent positive and negative, 01:04:56.290 --> 01:04:59.710 that means you can represent roughly negative 2 billion, all the way up 01:04:59.710 --> 01:05:01.020 to positive 2 billion. 01:05:01.020 --> 01:05:02.770 So that's the range, typically, with ints. 01:05:02.770 --> 01:05:06.820 If that's too few numbers for you, turns out there's things called longs. 01:05:06.820 --> 01:05:10.120 And longs use 64 bits, which allow you to have 01:05:10.120 --> 01:05:13.220 like a quintillion number of possibilities, 01:05:13.220 --> 01:05:15.730 which is a lot, certainly, a lot more than 4 billion. 01:05:15.730 --> 01:05:17.410 So sometimes you might use a long. 01:05:17.410 --> 01:05:18.670 But even that's finite. 01:05:18.670 --> 01:05:21.640 And so as we discussed at the end of last week, 01:05:21.640 --> 01:05:23.980 bad things can happen if you make certain assumptions 01:05:23.980 --> 01:05:27.220 as to the data because of things like integer overflow or the like, 01:05:27.220 --> 01:05:28.330 where things wrap around. 01:05:28.330 --> 01:05:31.538 Then there's a float, which is a real number, something with a decimal point. 01:05:31.538 --> 01:05:36.040 By convention, it's 4 bytes or 32 bits, which gives you, in short, 01:05:36.040 --> 01:05:37.810 only a specific amount of precision. 01:05:37.810 --> 01:05:41.620 It doesn't necessarily dictate how many numbers to the left or to the right. 01:05:41.620 --> 01:05:45.250 In the aggregate, ultimately, you have though, 01:05:45.250 --> 01:05:47.650 4 billion possible permutations still. 01:05:47.650 --> 01:05:50.110 If you need more precision for scientific, for medical, 01:05:50.110 --> 01:05:54.790 for financial applications, you might use 8 bytes, A.K.A. a double, 01:05:54.790 --> 01:05:57.700 which just gives you more digits of precision. 01:05:57.700 --> 01:06:01.360 They eventually get imprecise per the example we looked at last week, 01:06:01.360 --> 01:06:03.610 but it at least gets you further down the line. 01:06:03.610 --> 01:06:07.930 As an aside, in really, really important applications, in finance, 01:06:07.930 --> 01:06:10.030 in medicine, in military operations, and the 01:06:10.030 --> 01:06:12.640 like where you really can't have rounding errors-- 01:06:12.640 --> 01:06:17.470 long story short, humans have developed libraries in C and other languages 01:06:17.470 --> 01:06:19.317 that use more, even, than 8 bytes. 01:06:19.317 --> 01:06:22.150 So there are solutions to these problems, but they're always finite. 01:06:22.150 --> 01:06:24.070 You have to pick an upper bound. 01:06:24.070 --> 01:06:27.070 Then there's char, which we saw briefly last week when I asked 01:06:27.070 --> 01:06:29.470 the user for y or n, for yes or no. 01:06:29.470 --> 01:06:32.470 And then there's a string, which I'm going to propose as a question mark 01:06:32.470 --> 01:06:34.360 because a string totally depends. 01:06:34.360 --> 01:06:35.380 Like, Hi! 01:06:35.380 --> 01:06:38.890 H-I, exclamation point, would seem to be three bytes. 01:06:38.890 --> 01:06:41.140 D-A-V-I-D, would seem to be five. 01:06:41.140 --> 01:06:45.400 So the strings, clearly, are variable based on what you or the human type in. 01:06:45.400 --> 01:06:48.140 So we'll see what this means, though, in just a bit. 01:06:48.140 --> 01:06:51.580 This though, is the thing inside of your Mac, your PC, your phone. 01:06:51.580 --> 01:06:53.680 It might not look exactly like this, but this is 01:06:53.680 --> 01:06:56.187 a memory module for a modern computer. 01:06:56.187 --> 01:06:57.520 And let's go ahead and use this. 01:06:57.520 --> 01:06:59.920 Really, it's just representative of the finite amount of memory 01:06:59.920 --> 01:07:01.360 that any computer, indeed, has. 01:07:01.360 --> 01:07:06.160 Let's zoom in on one of these little black chips on the circuit board here. 01:07:06.160 --> 01:07:10.180 Zoom in, and let me propose that this rectangle really represents 01:07:10.180 --> 01:07:14.380 some number of bytes, like tucked inside of this little black circuit 01:07:14.380 --> 01:07:16.750 on the board is maybe, I don't know, a gigabyte, 01:07:16.750 --> 01:07:19.300 a billion bytes, maybe it's 100 bytes-- some number of bytes. 01:07:19.300 --> 01:07:21.258 It totally depends on the computer and how much 01:07:21.258 --> 01:07:22.700 you paid for the stick of memory. 01:07:22.700 --> 01:07:27.850 But if there's a finite number of bytes physically implemented somehow 01:07:27.850 --> 01:07:30.327 digitally inside of this hardware, well, then it 01:07:30.327 --> 01:07:32.410 stands to reason that we could number those bytes. 01:07:32.410 --> 01:07:36.940 We can just arbitrarily decide that the top left corner is byte number 01:07:36.940 --> 01:07:38.800 one, or really byte number zero. 01:07:38.800 --> 01:07:41.170 The one next to it is number one, then number two, 01:07:41.170 --> 01:07:43.450 number 3, dot, dot, dot, number 2 billion 01:07:43.450 --> 01:07:46.090 or whatever it is, however big this memory is. 01:07:46.090 --> 01:07:50.530 So if you use a variable in a C program, that's only one byte. 01:07:50.530 --> 01:07:54.190 Like a char, it might literally be stored in that top left-hand corner 01:07:54.190 --> 01:07:55.120 of the memory. 01:07:55.120 --> 01:07:57.760 In practice, you don't care where, physically, it is. 01:07:57.760 --> 01:07:59.830 But really, the artist's rendition would be 01:07:59.830 --> 01:08:02.872 this-- a char might use one of those single bytes 01:08:02.872 --> 01:08:04.330 somewhere in the computer's memory. 01:08:04.330 --> 01:08:07.450 If you use an int, which is 4 bytes, it would give you 01:08:07.450 --> 01:08:10.840 4 bytes, contiguous-- that is left to right, top to bottom. 01:08:10.840 --> 01:08:13.274 But all 32 bits would be next to each other 01:08:13.274 --> 01:08:16.149 so the computer knows that those, indeed, all belong to the same int. 01:08:16.149 --> 01:08:18.680 If you need a long, or a double for that matter, 01:08:18.680 --> 01:08:21.140 then you might use a full 8 bytes in this case. 01:08:21.140 --> 01:08:23.439 And you just keep using and using this memory, 01:08:23.439 --> 01:08:26.170 kind of like a canvas, almost in Photoshop 01:08:26.170 --> 01:08:29.845 or a spreadsheet where you can just move pixels or you can move data around, 01:08:29.845 --> 01:08:31.720 that's really what your computer's memory is, 01:08:31.720 --> 01:08:36.702 a canvas for storing information in units of bytes or 8 bits. 01:08:36.702 --> 01:08:39.160 Now, we don't need to keep looking at these circuit boards. 01:08:39.160 --> 01:08:41.287 We can abstract it away, as we often do. 01:08:41.287 --> 01:08:43.120 And let's go ahead and zoom in on this grid, 01:08:43.120 --> 01:08:45.740 just to consider some very specific variables. 01:08:45.740 --> 01:08:49.180 So let me zoom in, and now I see fewer, but larger boxes 01:08:49.180 --> 01:08:51.580 on the screen, each of which, again, represents a byte. 01:08:51.580 --> 01:08:55.130 And now let me propose that we play with some actual code. 01:08:55.130 --> 01:08:58.029 So here in C, albeit without a full program, 01:08:58.029 --> 01:09:01.060 are three ints-- score1, score2, score3. 01:09:01.060 --> 01:09:07.359 I have, coincidentally, given myself two scores around 72 and 73, 01:09:07.359 --> 01:09:09.040 and then a pretty low score at 33. 01:09:09.040 --> 01:09:12.048 Of course, last week or two weeks ago, this would have been high. 01:09:12.048 --> 01:09:13.840 But now we're dealing with actual integers. 01:09:13.840 --> 01:09:17.750 So these are three so-so scores on my quizzes or tests or the like. 01:09:17.750 --> 01:09:19.250 So let me go to VS Code here. 01:09:19.250 --> 01:09:22.210 And let's make a program called scores.c. 01:09:22.210 --> 01:09:24.399 So I'm going to write, code scores.c. 01:09:24.399 --> 01:09:26.149 That's going to give me my new file. 01:09:26.149 --> 01:09:28.420 And let me go ahead and implement something like this. 01:09:28.420 --> 01:09:34.149 Include stdio.h, int main(void), and then inside of here, 01:09:34.149 --> 01:09:37.689 let me do int score1 will be 72. 01:09:37.689 --> 01:09:40.029 Int score2 will be 73. 01:09:40.029 --> 01:09:43.149 And int score3 will be 33. 01:09:43.149 --> 01:09:45.460 And then let me just do something like write a program 01:09:45.460 --> 01:09:48.043 to average my three test scores together, something like that. 01:09:48.043 --> 01:09:52.240 So let me do printf, quote unquote, my average is-- 01:09:52.240 --> 01:09:56.470 and I'm going to go ahead and do, say, %i, /n. 01:09:56.470 --> 01:09:58.290 And now, let me plug in the results. 01:09:58.290 --> 01:10:00.040 And this is kind of grade school math now. 01:10:00.040 --> 01:10:02.210 How do I compute the average of three values? 01:10:02.210 --> 01:10:09.110 Well, just like on paper, I can do score1 plus score2 plus score3 01:10:09.110 --> 01:10:12.830 in parentheses, because of order of operations, divided by 3, 01:10:12.830 --> 01:10:14.457 since there's three total scores. 01:10:14.457 --> 01:10:16.040 All right, so I think this checks out. 01:10:16.040 --> 01:10:19.040 And indeed, you can use parentheses and operators like plus in your code 01:10:19.040 --> 01:10:23.180 like this in C. Let me go ahead now and do make scores. 01:10:23.180 --> 01:10:24.327 No syntax error. 01:10:24.327 --> 01:10:25.910 So that's good, nothing missing there. 01:10:25.910 --> 01:10:28.850 And now let me do ./scores and see what my test average is. 01:10:28.850 --> 01:10:32.270 All right, it's not great, but I think I still passed. 01:10:32.270 --> 01:10:36.050 And indeed, my average here is 59. 01:10:36.050 --> 01:10:38.360 Is it precisely 59 though? 01:10:38.360 --> 01:10:39.140 Well, let's see. 01:10:39.140 --> 01:10:42.110 Let's actually, instead of using an int, how about we go ahead 01:10:42.110 --> 01:10:44.870 and use something like a floating point value here? 01:10:44.870 --> 01:10:46.250 And let me go ahead and do this. 01:10:46.250 --> 01:10:48.710 So let me recompile my code, make scores. 01:10:48.710 --> 01:10:50.600 Huh, all right, I've got an issue. 01:10:50.600 --> 01:10:52.340 Let me zoom in on my terminal window. 01:10:52.340 --> 01:10:54.710 We've not seen this one, necessarily, before. 01:10:54.710 --> 01:10:56.510 But error on line 9. 01:10:56.510 --> 01:11:00.410 Format specifies type double, which is a lot of precision, 01:11:00.410 --> 01:11:02.180 but the argument has type int. 01:11:02.180 --> 01:11:03.300 So what does this mean? 01:11:03.300 --> 01:11:06.508 Well, it's showing me with these green squiggles that something's bad between 01:11:06.508 --> 01:11:09.060 the %f and this thing over here. 01:11:09.060 --> 01:11:13.020 Well, on the left, I'm implying a float, or a double for that matter. 01:11:13.020 --> 01:11:16.835 On the right, though, what data type are score1, score2, score3? 01:11:16.835 --> 01:11:17.960 All right, so they're ints. 01:11:17.960 --> 01:11:19.583 So clang does not like this. 01:11:19.583 --> 01:11:22.250 The compiler just doesn't like that I'm using ints on the right, 01:11:22.250 --> 01:11:24.170 but I want floats on the left. 01:11:24.170 --> 01:11:26.670 So there's going to be different ways of solving this. 01:11:26.670 --> 01:11:29.870 One way would be to just ignore the problem like I originally did, 01:11:29.870 --> 01:11:32.450 and just go back to %i. 01:11:32.450 --> 01:11:38.330 Or as an aside, %d is often an alternative to %i for a decimal number. 01:11:38.330 --> 01:11:42.358 But we use %i because it sounds like int, so %i is fine here too. 01:11:42.358 --> 01:11:44.150 But I don't want to just avoid the problem. 01:11:44.150 --> 01:11:46.500 I want to actually display a floating point value. 01:11:46.500 --> 01:11:47.730 So how can I fix this? 01:11:47.730 --> 01:11:50.272 Well, it turns out, I can solve this in a few different ways. 01:11:50.272 --> 01:11:53.990 The simplest is just to make sure that at least one number on the right 01:11:53.990 --> 01:11:59.330 is a floating point value, like 3.0 instead of just 3. 01:11:59.330 --> 01:12:01.700 Now I think clang will be happier. 01:12:01.700 --> 01:12:03.320 Let me do make scores-- 01:12:03.320 --> 01:12:04.400 Enter. 01:12:04.400 --> 01:12:05.330 And indeed, it's OK. 01:12:05.330 --> 01:12:05.930 Why? 01:12:05.930 --> 01:12:10.050 As soon as you have at least one more precise data type on the right, 01:12:10.050 --> 01:12:13.170 it just treats everything, at that point, as floating point value 01:12:13.170 --> 01:12:14.330 so that the math works out. 01:12:14.330 --> 01:12:17.720 So ./scores, Enter-- and now, there we go, right? 01:12:17.720 --> 01:12:20.390 Some of us might really want that 1/3 of a point. 01:12:20.390 --> 01:12:21.980 Our average was not 59. 01:12:21.980 --> 01:12:25.010 It's 59 1/3, as in this case here. 01:12:25.010 --> 01:12:26.750 All right, so we've solved that there. 01:12:26.750 --> 01:12:30.890 As an aside, though, there's one other technique to show here. 01:12:30.890 --> 01:12:33.320 If you didn't want to change it to 3.0 because that's 01:12:33.320 --> 01:12:36.410 a little weird, because there were literally three scores, 01:12:36.410 --> 01:12:38.760 it's not like that needs to have a decimal point, 01:12:38.760 --> 01:12:43.970 you could also explicitly convert the 3 to a float 01:12:43.970 --> 01:12:46.230 by saying, in parentheses, float. 01:12:46.230 --> 01:12:48.050 This is what's called typecasting. 01:12:48.050 --> 01:12:51.840 And this will just convert the thing right after it to that data type, 01:12:51.840 --> 01:12:52.560 if it's possible. 01:12:52.560 --> 01:12:56.970 So if I do this again, make scores, no errors now. ./scores, and I get, 01:12:56.970 --> 01:12:59.960 in fact, the same result. There's a bit of a rounding issue here, 01:12:59.960 --> 01:13:03.650 but we know the rounding relates to the imprecision from last week. 01:13:03.650 --> 01:13:06.980 For now, let me just be happy with my 59.3 something. 01:13:06.980 --> 01:13:08.360 I'll take that for now. 01:13:08.360 --> 01:13:14.660 But this is as close to a good enough correct answer for me now. 01:13:14.660 --> 01:13:15.942 But how do I-- 01:13:15.942 --> 01:13:18.650 think about now, what's going on inside of the computer's memory? 01:13:18.650 --> 01:13:19.310 Well, let's consider. 01:13:19.310 --> 01:13:20.643 Here's that same grid of memory. 01:13:20.643 --> 01:13:22.490 Each box represents a byte. 01:13:22.490 --> 01:13:25.790 Where are score1, score2, and score3 in my memory? 01:13:25.790 --> 01:13:28.790 Well, score1, let me just propose, is at the top left. 01:13:28.790 --> 01:13:32.060 But it's taking up four boxes for 4 bytes. 01:13:32.060 --> 01:13:34.842 Score2 probably ends up right next to it in memory, 01:13:34.842 --> 01:13:36.800 though, this isn't always going to be the case, 01:13:36.800 --> 01:13:38.180 but I've chosen simple examples. 01:13:38.180 --> 01:13:40.910 73 is next to it, also taking up 4 bytes. 01:13:40.910 --> 01:13:45.320 And then lastly, 33 is in score3, down there underneath. 01:13:45.320 --> 01:13:48.343 Now, if we really look at the computer's memory, 01:13:48.343 --> 01:13:50.510 look at it with some kind of microscope or the like, 01:13:50.510 --> 01:13:54.110 there's actually 32 bits, 32 bits, 32 bits 01:13:54.110 --> 01:13:59.308 in each of those four groups of four bytes representing those values. 01:13:59.308 --> 01:14:01.100 But again, for today's purposes onwards, we 01:14:01.100 --> 01:14:03.308 don't really need to think again and again in binary. 01:14:03.308 --> 01:14:05.940 It's just, indeed, these decimal numbers being stored there. 01:14:05.940 --> 01:14:08.240 But I claim now, this isn't the best design. 01:14:08.240 --> 01:14:11.300 Even if you have never programmed before CS50, 01:14:11.300 --> 01:14:13.220 what you're looking at here on the screen, 01:14:13.220 --> 01:14:16.970 as an excerpt, in what sense is this perhaps bad design, even though it's 01:14:16.970 --> 01:14:19.960 a correct way of storing three test scores? 01:14:19.960 --> 01:14:20.960 What's kind of bad here? 01:14:20.960 --> 01:14:21.882 Yeah? 01:14:21.882 --> 01:14:26.220 AUDIENCE: The more scores you have, the more you [INAUDIBLE].. 01:14:26.220 --> 01:14:28.950 DAVID MALAN: Yeah, always do exactly what you did-- extrapolate 01:14:28.950 --> 01:14:31.740 to 4 scores, 5 scores 50 scores. 01:14:31.740 --> 01:14:34.020 This can't be that well-designed because now you're 01:14:34.020 --> 01:14:36.300 going to have 4 lines of code, 5 lines of code, 01:14:36.300 --> 01:14:38.550 50 lines of code that are almost identical, 01:14:38.550 --> 01:14:40.770 except for this like arbitrary number that we're 01:14:40.770 --> 01:14:42.430 updating at the end of the variable. 01:14:42.430 --> 01:14:44.940 So indeed, there's probably going to be a better 01:14:44.940 --> 01:14:48.690 way, even though, at least in C, we haven't yet seen that technique. 01:14:48.690 --> 01:14:52.440 But the solution, today onward, is going to be something called an array. 01:14:52.440 --> 01:14:57.180 An array is a way of storing your data back 01:14:57.180 --> 01:15:00.630 to back to back in the computer's memory in such a way 01:15:00.630 --> 01:15:03.960 that you can access each individual member easily. 01:15:03.960 --> 01:15:08.530 Put another way, with an array, you can instead do something like this. 01:15:08.530 --> 01:15:12.300 Instead of saying int score1, int score2, int score3, 01:15:12.300 --> 01:15:15.790 giving each a value, you can first tell the computer, 01:15:15.790 --> 01:15:18.330 please give me a variable called scores-- 01:15:18.330 --> 01:15:20.700 plural, though you can call it anything you want-- 01:15:20.700 --> 01:15:24.090 of size three, each of which will be an integer. 01:15:24.090 --> 01:15:28.680 That is to say, this is how you declare an array in C that will have 01:15:28.680 --> 01:15:30.930 enough room to store three integers. 01:15:30.930 --> 01:15:34.540 Put another way, this is the technical way of telling the computer, 01:15:34.540 --> 01:15:38.880 please give me 12 bytes in total-- 01:15:38.880 --> 01:15:42.660 3 times 4 each for an int, so give me 12 bytes in total. 01:15:42.660 --> 01:15:44.640 And what the computer will do is guarantee 01:15:44.640 --> 01:15:47.350 that they're back to back to back in the computer's memory. 01:15:47.350 --> 01:15:49.360 And that'll be useful in just a moment. 01:15:49.360 --> 01:15:51.820 So let me go ahead and do something useful with this. 01:15:51.820 --> 01:15:53.640 Let me store three actual scores. 01:15:53.640 --> 01:15:58.500 Here's how I could now store those same numeric scores in this array. 01:15:58.500 --> 01:16:03.040 Syntax is a little different, but there's one variable called scores. 01:16:03.040 --> 01:16:05.010 But if you want to go to its first location, 01:16:05.010 --> 01:16:08.520 starting today, you use square brackets and go to location 0 01:16:08.520 --> 01:16:13.080 first, which because things in C are 0 indexed, so to speak, 01:16:13.080 --> 01:16:14.280 you start counting at 0. 01:16:14.280 --> 01:16:16.410 The first int is at [0]. 01:16:16.410 --> 01:16:18.030 Second int is at [1]. 01:16:18.030 --> 01:16:19.530 Third int is at [2]. 01:16:19.530 --> 01:16:20.730 So it's not one, two, three. 01:16:20.730 --> 01:16:22.090 It's literally 0, 1, 2. 01:16:22.090 --> 01:16:24.090 And this is not something you have control over. 01:16:24.090 --> 01:16:26.250 You must start at 0. 01:16:26.250 --> 01:16:29.940 So these lines now create an array of size three, 01:16:29.940 --> 01:16:33.510 and then insert one, two, three values into that array. 01:16:33.510 --> 01:16:37.770 But the upside now is that you only have one name of the variable to remember. 01:16:37.770 --> 01:16:39.240 It's just called scores. 01:16:39.240 --> 01:16:43.380 Yes, you need to go into the array to get individual values. 01:16:43.380 --> 01:16:46.618 You need to index into it using those square brackets. 01:16:46.618 --> 01:16:48.660 But at least you don't have this hackish approach 01:16:48.660 --> 01:16:53.050 of declaring a separate variable for each and every one of these values. 01:16:53.050 --> 01:16:56.070 So let me go back to scores.c here. 01:16:56.070 --> 01:16:57.580 And let me propose that I do this. 01:16:57.580 --> 01:17:00.580 Let me just use that same idea to do the following. 01:17:00.580 --> 01:17:02.580 Let me get rid of these three separate integers. 01:17:02.580 --> 01:17:06.210 Let me give myself an int scores array of size 3. 01:17:06.210 --> 01:17:10.470 And then scores[0] will, as before, be 72. 01:17:10.470 --> 01:17:14.070 Scores[1] will be 73. 01:17:14.070 --> 01:17:16.830 And scores[2] will be 33. 01:17:16.830 --> 01:17:18.780 And let me get rid of the little dot there. 01:17:18.780 --> 01:17:23.490 All right, so now, if I go ahead and run this again with make scores-- 01:17:23.490 --> 01:17:24.642 Enter. 01:17:24.642 --> 01:17:29.060 Huh, what did I do wrong here? 01:17:29.060 --> 01:17:31.680 I think I got a little too ahead of myself. 01:17:31.680 --> 01:17:36.100 Let me increase my terminal window. 01:17:36.100 --> 01:17:38.830 Let's focus on line 10 here, first. 01:17:38.830 --> 01:17:42.310 Error, use of undeclared identifier, score1. 01:17:42.310 --> 01:17:44.170 What did I do here that was dumb? 01:17:44.170 --> 01:17:45.430 Yeah? 01:17:45.430 --> 01:17:47.440 AUDIENCE: You didn't declare it a variable. 01:17:47.440 --> 01:17:49.420 DAVID MALAN: Right, so I didn't declare score1. 01:17:49.420 --> 01:17:50.530 I've got old code. 01:17:50.530 --> 01:17:53.798 So I just kind of, honestly, got ahead of myself here, not even intentionally. 01:17:53.798 --> 01:17:56.090 So let me go ahead and shrink my terminal window again. 01:17:56.090 --> 01:17:57.740 I need to finish my thought here. 01:17:57.740 --> 01:17:58.960 So let me clear my terminal. 01:17:58.960 --> 01:18:04.960 And let me change this now to be scores[0] plus scores[1] plus 01:18:04.960 --> 01:18:05.610 scores[2]. 01:18:05.610 --> 01:18:07.360 So it's a little more verbose because I've 01:18:07.360 --> 01:18:10.040 got these square brackets, so to speak. 01:18:10.040 --> 01:18:12.220 But I think now my code is consistent. 01:18:12.220 --> 01:18:13.870 So let me make scores now. 01:18:13.870 --> 01:18:14.950 It now compiles. 01:18:14.950 --> 01:18:19.870 ./scores gives me, indeed, the same rough average with those same values. 01:18:19.870 --> 01:18:24.280 All right, so let me go ahead and maybe enhance this a little bit. 01:18:24.280 --> 01:18:26.920 It's a little silly to have to write a special program just 01:18:26.920 --> 01:18:31.610 to check your average of three test scores like 72, 73, 33. 01:18:31.610 --> 01:18:33.550 Why don't I actually make the program dynamic 01:18:33.550 --> 01:18:37.250 and ask the human for those scores? 01:18:37.250 --> 01:18:39.140 So instead, let me do this. 01:18:39.140 --> 01:18:43.480 How about we get rid of the 72, and change this to getInt. 01:18:43.480 --> 01:18:46.300 And I'll just prompt the user for a score. 01:18:46.300 --> 01:18:52.510 Let me get rid of the 73 and get this to be getInt score, quote unquote. 01:18:52.510 --> 01:18:56.560 And then lastly, get rid of the 33, and replace it with getInt, quote unquote, 01:18:56.560 --> 01:18:57.670 score. 01:18:57.670 --> 01:19:03.680 getInt is a CS50 thing for now, so I need to include cs50.h, as always. 01:19:03.680 --> 01:19:05.650 But I think now, it's sort of a better program 01:19:05.650 --> 01:19:08.680 because now I can compile it once, I can even share it with my friends. 01:19:08.680 --> 01:19:12.490 And now any of us can average three scores on some classes test. 01:19:12.490 --> 01:19:15.190 They don't need to know the code or rewrite the code just 01:19:15.190 --> 01:19:16.910 to type in their scores. 01:19:16.910 --> 01:19:19.150 So make scores worked. 01:19:19.150 --> 01:19:25.120 ./scores, now I can type anything I want-- maybe it's a 72, 73, 33, 01:19:25.120 --> 01:19:26.320 still get the same answer. 01:19:26.320 --> 01:19:31.210 Or maybe I'm having a better semester, 100, 100, maybe 99, 01:19:31.210 --> 01:19:33.520 and now we get still a pretty high score there. 01:19:33.520 --> 01:19:34.600 But now it's dynamic. 01:19:34.600 --> 01:19:36.080 Now you don't need the source code. 01:19:36.080 --> 01:19:37.747 You don't need to recompile the program. 01:19:37.747 --> 01:19:39.670 It's just going to work again and again. 01:19:39.670 --> 01:19:41.090 But this, too. 01:19:41.090 --> 01:19:43.660 Let me propose that this code is correct if I 01:19:43.660 --> 01:19:45.910 want to get three scores from the user. 01:19:45.910 --> 01:19:50.950 But these highlighted lines now, 6 through 9, are they well-designed, 01:19:50.950 --> 01:19:53.170 would you say? 01:19:53.170 --> 01:19:53.680 Yeah? 01:19:53.680 --> 01:19:54.898 AUDIENCE: Can you loop? 01:19:54.898 --> 01:19:55.940 DAVID MALAN: Yeah, right? 01:19:55.940 --> 01:19:58.220 This is-- we can use a loop, is the spoiler here. 01:19:58.220 --> 01:19:58.820 Why? 01:19:58.820 --> 01:20:01.590 I mean, my God, it's like the same code again and again and again. 01:20:01.590 --> 01:20:03.465 The only thing that's changing is the number. 01:20:03.465 --> 01:20:06.170 And this should have kind of had some code smell again, 01:20:06.170 --> 01:20:09.080 because if I keep typing the same thing again and again, 01:20:09.080 --> 01:20:11.810 that's clearly an opportunity to better design something. 01:20:11.810 --> 01:20:13.650 So let me do this. 01:20:13.650 --> 01:20:18.590 Let me go ahead and still create my array of size three. 01:20:18.590 --> 01:20:23.270 But let me use our old friend, the for loop, for int i equals 0, 01:20:23.270 --> 01:20:26.610 i less than 3, i++. 01:20:26.610 --> 01:20:29.510 And then in here, let me do scores bracket-- 01:20:29.510 --> 01:20:32.920 we haven't seen this before, but any intuition? 01:20:32.920 --> 01:20:34.220 Scores bracket-- 01:20:34.220 --> 01:20:34.720 AUDIENCE: i. 01:20:34.720 --> 01:20:39.730 DAVID MALAN: i, because that will use whatever i is, be it 0 or 1 or 2 01:20:39.730 --> 01:20:40.720 in iteration. 01:20:40.720 --> 01:20:43.780 And then I can get an int, asking the user for score, 01:20:43.780 --> 01:20:47.000 without having to repeat myself again and again. 01:20:47.000 --> 01:20:50.560 So hopefully, if I didn't make any typos, make scores, all good. 01:20:50.560 --> 01:20:54.665 ./scores, 72, 73, 33, and we're back in business. 01:20:54.665 --> 01:20:56.540 But the code is arguably now better designed, 01:20:56.540 --> 01:21:01.240 because now, I haven't actually hardcoded the scores, 01:21:01.240 --> 01:21:04.940 and I haven't actually copied and pasted any of that code. 01:21:04.940 --> 01:21:08.230 Well, if we consider now what's going on inside of the computer's memory, 01:21:08.230 --> 01:21:10.510 it's pretty much the same in terms of the values. 01:21:10.510 --> 01:21:15.490 But instead of the variables being, literally, score1, score2, score3, 01:21:15.490 --> 01:21:17.210 there's just one variable. 01:21:17.210 --> 01:21:19.030 It's an array called scores. 01:21:19.030 --> 01:21:24.550 But you can index into its three locations by using scores[0] to get 01:21:24.550 --> 01:21:28.810 the first, scores[1] to get the second, scores[2] to get the third. 01:21:28.810 --> 01:21:29.990 But this is key. 01:21:29.990 --> 01:21:33.040 The memory is contiguous. 01:21:33.040 --> 01:21:35.380 The screen is only so large, so it wraps around. 01:21:35.380 --> 01:21:38.950 But physically, digitally, the memory is contiguous-- top 01:21:38.950 --> 01:21:40.270 to bottom, left to right. 01:21:40.270 --> 01:21:41.530 And that's important, why? 01:21:41.530 --> 01:21:46.060 Because the brackets indicate 0, 1, 2, that each of these integers 01:21:46.060 --> 01:21:48.790 is just one integer away from the next. 01:21:48.790 --> 01:21:51.220 It can't be randomly down here all of a sudden. 01:21:51.220 --> 01:21:54.070 It's got to be back to back to back. 01:21:54.070 --> 01:21:57.130 All right, now equipped with that paradigm, 01:21:57.130 --> 01:22:00.710 what more could we actually do here? 01:22:00.710 --> 01:22:04.270 Well, it turns out, it's worth knowing that it's possible in code 01:22:04.270 --> 01:22:06.850 to even pass arrays around as arguments. 01:22:06.850 --> 01:22:09.100 And let me just whip this program up somewhat quickly, 01:22:09.100 --> 01:22:11.320 just so you've seen it before long. 01:22:11.320 --> 01:22:13.190 But let me go ahead and do this. 01:22:13.190 --> 01:22:18.130 Let me propose that I create a function that does this averaging for me. 01:22:18.130 --> 01:22:22.510 So I'm going to create a function called average that returns a float. 01:22:22.510 --> 01:22:26.860 And the arguments this thing is going to take-- 01:22:26.860 --> 01:22:28.640 let's see, it's going to be the array. 01:22:28.640 --> 01:22:31.480 So it turns out, if you want to take in an array of numbers-- 01:22:31.480 --> 01:22:33.050 you can call it anything you want. 01:22:33.050 --> 01:22:36.970 This is how you tell C that a function takes, not 01:22:36.970 --> 01:22:39.790 an integer, but an array of integers. 01:22:39.790 --> 01:22:41.290 And you don't have to call it array. 01:22:41.290 --> 01:22:42.790 I'm doing that just for the sake of discussion. 01:22:42.790 --> 01:22:43.660 It can be called x. 01:22:43.660 --> 01:22:44.490 It can be numbers. 01:22:44.490 --> 01:22:45.490 It can be anything else. 01:22:45.490 --> 01:22:49.060 I'm just calling an array to be super explicit as to what it is there. 01:22:49.060 --> 01:22:51.730 Now, how do I change my code down here? 01:22:51.730 --> 01:22:55.130 What I think I'm going to do for the moment is just this. 01:22:55.130 --> 01:22:59.110 I'm going to get rid of this code here, where I manually computed the average. 01:22:59.110 --> 01:23:01.480 And let me just call the average function here 01:23:01.480 --> 01:23:05.000 by passing in the whole array of scores. 01:23:05.000 --> 01:23:07.030 So this is just an example of abstraction, 01:23:07.030 --> 01:23:08.890 like now I have a function called average. 01:23:08.890 --> 01:23:09.670 I don't care. 01:23:09.670 --> 01:23:12.490 I don't have to remember how it works once I implement it. 01:23:12.490 --> 01:23:15.010 It just kind of tightens up my main code a little bit. 01:23:15.010 --> 01:23:17.030 But I do still have to implement this. 01:23:17.030 --> 01:23:19.360 So later in my file-- let me repeat myself before, 01:23:19.360 --> 01:23:22.270 the only time it's OK in C to repeat yourself again and again, 01:23:22.270 --> 01:23:27.010 by typing out again, average, and then int array open bracket-- 01:23:27.010 --> 01:23:28.580 but now not a semicolon. 01:23:28.580 --> 01:23:30.250 Now I have to implement this thing. 01:23:30.250 --> 01:23:33.400 And I can implement this in a bunch of different ways, 01:23:33.400 --> 01:23:37.630 but I don't know in advance-- 01:23:37.630 --> 01:23:39.040 I can't just do this. 01:23:39.040 --> 01:23:48.400 I can't just do array[0] plus array[1] plus array[2], 01:23:48.400 --> 01:23:52.130 unless this program's only ever going to work on three numbers. 01:23:52.130 --> 01:23:55.460 So let me go ahead and do this. 01:23:55.460 --> 01:23:58.570 Let me first propose that there's a poor design here. 01:23:58.570 --> 01:24:01.930 In my main function, what value have I repeated twice? 01:24:05.050 --> 01:24:07.550 Among the highlighted lines, what jumps out at you as twice? 01:24:07.550 --> 01:24:09.020 AUDIENCE: The length of the array? 01:24:09.020 --> 01:24:11.520 DAVID MALAN: Yeah, the length of the array, it's just three. 01:24:11.520 --> 01:24:14.720 Now it's not a huge deal that I typed the number three on line 8 and line 9, 01:24:14.720 --> 01:24:17.120 but this is exactly the kind of like shortcut 01:24:17.120 --> 01:24:18.440 that's going to get you in trouble eventually. 01:24:18.440 --> 01:24:18.860 Why? 01:24:18.860 --> 01:24:20.240 Because, eventually, you or someone else is 01:24:20.240 --> 01:24:22.407 going to go in and make the array bigger or smaller, 01:24:22.407 --> 01:24:24.410 and you're not going to realize that magically, 01:24:24.410 --> 01:24:26.270 that same number is in two places. 01:24:26.270 --> 01:24:29.270 And indeed, this is what a programmer would often call a magic number. 01:24:29.270 --> 01:24:31.940 A magic number is one that just kind of appears magically. 01:24:31.940 --> 01:24:35.210 And you're on the honor system to change it here, if you change it here, 01:24:35.210 --> 01:24:36.688 and then you change it over here. 01:24:36.688 --> 01:24:39.230 That's not going to end well if the onus is on the programmer 01:24:39.230 --> 01:24:43.190 to remember where they hardcoded-- that is, wrote out three explicitly. 01:24:43.190 --> 01:24:46.250 So any time you reuse a value like this, you know what? 01:24:46.250 --> 01:24:50.690 We should probably do what we did last week, which was to declare a variable, 01:24:50.690 --> 01:24:53.510 perhaps at the very top of my program, so it's super obvious 01:24:53.510 --> 01:24:56.990 what it is, called, maybe n, and set that equal to 3. 01:24:56.990 --> 01:24:59.030 Better yet, what did I do last week to make sure 01:24:59.030 --> 01:25:02.390 that I can't screw up and accidentally change that value? 01:25:02.390 --> 01:25:03.440 Yeah, constant. 01:25:03.440 --> 01:25:05.810 And the keyword there was just const for short. 01:25:05.810 --> 01:25:09.110 And now I have a global variable-- global in the sense that I can 01:25:09.110 --> 01:25:11.870 access it anywhere-- that is called n. 01:25:11.870 --> 01:25:12.680 It's an int. 01:25:12.680 --> 01:25:14.450 And it's always going to be 3. 01:25:14.450 --> 01:25:18.500 And now I can improve my main function a little bit by just changing 01:25:18.500 --> 01:25:22.662 the 3's to n, so now if I, if a colleague realized, oh, wait a minute, 01:25:22.662 --> 01:25:23.870 there's four tests this year. 01:25:23.870 --> 01:25:25.610 You change n to four, recompile the code, 01:25:25.610 --> 01:25:31.190 and it just works everywhere else, except in my average function. 01:25:31.190 --> 01:25:33.830 Let me change it back to 3, just for consistency. 01:25:33.830 --> 01:25:39.770 This is not going to fly now, to just sum up things like this, for instance, 01:25:39.770 --> 01:25:43.610 and then return this divided by 3. 01:25:43.610 --> 01:25:51.130 Why will this not work now as I've defined it? 01:25:51.130 --> 01:25:52.159 Yeah? 01:25:52.159 --> 01:25:58.030 AUDIENCE: [INAUDIBLE] 01:25:58.030 --> 01:26:00.980 DAVID MALAN: OK, I might be returning an integer value when 01:26:00.980 --> 01:26:02.870 I intend to return a float per this. 01:26:02.870 --> 01:26:05.870 But I think I'm OK because I used that little trick where I made sure 01:26:05.870 --> 01:26:08.810 that at least one of the numbers in my arithmetic expression 01:26:08.810 --> 01:26:11.010 is, in fact, a floating point value. 01:26:11.010 --> 01:26:14.180 And just by adding the point 0, make sure that everything 01:26:14.180 --> 01:26:15.650 gets treated as a float. 01:26:15.650 --> 01:26:17.864 So I think that's OK. 01:26:17.864 --> 01:26:19.034 AUDIENCE: [INAUDIBLE] 01:26:19.034 --> 01:26:20.701 DAVID MALAN: I'm sorry, a little louder. 01:26:20.701 --> 01:26:24.385 AUDIENCE: It just seems like you're [INAUDIBLE].. 01:26:24.385 --> 01:26:25.260 DAVID MALAN: Exactly. 01:26:25.260 --> 01:26:27.093 So left hand's not talking to the right hand 01:26:27.093 --> 01:26:30.210 here, in that my current implementation of average 01:26:30.210 --> 01:26:33.510 is still assuming that there's only going to be three tests or whatever. 01:26:33.510 --> 01:26:35.670 But wait a minute, I just went through the trouble 01:26:35.670 --> 01:26:39.480 of modifying this to be n, generically. 01:26:39.480 --> 01:26:43.205 And if I change this to 4, I'm not going to be happy, perhaps, 01:26:43.205 --> 01:26:46.080 with my average because now I'm going to ignore one of my test scores 01:26:46.080 --> 01:26:46.690 altogether. 01:26:46.690 --> 01:26:48.450 So let me change this back to 3. 01:26:48.450 --> 01:26:51.180 And unfortunately, if it's a variable now, 01:26:51.180 --> 01:26:55.500 n, and therefore, I have literally a variable number of scores, 01:26:55.500 --> 01:27:00.920 how do I take the average of a variable number of things? 01:27:00.920 --> 01:27:02.630 I mean, what's my building block there? 01:27:02.630 --> 01:27:03.170 Yeah? 01:27:03.170 --> 01:27:10.100 AUDIENCE: [INAUDIBLE] 01:27:10.100 --> 01:27:10.850 DAVID MALAN: Yeah. 01:27:10.850 --> 01:27:14.880 Why don't I use a loop that goes through the array and adds things up as you go? 01:27:14.880 --> 01:27:17.360 I mean, kind of like grade school, as you take the average on your calculator 01:27:17.360 --> 01:27:19.730 or paper and pencil, you just keep adding the numbers together, 01:27:19.730 --> 01:27:22.380 and then you divide at the end by the total number of things. 01:27:22.380 --> 01:27:23.520 So how can I do this? 01:27:23.520 --> 01:27:25.730 Well, let me change my implementation of average 01:27:25.730 --> 01:27:30.515 to first declare a variable called sum, or whatever, set it equal to 0. 01:27:30.515 --> 01:27:33.140 So this is like me on my piece of paper getting ready to count, 01:27:33.140 --> 01:27:36.590 or my calculator, of course, when you turn it on, typically defaults to zero. 01:27:36.590 --> 01:27:41.570 And now, let me do for, int i equals 0. i is less than a-- 01:27:41.570 --> 01:27:43.700 well, no, I didn't do that. 01:27:43.700 --> 01:27:46.730 i is less than n, i++. 01:27:46.730 --> 01:27:52.640 And now in here, let me go ahead and add to the current sum, whatever 01:27:52.640 --> 01:27:55.910 is in the array's location, i. 01:27:55.910 --> 01:28:00.740 And then down here, I think I can just return some divided by 3.0-- 01:28:00.740 --> 01:28:04.560 not 3.0, n, perhaps here. 01:28:04.560 --> 01:28:08.492 And actually, I think I'm going to get-- let's make sure it's a float. 01:28:08.492 --> 01:28:11.450 Let's use the type casting trick just to make sure I don't accidentally 01:28:11.450 --> 01:28:15.540 shortchange someone and throw away everything after the decimal point. 01:28:15.540 --> 01:28:17.300 So it just escalated quickly, right? 01:28:17.300 --> 01:28:18.990 Average just got a lot more involved. 01:28:18.990 --> 01:28:22.130 It's not just a single one line of code, but now it's dynamic. 01:28:22.130 --> 01:28:25.070 I initialize a variable called sum to 0. 01:28:25.070 --> 01:28:30.920 In this loop, I go through and just keep adding to sum, which is initially 0, 01:28:30.920 --> 01:28:33.200 whatever's in array[i]-- 01:28:33.200 --> 01:28:36.740 or specifically array[0], array[1], array[2]. 01:28:36.740 --> 01:28:40.970 That gives me a total sum that I return, divided by the total number of things. 01:28:40.970 --> 01:28:42.560 Now, this I can tighten slightly. 01:28:42.560 --> 01:28:45.650 Recall that this is syntactic sugar for just adding things. 01:28:45.650 --> 01:28:48.620 I can't use plus plus because that only literally adds one. 01:28:48.620 --> 01:28:52.630 But I can use here, plus equals. 01:28:52.630 --> 01:28:54.880 Questions on this implementation here? 01:28:54.880 --> 01:28:58.000 Really the only takeaway-- or the most important takeaway 01:28:58.000 --> 01:29:00.730 is that this is the syntax for how you tell 01:29:00.730 --> 01:29:04.210 a function that it expects a whole array, not 01:29:04.210 --> 01:29:06.450 a single variable like an int or the like. 01:29:06.450 --> 01:29:08.200 You literally use square brackets, but you 01:29:08.200 --> 01:29:11.530 don't specify the length inside there. 01:29:11.530 --> 01:29:12.748 Yeah? 01:29:12.748 --> 01:29:16.410 AUDIENCE: What variable [INAUDIBLE] at the top? 01:29:16.410 --> 01:29:18.410 DAVID MALAN: What about the variable at the top? 01:29:18.410 --> 01:29:22.205 AUDIENCE: [INAUDIBLE] 01:29:22.205 --> 01:29:23.330 DAVID MALAN: Good question. 01:29:23.330 --> 01:29:25.220 What do I have it defined as at the top? 01:29:25.220 --> 01:29:31.280 This variable, N, it must be an integer if you're going to use it inside 01:29:31.280 --> 01:29:33.840 of an arrays square brackets here. 01:29:33.840 --> 01:29:38.360 So this line 10, notice, no longer says 3, it says N. 01:29:38.360 --> 01:29:42.350 And so whatever N is 3 or 4 or something else, that's how many 01:29:42.350 --> 01:29:43.970 integers I will get in that array. 01:29:43.970 --> 01:29:47.070 And it must be, by definition of an array, an integer that 01:29:47.070 --> 01:29:48.320 goes in those square brackets. 01:29:48.320 --> 01:29:50.000 And here's a common source of confusion. 01:29:50.000 --> 01:29:52.350 When you create the array, that is declare it, 01:29:52.350 --> 01:29:54.350 you use square brackets like this, where you put 01:29:54.350 --> 01:29:56.210 the total number of elements you want. 01:29:56.210 --> 01:29:59.820 When you subsequently use the array, like I'm doing here, 01:29:59.820 --> 01:30:02.690 you don't mention int again-- just like you don't mention int 01:30:02.690 --> 01:30:04.610 again and again once a variable exists. 01:30:04.610 --> 01:30:10.220 You use the square brackets still, but you don't use N. You use 0 or 1 or 2 01:30:10.220 --> 01:30:11.990 or, generically here, i. 01:30:11.990 --> 01:30:14.810 So when C was designed, they sometimes used the same syntax 01:30:14.810 --> 01:30:17.060 for two different ideas or contexts. 01:30:17.060 --> 01:30:17.984 Yeah? 01:30:17.984 --> 01:30:22.645 AUDIENCE: Do you have to include line 6 [INAUDIBLE]?? 01:30:22.645 --> 01:30:23.770 DAVID MALAN: Good question. 01:30:23.770 --> 01:30:25.900 Do I have to include line 6? 01:30:25.900 --> 01:30:29.290 Short answer, yes, because of the reason we ran into last week. 01:30:29.290 --> 01:30:32.750 C, or clang really, reads your code top to bottom, left to right. 01:30:32.750 --> 01:30:38.890 And so if the compiler sees some mention of this function average on line 16, 01:30:38.890 --> 01:30:41.800 but you haven't told the compiler that average exists, 01:30:41.800 --> 01:30:43.610 you're going to get an error on the screen. 01:30:43.610 --> 01:30:45.490 So the conventional way to do that is you 01:30:45.490 --> 01:30:48.670 just copy paste the first line of code from the function, 01:30:48.670 --> 01:30:51.260 it's so-called prototype or declaration. 01:30:51.260 --> 01:30:51.760 Yeah? 01:30:51.760 --> 01:30:55.662 AUDIENCE: Is there a library if you don't know the size of the array? 01:30:55.662 --> 01:30:58.120 DAVID MALAN: Really good question, and a perfect segue way. 01:30:58.120 --> 01:31:01.078 Is there a library you can use if you don't know the size of the array? 01:31:01.078 --> 01:31:01.720 No. 01:31:01.720 --> 01:31:07.660 And so if any of you have programmed in Java or Python or other languages, 01:31:07.660 --> 01:31:11.020 you can actually just ask the array, how big is it? 01:31:11.020 --> 01:31:13.778 In C, you and I, the programmers, have to remember it. 01:31:13.778 --> 01:31:15.820 And so short answer, no, there's no function that 01:31:15.820 --> 01:31:17.445 will just automatically do this for us. 01:31:17.445 --> 01:31:20.230 And in fact, let me make a more subtle claim 01:31:20.230 --> 01:31:23.950 that it's fine to use global variables like this if they're really 01:31:23.950 --> 01:31:25.160 for configuration options. 01:31:25.160 --> 01:31:25.660 Why? 01:31:25.660 --> 01:31:28.160 It's just convenient to put them at the very top of the file 01:31:28.160 --> 01:31:30.565 because everyone, you, your colleagues, your TAs 01:31:30.565 --> 01:31:32.440 are going to see them at the top of the code. 01:31:32.440 --> 01:31:36.130 But you really shouldn't be using them everywhere throughout your code. 01:31:36.130 --> 01:31:38.380 It'd be better if the average function, itself, were 01:31:38.380 --> 01:31:40.610 independent of that special variable. 01:31:40.610 --> 01:31:42.025 So by that, I mean this. 01:31:42.025 --> 01:31:46.240 You know what I should really do, if I really want to be well-designed? 01:31:46.240 --> 01:31:51.400 I should pass in the length of the array to the average function. 01:31:51.400 --> 01:31:54.310 I should give the average function a second argument-- 01:31:54.310 --> 01:31:57.800 I'll call it length, for instance, but I could call it anything I want. 01:31:57.800 --> 01:32:02.500 And so rather than putting N all the way down here at the bottom of my file, 01:32:02.500 --> 01:32:05.745 let me just dynamically say length instead. 01:32:05.745 --> 01:32:08.620 And this is a subtlety-- and no need to get too tripped up over this. 01:32:08.620 --> 01:32:11.830 But this, now, is just an example of how the same function can 01:32:11.830 --> 01:32:13.690 take not one, but two arguments. 01:32:13.690 --> 01:32:19.400 But indeed, in C, you must remember, yourself, what the length of an array 01:32:19.400 --> 01:32:19.900 is. 01:32:19.900 --> 01:32:22.810 You can't just ask the array via some syntax 01:32:22.810 --> 01:32:26.560 like you can, those of you who've programmed before in Java or Python. 01:32:26.560 --> 01:32:27.070 Yeah? 01:32:27.070 --> 01:32:35.115 AUDIENCE: [INAUDIBLE] 01:32:35.115 --> 01:32:36.240 DAVID MALAN: Good question. 01:32:36.240 --> 01:32:39.198 Would it be better designed to write a function that computes the size? 01:32:39.198 --> 01:32:42.570 Short answer, can't do that in C. As soon as you pass an array 01:32:42.570 --> 01:32:47.263 into a function in C, you cannot figure out its size if it's a generic array 01:32:47.263 --> 01:32:48.180 like that of integers. 01:32:48.180 --> 01:32:51.040 There are special cases that you can do that. 01:32:51.040 --> 01:32:53.283 But in general, no, it's just not possible in C. 01:32:53.283 --> 01:32:55.200 And if that's some frustration, honestly, this 01:32:55.200 --> 01:32:57.180 is why more modern languages add that feature. 01:32:57.180 --> 01:32:57.680 Why? 01:32:57.680 --> 01:32:59.910 Because it was really annoying, as I'm alluding here 01:32:59.910 --> 01:33:01.560 to not having that information. 01:33:01.560 --> 01:33:03.643 Now, just to make sure I didn't screw up anywhere, 01:33:03.643 --> 01:33:07.540 let me compile this final version of scores. 01:33:07.540 --> 01:33:08.620 Suspense. 01:33:08.620 --> 01:33:14.030 All good. ./scores, 72, 73, 33, and we're still back in business. 01:33:14.030 --> 01:33:15.530 So this version is more complicated. 01:33:15.530 --> 01:33:18.738 And as always, we'll have this version on the course's website for reference. 01:33:18.738 --> 01:33:20.740 But the point, really, is that arrays, not only 01:33:20.740 --> 01:33:23.290 can be used as containers to store multiple values-- 01:33:23.290 --> 01:33:25.490 three or more in this case-- 01:33:25.490 --> 01:33:30.440 you can also even pass them around as arguments, as such. 01:33:30.440 --> 01:33:34.300 All right, now besides that, let's simplify for just a moment, 01:33:34.300 --> 01:33:36.100 and consider now the world of chars. 01:33:36.100 --> 01:33:39.200 If we've just got single bytes, where does this lead us? 01:33:39.200 --> 01:33:41.200 And how does this get us, ultimately, to strings 01:33:41.200 --> 01:33:44.170 to solve problems like readability and cryptography and the like? 01:33:44.170 --> 01:33:46.390 Well here, for instance, are three lines of code, 01:33:46.390 --> 01:33:48.967 out of context, that simply store three chars. 01:33:48.967 --> 01:33:50.800 And you can already see where this is going. 01:33:50.800 --> 01:33:53.920 Having three variables called c1, c2, c3 is clearly 01:33:53.920 --> 01:33:57.470 going to end up being bad design because of all the silly redundancy here. 01:33:57.470 --> 01:33:59.650 But notice, I'm using single quotes like last week 01:33:59.650 --> 01:34:01.330 because these are single chars. 01:34:01.330 --> 01:34:03.647 What does this look like in the computer's memory? 01:34:03.647 --> 01:34:05.480 Well, it looks a little something like this. 01:34:05.480 --> 01:34:09.730 If we clear out the old memory, c1, c2, c3 probably 01:34:09.730 --> 01:34:12.562 will end up here, maybe not literally in the top left-hand corner. 01:34:12.562 --> 01:34:14.020 This is just an artist's rendition. 01:34:14.020 --> 01:34:18.440 But c1, c2, c3 will probably end up like that. 01:34:18.440 --> 01:34:20.020 Now, what's really there? 01:34:20.020 --> 01:34:21.730 It's really those same three numbers-- 01:34:21.730 --> 01:34:23.350 72, 73, 33. 01:34:23.350 --> 01:34:27.920 But how many bits does a byte have? 01:34:27.920 --> 01:34:28.880 Just eight. 01:34:28.880 --> 01:34:33.830 So if we were to look at the binary representation of these characters, 01:34:33.830 --> 01:34:35.330 it would only be eight bits each. 01:34:35.330 --> 01:34:39.140 That's enough to store small numbers like 72, 73, 33. 01:34:39.140 --> 01:34:41.580 We're not dealing with Unicode and emoji and the like. 01:34:41.580 --> 01:34:42.837 But the point is the same. 01:34:42.837 --> 01:34:45.170 You don't have to use four bytes to store these numbers. 01:34:45.170 --> 01:34:48.087 You can use a different data type like chars, and underneath the hood, 01:34:48.087 --> 01:34:51.420 it's, indeed, going to use just single bytes for each. 01:34:51.420 --> 01:34:55.850 But this is sort of like a-- this isn't really how we implement strings, right? 01:34:55.850 --> 01:34:59.270 When you wanted to say, hi, last week, or this, we used double quotes. 01:34:59.270 --> 01:35:02.400 And we wrote all of the things together and used one variable, not three, 01:35:02.400 --> 01:35:02.900 right? 01:35:02.900 --> 01:35:06.260 When I typed in David, I didn't have a variable for D-A-V-I-D. 01:35:06.260 --> 01:35:09.750 I had one variable called name that stored the whole thing. 01:35:09.750 --> 01:35:13.310 So in C, we keep talking about these things called strings. 01:35:13.310 --> 01:35:17.427 We'll see, eventually, that strings are not necessarily what they seem to be. 01:35:17.427 --> 01:35:19.760 But for now, the key thing about strings is that they're 01:35:19.760 --> 01:35:22.070 variable length, so to speak, right? 01:35:22.070 --> 01:35:25.250 They might be three characters, Hi, or five characters, David, 01:35:25.250 --> 01:35:28.250 or anything smaller or larger. 01:35:28.250 --> 01:35:30.980 So how do we go about implementing strings, 01:35:30.980 --> 01:35:33.110 if all we have at the end of the day is my memory? 01:35:33.110 --> 01:35:36.290 Well, here is an example of just creating, declaring, 01:35:36.290 --> 01:35:39.650 and defining a string called s. s because it's just a simple string, 01:35:39.650 --> 01:35:41.900 and quote unquote, HI!, in double quotes. 01:35:41.900 --> 01:35:44.090 What does this look like in the computer's memory? 01:35:44.090 --> 01:35:45.230 Well, let's clear it again. 01:35:45.230 --> 01:35:48.110 And here, now, because it's technically stored in one variable, 01:35:48.110 --> 01:35:50.960 s, here is how I might draw it as an artist. 01:35:50.960 --> 01:35:52.520 It's three bytes in total-- 01:35:52.520 --> 01:35:53.990 H-I exclamation point. 01:35:53.990 --> 01:35:59.630 But there's no c1, c2, c3, it's just, the whole thing is s. 01:35:59.630 --> 01:36:03.800 But it turns out that a string, fun fact, 01:36:03.800 --> 01:36:06.990 is really just what underneath the hood? 01:36:06.990 --> 01:36:09.610 Kind of leading up to this-- 01:36:09.610 --> 01:36:12.090 what is a string, if this is how it's laid out in memory? 01:36:12.090 --> 01:36:13.190 AUDIENCE: An array. 01:36:13.190 --> 01:36:15.830 DAVID MALAN: Literally, it's just an array of characters. 01:36:15.830 --> 01:36:18.590 And we didn't have to know about arrays last week to use strings. 01:36:18.590 --> 01:36:21.382 This is where, again, the training wheels are starting to come off. 01:36:21.382 --> 01:36:23.730 But a string is just an array of characters. 01:36:23.730 --> 01:36:26.040 H-I exclamation point, for instance. 01:36:26.040 --> 01:36:28.370 So technically, an array-- 01:36:28.370 --> 01:36:33.890 or a string called s is really a variable called s that allows you 01:36:33.890 --> 01:36:38.150 to get at the first character with s[0], if you want-- s[1], s[2]. 01:36:38.150 --> 01:36:40.340 You can literally get individual characters 01:36:40.340 --> 01:36:43.820 just by treating s as though it's an array, which it really 01:36:43.820 --> 01:36:47.000 is underneath the hood, in this case. 01:36:47.000 --> 01:36:48.560 But there's a catch. 01:36:48.560 --> 01:36:51.500 How do you know where strings end? 01:36:51.500 --> 01:36:54.560 In the past, when I drew some integers on the screen, 01:36:54.560 --> 01:36:57.080 I know, I claim they always take up 4 bytes. 01:36:57.080 --> 01:37:00.200 If I had drawn a long, it always takes up 8 bytes. 01:37:00.200 --> 01:37:03.530 If I had drawn a character, it always takes up 1 byte. 01:37:03.530 --> 01:37:06.533 But how many bytes does a string take up? 01:37:06.533 --> 01:37:08.450 Yeah, I mean, that's kind of the right answer. 01:37:08.450 --> 01:37:10.490 In this case, three, it would seem. 01:37:10.490 --> 01:37:13.490 But if it's David, that's a good five characters. 01:37:13.490 --> 01:37:16.173 But where do we put the number three? 01:37:16.173 --> 01:37:17.840 Where do you put the number five, right? 01:37:17.840 --> 01:37:20.190 This is literally all that's inside your computer. 01:37:20.190 --> 01:37:23.430 This is all our building blocks in front of us. 01:37:23.430 --> 01:37:25.490 So how can we-- where does the three go? 01:37:25.490 --> 01:37:26.540 Where does the five go? 01:37:26.540 --> 01:37:29.420 Well, it turns out you can solve this in a couple of different ways. 01:37:29.420 --> 01:37:34.160 But the way humans decided to implement strings years ago is, indeed, an array, 01:37:34.160 --> 01:37:38.960 but they added one extra byte at the end of every such string array, 01:37:38.960 --> 01:37:41.840 just to make clear, with a so-called sentinel value, 01:37:41.840 --> 01:37:44.480 that the string ends here. 01:37:44.480 --> 01:37:45.050 Why? 01:37:45.050 --> 01:37:47.930 So that if you have two strings in the computer's memory like, HI! 01:37:47.930 --> 01:37:52.760 and bye, you know where the barrier is between the exclamation point of one 01:37:52.760 --> 01:37:54.590 and the letter B in the next, right? 01:37:54.590 --> 01:37:56.000 You need some kind of delimiter. 01:37:56.000 --> 01:38:00.110 And so what really is underneath the hood is this. 01:38:00.110 --> 01:38:04.460 When you store a string in memory, when you type in a string-- as the user, 01:38:04.460 --> 01:38:07.040 if you type in 3 characters, it's going to use 01:38:07.040 --> 01:38:10.280 3 plus 1 equals 4 bytes in total. 01:38:10.280 --> 01:38:14.130 If you type in David, it's going to use 5 plus 1 equals 6 bytes in total. 01:38:14.130 --> 01:38:14.630 Why? 01:38:14.630 --> 01:38:20.210 Because C automatically adds this special 0 at the end of the string. 01:38:20.210 --> 01:38:24.710 I've drawn it with backslash 0 because this is how you represent 0 as a char, 01:38:24.710 --> 01:38:25.710 as a character. 01:38:25.710 --> 01:38:28.230 But this is literally just 0, as we'll soon see. 01:38:28.230 --> 01:38:31.100 So any time there's a string in memory, it always takes up 01:38:31.100 --> 01:38:36.197 one more byte than you, yourself, as the programmer or human typed in. 01:38:36.197 --> 01:38:38.780 In fact, if we convert this again, just for discussion's sake, 01:38:38.780 --> 01:38:41.572 to those integers, what's literally stored in the computer's memory 01:38:41.572 --> 01:38:45.170 is going to be 72, 73, 33, and now a 0. 01:38:45.170 --> 01:38:48.240 And the computer, because of C and how it was invented, 01:38:48.240 --> 01:38:51.350 it's just smart enough to know that when you print out a string, 01:38:51.350 --> 01:38:54.530 it prints out every character until it sees a 0, 01:38:54.530 --> 01:38:56.150 and then it just stops printing. 01:38:56.150 --> 01:38:58.470 In particular, printf knows how this works. 01:38:58.470 --> 01:39:02.050 And this is why printf knows when to stop printing. 01:39:02.050 --> 01:39:03.800 Decimal numbers are not that enlightening. 01:39:03.800 --> 01:39:05.940 We'll generally write the characters like this. 01:39:05.940 --> 01:39:09.350 And again, backslash 0 is just special symbology. 01:39:09.350 --> 01:39:13.190 It's what the programmer types to make clear that you're not saying, HI!, 0. 01:39:13.190 --> 01:39:15.980 You're saying HI!, and then it's a special 0. 01:39:15.980 --> 01:39:20.887 Specifically, it is eight 0 bits that indicate 01:39:20.887 --> 01:39:22.220 that it's the end of the string. 01:39:22.220 --> 01:39:26.330 Technically, that backslash zero, if you want to be fancy, it's called null, 01:39:26.330 --> 01:39:27.320 N-U-L-L. 01:39:27.320 --> 01:39:30.320 And it turns out, you've seen this before, though we didn't call it out. 01:39:30.320 --> 01:39:33.230 Here's that same ASCII chart from the past couple of weeks. 01:39:33.230 --> 01:39:39.080 If I highlight this, what is decimal number 0 mapping to? 01:39:39.080 --> 01:39:42.830 NUL, which is just programmer speak for the special null character. 01:39:42.830 --> 01:39:46.550 All 0 bits that means the string ends here. 01:39:46.550 --> 01:39:48.510 This all happens automatically for you. 01:39:48.510 --> 01:39:53.420 You do not need to create these null characters or these zeros. 01:39:53.420 --> 01:40:00.030 Any questions then, on this implementation thus far? 01:40:00.030 --> 01:40:01.820 Any questions here? 01:40:01.820 --> 01:40:02.320 No? 01:40:02.320 --> 01:40:03.195 Well, let me do this. 01:40:03.195 --> 01:40:05.310 Let me go back to VS Code in a second. 01:40:05.310 --> 01:40:07.770 And let's actually corroborate this with some code. 01:40:07.770 --> 01:40:10.830 Let me go ahead and create a small program called hi.c. 01:40:10.830 --> 01:40:12.070 And how about we do this? 01:40:12.070 --> 01:40:14.550 Let me include stdio.h. 01:40:14.550 --> 01:40:18.670 Let me include-- let me type out int main void, as always. 01:40:18.670 --> 01:40:20.910 And now let me do something simple and kind of bad, 01:40:20.910 --> 01:40:24.960 but char c1 equals quote unquote, h, in single quotes. 01:40:24.960 --> 01:40:28.590 Char c2 equals quote unquote, I, in single quotes. 01:40:28.590 --> 01:40:32.830 And lastly, char c3 equals exclamation point, in single quotes. 01:40:32.830 --> 01:40:34.500 And now, let me just print this out. 01:40:34.500 --> 01:40:36.960 I can't use %s because that is not a string. 01:40:36.960 --> 01:40:40.290 That's literally three chars, because that's the design decision I made. 01:40:40.290 --> 01:40:41.430 But I could do this-- 01:40:41.430 --> 01:40:48.600 %c, %c, %c, which we haven't seen before, but %s is string, %i is int, 01:40:48.600 --> 01:40:51.060 %c is, indeed, char. 01:40:51.060 --> 01:40:54.150 So let me put a backslash n at the end for cleanliness, 01:40:54.150 --> 01:40:56.280 and now do, c1, c2, c3. 01:40:56.280 --> 01:41:00.430 So this is like a char-based version of printing string. 01:41:00.430 --> 01:41:01.650 So let me make HI! 01:41:01.650 --> 01:41:05.880 And then let me do ./hi, and it looks like I used printf with %s. 01:41:05.880 --> 01:41:09.750 But I did things very manually by printing out each individual character. 01:41:09.750 --> 01:41:11.700 What's cool now, though, is that once you 01:41:11.700 --> 01:41:15.270 know that characters are just numbers and strings are just characters, 01:41:15.270 --> 01:41:16.560 you can kind of poke around. 01:41:16.560 --> 01:41:21.970 Let me change all three placeholders to %i instead. 01:41:21.970 --> 01:41:23.860 And this is totally fine, too. 01:41:23.860 --> 01:41:26.310 Let me rerun this, make hi. 01:41:26.310 --> 01:41:31.570 Actually, let me make one change, just so we can see this. 01:41:31.570 --> 01:41:37.710 Let me add spaces, just for aesthetics sake, let me do make hi, ./hi, Enter, 01:41:37.710 --> 01:41:40.350 and voila, like now, you can actually see the numbers, 01:41:40.350 --> 01:41:44.085 that I claimed back in week zero, were in fact happening underneath the hood. 01:41:44.085 --> 01:41:45.960 Well, this is not how you would make strings. 01:41:45.960 --> 01:41:49.457 It'd be incredibly tedious to have three variables for three letter words, five 01:41:49.457 --> 01:41:50.790 variables for five letter words. 01:41:50.790 --> 01:41:52.998 We've been using, of course, strings since last week, 01:41:52.998 --> 01:41:54.450 so let's do that instead. 01:41:54.450 --> 01:41:59.370 String s equals quote unquote, double quotes "HI!" 01:41:59.370 --> 01:42:02.520 For this, no, because of these training wheels, 01:42:02.520 --> 01:42:04.560 I need to include the CS50 library. 01:42:04.560 --> 01:42:06.580 But we'll come back to that in the coming weeks. 01:42:06.580 --> 01:42:10.530 But for now, I'm going to go ahead and create a string s called quote unquote, 01:42:10.530 --> 01:42:11.580 "HI!" 01:42:11.580 --> 01:42:14.760 And now I'm going to change this to be my familiar %s, 01:42:14.760 --> 01:42:17.610 and now just print out s itself. 01:42:17.610 --> 01:42:20.430 This, of course, is the same thing as last week, ./hi, 01:42:20.430 --> 01:42:24.750 gives me the exact same thing, but now, we're dealing, of course, with strings. 01:42:24.750 --> 01:42:27.610 But how can we see a little beyond that? 01:42:27.610 --> 01:42:28.810 Well, how about this? 01:42:28.810 --> 01:42:31.530 Let's poke around further with today's primitives. 01:42:31.530 --> 01:42:35.580 Even though s is a string, I could technically print out its first 01:42:35.580 --> 01:42:39.000 character with %c by doing s[0]. 01:42:39.000 --> 01:42:43.110 I could technically print out its second character with %c by doing s[1]. 01:42:43.110 --> 01:42:47.820 I could print out its third character with %c and printing out s[2]. 01:42:47.820 --> 01:42:50.430 So again, this just derives logically from my understanding 01:42:50.430 --> 01:42:52.770 now that strings are arrays, as you note. 01:42:52.770 --> 01:42:54.540 Let me do make-- 01:42:54.540 --> 01:42:57.300 let me do make hi, ./hi. 01:42:57.300 --> 01:43:00.760 And no visual change, but I'm just kind of now tinkering around. 01:43:00.760 --> 01:43:03.400 And in fact, if you're really curious, let me do this. 01:43:03.400 --> 01:43:06.870 Let me change these back to i, back to i-- 01:43:06.870 --> 01:43:08.250 oops, back to i. 01:43:08.250 --> 01:43:11.310 And let me add a fourth one because if I'm really curious now, 01:43:11.310 --> 01:43:14.490 let's see what's in s[3]. 01:43:14.490 --> 01:43:16.020 This is the fourth byte. 01:43:16.020 --> 01:43:18.990 And even though the string itself is H-I, 01:43:18.990 --> 01:43:21.840 I think we can corroborate this whole null thing. 01:43:21.840 --> 01:43:26.248 Make hi, ./hi, Enter, and there it is. 01:43:26.248 --> 01:43:28.290 You could have done this last week, if you really 01:43:28.290 --> 01:43:29.580 wanted to geek out on strings. 01:43:29.580 --> 01:43:33.060 But for now, it's just revealing what's going on underneath the hood. 01:43:33.060 --> 01:43:36.480 Questions then, on what these strings are? 01:43:36.480 --> 01:43:37.498 Yeah? 01:43:37.498 --> 01:43:41.293 AUDIENCE: [INAUDIBLE] 01:43:41.293 --> 01:43:42.960 DAVID MALAN: Why do we need the bracket? 01:43:42.960 --> 01:43:45.430 AUDIENCE: [INAUDIBLE] 01:43:45.430 --> 01:43:47.180 DAVID MALAN: Why do you not need brackets? 01:43:47.180 --> 01:43:47.780 Good question. 01:43:47.780 --> 01:43:51.620 Why do I not need brackets on line 6? 01:43:51.620 --> 01:43:53.300 Because s is a string. 01:43:53.300 --> 01:43:56.930 We'll see in a couple of weeks that s is, essentially, 01:43:56.930 --> 01:44:00.200 implemented underneath the hood, indeed, as an array, 01:44:00.200 --> 01:44:02.240 but that happens automatically for you. 01:44:02.240 --> 01:44:06.800 You can treat s as just a variable name without square brackets. 01:44:06.800 --> 01:44:09.500 You will use square brackets when you have arrays of ints 01:44:09.500 --> 01:44:13.730 or you manually create arrays of chars or doubles or floats or anything else. 01:44:13.730 --> 01:44:14.900 But strings are special. 01:44:14.900 --> 01:44:15.440 Why? 01:44:15.440 --> 01:44:19.190 I mean, every program you write seems to use strings, text in some form. 01:44:19.190 --> 01:44:21.930 We're humans we like text, not just numbers and such. 01:44:21.930 --> 01:44:25.910 So this is just treated a little specially in C and many other languages 01:44:25.910 --> 01:44:28.580 as well. 01:44:28.580 --> 01:44:31.170 Other questions on this here? 01:44:31.170 --> 01:44:31.670 No? 01:44:31.670 --> 01:44:33.530 Let's add then, one other string to the mix. 01:44:33.530 --> 01:44:36.290 So instead of just saying, HI!, why don't we consider a version 01:44:36.290 --> 01:44:38.660 of the program that says both, HI! and BYE!. 01:44:38.660 --> 01:44:41.420 And I claim now that that backslash zero, 01:44:41.420 --> 01:44:44.270 that null character is going to be ever more important now 01:44:44.270 --> 01:44:46.820 if we've got two strings in memory, so that C knows 01:44:46.820 --> 01:44:48.570 how to distinguish one from the other. 01:44:48.570 --> 01:44:51.487 So let me go ahead and just get rid of these two lines for the moment. 01:44:51.487 --> 01:44:55.430 Let me recreate string s equals, quote unquote double quotes, "HI!" 01:44:55.430 --> 01:44:56.780 Let me give myself another one. 01:44:56.780 --> 01:44:59.905 And because I'm just playing around, I'll choose very short variable names. 01:44:59.905 --> 01:45:04.410 String t equals quote unquote, "BYE!" 01:45:04.410 --> 01:45:06.470 And then let me just print them both out. 01:45:06.470 --> 01:45:11.300 Let me go ahead and print out %s, backslash n, comma s, 01:45:11.300 --> 01:45:16.910 and then printf %s backslash n, and then t. 01:45:16.910 --> 01:45:19.970 So very simple demonstration of just these two variables. 01:45:19.970 --> 01:45:26.090 Make hi, ./hi, and of course, it prints out two lines, one after the other. 01:45:26.090 --> 01:45:27.980 What's actually going on underneath the hood? 01:45:27.980 --> 01:45:29.510 Well, let's go back to the computer's memory. 01:45:29.510 --> 01:45:32.160 HI!, I think, is going to be, I claim, pretty much the same. 01:45:32.160 --> 01:45:36.170 So s, I'll claim, is in the top left, followed by the backslash zero. 01:45:36.170 --> 01:45:40.035 And that's important now because BYE! probably is going to end up there. 01:45:40.035 --> 01:45:43.160 And visually, it wraps just by nature of how I've drawn this grid of bytes, 01:45:43.160 --> 01:45:44.330 but it's contiguous. 01:45:44.330 --> 01:45:46.340 B-Y-E-! 01:45:46.340 --> 01:45:51.470 null, A.K.A. backslash zero, this is now helpful to printf 01:45:51.470 --> 01:45:55.550 because now printf knows where one begins and ends 01:45:55.550 --> 01:45:58.580 by way of that special null character. 01:45:58.580 --> 01:46:00.230 But we can poke around now, too. 01:46:00.230 --> 01:46:01.620 What else can I do here? 01:46:01.620 --> 01:46:02.840 How about this? 01:46:02.840 --> 01:46:08.870 How about I go into my code here, back to VS code, and let me go ahead 01:46:08.870 --> 01:46:13.790 and say something like, well, if I've got two of these strings, 01:46:13.790 --> 01:46:15.410 you know, let's put them in an array. 01:46:15.410 --> 01:46:20.520 Let's kind of do this sort of arrays in arrays, sort of inception-style here. 01:46:20.520 --> 01:46:23.060 So string words[2]. 01:46:23.060 --> 01:46:25.100 So give me an array of two strings is what 01:46:25.100 --> 01:46:28.100 I'm saying here in code, even though we've not done it with strings yet. 01:46:28.100 --> 01:46:29.270 We only did it with ints. 01:46:29.270 --> 01:46:30.770 And now let me do this. 01:46:30.770 --> 01:46:35.480 The first word A.K.A. words[0] will equal, as before, HI! 01:46:35.480 --> 01:46:40.940 And now words[1] will equal quote unquote, "BYE!" 01:46:40.940 --> 01:46:43.760 And now I've done the exact same thing, but again, I'm 01:46:43.760 --> 01:46:48.650 just avoiding having s, t, q, r, and all these different variables in my code. 01:46:48.650 --> 01:46:52.790 I just now am treating them as one single array of strings. 01:46:52.790 --> 01:46:54.750 How do I change my code down here? 01:46:54.750 --> 01:46:57.380 Well, if I want to print the first word, I do words[0]. 01:46:57.380 --> 01:46:59.900 And if I want to print the second word, I do words[1]. 01:46:59.900 --> 01:47:02.088 This is not a useful exercise at the moment 01:47:02.088 --> 01:47:04.130 because I'm just making my code more complicated. 01:47:04.130 --> 01:47:06.830 But again, it allows us to poke around and see what's 01:47:06.830 --> 01:47:08.690 going on because there is that HI! 01:47:08.690 --> 01:47:09.530 and BYE!. 01:47:09.530 --> 01:47:10.700 But watch this. 01:47:10.700 --> 01:47:14.670 If I really want to be cool, I can do this. 01:47:14.670 --> 01:47:24.380 Let's print out %c, %c, %c, backslash n, and then here, %c, %c, %c, %c, 01:47:24.380 --> 01:47:25.700 so four of those. 01:47:25.700 --> 01:47:28.430 And now here's where things get interesting. 01:47:28.430 --> 01:47:30.620 Words is an array of strings. 01:47:30.620 --> 01:47:33.400 Again, if I may, what's a string? 01:47:33.400 --> 01:47:35.060 An array of characters. 01:47:35.060 --> 01:47:36.790 So just use the same logic. 01:47:36.790 --> 01:47:41.110 If words is an array of strings, you get at the first string with words[0]. 01:47:41.110 --> 01:47:44.530 How do you get at the first character in the first string? 01:47:44.530 --> 01:47:52.150 Bracket 0, words[0][1], and lastly, words[0][2]. 01:47:52.150 --> 01:47:57.460 And now down here, words[1], but the first character is there. 01:47:57.460 --> 01:48:00.400 Word[1], the second character is here. 01:48:00.400 --> 01:48:03.190 Words[1], the third character is here-- 01:48:03.190 --> 01:48:04.720 whoops-- third character's here. 01:48:04.720 --> 01:48:07.898 And words[1], the fourth character is here. 01:48:07.898 --> 01:48:09.190 This is not how people program. 01:48:09.190 --> 01:48:10.840 This is only for demonstrations sake. 01:48:10.840 --> 01:48:13.060 My God, it's so tedious and verbose already. 01:48:13.060 --> 01:48:20.410 But if I make hi now, ./hi, now, I'm manually reinventing %s, 01:48:20.410 --> 01:48:22.990 if I forgot it existed, using %c alone. 01:48:22.990 --> 01:48:25.900 But you can indeed manipulate arrays in this way. 01:48:25.900 --> 01:48:28.300 But because strings are arrays of characters, 01:48:28.300 --> 01:48:32.200 you can manipulate strings in this way too. 01:48:32.200 --> 01:48:34.675 Any question now on this syntax? 01:48:37.210 --> 01:48:38.800 Any questions here? 01:48:38.800 --> 01:48:39.460 No? 01:48:39.460 --> 01:48:39.970 No? 01:48:39.970 --> 01:48:42.070 All right, well, let's go ahead and propose 01:48:42.070 --> 01:48:45.830 that we solve a couple of other problems we might not have as before. 01:48:45.830 --> 01:48:49.150 But first, a quick visual of what's been going on underneath the hood here. 01:48:49.150 --> 01:48:52.420 If here, again, is where we left off on the screen, HI! and BYE! 01:48:52.420 --> 01:48:56.470 back to back, here is really how I just treated these things. 01:48:56.470 --> 01:49:00.880 s bracket 0, 1, 2, 3 and then t 0, 1, 2, 3, 4. 01:49:00.880 --> 01:49:04.840 But really, once I put them in an array, the picture becomes this. 01:49:04.840 --> 01:49:07.030 Words[0] is the whole HI!. 01:49:07.030 --> 01:49:08.680 Words[1] is the whole BYE!. 01:49:08.680 --> 01:49:11.470 But if I really get into the weeds and start indexing 01:49:11.470 --> 01:49:14.980 into individual characters in those strings, all I'm using 01:49:14.980 --> 01:49:20.710 is new syntax in order to represent these same values here. 01:49:20.710 --> 01:49:28.710 Questions then, on these representations before we forge ahead? 01:49:28.710 --> 01:49:29.430 No? 01:49:29.430 --> 01:49:30.030 Yeah? 01:49:30.030 --> 01:49:33.390 AUDIENCE: Does the new line character not [INAUDIBLE]?? 01:49:33.390 --> 01:49:36.030 DAVID MALAN: Does the new line character-- say that once more? 01:49:36.030 --> 01:49:38.597 AUDIENCE: Does the new line character take up any space? 01:49:38.597 --> 01:49:40.180 DAVID MALAN: Ah, really good question. 01:49:40.180 --> 01:49:42.730 Does the new line character take up any space? 01:49:42.730 --> 01:49:45.340 It does, so far as printf is concerned. 01:49:45.340 --> 01:49:48.790 But I'm not storing the backslash n in my strings, 01:49:48.790 --> 01:49:53.460 printf is being manually handed that thing instead. 01:49:53.460 --> 01:49:55.520 All right, so let's go ahead then and consider 01:49:55.520 --> 01:49:58.970 how we might solve some problems that have arisen now with these strings, 01:49:58.970 --> 01:50:00.680 as follows here. 01:50:00.680 --> 01:50:02.760 Suppose I-- let's do this. 01:50:02.760 --> 01:50:04.400 Let me go back to VS Code here. 01:50:04.400 --> 01:50:09.980 And let me go ahead and open up a new file called, how about, length.c. 01:50:09.980 --> 01:50:12.680 And let's consider for a moment how I might actually figure out 01:50:12.680 --> 01:50:16.130 what the length of a string is, which is distinct from the length of an array. 01:50:16.130 --> 01:50:19.680 I claimed earlier, you cannot figure out dynamically what the length of an array 01:50:19.680 --> 01:50:20.180 is. 01:50:20.180 --> 01:50:24.020 But I can figure out the length of a string, specifically, because 01:50:24.020 --> 01:50:26.960 of this implementation detail of that null character. 01:50:26.960 --> 01:50:28.500 So let me go ahead and do this. 01:50:28.500 --> 01:50:31.940 Let me include cs50.h in this second program here. 01:50:31.940 --> 01:50:35.090 Let me include stdio.h, as before. 01:50:35.090 --> 01:50:38.120 And let me do this, int main void-- 01:50:38.120 --> 01:50:40.970 and the first thing I'll do is just get a string from the user. 01:50:40.970 --> 01:50:43.250 I'll ask the user, as always, for their name. 01:50:43.250 --> 01:50:48.170 So I'll call getString, and say, what's your name, question mark, as always. 01:50:48.170 --> 01:50:51.620 And then down here, if I want to figure out the length of this string 01:50:51.620 --> 01:50:56.210 and print the length out on the screen, well, I 01:50:56.210 --> 01:50:58.465 can kind of do this similar in spirit to the average, 01:50:58.465 --> 01:50:59.840 where I'm accumulating something. 01:50:59.840 --> 01:51:02.600 Let me go ahead and initialize N to 0. 01:51:02.600 --> 01:51:05.120 Let me give myself-- 01:51:05.120 --> 01:51:07.035 it's not a for loop because I don't have a-- 01:51:07.035 --> 01:51:08.660 I don't know in advance how long it is. 01:51:08.660 --> 01:51:09.980 But what if I do this? 01:51:09.980 --> 01:51:20.600 While the value at name[n] does not equal '/0'-- 01:51:20.600 --> 01:51:23.390 crazy syntax at the moment, but it's just the culmination 01:51:23.390 --> 01:51:25.590 of these various building blocks. 01:51:25.590 --> 01:51:28.970 Let me just finish the thought here, n++. 01:51:28.970 --> 01:51:33.656 And then down here, let's just print out, with printf and %i, 01:51:33.656 --> 01:51:38.930 that value of N. So I claim this is going to show me the length of any 01:51:38.930 --> 01:51:43.220 string I type in, whether it's hi or bye or David or anything else. 01:51:43.220 --> 01:51:45.410 I initialize a variable to zero, and that's good 01:51:45.410 --> 01:51:47.535 because that's where you start counting in general. 01:51:47.535 --> 01:51:50.990 While name[0] does not equal backslash zero. 01:51:50.990 --> 01:51:51.930 What is this saying? 01:51:51.930 --> 01:51:55.580 Well, if name is the string the user typed in-- and name is just an array, 01:51:55.580 --> 01:51:56.460 as you noted-- 01:51:56.460 --> 01:51:59.390 the name[0] is going to be the first character. 01:51:59.390 --> 01:52:02.750 And I'm asking the question, well, does the first character not equal 01:52:02.750 --> 01:52:03.680 backslash zero? 01:52:03.680 --> 01:52:08.750 And if I type in David, D, it's not, so I keep going and I add 1 to N. 01:52:08.750 --> 01:52:10.750 Then I'm going to check name[1]. 01:52:10.750 --> 01:52:13.895 Well, if I typed in David, name[1] is going to be A. 01:52:13.895 --> 01:52:18.020 A does not equal backslash zero, and so it's going to go again and again 01:52:18.020 --> 01:52:18.740 and again. 01:52:18.740 --> 01:52:23.090 But five steps in total later, it's going to get to the byte after 01:52:23.090 --> 01:52:26.480 D-A-V-I-D, realize, wait a minute, that is a backslash n. 01:52:26.480 --> 01:52:29.750 The loop finishes, and I print out the total length. 01:52:29.750 --> 01:52:33.050 Arrays, in general, do not have this null character. 01:52:33.050 --> 01:52:34.910 However, strings do. 01:52:34.910 --> 01:52:38.150 Again, strings are special versus all of the other data types 01:52:38.150 --> 01:52:39.590 we've talked about thus far. 01:52:39.590 --> 01:52:43.220 But how could I, for instance, do this differently? 01:52:43.220 --> 01:52:47.220 Well, let's actually factor this out as a function, as I've commonly done. 01:52:47.220 --> 01:52:50.540 But rather than implement it myself, you know what? 01:52:50.540 --> 01:52:54.140 It turns out what's nice about strings being so common, 01:52:54.140 --> 01:52:57.260 there are many other people who have solved these problems before. 01:52:57.260 --> 01:53:00.290 And in fact, there's a whole string library in C. 01:53:00.290 --> 01:53:04.190 It is used by way of a header file called string.h. 01:53:04.190 --> 01:53:08.400 And what string.h is, is a library of string-related functions. 01:53:08.400 --> 01:53:10.760 In fact, you can see in CS50's manual pages 01:53:10.760 --> 01:53:16.217 for C, the string.h functions, at least those that we recommend as most useful, 01:53:16.217 --> 01:53:18.050 and in particular, if you poke around there, 01:53:18.050 --> 01:53:20.290 you'll see that there's a function called strlen. 01:53:20.290 --> 01:53:22.055 It means string length. 01:53:22.055 --> 01:53:24.680 It was named very succinctly, just because it's a little easier 01:53:24.680 --> 01:53:25.850 to type than string length. 01:53:25.850 --> 01:53:28.800 But strlen tells you the length of a string. 01:53:28.800 --> 01:53:30.990 So how might I use this in my code here? 01:53:30.990 --> 01:53:34.020 Well, it turns out, I can simplify this quite a bit. 01:53:34.020 --> 01:53:37.700 Let me get rid of my loop, get rid of my accounting 01:53:37.700 --> 01:53:40.880 manually, and do something like this-- int n 01:53:40.880 --> 01:53:45.630 equals strlen of the humans name, name. 01:53:45.630 --> 01:53:49.430 And now I'll just use printf, as before, with %i backslash n, 01:53:49.430 --> 01:53:51.290 and output the value of n. 01:53:51.290 --> 01:53:54.380 But there's a bug at the moment. 01:53:54.380 --> 01:53:58.480 What have I forgotten to do? 01:53:58.480 --> 01:54:01.670 Yeah, I have to include the header file at the top of the screen, 01:54:01.670 --> 01:54:03.260 so let me-- at the top of the code. 01:54:03.260 --> 01:54:07.640 So let me also include string.h at the top of my file, 01:54:07.640 --> 01:54:10.970 so that C knows that, in fact, strlen exists. 01:54:10.970 --> 01:54:14.170 Let me go ahead and make length, as before. 01:54:14.170 --> 01:54:18.670 ./length-- or actually, really for the first time, what's your name? 01:54:18.670 --> 01:54:22.360 D-A-V-I-D. And hopefully, I'm going to see, in fact, 5. 01:54:22.360 --> 01:54:26.950 By contrast, if I run it again and type in HI!, now I see three. 01:54:26.950 --> 01:54:29.785 So strlen is just one of the functions in that library. 01:54:29.785 --> 01:54:30.910 And there are so many more. 01:54:30.910 --> 01:54:33.700 In fact, yet another library that might be useful moving forward 01:54:33.700 --> 01:54:37.570 is this one, ctype, which relates to C data 01:54:37.570 --> 01:54:40.580 types and lots of functions therein that can be useful. 01:54:40.580 --> 01:54:43.690 For instance, if you review its documentation in the manual pages 01:54:43.690 --> 01:54:46.930 online, you'll see that there are functions via which 01:54:46.930 --> 01:54:49.460 we can solve problems like this. 01:54:49.460 --> 01:54:52.480 Let me go ahead and propose here-- 01:54:52.480 --> 01:54:53.680 let me see. 01:54:53.680 --> 01:54:59.080 Let's do an example here involving-- 01:54:59.080 --> 01:55:03.250 how about checking if something is uppercase or lowercase, 01:55:03.250 --> 01:55:06.700 and converting it to uppercase only. 01:55:06.700 --> 01:55:10.810 Let me go back to VS Code, and code a program called uppercase.c. 01:55:10.810 --> 01:55:15.220 In this, file I'm going to start by including now, as always, cs50.h. 01:55:15.220 --> 01:55:17.710 I'm going to include stdio.h. 01:55:17.710 --> 01:55:21.670 And I'm going to add one other to the mix, which 01:55:21.670 --> 01:55:26.230 is string.h now too, so I can access the length of things as needed. 01:55:26.230 --> 01:55:28.570 Int main void comes next. 01:55:28.570 --> 01:55:30.460 And then within my main function, I'm going 01:55:30.460 --> 01:55:32.230 to go ahead and declare a string called s. 01:55:32.230 --> 01:55:34.240 I'm going to call getString, as before. 01:55:34.240 --> 01:55:38.170 And I'm going to go ahead and just ask the user for a string called before. 01:55:38.170 --> 01:55:39.670 I want to do a before and after. 01:55:39.670 --> 01:55:41.350 Whatever the user types in is before. 01:55:41.350 --> 01:55:44.770 But I want to force everything to uppercase, thereafter. 01:55:44.770 --> 01:55:48.740 Let me now, in this loop here, do this. 01:55:48.740 --> 01:55:53.800 Let me printf quote unquote, "After," just so we can see this on the screen. 01:55:53.800 --> 01:56:02.440 And let me do four int i gets 0, i is less than strlen of s, i++. 01:56:02.440 --> 01:56:03.610 What am I about to do? 01:56:03.610 --> 01:56:06.190 I'm about to iterate over every character in the string 01:56:06.190 --> 01:56:11.230 from left to right, from 0 on up to, but not through, the length of s. 01:56:11.230 --> 01:56:13.990 And how do I check if something is lowercase, 01:56:13.990 --> 01:56:16.990 so that I can actually force it to uppercase? 01:56:16.990 --> 01:56:19.630 Well, it turns out, I could do this literally. 01:56:19.630 --> 01:56:27.436 If the character in s at location i is greater than or equal to capital A, 01:56:27.436 --> 01:56:31.780 ampersand, ampersand, which means and instead of or, which we saw 01:56:31.780 --> 01:56:37.930 in the past, s[i] is less than or equal to little z, that means, 01:56:37.930 --> 01:56:41.800 logically in English, that this is indeed lowercase. 01:56:41.800 --> 01:56:44.830 How do I now convert it to uppercase, this character? 01:56:44.830 --> 01:56:48.160 Well, I could just literally print out the same character. 01:56:48.160 --> 01:56:52.280 But that would not be the answer here because that's not changing the value. 01:56:52.280 --> 01:56:54.470 But what could I do instead? 01:56:54.470 --> 01:56:59.890 Well, let me actually pull up here real fast the ASCII chart as before, 01:56:59.890 --> 01:57:03.220 and let's see if we can't glean some insight. 01:57:03.220 --> 01:57:05.710 If I pull up the same ASCII chart, and suppose 01:57:05.710 --> 01:57:09.790 the human has typed in a lowercase a, that's 97. 01:57:09.790 --> 01:57:13.240 What letter-- I want to convert it to uppercase 01:57:13.240 --> 01:57:18.660 A, so what number do I want to convert the 97 to, per week zero? 01:57:18.660 --> 01:57:21.000 So 65, we keep coming back to that one. 01:57:21.000 --> 01:57:23.010 What if the user types in lowercase b? 01:57:23.010 --> 01:57:27.550 I want to change the 98 value to 66, and so forth. 01:57:27.550 --> 01:57:30.130 And any quick math, how far apart are those? 01:57:30.130 --> 01:57:33.120 So it's always 32, like uppercase to lowercase 01:57:33.120 --> 01:57:37.990 is always, wonderfully, good design, 32 away, one from the other. 01:57:37.990 --> 01:57:39.100 So what does this mean? 01:57:39.100 --> 01:57:41.350 Well, I think we saw earlier that underneath the hood, 01:57:41.350 --> 01:57:42.600 a char is just a number. 01:57:42.600 --> 01:57:44.340 You can certainly do arithmetic on it. 01:57:44.340 --> 01:57:46.507 And here, again, if you understand these lower level 01:57:46.507 --> 01:57:48.180 primitives, what if I do this? 01:57:48.180 --> 01:57:53.940 Whatever s[i] is, if I know on line 13 that it's lowercase, 01:57:53.940 --> 01:57:57.048 do I want to add or subtract 32? 01:57:57.048 --> 01:57:57.840 AUDIENCE: Subtract. 01:57:57.840 --> 01:58:01.910 DAVID MALAN: So I want to subtract because I want to go from like 97 to 65 01:58:01.910 --> 01:58:06.560 or 98 to 66, so indeed, if you do some quick math, that gives you 32. 01:58:06.560 --> 01:58:10.970 So it's suffices to just treat chars as numbers, subtract the 32, 01:58:10.970 --> 01:58:16.370 and printing it with %c, I think, will just convert lowercase to uppercase. 01:58:16.370 --> 01:58:19.795 If you now fast forward to the real world, Microsoft Word or Google Docs, 01:58:19.795 --> 01:58:22.670 if you've ever chosen the menu option that forces things to uppercase 01:58:22.670 --> 01:58:24.980 or lowercase on occasion, literally, that's 01:58:24.980 --> 01:58:26.480 what Microsoft and Google have done. 01:58:26.480 --> 01:58:29.605 They iterate over every character in the document, check if it's lowercase, 01:58:29.605 --> 01:58:33.810 and if so, they subtract 32 from it and show you the new value. 01:58:33.810 --> 01:58:36.650 What if, though, it is not a lowercase letter? 01:58:36.650 --> 01:58:40.520 I think I can keep it easy and just print out the current letter unchanged, 01:58:40.520 --> 01:58:44.850 if my goal is to simply force things to all uppercase, and that letter, 01:58:44.850 --> 01:58:46.490 then would be s[i]. 01:58:46.490 --> 01:58:50.750 So let me go ahead now and make uppercase, hopefully, no errors. 01:58:50.750 --> 01:58:55.670 ./uppercase, and I'll now type in David with an uppercase D, 01:58:55.670 --> 01:58:57.120 but lowercase everything else. 01:58:57.120 --> 01:59:00.020 But now the after version is DAVID-- 01:59:00.020 --> 01:59:01.190 an aesthetic bug. 01:59:01.190 --> 01:59:04.400 Notice here, I forgot to include, just for prettiness sake, 01:59:04.400 --> 01:59:05.930 a backslash n at the end. 01:59:05.930 --> 01:59:07.640 No problem, I'll add that. 01:59:07.640 --> 01:59:08.870 Let me fix my mistake. 01:59:08.870 --> 01:59:12.050 Make uppercase, ./uppercase, Enter. 01:59:12.050 --> 01:59:14.240 D-A-V-I-D, Enter, and voila. 01:59:14.240 --> 01:59:16.820 And I deliberately added another space after, 01:59:16.820 --> 01:59:19.130 just so they would line up pretty, even though before 01:59:19.130 --> 01:59:22.070 and after have different numbers of letters. 01:59:22.070 --> 01:59:25.630 Questions then, on this implementation of forcing something 01:59:25.630 --> 01:59:28.380 to uppercase, which in and of itself is not all that enlightening, 01:59:28.380 --> 01:59:33.990 but is representative now of how you can leverage these low level primitives. 01:59:33.990 --> 01:59:35.880 Question? 01:59:35.880 --> 01:59:36.380 No? 01:59:36.380 --> 01:59:38.633 All right, well, this honestly is tedious. 01:59:38.633 --> 01:59:40.550 My God, like does Microsoft, Google, everyone, 01:59:40.550 --> 01:59:43.550 you have to literally write out this code just to do something simple? 01:59:43.550 --> 01:59:46.310 Well, no, that's, again, why we have things like libraries. 01:59:46.310 --> 01:59:49.220 And increasingly now, for problem sets, projects, and beyond, 01:59:49.220 --> 01:59:52.040 well, you just use libraries more often off-the-shelf 01:59:52.040 --> 01:59:55.940 so as to solve problems that, surely, other people have had before you. 01:59:55.940 --> 01:59:59.570 So how can I now use this library, ctype.h? 01:59:59.570 --> 02:00:01.320 Well, let me go back into my code. 02:00:01.320 --> 02:00:05.090 Let me include this among my header files here. 02:00:05.090 --> 02:00:08.030 Just so I can skim things easily, I tend to alphabetize my headers. 02:00:08.030 --> 02:00:11.238 But that's not strictly necessary, but it allows me, at a glance, to realize, 02:00:11.238 --> 02:00:13.400 did I or did I not include something I need? 02:00:13.400 --> 02:00:15.570 Now, let me go ahead and do this. 02:00:15.570 --> 02:00:20.390 It turns out if you read the documentation for the C type library, 02:00:20.390 --> 02:00:24.710 there's a function, wonderfully called, if islower, 02:00:24.710 --> 02:00:28.910 that takes in a character as its argument, essentially, so s[i]. 02:00:28.910 --> 02:00:32.182 And if that returns true, a Boolean value, if you will, 02:00:32.182 --> 02:00:33.890 well, I'm going to force it to lowercase. 02:00:33.890 --> 02:00:36.560 But I don't have to do this math anymore. 02:00:36.560 --> 02:00:40.610 Turns out, in the C type library, there's also a function called to upper 02:00:40.610 --> 02:00:43.130 that takes a character as input, like s[i], 02:00:43.130 --> 02:00:45.060 and it just does the math for you. 02:00:45.060 --> 02:00:47.270 So that you can abstract away the 32 thing, 02:00:47.270 --> 02:00:50.400 and just know that someone else has solved that problem for you. 02:00:50.400 --> 02:00:53.030 Otherwise, I can leave my code unchanged down below 02:00:53.030 --> 02:00:55.200 because I'm not changing anything else. 02:00:55.200 --> 02:01:00.410 So if I do make uppercase now, and then ./uppercase, D-a-v-i-d, 02:01:00.410 --> 02:01:03.710 with just a capital D, and now it still works. 02:01:03.710 --> 02:01:06.890 But if you read the documentation further, it turns out that to upper 02:01:06.890 --> 02:01:07.520 is smart. 02:01:07.520 --> 02:01:10.220 If you pass in a character to to upper, that's lowercase, 02:01:10.220 --> 02:01:13.040 it obviously converts it to uppercase by doing that math. 02:01:13.040 --> 02:01:17.240 But if you pass in a character to to upper that's already uppercase, 02:01:17.240 --> 02:01:21.540 the documentation you would see tells you that it leaves it unchanged. 02:01:21.540 --> 02:01:23.910 So I can tighten all of this up. 02:01:23.910 --> 02:01:25.880 I can get rid of the whole else. 02:01:25.880 --> 02:01:29.150 I can get rid of the whole if, and arguably now, 02:01:29.150 --> 02:01:33.620 implement a program that's just as correct, but better designed. 02:01:33.620 --> 02:01:34.250 Why? 02:01:34.250 --> 02:01:38.000 Fewer lines of code easier to read, lower probability of mistakes, 02:01:38.000 --> 02:01:39.740 assuming the library is correct. 02:01:39.740 --> 02:01:43.160 It just makes it easier and faster for me, now, to write code. 02:01:43.160 --> 02:01:47.960 So if I now do, one last time, make uppercase, Enter, ./uppercase, 02:01:47.960 --> 02:01:50.190 and type in my name, still working. 02:01:50.190 --> 02:01:53.810 But now notice, we've whittled this down to far fewer lines of code, 02:01:53.810 --> 02:01:57.740 albeit, using now this additional library. 02:01:57.740 --> 02:02:00.140 Questions then on how we did this? 02:02:03.930 --> 02:02:06.230 Well, even though this code, I daresay, is correct, 02:02:06.230 --> 02:02:09.120 it's not necessarily well-designed just yet. 02:02:09.120 --> 02:02:12.590 In fact, there's one line of code, one function 02:02:12.590 --> 02:02:14.690 call in this current implementation that's 02:02:14.690 --> 02:02:17.900 more inefficient than it needs to be. 02:02:17.900 --> 02:02:20.630 And allow me to draw your attention to this here, 02:02:20.630 --> 02:02:24.320 line 10, wherein we're calling strlen. 02:02:24.320 --> 02:02:27.350 But we're calling it inside of this for loop, specifically, 02:02:27.350 --> 02:02:29.000 inside of the condition. 02:02:29.000 --> 02:02:33.720 And why might that not necessarily be the best idea? 02:02:33.720 --> 02:02:36.810 Well, is the length of the string as changing, ever? 02:02:36.810 --> 02:02:38.950 I mean, certainly not within the span of this loop. 02:02:38.950 --> 02:02:42.840 And so here we are within our for loop on line 10, 11, 12, and 13, 02:02:42.840 --> 02:02:45.242 asking on every iteration that same question. 02:02:45.242 --> 02:02:46.200 What's the length of s? 02:02:46.200 --> 02:02:47.190 What's the length of s? 02:02:47.190 --> 02:02:48.330 What's the length of s? 02:02:48.330 --> 02:02:50.702 And in turn, we're calling strlen every time, 02:02:50.702 --> 02:02:52.660 even though we're getting back the same answer. 02:02:52.660 --> 02:02:54.960 So I daresay a better solution here would 02:02:54.960 --> 02:02:58.230 be to maybe figure out the length of s earlier on in my code, 02:02:58.230 --> 02:02:59.490 and maybe declare a variable. 02:02:59.490 --> 02:03:02.580 Or perhaps do something that's syntactically a little more elegant, 02:03:02.580 --> 02:03:05.070 and in fact, a very common design in a loop like this, 02:03:05.070 --> 02:03:07.860 would be to declare not just one variable like i, 02:03:07.860 --> 02:03:12.060 but to actually declare a second variable called n, for instance, where 02:03:12.060 --> 02:03:16.530 n is just some number, set n equal to the length of s. 02:03:16.530 --> 02:03:18.900 But thereafter, inside of this condition, 02:03:18.900 --> 02:03:24.540 instead of calling strlen of s again and again and again, what might I now do? 02:03:24.540 --> 02:03:28.110 I could instead just compare i against n itself, 02:03:28.110 --> 02:03:31.080 because n now will only be calculated once when it's initialized, 02:03:31.080 --> 02:03:32.730 just as i is initialize to zero. 02:03:32.730 --> 02:03:36.000 And thereafter, we're going to be comparing i, which is changing, 02:03:36.000 --> 02:03:37.350 against n, which will not be. 02:03:37.350 --> 02:03:40.330 So it's going to be marginally more efficient by design. 02:03:40.330 --> 02:03:42.900 Now with that said, a good compiler could also 02:03:42.900 --> 02:03:46.080 recognize that there is this optimization possibility, 02:03:46.080 --> 02:03:47.100 and maybe do it for us. 02:03:47.100 --> 02:03:49.080 But for now, best to get into the habit, best 02:03:49.080 --> 02:03:52.260 to develop the muscle memory for making those better design decisions 02:03:52.260 --> 02:03:54.010 yourselves. 02:03:54.010 --> 02:03:56.380 Questions, then, on how we did this? 02:03:58.900 --> 02:03:59.650 No? 02:03:59.650 --> 02:04:03.050 All right, a few final building blocks for the day. 02:04:03.050 --> 02:04:07.870 So we started by talking about those command line arguments that clang uses, 02:04:07.870 --> 02:04:13.090 whereby, anything after the command that you type at a prompt, be it make 02:04:13.090 --> 02:04:18.160 or clang or even CD in Linux, any word thereafter, or something 02:04:18.160 --> 02:04:21.350 cryptic like -o is a command line argument. 02:04:21.350 --> 02:04:22.840 It's an input to the command. 02:04:22.840 --> 02:04:26.132 It's different from a function argument because a function argument, of course, 02:04:26.132 --> 02:04:27.280 is an input to a function. 02:04:27.280 --> 02:04:28.345 But it's the same idea. 02:04:28.345 --> 02:04:30.970 It's just different syntax after the dollar sign at the prompt. 02:04:30.970 --> 02:04:33.880 Well, it turns out that command line arguments 02:04:33.880 --> 02:04:37.660 are something you can now use in your own programs 02:04:37.660 --> 02:04:41.800 by accessing words after the prompt. 02:04:41.800 --> 02:04:45.410 And let me propose that we invent this as follows. 02:04:45.410 --> 02:04:49.540 Let me propose that we switch back to VS Code here, 02:04:49.540 --> 02:04:53.560 and I'll open a new file here called greet.c. 02:04:53.560 --> 02:04:56.410 So in greet.c, it's going to be a program that very simply greets 02:04:56.410 --> 02:04:57.070 the user. 02:04:57.070 --> 02:04:59.440 Had we written this last week, we would have done this. 02:04:59.440 --> 02:05:08.200 Include cs50.h, and then include stdio.h, and then int main void, 02:05:08.200 --> 02:05:13.060 and then we might do something simple like string name equals getString, 02:05:13.060 --> 02:05:15.980 quote unquote, "What's your name?" 02:05:15.980 --> 02:05:20.020 And then we would have printed out, as always, Hello, %s, 02:05:20.020 --> 02:05:21.490 and then plugging in that name. 02:05:21.490 --> 02:05:25.300 So this is the same program we've implemented many times, just 02:05:25.300 --> 02:05:26.590 to make sure it works-- 02:05:26.590 --> 02:05:29.140 although, nope, that's not quite the same program. 02:05:29.140 --> 02:05:30.940 Semicolon's in the wrong place. 02:05:30.940 --> 02:05:32.960 This now is the same program. 02:05:32.960 --> 02:05:37.610 So make greet, dot ./greet, and I'll type in my own name. hello, David. 02:05:37.610 --> 02:05:38.770 So we're back there. 02:05:38.770 --> 02:05:41.770 Now, what's arguably a little annoying about this program, 02:05:41.770 --> 02:05:44.110 if I type in something else like, Carter, 02:05:44.110 --> 02:05:48.130 Enter, I have to run the program, wait for the prompt, type in my name, 02:05:48.130 --> 02:05:48.910 hit Enter. 02:05:48.910 --> 02:05:52.360 And that's fine, but imagine if every program worked like this. 02:05:52.360 --> 02:05:55.415 Like make, suppose you could only type make, then you wait for a prompt, 02:05:55.415 --> 02:05:58.540 then you type the name of the program you want to make, then you hit Enter. 02:05:58.540 --> 02:06:01.720 Or worse, in Linux when you have to change directories, 02:06:01.720 --> 02:06:05.263 as you might have for problem set one, what if you had to type CD, Enter, 02:06:05.263 --> 02:06:07.930 now type the name of the folder you want to change into, Enter-- 02:06:07.930 --> 02:06:09.710 I mean, it just slows life down. 02:06:09.710 --> 02:06:11.470 And so it just gets annoying quickly. 02:06:11.470 --> 02:06:16.070 So command line arguments just let you express your whole thought all at once. 02:06:16.070 --> 02:06:18.200 So how can I do this? 02:06:18.200 --> 02:06:22.450 Well, if I want to express the notion of command line arguments in my code, 02:06:22.450 --> 02:06:25.640 I could do something like this. 02:06:25.640 --> 02:06:28.750 I could, for the very first time, go up and get 02:06:28.750 --> 02:06:33.730 rid of this void, which as of today means, this program takes no command 02:06:33.730 --> 02:06:34.780 line arguments. 02:06:34.780 --> 02:06:37.540 And I can change it to exactly this. 02:06:37.540 --> 02:06:43.490 Int argc, string argv, with brackets. 02:06:43.490 --> 02:06:44.950 Now it's cryptic, admittedly. 02:06:44.950 --> 02:06:46.150 And let me zoom in. 02:06:46.150 --> 02:06:49.300 But I think we can perhaps infer now, what's going on. 02:06:49.300 --> 02:06:52.750 If main now does not have void as its input, which 02:06:52.750 --> 02:06:55.600 means it takes no arguments, surely, the spoiler 02:06:55.600 --> 02:06:59.230 here is that now main will take command line arguments somehow. 02:06:59.230 --> 02:07:05.180 Any guesses as to what argv is or will be? 02:07:05.180 --> 02:07:08.330 What might this represent? 02:07:08.330 --> 02:07:11.390 It's an array of strings, right, by way of the syntax. 02:07:11.390 --> 02:07:13.223 Yeah? 02:07:13.223 --> 02:07:15.480 AUDIENCE: All the characters will be typed out. 02:07:15.480 --> 02:07:16.050 DAVID MALAN: Exactly. 02:07:16.050 --> 02:07:18.550 It will be all of the characters, or really all of the words 02:07:18.550 --> 02:07:19.830 that you type at the prompt. 02:07:19.830 --> 02:07:21.765 Argc, as an int, any guess? 02:07:24.360 --> 02:07:28.700 Argument count is what it generally stands for, though technically, 02:07:28.700 --> 02:07:30.290 you could call these things anything. 02:07:30.290 --> 02:07:31.520 But this is the convention. 02:07:31.520 --> 02:07:35.780 Because I claimed earlier that arrays don't keep track of their own length, 02:07:35.780 --> 02:07:38.930 if you want to know how many words the human typed at the prompt 02:07:38.930 --> 02:07:41.420 after your program's name, you have to be told, 02:07:41.420 --> 02:07:45.650 not just the array of the words, but the length of that array. 02:07:45.650 --> 02:07:48.530 The strings, you can figure out the length of using strlen, 02:07:48.530 --> 02:07:53.360 but you can't figure out the length of the array of strings, the collection 02:07:53.360 --> 02:07:55.020 of words that the human typed in. 02:07:55.020 --> 02:07:56.760 So how can I now use this? 02:07:56.760 --> 02:07:59.190 Well, let me go ahead and do this. 02:07:59.190 --> 02:08:04.190 Let me go ahead and change this program now just to be printf, quote unquote, 02:08:04.190 --> 02:08:11.630 "hello, %2 /n", then argv[1]. 02:08:11.630 --> 02:08:14.780 So this is not the best version of my code yet, but it's my first. 02:08:14.780 --> 02:08:21.020 Make greet, and now let me do ./greet, David all at once. 02:08:21.020 --> 02:08:23.210 Enter, hello, David. 02:08:23.210 --> 02:08:25.820 Now let me run it again, ./greet, Carter. 02:08:25.820 --> 02:08:27.620 Enter, hello, Carter. 02:08:27.620 --> 02:08:29.840 It's a marginal improvement, but I don't have 02:08:29.840 --> 02:08:32.330 to wait for getString to prompt me to hit Enter. 02:08:32.330 --> 02:08:34.370 It's just speeding things up, twice as fast. 02:08:34.370 --> 02:08:36.890 One less command to type in. 02:08:36.890 --> 02:08:41.390 But I deliberately did [1], but what's the beginning of argv? 02:08:41.390 --> 02:08:42.170 It would be [0]. 02:08:44.730 --> 02:08:45.780 Well, what's that? 02:08:45.780 --> 02:08:48.840 This is sometimes useful, though for now, it's not. 02:08:48.840 --> 02:08:54.110 Suppose I recompile my code and run this program now, greet David. 02:08:54.110 --> 02:08:58.598 Anyone want to guess what's in argv[0]? 02:08:58.598 --> 02:08:59.530 AUDIENCE: [INAUDIBLE] 02:08:59.530 --> 02:09:00.220 DAVID MALAN: Say again? 02:09:00.220 --> 02:09:01.230 AUDIENCE: Greet, hello. 02:09:01.230 --> 02:09:04.530 DAVID MALAN: Greet, Enter, hello, ./greet. 02:09:04.530 --> 02:09:08.280 So if you want to sort of inception style your program to figure out what 02:09:08.280 --> 02:09:11.910 its own name is, or at least how it was executed at the command line, 02:09:11.910 --> 02:09:14.460 at the terminal, you can look at argv[0]. 02:09:14.460 --> 02:09:17.160 In general, probably not that useful, probably better 02:09:17.160 --> 02:09:21.900 to start looking at [1], which was the first word after the program name. 02:09:21.900 --> 02:09:25.320 And if there were more, I could do this how about argv[2], 02:09:25.320 --> 02:09:27.690 let me add in a second %s. 02:09:27.690 --> 02:09:29.550 Let me recompile greet. 02:09:29.550 --> 02:09:35.490 Let me do ./greet David Malan, Enter, and that, too, now works, 02:09:35.490 --> 02:09:37.112 taking in two words at the prompt. 02:09:37.112 --> 02:09:38.820 If I really want to be smart at this now, 02:09:38.820 --> 02:09:40.445 I could do something like this, though. 02:09:40.445 --> 02:09:44.700 How about if the count of arguments, A.K.A. argc, 02:09:44.700 --> 02:09:49.890 equals equals to, then assume that the human typed in only their first name, 02:09:49.890 --> 02:09:58.440 and do printf hello comma %s /n, and then argv[1]. 02:09:58.440 --> 02:10:01.470 Else, if the human did not provide exactly two 02:10:01.470 --> 02:10:04.920 arguments, the name of the program and their own name, 02:10:04.920 --> 02:10:07.890 let's just print out a default value, lest they forgot their name 02:10:07.890 --> 02:10:09.990 or they typed in two names or three names. 02:10:09.990 --> 02:10:13.110 Let's just do, hello comma world as a default. 02:10:13.110 --> 02:10:15.270 And we'll just ignore what the human typed in. 02:10:15.270 --> 02:10:20.850 If I recompile this, make greet, I can do ./greet and David again, Enter. 02:10:20.850 --> 02:10:24.840 Oops-- sorry, what am I missing? 02:10:24.840 --> 02:10:26.640 Yeah, so newbie mistake. 02:10:26.640 --> 02:10:30.090 Else, all right, make greet again. 02:10:30.090 --> 02:10:34.050 ./greet, David, Enter, there's my hello, David. 02:10:34.050 --> 02:10:37.870 But if I omit my name, I just get the generic, like a default value. 02:10:37.870 --> 02:10:41.590 And if I get a little curious and I type in both names, then I get ignored too. 02:10:41.590 --> 02:10:42.090 Why? 02:10:42.090 --> 02:10:44.880 Because I just haven't built in support for argc of three. 02:10:44.880 --> 02:10:47.610 I could do anything I want, but now we have access 02:10:47.610 --> 02:10:50.730 to these kinds of building blocks. 02:10:50.730 --> 02:10:52.780 All right, what else might I do here? 02:10:52.780 --> 02:10:57.660 Well, it turns out there might be some final features for us to now execute. 02:10:57.660 --> 02:11:00.090 Notice, though, that in C, despite what you 02:11:00.090 --> 02:11:02.820 might see in books or online tutorials, nowadays, 02:11:02.820 --> 02:11:06.180 the two official formats for defining a main function 02:11:06.180 --> 02:11:11.130 are either this, which we've been using now for two plus weeks or now this, 02:11:11.130 --> 02:11:14.250 whereby, you change the void to int argc, 02:11:14.250 --> 02:11:17.880 and then for now, string argv, and then empty brackets. 02:11:17.880 --> 02:11:20.608 And we'll see that this, too, is a simplification, some training 02:11:20.608 --> 02:11:21.400 wheels if you will. 02:11:21.400 --> 02:11:23.550 But for now, those are the two forms, even 02:11:23.550 --> 02:11:26.550 though you will see in online tutorials and even books, some people 02:11:26.550 --> 02:11:27.840 use main in different ways. 02:11:27.840 --> 02:11:30.142 These are the two now to keep in mind. 02:11:30.142 --> 02:11:32.100 And I'll note that these command line arguments 02:11:32.100 --> 02:11:33.360 are kind of all over the place. 02:11:33.360 --> 02:11:35.590 Didn't probably expect to see this word on the screen here. 02:11:35.590 --> 02:11:36.490 And what does it mean? 02:11:36.490 --> 02:11:37.920 Well, it turns out that for decades-- there's 02:11:37.920 --> 02:11:40.080 actually this program that comes with Linux systems 02:11:40.080 --> 02:11:41.880 in particular called cowsay. 02:11:41.880 --> 02:11:42.510 Why? 02:11:42.510 --> 02:11:45.300 Probably because someone had too much free time once and decided 02:11:45.300 --> 02:11:49.920 to write a program that creates ASCII art out of a cow saying something 02:11:49.920 --> 02:11:51.520 textually on the screen. 02:11:51.520 --> 02:11:55.780 But you use cowsay, just for fun, by way of command line arguments. 02:11:55.780 --> 02:12:00.660 So for instance, let me propose that I go back to VS Code 02:12:00.660 --> 02:12:03.020 here, not because I want to write any code, 02:12:03.020 --> 02:12:04.770 but I just want to use my terminal window. 02:12:04.770 --> 02:12:07.320 And let me maximize my terminal window here. 02:12:07.320 --> 02:12:11.880 And let me go ahead and type in something like, how about cowsay, 02:12:11.880 --> 02:12:13.170 space moo? 02:12:13.170 --> 02:12:14.822 So cowsay is not a program I wrote. 02:12:14.822 --> 02:12:16.030 It's been around for decades. 02:12:16.030 --> 02:12:18.870 But we installed it in VS Code for you in the cloud. 02:12:18.870 --> 02:12:21.330 It takes at least one command line argument. 02:12:21.330 --> 02:12:23.070 What do you want the cow to say? 02:12:23.070 --> 02:12:26.190 I can say, cowsay moo, and hit Enter, and voila, there 02:12:26.190 --> 02:12:29.490 is my ASCII art of a cow saying moo on the screen. 02:12:29.490 --> 02:12:31.090 It can say multiple words. 02:12:31.090 --> 02:12:33.960 So I can say, Hello, world, Enter. 02:12:33.960 --> 02:12:35.800 And now it says, Hello, world. 02:12:35.800 --> 02:12:38.730 So this is just an example of a silly program that uses command line 02:12:38.730 --> 02:12:40.470 arguments, but it takes others too. 02:12:40.470 --> 02:12:43.650 Just like clang, use this convention of hyphens 02:12:43.650 --> 02:12:45.750 to change the output of the program. 02:12:45.750 --> 02:12:49.350 Dash something is just a super common convention with command line arguments 02:12:49.350 --> 02:12:53.520 when you want a very terse notation for some option like output. 02:12:53.520 --> 02:12:56.460 In cowsay, I read the documentation, and it turns out 02:12:56.460 --> 02:12:59.040 there's a dash f command line argument that 02:12:59.040 --> 02:13:03.460 allows you to change the appearance of the cow, if you will. 02:13:03.460 --> 02:13:10.170 So if I do cowsay dash f, duck, and then some other word like quack, 02:13:10.170 --> 02:13:11.640 it's no longer a cow. 02:13:11.640 --> 02:13:15.850 That command line argument turns it into a tiny, adorable duck instead. 02:13:15.850 --> 02:13:19.020 And then lastly, just for fun, because I spent way too much time 02:13:19.020 --> 02:13:20.790 playing with command line arguments. 02:13:20.790 --> 02:13:25.260 Cowsay dash f, dragon, and then how about, rawr, Enter, 02:13:25.260 --> 02:13:27.910 you can even get this on the screen here. 02:13:27.910 --> 02:13:30.150 So this, too, is just an example of what you 02:13:30.150 --> 02:13:34.230 can do with these command line arguments now that we have this building block. 02:13:34.230 --> 02:13:36.960 And there's one final thing we can now do with code. 02:13:36.960 --> 02:13:39.150 There's one last feature today that we'll 02:13:39.150 --> 02:13:41.610 introduce before we now connect all of these dots 02:13:41.610 --> 02:13:47.520 to readability and encryption by talking, lastly, about something called 02:13:47.520 --> 02:13:48.450 exit status. 02:13:48.450 --> 02:13:52.380 It turns out that whenever your main function exits, 02:13:52.380 --> 02:13:55.590 it returns a secret integer that you can figure out, 02:13:55.590 --> 02:13:58.260 as the programmer or an advanced user, what it was. 02:13:58.260 --> 02:14:02.398 And these exit codes, exit statuses, are typically used to indicate errors. 02:14:02.398 --> 02:14:05.190 So for instance, over the past couple of years, if you've used zoom 02:14:05.190 --> 02:14:08.560 and you ever got some kind of error, you might have seen a screen like this. 02:14:08.560 --> 02:14:11.040 It's usually not that helpful, maybe tells you to click 02:14:11.040 --> 02:14:13.050 Report Problem or Contact Support. 02:14:13.050 --> 02:14:16.980 But very often in our human world on Macs, PCs, and phones, 02:14:16.980 --> 02:14:20.010 you see cryptic error codes, like literally numbers 02:14:20.010 --> 02:14:23.640 that probably only Zoom knows, or Microsoft or Google or whatever company 02:14:23.640 --> 02:14:25.050 wrote the software you're using. 02:14:25.050 --> 02:14:28.260 But that number corresponds to a specific error 02:14:28.260 --> 02:14:32.070 that some human somewhere knows might very well happen. 02:14:32.070 --> 02:14:34.950 These are used similarly, although under a different name 02:14:34.950 --> 02:14:38.260 that we'll talk about later in the term, on the web as well. 02:14:38.260 --> 02:14:41.350 Have you ever seen this-- maybe not character, but number? 02:14:41.350 --> 02:14:43.485 So, 404 means what? 02:14:43.485 --> 02:14:44.880 AUDIENCE: Error. 02:14:44.880 --> 02:14:47.790 DAVID MALAN: So error, yes, but really, not found. 02:14:47.790 --> 02:14:48.410 So, why? 02:14:48.410 --> 02:14:49.993 I mean, this is the most arcane thing. 02:14:49.993 --> 02:14:53.000 And we'll talk in a few weeks about what this and other numbers mean, 02:14:53.000 --> 02:14:54.917 but numbers are all around us in technology, 02:14:54.917 --> 02:14:57.500 and they very often mean something to the technical people who 02:14:57.500 --> 02:15:00.270 wrote the software, less so to humans like you and me. 02:15:00.270 --> 02:15:03.230 Why so many of us recognize 404 is kind of weird, 02:15:03.230 --> 02:15:05.900 that like that's been around long enough that we all know it. 02:15:05.900 --> 02:15:10.250 But it really is just a special number that represents an error of some sort. 02:15:10.250 --> 02:15:13.100 So it turns out, the last thing we'll reveal today 02:15:13.100 --> 02:15:15.530 about what we've been taking for granted for two weeks, 02:15:15.530 --> 02:15:18.200 is what the int is in main. 02:15:18.200 --> 02:15:21.650 We've seen, just a moment ago, that the thing in the parentheses, which 02:15:21.650 --> 02:15:24.680 up until now has been void, which means no command line arguments. 02:15:24.680 --> 02:15:29.690 now int argc string argv brackets just means, yes, command line arguments. 02:15:29.690 --> 02:15:31.290 And we've seen how to access them. 02:15:31.290 --> 02:15:33.620 So the last piece of the puzzle, honestly, 02:15:33.620 --> 02:15:37.460 of all the cryptic syntax the past two weeks, is just what int means. 02:15:37.460 --> 02:15:40.610 Int is always there for main, and it indicates 02:15:40.610 --> 02:15:44.300 that main will always return an integer, even though you and I have never 02:15:44.300 --> 02:15:46.010 done so explicitly. 02:15:46.010 --> 02:15:50.450 Usually, main returns 0, by default. But it 02:15:50.450 --> 02:15:53.928 would be weird if you saw an error message saying 0, so 0 is just hidden. 02:15:53.928 --> 02:15:55.470 You would never see it on the screen. 02:15:55.470 --> 02:15:58.670 But it's happening automatically by way of how C is designed. 02:15:58.670 --> 02:16:01.550 So let me write one final program here. 02:16:01.550 --> 02:16:05.750 I'll call it, for instance, status.c to show you these exit statuses. 02:16:05.750 --> 02:16:10.790 Code of status.c, and then up here, let me do something simple like include 02:16:10.790 --> 02:16:18.020 cs50.h, then include stdio.h, and then int main-- 02:16:18.020 --> 02:16:21.350 actually, let's use a command line argument. int argc, string argv[], 02:16:21.350 --> 02:16:23.180 so that's copy, paste. 02:16:23.180 --> 02:16:26.000 But now let's do this. 02:16:26.000 --> 02:16:29.280 If argc does not equal to-- 02:16:29.280 --> 02:16:30.780 why don't we do something like this? 02:16:30.780 --> 02:16:33.740 Let's not just default to hello, world like last time. 02:16:33.740 --> 02:16:34.770 Let's yell at the user. 02:16:34.770 --> 02:16:38.802 So let's say something like printf missing command line argument, 02:16:38.802 --> 02:16:40.760 so that they know they screwed up and they need 02:16:40.760 --> 02:16:43.160 to run the program again correctly. 02:16:43.160 --> 02:16:51.320 Else, let's go ahead and say, print out, as before, Hello, comma %s, 02:16:51.320 --> 02:16:56.730 and then plug in argv[1], so the human's name from the prompt. 02:16:56.730 --> 02:17:01.910 Now at this point, let me go ahead and run status, ./status, 02:17:01.910 --> 02:17:03.590 and I'll type nothing first. 02:17:03.590 --> 02:17:04.700 I get yelled at. 02:17:04.700 --> 02:17:10.170 This time, I'll type it again. ./status David, and it works properly. 02:17:10.170 --> 02:17:14.090 But now let me show you a somewhat secret, cryptic command. 02:17:14.090 --> 02:17:17.330 You can type this at your prompt, and it's just a coincidence 02:17:17.330 --> 02:17:18.740 that there's another dollar sign. 02:17:18.740 --> 02:17:22.400 Echo $?, totally arcane, but it allows you 02:17:22.400 --> 02:17:25.490 to see what exit status your program has ended with. 02:17:25.490 --> 02:17:27.559 So let me run this again the wrong way. 02:17:27.559 --> 02:17:31.040 ./status, I get the error message. 02:17:31.040 --> 02:17:32.780 What was secretly returned? 02:17:32.780 --> 02:17:33.440 I can't see it. 02:17:33.440 --> 02:17:37.280 There's obviously no error screen, but by typing echo $?, 02:17:37.280 --> 02:17:41.420 I can see that, oh, my program automatically, by default, returns 02:17:41.420 --> 02:17:42.170 zero. 02:17:42.170 --> 02:17:46.879 However, if I run it again correctly, ./status David, Enter, 02:17:46.879 --> 02:17:48.690 this is the correct version. 02:17:48.690 --> 02:17:50.629 But if I run echo $? 02:17:50.629 --> 02:17:52.879 status again, it's still entered with 0. 02:17:52.879 --> 02:17:55.879 And long story short, this is just a missed opportunity. 02:17:55.879 --> 02:17:59.570 When something goes wrong, why don't I return a value other than 0? 02:17:59.570 --> 02:18:01.070 0, by default, means success. 02:18:01.070 --> 02:18:02.690 And it's always there automatically. 02:18:02.690 --> 02:18:04.940 But you can control this. 02:18:04.940 --> 02:18:11.160 I can go into my code here and return 1, else, if something works fine, 02:18:11.160 --> 02:18:14.870 I can return 0, by default. And honestly, if I omit the return zero, 02:18:14.870 --> 02:18:17.129 again, zero automatically is returned. 02:18:17.129 --> 02:18:20.719 So let me go ahead and go be explicit, just so I know what's going on. 02:18:20.719 --> 02:18:26.360 Make status again, ./status, and let's do this correctly with David. 02:18:26.360 --> 02:18:28.520 Enter, hello, David. 02:18:28.520 --> 02:18:32.059 Echo $?, zero. 02:18:32.059 --> 02:18:33.270 So all is well. 02:18:33.270 --> 02:18:38.240 But now if I do ./status and nothing, or multiple things, but not just David, 02:18:38.240 --> 02:18:40.530 Enter, I get the error message. 02:18:40.530 --> 02:18:45.230 But now if I do echo $?, voila, there now is the one. 02:18:45.230 --> 02:18:47.330 So what does this now mean? 02:18:47.330 --> 02:18:49.490 This is, in the graphical world, we would just 02:18:49.490 --> 02:18:51.020 show something like this on the screen, which is 02:18:51.020 --> 02:18:52.459 a little more informative to the user. 02:18:52.459 --> 02:18:54.469 But even in the Linux world where you don't have a GUI, 02:18:54.469 --> 02:18:56.690 necessarily, even for the programs we've written, 02:18:56.690 --> 02:18:58.549 you can check these exit statuses. 02:18:58.549 --> 02:19:01.070 And in fact, more comfortable, more advanced programmers, 02:19:01.070 --> 02:19:03.889 when they write code that calls programs, 02:19:03.889 --> 02:19:07.340 be it cowsay or anything else, you can encode, 02:19:07.340 --> 02:19:11.030 check what the exit status is of a program, and then decide, 02:19:11.030 --> 02:19:13.170 did my program work or did it not? 02:19:13.170 --> 02:19:16.219 And now let's connect the final dots before we 02:19:16.219 --> 02:19:19.070 adjourn for some fruit snacks. 02:19:19.070 --> 02:19:22.100 Cryptography, namely one of the applications this week 02:19:22.100 --> 02:19:24.770 via which you'll be able to send, if you will, 02:19:24.770 --> 02:19:27.650 secret messages, and better yet, decrypt secret messages. 02:19:27.650 --> 02:19:29.780 This will be in addition to perhaps analyzing 02:19:29.780 --> 02:19:32.120 the readability of text using heuristics, like we 02:19:32.120 --> 02:19:34.040 identified at the start of class two. 02:19:34.040 --> 02:19:38.299 So cryptography is just the art, the science of encrypting information, 02:19:38.299 --> 02:19:41.330 scrambling information so that if you have a secret message 02:19:41.330 --> 02:19:45.980 to send in so-called plaintext, you can run it through some algorithm 02:19:45.980 --> 02:19:49.910 and turn it into what's called ciphertext, thereby, encrypting it. 02:19:49.910 --> 02:19:53.150 And only someone who knows what algorithm you've used 02:19:53.150 --> 02:19:55.880 and what input you've used to the algorithm, theoretically, 02:19:55.880 --> 02:19:59.880 can decrypt that process and convert it back to the original message. 02:19:59.880 --> 02:20:03.030 So if we use our mental model from last week, here is a problem. 02:20:03.030 --> 02:20:04.910 Here is an input and output. 02:20:04.910 --> 02:20:08.120 The goal I claim here is to take some plain text, like the message 02:20:08.120 --> 02:20:10.250 you want to send, think back to grade school 02:20:10.250 --> 02:20:13.640 if you ever passed a note to a friend or to your crush saying, I love you, 02:20:13.640 --> 02:20:16.910 it's a little awkward if the teacher or someone else intercepts the paper. 02:20:16.910 --> 02:20:19.490 And in English, it just says, I love you, or whatever it is. 02:20:19.490 --> 02:20:22.350 It'd be nice if you had at least encrypted it in some way. 02:20:22.350 --> 02:20:25.220 But the other person needs to know what algorithm you used 02:20:25.220 --> 02:20:27.230 and what inputs you use to that algorithm 02:20:27.230 --> 02:20:31.100 so that, ultimately, they can decode the so-called ciphertext, which 02:20:31.100 --> 02:20:32.040 is the output. 02:20:32.040 --> 02:20:34.190 So what goes inside of the box today? 02:20:34.190 --> 02:20:37.970 Well, an algorithm, as it relates to cryptography, is called a cipher. 02:20:37.970 --> 02:20:41.390 And a cipher is a fancy name for an algorithm that encrypts text 02:20:41.390 --> 02:20:43.250 from plaintext to ciphertext. 02:20:43.250 --> 02:20:46.760 The catch is, there needs to be not just the algorithm, 02:20:46.760 --> 02:20:48.750 there needs to be an input to it. 02:20:48.750 --> 02:20:52.590 And so, for instance, you might draw the picture like this for the first time 02:20:52.590 --> 02:20:53.090 today. 02:20:53.090 --> 02:20:54.257 And we've seen this in code. 02:20:54.257 --> 02:20:57.180 You can give multiple inputs or arguments to functions. 02:20:57.180 --> 02:20:59.960 So in this black box, can you imagine passing in the message 02:20:59.960 --> 02:21:02.510 you want to send, and then some secret. 02:21:02.510 --> 02:21:05.300 So for instance, suppose that, the simplest 02:21:05.300 --> 02:21:08.750 thing I could think of as a kid was instead of sending the letter A, 02:21:08.750 --> 02:21:10.310 why don't I write the letter B? 02:21:10.310 --> 02:21:13.070 Instead of the letter B, why don't I write the letter C? 02:21:13.070 --> 02:21:16.280 So I can kind of shift the English alphabet by one space. 02:21:16.280 --> 02:21:18.740 So A becomes B, B becomes C, dot, dot, dot, 02:21:18.740 --> 02:21:21.690 Z becomes A. You can wrap around at the end. 02:21:21.690 --> 02:21:24.120 And let's assume no punctuation in this part of the story. 02:21:24.120 --> 02:21:29.420 So that's a very simple algorithm-- add a value to each letter 02:21:29.420 --> 02:21:32.090 and send the value as the ciphertext. 02:21:32.090 --> 02:21:35.540 And now the teacher, the classmate, they have to know that you use, 02:21:35.540 --> 02:21:39.410 not only this rotational algorithm, also known as a Caesar cipher, 02:21:39.410 --> 02:21:41.300 they also need to know what number you use. 02:21:41.300 --> 02:21:45.200 Did you add 1 to every letter, 2 to every letter, 25 to every letter? 02:21:45.200 --> 02:21:49.310 Now if they're super smart and probably not the young age in this story, 02:21:49.310 --> 02:21:51.165 they could also just try all possibilities. 02:21:51.165 --> 02:21:53.040 And that would be an attack on the algorithm. 02:21:53.040 --> 02:21:55.310 This is not a sophisticated algorithm, but it's 02:21:55.310 --> 02:21:56.970 enough to send a message in class. 02:21:56.970 --> 02:21:58.940 So if the two inputs now are HI! 02:21:58.940 --> 02:22:04.280 as the plain text message, and 1 as the so-called key, the secret number 02:22:04.280 --> 02:22:06.950 that only you and the other person know, you 02:22:06.950 --> 02:22:11.040 might be able to encrypt a message from one way to the other. 02:22:11.040 --> 02:22:13.400 And so in this case, for instance, HI! 02:22:13.400 --> 02:22:16.198 would become I-J-!. 02:22:16.198 --> 02:22:17.990 In this version of the algorithm, we're not 02:22:17.990 --> 02:22:19.823 going to bother with numbers or punctuation. 02:22:19.823 --> 02:22:23.090 We'll only operate on A through Z, be it uppercase or lowercase. 02:22:23.090 --> 02:22:28.250 So now if you were to receive a slip of paper in class with I-J on it, 02:22:28.250 --> 02:22:31.290 you, the recipient, would know what it is 02:22:31.290 --> 02:22:33.440 so long as you know that the sender used one, 02:22:33.440 --> 02:22:36.500 because you just reverse the algorithm and you subtract one instead. 02:22:36.500 --> 02:22:39.110 The teacher, they probably don't know what this means, 02:22:39.110 --> 02:22:41.443 and they're not going to spend time hacking the message, 02:22:41.443 --> 02:22:42.975 so it just looks scrambled to them. 02:22:42.975 --> 02:22:44.600 And that's what we get from encryption. 02:22:44.600 --> 02:22:47.430 Someone who intercepts it, be it in class or in the real world, 02:22:47.430 --> 02:22:51.080 on the internet or anywhere else, can't actually figure out, ideally, 02:22:51.080 --> 02:22:52.700 what it is you have sent. 02:22:52.700 --> 02:22:55.130 The opposite, of course, is indeed called decryption, 02:22:55.130 --> 02:22:56.300 but the process is the same. 02:22:56.300 --> 02:22:58.370 We now pass in negative 1. 02:22:58.370 --> 02:23:00.300 And so how about this? 02:23:00.300 --> 02:23:02.840 Why don't we end with a demonstration here? 02:23:02.840 --> 02:23:08.360 UIJT XBT DT50-- there's a bit of a tell there. 02:23:08.360 --> 02:23:11.060 If we pass that in and do negative 1, well, 02:23:11.060 --> 02:23:14.180 how do we get out the plaintext originally? 02:23:14.180 --> 02:23:18.200 Well, if this is the ciphertext, and we subtract 1 from each letter, 02:23:18.200 --> 02:23:28.010 I think U becomes T, I becomes H, J becomes I, T becomes S, X becomes W, 02:23:28.010 --> 02:23:37.580 B becomes A, T becomes S, D becomes C, T becomes S, and this was, indeed, CS50. 02:23:37.580 --> 02:23:40.250 Have a duck on your way out, and some snacks in the lobby. 02:23:40.250 --> 02:23:42.350 [APPLAUSE] 02:23:42.350 --> 02:23:43.850 [FILM ROLLING] 02:23:43.850 --> 02:23:47.500 [MUSIC PLAYING]